Elsa/Oink/Cqual++: Open-Source Static Analysis For C++
Elsa/Oink/Cqual++: Open-Source Static Analysis For C++
Elsa/Oink/Cqual++: Open-Source Static Analysis For C++
CodeCon 2006
Goals
• Build extensible infrastructure to
• Find certain categories of bugs
– Exhaustively, within some constraints
• At compile time
• In real-world C and C++ programs
• Using composable analyses
Components
• Elkhound: Generalized LR Parser Generator
Expr
Expr
possibly annotated
preproc’d token ambiguous unambiguous
source stream AST AST
final
AST
Type Post
Lexer Parser
Checker Process
• Compute types
• Resolve overloading
• Instantiate templates
Disambiguation
Ambiguous syntax example: return (x)(y);
S_return
expr
x y
Lowered Output: Simplified C++
• Original or Lowered output can be printed
• Lowering always done:
– Templates are instantiated
– Implicit type conversions inserted
• Lowering optionally done:
– Implicit member functions created
– Implicit ctor/dtor calls inserted
C++ or XML, In and Out
C++ C++
Elsa
XML XML
printf(x);
}
Feature: Polymorphic Dataflow
int a = f(t);
p->s = read_from_network();
use_in_untrusting_way(p->s);
// does p == q still??
q->s = "innocuous"; $tainted??
use_in_trusting_way(p->s);
What Exactly Is ‘Data-Flow’?
char *launderString(char *in) {
int len = strlen(in);
char *out = malloc(len+1);
for (int i=0; i<len; ++i) {
out[i] = 0;
for (int j=0; j<8; ++j)
if (in[i] & (1<<j))
out[i] |= (1<<j);
}
out[len] = '\0';
return out;
}
Application: Finding Format-
String Vulnerabilities
• Printf() is an interpreter
• the format string is a program
– %n writes number of bytes written to memory
pointed to by the arg
– ex: printf(“stuff%n”, p) means *p = 5
• if no argument p, printf() writes through
some pointer on the stack
– do not allow untrusted data in first arg to printf
Application: Finding User-Kernel
Vulnerabilities
• Kernel must check user pointers are valid
– must point to memory mapped into user
process’s address space
– otherwise could manipulate the kernel data
• This is also a dataflow/taint analysis
Rob’s Cqual Linux
User-Kernel Results
• 2.4.20, full config, 7 bugs, 275 false pos.
• 2.4.23, full config, 6 bugs, 264 false pos.
• including other trials on same kernels:
– found 17 different security vulnerabilites
– found bugs missed by other tools and manually
– all but one bug confirmed exploitable
– significant “bug churn” across kernel versions
Linus’s “Sparse” Tool
for User-Kernel Vulnerabilities
• Linus also has a tool using type qualifiers
– it requires manual annotation of every var
• In contrast, Cqual++ infers the qualifiers
– only sources and sinks need be annotated
– and any “sanitizer” functions:
• Linus says this “is not the C way”
– ok, he can write all the annotations
Future Application: Finding
Character-Set Confusions
• Microsoft confusing ASCII and UCS2
• Mozilla has 20-ish differnt charcter sets
• they should only flow together through
conversion functions
• if array sizes differ, confusions can be a
security hole too
Oink Vision:
Composable Analysis Tools
• Compilers refuse to compile bugs
– well, some classes of bugs
– and you may have to wait until tomorrow
morning to find out
• Correctness analysis is expected as part of
any compiler toolchain
• The analyses are composable and extensible