Elsa/Oink/Cqual++: Open-Source Static Analysis For C++

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 29

Elsa/Oink/Cqual++:

Open-Source Static Analysis for C++

Scott McPeak Daniel Wilkerson


work with Rob Johnson

CodeCon 2006
Goals
• Build extensible infrastructure to
• Find certain categories of bugs
– Exhaustively, within some constraints
• At compile time
• In real-world C and C++ programs
• Using composable analyses
Components
• Elkhound: Generalized LR Parser Generator

• Elsa: C++ Parser

• Oink: Whole-program dataflow

• Cqual++: Type qualifier analysis


Elkhound: GLR Parser Generator
• GLR eliminates the pain of LALR(1)
– Unbounded lookahead
– Allows ambiguous grammars!
• 10x faster than other GLR implementations
– Novel combination of GLR and LALR(1)
• User-defined disambiguation
– Early: during parsing
– Late: after generating AST w/ambiguities
Example: ‘>’ ambiguity
Type

Expr

new C < 3 > + 4 > + 5 ;


Type
Expr

new C < 3 > + 4 > + 5 ;


Example: ‘>’ ambiguity
Type
Correct

Expr

new C < 3 > + 4 > + 5 ;


Type
Incorrect
Expr

new C < 3 > + 4 > + 5 ;


unparenthesized ‘>’ symbol
Example: Type vs. Variable
• In C & C++, sometimes hard to tell whether
a name refers to a type or a variable

Expr Expr Type Expr

(a) & (b) or (a) & (b)


Example: Type vs. Variable
• In C & C++, sometimes hard to tell whether
a name refers to a type or a variable
int a; // hidden
class C {
int f(int b)
{ return (a) & (b); }
typedef int a; // visible
};
Elsa: Extensible C++ Front-end
• Parses ANSI C++ with GNU extensions
• Uses GLR to handle the ambiguities
• Extensible components:
– flex lexer
– Elkhound parser
– AST defined with custom tool
– Type checker
The Elsa Block Diagram

possibly annotated
preproc’d token ambiguous unambiguous
source stream AST AST
final
AST
Type Post
Lexer Parser
Checker Process

No lexer feedback hack!


Extending the Syntax
• ANSI or GNU? Both!
– Declarative language
– Extend simply by concatenating

ANSI Base: GNU Extension:

nonterm ConditionalExp { nonterm ConditionalExp {


-> Exp {...} -> Exp "?" ":" Exp {...}
-> Exp "?" Exp ":" Exp {...} }
}
Declarative Abstract Syntax
superclass name superclass ctor parameter
subclass ctor list parameter
class Statement (SourceLoc loc) {
subclass names
-> S_compound(ASTList<Statement> stmts);

-> S_if(Condition cond, Statement thenBranch,


Statement elseBranch);

-> S_while(Condition cond, Statement body);

// ... subclass ctor parameter


}
Extending the Abstract Syntax
• ANSI or GNU? Both!
– Declarative language
– Extend simply by concatenating

ANSI Base: GNU Extension:


class Statement { class Statement {
-> S_decl(Declaration decl); -> S_function(Function f);
-> S_expr(Expression expr); }
-> S_if(...);
-> S_for(...);
 GNU nested functions
}
Semantic Analysis
• Disambiguate

• Compute types

• Resolve overloading

• Insert implicit conversions

• Instantiate templates
Disambiguation
Ambiguous syntax example: return (x)(y);

S_return
expr

E_cast ambiguity link E_funCall


type expr func arg

TypeId E_variable E_variable E_variable

x y
Lowered Output: Simplified C++
• Original or Lowered output can be printed
• Lowering always done:
– Templates are instantiated
– Implicit type conversions inserted
• Lowering optionally done:
– Implicit member functions created
– Implicit ctor/dtor calls inserted
C++ or XML, In and Out

C++ C++
Elsa
XML XML

First pass renders to a canonical form.


Serialization commutes with lowering.
Cqual++: Dataflow
• Dataflow Analysis on Type Qualifiers
• Successor to Cqual: Jeff Foster, Alex Aiken

char $tainted *getenv();


void printf(char $untainted *fmt, ...);
int main() {
char *x = getenv(“foo”));

printf(x);
}
Feature: Polymorphic Dataflow

int f(int x) {return x;}


int main() {
int $tainted t = ...;

int a = f(t);

int $untainted u = f(3);


}
Feature: “Funky Qualifiers”:
Fake Function Bodies
char $_1_2 *strcat(char $_1_2 *dest,
{1} ½ {1,2}
const char $_1 *src);
int main() {
char $tainted *x;
char $untainted *y;
strcat(y, x);
}
Feature: Separate Compilation
for Scalability
• “Compile” each file to a dataflow graph
– only flow behavior between external symbols
matters
– compress by finding smaller graph with same
flow behavior; typically saves factor of 12
• “Link” each graph
– AST is gone at linking so we save even more
space
Non-Feature: Cqual++ Is Not
Flow-Sensitive
q = p;
... time passes ...

p->s = read_from_network();
use_in_untrusting_way(p->s);

// does p == q still??
q->s = "innocuous"; $tainted??
use_in_trusting_way(p->s);
What Exactly Is ‘Data-Flow’?
char *launderString(char *in) {
int len = strlen(in);
char *out = malloc(len+1);
for (int i=0; i<len; ++i) {
out[i] = 0;
for (int j=0; j<8; ++j)
if (in[i] & (1<<j))
out[i] |= (1<<j);
}
out[len] = '\0';
return out;
}
Application: Finding Format-
String Vulnerabilities
• Printf() is an interpreter
• the format string is a program
– %n writes number of bytes written to memory
pointed to by the arg
– ex: printf(“stuff%n”, p) means *p = 5
• if no argument p, printf() writes through
some pointer on the stack
– do not allow untrusted data in first arg to printf
Application: Finding User-Kernel
Vulnerabilities
• Kernel must check user pointers are valid
– must point to memory mapped into user
process’s address space
– otherwise could manipulate the kernel data
• This is also a dataflow/taint analysis
Rob’s Cqual Linux
User-Kernel Results
• 2.4.20, full config, 7 bugs, 275 false pos.
• 2.4.23, full config, 6 bugs, 264 false pos.
• including other trials on same kernels:
– found 17 different security vulnerabilites
– found bugs missed by other tools and manually
– all but one bug confirmed exploitable
– significant “bug churn” across kernel versions
Linus’s “Sparse” Tool
for User-Kernel Vulnerabilities
• Linus also has a tool using type qualifiers
– it requires manual annotation of every var
• In contrast, Cqual++ infers the qualifiers
– only sources and sinks need be annotated
– and any “sanitizer” functions:
• Linus says this “is not the C way”
– ok, he can write all the annotations
Future Application: Finding
Character-Set Confusions
• Microsoft confusing ASCII and UCS2
• Mozilla has 20-ish differnt charcter sets
• they should only flow together through
conversion functions
• if array sizes differ, confusions can be a
security hole too
Oink Vision:
Composable Analysis Tools
• Compilers refuse to compile bugs
– well, some classes of bugs
– and you may have to wait until tomorrow
morning to find out
• Correctness analysis is expected as part of
any compiler toolchain
• The analyses are composable and extensible

You might also like