LLVM Tutorial
LLVM Tutorial
LLVM Tutorial
Transformation
ANDREW RUEF
UNIVERSITY OF MARYLAND
COMPUTER SCIENCE
LLVM Overview
Research project at UIUC
Modular compiler tool chain
Integrated in many open source and commercial
projects
Licensed under an open-source license
Introduction
Components of LLVM
Mid-level compiler Intermediate
Representation (IR)
C/C++ compiler frontend (clang)
Target-specific (X86, ARM, etc) code
generators
Divide between clang and LLVM
Clang is a C/C++ compiler with an
LLVM backend
LLVM is everything else
Todays Agenda
Well talk about existing LLVM tools
Well do a few demos using those tools
Well talk about how to build tools on top of LLVM
Well build two analysis tools
Well look at a program re-writing tool
LLVM bitcode
opt Analyze and transform LLVM bitcode
llc Code generator for LLVM bitcode to native
code
infrastructure is doing
C to un-optimized bitcode
Optimized bitcode
Machine code
Executable
instruction
http://eli.thegreenplace.net/2012/11/24/life-of-aninstruction-in-llvm/
LLVM Intermediate
Representation
accepting routine
If it is, retrieve the first parameter
If the first parameter is not a constant global, raise
an alert
on it
Produce bitcode file using clang c emit-llvm!
Using the driver might seem clunky, this is easier
than integrating with opt
The pass can later be integrated with opt
!
cd tutorial!
mkdir build!
cd build!
cmake DLLVM_ROOT=/usr/local ..!
make!
CMake
CMake is a meta make
Why? Why not
CMake generates your build environment
Makefiles
XCode solution
Visual Studio solution
CMake has its own build specification system for
autoconf
a value!
add a binary instruction!
nsw no signed wrap!
Types
No implicit casting in LLVM IR, all values must be
explicitly converted
All values have a static type
Integers are specified at arbitrary bitwidth
pointers, structures
Structures have types like {i32, i32, i8}!
Pointers have types like pointer to i32
wrapping in LLVM IR (
http://code.google.com/p/wrapped-intervals/ )
Operations are interpreted as signed or unsigned
based on instructions they are used in
Memory Model
LLVM has a low level view of memory
Just a key -> value map
Keys are pointer values
Values stored in LLVM memory must be integers, floating
point, pointers, vectors, structures, or arrays
LLVM has a concept of creating function-local
The Module
Highest level concept
Contains a set of global values
Global variables
Functions
The Function
Name
Argument list
Return type
Calling convention
Extends from GlobalValue, has properties of
linkage visibility
The BasicBlock
Contains a list of Instructions
All BasicBlocks must end in a TerminatorInst
BasicBlocks descend from values, and are used as
The Instruction
Terminator instructions
Binary instructions
Bitwise instructions
Aggregate instructions
Memory instructions
Type conversion instructions
Control and misc instructions
Language By Example
Produced with opt dot-cfg o fib.bc fib.bc and graphviz
Simple function
int foo(int a, int b) {!
int i = a;!
int j = b;!
!
return i+j+1;!
}!
Pre-SSA
define i32 @foo(i32 %a, i32 %b) nounwind uwtable ssp {!
entry:!
%a.addr = alloca i32, align 4!
%b.addr = alloca i32, align 4!
%i = alloca i32, align 4!
%j = alloca i32, align 4!
store i32 %a, i32* %a.addr, align 4!
store i32 %b, i32* %b.addr, align 4!
%0 = load i32* %a.addr, align 4!
store i32 %0, i32* %i, align 4!
%1 = load i32* %b.addr, align 4!
store i32 %1, i32* %j, align 4!
%2 = load i32* %i, align 4!
%3 = load i32* %j, align 4!
%add = add nsw i32 %2, %3!
%add1 = add nsw i32 %add, 1!
ret i32 %add1!
}!
Post-SSA
define i32 @foo(i32 %a, i32 %b) nounwind
uwtable ssp {!
entry:!
%add = add nsw i32 %a, %b!
%add1 = add nsw i32 %add, 1!
ret i32 %add1!
}!
The Phi-Node
To support conditional assignments, we introduce an
imaginary function
Phi defines a value and accepts a list of tuples as an
argument
Each tuple is a (BasicBlock * Value)
Interpret the phi node as defining a value
conditionally based on the previous basic block
LLVM CFG
Well-Formed LLVM
There are specific rules as to what constitutes Well-
Formed LLVM
Phi-nodes dominate their uses
Instruction arguments are defined before use
All blocks end in a terminator
All branch targets are defined values
C++ API
Value Hierarchy
Value has a very rich class hierarchy
LLVM API allows the manipulation of every Value
Any degree of transformation is possible
Value
This allows for some useful APIs
Def-Use / Use-Def iteration
Replace any Value with another Value
Sub
LLVM Context
Frequent argument to LLVM API functions
These can normally be retrieved from a Value via
getContext!
LLVM objects
When writing passes, you use LLVM specific helpers
isa<T> - True or false if pointer/reference is of type T!
cast<T> - Checked cast, asserts on failure if not type T!
dyn_cast<T> - unchecked cast, null if not type T
Common Patterns
Iterate over BasicBlock in a Function
Use begin(), end() iterators of Function
Iterate over Instructions in a Function
Use inst_iterator
Iterate over Def-Use chains
Use use_begin, use_end!
InstVisitor
Pattern to avoid giant blocks of !
!if(T *n = dyn_cast<T>(foo))!
Inherit from InstVisitor class and define a visitTInst
method
Could work for your purposes
Could confuse control flow even more
Passes
In the previous lab, we wrote a
pass
Compiling is the act of passing
over and analyzing/transforming
IR
Most things that happen in LLVM
happen in the context of a pass
Passes can have complicated
actions
Pass Dependencies
Passes can depend on the output of other passes
Analysis passes for alias analysis
Passes note their dependencies on other passes
By overriding the getAnalysisUsage method
PassManager figures out the dependency graph
It also attempts to optimize the traversal of the graph
Each Pass returns a bool, PassManager runs until
everyone stops
Pass Manager
PassManager performs dependency maintenance
Note that PassManager invocations could be multi-threaded!
Importance of multiple LLVMContexts
PassManager also performs optimizations of pass
ordering
PassManager defines different kinds of Passes that
can be run
ModulePass Run on entire module
FunctionPass Run on individual functions
BasicBlockpass Run on individual basic blocks
Pass Rules
Non-analysis passes should not remember any
return or store
Traverse the set checking for alloca-ed values in
the Values descending from the escapes
analysis
lldb llvm debugger
klee symbolic execution for LLVM
FreeBSD compiles with clang, soon will switch to
Conclusion
LLVM enables powerful transformations
Includes an industry grade C/C++ frontend
clang is default compiler on OSX, supported by Apple
Can compile much of Linux userspace
Well defined Intermediate Language
Modular and pluggable framework for analysis and
transformation
Project Documentation
Good documentation online
http://www.llvm.org/docs
Documentation covers many aspects of the LLVM
project
Programmers manual details finer points of the C++ API
Language reference is ultimate source for language details and
semantics