PLDI Week 03 Irs
PLDI Week 03 Irs
PLDI Week 03 Irs
Ilya Sergey
[email protected]
ilyasergey.net/CS4212/
From the Last week:
Comparisons and Conditioning
2
X86lite State: Flags & Condition Codes
• X86 instructions set flags as a side effect
• X86lite has only 3 flags:
– OF: “overflow” set when the result is too big/small to fit in 64-bit reg.
– SF: “sign” set to the sign or the result (0=positive, 1 = negative)
– ZF: “zero” set when the result is 0
• E.g., cmpq SRC1, SRC2 computes SRC1 – SRC2 to set the flags
• Now we can check conditional codes:
– eq equality holds when ZF is set
– neq inequality holds when (not ZF)
– lt less than holds when SF <> OF
• Equivalently: ((SF && not OF) || (not SF && OF))
– ge greater or equal holds when (not lt) holds, i.e. (SF = OF)
– le than or equal holds when SF <> OF or ZF
– gt greater than holds when (not le) holds,
• i.e. (SF = OF) && not(ZF)
3
Conditional Instructions
• Example:
4
Code Blocks & Labels
label1:
<instruction>
<instruction>
…
<instruction>
label2:
<instruction>
<instruction>
…
<instruction>
Labels indicate code locations that can be jump targets (either through conditional branch instructions or function calls).
• Labels are translated away by the linker and loader – instructions live in the heap in the “code segment”
5
Basic Control Flow
6
Jumps, Calls, and Return
7
Loop-based Factorial in Assembly
.globl _program
_program:
int program() {
movq $1, %rax
int acc = 1;
movq $6, %rdi
int n = 6;
loop:
while (0 < n) {
cmpq $0, %rdi
acc = acc * n;
je exit
n = n - 1;
imulq %rdi, %rax
}
decq %rdi
return acc;
jmp loop
}
exit:
retq
8
Demo: Hand-Coded x86Lite
• https://github.com/cs4212/week-02-x86lite
9
Compiling, Linking, Running
• To use hand-coded X86:
1. Compile OCaml program main1.ml to the executable
by running make
10
Implementing Functions &
C Calling Conventions
11
X86 Schematic
Processor Memory
Instruction
Decoder
Larger Addresses
Heap
OF
ALU
Control SF ALU
ZF
Flags
rax rbx rcx rdx
rsi rdi rbp rsp
r08 r09 r10 r11 Stack
r12 r13 r14 r15
Registers
3 parts of the C memory model
Memory
• The code & data (or "text") segment Code & Data
– contains compiled code, constant strings, etc.
• The Heap
– Stores dynamically allocated objects
– Allocated via "malloc"
Larger Addresses
Heap
– Deallocated via "free"
– C runtime system
• The Stack
– Stores local variables
– Stores the return address of a function
Stack
• In practice, most languages use this model.
13
Local/Temporary Variable Storage
• Need space to store:
– Global variables
– Values passed as arguments to procedures
– Local variables (either defined in the source program or introduced by the compiler)
• In practice on X86:
– Registers are limited (and have restrictions)
– Divide memory into regions including the stack and the heap
14
Calling Conventions
• Specify the locations (e.g. register or stack) of arguments passed to a function and returned by the function
int64_t f(int64_t x) {
int64_t ans = g(3,4) + x;
return ans;
}
g is a callee
• Designate registers either:
– Caller Save – e.g., freely usable by the called code
– Callee Save – e.g., must be restored by the called code
• Define the protocol for deallocating stack-allocated arguments
– Caller cleans up
– Callee cleans up (makes variable number arguments harder — the callee doesn’t know how many are those)
15
x64 Calling Conventions: Caller Protocol
f: …
… # set up arguments
%rip f …
callq g
lret: …
…
Machine state when
executing in function f.
%rdx arg3
%rcx arg4
%r8 arg5
%r9 arg6
%rsp f ’s
%rbp frame
registers old %rbp
(not all of them)
16
x64 Calling Conventions: Caller Protocol
f: …
… # set up arguments
%rip …
callq g
lret: …
Calling conventions:
…
args 1…6 in registers
as shown below.
%rsi arg2
%rdx arg3
%rcx arg4
%r8 arg5
%r9 arg6
%rsp f ’s
%rbp frame
registers old %rbp
(not all of them)
17
x64 Calling Conventions: Caller Protocol
f: …
… # set up arguments
%rip …
callq g
lret: …
args > 6 pushed onto
…
the stack (from right to left)
Note: %rsp changes
%rsi arg2
%r9 arg6
%rsp f ’s
%rbp frame
registers old %rbp
(not all of them)
18
Call Instruction
f: …
… # set up arguments
%rip …
callq g
lret: …
To execute the call:
…
1. push the return
address
%rdi arg1
%r9 arg6
%rsp f ’s
%rbp frame
registers old %rbp
(not all of them)
19
Call Instruction
g: pushq %rbp
movq %rsp, %rbp
%rip subq $128, %rsp
…
To execute the call:
…
2. set rip to address g
%r9 arg6
%rsp f ’s
%rbp frame
registers old %rbp
(not all of them)
20
Callee Function Prologue
g: pushq %rbp
movq %rsp, %rbp
%rip subq $128, %rsp
…
Callee protocol:
…
1. store the old %rbp
g’s
frame
%r9 arg6
%rsp f ’s
%rbp frame
registers old %rbp
(not all of them)
21
Callee Function Prologue
g: pushq %rbp
movq %rsp, %rbp
%rip subq $128, %rsp
…
Callee protocol:
…
2. adjust the %rbp to
point to the new “base” g’s
(%rbp is the “base pointer”) frame
%r9 arg6
%rsp f ’s
%rbp frame
registers old %rbp
(not all of them)
22
Callee Function Prologue
g: pushq %rbp
movq %rsp, %rbp
%rip subq $128, %rsp
…
Callee protocol:
…
3. allocate 128 bytes of
“scratch” stack space g’s
frame
%r9 arg6
%rsp f ’s
%rbp frame
registers old %rbp
(not all of them)
23
Callee Invariants: Function Arguments
g: pushq %rbp
movq %rsp, %rbp
%rip subq $128, %rsp
…
Now g’s body can run...
• its arguments are accessible
…
either in registers or as
g’s
offsets from %rbp
frame
%r9 arg6
%rsp f ’s
%rbp frame
registers old %rbp
(not all of them)
24
Callee Invariants: Callee Same Registers
g: pushq %rbp
movq %rsp, %rbp
%rip subq $128, %rsp
…
…
g’s
frame
25
Callee Epilogue (Return Protocol)
g: …
movq ANS, %rax
addq $128, %rsp
%rip
popq %rbp
retq
Step 1: Move the result
…
(if any) into %rax.
g’s
%rax ANS frame
%r9 arg6
%rsp f ’s
%rbp frame
registers old %rbp
(not all of them)
26
Callee Epilogue (Return Protocol)
g: …
movq ANS, %rax
addq $128, %rsp
%rip
popq %rbp
Step 2: deallocate the retq
scratch space
…
g’s
%rax ANS frame
%r9 arg6
%rsp f ’s
%rbp frame
registers old %rbp
(not all of them)
27
Callee Epilogue (Return Protocol)
g: …
movq ANS, %rax
addq $128, %rsp
%rip
popq %rbp
Step 3: restore the caller’s retq
%rbp
…
g’s
%rax ANS frame
%r9 arg6
%rsp f ’s
%rbp frame
registers old %rbp
(not all of them)
28
Callee Epilogue (Return Protocol)
g: …
movq ANS, %rax
addq $128, %rsp
%rip
popq %rbp
Step 4: the return instruction retq
…
g’s
%rax ANS frame
%r9 arg6
%rsp f ’s
%rbp frame
registers old %rbp
(not all of them)
29
Callee Epilogue (Return Protocol)
f: …
… # set up arguments
…
%rip
callq g
lret: …
Step 4: the return instruction
pops the stack into %rip
…
%rax ANS
%r9 arg6
%rsp f ’s
%rbp frame
registers old %rbp
(not all of them)
30
Back in f
f: …
… # set up arguments
…
%rip
callq g
lret: …
At this point, f has the result of
…
g in %rax. It should clean up its
stack as needed.
%rax ANS
%r9 arg6
%rsp f ’s
%rbp frame
registers old %rbp
(not all of them)
31
X86-64 SYSTEM v AMD 64 ABI
• More modern variant of C calling conventions
– used on Linux, Solaris, BSD, OS X
• 128 byte "red zone" – scratch pad for the callee's data
– typical of C compilers, not required
– can be optimised away
32
Announcements
• HW2: X86lite
• Due: Sunday, September 11 at 23:59
• Pair Programming:
• Use GitHub Classroom link to create a new team for the project or join an existing one
• Choose a funny group name!
• Submission by any group member done on Canvas counts for the group
Demo: Directly Compiling Expressions to X86lite
• https://github.com/cs4212/week-02-x86lite
34
Directly Translating AST to Assembly
• Key Challenges:
– storing intermediate values needed to compute complex expressions
– some instructions use specific registers (e.g. shift)
One Simple Strategy
• Invariants:
– Compilation of an expression yields its result in %rax
– Argument (Xi) is stored in a dedicated operand register
– Intermediate values are pushed onto the stack
– Stack slot is popped after use (so the space is reclaimed)
• Resulting code is wrapped (e.g., with retq) to comply with cdecl calling conventions
But…
• The resulting code quality is poor.
• Richer source language features are hard to encode
– Structured data types, objects, first-class functions, etc.
• It’s hard to optimize the resulting assembly code.
– The representation is too concrete – e.g. it has committed to using certain registers and the stack
– Only a fixed number of registers
– Some instructions have restrictions on where the operands are located
• Control-flow is not structured:
– Arbitrary jumps from one code block to another
– Implicit fall-through makes sequences of code non-modular
(i.e. you can’t rearrange sequences of code easily)
• Retargeting the compiler to a new architecture is hard.
– Target assembly code is hard-wired into the translation
Intermediate Representations (IR’s)
• Abstract machine code: hides details of the target architecture
x86
Java
AST IR Byte-
code
Optimization Arm
Multiple IR’s
• Goal: get program closer to machine code without losing the
information needed to do analysis and optimizations
Java
AST HIR MIR Byte-
code
Optimizations
What makes a good IR?
• Easy translation target (from the level above)
• Easy to translate (to the level below)
• Narrow interface
– Fewer constructs means simpler phases/optimizations
• Example: Source language might have “while”, “for”, and “foreach” loops
(and maybe more variants)
– IR might have only “while” loops and sequencing
– Translation eliminates “for” and “foreach”
– Here the notation ⟦cmd⟧ denotes the “translation” or “compilation” of the command cmd.
IR’s at the extreme
• High-level IR’s
– Abstract syntax + new node types not generated by the parser
• e.g. Type checking information or disambiguated syntax nodes
– Typically preserves the high-level language constructs
• Structured control flow, variable names, methods, functions, etc.
• May do some simplification (e.g. convert for to while)
– Allows high-level optimizations based on program structure
• e.g. inlining “small” functions, reuse of constants, etc.
– Useful for semantic analyses like type checking
• Low-level IR’s
– Machine dependent assembly code + extra pseudo-instructions
• e.g. a pseudo instruction for interfacing with garbage collector or memory allocator (parts of the language runtime
system)
• e.g. (on x86) a imulq instruction that doesn’t restrict register usage
– Source structure of the program is lost:
• Translation to assembly code is straightforward
– Allows low-level optimizations based on target architecture
• e.g. register allocation, instruction selection, memory layout, etc.
• What’s in between?
Mid-level IR’s: Many Varieties
• Many examples:
– Triples: OP a b
• Useful for instruction selection on X86 via “graph tiling” (a way to better utilise registers)
– Quadruples: a = b OP c (RISC-like “three address form”)
– SSA: variant of quadruples where each variable is assigned exactly once
• Easy dataflow analysis for optimization
• e.g. LLVM: industrial-strength IR, based on SSA
– Stack-based:
• Easy to generate
• e.g. Java Bytecode, UCODE
Growing an IR
IR ?
• Idea: name intermediate values, make order of evaluation explicit.
– No nested operations.
Translation to SLL
Add(Add(Const 1, Var X4),
• Given this: Add(Const 3, Mul(Var X1,
Const 5)))
• https://github.com/cs4212/week-03-intermediate-2023
• Definitions: ir<X>.ml
Intermediate Representations
• IR1: Expressions
– simple arithmetic expressions, immutable global variables
• IR2: Commands
– global mutable variables
– commands for update and sequencing
• LLVM