01 Lecture02

Download as pdf or txt
Download as pdf or txt
You are on page 1of 78

Basic exploitation techniques

20200123
Outline

A primer on x86 assembly

Memory segments

Stack-based buffer overflows

Heap-based buffer overflows

Format strings

1
A primer on x86 assembly
Introduction

Verily, when the developer herds understand the tools


that drive them to their cubicled pastures every day,
then shall the 0day be depleted — but not before.
– Pastor Manul Laphroaig

2
It’s a trap!

• ≈ 1000 instructions . . .
• No time to know them all :-)
This overview is meant as a first help
Multiple syntaxes
• AT&T
• Intel

3
Basics

In general
Mnemonics accept from 0 to 3 arguments.
2 arguments mnemonics are of the form (Intel syntax)

m dst, src

which roughly means

dst ← dst src

where is the semantics of m

4
Endianness

x = 0xdeadbeef

Endianness
byte address 0x00 0x01 0x02 0x03
byte content (big-endian) 0xde 0xad 0xbe 0xef
byte content (litte-endian) 0xef 0xbe 0xad 0xde

5
Architectures

Big endian
PowerPC, Sparc, 68000

Little endian
Intel, AMD

Bi-endian
ARM, RISC-V
These usually defaults to little endian.

6
Resources

• Cheat sheet
• Opcode and Instruction Reference
• Intel full instruction set reference

7
Basic registers (16/32/64 bits)

64 32 16 name (8080) / use


r e ax accumulator
r e bx base address
r e cx count
r e dx data
r e di source index
r e si destination index
r e bp base pointer
r e sp stack pointer
r e ip instruction pointer

• esp (e = extended) is the 32 bits stack pointer


• rsp (r = register) is the 64 bits one
8
Less basic registers (64 bits)

Add extended general purpose registers r8-15

• r7*d* accesses the lower 32 bits of r7;


• r7*w* accesses the lower 16 bits;
• r7*b* accesses its lower 8 bits.

9
The full story

10
Register flags (partial)

of overflow flag
cf carry flag
zf zero flag
sf sign flag
df direction flag
pf parity flag
af adjust flag

11
Signed vs unsigned

At machine-level, every value is a bitvector.

Bitvectors can be seen through different lenses:

• unsigned value
• signed value
• float (will not talk about it)

12
Transfer

Move

mov dst, src dst := src

xchg o1, o2 tmp:= o1; o1 := o2; o2 := tmp

13
Arithmetic

All 4 arithmetic operations are present

add src, dst dst ← dst + src


sub src, dst dst ← dst - src

div src t64 ← edx @ eax


eax ← t64 / src
edx ← t64 % src

mul src t64 ← eax * src


edx ← t64{32,63}
eax ← t64{0,31}

14
Arithmetic

inc dst dst ← dst + 1


dec dst dst ← dst - 1
sal/sar dst, src arithmetic shift left / right

Sign preservation
1 mov ax, 0xff00 # unsigned: 65280, signed : -256
2 # ax=1111.1111.0000.0000
3 sal ax, 2 # unsigned: 64512, signed : -1024
4 # ax=1111.1100.0000.0000
5 sar ax, 5 # unsigned: 65504, signed : -32
6 # ax=1111.1111.1110.0000

15
Basic logical operators

Basic semantics

and dst, src dst ← dst & src


or dst, src dst ← dst | src
xor dst, src dst ← dst ˆ src
not dst dst ← ~dst

Examples
1 xor ax, ax # ax = 0x0000
2 not ax # ax = 0xffff
3 mov bx, 0x5500 # bx = 0x5500
4 xor ax, bx # ax = 0xbbff

16
Logical shifts

Shift

shl dst, src logical shift left


shr dst, src logical shift right

Logical and arithmetic shift lefts are the same.

Example
1 mov ax, 0xff00 # unsigned: 65280, signed : -256
2 # ax=1111.1111.0000.0000
3 shl ax, 2 # unsigned: 64512, signed : -1024
4 # ax=1111.1100.0000.0000
5 shr ax, 5 # unsigned: 2016, signed : 2016
6 # ax=0000.0111.1110.0000

17
Comparison and test instructions

Comparison
cmp dst, src : set condition according to dst − src

Test
test dst, src: set condition according to dst & src

18
Stack manipulation

Push

push src dec sp; @[sp] := src

Pop

pop src src := @[sp]; inc sp

19
Nops

The nop instruction does nothing (it’s skip!).


There are lots of nop instructions.

Assembly Byte sequence


66 NOP 66 90H
NOP DWORD ptr [EAX] 0F 1F 00H
NOP DWORD ptr [EAX + 00H] 0F 1F 40 00H
NOP DWORD ptr [EAX + EAX*1 + 00H] 0F 1F 44 00 00H
66 NOP DWORD ptr [EAX + EAX*1 + 00H] 66 0F 1F 44 00 00H
NOP DWORD ptr [EAX + 00000000H] 0F 1F 80 00 00 00 00H
NOP DWORD ptr [EAX + EAX*1 + 00000000H] 0F 1F 84 00 00 00 00 00H
66 NOP DWORD ptr [EAX + EAX*1 + 00000000H] 66 0F 1F 84 00 00 00 00 00H

20
Misc

Lea (load effective address)

lea dst, [src] dst := src

mov dst, [src] dst := @[src]

Int

int n runs interrupt number n

21
Unconditional jump instructions

Call

call address
call *op

call pushes eip

Jmp

jmp *op
jmp address

jmp only jumps

22
Extra jumps

Leave
esp := ebp; ebp := pop();

Ret
esp := esp + 4; eip := @[esp - 4];

23
Unsigned jumps

jump if n version e version


ja above Í Í
jb below Í Í
jc carry Í ë

Reading
ja has n and e versions, means that mnemonics

• jna (not above),


• jae (above or equal),
• jnae (not above or equal)

exist as well
24
Signed jumps

jump type if n version e version


jg greater Í Í
jl lower Í Í
jo overflow Í ë
js if sign Í ë

25
Addressing modes

The addressing mode determines, for an instruction that


accesses a memory location, how the address for the memory
location is specified.

Mode Intel
Immediate mov ax, 16h
Direct mov ax, [1000h]
Register Direct mov bx, ax
Register Indirect (indexed) mov ax, [di]
Based Indexed Addressing mov ax, [bx + di]
Based Indexex Disp. mov eax, [ebx + edi + 2]

26
The semantics of instructions
may seem intuitive
but is complex

26
Instructions do have side effects

1 // 04 16 / add al, 0x16


2 0: res8 := (eax(32){0,7} + 22(8))
3 1: OF := ((eax(32){0,7}{7} = 22(8){7}) &
4 (eax(32){0,7}{7} != res8(8){7}))
5 2: SF := (res8(8) <s 0(8))
6 3: ZF := (res8(8) = 0(8))
7 4: AF := ((extu eax(32){0,7}{0,7} 9) + 22(9)){8}
8 5: PF := !
9 ((((((((res8(8){0} ^ res8(8){1}) ^ res8(8){2}) ^
10 res8(8){3}) ^ res8(8){4}) ^ res8(8){5}) ^
11 res8(8){6}) ^ res8(8){7}))
12 6: CF := ((extu eax(32){0,7} 9) + 22(9)){8}
13 7: eax{0, 7} := res8(8)

27
Real behavior of conditions

Mnemonic Flag cmp x y sub x y test x y

ja ¬ CF ∧¬ ZF x >u y x 0 6= 0 x &y 6= 0

jnae CF x <u y x 0 6= 0 ⊥

je ZF x =y x0 = 0 x &y =0

jge OF = SF x ≥y > x ≥0∨y ≥0

jle ZF ∨ OF 6= SF x ≤y > x&y = 0 ∨


(x < 0 ∧ y < 0)

28
Shift left

The OF flag is affected only on 1-bit shifts. For left


shifts, the OF flag is set to 0 if the most-significant
bit of the result is the same as the CF flag (that is, the
top two bits of the original operand were the same);
otherwise, it is set to 1. For the SAR instruction, the
OF flag is cleared for all 1-bit shifts. For the SHR
instruction, the OF flag is set to the most-significant
bit of the original operand.

The OF flag is affected only for 1-bit shifts (see "De-


scription" above); otherwise, it is undefined.

29
Memory segments
General overview

A compiled program has 5 segments:

1. code (text)
2. stack
3. data segments
3.1 data
3.2 bss
3.3 heap

30
Execution

1. Read instruction i @ eip


2. Add byte length of i to eip
3. Execute i
4. Goto 1

31
Graphically speaking

stack function, locals

the hole

the break

heap malloc, free

bss globals

data
text
32
Text segment

stack

• The text segment (aka code segment)


the hole
is where the code resides.
• It is not writable. Any attempt to to
write to it will kill the program.
heap
• As it is ro, it can be shared among
bss processes.
• It has a fixed size

data
text
33
Data & bss segments

stack

• The data segment is filled with


the hole
initialized global and static variables.
• The bss segment contains the
uninitialized ones. It is zeroed on
heap
program startup.
bss • The segments are (of course) writable.
• They have a fixed size

data
text
34
Heap segment

stack

• The heap segment is directly


the hole controlled by the programmer
• Blocks can be allocated or freed and
used for anything.
heap • It is writable
bss • It can grow larger, towards higher
memory addresses – or smaller, on
need
data
text
35
Stack segment

stack
• The stack segment is a temporary
scratch pad for functions
the hole
• Since eip changes on function calls,
the stack is used to remember the
previous state (return address, calling
heap function base, arguments, . . . ).
• It is writable
bss
• It can grow larger, towards lower
memory addresses – w.r.t to function
data calls.
text
36
In C

1 void test_function(int a, int b, int c, int d)


2 {
3 int flag;
4 char buffer[10];
5 flag = 31337;
6 buffer[0] = 'A';
7 }
8
9 int main()
10 {
11 test_function(1, 2, 3, 4);
12 }

37
Stack-based buffer overflows
C low-level responsibility

In C, the programmer is responsible for data integrity.


This means there are no guards to ensure data is freed, or that
the contents of a variable fits into memory,
This exposes memory leaks and buffer overflows

38
Reminder : stack layout for x86

return address f
saved frame pointer f

stack frame f
Code locals f
f: ...
call g
... arguments g

return address g
Data saved frame pointer g

val1 pointer to data


val2

stack frame g
locals g
buffer

39
Vulnerability reason

• When an array a is declared in C, space is reserved for it.


• a will be manipulated through offsets from its base
pointer.
• At run-time, no information about the array size is present
• Thus, it is allowed to copy data beyond the end of a

40
A rich history

1972 First document attack


1988 Morris worm
1995 NCSA httpd 1.3
1996 Smashing the Stack for Fun & Profit

41
Basic exploitation

return address f
saved frame pointer f

stack frame f
Code locals f
f: ...
call g
... arguments g

return address g
Data saved frame pointer g

val1 pointer to data


val2

stack frame g
injected code

locals g
42
Frame pointer overwriting

return address f
saved frame pointer f

stack frame f
Code locals f
f: ...
call g
... arguments g

return address g
Data saved frame pointer g

val1 pointer to data


val2 return address f
saved frame pointer f

stack frame g
locals g
injected code

43
Indirect pointer overwriting

return address f
saved frame pointer f

stack frame f
Code locals f
f: ...
call g
... arguments g

return address g
Data saved frame pointer g

val1 pointer to data


val2

stack frame g
injected code

locals g
44
Example 1
1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <string.h>
4
5 int check_authentication(char *password) {
6 int auth_flag = 0;
7 char password_buffer[16];
8 strcpy(password_buffer, password);
9 if (strcmp(password_buffer, "brillig") == 0)
10 auth_flag = 1;
11 if (strcmp(password_buffer, "outgrabe") == 0)
12 auth_flag = 1;
13 return auth_flag;
14 }
15
16 int main(int argc, char *argv[]) {
17 if (argc < 2) { printf("Usage: %s <password>\n", argv[0]); exit(0); }
18 if (check_authentication(argv[1])) printf("\nAccess Granted.\n");
19 else printf("\nAccess Denied.\n");
20 }
45
Example 2
1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <string.h>
4
5 int check_authentication(char *password) {
6 char password_buffer[16]; /* Putting buffers before variables to impede
7 int auth_flag = 0;
8 strcpy(password_buffer, password);
9 if (strcmp(password_buffer, "brillig") == 0)
10 auth_flag = 1;
11 if (strcmp(password_buffer, "outgrabe") == 0)
12 auth_flag = 1;
13 return auth_flag;
14 }
15
16 int main(int argc, char *argv[]) {
17 if (argc < 2) { printf("Usage: %s <password>\n", argv[0]); exit(0); }
18 if (check_authentication(argv[1])) printf("\nAccess Granted.\n");
19 else printf("\nAccess Denied.\n");
20 }
46
Constraints

Needs
• Hardware willing to execute data as code
• No null bytes

Variants
• Frame pointer corruption
• Causing an exception to execute a specific function
pointer

47
Statistics # (https://nvd.nist.gov/vuln)

48
Statistics % (https://nvd.nist.gov/vuln)

49
Heap-based buffer overflows
Vulnerability

Heap memory is dynamically allocated at runtime.


Arrays on the heap overflow just as well as those on the stack.
Warning
The heap grows towards higher addresses instead of lower
addresses.
This is the opposite of the stack.

50
Basic exploitation

Overwriting heap-based function pointers located after the


buffer

Overwriting virtual function pointer

1998 IE4 Heap overflow


2002 Slapper worm (Linux, OpenSSL)
CVE-2007-1365 OpenBSD 2nd remote exploits in 10 years
CVE-2017-11779 Windows DNS client

51
Overwriting heap-based function pointers

1 typedef struct _vulnerable_struct


2 {
3 char buff[MAX_LEN];
4 int (*cmp)(char*,char*);
5
6 } vulnerable;
7
8 int is_file_foobar_using_heap(vulnerable* s, char* one, char* two)
9 {
10 strcpy( s->buff, one );
11 strcat( s->buff, two );
12 return s->cmp(s->buff, "foobar");
13 }

52
Constraints

• Ability to determine the address of heap


• If string-based, no null-bytes
Variants
• Corrupt pointers in other (adjacent) data structures
• Corrupt heap metadata

53
Statistics # (https://nvd.nist.gov/vuln)

54
Statistics % (https://nvd.nist.gov/vuln)

55
Format strings
About format strings vulnerabilities

They were the ‘spork‘ of exploitation. ASLR? PIE?


NX Stack/Heap? No problem, fmt had you covered.

56
Vulnerability

Format functions are variadic.


1 int printf(const char *format, ...);

How it works
• The format string is copied to the output unless ’%’ is
encountered.
• Then the format specifier will manipulate the output.
• When an argument is required, it is expected to be on the
stack.

57
Caveat
And so ..
If an attacker is able to specify the format string, it is now
able to control what the function pops from the stack and
can make the program write to arbitrary memory locations.

CVEs

Software CVE
Zend 2015-8617
latex2rtf 2015-8106
VmWare 8x 2012-3569
WuFTPD (providing remote root since 1994) 2000

58
Good & Bad

Good Í Bad ë
1 int f (char *user) { 1 int f (char *user) {
2 printf("%s", user); 2 printf(user);
3 } 3 }

59
Exploitation

Badly formatted format parameters can lead to :

• arbitrary memory read (data leak)


• arbitrary memory write
• rewriting the .dtors section
• overwriting the Global Offset Table (.got)

60
Example
1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <string.h>
4
5 int main(int argc, char *argv[]) {
6 char text[1024];
7 static int test_val = 65;
8 if (argc < 2) {
9 printf("Usage: %s <text to print>\n", argv[0]);
10 exit(0);
11 }
12 strcpy(text, argv[1]);
13 printf("The right way to print user-controlled input:\n");
14 printf("%s", text);
15 printf("\nThe wrong way to print user-controlled input:\n");
16 printf(text);
17 // Debug output
18 printf("\n[*] test_val @ 0x%08x = %d 0x%08x\n",
19 &test_val, test_val, test_val);
20 exit(0);
21 } 61
Stack situation

fmt
...
argn
...
arg1
&fmt

62
Reading from arbitrary addresses

The %s format specifier can be used to read from arbitrary


addresses
1 $ ./fmt_vuln AAAA%08x.%08x.%08x.%08x
2 The right way to print user-controlled input:
3 AAAA%08x.%08x.%08x.%08x
4 The wrong way to print user-controlled input:
5 AAAAffffcbc0.f7ffcfd4.565555c7.41414141
6 [*] test_val @ 0x56557028 = 65 0x00000041

63
Printing local variable

1 $ ./fmt_vuln $(printf "\x28\x70\x55\x56")%08x.%08x.%08x.%s


2 The right way to print user-controlled input:
3 (pUV%08x.%08x.%08x.%s
4 The wrong way to print user-controlled input:
5 (pUVffffcbc0.f7ffcfd4.565555c7.A
6 [*] test_val @ 0x56557028 = 65 0x00000041

65 is the ASCII value of ’a’

64
Writing to arbitrary memory

As %s, %n can be used to write to arbitrary addresses.


1 $ ./fmt_vuln $(printf "\x28\x70\x55\x56")%08x.%08x.%08x.%n
2 The right way to print user-controlled input:
3 (pUV%08x.%08x.%08x.%n
4 The wrong way to print user-controlled input:
5 (pUVffffcbc0.f7ffcfd4.565555c7.
6 [*] test_val @ 0x56557028 = 31 0x0000001f

65
It may be unintentional

• printf("100% dave") prints stack entry above saved


eip
• printf("%s") prints bytes pointed to by that stack entry
• printf("%d %d %d ...") prints a series of stack entries
as integer
• printf("%08x %08x %08x ...") same but as
hexadecimal values
• printf("100% no way") writes 3 to the address
pointed to by stack entry

66
Statistics # (https://nvd.nist.gov/vuln)

67
Statistics % (https://nvd.nist.gov/vuln)

68
Looking back

Buffer overflow Format string


public since ≈ 1985 1999
dangerous 1990’s 2000
# exploits thousands dozens
considered security threat programming bug
techniques evolved & advanced basic
visibility sometimes hard easy

69
Play (exploitation) games

https://microcorruption.com

70
Questions ?

https://rbonichon.github.io/teaching/2020/asi36/
70

You might also like