cs107 Reader
cs107 Reader
cs107 Reader
S TA N F O R D C O M P U T E R S C I E N C E D E PA R T M E N T
Copyright © 2022
tufte-latex.googlecode.com
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in com-
pliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/
LICENSE-2.0. Unless required by applicable law or agreed to in writing, software distributed under the
License is distributed on an “as is” basis, without warranties or conditions of any kind, ei-
ther express or implied. See the License for the specific language governing permissions and limitations
under the License.
Contents
C Primer 26
gdb 46
Bibliography 144
Index 145
4
List of Figures
List of Tables
List of URLs
http://stanford.edu/~cgregg/107-Reader/107-Reader-code.zip
. . . . . . . .. . . . . . . . . . . . . . p. 11
http://vxer.org/lib/pdf/Reflections%20on%20Trusting%20Trust.
pdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . p. 13
http://web.stanford.edu/class/cs107/ . . . . . . . . . . . . . . . p. 9
http://www.apache.org/licenses/LICENSE-2.0 . . . . . . . . . . p. 2
http://www.cplusplus.com/reference/cstdio/printf/ . . . p. 28
http://www.imada.sdu.dk/Courses/DM18/Litteratur/IntelnATT.
htm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . p. 107
https://books.google.com/books?id=KfM2rgEACAAJ pp. 9, 129,
144
https://books.google.com/books?id=Yi5FI5QcdmYC . pp. 9, 144
https://cdecl.org . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . p. 44
https://docs.oracle.com/cd/E19957-01/816-2464/ncg_math.html
. . . . . . . .. . . . . . . . . . . . . . p. 99
https://en.wikipedia.org . . . . . . . . . .. . . . . . . . . . . . . . p. 11
https://en.wikipedia.org/wiki/C_(programming_language) . . p.
13
https://en.wikipedia.org/wiki/C_standard_library . . . . p. 29
https://en.wikipedia.org/wiki/Dennis_Ritchie . . . . . . . p. 13
https://en.wikipedia.org/wiki/Donald_Knuth . . . . . . . . . p. 13
https://en.wikipedia.org/wiki/Free_Software_Foundation . . p.
14
https://en.wikipedia.org/wiki/Git . .. . . . . . . . . . . . . . p. 16
https://en.wikipedia.org/wiki/Gratis_versus_libre . . . p. 14
https://en.wikipedia.org/wiki/IEEE_754 . . . . . . . . . . . . p. 93
https://en.wikipedia.org/wiki/IEEE_754#Roundings_to_nearest
. . . . . . . .. . . . . . . . . . . . . . p. 98
https://en.wikipedia.org/wiki/Ken_Thompson . . . . . . . . . p. 13
https://en.wikipedia.org/wiki/Linus_Torvalds . . . . . . . p. 13
https://en.wikipedia.org/wiki/Make_(software) . . . . . . p. 15
https://en.wikipedia.org/wiki/Setun .. . . . . . . . . . . . . . p. 19
https://en.wikipedia.org/wiki/TeX . .. . . . . . . . . . . . . . p. 13
https://en.wikipedia.org/wiki/The_Art_of_Computer_
Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . p. 13
https://en.wikipedia.org/wiki/Two%27s_complement . . . . p. 62
https://en.wikipedia.org/wiki/Unicode . . . . . . . . . . . . . p. 73
https://en.wikipedia.org/wiki/Unix .. . . . . . . . . . . . . . p. 13
https://en.wikipedia.org/wiki/Unix_philosophy . . . . . . p. 14
https://en.wikipedia.org/wiki/William_Kahan . . . . . . . . p. 95
https://en.wikipedia.org/wiki/X86 . .. . . . . . . . . . . . . p. 107
https://github.com . . . . . . . . . . . . . . . . . . . . . . . . . . . . . p. 16
https://randomascii.wordpress.com/2012/02/25/comparing-
floating-point-numbers-2012-edition/ . . . . . . . . . . . p. 103
https://software.intel.com/en-us/articles/intel-sdm p. 107
https://stackoverflow.com/a/1641963/561677 . . . . . . . . . p. 32
8
https://stackoverflow.com/a/8029624/561677 . . . . . . . . p. 137
https://stallman.org . . . . . . . . . . . . .. . . . . . . . . . . . . . p. 14
https://stanford.edu/~cgregg/107-Reader/float/convert.html
. . . . . . . .. . . . . . . . . . pp. 95, 97, 98
https://wikimediafoundation.org/wiki/Ways_to_Give . . . p. 11
https://www.bell-labs.com/usr/dmr/www/chist.pdf pp. 82, 144
https://www.h-schmidt.net/FloatConverter/IEEE754.html p. 98
https://www.strchr.com/x86_machine_code_statistics . p. 111
https://www.tutorialspoint.com/cprogramming/c_preprocessors.
htm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . p. 27
https://www.wired.com/2011/10/thedennisritchieeffect/ p. 13
mailto:cgregg@stanford.edu . . . . . . . . . . . . . . . . . . . . . . p. 11
9
Introduction
http://web.stanford.edu/class/cs107/
the course material. That said, searching online for the meaning of
error messages, or for syntax questions is encouraged.
A frequent source of questions about the course material come
in a form that could easily be tested by the student with a bit of C
code. For example, a student might ask, “I know that pointer arith-
metic cannot be done on void* pointers. So... if we have two void* point-
ers, void* x and void* y, can we find the difference between their ad-
dresses by doing y - x or would that count as pointer arithmetic?” This
is a great question! However, it is also a question that can be tested
with a little bit of C code, as shown in Figure 1. The student who
can produce C code to answer questions such as this one have truly
learned to swim in the programming ocean!
int main ()
{
int m = 0;
int n = 1;
void * x = & m ;
void * y = & n ;
return 0;
}
$ gcc -g -O0 -std=gnu99 -Wall -Wfloat-equal -Wtype-limits \
-Wpointer-arith -Wlogical-op -Wshadow \
-fno-diagnostics-show-option void_star_test.c \
-o void_star_test
void_star_test.c: In function `main':
void_star_test.c:12:29: warning: pointer of type `void *'
used in subtraction
printf("x - y: %lun",x - y);
^
$ cd
$ wget http://stanford.edu/~cgregg/107-Reader/107-Reader-code.zip
$ unzip 107-Reader-code.zip
$ cd code
$ make
foreign to you, you are not alone – many students who take CS 107
have never used the terminal, nor have they directly used any of the
tools used to turn their programs into runnable applications.
In order to get up to speed using the Linux terminal, you should
see the course website, which has a number of tutorial videos that
describe how to set up your computer to log into the Myth com-
puters at Stanford. If you are going through this reader and not at
Stanford, unfortunately you won’t have access to the Myth comput-
ers, but much of the online material can be used on a generic Linux
command line. If you don’t have a computer that natively runs
Linux as its operating system, I suggest installing a virtual machine
for Linux.8 Once you install a virtual machine, you should install 8
An online search for “how to install a
the latest version of gcc and gdb, which will enable you to follow Linux VM” will lead to many results.
Unix is an operating system that dates from the early 1970s and
was originally created primarily by Ken Thompson9 and Dennis 9
Thompson’s Reflections on Trusting
Ritchie10 at Bell Labs. It was the first operating system that was Trust is a terrifying look at why
you can never completely trust your
written to be portable, and it was completely written in the C pro- compiler
gramming language11 , also created by Dennis Ritchie. In the early 10
See this Wired Magazine tribute
1990s, Linus Torvalds began an open-source version of Unix that he about Dennis Ritchie to understand
why he deserves more name recogni-
called Linux, and Linux has become a standard, free, open source tion
operating system used on millions of computers around the world. 11
Except for the machine specific parts,
which were written in Assembly Lan-
guage for each particular computer.
But, the bulk of the operating system
was written in C
cs 107 reader 14
ple, the cat program has one job, and that is to read a file and print
the file to the terminal. It is a simple program, and it does the job
well. We will spend a lot of time in CS 107 writing programs that
do one thing well – sometimes you may think we abuse the notion
of simple (because the programs we write can be complex to write),
but all of the programs you write for class will do a single thing.13 13
Possibly in different ways. For
example, you will write a sorting
program that will sort based on
Compiling programs using gcc options, such as reverse-ordering, or
based on the length of a line.
$ make sat
$ make clean
$ make
cs 107 reader 16
Using git
You will work on all of your projects for CS 107 in your local direc-
tory on the Stanford computer network. However, your code will
be copied from a different directory, also located on the network.
Specifically, for each assignment, you will run a command similar
to the following (which is the command for assignment 0):
The first command commits the utf8.c file to your local reposi-
tory, with a message (-m) that says what the change accomplished18 18
Good commit messages are impor-
The second command pushes the change to the master reposi- tant for finding old versions!
tory, where it originally came from. All previous commits are stored,
as well.
Let’s say that you made another change and committed it, but
it turns out that you deleted an important function, accidentally. If
you need to go back to a previous commit temporarily, you can do
so as follows:
$ git log
commit d0fcf945876592e2bb74e26f31fcad7ffb850451
Author: Chris Gregg <tofergregg@gmail.com>
Date: Fri Dec 15 14:38:27 2017 -0800
commit fb8f84f57672b0b9ded5cd4dd737141650cf7aea
Author: Chris Gregg <tofergregg@gmail.com>
Date: Fri Dec 15 14:33:34 2017 -0800
...
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:
cgregg@myth32:~/cs107/assign1$ tools/sanitycheck
Base 10
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, ..., 97, 98, 99, 100, 101, ..., 998, 999, 1000, ...
11
178
+ 456
634
When subtracting two base 10 numbers, we borrow when we are
trying to subtract a digit from a smaller digit, treating the subtrac-
tion as the 10 + dsmaller − dlarger , and reducing the digit that was
carried from by one:
2 14
63
4
− 416
218
Two definitions you should become familiar with are least signifi-
cant digit and most significant digit:
The least significant digit is the digit that matters the least in a
number. In the number 456, the 6 represents just 6, and the 4 repre-
sents 400, so the 6 is least significant and the 4 is most significant.
Base 2 (binary)
Base 2, or binary notation, has only the digits 0 and 1. Just like
counting in base 10, when the digits are exhausted, you must start
with a new 1:
0, 1, 10, 11, 100, 101, 110, 111, 1000, 1001, 1010, 1011, 1100, 1101, ...
As you can see, binary numbers can get long quickly, as there are
only two digits to write with!
Base 2 notation is comprised of powers of 2 (instead of powers of
10), so a base 2 number with four digits, b3 b2 b1 b0 is represented as
follows:
b3 × 23 + b2 × 22 + b1 × 21 + b0 × 20
For example, 1101b is: To differentiate between binary and
decimal representations, we use b and
1 × 23 + 1 × 22 + 0 × 21 + 1 × 20 = 8 + 4 + 0 + 1 = 13d d following each number
11
1011
+ 1011
10110
When subtracting two binary numbers, the borrow happens any
time a 1 is subtracted from a 0:
10
1
0110
− 1010
1100
Base 16 (hexadecimal)
Base 16 may at first seem like an odd choice for a base for comput-
ing. It has more digits than decimal, and computers are fundamen-
tally binary. However, one of the downsides of binary is that binary
cs 107 reader 22
Because both the base for both binary and hexadecimal formats
are divisible by 2, the conversion between them is straightforward.
Every four binary digits (or bits) is equivalent to one hexadecimal
digit. Tables 1 and 2 show the conversion from hex digit to binary
(and decimal). You should memorize the binary representations for
each hex digit. One trick is to memorize A (1010), C (1100), and F
(1111), and the others are easy to figure out.
Hex digit 0 1 2 3 4 5 6 7
Decimal value 0 1 2 3 4 5 6 7
Binary value 0000 0001 0010 0011 0100 0101 0110 0111
Table 1: Hexadecimal to binary conversion for 0 - 7
cs 107 reader 23
Hex digit 8 9 A B C D E F
Decimal value 8 9 10 11 12 13 14 15
Binary value 1000 1001 1010 1011 1100 1101 1110 1111
Table 2: Hexadecimal to binary conversion for 8 - 15
0x1 = 0001
0x3 = 0011
0x5 = 0101
0xa = 1010
0xf = 1111
0x2 = 0010
For example:
NAME
ascii - ASCII character set encoded in octal, decimal, and hexadecimal
DESCRIPTION
ASCII is the American Standard Code for Information Interchange.
It is a 7-bit code. Many 8-bit codes (such as ISO 8859-1, the
Linux default character set) contain ASCII as their lower half.
The international counterpart of ASCII is known as ISO 646.
C Primer
As you begin CS 107, we expect that you have had a good intro- Figure 3: The C Programming Language
by Brian Kernighan and Dennis
duction to programming course, and a good follow-on course that
Ritchie (also known as “K&R”) is the
covers typical data structures content for a second programming definitive text on C and should be in
course. At Stanford, CS 106A and CS 106B are excellent prepa- every C programmer’s library.
ration for CS 107 and they are taught in Java and C++, which are
both based on C syntax, and both have similar function structure,
loop constructs, variable naming and scoping. The transition to C
after coding in Java and/or C++ is relatively straightforward, and
you should quickly feel at home. If your background is in a lan-
guage like Python, Ruby, R, PHP, or a functional language like Lisp or
Haskell, you may need more time to assimilate to C.
Whatever your background, C does come with its own intricacies,
some of which are historical in nature, and some of which might
be, to some, a bit dated. You will write a great deal of of pointer-
based code that provides very little safety, and this is just how C
rolls. C does not have any notion of classes, and memory manage-
ment is neither garbage collected like in Java, nor object-based
like in C++. But, C is actually a language that is simple enough that
you can become an expert in the language – once you learn a few
crucial details, C becomes a language that allows you to do ex-
actly what you want, and fast, though sometimes at the expense of
“ease.”
As discussed in Unix, the Command Line, gcc, and Makefiles, C was
invented in the 1970s by Dennis Ritchie as a systems language that
could be used to write low-level, portable code that could be run on
any computer with a C compiler. Ritche and Ken Thompson wrote
Unix in C, and the popularity of Unix cemented C into a key role in
systems programming ever since.
This chapter is designed to provide an introduction to program-
ming in C -vs- other C-syntax based languages – we primarily want
to show you some of the nuances in C that you will need to get
used to as you start CS 107. C++ actually shares many of the fea-
tures described in the chapter (and Java shares some, as well), but
in CS 106B we generally focus on other C++ features than the ones
discussed here. The chapter isn’t in any particular order, and can be
used as a reference to some of the syntax as you begin CS 107.
cs 107 reader 27
C Basics
# include < stdio .h > These are the two typical library
include statements, for input/output
# include < stdlib .h > and the C standard library.
void scalearray ( int * input , const int * scale , Function prototypes declare functions
size_t nelems ) ; before they are defined. A function
must be declared before it can be used
void converttohours ( int * input , size_t nelems ) ; in C, and we normally put function
int workyearstohours ( int days ); declarations at the top of a file so we
can use them from any function below.
int main ( int argc , char * argv []) All C programs begin at the main
{ function, which must return an int.
See below for more information about
// jobdays initially holds the argc and argv command line
// the number of days worked per job arguments.
int jobdays [] = {1 , 5 , 7 , 2 , 4 , 8 , 3}; C arrays can be declared with the
“[]” notation and an initial comma
separated list of values. The compiler
// workersperday holds the number of workers will determine the size of the array
accordingly.
// who worked each day on a job
int workersperday [] = {2 , 5 , 7 , 1 , 9 , 4 , 5};
The first parameter is the format string, which contains the string
that will be printed, along with optional embedded format specifiers,
which start with the “%” character. For every format specifier, there
is an additional parameter that provides the value to be format-
ted.23 Most often, the return value for printf is ignored, but the 23
for a comprehensive list, see this
return value is the number of characters that were printed. For reference.
example,
int x = 5;
char c = 'A ';
char * str = " some words " ;
printf ( " x : %d , a character : %c , a string : % s \ n " ,x ,c , str ) ;
Special characters in the format string are prepended with a
backslash, “\”. The most common special character is the newline
cs 107 reader 29
5000000000
$ man atoi
which brings up the manual entry for the function from the
stdlib library. All the standard library functions are in section 3
cs 107 reader 30
int main ()
{
int arr [] = {1 ,3 ,4 ,2 ,7 ,9};
return 0;
}
4
9
42
sizeof(arr) in main: 24
sizeof(arr) in function: 8
int main ()
{
char cstr1 [] = " A string " ;
char * cstr2 = " Another string " ;
If you took CS 106B, you have worked with pointers before. If you
only have Java experience, you may not have seen pointers before.
A pointer is, simply, a variable that holds a memory address. On
the Myth machines, all pointers are 8 bytes (64 bits) long. Although
under the hood all pointers are identical, C enforces a type on point-
ers so the compiler can calculate the element offsets for arrays.31 31
This has an added benefit of provid-
ing some type safety when compiling,
and the compiler can warn you when
you use a pointer of the incorrect type.
cs 107 reader 34
int main ()
{
int values [] = {1 ,3 ,5 ,2 ,4 ,6};
print_array ( values , sizeof ( values ) /
sizeof ( values [0]) ) ;
return 0;
}
*x = *y;
* y = tmp ;
}
int main ()
{
int a = 5;
int b = 12;
printf ( " a : %d , b : % d \ n " ,a , b) ;
swap (& a ,& b ) ;
printf ( " a : %d , b : % d \ n " ,a , b) ;
return 0;
}
int main ()
{
int arr [] = {5 , 12};
a: 5, b: 12
a: 12, b: 5
It is helpful to draw pictures for the situation covered in swap2.
This is a critical understanding point in CS 107 – if you are at all in
doubt about pointers or pointers-to-pointers, draw a picture! Figure
4 shows a possible memory layout for swap2; the addresses are
made up for the purpose of this example. The arrows graphically
show the pointer references, but often it is instructive to see the
numbers, just to make it more concrete. Recall from above that
pointers are 8 bytes, and ints are 4 bytes. Also note that once in
swap2, the function no longer has access to the variables from main.
Memory Memory
Address Value Address Value Address Value Address Value Address Value
0x2024 0x1004 12 0x300c 0x2020 0x2024 0x1004 12
0x1004 (y)
0x2020 (bptr) 0x1000 5 (arr) 0x3008 0x2020 0x1004 0x1000 5
0x201c 0x3004 0x2010 0x201c
(x)
0x2018 0x3000 0x2018
0x2014 0x2014
0x1000
0x2010 (aptr) 0x2010 0x1000
Figure 4a: Before the swap, in main Figure 4b: Before the swap, in swap2
Memory Memory
Address Value Address Value Address Value Address Value Address Value
0x300c 0x2020 0x2024 0x1004 12 0x2024 0x1004 12
(y) 0x1000
0x3008 0x2020 0x1000 0x1000 5 0x2020 (pbtr) 0x1000 5 (arr)
0x3004 0x2010 0x201c 0x201c
(x)
0x3000 0x2018 0x2018
0x2014 0x2014
0x1004
0x2010 0x1004 0x2010 (aptr)
Figure 4c: After the swap, in swap2 Figure 4d: After the swap, in main
Memory Memory
Address Value Address Value Address Value Address Value Address Value
0x2024 0x1004 12 0x300c 0x2020 0x2024 0x1004 12
0x1004 (y)
0x2020 (bptr) 0x1000 5 (arr) 0x3008 0x2020 0x1004 0x1000 5
0x201c 0x3004 0x2010 0x201c
(x)
0x2018 0x3000 0x2018
0x2014 0x2014
0x1000
0x2010 (aptr) 0x2010 0x1000
Figure 5a: Before the swap, in main Figure 5b: Before the swap, in swap2
Memory Memory
Address Value Address Value Address Value Address Value Address Value
0x300c 0x2020 0x2024 0x1004 5 0x2024 0x1004 5
(y) 0x1004
0x3008 0x2020 0x1004 0x1000 12 0x2020 (aptr) 0x1000 12 (arr)
0x3004 0x2010 0x201c 0x201c
(x)
0x3000 0x2018 0x2018
0x2014 0x2014
0x1000
0x2010 0x1000 0x2010 (bptr)
Figure 5c: After the swap, in swap2 Figure 5d: After the swap, in main
any type information associated with the data it points to. This
necessarily means that when faced with a void * pointer, the com-
piler must be told to intrepret the pointer as a particular type.34 34
But not necessarily the exact under-
This means that you must cast void * pointers to some type. Often, lying type!
we will not have enough information about the type to tell exactly
what it is, so we have to rely on simply walking through the bytes
one-by-one. In order to walk an array one byte at a time, the com-
piler must be told to interpret the pointer as pointing to a 1-byte
data type. The only 1-byte data type we have in C is the char type,
so we you will frequently see casts to char * pointers, even though
the underlying data has nothing at all to do with “characters,” and
are simply bytes of another data type. For example, the following
(annotated) program prints out the raw bytes (in hex) of any array,
which is passed in as a void * array. Notice that we must have the
length of the array in bytes:
// file : rawbytes . c
# include < stdio .h >
# include < stdlib .h >
...
tag var ; // declare var to be of type struct tag
To access a struct variable’s fields, you use dot notation (e.g.,
var.a), and to access a pointer to a struct variable’s fields, you
use arrow notation (e.g., var->a). Here is a complete example to
demonstrate how to use structs:
// file : fraction . c
# include < stdio .h >
# include < stdlib .h >
struct fraction {
int numerator ;
int denominator ;
};
printfrac (& f1 ) ;
printfrac (& f2 ) ;
return 0;
}
// reduce
int fgcd = gcd ( f . numerator ,f . denominator ) ;
f . numerator /= fgcd ;
f . denominator /= fgcd ;
return f ;
}
sired. Heap allocated memory does not suffer from the same size
restrictions as memory from the stack, so it also allows you to re-
quest large blocks of memory.
It is good practice to free memory that is no longer needed, to
avoid memory leaks. The operating system will reclaim any unfreed
memory when a program ends, but you should ensure that your
program frees all of its memory before the program ends38 38
And you will lose points in CS
The details for all of the dynamic memory allocation functions 107 for not freeing memory. We will
discuss the use of valgrind to find
are located on malloc’s man page39 , but here are the function signa- memory errors and leaks as the course
tures: progresses.
39
man malloc
cs 107 reader 42
values = new_values ;
num_ints *= 2;
// clean up
free ( values ) ;
return 0;
}
Using cdecl
In the first case, the chars themselves may not be changed, in the
second case, the pointer to char may not be changed, and in the fi-
nal case, the pointer to pointer to the char may not be changed. The
cdecl program can come in handy when you start to get confused
by type declarations!
Boolean Values
Final Thoughts
using the man pages to look up library functions, you will be ready
to excel in CS 107.
cs 107 reader 46
gdb
Probably the most important tool you will use during CS 107
is the command line debugger, gdb. Gdb allows you to step through
your code line-by-line, investigate variables while your program is
running, set breakpoints to stop your code at particular lines, look
at the assembly code that gcc generates, and many more functions
that will help you write and debug your code. It is imperative that
you become proficient at gdb early during CS 107!
For the next part of this chapter, we will use the following buggy
example program to demonstrate the use of gdb:
// file : digits_buggy . c
return 0;
}
We can compile our program as follows:
$ ./digits_buggy 8675309
eight six seven five three zero nine
$ ./digits_buggy 8675309
Segmentation fault (core dumped)
$ gdb digits_buggy
The target architecture is assumed to be i386:x86-64
Reading symbols from digits_buggy...done.
(gdb) run 8675309
The run command starts the program.
Starting program:
It can take command line arguments
/afs/.ir.stanford.edu/users/c/g/cgregg/cs107 just as if you were running the pro-
/reader/digits_buggy 8675309 gram on the command line.
46 }
47 printf("\n");
48
49 return 0;
We are investigating the number and i
(gdb) p number variables.
$3 = 8675309
(gdb) p i
$4 = 4
The call command will run a function.
(gdb) call digit_place(number,i) Here, we are confirming that the
$10 = 86 digit_place function is giving us an
incorrect result (it should be 5 instead
(gdb) break 44 if i==4 of 86).
Breakpoint 1 at 0x4006f2: file digits_buggy.c, line 44. The break command tells gdb to
(gdb) run stop running at a particular line; in
this case, line 44. We have also told
The program being debugged has been started already. gdb to stop when i is equal to 4, so
Start it from the beginning? (y or n) y we can continue at that point (the
condition is optional). When we run
the program again, it uses the last
Starting program: /afs/.ir.stanford.edu/users/c/g/cgregg/cs107 command line arguments, and it starts
/reader/gdb/digits_buggy 8675309 the program over. Then, it breaks on
the line or function where we have set
a breakpoint.
Breakpoint 1, main (argc=2, argv=0x7fffffffeaa8)
at digits_buggy.c:44
44 int digit = digit_place(number,i);
(gdb) p i
$3 = 4 The program stopped at line 44 when
(gdb) step i was equal to 4, before that line runs,
and we use the step command to run
digit_place (d=8675309, tensplace=4) at digits_buggy.c:30
the next line in the program, which
30 int p10 = powerof10(tensplace); steps into the digit_place function. We
(gdb) next use the next command to run the next
line but to run the entire function (in
32 return d / (p10 * 10) % p10; this case, the powerof10 function).
(gdb) p d
$4 = 8675309 Now we can start to analyze the bug.
(gdb) p p10 We print out the d and p10 variables,
and see that they are what we expect.
$5 = 10000
(gdb) p d / (p10 * 10) % p10 When we print out the details of the
line that is about to be returned, we
$6 = 86
realize that this is incorrect. We then
(gdb) p d % (p10 * 10) / p10 (hopefully!) recognize that we have the
$7 = 7 modulus (%) and division swapped,
so we print out the expression with
(gdb) quit the operators in the correct position,
A debugging session is active. and see that we get the correct answer,
which is the 7 extracted from 8675309.
We have found a bug, and we can quit
Inferior 1 [process 14432] will be killed. gdb to go fix it.
Quit anyway? (y or n) y
cs 107 reader 50
$ ./digits_buggy 8675309
six seven five three zero nine
This is better! But, we are missing the eight that we are expect-
ing. Back to gdb!
$ gdb digits_buggy
The target architecture is assumed to be i386:x86-64
Reading symbols from digits_buggy...done.
(gdb) run 8675309
If we run the program from gdb, we
Starting program: /afs/.ir.stanford.edu/users/c/g/cgregg/cs107 find see that it is missing the first digit
/reader/gdb/digits_buggy 8675309 (eight).
six seven five three zero nine
[Inferior 1 (process 22983) exited normally]
(gdb) l
29 { We list the program (l) to determine
30 int p10 = powerof10(tensplace); where to break.
31
32 //return d / (p10 * 10) % p10; // fix: swap mod and div
33 return d % (p10 * 10) / p10;
34 }
35
36 int main(int argc, char **argv)
37 {
38 char *digit_str[] = {"zero","one","two","three",
(gdb)
39 "four","five","six","seven",
40 "eight","nine"};
41
42 int number = atoi(argv[1]); // from command line
43 int ndigits = numdigits(number);
44 for (int i=ndigits-1; i >= 0; i--) {
45 int digit = digit_place(number,i);
46 printf("%s ",digit_str[digit]);
47 }
48 printf("\n");
(gdb) break 45 We decide to break on line 45 to
investigate some more.
Breakpoint 1 at 0x4006f2: file digits_buggy.c, line 45.
(gdb) run
Starting program: /afs/.ir.stanford.edu/users/c/g/cgregg/cs107
/reader/gdb/digits_buggy 8675309
cs 107 reader 51
The commands in the above gdb traces are probably the most useful
commands, and you will use almost all of the them frequently. The
list below has other commands that you will want to know when
debugging C code, as well.
(gdb) help x
Examine memory: x/FMT ADDRESS.
ADDRESS is an expression for the memory address to examine.
FMT is a repeat count followed by a format letter and a size letter.
Format letters are o(octal), x(hex), d(decimal), u(unsigned decimal),
t(binary), f(float), a(address), i(instruction), c(char), s(string)
and z(hex, zero padded on the left).
Size letters are b(byte), h(halfword), w(word), g(giant, 8 bytes).
The specified number of objects of the specified size are printed
according to the format.
Defaults for format and size letters are those previously used.
Default count is 1. Default address is following last thing printed
with this command or "print".
The x command is also useful if you want to print out the mem-
ory contents of a variable. For example, let’s say we had the fol-
lowing function, which simply increments the value of the int that
xptr points to:
void add1 ( int * xptr )
{
(* xptr ) ++;
}
In gdb, you could investigate xptr in a number of different ways:
00
01
10
11
Three bits gives us eight representations: 000, 001, 010, 011, 100, 101, 110, 111.
It follows that adding an extra bit produces two times more states,
and therefore for n bits, we have 2n states available to us.
As you can see, we are actually just counting, assigning each
state a different number. In this way, we can encode anything we
want with bits. We can encode text characters in the ASCII format,
for instance. The character value for “A” is 65d, or 01000001b42 We 42
(See Tables 1 and 2 for the decimal
encode images, music, and video numerically, and therefore, using and hexadecimal ASCII character
values.
bits.
In this chapter, we will discuss how computers represent and
perform calculations on integers43 , and the imporant role bits play 43
We will tackle non-integers, or
in those calculations. We will also delve into the way C handles in- floating point numbers in a later chapter
Signed Integers: these are the negative and positive integers, and
zero.
// file : multtest . c
int main () {
int a = 200;
int b = 300;
int c = 400;
int d = 500;
int answer = a * b * c * d ;
printf ( " % d \ n " , answer ) ;
return 0;
}
$ ./multtest
-884901888
Why did we get a negative result? The answer is that we just did
not have enough bits to represent the number we were trying to
calculate:
cs 107 reader 56
Information Storage
7 2 8 3 14 99 -6 3 45 11
Figure 6: Memory address for an array
values (ints):
of 4-byte ints
address (decimal): 200d 204d 208d 212d 216d 220d 224d 228d 232d 236d
address (hex): 0xc8 0xcc 0xd0 0xd4 0xd8 0xdc 0xe0 0xe4 0xe8 0xec
You cannot address a particular bit, and you must address at the
byte level.
How much memory can our computers address? The limit hap-
pens to be the underlying word size of the computer, and the Myth
computers have a word size of 64 bits.47 In other words, an address 47
They are 64-bit machines.
is allocated a 64-bit value, and because each byte is addressable, the
Myth machines could reference up to 264 bytes of memory, which
is a lot of memory. The Linux operating system imposes a memory
limit of 248 bytes, which is still a tremendous amount of memory.48 48
A high-end computer these days
Because a byte is made up of 8 bits, the range of a byte is 00000000 might have 64 Gigabytes (GB) of
memory, or 68, 719, 476, 736 bytes of
to 11111111. As discussed in Numerical Formats Used in CS 107, it is memory (“Giga” in this case referring
more convenient to represnt bytes in hexadecimal, and the hexadec- to 230 bytes). Linux is capable of
addressing over four thousand times
imal range for a byte is 0x00 to 0xff. more memory than 64GB. It will be
As discussed above, the Myth computers represent ints with 32 a long time before Linux needs to
bits. Table 4 shows the other representations used by gcc on the modify the 248 byte limit.
Myth computers.
cs 107 reader 57
C Declaration Bytes
Signed Unsigned
char unsigned char 1
short unsigned short 2
int unsigned int 4
long unsigned long 8
int32_t uint32_t 4
int64_t uint64_t 8
char * 8
float 4
double 8
Table 4: C Data Sizes on the Myth Computers
0x01234567
0x 01 23 45 67
There are two natural orderings for storing those bytes into
memory:
Big Endian: the order in memory is from the “big end” of the num-
ber (i.e., the most significant byte) at the lowest memory location,
to the “little end” of the number (i.e., the least significant byte)
at the highest memory location. Writing out our number from
above, big endian format would look like Figure 7:
Little Endian: the order in memory is from the little end of the
number at the lowest memory location, to the big end of the
number at the highest memory location. Writing out our number
from above, little endian format would look like Figure 8, and
this is the form in which integers on the Myth computers are
stored in:
The bitwise operators apply to all the bits on a number at once. Figure 12: The NOT truth table.
For example: INPUT OUTPUT
A ~A
0 1
1 0
cs 107 reader 59
10011001
&10100111
10000001
The AND operator is applied to the corresponding bits in both
numbers to produce the resulting number.
Bit Masking
x == 0b10001001101010111100110111101111
0xFF == 0b00000000000000000000000011111111
When the AND operator is appied, the only bits that remain are
the lowest eight bits, and we have “masked” x to get the lower eight
bits alone.
Example: write an expression that sets the most significant byte
of a 4-byte integer, n to all ones, and all other bytes of the number
left unchanged E.g.
0x87654321 → 0xFF654321
x | 0xFF000000
0x87654321 → 0x879abcde
x ^ ~0xFF000000
Shift Operations
There are actually two flavors of right shift, which work differ-
ently depending on the value and type of the number you are shift-
ing.
A logical right shift moves the values to the right, replacing the
upper bits with zeros.
An arithmetic right shift moves the values to the right, replacing
the upper bits with a copy of the most significant bit.51 51
This may seem weird! But, we will
Example logical shift for an 8-bit binary number: see why this is useful soon!
There are two important things you need to consider when using
the shift operators:
Integer Representations in C
0 001 = 1 1 001 = -1
0 010 = 2 1 010 = -2
0 011 = 3 1 011 = -3
0 100 = 4 1 100 = -4
0 101 = 5 1 101 = -5
0 110 = 6 1 110 = -6
0 111 = 7 1 111 = -7
0 000 = ? 1 000 = ?
It seems that we are left with two representations for zero: posi-
tive zero and negative zero. For integers, this does not make sense,
and we would rather have a single zero representation.53 Addi- 53
In the Floating Point chapter, we
tionally, and perhaps more imporantly, performing calculations on will see that we do have two zero
representations for non-integers.
positive and negative numbers in the representation above is not
particularly easy, because there is an inherent need to subtract with
numbers that are negative.
Instead of the method outlined above, signed integers in a
computer are represented in a format called Two’s compliment.
The two’s complement of an n-bit number is defined to be the
compliment of the number with respect to 2n . For example, the
four bit number 0110 has a two’s complement of 1010 because
cs 107 reader 62
holds.
The standard method for calcluating the two’s complement of a
number is to invert all the bits55 , and then add 1. For the example 55
called the one’s complement, or simply
above: the bitwise NOT operation.
~0110 = 1001
1001 + 1 = 1010
~1010 = 0101
0101 + 1 = 0110
If a number overflows to the left past the bit length, we just ignore
the bits that overflowed. For example, to find the two’s complement
of the four-bit number, 0000:
~0000 = 1111
1111 + 1 = 1
0000
~1000 = 0111
Figure 13: The two’s complement circle
0111 + 1 = 1000 for a signed 4-bit number
0
-1 1
We will soon see why this is the case!
-2 1111 0000 0001 2
1110 0010
The Two’s Complement Circle -3
1101 0011
3
Figure 13 shows the “two’s complement circle” for for a 4-bit -4 1100 0100 4
signed integer. The figure shows the correspondence between the 1011 0101
decimal and binary numbers in the 4-bit range. There are a few key -5 5
1010 0110
ideas to take away from the diagram: -6
1001 0111
1000 6
1001 0111
5. Overflow happens at the bottom of the circle. If we try to add 10 1000 6
two positive numbers together that are bigger than 2n−1 , the 9
8
7
Figure 14 shows the two’s complement circle for for a 4-bit un-
signed integer. Note the differences between this signed and un-
signed circle:
1. All numbers are the binary equivalent of their value.
So far, you may be asking yourself what the big deal is with two’s
complement numbers. It turns out that the use of two’s comple-
ment to store signed integers leads to a number of interesting and
helpful properties:
1. There is only one representation for 0. As we saw earlier, a sim-
ple sign-bit representation leads to having two zeros, and two’s
complement avoids this.
1101 −3
+0111 7
10100
4
and indeed, we do!
cs 107 reader 64
0011 3
+1011 −5
1110 −2
From a hardware perspective, the ben-
efit of two’s complement artithmetic
5. Multiplication is also a matter of ignoring the fact that one or is that there is no need for separate
subtraction hardware, and performing
both numbers is negative, and simply throwing away overflow a two’s compliment conversion is a
bits.57 For example: −2 × 3: simple hardare circuit.
57
One caveat: multiplication is much
1110 −2 more likely to produce actual overflow
mathematically. But as long as the
×0011 3 product fits into the number of bits
1110 in the format, the multiplaction is
1110 straighforward.
0000
0000
0101010
−6
1111 −1
×1110 −2
0000
1111
1111
1111
10010010
2
In C, we can cast between different number types either explic- Figure 16: An implicit cast between
signed and unsigned ints.
itly, or implicitly. Figure 15 demonstrates an explicit cast, using
parenthesized type notation for the cast. Figure 16 demonstrates an int tx , ty ;
unsigned ux , uy ;
implicit cast, where the compiler performs the cast for us. ...
When casting, the underlying bits do not change. In other tx = ux ; // cast to signed
uy = ty ; // cast to unsigned
words, casting between an unsigned integer and a signed integer
cs 107 reader 65
int main ()
{
char tx ;
unsigned char ux ;
tx = -5;
ux = tx ;
%u : unsigned int
int main ()
{
int d = -1;
unsigned int u = 1 < <31;
return 0;
}
-1 < 0U Unsigned 0
2147483647 > -2147483647 - 1 Signed 1
2147483647U > -2147483647 - 1 Unsigned 0
2147483647 > (int)2147483648U Signed 1
TRUE Signed 1
(unsigned)-1 > -2 Unsigned 1
cs 107 reader 67
bytes that the pointer itself occupies, which is 8-bytes on the Myth
machines.
unsigned short s = 4;
// short is a 16-bit format, so s = 0000 0000 0000 0100b
unsigned int i = s;
// conversion to 32-bit int, so i = 0000 0000 0000 0000 0000 0000 0000 0100b
// file : sizeof . c
# include < stdio .h >
# include < stdlib .h >
int main () {
int iarray [] = {5 , 4 , 3 , 2 , 1};
char * iarr = ( char *) iarray ;
return 0;
}
Unsigned Integer Two’s Complement Representation Table 7: Four and eight bit signed
two’s complement representation.
Decimal value 8-bit value 16-bit value
5 00000101 0000000000000101
82 01010010 0000000001010010
127 01111111 0000000011111111
-5 11111011 1111111111111011
-82 10101110 1111111110101110
-127 10000001 1111111110000001
E.g., for positive numbers:
short s = 4;
// short is a 16-bit format, so s = 0000 0000 0000 0100b
int i = s;
// conversion to 32-bit int, so i = 0000 0000 0000 0000 0000 0000 0000 0100b
short s = -4;
// short is a 16-bit format, so s = 1111 1111 1111 1100b
int i = s;
// conversion to 32-bit int, so i = 1111 1111 1111 1111 1111 1111 1111 1100b
Truncating Numbers
int main ()
{
int x = 53191; // 53191
short sx = ( short ) x ; // -12345
int y = sx ;
int main ()
{
int x = -3;
short sx = ( short ) x ;
int y = sx ;
The value of sx is 62, 464, which has lost the information about
the most significant bit in the 32-bit version.
When integer operations overflow in C, the runtime does not pro-
duce an error. According to the C specification, unsigned integers
will overflow in a defined behavior (as we saw on the number cir-
cle), although you should be wary of using calculations that could Figure 22: The following is one way to
determine if addition of two unsigned
potentially overflow, and you should be careful with your code if
ints will overflow
that is a possibility. Figure 22 demonstrates one way to determine if
# include < limits .h >
addition on two unsigned ints will overflow. unsigned int a = < something >;
According to the C specification, signed integers produce undefined unsigned int x = < something >;
if ( a > UINT_MAX - x )
behavior when they overflow. On most machines, the overflow will
/* 'a + x ' would
behave as we saw in the number circle, however, the compiler is overflow */ ;
free to ignore this and can assume that numbers will not overflow.
So, you should write your programs to be careful not to overflow
Figure 23: The following is one way
with signed numbers, or to check if that will be the case. Figure 23 to determine if addition of two signed
demonstrates one way to check for overflow addition. ints will overflow
# include < limits .h >
int a = < something >;
Final Thoughts int x = < something >;
if (( x > 0) && ( a > INT_MAX
Having an excellent grasp of how integers are stored and manip- - x))
/* 'a + x ' would
ulated in a computer is fundamental to understanding how C han-
overflow */ ;
dles integers in programs. The two’s complement format does take if (( x < 0) && ( a < INT_MIN
- x))
/* 'a + x ' would
underflow */ ;
cs 107 reader 72
// convert to uppercase
while (* string ) {
if ( isalpha (* string ) ) {
printf ( " % c " , toupper (* string ) ) ;
} else {
printf ( " % c " ,* string ) ;
}
string ++;
}
printf ( " \ n " ) ;
return 0;
}
cs 107 reader 75
C Strings in Memory
return 0;
}
return 0;
}
strcpy : Copies src to dst, including the null byte. The caller is
responsible for ensuring that there is enough space in dst to hold
the entire copy. The strings may not overlap.68 Prototype: 68
Be careful! strcpy is responsible
for many buffer overruns, which are a
char *strcpy(char *dst, const char *src); primary attack vector for malicious
programmers.
strncpy : Similar to strcpy, except that at most n bytes will be
copied. If there is no null byte in the first n bytes of src, then dst
will not be null-terminated!69 Prototype: 69
strcpy and strncpy are not re-
placements for memcpy, although they
char *strncpy(char *dst, const char *src, size_t n); perform similar functions.
may not overlap, and the caller is responsible for ensureing that
there is enough space in dst to hold the resulting string.71 Proto- 71
This is another relatively unsafe
type: function!
while (* s ) {
// count how many digits are at the front of s
size_t prefixDigits = strspn (s , digits ) ;
// clean up
free ( numstr ) ;
// update s
cs 107 reader 79
return 0;
}
Final Thoughts
Pointers
type * varname ;
where type is any type (e.g., int, char, etc.). When declared as
above, the initial value of the pointer is undefined. Pointers that
have the value 0 are called NULL pointers.
The value of a pointer can be set to the address of another vari-
able by using the “&” character:
int x = 7;
int * xptr = & x ; // xptr points to x
To reitierate, the value of xptr in this case is the address of x in
memory, whatever that may be.
If a pointer has a non-NULL value that points to a memory loca-
tion a program can access, the value at that location can be derefer-
enced using the * operator:73 73
Yes, this is an overloaded meaning for
the asterisk character. It can be used
printf ( " % d \ n " ,* xptr ) ; // prints 7 to define a pointer, or to dereference a
pointer (and it can also be used as the
When compiling the above code, gcc knows that xptr points to multiplication operator).
an int, and therfore it knows that when it dereferences the value,
it needs to get four bytes from the address pointed to by xptr. This
is a key benefit of having typed pointers: if gcc only knew that
cs 107 reader 81
xptr held an address, but it didn’t know what that address pointed
to, it could not successfully determine how to encode the printf
function parameters.74 74
Well, it probably could use the
The type information that is embedded into the pointer is also printf string as a guide, knowing
that %d expects an int. But, take the
necessary for performing pointer arithmetic on arrays. Take the following example:
following example: long z = *xptr;
gcc could assume that xptr pointed to
void printarr ( long * array , size_t nelems ) a long, but that would be incorrect.
{
Figure 25: An array of longs in mem-
for ( size_t i =0; i < nelems ; i ++) {
ory
printf ( " % lu " , array [ i ]) ;
i == nelems - 1 ? printf ( " \ n " ) : printf ( " , " ) ;
Address Value
}
}
Figure 25 shows the layout of memory – note that each long 8
0x128
takes up 8 bytes of memory. Also note that in hexadecimal, 0x8 +
0x8 is 0x10. 3
Another way to express the printf statement from above would 0x120
be:
9
printf ( " % lu " , *( array + i ) ) ; 0x118
}
In order to dereference the proper element in the array, gcc needs -4
0x110
to know that each element in the array has a width of 8 bytes, and
assuming the value of array is 0x100 (as in Figure 25), the calcula- 2
tion of the address of the ith element of the array is: 0x108
array
array + i * sizeof(long) 0x100 5
0x100
which would be 0x100 + i * 8, leading to the addresses seen in
Figure 25.
// file : array_vs_pointer . c
# include < stdio .h >
# include < stdlib .h >
that only work on a specific type of data. Take the following pro-
gram, which swaps the beginning and end elements of an array, as
an example:
// file : nongeneric . c
# include < stdio .h >
# include < stdlib .h >
return 0;
}
// file : generic . c
# include < stdio .h >
# include < stdlib .h >
# include < string .h >
return 0;
}
Pointers to Functions
return 0;
}
cs 107 reader 87
void print_array ( void * arr , size_t nelems , int width , void (* pr_func ) ( void *) )
{
for ( int i =0; i < nelems ; i ++) {
void * element = ( char *) arr + i * width ;
pr_func ( element ) ;
i == nelems - 1 ? printf ( " \ n " ) : printf ( " , " ) ;
}
}
return 0;
}
struct coordinate {
double lat ;
cs 107 reader 89
double lon ;
};
void print_array ( void * arr , size_t nelems , int width , void (* pr_func ) ( void *) )
{
for ( int i =0; i < nelems ; i ++) {
void * element = ( char *) arr + i * width ;
pr_func ( element ) ;
}
}
printf ( " % f Latitude , % f Longitude =\ n " , coord - > lat , coord - > lon ) ;
printf ( " Latitude : % d deg % d min % f sec \ n " , lat_deg , lat_min , lat_sec ) ;
printf ( " Longitude : % d deg %d min % f sec \ n \ n " , lon_deg , lon_min , lon_sec ) ;
}
return 0;
}
You must be careful with the arguments and return values for
functions to pointers, and we will show an additional example that
demonstrates this.
A frequent use of functions to pointers is the comparison function.
The standard C comparison function compares to elements, and
returns a negative value if the first element is less than the second
element, zero if the elements are the same, and a positive value if
the first element is larger than the second element. The definition of
“less than,” “equal to,” and “greater than” depends on the things
that are being compared77 One example of a standard library func- although it is normally based on a
77
// file : compare_words . c
# include < stdio .h >
# include < stdlib .h >
Students tend to gloss over some of the critical details of this ex-
ample, and it is instructive to take a good look at the various point-
ers and the memory footprint of what is happening here. Figure 27
shows the layout in memory of the command line arguments.
0xf8a5 0xf8a5 p e a r \0
0x130
0xf89f 0xf89f p e a c h \0
0x128
0xf898 o r a n g e \0
0xf898
0x120
0xf891 b a n a n a \0
0xf891
0x118
0xf887 n e c t a r i n e \0
0xf887
argc 0x110
0xf881 a p p l e \0
7 0xf881
0x108
argv
0xf838 c o m p a r e w o r d s \0
0x100 0xf838
0x100
0xf898 0xf89f p e a c h \0
0x128
0xf898 o r a n g e \0
0xf891
0x120
0xf891 b a n a n a \0
0xf881
0x118
0xf887 n e c t a r i n e \0
0xf89f
argc 0x110
0xf881 a p p l e \0
7 0xf8a5
0x108
argv
0xf838 c o m p a r e w o r d s \0
0x100 0xf838
0x100
Final Thoughts
and in this chapter we will take a look at the format, and we will
investigate some of the decisions that were made and the trade-offs
that this implies. We will learn how to convert between a decimal
representation and the binary IEEE Floating Point representation,
and you should become familiar enough with the format to convert
some categories of numbers83 between the two formats. 83
For example, powers of two, and
other small numbers such as 1.25.
123.45
+ 678.90
802.35
and
100.22
∗ 1.08
80176
000000
1002200
1082376 = 108.2376 = 108.24 (rounded)
Fixed point format has one glaring problem: the range of our
numbers is severely limited. For our 5-digit example above, if we
set the decimal point to the left of the most significant digit, we
could only represent numbers in the range of 0 to 0.99999. Al-
though fixed point format has its place,84 it is too limiting for a 84
It can provide better accuracy in
general format for real numbers. some cases.
N = x × 10y (decimal)
or
N = x × 2y (binary)
We will concentrate on the binary representation from now on,
as it is the way we will eventually encode the data. When repre-
senting binary numbers, digits after the binary point are represented
by negative powers of two:
1 × 22 + 0 × 21 + 1 × 20 + 1 × 2−1 + 1 × 2−2
1 1 3
= 4+0+1+ + =5
2 4 4
We have written an online binary to decimal (and vice-versa)
calculator that you can play with: https://stanford.edu/~cgregg/
107-Reader/float/convert.html. If you type in 101.11 into the
Binary to Decimal converter, you will find that it is 5.75, as we
calculated above.
Just like in decimal format, binary cannot represent numbers
such as 13 and 16 exactly, nor can binary represent some other num-
1
bers that can be represented in base 10 exactly, like 10 . If you type
0.1 into the Decimal to Binary converter mentioned above, you will
get:
0.0001100110011 . . .
V = (−1)s × M × 2E
where:
• s is a sign bit (0 for positive, 1 for negative, with the sign for
numerical zero as a special case)
s exp fraction
Normalized Floats
The IEEE Floating Point format has three different possible in-
trepetations, depending on the value of the exponent field: normal-
ized, denormalized, and exceptional. Most of our time will be spent
learning about the normalized form, which handles the vast major-
ity of the numbers represented by the format. The denormalized
numbers are a special case for numbers with the very smallest mag-
nitude, and exceptional numbers are numbers such as infinity and
NaN.
A float is considered to have a normalized value if the exponent
is not all 0s, and not all 1s. In a normalized number, the exponent
is encoded as a signed integer88 that is in biased form. The exponent 88
Not a two’s complement number!
has a value of exp − bias, where bias is 2k−1 − 1, and where k is It is an 8-bit binary number that is
intrepeted as a signed number, based
the number of bits in the exponent. For a 32-bit number, there are on the bias.
cs 107 reader 97
0 01111110 00000000000000000000000
The sign bit, 0 means that the number is positive.
The exp field has the value of 01111110, or 126 decimal, and
it has been biased by 127. Therefore, the exponent value we will
intrepet exp as will be 126 − 127 = −1.
The f raction field has a value of all 0s, but we intrepet this as the
binary value of 1.000 · · · .
Therefore, the number represents:
0 10000100 01010000000000000000000
The sign bit, 0 means that the number is positive.
The exp field has the value of 10000100, or 132 decimal, and
it has been biased by 127. Therefore, the exponent value we will
intrepet exp as will be 132 − 127 = 5.
The f raction field has a value of 01010000000000000000000, but
we intrepet this as the binary value of 1.01010 · · · : We can check this at the website
from above: https://stanford.edu/
1 × 20 + 0 × 2−1 + 1 × 2−2 + 0 × 2−3 + 1 × 2−4 = 1.3125 ~cgregg/107-Reader/float/convert.
html
Therefore, the number represents:
+1.3125 × 25 = 42.0
cs 107 reader 98
0.011001100110011001100110011
This is a non-terminating rational number, so we won’t be able
to represent it exactly. Next, we find the most significant 1 in the
number, and then take the following 23 bits after that, in order to
form the significand (in bold, below):
0.011001100110011001100110011
In this case, we have one more tiny step to do: we must round
the number. Notice that the number after the 23rd digit is a 1. Be-
cause 0.1 in binary is 12 , the rest of the number is greater than one
half. Therefore, just like when we round up in decimal when digits
when the digit after the rounding place is 5, we round up when the
number after the 23rd digit is 1 and there are any more 1s after it
(and we do not round up if the number is 0, or if there is a single
191 . In order to round up a number, add 1 to it. So, we have: 91
This is called rounding towards
nearest, ties to even. There are actually
five rounding rules, but they are
0.0110011001100110011001101 beyond the scope of CS 107. See the
Wikipedia page for more information.
The number in bold above will be the 23 bits of the significand.
Next, we have to scale the number so that it falls between 1.0 and
2, which would look like this:
1.10011001100110011001101
Notice that we had to shift the number two binary places to the
right. Therefore, the exponent of two that we will multiply by is go-
ing to be −2. We need to bias this number by 127, so the exponent
will be −2 + 127 = 125, or 01111101, which is the exponent field for
the number.
Finally, 0.4
we know that the sign bit will be 0 because it is a pos-
itive number. The following depicts the bits in the floating point
representation for 0.4:
31 30 23 22 0
0 01111101 10011001100110011001101
There are websites to check if you are correct92 , or you can write 92
This is an outstanding site: https:
a quick program and run it in gdb. The gdb command “x/tw &f” //www.h-schmidt.net/FloatConverter/
IEEE754.html
means, print the memory located at the address of f in binary (“t”) and
in word size (“w”, or 4-bytes).
// file : floatingpt . c
# include < stdio .h >
# include < stdlib .h >
cs 107 reader 99
Denormalized Floats
Exceptional Floats
When the exponent bits in an IEEE float are all 1s, the number is
considered exceptional. Two interesting exceptional numbers are ∞,
which is 0 11111111 000000000000000000000000 (for a 32-bit float),
and −∞, which is 1 11111111 000000000000000000000000. Positive
infinity can can arise when a floating point number overflows, and
negative infinity can arise when a number underflows (even smaller
than the denormalized values), and other calculations such as 10 can
also produce infinity.
The other exceptional numbers95 are either NaNs, or representa- 95
and there are over 8-million of them!
tions that have a special meaning for a particular processor.
cs 107 reader 101
Unrepresentable Numbers
1000000000000000000000001
Next, we put a binary point to the right of the most significant 1:
1.000000000000000000000001
Next, we count off the 23-bits after the binary point, which will
become our significand (in bold, below):
1.000000000000000000000001
This forms the significand, and it has lost information, so we
cannot represent it exactly.
To finish off the conversion, we calculate the exponent, which
is 24 (because we have to shift left by 24 places to get a number
between 1 and 2), and after biasing, our exponent bits are 24 +
127 = 151, or 10010111. The sign bit is 0 because the number is
positive.
Below are the binary representations of 16, 777, 216, 16, 777, 217,
and 16, 777, 218 – notice that the representations for 16, 777, 216 and
16, 777, 217 are identical:
31 30 23 22 0
31 30 23 22 0
31 30 23 22 0
printf ( " c == d ? % s \ n " , ( diff_cd <= max_cd_percent ) ? " true " : " false " ) ;
printf ( " c < d ? % s \ n " , ( diff_cd <= max_cd_percent ) ? " false " : " true " ) ;
return 0;
}
1010110101111000111011000000000000000000000000000000000000000000011.0010001111010111000011
1010110101111000111011000000000000000000000000000000000000000000011.0010001111010111000011
Notice that the significand from the result of our addition is the
same as the significand for 1 × 1020 , and therefore, the calculation
loses the significant digits from 3.14.
Much like the integers, there are defined maximum and minimum
values for floating point numbers, which can be derived from the
cs 107 reader 105
We can generate those values directly in a roundabout way, as FLT_MAX 0 1111110 11111111111111111111111
31 30 23 22 0
// file : float_extremities . c
# include < stdio .h >
# include < stdlib .h >
# include < float .h >
Final Thoughts
The IEEE Floating Point Standard was a carefully thought out way
to get the most out of a discrete set of bits. It may not be simple,
but it is a great study in good engineering design. Floating point
numbers represent a very large range, in a limited number of bits.
A 32-bit float can only hold a bit over 4 billion numbers but it has
a range of −3.4 × 1038 to +3.4 × 1038 . Not only is this literally an
infinite number of reals that the format must try and represent, but
that is a phenomenal range of numbers. The 64-bit double range is
−1.7 × 10308 to +1.7 × 10308 , which is enormous. Most numbers are,
cs 107 reader 106
named locations that store 64-bit values, plus other registers that
hold information you cannot directly access. Registers are the
fastest memory on your computer. Registers can hold addresses,
or integer data. Some registers are used to keep track of your
program’s state, and others hold temporary data. Registers are
used for arithmetic, local variables, and return values for func-
tions, and your entire processor core shares the small number of
registers.
int main ()
{
int i = 1;
printf ( " Hello , World % d !\ n " , i ) ;
return 0;
}
Data Formats
Intel assembly language has four basic data width formats, to de-
note byte widths of 1 (b), 2 (w), 4 (l), and 8 (q). Because of its 16-bit
origins, Intel uses word to mean 16-bits, or 2 bytes. Thirty-two bit Table 8: Intel Data Types
words are referred to double words, and 64-bit words are referred to C type Suf. B Intel data type
as quad words. Table 8 shows how the assembly code suffix aligns char b 1 Byte
with common C data types. short w 2 Word
int l 4 Double word
long q 8 Quad word
char * q 8 Quad word
float s 4 Sing. precision
double l 8 Dbl. precision
cs 107 reader 111
x86-64 Registers
x86 CPUs have 16 general purpose registers, which store 64-bit val-
ues. Registers store integer data and pointers. Register names begin
with r, but the naming is historical: the original 16-bit registers
were %ax, %bx, %cx, %dx, %si, %di, %bp, and %sp. Each register had a
purpose, and were named as such.
When 32-bit x86 arrived, the register names expanded to 32-
bits each, and changed to %eax, %ebx, etc. When x86-64 arrived, the
registers were again renamed to %rax, %rbx, etc., and expanded to
64-bits. Additionally, eight more registers were added, %r8 - %r15.
Figure 31 shows the integer registers, which are nested such that
(for example), %rax is the full 64-bit register, and %eax is the low 32
bits of the register, %ax is the low 16 bits of the register, and %al is
the low 8 bits of the register.
Imm(rb ,ri ,s) The general form of a scaled indexed operand. The
value is computed with the following linear formula:
Imm + rb + s * ri
also be the proper width for the instruction (see the examples be-
low), and in most cases, only the bits in the specified width are
modified in a 64-bit register. For example, if you are moving 1 byte
(with the b specifier), only the lowest byte of a destination register
will be affected, and the rest will remain the same. There is one
exception: the movl instruction moves 32-bits into the low bits of a
register, and clears the upper 32-bits with zeros.
If the operand is in the form Imm(rb ,ri ,s), then the location is
a memory address, and the copy will be from the location calculated
with the linear equation above.
Examples:
movl $0x20,%eax copies the immediate value 0x20 into the low 32 bits of
%rax, and clears the upper 32 bits to zero.
movw %bx, (%rdx,%rax,4) copies two bytes from %bx into the memory location
%rdx + 4 * %rax
movl %eax,%edx copies the low 32-bits (four bytes) of %eax into the
low 32-bits of %rdx, and clears the upper 32 bits to zero.
movb (,%rdx,8),%al copies one byte from 8 * %rdx to the lowest 8 bits of %rax.
movq %rcx, (%rax, %rdx) copies the 64-bit value in %rcx into the memory location
at %rax + %rdx.
There are two additional mov instructions, which are used to copy
a smaller source into a larger destination (which must be a register):
movz and movs. These instructions perform either a zero-fill of the
remaining bytes, or a sign-extension of the remaining bytes. There
are six ways to move a 1- or 2-byte operand into a 2-, 4- or 8-byte
operand. There is not an explicit instruction to zero-extend a 4-
byte source into an 8-byte destination, because the movl instruction
already accomplishes the zero-fill operation. Note that the instruc-
tions have the size designations embedded in each instruction:
leaq (%rax,%rdx,4), %rax copies the value of %rax + 4 * %rdx into %rax
Often, you will see the leaq instruction used to perform arith-
metic on values in registers, and the registers do not have to hold
addresses.102 102
Keep this in mind! You may see the
leaq instruction in what seem like odd
sections of code (where there are no
memory references), but in those cases
it is being used for arithmetic.
cs 107 reader 115
The Linux Address Space, and the Stack and the Heap Figure 32: The Linux address space
(not to scale)
At this point, we will take a brief detour into the land of the stack
and the heap in a program. We have discussed stack variables (local
variables and local arrays), and heap allocation (using malloc and
free), but as we get into assembly language, we need to discuss
how the stack and heap are layed out in memory.
Figure 32 shows the layout of the Linux address space. 0x7ffffffff000
Stack 8MB
reserved
The following is a brief introduction to each section of memory:
shared library text and data The C libaries (and other shared libraries) only doles out the memory as needed.
It actually infrequently holds the
104
are placed into a location accessable by all programs. Each pro-
paramaters, as we will see later in the
gram gets its own data segment for shared libraries, but the code chapter
is completely shared.
global data Any global or static variables get stored in global mem-
ory.
text The text segment (in the 0x400000 range is where a program’s
code resides. You will see these addresses when you analyze
code in more detail.
low addresses The low addresses are used by the operating system
and hardware.
the reverse process: it saves the value at the location of %rsp into a
register, and then it increments %rsp by the amount of bytes that it
popped. Figure 33 demonstrates a push followed by a pop.
Initially pushq %rax popq %rdx Figure 33: A push followed by a pop.
%rax 0x123 %rax 0x123 %rax 0x123 Notice that after the push %rax, the
%rdx 0 %rdx 0 %rdx 0x123 stack pointer has been decremented
%rsp 0x108 %rsp 0x100 %rsp 0x108 by 8, and after the pop %rdx, the stack
pointer has been incremented by 8.
Stack "bottom" Stack "bottom" Stack "bottom" Also note that the value on the stack
after the pop has not been cleared.
. . .
Increasing . Increasing . Increasing .
address address address
. . .
0x108 0x108 0x108
Stack "top" 0x100 0x123 0x100 0x123
Stack "top" Stack "top"
Arithmetic in Assembly
sub src,dst : Subtracts src from dst and puts the result into dst.106 . 106
This syntax can be tricky. addq
Example: subl %eax,%edx %rax, %rdx is read “subtract %rax
from %rdx.”
imul src,dst : Peforms a signed multiply on src and dst, and puts
the result into dst. There is a possibility of losing information, as
cs 107 reader 117
lower half of the product is the same. Example: imulb %al,%dl imul that does not lose information.
and src,dst : Performs the AND operation on src and dst and puts
the result into dst. Example: andw %ax,%dx
The shift instructions are defined as follows. Note that the first
operand can be either an immediate value or %cl (and only %cl):
sal k,dst or shl src,dst : Performs a left shift of k bits on dst and
puts the result into dst.108 Example: shlq $4,%rdx 108
The reason there are two instruc-
tions that do the exact same thing is
sar k,dst : Performs an arithmetic right shift of k bits on dst and simply to be analogous to the right
shift instructions, which have a logical
puts the result into dst. Example: sarb $2,%dl and artihmetic form.
shr k,dst : Performs an logical right shift of k bits on dst and puts
the result into dst. Example: shrq %cl,8(%rax)
// multiply64
// x : the first multiplicand
// y : the second multiplicand
// result : a pointer to an array of 2 longs
// that will be in little - endian format
void multiply64 ( unsigned long x , unsigned long y , unsigned long * result ) ;
multiply64 (x ,y , result ) ;
# File : multiply64 . s
# ---------------
# Demonstrates the 1 - operand imul / mul
# instructions with two signed 64 - bit multiplicands
# and the result placed into % rdx :% rax
. section . text
placed into %rax, and the remainder of the integer division is placed
into %rdx. If the quotient ends up as too large for the destination
register, %rax), the runtime throws an exception.109 The divide in- 109
One way to check for this potential
structions are explained below: case is to ensure that %rdx is less than
the divisor operand.
Control
For example, if %rax contains 5 and %rdx contains −5, the in-
struction add %rax,%rdx will set the ZF flag, because the result of
the addition was 0. If, instead, %rdx holds −20, then the same in-
struction would set the SF flag, because the result of the addition is
negative.
The leaq and mov instructions do not set any condition codes,
but the arithmetic instructions do set them, as do the logical in-
structions110 . For shift instructions, the carry flag is set to the last 110
The logical instructions, e.g., xor set
bit shifted out, while the overflow flag is set to zero. For historical the carry and overflow flags to 0.
reasons, the inc and dec instructions do not set the carry flag, but
do set the overflow and zero flags.
There are two types of instructions, cmp and test that we can use
to set condition codes without altering any other registers:
cmpx s1,s2: Compares two numbers by subtracting s2-s1.111 The x Be careful to get the order correct!
111
movl $5,%eax
movl $1,%edx
cs 107 reader 121
cmpl %eax,%edx # sets the sign flag because 1-5 is less than 0
# also sets the carry flag, because adding
# -1 to 5 produces a carry-out
movl $0,%eax
test %eax,%eax # sets the zero flag because 0 AND 0 == 0
cmpq $0x5,%rax
jg 0x821 <main+193>
cmpw %dx,%ax
cmovne %rcx,%r8
To read this, say, “move %rcx to %r8 if %ax is not equal to %dx.”
The jmp instruction is unconditional, meaning that it always per-
forms the requested jump to a different instruction:
void loop ()
{
int i = 0;
while ( i < 100) {
printf ( " i :% d \ n " ,i ) ;
i ++;
}
}
int main ( int argc , char ** argv )
{
loop () ;
return 0;
}
The unconditional jmp on line <+6> sets the instruction pointer to soon!
instruction <+33> (address 0x400567), where the initial while loop
check for i < 100 is completed.
The jle instruction on line <+36> jumps back to line <+8> if %rbx
is less than or equal to 99 (0x63), at which point the function enters
into the while loop body.
Lines <+8> through <+25> set up and call the printf function.
On line <+30>, the counter is incremented, and the code falls
through to the loop counter check again. When the counter has
reached a value of 99, the jle instruction does not branch, and
the code continues with line <+38>, restoring %rbx (line <+38> and
returning from the function (line <+39>.
return diffab ;
}
return 0;
}
$ gdb conditionalmov
cs 107 reader 125
Notice that the compiler has discovered that both of the loops
can be re-written using the cmovl instruction, without needing to
use any conditional jumps.
while ( test_expr )
body \ _statement
Gcc often translates a while loop into assembly that looks like
this:
The next code transformation we will look at is the for loop. The
general form of a for loop in C is as follows:
cs 107 reader 127
init_expression
jmp test // unconditional jump
loop:
body_statement instructions
update_expression
test:
cmp ... // comparison instruction
j.. loop // conditional jump based on comparison
// can think of as "if test, goto loop"
init_expression
cmp ... // comparison instruction
j.. done // conditional jump based on comparison
// can think of as "if not test, goto done"
loop:
body_statement instructions
update_expression
cmp ... // comparison instruction
j.. loop // "if test, goto loop"
done:
return result ;
}
3. There must be a way to save the state of the calling function, and
to recover the saved state when control is returned to the calling
function.
cs 107 reader 129
As we have seen earler, the stack holds local data that a function
uses for variables, arrays, etc. Figure 34118 shows the stack with R.E. Bryant and D.R. O’Hallaron.
118
more detail than we have seen before – in particular, it shows the Computer Systems : A Programmer’s
Perspective. Pearson, 2015. ISBN
“stack frames” for two functions, a calling function (P), and an 9781292101767. URL https://books.
executing function (Q). google.com/books?id=KfM2rgEACAAJ
The calling function, P, generally stores its arguments in regis-
ters, but if there are more than six arguments to a function, they Figure 34: The stack frame structure,
borrowed from Figure 3.25, Bryant and
get placed onto the stack before calling the function, Q. Addition- O’Halloran
ally, the return address of the next instruction after the instruction Stack "bottom"
.
that calls Q is stored on the stack, and when Q is called, the stack . Earlier Frames
.
pointer points to the location on the stack where that return ad- .
.
dress is stored. The arguments passed on the stack and the return .
below the return address (labeled as “Frame for executing function Saved Registers
Frame for
Q in Figure 34). Function Q allocates this space by either pushing Local Variables executing
Stack
and popping values onto and off of the stack, or by decrementing function Q
pointer Argument build
%rsp area
the stack pointer appropriately. When function Q ends, it must re- Stack "top"
turn the stack pointer to point to the location of the return address
(thereby cleaning up after itself).
As an example of the stack frame structure, we will use the fol-
lowing program, which has three functions (main, first_function,
and leaf). The program listing is shown along with a partial listing
of the dissassembly (to just show the relevant functions):
// file : run_time_stack . c
# include < stdio .h >
# include < stdlib .h >
return a ;
}
int main ( int argc , char ** argv )
{
int n = 10;
n = first_function ( n ) ;
printf ( " n : % d \ n " ,n ) ;
return 0;
}
000000000040054a <first_function>:
40054a: 83 ef 01 sub $0x1,%edi
40054d: e8 f4 ff ff ff callq 400546 <leaf>
400552: 83 c0 03 add $0x3,%eax
400555: c3 retq
0000000000400556 <main>:
400556: 48 83 ec 08 sub $0x8,%rsp
40055a: bf 0a 00 00 00 mov $0xa,%edi
40055f: e8 e6 ff ff ff callq 40054a <first_function>
400564: 89 c2 mov %eax,%edx
400566: be 14 06 40 00 mov $0x400614,%esi
40056b: bf 01 00 00 00 mov $0x1,%edi
400570: b8 00 00 00 00 mov $0x0,%eax
400575: e8 b6 fe ff ff callq 400430 <__printf_chk@plt>
40057a: b8 00 00 00 00 mov $0x0,%eax
40057f: 48 83 c4 08 add $0x8,%rsp
400583: c3 retq
Table 10 shows a stack trace for the above code, starting at in-
struction 40055f:
Instruction State values (at beginning) Table 10: Stack Trace Example
PC Instruction %rdi %rax %rsp *%rsp Description
0x40055f callq 10 – 0x7fffffffe940 – Call call first_function(10)
0x40054a sub 10 – 0x7fffffffe938 0x400564 Entry of first_function
0x40054d callq 9 – 0x7fffffffe938 0x400564 Cal leaf(9)
0x400546 lea 9 – 0x7fffffffe930 0x400552 Entry of leaf
0x400549 retq 9 27 0x7fffffffe930 0x400552 Return 27 from leaf
0x400552 add 9 27 0x7fffffffe938 0x400564 Resume first_function
0x400551 retq 9 30 0x7fffffffe938 0x400564 Return 30 from first_function
0x400564 mov 9 30 0x7fffffffe940 – Resume main
Procedure calls can pass data as arguments (either via the stack or
registers), and a function can return a value to the calling function
cs 107 reader 131
as well.119 Most often, arguments can fit into registers, so we don’t 119
Normally through the %rax register
need to involve the stack. However, if there are more than six argu- for integer return values, and through
a floating point register for floats, or
ments (or the argument is a struct120 , the calling function allocates via the stack for structs.
space on the stack for those arguments, in a well-defined way. The 120
though not a struct pointer, of
course – that is sent as the value of the
first six arguments are placed into the following registers, in the address of the struct.
order of the arguments to the function: %rdi, %rsi, %rdx, %rcx, %r8,
and %r9. Arguments seven and above are placed onto the stack in 8-
byte chunks, even if smaller values are passed.121 The return value 121
e.g., a char argument will take up 8
is (for integer values) placed in the %rax register. bytes on the stack.
long sum = a1 + a2 + a3 + * a4 + * a5 + * a6 + a7 + * a8 ;
* a4 = * a5 = * a6 = 0;
* a8 = 'y ';
return sum ;
}
int main ( int argc , char ** argv )
{
long a = 0 xabcdefabcdef ;
int b = 0 x1234567 ;
short c = 0 xabcd ;
char d = 'x ';
return 0;
}
In x86-64 assembly, the stack can be used to hold data for a function
aside from the arguments. When the “address of” operator, & is
applied to a local variable, the value must be copied onto the stack
because registers do not have addresses, and a proper address
needs to be generated that points to the variable. Also, if the local
variables are arrays or structs, these must also be placed onto the
stack. Whenever a function uses the stack, it must return the stack
pointer to its original value when the function was called in order
to properly return to the calling function.122 122
Remember, this is because at the
In the following example, main puts the address values onto the end of the function, the stack pointer
must point to the return address of the
stack, which get used by the settomin function: calling function.
// file : localstack . c
# include < stdio .h >
# include < stdlib .h >
}
cs 107 reader 133
Your programs have two areas of main memory: the stack and
the heap. On a Linux system, programs have (by default) 8MB
of stack space that they must manage based on the conventions
we discussed in x86-64 Assembly Language. The heap, on the other
hand, is ultimately controlled by the operating system, and a heap
allocator maintains the heap as a collection of contiguous memory
blocks that are either free or allocated.
An allocated block has been reserved for a particular application.
When a program calls malloc, it has access to an allocated block,
and only that program can modify or read the values in that block.
Allocated blocks remain allocated for the rest of the program, or
until the program frees them. If the program ends, the heap alloca-
tor frees the block.
As we have discussed before, stack memory is limited and serves
as a scratch-pad for functions, and it is continually being re-used by
a program’s functions. Stack memory isn’t persistent, but because it
is already allocated to a program, it is fast.
Heap memory takes more time to set up (a program has to go
through the heap allocator), but it is unlimited,125 and persistent for 125
For all intents and purposes it is
the rest of a program. unlimited. Operating systems and
hardware are excellent at allocating
memory, and they will even use (slow)
hard disk or SSD memory to fulfill
Heap Allocator Requirements requests if necessary.
We will investigate heap allocation for the following program: f 0xffffe828 0xfeed
e 0xffffe820 0xabcde
// file : heapalloc_ex1 . c d 0xffffe818 0xf0123
c 0xffffe810 0x0
# include < stdio .h > b 0xffffe808 0x0
# include < stdlib .h > a 0xffffe800 0xbeef
# include < string .h >
Table 13: Stack after a = malloc(16)
Variable Address Value
int main ( int argc , char ** argv )
{ f 0xffffe828 0xfeed
e 0xffffe820 0xabcde
void *a , *b , *c , *d , *e , * f; d 0xffffe818 0xf0123
a = malloc (16) ; c 0xffffe810 0x0
b 0xffffe808 0x0
memset (a , 'a ' ,16) ; a 0xffffe800 0x100
} f 0xffffe828 0x0
e 0xffffe820 0x100
The heap we will look at will be limited to 96 bytes (and yes, d 0xffffe818 0x130
c 0xffffe810 0x118
this is a very small heap!). The initial heap is shown in Figure 35. b 0xffffe808 0x110
The numbers in Figure 35 are left-aligned to their start locations, a 0xffffe800 0x100
and each minor hash-mark represents 4 bytes. Table 12 shows the
stack after the pointer variables are declared in the program, and
note that the variables have uninitialized values. We will investigate
partitioning the heap as if we were simply filling in locations in our
block of memory – we will soon see that this is not the way most
heaps are organized.
(free)
aaaaaaaa (free)
0x100 0x108 0x110 0x118 0x120 0x128 0x130 0x138 0x140 0x148 0x150 0x158
Figure 40 shows the heap after the free(a); line. Note that the
only information the free line provides is the pointer, and the heap
allocator must keep track of the amount of space allocated to a in
order to free it.
Because c was already freed, the heap allocator was able to extend
b’s allocation, and returned the same pointer.
same memory that keeps track of the used and free information.
Figure 46 shows a 96-byte, completely free heap. The first eight
bytes make up the block header, and the rest of the bytes are free.
88
The block header in this case has both the amount of free bytes
following it (88, in this case), and an “F” to designate that it is free.
Figure 47 shows how the heap memory changes after a a =
malloc(16); statement is executed. Memory for the requested bytes
is allocated to a after the first block header, and the header is up-
dated to reflect the amount of memory (16 bytes), and the fact that
it is now used (“U”). After the allocated memory, another block
header is created in order to designate the smaller free block that is
left after the allocation. This second block header uses an additional
8 bytes from the available memory, so there are only 64 bytes in the
free block.
16
64
aaaaaaaa
U F
16
8
48
aaaaaaaa bbbb
U U F
16
8
24
16
16
8
24
16
bbbb cccccccccccc
F U U F
16
8
24
16
bbbb
F U F F
16
8
48
bbbb
F U F
cs 107 reader 144
Bibliography
Index