ch2编程背景

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 95

Linux 技术

2 Programming Background
2.3 Program Development
2.3.1 Program Development Steps
(1). Create source files
t1.c and t2.c are the source files of a C program

2.3.2 Variables in C
Variables in C programs can be classified as global, local, static,
automatic and registers, etc. as
shown in Fig. 2.3.
• Global variables are defined outside of any function.
• Local variables are defined inside functions.
• Global variables are unique and have only one copy.
• Static globals are visible only to the file in which they are defined.
• Non-static globals are visible to all the files of the same program.
• Initialized globals are assigned values at compile time.
• Uninitialized globals are cleared to 0 when the program execution starts.
• Local variables are visible only to the function in which they are
defined.
• By default, local variables are automatic; they come into existence when
the function is entered and they logically disappear when the function
exits.
• For register variables, the compiler tries to allocate them in CPU
registers.
• Since automatic local variables do not have any allocated memory
space until the function is entered, they cannot be initialized at
compile time.
• Static local variables are permanent and unique, which can be
initialized.
• C also supports volatile variables, which are used as memory-mapped
I/O locations or global variables that are accessed by interrupt
handlers or multiple execution threads.
• The volatile keyword prevents the C compiler from optimizing the
code that operates on such variables.
2.3.3 Compile-Link in GCC
(2). Use gcc to convert the source files into a binary executable, as in
gcc t1.c t2.c
which generates a binary executable file named a.out(assembler output).
(3). What’s gcc? gcc is a program, which consists of three major steps,
as shown in Fig. 2.4.
Step 1. Convert C source files to assembly code files :
The C COMPILER translates .c files into .s files containing assembly
code of the target machine.

Step 2. Convert assembly Code to OBJECT code :


The second step of cc is to invoke the ASSEMBLER to translate .s
files to .o files. The resulting .o files are called OBJECT code.
Each .o file consists of
. a header containing sizes of CODE, DATA and BSS sections
. a CODE section containing machine instructions
. a DATA section containing initialized global and initialized static local
variables
. a BSS ( Block Started by Symbol ) section containing uninitialized
global and uninitialized static local variables
. relocation information for pointers in CODE and offsets in DATA and
BSS
. a Symbol Table containing non-static globals, function names and their
attributes.
Step 3: LINKING:
The last step of cc is to invoke the LINKER, which combines all the .o
files and the needed library functions into a single binary executable
file. More specifically, the LINKER does the following:
Combine all the CODE sections of the .o files into a single Code
section. For C programs, the combined Code section begins with the
default C startup code crt0.o, which calls main(). This is why every C
program must have a unique main() function.
Combine all the DATA sections into a single Data section. The
combined Data section contains only initialized globals and initialized
static locals.
Combine all the BSS sections into a single BSS section.
Use the relocation information in the .o files to adjust pointers in the
combined Code section and offsets in the combined Data and bss
sections.
Use the Symbol Tables to resolve cross references among the
individual .o files.

If all the cross references can be resolved successfully, the linker writes
the resulting combined file as a.out, which is the binary executable file.
2.3.4 Static vs. Dynamic Linking
• In static linking, which uses a static library, the linker includes all the
needed library function code and data into a.out. This makes a.out
complete and self-contained but usually very large.
• In dynamic linking, which uses a shared library, the library functions
are not included in a.out but calls to such functions are recorded in
a.out as directives.
• When execute a dynamically linked a.out file, the operating system
loads both a.out and the shared library into memory and makes the
loaded library code accessible to a.out during execution.
The main advantages of dynamic linking are:
• The size of every a.out is reduced.
• Many executing programs may share the same library functions in
memory.
• Modifying library functions does not need to re-compile the source
files again.

Libraries used for dynamic linking are known as Dynamic Linking


Libraries (DLLs). They are called Shared Libraries (.so files) in Linux.
Dynamically loaded (DL) libraries are shared libraries which are loaded
only when they are needed. DL libraries are useful as plug-ins and
dynamically loaded modules
2.3.5 Executable File Format
Although the default binary executable is named a.out, most C
compilers and linkers can generate executable files in several different
formats, which include :
(1) Flat binary executable:
A flat binary executable file consists only of executable code and
initialized data. It is intended to be loaded into memory in its entirety
for execution directly. For example, bootable operating system images
are usually flat binary executables, which simplifies
the boot-loader.
(2) a.out executable file: A traditional a.out file consists of a header,
followed by code, data and bss sections.
(3) ELF executable file: An Executable and Linking Format (ELF) file
consists of one or more program sections. Each program section can be
loaded to a specific memory address.In Linux, the default binary
executables are ELF files, which are better suited to dynamic linking.
2.3.6 Contents of a.out File
An a.out file consists of the following sections:
(1) header: the header contains loading information and sizes of the
a.out file, where
tsize = size of Code section;
dsize = size of Data section containing initialized globals and static
locals;
bsize = size of bss section containing uninitialized globals and static
locals;
total_size = total size of a.out to load.
(2) Code Section: also called the Text section, which contains
executable code of the program. It begins with the standard C startup
code crt0.o, which calls main().
(3) Data Section: The Data section contains initialized global and static
data.
(4) Symbol table: optional, needed only for run-time debugging.

Note that the bss section, which contains uninitialized global and static
local variables, is not in the a.out file. Only its size is recorded in the
a.out file header. Also, automatic local variables are not in a.out.
• Figure 2.5 shows the layout of an a.out file.

• _brk is a symbolic mark indicating the end of the bss section. The total
loading size of a.out is usually equal to _brk, i.e. equal to
tsize+dsize+bsize. If desired, _brk can be set to a higher value for a
larger loading size. The extra memory space above the bss section is
the HEAP area for dynamic memory allocation during execution.
2.3.7 Program Execution
• Under a Unix-like operating system, the sh command line
a.out one two three
executes a.out with the token strings as command-line parameters.

• Sh forks a child process and waits for the child to terminate.


• When the child process runs, it uses a.out to create a new execution
image by the following steps.
(1) Read the header of a.out to determine the total memory size needed,
which includes the size of a stack space:
TotalSize = _brk + stackSize
stackSize is usually a default value chosen by the OS kernel for the
program to start.

• There is no way of knowing how much stack space a program will


ever need.
• the trivial C program
main(){ main(); }
will generate a segmentation fault due to stack overflow on any
computer.
(2) It allocates a memory area of TotalSize for the execution image.
Conceptually, we may assume that the allocated memory area is a single
piece of contiguous memory. It loads the Code and Data sections of
a.out into the memory area, with the stack area at the high address end.
It clears the bss section to 0, so that all uninitialized globals and static
locals begin with the initial value 0. During execution, the stack grows
downward toward low address.
(3) Then it abandons the old image and begins to execute the new
image, which is shown in Fig. 2.6.
(4). Execution begins from crt0.o, which calls main(), passing as
parameters argc and argv to main(),which can be written as
int main( int argc, char *argv[ ] ) { . . . . }
2.3.8 Program Termination
A process executing a.out may terminate in two possible ways.
(1). Normal Termination:
If the program executes successfully, main() eventually returns to
crt0.o,which calls the library function exit(0) to terminate the process.
The exit(value) function does some clean-up work first, such as flush
stdout, close I/O streams, etc. Then it issues an _exit(value) system call,
which causes the process to enter the OS kernel to terminate.
A 0 exit value usually means normal termination.
If desired, a process may call exit(value) directly without going back to
crt0.o.
Even more drastically, a process may issue an _exit(value) system call
to terminate immediately without doing the clean-up work first.
When a process terminates in kernel, it records the value in the
_exit(value) system call as the exit status in the process structure,
notifies its parent and becomes a ZOMBIE.
The parent process can find the ZOMBIE child, get its pid and exit
status by the system call pid = wait(int *status);
which also releases the ZOMBIE child process structure as FREE,
allowing it to be reused for another process.
(2). Abnormal Termination:
While executing a.out the process may encounter an error condition,
such as invalid address, illegal instruction, privilege violation, etc.
which is recognized by the CPU as an exception.
When a process encounters an exception, it is forced into the OS kernel
by a trap.
The kernel’s trap handler converts the trap error type to a magic number,
called a SIGNAL, and delivers the signal to the process, causing it to
terminate.
In this case, the exit status of the ZOMBIE process is the signal number,
and we may say that the process has terminated abnormally.
In addition to trap errors,signals may also originate from hardware or
from other processes.
For example,
pressing the Control_C key generates a hardware interrupt, which sends
the number 2 signal SIGINT to all processes on that terminal, causing
them to terminate.
Alternatively, a user may use the command
kill -s signal_number pid # signal_number = 1 to 31
to send a signal to a target process identified by pid.
For most signal numbers, the default action of a process is to terminate.
2.4 Function Call in C
• The following discussions apply to running C programs on 32-bit
Intel x86 processors.
• On these machines, the C compiler generated code passes parameters
on the stack in function calls.
• During execution, it uses a special CPU register (ebp) to point at the
stack frame of the current executing function.
2.4.1 Run-Time Stack Usage in 32-Bit GCC
Consider the following C program,
(1) When executing a.out, a process image is created in memory, which
looks (logically) like the diagram shown in Fig. 2.7, where Data
includes both initialized data and bss.

(2) Every CPU has the following registers or equivalent, where the
entries in parentheses denote registers of the x86 CPU:
PC (IP): point to next instruction to be executed by the CPU.
SP (SP): point to top of stack.
FP (BP): point to the stack frame of current active function.
Return Value Register (AX): register for function return value.
(3) In every C program, main() is called by the C startup code crt0.o.
When crt0.o calls main(), it pushes the return address (the current PC
register) onto stack and replaces PC with the entry address of main(),
causing the CPU to enter main().
When control enters main(), the stack contains the saved return PC on
top, as shown in Fig. 2.8, in which XXX denotes the stack contents
before crt0.o calls main(), and SP points to the saved return PC from
where crt0.o calls main().
(4) Upon entry, the compiled code of every C function does the
following:
• push FP onto stack # this saves the CPU's FP register on stack.
• let FP point at the saved FP # establish stack frame
• shift SP downward to allocate space for automatic local variables on
stack
• the compiled code may shift SP farther down to allocate some
temporary working space on the stack, denoted by temps.
• After entering main(), the stack contents becomes as shown in Fig.
2.9, in which the spaces of a, b, c are allocated but their contents are
yet undefined.
(5) Then the CPU starts to execute the code a=1; b=2; c=3; which put
the values 1, 2, 3 into the memory locations of a, b, c, respectively.
Assume that sizeof(int) is 4 bytes. The locations of a, b, c are at -4, -8, -
12 bytes from where FP points at. These are expressed as -4(FP), -
8(FP), -12(FP) in assembly code, where FP is the stack frame pointer.
For example, in 32-bit Linux the assembly code for b=2 in C is
movl $2, -8(%ebp) # b=2 in C
where $2 means the value of 2 and %ebp is the ebp register.
(6) main() calls sub() by c = sub(a, b); The compiled code of the
function call consists of
• Push parameters in reverse order, i.e. push values of b=2 and a=1 into
stack.
• Call sub, which pushes the current PC onto stack and replaces PC with
the entry address of sub, causing the CPU to enter sub().
• When control first enters sub(), the stack contains a return address at
the top, preceded by the parameters, a, b, of the caller, as shown in
Fig. 2.10.
(7) Since sub() is written in C, it actions are exactly the same as that of
main(), i.e. it
• Push FP and let FP point at the saved FP;
• Shift SP downward to allocate space for local variables u, v.
• The compiled code may shift SP farther down for some temp space on
stack.
• The stack contents becomes as shown in Fig. 2.11.
2.4.2 Stack Frames
While execution is inside a function, such as sub(), it can only access
global variables, parameters passed in by the caller and local variables.
Global and static local variables are in the combined Data section,
which can be referenced by a fixed base register.
So the problem is: how to reference parameters and automatic locals?
• The stack area visible to a function, i.e. parameters and automatic
locals, is called the Stack Frame of a function, FP is called the Stack
Frame Pointer.
What would happen if we have a sequence of function calls, e.g.
crt0.o --> main() --> A(par_a) --> B(par_b) --> C(par_c)
the function call sequence is maintained in the stack as a link list, as
shown in Fig. 2.13.

• By convention, the CPU’s FP = 0 when crt0.o is entered from the OS


kernel. When a function returns, its stack frame is deallocated and the
stack shrinks back.
2.4.3 Return From Function Call
When sub() executes the C statement return x+y+u+v, it evaluates the
expression and puts the resulting value in the return value register (AX). Then
it deallocates the local variables by
• copy FP into SP; # SP now points to the saved FP in stack.
• pop stack into FP; # this restores FP, which now points to the caller’s
#stack frame,leaving the return PC on the stack top.
(On the x86 CPU, the above operations are equivalent to the leave instruction).
• Then, it executes the RET instruction, which pops the stack top into PC
register, causing the CPU to execute from the saved return address of the
caller.
(8) Upon return, the caller function catches the return value in the return
register (AX).
Then it cleans the parameters a, b, from the stack (by adding 8 to SP).
This restores the stack to the original situation before the function call.
Then it continues to execute the next instruction.
2.4.4 Long Jump
In a sequence of function calls, such as
main() --> A() --> B()-->C();
It is possible to return directly to an earlier function in the calling
sequence by a long jump.
The principle of long jump is very simple. When a function finishes, it
returns by the (callerPC, callerFP) in the current stack frame, as shown
in Fig. 2.15.
If we replace (callerPC, callerFP) with (savedPC, savedFP) of an earlier
function in the calling sequence, execution would return to that function
directly.
2.4.5 Run-Time Stack Usage in 64-Bit GCC
• In 64-bit mode, the CPU registers are expanded to rax, rbx, rcx, rdx,
rbp, rsp, rsi, rdi, r8 to r15, all 64-bit wide.
• When calling a function,the first 6 parameters are passed in rdi, rsi,
rdx, rcx, r8, r9, in that order.
• Any extra parameters are passed through the stack as they are in 32-bit
mode.
• Upon entry, a called function first establishes the stack frame (using
rbp) as usual. Then it may shift the stack pointer (rsp) downward for
local variables and working spaces on the stack.
• The GCC compiler generated code may keep the stack pointer fixed,
with a default reserved Red Zone stack area of 128 bytes, while
execution is inside a function, making it possible to access stack
contents by using rsp as the base register.
• the GCC compiler generated code still uses the stack frame pointer rbp
to access both parameters and locals.

Example: Function Call Convention in 64-Bit
Mode
(1) The following t.c file contains a main() function in C, which defines
9 local int (32-bit) variables, a to i. It calls a sub() function with 8 int
parameters.
(2) Under 64-bit Linux, compile t.c to generate a t.s file in 64-bit
assembly by
gcc –S t.c # generate t.s file
2.5 Link C Program with Assembly Code
2.5.1 Programming in Assembly
(1) C code to Assembly Code
======= compile a.c file into 32-bit assembly code ======
cc -m32 -S a.c ===> a.s file

The assembly code generated by GCC consists of three parts:


(1) Entry: also called the prolog, which establishes stack frame,
allocates local variables and working space on stack
(2) Function body, which performs the function task with return value in
AX register
(3) Exit: also called the epilog, which deallocates stack space and return
to caller
A: # A() start code location
(1). Entry Code:
pushl %ebp
movl %esp, %ebp # establish stack frame
The stack contents become
subl $24, %esp
Shift SP downward 24 bytes to allocate space for locals variables and
working area.
(2). Function Body Code:
movl $4, -20(%ebp) // d=4
movl $5, -16(%ebp) // e=5
movl $6, -12(%ebp) // f=6
After assigning values to the local variables, the stack contents become
# call B(d,e): push parameters d, e in reverse order:
subl $8, %esp # create 8 bytes TEMP slots on stack
pushl -16(%ebp) # push e
pushl -20(%ebp) # push d

call B # B() will grow stack to the RIGHT


# when B() returns:
addl $16, %esp # clean stack
movl %eax, -12(%ebp) # f = return value in AX
(3). Exit Code:
# leave
movl %ebp, %esp # SAME as leave
popl %ebp
ret # pop retPC on stack top into PC
2.5.2 Implement Functions in Assembly
Example 1: Get CPU registers. Since these functions are simple, they do
not need to establish and deallocate stack frames.
Example 2: mysum.c mysum.s
# (1) Entry:(establish stack frame)
pushl %ebp
movl %esp, %ebp

# (2): Function Body Code of mysum: compute x+y in AX register


movl 8(%ebp), %eax # AX = x
addl 12(%ebp), %eax # AX += y
# (3) Exit Code: (deallocate stack space and return)
movl %ebp, %esp
pop %ebp
ret
2.5.3 Call C functions from Assembly
Example 3: Access global variables and call printf().
2.6 Link Library
In Linux, there are two kinds of link libraries;
• static link library for static linking,
• dynamic link library for dynamic linking.
• Assume that we have a function
// mysum.c file
int mysum(int x, int y){ return x + y; }
• We would like to create a link library containing the object code of the mysum() function, which
can be called from different C programs, e.g.
// t.c file
int main()
{
int sum = mysum(123,456);
}
2.6.2 Dynamic Link Library
The following steps show how to create and use a dynamic link library.
(1). gcc –c -fPIC mysum.c # compile to Position Independent
Code mysum.o
(2). gcc –shared -o libmylib.so mysum.o # create shared libmylib.so with
mysum.o
(3). gcc t.c -L. –lmylib # generate a.out using shared library
libmylib.so
(4). export LD_LIBRARY_PATH=./ # to run a.out, must export
LD_LIBRARY_PATH=./
(5). a.out # run a.out. ld will load libmylib.so
2.6.1 Static Link Library
(1). gcc –c mysum.c # compile mysum.c into mysum.o
(2). ar rcs libmylib.a mysum.o # create static link library with
member mysum.o
(3). gcc -static t.c -L. –lmylib # static compile-link t.c with libmylib.a
as link library
(4). a.out # run a.out as usual

-L. specifies the library path (current directory), and -l specifies the library.
Note that the library (mylib) is specified without the prefex lib, as well as the
suffix .a
2.7 Makefile
2.7.1 Makefile Format
• A make file consists of a set of targets, dependencies and rules.
• A target is usually a file to be created or updated, it may also be a
directive to, or a label to be referenced by, the make program.
• A target depends on a set of source files, object files or even other
targets, which are described in a Dependency List.
• Rules are the necessary commands to build the target by using the
Dependency List.
Makefile format
2.7.2 The make Program
• When the make program reads a makefile, it determines which targets
to build by comparing the timestamps of source files in the
Dependency List.
• If any dependency has a newer timestamp since last build, make will
execute the rule associated with the target.

• Assume that we have a C program consisting of three source files:


(1). type.h file: // header file
int mysum(int x, int y) // types, constants, etc
(2). mysum.c file: // function in C
#include <stdio.h>
#incldue “type.h”
int mysum(int x, int y)
{
return x+y;
}
(3). t.c file: // main() in C
#include <stdio.h>
#include “type.h”
int main()
{
int sum = mysum(123,456);
printf(“sum = %d\n”, sum);
}
Normally, we would use the sh command
gcc –o myt main.c mysum.c
2.7.3 Makefile Examples
Makefile Example 1
(1). Create a makefile named mk1 containing:
myt: type.h t.c mysum.c # target: dependency list
gcc –o myt t.c mysum.c # rule: line MUST begin with a TAB
(2). Run make using mk1 as the makefile: make normally uses the
default makefile or Makefile,whichever is present in the current
directory. It can be directed to use a different makefile by the –f flag, as
in
make –f mk1
(3). Run the make command again. It will show the message
make: ‘myt’ is up to date
(4). On the other hand, make will execute the rule command again if any
of the files in the dependency list has changed. A simple way to modify
a file is by the touch command, which changes the timestamp of the file.
So if we enter the sh commands
touch type.h // or touch *.h, touch *.c, etc.
make –f mk1
make will recompile-link the source files to generate a new myt file
(5). If we delete some of the file names from the dependency list, make
will not execute the rule command even if such files are changed.
• macro defined symbols are replaced with their values by $(symbol),
e.g. $(CC) is replaced with gcc, $(CFLAGS) is replaced with –Wall, etc.
• For each .o file in the dependency list, make will compile the
corresponding .c file into .o file, this works only for .c files.
• Since all the .c files depend on .h files, we have to explicitly include
type.h (or any other .h files) in the dependency list also.
• Alternatively, we may define additional targets to specify the
dependency of .o files on .h files, as in
t.o: t.c type.h # t.o depend on t.c and type.h
gcc –c t.c
mysum.o: mysum.c type.h # mysum.o depend type.h
gcc –c mysum.c
Makefile Example 2: Macros in Makefile
(1). Create a makefile named mk2 containing:
CC = gcc # define CC as gcc
CFLAGS = -Wall # define CLAGS as flags to gcc
OBJS = t.o mysum.o # define Object code files
INCLUDE = -Ipath # define path as an INCLUDE directory

myt: type.h $(OBJS) # target: dependency: type.h and .o files


$(CC) $(CFLAGS) –o t $(OBJS) $(INCLUDE)
Alternatively, we may define additional targets to specify the
dependency of .o files on .h files, as in
t.o: t.c type.h # t.o depend on t.c and type.h
gcc –c t.c
mysum.o: mysum.c type.h # mysum.o depend type.h
gcc –c mysum.c
(3). Run make using mk2 as the makefile:
make –f mk2

The simple makefiles of Examples 1 and 2 are sufficient for compile-


link most small C programs.
Makefile Example 3: Make Target by Name
• When make runs on a makefile, it normally tries to build the first
target in the makefile.
• The behavior of make can be changed by specifying a target name,
which causes make to build the specific named target.
all: myt install # build all listed targets: myt, install
myt: t.o mysum.o # target: dependency list of .o files
$(CC) $(CFLAGS) –o myt $(OBJS) $(INCLUDE)
t.o: t.c type.h # t.o depend on t.c and type.h
gcc –c t.c
mysum.o: mysum.c type.h # mysum.o depend mysum.c and type.h
gcc –c mysum.c
install: myt # depend on myt: make will build myt first
echo install myt to /usr/local/bin
sudo mv myt /usr/local/bin/ # install myt to /usr/local/bin/
run: install # depend on install, which depend on myt
echo run executable image myt
myt || /bin/true # no make error 1 if main() return non-zero
clean:
rm –f *.o 2> /dev/null # rm all *.o files
sudo rm –f /usr/local/bin/myt # rm myt
test the mk3 file by entering the following make commands:
(1). make all –f mk3 # build all targets: myt and install
(2). make install –f mk3 # build target myt and install myt
(3). make run –f mk3 # run /usr/local/bin/myt
(4). make clean –f mk3 # remove all listed files
Makefile Variables
• % is a wildcard variable similar to * in sh.
• Automatic variables, which are set by make after a rule is matched.
They provide access to elements from the target and dependency lists.
• $@ : name of current target.
• $< : name of first dependency
• $^ : names of all dependencies
• $* : name of current dependency without extension
• $? : list of dependencies changed more recently than current target.
make also supports suffix rules, which are not targets but directives to
the make program.

DEPS = type.h # list ALL needed .h files


%.o: %.c $(DEPS) # for all .o files: if its .c or .h file
changed
$(CC) –c –o $@ $< # compile corresponding .c file again

%.o stands for all .o files and $@ is set to the current target name, i.e.
the current .o file name. This avoids defining separate targets for
individual .o files.
Makefile Example 4: Use make variables and
suffix rules
CC = gcc
CFLAGS = -I.
OBJS = t.o mysum.o
AS = as # assume we have .s files in assembly also
DEPS = type.h # list all .h files in DEPS
.s.o: # for each fname.o, assemble fname.s into fname.o
$(AS) –a $< -o $@ # -o $@ REQUIRED for .s files
.c.o: # for each fname.o, compile fname.c into fname.o
$(CC) –c $< -o $@ # -o $@ optional for .c files
%.o: %.c $(DEPS) # for all .o files: if its .c or .h file changed
$(CC) –c –o $@ $< # compile corresponding .c file again
myt: $(OBJS)
$(CC) $(CFLAGS) -o $@ $^
the lines .s.o: and .c.o: are not targets but directives to the make program
by the suffix rule.
for each .o file, there should be a corresponding .s or .c file to build
if their timestamps differ.
$@ means the current target.
$< means the first file in the dependency list.
$^ means all files in the dependency list.

we may use make variables to write very general and compact


makefiles.
The downside is that such makefiles are rather hard to understand,
especially for beginning programmers.
Makfiles in Subdirectories
• the source files are usually organized into different levels of
directories, each with its own makefile.
• let make go into a subdirectory to execute the local makefile in that
directory by the command
(cd DIR; $(MAKE)) OR cd DIR && $(MAKE)
• After executing the local makefile in a subdirectory, control returns to
the current directory from where make continues.
Makefile Example 5: PMTX System
Makefiles
• PMTX (Wang 2015) is a Unix-like operating system designed for the
Intel x86 architecture in 32-bit protect mode.
• The source files of PMTX are organized in three subdirectories:
Kernel : PMTX kernel files; a few GCC assembly files, mostly in C
Fs : file system source files; all in C
Driver : device driver source files; all in C

Top level Makefile, Kernel Makefile, Fs Makefile, Driver Makefile


2.8 The GDB Debugger
• The GNU Debugger (GDB) is an interactive debugger,which can
debug programs written in C, C++ and several other languages.
• In Linux, the command man gdb displays the manual pages of gdb,
which provides a brief description of how to use GDB.
2.8.1 Use GDB in Emacs IDE
• 1. Source Code: Under X-window, open a pseudo-terminal. Use
EMACS to create a Makefile, as shown below.
Makefile:
t: t.c
gcc –g –o t t.c
2. Compile Source Code: When EMACS is running, it displays a menu and a
tool bar at the top of the edit window (Fig. 2.17).

Open EMACS Tools menu and select Compile. EMACS will show a prompt
line at the bottom of the edit window
make –k
and waits for user response.
EMACS normally compile-link the source code by a makefile. If the
reader already has a makefile in the same directory as shown above, press the
Enter key to let EMACS continue.
In instead of a makefile, the reader may also enter the command line manually.
3. Start up GDB: Open EMACS Tools menu and select Debugger.
EMACS will show a prompt line at the bottom of the edit window and
wait for user response.
gdb –i=mi t
Press Enter to start up the GDB debugger.
GDB will run in the upper window and display a menu and a
tool bar at the top of the EMACS edit window, as shown in Fig. 2.18.
The user may now enter GDB commands to debug the program. For
example, to set break points, enter the GDB commands
b main # set break point at main
b sub # set break point at sub
b 10 # set break point at line 10 in program

When the user enters the Run (r) command (or choose Run in the tool
bar), GDB will display the program code in the same GDB window.
Other frames/windows can be activated through the submenu
GDB-Frames or GDB-Windows.
4. GDB in Multi-Windows: From the GDB menu, choose Gud => GDB-
MI => Display Other Windows.
GDB will display GDB buffers in different windows, as shown in Fig.
2.19.
Figure 2.19 shows six (6) GDB windows, each displays a specific GDB
buffer.
• Gud-t: GDB buffer for user commands and GDB messages
• t.c: Program source code to show progress of execution
• Stack frames: show stack frames of function calling sequence
• Local Registers: show local variables in current executing function
• Input/output: for program I/O
• Breakpoints: display current break points settings
• It also shows some of the commonly used GDB commands in a tool
bar, e.g. Run, Continue, Next line, Step line.
Figure 2.20 shows that the program execution is now inside sub() and
the execution already passed the statements before
printf(“return from sub\n”);
(5). Additional GDB Commands:
• At each break point or while executing in single line mode, the user
may enter GDB commands
• either manually,
• by the GDB tool bar or by choosing submenu items in the Gud menu,
which includes all the commands in the GDB tool bar.
• The following lists some additional GDB commands and their
meanings.
2.9 Structures in C
A structure is a composite data type containing a collection of variables
or data objects.

Assume that we need a node structure containing the following


fields.
next : a pointer to the next node structure;
key : an integer;
name : an array of 64 chars;
“struct node” can be used as a derived type to define variables of that
type, as in
struct node x, *nodePtr;

Alternatively, we may define “struct node” as a derived type by the


typedef statement.
typedef struct node{
struct node *next;
int key;
char name[64];
}NODE;
NODE is a derived type, which can be used to define variables of that
type.
NODE x, *nodePtr;
The following summarizes the properties of C structures.
(1). When defining a C structure, every field of the structure must have
a type already known to the compiler, except for the self-referencing
pointers.
This is because pointers are always the same size,e.g. 4 bytes in 32-bit
architecture.
In the above NODE type, the field next is a
struct node *next;
The compiler knows struct node is a type (despite being incomplete yet)
and
how many bytes to allocate for the next pointer.
In contrast, the following statements
typedef struct node{
NODE *next; // error
int key;
char name[64];
}NODE;
the compiler does not know what is the NODE type yet,despite next is a
pointer.
(2). Each C structure data object is allocated a piece of contiguous
memory.
The individual fields of a C structure are accessed by using the .
operator.
At run time, each field is accessed as an offset from the beginning
address of the structure.
(3). The size of a structure can be determined by sizeof(struct type). The
C compiler will calculate the size in total number of bytes of the
structure.
Due to memory alignment constraints, the C compiler may pad some of
the fields of a structure with extra bytes.
If needed, the user may define C structures with the PACKED attribute,
which prevents the C compiler from padding the fields with extra bytes.
typedef struct node{
struct node *next;
int key;
char name[2];
}__attribute__((packed, aligned(1))) NODE;
In this case, the size of the NODE structure will be 10 bytes.
Without the packed attribute, it would be 12 bytes because the C
compiler would pad the name field with 2 extra bytes, making every
NODE object a multiple of 4 bytes for memory alignment.
Can also use
#pragma pack(1) or __attribute__((aligned(1)))
(4). Assume that NODE x, y; are two structures of the same type. Rather
than copying the individual fields of a structure, we can assign x to y by
the C statement y = x.
(5). Unions in C is similar to structures.
union node{
int *ptr; // pointer to integer
int ID; // 4-byte integer
char name[32]; // 32 chars
}x; // x is a union of 3 fields
Each member in a structure has a unique memory area, all members of a
union share the same memory area.
The size of a union is determined by the largest member.
2.10.10 Open-Ended C Structures
consider the following structure, in which the last member is an array of
unspecified size.
struct node{
struct node *next;
int ID;
char name[ ]; // unspecified array size
};
• name[ ] denotes an incompletely specified field, which must be the
last entry.
• The size of an open-ended structure is determined by the specified
fields.
• For the above example, the structure size is 8 bytes.
• The user must allocate the needed memory for the actual structure, as
in
struct node *sp = malloc(sizeof(struct node) + 32);
strcpy(sp->name, “this is a test string”);

You might also like