ch3进程管理
ch3进程管理
ch3进程管理
Unix/Linux
3.1 Multitasking
• In general, multitasking refers to the ability of performing several
independent activities at the same time.
• In computing, multitasking refers to the execution of several
independent tasks at the same time.
• In a uniprocessor (single CPU) system,Logical parallelism is called
concurrency.
• In multiprocessor systems with many CPUs or processor cores, tasks
can execute on different CPUs in parallel in real time.
• Multitasking is the basis of all operating systems. It is also the basis of
concurrent programming in general.
3.2 The Process Concept
• An execution image as a memory area containing the execution’s
code, data and stack.
• A process is the execution of an image.
• It is a sequence of executions regarded by the OS kernel as a single
entity for using system resources.
• System resources include memory space, I/O devices and, most
importantly, CPU time.
• Each process is represented by a unique data structure, called the
Process Control Block (PCB) or Task Control Block (TCB)
• We shall simply call it the PROC structure.
• In a real OS, the PROC structure may contain many fields and quite large.
• To begin with, we shall define a very simple PROC structure to represent
processes.
typedef struct proc{
struct proc *next; // next proc pointer
int *ksp; // saved sp: at byte offset 4
int pid; // process ID
int ppid; // parent process pid
int status; // PROC status=FREE|READY, etc.
int priority; // scheduling priority
int kstack[1024]; // process execution stack
}PROC;
• The field ksp is the saved stack pointer.
• When a process gives up the CPU, it saves the execution context in
stack and saves the stack pointer in PROC.ksp for resumption later.
• kstack is the process stack when it is executing.
• An OS kernel usually defines a finite number of PROC structures in
its data area, denoted by
PROC proc[NPROC]; // NPROC a constant, e.g. 64
In a single CPU system, only one process can be executing at a time.
The OS kernel usually uses a global PROC pointer, running or current,
to point at the PROC that is currently executing.
• In a multiprocessor OS with multiple CPUs, many processes may be
executing on different CPUs in parallel in real time.
• In a MP system running[NCPU] may be an array of pointers, each
points at a PROC running on a specific CPU.
3.3 A Multitasking System
• We begin with a programming example, which is designed to illustrate
the principles of multitasking, context switching and processes.
• The program implements a multitasking environment, which simulates
the kernel mode operations of an operating system.
• The multitasking system, denoted by MT, consists of the following
components.
The type.h file defines system constants and a simple PROC structure to
represent processes.
The ts.s file implements process context switching in 32-bit GCC
assembly code.
The queue.c file implements queue and list operation functions.
The t.c file defines the MT system data structures, system initialization
code and process management functions.
Figure 3.1 shows the sample outputs of
running the MT multitasking system.
3.3.5 Explanations of the Multitasking
System Code
(1). The Virtual CPU: The MT system is compile-linked under Linux as
gcc –m32 t.c ts.s
The entire MT system runs as Linux process in user mode.
Within the Linux process,we create independent execution entities
called tasks and schedule them to run inside the Linux process
by our own scheduling algorithm.
To the tasks in the MT system, the Linux process behaves as a
virtual CPU.
(2). init(): When the MT system starts, main() calls init() to initialize the
system.
Init() initializes the PROC structures and enters them into a freeList. It
also initializes the readyQueue to empty.
Then it uses proc[0] to create P0 as the initial running process. P0 has
the lowest priority 0.
All other tasks will have priority 1, so that they will take turn to run
from the readyQueue.
(3). P0 calls kfork() to create a child process P1 with priority 1 and enter
it into the ready queue.
Then P0 calls tswitch(), which will switch task to run P1.
(4). tswitch(): The tswitch() function implements process context
switching.
It acts as a process switch box, where one process goes in and, in
general, another process emerges.
tswitch() consists of 3 separated steps.
(4).1. SAVE part of tswitch(): When an executing task calls tswitch(), it
saves the return address on stack and enters tswitch() in assembly code.
In tswitch(), the SAVE part saves CPU registers into the calling task’s
stack and saves the stack pointer into proc.ksp.
The Intel x86 CPU in 32-bit mode has many registers, but only the
registers eax, ebx, ecx, edx, ebp, esi, edi and eflag are
visible to a Linux process in user mode, which is the virtual CPU of the
MT system.
So we only need to save and restore these registers of the virtual CPU.
(4).2. scheduler(): After executing the SAVE part of tswitch(), the task
calls scheduler() to pick the next running task.
In scheduler(), if the calling task is still READY to run, it calls
enqueue() to put itself into the readyQueue by priority.
Otherwise, it will not be in the readyQueue, which makes it non-
runnable.
Then it calls dequeue(), which returns the first PROC removed from
readyQueue as the new running task.
(4).3. RESUME part of tswitch(): When execution returns from
scheduler(), running may have changed to point at the PROC of a
different task, it is the current running task.
The RESUME part of tswitch() sets the CPU’s stack pointer to the saved
stack pointer of the current running task.
Then it pops saved registers, followed by RET, causing the current
running task return to where it called tswitch() earlier.
(5). kfork(): The kfork() function creates a child task and enters it into
the readyQueue.
Every newly created task begins execution from the same body()
function.
Although the new task never existed before, we may pretend that it not
only existed before but also ran before.
The reason why it is not running now is because it called tswitch() to
give up CPU earlier.
If so, its stack must contain a frame saved by the SAVE part of
tswitch(), and its saved ksp must point at the stack top.
Since the new task never really ran before, we may assume that its stack
was empty and all CPU register contents are 0 when it called tswitch().
Thus, in kfork(), we initialize the stack of the new task as follows.
• These are done by the following code segment in kfork().
When the new task starts to run, it begins by executing the RESUME
part of tswitch(),
which causes it to return to the entry address of the body() function.
When execution first enters body(), the task’s stack is logically empty.
In practice, the body() function never returns, so there is no need for a
return address in the stack.
(5). body(): For demonstration purpose, all tasks are created to execute the
same body() function.
This shows the difference between processes and programs. Many
processes may execute the same program code, but each process executes
in its own context.
For instance, all (automatic) local variables in body() are private to the
process since they are all allocated in the per process stack.
If the body() function calls other functions, all the calling sequences are
maintained in the per process stack.
While executing in body(), a process prompts for a command char = [f|s|
q], where
f : kfork a new child process to execute body();
s : switch process;
q : terminate and return the PROC as FREE to freeList
(6). The Idle Task P0: P0 is special in that it has the lowest priority
among all the tasks.
After system initialization, P0 creates P1 and switches to run P1. P0 will
run again if and only if there are no runnable tasks.
In that case, P0 simply loops. It will switch to another task whenever
readyQueue becomes nonempty.
In the base MT system, P0 runs again if all other processes have
terminated.
To end the MT system, the user may enter Control-C to kill the Linux
process.
(7). Running the MT Multitasking System: Under Linux, enter
gcc –m32 t.c s.s
to compile-link the MT system and run the resulting a.out
3.4 Process Synchronization
• Process synchronization refers to the rules and mechanisms used to
control and coordinate process interactions to ensure their proper
executions.
• The simplest tools for process synchronization are sleep and wakeup
operations.
3.4.1 Sleep Operation
To implement the sleep operation, we can add an event field to the
PROC structure and
implement a ksleep(int event) function, which lets a process go to sleep.
PROC structure is modified to include the added fields shown in bold
face.
/************ Algorithm of ksleep(int event) **************/
1. record event value in PROC.event: running->event = event;
2. change status to SLEEP: running->status = SLEEP;
3. for ease of maintenance, enter caller into a PROC *sleepList
enqueue(&sleepList, running);
4. give up CPU: tswitch();
Since a sleeping process is not in the readyQueue, it’s not runnable until
it is woken up by another process.
3.4.2 Wakeup Operation
Many processes may sleep on the same event, which is natural since all
of them may need the same resource,
In that case, all such processes would go to sleep on the same event
value.
When an awaited event occurs, another execution entity, which may be
either a process or an interrupt handler, will call kwakeup(event), which
wakes up ALL the processes sleeping on the event value.
• It is noted that an awakened process may not run immediately.
• It is only put into the readyQueue,waiting for its turn to run.
• When an awakened process runs, it must try to get the resource again
if it was trying to get a resource before the sleep.
3.5 Process Termination
A process may terminate in two possible ways:
• Normal termination: The process calls exit(value), which issues
_exit(value) system call to execute kexit(value) in the OS kernel.
• Abnormal termination: The process terminates abnormally due to a
signal.
• In either case, when a process terminates, it eventually calls kexit() in
the OS kernel.
3.5.1 Algorithm of kexit()
/**************** Algorithm of kexit(int exitValue)
*****************/
1. Erase process user-mode context, e.g. close file descriptors,
release resources, deallocate user-mode image memory, etc.
2. Dispose of children processes, if any
3. Record exitValue in PROC.exitCode for parent to get
4. Become a ZOMBIE (but do not free the PROC)
5. Wakeup parent and, if needed, also the INIT process P1
• All processes in the MT system run in the simulated kernel mode of an
OS.
• they do not have any user mode context.
• In Unix/Linux, processes only have the very loose parent-child
relation but their execution environments are all independent.
• Thus, in Unix/Linux a process may die any time.
• If a process with children dies first, all the children processes would
have no parent anymore, i.e. they become orphans.
What to do with such orphans?
• There must be a process which should not die if there are other
processes still existing.
• In all Unix-like systems, the process P1, which is also known as the
INIT process, is chosen to play this role.
• When a process dies, it sends all the orphaned children, dead or alive,
to P1, i.e. become P1's children.
• We shall also designate P1 in the MT system as such a process.
3.5.2 Process Family Tree
The process family tree is implemented as a binary tree by a pair of
child and sibling pointers in each PROC, as in
PROC *child, *sibling, *parent;
child points to the first child of a process.
sibling points to a list of other children of the same parent.
parent points to its parent.
the process tree shown on the left-hand side of Fig. 3.2 can be
implemented as the binary tree shown on the right-hand side
Each PROC has an exitCode field, which is the process exitValue when
it terminates.
After recording exitValue in PROC.exitCode, the process changes its
status to ZOMBIE but does not free the PROC structure.
• Then the process calls kwakeup(event) to wake up its parent, where
event must be the same unique value used by both the parent and child
processes, e.g. the address of the parent PROC structure or the parent
pid.
• It also wakes up P1 if it has sent any orphans to P1.
• The final act of a dying process is to call tswitch() for the last time.
• After these, the process is essentially dead but still has a dead body in
the form of a ZOMBIE PROC, which will be buried (set FREE) by the
parent process through the wait operation.
3.5.3 Wait for Child Process Termination
At any time, a process may call the kernel function
pid = kwait(int *status)
to wait for a ZOMBIE child process.
If successful, the returned pid is the ZOMBIE child's pid and
status contains the exitCode of the ZOMBIE child.
In addition, kwait() also releases the ZOMBIE child PROC back to the
freeList for reuse.
When a process terminates, it must issue to wake up the parent.
kwakeup(running->parent);
Note that each kwait() call handles only one ZOMBIE child.
If a process has many children, it may have to call kwait() multiple
times to dispose of all the dead children.
A process may terminate first without waiting for any dead child.
When a process dies, all of its children become children of P1.
In a real system, P1 executes in an infinite loop, in which it repeatedly
waits for dead children, including adopted orphans.
In a Unix-like system, the INIT process P1 wears many hats.
• It is the ancestor of all processes except P0. In particular, it is the
grand daddy of all user processes since all login processes are children
of P1.
• All orphans are sent to his house and call him Papa.
• It keeps looking for ZOMBIEs to bury their dead bodies.
• in a Unix-like system if the INIT process P1 dies or gets stuck, the
system would stop functioning
• because no user can login again and the system will soon be full of
rotten corpses.
3.7 Processes in Unix/Linux
3.7.1 Process Origin
• When an operating system starts, the OS kernel’s startup code creates
an initial process with PID=0 by brute force, i.e. by allocating a PROC
structure, usually proc[0].
• Initializes the PROC contents and lets running point at proc[0], the
system is executing the initial process P0.
• P0 continues to initialize the system, which include both the system
hardware and kernel data structures.
• Then it mounts a root file system to make files available to the system.
• After initializing the system, P0 forks a child process P1 and switches
process to run P1 in user mode.
3.7.2 INIT and Daemon Processes
• When the process P1 starts to run, it changes its execution image to the
INIT program.
• Thus, P1 is commonly known as the INIT process because its execution
image is the init program.
• P1 starts to fork many children processes. Most children processes of P1 are
intended to provide system services.
• They run in the background and do not interact with any user. Such
processes are called daemon processes.
syslogd: log daemon process
inetd : Internet service daemon process
httpd : HTTP server daemon process etc.
3.7.3 Login Processes
P1 also forks many LOGIN processes, one on each terminal, for users to
login.
Each LOGIN process opens three FILE streams associated with its own
terminal.
The three file streams are stdin for standard input, stdout for standard
output and stderr for standard error messages.
Each file stream is a pointer to a FILE structure in the process HEAP
area.
Each FILE structure records a file descriptor (number), which is 0 for
stdin, 1 for stdout and 2 for stderr.
Each LOGIN process displays a login: to its stdout, waiting for users to
login.
User accounts are maintained in the files /etc/passwd and /etc/
shadow.
Each user account has a line in the /etc/passwd file of the form
name:x:gid:uid:description:home:program
name is the user login name
x means check password during login
gid is the user’s group ID
uid is the user ID
home is the user’s home directory
program is the initial program to execute after the user login
Additional user account information are maintained in the /etc/shadow
file.
Each line of the shadow file contains the encrypted user password,
followed by optional aging limit information, such as expiration date
and time, etc.
When a user tries to login with a login name and password, Linux will
check both the /etc/passwd and /etc/shadow files to authenticate the
user.
3.7.4 Sh Process
When a user login successfully, the LOGIN process acquires the user’s
gid and uid, thus becoming the user’s process.
It changes directory to the user’s home directory and executes the listed
program, which is usually the command interpreter sh.
The user process now executes sh, so it is commonly known as
the sh process.
It prompts the user for commands to execute. Some special commands,
such as cd (change directory), exit, logout, etc. are performed by sh
itself directly.
Most other commands are executable files in the various bin directories,
such as /bin, /sbin, /usr/bin, /usr/local/bin, etc.
For each (executable file) command, sh forks a child process and waits
for the child to terminate.
The child process changes its execution image to the command file and
executes the command program.
When the child process terminates, it wakes up the parent sh, which
collects the child process termination status, frees the child PROC
structure and prompts for another command, etc.
In addition to simple commands, sh also supports I/O redirections and
multiple commands connected by pipes.
3.7.5 Process Execution Modes
In Unix/Linux, a process may execute in two different modes; Kernel
mode and User mode.
In each mode, a process has an execution image, as shown in Fig. 3.4.
In general, the Umode images of processes are all different.
In Kmode they share the same Kcode, Kdata and Kheap,
which are those of the OS Kernel, but each process has its own Kstack.
A process migrates between Kmode and Umode many times during its
life time.
Every process comes into existence and begins execution in Kmode.
While in Kmode, it can come to Umode very easily by changing CPU's
status register from K to U mode.
However, once in Umode it cannot change CPU's status arbitrarily for
obvious reasons.
A Umode process may enter Kmode only by one of three possible ways:
(1). Interrupts: Interrupts are signals from external devices to the CPU,
requesting for CPU service.
While executing in Umode, the CPU’s interrupts are enabled so that it
will respond to any interrupts.
When an interrupt occurs, the CPU will enter Kmode to handle the
interrupt, which causes the process to enter Kmode.
(2). Traps: Traps are error conditions, such as invalid address, illegal
instruction, divide by 0, etc.
Which are recognized by the CPU as exceptions, causing it to enter
Kmode to deal with the error.
In Unix/Linux, the kernel trap handler converts the trap reason to a
signal number and delivers the signal to the process.
For most signals, the default action of a process is to terminate.
(3). System Calls: System call, or syscall for short, is a mechanism
which allows a Umode process to enter Kmode to execute Kernel
functions.
When a process finishes executing Kernel functions, it returns to
Umode with the desired results and a return value, which is normally 0
for success or -1 for error.
In case of error, the external global variable errno (in errno.h) contains
an ERROR code which identifies the error. The user may use the library
function
perror(“error message”);
to print an error message.
Every time a process enters Kmode, it may not return to Umode
immediately.
In some cases, it may not return to Umode at all.
For example, the _exit() syscall and most traps would cause the process
to terminate in kernel, so that it never returns to Umode again.
When a process is about to exit Kmode, the OS kernel may switch
process to run another process of higher priority.
3.8 System Calls for Process Management
The following system calls in Linux are related to process management.
fork(), wait(), exec(), exit()
Each is a library function which issues an actual syscall
int syscall(int a, int b, int c, int d);
a is the syscall number, b, c, d are parameters to the corresponding
kernel function.
In Intel x86 based Linux, syscall is implemented by the assembly
instruction INT 0x80.(sysenter/sysexit, syscall/sysret)
3.8.1 fork()
Usage: int pid = fork();
fork() creates a child process and returns the child's pid or -1 if fork()
fails.
Figure 3.5 shows the actions of fork(),
(1). The left-hand side of Fig. 3.5 shows the images of a process Pi,
which issues the syscall pid=fork() in Umode.
(2). Pi goes to Kmode to execute the corresponding kfork() function in
kernel, in which it creates a child process PROCj with its own Kmode
stack and Umode image, as shown in the right-hand side of the figure.
The Umode image of Pj is an IDENTICAL copy of Pi's Umode
image. Therefore, Pj's code section also has the statement
pid=fork();
Furthermore, kfork() lets the child inherit all opened files of the parent.
Thus, both the parent and child can get inputs from stdin and display to
the same terminal of stdout and stderr.
(3). After creating a child, Pi returns to the statement
pid = fork(); // parent return child PID
in its own Umode image with the child's pid = j. It returns -1 if fork()
failed, in which case no child is created.
(4). When the child process Pj runs, it exits Kmode and returns to the
same statement
pid = fork(); // child returns 0
in its own Umode image with a 0 return value.
Thus the program code should be written as
int pid = fork();
if (pid)
{
// parent executes this part
}
else
{
// child executes this part
}
3.8.2 Process Execution Order
After fork(), Which process will run next depends on their scheduling
priorities, which change dynamically.
In addition to sleep(seconds), which suspends a calling process for a
number of seconds, Unix/Linux also provide the following syscalls,
which may affect the execution order of processes.
• nice(int inc): nice() increases the process priority value by a specified
value, which lowers the process scheduling priority (larger priority
value means lower priority).
• In a non-preemptive kernel, process switch may not occur
immediately. It occurs only when the executing process is about to exit
Kmode to return to Umode.
• sched_yield(void): sched_yield() causes the calling process to
relinquish the CPU, allowing other process of higher priority to run
first. However, if the calling process still has the highest priority, it
will continue to run.
3.8.3 Process Termination
A process executing a program image may terminate in two possible
ways.
(1). Normal Termination: the main() function of every C program is
called by the C startup code crt0.o.
If the program executes successfully, main() eventually returns to
crt0.o, which calls the library function exit(0) to terminate the process.
The exit(value) function does some clean-up work first, such as
flush stdout, close I/O streams, etc. Then it issues an _exit (value)
system call, which causes the process to enter the OS kernel to
terminate.
A 0 exit value usually means normal termination.
If desired, a process may call exit(value) directly from anywhere inside
a program without going back to crt0.o.
Even more drastically, a process may issue a _exit(value) system call to
terminate immediately without doing the clean-up work first.
When a process terminates in kernel, it records the value in the
_exit(value) system call as the exit status in the process PROC structure,
notifies its parent and becomes a ZOMBIE.
The parent process can find the ZOMBIE child, get its pid and exit
status by the
pid = wait(int *status);
system call, which also releases the ZMOBIE child PROC structure as
FREE, allowing it to be reused for another process.
(2). Abnormal Termination: While executing a program, the process
may encounter an error condition, such as illegal address, privilege
violation, divide by zero, etc. which is recognized by the CPU as an
exception.
When a process encounters an exception, it is forced into the OS
kernel by a trap.
The kernel’s trap handler converts the trap error type to a magic number,
called SIGNAL, and delivers the signal to the process, causing it to
terminate.
In this case, the process terminates abnormally and the exit status of the
ZOMBIE process is the signal number.
In addition to trap errors, signals may also originate from hardware or
from other processes.
For example, pressing the Control_C key generates a hardware
interrupt, which sends a number 2 signal(SIGINT) to all processes on
the terminal, causing them to terminate.
Alternatively, a user may use the command
kill -s signal_number pid # signal_number=1 to 31
to send a signal to a target process identified by pid.
For most signal numbers, the default action of a process is to terminate.
In either case, when a process terminates, it eventually calls a kexit()
function in the OS kernel. the Unix/Linux kernel will erase the user
mode image of the terminating process.
In Linux, each PROC has a 2-byte exitCode field, which records the
process exit status.
The high byte of exitCode is the exitValue in the _exit(exitValue)
syscall, if the process terminated normally.
The low byte is the signal number that caused it to terminate
abnormally.
Since a process can only die once, only one of the bytes has meaning.
3.8.4 Wait for Child Process Termination
At any time, a process may use the
int pid = wait(int *status);
system call, to wait for a ZOMBIE child process.
If successful, wait() returns the ZOMBIE child PID and status contains
the exitCode of the ZOMBIE child.
In addition, wait() also releases the ZOMBIE child PROC as FREE for
reuse.
The wait() syscall invokes the kwait() function in kernel.
The example program C3.3 demonstrates wait and exit system calls.
When running the Example 3.3 program, the child termination status
will be 0x6400, in which the high byte is the child’s exit value 100.
A process may use the syscall
int pid = waitpid(int pid, int *status, int options);
to wait for a specific ZOMBIE child specified by the pid parameter with
several options.
For instance, wait(&status) is equivalent to waitpid(-1, &status, 0).
3.8.5 Subreaper Process in Linux
Since kernel version 3.4, Linux handles orphan processes in a slightly
different way. A process may define itself as a subreaper by the syscall
prctl(PR_SET_CHILD_SUBREAPER);
The init process P1 will no longer be the parent of orphan processes.
Instead, the nearest living ancestor process that is marked as a subreaper
will become the new parent.
If there is no living subreaper process, orphans still go to the INIT
process as usual.
The reason to implement this mechanism is as follows.
Many user space service managers, such as upstart, systemd, etc. need to
track their started services.
Such services usually create daemons by forking twice but let the
intermediate child exit immediately, which elevates the grandchild to be a
child of P1.
The drawback of this scheme is that the service manager can no longer
receive the SIGCHLD (death_of_child) signals from the service daemons,
nor can it wait for any ZOMBIE children.
All information about the children will be lost when P1 cleans up the re-
parented processes.
With the subreaper mechanism, a service manager can mark itself as a
"sub-init", and is now able to stay as the parent for all orphaned processes
created by the started services.
Example 3.4: The example program C3.4 demonstrates subreaper
processes in Linux.
3.8.6 exec(): Change Process Execution
Image
A process may use exec() to change its Umode image to a different
(executable) file. The exec() library functions have several members:
int execl( const char *path, const char *arg, ...);
int execlp(const char *file, const char *arg, ...);
int execle(const char *path, const char *arg,...,char *const envp[]);
int execv( const char *path, char *const argv[]);
int execvp(const char *file, char *const argv[]);
All of these are wrapper functions, which prepare the parameters and
eventually issue the syscall
int execve(const char *filename, char *const argv[ ], char *const envp[ ]);
The first parameter filename is either relative to the Current Working
Directory (CWD) or an absolute pathname.
The parameter argv[ ] is a NULL terminated array of string pointers,
each points to a command line parameter string.
By convention, argv[0] is the program name and other argv[ ] entries are
command line parameters to the program. As an example,
a.out one two three
the following diagram shows the layout of argv [ ].
3.8.7 Environment Variables
Environment variables are variables that are defined for the current sh,
which are inherited by children sh or processes.
Environment variables are set in the login profiles and .bashrc script
files when sh starts.
Each Environment variable is defined as
KEYWORD=string
Within a sh session the user can view the environment variables by
using the env or printenv command.
Some of the important environment variables:
SHELL: This specifies the sh that will be interpreting any user commands.
TERM : his specifies the type of terminal to emulate when running the sh.
USER : The current logged in user.
PATH : A list of directories that the system will check when looking for
commands.
HOME: home directory of the user. In Linux, all user home directories are in
/home
While in a sh session, environment variables can be set to new (string)
values, as in
HOME=/home/newhome
which can be passed to descendant sh by the EXPORT command, as in
export HOME
They can also be unset by setting them to null strings.
Within a process, environment variables are passed to C programs via
the env[ ] parameter, which is a NULL terminated array of string
pointers, each points to an environment variable.
Environment variables define the execution environment of subsequent
programs.
Both command line parameters and environment variables must be
passed to an executing program. This is the basis of the main() function
in all C programs, which can be written as
int main(int argc, char *argv[ ], char *env[ ])
If successful, exec("filename",....) replaces the process Umode image
with a new image from the executable filename.
It's still the same process but with a new Umode image.
The old Umode image is abandoned and therefore never returned to,
unless exec() fails.
In general, after exec(), all opened files of the process remain open.
Opened file descriptors that are marked as close-on-exec are closed.
Most signals of the process are reset to default.
If the executable file has the setuid bit turned on, the process effective
uid/gid are changed to the owner of the executable file, which will be
reset back to the saved process uid/gid when the execution finishes.
3.9 I/O Redirection
3.9.1 FILE Streams and File Descriptors
The sh process has three FILE streams for terminal I/O: stdin, stdout
and stderr.
Each is a pointer to a FILE structure in the execution image’s HEAP
area, as shown below.
Each FILE stream corresponds to an opened file in the Linux kernel.
Each opened file is represented by a file descriptor (number).
The file descriptors of stdin, stdout, stderr are 0, 1, 2, respectively.
When a process forks a child, the child inherits all the opened files of the
parent.
3.9.2 FILE Stream I/O and System Call
When a process executes the library function
scanf("%s", &item);
it tries to input a (string) item from stdin, which points to a FILE
structure.
If the FILE structure's fbuf[ ] is empty, it issues a READ system call to
the Linux kernel to read data from the file descriptor 0, which
is mapped to the keyboard of a terminal (/dev/ttyX) or a pseudo-
terminal (/dev/pts/#).
In this case, the file descriptor of the opened file would be 0.
Thus the original fd 0 is replaced by the newly opened file.
We may also use
The syscall dup(fd) duplicates fd into the lowest numbered and unused
file descriptor, allowing both fd and 0 to access the same opened file.
In addition, the syscall
dup2(fd1, fd2)
duplicates fd1 into fd2, closing fd2 first if it was already open.
After any one of the above operations, every scanf() call will get inputs
from the opened file.
3.9.3 Redirect stdin
If we replace the file descriptor 0 with a newly opened file, inputs
would come from that file rather than the original input device.
The syscall close(0) closes the file descriptor 0, making 0 an unused file
descriptor.
The open() syscall opens a file and uses the lowest unused descriptor
number as the file descriptor.
3.9.4 Redirect stdout
When a process executes the library function
printf("format=%s\n", items);
it tries to write to the fbuf[ ] in the stdout FILE structure, which is line
buffered.
If fbuf[ ] has a complete line, it issues a WRITE syscall to write data
from fbuf[ ] to file descriptor 1, which is mapped to the terminal screen.
To redirect the standard outputs to a file, do as follows.
close(1);
open("filename", O_WRONLY|O_CREAT, 0644);
The outputs to stdout will go to that file instead of the screen.
We may also redirect stderr to a file.
When a process terminates(in Kernel), it closes all opened files.
3.10 Pipes
Pipes are unidirectional inter-process communication channels for
processes to exchange data.
A pipe has a read end and a write end.
Data written to the write end of a pipe can be read from the read end of
the pipe.
Pipes have been incorporated into almost all OS, with many variations.
Some systems allow pipes to be bidirectional.
Ordinary pipes are for related processes. Named pipes are FIFO
communication channels between unrelated processes.
Reading and writing pipes are usually synchronous and blocking.
Some systems support non-blocking and asynchronous read/write
operations on pipes.
For the sake of simplicity, we shall consider a pipe as a finite-sized
FIFO communication channel between a set of related processes.
Reader and writer processes of a pipe are synchronized in the following
manner.
When a reader reads from a pipe, if the pipe has data, the reader reads as
much as it needs (up to the pipe size) and returns the number of bytes
read.
If the pipe has no data but still has writers, the reader waits for data.
When a writer writes data to a pipe, it wakes up the waiting readers,
allowing them to continue.
If the pipe has no data and also no writer, the reader returns 0.
Since readers wait for data if the pipe still has writers, a 0 return value
means only one thing, namely the pipe has no data and also no writer.
When a writer writes to a pipe, if the pipe has room, it writes as much as
it needs to or until the pipe is full.
If the pipe has no room but still has readers, the writer waits for room.
When a reader reads data from the pipe to create more room, it wakes
up the waiting writers, allowing them to continue.
If a pipe has no more readers, the writer must detect this as a broken
pipe error and aborts.
3.10.1 Pipe Programming in Unix/Linux
In Unix/Linux, pipes are supported by a set of pipe related syscalls.
The syscall creates a pipe in kernel and returns two file descriptors in
pd[2].
pd[0] is for reading from the pipe and pd[1] is for writing to the pipe.
A process can only be either a reader or a writer on a pipe, but not both.
Figure 3.7 shows the system model of pipe operations.
The writer process may terminate first when it has no more data to
write, in which case the reader may continue to read as long as the PIPE
still has data.
However, if the reader terminates first, the writer should see a broken
pipe error and also terminate.
Note that the broken pipe condition is not symmetrical. It is a condition
that there are writers but no reader.
Example 3.7: The example program C3.7 demonstrates pipe operations.
3.10.2 Pipe Command Processing
The command line
cmd1 | cmd2
contains a pipe symbol ‘|’, Sh will run cmd1 by a process and cmd2 by
another process, which are connected by a PIPE.
Outputs of cmd1 become the inputs of cmd2. e.g.
ps x | grep "httpd" # show lines of ps x containing httpd
cat filename | more # display one screen of text at a time
3.10.3 Connect PIPE writer to PIPE reader
(1). When sh gets the command line cmd1 | cmd2, it forks a child sh and
waits for the child sh to terminate as usual.
(2). Child sh: scan the command line for | symbol. In this case,
cmd1 | cmd2
has a pipe symbol |. Divide the command line into head=cmd1,
tail=cmd2.
(3). Then the child sh executes the following code segment
int pd[2];
pipe(pd); // creates a PIPE
pid = fork(); // fork a child (to share the PIPE)
if (pid)
{ // parent as pipe WRITER
close(pd[0]); // WRITER MUST close pd[0]
close(1); // close 1
dup(pd[1]); // replace 1 with pd[1]
close(pd[1]); // close pd[1]
exec(head); // change image to cmd1
}
else
{ // child as pipe READER
close(pd[1]); // READER MUST close pd[1]
close(0);
dup(pd[0]); // replace 0 with pd[0]
close(pd[0]); // close pd[0]
exec(tail); // change image to cmd2
}
3.10.4 Named pipes
Named pipes are also called FIFOs. They have "names" and exist as
special files within the file system.
They exist until they are removed with rm or unlink.
They can be used with unrelated process, not just descendants of the
pipe creator.
Examples of named pipe
(1). From the sh, create a named pipe by the mknod command
mknod mypipe p
(2). OR from C program, issue the mknod() syscall
int r = mknod(“mypipe”, S_IFIFO, 0);
Either (1) or (2) creates a special file named mypipe in the current
directory. Enter
ls –l mypipe
will show it as
prw-r—r— 1 root root 0 time mypipe
where the file type p means it’s a pipe, link count =1 and size=0
(3). Processes may access named pipe as if they are ordinary files.
However, write to and read from named pipes are synchronized by the
Linux kernel.
The following diagram shows the interactions of writer and reader
processes on a named pipe via sh commands.
It shows that the writer stops if no one is reading from the pipe. The
reader stops if the pipe has no data.
Example 3.8: The example program C3.8 demonstrates named pipe
operations. It shows how to open a named pipe for read/write by
different processes.
(3).1 Writer process program:
(3).2. Reader process program: