Unix System Programming

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 92

Unix system programming

Chapter 1 : Introduction
UNIX AND ANSI STANDARDS
UNIX is a computer operating system originally
developed in 1969 by a group of AT&T employees
at Bell Labs, including Ken Thompson, Dennis
Ritchie, Douglas McElroy and Joe Ossanna.

Today UNIX systems are split into various


branches, developed over time by AT&T as well
as various commercial vendors and non-profit
organizations.
Introduction
The ANSI C Standard

In 1989, American National Standard Institute (ANSI)


proposed C programming language standard X3.159-1989 to
standardize the language constructs and libraries.

This is termed as ANSI C standard.

This attempt to unify the implementation of the C language


supported on all computer system.
The major differences between ANSI C and K&R C
[Kernighan and Ritchie] are as follows:

• Function prototyping
• Support of the const and volatile data type
qualifiers.
• Support wide characters and internationalization.
• Permit function pointers to be used without
dereferencing.
Function prototyping ANSI C adopts C++ function
prototype technique where function definition and
declaration include function names, arguments’ data
types, and return value data types.

This enables ANSI C compilers to check for function


calls in user programs that pass invalid number of
arguments or incompatible arguments’ data type.
These fix a major weakness of K&R C compilers:
Invalid function calls in user programs often pass compilation
but cause programs to crash when they are executed.

Eg:
unsigned long f1(char * fmt, double data)
{
/*body of f1*/
}
External declaration of this function f1 is
unsigned long f1(char * fmt, double data);

eg: int printf(const char* fmt,...........);

specify variable number of arguments


Support of the const and volatile data type qualifiers.
The const keyword declares that some data cannot be
changed.

Eg: int printf(const char* fmt,...........);


Declares a fmt argument that is of a const char * data type,
meaning that the function printf cannot modify data in any
character array that is passed as an actual argument value to
fmt.

Volatile keyword specifies that the values of some variables


may change asynchronously, giving an hint to the compiler’s
optimization algorithm not to remove any “redundant”
statements that involve “volatile” objects.
eg:
char get_io()
{
volatile char * io_port = 0x7777;
char ch = * io_port; /*read first byte of data*/
ch = * io_port; /*read second byte of data*/
}
If io_port variable is not declared to be volatile when the
program is compiled, the compiler may eliminate second
ch = *io_port statement, as it is considered redundant with
respect to the previous statement.
The const and volatile data type qualifiers are also
supported in C++.
Support wide characters and internationalisation

ANSI C supports internationalisation by allowing C-program


to use wide characters. Wide characters use more than one
byte of storage per character.

ANSI C defines the setlocale function, which allows users to


specify the format of date, monetary and real number
representations.

For eg: most countries display the date in dd/mm/yyyy format


whereas US displays it in mm/dd/yyyy format.
Function prototype of setlocale function is:
#include<locale.h>
char setlocale (int category, const char* locale);
The setlocale function prototype and possible values of the
category argument are declared in the <locale.h> header. The
category values specify what format class(es) is to be
changed.
Some of the possible values of the category argument are:
category value effect on standard C functions/macros
LC_CTYPE ⇒ Affects behavior of the <ctype.h> macros
LC_TIME ⇒ Affects date and time format.
LC_NUMERIC ⇒ Affects number representation format
LC_MONETARY ⇒ Affects monetary values format
LC_ALL ⇒ combines the affect of all above
The locale argument value is a character string that defines which locale
to use.
Possible values may be C, POSIX, en_US, etc. which refer to the UNIX,
POSIX and US locales respectively.
By default, all processes on an ANSI C or POSIX compliant system
execute the equivalent of the following call at their process start-up
time :
Setlocale(LC_ALL, ”C”); Thus all processes start up have a known locale.

If a locale value is NULL, the setlocale function returns the current


locale value of a calling process.
If a locale value is “”( null string), the setlocale function looks for an
environment variable LC_ALL, an environment variable with the same
name as the category argument value , and finally , the LANG
environment variable –in that order for the value of the locale argument
.
Setlocale function is an ANSI C standard that is also adopted by POSIX.1.
Permit function pointers to be used without dereferencing
ANSI C specifies that a function pointer may be used like a
function name.
No referencing is needed when calling a function whose
address is contained in the pointer.
For Example, the following statement given below defines a function pointer funptr,
which contains the address of the function f1.
extern void f1(double xyz, const int *ptr);
void (*funptr)(double, const int *)=f1;
The function f1 may be invoked by either directly calling f1 or via the funptr.
f1(12.78,”Hello world”);
funptr(12.78,”Hello world”);
K&R C requires funptr be dereferenced to call foo.
(* funptr) (13.48,”Hello usp”);
ANSI C also defines a set of C pre processor(cpp) symbols,
which may be used in user programs.

These symbols are assigned actual values at compilation time.


cpp SYMBOL USE
_STDC_ Feature test macro. Value is 1 if a
compiler is ANSI C, 0 otherwise
_LINE_ Evaluated to the physical line
number of a source file.
_FILE_ Value is the file name of a module
that contains this symbol.
_DATE_ Value is the date that a module
containing this symbol is compiled.
_TIME_ value is the time that a module
containing this symbol is compiled.
The following test_ansi_c.c program illustrates the use of these symbols:
#include<stdio.h>
int main()
{
#if __STDC__==0
printf(“cc is not ANSI C compliant”);
#else
printf(“%s compiled at %s:%s. This statement is at line
%d\n”, __FILE__ , __DATE__ , __TIME__ , __LINE__ );
#endif
Return 0;
}
Finally, ANSI C defines a set of standard library function &
associated headers. These headers are the subset of the C
libraries available on most system that implement K&R C.
The ANSI/ISO C++ Standard

These compilers support C++ classes, derived classes, virtual functions,


operator overloading.

Furthermore, they should also support template classes, template


functions, exception handling and the iostream library classes.
Differences between ANSI C and C++

ANSI C C++

Uses K&R C default function declaration Requires that all functions must be
for any functions that are referred declared / defined before they
before their declaration in the program. can be referenced.

int f1(); int f1();


ANSI C treats this as old C function C++ treats this as int f1(void);
declaration & interprets it as declared in Meaning that f1 may not accept
following manner. int f1(........); any argument when it is called.
meaning that f1 may be called with any
number of actual arguments.

Does not employ type safe linkage Encrypts external function names
technique and does not catch user errors. for type safe linkage. Thus reports any
user errors.
The POSIX standards

Because many versions of UNIX exist today and each


of them provides its own set of API functions, it is
difficult for system developers to create applications
that can be easily ported to different versions of UNIX.

To overcome this problem, the IEEE society formed a


special task force called POSIX in the 1980s to create a
set of standards for operating system interfacing.
•POSIX or “Portable Operating System Interface” is the name
of a family of related standards specified by the IEEE to define
the application-programming interface (API), along with shell
and utilities interface for the software compatible with
variants of the UNIX operating system.

•Some of the subgroups of POSIX are POSIX.1, POSIX.1b &


POSIX.1c are concerned with the development of set of
standards for system developers.
POSIX.1
This committee proposes a standard for a base operating
system API; this standard specifies APIs for the manipulating
of files and processes.
It is formally known as IEEE standard 1003.1-1990 and it
was also adopted by the ISO as the international standard
ISO/IEC 9945:1:1990.

POSIX.1b
This committee proposes a set of standard APIs for a real
time OS interface; these include IPC (inter-process
communication).
This standard is formally known as IEEE standard 1003.4-
1993.
POSIX.1c
This standard specifies multi-threaded programming
interface. This is the newest POSIX standard.
These standards are proposed for a generic OS that
is not necessarily be UNIX system.
E.g.: VMS from Digital Equipment Corporation, OS/2
from IBM, & Windows NT from Microsoft Corporation
are POSIX-compliant, yet they are not UNIX systems.
To ensure a user program conforms to POSIX.1
standard, the user should either define the
manifested constant _POSIX_SOURCE at the
beginning of each source module of the program
(before inclusion of any header) as;

#define _POSIX_SOURCE
Or specify the -D_POSIX_SOURCE option to a C++
compiler (CC) in a compilation;
% CC -D _POSIX_SOURCE *.C
POSIX.1b defines different manifested constant to check
conformance of user program to that standard.

The new macro is _POSIX_C_SOURCE and its value indicates


POSIX version to which a user program conforms.

Its value can be:


POSIX_C_SOURCE VALUES MEANING

198808L First version of POSIX.1 compliance


199009L Second version of POSIX.1 compliance
199309L POSIX.1 and POSIX.1b compliance
Program to check and display _POSIX_VERSION constant of the system
on which it is run

#define _POSIX_SOURCE
#define _POSIX_C_SOURCE 199309L
#include<iostream.h>
#include<unistd.h>
int main()
{
#ifdef _POSIX_VERSION
cout<<“System conforms to POSIX”<<“_POSIX_VERSION<<endl;
#else
cout<<“_POSIX_VERSION undefined\n”;
#endif
return 0;
}
The POSIX Environment
Although POSIX was developed on UNIX, a POSIX complaint system is
not necessarily a UNIX system.
A few UNIX conventions have different meanings according to the POSIX
standards.
Most C and C++ header files are stored under the /usr/include
directory in any UNIX system and each of them is referenced by

#include<header-file-name>

This method is adopted in POSIX. For each name specified in a #included


statement ,there need not be a physical file of that name existing on a
POSIX conforming system. In fact the data that should be contained in
that named object may be built to a compiler , or stored by some other
means on a given system.
Thus , in a POSIX environment ,included files are called simply headers
instead of header files.
Another difference between POSIX and UNIX is the concept of
superuser.

In UNIX, a superuser has privilege to access all system resources


and functions.The superuser ID is always ZERO.

POSIX standard do not mandate that all POSIX conforming


systems support the concept of Superuser ,nor does the user id
of zero require any special privileges
The POSIX Feature Test Macros
POSIX.1 defines a set of feature test macro’s which if defined
on a system, means that the system has implemented the
corresponding features. All these test macros are defined in
<unistd.h> header.
Feature test macro Effects if defined
_POSIX_JOB_CONTROL The system supports the BSD style job control.

_POSIX_SAVED_IDS Each process running on the system keeps the


saved set UID and the set-GID, so that they can
change its effective user-ID and group-ID to those
values via seteuid and setegid API's.
_POSIX_CHOWN_RESTRICTED If the defined value is -1, users may
change ownership of files owned by
them, otherwise only users with special
privilege may change ownership of any file
on the system.
_POSIX_NO_TRUNC If the defined value is -1, any long
pathname passed to an API is silently
truncated to NAME_MAX bytes, otherwise
error is generated.

_POSIX_VDISABLE If defined value is -1, there is no disabling


character for special characters for all
terminal device files. Otherwise the value
is the disabling character value.
Program to print POSIX defined configuration options supported on
any given system.

/* show_test_macros.C */

#define _POSIX_SOURCE
#define _POSIX_C_SOURCE 199309L
#include<iostream.h>
#include<unistd.h>
int main()
{
#ifdef _POSIX_JOB_CONTROL
cout<<“system supports job control”;
#else
cout<<“ system does not support job control\n”;
#endif
#ifdef _POSIX_SAVED_IDS
cout<<“ system supports saved set-UID and set-GID”;
#else
cout<<“ system does not support set-uid and gid\n”;
#endif
#ifdef _POSIX_CHOWN_RESTRICTED
cout<<“chown_restricted option is :”
<<_POSIX_CHOWN_RESTRICTED<<endl;
#else
cout<<”system does not support” <<” chown_restricted option\n”;
#endif
#ifdef _POSIX_NO_TRUNC
cout<<”pathname trunc option is:” << _POSIX_NO_TRUNC<<endl;
#else
cout<<” system does not support system-wide pathname” <<”trunc
option\n”;
#endif
#ifdef _POSIX_VDISABLE
cout<<“disable char. for terminal files is:” <<_POSIX_VDISABLE<<endl;
#else
cout<<“ system does not support _POSIX_VDISABLE \n”;
#endif
return 0;
}
Limits checking at Compile time and at Run time

POSIX.1 and POSIX.1b defines a set of system configuration limits in the


form of manifested constants in the <limits.h> header.

The following is a list of POSIX.1 – defined constants in the <limits.h>


header.

Compile time limit Min. Value Meaning

_POSIX_CHILD_MAX 6 Maximum number of child processes that


may be created at any one time by a
process.
_POSIX_OPEN_MAX 16 Maximum number of files that a process
can open simultaneously.
_POSIX_STREAM_MAX 8 Maximum number of I/O streams opened
by a process simultaneously.
_POSIX_ARG_MAX 4096 Maximum size, in bytes of arguments
that may be passed to an exec function.

_POSIX_NGROUP_MAX 0 Maximum number of supplemental


groups to which a process may belong

_POSIX_PATH_MAX 255 Maximum number of characters allowed


in a path name
_POSIX_NAME_MAX 14 Maximum number of characters allowed
in a file name
_POSIX_LINK_MAX 8 Maximum number of links a file may have
_POSIX_PIPE_BUF 512 Maximum size of a block of data
that may be automically read
from or written to a pipe
_POSIX_MAX_INPUT 255 Maximum capacity of a terminal’s
input queue (bytes)
_POSIX_MAX_CANON 255 Maximum size of a terminal’s
canonical input queue
_POSIX_SSIZE_MAX 32767 Maximum value that can be
stored in a ssize_t-typed object

_POSIX_TZNAME_MAX 3 Maximum number of characters in


a time zone name
The following is a list of POSIX.1b – defined constants:

Compile time limit Min. Value Meaning

_POSIX_AIO_MAX 1 Number of simultaneous


asynchronous I/O.
_POSIX_AIO_LISTIO_MAX 2 Maximum number of operations in
one listio.
_POSIX_TIMER_MAX 32 Maximum number of timers that
can be used simultaneously by a
process.
_POSIX_DELAYTIMER_MAX 32 Maximum number of overruns
allowed per timer.
_POSIX_MQ_OPEN_MAX 2 Maximum number of message
queues that may be accessed
simultaneously per process
POSIX_MQ_PRIO_MAX 2 Maximum number of message
priorities that can be assigned to
the messages
_POSIX_RTSIG_MAX 8 Maximum number of real time
signals.
_POSIX_SIGQUEUE_MAX 32 Maximum number of real time
signals that a process may queue at
any time.
_POSIX_SEM_NSEMS_MAX 256 Maximum number of semaphores
that may be used simultaneously
per process.
_POSIX_SEM_VALUE_MAX 32767 Maximum value that may be
assigned to a semaphore.
POSIX–defined constants specify only the minimum values for some
system configuration limits.A POSIX confirming system may be
configured with higher values for these limits.

To find out the actual implemented configuration limits system-wide or


on individual objects, one can use the sysconf, pathconf and fpathconf
functions to query these limits’ values at run time.

These functions are defined by POSIX.1, the sysconf is used to query


general system-wide configuration limits that are implemented on a
given system,pathconf and fpathconf are used to query file-related
configuration limits.

Two functions do the same thing, the only difference is that pathconf
takes a file’s pathname as argument, whereas fconf takes a file
descriptor as argument.
Prototypes of these functions are:
#include<unistd.h>

long sysconf(const int limit_name);


long pathconf(const char * pathname, int flimit_name);
long fpathconf(const int fd, int flimit_name);

The limit_name argument value is a manifested constant as defined in


the <unistd.h> header.

The possible values and the corresponding data returned by the sysconf
function are:
Limit value Sysconf return data

_SC_ARG_MAX Maximum size of argument values (in bytes) that


may be passed to an exec API call

_SC_CHILD_MAX Maximum number of child processes that may be


owned by a process simultaneously

_SC_OPEN_MAX Maximum number of opened files per process


_SC_NGROUPS_MAX Maximum number of supplemental groups
per process
_SC_CLK_TCK The number of clock ticks per second

_SC_JOB_CONTROL The _POSIX_JOB_CONTROL value


_SC_SAVED_IDS The _POSIX_SAVED_IDS value
_SC_VERSION The _POSIX_VERSION value
_SC_TIMERS The _POSIX_TIMERS value
_SC_DELAYTIMERS_MAX Maximum number of overruns allowed
per timer
_SC_RTSIG_MAX Maximum number of real time signals.
_SC_MQ_OPEN_MAX Maximum number of messages queues
per process.
_SC_MQ_PRIO_MAX Maximum priority value assignable to a
message
_SC_SEM_MSEMS_MAX Maximum number of semaphores per
process
_SC_SEM_VALUE_MAX Maximum value assignable to a
semaphore.
_SC_SIGQUEUE_MAX Maximum number of real time signals that
a process may queue at any one time

_SC_AIO_LISTIO_MAX Maximum number of operations in one


listio.
_SC_AIO_MAX Number of simultaneous asynchronous
I/O.
All constants used as a sysconf argument value have the _SC prefix.

Similarly the flimit_name argument value is a manifested constant


defined by the <unistd.h> header.

These constants all have the _PC_ prefix.

Following is the list of some of the constants and their corresponding


return values from either pathconf or fpathconf functions for a named
file object.
Limit value Pathconf return data

_PC_CHOWN_RESTRICTED The POSIX_CHOWN_RESTRICTED value

_PC_NO_TRUNC Returns the _POSIX_NO_TRUNC value


_PC_VDISABLE Returns the _POSIX_VDISABLE value
_PC_PATH_MAX Maximum length of a pathname (in bytes)
_PC_NAME_MAX Maximum length of a filename (in bytes)
_PC_LINK_MAX Maximum number of links a file may have
_PC_PIPE_BUF Maximum size of a block of data that may be
read from or written to a pipe
_PC_MAX_CANON maximum size of a terminal’s canonical input
queue
_PC_MAX_INPUT Maximum capacity of a terminal’s input queue.
These variables may be used at compile time, such as the following:

char pathname [ _POSIX_PATH_MAX + 1 ];


for (int i=0; i < _POSIX_OPEN_MAX; i++)
close(i); //close all file descriptors
The following test_config.C illustrates the use of sysconf, pathcong and
fpathconf:
#define _POSIX_SOURCE
#define _POSIX_C_SOURCE 199309L
#include<stdio.h>
#include<iostream.h>
#include<unistd.h>
int main()
{
int res;
if((res=sysconf(_SC_OPEN_MAX))==-1)
perror(“sysconf”); else
cout<<”OPEN_MAX:”<<res<<endl;
if((res=pathconf(“/”,_PC_PATH_MAX))==-1)
perror(“pathconf”);
else
cout<<”max path name:”<<(res+1)<<endl;
if((res=fpathconf(0,_PC_CHOWN_RESTRICTED))==-1)
perror(“fpathconf”);
else
cout<<”chown_restricted for stdin:”<<res<<endl;
return 0;
}
The POSIX.1 FIPS Standard
FIPS stands for Federal Information Processing Standard. The FIPS
standard is a restriction of the POSIX.1 – 1988 standard, and it requires
the following features to be implemented in all FIPS-conforming
systems:
•Job control.
•Saved set-UID and saved set-GID.
•Long path name is not supported.
•The _POSIX_CHOWN_RESTRICTED must be defined.
•The _POSIX_VDISABLE symbol must be defined.
•The NGROUP_MAX symbol’s value must be at least 8.
•The read and write API should return the number of bytes that have
been transferred after the APIs have been interrupted by signals.
•The group ID of a newly created file must inherit the group ID of its
containing directory.
The FIPS standard is a more restrictive version of the POSIX.1 standard.
The X/OPEN Standards
The X/Open organization was formed by a group of European
companies to propose a common operating system interface for their
computer systems.
The portability guides specify a set of common facilities and C
application program interface functions to be provided on all UNIX
based open systems.

In 1973, a group of computer vendors initiated a project called


“common open software environment” (COSE).
The goal of the project was to define a single UNIX programming
interface specification that would be supported by all type vendors.
The applications that conform to ANSI C and POSIX also conform to the
X/Open standards but not necessarily vice-versa.
UNIX AND POSIX APIs
API : A set of application programming interface functions that can be
called by user programs to perform system specific functions.
A set of functions and procedures that allow the creation of applications
which access the features or data of an operating system, application, or
other service.

Most UNIX systems provide a common set of APIs to perform the


following functions.
•Determine the system configuration and user information.
•Files manipulation.
•Processes creation and control.
•Inter-process communication.
•Signals and daemons
•Network communication.
The POSIX APIs
In general POSIX API’s uses and behaviors are similar to those of Unix
API’s. However, user’s programs should define the _POSIX_SOURCE or
_POSIX_C_SOURCE in their programs to enable the POSIX API’s
declaration in header files that they include.

The UNIX and POSIX Development Environment


POSIX provides portability at the source level.
This means that you transport your source program to the target
machine, compile it with the standard C compiler using conforming
headers and link it with the standard libraries
Some commonly used POSIX.1 and UNIX API’s are declared in <unistd.h>
header.
Most of POSIX.1, POSIX.1b and UNIX API object code is stored in the
libc.a and libc.so libraries.
API Common Characteristics
Many APIs returns an integer value which indicates the termination
status of their execution.
API return -1 to indicate the execution has failed, and the global
variable errno is set with an error code.
A user process may call perror function to print a diagnostic message
of the failure to the standard output, or it may call strerror function and
gives it errno as the actual argument value; the strerror function returns
a diagnostic message string and the user process may print that
message in its preferred way.
The possible error status codes that may be assigned to errno by any
API are defined in the <errno.h> header.
Following is a list of commonly occur error status codes and their
meanings:
Error status code Meaning
EACCESS A process does not have access permission to perform an
operation via a API.
EPERM A API was aborted because the calling process does not
have the superuser privilege.
ENOENT An invalid filename was specified to an API.
BADF A API was called with invalid file descriptor.
EINTR A API execution was aborted due to a signal interruption
EAGAIN A API was aborted because some system resource it
requested was temporarily unavailable. The API should
be called again later.
ENOMEM A API was aborted because it could not allocate dynamic
memory.
EIO I/O error occurred in a API execution.
EPIPE A API attempted to write data to a pipe which has no
reader.
EFAULT A API was passed an invalid address in one of its
argument.
ENOEXEC A API could not execute a program via one of the exec API

ECHILD A process does not have any child process which it can
wait on.
UNIT 2 UNIX FILES
Files are the building blocks of any operating system.

When you execute a command in UNIX, the UNIX kernel fetches the
corresponding executable file from a file system, loads its instruction
text to memory, and creates a process to execute the command on your
behalf.

In the course of execution, a process may read from or write to files.


All these operations involve files.

Thus, the design of an operating system always begins with an efficient


file management system.
File Types
A file in a UNIX or POSIX system may be one of the following
types:
Regular file
Directory file
FIFO file
Character device file
Block device file
Regular file
A regular file may be either a text file or a binary file
These files may be read or written to by users with the
appropriate access permission
Regular files may be created, browsed through and modified
by various means such as text editors or compilers, and they
can be removed by specific system commands
Directory file
It is like a folder that contains other files, including sub-
directory files.
It provides a means for users to organize their files into
some hierarchical structure based on file relationship or uses.
Ex: /bin directory contains all system executable programs,
such as cat, rm, sort
A directory may be created in UNIX by the mkdir command
o Ex: mkdir /usr/foo/xyz
A directory may be removed via the rmdir command
o Ex: rmdir /usr/foo/xyz
The content of directory may be displayed by the ls
command
Device file
Block device file
It represents a physical device that transmits data a block at a
time.
Ex: hard disk drives and floppy disk drives

Character device file


It represents a physical device that transmits data in a
character-based manner.
Ex: line printers, modems, and consoles
An application program in turn may choose to transfer data
by either a character-based(via character device file) or block-
based(via block device file)
A device file is created in UNIX via the mknod command

o Ex: mknod /dev/cdsk c 115 5


Here , c - character device file
115 - major device number
5 - minor device number
o For block device file, use argument ‘b’ instead of ‘c’.
Major device number
an index to a kernel table that contains the addresses of all
device driver functions known to the system. Whenever a
process reads data from or writes data to a device file, the
kernel uses the device file’s major number to select and
invoke a device driver function to carry out actual data
transfer with a physical device.
Minor device number
an integer value to be passed as an argument to a device
driver function when it is called. It tells the device driver
function what actual physical device is talking to and the I/O
buffering scheme to be used for data transfer.
FIFO file

It is a special pipe device file which provides a temporary


buffer for two or more processes to communicate by writing
data to and reading data from the buffer.
The size of the buffer is fixed to PIPE_BUF.
Data in the buffer is accessed in a first-in-first-out manner.
The buffer is allocated when the first process opens the
FIFO file for read or write
The buffer is discarded when all processes close their
references (stream pointers) to the FIFO file.
Data stored in a FIFO buffer is temporary.
A FIFO file may be created via the mkfifo
command.
o The following command creates a FIFO file (if it
does not exists)
mkfifo /usr/prog/fifo_pipe
o The following command creates a FIFO file (if it
does not exists)
mknod /usr/prog/fifo_pipe p

FIFO files can be removed using rm command.


Symbolic link file
BSD UNIX & SV4 defines a symbolic link file.
A symbolic link file contains a path name which references
another file in either local or a remote file system.
POSIX.1 does not support symbolic link file type
A symbolic link may be created in UNIX via the ln command
Ex: ln -s /usr/divya/original /usr/raj/slink
It is possible to create a symbolic link to reference another
symbolic link.
rm, mv and chmod commands will operate only on the
symbolic link arguments directly and not on the files that they
reference.
The UNIX and POSIX File Systems
Files in UNIX or POSIX systems are stored in tree-like hierarchical file
system.
The root of a file system is the root (“/”) directory.
The leaf nodes of a file system tree are either empty directory files or
other types of files.
Absolute path name of a file consists of the names of all the
directories, starting from the root.
Ex: /usr/divya/a.out
Relative path name may consist of the “.” and “..” characters. These
are references to current and parent directories respectively.
Ex: ../../.login denotes .login file which may be found 2 levels up
from the current directory
A file name may not exceed NAME_MAX characters (14 bytes) and the
total number of characters of a path name may not exceed PATH_MAX
(1024 bytes).
POSIX.1 defines _POSIX_NAME_MAX and _POSIX_PATH_MAX in
<limits.h> header .

File name can be any of the following character set only


A to Z a to z 0 to 9 _ .

Path name of a file is called the hardlink.

A file may be referenced by more than one path name if a user creates
one or more hard links to the file using ln command.

ln /usr/foo/path1 /usr/prog/new/n1
If the –s option is used, then it is a symbolic (soft) link .
The following files are commonly defined in most UNIX
systems
FILE Use
/etc Stores system administrative files and programs
/etc/passwd Stores all user information’s
/etc/shadow Stores user passwords
/etc/group Stores all group information
/bin Stores all the system programs like cat,
rm, cp,etc.
/dev Stores all character device and block
device files
/usr/include Stores all standard header files.
/usr/lib Stores standard libraries
/tmp Stores temporary files created by
program
The UNIX and POSIX File Attributes
The general file attributes for each file in a file system are:

1) File type - specifies what type of file it is.


2) Access permission - the file access permission for owner, group and
others.
3) Hard link count - number of hard link of the file
4) Uid - the file owner user id.
5) Gid - the file group id.
6) File size - the file size in bytes.
7) Inode no - the system inode no of the file.
8) File system id - the file system id where the file is stored.
9) Last access time - the time, the file was last accessed.
10) Last modified time - the file, the file was last modified.
11) Last change time - the time, the file was last changed.
In addition to the above attributes, UNIX systems also store
the major and minor device numbers for each device file.

All the above attributes are assigned by the kernel to a file


when it is created.

The attributes that are constant for any file are:

File type
File inode number
File system ID
Major and minor device number
The other attributes are changed by the following UNIX commands or
system calls
Unix System Call Attributes changed
Command
chmod chmod Changes access permission, last
change time
chown chown Changes UID, last change time
chgrp chown Changes GID, ast change time
touch utime Changes last access time, modification
time
ln link Increases hard link count
rm unlink Decreases hard link count. If the hard link
count is zero, the file will be removed from
the file system.
vi, emac Changes the file size, last access time, last
modification time
Inodes in UNIX System V
In UNIX system V, a file system has an inode table, which keeps tracks
of all files. Each entry of the inode table is an inode record which
contains all the attributes of a file, including inode # and the physical
disk address where data of the file is stored .

For any operation, if a kernel needs to access information of a file with


an inode # 15, it will scan the inode table to find an entry, which
contains an inode # 15 in order to access the necessary data.

An inode # is unique within a file system. A file inode record is


identified by a file system ID and an inode #.

Generally an OS does not keep the name of a file in its record, because
the mapping of the filenames to inode# is done via directory files i.e. a
directory file contains a list of names of their respective inode # for all
file stored in that directory.
Ex: a sample directory file content
Inode number File name
115 .
89 ..
201 xyz
346 a.out
201 xyz_ln1
To access a file, for example /usr/divya, the UNIX kernel always knows
the “/” (root) directory inode # of any process. It will scan the “/”
directory file to find the inode number of the usr file. Once it gets the
usr file inode #, it accesses the contents of usr file. It then looks for the
inode # of divya file.
Whenever a new file is created in a directory, the UNIX kernel
allocates a new entry in the inode table to store the information of the
new file
It will assign a unique inode # to the file and add the new file name
and inode # to the directory file that contains it.
Application Program Interface to Files
The general interfaces to the files on UNIX and POSIX system
are
Files are identified by pathnames.
Files should be created before they can be used. The various
commands and system calls to create files are listed below.
File type Commands System call
Regular file vi,pico,emac open,creat
Directory file mkdir mkdir,mknod
FIFO file mkfifo mkfifo,mknod
Device file mknod mknod
Symbolic link file ln –s symlink
For any application to access files, first it should be opened, generally
we use open system call to open a file, and the returned value is an
integer which is termed as file descriptor.

There are certain limits of a process to open files. A maximum number


of OPEN-MAX files can be opened .The value is defined in <limits.h>
header.
The data transfer function on any opened file is carried out by read
and write system call.
File hard links can be increased by link system call, and decreased by
unlink system call.
File attributes can be changed by chown, chmod and link system calls.
File attributes can be queried (found out or retrieved) by stat and fstat
system call.
UNIX and POSIX.1 defines a structure of data type stat i.e.
defined in <sys/stat.h> header file.

This contains the user accessible attribute of a file.


The definition of the structure can differ among
implementation, but it could look like
struct stat
{
dev_t st_dev; /* file system ID */
ino_t st_ino; /* file inode number */
mode_t st_ mode; /* contains file type and permission */
nlink_t st_nlink; /* hard link count */
uid_t st_uid; /* file user ID */
gid_t st_gid; /* file group ID */
dev_t st_rdev; /*contains major and minor
device#*/
off_t st_size; /* file size in bytes */
time_t st_atime; /* last access time */
time_t st_mtime; /* last modification time */
time_t st_ctime; /* last status change time */
};
UNIX Kernel Support for Files
In UNIX system V, the kernel maintains a file table that has an
entry of all opened files and also there is an inode table that
contains a copy of file inodes that are most recently accessed.

A process, which gets created when a command is executed


will be having its own data space (data structure) wherein it
will be having file descriptor table. The file descriptor table
will be having an maximum of OPEN_MAX file entries.

Whenever the process calls the open function to open a file


to read or write, the kernel will resolve the pathname to the
file inode number.
The steps involved are :
1. The kernel will search the process descriptor table and
look for the first unused entry. If an entry is found, that
entry will be designated to reference the file .
The index of the entry will be returned to the process as the
file descriptor of the opened file.

2. The kernel will scan the file table in its kernel space to
find an unused entry that can be assigned to reference the
file.
If an unused entry is found the following events will occur:
The process file descriptor table entry will be set to point to
this file table entry.
The file table entry will be set to point to the inode table
entry, where the inode record of the file is stored.
The file table entry will contain the current file pointer of
the open file. This is an offset from the beginning of the file
where the next read or write will occur.
The file table entry will contain an open mode that specifies
that the file opened is for read only, write only or read and
write etc. This should be specified in open function call.
The reference count (rc) in the file table entry is set to 1.
Reference count is used to keep track of how many file
descriptors from any process are referring the entry.

The reference count of the in-memory inode of the file is


increased by 1. This count specifies how many file table
entries are pointing to that inode. If either (1) or (2) fails,
the open system call returns -1 (failure/error)
Data Structure for File Manipulation
Normally the reference count in the file table entry is
1,if we wish to increase the rc in the file table entry,
this can be done using fork,dup,dup2 system call.

When a open system call is succeeded, its return


value will be an integer (file descriptor).
Whenever the process wants to read or write data from the file, it
should use the file descriptor as one of its argument. The following
events will occur whenever a process calls the close function to close
the files that are opened.

1. The kernel sets the corresponding file descriptor table entry to be


unused.
2. It decrements the rc in the corresponding file table entry by 1, if rc
not equal to 0 go to step 6.
3. The file table entry is marked as unused.
4. The rc in the corresponding file inode table entry is decremented by
1, if rc value not equal to 0 go to step 6.
5. If the hard link count of the inode is not zero, it returns to the caller
with a success status otherwise it marks the inode table entry as
unused and de-allocates all the physical dusk storage of the file.
6. It returns to the process with a 0 (success) status.
Relationship of C Stream Pointers and File Descriptors
The major difference between the stream pointer and the
file descriptors are as follows:
The file descriptor associated with a stream pointer can be
extracted by fileno macro, which is declared in the <stdio.h>
header.

int fileno(FILE * stream_pointer);

To convert a file descriptor to a stream pointer, we can use


fdopen C library function

FILE *fdopen(int file_descriptor, char * open_mode);


The following lists some C library functions and the underlying
UNIX APIs they use to perform their functions:

C library function UNIX system call used


fopen open
fread, fgetc, fscanf, fgets read
fwrite, fputc, fprintf, fputs write
fseek, fputc, fprintf, fputs lseek
fclose close
Directory Files
It is a record-oriented file
Each record contains the information of a file
residing in that directory
The record data type is struct dirent in UNIX System
V and POSIX.1 and struct direct in BSD UNIX.
The record content is implementation-dependent
They all contain 2 essential member fields
o File name
o Inode number
Usage is to map file names to corresponding inode number

Directory function Purpose


opendir Opens a directory file
readdir Reads next record from the file
closedir Closes a directory file
rewinddir Sets file pointer to beginning of file
Hard and Symbolic Links
A hard link is a UNIX pathname for a file. Generally most of the UNIX
files will be having only one hard link.

In order to create a hard link, we use the command ln.


Example : Consider a file /usr/ divya/old, to this we can create a hard
link by ln /usr/ divya/old /usr/ divya/new after this we can refer the
file by either /usr/ divya/old or /usr/ divya/new

Symbolic link can be creates by the same command ln but with option
–s
Example: ln –s /usr/divya/old /usr/divya/new

ln command differs from the cp(copy) command in that cp creates a


duplicated copy of a file to another file with a different pathname,
whereas ln command creates a new directory to reference a file.
Let’s visualize the content of a directory file after the
execution of command ln.
Case 1: for hardlink file ln /usr/divya/abc /usr/raj/xyz
The content of the directory files /usr/divya and /usr/raj are
Both /urs/divya/abc and /usr/raj/xyz refer to the same
inode number 201, thus type is no new file created.

Case 2: For the same operation, if ln –s command is used


then a new inode will be created.

ln –s /usr/divya/abc /usr/raj/xyz

The content of the directory files divya and raj will be


If cp command was used then the data contents will be
identical and the 2 files will be separate objects in the file
system, whereas in ln –s the data will contain only the path
name.
Limitations of hard link:

1. User cannot create hard links for directories, unless he has


super-user privileges.
2. User cannot create hard link on a file system that
references files on a different file system, because inode
number is unique to a file system. Differences between
hard link and symbolic link are listed below:

You might also like