Coos Unit V Part 1&2
Coos Unit V Part 1&2
Coos Unit V Part 1&2
Part-1 Part-2
File System Interface: The Concept of a File File System Implementation:
Access Methods File System Structure
Directory Structure File system Implementation
File System Mounting Directory Implementation
File Sharing Allocation Methods
Protection. Free-Space Management
Page 1 of 16
6. Truncating a file. The user may want to erase the contents of a file but keep its attributes. Rather than
forcing the user to delete the file and then recreate it, this function allows all attributes to remain unchanged
(except for file length) but lets the file be reset to length zero and its file space released.
File types-
File Type Usual extension Function
Executable exe, com, bin or none ready-to-run machine- language program
Object obj, o complied, machine language, not linked
Source code c. p, pas, 177, asm, a source code in various languages
Batch bat, sh Series of commands to be executed
Text txt, doc textual data documents
Word processor doc,docs, tex, rrf, etc. various word-processor formats
Library lib, a,so,dll libraries of routines
Related files grouped into one file, sometimes
Archive arc, zip, tar
compressed.
Print or view ASCII or binary file in a format for printing of
Ps,pdf,jpg
viewing
File structure
A File Structure needs to be predefined format in such a way that an operating system understands. It has an
exclusively defined structure, which is based on its type.
Three types of files structure in OS:
A text file: It is a series of characters that is organized in lines.
An object file: It is a series of bytes that is organized into blocks.
A source file: It is a series of functions and processes.
Internal File Structure
Files can be structured in any of several ways. Three common possibilities are depicted in Fig. .
Figure: Three kinds of files. (a) Byte sequence. (b) Record sequence. (c) Tree.
Stream of Bytes. The file in Fig. a is an unstructured sequence of bytes. In effect, the operating system does
not know or care what is in the file. All it sees are bytes. Both UNIX and Windows use this approach.
Records. The first step up in structure is shown in Fig. b. A file is a sequence of fixed-length records, each
with some internal structure.
Tree of Records. The third kind of file structure is shown in Fig. c. In this organization, a file consists of a tree
of records, not necessarily all the same length, each containing a key field in a fixed position in the record.
Page 2 of 16
1. Sequential Access:
It is the simplest access method. Information in the file is processed in order, one record after the other.
This mode of access is by far the most common; for example, editor and compiler usually access the file
in this fashion.
Read and write make up the bulk of the operation on a file. A read operation -read next- read the next
position of the file and automatically advance a file pointer, which keeps track I/O location. Similarly, for
the -write next- append to the end of the file and advance to the newly written material.
Key points:
1. Data is accessed one record right after another record in an order.
2. When we use read command, it move ahead pointer by one
3. When we use write command, it will allocate memory and move the pointer to the end of the file
4. Such a method is reasonable for tape.
2. Direct Access:
Another method is direct access method also known as relative access method. A filed-length logical record
that allows the program to read and write record rapidly. in no particular order. The direct access is based on
the disk model of a file since disk allows random access to any file block. For direct access, the file is
viewed as a numbered sequence of block or record. Thus, we may read block 14 then block 59, and then we
can write block 17. There is no restriction on the order of reading and writing for a direct access file.
A block number provided by the user to the operating system is normally a relative block number; the first
relative block of the file is 0 and then 1 and so on.
It is the other method of accessing a file that is built on the top of the sequential access method. These
methods construct an index for the file. The index, like an index in the back of a book, contains the pointer
to the various blocks. To find a record in the file, we first search the index, and then by the help of pointer
we access the file directly.
Key points:
1. It is built on top of Sequential access.
2. It controls the pointer by using index.
Directory Structure:
A directory is a container that is used to contain folders and files. It organizes files and folders in a
hierarchical manner.
Page 3 of 16
There are several logical structures of a directory, these are given below.
Single-level directory:
The single-level directory is the simplest directory structure. In it, all files are contained in the same
directory which makes it easy to support and understand.
A single level directory has a significant limitation, however, when the number of files increases or when the
system has more than one user. Since all the files are in the same directory, they must have a unique name. If
two users call their dataset test, then the unique name rule violated.
Advantages:
1. Since it is a single directory, so its implementation is very easy.
2. If the files are smaller in size, searching will become faster.
3. The operations like file creation, searching, deletion, updating is very easy in such a directory
structure.
Disadvantages:
1. There may change of name collision because two files can have the same name.
2. Searching will become time taking if the directory is large.
3. This cannot group the same type of files together.
Two-level directory –
As we have seen, a single level directory often leads to confusion of files names among different users. The
solution to this problem is to create a separate directory for each user. In the two-level directory structure,
each user has their own user files directory (UFD). The UFDs have similar structures, but each lists only the
files of a single user. System’s master file directory (MFD) is searches whenever a new user id=s logged in.
The MFD is indexed by username or account number, and each entry points to the UFD for that user.
Advantages:
1. We can give full path like /User-name/directory-name/.
2. Different users can have the same directory as well as the file name.
3. Searching of files becomes easier due to pathname and user-grouping.
Disadvantages:
1. A user is not allowed to share files with other users.
2. Still, it not very scalable, two files of the same type cannot be grouped together in the same user.
Page 4 of 16
Tree-structured directory –
Once we have seen a two-level directory as a tree of height 2, the natural generalization is to extend the
directory structure to a tree of arbitrary height.
This generalization allows the user to create their own subdirectories and to organize their files accordingly.
A tree structure is the most common directory structure. The tree has a root directory, and every file in the
system has a unique path.
Advantages:
1. Very general, since full pathname can be given.
2. Very scalable, the probability of name collision is less.
3. Searching becomes very easy, we can use both absolute paths as well as relative.
Disadvantages:
1. Every file does not fit into the hierarchical model; files may be saved into multiple directories.
2. We cannot share files.
3. It is inefficient, because accessing a file may go under multiple directories.
An acyclic graph is a graph with no cycle and allows us to share subdirectories and files. The same file or
subdirectories may be in two different directories. It is a natural generalization of the tree-structured
directory.
It is used in the situation like when two programmers are working on a joint project and they need to access
files. The associated files are stored in a subdirectory, separating them from other projects and files of other
programmers since they are working on a joint project so they want the subdirectories to be into their own
directories. The common subdirectories should be shared. So here we use acyclic directories.
It is the point to note that the shared file is not the same as the copy file. If any programmer makes some
changes in the subdirectory it will reflect in both subdirectories.
Page 5 of 16
Advantages:
1. We can share files.
2. Searching is easy due to different-different paths.
Disadvantages:
1. We share the files via linking, in case deleting it may create the problem,
2. If the link is a soft link then after deleting the file we left with a dangling pointer.
3. In the case of a hard link, to delete a file we have to delete all the references associated with it.
Advantages:
1. It allows cycles.
2. It is more flexible than other directories structure.
Disadvantages:
1. It is more costly than others.
2. It needs garbage collection.
Operating system organizes files on a disk using file system. And these file systems differ with operating
systems. We have greater number of file systems available for Linux. One of the great benefits of Linux is that
we can access data on many different file systems even if these file systems are from different operating system.
Page 6 of 16
In order to access the file system in the Linux first we need to mount it. Mounting refers to making a group of
files in a file system structure accessible to user or group of users. It is done by attaching a root directory from
one file system to that of another. This ensures that the other directory or device appears as a directory or
subdirectory of that system.
Mounting may be local or remote. In local mounting it connects disk drivers as one machine such that they
behave as single logical system, while remote mount uses NFS(Network file system) to connect to directories on
other machine so that they can be used as if they are the part of the users file system. Opposite of mounting is
unmounting [commands for the same will be discussed later] and it is removal of access of those files/folders. It
was the overview of what a mounting is.
In UNIX based system there is a single directory tree, and all the accessible storage must have an associated
location in a single directory tree. And for making the storage accessible mounting is used.
File Sharing:
1. Multiple Users:
• When an operating system accommodates multiple users, the issues of file sharing, file naming and file
protection become preeminent.
• The system either can allow user to access the file of other users by default, or it may require that a user
specifically grant access to the files.
• These are the issues of access control and protection.
• To implementing sharing and protection, the system must maintain more file and directory attributes than a
on a single-user system.
• The owner is the user who may change attributes, grand access, and has the most control over the file or
directory.
• The group attribute of a file is used to define a subset of users who may share access to the file.
• Most systems implement owner attributes by managing a list of user names and associated user identifiers
(user Ids).
• When a user logs in to the system, the authentication stage determines the appropriate.
• User ID for the user. That user ID is associated with all of user’s processes and threads. When they need to
be user readable, they are translated, back to the user name via the user name list.
• Likewise, group functionality can be implemented as a system wide list of group names and group
identifiers.
• Every user can be in one or more groups, depending upon operating system design decisions. The
user’s group Ids is also included in every associated process and thread.
Page 7 of 16
2. Remote File System:
Networks allowed communications between remote computers.
• Networking allows the sharing or resource spread within a campus or even around the world. User
manually transfer files between machines via programs like ftp.
• A distributed file system (DFS) in which remote directories is visible from the local machine.
• · The World Wide Web: A browser is needed to gain access to the remote file and separate operations
(essentially a wrapper for ftp) are used to transfer files.
3. The client-server Model:
Remote file systems allow a computer to a mount one or more file systems from one or more remote
machines. A server can serve multiple clients, and a client can use multiple servers, depending on the
implementation details of a given client –server facility. Client identification is
more difficult. Clients can be specified by their network name or other identifier, such as IP address, but
these can be spoofed (or imitate). An unauthorized client can spoof the server into deciding that it is
authorized, and the unauthorized client could be allowed access.
4. Distributed Information systems:
Distributed information systems, also known as distributed naming service, have been devised to provide a
unified access to the information needed for remote computing.
i. Domain name system (DNS) provides host-name-to-network address translations for their entire
Internet (including the World Wide Web). Before DNS was invented and became widespread, files
containing the same information were sent via e-mail of ftp between all networked hosts.
5. Failure Modes:
· Redundant arrays of inexpensive disks (RAID) can prevent the loss of a disk from resulting in the loss of
data. Remote file system has more failure modes. By nature of the complexity of networking system and the
required interactions between remote machines, many more problems can interfere with the proper operation of
remote file systems.
6. Consistency Semantics:
It is characterization of the system that specifies the semantics of multiple users accessing a shared file
simultaneously.
• These semantics should specify when modifications of data by one user are observable by other users.
• The semantics are typically implemented as code with the file system.
• A series of file accesses (that is reads and writes) attempted by a user to the same file is always enclosed
between the open and close operations.
• The series of access between the open and close operations is a file session.
7. UNIX Semantics:
The UNIX file system uses the following consistency semantics:
1. Writes to an open file by a user are visible immediately to other users that have this file open at the same
time.
2. One mode of sharing allows users to share the pointer of current location into the file. Thus, the advancing
of the pointer by one user affects all sharing users.
8. Session Semantics:
The Andrew file system (AFS) uses the following consistency semantics:
1. Writes to an open file by a user are not visible immediately to other users that have the same file open
simultaneously.
2. Once a file is closed, the changes made to it are visible only in sessions starting later. Already open
instances of the file do not reflect this change.
9. Immutable –shared File Semantics:
Once a file is declared as shared by its creator, it cannot be modified.
a. An immutable file has two key properties:
b. Its name may not be reused and its contents may not be altered.
Page 8 of 16
Protection in File System
• Users want to protect the information stored in the file system from improper access and physical
damage.
• To protect our information, one can make duplicate copies of the files; some systems automatically
copy the files to protect the user from losing important information if the original files are
accidentally destroyed.
• There can be situations of hardware errors like extreme temperature, vandalism, head crashes,
failures, etc that may cause damage to the files.
Types of access:
File sharing is the method of providing partial or full access to the users of the file system, as multiple
users have access to the same data; there is a need for protection.
One way to protect file systems can be to prohibit access, but it is an extreme scenario and is not of
practical use. What is needed is controlled access. It may depend on various factors.
These are the following operations that can be controlled:
1. Delete: Delete the file and free space for possible reuse. 4. Read: Read from the file.
2. Append: Write new information at the end of the file. 5. Write: Write or rewrite the file.
3. Execute: Load the file into memory and execute it. 6. List: List the name and attributes of the file.
However, protection is provided only at a lower level. For example, copying a file may be implemented
by simply providing a sequence of read requests. In this scenario, a user with read access can copy the
file and print it, and so on.
Access Control
There are numerous ways to access any file, one of the prominent ones is to associate identity-
dependent access with all files and directories. A list is created called the access-control list which
enlists the names of users and the type of access granted to them. However, it is very long as all the
users need to be listed down. This process can often be tedious and unrewarding, especially if one does
not have the list of users before the task.
To resolve this situation and condense the length of access-control list, the following classifications are
used:
*****************************************************************************************
Page 9 of 16
Part-2:
File System Implementation:
File is a collection of related information. The file system resides on secondary storage and provides
efficient and convenient access to the disk by allowing data to be stored, located, and retrieved.
File system organized in many layers:
Page 10 of 16
Directory Structure –
They store file names and associated inode numbers. In UNIX, includes file names and associated file names
and in NTFS, it is stored in master file table.
Per-File FCB –
It contains details about files and it has a unique identifier number to allow association with directory entry.
In NTFS it is stored in master file table.
2. In-Memory Structure:
They are maintained in main-memory and these are helpful for file system management for caching. Several
in-memory structures given below:
Mount Table –
It contains information about each mounted volume.
Directory-Structure cache –
This cache holds the directory information of recently accessed directories.
System wide open-file table –
It contains the copy of FCB of each open file.
Per-process open-file table –
It contains information opened by that particular process and it maps with appropriate system wide open-file.
Directory Implementation:
Linear List –
It maintains a linear list of filenames with pointers to the data blocks. It is time-consuming also. To create a
new file, we must first search the directory to be sure that no existing file has the same name then we add a
file at end of the directory. To delete a file, we search the directory for the named file and release the space.
To reuse the directory entry either we can mark the entry as unused or we can attach it to a list of free
directories.
Hash Table –
The hash table takes a value computed from the file name and returns a pointer to the file. It decreases the
directory search time. The insertion and deletion process of files is easy. The major difficulty is hash tables
are its generally fixed size and hash tables are dependent on hash function on that size.
Page 11 of 16
File Allocation Methods
The allocation methods define how the files are stored in the disk blocks. There are three main disk space or
file allocation methods.
1. Contiguous Allocation
2. Linked Allocation
3. Indexed Allocation
The main idea behind these methods is to provide:
• Efficient disk space utilization.
• Fast access to the file blocks.
All the three methods have their own advantages and disadvantages as discussed below:
1. Contiguous Allocation
In this scheme, each file occupies a contiguous set of blocks on the disk. For example, if a file requires n
blocks and is given a block b as the starting location, then the blocks assigned to the file will be: b, b+1,
b+2,……b+n-1. This means that given the starting block address and the length of the file (in terms of blocks
required), we can determine the blocks occupied by the file.
The directory entry for a file with contiguous allocation contains
Address of starting block
Length of the allocated portion.
The file ‘mail’ in the following figure starts from the block 19 with length = 6 blocks. Therefore, it
occupies 19, 20, 21, 22, 23, 24 blocks.
Advantages:
Both the Sequential and Direct Accesses are supported by this. For direct access, the address of the kth block
of the file which starts at block b can easily be obtained as (b+k).
This is extremely fast since the number of seeks are minimal because of contiguous allocation of file blocks.
Disadvantages:
• This method suffers from both internal and external fragmentation. This makes it inefficient in terms of
memory utilization.
• Increasing file size is difficult because it depends on the availability of contiguous memory at a particular
instance.
Page 12 of 16
The directory entry contains a pointer to the starting and the ending file block. Each block contains a pointer
to the next block occupied by the file.
The file ‘jeep’ in following image shows how the blocks are randomly distributed. The last block (25)
contains -1 indicating a null pointer and does not point to any other block.
Advantages:
• This is very flexible in terms of file size. File size can be increased easily since the system does not have
to look for a contiguous chunk of memory.
• This method does not suffer from external fragmentation. This makes it relatively better in terms of
memory utilization.
Disadvantages:
• Because the file blocks are distributed randomly on the disk, a large number of seeks are needed to access
every block individually. This makes linked allocation slower.
• It does not support random or direct access. We can not directly access the blocks of a file. A block k of a
file can be accessed by traversing k blocks sequentially (sequential access ) from the starting block of the
file via block pointers.
• Pointers required in the linked allocation incur some extra overhead.
3. Indexed Allocation
In this scheme, a special block known as the Index block contains the pointers to all the blocks occupied by
a file. Each file has its own index block. The ith entry in the index block contains the disk address of the ith
file block. The directory entry contains the address of the index block as shown in the image:
Page 13 of 16
Advantages:
• This supports direct access to the blocks occupied by the file and therefore provides fast access to the file
blocks.
• It overcomes the problem of external fragmentation.
Disadvantages:
• The pointer overhead for indexed allocation is greater than linked allocation.
• For very small files, say files that expand only 2-3 blocks, the indexed allocation would keep one entire
block (index block) for the pointers which is inefficient in terms of memory utilization. However, in
linked allocation we lose the space of only 1 pointer per block.
For files that are very large, single index block may not be able to hold all the pointers.
Following mechanisms can be used to resolve this:
1. Linked scheme: This scheme links two or more index blocks together for holding the pointers. Every
index block would then contain a pointer or the address to the next index block.
2. Multilevel index: In this policy, a first level index block is used to point to the second level index blocks
which inturn points to the disk blocks occupied by the file. This can be extended to 3 or more levels
depending on the maximum file size.
3. Combined Scheme: In this scheme, a special block called the Inode (information Node) contains all the
information about the file such as the name, size, authority, etc and the remaining space of Inode is used
to store the Disk Block addresses which contain the actual file as shown in the image below. The first few
of these pointers in Inode point to the direct blocks i.e the pointers contain the addresses of the disk
blocks that contain data of the file. The next few pointers point to indirect blocks. Indirect blocks may be
single indirect, double indirect or triple indirect. Single Indirect block is the disk block that does not
contain the file data but the disk address of the blocks that contain the file data. Similarly, double
indirect blocks do not contain the file data but the disk address of the blocks that contain the address of
the blocks containing the file data.
Page 14 of 16
Free space management:
The system keeps tracks of the free disk blocks for allocating space to files when they are created. Also, to
reuse the space released from deleting the files, free space management becomes crucial. The system
maintains a free space list which keeps track of the disk blocks that are not allocated to some file or
directory. The free space list can be implemented mainly as:
1. Bitmap or Bit vector –
A Bitmap or Bit Vector is series or collection of bits where each bit corresponds to a disk block. The bit can
take two values: 0 and 1: 0 indicates that the block is allocated and 1 indicates a free block.
The given instance of disk blocks on the disk in Figure 1 (where green blocks are allocated) can be
represented by a bitmap of 16 bits as: 0000111000000110.
Advantages:
• Simple to understand.
• Finding the first free block is efficient. It requires scanning the words (a group of 8 bits) in a bitmap
for a non-zero word. (A 0-valued word has all bits 0). The first free block is then found by scanning
for the first 1 bit in the non-zero word.
The block number can be calculated as:
(number of bits per word) *(number of 0-values words) + offset of bit first bit 1 in the non-zero word .
For the Figure-1, we scan the bitmap sequentially for the first non-zero word.
The first group of 8 bits (00001110) constitute a non-zero word since all bits are not 0. After the non-0
word is found, we look for the first 1 bit. This is the 5th bit of the non-zero word. So, offset = 5.
Therefore, the first free block number = 8*0+5 = 5.
2. Linked List –
In this approach, the free disk blocks are linked together i.e. a free block contains a pointer to the next
free block. The block number of the very first disk block is stored at a separate location on disk and is
also cached in memory.
Page 15 of 16
In Figure-2, the free space list head points to Block 5 which points to Block 6, the next free block and so
on. The last free block would contain a null pointer indicating the end of free list.
A drawback of this method is the I/O required for free space list traversal.
3. Grouping –
This approach stores the address of the free blocks in the first free block. The first free block stores the
address of some, say n free blocks. Out of these n blocks, the first n-1 blocks are actually free and the last
block contains the address of next free n blocks.
An advantage of this approach is that the addresses of a group of free disk blocks can be found easily.
4. Counting –
This approach stores the address of the first free disk block and a number n of free contiguous disk blocks
that follow the first block.
Every entry in the list would contain:
1. Address of first free disk block
2. A number n
For example, in Figure-1, the first entry of the free space list would be: ([Address of Block 5], 2), because
2 contiguous free blocks follow block 5.
Page 16 of 16