OSY Chapter 6 SSP

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Chapter 6

File Management
• A file is a named collection of related information that is recorded on secondary storage
such as magnetic disks, magnetic tapes and optical disks.

• Commonly files represent programs (source and object forms) and data. Data files may be
numeric alphabetic, alphanumeric or binary.

• In general, a file is a sequence of bits, bytes lines or records whose meaning is defined by
the file’s creator and user. The information in a file is defined by its creator.

• Many different types of information may be stored in a file: Source programs, Object
programs, Executable programs, Numeric Data, Text, Payroll records, Graphic Images,
Sound recording and so on.

File Attributes:
• Name: The symbolic file name is the only information kept in human readable form.

• Type: This information is needed for those systems that support different types.

• Location: This information is a pointer to a device and to the location of the file on that
device.

• Size: The current size of the file (in bytes, words or blocks) and possibly the maximum
allowed size are included in this attribute.

• Protection: Access control information controls that who can do reading, writing,
executing and so on.

• Time, Date and User Identification: This information may be kept for creation, Last
modification and last use. These data can be useful for protection, security and usage
monitoring.

• Identifier: File system gives a unique tag or number that identifies file within file system
and which is used to refer files internally.

• Creator or Owner: A creator is a user or a person who has created that file and the owner
is a person who owns that file currently.

File Operations:
• Creating a file: Two steps are necessary to create a file. First space in the file system must
be found for the file. Second an entry for the new file must be made in the directory. The
directory entry records the name of the file and the location in the file system.

1
• Writing a file: To write a file, we make a system call specifying both the name of the file
and the information to be written to the file. Given the name of the file, the system searches
the directory to find the location of file then the write pointer must be updated whenever
a write occurs

• Reading a file: To read from a file, we use a system call that specifies the name of the file
and where (in memory) the next block of the file should be put. System needs to keep a
read pointer to location in the file where the next read is to take place. Once the read has
taken place, the read pointer is updated.

• Repositioning within a file: The directory is searched for the appropriate entry, and the
current file position is set to a given value. Repositioning within a file does not need to
involve any actual I/O. This file operation is also known as a file seeks.

• Deleting a file: To delete a file, we search the directory for the named file. Having found
the associated directory entry, we release all file space and erase the directory entry.

• Truncating a file: Instead of deleting a file and then recreate it, this function allows all
attributes to remain unchanged but for the file to be reset to length zero. User wants to
erase the contents of the file.

• Other common operations include appending new information to the end of an existing
file, and renaming an existing file.

File Types:
The file name is split into two parts a name of file and extension.

2
File Structure:

A) Serial File:

• The least complicated form of file organization is the serial file or pile. Data are collected
in the order in which they arrive.

• The purpose of this file is simply accumulating the mass of data and save it. Records may
have different fields, or similar fields in different orders. Thus, each field should be self
describing including a field name as well as a value.

3
Advantages of Serial File:

1. Simple organization.

2. Data usually stored prior to processing.

3. Less complexity and good efficiency for variable sized record.

4. Utilizes space very well for varying data structure.

Disadvantages of Serial File:

1. Because there is no structure to these files, record access is very difficult.

2. Required more searching time.

3. Records are not arranged in proper manner.

B) Sequential file access:

• The simplest access method is sequential access. Information in the file is processed in
order, one record after the other.

• The bulk of the operations on a file are reads and writes. The read and write operations on
the sequential file are done in sequential order.

• A read operation reads the next portion of the file and automatically advances a file
pointer, which tracks the I/O location. Similarly a write appends to the end of the file and
advances to the end of the newly written material.

• Such a file can be reset to the beginning and on some systems a program may be able to
skip forward or backward ‘n’ records for some integer ‘n’.

4
Advantages of Sequential File:

1. Easy to access the next record.

2. Data organization is very simple.

3. Absence of data structures.

4. They are easily stored on tapes as well as disks.

5. Automatic backup copy is created.

Disadvantages of Sequential File:

1. Wastage of memory space because of master file and transaction file.

2. For interactive applications that involve queries and/or updates of individual records, the

sequential file provides poor performance.

3. It is more time consuming since, reading, writing and searching always start from beginning of
file.

C) Index sequential file:

These additional methods generally involve the construction of an index for the file. The index,
like an index in the back of a book, contains pointers to the various blocks. To find an entry in the
file we first search the index and then use the pointer to directly access the file and find the desired
entry. From this search we would know exactly which block contains the desired entry and access
that block. This structure allows us to search a large file with very little Input/Output.

With large files, the index file itself may become too large to be kept in memory. One
solution is then to create an index for the index file. The primary index file would contain pointers
to secondary index files, which then point to the actual data items.

Advantages:

• Variable length records are allowed.

• Indexed sequential file may be updated in sequential or random mode

• Very fast operation

Disadvantages:

• The major disadvantage of the index sequential file is that as the file grows, a performance
deteriorates rapidly because of overflows and consequently there arises the need for
periodic reorganization. Reorganization is an expensive process and the file becomes
unavailable during reorganization.

5
• When a new record is added to the main file, all of the index files must be updated.

• Consumes large memory space for maintaining index files.

D) Direct Access:

For direct access the file is viewed as a number sequence of blocks of records. A block is generally
a fixed length quantity, defined by operating system. A block may be a byte 512 words, 1024 bytes
or some other quantity, depending upon the system.

A direct access file allows arbitrary blocks to be read or written. Thus we may read block 14, then
read block 50 and then write block 7. There are no restrictions on the order of reading or writing
for a direct access file.

Direct access files are of great use for immediate access to large amounts of information. When a
query concerning a particular subject arrives, we compute which block contains the answer and
then read the block directly to provide the desired information. Not all the operating system
support both sequential and direct access of files. Some systems allow only sequential file access,
others allow only direct access.

Advantages of Direct File Access:

1. Using this method we can access any records randomly.

2. It gives fastest retrieval of records.

Disadvantages of Direct File Access:

1. Wastage of storage space, if hashing algorithm is not chosen properly.

2. This method is complex and expensive.

6.2 Access Methods


• From the user’s point of view, a file is an abstract data type. It can be created, opened,
written, read, closed and deleted without any real concern for its implementation. The
implementation of a file is a problem for the operating system.

• The main problem is how to allocate space to these files so that disk space is effectively
utilized and files can be quickly accessed.

• Three major methods of allocating disk space are in wide use:

6
a) Contiguous

b) Linked

c) Indexed

a) Contiguous Allocation
• The contiguous allocation method requires each file to occupy a set of contiguous
addresses on the disk. Disk addresses define a linear ordering on the disk. Contiguous
allocation of a file is defined by the disk address of the first block and its length. If the file
is ‘n’ blocks long and starts at location ‘b’, then it occupies blocks b, b+1, b+2, - - - - -
b+n-1. The directory entry for each file indicates the address of the starting block and the
length of the area allocated for this file.
• Contiguous allocation supports both sequential and direct access.
• For direct access to block ‘i’ of a file, which starts at block ‘b’, we can immediately access
block b+i. The difficulty with contiguous allocation is finding space for a new file.
• For direct access to block ‘i’ of a file, which starts at block ‘b’, we can immediately access
block b+i.
• The difficulty with contiguous allocation is finding space for a new file.
• If file to be created are ‘n’ blocks long, we must search free space list for ‘n’ free contiguous
blocks.

Advantages of Contiguous File Allocation Method:


1. Supports both sequential and direct access methods.
2. Contiguous allocation is the best form of allocation for sequential files. Multiple blocks
can be brought in at a time to improve I/O performance for sequential processing.
3. It is also easy to retrieve a single block from a file. For example, if a file starts at block ‘n’
and the ith block of the file is wanted, its location on secondary storage is simply n + i.

7
4. Reading all blocks belonging to each file is very fast.
5. Provides good performance.
Disadvantages of Contiguous File Allocation Method:
1. Suffers from external fragmentation.
2. Very difficult to find contiguous blocks of space for new files.
3. Also with pre-allocation, it is necessary to declare the size of the file at the time of
creation which many a times is difficult to estimate.
4. Compaction may be required and it can be very expensive.

b) Linked File Allocation (Chained File Allocation)


With linked allocation, each file is a linked list of disk blocks; the disk blocks may be
scattered anywhere on the disk. The directory contains a pointer to the first (and last)
blocks of the file. For example, a file of 5 blocks, which starts at block 9 might continue at
block 16, then block 1, block 10, and finally block 25. Each block contains a pointer to the
next block. These pointers are not made available to the user, thus if sector is 512 words
and a disk address (the pointer) requires two words, then the user sees blocks of 510
words.
Creating a file is easy. We simply create a new entry in the device directory. A write
to a file removes first free block from free space list and write to it. This new block is then
linked to the end of the file. To read a file, we simply read block by following the pointers
from block to block. There is no external fragmentation with linked allocation.

Advantages of Linked File Allocation Method:


1. Any free blocks can be added to a chain.
2. There is no external fragmentation.
3. Best suited for sequential files that are to be processed sequentially.
4. No need to know the size of the file in advance.
5. The disk address of first block can be used to locate the rest of the blocks.

8
6. Never necessary to defragment disk. Blocks are completely utilized here. So no disk
fragmentation.
7. No need to compact or relock files.
Disadvantages of Linked File Allocation Method:
1. There is no accommodation of the principle of locality that is series of accesses to
different parts of the disk are required.
2. Space is required for the pointers, 1.5% of disk is used for the pointers and not for
information. If a pointer is lost or damaged or bug occurs in operating system or disk
hardware failure occur, it may result in picking up the wrong pointer.
3. This method cannot support direct access.

c) Indexed File Allocation


• Linked allocation solves the external fragmentation and size declaration problems of
contiguous allocation. However linked allocation cannot support direct access, since the
blocks are scattered all over the disk. Mostly pointers to blocks are scattered all over the
disk.
• Indexed allocation solves this problem by bringing all of the pointers together into one
location the Index Block.
• Each file has its own index block, which is an array of disk block addresses. The ith entry in
the index block points to the ith block of the file. The directory contains the address of the
index block.
• To read the ith block we use pointer in ith index block entry to find and read the desired
block. When the file is created, all pointers in the index block are set to nil. When the ith
block is first written a lock is removed from the free space list and its address is put in the
ith index block entry.
• Indexed allocation supports direct access, without suffering from external fragmentation.
Indexed allocation does suffer from wasted space. The pointer overhead of index block is
worse than pointer over head of linked allocation. Assume we have a file of only one or
two blocks with linked allocation we only lose the space of one pointer per block. With
indexed allocation an index block must be allocated even if only one or two pointers will
be non-nil.

9
Advantages of Indexed File Allocation Method:
1. Does not suffer from external fragmentation.
2. Support both sequential and direct access to the file.
3. No need for user to know size of the file in advance.
4. Indexing of free space can be done by mean of the bit map.
5. Entire block is available for data as no space is occupied by pointers.
Disadvantages of Indexed File Allocation Method:
1. It required lot of space for keeping pointers so wasted space of memory.
2. Indexed allocation is more complex and time consuming.
3. Overhead of index blocks is not feasible for very small file.
4. Overhead of index blocks is not feasible for very big file also, because it is difficult to
manage levels of indices.
5. Keeping index in memory requires space.

6.3 Directory Structure


• Numbers of files are stored on the disk. To keep track of files, file systems normally have
directories or folders, which in many systems are themselves files. To manage all these
files, we organized them in to structure called directory structure.
• Directories are basically symbol tables of files. A single flat directory can contain a list of
all files in a system.
• The directory structure is the organization of files into a hierarchy of folders.
• A directory can be defined as a way of grouping files together.

10
a) Single Level Directory Structure
• It is the simplest form of directory system is having one directory containing all the files.
Sometimes, it is called the root directory.
• In single level directory structure, the entire files are contained in the same directory. So
unique name must be assigned to each file of the directory
• Single level directory structure was implemented in the older versions of single user
systems.
• The world’s first supercomputer, the CDC 6600, had only a single directory for all files, even
though it was used by many users at once.

11
Advantages of Single Level Directory Structure:
1. Single level directory structure is easy to implement and maintain.
2. It is simple directory structure.
3. Single level directory structure, the operations like creation, searching, deletion,
updating are very easy and faster.
Disadvantages of Single Level Directory Structure:
1. It having only one directory in a system so there may chance of name collision because
two files cannot have the same name.
2. In single level directory structure, difficult to keep track of the files, if the number of files
increases.
3. This directory is not used on multi-user systems but could be used on a small embedded
system.
4. The files such as graphics, text etc. are inconvenient for this data structure.
5. The MS-DOS operating system allows only 11-character file names; UNIX, in contrast,
allows 255 characters.

b) Two Level Directory Structure


• The structure of two-level directory structure is divided into two levels of directories
namely, a master directory and user directories. The user directories are the sub-directories
of the master directory.
• In two level directory structure, a separate directory is provided to each user and all these
directories are contained and indexed in the master directory.
• The user directory represents a list of files of a specific user. In this directory structure, each
user has its private directory known as User File Directory (UFD).
• The user directories themselves must be created and deleted as necessary. A special
system program is run with the appropriate user name and account information. The
program creates a new UFD and adds an entry for it to the Master File Directory (MFD).
Thus, the two-level directories solve the name collision problem.

12
Advantages of Two-Level Directory Structure:

1. It solves the file name collision problem by creating own user directory.

2. This method isolates one user from another and protects user’s files.

3. Different users may have files with same name.

Disadvantages of Two-level directory

1. Still it not very scalable, two files of the same type cannot be grouped together in the same
user.

2. Sharing of files by different users is difficult.

13
c) Hierarchical Directory Structure (Tree Structure)
• The two-level hierarchies eliminate name conflicts among users but are not satisfactory
for users with a large number of files. We needed general hierarchy i.e., a tree of directories.

• The tree structured directory structure, allows users to create their own subdirectory and
to organize their files accordingly. A subdirectory contains a set of files or subdirectories.
A directory is simply another file, but it is treated in a special way.

Advantages of Tree Structured Directory:

1. User can create directory as well as subdirectory.

2. Users can be provided access to a sub directory rather than the entire directory.

3. It provides a better structure to file system.

4. Managing millions of files is easy with tree structured directory.

14
Disadvantages of Tree Structured Directory:

1. The tree structure can create duplicate copies of the files.

2. The users could not share files or directories.

3. It is inefficient, because accessing a file may go under multiple directories.

4. Search time may become unnecessarily long.

Disk management in Linux

15
Physical Disk Structure

Structure of Hard Disk

Measurement of speed of the disk


• Transfer Rate is the rate at which the data moves from disk to the computer.

• Random Access Time is the sum of the seek time and rotational latency.

16
• The seek time is the time for the disk arm to move the head to the required cylinder
containing the desired track.

• The rotational latency is the additional time for the disk to rotate the desired sector to
the disk head.

• The disk bandwidth is the total number of bytes transferred, divided by the total time
between the first request for service and the completion of the last transfer.

Logical Structure of Hard Disk


We can divide the logical structure of the hard disk in the following five logical terms:

1. MBR (Master Boot Record).

2. DBR (DOS Boot Record).

3. FAT (File Allocation Tables).

4. Root Directory.

5. Data Area.

Master Boot Record:

• It contains a small program to load and start the active partition from the hard disk.

• The MBR is created on the hard disk drive by executing FDISK.EXE command of DOS.

• It is located at absolute sector 0 or we can say at cylinder 0, head 0 and sector 1(The MBR).

• If we have more than one partition, then there are Extended Master Boot Records, located
at the beginning of each extended partition volume.

17
(DBR) Dos Boot Record:

• DOS Boot Record(DBR) or sometimes called DOS Boot Sector is the second most
important information on your hard disk.

• It contains some important information about disk geometry like: Bytes Per Sector, Sectors
per cluster, Reserved Sectors etc.

• The DBR is created by the FORMAT command of DOS.

• All DOS partitions contain the program code to boot the machine, but only that partition
is given control by the MBR which is specified as active partition.

FAT (File Allocation Table)

• It was developed to fulfil the requirements of a fast and flexible system for managing data
on both removable and fixed media.

• FAT keeps a map of the complete surface of the disk drive such that, which area is free,
which area is taken up by which file etc. When some data stored on the disk is to be
accessed, the DOS consults the FAT to find out the areas of the hard disk which contains
the data.

• The FAT manages the disk area in a group of sectors called “CLUSTER”.

Root Directory:

• The Root Directory is like a table of contents for the information stored on the hard disk
drive. The directory area keeps the information about the file name, date and time of the
file creation, file attribute, file size and starting cluster of the particular file.

• The number of files that one can store on the root directory depends on the FAT type
being used.

Data Area OR Files Area:

• The remainder of the volume after Root Directory is the Data Area.

• The data area contains the actual data stored on the disk surfaces.

• When we format a hard disk the FORMAT command of DOS does not destroy or overwrite
the data on the data area. The FORMAT command only removes the directory entry and
FAT entries and it does not touch the actual data area. This makes the recovery of
accidentally formatted hard disk drive possible.

18
RAID (Redundant Array of Independent Disks)
• RAID (Redundant Array of Independent Disks originally Redundant Array of Inexpensive
Disks) is a way of storing the same data in different places on multiple hard disks to protect
data in the case of a drive failure.

• RAID organizes multiple disks into a large, high-performance logical disk. In other words,
if you have three hard drives, you can configure them to look like one large drive.

• RAID is a set of physical disk drives, and the operating system views it as a single logical
drive. Data are distributed across physical drives in a way that enables simultaneous access
to data from multiple drives.

• RAID is a data storage virtualization technology that combines multiple physical disk drive
components into a single logical unit for the purposes of data redundancy, performance
improvement, or both.

RAID Levels:

RAID-0 (Striped Disk Array without fault Tolerance)

• Simple striping is used in this level to gain in performance.

• This level does not offer any redundancy.

• Data is broken into stripes of user-defined size and written to a different drive in the array.

• Minimum of two disks are required. It uses 100% of the storage capacity since no
redundant information is written.

Web servers, graphics design, audio and video editing, and online gaming are some
example applications that might benefit from this level.

19
RAID-1 (Mirroring and Duplexing)

• This level performs mirroring of data in drive 1 to drive 2. It offers 100% redundancy as
array will continue to work even if either disk fails.

• This level uses mirroring and data is duplicated on two drives.

• If either fails, the other continues to function until the failed drive is replaced.

• A minimum of 2 drives is required.

20
RAID-2 (Hamming Code Error Correcting Code)

• This level uses bit-level data stripping rather than block level.

• To be able to use RAID 2 make sure the disk selected has no self-disk error checking
mechanism as this level uses external Hamming code for error detection.

• This is one of the reason RAID is not in the existence in real IT world as most of the disks
used these days come with self-error detection. It uses an extra disk for storing all the
parity information

RAID-3 (Parallel Transfer with parity)

• In RAID 3, the data block is striped and written on the data disks. This requires a minimum
of three drives to implement.

• This level uses byte level stripping along with parity. One dedicated drive is used to store
the parity information and in case of any drive failure the parity is restored using this extra
drive.

• But in case the parity drive crashes then the redundancy gets affected again so not much
considered in organizations.

21
RAID-4 (Independent Data Disks with shared parity disk)

• This level is very much similar to RAID 3 apart from the feature where RAID 4 uses block
level stripping rather than byte level.

• In a RAID-4 system, if any one of the disks fails, the data on the remaining disks can be
used to reconstruct the data that was on the failed disk. Even if the parity disk fails, the
other disks are still intact. Thus RAID-4 can survive the failure of any of its disks.

22
RAID-5 (Independent Data Disks with Distributed parity blocks)

• Parity information is written to a different disk in the array for each stripe. In case of single
disk failure data can be recovered with the help of distributed parity without affecting the
operation and other read write operations.

• One of the most popular RAID techniques, it uses Block Striping of data along with parity
and writes them to all drives. RAID-5 systems require a minimum of 3 disks.

• If anyone drive fails, the array is said to be degraded, and the data blocks residing on that
drive can be derived from parity and data on remainder of the drives.

RAID-6 (Independent Data Disks with Two independent parity schemes)

23
• This level is an enhanced version of RAID 5 adding extra benefit of dual parity. This level
uses block level stripping with DUAL distributed parity. So now you can get extra
redundancy.

• The advantages of RAID-6 becomes even more pronounced as the capacity of SATA drives
go up and rebuilds take longer to finish.

• RAID 6 requires a minimum of four drives to be implemented and the usable capacity is
always 2 less than the number of available disk drives in the RAID set. Applications suited
for this level are the same as those of level 5.

24

You might also like