OSY Chapter 6 SSP
OSY Chapter 6 SSP
OSY Chapter 6 SSP
File Management
• A file is a named collection of related information that is recorded on secondary storage
such as magnetic disks, magnetic tapes and optical disks.
• Commonly files represent programs (source and object forms) and data. Data files may be
numeric alphabetic, alphanumeric or binary.
• In general, a file is a sequence of bits, bytes lines or records whose meaning is defined by
the file’s creator and user. The information in a file is defined by its creator.
• Many different types of information may be stored in a file: Source programs, Object
programs, Executable programs, Numeric Data, Text, Payroll records, Graphic Images,
Sound recording and so on.
File Attributes:
• Name: The symbolic file name is the only information kept in human readable form.
• Type: This information is needed for those systems that support different types.
• Location: This information is a pointer to a device and to the location of the file on that
device.
• Size: The current size of the file (in bytes, words or blocks) and possibly the maximum
allowed size are included in this attribute.
• Protection: Access control information controls that who can do reading, writing,
executing and so on.
• Time, Date and User Identification: This information may be kept for creation, Last
modification and last use. These data can be useful for protection, security and usage
monitoring.
• Identifier: File system gives a unique tag or number that identifies file within file system
and which is used to refer files internally.
• Creator or Owner: A creator is a user or a person who has created that file and the owner
is a person who owns that file currently.
File Operations:
• Creating a file: Two steps are necessary to create a file. First space in the file system must
be found for the file. Second an entry for the new file must be made in the directory. The
directory entry records the name of the file and the location in the file system.
1
• Writing a file: To write a file, we make a system call specifying both the name of the file
and the information to be written to the file. Given the name of the file, the system searches
the directory to find the location of file then the write pointer must be updated whenever
a write occurs
• Reading a file: To read from a file, we use a system call that specifies the name of the file
and where (in memory) the next block of the file should be put. System needs to keep a
read pointer to location in the file where the next read is to take place. Once the read has
taken place, the read pointer is updated.
• Repositioning within a file: The directory is searched for the appropriate entry, and the
current file position is set to a given value. Repositioning within a file does not need to
involve any actual I/O. This file operation is also known as a file seeks.
• Deleting a file: To delete a file, we search the directory for the named file. Having found
the associated directory entry, we release all file space and erase the directory entry.
• Truncating a file: Instead of deleting a file and then recreate it, this function allows all
attributes to remain unchanged but for the file to be reset to length zero. User wants to
erase the contents of the file.
• Other common operations include appending new information to the end of an existing
file, and renaming an existing file.
File Types:
The file name is split into two parts a name of file and extension.
2
File Structure:
A) Serial File:
• The least complicated form of file organization is the serial file or pile. Data are collected
in the order in which they arrive.
• The purpose of this file is simply accumulating the mass of data and save it. Records may
have different fields, or similar fields in different orders. Thus, each field should be self
describing including a field name as well as a value.
3
Advantages of Serial File:
1. Simple organization.
• The simplest access method is sequential access. Information in the file is processed in
order, one record after the other.
• The bulk of the operations on a file are reads and writes. The read and write operations on
the sequential file are done in sequential order.
• A read operation reads the next portion of the file and automatically advances a file
pointer, which tracks the I/O location. Similarly a write appends to the end of the file and
advances to the end of the newly written material.
• Such a file can be reset to the beginning and on some systems a program may be able to
skip forward or backward ‘n’ records for some integer ‘n’.
4
Advantages of Sequential File:
2. For interactive applications that involve queries and/or updates of individual records, the
3. It is more time consuming since, reading, writing and searching always start from beginning of
file.
These additional methods generally involve the construction of an index for the file. The index,
like an index in the back of a book, contains pointers to the various blocks. To find an entry in the
file we first search the index and then use the pointer to directly access the file and find the desired
entry. From this search we would know exactly which block contains the desired entry and access
that block. This structure allows us to search a large file with very little Input/Output.
With large files, the index file itself may become too large to be kept in memory. One
solution is then to create an index for the index file. The primary index file would contain pointers
to secondary index files, which then point to the actual data items.
Advantages:
Disadvantages:
• The major disadvantage of the index sequential file is that as the file grows, a performance
deteriorates rapidly because of overflows and consequently there arises the need for
periodic reorganization. Reorganization is an expensive process and the file becomes
unavailable during reorganization.
5
• When a new record is added to the main file, all of the index files must be updated.
D) Direct Access:
For direct access the file is viewed as a number sequence of blocks of records. A block is generally
a fixed length quantity, defined by operating system. A block may be a byte 512 words, 1024 bytes
or some other quantity, depending upon the system.
A direct access file allows arbitrary blocks to be read or written. Thus we may read block 14, then
read block 50 and then write block 7. There are no restrictions on the order of reading or writing
for a direct access file.
Direct access files are of great use for immediate access to large amounts of information. When a
query concerning a particular subject arrives, we compute which block contains the answer and
then read the block directly to provide the desired information. Not all the operating system
support both sequential and direct access of files. Some systems allow only sequential file access,
others allow only direct access.
• The main problem is how to allocate space to these files so that disk space is effectively
utilized and files can be quickly accessed.
6
a) Contiguous
b) Linked
c) Indexed
a) Contiguous Allocation
• The contiguous allocation method requires each file to occupy a set of contiguous
addresses on the disk. Disk addresses define a linear ordering on the disk. Contiguous
allocation of a file is defined by the disk address of the first block and its length. If the file
is ‘n’ blocks long and starts at location ‘b’, then it occupies blocks b, b+1, b+2, - - - - -
b+n-1. The directory entry for each file indicates the address of the starting block and the
length of the area allocated for this file.
• Contiguous allocation supports both sequential and direct access.
• For direct access to block ‘i’ of a file, which starts at block ‘b’, we can immediately access
block b+i. The difficulty with contiguous allocation is finding space for a new file.
• For direct access to block ‘i’ of a file, which starts at block ‘b’, we can immediately access
block b+i.
• The difficulty with contiguous allocation is finding space for a new file.
• If file to be created are ‘n’ blocks long, we must search free space list for ‘n’ free contiguous
blocks.
7
4. Reading all blocks belonging to each file is very fast.
5. Provides good performance.
Disadvantages of Contiguous File Allocation Method:
1. Suffers from external fragmentation.
2. Very difficult to find contiguous blocks of space for new files.
3. Also with pre-allocation, it is necessary to declare the size of the file at the time of
creation which many a times is difficult to estimate.
4. Compaction may be required and it can be very expensive.
8
6. Never necessary to defragment disk. Blocks are completely utilized here. So no disk
fragmentation.
7. No need to compact or relock files.
Disadvantages of Linked File Allocation Method:
1. There is no accommodation of the principle of locality that is series of accesses to
different parts of the disk are required.
2. Space is required for the pointers, 1.5% of disk is used for the pointers and not for
information. If a pointer is lost or damaged or bug occurs in operating system or disk
hardware failure occur, it may result in picking up the wrong pointer.
3. This method cannot support direct access.
9
Advantages of Indexed File Allocation Method:
1. Does not suffer from external fragmentation.
2. Support both sequential and direct access to the file.
3. No need for user to know size of the file in advance.
4. Indexing of free space can be done by mean of the bit map.
5. Entire block is available for data as no space is occupied by pointers.
Disadvantages of Indexed File Allocation Method:
1. It required lot of space for keeping pointers so wasted space of memory.
2. Indexed allocation is more complex and time consuming.
3. Overhead of index blocks is not feasible for very small file.
4. Overhead of index blocks is not feasible for very big file also, because it is difficult to
manage levels of indices.
5. Keeping index in memory requires space.
10
a) Single Level Directory Structure
• It is the simplest form of directory system is having one directory containing all the files.
Sometimes, it is called the root directory.
• In single level directory structure, the entire files are contained in the same directory. So
unique name must be assigned to each file of the directory
• Single level directory structure was implemented in the older versions of single user
systems.
• The world’s first supercomputer, the CDC 6600, had only a single directory for all files, even
though it was used by many users at once.
11
Advantages of Single Level Directory Structure:
1. Single level directory structure is easy to implement and maintain.
2. It is simple directory structure.
3. Single level directory structure, the operations like creation, searching, deletion,
updating are very easy and faster.
Disadvantages of Single Level Directory Structure:
1. It having only one directory in a system so there may chance of name collision because
two files cannot have the same name.
2. In single level directory structure, difficult to keep track of the files, if the number of files
increases.
3. This directory is not used on multi-user systems but could be used on a small embedded
system.
4. The files such as graphics, text etc. are inconvenient for this data structure.
5. The MS-DOS operating system allows only 11-character file names; UNIX, in contrast,
allows 255 characters.
12
Advantages of Two-Level Directory Structure:
1. It solves the file name collision problem by creating own user directory.
2. This method isolates one user from another and protects user’s files.
1. Still it not very scalable, two files of the same type cannot be grouped together in the same
user.
13
c) Hierarchical Directory Structure (Tree Structure)
• The two-level hierarchies eliminate name conflicts among users but are not satisfactory
for users with a large number of files. We needed general hierarchy i.e., a tree of directories.
• The tree structured directory structure, allows users to create their own subdirectory and
to organize their files accordingly. A subdirectory contains a set of files or subdirectories.
A directory is simply another file, but it is treated in a special way.
2. Users can be provided access to a sub directory rather than the entire directory.
14
Disadvantages of Tree Structured Directory:
15
Physical Disk Structure
• Random Access Time is the sum of the seek time and rotational latency.
16
• The seek time is the time for the disk arm to move the head to the required cylinder
containing the desired track.
• The rotational latency is the additional time for the disk to rotate the desired sector to
the disk head.
• The disk bandwidth is the total number of bytes transferred, divided by the total time
between the first request for service and the completion of the last transfer.
4. Root Directory.
5. Data Area.
• It contains a small program to load and start the active partition from the hard disk.
• The MBR is created on the hard disk drive by executing FDISK.EXE command of DOS.
• It is located at absolute sector 0 or we can say at cylinder 0, head 0 and sector 1(The MBR).
• If we have more than one partition, then there are Extended Master Boot Records, located
at the beginning of each extended partition volume.
17
(DBR) Dos Boot Record:
• DOS Boot Record(DBR) or sometimes called DOS Boot Sector is the second most
important information on your hard disk.
• It contains some important information about disk geometry like: Bytes Per Sector, Sectors
per cluster, Reserved Sectors etc.
• All DOS partitions contain the program code to boot the machine, but only that partition
is given control by the MBR which is specified as active partition.
• It was developed to fulfil the requirements of a fast and flexible system for managing data
on both removable and fixed media.
• FAT keeps a map of the complete surface of the disk drive such that, which area is free,
which area is taken up by which file etc. When some data stored on the disk is to be
accessed, the DOS consults the FAT to find out the areas of the hard disk which contains
the data.
• The FAT manages the disk area in a group of sectors called “CLUSTER”.
Root Directory:
• The Root Directory is like a table of contents for the information stored on the hard disk
drive. The directory area keeps the information about the file name, date and time of the
file creation, file attribute, file size and starting cluster of the particular file.
• The number of files that one can store on the root directory depends on the FAT type
being used.
• The remainder of the volume after Root Directory is the Data Area.
• The data area contains the actual data stored on the disk surfaces.
• When we format a hard disk the FORMAT command of DOS does not destroy or overwrite
the data on the data area. The FORMAT command only removes the directory entry and
FAT entries and it does not touch the actual data area. This makes the recovery of
accidentally formatted hard disk drive possible.
18
RAID (Redundant Array of Independent Disks)
• RAID (Redundant Array of Independent Disks originally Redundant Array of Inexpensive
Disks) is a way of storing the same data in different places on multiple hard disks to protect
data in the case of a drive failure.
• RAID organizes multiple disks into a large, high-performance logical disk. In other words,
if you have three hard drives, you can configure them to look like one large drive.
• RAID is a set of physical disk drives, and the operating system views it as a single logical
drive. Data are distributed across physical drives in a way that enables simultaneous access
to data from multiple drives.
• RAID is a data storage virtualization technology that combines multiple physical disk drive
components into a single logical unit for the purposes of data redundancy, performance
improvement, or both.
RAID Levels:
• Data is broken into stripes of user-defined size and written to a different drive in the array.
• Minimum of two disks are required. It uses 100% of the storage capacity since no
redundant information is written.
Web servers, graphics design, audio and video editing, and online gaming are some
example applications that might benefit from this level.
19
RAID-1 (Mirroring and Duplexing)
• This level performs mirroring of data in drive 1 to drive 2. It offers 100% redundancy as
array will continue to work even if either disk fails.
• If either fails, the other continues to function until the failed drive is replaced.
20
RAID-2 (Hamming Code Error Correcting Code)
• This level uses bit-level data stripping rather than block level.
• To be able to use RAID 2 make sure the disk selected has no self-disk error checking
mechanism as this level uses external Hamming code for error detection.
• This is one of the reason RAID is not in the existence in real IT world as most of the disks
used these days come with self-error detection. It uses an extra disk for storing all the
parity information
• In RAID 3, the data block is striped and written on the data disks. This requires a minimum
of three drives to implement.
• This level uses byte level stripping along with parity. One dedicated drive is used to store
the parity information and in case of any drive failure the parity is restored using this extra
drive.
• But in case the parity drive crashes then the redundancy gets affected again so not much
considered in organizations.
21
RAID-4 (Independent Data Disks with shared parity disk)
• This level is very much similar to RAID 3 apart from the feature where RAID 4 uses block
level stripping rather than byte level.
• In a RAID-4 system, if any one of the disks fails, the data on the remaining disks can be
used to reconstruct the data that was on the failed disk. Even if the parity disk fails, the
other disks are still intact. Thus RAID-4 can survive the failure of any of its disks.
22
RAID-5 (Independent Data Disks with Distributed parity blocks)
• Parity information is written to a different disk in the array for each stripe. In case of single
disk failure data can be recovered with the help of distributed parity without affecting the
operation and other read write operations.
• One of the most popular RAID techniques, it uses Block Striping of data along with parity
and writes them to all drives. RAID-5 systems require a minimum of 3 disks.
• If anyone drive fails, the array is said to be degraded, and the data blocks residing on that
drive can be derived from parity and data on remainder of the drives.
23
• This level is an enhanced version of RAID 5 adding extra benefit of dual parity. This level
uses block level stripping with DUAL distributed parity. So now you can get extra
redundancy.
• The advantages of RAID-6 becomes even more pronounced as the capacity of SATA drives
go up and rebuilds take longer to finish.
• RAID 6 requires a minimum of four drives to be implemented and the usable capacity is
always 2 less than the number of available disk drives in the RAID set. Applications suited
for this level are the same as those of level 5.
24