File System (Internals) Mar. 31, 2004: "... Does This Look Familiar?... "
File System (Internals) Mar. 31, 2004: "... Does This Look Familiar?... "
File System (Internals) Mar. 31, 2004: "... Does This Look Familiar?... "
15-410
-1-
L26_Filesystem
15-410, S04
Synchronization
Project 3 status
Several groups skipped Checkpoint 3 (the easiest one!) Not everybody took advantage of opportunity to plan Several groups seem on track to finish early Several groups dangerously close to the 90% problem
First 90% of the work takes the first 90% of the time Last 10% of the work takes the second 90% of the time
-2-
15-410, S04
Synchronization
Project 3 / Project 4 hurdle test suite
Released this week Two sections
Basic tests, solidity tests
-3-
Synchronization
Today
Chapter 12 (not: Log-structured, NFS)
-4-
15-410, S04
Outline
File system code layers (abstract) Disk, memory structures Unix VFS layering indirection Directories Block allocation strategies, free space Cache tricks Recovery, backups
-5-
15-410, S04
-6-
15-410, S04
-7-
15-410, S04
Shredding Disks
Split disk into partitions/slices/minidisks/...
PC: 4 partitions Windows, FreeBSD, Plan 9 Mac: volumes OS 9, OS X, system vs. user data
Or: glue disks together into volumes/logical disks Partition may contain...
Paging area
Indexed by in-memory structures random garbage when OS shuts down Block allocation: file # block list Directory: name file #
15-410, S04
File system
-8-
Disk Structures
Boot area (first block/track/cylinder)
Interpreted by hardware bootstrap ( BIOS ) May include partition table Key parameters: #blocks, metadata layout Unix: superblock ownership/permissions data location
-9-
15-410, S04
Memory Structures
In-memory partition tables
Sanity check file system I/O in correct partition
- 10 -
15-410, S04
VFS layer
Goal
Allow one machine to use multiple file system types
Unix FFS MS-DOS FAT CD-ROM ISO9660 Remote/distributed: NFS/AFS
Solution
- 11 -
15-410, S04
IDEintr()
15-410, S04
VFS Virtualization
n = read(fd, buf, size) INT 54 vfs_read() ufs_read() namei() procfs_read() procfs_domem() iput()
15-410, S04
iget()
- 13 -
- 14 -
15-410, S04
- 15 -
Directories
External interface
vnode2 = lookup(vnode1, name) List of (name,inode #) - not sorted! Names are variable-length Lookup is linear
How long does it take to delete N files?
- 16 -
15-410, S04
Allocation / Mapping
Allocation problem
Where do I put the next block of this file? Near the previous block? Where is block 32 of this file? Similar to virtual memory
Multiple large address spaces specific to each file Only one underlying address space of blocks Source address space may be sparse!
Mapping problem
- 17 -
15-410, S04
Allocation Contiguous
Approach
File location defined as (start, length) Sequential disk accesses are cheap Bookkeeping is easy Dynamic storage allocation (fragmentation, compaction) Must pre-declare file size at creation
Motivation
Issues
- 18 -
15-410, S04
Allocation Linked
Approach
File location defined as (start) Each disk block contains pointer to next Avoid fragmentation problems Allow file growth
Motivation
Issues?
- 19 -
15-410, S04
Allocation Linked
Issues
508-byte blocks don't match memory pages In general, one seek per block read/written - slow! Very hard to access file blocks at random
lseek(fd, 37 * 1024, SEEK_SET);
Benefit
Can recover files even if directories destroyed Linked multi-block clusters, not blocks
Common modification
- 20 -
15-410, S04
Allocation FAT
Used by MS-DOS, OS/2, Windows
Digital cameras, GPS receivers, printers, PalmOS, ...
- 21 -
Allocation FAT
7 2 5 -1 3 -1 0 -1
- 22 -
0 1 4
15-410, S04
Allocation - FAT
7 2 5 -1 3 -1 0 -1
- 23 -
0 1 4
15-410, S04
Allocation - FAT
7 2 5 -1 3 -1 0 -1
- 24 -
0 1 4
15-410, S04
Allocation - FAT
7 2 5 -1 3 -1 0 -1
- 25 -
0 1 4
hello.jav: 0, 7
15-410, S04
Allocation FAT
Issues
Damage to FAT scrambles entire disk
Solution: backup FAT Seek to FAT, read, seek to actual block (repeat) Unless FAT can be cached Linear time to walk through FAT
- 26 -
15-410, S04
Allocation Indexed
Motivation
Avoid fragmentation problems Allow file growth Improve random access Per-file block array
Approach
3004 -1 -1 -1 6002 -1 -1 -1
15-410, S04
- 27 -
Allocation Indexed
Allows holes
foo.c is sequential foo.db, blocks 1..3 -1
logically blank
sparse allocation
a.k.a. holes read() returns nulls write() requires alloc file size
ls -l ls -s
file size
- 28 -
Allocation Indexed
How big should index block be?
Too small: limits file size Too big: lots of wasted pointers Linked Multi-level What Unix actually does
- 29 -
15-410, S04
- 30 -
9987 9988 -1 -1
- 31 -
- 32 -
15-410, S04
- 33 -
- 34 -
15-410, S04
- 35 -
15-410, S04
- 36 -
15-410, S04
- 37 -
- 38 -
15-410, S04
- 39 -
15-410, S04
Observation
How much RAM to devote to each one? Why not have just one cache?
Mix automatically varies according to load
15-410, S04
- 40 -
Cache tricks
Read-ahead for (i = 0; i < filesize; ++i) putc(getc(infile), outfile);
System observes sequential reads
can pipeline reads to overlap computation , read latency
Free-behind
Discard buffer from cache when next is requested Good for large files Anti-LRU
- 41 -
15-410, S04
Recovery
System crash...now what?
Some RAM contents were lost Free-space list on disk may be wrong Scan file system
Check invariants Unreferenced files Double-allocated blocks Unallocated blocks Fix problems Expert user???
- 42 -
15-410, S04
Backups
Incremental approach
Monthly: dump entire file system Weekly: dump changes since last monthly Daily: dump changes since last weekly Collect changes since yesterday
Scan file system by modification time
- 43 -
15-410, S04
Summary
Block-mapping problem
Similar to virtual-to-physical mapping for memory Large, often-sparse address spaces
Holes not the common case, but not impossible
Map any logical address to any physical address Key difference: file maps often don' fit in memory t Multiple file system types on one machine Grow your block-allocation map ...
15-410, S04
- 44 -