Academia.eduAcademia.edu

A note on bit-mapped free sector management

1993, ACM SIGOPS Operating Systems Review

empirically that when this is done, allocated sectors will cluster at the start of the disk, and free sectors will cluster towards the end. The result is that the expected number of bits to be scanned will tend towards the worst case of n , r + 1. There are two simple heuristics that alleviate this problem. The simplest way to maintain a roughly uniform distribution of free sectors is to begin scanning at a random position each time. The method more frequently used in practice is to use a roving pointer, where the next search begins where the last left off. It is also common to reset the roving pointer to the beginning of the deleted region when a file is freed. This increases the randomness of the free sectors since the roving pointer is reset to a random position just before a sequence of zero bits. It can also significantly improve performance as the disk becomes increasingly full, since the roving pointer will often be at the beginning of a free region. While the analysis presented here is simple, we have been unable to locate it in any of the standard texts. It has important implications to practitioners, who might instead follow their intuition and implement a more complex scheme than necessary. References [1] A. S. Tanenbaum, Modern Operating Systems. Englewood Cliffs: Prentice Hall, 1992. [2] B. V. Gnedenko, The Theory of Probability and the Elements of Statistics. New York: Chelsea, 1989. 3 In general, pk = Pr[bk = 0 ^ bj = 1; 1  j < k] = "k, Y2 =0 (1 i # , n , i) r r n ,k+1 = "k , Y2 n i =0 ,i,r n,i # r n , k + 1; which simply means that the probability of bk being the first zero bit is conditional on the probability of bj ; 1  j  k , 1 being one. As k increases the likelihood of bk being zero increases as the number of bits yet to be scanned decreases. We can reduce this unwieldy expression by noticing that , k Y2 i =0 (n }| Similarly, , k Y2 i The result is that pk = ,1 { , i , r) = (n , r)(n , r , 1)    (n , r , k + 2) = (n ,(nr ,, kr)+! 1)! : k z (n =0 (n , i) = (n , nk!+ 1)! : (n , r)! (n , k + 1)! k)! r  n , rk + 1 = ((nn ,, rr)!, (kn+,1)! : , r , k + 1)!  n! n! This can be rewritten as ,n,k  ,r, 1 ; n pk = r which can be viewed as the number of ways to obtain a run of length k , 1 one bits divided by the total number of ways to arrange r zero bits out of n total. Let the random variable x be the number of bits that must be scanned before finding a zero, then the expected value of x is E[x] = n ,X r +1 i =1 ,n,i ,X r +1 i r ,1 ,n : r i=1 n ipi = Since the only the first n , r bits can be allocated, in the worst case the scan will have to continue to bit n , r + 1 (which must certainly be zero). This sum can be reduced to E[x] = nr ++ 11 : The variance of x is given by V[x] = r (n , r2) (1 + n) : (1 + r) (2 + r) This means that the expected number of bits that must be scanned to find a free sector depends only on the ratio of the total number of sectors to the number of free sectors. Said another way, given a disk that is 90% full, it does not matter whether the disk is 20 megabytes or 5 gigabytes, the average number of bits that must be scanned is approximately 10 (except for a vanishingly small ). We assumed that free sectors were uniformly distributed across the disk. If the algorithm always scans sequentially from the first bit, this will almost never be true. It has been observed 2 A note on bit-mapped free sector management Darrell D. E. Long Computer & Information Sciences University of California, Santa Cruz The most common methods for maintain a list of free sectors on disk are to use either a linked list or a bit map [1]. Using a linked list has the advantage that is requires no extra storage since the links are stored in the free sectors. It also provides quick allocation and deallocation, requiring only that a free sector be removed from the head of the list, or a freed sector be added to the head of the list, respectively. The main disadvantage of a linked list is that over time the list tends towards random. That is, unless the list is sorted, sectors are placed on the list in no particular order. The result is poor locality during file access, significantly impacting performance by increasing the average seek time. By using a bit map, adjacent free sectors will always appear adjacent in the bit map. There is a small cost in terms of storage; that is, the bit map will contain one eighth as many bytes as there are sectors on the disk. There is a potentially more important concern: the average number of bits that must be scanned in order to find a free sector. Since this technique was first used, the size of disks has increased by approximately four orders of magnitude. If the number of bits to be scanned on average increased even a small fraction of this amount, the technique would need to be abandoned. This question came up in our undergraduate operating systems course. Our initial speculation was that bit maps would be inappropriate for the large disks that are becoming available. As the following analysis will show, the bit map technique remains viable for large disks, and as we shall see it is, in a sense, independent of the size of the disk. Assume for the moment that the free sectors are uniformly distributed across the disk, resulting from files being freed in no particular order. Further assume that the system is in steady state with r sectors free on average out of a total n. The problem then reduces to drawing one of r cards from a deck of size n [2]. Let bi , 1  i  n, be the bit map where there are r zero bits indicating free sectors. Then the probability of the first bit being zero is p1 Similarly, p2 and p3 = = Pr[b2 Pr[b3 = 0 ^ b1 = = = Pr[b1 0 ^ b1 = 0] = r n = 1] = (1 , 1 ^ b2 = 1] = (1 , 1 : r n r n ) r n )(1 ,1 ; , , 1) , 2 r n r n :