empirically that when this is done, allocated sectors will cluster at the start of the disk, and free
sectors will cluster towards the end. The result is that the expected number of bits to be scanned
will tend towards the worst case of n , r + 1.
There are two simple heuristics that alleviate this problem. The simplest way to maintain a
roughly uniform distribution of free sectors is to begin scanning at a random position each time.
The method more frequently used in practice is to use a roving pointer, where the next search begins
where the last left off. It is also common to reset the roving pointer to the beginning of the deleted
region when a file is freed. This increases the randomness of the free sectors since the roving
pointer is reset to a random position just before a sequence of zero bits. It can also significantly
improve performance as the disk becomes increasingly full, since the roving pointer will often be
at the beginning of a free region.
While the analysis presented here is simple, we have been unable to locate it in any of the
standard texts. It has important implications to practitioners, who might instead follow their
intuition and implement a more complex scheme than necessary.
References
[1] A. S. Tanenbaum, Modern Operating Systems. Englewood Cliffs: Prentice Hall, 1992.
[2] B. V. Gnedenko, The Theory of Probability and the Elements of Statistics. New York: Chelsea, 1989.
3
In general,
pk =
Pr[bk
=
0 ^ bj
= 1; 1
j < k] =
"k,
Y2
=0
(1
i
#
, n , i)
r
r
n
,k+1 =
"k ,
Y2 n
i
=0
,i,r
n,i
#
r
n
, k + 1;
which simply means that the probability of bk being the first zero bit is conditional on the probability of bj ; 1 j k , 1 being one. As k increases the likelihood of bk being zero increases as the
number of bits yet to be scanned decreases.
We can reduce this unwieldy expression by noticing that
,
k
Y2
i
=0
(n
}|
Similarly,
,
k
Y2
i
The result is that
pk =
,1
{
, i , r) = (n , r)(n , r , 1) (n , r , k + 2) = (n ,(nr ,, kr)+! 1)! :
k
z
(n
=0
(n
, i) = (n , nk!+ 1)! :
(n , r)!
(n , k + 1)!
k)! r
n , rk + 1 = ((nn ,, rr)!, (kn+,1)!
:
, r , k + 1)!
n!
n!
This can be rewritten as
,n,k
,r,
1 ;
n
pk =
r
which can be viewed as the number of ways to obtain a run of length k , 1 one bits divided by
the total number of ways to arrange r zero bits out of n total.
Let the random variable x be the number of bits that must be scanned before finding a zero,
then the expected value of x is
E[x] =
n
,X
r +1
i
=1
,n,i
,X
r +1
i r ,1
,n :
r
i=1
n
ipi =
Since the only the first n , r bits can be allocated, in the worst case the scan will have to continue
to bit n , r + 1 (which must certainly be zero). This sum can be reduced to
E[x] = nr ++ 11 :
The variance of x is given by
V[x] = r (n , r2) (1 + n) :
(1 + r) (2 + r)
This means that the expected number of bits that must be scanned to find a free sector depends
only on the ratio of the total number of sectors to the number of free sectors. Said another way,
given a disk that is 90% full, it does not matter whether the disk is 20 megabytes or 5 gigabytes,
the average number of bits that must be scanned is approximately 10 (except for a vanishingly
small ).
We assumed that free sectors were uniformly distributed across the disk. If the algorithm
always scans sequentially from the first bit, this will almost never be true. It has been observed
2
A note on bit-mapped free sector management
Darrell D. E. Long
Computer & Information Sciences
University of California, Santa Cruz
The most common methods for maintain a list of free sectors on disk are to use either a linked
list or a bit map [1].
Using a linked list has the advantage that is requires no extra storage since the links are stored
in the free sectors. It also provides quick allocation and deallocation, requiring only that a free
sector be removed from the head of the list, or a freed sector be added to the head of the list,
respectively.
The main disadvantage of a linked list is that over time the list tends towards random. That
is, unless the list is sorted, sectors are placed on the list in no particular order. The result is poor
locality during file access, significantly impacting performance by increasing the average seek
time.
By using a bit map, adjacent free sectors will always appear adjacent in the bit map. There
is a small cost in terms of storage; that is, the bit map will contain one eighth as many bytes as
there are sectors on the disk. There is a potentially more important concern: the average number
of bits that must be scanned in order to find a free sector. Since this technique was first used, the
size of disks has increased by approximately four orders of magnitude. If the number of bits to be
scanned on average increased even a small fraction of this amount, the technique would need to
be abandoned.
This question came up in our undergraduate operating systems course. Our initial speculation
was that bit maps would be inappropriate for the large disks that are becoming available.
As the following analysis will show, the bit map technique remains viable for large disks, and
as we shall see it is, in a sense, independent of the size of the disk.
Assume for the moment that the free sectors are uniformly distributed across the disk, resulting
from files being freed in no particular order. Further assume that the system is in steady state
with r sectors free on average out of a total n. The problem then reduces to drawing one of r cards
from a deck of size n [2]. Let bi , 1 i n, be the bit map where there are r zero bits indicating
free sectors. Then the probability of the first bit being zero is
p1
Similarly,
p2
and
p3
=
=
Pr[b2
Pr[b3 = 0 ^ b1
=
=
=
Pr[b1
0 ^ b1
= 0] =
r
n
= 1] = (1
,
1 ^ b2 = 1] = (1 ,
1
:
r
n
r
n
)
r
n
)(1
,1
;
, , 1) , 2
r
n
r
n
: