Virtual Memory

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 5

6 Virtual Memory, Protection, and Paging

In a modern operating system such as Linux or Windows, it is very common to have


several different programs running concurrently in memory. This presents several
problems. First, how do you keep the programs from interfering with one another?
Second, if one program expects to load into memory at address $1000 and a second
program also expects to load into memory at address $1000, how can you load and
execute both programs at the same time? One last question we might ask is what happens
if our computer has 64 megabytes of memory and we decide to load and execute three
different applications, two of which require 32 megabytes and one that requires 16
megabytes (not to mention the memory the operating system requires for its own
purposes)? The answer to all these questions lies in the virtual memory subsystem the
80x86 processors support1.
Virtual memory on the 80x86 gives each process its own 32-bit address space2. This
means that address $1000 in one program is physically different than address $1000 in a
separate program. The 80x86 achieves this sleight of hand by using paging to remap
virtual addresses within one program to different physical addresses in memory. A virtual
address in the memory address that the program uses. A physical address is the bit pattern
than actually appears on the CPU's address bus. The two don't have to be the same (and
usually, they aren't). For example, program #1's virtual address $1000 might actually
correspond to physical address $215000 while program #2's virtual address $1000 might
correspond to physical memory address $300000. How can the CPU do this? Easy, by
using paging.
The concept behind paging is quite simple. First, you break up memory into blocks of
bytes called pages. A page in main memory is comparable to a cache line in a cache
subsystem, although pages are usually much larger than cache lines. For example, the
80x86 CPUs use a page size of 4,096 bytes.
After breaking up memory into pages, you use a lookup table to translate the H.O. bits of
a virtual address to select a page; you use the L.O. bits of the virtual address as an index
into the page. For example, with a 4,096-byte page, you'd use the L.O. 12 bits of the
virtual address as the offset within the page in physical memory. The upper 20 bits of the
address you would use as an index into a lookup table that returns the actual upper 20 bits
of the physical address (see Figure 6.5).
Figure 6.5 Translating a Virtual Address to a Physical Address
Of course, a 20-bit index into the page table would require over one million entries in the
page table. If each entry is 32 bits (20 bits for the offset plus 12 bits for other purposes),
then the page table would be four megabytes long. This would be larger than most of the
programs that would run in memory! However, using what is known as a multi-level page
table, it is very easy to create a page table that is only 8 kilobytes long for most small
programs. The details are unimportant here, just rest assured that you don't need a four
megabyte page table unless your program consumes the entire four gigabyte address
space.
If you study Figure 6.5 for a few moments, you'll probably discover one problem with
using a page table - it requires two memory accesses in order to access an address in
memory: one access to fetch a value from the page table and one access to read or write
the desired memory location. To prevent cluttering the data (or instruction) cache with
page table entries (thus increasing the number of cache misses), the page table uses its
own cache known as the Translation Lookaside Buffer, or TLB. This cache typically has
32 entries on a Pentium family processor. This provides a sufficient lookup capability to
handle 128 kilobytes of memory (32 pages) without a miss. Since a program typically
works with less data than this at any given time, most page table accesses come from the
cache rather than main memory.
As noted, each entry in the page table is 32 bits even though the system really only needs
20 bits to remap the addresses. Intel uses some of the remaining 12 bits to provide some
memory protection information. For example, one bit marks whether a page is read/write
or read-only. Another bit determines if you can execute code on that page. Some bits
determine if the application can access that page or if only the operating system can do
so. Some bits determine if the page is "dirty" (that is, if the CPU has written to the page)
and whether the CPU has accessed the page recently (these bits have the same meaning
as for cache lines). Another bit determines whether the page is actually present in
physical memory or if it's stored on secondary storage somewhere. Note that your
applications do not have access to the page table, and therefore they cannot modify these
bits. However, Windows does provide some functions you can call if you want to change
certain bits in the page table (e.g., Windows will allow you to set a page to read-only if
you want to do so). Linux users also have some memory mapping functions they can call
to play around with the access bits.
Beyond remapping memory so multiple programs can coexist in memory even though
they access the same virtual addresses, paging also provides a mechanism whereby the
operating system can move infrequently used pages to secondary storage (i.e., a disk
drive). Just as locality of reference applies to cache lines, it applies to pages in memory as
well. At any one given time a program will only access a small percentage of the pages in
memory that contain data and code (this set of pages is known as the working set). While
this working set of pages varies (slowly) over time, for a reasonable time period the
working set remains constant. Therefore, there is little need to have the remainder of the
program in memory consuming valuable physical memory that some other process could
be using. If the operating system can save those (currently unused) pages to disk, the
physical memory they consume would be available for other programs that need it.
Of course, the problem with moving data out of physical memory is that sooner or later
the program might actually need it. If you attempt to access a page of memory and the
page table bit tells the MMU (memory management unit) that this page is not present in
physical memory, then the CPU interrupts the program and passes control to the
operating system. The operating system analyzes the memory access request and reads
the corresponding page of data from the disk drive to some available page in memory.
The process is nearly identical to that used by a fully associative cache subsystem except,
of course, accessing the disk is much slower than main memory. In fact, you can think of
main memory as a fully associative write-back cache with 4,096 byte cache lines that
caches the data on the disk drive. Placement and replacement policies and other issues are
very similar to those we've discussed for caches. Discussing how the virtual memory
subsystem works beyond equating it to a cache is will beyond the scope of this text. If
you're interested, any decent text on operating system design will explain how a virtual
memory subsystem swaps pages between main memory and the disk. Our main goal here
is to realize that this process takes place in operating systems like Linux or Windows and
that accessing the disk is very slow.
One important issue resulting from the fact that each program as a separate page table
and the programs themselves don't have access to the page table is that programs cannot
interfere with the operation of other programs by overwriting those other program's data
(assuming, of course, that the operating system is properly written). Further, if your
program crashes by overwriting itself, it cannot crash other programs at the same time.
This is a big benefit of a paging memory system.
Note that if two programs want to cooperate and share data, they can do so. All they've
got to do is to tell the operating system that they want to share some blocks of memory.
The operating system will map their corresponding virtual addresses (of the shared
memory area) to the same physical addresses in memory. Under Windows, you can
achieve this use memory mapped files; see the operating system documentation for more
details. Linux also supports memory mapped files as well as some special shared memory
operations; again, see the OS documentation for more details.

6.7 Thrashing
Thrashing is a degenerate case that occurs when there is insufficient memory at one level
in the memory hierarchy to properly contain the working set required by the upper levels
of the memory hierarchy. This can result in the overall performance of the system
dropping to the speed of a lower level in the memory hierarchy. Therefore, thrashing can
quickly reduce the performance of the system to the speed of main memory or, worse yet,
the speed of the disk drive.
There are two primary causes of thrashing: (1) insufficient memory at a given level in the
memory hierarchy, and (2) the program does not exhibit locality of reference. If there is
insufficient memory to hold a working set of pages or cache lines, then the memory
system is constantly replacing one block (cache line or page) with another. As a result,
the system winds up operating at the speed of the slower memory in the hierarchy. A
common example occurs with virtual memory. A user may have several applications
running at the same time and the sum total of these programs' working sets is greater than
all of physical memory available to the program. As a result, as the operating system
switches between the applications it has to copy each application's data to and from disk
and it may also have to copy the code from disk to memory. Since a context switch
between programs is often much faster than retrieving data from the disk, this slows the
programs down by a tremendous factor since thrashing slows the context switch down to
the speed of swapping the applications to and from disk.
If the program does not exhibit locality of reference and the lower memory subsystems
are not fully associative, then thrashing can occur even if there is free memory at the
current level in the memory hierarchy. For example, suppose an eight kilobyte L1
caching system uses a direct-mapped cache with 16-byte cache lines (i.e., 512 cache
lines). If a program references data objects 8K apart on each access then the system will
have to replace the same line in the cache over and over again with each access. This
occurs even though the other 511 cache lines are currently unused.
If insufficient memory is the cause of thrashing, an easy solution is to add more memory
(if possible, it is rather hard to add more L1 cache when the cache is on the same chip as
the processor). Another alternative is to run fewer processes concurrently or modify the
program so that it references less memory over a given time period. If lack of locality of
reference is causing the problem, then you should restructure your program and its data
structures to make references local to one another.
1
Actually, virtual memory is really only supported by the 80386 and later processors.
We'll ignore this issue here since most people have an 80386 or later processor.
2
Strictly speaking, you actually get a 36-bit address space on Pentium Pro and later
processors, but Windows and Linux limits you to 32-bits so we'll use that limitation here.

You might also like