Virtual Machine Monitors: B.1 Introduction
Virtual Machine Monitors: B.1 Introduction
Virtual Machine Monitors: B.1 Introduction
B.1 Introduction
Years ago, IBM sold expensive mainframes to large organizations, and
a problem arose: what if the organization wanted to run different oper-
ating systems on the machine at the same time? Some applications had
been developed on one OS, and some on others, and thus the problem.
As a solution, IBM introduced yet another level of indirection in the form
of a virtual machine monitor (VMM) (also called a hypervisor) [G74].
Specifically, the monitor sits between one or more operating systems
and the hardware and gives the illusion to each running OS that it con-
trols the machine. Behind the scenes, however, the monitor actually is
in control of the hardware, and must multiplex running OSes across the
physical resources of the machine. Indeed, the VMM serves as an operat-
ing system for operating systems, but at a much lower level; the OS must
still think it is interacting with the physical hardware. Thus, transparency
is a major goal of VMMs.
Thus, we find ourselves in a funny position: the OS has thus far served
as the master illusionist, tricking unsuspecting applications into thinking
they have their own private CPU and a large virtual memory, while se-
cretly switching between applications and sharing memory as well. Now,
we have to do it again, but this time underneath the OS, who is used to
being in charge. How can the VMM create this illusion for each OS run-
ning on top of it?
T HE C RUX :
H OW TO VIRTUALIZE THE MACHINE UNDERNEATH THE OS
The virtual machine monitor must transparently virtualize the ma-
chine underneath the OS; what are the techniques required to do so?
1
2 V IRTUAL M ACHINE M ONITORS
O PERATING
S YSTEMS
[V ERSION 1.00] WWW. OSTEP. ORG
V IRTUAL M ACHINE M ONITORS 3
1
Just to make things confusing, the Intel folks use the term “interrupt” for what almost
any sane person would call a trap instruction. As Patterson said about the Intel instruction
set: “It’s an ISA only a mother could love.” But actually, we kind of like it, and we’re not its
mother.
T HREE
c 2008–18, A RPACI -D USSEAU
E ASY
P IECES
4 V IRTUAL M ACHINE M ONITORS
O PERATING
S YSTEMS
[V ERSION 1.00] WWW. OSTEP. ORG
V IRTUAL M ACHINE M ONITORS 5
As you can see from the figures, a lot more has to take place when
virtualization is going on. Certainly, because of the extra jumping around,
virtualization might indeed slow down system calls and thus could hurt
performance.
You might also notice that we have one remaining question: what
mode should the OS run in? It can’t run in kernel mode, because then
it would have unrestricted access to the hardware. Thus, it must run in
some less privileged mode than before, be able to access its own data
structures, and simultaneously prevent access to its data structures from
user processes.
In the Disco work, Rosenblum and colleagues handled this problem
quite neatly by taking advantage of a special mode provided by the MIPS
hardware known as supervisor mode. When running in this mode, one
still doesn’t have access to privileged instructions, but one can access a
little more memory than when in user mode; the OS can use this extra
memory for its data structures and all is well. On hardware that doesn’t
have such a mode, one has to run the OS in user mode and use memory
protection (page tables and TLBs) to protect OS data structures appro-
priately. In other words, when switching into the OS, the monitor would
have to make the memory of the OS data structures available to the OS via
page-table protections; when switching back to the running application,
the ability to read and write the kernel would have to be removed.
T HREE
c 2008–18, A RPACI -D USSEAU
E ASY
P IECES
6 V IRTUAL M ACHINE M ONITORS
O PERATING
S YSTEMS
[V ERSION 1.00] WWW. OSTEP. ORG
V IRTUAL M ACHINE M ONITORS 7
greater than one), and thus V VMM page tables; further, on top of each
running operating system OSi , there would be a number of processes Pi
running (Pi likely in the tens or hundreds), and hence Pi (per-process)
page tables within OSi .
To understand how this works a little better, let’s recall how address
translation works in a modern paged system. Specifically, let’s discuss
what happens on a system with a software-managed TLB during address
translation. Assume a user process generates an address (for an instruc-
tion fetch or an explicit load or store); by definition, the process generates
a virtual address, as its address space has been virtualized by the OS. As
you know by now, it is the role of the OS, with help from the hardware,
to turn this into a physical address and thus be able to fetch the desired
contents from physical memory.
Assume we have a 32-bit virtual address space and a 4-KB page size.
Thus, our 32-bit address is chopped into two parts: a 20-bit virtual page
number (VPN), and a 12-bit offset. The role of the OS, with help from the
hardware TLB, is to translate the VPN into a valid physical page frame
number (PFN) and thus produce a fully-formed physical address which
can be sent to physical memory to fetch the proper data. In the common
case, we expect the TLB to handle the translation in hardware, thus mak-
ing the translation fast. When a TLB miss occurs (at least, on a system
with a software-managed TLB), the OS must get involved to service the
miss, as depicted here in Table B.4.
As you can see, a TLB miss causes a trap into the OS, which handles
the fault by looking up the VPN in the page table and installing the trans-
lation in the TLB.
With a virtual machine monitor underneath the OS, however, things
again get a little more interesting. Let’s examine the flow of a TLB miss
again (see Table B.5 for a summary). When a process makes a virtual
memory reference and misses in the TLB, it is not the OS TLB miss han-
dler that runs; rather, it is the VMM TLB miss handler, as the VMM is
the true privileged owner of the machine. However, in the normal case,
the VMM TLB handler doesn’t know how to handle the TLB miss, so it
immediately jumps into the OS TLB miss handler; the VMM knows the
T HREE
c 2008–18, A RPACI -D USSEAU
E ASY
P IECES
8 V IRTUAL M ACHINE M ONITORS
location of this handler because the OS, during “boot”, tried to install its
own trap handlers. The OS TLB miss handler then runs, does a page ta-
ble lookup for the VPN in question, and tries to install the VPN-to-PFN
mapping in the TLB. However, doing so is a privileged operation, and
thus causes another trap into the VMM (the VMM gets notified when any
non-privileged code tries to do something that is privileged, of course).
At this point, the VMM plays its trick: instead of installing the OS’s VPN-
to-PFN mapping, the VMM installs its desired VPN-to-MFN mapping.
After doing so, the system eventually gets back to the user-level code,
which retries the instruction, and results in a TLB hit, fetching the data
from the machine frame where the data resides.
This set of actions also hints at how a VMM must manage the virtu-
alization of physical memory for each running OS; just like the OS has a
page table for each process, the VMM must track the physical-to-machine
mappings for each virtual machine it is running. These per-machine page
tables need to be consulted in the VMM TLB miss handler in order to de-
termine which machine page a particular “physical” page maps to, and
even, for example, if it is present in machine memory at the current time
(i.e., the VMM could have swapped it to disk).
O PERATING
S YSTEMS
[V ERSION 1.00] WWW. OSTEP. ORG
V IRTUAL M ACHINE M ONITORS 9
while (1)
; // the idle loop
It makes sense to spin like this if the OS in charge of the entire machine
and thus knows there is nothing else that needs to run. However, when a
T HREE
c 2008–18, A RPACI -D USSEAU
E ASY
P IECES
10 V IRTUAL M ACHINE M ONITORS
VMM is running underneath two different OSes, one in the idle loop and
one usefully running user processes, it would be useful for the VMM to
know that one OS is idle so it can give more CPU time to the OS doing
useful work.
Another example arises with demand zeroing of pages. Most oper-
ating systems zero a physical frame before mapping it into a process’s
address space. The reason for doing so is simple: security. If the OS
gave one process a page that another had been using without zeroing it,
an information leak across processes could occur, thus potentially leak-
ing sensitive information. Unfortunately, the VMM must zero pages that
it gives to each OS, for the same reason, and thus many times a page will
be zeroed twice, once by the VMM when assigning it to an OS, and once
by the OS when assigning it to a process. The authors of Disco had no
great solution to this problem: they simply changed the OS (IRIX) to not
zero pages that it knew had been zeroed by the underlying VMM [B+97].
There are many other similar problems to these described here. One
solution is for the VMM to use inference (a form of implicit information)
to overcome the problem. For example, a VMM can detect the idle loop by
noticing that the OS switched to low-power mode. A different approach,
seen in para-virtualized systems, requires the OS to be changed. This
more explicit approach, while harder to deploy, can be quite effective.
B.6 Summary
Virtualization is in a renaissance. For a multitude of reasons, users
and administrators want to run multiple OSes on the same machine at
the same time. The key is that VMMs generally provide this service trans-
parently; the OS above has little clue that it is not actually controlling the
hardware of the machine. The key method that VMMs use to do so is
to extend the notion of limited direct execution; by setting up the hard-
O PERATING
S YSTEMS
[V ERSION 1.00] WWW. OSTEP. ORG
V IRTUAL M ACHINE M ONITORS 11
T IP : U SE I MPLICIT I NFORMATION
Implicit information can be a powerful tool in layered systems where
it is hard to change the interfaces between systems, but more informa-
tion about a different layer of the system is needed. For example, a
block-based disk device might like to know more about how a file sys-
tem above it is using it; Similarly, an application might want to know
what pages are currently in the file-system page cache, but the OS pro-
vides no API to access this information. In both these cases, researchers
have developed powerful inferencing techniques to gather the needed in-
formation implicitly, without requiring an explicit interface between lay-
ers [AD+01,S+03]. Such techniques are quite useful in a virtual machine
monitor, which would like to learn more about the OSes running above it
without requiring an explicit API between the two layers.
ware to enable the VMM to interpose on key events (such as traps), the
VMM can completely control how machine resources are allocated while
preserving the illusion that the OS requires.
You might have noticed some similarities between what the OS does
for processes and what the VMM does for OSes. They both virtualize
the hardware after all, and hence do some of the same things. However,
there is one key difference: with the OS virtualization, a number of new
abstractions and nice interfaces are provided; with VMM-level virtual-
ization, the abstraction is identical to the hardware (and thus not very
nice). While both the OS and VMM virtualize hardware, they do so by
providing completely different interfaces; VMMs, unlike the OS, are not
particularly meant to make the hardware easier to use.
There are many other topics to study if you wish to learn more about
virtualization. For example, we didn’t even discuss what happens with
I/O, a topic that has its own new and interesting issues when it comes to
virtualized platforms. We also didn’t discuss how virtualization works
when running “on the side” with your OS in what is sometimes called a
“hosted” configuration. Read more about both of these topics if you’re in-
terested [SVL01]. We also didn’t discuss what happens when a collection
of operating systems running on a VMM uses too much memory.
Finally, hardware support has changed how platforms support virtu-
alization. Companies like Intel and AMD now include direct support for
an extra level of virtualization, thus obviating many of the software tech-
niques in this chapter. Perhaps, in a chapter yet-to-be-written, we will
discuss these mechanisms in more detail.
T HREE
c 2008–18, A RPACI -D USSEAU
E ASY
P IECES
12 V IRTUAL M ACHINE M ONITORS
References
[AA06] “A Comparison of Software and Hardware Techniques
for x86 Virtualization”
Keith Adams and Ole Agesen
ASPLOS ’06, San Jose, California
A terrific paper from two VMware engineers about the surprisingly small benefits of having hardware
support for virtualization. Also an excellent general discussion about virtualization in VMware, includ-
ing the crazy binary-translation tricks they have to play in order to virtualize the difficult-to-virtualize
x86 platform.
O PERATING
S YSTEMS
[V ERSION 1.00] WWW. OSTEP. ORG
V IRTUAL M ACHINE M ONITORS 13
T HREE
c 2008–18, A RPACI -D USSEAU
E ASY
P IECES