L3 Virtualization
L3 Virtualization
L3 Virtualization
Lecture 3
CS451 Cloud Computing
Virtualization.
Layering and virtualization.
Virtual machine monitor.
Virtual machine.
x86 support for virtualization.
Full and paravirtualization.
Contents Xen.
Resources:
Book and,
VMware White paper: “Understanding Full Virtualization,
Paravirtualization, and Hardware Assisted”
https://www.vmware.com/techpapers/2007/understanding-full-
virtualization-paravirtualizat-1008.html
2
Three fundamental abstractions are necessary to describe the
operation of a computing systems:
(1) interpreters/processors, (2) memory, (3) communications links
As the scale of a system and the size of its users grows, it becomes
very challenging to manage its recourses (see three points above)
3
“Virtualization, in computing, refers to the act of creating a virtual
(rather than actual) version of something, including but not
limited to a virtual computer hardware platform, operating system
(OS), storage device, or computer network resources.” from
Motivation Wikipedia.
(cont’d)
Virtualization abstracts the underlying resources; simplifies their
use; isolates users from one another; and supports replication
which increases the elasticity of a system.
4
Cloud resource virtualization is important for:
Performance isolation
as we can dynamically assign and account for resources across different
Motivation applications
5
Virtualization simulates the interface to a physical object by:
Multiplexing
creates multiple virtual objects from one instance of a physical object.
Many virtual objects to one physical. Example - a processor is
multiplexed among a number of processes or threads.
Aggregation
creates one virtual object from multiple physical objects. One virtual
Virtualization object to many physical objects. Example - a number of physical disks
are aggregated into a RAID disk.
Emulation
constructs a virtual object of a certain type from a different type of a
physical object. Example - a physical disk emulates a Random Access
Memory (RAM).
Multiplexing and emulation
Examples - virtual memory with paging multiplexes real memory and
disk; a virtual address emulates a real address.
6
Layering – a common approach to manage system complexity:
Simplifies the description of the subsystems; each subsystem is
abstracted through its interfaces with the other subsystems
Minimises the interactions among the subsystems of a complex
system
With layering we are able to design, implement, and modify the
individual subsystems independently
Layering and Layering in a computer system:
Virtualization Hardware
Software
Operating system
Libraries
Applications
7
A1 Applications
API
Libraries A2
ABI
Layering and System calls
Operating System
Interfaces A3
ISA
System ISA User ISA
Hardware
Application Programming Interface (API), Application Binary Interface (ABI), and Instruction
Set Architecture (ISA). An application uses library functions (A1), makes system calls (A2), and
executes machine instructions (A3) (from book)
8
Instruction Set Architecture (ISA) – at the boundary between
hardware and software.
9
Binaries created by a compiler for a specific ISA and a specific
operating systems are not portable
10
HLL code
Intermediate Portable
VM compiler/ VM compiler/
Loader
interpreter interpreter
11
A virtual machine monitor (VMM/hypervisor) partitions the
resources of computer system into one or more virtual machines
Virtual (VMs). Allows several operating systems to run concurrently on a
single hardware platform
Machine A VM is an execution environment that runs an OS
Monitor VM – an isolated environment that appears to be a whole
computer, but actually only has access to a portion of the
(VMM / computer resources
12
A VMM (also hypervisor) (howto):
Traps the privileged instructions executed by a guest OS and
enforces the correctness and safety of the operation
Traps interrupts and dispatches them to the individual guest
operating systems
VMM Controls the virtual memory management
Virtualizes the Maintains a shadow page table for each guest OS and replicates any
modification made by the guest OS in its own shadow page table.
CPU and the This shadow page table points to the actual page frame and it is
used by the Memory Management Unit (MMU) for dynamic address
Memory translation.
Monitors the system performance and takes corrective actions to
avoid performance degradation. For example, the VMM may swap
out a VM to avoid thrashing.
13
Type 1 Hypervisor Type 2 Hypervisor
Type 1 and 2
Hypervisors
Taxonomy of VMMs:
1. Type 1 Hypervisor (bare metal, native): supports multiple virtual machines and
runs directly on the hardware (e.g., VMware ESX , Xen, Denali)
2. Type 2 Hypervisor (hosted) VM - runs under a host operating system (e.g., user-
mode Linux)
14
The run-time behavior of an application is affected by other
applications running concurrently on the same platform and
competing for CPU cycles, cache, main memory, disk and network
access. Thus, it is difficult to predict the completion time!
15
Conditions for
Conditions for efficient virtualization (from Popek and Goldberg):
Efficient A program running under the VMM should exhibit a behavior
Virtualization essentially identical to that demonstrated when running on an
equivalent machine directly.
(from Popek The VMM should be in complete control of the virtualized resources.
A statistically significant fraction of machine instructions must be
and Goldberg): executed without the intervention of the VMM. (Why?)
16
Dual-mode operation allows OS to protect itself and other system
components
User mode and kernel mode
Mode bit provided by hardware
Ability to distinguish when system is running user or kernel code
Some instructions are privileged, only executable in kernel mode
Operation
(recap)
17
Kernel-code (in particular, interrupt handlers) runs in kernel mode
User-mode vs the hardware allows all machine instructions to be executed and
allows unrestricted access to memory and I/O ports
Kernel-mode Everything else runs in user mode
(recap) The OS relies very heavily on this hardware-enforced protection
mechanism
18
Four layers of privilege execution rings
Challenges of User applications run in ring 3
OS runs in ring O
x86 CPU In which ring should the VMM run?
Virtualization In ring O, then, same privileges as an OS wrong
In rings 1,2,3, then OS has higher privileges wrong
Move the OS to ring 1 and the VMM in ring O OK
19
Techniques for Full virtualization with binary translation
Virtualizing OS-assisted Virtualization or Paravirtualization
CPU on x86 Hardware assisted virtualization
20
Techniques for Full virtualization
a guest OS can run unchanged under the VMM as if it was running
directly on the hardware platform. Each VM runs an exact copy of the
Virtualizing actual hardware.
Binary translation rewrites parts of the code on the fly to replace
CPU on x86 – sensitive but not privileged instructions with safe code to emulate the
original instruction
Full “The hypervisor translates all operating system instructions on the fly
and caches the results for future use, while user level instructions run
unmodified at native speed.” (from VMware paper)
Virtualization Examples:
VMware, Microsoft Virtual Server
Advantages:
No hardware assistance,
No modifications of the guest OS
Isolation, Security
Disadvantages:
Speed of execution
21
Techniques for Para-virtualization
“involves modifying the OS kernel to replace non- virtualizable
Virtualizing instructions with hypercalls that communicate directly with the
virtualization layer hypervisor.
CPU on x86 – The hypervisor also provides hypercall interfaces for other critical
Para- kernel operations such as memory management, interrupt handling
and time keeping. “ (from VMware paper)
virtualization Advantage:
faster execution, lower virtualization overhead
Disadvantage:
poor portability
Examples:
Xen, Denali
22
Guest OS Guest OS
Hardware Hardware
abstraction abstraction
layer layer
Full Virtualization
and Para- Hypervisor Hypervisor
virtualization
Hardware Hardware
23
Techniques for Hardware Assisted Virtualization
“a new CPU execution mode feature that allows the VMM to run in a
Virtualizing new root mode below ring O.
CPU on x86 – As depicted in Figure 7, privileged and sensitive calls are set to
automatically trap to the hypervisor, removing the need for either
Hardware binary translation or para-virtualization“ (from VMware paper)
Assisted
Advantage:
Virtualization even faster execution
Examples:
Intel VT-x, Xen 3.x
24
In 2005 Intel released two Pentium 4 models supporting VT-x.
VT-x supports two modes of operations (Figure (a)):
VMX root - for VMM operations.
VMX non-root - support a VM.
And a new data structure called the Virtual Machine Control
Structure including host-state and guest-state areas (Figure (b)).
VT-x, a Major VM entry
the processor state is loaded from the guest-state of the VM scheduled
Architectural to run; then the control is transferred from VMM to the VM.
VM exit
Enhancement saves the processor state in the guest-state area of the running VM;
then it loads the processor state from the host-state area, finally
transfers control to the VMM.
25
The goal was to design a VMM capable of scaling to about 100 VMs
running standard applications and services without any
modifications to the Application Binary Interface (ABI).
Linux, Minix, NetBSD, FreeBSD and others can operate as
paravirtualized Xen guest OS running on x86, x86-64, Itanium, and
ARM architectures.
Xen - a VMM
Xen domain
based on ensemble of address spaces hosting a guest OS and applications
running under the guest OS. Runs on a virtual CPU.
Paravirtualizati DomO - dedicated to execution of Xen control functions and privileged
on instructions.
DomU - a user domain.
26
Management
OS Application Application Application
Xen
Domain0 control Virtual x86 Virtual physical Virtual block
interface Virtual network
CPU memory devices
X86 hardware
27
XenStore – a DomO process.
Supports a system-wide registry and naming service.
Implemented as a hierarchical key-value storage.
A watch function informs listeners of changes of the key in storage
they have subscribed to.
Communicates with guest VMs via shared memory using Dom0
privileges.
Dom0 Toolstack - responsible for creating, destroying, and managing the
Components resources and privileges of VMs.
To create a new VM, a user provides a configuration file describing
memory and CPU allocations and device configurations.
Toolstack parses this file and writes this information in XenStore.
Takes advantage of DomO privileges to map guest memory, to load
a kernel and virtual BIOS and to set up initial communication
channels with XenStore and with the virtual console when a new VM
is created.
28
Strategies for
virtual
memory
management,
CPU
multiplexing,
and I/O devices
29
Each domain has one or more Virtual Network Interfaces (VIFs)
which support the functionality of a network interface card. A VIF
is attached to a Virtual Firewall-Router (VFR).
Split drivers have a front-end in the DomU and the back-end in
Xen Dom0; the two communicate via a ring in shared memory.
Ring - a circular queue of descriptors allocated by a domain and
Abstractions accessible within Xen. Descriptors do not contain data, the data
for buffers are allocated off-band by the guest OS.
Two rings of buffer descriptors, one for packet sending and one for
Networking packet receiving, are supported.
and I/O To transmit a packet:
a guest OS enqueues a buffer descriptor to the send ring,
then Xen copies the descriptor and checks safety,
copies only the packet header, not the payload, and
executes the matching rules.
30
I/O channel
Driver domain Guest domain
Bridge
Backend Frontend
Network
interface
Event channel
Consumer Response
Producer Response
(private pointer maintained by
(shared pointer updated Response queue the guest OS)
by Xen)
(b)
31
Driver domain Guest domain Driver domain Guest domain
Bridge Bridge
(a) (b)
The original architecture The optimized architecture
32
In a layered structure, a defense mechanism at some layer can be
disabled by malware running at a layer below it.
It is feasible to insert a rogue VMM, a Virtual-Machine Based
Rootkit (VMBR) between the physical hardware and an operating
system.
The Darker Rootkit - malware with a privileged access to a system.
33
The insertion of a Virtual-Machine Based Rootkit (VMBR) as the
lowest layer of the software stack running on the physical
hardware; (a) below an operating system; (b) below a legitimate
virtual machine monitor. The VMBR enables a malicious OS to run
surreptitiously and makes it invisible to the genuine or the guest
OS and to the application.
The Darker Application
Side of Application
Malicious Guest OS
Virtualization OS
Operating
(con’t) Malicious
OS
system (OS) Virtual machine monitor
Hardware Hardware
(a) (b)
34
A Linux Container is a Linux process (or processes) that is a virtual
environment with its own process network space. (lightweight
process virtualization)
Containers share portions of the host kernel
Containers use:
Namespaces
per-process isolation of OS resources (filesystem, network and user ids)
Linux Cgroups
Containers resource management and accounting per process
opensource.com 35
Comparison of
Traditional
virtualization
and containers
36
Why do we
want to run
our application
inside
containers?
37
● Lightweight footprint and minimal overhead,
● Portability across machines,
● Simplify DevOps practices,
● Speeds up Continuous Integration,
● Empower Micro-services Architectures.
Container ● Isolation
Advantages
38
Virtualization
Layering and virtualization.
Virtual machine monitor.
Summary Virtual machine.
x86 support for virtualization.
Xen.
39