Firecracker microVM

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Study of Firecracker MicroVM

Madhur Jain
Khoury College of Computer Sciences
Northeastern University
Boston, MA
[email protected]

Abstract—Firecracker is a virtualization technology that makes Firecracker was developed to handle serverless workloads
use of Kernel Virtual Machine (KVM). Firecracker belongs to and has been running in production for AWS Lambda and
a new virtualization class named the micro-virtual machines
arXiv:2005.12821v1 [cs.OS] 26 May 2020

Fargate since 2018. Serverless workloads require isolation


(MicroVMs). Using Firecracker, we can launch lightweight Mi-
croVMs in non-virtualized environments in a fraction of a second, and security, and at the same time, have container benefits
at the same time offering the security and workload isolation of faster boot time. A lot of research has been done with
provided by traditional VMs and also the resource efficiency that respect to the cold-start and warm-start of the VM instances
comes along with containers [1]. Firecracker aims to provide a for serverless workloads. [17] The idea behind developing
slimmed-down MicroVM, comprised of approximately 50K lines Firecracker was to make use of the KVM module loaded in
of code in Rust and with a reduced attack surface for guest
VMs. This report will examine the internals of Firecracker and the Linux kernel and get rid of the legacy devices that other
understand why Firecracker is the next big thing going forward virtualization technologies like Xen and VMWare offer. This
in virtualization and cloud computing. way, Firecracker can create a VMM with a smaller memory
Index Terms—firecracker, microvm, Rust, VMM, QEMU, footprint and also provide improved performance.
KVM Section 2 explains the high-level architecture of the Fire-
cracker Micro VM, Section 3 dives deep into the boot se-
I. I NTRODUCTION quence of the Firecracker, Section 4 describes the device
model emulation provided, and Section 5 shares some light
Firecracker is a new open source Virtual Machine Monitor on the conclusions garnered through this study.
(VMM) developed by AWS for serverless workloads. It is a
virtualization technology which makes use of KVM, meaning II. A RCHITECTURE OF F IRECRACKER
it can only be run on KVM-supported and enabled hosts and KVM is an enabler of hardware extensions provided by
a corresponding Linux Kernel v4.14+. With the recommended vendors such as Intel and AMD with their virtualization
Linux kernel guest configuration, Firecracker claims to offer extensions such as SVM and VMX. These extensions allow
a memory overhead of less than 5MB per container, boots to KVM to directly execute the guest code on the host CPU.
application code within 125ms, and allows for the creation of There are three sets of ioctls that make up the KVM API and
up to 150 MicroVMs per second. The number of Firecracker are issued to control the various aspects of the virtual machine.
MicroVMs running simultaneously on a single host is only The three classes that the iocltls belongs to are [7] -
limited by the availability of hardware resources. [1] • System IOCTLs: These query and set global attributes,
Firecracker provides security and resource isolation by run- which affect the whole KVM subsystem. In addition, a
ning the Firecracker userpsace process inside a jailer process. system ioctl is used to create virtual machines.
The jailer sets up system resources that require elevated • VM IOCTLs: These query and set attributes that affect
permissions (e.g., cgroup, chroot), drops privileges, and then an entire virtual machinefor example, memory layout.
exec()s into the Firecracker binary, which then runs as an In addition, a VM ioctl is used to create virtual CPUs
unprivileged process. Past this point, Firecracker can only (vCPUs). VM ioctls are run from the same userspace
access resources to which a privileged third-party grants access process (address space) that was used to create the VM.
(e.g., by copying a file into the chroot, or passing a file • vCPU IOCTLs: These query and set attributes that
descriptor). [13] control the operation of a single virtual CPU. They run
Seccomp filters limit the system calls that the Firecracker vCPU ioctls from the same thread that was used to create
process can use. There are 3 possible levels of seccomp the vCPU.
filtering, configurable by passing a command line argument QEMU is an open source machine emulator and a virtualiza-
to the jailer: 0 (disabled), 1 (whitelists a set of trusted system tion technique to run KVM-enabled Virtual Machines. Similar
calls by their identifiers) and 2 (whitelists a set of trusted to QEMU, Firecracker uses a multithreaded event-driven ar-
system calls with trusted parameter values), the latter being chitecture. Each Firecracker process runs one and only one
the most restrictive and the recommended one. The filters MicroVM. The process consists of three main threads: API,
are loaded in the Firecracker process, immediately before the VMM and vCPU. The API server thread is an HTTP server
execution of the untrusted guest code starts. [13] that is created to accept requests and performs actions on those
created for each virtual CPU. These vCPU threads are nothing
but POSIX threads created by KVM. To run guest code, the
vCPUs execute an ioctl with KVM RUN as its argument.
Software in the root mode is the hypervisor. Hypervisor
or the VMM forms a new plane that runs in root mode
while the VMs run in non-root mode. KVM uses virtualization
extensions to provide these different modes on the host CPUs.
In the case of Intel CPUs, VT-x is the CPU virtualization and
VT-d is the IO virtualization. For vCPUs, VT-x provides two
modes of guest code execution: root and non-root. Whenever
Fig. 1. Firecracker Design - Threads a VM attempts to execute an instruction that is prohibited
by the non-root model, vCPU immediately switches to a
root mode in a trap-like way. This is called a VMEXIT.
requests. The API server thread structure comprises of a socket Hypervisor deals with the reason for VMEXIT and then
connection to listen to requests on the port, epoll fd to listen to executes VMRESUME to re-enter non-root mode for that VM
events on the socket port, and a hashmap of connections. The instance. This interaction between root and non-root is the
hashmap consists of token and connection pairs corresponding essence of hardware virtualization.
to the file descriptor of the streamin this case, the socket.
Traditionally, QEMU has made use of select or poll system III. F IRECRACKER B OOT S EQUENCE
calls to maintain an event loop of file descriptors on which Traditional PCs boot Linux with a BIOS and a bootloader.
to listen for new events. [16] select or poll system calls The primary responsibilities of the BIOS includes booting
requires a list of all open file descriptors maintained in the the CPU in real mode and performing a Power on Self Test
VMM structure and then it would go through each of the file (POST) setup before loading the bootloader. BIOS determines
descriptors to determine which FDs have new events. This the candidates for boot devices, and once a bootable device
would take up O(N) time where N is the number of file is found, the bootloader is loaded into RAM and executed.
descriptors to listen for new events. [14] Different systems have different stages of the bootloader to
Firecracker takes the epoll approach where the host kernel be executed. LILO, for example, has a two stage bootloader
maintains a list of file descriptors for VM process and notifies while GRUB contains a 3 stage bootloader. Multiple stages
the VMM whenever there is a new event that occurs in any of a bootloader are used because of the system limitations of
of the file descriptors. This is called as a Edge Triggered some of the older devices that were used to boot Linux.
mechanism (”pull”), whereas the select/poll was a Level Linux kernels actually do not require a BIOS and a boot-
Triggered mechanism (”push”). The epoll fd structure created loader. Instead, Firecracker uses what is known as Linux Boot
by Firecracker has the ‘close on exec‘ flag set, which means Protocol. [15] There are multiple versions of the Linux Boot
if a process forks the Firecracker process, the file descriptors Protocol standard that exist. Firecracker follows the 64-bit
would not be shared. Linux Boot Protocol Standard. Thus, Firecracker can directly
The API server exposes a number of requests as REST load the Linux kernel and mount the corresponding root file
APIs which can be used to configure the MicroVM. Once system.
the MicroVM has been started, i.e., once it receives the The Linux kernel is an uncompressed bzImage (big com-
”InstanceStart” command, the API server thread will just pressed image, usually larger than 512KB). The bzImage
block on the epoll file descriptor until new events come in. format consists of a real mode kernel code and a protected
Firecracker creates a channel (see Rust channels) to enable mode kernel code. Instead of booting into the entry point
communication between the API Server thread and the VMM defined by the real mode, Firecracker directly boots into the
thread. Rust channels are similar to Unix pipes for comparison. 64-bit entry point located at 0x200 in the protected mode
The VMM server thread manages the entire MicroVM. kernel code. Firecracker loads the uncompressed Linux kernel
Once the VMM server thread is created, it runs an event as well as the init process, thereby saving approximately 20
loop which takes the parsed request one by one from the API to 30ms of the time taken to decompress the kernel.
server thread and dispatches it to the appropriate handlers. The Linux kernel also contains another component, namely the
handlers are defined according to the dispatch table set by initramfs [5]. There are four primary reasons to have an
the event loop. For now, Firecracker supports the following initramfs in the LFS environment: loading the rootfs from
handlers - Exit, Stdin, DeviceHandler, VMMActionRequest, a network, loading it from an LVM logical volume, having
WriteMetrics. The dispatch table is managed by the epoll fd. an encrypted rootfs where a password is required, or for the
The dispatch table maintains a map of file descriptors that convenience of specifying the rootfs as a LABEL or UUID.
are to be monitored, and the kind of events to be monitored Anything else usually means that the kernel was not configured
for. When the vCPU thread creation request is received by properly.
the VMM thread, the VMM spawns the required number of Since Firecracker doesn’t need any of the above stated
vCPU by using the KVM vCPU ioctls. A vCPU thread is reasons for loading the initramfs before mounting the root
˜ ˜
| Protected-mode kernel |
100000 +------------------------+
| I/O memory hole |
0A0000 +------------------------+
| Reserved for BIOS | Leave as much as possible unused
˜ ˜
| Command line | (Can also be below the X+10000 mark)
X+10000 +------------------------+
| Stack/heap | For use by the kernel real-mode code.
X+08000 +------------------------+
| Kernel setup | The kernel real-mode code.
| Kernel boot sector | The kernel legacy boot sector.
X +------------------------+
| Boot loader | <- Boot sector entry point 0000:7C00
001000 +------------------------+
| Reserved for MBR/BIOS |
000800 +------------------------+
| Typically used by MBR |
000600 +------------------------+
| BIOS use only |
000000 +------------------------+

Fig. 2. Linux BzImage Memory Layout

file system, it is recommended to avoid loading the initramfs and the corresponding device being exposed by the hyper-
at boot time, thereby further reducing the overall boot time visor or the VMM. A transport layer enables communication
and the memory footprint for the kernel. So, if no initramfs is between the host and the guest. For the transport layers, Virtio
configured externally, then at boot time, Firecracker replaces employs a ring-buffer virtqueue structure. A virtuqueue is a
the initramfs with a default empty, 134 byte initramfs. queue of guest-allocated buffers that the host interacts with
either by reading or writing to them. Each device can have
IV. F IRECRACKER D EVICE M ODEL zero or more virtqueues. A back-end driver present in the
Until this section, we have talked about the similar architec- host kernel completes the communication flow, to which the
tures and execution flow for Firecracker and QEMU. So, what virtqueue is connected.
is different between QEMU and Firecracker? One of the main Firecracker device model architecture using Virtio is shown
differences is with the device emulations. There are only 5 in Figure 3. The following list provides a description of the
Device emulations available in Firecracker: network, block de- devices available within Firecracker:
vices, sockets, serial console and minimal keyboard controller, • virtio-net: implementation for the network driver (tun/tap
as shown in Figure 3. Firecracker does not provide support for devices)
device emulations like USB, GPU and 9P filesystem in order • virtio-blk: implementation for the block devices
to provide increased security compared to other virtualization • virtio-vsock: implementation for VM sockets providing
technologies like QEMU. On the other hand, QEMU has most N:1 serial communications
device model emulations available in the VMM. The careful • serial console: implementation for the legacy console
reader will notice that Firecracker does not the use the vhost devices for serial communication - terminal
implementation in the host kernel that provides more efficient • keyboard controller: implementation for the keyboard
IO performance without doing VMEXITS. device, though only one function is implemented -
An open specification for emulating device models in virtu- Ctrl+Alt+Del to reboot/shutdown the system.
alization has been developed, named Virtio. Virtio is defined
as a straightforward, efficient and a standard mechanism to V. S COPE FOR I MPROVEMENTS
allow guest OS to talk to the virtual device driver in a similar
way the host OS would call the actual hardware device driver. Even with all the excellent features providing near-native
It takes advantage of the fact that the guest can share memory performance of the guest code using KVM, as well as faster
with the host for IO. boot times and lower memory footprint of the VMs due
The general flow for the virtio specification [8] includes a to fewer support for available device emulations, there still
front-end driver representing the virtual device in the guest, exist some areas for improvements that can make Firecracker
modular design approach has also led to the development of
community-driven high-quality rust-vmm crates which pro-
vide us with the core modules required for the implementation
of a hypervisor [11]. rust-vmm [12] is a community approach
initiated by AWS. Amazon along with Intel, Redhat and
Google, are trying to provide a platform to build a hypervisor
from scratch by only consuming the modules required from the
rust-vmm crates. This approach also enables the development
of a plug-n-play architecture in hypervisors, which we haven’t
seen so far.
Fig. 3. Firecracker Device Model R EFERENCES
[1] https://aws.amazon.com/blogs/aws/firecracker-lightweight-virtualization-for-serverless-
[2] Alexandru Agache, Marc Brooker, Andreea Florescu, Alexandra Ior-
more suitable for general use cases and not just for serverless dache, Anthony Liguori, Rolf Neugebauer, Phil Piwonka, and Diana-
workloads. Maria Popa https://www.usenix.org/system/files/nsdi20-paper-agache.
pdf Firecracker: Lightweight Virtualization for Serverless Applications
• Support for virtio-fs: virto-fs is the interface to provide
[3] https://unixism.net/2019/10/how-aws-firecracker-works-a-deep-dive/
efficient sharing between the host and the guest filesystem [4] https://developer.ibm.com/technologies/linux/articles/l-linuxboot/
avoiding context switches (VMEXITS) thereby providing [5] http://www.linuxfromscratch.org/blfs/view/cvs/postlfs/initramfs.html
[6] Swift Birth and Quick Death: Enabling Fast ParallelGuest Boot and
more performance. virtio-fs is an upgrade on the existing Destruction in the Xen Hypervisor https://ssrg.ece.vt.edu/papers/vee
virtio-9p interface for the same purpose. Though more 2017.pdf
research is required for security purposes before including [7] Mastering KVM Virtualization by Prasad Mukhedkar; Humble
Devassy Chirammal; Anil Vettathu https://learning.oreilly.com/library/
it as part of Firecracker. view/mastering-kvm-virtualization/9781784399054/ch02s04.html
• Increased IO Performance: The results of the tests [8] Deep Dive into virtio and vhost-net driver https://www.redhat.com/en/
performed between the Firecracker, QEMU and Cloud blog/deep-dive-virtio-networking-and-vhost-net
[9] Memory Ballooning Support https://github.com/firecracker-microvm/
Hypervisor show limitations in Firecracker’s virtio im- firecracker/issues/1571
plementation and serial execution. [1] [2] [10] Enable IRQ Sharing to remove current limit of 10 devices https://github.
• Larger number of device emulations: Currently, Fire- com/firecracker-microvm/firecracker/issues/1268
[11] Building virtualization stack for the future https://opensource.com/
cracker can emulate only 10 devices, since each device article/19/3/rust-virtual-machine
gets its own IRQ. [10] [12] rust-vmm https://github.com/rust-vmm
• Support for attaching devices at runtime: Firecracker [13] Firecracker Design Doc https://github.com/firecracker-microvm/
firecracker/blob/master/docs/design.md
only allows specifying the devices at booting time. De- [14] Understand epoll and its madness https://medium.com/@copyconstruct/
vices can only be attached when the MicroVM is shut the-method-to-epolls-madness-d9d2d6378642
down. [15] Linux 64-bit Boot Protocol https://www.kernel.org/doc/html/latest/x86/
boot.html#id1
• Hotplugging Support: For any workload, it is beneficial
[16] QEMU Internals http://blog.vmsplice.net/2011/03/
to allow guest memory/CPU hotplugging within a VM qemu-internals-overall-architecture-and.html
at runtime in order to avoid interference to the workload. [17] Cold Start/Warm Start with AWS Lambda https://blog.octo.com/en/
cold-start-warm-start-with-aws-lambda/
Firecracker oversubscribes the allocated memory required
for the guest, but there is no way to expand the allocated
memory for the guest.
• Memory Ballooning Support: At present, Firecracker
does not have any support for reclaiming unused mem-
ory from the guest, since no communication is present
between the host and the guest. This, along with the hot-
plugging feature would make it very easy to dynamically
add/remove memory/CPU at runtime thereby providing
elasticity to the MicroVM. [9]
VI. C ONCLUSION /T HOUGHTS
This paper reviews the implementation of a minimalist and
modular VMM in the form of Firecracker MicroVM. It also
identifies the process of how Firecracker provides resource
isolation and security through the use of seccomp filters and
jailer process and provides faster boot times and lower memory
footprint due to KVM and minimal device model emulation.
One other thing to note is that Firecracker embodies the
modular design in the development of the hypervisor. The

You might also like