Virtualization Theory

Alexey Eremenko "Technologov"

$Revision: 248 $

$Date: 2007-05-06 15:11:51 +0100 (Sun, 06 May 2007) $

Revision History
Revision 113 May 2007A.E.
First public release

Basic Terminology

Host

Your real computer, on which the emulator/virtualizer software runs.

Guest (also known as VM=Virtual Machine)

Your emulated computer, virtual machine, or VM for short, this is what you are trying to emulate. Your target. It can be the same, or very different from your real system.

For example, your host can be a Pentium III PC, while your guest can be a Sony Playstation. Of course, VirtualBox cannot emulate Playstations, so look at different software. It's just important that you understand those two basic concepts.

Some extra virtualization theory

Virtualization for operating-systems can be devided into 4 levels: Emulation, Full Virtualization, Para-virtualization and Operating-System-level virtualization, as shown in the scale below, plus there are other types of virtualization in existence not shown in the scaler below, such as software-only Virtual Machines (such as Java and .NET) and API emulation (such as Cygwin and Wine). Those two latter types are not shown, becuase their purpose is not virtualization of operating-systems, so they don't belong.

Virtualization-Scale Diagram:

Each of those 4 levels increases complexity, operating-system integration and performance... at the cost of losing abstraction and portability to new OSes.

Emulation

This technique allow conversion of commands, or instructions, one by one, by using software. Some complex commands take a lot of effort to emulate correctly.

This method does not uses kernel drivers, not in the host and not in the guest. This is important, because it makes the software more abstract, more secure, more portable, and theoretically allows for running an emulator without installing it first.

This technique has a very important feature: you can emulate things that are completely different from your Host architecture, that is: you can emulate whatever you want on a 32-bit PC: a 64-bit PC, SPARC or PowerPC workstations, or even PlayStation or Cisco Router.

Emulation method is a very abstract one, needs least amount of intervention in the Host/Guest systems, but the price is heavy: performance. Usually emulation has efficiency of 0.1%-5% of normal speed, that depends on the program.

Programs, that belong to this category: DOSbox, bochs, Qemu, Dynamips/Dynagen, ePSXe.

Full Virtualization

This technique allow execution of normal (safe) commands on real CPU as-is, without any kind of translation. The risky (or unsafe) commands, that can take your Host system down, are not allowed to run "as-is", and instead those are usually emulated or pre-translated ahead of time. With Full Virtualization the CPU is virtualized (used as-is for most commands), but other hardware is emulated, just like in previous method. This includes video card, sound card, motherboard and the rest...

This method uses host kernel drivers. Potentially it lowers stability and security, however in practise this method works good enough. Using drivers also means, that you can not just run it, so installation step will be required.

This technique must have single architecture on both Host and Guest. For PC case if your host is 32-bit, then your guest can be up to 32-bit. (16/32-bit), and if your host is 64-bit then your guest can be up to 64-bit (16/32/64-bit).

Full Virtualization must have some integration with the host OS. This method usually approaches efficiency of 50%-90% of normal speed, that depends.

CPU Hardware virtualization such as Intel's Vanderpool (VMX) and AMD SVM technology uses this approach. Don't allow hardware vendors fool you: their so-called hardware virtualization technology is way slower than the more advanced methods, such as OS-level virtualization. Only software that uses Full Virtualization technique can take advantage of those CPU features. Software from other categories cannot use those.

Programs, that belong to this category: VMware, Qemu +accelerators(KVM,QVM,KQemu), VirtualPC, VirtualBox, Xen. The Linux kernel has KVM in this area.

Para-Virtualization

This technique allow execution of normal (safe) commands on real CPU as-is, without any kind of translation. Additionally this method requires changes to the Guest OS kernel, so that it won't use any risky/unsafe commands. In other words, the GuestOS is modified in a way that makes it easy to virtualize.

This method modifies both host and guest operating-system kernel. Because Windows is a closed-source operating system, this method currently works mostly for Linux-on-Linux scenarios.

This technique must have single architecture on both Host and Guest. For PC case if your host is 32-bit, then your guest can be up to 32-bit. (32-bit), and if your host is 64-bit then your guest can be up to 64-bit (32/64-bit).

Para-Virtualization must have good integration with both the host and the guest OSes. This method usually approaches efficiency of 90%-98% of normal speed, that depends, but since the resource allocation is static, you're still bound to part of your Host's power.

Programs, that belong to this category: Xen, User-Mode Linux (UML), VMware6. The Linux kernel has VMI in this area.

Operating-System Level Virtualization (also known as: Containerization)

This technique offers the tightest integration between host and guest systems, by porviding a single, unified, heavily-modified kernel, that works as both guest and host kernel at the same time, and provide some level of isolation, and very-lightweight virtualization between guests. Speaking of terms, guests made with this method are known as Virtual Private Servers (VPS) instead of Virtual Machines (VM) as previously.

More correctly would be to say that guest VPSes have no kernel at all, but they share a part of a host kernel. Having no guest kernel means that there is no need for Hyperviser, which allows to reduce overhead.

Because the kernel is single for both Host and Guests, the only thing that differs between guests here is: userland applications. The userland may be 32-bit on 32-bit Host or 32/64-bit on 64-bit Host. This method currently works mostly for Linux-on-Linux scenarios. Also because of a single kernel, there is a highter-risk of a hacker-attack that could take all the guests and the host down.

This method allows for installing several Linux distributions on one Host, but they should use a single-kernel, without distro-specific extensions in each guest VPS, such as AppArmor.

This method usually approaches efficiency of 98%-99.9% of normal speed, that depends. Unlike previous scenarios, this technology offers dynamic resource allocation, which means that 1 guest can use up to 100% of all CPU, Memory, Hard Disk and network resources available on Host, driving up the total performance usage of a Host, and allowing more guests to share the same host. Dynamic resource allocation is a very good thing indeed.

While traditional Full virtualization allows for just several VMs on a single Pentium4 class PC with 1 GB of RAM, this technique allows for over a dozens VPSes.

Programs, that belong to this category: Linux vserver, OpenVZ, Virtuozzo, FreeVPS.

The ideal, as I see it: I believe, that a single program, or GUI, should be created to manage all four cases, by switching drivers both on host and guest. Qemu currently occupies 2 lands (Emulator+Full virtualizer) thanks to it's accelerators, and VMware plans to occupy both Full and Para virtual environments.

That is: a program could start as an userland emulator, then the user could install host drivers to make it Full Virtualizer, then install guest additions to make it either para-virtual or even OS-level virtualization possible.

For More Information

For more technical explanation, look: http://www.virtualbox.org/wiki/Virtualization