Building a Windows Hypervisor for Systems Research
Motivation
I started this project because I wanted to understand the Windows kernel from the bottom up: how the OS manages memory, handles system calls, and enforces privilege boundaries. The more I read, the more I realized that a hypervisor was a good way to study those boundaries because it sits below the operating system instead of inside it.
That was the part that pulled me in. A hypervisor gives you a different view of the machine than a kernel driver does. The kernel is still important, but it is no longer the deepest layer in the stack. For defensive research, that is useful because it lets you ask questions about memory, CPU state, and execution from outside the guest's normal accounting model.
That became the purpose of Hv2: not to build a general-purpose VM platform, but to learn how live virtualization works on real hardware and to build a lab tool for memory introspection and kernel-behavior research on machines I control.
What It Is
Hv2 is an AMD-V hypervisor that loads from a Windows kernel driver. Unlike a traditional VM where you boot a separate guest OS on top of a host, Hv2 virtualizes the already-running Windows system. The machine transitions from normal execution into a virtualized guest context without rebooting. Paired with an EFI mapper, Hv2 enforces a Type-1 architecture that virtualizes before the OS begins to boot.
It is written in C++ and x86-64 MASM, built against the Windows Driver Kit.
Technical Milestones
The project ended up being much more than "turn on virtualization." The parts that taught me the most were:
- Launching SVM across all logical processors without tripping Windows' watchdog.
- Capturing enough live CPU state to let an already-running Windows kernel continue as a guest.
- Building and managing VMCBs for each virtual processor.
- Handling VMEXITs through an assembly trampoline and C++ dispatcher.
- Implementing nested page tables with large-page identity mappings.
- Debugging nested page faults caused by invalid permissions, stale translations, and bad physical address calculations.
- Building read and write memory primitives backed by a guest page-table walker.
- Resolving process address spaces by walking from a target process CR3 instead of relying on Windows memory-copy helpers.
- Building a small hypercall path for controlled lab experiments.
- Learning how unforgiving kernel and hypervisor debugging becomes when the system fails below the OS.
Those are the pieces that made the project valuable to me. The hypervisor is not polished production software, but it forced me to work across CPU architecture, Windows internals, assembly, paging, synchronization, and debugging all at once.
Getting the CPU into SVM Mode
Before the CPU can enter SVM mode, the driver has to check the CPUID feature bit, enable EFER.SVME, and write the host save area physical address into MSR_VM_HSAVE_PA. That is straightforward on one core, but Hv2 has to do it across every logical processor.
The challenge with multicore launch is timing. If you launch cores one at a time, the already-virtualized cores are running guest code while the others are still being prepared. Windows' watchdog can also fire if a core goes unresponsive for too long. Hv2 uses a two-phase approach: first, a per-core setup pass pins a thread to each processor and initializes its VMCB. Then DPCs queue the actual VMRUN to all cores, so the window between the first and last core entering guest mode stays small.
Capturing Live State
The hardest part of virtualizing a live system is not enabling the hardware feature. It is reconstructing the CPU state accurately enough that Windows keeps running as if nothing happened.
init_vmcb() reads directly from the live CPU. It rebuilds segment state from the GDT, captures GDTR and IDTR, saves syscall MSRs such as LSTAR, STAR, and SFMASK, and copies control registers like CR0, CR3, and CR4.
One subtle issue is EFER.SVME. It has to be set in the host, but the guest should keep seeing the architectural state it expects. Hv2 maintains a shadow_efer: guest WRMSR operations update the virtualized value rather than directly mutating host state. This keeps guest-visible MSR behavior consistent and avoids surprising Windows with state changes it did not make.
The VMEXIT Loop
When the guest executes an instruction or hits a condition that the hypervisor intercepts, execution exits to the VMM. The assembly trampoline in Entrypoint.asm handles the transition:
VMSAVEflushes guest FS/GS/syscall MSRs from the CPU back into the VMCB.- General-purpose registers are saved onto the VMM stack.
- The C++
vmexit_handler()runs with a pointer to the saved guest register state. - The trampoline restores guest state, re-enables global interrupts with
STGI, and resumes withVMRUN.
The GIF, or Global Interrupt Flag, is cleared while the VMM is handling an exit. That means the handler has to be quick and predictable. Blocking or doing anything too clever there is a good way to turn a research project into a blue screen.
The C++ dispatcher routes on the exit code. A few examples:
- RDMSR/WRMSR: return or update virtualized MSR state where needed.
- INVD: convert to
WBINVDto avoid cache coherency problems. - Nested virtualization instructions: reject unsupported nested virtualization attempts.
- VMMCALL: dispatch a small research hypercall interface used by my lab tooling.
Nested Page Tables
NPT, AMD's second-level address translation, adds another layer of page tables between guest physical addresses and real RAM. The hypervisor owns these tables and can control how the guest accesses memory.
Hv2 starts with an identity map, where guest physical addresses map to the same host physical addresses. It uses large pages where possible and maps the physical memory ranges reported by Windows. Unmapped regions, such as MMIO or device memory, can be handled lazily when a nested page fault occurs.
The part I spent the most time on was page-view instrumentation. In simplified form, a page can have different permissions depending on whether the guest is executing from it or accessing it as data:
- Execute view: mapped with execute permissions.
- RW view: mapped for ordinary data access.
When the guest accesses a page in a way that does not match the current permissions, a nested page fault gives the hypervisor a chance to inspect the event and update the mapping. A single-step trap can then restore the previous state after the instruction completes.
This is similar in spirit to EPT/NPT instrumentation used by debuggers, sandboxes, and research monitors. It is also very easy to get wrong. One stale TLB entry or one incorrect physical address calculation can crash the whole system.
I used this mainly to understand how execute/data visibility interacts with nested paging. The important research lesson was that changing memory visibility below the guest has tradeoffs. It can be useful for instrumentation, but it also creates timing and consistency problems that need to be treated carefully.
Process Memory Introspection
The most useful capability I added was process memory introspection from the hypervisor layer. Hv2 can read and write guest memory by resolving addresses through a target process CR3 and walking that process's page tables directly.
That was an important milestone because it meant the hypervisor did not need to ask the Windows kernel to translate or copy memory for it. Instead of depending on APIs such as kernel memory-copy helpers or virtual-to-physical mapping helpers, Hv2 treats the guest page tables as data structures and resolves the translation itself. NPT gives the hypervisor a way to inspect the physical pages involved while remaining outside the guest's normal execution path.
The result is a controlled lab primitive for:
- Translating guest virtual addresses in a specific process address space.
- Reading guest memory after resolving the backing physical page.
- Writing guest memory through the same translation path.
- Testing how memory visibility changes depending on whether the observer is inside or below the guest.
This was also one of the more failure-prone parts of the project. A page-table walker has to handle large pages, invalid entries, page boundaries, and process context correctly. When it is wrong, the bug does not look like a normal application failure. It looks like corrupted memory, a hung core, or a machine that needs a hard reboot.
Hypercall Interface
Hv2 also has a small hypercall interface so my test application can ask the hypervisor for lab data. This is not meant to be a production API. It was a convenient way to verify that the hypervisor was alive, inspect selected guest state, and exercise the read/write memory path without building a full driver stack around every experiment.
At a high level, the interface supports:
- A basic liveness check.
- Reading selected guest CPU context for debugging.
- Exercising the page-table walker and read/write primitives in controlled tests.
The main lesson here was less about the interface itself and more about validation. Anything crossing the guest/hypervisor boundary needs strict input checking, clear ownership rules, and a very small surface area. A bug at this layer is much less forgiving than a normal user-mode bug.
What I Learned
The watchdog timeout was the first wall I hit. Before the two-phase DPC launch, cores would go unresponsive mid-setup and Windows would panic. It took a while to understand why. Once a core is virtualized, it is running guest code, and if the others are not ready yet, the system can fall apart. The fix was straightforward once I understood the problem, but getting there meant a lot of BSODs and WinDbg sessions staring at a hung scheduler.
NPT was the hardest part overall. The page-view instrumentation idea is conceptually clean: two views, switch on fault, restore after the instruction. The implementation was not clean at all. I hit bugs where I corrupted page tables mid-execution, miscalculated physical addresses, or resumed the guest with the wrong permissions. Reading other hypervisor implementations helped me understand where mine diverged and why. Credit to Samuel Tulach's Memhv and Satoshi Tanda's SimpleSvm.
The larger lesson was about humility. Hypervisors are unforgiving because they sit underneath the thing that would normally help you debug. When the VMM is wrong, the whole machine is wrong.
What's Next
The memory introspection path now works for both reads and writes, but I still want to improve the ergonomics around it. The next step is making the testing harness better: clearer error reporting, more careful validation around page boundaries, and better diagnostics when a translation fails.
Beyond that, I am interested in studying measured boot and earlier launch paths in a lab setting. If I take this into UEFI, I want the focus to be on boot-time integrity, TPM measurements, and how the OS establishes trust in the platform, not on building something persistent or covert.
Source: github.com/KQAsk/Hv2