QEMU (Quick Emulator) is an open-source machine emulator and virtualizer. It enables users to run operating systems and programs designed for one architecture on a different architecture.
模拟器
as a translator
QEMU, a Fast and Portable Dynamic Translator - bellard
管理器
as a hypervisor qemu+kvm
编译x86_64 + kvm
../configure --target-list=x86_64-softmmu --enable-debug
qemu monitor
qemu-system-x86_64 -m 4096m -kernel /home/ubuntu/kernel-utils/src/linux-5.15/arch/x86_64/boot/bzImage -hda /home/ubuntu/kernel-utils/rootfs.img -nographic -append root=/dev/sda init=/lib/systemd/systemd rw console=ttyS0 earlyprintk=vga nokaslr selinux=0 audit=0 debug -enable-kvm -s -S -smp 4 -monitor unix:/tmp/monitor.sock,server,nowait
kvm
ioctl
创建VM
kvm_dev_ioctl
kvm_dev_ioctl_create_vm
kvm_create_vm
strace:
openat(AT_FDCWD, "/dev/kvm", O_RDWR|O_CLOEXEC) = 9
ioctl(9, KVM_GET_API_VERSION, 0) = 12
ioctl(9, KVM_CHECK_EXTENSION, KVM_CAP_IMMEDIATE_EXIT) = 1
ioctl(9, KVM_CHECK_EXTENSION, KVM_CAP_NR_MEMSLOTS) = 32764
ioctl(9, KVM_CHECK_EXTENSION, KVM_CAP_MULTI_ADDRESS_SPACE) = 2
ioctl(9, KVM_CREATE_VM, 0) = 10
ioctl(10, KVM_CHECK_EXTENSION, KVM_CAP_NR_VCPUS) = 710
ioctl(9, KVM_CHECK_EXTENSION, KVM_CAP_MAX_VCPUS) = 1024
ioctl(9, KVM_CHECK_EXTENSION, KVM_CAP_USER_MEMORY) = 1
ioctl(9, KVM_CHECK_EXTENSION, KVM_CAP_DESTROY_MEMORY_REGION_WORKS) = 1
ioctl(9, KVM_CHECK_EXTENSION, KVM_CAP_JOIN_MEMORY_REGIONS_WORKS) = 1
ioctl(9, KVM_CHECK_EXTENSION, KVM_CAP_SET_TSS_ADDR) = 1
ioctl(9, KVM_CHECK_EXTENSION, KVM_CAP_EXT_CPUID) = 1
ioctl(9, KVM_CHECK_EXTENSION, KVM_CAP_MP_STATE) = 1
ioctl(9, KVM_CHECK_EXTENSION, KVM_CAP_COALESCED_MMIO) = 2
ioctl(9, KVM_CHECK_EXTENSION, KVM_CAP_COALESCED_PIO) = 1
ioctl(9, KVM_CHECK_EXTENSION, KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2) = 3
ioctl(10, KVM_ENABLE_CAP, 0x7ffe9ae54ff0) = 0
kvm_vm_ioctl
kvm_vm_ioctl_create_vcpu
vcpu
qemu使用一个thread模拟一个vcpu,guest的一个cpu从host视角就是一个线程。
超长运行的guest code如何返回?
signal:
- VM exit/VM entry
- guest kernel 调度和host调度
- kvm_steal_time
memory
guest 的物理内存,就是host的虚拟内存,是和其他用户进程一样用malloc分配的连续虚拟地址。
guest 虚拟地址翻译到物理地址需要经过,guest虚拟地址(gva) -> guest物理地址(gpa) -> host物理地址(hpa).
``` pfn host page frame number hpa host physical address hva host virtual address gfn guest frame number gpa guest physical address gva guest virtual address ngpa nested guest physical address ngva nested guest virtual address pte page table entry (used also to refer generically to paging structure entries) gpte guest pte (referring to gfns) spte shadow pte (referring to pfns) tdp two dimensional paging (vendor neutral term for NPT and EPT)
invlpg page invalidations
#### shadow page
The way the shadow page works is that there are two sets of page-tables.
#### paging
- tdp - two dimensional paging (2D)
- softmmu
void kvm_init_mmu(struct kvm_vcpu *vcpu) { if (mmu_is_nested(vcpu)) init_kvm_nested_mmu(vcpu); else if (tdp_enabled) init_kvm_tdp_mmu(vcpu); else init_kvm_softmmu(vcpu); }
#### kvm tracing
- tracing
echo 1 > /sys/kernel/debug/tracing/events/kvm/enable cat /sys/kernel/debug/tracing/trace_pipe
- perf kvm
perf kvm –host record -a perf report ```
参考资料
虚拟化
qemu-kvm docs
qemu-internals
qemu-internals-overview
kvm-x86-mmu-setup
文档信息
- 本文作者:seamaner
- 本文链接:https://seamaner.github.io/2024/07/31/qemu%E5%AD%A6%E4%B9%A0%E7%AC%94%E8%AE%B0/
- 版权声明:自由转载-非商用-非衍生-保持署名(创意共享3.0许可证)