Intel的VMX定义了一套支持硬件辅助虚拟化的机制,包括一个硬件的数据结构VMCS(VM control data structure)以及一系列新的VMX的指令,其中包括了操作VMCS的若干指令,例如vmread,用于读取VMCS的字段;vmwrite用于写入VMCS的字段。
VMCS是物理内存的一个4KB页面(4KB对齐),包含了6个区域,guest/host状态,VM运行控制,VM entry控制,VM exit控制以及VM exit的信息。
按理说可以直接通过内存的操作来操作VMCS的相应区域,但是Intel明确要求需要用vmread/vmwrite来读写相应区域和字段,因为VMCS的layout是随着CPU架构而进行改变的,不同的CPU可能使用不同的VMCS layout,所以直接访问某个内存偏移是不具有准确性和兼容性的。
而且,按照Intel一贯的风格,VMCS的layout是没有公布出来的,this seems a secret。
但是,好奇心作祟!这个VMCS layout到底是什么样的呢,例如guest的状态到底存在VMCS这个4KB内存的什么位置呢?
我做了一个kernel module,来找出VMCS的layout。下面显示的是Intel Sandybridge的VMCS layout的一部分,仅供参考。
in sandybridge physical host, it shows.
[ 4840.029135] [i] vmx supported cpu.
[ 4841.029145] [i] msr 0x3a: lock bit is on. vmxon bit is on. ok
[ 4840.029147] [i] revision id: 0x00000010
[ 4840.029163] # vpid = 0x2f0
[ 4840.029167] # pi_notification_vector = 0x44
[ 4840.029171] # eptp_index na
[ 4840.029172] # g_es_select = 0x200
[ 4840.029173] # g_cs_select = 0x218
[ 4840.029175] # g_ss_select = 0x230
[ 4840.029176] # g_ds_select = 0x248
[ 4840.029177] # g_fs_select = 0x260
[ 4840.029178] # g_gs_select = 0x278
[ 4840.029179] # g_ldtr_select = 0x290
[ 4840.029180] # g_tr_select = 0x2a8
[ 4840.029181] # g_interrupt_status = 0x2a8
[ 4840.029183] # h_es_select = 0x300
[ 4840.029184] # h_cs_select = 0x304
[ 4840.029185] # h_ss_select = 0x308
[ 4840.029186] # h_ds_select = 0x30c
[ 4840.029187] # h_fs_select = 0x310
[ 4840.029188] # h_gs_select = 0x314
[ 4840.029189] # h_tr_select = 0x318
[ 4840.029190] # io_bmap_a = 0xa0
[ 4840.029192] # io_bmap_a_hi = 0xa4
[ 4840.029193] # io_bmap_b = 0xa8
[ 4840.029194] # io_bmap_b_hi = 0xac
[ 4840.029195] # msr_bmap = 0xb0
[ 4840.029196] # msr_bmap_hi = 0xb4
[ 4840.029198] # exit_msr_store_addr = 0xb8
[ 4840.029199] # exit_msr_store_addr_hi = 0xbc
[ 4840.029200] # exit_msr_load_addr = 0xc0
[ 4840.029201] # exit_msr_load_addr_hi = 0xc4
[ 4840.029203] # entry_msr_load_addr = 0xc8
[ 4840.029204] # entry_msr_load_addr_hi = 0xcc
[ 4840.029205] # exec_vmcs_ptr = 0xd0
[ 4840.029206] # exec_vmcs_ptr_hi = 0xd4
[ 4840.029208] # tsc_offset = 0xd8
[ 4840.029209] # tsc_offset_hi = 0xdc
[ 4840.029210] # virt_apic_page_addr = 0xe0
[ 4840.029211] # virt_apic_page_addr_hi = 0xe4
[ 4840.029213] # apic_access_addr = 0x78
[ 4840.029214] # apic_access_addr_hi = 0x7c
[ 4840.029215] # pi_desc_addr = 0x50
[ 4840.029217] # pi_desc_addr_hi = 0x54
就是说在Intel Sandybridge里面MSR_bitmap是位于VMCS的偏移量0xb0处,IO_BITMAP位于0xa0,和0xa8的位置。
而在最近的Skylake上IO_BITMAP分别位于0x2c8和0x2d0的位置。
可见不同的CPU,VMCS的layout的确不同,不能直接通过某个固定偏移访问VMCS来获取相应的字段。