内核内存管理-虚拟地址到物理地址转换实验

2025/01/21 kernel 共 7293 字,约 21 分钟

虚拟地址是如何一步步转为物理地址的?CPU在执行用户空间程序访问内存的指令时,会根据CR3(x64)寄存器的值一步步查询到最终的物理地址,这一过程是由MMU自动完成的。虚拟地址到物理地址到底是如何转换的?借助qemu我们可以通过实验查看整个转换过程是如何工作的。

地址空间

可以从/proc/self/maps查看到用户空间进程的各个内存区:

561b0feb4000-561b0feb5000 r--p 00000000 fd:00 917628                     /home/ubuntu/kernel/hack/mm
561b0feb5000-561b0feb6000 r-xp 00001000 fd:00 917628                     /home/ubuntu/kernel/hack/mm
561b0feb6000-561b0feb7000 r--p 00002000 fd:00 917628                     /home/ubuntu/kernel/hack/mm
561b0feb7000-561b0feb8000 r--p 00002000 fd:00 917628                     /home/ubuntu/kernel/hack/mm
561b0feb8000-561b0feb9000 rw-p 00003000 fd:00 917628                     /home/ubuntu/kernel/hack/mm
561b2be8b000-561b2beac000 rw-p 00000000 00:00 0                          [heap]
7f1b6d3d4000-7f1b6d3d7000 rw-p 00000000 00:00 0
7f1b6d3d7000-7f1b6d3ff000 r--p 00000000 fd:00 137558                     /usr/lib/x86_64-linux-gnu/libc.so.6
7f1b6d3ff000-7f1b6d594000 r-xp 00028000 fd:00 137558                     /usr/lib/x86_64-linux-gnu/libc.so.6
7f1b6d594000-7f1b6d5ec000 r--p 001bd000 fd:00 137558                     /usr/lib/x86_64-linux-gnu/libc.so.6
7f1b6d5ec000-7f1b6d5ed000 ---p 00215000 fd:00 137558                     /usr/lib/x86_64-linux-gnu/libc.so.6
7f1b6d5ed000-7f1b6d5f1000 r--p 00215000 fd:00 137558                     /usr/lib/x86_64-linux-gnu/libc.so.6
7f1b6d5f1000-7f1b6d5f3000 rw-p 00219000 fd:00 137558                     /usr/lib/x86_64-linux-gnu/libc.so.6
7f1b6d5f3000-7f1b6d600000 rw-p 00000000 00:00 0
7f1b6d60a000-7f1b6d60c000 rw-p 00000000 00:00 0
7f1b6d60c000-7f1b6d60e000 r--p 00000000 fd:00 132206                     /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7f1b6d60e000-7f1b6d638000 r-xp 00002000 fd:00 132206                     /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7f1b6d638000-7f1b6d643000 r--p 0002c000 fd:00 132206                     /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7f1b6d644000-7f1b6d646000 r--p 00037000 fd:00 132206                     /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7f1b6d646000-7f1b6d648000 rw-p 00039000 fd:00 132206                     /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7ffcae07f000-7ffcae0a0000 rw-p 00000000 00:00 0                          [stack]
7ffcae11c000-7ffcae120000 r--p 00000000 00:00 0                          [vvar]
7ffcae120000-7ffcae122000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0                  [vsyscall]

观察各个内存区,在范围上他们都小于0x7ffffffff000, 实际上,在x64上默认4级页表情况下,地址区间是这样分布的:

========================================================================================================================
    Start addr    |   Offset   |     End addr     |  Size   | VM area description
========================================================================================================================
                  |            |                  |         |
 0000000000000000 |    0       | 00007fffffffffff |  128 TB | user-space virtual memory, different per mm
__________________|____________|__________________|_________|___________________________________________________________
                  |            |                  |         |
 0000800000000000 | +128    TB | ffff7fffffffffff | ~16M TB | ... huge, almost 64 bits wide hole of non-canonical
                  |            |                  |         |     virtual memory addresses up to the -128 TB
                  |            |                  |         |     starting offset of kernel mappings.
__________________|____________|__________________|_________|___________________________________________________________
                                                            |
                                                            | Kernel-space virtual memory, shared between all processes:
____________________________________________________________|___________________________________________________________
                  |            |                  |         |
 ffff800000000000 | -128    TB | ffff87ffffffffff |    8 TB | ... guard hole, also reserved for hypervisor
 ffff880000000000 | -120    TB | ffff887fffffffff |  0.5 TB | LDT remap for PTI
 ffff888000000000 | -119.5  TB | ffffc87fffffffff |   64 TB | direct mapping of all physical memory (page_offset_base)
 ffffc88000000000 |  -55.5  TB | ffffc8ffffffffff |  0.5 TB | ... unused hole
 ffffc90000000000 |  -55    TB | ffffe8ffffffffff |   32 TB | vmalloc/ioremap space (vmalloc_base)
 ffffe90000000000 |  -23    TB | ffffe9ffffffffff |    1 TB | ... unused hole
 ffffea0000000000 |  -22    TB | ffffeaffffffffff |    1 TB | virtual memory map (vmemmap_base)
 ffffeb0000000000 |  -21    TB | ffffebffffffffff |    1 TB | ... unused hole
 ffffec0000000000 |  -20    TB | fffffbffffffffff |   16 TB | KASAN shadow memory
__________________|____________|__________________|_________|____________________________________________________________
                                                            |
                                                            | Identical layout to the 56-bit one from here on:
____________________________________________________________|____________________________________________________________
                  |            |                  |         |
 fffffc0000000000 |   -4    TB | fffffdffffffffff |    2 TB | ... unused hole
                  |            |                  |         | vaddr_end for KASLR
 fffffe0000000000 |   -2    TB | fffffe7fffffffff |  0.5 TB | cpu_entry_area mapping
 fffffe8000000000 |   -1.5  TB | fffffeffffffffff |  0.5 TB | ... unused hole
 ffffff0000000000 |   -1    TB | ffffff7fffffffff |  0.5 TB | %esp fixup stacks
 ffffff8000000000 | -512    GB | ffffffeeffffffff |  444 GB | ... unused hole
 ffffffef00000000 |  -68    GB | fffffffeffffffff |   64 GB | EFI region mapping space
 ffffffff00000000 |   -4    GB | ffffffff7fffffff |    2 GB | ... unused hole
 ffffffff80000000 |   -2    GB | ffffffff9fffffff |  512 MB | kernel text mapping, mapped to physical address 0
 ffffffff80000000 |-2048    MB |                  |         |
 ffffffffa0000000 |-1536    MB | fffffffffeffffff | 1520 MB | module mapping space
 ffffffffff000000 |  -16    MB |                  |         |
    FIXADDR_START | ~-11    MB | ffffffffff5fffff | ~0.5 MB | kernel-internal fixmap range, variable size and offset
 ffffffffff600000 |  -10    MB | ffffffffff600fff |    4 kB | legacy vsyscall ABI
 ffffffffffe00000 |   -2    MB | ffffffffffffffff |    2 MB | ... unused hole
__________________|____________|__________________|_________|___________________________________________________________

  • 128TB(0x7fffffffffff)以下用户空间,64位地址的低48位,48 = 4 * 9 + 12
  • 128TB(0x800000000000)以上线性地址 - 内核kmalloc、object pool, 内核物理地址映射,内核text等
  • 128TB以上vmalloc区域, 这部分地址管理上和用户空间地址是相似的,都是虚拟地址,区别在于使用的PGT不同,vmalloc使用的内核pgd,即init_mm.pgd.
(gdb) p init_mm.pgd
$3 = (pgd_t *) 0xffffffff84c26000 <init_top_pgt>
(gdb)

虚拟地址

虚拟地址与其说是地址,不如说是一种编码:
从低到高第1~12位,在当前page中的offset, page 大小4K,这12位取值范围[0, 0xfff]; 从第13位起到48位的36位,分为4级页表地址,每级9位,取值范围[0, 2^9 = 512), 数值代表了在页表页中的偏移也就是第几个地址,一个页表页4K,每个地址8字节,共可存放512个地址; 寻址开始时,从PGD所指的页中,最高一级地址 - offset值,指定的偏移处,取出8位物理地址,这是下一级地址所在的页;依次类推,取出其他各级地址,一直取出最后的页地址,再根据1到12位的offset就得到最终的物理地址。

通过一个小程序 - mm-paging example, 打印出malloc分配的内存地址,也就是虚拟地址:

 cat hello-mm.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

int main(int argc, char* argv[])
{
  char *p = malloc(128);
  if (p) {
      strcpy(p, "Hello mm paging!\n");
      printf("p(%p) is: %s\n", p, p);
      printf("sleep here!!!!\n");
      sleep(1000000);
  } else {
      printf("Failed to alloc 128 chars\n");
  }
  return 0;
}

首先通过qemu启动一个虚拟机,并在其中运行example程序,打印出虚拟地址:

./mm-paging
p(0x5614df8812a0) is: Hello mm paging!

sleep here!!!!

从虚拟地址0x5614df8812a0可以得到各个page table地址:
0x5614df8812a0从第13bit起到第48bit的36位是4级页表的”地址”, 每9位构成一个当前页表中的地址,是一个0~511的数,也就是当前level页表中的offset下标. 对0x5614df8812a0来说,四级页表地址分别是:

  • 0x5614df8812a0 » (12+27) & (2**9 - 1) -> 0xac
  • 0x5614df8812a0 » (12+18) & (2**9 - 1) -> 0x53
  • 0x5614df8812a0 » (12+9) & (2**9 - 1) -> 0xfc
  • 0x5614df8812a0 » (12) & (2**9 - 1) -> 0x81

页中的偏移地址为:
0x5614df8812a0 & 0xfff -> 0x2a0 通过gdb qemu 调试虚机,查看exaple程序(mm-paging)的页表信息:

gdb ./vmlinux
target remote:1234

mapping过程

从task->mm->pgd拿到mm-paging进程的PGD地址:


>lx-ps
      TASK          PID    COMM
0xffff888006804ec0  214  mm-paging

p ((struct task_struct*)0xffff888006804ec0)->mm->pgd
$1 = (pgd_t *) 0xffff8880068c8000

内核地址都在0xffff800000000000以上部分,内核内访问物理地址是通过 物理地址+ page_offset_base 的线性地址访问的:

x/a &page_offset_base
0xffffffff8323f1f8 <page_offset_base>:  0xffff888000000000

现在有了PGD,也有了page_offset_base, 可以访问任意物理地址; 也有了各级页表的offset(0xac, 0x53, 0xfc, 0x81); 现在可以遍历拿到虚拟地址真正的物理地址了:


- L4:
 x/gx (0xffff8880068c8000 + 0xac * 8)
0xffff8880068c8560:     0x8000000005847067

- L3:
x/gx (0x5847000 + 0x53 * 8 + 0xffff888000000000)
0xffff888005847298:     0x00000000078c4067

- L2:
x/gx (0x78c4000 + 0xfc *8 + 0xffff888000000000)
0xffff8880078c47e0:     0x0000000005bb3067

- L1:
x/gx(0x5bb3000 + 0x81 * 8 + 0xffff888000000000)
0xffff888005bb3408:     0x8000000006565067

check the memory: page + offset:

x/s (0x5614df8812a0 & 0xfff) + (0x6565000 + 0xffff888000000000)
0xffff8880065652a0:     "Hello mm paging!\n"

直接访问物理地址打印出的字符串,正是我们设置的内容,这说明这个”寻址”过程是正确的。

附注

  • 实验中,页大小是x64默认的4K, 4级页表
  • 页表中存的地址的低位和高位bit存的是flag:0x8000000005847067中的高位0x8和低位的0x067都是flag

参考资料

x64 mm

文档信息

Search

    Table of Contents