4

I'm a bit confused about what happens when we're in user-mode and a page fault occurs.

IIRC, a page fault will be generated when the TLB is attempting to map my (user-space) virtual address into a physical address and it fails.

It then generates an exception that will be synchronously handled by the OS. But the question now is: most likely, the addresses of this exception handler code plus its associated data are also not going to be in the TLB!

Does this get recursive or is this kernel range of memory addresses subject to different rules (for instance, an automatic mapping between virtual/physical memory as to avoid needing to use the TLB?)

Thanks!

1
  • 4
    I think you are misunderstanding the relationship between the TLB and page faults. Everywhere you say "TLB" you should say "page tables [for the process]" instead. The TLB is just a cache for the page tables - missing in the TLB doesn't cause a page fault (it causes a "page walk"), but missing in the page tables does.
    – BeeOnRope
    Commented May 4, 2019 at 4:57

2 Answers 2

7

No, Linux doesn't swap out kernel memory. (For this and similar reasons, like being sure that a page-fault handler doesn't run before any random instruction that accesses memory).

OSes that do page some of kernel memory would definitely need to keep the page-fault handler, page-tables, and disk I/O code in memory...


this exception handler code plus its associated data are also not going to be in the TLB!

You're conflating page-walks (on a TLB miss) with page faults (the entry for the virtual page is invalid or insufficient permissions, taken after a page walk if necessary).

On x86 and most(?) other ISAs, page-walks are done by hardware. See What happens after a L2 TLB miss?.

The OS gives the CPU the physical address of the top-level page table (with mov cr3, rax for example on x86), and the CPU handles everything else transparently. (The only software TLB management is invalidation of a possibly-cached entry after modifying the page table entry in memory. e.g. x86 invlpg)

Hardware page-table management allows the CPU to speculatively do a TLB walk when a loop over an array is getting close to a page boundary, instead of waiting until an actual load touches the next page. And for page-walk latency to be partially hidden by out-of-order execution, and lots of good things. Skylake even has 2 page-walk units, so it can be working on 2 TLB misses in parallel (either or both could be speculative or demand).


On an ISA with software page walks, the TLB-miss handler is separate from the page-fault handler.

On MIPS for example, there is a special range of addresses which are mapped differently from normal kernel virtual addresses:

If address starts with 0b100 [top 3 bits], translates to bottom 512 Mbytes of physical memory and does not go through TLB. (cached and unmapped). Called kseg0. Used for kernel instructions and data.

MIPS TLB handling - https://people.csail.mit.edu/rinard/teaching/osnotes/h11.html

(MIPS addresses with the high bit set can only be used by kernel code, user-space access faults. i.e. a high-half kernel is baked-in for MIPS.)

This is kind of like have a 512MiB hugepage mapping of the low physical memory baked into the hardware. Obviously the kernel would want to keep its page-lookup data structure in that range, but it could use any data structure it wanted, e.g. based on start/length.

5
  • Hi. Thanks for the thorough answer! I took all this time to get back to you as I was reading the TLB section of "What every programmer should know about memory". Commented May 3, 2019 at 6:34
  • I have a couple of questions: at point does the page fault happen in this cycle? I imagine that a hard page fault happens when tree-walking and coming to the conclusion the page has been swapped. What about a soft page fault? Commented May 3, 2019 at 6:36
  • @devouredelysium: a soft page fault is when the page is present in memory but not "wired" into the hardware page table (invalid). It's up to the page-fault handler to sort this out using the kernel's data structures (typically an efficient start/length extent-based list of mappings, instead of the HW page table's radix-tree format with always 1 entry per page). Or copy-on-write mapped: logically the page is writeable, but OS marks it as read-only in the page table. So the page needs to be copied to a newly allocated physical page before re-running a write instruction. Commented May 3, 2019 at 6:45
  • @devouredelysium: either way, the HW just knows the PTE is invalid/not present or doesn't have permissions and e.g. on x86 raises a #PF exception (once the instruction is known to be non-speculative, i.e. reaches retirement in the out-of-order back-end, not earlier. This is part of why Meltdown is a thing :P). The hardware doesn't know anything about swap space, that's just one of the fun things that software can do with virtual memory. Lazy allocation (not creating HW page-table entries at all until memory is touched), and copy-on-write, are two other things. Commented May 3, 2019 at 6:48
  • 1
    However, in Linux kernel, a secondary #PF exception could occur for synchronizing vmalloc page tables.
    – firo
    Commented Dec 16, 2019 at 13:32
1

First, you should get TLB out of your mind when thinking software. TLB is a hardware component. The fact that there is not a mapping in the TLB does not automatically cause a page fault.

Second, on some hardware multiple page faults are common. This occurs when the processor permits page tables to be paged. Thus you can get a page fault on the page and more or more page faults reading the page tables. Processors that support this use various mechanisms to get around the chicken and egg problem paged page tables can created.

It then generates an exception that will be synchronously handled by the OS. But the question now is: most likely, the addresses of this exception handler code plus its associated data are also not going to be in the TLB!

Ever operating system has to ensure that its page fault handler remains in physical memory.

Does this get recursive or is this kernel range of memory addresses subject to different rules (for instance, an automatic mapping between virtual/physical memory as to avoid needing to use the TLB?)

On some processors, the system address space range is mapped differently than the user space. That is one of the ways to avoid the chicken and egg problem of paged page tables.

6
  • This isn't an x86-specific question. There are architectures like MIPS where a TLB miss causes a software exception. (Separate from a page-fault, though.) Commented May 3, 2019 at 20:43
  • I wasn't thinking of Intel as the text should show. Note the last paragraph. Commented May 3, 2019 at 23:26
  • Right, the last part makes sense across architectures, but you start with "you should get TLB out of your mind when thinking software." Yes it's logically separate from page-fault handling, but with software TLB handling it is the TLB-miss handler that detects if an access is a page fault, right? Or maybe hardware can also page fault directly for write-access to a read-only TLB entry? Commented May 3, 2019 at 23:32
  • But yes, fair point about hardware with paged page tables instead of multi-level nested page tables, 2 different ways of solving the same problem of covering a 4GB address space with 4 or 8 byte entries per 4k page, leading to huge page tables if fully populated and not paged. Commented May 3, 2019 at 23:34
  • Unless you are dealing with MIPS (which is the only architecture I can think of that has a software loaded TLB) you can put the TLB out of your mind to understand how things work. Commented May 3, 2019 at 23:34

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.