Page Fault in Linux Kernel

Question

I have few questions after reading Mel Gorman's book Understanding the Linux Virtual Memory Manager. Section 4.3 Process Address Space Descriptor says kernel threads never page fault or access the user space portion. The only exception is page faulting within the vmalloc space . Following are my questions.

kenrel threads never page fault: Does this mean only user space code triggers page fault? If a kmalloc() or vmalloc() is called, will it not page fault? I believe the kernel has to map these to the anon pages. When a write to this pages is performed, a page fault occurs. Is my understanding correct?
Why can't kernel threads access user space? Aren't copy_to_user() or copy_from_user() do that?
Exception is page faulting within vmalloc space: Does that mean vmalloc() triggers a page fault and kmalloc() doesn't ? Why kmalloc() does not page fault? The physical frames to kernel's virtual address need not to be kept as a page table entry?

kernel code is locked in memory so won't fault. Faults happen when the page is used, not when it is allocated. — stark, Commented Sep 11, 2020 at 13:35

Technologeeks · Accepted Answer · 2020-09-23 01:32:55Z

kernel threads never page fault: The page fault talked about is when making a virtual page resident, or bringing it back from swap. Kernel pages not only get paged in on kmalloc(), but also remain resident for their lifetime. The same does not hold for user space pages, which A) may be lazy allocated (i.e. just reserved as page table entries on malloc(), but not actually faulted in until a memset() or other dereference) and B) may be swapped out on low memory conditions.
Why can't kernel threads access user space? Aren't copy_to_user() or copy_from_user() do that?

That's a great question, with a hardware-specific reply. It used to be the case that kernel threads were discouraged from accessing user space, exactly because of the possible page fault hit that might occur, if accessing unpaged/paged out memory in user space (recall, that wouldn't happen in kernel space, as above ensures). So copy_to/from would be normal memcpy, but wrapped in a page fault handler. This way, any potential page fault would be handled transparently (i.e. the memory would be paged in) and all would be well. But there were certainly cases where the bad approach of memcpy to/from user memory would just work - worse, it would work more often than not, as page faults very with RAM residency and availability - and thus unhandled faults would cause random panics. Hence the decree of always using the copy_from/to_user.

Recently, however, kernel/user memory isolation became important from a security standpoint. This is due to many exploitation techniques (NULL pointer dereferencing being a very common and powerful one), where fake kernel objects (or code) could be constructed in user space (and thus, easily controlled) memory, and could lead to code execution in kernel.

Most architectures thus have a page table bit which physically prevents a page belonging to user mode from being accessed by kernel. Taking ARM64 as an example, this feature is called PAN/PXN (Privileged Access/Execute Never).

Thus, copy_from/to now not only handles page faults, but also disables PAN/PXN before the operation, and restores it after.

Exception is page faulting within vmalloc space: vmalloc() allocates memory which is swappable, whereas kmalloc does not. The difference is in the implementation (kmalloc uses GFP_KERNEL). This also means that kmalloc is more likely to fail (if there is no RAM available for this), but will not page fault (it would return NULL, which itself would be a problem..)

Thanks. If I understand correctly, There will be pagefaults(the exception handling) while kmalloc is called and a PTE entry will be created . However the kmalloced memory will not be moved to swap and so no page-out. Can I conclude like this? — Franc, Commented Sep 23, 2020 at 5:17
Not exactly. It depends on flags. When you kmalloc with GFP_ATOMIC you are guaranteed that it won't sleep, i.e. there won't be a page fault which needs to be serviced. It attempts to use pages which are already pre-allocated. If it can't find any, it returns NULL. When you use GFP_KERNEL it may sleep, free up some RAM and/or reclaim pages. But you did understand correctly in that GFP_KERNEL memory will not be swapped out. A rather old but good article on this is linuxjournal.com/article/6930 - by Robert Love, who also wrote the definitive "Linux Kernel Development" book — Technologeeks, Commented Sep 24, 2020 at 2:22
@Technologeeks The kernel disables PAN (or SMAP on x86) when accessing user pages, but does it really disable PXN? At least on x86, it doesn't disable SMEP ever. — forest, Commented Jun 19, 2022 at 20:58

tyChen · Accepted Answer · 2020-09-12 10:15:10Z

I think you get counfused because you haven't understand clearly about the start of kernel, process, and virtual memeory.

kenrel threads never page fault: This is because the pages of kernel space and user space use different allocation methods. For the kernel space, we allocate pages when initialization, but for user space, we allocate them when running process and calling funcitons like malloc(), and after mapping, when truly using that virtual memory, we trigger page fault.
Why can't kernel threads access user space? When kenrel start, the process 0 will create process 1 and process 2. The process 1 is used to form the user space process tree, while the process 2 is used to manage the kernel threads. And the functions you mensioned are always used by those user threads to transmit data into/out of kernel to realise some function like open file or socket and so on.
Exception is page faulting within vmalloc space: The vmalloc space is not function vmalloc(), it is an area in kernel memory space for some dynamic memory allocation used as an exception.

consider I have a kernel module/kthread which after initialization of kernel and at some later point of time allocates memory using kmalloc. When a write to that portion of memory happens, there won't be a page fault as that part of memory is first going to be written? the page table does not need to create a PTE for that frame? — Franc, Commented Sep 17, 2020 at 9:33
The initialization will do when kernel start in start_kernel()->setup_arch()->init_mem_mapping()->init_memory_mapping()->kernel_physical_mapping_init(). It's better to watch source code to understand it. — tyChen, Commented Sep 17, 2020 at 12:38

Collectives™ on Stack Overflow

Page Fault in Linux Kernel

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged
memory-management
linux-kernel
kernel
page-fault
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged memory-managementlinux-kernelkernelpage-fault or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
memory-management
linux-kernel
kernel
page-fault
or ask your own question.