2

I have multiple machines sharing home directory via NFS share used by 6-10 users. All machines are used to run computational experiments including the one with NFS server. Although it is very rare but possible that some experiment may cause out of memory(OOM) problem. Though the user process may get killed at some point of time, I would like to know how it can affect NFS server thus in turn affecting other machines too. I tried searching for it but could not find a specific answer. Also are there any measures I can take to avoid OOM affecting NFS share?

NFS Server Configuration: Intel Core i7-9700, 32 GB RAM, SWAP 32 GB and Graphics TITAN RTX Other machines have similar configurations.

2
  • If there is a risk, split the service on another host ! In cluster environment, there is often a master node dedicated to NFS shares, SSH access gateway, scheduler, hosts provisionning...
    – Dom
    Commented Jul 12, 2020 at 7:00
  • @Dom Thanks for the advice. However all the machines (5 in total) have good specs and can be used to run computational experiments. Dedicating a machine just to host NFS server would be wasting the hardware. I am wondering if anything can be done to make sure NFS service always have some resources to keep serving home directories as NFS share.
    – rmah
    Commented Jul 12, 2020 at 7:12

2 Answers 2

8

I would limit the process memory with ulimit or with cgroups. You need to limit RSS and shared memory. Another approach would be to run it in a container or VM.

Probably the easiest approach is to use a container: docker, podman, LXC...

5

By default when Linux runs out of memory it uses a heuristic to decide which processes to kill to recover enough memory to continue. This often is not desired, though. In many cases (including probably this one) it would be better to kill the process which caused the out of memory condition.

You can set the vm.oom_kill_allocating_task sysctl to cause the OOM killer to kill the process which ran the system out of memory.

2
  • 2
    "Causing the condition" is more random than "biggest process", i.e. that is exactly the opposite of what OP needs, because if the experiment got the last free page, any allocation done by the NFS server will kill the NFS server then. Commented Jul 13, 2020 at 9:18
  • @SimonRichter The NFS server, though, isn't really going to be asking for much memory during its operation. And since it's run as kernel threads, it gets priority over user processes anyway. Commented Jul 13, 2020 at 14:01

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .