2

I'm trying to understand containers and stumbled upon a trick apparently found by the LXC developers, see this runc PR: You can call pivot_root(".", ".") which avoids the need for a directory to put the old root into. However, this makes the mount namespace behave strangely:

unshare --user --map-root-user --mount bash -c "
mount --bind containerfs bindmountpoint
cd bindmountpoint
pivot_root . .
# this is fine:
ls -l /
# this is not fine:
ls -l /..
"

The parent of ., accessed via ./.. or /.. or /proc/<any>/cwd/.. points to the root of the root mount namespace (I haven't tried nesting yet)! It does not point to the parent of containerfs nor bindmountpoint, but really the root of the root/outer mount namespace.

Similarly, when I try nsenter --user --preserve-credentials --mount --target=<pid>, then this new process has its CWD placed at the root of the root mount namespace.

None of this happens when I pivot_root(".", "oldroot"). The behaviour also disappears when unmounting the old root, either via a file descriptor, umount -l / or umount -l /proc/1/cwd.

I have also tried this sequence of syscalls from a custom C program, since the documentation of the pivot_root only gives guarantees about the current process (so I do everything in the same process). The behaviour is the same as the multi-process steps using CLI tools shown above.

Tested on a 5.3 kernel.

What is going on when I run pivot_root(".", ".")?

2 Answers 2

3

pivot_root(new_root, put_old) moves old root directory of the calling process (which must be a root of a mount) onto put_old, and puts new_root in it's place. It then sets the current directory and root of every process that was set to the old root directory to new_root.

So after pivot_root(".", ".") the new root directory has old root directory mounted on top of it.

Whenever .. would otherwise resolve to directory that has another directory mounted on it, it actually resolves to the directory that is mounted on top. This match historic Unix and Linux behaviour where .., except at the root directory of a filesystem, had no special handling in path traversal and was implemented by a directory entry stored on the disk.

This is not a mount namespace escape.

1
  • Thanks, that also explains why I can unmount / to restore sanity. Why though does nsenter put the process into the old root mounted on top of /?
    – dyp
    Commented Jun 17, 2020 at 12:12
-3

Looks like a bona-fide bug...

Check with the latest kernel from your distribution, and report it with full details (attach your test program!).

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .