Linux Namespaces#

This section covers how the containers are isolated from the host as well as each other using the kernel namespaces. This is actually the most significant kernel feature which virtualizes the resources and isolates the processes from each other and using just namespaces creates a containers of sorts, see nsexec.

Namespaces#

Pasting here the definition from the manual page namespaces(7) as there probably isn't a better one.

A namespace wraps a global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource. Changes to the global resource are visible to other processes that are members of the namespace, but are invisible to other processes.

There are 7 namespaces at the moment and a process can be in one or more of them. There are always global namespaces for each of the types so that any process is always in some namespace of each type.

Linux has so far following namespaces.
Number in the brackets is the kernel version when the namespace was introduced

Mount namespace#

Mount namespace isolates the mount points and effectively different namespaces can have different filesystem trees as well as any changes in the mount points may or may not be propagated in the other namespaces depending on the mount types (private, bind, slave etc), see mount(8). In the container context it means that anything happening to mount points inside the container is not propagated elsewhere so they are completely isolated.

Mount Namespace

Image courtesy of Wonchang Song

PID namespace#

PID namespace isolated the PID numbers, they are a hierarchical structure where the parent namespace can view all the PIDs in the child namespaces. When a new namespace is created the first process gets the PID 1 and is a sort of init process of that namespace. It should in the ideal world be able to reap any child processes as otherwise it can actually exhaust the root PID space because of the hierarchical nature.

PID Namespace

Network namespace#

Network namespace creates a completely new network stack including routing tables, in a new network namespace you get just the loopback device lo and nothing else so you are actually unable to connect to the network (see nsexec). Physical network interfaces can reside in only one namespace at a time so very often to connect the namespace somewhere the virtual Ethernet device pair (veth pair) is used with together with Linux bridge. In any case the setns(2) comes handy for adding a device to the namespace.

Creating new namespaces#

There are two syscalls how to create a new namespace.

There is also setns(2) which allows you to enter an existing namespace.

unshare and nsenter in the shell#

You can play with the namespaces in the shell too, nsenter(1) is the command line equivalent of setns(2) and unshare(1) is the equivalent of unshare(2) syscall.

$ unshare --fork --pid --mount-proc 

Runs a new shell in own PID namespace, it needs to remount the procfs as otherwise tools like ps would still show the parent namespace.

nsexec#

nsexec is a minimal example on how to use namespaces to isolate processes and one could argue that it creates a container using the host filesystem and programs.

./nsexec --help
Create a child process that executes a shell command in new namespace(s),
Usage: ./nsexec [OPTIONS] <CMD>

    -h, --help           print this help
    -n, --net            new network namespace
    -p, --pid            new PID namespace
    -u, --uts HOSTNAME   new UTS namespace
    -v, --verbose        more verbose output

    <CMD>                command to be executed

See the Code

nsexec.c

Example#

$ sudo ./nsexec -npu myhost bash
myhost> ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 10:45 pts/3    00:00:00 bash
root         6     1  0 10:45 pts/3    00:00:00 ps -ef
myhost> ip a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
myhost> exit
exit

More to read#