Linux Namespaces#

This section covers how the containers are isolated from the host as well as each other using the kernel namespaces. This is actually the most significant kernel feature which virtualizes the resources and isolates the processes from each other and using just namespaces creates a containers of sorts, see nsexec.

Namespaces#

Pasting here the definition from the manual page namespaces(7) as there probably isn't a better one.

A namespace wraps a global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource. Changes to the global resource are visible to other processes that are members of the namespace, but are invisible to other processes.

There are 7 namespaces at the moment and a process can be in one or more of them. There are always global namespaces for each of the types so that any process is always in some namespace of each type.

Linux has so far following namespaces.
Number in the brackets is the kernel version when the namespace was introduced

Mount (2.4.19)
Isolates the mount points. A process has it's own view of the mount points and changes are not propagated to other namespaces.
mount_namespaces(7)
UTS (2.6.19)
Isolates the hostname and the NIS domain name. Calling sethostname(2) or setdomainname(2) is affecting only the namespace.
IPC (2.6.19)
Isolates IPC resources. System V IPC objects and POSIX message queues.
PID (2.6.24)
Isolates the process ID number space. Processes in different PID namespaces can have the same PID or can't see PIDs of different namespace.
pid_namespaces(7)
Network (2.6.29)
Isolates the network resources like network devices, IPv4 and IPv6 protocol stacks, IP routing tables, firewalls etc.
User (3.8)
Isolates the user and group resources, unprivileged user in the "root" namespace can be a user ID 0 in the new namespace. When new user namespace is created the user gets full capabilities(7) inside the namespace.
user_namespaces(7)
Cgroups (4.6)
Isolates the view of the /proc/[pid]/cgroup and /proc/[pid]/mountinfo.
cgroup_namespaces(7)

Mount namespace#

Mount namespace isolates the mount points and effectively different namespaces can have different filesystem trees as well as any changes in the mount points may or may not be propagated in the other namespaces depending on the mount types (private, bind, slave etc), see mount(8). In the container context it means that anything happening to mount points inside the container is not propagated elsewhere so they are completely isolated.

Mount Namespace

Image courtesy of Wonchang Song

PID namespace#

PID namespace isolated the PID numbers, they are a hierarchical structure where the parent namespace can view all the PIDs in the child namespaces. When a new namespace is created the first process gets the PID 1 and is a sort of init process of that namespace. It should in the ideal world be able to reap any child processes as otherwise it can actually exhaust the root PID space because of the hierarchical nature.

PID Namespace

Network namespace#

Network namespace creates a completely new network stack including routing tables, in a new network namespace you get just the loopback device lo and nothing else so you are actually unable to connect to the network (see nsexec). Physical network interfaces can reside in only one namespace at a time so very often to connect the namespace somewhere the virtual Ethernet device pair (veth pair) is used with together with Linux bridge. In any case the setns(2) comes handy for adding a device to the namespace.

Creating new namespaces#

There are two syscalls how to create a new namespace.

clone(2)
is like fork(2) but allows you to pick what context you share with the parent process.
unshare(2)
is to disassociate from the parent process context and thus create a new one.

There is also setns(2) which allows you to enter an existing namespace.

unshare and nsenter in the shell#

You can play with the namespaces in the shell too, nsenter(1) is the command line equivalent of setns(2) and unshare(1) is the equivalent of unshare(2) syscall.

$ unshare --fork --pid --mount-proc

Runs a new shell in own PID namespace, it needs to remount the procfs as otherwise tools like ps would still show the parent namespace.

nsexec#

nsexec is a minimal example on how to use namespaces to isolate processes and one could argue that it creates a container using the host filesystem and programs.

./nsexec --help
Create a child process that executes a shell command in new namespace(s),
Usage: ./nsexec [OPTIONS] <CMD>

    -h, --help           print this help
    -n, --net            new network namespace
    -p, --pid            new PID namespace
    -u, --uts HOSTNAME   new UTS namespace
    -v, --verbose        more verbose output

    <CMD>                command to be executed

See the Code

nsexec.c

Example#

$ sudo ./nsexec -npu myhost bash
myhost> ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 10:45 pts/3    00:00:00 bash
root         6     1  0 10:45 pts/3    00:00:00 ps -ef
myhost> ip a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
myhost> exit
exit

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search

Linux Namespaces#

Namespaces#

Mount namespace#

PID namespace#

Network namespace#

Creating new namespaces#

unshare and nsenter in the shell#

nsexec#

Example#

More to read#