Docker Namespace in kernel level - docker

How to differentiate pid 1,17 etc of docker containers with host's 1,17 etc pid's and what all the kernel changes are happening when we create a new process inside the docker container?
How the process inside the docker can be seen in the host?

How to differentiate pid 1,17 etc of docker containers with host's 1,17
By default, those pid are in different namespace.
Since issue 10080 and --pid host, the container pids can stay in the host's pid namespace.
There also issue 10163: "Allow shared PID namespaces", requesting a --pid=container:id
what all the kernel changes are happening when we create a new process inside the docker container
Note and update May 2016: issue 10163 and --pid=container:id is now resolved by PR 22481 for docker 1.12, allowing to join another container's PID namespace.
No changes on the kernel level, only the use of:
cgroups or control groups. A key to running applications in isolation is to have them only use the resources you want.
union file systems to provide the building blocks for containers

Related

Is docker container completely isolated with outside of the docker container?

I wonder whether things like shell script execution can affect on the outside of the container. For example, let's say I want to save some file at the host machine from inside of the container, not using docker volumes or mount. Is that can be done? Or let's say I want to kill a process which is running on the host machine with shell commands from inside of the container. Is that can be done?
No, that is not possible, a docker container environment is completely isolated from the host, the only way to change some files in the host is by mounting a volume from the host to the container, you can kill an external PID but it's not a common practice.
Docker takes advantage of Linux namespaces to provide the isolated workspace we call a container. When a container is deployed, Docker creates a set of namespaces for that specific container, isolating it from all the other running containers. The various namespaces created for a container include:
PID Namespace: Anytime a program starts, a unique ID number is assigned to the namespace that is different than the host system. Each container has its own set of PID namespaces for its processes.
MNT Namespace: Each container is provided its own namespace for mount directory paths.
NET Namespace: Each container is provided its own view of the network stack avoiding privileged access to the sockets or interfaces of another container.
UTS Namespace: This provides isolation between the system identifiers; the hostname and the NIS domain name.
IPC Namespace: The inter-process communication (IPC) namespace creates a grouping where containers can only see and communicate with other processes in the same IPC namespace.
Containers allow developers to package large or small amounts of code and their dependencies together into an isolated package. This model then allows multiple isolated containers to run on the same host, resulting in better usage of hardware resources, and decreasing the impact of misbehaving applications on each other and their host system.
I hope it may help you.
You cannot modify host files without mounting them inside the container, though you can mount entire root inside (e.g -v /:/host). As for killing host processes, it is possible if you ran the container with host PID mode: docker run --pid=host ....

PID mapping between docker and host

How docker namespace is different from Host namespace and how the pid can be mapped between these two? Can anyone give me an idea that helps to make easy way of mapping pid's between host n docker using source code?
You can find the mapping in /proc/PID/status file. It contains a line like:
NSpid: 16950 24
Which means that 16950 on the host is 24 inside the container.
As I mentioned in "Running docker securely":
Currently, Docker uses five namespaces to alter processes view of the system: Process, Network, Mount, Hostname, Shared Memory.
The fact that, by default, as I mentioned in your previous question "Docker Namespace in kernel level" the container pid are isolated from the host (unless you run them with --pid host) is by design.
If you are using --pid=host, then those container pids are visible from the host, but not easily matched to a particular container, not until issue 10163 and --pid=container:id is resolved.
Update May 2016: issue 10163 and --pid=container:id is actually resolved by PR 22481 for docker 1.12, allowing to join another container's PID namespace.

Docker pid namespace and Host

When we run the same process in docker and in host system, how it differentiates one from the other, from the perspective of audit logs?
Can I view the process running in docker in host system?
You would not run the same process (same pid) in docker and in host, since the purpose of a container is to provide isolation (both processes and filesystem)
I mentioned in your previous question "Docker Namespace in kernel level" that the pid of a process run in a container could be made visible from the host.
But in term of audit log, you can configure logging drivers in order to follow only containers, and ignore processes running directly on host.
For instance, in this article, Mark configures rsyslog to isolate the Docker logs into their own file.
To do this create /etc/rsyslog.d/10-docker.conf and copy the following content into the file using your favorite text editor.
# Docker logging
daemon.* {
/var/log/docker.log
stop
}
In summary this will write all logs for the daemon category to /var/log/docker.log then stop processing that log entry so it isn’t written to the systems default syslog file.
That should be enough to clearly differentiate the host processes logs (in regular syslog) from the ones running in containers (in /var/log/docker.log)
Update May 2016: issue 10163 and --pid=container:id is closed by PR 22481 for docker 1.12, allowing to join another container's PID namespace.

Docker container with HBA card

How can i attach a HBA card (which is on my physical server running on centos 7) to a docker container? As I'm doing POC for migration to docker from existing environment this is much needed. It's similar to direct IO in VMware ESXi(Attaching a physical hba to VM can be done via Direct I/O).
Docker isn't a hypervisor, containers aren't VMs, and "attaching devices" to a container doesn't necessarily make sense -- a container is just a process running on your host.
You can expose a device node in /dev to a container using the --device flag to docker run, although exposing a block device inside a container usually leads to other complications (e.g., a normal container can't mount filesystems, so you would need to run it with --privileged, which may or may not be acceptable form a security perspective depending on your environment).
For storage, it is more common to mount devices on the host, and then expose those filesystems to container as Docker volumes (-v /host/path:/container/path).

changing transparent_hugepage in docker

I have a container which requires /sys/kernel/mm/transparent_hugepage/enabled set to "never". The host has this set to a different value, which I cannot change due to other applications running on the host. Is it impossible run a container with different transparent_hugepage values from the host? Both the host and the container are using CentOS 6.6.
I imagine you're referring to Redis, but unfortunately it is impossible. Even if you give the container access to change kernel parameters (via --privileged or --cap-add), it would change for that container, the host, and all other containers.
The kernel is shared between the host and all containers so they all need to agree on the same kernel parameters. The only exceptions to this rule are those parameters within Kernel Resource Control Groups, or cgroups:
PID: Process IDs
UTS: Hostnames
Network: Networking params like TCP backlog, etc
Mount: Mounted filesystems
User: UID/GIDs
IPC: Inter-Process Control chit-chat is isolated
(more on cgroups: http://en.wikipedia.org/wiki/Cgroups)
Your specific request is related to a kernel memory-management parameter that applies globally.

Resources