Restrict system calls inside docker container - docker

How can I restrict any system call made inside a docker container. If the given process makes a system call it will be blocked. Or how can I use seccomp with docker.

You can see more at "Seccomp security profiles for Docker" (the eature is available only if the kernel is configured with CONFIG_SECCOMP enabled.)
The supoprt for docker containers will be in docker 1.10: see issue 17142
allowing the Engine to accept a seccomp profile at container run time.
In the future, we might want to ship builtin profiles, or bake profiles in the images.
PR 17989 has been merged.
It allows for passing a seccomp profile in the form of:
{
"defaultAction": "SCMP_ACT_ALLOW",
"syscalls": [
{
"name": "getcwd",
"action": "SCMP_ACT_ERRNO"
}
]
}
Example (based on Linux-specific Runtime Configuration - seccomp):
$ docker run --rm -it --security-ops seccomp:/path/to/container-profile.json jess/i-am-malicious

Related

Automatically restart process on crash in an Ubuntu docker container

I have a process in an Ubuntu docker container. If it crashes, I want to restart it automatically.
What is the best way to go about it?
I checked systemd (which is the normal Linux method) but docker doesn't support it. inittab is also deprecated.
Docker offers such functionality, all you have to do is to define a restart policy for the container.
You should choose one of the available policies no,always,on-failure,unless-stopped and adjust your docker run command accordingly.
From docs:
To configure the restart policy for a container, use the --restart
flag when using the docker run command
For your case, choose one of always or on-failure.
Note: The above is valid only if the process you have mentioned is the container's entrypoint.

Integrate seccomp profile into Docker image

I have a seccomp profile for my Golang app (generated with go2seccomp) to use with Docker but would like not to have to use it on the command line with --security-opt. Is there a way to "integrate" the profile while building the image? One reason is to avoid having an extra file and be able to docker run it directly.
Maybe I'm not understanding how Docker and seccomp work together, or maybe this is just not an available feature.

Install Docker binary on a server without root access

I have a server by a provider without any root access. It is not possible to write scripts in /etc/ or /var/lib/docker. Docker is not installed. My idea is to install and run docker binary in directory. I will install docker with a shell script. The script should be able to be started from any directory without root access.
When the script starts ./docker/dockerd --data-root=docker/var/lib/docker I get this error message.
WARN[2018-11-17T18:26:19.492488618+01:00] Error while setting daemon root propagation, this is not generally critical but may cause some functionality to not work or fallback to less desirable behavior dir=docker/var/lib/docker error="error getting daemon root's parent mount: open /proc/self/mountinfo: permission denied"
Error starting daemon: open /var/run/docker.pid: permission denied
dockerd has so many parameter. Here for the pidfile: -p | **--pidfile*[=/var/run/docker.pid]
http://manpages.ubuntu.com/manpages/cosmic/man8/dockerd.8.html
Thank you for the help
#!/bin/bash
DOCKER_RELEASE='docker-18.06.1-ce.tgz'
wget https://download.docker.com/linux/static/stable/x86_64/$DOCKER_RELEASE
tar xzvf $DOCKER_RELEASE
rm $DOCKER_RELEASE
./docker/dockerd --data-root=docker/var/lib/docker
As announced today (Feb. 4th, 2019) by Akihiro Suda:
Finally, it is now possible to run upstream dockerd as an unprivileged user!
See moby/moby PR 38050:
Allow running dockerd in an unprivileged user namespace (rootless mode).
Close #37375 "Proposal: allow running dockerd as an unprivileged user (aka rootless mode)", opened in June 2018
No SETUID/SETCAP binary is required, except newuidmap and newgidmap.
How I did it:
By using user_namespaces(7), mount_namespaces(7), network_namespaces(7), and slirp4netns.
Warning, there are restrictions:
Restrictions:
Only vfs graphdriver is supported.
However, on Ubuntu and a few distros, overlay2 and overlay are also supported.
Starting with Linux 4.18, we will be also able to implement FUSE snapshotters.
(See Graphdriver plugins, where Docker graph driver plugins enable admins to use an external/out-of-process graph driver for use with Docker engine.
This is an alternative to using the built-in storage drivers, such as aufs/overlay/devicemapper/btrfs.)
Cgroups (including docker top) and AppArmor are disabled at the moment.
In future, Cgroups will be optionally available when delegation permission is configured on the host.
Checkpoint is not supported at the moment.
Running rootless dockerd in rootless/rootful dockerd is also possible, but not fully tested.
The documentation is now in docs/rootless.md:
Note the following requirements:
newuidmap and newgidmap need to be installed on the host.
These commands are provided by the uidmap package on most distros.
/etc/subuid and /etc/subgid should contain >= 65536 sub-IDs.
e.g. penguin:231072:65536.
That is:
$ id -u
1001
$ whoami
penguin
$ grep ^$(whoami): /etc/subuid
penguin:231072:65536
$ grep ^$(whoami): /etc/subgid
penguin:231072:65536
Either slirp4netns (v0.3+) or VPNKit needs to be installed.
slirp4netns is preferred for the best performance.
You will have to modify your script:
You need to run dockerd-rootless.sh instead of dockerd.
$ dockerd-rootless.sh --experimental"
Update May 2019: Tõnis Tiigi does explore this rootless option with "Experimenting with Rootless Docker":
User namespaces map a range of user ID-s so that the root user in the inner namespace maps to an unprivileged range in the parent namespace.
A fresh process in user namespace also picks up a full set of process capabilities.
The rootless mode works in a similar way, except we create a user namespace first and start the daemon already in the remapped namespace. The daemon and the containers will both use the same user namespace that is different from the host one.
Although Linux allows creating user namespaces without extended privileges these namespaces only map a single user and therefore do not work with many current existing containers.
To overcome that, rootless mode has a dependency on the uidmap package that can do the remapping of users for us. The binaries in uidmap package use setuid bit (or file capabilities) and therefore always run as root internally.
To make the launching of different namespaces and integration with uidmap simpler Akihiro created a project called rootlesskit.
Rootlesskit also takes care of setting up networking for rootless containers. By default rootless docker uses networking based on moby/vpnkit project that is also used for networking in the Docker Desktop products.
Alternatively, users can install slirp4netns and use that instead.
Again:
Caveats:
Some examples of things that do not work on rootless mode are cgroups resource controls, apparmor security profiles, checkpoint/restore, overlay networks etc.
Exposing ports from containers currently requires manual socat helper process.
Only Ubuntu based distros support overlay filesystems in rootless mode.
For other systems, rootless mode uses vfs storage driver that is suboptimal in many filesystems and not recommended for production workloads.
I appreciate the OP has moved on, but here's a short answer for others. If the files /etc/subuid and /etc/subgid do not fulfill the prerequisite settings (see code below), then you will be forced to involve someone with root access.
# rightmost values returned from these commands should be >= 65536
# if not, you're out of luck if admin doesn't like you.
cat /etc/subgid | grep `whoami`
cat /etc/subuid | grep `whoami`

How to add a device to all docker containers

To make use of SGX enclaves applications have to talk to the SGX driver which is exposed via /dev/isgx on the host. We execute such applications inside of Docker containers mapping /dev/isgx inside with the --device command line option.
Is there an option to add a device (/dev/isgx in this case) to any container ever started by a docker engine?
Edit:
Progress on my side so far:
Docker uses containerd & runc to create a containers configuration before it is started. Docker's configuration file /etc/docker/daemon.json has a field runtimes where one can provide arbitrary arguments to runc:
[...]
"runtimes": {
"runc": {
"path": "runc"
},
"custom": {
"path": "/usr/local/bin/my-runc-replacement",
"runtimeArgs": [
"--debug"
]
}
},
[...]
Sadly, it seams runc is not consuming to many useful arguments for my means (runc --help and runc spec --help <-- creates the configuration).
I found interesting source code regarding DefaultSimpleDevices and DefaultAllowedDevices in runc's codebase. The last commit to this file says 'Do not create /dev/fuse by default' which is promising, but would involve building my own runc. I was hoping for a generic solution via a configuration option.
UPDATE
This is not the correct answer. Turns out, that docker's default parent cgroup already has open devices permissions:
/# cat /sys/fs/cgroup/devices/docker/devices.list
a *:* rwm
Upon container creation a new cgroup for that container is created with more restricted devices rules.
ORIGINAL ANSWER
I think you could use cgroups to achieve what you want.
You could create a new cgroup on your host machine which allows access to /dev/isgx and start your docker daemon with --cgroup-parent=<my-cgroup-name>.
You could also set the cgroup-parent option in your /etc/docker/daemon.json.
If you have never worked with cgroups before, then it might not be trivial to setup though.
How to create a new cgroup depends on your host system, but you must use the devices controller to whitelist specific devices for a cgroup.
E.g., one way is to use libcgroup's /etc/cgconfig.conf and give read/write access to a block device for cgroup dockerdaemon in the following way:
group dockerdaemon {
devices {
devices.allow = b <device-major>:<device-minor> rw
}
}
Here is one example on how to find out the major/minor of your block device:
sudo udevadm info -n /dev/isgx
Here are some further links that might give you more insights into the whole cgroup topic:
cgroups in CentOS6
cgroups in redhat
cgroups in Ubuntu
You need something like this in your docker-compose.yaml file (or similar for other Docker-based technologies:
devices:
- "/dev/isgx:/dev/isgx"

Docker root access to host system

When I run a container as a normal user I can map and modify directories owned by root on my host filesystem. This seems to be a big security hole. For example I can do the following:
$ docker run -it --rm -v /bin:/tmp/a debian
root#14da9657acc7:/# cd /tmp/a
root#f2547c755c14:/tmp/a# mv df df.orig
root#f2547c755c14:/tmp/a# cp ls df
root#f2547c755c14:/tmp/a# exit
Now my host filesystem will execute the ls command when df is typed (mostly harmless example). I cannot believe that this is the desired behavior, but it is happening in my system (debian stretch). The docker command has normal permissions (755, not setuid).
What am I missing?
Maybe it is good to clarify a bit more. I am not at the moment interested in what the container itself does or can do, nor am I concerned with the root access inside the container.
Rather I notice that anyone on my system that can run a docker container can use it to gain root access to my host system and read/write as root whatever they want: effectively giving all users root access. That is obviously not what I want. How to prevent this?
There are many Docker security features available to help with Docker security issues. The specific one that will help you is User Namespaces.
Basically you need to enable User Namespaces on the host machine with the Docker daemon stopped beforehand:
dockerd --userns-remap=default &
Note this will forbid the container from running in privileged mode (a good thing from a security standpoint) and restart the Docker daemon (it should be stopped before performing this command). When you enter the Docker container, you can restrict it to the current non-privileged user:
docker run -it --rm -v /bin:/tmp/a --user UID:GID debian
Regardless, try to enter the Docker container afterwards with your default command of
docker run -it --rm -v /bin:/tmp/a debian
If you attempt to manipulate the host filesystem that was mapped into a Docker volume (in this case /bin) where files and directories are owned by root, then you will receive a Permission denied error. This proves that User Namespaces provide the security functionality you are looking for.
I recommend going through the Docker lab on this security feature at https://github.com/docker/labs/tree/master/security/userns. I have done all of the labs and opened Issues and PRs there to ensure the integrity of the labs there and can vouch for them.
Access to run docker commands on a host is access to root on that host. This is the design of the tool since the functionality to mount filesystems and isolate an application requires root capabilities on linux. The security vulnerability here is any sysadmin that grants access to users to run docker commands that they wouldn't otherwise trust with root access on that host. Adding users to the docker group should therefore be done with care.
I still see Docker as a security improvement when used correctly, since applications run inside a container are restricted from what they can do to the host. The ability to cause damage is given with explicit options to running the container, like mounting the root filesystem as a rw volume, direct access to devices, or adding capabilities to root that permit escaping the namespace. Barring the explicit creation of those security holes, an application run inside a container has much less access than it would if it was run outside of the container.
If you still want to try locking down users with access to docker, there are some additional security features. User namespacing is one of those which prevents root inside of the container from having root access on the host. There's also interlock which allows you to limit the commands available per user.
You're missing that containers run as uid 0 internally by default. So this is expected. If you want to restrict the permission more inside the container, build it with a USER statement in Dockerfile. This will setuid to the named user at runtime, instead of running as root.
Note that the uid of this user it not necessarily predictable, as it is assigned inside the image you build, and it won't necessarily map to anything on the outside system. However, the point is, it won't be root.
Refer to Dockerfile reference for more information.

Resources