How to add a device to all docker containers - docker

To make use of SGX enclaves applications have to talk to the SGX driver which is exposed via /dev/isgx on the host. We execute such applications inside of Docker containers mapping /dev/isgx inside with the --device command line option.
Is there an option to add a device (/dev/isgx in this case) to any container ever started by a docker engine?
Edit:
Progress on my side so far:
Docker uses containerd & runc to create a containers configuration before it is started. Docker's configuration file /etc/docker/daemon.json has a field runtimes where one can provide arbitrary arguments to runc:
[...]
"runtimes": {
"runc": {
"path": "runc"
},
"custom": {
"path": "/usr/local/bin/my-runc-replacement",
"runtimeArgs": [
"--debug"
]
}
},
[...]
Sadly, it seams runc is not consuming to many useful arguments for my means (runc --help and runc spec --help <-- creates the configuration).
I found interesting source code regarding DefaultSimpleDevices and DefaultAllowedDevices in runc's codebase. The last commit to this file says 'Do not create /dev/fuse by default' which is promising, but would involve building my own runc. I was hoping for a generic solution via a configuration option.

UPDATE
This is not the correct answer. Turns out, that docker's default parent cgroup already has open devices permissions:
/# cat /sys/fs/cgroup/devices/docker/devices.list
a *:* rwm
Upon container creation a new cgroup for that container is created with more restricted devices rules.
ORIGINAL ANSWER
I think you could use cgroups to achieve what you want.
You could create a new cgroup on your host machine which allows access to /dev/isgx and start your docker daemon with --cgroup-parent=<my-cgroup-name>.
You could also set the cgroup-parent option in your /etc/docker/daemon.json.
If you have never worked with cgroups before, then it might not be trivial to setup though.
How to create a new cgroup depends on your host system, but you must use the devices controller to whitelist specific devices for a cgroup.
E.g., one way is to use libcgroup's /etc/cgconfig.conf and give read/write access to a block device for cgroup dockerdaemon in the following way:
group dockerdaemon {
devices {
devices.allow = b <device-major>:<device-minor> rw
}
}
Here is one example on how to find out the major/minor of your block device:
sudo udevadm info -n /dev/isgx
Here are some further links that might give you more insights into the whole cgroup topic:
cgroups in CentOS6
cgroups in redhat
cgroups in Ubuntu

You need something like this in your docker-compose.yaml file (or similar for other Docker-based technologies:
devices:
- "/dev/isgx:/dev/isgx"

Related

Limit docker (or docker-compose) resources GLOBALLY

i'm kinda new to docker and docker compose, plus i recently switched back to ubuntu from a year or so of using osx.
I am working with some docker-compose projects that are quite resource consuming, and when configuring the env on ubuntu i stumbled across a problem: when using docker on a mac (https://docs.docker.com/docker-for-mac/) you can specify maximum resources allocation -like hd space, memory, cpu - for the entire system (so to speak) in the docker app - in ubuntu i didn't find such thing anywhere.
I saw that there is a way to do this for some specific container, but what if i want to - say - allow a max of 6GB of ram for ALL containers? Is there a way to do this i'm not seeing?
Thanks a lot!
you need to setup a cgroup with limited CPU and Memory and refer Docker engine to it
example for a cgroup configs in "/etc/systemd/system/my_docker_slice.slice":
[Unit]
Description=my cgroup for Docker
Before=slices.target
[Slice]
MemoryAccounting=true
MemoryHigh=2G
MemoryMax=2.5G
CPUAccounting=true
CPUQuota=50%
and then update your docker daemon.json in /etc/docker/
{
"cgroup-parent": "/my_docker_slice.slice"
}
Note:
If the cgroup has a leading forward slash (/), the cgroup is created
under the root cgroup, otherwise the cgroup is created under the
daemon cgroup.
you can read more by search after "Default cgroup parent" here

Apply control group rule to specific (or all) Docker containers in Kubernetes cluster

By default Docker containers are unprivileged. Of course devices can be added individually with docker run --device /dev/abc0 but this cannot yet be done in Kubernetes.
In any event I have an arbitrary number of devices per node, which makes it easier to map /dev and to enable a cgroup rule: docker run -v /dev:/dev --device-cgroup-rule='c 123:* rmw'. How can I pass this --device-cgroup-rule to specific or all Docker containers controlled by Kubernetes? Can a RuntimeClass help? A system-level cgroup config?
If I understand you correctly you should focus on Kublet, it's support for several container runtimes and it's integration with Docker.
According to this documentation, there are plenty of options to choose from, like:
--cgroup-driver string
Driver that the kubelet uses to manipulate cgroups on the host.
--cgroup-root string
Optional root cgroup to use for pods. This is handled by the container
runtime on a best effort basis. Default: '', which means use the
container runtime default.
--enforce-node-allocatable stringSlice
A comma separated list of levels of node allocatable enforcement to be
enforced by kubelet. Acceptable options are 'pods', 'system-reserved'
& 'kube-reserved'. If the latter two options are specified,
'--system-reserved-cgroup' & '--kube-reserved-cgroup' must also be set
respectively. See
/docs/tasks/administer-cluster/reserve-compute-resources/ for more
details. (default [pods])
--runtime-cgroups string
Optional absolute name of cgroups to create and run the runtime in.
Please look into them and verify if they satisfy your needs.
Please let me know if that helped.

Install Docker binary on a server without root access

I have a server by a provider without any root access. It is not possible to write scripts in /etc/ or /var/lib/docker. Docker is not installed. My idea is to install and run docker binary in directory. I will install docker with a shell script. The script should be able to be started from any directory without root access.
When the script starts ./docker/dockerd --data-root=docker/var/lib/docker I get this error message.
WARN[2018-11-17T18:26:19.492488618+01:00] Error while setting daemon root propagation, this is not generally critical but may cause some functionality to not work or fallback to less desirable behavior dir=docker/var/lib/docker error="error getting daemon root's parent mount: open /proc/self/mountinfo: permission denied"
Error starting daemon: open /var/run/docker.pid: permission denied
dockerd has so many parameter. Here for the pidfile: -p | **--pidfile*[=/var/run/docker.pid]
http://manpages.ubuntu.com/manpages/cosmic/man8/dockerd.8.html
Thank you for the help
#!/bin/bash
DOCKER_RELEASE='docker-18.06.1-ce.tgz'
wget https://download.docker.com/linux/static/stable/x86_64/$DOCKER_RELEASE
tar xzvf $DOCKER_RELEASE
rm $DOCKER_RELEASE
./docker/dockerd --data-root=docker/var/lib/docker
As announced today (Feb. 4th, 2019) by Akihiro Suda:
Finally, it is now possible to run upstream dockerd as an unprivileged user!
See moby/moby PR 38050:
Allow running dockerd in an unprivileged user namespace (rootless mode).
Close #37375 "Proposal: allow running dockerd as an unprivileged user (aka rootless mode)", opened in June 2018
No SETUID/SETCAP binary is required, except newuidmap and newgidmap.
How I did it:
By using user_namespaces(7), mount_namespaces(7), network_namespaces(7), and slirp4netns.
Warning, there are restrictions:
Restrictions:
Only vfs graphdriver is supported.
However, on Ubuntu and a few distros, overlay2 and overlay are also supported.
Starting with Linux 4.18, we will be also able to implement FUSE snapshotters.
(See Graphdriver plugins, where Docker graph driver plugins enable admins to use an external/out-of-process graph driver for use with Docker engine.
This is an alternative to using the built-in storage drivers, such as aufs/overlay/devicemapper/btrfs.)
Cgroups (including docker top) and AppArmor are disabled at the moment.
In future, Cgroups will be optionally available when delegation permission is configured on the host.
Checkpoint is not supported at the moment.
Running rootless dockerd in rootless/rootful dockerd is also possible, but not fully tested.
The documentation is now in docs/rootless.md:
Note the following requirements:
newuidmap and newgidmap need to be installed on the host.
These commands are provided by the uidmap package on most distros.
/etc/subuid and /etc/subgid should contain >= 65536 sub-IDs.
e.g. penguin:231072:65536.
That is:
$ id -u
1001
$ whoami
penguin
$ grep ^$(whoami): /etc/subuid
penguin:231072:65536
$ grep ^$(whoami): /etc/subgid
penguin:231072:65536
Either slirp4netns (v0.3+) or VPNKit needs to be installed.
slirp4netns is preferred for the best performance.
You will have to modify your script:
You need to run dockerd-rootless.sh instead of dockerd.
$ dockerd-rootless.sh --experimental"
Update May 2019: Tõnis Tiigi does explore this rootless option with "Experimenting with Rootless Docker":
User namespaces map a range of user ID-s so that the root user in the inner namespace maps to an unprivileged range in the parent namespace.
A fresh process in user namespace also picks up a full set of process capabilities.
The rootless mode works in a similar way, except we create a user namespace first and start the daemon already in the remapped namespace. The daemon and the containers will both use the same user namespace that is different from the host one.
Although Linux allows creating user namespaces without extended privileges these namespaces only map a single user and therefore do not work with many current existing containers.
To overcome that, rootless mode has a dependency on the uidmap package that can do the remapping of users for us. The binaries in uidmap package use setuid bit (or file capabilities) and therefore always run as root internally.
To make the launching of different namespaces and integration with uidmap simpler Akihiro created a project called rootlesskit.
Rootlesskit also takes care of setting up networking for rootless containers. By default rootless docker uses networking based on moby/vpnkit project that is also used for networking in the Docker Desktop products.
Alternatively, users can install slirp4netns and use that instead.
Again:
Caveats:
Some examples of things that do not work on rootless mode are cgroups resource controls, apparmor security profiles, checkpoint/restore, overlay networks etc.
Exposing ports from containers currently requires manual socat helper process.
Only Ubuntu based distros support overlay filesystems in rootless mode.
For other systems, rootless mode uses vfs storage driver that is suboptimal in many filesystems and not recommended for production workloads.
I appreciate the OP has moved on, but here's a short answer for others. If the files /etc/subuid and /etc/subgid do not fulfill the prerequisite settings (see code below), then you will be forced to involve someone with root access.
# rightmost values returned from these commands should be >= 65536
# if not, you're out of luck if admin doesn't like you.
cat /etc/subgid | grep `whoami`
cat /etc/subuid | grep `whoami`

Restrict system calls inside docker container

How can I restrict any system call made inside a docker container. If the given process makes a system call it will be blocked. Or how can I use seccomp with docker.
You can see more at "Seccomp security profiles for Docker" (the eature is available only if the kernel is configured with CONFIG_SECCOMP enabled.)
The supoprt for docker containers will be in docker 1.10: see issue 17142
allowing the Engine to accept a seccomp profile at container run time.
In the future, we might want to ship builtin profiles, or bake profiles in the images.
PR 17989 has been merged.
It allows for passing a seccomp profile in the form of:
{
"defaultAction": "SCMP_ACT_ALLOW",
"syscalls": [
{
"name": "getcwd",
"action": "SCMP_ACT_ERRNO"
}
]
}
Example (based on Linux-specific Runtime Configuration - seccomp):
$ docker run --rm -it --security-ops seccomp:/path/to/container-profile.json jess/i-am-malicious

Usage of loopback devices is strongly discouraged for production use

I want to test docker in my CentOS 7.1 box, I got this warning:
[root#docker1 ~]# docker run busybox /bin/echo Hello Docker
Usage of loopback devices is strongly discouraged for production use. Either use `--storage-opt dm.thinpooldev` or use `--storage-opt dm.no_warn_on_loop_devices=true` to suppress this warning.
Hello Docker
I want to know the reason and how to suppress this warning.
The CentOS instance is running in virtualbox created by vagrant.
The warning message occurs because your Docker storage configuration is using a "loopback device" -- a virtual block device such as /dev/loop0 that is actually backed by a file on your filesystem. This was never meant as anything more than a quick hack to get Docker up and running quickly as a proof of concept.
You don't want to suppress the warning; you want to fix your storage configuration such that the warning is no longer issued. The easiest way to do this is to assign some local disk space for use by Docker's devicemapper storage driver and use that.
If you're using LVM and have some free space available on your volume group, this is relatively easy. For example, to give docker 100G of space, first create a data and metadata volume:
# lvcreate -n docker-data -L 100G /dev/my-vg
# lvcreate -n docker-metadata -L1G /dev/my-vg
And then configure Docker to use this space by editing /etc/sysconfig/docker-storage to look like:
DOCKER_STORAGE_OPTIONS=-s devicemapper --storage-opt dm.datadev=/dev/my-vg/docker-data --storage-opt dm.metadatadev=/dev/my-vg/docker-metadata
If you're not using LVM or don't have free space available on your VG, you could expose some other block device (e.g., a spare disk or partition) to Docker in a similar fashion.
There are some interesting notes on this topic here.
Thanks. This was driving me crazy. I thought bash was outputting this message. I was about to submit a bug against bash. Unfortunately, none of the options presented are viable on a laptop or such where disk is fully utilized. Here is my answer for that scenario.
Here is what I used in the /etc/sysconfig/docker-storage on my laptop:
DOCKER_STORAGE_OPTIONS="--storage-opt dm.no_warn_on_loop_devices=true"
Note: I had to restart the docker service for this to have an effect. On Fedora the command for that is:
systemctl stop docker
systemctl start docker
There is also just a restart command (systemctl restart docker), but it is a good idea to check to make sure stop really worked before starting again.
If you don't mind disabling SELinux in your containers, another option is to use overlay. Here is a link that describes that fully:
http://www.projectatomic.io/blog/2015/06/notes-on-fedora-centos-and-docker-storage-drivers/
In summary for /etc/sysconfig/docker:
OPTIONS='--selinux-enabled=false --log-driver=journald'
and for /etc/sysconfig/docker-storage:
DOCKER_STORAGE_OPTIONS=-s overlay
When you change a storage type, restarting docker will destroy your complete image and container store. You may as well everything up in the /var/lib/docker folder when doing this:
systemctl stop docker
rm -rf /var/lib/docker
dnf reinstall docker
systemctl start docker
In RHEL 6.6 any user with docker access can access my private keys, and run applications as root with the most trivial of hacks via volumes. SELinux is the one thing that prevents that in Fedora and RHEL 7. That said, it is not clear how much of the additional RHEL 7 security comes from SELinux outside the container and how much inside the container...
Generally, loopback devices are fine for instances where the limit of 100GB maximum and a slightly reduced performance are not a problem. The only issue I can find is the docker store can be corrupt if you have a disk full error while running... That can probably be avoided with quotas, or other simple solutions.
However, for a production instance it is definitely worth the time and effort to set this up correctly.
100G may excessive for your production instance. Containers and images are fairly small. Many organizations are running docker containers within VM's as an additional measure of security and isolation. If so, you might have a fairly small number of containers running per VM. In which case even 10G might be sufficient.
One final note. Even if you are using direct lvm, you probable want a additional filesystem for /var/lib/docker. The reason is the command "docker load" will create an uncompressed version of the images being loaded in this folder before adding it to the data store. So if you are trying to keep it small and light then explore options other than direct lvm.
#Igor Ganapolsky Feb and #Mincă Daniel Andrei
Check this:
systemctl edit docker --full
If directive EnvironmentFile is not listed in [Service] block, then no luck (I also have this problem on Centos7), but you can extend standard systemd unit like this:
systemctl edit docker
EnvironmentFile=-/etc/sysconfig/docker
ExecStart=
ExecStart=/usr/bin/dockerd $OPTIONS
And create a file /etc/sysconfig/docker with content:
OPTIONS="-s overlay --storage-opt dm.no_warn_on_loop_devices=true"

Resources