Cannot open vfio device in docker container as non-root user - docker

I have enabled virtualization in the BIOS and enabled the IOMMU on kernel command line (intel_iommu=on).
I bound a solarflare NIC to the vfio-pci device and added a udev rule to ensure the vfio device is accessible by my non-root user (e.g., /etc/udev/rules.d/10-vfio-docker-users.rules):
SUBSYSTEM=="vfio", OWNER="myuser", GROUP=="myuser"
I've launched my container with -u 1000 and mapped /dev (-v /dev:/dev). Running in an interactive shell in the container, I am able to verify that the device is there with the permissions set by my udev rule:
bash-4.2$ whoami
whoami: unknown uid 1000
bash-4.2$ ls -al /dev/vfio/35
crw-rw---- 1 1000 1000 236, 0 Jan 25 00:23 /dev/vfio/35
However, if I try and open it (e.g., python -c "open('/dev/vfio/35', 'rb')" I get IOError: [Errno 1] Operation not permitted: '/dev/vfio/35'. However, the same command works outside the container as the normal non-root user with user-id 1000!
It seems that there are additional security measures that are not allowing me to access the vfio device within the container. What am I missing?

Docker drops a number of privileges by default, including the ability to access most devices. You can explicitly grant access to a device using the --device flag, which would look something like:
docker run --device /dev/vfio/35 ...
Alternately, you can ask Docker not to drop any privileges:
docker run --privileged ...
You'll note that in both of the above examples it was not necessary to explicitly bind-mount /dev; in the first case, the device(s) you have exposed with --device will show up, and in the second case you see the host's /dev by default.

Related

Logspout container in Docker

I am trying to deploy logspout container in docker, but keep running into an issue which I have searched in this website and github but to no avail, so hoping someone knows.
I followed the following commands as per the Readme here: https://github.com/gliderlabs/logspout
(1) docker pull gliderlabs/logspout:latest (also tried with logspout:master, same results)
(2) docker run -d --name="logspout" --volume=/var/run/docker.sock:/var/run/docker.sock --publish=127.0.0.1:8000:80 gliderlabs/logspout (also tried with -v /var/run/docker.sock:/var/run/docker.sock, same results)
The container gets created but stops immediately. When I check the container logs (docker container logs logspout), I only see the following entries:
2021/12/19 06:37:12 # logspout v3.2.14 by gliderlabs
2021/12/19 06:37:12 # adapters: raw syslog tcp tls udp multiline
2021/12/19 06:37:12 # options :
2021/12/19 06:37:12 persist:/mnt/routes
2021/12/19 06:37:12 # jobs : pump routes http[health,logs,routes]:80
2021/12/19 06:37:12 # routes : none
2021/12/19 06:37:12 pump ended: Get http://unix.sock/containers/json?: dial unix /var/run/docker.sock: connect: no such file or directory
I checked docker.sock as ls -la /var/run/docker.sock results in srw-rw---- 1 root docker 0 Dec 12 09:49 /var/run/docker.sock. So docker.sock does exist, which adds to the confusion as to why the container can't find it.
I am new to linux/docker, but my understanding is that using -v or --version would automatically mount the location to the container, but does not seem to be happening here. So I am wondering if anyone has any suggestion on what needs to be done so that the logspout container can find the docker.sock.
System Info: Docker version 20.10.11, build dea9396; Raspberry Pi 4 ARM 64, OS: Debian GNU/Linux 11 (bullseye)
EDIT: added comment about -v tag in step (2) above
The container must be able to access the Docker Unix socket to mount it. This is typically a problem when namespace remapping is enabled. To disable remapping for the logspout container, pass the --userns=host flag to docker run, .. create, etc.

Docker device-cgroups-rule, mknod and mount

I'm attempting to implement what is described here:
https://docs.docker.com/engine/reference/commandline/create/#dealing-with-dynamically-created-devices---device-cgroup-rule
Similar to the page I am creating (and then starting) a container as follows:
docker create --device-cgroup-rule='b 8:* rmw' -name my-container my-image
Quoting from the above page
Then, a user could ask udev to execute a script that would docker exec
my-container mknod newDevX c 42 the required device when it is
added.
Within the container (docker exec -it my-container sh) I then mknod a device:
mknod /dev/sdc1 b 8 33
The device was reported as above by lsblk:
sdc 8:32 1 500M 0 disk
└─sdc1 8:33 1 500M 0 part
mknod succeeds but mounting /dev/sdc1 gives an error:
$ mount /dev/sdc1 /mnt
mount: /mnt: permission denied.
I also tried various other things like
mknod with -m
docker start with --cap-add=CAP_MKNOD
EDIT:
I also tried starting with --privileged but without the /dev/sdc1 precreated and it worked. It must have something to do with Capabilities or other differences between privileged and non-privileged mode. I tried with --cap-add=CAP_MKNOD and CAP_SYS_ADMIN but it now reports a difference message:
$ mount /dev/sdc1 /mnt
mount: /mnt: cannot mount /dev/sdc1 read-only.

Why is Loki's Docker Driver Client stopping to log after some time?

I want to send logs of my Docker containers to Grafana Loki. Therefore, I installed Loki's Docker Driver Client and started my containers with it. First I can see logs, but after some time I see no more logs.
Installation
I installed Loki's Docker Driver Client as a Docker plugin on my Docker Engine (version 20.10.2):
$ docker plugin install grafana/loki-docker-driver:master-54d1d3b --alias loki --grant-all-permissions
I didn't use the tag lastest, because of the bug Unable to connect to logging plugin in Swarm
Configuration
I started my Docker containers with Loki's Docker Driver Client as log driver:
$ docker container run
--log-driver=loki
--log-opt loki-url="$LOKI_URL"
--log-opt loki-retries=5
--log-opt loki-batch-size=400
--log-opt max-size="10m"
--log-opt max-file=5
--detach
--name $CONTAINER_NAME
--restart unless-stopped
$IMAGE:$TAG
I also added json-log driver's max-size and max-file to limit disk space, see Configuring the Docker Driver.
Problem
First I could see logs in Grafana and in command line with docker container logs, but after some time no more logs were shown. If I tried to look into the logs on Docker host and I saw an error:
$ docker container logs 75d4b13eb3e8
error from daemon in stream: Error grabbing logs: error getting log reader: LogDriver.ReadLogs: logger does not exist for 75d4b13eb3e8203b9247ecdeb41fdf495cc8fea7dcfc4775fd8261263b1dcd32
Research
I looked into the directories of the containers (see Where is a log file with logs from a container?), but I couldn't see any log files:
$ sudo ls /var/lib/docker/containers/75d4b13eb3e8203b9247ecdeb41fdf495cc8fea7dcfc4775fd8261263b1dcd32
checkpoints config.v2.json hostconfig.json hostname hosts mounts resolv.conf resolv.conf.hash
I also checked the log path (see Get an instance’s log path), but it was empty:
$ docker inspect --format='{{.LogPath}}' 75d4b13eb3e8
I found container's logs in plugin's directory (see Loki log driver not storing logs as files on disk, even with keep-file: true), but the log files don't change anymore:
$ sudo ls -la /var/lib/docker/plugins/eac33cc9913ca962a189904392e516dd495d6fd52391fb5af4a34af46b281288/rootfs/var/log/docker/75d4b13eb3e8203b9247ecdeb41fdf495cc8fea7dcfc4775fd8261263b1dcd32
total 912
drwxr-xr-x 2 root root 4096 Jan 22 12:59 .
drwxr-xr-x 17 root root 4096 Jan 22 15:46 ..
-rw-r----- 1 root root 923177 Jan 22 13:34 json.log
I looked into Docker daemon's logs (see Read the logs) and found errors and a warning (at the same time logging stopped):
$ sudo journalctl -u docker.service | grep eac33cc9913c
[...]
[...]level=error msg="panic: send on closed channel" plugin=eac33cc9913ca962a189904392e516dd495d6fd52391fb5af4a34af46b281288
[...]level=error plugin=eac33cc9913ca962a189904392e516dd495d6fd52391fb5af4a34af46b281288
[...]level=error msg="goroutine 153 [running]:" plugin=eac33cc9913ca962a189904392e516dd495d6fd52391fb5af4a34af46b281288
[...]level=error msg="main.(*loki).Log(0xc0000c5e00, 0xc0001d81c0, 0xc0000c5e80, 0x0)" plugin=eac33cc9913ca962a189904392e516dd495d6fd52391fb5af4a34af46b281288
[...]level=error msg="\t/src/loki/cmd/docker-driver/loki.go:69 +0x2fb" plugin=eac33cc9913ca962a189904392e516dd495d6fd52391fb5af4a34af46b281288
[...]level=error msg="main.consumeLog(0xc0002c0480)" plugin=eac33cc9913ca962a189904392e516dd495d6fd52391fb5af4a34af46b281288
[...]level=error msg="\t/src/loki/cmd/docker-driver/driver.go:165 +0x4c2" plugin=eac33cc9913ca962a189904392e516dd495d6fd52391fb5af4a34af46b281288
[...]level=error msg="created by main.(*driver).StartLogging" plugin=eac33cc9913ca962a189904392e516dd495d6fd52391fb5af4a34af46b281288
[...]level=error msg="\t/src/loki/cmd/docker-driver/driver.go:116 +0xa75" plugin=eac33cc9913ca962a189904392e516dd495d6fd52391fb5af4a34af46b281288
[...]level=warning msg="Unable to connect to plugin: /run/docker/plugins/eac33cc9913ca962a189904392e516dd495d6fd52391fb5af4a34af46b281288/loki.sock/LogDriver.StopLogging: Post http://%2Frun%2Fdocker%2Fplugins%2Feac33cc9913ca962a189904392e516dd495d6fd52391fb5af4a34af46b281288%2Floki.sock/LogDriver.StopLogging: EOF, retrying in 1s"
[...]
What did I do wrong?
I was experiencing the same issue.
My only differences in configuration are that I'm trialing the latest Enterprise Edition (19.03) as it brings dual logging capability although this is also supported in the latest CE versions, and I'm using the latest Loki Docker driver client now that the Github issue previously mentioned has been resolved.
I ended up setting the log-opts properties no-file and keep-file in docker-compose.yml:
logging:
driver: "loki"
options:
loki-url: "http://${LOKI_URL}:3100/loki/api/v1/push"
loki-batch-size: "400"
no-file: "false"
keep-file: "true"
max-size: "5m"
max-file: "3"
Since making this change I am receiving logs in Loki and can still use docker container logs and docker service logs on my Docker hosts.
no-file: "false" tells the driver to continue creating logs on disk and keep-file: "true" tells the driver to keep json logs if the container is stopped (by default files are removed).
Note: Originally I was adding these settings to /etc/docker/daemon.json on the host but would still see the error getting log reader issue, I had to switch to specifying the log driver per container/swarm service.
Regarding this issue
First I could see logs in Grafana and in command line with docker container logs, but after some time no more logs were shown.
On Grafana please select Query type: Range not Instant and you will see all the logs for the selected period of time, if exists in loki.

who and w commands in CentOS 8 Docker container

While playing with CentOs 8 on Docker container I found out, that outputs of who and w commands are always empty.
[root#9e24376316f1 ~]# who
[root#9e24376316f1 ~]# w
01:01:50 up 7:38, 0 users, load average: 0.00, 0.04, 0.00
USER TTY FROM LOGIN# IDLE JCPU PCPU WHAT
Even when I'm logged in as a different user in second terminal.
When I want to write to this user it shows
[root#9e24376316f1 ~]# write test
write: test is not logged in
Is this because of Docker? Maybe it works in some way that disallow sessions to see each other?
Or maybe that's some other issue. I would really appreciate some explanation.
These utilities obtain the information about current logins from the utmp file (/var/run/utmp). You can easily check that in ordinary circumstances (e.g. on the desktop system) this file contains something like the following string (here qazer is my login and tty7 is a TTY where my desktop environment runs):
$ cat /var/run/utmp
tty7:0qazer:0�o^�
while in the container this file is (usually) empty:
$ docker run -it centos
[root#5e91e9e1a28e /]# cat /var/run/utmp
[root#5e91e9e1a28e /]#
Why?
The utmp file is usually modified by programs which authenticate the user and start the session: login(1), sshd(8), lightdm(1). However, the container engine cannot rely on them, as they may be absent in the container file system, so "logging in" and "executing on behalf of" is implemented in the most primitive and straightforward manner, avoiding relying on anything inside the container.
When any container is started or any command is execd inside it, the container engine just spawns the new process, arranges some security settings, calls setgid(2)/setuid(2) to forcibly (without any authentication) alter the process' UID/GID and then executes the required binary (the entry point, the command, and so on) within this process.
Say, I start the CentOS container running its main process on behalf of UID 42:
docker run -it --user 42 centos
and then try to execute sleep 1000 inside it:
docker exec -it $CONTAINER_ID sleep 1000
The container engine will perform something like this:
[pid 10170] setgid(0) = 0
[pid 10170] setuid(42) = 0
...
[pid 10170] execve("/usr/bin/sleep", ["sleep", "1000"], 0xc000159740 /* 4 vars */) = 0
There will be no writes to /var/run/utmp, thus it will remain empty, and who(1)/w(1) will not find any logins inside the container.

How to get host's udev events from a Docker container?

In a Docker container, I am looking for a way to get the udev events on the host.
Using udevadm monitor, it sends back host's kernel events only in a container.
The question is whether there is a way to detect host's udev events or forward host's event to containers?
This is how I made my container receive host events by udev:
docker run --net=host -v /run/udev/control:/run/udev/control
--net=host allows container and host operate through PF_NETLINK sockets, which are used by udev monitor to receive kernel events (found here)
/run/udev/control is a file, which udev monitor uses to check if udevd is already running. If it doesn't exist, monitoring is disabled.
Just like above answer pointed out: we could enable --net=host, but host network is not suggested because of multiple known reasons.
In fact this issue happens just because it need NETLINK to communicate between kernel & user space, but if not use host network, host & container will in different netns, so enable udev in container could make them in same netns which then no need to use host network.
When we ran into this issue, we did next:
# apt-get install udev
# vim /etc/init.d/udev to comment some special settings:
1) Comments next:
#if [ ! -e "/run/udev/" ]; then
# warn_if_interactive
#fi
2) Comments next:
#if ! ps --no-headers --format args ax | egrep -q '^\['; then
# log_warning_msg "udev does not support containers, not started"
# exit 0
#fi
# root#e751e437a8ba:~# service udev start
[ ok ] Starting hotplug events dispatcher: systemd-udevd.
[ ok ] Synthesizing the initial hotplug events (subsystems)...done.
[ ok ] Synthesizing the initial hotplug events (devices)...done.
[ ok ] Waiting for /dev to be fully populated...done.

Resources