I have a server by a provider without any root access. It is not possible to write scripts in /etc/ or /var/lib/docker. Docker is not installed. My idea is to install and run docker binary in directory. I will install docker with a shell script. The script should be able to be started from any directory without root access.
When the script starts ./docker/dockerd --data-root=docker/var/lib/docker I get this error message.
WARN[2018-11-17T18:26:19.492488618+01:00] Error while setting daemon root propagation, this is not generally critical but may cause some functionality to not work or fallback to less desirable behavior dir=docker/var/lib/docker error="error getting daemon root's parent mount: open /proc/self/mountinfo: permission denied"
Error starting daemon: open /var/run/docker.pid: permission denied
dockerd has so many parameter. Here for the pidfile: -p | **--pidfile*[=/var/run/docker.pid]
http://manpages.ubuntu.com/manpages/cosmic/man8/dockerd.8.html
Thank you for the help
#!/bin/bash
DOCKER_RELEASE='docker-18.06.1-ce.tgz'
wget https://download.docker.com/linux/static/stable/x86_64/$DOCKER_RELEASE
tar xzvf $DOCKER_RELEASE
rm $DOCKER_RELEASE
./docker/dockerd --data-root=docker/var/lib/docker
As announced today (Feb. 4th, 2019) by Akihiro Suda:
Finally, it is now possible to run upstream dockerd as an unprivileged user!
See moby/moby PR 38050:
Allow running dockerd in an unprivileged user namespace (rootless mode).
Close #37375 "Proposal: allow running dockerd as an unprivileged user (aka rootless mode)", opened in June 2018
No SETUID/SETCAP binary is required, except newuidmap and newgidmap.
How I did it:
By using user_namespaces(7), mount_namespaces(7), network_namespaces(7), and slirp4netns.
Warning, there are restrictions:
Restrictions:
Only vfs graphdriver is supported.
However, on Ubuntu and a few distros, overlay2 and overlay are also supported.
Starting with Linux 4.18, we will be also able to implement FUSE snapshotters.
(See Graphdriver plugins, where Docker graph driver plugins enable admins to use an external/out-of-process graph driver for use with Docker engine.
This is an alternative to using the built-in storage drivers, such as aufs/overlay/devicemapper/btrfs.)
Cgroups (including docker top) and AppArmor are disabled at the moment.
In future, Cgroups will be optionally available when delegation permission is configured on the host.
Checkpoint is not supported at the moment.
Running rootless dockerd in rootless/rootful dockerd is also possible, but not fully tested.
The documentation is now in docs/rootless.md:
Note the following requirements:
newuidmap and newgidmap need to be installed on the host.
These commands are provided by the uidmap package on most distros.
/etc/subuid and /etc/subgid should contain >= 65536 sub-IDs.
e.g. penguin:231072:65536.
That is:
$ id -u
1001
$ whoami
penguin
$ grep ^$(whoami): /etc/subuid
penguin:231072:65536
$ grep ^$(whoami): /etc/subgid
penguin:231072:65536
Either slirp4netns (v0.3+) or VPNKit needs to be installed.
slirp4netns is preferred for the best performance.
You will have to modify your script:
You need to run dockerd-rootless.sh instead of dockerd.
$ dockerd-rootless.sh --experimental"
Update May 2019: Tõnis Tiigi does explore this rootless option with "Experimenting with Rootless Docker":
User namespaces map a range of user ID-s so that the root user in the inner namespace maps to an unprivileged range in the parent namespace.
A fresh process in user namespace also picks up a full set of process capabilities.
The rootless mode works in a similar way, except we create a user namespace first and start the daemon already in the remapped namespace. The daemon and the containers will both use the same user namespace that is different from the host one.
Although Linux allows creating user namespaces without extended privileges these namespaces only map a single user and therefore do not work with many current existing containers.
To overcome that, rootless mode has a dependency on the uidmap package that can do the remapping of users for us. The binaries in uidmap package use setuid bit (or file capabilities) and therefore always run as root internally.
To make the launching of different namespaces and integration with uidmap simpler Akihiro created a project called rootlesskit.
Rootlesskit also takes care of setting up networking for rootless containers. By default rootless docker uses networking based on moby/vpnkit project that is also used for networking in the Docker Desktop products.
Alternatively, users can install slirp4netns and use that instead.
Again:
Caveats:
Some examples of things that do not work on rootless mode are cgroups resource controls, apparmor security profiles, checkpoint/restore, overlay networks etc.
Exposing ports from containers currently requires manual socat helper process.
Only Ubuntu based distros support overlay filesystems in rootless mode.
For other systems, rootless mode uses vfs storage driver that is suboptimal in many filesystems and not recommended for production workloads.
I appreciate the OP has moved on, but here's a short answer for others. If the files /etc/subuid and /etc/subgid do not fulfill the prerequisite settings (see code below), then you will be forced to involve someone with root access.
# rightmost values returned from these commands should be >= 65536
# if not, you're out of luck if admin doesn't like you.
cat /etc/subgid | grep `whoami`
cat /etc/subuid | grep `whoami`
Related
When trying to remove a docker container (for example when trying to docker-compose down) I always get these errors:
ERROR: for <my_container> container d8424f80ef124c2f3dd8f22a8fe8273f294e8e63954e7f318db93993458bac27: driver "overlay2" failed to remove root filesystem: unlinkat /var/lib/docker/overlay2/64311f2ee553a5d42291afa316db7aa392a29687ffa61971b4454a9be026b3c4/merged: device or resource busy
Common advice like restarting the docker service, pruning or force removing the containers doesn't work. The only thing that I found that works is manually unmounting with sudo umount /home/virtfs/xxxxx/var/lib/docker/overlay2/<container_id>/merged, then I'm able to remove the container.
My OS is CentOS Linux release 7.9.2009 (Core) with kernel version 3.10.0-1127.19.1.el7.x86_64. I thought this was maybe due to overlay2 clashing with CentOS, but according to this page my CentOS/kernel version should work. It would be great to find a solution to this, because I ideally want to docker-compose down without having to use elevated privileges to umount beforehand.
From the error log, it is observed that the file mounting is involved
Execute the following command to view the related processes
grep 64311f2ee553a5d42291afa316db7aa392a29687ffa61971b4454a9be026b3c4 /proc/*/mountinfo
ps -ef | grep "The process ID obtained by the grep command above"
Stop the occupied process
Then delete the container
Often this happens when there are no processes listed as blocking, then you know it's a kernel module blocking it.
The most likely culprits are nfs (not sure why you'd run that in docker), or files inside docker that are bind-mounted, sometimes the automatic ones, such as perhaps ones created by systemd-networkd.
Overlay2 was phased out by Ubuntu for a reason. CentOS is at its end of life, so this problem might be resolved already in your most likely upgrade path, to Rocky Linux. Alternately you can enter the jungle of migrating your docker storage engine.
Or you can just get rid of the package or software hogging it in the first place, if you can. But you'll have to share more info on what it is for help on that.
I am trying to run DPDK in a non-privileged docker container. While I can limit the container's privileges and specify the container as non-privileged, I still need to run a dpdk application (say testpmd) as root. I can also run the container as non-root and use sudo to start testpmd.
I was wondering if anyone is able to run dpdk (without the --no-huge option) as non-root user, inside a docker container. If so, are there certain privileges or permissions that need to be granted?
UPDATED:
I'm using DPDK 20.02. I think I've narrowed down the problem to a ulimit -l setting.
From testpmd:
EAL: cannot set up DMA remapping, error 12 (Cannot allocate memory)
From Dmesg
dmesg: [ 5697911.199003] vfio_pin_pages_remote: RLIMIT_MEMLOCK (65536) exceeded.
In response to Vipin:
Did you need to adjust the limits for the container? if so how?
I am using helm to deploy the pods so I'm not sure if I can modify the docker run command, it looks like I would need to edit /etc/security/limits.conf on the host and redeploy.
Also, what did you use to give ownership of the fs? Doesn't having a non-privilieged container prevent you? For testing, I just sudo it, but ultimately I want to be able to drop SETUID/SETGID.
We can run DPDK on Host or inside docker with non root user.
To run DPDK as non-root user
Create or choose the user without root privelleges
set access to RUNTIME directory value as export XDG_RUNTIME_DIR=/tmp (since all users has access to tmp folder and on certain distros /var/run might not be accessible
Mount the huge pages to similar folder mkdir -p /tmp/mnt/huge; mount -t hugetlbfs nodev /tmp/mnt/huge
assign ownership to user to access the huge page as chown -R [non-root user]:[non-root user] /tmp/mnt/huge
If access to devices are required check the same with either iommu or no-iommu drivers using lsmod | grep vfio
change the ownership of the device chown -R [non-root user]:[non-root user] /dev/vfio/[device id]
user DPDK rte_eal_init option --huge-dir o point to /tmp/mnt/huge
Certian PMD might fail even after step 7, for those use option --legacy-mem this resolves the issue.
In order to run the DPDK application inside a docker, couple more things need to addressed
Use DPDK either 19.11 LTS or greater (there are patches related to docker, namespace, memory limit)
certain SE policy (Linux) does not allow sharing of huge Pages, so use option --in-memory to disable sharing of MMAP to huge pages (this should avoid most of the issues).
Note: assumption made
there is only one single application to run on docker
as mentioned in the comments --privileged -v /sys/bus/pci/drivers:/sys/bus/pci/drivers -v /sys/kernel/mm/hugepages:/sys/kernel/mm/hugepages -v /sys/devices/system/node:/sys/devices/system/node -v /dev:/dev are used to run DPDk in docker with sudo privelleges.
I assume based on the question ulimit -c unlimited cannot be executed also.
If there are multiple dockers running dpdk application always use --file-prefix to distinguish.
I have not tried this with DPDK 21.02, 21.05, 21.08
[EDIT-1] the earlier question that got removed is running DPDK as non root
During my studies, I came across the fact that Docker containers don't support neither sudo nor systemd services. Not that I need these tools but I'm just curious about the topic and couldn't find an adequate reasoning.
Docker is aimed at being minimal, since there can be many, many containers running at the same time. The idea is to reduce memory and disk usage. Since containers already run as root to begin with unless otherwise specified, there's no need to have sudo. Also, since most containers only ever run one process, there's no need for a service manager like systemd. Even if they did need to run more than one process, there are smaller programs like supervisord.
sudo is unnecessary in Docker. A container generally runs a single process, and if you intend it to run as not-root, you don't generally want it to be able to become root arbitrarily. In a Dockerfile, you can use USER to switch users as many times as you'd like; outside of Docker, you can use docker run -u root or docker exec -u root to get a root shell no matter how the container is configured.
Mechanically, sudo is bad for non-interactive environments (especially, it's very prone to asking for a user password) and users in Docker aren't usually configured with passwords at all. The most common recipe I see involves echo plain-text-password | passwd user, in a file committed to source control, and also easily retrieved via docker history; this is not good security practice.
systemd is unnecessary in Docker. A container generally runs a single process, so you don't need a process manager. Running systemd instead of the process you're trying to run also means you don't get anything useful from docker logs, can't use Docker restart policies effectively, and generally miss out on the core Docker ecosystem.
systemd also runs against the Unix philosophy of "make each program do one thing well". If you look at the set of things listed out on the systemd home page it sets up a ton of stuff; much of that is system-level things that belong to the host (swap, filesystem mounts, kernel parameters) and other things that you can't run in Docker (console getty processes). This also means you usually can't run systemd in a container without it being --privileged, which in turn means it can interfere with this system-level configuration.
There are some good technical reasons to run a dedicated init process in Docker, but a lightweight single-process init like tini is a better choice.
Beside what #Aplet123 mentioned,consider that since the containers themselves don't have root access and even cannot see other processes in the system(unless created by the --ipc option), they cannot cause any harm to your system by any means even if all the processes within the container have root access.So there's no need to limit that already-limited environment with non-root users.And when there is only one user,there's no need to have sudo.
Also starting and stopping the containers as services can be done by docker itself,so the docker daemon(which itself has been started via systemd) is in fact the Master SystemD for all containers.So there's no need to have systemd too for example when you want to start your apache HTTP server.
When I run a container as a normal user I can map and modify directories owned by root on my host filesystem. This seems to be a big security hole. For example I can do the following:
$ docker run -it --rm -v /bin:/tmp/a debian
root#14da9657acc7:/# cd /tmp/a
root#f2547c755c14:/tmp/a# mv df df.orig
root#f2547c755c14:/tmp/a# cp ls df
root#f2547c755c14:/tmp/a# exit
Now my host filesystem will execute the ls command when df is typed (mostly harmless example). I cannot believe that this is the desired behavior, but it is happening in my system (debian stretch). The docker command has normal permissions (755, not setuid).
What am I missing?
Maybe it is good to clarify a bit more. I am not at the moment interested in what the container itself does or can do, nor am I concerned with the root access inside the container.
Rather I notice that anyone on my system that can run a docker container can use it to gain root access to my host system and read/write as root whatever they want: effectively giving all users root access. That is obviously not what I want. How to prevent this?
There are many Docker security features available to help with Docker security issues. The specific one that will help you is User Namespaces.
Basically you need to enable User Namespaces on the host machine with the Docker daemon stopped beforehand:
dockerd --userns-remap=default &
Note this will forbid the container from running in privileged mode (a good thing from a security standpoint) and restart the Docker daemon (it should be stopped before performing this command). When you enter the Docker container, you can restrict it to the current non-privileged user:
docker run -it --rm -v /bin:/tmp/a --user UID:GID debian
Regardless, try to enter the Docker container afterwards with your default command of
docker run -it --rm -v /bin:/tmp/a debian
If you attempt to manipulate the host filesystem that was mapped into a Docker volume (in this case /bin) where files and directories are owned by root, then you will receive a Permission denied error. This proves that User Namespaces provide the security functionality you are looking for.
I recommend going through the Docker lab on this security feature at https://github.com/docker/labs/tree/master/security/userns. I have done all of the labs and opened Issues and PRs there to ensure the integrity of the labs there and can vouch for them.
Access to run docker commands on a host is access to root on that host. This is the design of the tool since the functionality to mount filesystems and isolate an application requires root capabilities on linux. The security vulnerability here is any sysadmin that grants access to users to run docker commands that they wouldn't otherwise trust with root access on that host. Adding users to the docker group should therefore be done with care.
I still see Docker as a security improvement when used correctly, since applications run inside a container are restricted from what they can do to the host. The ability to cause damage is given with explicit options to running the container, like mounting the root filesystem as a rw volume, direct access to devices, or adding capabilities to root that permit escaping the namespace. Barring the explicit creation of those security holes, an application run inside a container has much less access than it would if it was run outside of the container.
If you still want to try locking down users with access to docker, there are some additional security features. User namespacing is one of those which prevents root inside of the container from having root access on the host. There's also interlock which allows you to limit the commands available per user.
You're missing that containers run as uid 0 internally by default. So this is expected. If you want to restrict the permission more inside the container, build it with a USER statement in Dockerfile. This will setuid to the named user at runtime, instead of running as root.
Note that the uid of this user it not necessarily predictable, as it is assigned inside the image you build, and it won't necessarily map to anything on the outside system. However, the point is, it won't be root.
Refer to Dockerfile reference for more information.
I want to test docker in my CentOS 7.1 box, I got this warning:
[root#docker1 ~]# docker run busybox /bin/echo Hello Docker
Usage of loopback devices is strongly discouraged for production use. Either use `--storage-opt dm.thinpooldev` or use `--storage-opt dm.no_warn_on_loop_devices=true` to suppress this warning.
Hello Docker
I want to know the reason and how to suppress this warning.
The CentOS instance is running in virtualbox created by vagrant.
The warning message occurs because your Docker storage configuration is using a "loopback device" -- a virtual block device such as /dev/loop0 that is actually backed by a file on your filesystem. This was never meant as anything more than a quick hack to get Docker up and running quickly as a proof of concept.
You don't want to suppress the warning; you want to fix your storage configuration such that the warning is no longer issued. The easiest way to do this is to assign some local disk space for use by Docker's devicemapper storage driver and use that.
If you're using LVM and have some free space available on your volume group, this is relatively easy. For example, to give docker 100G of space, first create a data and metadata volume:
# lvcreate -n docker-data -L 100G /dev/my-vg
# lvcreate -n docker-metadata -L1G /dev/my-vg
And then configure Docker to use this space by editing /etc/sysconfig/docker-storage to look like:
DOCKER_STORAGE_OPTIONS=-s devicemapper --storage-opt dm.datadev=/dev/my-vg/docker-data --storage-opt dm.metadatadev=/dev/my-vg/docker-metadata
If you're not using LVM or don't have free space available on your VG, you could expose some other block device (e.g., a spare disk or partition) to Docker in a similar fashion.
There are some interesting notes on this topic here.
Thanks. This was driving me crazy. I thought bash was outputting this message. I was about to submit a bug against bash. Unfortunately, none of the options presented are viable on a laptop or such where disk is fully utilized. Here is my answer for that scenario.
Here is what I used in the /etc/sysconfig/docker-storage on my laptop:
DOCKER_STORAGE_OPTIONS="--storage-opt dm.no_warn_on_loop_devices=true"
Note: I had to restart the docker service for this to have an effect. On Fedora the command for that is:
systemctl stop docker
systemctl start docker
There is also just a restart command (systemctl restart docker), but it is a good idea to check to make sure stop really worked before starting again.
If you don't mind disabling SELinux in your containers, another option is to use overlay. Here is a link that describes that fully:
http://www.projectatomic.io/blog/2015/06/notes-on-fedora-centos-and-docker-storage-drivers/
In summary for /etc/sysconfig/docker:
OPTIONS='--selinux-enabled=false --log-driver=journald'
and for /etc/sysconfig/docker-storage:
DOCKER_STORAGE_OPTIONS=-s overlay
When you change a storage type, restarting docker will destroy your complete image and container store. You may as well everything up in the /var/lib/docker folder when doing this:
systemctl stop docker
rm -rf /var/lib/docker
dnf reinstall docker
systemctl start docker
In RHEL 6.6 any user with docker access can access my private keys, and run applications as root with the most trivial of hacks via volumes. SELinux is the one thing that prevents that in Fedora and RHEL 7. That said, it is not clear how much of the additional RHEL 7 security comes from SELinux outside the container and how much inside the container...
Generally, loopback devices are fine for instances where the limit of 100GB maximum and a slightly reduced performance are not a problem. The only issue I can find is the docker store can be corrupt if you have a disk full error while running... That can probably be avoided with quotas, or other simple solutions.
However, for a production instance it is definitely worth the time and effort to set this up correctly.
100G may excessive for your production instance. Containers and images are fairly small. Many organizations are running docker containers within VM's as an additional measure of security and isolation. If so, you might have a fairly small number of containers running per VM. In which case even 10G might be sufficient.
One final note. Even if you are using direct lvm, you probable want a additional filesystem for /var/lib/docker. The reason is the command "docker load" will create an uncompressed version of the images being loaded in this folder before adding it to the data store. So if you are trying to keep it small and light then explore options other than direct lvm.
#Igor Ganapolsky Feb and #Mincă Daniel Andrei
Check this:
systemctl edit docker --full
If directive EnvironmentFile is not listed in [Service] block, then no luck (I also have this problem on Centos7), but you can extend standard systemd unit like this:
systemctl edit docker
EnvironmentFile=-/etc/sysconfig/docker
ExecStart=
ExecStart=/usr/bin/dockerd $OPTIONS
And create a file /etc/sysconfig/docker with content:
OPTIONS="-s overlay --storage-opt dm.no_warn_on_loop_devices=true"