When I run a container as a normal user I can map and modify directories owned by root on my host filesystem. This seems to be a big security hole. For example I can do the following:
$ docker run -it --rm -v /bin:/tmp/a debian
root#14da9657acc7:/# cd /tmp/a
root#f2547c755c14:/tmp/a# mv df df.orig
root#f2547c755c14:/tmp/a# cp ls df
root#f2547c755c14:/tmp/a# exit
Now my host filesystem will execute the ls command when df is typed (mostly harmless example). I cannot believe that this is the desired behavior, but it is happening in my system (debian stretch). The docker command has normal permissions (755, not setuid).
What am I missing?
Maybe it is good to clarify a bit more. I am not at the moment interested in what the container itself does or can do, nor am I concerned with the root access inside the container.
Rather I notice that anyone on my system that can run a docker container can use it to gain root access to my host system and read/write as root whatever they want: effectively giving all users root access. That is obviously not what I want. How to prevent this?
There are many Docker security features available to help with Docker security issues. The specific one that will help you is User Namespaces.
Basically you need to enable User Namespaces on the host machine with the Docker daemon stopped beforehand:
dockerd --userns-remap=default &
Note this will forbid the container from running in privileged mode (a good thing from a security standpoint) and restart the Docker daemon (it should be stopped before performing this command). When you enter the Docker container, you can restrict it to the current non-privileged user:
docker run -it --rm -v /bin:/tmp/a --user UID:GID debian
Regardless, try to enter the Docker container afterwards with your default command of
docker run -it --rm -v /bin:/tmp/a debian
If you attempt to manipulate the host filesystem that was mapped into a Docker volume (in this case /bin) where files and directories are owned by root, then you will receive a Permission denied error. This proves that User Namespaces provide the security functionality you are looking for.
I recommend going through the Docker lab on this security feature at https://github.com/docker/labs/tree/master/security/userns. I have done all of the labs and opened Issues and PRs there to ensure the integrity of the labs there and can vouch for them.
Access to run docker commands on a host is access to root on that host. This is the design of the tool since the functionality to mount filesystems and isolate an application requires root capabilities on linux. The security vulnerability here is any sysadmin that grants access to users to run docker commands that they wouldn't otherwise trust with root access on that host. Adding users to the docker group should therefore be done with care.
I still see Docker as a security improvement when used correctly, since applications run inside a container are restricted from what they can do to the host. The ability to cause damage is given with explicit options to running the container, like mounting the root filesystem as a rw volume, direct access to devices, or adding capabilities to root that permit escaping the namespace. Barring the explicit creation of those security holes, an application run inside a container has much less access than it would if it was run outside of the container.
If you still want to try locking down users with access to docker, there are some additional security features. User namespacing is one of those which prevents root inside of the container from having root access on the host. There's also interlock which allows you to limit the commands available per user.
You're missing that containers run as uid 0 internally by default. So this is expected. If you want to restrict the permission more inside the container, build it with a USER statement in Dockerfile. This will setuid to the named user at runtime, instead of running as root.
Note that the uid of this user it not necessarily predictable, as it is assigned inside the image you build, and it won't necessarily map to anything on the outside system. However, the point is, it won't be root.
Refer to Dockerfile reference for more information.
Related
I understand that it's considered a bad security practice to run Docker images as root, but I have a specific situation that I wanted to pass by the community to see if anyone can help.
We are currently using a pipeline on an Amazon Linux 2 instance with a single user called ec2-user. Unfortunately, a lot of the scripts we're using for our pipeline have hard-coded paths baked in (notably /home/ec2-user/) ... which may or may not reference the $HOME variable.
I've been talking to one of the engineers that is building a Docker image for our pipeline and suggested that he creates a new user entirely so root user isn't running our pipeline.
For example:
# add clip user
RUN groupadd-r clip && useradd -r -g clip clip
# disable root
RUN chsh -s /usr/sbin/nologin root
# set environment variables
ENV HOME /home/clip
ENV DEBIAN FRONTEND-noninteractive
However, the engineer mentioned that the clip user inside the container will have some uid that may or may not exist in the host machine. For example, if the clip user had uid 1001 in the container, but 1001 was john in the host, all the files created as the clip user inside the container would be owned by john on the outside.
Further, he is more concerned about the situation where the clip user has a uid in the container that doesn’t exist in the host’s passwd. In that case files created by the clip user in the container would be owned by a bare unassociated uid on the host.
If we decided to pass in ids from the host as the user/group to run the image. The kernel will be ok with it (same kernel as the host), and when all is said and done files created inside the container will then be owned by the user/group you pass in. However, the container wouldn’t know who that user/group are, so it’ll just use the raw ids, and stuff like $HOME or whoami won’t work.
With that said, we're curious if anyone else has experienced these problems and if anyone has found solutions?
Everything you say is totally normal. The container has its own /etc/passwd file, and so a given numeric user ID might map to different user names (or to not at all) in the host and in the container. Beyond some cosmetic issues around debug shells, it shouldn't usually matter if the current numeric uid is actually present in the container /etc/passwd, and there's no reason a container uid would need to be mapped in the host /etc/passwd.
Note that there are a couple of ways to directly assume another user ID in Docker, either using the docker run -u option or the Dockerfile USER directive. The RUN chsh command you propose doesn't really do anything and doesn't prevent becoming root inside a container.
clip user inside the container will have some uid that may or may not exist in the host machine.
True, totally normal.
For example, if the clip user had uid 1001 in the container, but 1001 was john in the host, all the files created as the clip user inside the container would be owned by john on the outside.
This is partially true, but only in the case where you've explicitly mapped a host directory into the container with a docker run -v option. Otherwise, the host user with uid 1001 won't be able to navigate to the /var/lib/docker/... directory that actually contains the container files, so it doesn't matter that they could hypothetically write them.
The more usual case around this is to explicitly supply a host uid so that the container process can save its state in a mapped host directory. Pass a numeric uid to the docker run -u option; there's no particular need for that uid to exist in the container's /etc/passwd.
docker run \
-u $(id -u) \
-v "$PWD/data:/data" \
...
the container wouldn’t know who that user/group are, so it’ll just use the raw ids, and stuff like $HOME or whoami won’t work.
Unless your application explicitly calls these things, they won't usually matter. "Home directory" is a pretty poorly defined concept in a Docker container since it's usually a wrapper around a single process.
I am trying to run DPDK in a non-privileged docker container. While I can limit the container's privileges and specify the container as non-privileged, I still need to run a dpdk application (say testpmd) as root. I can also run the container as non-root and use sudo to start testpmd.
I was wondering if anyone is able to run dpdk (without the --no-huge option) as non-root user, inside a docker container. If so, are there certain privileges or permissions that need to be granted?
UPDATED:
I'm using DPDK 20.02. I think I've narrowed down the problem to a ulimit -l setting.
From testpmd:
EAL: cannot set up DMA remapping, error 12 (Cannot allocate memory)
From Dmesg
dmesg: [ 5697911.199003] vfio_pin_pages_remote: RLIMIT_MEMLOCK (65536) exceeded.
In response to Vipin:
Did you need to adjust the limits for the container? if so how?
I am using helm to deploy the pods so I'm not sure if I can modify the docker run command, it looks like I would need to edit /etc/security/limits.conf on the host and redeploy.
Also, what did you use to give ownership of the fs? Doesn't having a non-privilieged container prevent you? For testing, I just sudo it, but ultimately I want to be able to drop SETUID/SETGID.
We can run DPDK on Host or inside docker with non root user.
To run DPDK as non-root user
Create or choose the user without root privelleges
set access to RUNTIME directory value as export XDG_RUNTIME_DIR=/tmp (since all users has access to tmp folder and on certain distros /var/run might not be accessible
Mount the huge pages to similar folder mkdir -p /tmp/mnt/huge; mount -t hugetlbfs nodev /tmp/mnt/huge
assign ownership to user to access the huge page as chown -R [non-root user]:[non-root user] /tmp/mnt/huge
If access to devices are required check the same with either iommu or no-iommu drivers using lsmod | grep vfio
change the ownership of the device chown -R [non-root user]:[non-root user] /dev/vfio/[device id]
user DPDK rte_eal_init option --huge-dir o point to /tmp/mnt/huge
Certian PMD might fail even after step 7, for those use option --legacy-mem this resolves the issue.
In order to run the DPDK application inside a docker, couple more things need to addressed
Use DPDK either 19.11 LTS or greater (there are patches related to docker, namespace, memory limit)
certain SE policy (Linux) does not allow sharing of huge Pages, so use option --in-memory to disable sharing of MMAP to huge pages (this should avoid most of the issues).
Note: assumption made
there is only one single application to run on docker
as mentioned in the comments --privileged -v /sys/bus/pci/drivers:/sys/bus/pci/drivers -v /sys/kernel/mm/hugepages:/sys/kernel/mm/hugepages -v /sys/devices/system/node:/sys/devices/system/node -v /dev:/dev are used to run DPDk in docker with sudo privelleges.
I assume based on the question ulimit -c unlimited cannot be executed also.
If there are multiple dockers running dpdk application always use --file-prefix to distinguish.
I have not tried this with DPDK 21.02, 21.05, 21.08
[EDIT-1] the earlier question that got removed is running DPDK as non root
I currently run the official Tensorflow Docker Container (GPU) with Nvidia-Docker:
https://hub.docker.com/r/tensorflow/tensorflow/
https://gcr.io/tensorflow/tensorflow/
However, I can't find a way to set a default user for the container. The default user for this container is "root", which is dangerous in term of security and problematic because it gives root access to the shared folders.
Let's say my host machine run with the user "CNNareCute", is there any way to launch my containers with the same user ?
Docker containers by default run as root. You can override the user by passing --user <user> to docker run command. Note however this might be problematic in case the container process needs root access inside the container.
The security concern you mention is handled in docker using User Namespaces. Usernamespaces basically map users in the container to a different pool of users on the host. Thus you can map the root user inside the container to a normal user on the host and the security concern should be mitigated.
AFAIK, docker images run by default as root. This means that any Dockerfile using the image as a base, doesn't have to jump through hoops to modify it. You could carry out user modification in a Dockerfile - same way you would on any other linux box which would give you the configuration you need.
You won't be able to use users (dynamically) from your host in the containers without creating them in the container first - and they will be in effect separate users of the same name.
You can run commands and ssh into containers as a specific user provided it exists on the container. For example, a PHP application needing commands run that retain www-data privileges, would be run as follows:
docker exec --user www-data application_container_1 sh -c "php something"
So in short, you can set up whatever users you like and use them to run scripts but the default will be root and it will exist unless you remove it which may also have repercussions...
In my centos system, I add a user to group docker, and I found such a user can access any folder by attach folder to container via docker run -it -v path-to-directory:directory-in-container. For example, I have a folder with mode 700 which can only access by root, but if someone who doesn't have permission to access this folder run a container and mount this folder to container, he can access this folder in container.How can I prevent such a user to attach unauthorized directories to docker container? My docker version is 17.03.0-ce, system os centOS 7.0. Thanks!
You should refer and follow the container principles for security or external volume permission setting up. But you want to test simply for container features.
You can set up the path-to-directory access mode 777, yes it's world-readable-writable access mode. It's no additional owner-group setting and access mode setting for any container volume mapping.
chmod 777 /path/to/directory
Docker daemon runs as root and normally starts the containers as root, with users inside mapping one-to-one to the host users, so anybody in the docker group has effective root permissions.
There is an option to tell dockerd to run the containers via subusers of a specific user, see [https://docs.docker.com/engine/security/userns-remap/]. That prevents full root access, but everybody accessing the docker daemon will be running the containers under that user—and if that user is not them, they won't be able to usefully mount things in their home.
Also I believe it is incompatible with --privileged containers, but of course those give you full root access via other means as well anyway.
Docker kind of always had a USER command to run a process as a specific user, but in general a lot of things had to run as ROOT.
I have seen a lot of images that use an ENTRYPOINT with gosu to de-elevate the process to run.
I'm still a bit confused about the need for gosu. Shouldn't USER be enough?
I know quite a bit has changed in terms of security with Docker 1.10, but I'm still not clear about the recommended way to run a process in a docker container.
Can someone explain when I would use gosu vs. USER?
Thanks
EDIT:
The Docker best practice guide is not very clear: It says if the process can run without priviledges, use USER, if you need sudo, you might want to use gosu.
That is confusing because one can install all sorts of things as ROOT in the Dockerfile, then create a user and give it proper privileges, then finally switch to that user and run the CMD as that user.
So why would we need sudo or gosu then?
Dockerfiles are for creating images. I see gosu as more useful as part of a container initialization when you can no longer change users between run commands in your Dockerfile.
After the image is created, something like gosu allows you to drop root permissions at the end of your entrypoint inside of a container. You may initially need root access to do some initialization steps (fixing uid's, host mounted volume permissions, etc). Then once initialized, you run the final service without root privileges and as pid 1 to handle signals cleanly.
Edit:
Here's a simple example of using gosu in an image for docker and jenkins: https://github.com/bmitch3020/jenkins-docker
The entrypoint.sh looks up the gid of the /var/lib/docker.sock file and updates the gid of the docker user inside the container to match. This allows the image to be ported to other docker hosts where the gid on the host may differ. Changing the group requires root access inside the container. Had I used USER jenkins in the dockerfile, I would be stuck with the gid of the docker group as defined in the image which wouldn't work if it doesn't match that of the docker host it's running on. But root access can be dropped when running the app which is where gosu comes in.
At the end of the script, the exec call prevents the shell from forking gosu, and instead it replaces pid 1 with that process. Gosu in turn does the same, switching the uid and then exec'ing the jenkins process so that it takes over as pid 1. This allows signals to be handled correctly which would otherwise be ignored by a shell as pid 1.
I am using gosu and entrypoint.sh because I want the user in the container to have the same UID as the user that created the container.
Docker Volumes and Permissions.
The purpose of the container I am creating is for development. I need to build for linux but I still want all the connivence of local (OS X) editing, tools, etc. My keeping the UIDs the same inside and outside the container it keeps the file ownership a lot more sane and prevents some errors (container user cannot edit files in mounted volume, etc)
Advantage of using gosu is also signal handling. You may trap for instance SIGHUP for reloading the process as you would normally achieve via systemctl reload <process> or such.