Docker(containers) cgroup/namespace setup vs running Dockerfile commands as root? - docker

From my understanding, docker sets up the required cgroup's and namespace's so containers(i.e container processes) run in isolation (isolated environment on the host system) and have limited permissions and access to the host system. So, even if the process is running as root in the container, it will not have root access on the host system.
But from this article: processes-in-containers-should-not-run-as-root, i see that it is still possible for a container process running as root to access the host files which are only accessible to root on the host system.
On host system:
root#srv:/root# ls -l
total 4
-rw------- 1 root root 17 Sep 26 20:29 secrets.txt
Dockerfile -
FROM debian:stretch
CMD ["cat", "/tmp/secrets.txt"]
On running corresponding image of above Dockerfile,
marc#srv:~$ docker run -v /root/secrets.txt:/tmp/secrets.txt <img>
top secret stuff
If, top secret stuff is readable, how is it possible. Then what is the point of container isolation. What am i missing, seems there is something more I am missing.
(has it to do with how i use docker run, by default are all permissions/capabilities given to the container based on the user running the docker run command.

A container can only access the host filesystem if the operator explicitly gives it access. For example, try without any docker run -v options:
docker run \
--rm \ # clean up the container when done
-u root \ # explicitly request root user
busybox \ # image to run
cat /etc/shadow # dumps the _container's_ password file
More generally, the rule (on native Linux without user namespace remapping) is that, if files are bind-mounted from the host into a container, they are accessible if the container's numeric user or group IDs match the file's ownership and permissions. If a file is owned by uid 1000 on the host with mode 0600, it can be read by uids 0 or 1000 in the container, regardless of the corresponding container and host users' names.
The corollary to this is that anyone who can run any docker run command at all can pretty trivially root the entire host.
docker run \
--rm \
-u root \
-v /:/host \ # bind-mount the host filesystem into the container
busybox \
cat /host/etc/shadow # dumps the host's encrypted password file
The root user in a container is further limited by Linux capabilities: without giving special additional Docker options, even running as root, a container can't change filesystem mounts, modify the network configuration, load kernel modules, reboot the host, or do several other extra-privileged things. (And it's usually better to do these things outside a container than to give extra permission to Docker; don't casually run containers --privileged.)
It's still generally better practice to run containers as non-root users. The user ID doesn't need to match any user ID in particular, it just needs to not be 0 (matching a specific host uid isn't portable across hosts and isn't recommended). The files in the container generally should be owned by root, so they can't be accidentally overwritten.
FROM debian
# Create the non-root user
RUN adduser --system --no-create-home nonroot
# Do the normal installation, as root
COPY ... # no --chown option
RUN ... # does not run chown either
# Specify the non-root user only for the final container
EXPOSE 12345
USER nonroot
CMD the main container command
If the container does need to read or (especially) write host files, bind-mount the host directory into some data-specific directory in the container (do not overwrite the application code with this mount) and use the docker run -u option to specify the host uid that the container needs to run as. The user does not specifically need to exist in the container's /etc/passwd file.
docker run \
-v "$PWD:/app/data" \ # bind-mount the current directory as data
-u $(id -u) \ # specify the user ID to use
...

Related

Disable root login into the docker container [duplicate]

I am working on hardening our docker images, which I already have a bit of a weak understanding of. With that being said, the current step I am on is preventing the user from running the container as root. To me, that says "when a user runs 'docker exec -it my-container bash', he shall be an unprivileged user" (correct me if I'm wrong).
When I start up my container via docker-compose, the start script that is run needs to be as root since it deals with importing certs and mounted files (created externally and seen through a volume mount). After that is done, I would like the user to be 'appuser' for any future access. This question seems to match pretty well what I'm looking for, but I am using docker-compose, not docker run: How to disable the root access of a docker container?
This seems to be relevant, as the startup command differs from let's say tomcat. We are running a Spring Boot application that we start up with a simple 'java -jar jarFile', and the image is built using maven's dockerfile-maven-plugin. With that being said, should I be changing the user to an unprivileged user before running that, or still after?
I believe changing the user inside of the Dockerfile instead of the start script will do this... but then it will not run the start script as root, thus blowing up on calls that require root. I had messed with using ENTRYPOINT as well, but could have been doing it wrong there. Similarly, using "user:" in the yml file seemed to make the start.sh script run as that user instead of root, so that wasn't working.
Dockerfile:
FROM parent/image:latest
ENV APP_HOME /apphome
ENV APP_USER appuser
ENV APP_GROUP appgroup
# Folder containing our application, i.e. jar file, resources, and scripts.
# This comes from unpacking our maven dependency
ADD target/classes/app ${APP_HOME}/
# Primarily just our start script, but some others
ADD target/classes/scripts /scripts/
# Need to create a folder that will be used at runtime
RUN mkdir -p ${APP_HOME}/data && \
chmod +x /scripts/*.sh && \
chmod +x ${APP_HOME}/*.*
# Create unprivileged user
RUN groupadd -r ${APP_GROUP} && \
useradd -g ${APP_GROUP} -d ${APP_HOME} -s /sbin/nologin -c "Unprivileged User" ${APP_USER} && \
chown -R ${APP_USER}:${APP_GROUP} ${APP_HOME}
WORKDIR $APP_HOME
EXPOSE 8443
CMD /opt/scripts/start.sh
start.sh script:
#!/bin/bash
# setup SSL, modify java command, etc
# run our java application
java -jar "boot.jar"
# Switch users to always be unprivileged from here on out?
# Whatever "hardening" wants... Should this be before starting our application?
exec su -s "/bin/bash" $APP_USER
app.yml file:
version: '3.3'
services:
app:
image: app_image:latest
labels:
c2core.docker.compose.display-name: My Application
c2core.docker.compose.profiles: a_profile
volumes:
- "data_mount:/apphome/data"
- "cert_mount:/certs"
hostname: some-hostname
domainname: some-domain
ports:
- "8243:8443"
environment:
- some_env_vars
depends_on:
- another-app
networks:
a_network:
aliases:
- some-network
networks:
a_network:
driver: bridge
volumes:
data_mount:
cert_mount:
docker-compose shell script:
docker-compose -f app.yml -f another-app.yml $#
What I would expect is that anyone trying to access the container internally will be doing so as appuser and not root. The goal is to prevent someone from messing with things they shouldn't (i.e. docker itself).
What is happening is that the script will change users after the app has started (proven via an echo command), but it doesn't seem to be maintained. If I exec into it, I'm still root.
As David mentions, once someone has access to the docker socket (either via API or with the docker CLI), that typically means they have root access to your host. It's trivial to use that access to run a privileged container with host namespaces and volume mounts that let the attacker do just about anything.
When you need to initialize a container with steps that run as root, I do recommend gosu over something like su since su was not designed for containers and will leave a process running as the root pid. Make sure that you exec the call to gosu and that will eliminate anything running as root. However, the user you start the container as is the same as the user used for docker exec, and since you need to start as root, your exec will run as root unless you override it with a -u flag.
There are additional steps you can take to lock down docker in general:
Use user namespaces. These are defined on the entire daemon, require that you destroy all containers, and pull images again, since the uid mapping affects the storage of image layers. The user namespace offsets the uid's used by docker so that root inside the container is not root on the host, while inside the container you can still bind to low numbered ports and run administrative activities.
Consider authz plugins. Open policy agent and Twistlock are two that I know of, though I don't know if either would allow you to restrict the user of a docker exec command. They likely require that you give users a certificate to connect to docker rather than giving them direct access to the docker socket since the socket doesn't have any user details included in API requests it receives.
Consider rootless docker. This is still experimental, but since docker is not running as root, it has no access back to the host to perform root activities, mitigating many of the issues seen when containers are run as root.
You intrinsically can't prevent root-level access to your container.
Anyone who can run any Docker command at all can always run any of these three commands:
# Get a shell, as root, in a running container
docker exec -it -u 0 container_name /bin/sh
# Launch a new container, running a root shell, on some image
docker run --rm -it -u 0 --entrypoint /bin/sh image_name
# Get an interactive shell with unrestricted root access to the host
# filesystem (cd /host/var/lib/docker)
docker run --rm -it -v /:/host busybox /bin/sh
It is generally considered best practice to run your container as a non-root user, either with a USER directive in the Dockerfile or running something like gosu in an entrypoint script, like what you show. You can't prevent root access, though, in the face of a privileged user who's sufficiently interested in getting it.
When the docker is normally run from one host, you can do some steps.
Make sure it is not run from another host by looking for a secret in a directory mounted from the accepted host.
Change the .bashrc of the users on the host, so that they will start running the docker as soon as they login. When your users needs to do other things on the host, give them an account without docker access and let them sudo to a special user with docker access (or use a startdocker script with a setuid flag).
Start the docker with a script that you made and hardened, something like startserver.
#!/bin/bash
settings() {
# Add mount dirs. The homedir in the docker will be different from the one on the host.
mountdirs="-v /mirrored_home:/home -v /etc/dockercheck:/etc/dockercheck:ro"
usroptions="--user $(id -u):$(id -g) -v /etc/passwd:/etc/passwd:ro"
usroptions="${usroptions} -v/etc/shadow:/etc/shadow:ro -v /etc/group:/etc/group:ro"
}
# call function that fills special variables
settings
image="my_image:latest"
docker run -ti --rm ${usroptions} ${mountdirs} -w $HOME --entrypoint=/bin/bash "${image}"
Adding a variable --env HOSTSERVER=${host} won't help hardening, on another server one can add --env HOSTSERVER=servername_that_will_be_checked.
When the user logins to the host, the startserver will be called and the docker started. After the call to the startserver add exit to the .bash_rc.
Not sure if this work but you can try. Allow sudo access for user/group with limited execution command. Sudo configuration only allow to execute docker-cli. Create a shell script by the name docker-cli with content that runs docker command, eg docker "$#". In this file, check the argument and enforce user to provide switch --user or -u when executing exec or attach command of docker. Also make sure validate the user don't provide a switch saying -u root. Eg
sudo docker-cli exec -it containerid sh (failed)
sudo docker-cli exec -u root ... (failed)
sudo docker-cli exec -u mysql ... (Passed)
You can even limit the docker command a user can run inside this shell script

Bind mounts created using rootless docker have a weird uid on the host machine. How can I delete these folders?

I have the following docker-compose.yml file which creates a bind mount located in $HOME/test on the host system:
version: '3.8'
services:
pg:
image: postgres:13
volumes:
- $HOME/test:/var/lib/postgresql/data
environment:
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=pass
- PGUSER=postgres
I bring up the container and inspect the permissions of the bind mount directory:
$ docker-compose up -d
$ ls -l ~
drwx------ 19 4688518 usertest 4096 Mar 11 17:06 test
The folder ~/test is created with a different uid in order to prevent accidental manipulation of this folder outside of the container. But what if I really do want to manipulate it? For example, if I try to delete the folder, I get a permission denied error as expected:
$ rm ~/test -rf
rm: cannot remove '/home/usertest/test': Permission denied
I suspect that I need to change uids using the newuidmap command somehow, but I'm not sure how to go about that.
How can I delete these folders?
But what if I really do want to manipulate it?
Using Docker, you can:
Run a command in the container as a specific user using the same UID (such as rm or sh), for example:
# Run shell session using your user with docker-compose
# You can then easily manipulate data
docker-compose exec -u 4688518 pg sh
# Run command directly with docker
# Docker container name may vary depending on your situation
# Use docker ps to see real container name
docker exec -it -u 4688518 stack_pg_1 rm -rf /var/lib/postgresql/data
Similar to previous one, you can run a new container with:
# Will run sh by default
docker run -it -u 4688518 -v $HOME/test:/tmp/test busybox
# You can directly delete data with
docker run -it -u 4688518 -v $HOME/test:/tmp/test busybox rm -rf /tmp/test/*
This may be suitable if your pg container is stopped or deleted. Docker image itself does not need to be the same as the one run by Docker Compose, you only need to specify proper user UID.
Note: you may not be able to delete folder using rm -rf /tmp/test as user 4688518 may not have writing permission on /tmp folder to do so, hence the use of /tmp/test/*
Use any of the above, but using root user such as -u 0 or -u root
Without using Docker, you can effectively run sudo command as suggested by other answer, or even temporarily change permission of said folder then change it back. However, from experience, when manipulating Docker-related data it's easier and less error-prone to user Docker itself.
Dealing with user ids in docker is tricky business because docker containers share the same kernel with the host operating system (at least on linux). Consequently, any files that the container creates in the bind mount with a given uid will have the same uid on the host system.
Whenever the uid used by the container (let's say it's 2222) is different from your own uid (or you don't have write access to files owned by 2222), you won't be able to delete the folder. The easy workaround is to run sudo rm -rf ~/test.
Edit: If the user does not have admin rights, you can still give them rights to modify the generated files like so.
# Create a directory that the users can write in.
mkdir workspace
# Change the owner to the group of users that should have access (3333).
sudo chown -R 2222:3333 workspace
# Give group write access.
sudo chmod -R g+w workspace
# Make sure that all users that should have write access are in group 3333.
Then you can run the container using
docker run --rm -u `id -u`:3333 -v `pwd`/workspace:/workspace \
-w /workspace alpine:latest touch myfile
which creates myfile in the workspace folder with the right permissions so your users can delete the file again.

How to launch container with user namespace configuration?

In the below docker file, base image(jenkins/jenkins) is providing a user jenkins with UID 1000 and GID 1000, within container.
FROM jenkins/jenkins
# Install some base packages
# Use non-privileged user provided by base image
USER jenkins # with uid 1000 and GID 1000
# Copy plugins and other stuff
On the docker host(EC2 instance), we also have similar UID & GID created,
$ groupadd -g 1000 jenkins
$ useradd -u 1000 -g jenkins jenkins
$ mkdir -p /abc/home_folder_for_jenkins
$ chown -R jenkins:jenkins /abc/home_folder_for_jenkins
to make sure, container can write files to /abc/home_folder_for_jenkins in EC2 instance.
Another aspect that we need to take care in same EC2 instance, is to run containers(other than above container) to run in non-privileged mode.
So, below configuration is performed on docker host(EC2):
$ echo dockremap:165536:65536 > /etc/subuid
$ echo dockremap:165536:65536 > /etc/subgid
$ echo '{"debug":true, "userns-remap":"default"}' > /etc/docker/daemon.json
This dockremap configuration is not allowing jenkins to start and docker container goes in Exited state:
$ ls -l /abc/home_folder_for_jenkins
total 0
After removing docker remap configuration, everything work fine.
Why dockremap configuration not allow the jenkins container to run as jenkins user?
I'm actually fighting with this because it seems not very portable but this is the best I found. As said above on your docker host the UID/GID are the ones from the container + the value in /etc/subuid & /etc/subgid.
So your "container root" is 165536 on your host and your user jenkins is 166536 (165536 + 1000).
To come back to your example what you need to do is
$ mkdir -p /abc/home_folder_for_jenkins
$ chown -R 166536:166536 /abc/home_folder_for_jenkins
User namespaces offset the UID/GID of the user inside the container, and any files inside the container. There is no mapping from the UID/GID inside the container to the external host UID/GID (that would defeat the purpose). Therefore, you would need the offset the UID/GID of the directory being created, or just use a named volume and let docker handle this for you. I believe that UID/GID on the host would be 166536 (165536 + 1000) (I may have an off by one in there, so try opening the directory permissions if this still fails and see what gets created).

File permission in docker container with volume mount

I'm trying to let a docker container access a letsencrypt certificate from the host file system.
I do not want to run the docker container as root, but rather as a user with very specific access rights.
Neither do I want to change the permissions of the certificate.
All I want, is for the given user, to have access to read the certificate inside the docker container.
The certificate has the following setup:
-rw-r----- 1 root cert-group
The user who's going to run the docker container, is in the cert-group:
uid=113(myuser) gid=117(myuser) groups=117(myuser),999(cert-group),998(docker)
This works as long as we're on the host - I am able to read the file as expected with the user "myuser".
Now I want to do this within a docker container with the certificate mounted as a volume.
I have done multiple test cases, but none with any luck.
A simple docker-compose file for testing:
version: '3.7'
services:
test:
image: alpine:latest
volumes:
- /etc/ssl/letsencrypt/cert.pem:/cert.pem:ro
command: >
sh -c 'ls -l / && cat /etc/passwd && cat /etc/group && cat /cert.pem'
user: "113:117"
restart: "no"
This ouputs a lot, but most important is:
test_1 | -rw-r----- 1 root ping 3998 Jul 15 09:51 cert.pem
test_1 | cat: can't open '/cert.pem': Permission denied
test_1 | ping:x:999:
Here I assume that "ping" is an internal group for docker alpine, however, im getting some mixed information about how this collaborates with the host.
From this article https://medium.com/#mccode/understanding-how-uid-and-gid-work-in-docker-containers-c37a01d01cf my takeaway is, that there's a single kernel handling all permissions (the host) and therefore if the same uid and gid is used, the permissions would inherit from the host. However, even though that the running user is 113:117, which on the host is part of the group 999 it still doesnt give me access to read the file.
Next I found this article https://medium.com/#nielssj/docker-volumes-and-file-system-permissions-772c1aee23ca where especially this bullet point caught my attention:
The container OS enforces file permissions on all operations made in
the container runtime according to its own configuration. For example,
if a user A exists in both host and container, adding user A to group
B on the host will not allow user A to write to a directory owned by
group B inside the container unless group B is created inside the
container as well and user A is added to it.
This made me think, that maybe a custom Dockerfile was needed, to add the user inside docker, and make the user part of 999 (which is known as ping as earlier stated):
FROM alpine:latest
RUN adduser -S --uid 113 -G ping myuser
USER myuser
Running this gives me the exact same result, now with myuser appended to passwd though:
test_1 | myuser:x:113:999:Linux User,,,:/home/myuser:/sbin/nologin
This is just a couple of things that I've tried.
Another is syncing /etc/passwd and /etc/group with volumes found in some other blog
volumes:
- /etc/passwd:/etc/passwd
- /etc/group:/etc/group
This makes it visually look correct inside the container, but it doesnt change the end result - still permission denied.
Any help or pointers in the right direction would be really appreciated since I'm running out of ideas.
Docker containers do not know the uid/gid of the user running the container on the host. All requests to run containers go through the docker socket, and then to the docker engine that is often running as root, and no uid/gid's are passed in those API calls. The docker engine is just running the container as the user specified in the Dockerfile or as part of the container create command (in this case, from the docker-compose.yml).
Once inside the container, the mapping from uid/gid to names is done with the /etc/passwd and /etc/group file that is inside the container. Importantly, at the filesystem level, uid/gid values are not being mapped between the container and the host (with the exception of user namespaces, but if implemented properly, that would only make this problem worse). And all filesystem operations happen at the uid/gid level, not based on names. So when you do a host volume mount, the uid/gid's are passed directly through.
The issue you are encountering here is how you are telling the container to pick the uid/gid to run the container processes. By specifying user: "113:117" you have told the container to not only specify the uid (113), but also the gid (117) of the process. When that's done, none of the secondary groups from /etc/group are assigned to the user. To get those secondary groups assigned, you want to only specify the uid, user: "113", which will then lookup the group assignments from the /etc/passwd and /etc/group file inside the container. E.g.:
user: "113"
Unfortunately, the lookup for group membership is done by docker before any volumes are mounted, so you have the following scenario.
First, create an image with an example user assigned to a few groups:
$ cat df.users
FROM alpine:latest
RUN addgroup -g 4242 group1 \
&& addgroup -g 8888 group2 \
&& adduser -u 1000 -D -H test \
&& addgroup test group1 \
&& addgroup test group2
$ docker build -t test-users -f df.users .
...
Next, run that image, comparing the id on the host to the id inside the container:
$ id
uid=1000(bmitch) gid=1000(bmitch) groups=1000(bmitch),24(cdrom),25(floppy),...
$ docker run -it --rm -u bmitch -v /etc/passwd:/etc/passwd:ro -v /etc/group:/etc/group:ro test-users:latest id
docker: Error response from daemon: unable to find user bmitch: no matching entries in passwd file.
Woops, docker doesn't see the entry from /etc/passwd, lets try with the test user we created in the image:
$ docker run -it --rm -u test -v /etc/passwd:/etc/passwd:ro -v /etc/group:/etc/group:ro test-users:latest id
uid=1000(bmitch) gid=1000(bmitch) groups=4242,8888
That works, and assigns the groups from the /etc/group file in the image, not the one we mounted. We can also see that uid works too:
$ docker run -it --rm -u 1000 -v /etc/passwd:/etc/passwd:ro -v /etc/group:/etc/group:ro test-users:latest id
uid=1000(bmitch) gid=1000(bmitch) groups=4242,8888
As soon as we specify the gid, the secondary groups are gone:
$ docker run -it --rm -u 1000:1000 -v /etc/passwd:/etc/passwd:ro -v /etc/group:/etc/group:ro test-users:latest id
uid=1000(bmitch) gid=1000(bmitch)
And if we run without overriding the /etc/passwd and /etc/group file, we can see the correct permissions:
$ docker run -it --rm -u test test-users:latest id
uid=1000(test) gid=1000(test) groups=4242(group1),8888(group2)
Likely the best option is to add a container user with the group membership matching the uid/gid values from the host. For host volumes, I've also solved this problem with a base image that dynamically adjusts the user or group inside the container to match the uid/gid of the file mounted in a volume. This is done as root, and then gosu is used to drop permissions back to the user. You can see that at sudo-bmitch/docker-base on github, specifically the fix-perms script that I would run as part of an entrypoint.
Also, be aware that mounting the /etc/passwd and /etc/group can break file permissions of other files within the container filesystem, and this user may have access inside that container that is not appropriate (e.g. you may have special access to the ping command that gives the ability to modify files or run ping commands that a normal user wouldn't have access to). This is why I tend to adjust the container user/group rather than completely replace these files.
Actually your solution is not wrong. I did the same with few differences.
This is my Dockerfile:
FROM alpine:latest
RUN addgroup -S cert-group -g 117 \
&& adduser -S --uid 113 -G cert-group myuser
USER myuser
And my docker-compose.yml:
version: '3.7'
services:
test:
build:
dockerfile: ./Dockerfile
context: .
command: >
sh -c 'ls -l / && cat /etc/passwd && cat /etc/group && cat /cert.pem'
volumes:
- "/tmp/test.txt:/cert.pem:ro"
restart: "no"
My '/tmp/test.txt' is assigned to 113:117.
IMHO, I think the problem in your docker-compose.yml that doesn't use your image. You should remove the image: and add build:
I have gone through the same issue today and luckily, the below solution helped me.
"Add :Z to your volumes mounts"
Reference: https://github.com/moby/moby/issues/41202
Note: Unfortunately It's issue with only Centos, I didn't face any problem with Ubuntu.

Docker volume and host permissions

When I run a docker image for example like
docker run -v /home/n1/workspace:/root/workspace -it rust:latest bash
and I create a directory in the container like
mkdir /root/workspace/test
It's owned by root on my host machine. Which leads to I have to change the permissions everytime after I turn of the container to be able to operate with that directory.
Is there a way how to tell Docker to handle directories and files from my machine (host machine) point of view under a certain user?
You need to run your application as the same uid inside the container as you do on the host to get file ownership to match. My own solution for this is to start the container as root, adjust the uid of the user inside the container to match the volume mount, and then su to the user to run the app. Scripts for this can be found in this repo: https://github.com/sudo-bmitch/docker-base
The in that repo, the fix-perms script handles the change in uid/gid inside the container, and the entrypoint script has an exec gosu $username "$#" that runs the app as the selected user.
Sure, because Docker uses root as a default user. You should create user in your docker container, switch to that user and then make folder, then you will get them without root permissions on you host machine.
Dockerfile
FROM rust:latest
...
RUN useradd -ms /bin/bash myuser
USER myuser

Resources