Deriving FROM existing Dockerfile + setting USER to non-root - docker

I'm trying to find a generic best practice for how to:
Take an arbitrary (parent) Dockerfile, e.g. one of the official Docker images that run their containerized service as root,
Derive a custom (child) Dockerfile from it (via FROM ...),
Adjust the child in the way that it runs the same service as the parent, but as non-root user.
I've been searching and trying for days now but haven't been able to come up with a satisfying solution.
I'd like to come up with an approach e.g. similar to the following, simply for adjusting the user the original service runs as:
FROM mariadb:10.3
RUN chgrp -R 0 /var/lib/mysql && \
chmod g=u /var/lib/mysql
USER 1234
However, the issue I'm running into again and again is whenever the parent Dockerfile declares some path as VOLUME (in the example above actually VOLUME /var/lib/mysql), that effectively makes it impossible for the child Dockerfile to adjust file permissions for that specific path. The chgrp & chmod are without effect in that case, so the resulting docker container won't be able to start successfully, due to file access permission issues.
I understand that the VOLUME directive works that way by design and also why it's like that, but to me it seems that it completely prevents a simple solution for the given problem: Taking a Dockerfile and adjusting it in a simple, clean and minimalistic way to run as non-root instead of root.
The background is: I'm trying to run arbitrary Docker images on an Openshift Cluster. Openshift by default prevents running containers as root, which I'd like to keep that way, as it seems quite sane and a step into the right direction, security-wise.
This implies that a solution like gosu, expecting the container to be started as root in order to drop privileges during runtime isn't good enough here. I'd like to have an approach that doesn't require the container to be started as root at all, but only as the specified USER or even with a random UID.
The unsatisfying approaches that I've found until now are:
Copy the parent Dockerfile and adjust it in the way necessary (effectively duplicating code)
sed/awk through all the service's config files during build time to replace the original VOLUME path with an alternate path, so the chgrp and chmod can work (leaving the original VOLUME path orphaned).
I really don't like these approaches, as they require to really dig into the logic and infrastructure of the parent Dockerfile and how the service itself operates.
So there must be better ways to do this, right? What is it that I'm missing? Help is greatly appreciated.

Permissions on volume mount points don't matter at all, the mount covers up whatever underlying permissions were there to start with. Additionally you can set this kind of thing at the Kubernetes level rather than worrying about the Dockerfile at all. This is usually though a PodSecurityPolicy but you can also set it in the SecurityContext on the pod itself.

Related

Switch USER for single RUN command in Dockerfile

Currently I am facing the following challenge:
I am extending a base image, which sets a USER "safeuser" at the end. In my dependent image I try to make some changes to the filesystem of the baseimage, but since "safeuser" can't modify files from "root" I would need to change via USER ROOT, do my changes and then go back to USER SAFEUSER.
This approach does seem quite ugly, what if for example the baseimage changes the username from "safuser" to "othername"? Is there any way I can change the USER only during the build process, or RUN single commands as a different user without having to explicitly switch back to the original user? Or can I at least store some reference to the original USER during the build process somehow?
This can be done using sudo --user:
root#machine:/home/user# whoami
root
root#machine:/home/user# sudo --user=user whoami
user
root#machine:/home/user# whoami
root
When used in a Dockerfile, it will look like the following:
# ...
RUN sudo --user=user mkdir /tmp/dir
RUN touch /tmp/dir/root_file
Edit: as #CharlesDuffy wrote below, doing this would require adding the safeuser to the sudoers of the build machine, which isn't ideal. I'd consider other (less elegant but more secure) options before choosing this one.

How to avoid sprinkling chowns all over

I'm trying to figure out how it's supposed to work based on these two articles:
https://medium.com/#mccode/processes-in-containers-should-not-run-as-root-2feae3f0df3b
https://vsupalov.com/docker-shared-permissions/
I am setting up a dockerfile based on an image that has user Ubuntu already set (the subject of the first linked article above) so that the container runs by default under user ubuntu. This is adhering to best practice.
The problem I'm having is that the code directories COPYed in the dockerfile are all owned by root, and calling cmake .. required for the docker build fails because of this. I understand that the COPYs will by default run as root, and that even if I use the --chown flag with COPY, any parent directories implicitly created by the COPY would be owned by root regardless of any --chown flag used.
Doesn't the fact that the container already has a Ubuntu user mean that calling RUN adduser --uid 1000 ubuntu in the dockerfile (the suggestion from the second linked article above) would be problematic (it'd be at best redundant)?
Then that would mean that we would not want to RUN adduser so does this mean the only remaining option is to actually just sprinkle tons of chowns everywhere in the dockerfile? I refuse to do this.
By specifying USER ubuntu near the top of the dockerfile prior to any of the COPYs it appears to work to eliminate most of the required sudo chown calls. There will still be some required, if some part of the directory tree being copied to started out owned by root and need to be owrked with later.

With Docker USER directive, multiple shell command run into permission problems

I've browsed a couple of articles about Docker best practices, and recognize that running a container as a non-privileged user has some obvious security bonuses. So my first question is: why use the USER directive at all to build your image? That is, why not simply build the image as root, but run the container as a restricted user? In that case, it seems USER is superfluous.
The story does not end there. I tried various use-cases with USER. As I am building an image based off a debian snapshot, I placed USER after all relevant apt-get installations. So far so good. But as soon as I tried creating content within the USER's home directory, permission issues surfaced -- no matter if I explicitly assigned USER and group permissions to the enclosing parent directory.
Whenever I run into a feature that does not work in the obvious way, I ask myself whether it is a feature worth keeping. So, is there any practical reason to retain USER, given that you probably could do everything in a user-restricted way -- at least from a permissions perspective -- from outside the container?
One of the main reasons to run things as a non-root user (Docker or otherwise) is as an additional layer of security: even if your application is somehow compromised, it can't overwrite its own source code or static content that it's serving to end users. I would generally include a USER directive, but only at the very end so that it only affects the (default) user for the docker run command.
FROM some-base-image
...
# Do all installation as root
...
# And pick an alternate user for runtime only
USER something
CMD ["the_app"]
"Home directory" isn't really a Docker concept. It's very typical to store your application in, say, /app (mode 0755 owned by root).

Modifying volume data inherited from parent image

Say there is an image A described by the following Dockerfile:
FROM bash
RUN mkdir "/data" && echo "FOO" > "/data/test"
VOLUME "/data"
I want to specify an image B that inherites from A and modifies /data/test. I don't want to mount the volume, I want it to have some default data I specify in B:
FROM A
RUN echo "BAR" > "/data/test"
The thing is that the test file will maintain the content it had at the moment of VOLUME instruction in A Dockerfile. B image test file will contain FOO instead of BAR as I would expect.
The following Dockerfile demonstrates the behavior:
FROM bash
# overwriting volume file
RUN mkdir "/volume-data" && echo "FOO" > "/volume-data/test"
VOLUME "/volume-data"
RUN echo "BAR" > "/volume-data/test"
RUN cat "/volume-data/test" # prints "FOO"
# overwriting non-volume file
RUN mkdir "/regular-data" && echo "FOO" > "/regular-data/test"
RUN echo "BAR" > "/regular-data/test"
RUN cat "/regular-data/test" # prints "BAR"
Building the Dockerfile will print FOO and BAR.
Is it possible to modify file /data/test in B Dockerfile?
It seems that this is intended behavior.
Changing the volume from within the Dockerfile: If any build steps change the data within the volume after it has been declared, those changes will be discarded.
VOLUMEs are not part of your IMAGE. So what is the use case to seed data into it. When you push the image into another location, it is using an empty volume at start. The dockerfile behaviour does remind you of that.
So basically, if you want to keep the data along with the app code, you should not use VOLUMEs. If the volume declaration did exist in the parent image then you need to remove the volume before starting your own image build. (docker-copyedit).
There are a few non-obvious ways to do this, and all of them have their obvious flaws.
Hijack the parent docker file
Perhaps the simplest, but least reusable way, is to simply use the parent Dockerfile and modify that. A quick Google of docker <image-name:version> source should find the github hosting the parent image Dockerfile. This is good for optimizing the final image, but destroyes the point of using layers.
Use an on start script
While a Dockerfile can't make further modifications to a volume, a running container can. Add a script to the image, and change the Entrypoint to call that (and have that script call the original entry point). This is what you will HAVE to do if you are using a singleton-type container, and need to partially 'reset' a volume on start up. Of course, since volumes are persisted outside the container, just remember that 1) your changes may already be made, and 2) Another container started at the same time may already be making those changes.
Since volumes are (virtually) forever, I just use one time setup scripts after starting the containers for the first time. That way I easily control when the default data is setup/reset. (You can use docker inspect <volume-name> to get the host location of the volume if you need to)
The common middle ground on this one seems to be to have a one-off image whose only purpose is to run once to do the volume configurations, and then clean it up.
Bind to a new volume
Copy the contents of the old volume to a new one, and configure everything to use the new volume instead.
And finally... reconsider if Docker is right for you
You probably already wasted more time on this than it was worth. (In my experience, the maintenance pain of Docker has always far outweighed the benefits. However, Docker is a tool, and with any tool, you need to sometimes take a moment to reflect if you are using it right, and if there are better tools for the job.)

Move file downloaded in Dockerfile to harddrive

First off, I really lack a lot of knowledge regarding Docker itself and its structure. I know that it'd be way more beneficial to learn the basics first, but I do require this to work in order to move on to other things for now.
So within a Dockerfile I installed wget & used it to download a file from a website, authentification & download are successful. However, when I later try move said file it can't be found, and it doesn't show up using e.g explorer either (path was specified)
I thought it might have something to do with RUN & how it executes the wget command; I read that the Id can be used to copy it to harddrive, but how'd I do that within a Dockerfile?
RUN wget -P ./path/to/somewhere http://something.com/file.txt --username xyz --password bluh
ADD ./path/to/somewhere/file.txt /mainDirectory
Download is shown & log-in is successful, but as I mentioned I am having trouble using that file later on as it's not to be located on the harddrive. Probably a basic error, but I'd really appreciate some input that might lead to a solution.
Obviously the error is produced when trying to execute ADD as there is no file to move. I am trying to find a way to mount a volume in order to store it, but so far in vain.
Edit:
Though the question is similiar to the "move to harddrive" one, I am searching for ways to get the id of the container created within the Dockerfile in order to move it; while the thread provides such answers, I haven't had any luck using them within the Dockerfile itself.
Short answer is that it's not possible.
The Dockerfile builds an image, which you can run as a short-lived container. During the build, you don't have (write) access to the host and its file system. Which kinda makes sense, since you want to build an immutable image from which to run ephemeral containers.
What you can do is run a container, and mount a path from your host as a volume into the container. This is the only way how you can share files between the host and a container.
Here is an example how you could do this with the sherylynn/wget image:
docker run -v /path/on/host:/path/in/container sherylynn/wget wget -O /path/in/container/file http://my.url.com
The -v HOST:CONTAINER parameter allows you to specify a path on the host that is mounted inside the container at a specified location.
For wget, I would prefer -O over -P when downloading a single file, since it makes it really explicit where your download ends up. When you point -O to the location of the volume, the downloaded file ends up on the host system (in the folder you mounted).
Since I have no idea what your image or your environment looks like, you might need to tweak one or two things to work well with your own image. As a general recommendation: For basic commands like wget or curl, you can find pre-made images on Docker Hub. This can be quite useful when you need to set up a Continuous Integration pipeline or so, where you want to use wget or curl but can't execute it directly.
Use wget -O instead of -P for specific file download
for e.g.,
RUN wget -O /tmp/new_file.txt http://something.com --username xyz --password bluh/new_file.txt
Thanks

Resources