How to idiomatically access sensitive data when building a Docker image? - docker

Sometimes there is a need to use sensitive data when building a Docker image. For example, an API token or SSH key to download a remote file or to install dependencies from a private repository. It may be desirable to distribute the resulting image and leave out the sensitive credentials that were used to build it. How can this be done?
I have seen docker-squash which can squash multiple layers in to one, removing any deleted files from the final image. But is there a more idiomatic approach?

Regarding idiomatic approach, I'm not sure, although docker is still quite young to have too many idioms about.
We have had this same issue at our company, however. We have come to the following conclusions, although these are our best efforts rather than established docker best practices.
1) If you need the values at build time: Supply a properties file in the build context with the values that can be read at build, then the properties file can be deleted after build. This isn't as portable but will do the job.
2) If you need the values at run time: Pass values as environment variables. They will be visible to someone who has access to ps on the box, but this can be restricted via SELinux or other methods (honestly, I don't know this process, I'm a developer and the operations teams will deal with that part).

Sadly, there is still no proper solution for handling sensitive data while building a docker image.
This bug has a good summary of what is wrong with every hack that people suggest:
https://github.com/moby/moby/issues/13490
And most advice seems to confuse secrets that need to go INTO the container with secrets that are used to build the container, like several of the answers here.
The current solutions that seem to actually be secure, all seem to center around writing out the secret file to disk or memory, and then starting a silly little HTTP server, and then having the build process pull in the secret from the http server, use it, and not store it in the image.
The best I've found without going to that level of complexity, is to (mis)use the built in predefined-args feature of docker compose files, as specified in this comment:
https://github.com/moby/moby/issues/13490#issuecomment-403612834
That does seem to keep the secrets out of the image build history.

Matthew Close talks about this in this blog article.
Summarized: You should use docker-compose to mount sensitive information into the container.

2019, and I'm not sure there is an idomatic approach or best practices regarding secrets when using docker: https://github.com/moby/moby/issues/13490 remains open so far.
Secrets at runtime:
So far, the best approach I could find was using environment variables in a container:
with docker run -e option... but then your secrets are available in command line history
with docker env_file option or docker-compose env_file option. At least secrets are not passed in command line
Problem: in any case, secrets are now available for anyone able to run docker commands on your docker host (using docker inspect command)
Secrets at build time (your question):
I can see 2 additional (partial?) solutions to this problem:
Multistage build:
use a multi-stage docker build: basically, your dockerfile will define 2 images:
One first intermediate image (the "build image") in which:
you add your secrets to this image: either use build args or copy secret files (be careful with build args: they have to be passed in docker build command line)
you build your artefact (you now have access to your private repository)
A second image (the "distribution image") in which:
you copy the built artefact from the "build image"
distribute your image on a docker registry
This approach is explained by several comments in the quoted github thread:
https://github.com/moby/moby/issues/13490#issuecomment-408316448
https://github.com/moby/moby/issues/13490#issuecomment-437676553
Caution
This multistage build approach is far from being ideal: the "build image" is still lying on your host after the build command (and is containing your sensitive information). There are precautions to take
A new --secret build option:
I discovered this option today, and therefore did not experiment it yet... What I know so far:
it was announced in a comment from the same thread on github
this comment leads to a detailed article about this new option
the docker documentation (docker v19.03 at the time being) is not verbose about this option: it is listed with the description below, but there is no detailed section about it:
--secret
API 1.39+
Secret file to expose to the build (only if BuildKit enabled): id=mysecret,src=/local/secret

The way we solve this issue is that we have a tool written on top of docker build. Once you initiate a build using the tool, it will download a dockerfile and alters it. It changes all instructions which require "the secret" to something like:
RUN printf "secret: asd123poi54mnb" > /somewhere && tool-which-uses-the-secret run && rm /somewhere
However, this leaves the secret data available to anyone with access to the image unless the layer itself is removed with a tool like docker-squash. The command used to generate each intermediate layer can be found using the history command

Related

How to extract docker-compose file from running docker start

I have docker stack started with docker stack deploy --compose-file ...
and later manually edited via Docker Portainer UI.
I'd like to write a script that updates the docker image tag of one of the services.
To do that I need to "download" the latest "docker-compose" stack definition however I cannot find the appropriate docker command.
I do know that the best would be to stop changing stack manually and rely on its definition stored in git but unfortunately, it is not up to me.
Please point me to the appropriate docker command or confirm that it is not available.
As far as i know there is no command you could get the compose file from the running container directly. At least not implemented out of the box in docker. You could try to parse all the relevant information from docker inspect and few other commands to list/inspect all relevant objects?.
I have once came across the similar situation where we had a running container but no run/compose command which we needed to update. At the time (roughly a year ago) i found and used docker-autocompose which did very good job. We only had to manually verify and adjust few things,but it got all the difficult parts with run parameters done for us.
It could help in your case to automate it if your compose configs are simple enough.
But if you wanted to fully automate it to mimic CD, then i would not recommend the approach above. In that case i would check if you could use portainer api as #LinFelix recommended. Or store compose files somewhere - prepared with parameters ($IMAGE_TAG) (git/on server) so you can then generate temporary compose files with all configuration and then remove the current one.

docker - how to hide build arguments

I've recently started to use docker far and wide for my professional projects. I'm still getting to grips with many of the details.
So far, when trying to acquire a software package from a repository on gitlab or github, I have gone the route of acquiring a token, putting the token in some environment variable, and passing that to docker build via the --build-arg argument and then to the git clone command.
However, as I started pushing my images to dockerhub, I was a bit shocked to find that in the "Image Layer Details" section, it displays also the value of the environment variables passed to docker build, that is, the content of my security tokens. Now, this is not so problematic because I can just revoke them and create new ones everytime I push, but that seems quite cumbersome.
Is there a good way to pass security tokens to docker build such that they don't show up in anywhere publicly?
First I want to mention that COPYing the secret (if it's a file) or using ARG (with docker build --arg) will always be visible (either by inspecting the layers or checking the image with docker history <image-id> so those options are out of the question
Docker now supports BuildKit which enables you to mount secrets during build time.
One way to do this is by adding the following statement in your Dockerfile:
RUN --mount=type=secret,id=mysecret <some_command>
and during build use:
export MYSECRET=bigsecret
DOCKER_BUILDKIT=1 docker build --secret id=mysecret,env=MYSECRET -t myimage:latest .
The secrets should be available at /run/secrets/<secret_name> by default, but you can also specify the destination yourself (check the link).

How to instruct docker or docker-compose to automatically build image specified in FROM

When processing a Dockerfile, how do I instruct docker build to build the image specified in FROM locally using another Dockerfile if it is not already available?
Here's the context. I have a large Dockerfile that starts from base Ubuntu image, installs Apache, then PHP, then some custom configuration on top of that. Whether this is a good idea is another point, let's assume the build steps cannot be changed. The problem is, every time I change anything in the config, everything has to be rebuilt from scratch, and this takes a while.
I would like to have a hierarchy of Dockerfiles instead:
my-apache : based on stock Ubuntu
my-apache-php: based on my-apache
final: based on my-apache-php
The first two images would be relatively static and can be uploaded to dockerhub, but I would like to retain an option to build them locally as part of the same build process. Only one container will exist, based on the final image. Thus, putting all three as "services" in docker-compose.yml is not a good idea.
The only solution I can think of is to have a manual build script that for each image checks whether it is available on Dockerhub or locally, and if not, invokes docker build.
Are there better solutions?
I have found this article on automatically detecting dependencies between docker files and building them in proper order:
https://philpep.org/blog/a-makefile-for-your-dockerfiles
Actual makefile from Philippe's git repo provides even more functionality:
https://github.com/philpep/dockerfiles/blob/master/Makefile

Why doesn't Docker Hub cache Automated Build Repositories as the images are being built?

Note: It appears the premise of my question is no longer valid since the new Docker Hub appears to support caching. I haven't personally tested this. See the new answer below.
Docker Hub's Automated Build Repositories don't seem to cache images. As it is building, it removes all intermediate containers. Is this the way it was intended to work or am I doing something wrong? It would be really nice to not have to rebuild everything for every small change. I thought that was supposed to be one of the best advantages of docker and it seems weird that their builder doesn't use it. So why doesn't it cache images?
UPDATE:
I've started using Codeship to build my app and then run remote commands on my DigitalOcean server to copy the built files and run the docker build command. I'm still not sure why Docker Hub doesn't cache.
Disclaimer: I am a lead software engineer at Quay.io, a private Docker container registry, so this is an educated guess based on the same problem we faced in our own build system implementation.
Given my experience with Dockerfile build systems, I would suspect that the Docker Hub does not support caching because of the way caching is implemented in the Docker Engine. Caching for Docker builds operates by comparing the commands to be run against the existing layers found in memory.
For example, if the Dockerfile has the form:
FROM somebaseimage
RUN somecommand
ADD somefile somefile
Then the Docker build code will:
Check to see if an image matching somebaseimage exists
Check if there is a local image with the command RUN somecommand whose parent is the previous image
Check if there is a local image with the command ADD somefile somefile + a hashing of the contents of somefile (to make sure it is invalidated when somefile changes), whose parent is the previous image
If any of the above steps match, then that command will be skipped in the Dockerfile build process, with the cached image itself being used instead. However, the one key issue with this process is that it requires the cached images to be present on the build machine, in order to find and verify the matches. Having all of everyone's images on build nodes would be highly inefficient, making this a harder problem to solve.
At Quay.io, we solved the caching problem by creating a variation of the Docker caching code that could precompute these commands/hashes and then ask our registry for the cached layers, downloading them to the machine only after we had found the most efficient caching set. This required significant data model changes in our registry code.
If you'd like more information, we gave a technical overview into how we do so in this talk: https://youtu.be/anfmeB_JzB0?list=PLlh6TqkU8kg8Ld0Zu1aRWATiqBkxseZ9g
The new Docker Hub came out with a new Automated Build system that supports Build Caching.
https://blog.docker.com/2018/12/the-new-docker-hub/

Dockerfile or Registry? Which is the preferred strategy for distribution?

If you are making a service with a Dockerfile is it preferred for you to build an image with the Dockerfile and push it to the registry -- rather than distribute the Dockerfile (and repo) for people to build their images?
What use cases favour Dockerfile+repo distribution, and what use case favour Registry distribution?
I'd imagine the same question could be applied to source code versus binary package installs.
Pushing to a central shared registry allows you to freeze and certify a particular configuration and then make it available to others in your organisation.
At DevTable we were initially using a Dockerfile that was run when we deployed our servers in order to generate our Docker images. As our docker image become more complex and had more dependencies, it was taking longer and longer to generate the image from the Dockerfile. What we really needed was a way to generate the image once and then pull the finished product to our servers.
Normally, one would accomplish this by pushing their image to index.docker.io, however we have proprietary code that we couldn't publish to the world. You may also end up in such a situation if you're planning to build a hosted product around Docker.
To address this need in to community, we built Quay, which aims to be the Github of Docker images. Check it out and let us know if it solves a need for you.
Private repositories on your own server are also an option.
To run the server, clone the https://github.com/dotcloud/docker-registry to your own server.
To use your own server, prefix the tag with the address of the registry's host. For example:
# Tag to create a repository with the full registry location.
# The location (e.g. localhost.localdomain:5000) becomes
# a permanent part of the repository name
$ sudo docker tag 0u812deadbeef your_server.example.com:5000/repo_name
# Push the new repository to its home location on your server
$ sudo docker push your_server.example.com:5000/repo_name
(see http://docs.docker.io.s3-website-us-west-2.amazonaws.com/use/workingwithrepository/#private-registry)
I think it depends a little bit on your application, but I would prefer the Dockerfile:
A Dockerfile...
... in the root of a project makes it super easy to build and run it, it is just one command.
... can be changed by a developer if needed.
... is documentation about how to build your project
... is very small compared with an image which could be useful for people with a slow internet connection
... is in the same location as the code, so when people checkout the code, they will find it.
An Image in a registry...
... is already build and ready!
... must be maintained. If you commit new code or update your application you must also update the image.
... must be crafted carefully: Can the configuration be changed? How you handle the logs? How big is it? Do you package an NGINX within the image or is this part of the outer world? As #Mark O'Connor said, you will freeze a certain configuration, but that's maybe not the configuration someone-else want to use.
This is why I would prefer the Dockerfile. It is the same with a Vagrantfile - it would prefer the Vagrantfile instead of the VM image. And it is the same with a ANT or Maven script - it would prefer the build script instead of the packaged artifact (at least if I want to contribute code to the project).

Resources