docker - how to hide build arguments - docker

I've recently started to use docker far and wide for my professional projects. I'm still getting to grips with many of the details.
So far, when trying to acquire a software package from a repository on gitlab or github, I have gone the route of acquiring a token, putting the token in some environment variable, and passing that to docker build via the --build-arg argument and then to the git clone command.
However, as I started pushing my images to dockerhub, I was a bit shocked to find that in the "Image Layer Details" section, it displays also the value of the environment variables passed to docker build, that is, the content of my security tokens. Now, this is not so problematic because I can just revoke them and create new ones everytime I push, but that seems quite cumbersome.
Is there a good way to pass security tokens to docker build such that they don't show up in anywhere publicly?

First I want to mention that COPYing the secret (if it's a file) or using ARG (with docker build --arg) will always be visible (either by inspecting the layers or checking the image with docker history <image-id> so those options are out of the question
Docker now supports BuildKit which enables you to mount secrets during build time.
One way to do this is by adding the following statement in your Dockerfile:
RUN --mount=type=secret,id=mysecret <some_command>
and during build use:
export MYSECRET=bigsecret
DOCKER_BUILDKIT=1 docker build --secret id=mysecret,env=MYSECRET -t myimage:latest .
The secrets should be available at /run/secrets/<secret_name> by default, but you can also specify the destination yourself (check the link).

Related

How to instruct docker or docker-compose to automatically build image specified in FROM

When processing a Dockerfile, how do I instruct docker build to build the image specified in FROM locally using another Dockerfile if it is not already available?
Here's the context. I have a large Dockerfile that starts from base Ubuntu image, installs Apache, then PHP, then some custom configuration on top of that. Whether this is a good idea is another point, let's assume the build steps cannot be changed. The problem is, every time I change anything in the config, everything has to be rebuilt from scratch, and this takes a while.
I would like to have a hierarchy of Dockerfiles instead:
my-apache : based on stock Ubuntu
my-apache-php: based on my-apache
final: based on my-apache-php
The first two images would be relatively static and can be uploaded to dockerhub, but I would like to retain an option to build them locally as part of the same build process. Only one container will exist, based on the final image. Thus, putting all three as "services" in docker-compose.yml is not a good idea.
The only solution I can think of is to have a manual build script that for each image checks whether it is available on Dockerhub or locally, and if not, invokes docker build.
Are there better solutions?
I have found this article on automatically detecting dependencies between docker files and building them in proper order:
https://philpep.org/blog/a-makefile-for-your-dockerfiles
Actual makefile from Philippe's git repo provides even more functionality:
https://github.com/philpep/dockerfiles/blob/master/Makefile

What is the best practices for getting code into a docker container?

What is the best practices for getting code into a docker container?
Here are some possible approaches:
ADD call in docker container
git clone or git wget from repo in docker container
Mount an external device
Usually, we use mount for dev/local environment with docker to have the code changes instantly applied.
You can use RUN git clone but you will need to install git and have access to the repository.
The easiest and most used/recommended way is putting the Dockerfile inside your repo and use the ADD call. But use the COPY directive instead because it’s more explicit
You should always COPY your application code in the Dockerfile. You generally shouldn't mount something over that, and you almost definitely shouldn't run git in your Dockerfile.
Running Git has a number of obvious issues. You can only build whatever specific commit the Dockerfile mentions; in real development it's not unusual to want to build a branch or a past commit. You need to install both git itself and relevant credentials in the image, with corresponding space and security issues. Docker layer caching means that docker build by default won't repeat a git clone step if it thinks it's already done that, which means you need to go out of your way to get a newer build.
It's very common in SO questions to mount code into a container. This very much defeats the point of Docker, IMHO. If you go to run your application in production you want to just distribute the image and not also separately distribute its code, but if you've been developing in an environment where you always hide the code built into the image, you've never actually run the image itself. There are also persistent performance and consistency problems in various environments. I don't really recommend docker run -v or Docker Compose volumes: as a way of getting code into a container.
I've seen enough SO questions that have a Dockerfile that's essentially just FROM python, for example, and then use Docker Compose volumes: and command: options to get a live-development environment on top of that. It's much easier to just install Python, and have the tools you need right in front of you without needing to go through a complex indirection sequence to run them in Docker.

Accessing host's api from inside a container

I'm trying to make a build env with docker and i want to make this automatic. i've written a custom go binary to handle build stuff and i've built an image which has the go binary, maven and java8 sdk installed.
The steps that binary does are:
Clone a git repo
Run build command
Extract build artifacts to host. (which hasnt done yet.)
I'm passing repo url as parameter to binary while running container and it does build.
But the problem is i need those artifacts in order to run builted app.
I know i can use volumes, but i dont want to use them because when build has done volumes are becoming dangle and it needs a job for deleting those dangling volumes.
I thought i can create an api for saving files into host (that means i have to run that api inside host machine) and my custom go binary can send files to the api and api will do the saving.
But when it comes to calling host from inside a container i've got a problem. i'm getting connection refused to port xx error.
Is there a better way to do it , or should i change my approach?
found an answer on accessing-host-machine-as-localhost-from-a-docker-container-thats-also-inside
Running container with --add-host option is the answer.
While you could use
docker cp CONTAINER:SRC_PATH DEST_PATH
to get the files out of your container, I still believe using a volume is the better idea. Instead of using an anonymous volume use a named one:
docker run -v /local/host/dir:/build/output YOURIMAGE
This allows you to pick up the artefacts on your host from the /local/host/dir
https://docs.docker.com/engine/tutorials/dockervolumes/#locate-a-volume

How to idiomatically access sensitive data when building a Docker image?

Sometimes there is a need to use sensitive data when building a Docker image. For example, an API token or SSH key to download a remote file or to install dependencies from a private repository. It may be desirable to distribute the resulting image and leave out the sensitive credentials that were used to build it. How can this be done?
I have seen docker-squash which can squash multiple layers in to one, removing any deleted files from the final image. But is there a more idiomatic approach?
Regarding idiomatic approach, I'm not sure, although docker is still quite young to have too many idioms about.
We have had this same issue at our company, however. We have come to the following conclusions, although these are our best efforts rather than established docker best practices.
1) If you need the values at build time: Supply a properties file in the build context with the values that can be read at build, then the properties file can be deleted after build. This isn't as portable but will do the job.
2) If you need the values at run time: Pass values as environment variables. They will be visible to someone who has access to ps on the box, but this can be restricted via SELinux or other methods (honestly, I don't know this process, I'm a developer and the operations teams will deal with that part).
Sadly, there is still no proper solution for handling sensitive data while building a docker image.
This bug has a good summary of what is wrong with every hack that people suggest:
https://github.com/moby/moby/issues/13490
And most advice seems to confuse secrets that need to go INTO the container with secrets that are used to build the container, like several of the answers here.
The current solutions that seem to actually be secure, all seem to center around writing out the secret file to disk or memory, and then starting a silly little HTTP server, and then having the build process pull in the secret from the http server, use it, and not store it in the image.
The best I've found without going to that level of complexity, is to (mis)use the built in predefined-args feature of docker compose files, as specified in this comment:
https://github.com/moby/moby/issues/13490#issuecomment-403612834
That does seem to keep the secrets out of the image build history.
Matthew Close talks about this in this blog article.
Summarized: You should use docker-compose to mount sensitive information into the container.
2019, and I'm not sure there is an idomatic approach or best practices regarding secrets when using docker: https://github.com/moby/moby/issues/13490 remains open so far.
Secrets at runtime:
So far, the best approach I could find was using environment variables in a container:
with docker run -e option... but then your secrets are available in command line history
with docker env_file option or docker-compose env_file option. At least secrets are not passed in command line
Problem: in any case, secrets are now available for anyone able to run docker commands on your docker host (using docker inspect command)
Secrets at build time (your question):
I can see 2 additional (partial?) solutions to this problem:
Multistage build:
use a multi-stage docker build: basically, your dockerfile will define 2 images:
One first intermediate image (the "build image") in which:
you add your secrets to this image: either use build args or copy secret files (be careful with build args: they have to be passed in docker build command line)
you build your artefact (you now have access to your private repository)
A second image (the "distribution image") in which:
you copy the built artefact from the "build image"
distribute your image on a docker registry
This approach is explained by several comments in the quoted github thread:
https://github.com/moby/moby/issues/13490#issuecomment-408316448
https://github.com/moby/moby/issues/13490#issuecomment-437676553
Caution
This multistage build approach is far from being ideal: the "build image" is still lying on your host after the build command (and is containing your sensitive information). There are precautions to take
A new --secret build option:
I discovered this option today, and therefore did not experiment it yet... What I know so far:
it was announced in a comment from the same thread on github
this comment leads to a detailed article about this new option
the docker documentation (docker v19.03 at the time being) is not verbose about this option: it is listed with the description below, but there is no detailed section about it:
--secret
API 1.39+
Secret file to expose to the build (only if BuildKit enabled): id=mysecret,src=/local/secret
The way we solve this issue is that we have a tool written on top of docker build. Once you initiate a build using the tool, it will download a dockerfile and alters it. It changes all instructions which require "the secret" to something like:
RUN printf "secret: asd123poi54mnb" > /somewhere && tool-which-uses-the-secret run && rm /somewhere
However, this leaves the secret data available to anyone with access to the image unless the layer itself is removed with a tool like docker-squash. The command used to generate each intermediate layer can be found using the history command

Why doesn't Docker Hub cache Automated Build Repositories as the images are being built?

Note: It appears the premise of my question is no longer valid since the new Docker Hub appears to support caching. I haven't personally tested this. See the new answer below.
Docker Hub's Automated Build Repositories don't seem to cache images. As it is building, it removes all intermediate containers. Is this the way it was intended to work or am I doing something wrong? It would be really nice to not have to rebuild everything for every small change. I thought that was supposed to be one of the best advantages of docker and it seems weird that their builder doesn't use it. So why doesn't it cache images?
UPDATE:
I've started using Codeship to build my app and then run remote commands on my DigitalOcean server to copy the built files and run the docker build command. I'm still not sure why Docker Hub doesn't cache.
Disclaimer: I am a lead software engineer at Quay.io, a private Docker container registry, so this is an educated guess based on the same problem we faced in our own build system implementation.
Given my experience with Dockerfile build systems, I would suspect that the Docker Hub does not support caching because of the way caching is implemented in the Docker Engine. Caching for Docker builds operates by comparing the commands to be run against the existing layers found in memory.
For example, if the Dockerfile has the form:
FROM somebaseimage
RUN somecommand
ADD somefile somefile
Then the Docker build code will:
Check to see if an image matching somebaseimage exists
Check if there is a local image with the command RUN somecommand whose parent is the previous image
Check if there is a local image with the command ADD somefile somefile + a hashing of the contents of somefile (to make sure it is invalidated when somefile changes), whose parent is the previous image
If any of the above steps match, then that command will be skipped in the Dockerfile build process, with the cached image itself being used instead. However, the one key issue with this process is that it requires the cached images to be present on the build machine, in order to find and verify the matches. Having all of everyone's images on build nodes would be highly inefficient, making this a harder problem to solve.
At Quay.io, we solved the caching problem by creating a variation of the Docker caching code that could precompute these commands/hashes and then ask our registry for the cached layers, downloading them to the machine only after we had found the most efficient caching set. This required significant data model changes in our registry code.
If you'd like more information, we gave a technical overview into how we do so in this talk: https://youtu.be/anfmeB_JzB0?list=PLlh6TqkU8kg8Ld0Zu1aRWATiqBkxseZ9g
The new Docker Hub came out with a new Automated Build system that supports Build Caching.
https://blog.docker.com/2018/12/the-new-docker-hub/

Resources