docker build context and sensitive data

docker build context and sensitive data - docker

The title of this question might suggest that it has already been answered, but trust me, I searched intensively here on SO :-)
As I understand it when building a docker-image the current folder will be packaged up and sent to the docker-deamon as the build-context. From this build-context the docker-image is build by "ADD"ing or "COPY"ing files and "RUN"ning the commands in the Dockerfile.
And furthermore: In case I have sensitive configuration-files in the folder of the DockerFile, these files will be sent to the docker-deamon as part of the build-context.
Now my question:
Lets say I did not use any COPY or ADD in my Dockerfile... will these configuration files be included somewhere in the docker-image? I ran a bash inside the image and could not find the configuration-files, but maybe they are still somewhere in the deeper layers of the image?
Basically my question is: Will the context of the build be stored in the image?

Only things you explicitly COPY or ADD to the image will be present there. It's common to have lines like COPY . . which will copy the entire context into the image, so it's up to you to check that you're not copying in things you don't want to have persisted and published.
It still is probably a good idea to keep these files from being sent to the Docker daemon at all. If you know which files have this information, you can add them to a .dockerignore file (syntax similar to .gitignore and similar files). There are other ways to more tightly control what's in the build context (by making a shadow install tree that has only the context content) but that's a relatively unusual setup.

As you said only COPY, ADD and RUN operations create layers, and therefore, only those operations add something to the image.
The build context is only the directory with the resources those operations (specifically COPY and ADD) will have access to while building the image. But it's not anything like a "base layer".
In fact, you said you ran bash and double checked that nothing sensitive was there. Another way to make sure about this is by checking the layers of the image. To do so, docker history --no-trunc <image>

Related

Combine executables from separate Docker images?

Let's say I need A.exe and B.exe installed in my final node image by runtime. Both A.exe and B.exe happen to be available on Docker Hub, but they're from separate images. Does Docker have a way to somehow make both executables from different images available in my final image?
I don't think Docker's multi-stage build is relevant as it only simplifies passing artefacts that we want available on the next image. Whereas in my case, I need the whole runtime environment from previous images to be available. I have the option to RUN shell commands to manually install these dependencies but is this really the only way?

You could use a multistage build, since you could declare a.exe/b.exe together with the required runtime to be the required artefacts.
But I agree it could be easier if you install the runtime from packages and just copy the application.

If you feel sure enough about what you need, you could export image A and image B as tar balls.
Now comes the tricky part: merging the two filesystem structures
Extract both archives such that you have one target filesystem structure and wrap that up into a tar ball again.
So it is not impossible to get the files - but you need to know exactly what you are doing.
Finally import that tar ball into a docker image.

One option to get a combined image is not to merge two images but maybe merge two dockerfiles. This may need a review once in a while but might even change less often than the images themselves.

.dockerignore workflow unclear

I have a question about the .dockerignore workflow which I wasn't really able to understand while browsing through the documentation and different internet topics.
Have the following folder structure:
home
|
|- folder_1
|- folder_2
Inside my dockerfile I want to copy the contents of home directory, so I use
COPY ./ /home
Inside .dockerignore I have:
*
!folder_1
!folder_3
I am referring to a non-existent folder - folder_3, which is supposed to be copied, right?
I ran it and it looks like there's no problem with that, thus .dockerignore somehow manages this situation.
If I tried to do the same thing without using .dockerignore, targeting a non-existent directory I would get an error.
If anybody can please clear this workflow, or if a duplicate, please attach some information so I can educate myself.
Thanks in advance!

First of all, .dockerignore works like .gitignore. Inside these files you set the rules on the basis of which files should be added, and which should not.
In your scenario you COPY the whole home directory which consists of folder_1 and folder_2. Your .dockerignore file sets the following rules:
* # ignore all files/directories
!folder_1 # do not ignore folder_1
!folder_3 # do not ignore folder_3
Regardless of whether there is a folder_1 or folder_3 in your local home directory or not, it won't show you any errors, because it just tries to find particular files/directories that are inside .dockerignore. If it finds this file/directory, it applies the rules. If it doesn't find this file/directory, it doesn't do anything with it.
Hope that's a little bit more clear now.

You'll occasionally see reference to a Docker build context. The build has two steps:
The docker build client application creates a tar file of its directory parameter, and sends it in an HTTP request to the Docker daemon.
The Docker daemon unpacks the tar file, finds the Dockerfile in it, and runs it using the file content it was given.
.dockerignore only affects the first step: it keeps docker build from sending the Docker daemon particular files. The .dockerignore file doesn't require there to be a folder_3 directory, it just says that if there is one it shouldn't be excluded. The second step on the Docker daemon side doesn't use .dockerignore at all, and when you COPY . /somewhere it copies the entire build context; that is, whatever was sent in the API request.
There are a couple of practical consequences of this workflow. If you have a very large local directory it can take time to send it to the Docker daemon, and the Docker daemon keeps a duplicate copy of it during the build, so it's often worthwhile to .dockerignore your .git directory and a build tree. This setup is also how docker build works with a Docker daemon on a different system or in a VM, and it's why if you try to COPY a file by name that doesn't exist (COPY folder_3 somewhere) you get an error message referencing a Docker-internal path.

How can I see which file(s) caused a Dockerfile `COPY` statement to invalidate the cache?

docker build . will rebuild the docker image given the Dockerfile in the current directory, and ignore any paths matched from the .dockerignore file.
Any COPY statements in that Dockerfile will cause the build cache to be invalidated if the files on-disk are different from last time it built.
I've noticed that if you don't ignore the .git dir, simple things like git fetch which have no side-effect will cause the build cache to become invalidated (presumably because some tracking information within the .git dir has changed.
It would be very helpful if I knew how to see precisely which files caused the cache to become invalidated... But I've been unable to find a way.

I don't think there is a way to see which file invalidated the cache with the current Docker image design.
Layers and images since v1.10 are 'content addressable'. Their ID's are based on a SHA256 checksum which reflects their content.
The caching code just looks up the ID of the image/layer which will only exist in Docker Engine if the contents of the entire layer match (or possibly a collision).
So when you run docker build, a new build context is created for each command in the Dockerfile. A checksum is calculated for the entire layer that command would produce. Then docker checks to see if an existing layer is available with that checksum and run config.
The only way I can see to get individual file detail back would be to recompute the destination file checksums, which would probably negate most of the caching speed up. If you did want to do this anyway, the other problem is deciding which layer to check that against. You would have to lookup a previous image build tree (maybe by tag?) to find what the contents of the previous comparable layer were.

Can I build a Docker image to "cache" a yocto/bitbake build?

I'm building a Yocto image for a project but it's a long process. On my powerful dev machine it takes around 3 hours and can consume up to 100 GB of space.
The thing is that the final image is not "necessarily" the end goal; it's my application that runs on top of it that is important. As such, the yocto recipes don't change much, but my application does.
I would like to run continuous integration (CI) for my app and even continuous delivery (CD). But both are quite hard for now because of the size of the yocto build.
Since the build does not change much, I though of "caching" it in some way and use it for my application's CI/CD and I though of Docker. That would be quite interesting as I could maintain that image and share it with colleagues who need to work on the project and use it in CI/CD.
Could a custom Docker image be built for that kind of use?
Would it be possible to build such an image completely offline? I don't want to have to upload the 100GB and have to re-download it on build machines...
Thanks!

1. Yes.
I've used docker to build Yocto images for many different reasons, always with positive results.
2. Yes, with some work.
You want to take advantage of the fact that Yocto caches all the stuff you need to do your build in what it calls "Shared State Cache". This is normally located in your build directory under ${BUILDDIR}/sstate-cache, and it contains exactly what you are looking for in this case. There are a couple of options for how to get these files to your build machines.
Option 1 is using sstate mirrors:
This isn't completely offline, but lets you download a much smaller cache and build from that cache, rather than from source.
Here's what's in my local.conf file:
SSTATE_MIRRORS ?= "\
file://.* http://my.shared-computer.com/some-folder/PATH"
Don't forget the PATH at the end. That is required. The build system substitutes the correct path within the directory structure.
Option 2 lets you keep a local copy of your sstate-cache and build from that locally.
In your dockerfile, create the sstate-cache directory (location isn't important here, I like /opt for my purposes):
RUN mkdir -p /opt/yocto/sstate-cache
Then be sure to bindmount these directories when you run your build in order to preserve the contents, like this:
docker run ... -v /place/to/save/cache:/opt/yocto/sstate-cache
Edit the local.conf in your build directory so that it points at these folders:
SSTATE_DIR ?= "/opt/yocto/sstate-cache"
In this way, you can get your cache onto your build machines in whatever way is best for you (scp, nfs, sneakernet).
Hope this helps!

"Caching" intermediate Docker build

I'm learning to use Docker and I've come across a minor annoyance. Whenever I make a change to the Dockerfile,I run docker build -t tag . which goes through the entire Dockerfile as it should. This takes a good 5-6 minutes due to the dependencies in my project. Sometimes a command that I run will cause an error, or there will be a mistake in the Dockerfile. While the fix may take a couple seconds, I have to rebuild the entire thing which decreases my productivity. Is there a way to "continue from where the build last failed" after editing the Dockerfile? Thanks.

This is called the "build cache" and it is already a feature of Docker. Docker's builder will only use the cache up until the point where your Dockerfile has changed. There are some edge cases when using COPY or ADD directives that will cause the build cache to be invalidated (since it hashes files to determine if any have changed, and invalidates the cache if so). This means that if you are using COPY foo /foo and you have changed that file, the build cache will be invalidated. Also, if you do COPY . /opt/bar/ (meaning, you copy the entire directory to somewhere), even some small change like a Vim swap file or Dockerfile change will invalidate the cache!
The behavior of not using the build cache at all is invoked using --no-cache in your docker build command.
So basically, it's there, and you're using it, just that you're probably changing the Dockerfile at a very early point or hitting that lesser known edge case with a COPY/ADD directive, and the builder is invalidating everything after that point. And just to answer the question before you ask it, it would be very hard or impossible to continue using the cache after a change has invalidated the cache. Meaning, if you change your first Dockerfile line and invalidate the build cache, it is basically impossible to use the build cache past that point.

Is there a way to "continue from where the build last failed" after editing the Dockerfile?
No (as L0j1k's answer explains well)
That is why the best practice is to organize your Dockerfile from the stablest commands (the one which will never have to be changed/modified) to the most specific commands (the ones you might have to change quite a bit).
That way, your modifications will trigger only a build on the last few lines of your Dockerfile, instead of going through everything again, because you changed one of the first lines.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart