How can you copy the entire file system of a prior build phase in a Dockerfile without using FROM to do it? - docker

We use JFrog Artifactory for managing Docker images created from Dockerfiles. It has a nice feature where you can see all the "layers" that were involved in creating any given final Docker image.
We have to be careful though, so that credentials do not wind up showing in the layers where they were used. The way we currently do this is by using multistage builds with "COPY --from".
However recently we needed to use credentials for a particular yum repository, which is supplying many dependencies that we need (thousands of files spread throughout the file system). I used yum-config-manager to set the password and username from ENV variables. However even if I use FROM depbuilder, the commands from all the prior stages (including depbuilder) now become visible in Artifactory.
I need to avoid that from happening, and a colleage suggested that we could simply do this:
COPY --from=depbuilder / /
And that way it wouldn't show the other stage's steps as part of the history of the build in Artifactory. However I'm afraid that this command might not set all the ownership and permissions correctly, or it might miss certain files, since the documentation on how it works seems spotty at best.
So what's the best way to copy everything from a prior build stage in a way that would be invisible to someone looking at the build layers in Artifactory?
Here is a screenshot showing what the layers look like in Artifactory: (if we expand the RUN steps, currently we could see the credentials passed into docker via ENV since they become part of the URL for Artifactory)
Thanks for any help!

Related

Docker dealing with images instead of Dockerfiles

Can someone explain to me why the normal Docker process is to build an image from a Dockerfile and then upload it to a repository, instead of just moving the Dockerfile to and from the repository?
Let's say we have a development laptop and a test server with Docker.
If we build the image, that means uploading and downloading all of the packages inside the Dockerfile. Sometimes this can be very large (e.g. PyTorch > 500MB).
Instead of transporting the large imagefile to and from the server, doesn't it make sense to, perhaps compile the image locally to verify it works, but mostly transport the small Dockerfile and build the image on the server?
This started out as a comment, but it got too long. It is likely to not be a comprehensive answer, but may contain useful information regardless.
Often the Dockerfile will form part of a larger build process, with output files from previous stages being copied into the final image. If you want to host the Dockerfile instead of the final image, you’d also have to host either the (usually temporary) processed files or the entire source repo & build script.
The latter is often done for open source projects, but for convenience pre-built Docker images are also frequently available.
One tidy solution to this problem is to write the entire build process in the Dockerfile using multi-stage builds (introduced in Docker CE 17.05 & EE 17.06). But even with the complete build process described in a platform-independent manner in a single Dockerfile, the complete source repository must still be provided.
TL,DR: Think of a Docker image as a regular binary. It’s convenient to download and install without messing around with source files. You could download the source for a C application and build it using the provided Makefile, but why would you if a binary was made available for your system?
Instead of transporting the large imagefile to and from the server,
doesn't it make sense to, perhaps compile the image locally to verify
it works, but mostly transport the small Dockerfile and build the
image on the server?
Absolutely! You can, for example, set up an automated build on Docker Hub which will do just that every time you check in an updated version of your Dockerfile to your GitHub repo.
Or you can set up your own build server / CI pipeline accordingly.
IMHO, one of the reason for building the images concept and putting into repository is sharing with people too. For example we call Python's out of the box image for performing all python related stuff for a python program to run in Dockerfile. Similarly we could create a custom code(let's take example I did for apache installation with some custom steps(like ports changes and additionally doing some steps) I created its image and then finally put it to my company's repository.
I came to know after few days that may other teams are using it too and now when they are sharing it they need NOT to make any changes simply use my image and they should be done.

Efficient svn checkout in docker container

I want to checkout some files (specifically, test suite http://llvm.org/svn/llvm-project/test-suite/trunk) in my docker container.
Now I just use RUN svn co http://llvm.org/svn/llvm-project/test-suite/trunk train.out/llvm-test-suite inside Dockerfile.
It works, but doesn't look efficient: on each docker-compose I need to wait for ~5 minutes while the tests are loading.
Is there a better way to prevent Docker from checking out this file each time? The only alternative I see for now is including the file into container.
You generally don’t run source-control tools from inside a Dockerfile. Check them out in a host directory (better still, if you can manage it, add the Dockerfile directly to the repository you’re trying to build) and run docker build with all of its inputs directly on disk.
There are a couple of good reasons for this:
Docker image caching can often mean that Docker won’t repeat a “clone”, “checkout”, or “pull” type operation: it knows it’s done it once and already knows the output of it and skips the step, even if there are new commits you don’t have.
Adding tools like svn or git to the image that you only need to build it makes it unnecessarily larger. (Multi-stage builds can avoid this, but they’re relatively new.)
The more common use case for this is to clone a private repository that needs credentials, and it’s hard to avoid leaking those credentials into the final image. (Again multi-stage builds can avoid this, with some care, but it’s better to not have the security exposure at all.)

How to stop TeamCity from rebuilding docker dependencies every time?

I have a TeamCity build project that parameterizes a docker-compose.yml template with the build versions of a dozen Docker containers, so in order to get the build_counter from each container, I have them set as snapshot dependencies in the docker-compose build job. Each container's Dockerfile and other files are in their own BitBucket repo, and they have triggers for the appropriate files. In the snapshot dependencies in the docker-compose build I have them set to "Do not run new build if there is a suitable one" but it still tries to run all of the dependent builds even though there aren't any changes in their respective repos.
This makes what should be a very simple and quick build in to a very long build. And often times, one of the dependent builds will fail with "could not collect changes: connection refused" and I suspect it has to do with TC trying to hit all of these different repos all at once.
Is there something I can do to not trigger a build of every dependency every time the docker-compose build is run?
Edit:
Here's an example of what our docker-compose.yml.j2 looks like: http://termbin.com/b2xy
Obviously, I've sanitized it for sharing, and our real docker-compose template has about a dozen services listed.
Here is an example Dockerfile for one of the services: http://termbin.com/upins
Rather than changing the source code of your build (parameterized docker-compose.yml) and brute-force your build every time, you could consider building the containers independently while tagging them with a version increment, and labels. After the build store the images in a local registry. Use docker-compose to suit your runtime needs. docker-compose can use multiple yaml files, so if you need other images for a particular build, just pull the other images you need. For production use another yaml file that composes the system to run. Add LABEL to your Dockerfile. See http://label-schema.org//rc1/ for a set of labels that suit your needs.
I know this is old question but I have come across this issue and you can't do what sounds reasonable i.e. get recent green builds without rebuilding. This is partly because of what the snapshot dependencies are designed to do by Jetbrains.
The basic idea is that dependencies are for synchronized revisions of code: that is if you build Compose at a certain time, it will need to use not just its own versions of source code at that point in time but also the code for all the dependencies that also comes from that point of time, regardless of whether anything significant has changed.
In my example, there were always changes because the same repo was used for lots of projects and had unrelated changes that would not trigger a build but would make the project appear behind and cause a build.
If your dependencies have NOT changed and show no changes pending, then they shouldn't build. In this case, you need to tick "Do not run new build if there is a suitable one". "Enforce Revisions Synchronization" is slightly confusing. If ticked, it will find older builds that match the first build after your build was triggered. If unticked, it can use newer builds.

Build chain in the cloud?

(I understand this question is somewhat out of scope for stack overflow, because contains more problems and somewhat vague. Suggestions to ask it in the proper ways are welcome.)
I have some open source projects depending in each other.
The code resides in github, the builds happen in shippable, using docker images which in turn are built on docker hub.
I have set up an artifact repo and a debian repository where shippable builds put the packages, and docker builds use them.
The build chain looks like this in terms of deliverables:
pre-zenta docker image
zenta docker image (two steps of docker build because it would time out otherwise)
zenta debian package
zenta-tools docker image
zenta-tools debian package
xslt docker image
adadocs artifacts
Currently I am triggering the builds by pushing to github and sometimes rerunning failed builds on shippable after the docker build ran.
I am looking for solutions for the following problems:
Where to put Dockerfiles? Now they are in the repo of the package needing the resulting docker image for build. This way all information to build the package are in one place, but sometimes I have to trigger an extra build to have the package actually built.
How to trigger build automatically?
..., in a way supporting git-flow? For example if I change the code in zenta develop branch, I want to make sure that zenta-tools will build and test with the development version of it, before merging with master.
Are there a tool with which I can overview the health of the whole build chain?
Since your question is related to Shippable, I've created a support issue for you here - https://github.com/Shippable/support/issues/2662. If you are interested in discussing the best way to handle your scenario, you can also send me an email at support#shippable.com You can set up your entire flow, including building the docker images, using Shippable.

Where to keep Dockerfile's in a project?

I am gaining knowledge about Docker and I have the following questions
Where are Dockerfile's kept in a project?
Are they kept together with the source?
Are they kept outside of the source? Do you have an own Git repository just for the Dockerfile?
If the CI server should create a new image for each build and run that on the test server, do you keep the previous image? I mean, do you tag the previous image or do you remove the previous image before creating the new one?
I am a Java EE developer so I use Maven, Jenkins etc if that matter.
The only restriction on where a Dockerfile is kept is that any files you ADD to your image must be beneath the Dockerfile in the file system. I normally see them at the top level of projects, though I have a repo that combines a bunch of small images where I have something like
top/
project1/
Dockerfile
project1_files
project2/
Dockerfile
project2_files
The Jenkins docker plugin can point to an arbitrary directory with a Dockerfile, so that's easy. As for CI, the most common strategy I've seen is to tag each image built with CI as 'latest'. This is the default if you don't add a tag to a build. Then releases get their own tags. Thus, if you just run an image with no arguments you get the last image built by CI, but if you want a particular release it's easy to say so.
I'd recommend keeping the Dockerfile with the source as you would a makefile.
The build context issue means most Dockerfiles are kept at or near the top-level of the project. You can get around this by using scripts or build tooling to copy Dockerfiles or source folders about, but it gets a bit painful.
I'm unaware of best practice with regard to tags and CI. Tagging with the git hash or similar might be a good solution. You will want to keep at least one generation of old images in case you need to rollback.

Resources