DockerFile one-line vs multi-line instruction [duplicate] - docker

This question already has answers here:
Multiple RUN vs. single chained RUN in Dockerfile, which is better?
(4 answers)
Closed 2 years ago.
To my knowledge of the way docker build works is that for each line of instruction, it creates a separate image/layer. However, it is very efficient in managing to reuse the layers or avoid rebuilding those layers if nothing has changed.
So does it matter if I put below instruction either on same line or multi-line? For convenience, I would prefer the single line option unless it is not an efficient option.
Multi-Line Instruction
RUN apt-get -y update
RUN apt-get -y install ...
Single-Line Instruction
RUN apt-get -y update && apt-get -y install

In this specific case it is important to put apt-get update and apt-get install together. More broadly, fewer layers is considered "better" but it almost never has a perceptible difference.
In practice I tend to group together "related" commands into the same RUN command. If I need to configure and install a package from source, that can get grouped together, and even if I change make arguments I don't mind re-running configure. If I need to configure and install three packages, they'd go into separate RUN lines.
The important difference in this specific apt-get example is around layer caching. Let's say your Dockerfile has
FROM ubuntu:18.04
RUN apt-get update
RUN apt-get install package-a
If you run docker build a second time, it will decide it's already run all three of these commands and the input hasn't changed, so it will run very quickly and you'll get an identical image out.
Now you come back a day or two later and realize you were missing something, so you change
FROM ubuntu:18.04
RUN apt-get update
RUN apt-get install package-a package-b
When you run docker build again, Docker decides it's already run apt-get update and can jump straight to the apt-get install line. In this specific case you'll have trouble: Debian and Ubuntu update their repositories fairly frequently, and when they do the old versions of packages get deleted. So your apt-get update from two days ago points at a package that no longer exists, and your build will fail.
You'll avoid this specific problem by always putting the two apt-get commands together in the same docker run line
FROM ubuntu:18.04
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive \
apt-get install --assume-yes --no-install-recommends \
package-a \
package-b

I would use single-line instruction. It's considered to be a best practice for docker. So that you minimize number of layers(RUN is one of instructions that create layers).
As of multiple instructions for collecting dependencies, sometimes it's useful during development, if you're frequently changing list of packages(or their versions). But for production image, i would avoid that.

Related

docker build cache busting with apt-get

My understanding is that if the RUN command "string" itself just does not change (i.e., the list of packages to be installed does not change), docker engine uses the image in the cache for the same operation. This is also my experience:
...
Step 2/6 : RUN apt update && DEBIAN_FRONTEND=noninteractive apt install -y curl git-all locales locales-all python3 python3-pip python3-venv libusb-1.0-0 gosu && rm -rf /var/lib/apt/lists/*
---> Using cache
---> 518e8ff74d4c
...
However, the official Dockerfile best practices document says this about apt-get:
Using RUN apt-get update && apt-get install -y ensures your Dockerfile installs the latest package versions with no further coding or manual intervention. This technique is known as “cache busting”.
This is true if I add a new package to the list but it is not if I do not modify the list.
Is my understanding correct, or I am missing something here?
If yes, can I assume that I will only get newer packages in apt-get install if also the Ubuntu base image has been updated (which invalidates the whole cache)?
You cut off the quote in the middle. The rest of the quote included a very important condition:
You can also achieve cache-busting by specifying a package version. This is known as version pinning, for example:
RUN apt-get update && apt-get install -y \
package-bar \
package-baz \
package-foo=1.3.*
Therefore the command you run in there example would change each time by changing the pinned version of the package in the list. Note that in addition to changing the command run, you can change the environment, which has the same effect, using a build arg as described in this answer.
You are right. The documentation is very poorly written. If you read further you can see what the author is trying to say:
The s3cmd argument specifies a version 1.1.*. If the image previously used an older version, specifying the new one causes a cache bust of apt-get update and ensures the installation of the new version.
It seems author thinks 'cache busting' is when you change the Dockerfile in a way that invalidates the cache. But the usual definition of cache busting is a mechanism by which we can invalidate cache even if the file is the same.

How does docker cache the layer?

I have a docker image which run the following command
RUN apt-get update --fix-missing && apt-get install -y --no-install-recommends build-essential debhelper rpm ruby ruby-dev sudo cmake make gcc g++ flex bison git libpcap-dev libssl-dev ninja-build openssh-client python-dev python3-pip swig zlib1g-dev python3-setuptools python3-requests wget curl unzip zip default-jdk && apt-get clean && rm -rf /var/lib/apt/lists/*
If I run it couple time in the same day, the layer seems cached. However, docker will think the layer changed if I run it for the first time daily.
Just wonder what's special in the above command that makes docker thinks the layer changed?
This is not caused by docker. When docker sees a RUN command, all it does is simple string comparison to determine whether the layer is in the cache or not. If it sees it in cache, it will reuse it and if not, it will run it.
Since you have mentioned that it builds whole day using cache and then it doesn't the next day, the only possible explanation is that the cache has been invalidated/deleted during that time by someone/something.
I don't know how/where you are running the docker daemon but it may be the case that it is running in VM that is being recreated each day from a base image which would then destroy all the cache and force docker to rebuild the image.
Another explanation is that you have some cleanup process running once a day, maybe some cron that deletes the cache.
Bottom line is that docker will happily reuse that cache for unlimited period of time, as long as the cache actually exists.
I am assuming that previous layers has been built from cache (if there are any), otherwise you should look for COPY/ADD commands if they are not causing the cache busting due to file changes in your build context.
It's not the command, it's the steps that occur before it. Specifically, if the files being copied to previous layers were modified. I can be more specific if you'll edit the post to show all the steps in the Dockerfile before this one.
According to the docker doc:
Aside from the ADD and COPY commands, cache checking does not look at the files in the container to determine a cache match. For example, when processing a RUN apt-get -y update command the files updated in the container are not examined to determine if a cache hit exists. In that case just the command string itself is used to find a match
For a RUN command, it just command string itself is used to find a match. So, maybe any processes delete the cache layer, or maybe you changed your Dockerfile?

Update of root certificates on docker

If I understand correctly, on standard Ubuntu systems for example, root certificates are provided by ca-certificates package and get updated when the package itself is updated.
But how can the root certificates be updated when using docker containers ? Is there a common preferred way of doing this, or must the containers be redeployed with an up-to-date docker image ?
The containers must be redeployed with an up-to-date image.
The Docker Hub base images like ubuntu actually get updated fairly regularly, and if you look at the tag list you can see that there are several date-stamped variants of the images. So one approach that will get you pretty close to current is to always (have your CI system) pull the base image before you build.
docker pull ubuntu:18.04
docker build .
If you can't do that, or if you're working from some sort of derived image that updates less frequently, you can just manually run apt-get upgrade in your Dockerfile. Doing this in the same place you're otherwise installing packages makes sense. It needs to be in the same RUN line as a matching apt-get update, and you might need some way to force Docker to not cache that update line to get current updates.
FROM python:3.8-slim
# Have an option to force rebuilds; the RUN line won't be
# cacheable if the dependency_stamp option changes
ARG dependency_stamp
ENV dependency_stamp=${dependency_stamp:-unknown}
RUN touch /dependencies.${dependency_stamp}
# Update base OS packages and install other things we need
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get upgrade \
&& DEBIAN_FRONTEND=noninteractive apt-get install \
--no-install-recommends --assume-yes \
...
If you find yourself doing this routinely, maintaining your own base images that are upgraded to current packages but don't have anything else installed can be helpful; if you find yourself doing that, you might have more control over the process and get smaller images if you build an image FROM ubuntu and install e.g. Python, rather than building an image FROM python and then installing updates over it.

Docker build with latest apt package is general?

In my understanding, docker build usually use cache if Dockerfile seems not to be changed and not include COPY command, so if I do it with no option, Dockerfile which includes apt-get or apt-get update(or something similler command, you know) will be cached and never update package actually.
I want to use latest package for several library(for security purpose) so I always use docker build with no cache option.
On the other hand, there is --mount=type=cache option. It's not docker build option but RUN command option. I read document. this RUN option makes package managers possible to be cached.
So, maybe my approach is wrong? With docker, does it generally use cache and never (or slight few) update packages?
when you not change the Dockerfile the cashe will always be used sure if the image is already downloaded locally.
your approch to use --no-cache is right.
on the other hand if you need to update the packages during the run time you may add apt-get -y update && apt-get -y upgrade to your ENTRYPOINT in this case you update the packages every time the container starts.

docker build - Avoid ADDing files only needed at build time

I'm trying to build a docker image avoiding unnecessary bulk, and I've run into a problem that I think should be common, but so far I haven't found a straightforward solution. (I'm building the docker on an ubuntu 18.04 system, and starting with a FROM ubuntu layer.)
In particular, I have a very large .deb file (over 3G) that I need to install in the image. It's easy enough to COPY or ADD it and then RUN dpkg -i, but that results in duplication of several GB of space that I don't need. Of course, just removing the file doesn't reduce the image size.
I'd like to be able to mount a volume to access the .deb file, rather than COPY it, which is easy to do when running a container, but apparently not possible to do when building one?
What I've come up with so far is to build the docker up to the point where I would ADD the file, then run it with a volume mounted so I can access it from the container without COPYing it, then I dpkg -i it, then I do a docker commit to create an image from that container. Sure enough, I end up with an image that's over 3GB smaller than my first try, but that seems like a hack, and makes scripting the build more complicated.
I'm thinking there must be a more appropriate way to achieve this, but so far my searching has not revealed an obvious answer. Am I missing something?
Relying on docker commit indeed amounts to a hack :) and its use is thus mentioned as inadvisable by some references such as this blog article.
I only see one possible approach for the kind of use case you mention (copy a one-time .deb package, install it and remove the binary immediately from the image layer):
You could make remotely available to the docker engine that builds your image, the .deb you'd want to install, and replace the COPY + RUN directives with a single one, e.g., relying on curl:
RUN curl -OL https://example.com/foo.deb && dpkg -i foo.deb && rm -f foo.deb
If curl is not yet installed, you could run beforehand the usual APT commands:
RUN apt-get update -y -q \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y -q --no-install-recommends \
ca-certificates \
curl \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
Maybe there is another possible solution (but I don't think the multi-staged builds Docker feature would be of some help here, as all perms would be lost by doing e.g. COPY --from=build / /).

Resources