Check for updated package via yum in Dockerfile - docker

In my Dockerfile I may have a step that looks like this in order to install some packages.
Run yum install pkg1 pkg2 -y &&\
yum -y clean all
The problem is that when I build the container more than once, Docker see's this command as not changing and never runs it. It instead chooses to use a previously cached layer.
However, pkg1 or pkg2 may have been updated in the yum repository and need to be updated, and since it instead used a cached docker layer, the container does not receive the updated packages.
I could build with the --no-cache option, but that would invalidate all cache layers, which substantially slows down the container build as usually my yum install commands are near the end of my Dockerfiles.
What is the best strategy to deal with this? Is there any solution to only invalidate the docker cache if there is a different version of the package in the cache vs repo?

From "Build cache", you could insert an ADD or COPY directive (of a dummy file) just before those RUN commands.
Whenever you want to invalidate the cache for the next RUN, modify the content of the dummy file, and the ADD/COPY (with the rest of the Dockerfile commands) won't rely on the cache.

Related

How does docker cache the layer?

I have a docker image which run the following command
RUN apt-get update --fix-missing && apt-get install -y --no-install-recommends build-essential debhelper rpm ruby ruby-dev sudo cmake make gcc g++ flex bison git libpcap-dev libssl-dev ninja-build openssh-client python-dev python3-pip swig zlib1g-dev python3-setuptools python3-requests wget curl unzip zip default-jdk && apt-get clean && rm -rf /var/lib/apt/lists/*
If I run it couple time in the same day, the layer seems cached. However, docker will think the layer changed if I run it for the first time daily.
Just wonder what's special in the above command that makes docker thinks the layer changed?
This is not caused by docker. When docker sees a RUN command, all it does is simple string comparison to determine whether the layer is in the cache or not. If it sees it in cache, it will reuse it and if not, it will run it.
Since you have mentioned that it builds whole day using cache and then it doesn't the next day, the only possible explanation is that the cache has been invalidated/deleted during that time by someone/something.
I don't know how/where you are running the docker daemon but it may be the case that it is running in VM that is being recreated each day from a base image which would then destroy all the cache and force docker to rebuild the image.
Another explanation is that you have some cleanup process running once a day, maybe some cron that deletes the cache.
Bottom line is that docker will happily reuse that cache for unlimited period of time, as long as the cache actually exists.
I am assuming that previous layers has been built from cache (if there are any), otherwise you should look for COPY/ADD commands if they are not causing the cache busting due to file changes in your build context.
It's not the command, it's the steps that occur before it. Specifically, if the files being copied to previous layers were modified. I can be more specific if you'll edit the post to show all the steps in the Dockerfile before this one.
According to the docker doc:
Aside from the ADD and COPY commands, cache checking does not look at the files in the container to determine a cache match. For example, when processing a RUN apt-get -y update command the files updated in the container are not examined to determine if a cache hit exists. In that case just the command string itself is used to find a match
For a RUN command, it just command string itself is used to find a match. So, maybe any processes delete the cache layer, or maybe you changed your Dockerfile?

Update of root certificates on docker

If I understand correctly, on standard Ubuntu systems for example, root certificates are provided by ca-certificates package and get updated when the package itself is updated.
But how can the root certificates be updated when using docker containers ? Is there a common preferred way of doing this, or must the containers be redeployed with an up-to-date docker image ?
The containers must be redeployed with an up-to-date image.
The Docker Hub base images like ubuntu actually get updated fairly regularly, and if you look at the tag list you can see that there are several date-stamped variants of the images. So one approach that will get you pretty close to current is to always (have your CI system) pull the base image before you build.
docker pull ubuntu:18.04
docker build .
If you can't do that, or if you're working from some sort of derived image that updates less frequently, you can just manually run apt-get upgrade in your Dockerfile. Doing this in the same place you're otherwise installing packages makes sense. It needs to be in the same RUN line as a matching apt-get update, and you might need some way to force Docker to not cache that update line to get current updates.
FROM python:3.8-slim
# Have an option to force rebuilds; the RUN line won't be
# cacheable if the dependency_stamp option changes
ARG dependency_stamp
ENV dependency_stamp=${dependency_stamp:-unknown}
RUN touch /dependencies.${dependency_stamp}
# Update base OS packages and install other things we need
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get upgrade \
&& DEBIAN_FRONTEND=noninteractive apt-get install \
--no-install-recommends --assume-yes \
...
If you find yourself doing this routinely, maintaining your own base images that are upgraded to current packages but don't have anything else installed can be helpful; if you find yourself doing that, you might have more control over the process and get smaller images if you build an image FROM ubuntu and install e.g. Python, rather than building an image FROM python and then installing updates over it.

Docker build with latest apt package is general?

In my understanding, docker build usually use cache if Dockerfile seems not to be changed and not include COPY command, so if I do it with no option, Dockerfile which includes apt-get or apt-get update(or something similler command, you know) will be cached and never update package actually.
I want to use latest package for several library(for security purpose) so I always use docker build with no cache option.
On the other hand, there is --mount=type=cache option. It's not docker build option but RUN command option. I read document. this RUN option makes package managers possible to be cached.
So, maybe my approach is wrong? With docker, does it generally use cache and never (or slight few) update packages?
when you not change the Dockerfile the cashe will always be used sure if the image is already downloaded locally.
your approch to use --no-cache is right.
on the other hand if you need to update the packages during the run time you may add apt-get -y update && apt-get -y upgrade to your ENTRYPOINT in this case you update the packages every time the container starts.

What is the difference between multiples RUN entries in Dockerfile and just one RUN entry?

What is the difference between multiples RUN entries in Dockerfile like:
FROM php:5.6-apache
RUN docker-php-ext-install mysqli
RUN apt update
RUN apt install git -y -q
and just one RUN entry?
FROM php:5.6-apache
RUN docker-php-ext-install mysqli && apt update && apt install git -y -q
OBS. I'm not asking which one is better. I Want to know all the differences between the two approaches.
Each RUN command creates a layer of the filesystem changes generated by a temporary container started to run that command. (It's effectively running a docker run and then packaging the result of docker diff into a filesystem layer.)
These layers have a few key details to note:
They are immutable. Once you create them you don't change them. You would have to generate/recreate a new layer, to update your image.
They are reusable between multiple images and running containers. You can do this because of the immutability.
You do not delete files from a parent layer, but you can register that a file is deleted in a later layer. This is a metadata change in that later layer, not a modification to the parent layer.
Layers are reused in docker's build cache. If two different images, or even the same image being rebuilt, perform the same command on top of the same parent layer, docker will reuse the already created layer.
These layers are merged together into the final filesystem you see inside your container.
The main difference between the two approaches are the build cache and deleting files. If you split apart the download of a source code tgz, extraction of the tgz, compiling a binary, and the deleting of the tgz and source folders, into multiple RUN lines, then when you ship the image over the network and store it on disk, you will have all of the source in the layers even though you don't see it in the final container. Your image will be significantly larger.
Caching can also be a bad thing when you cache too much. If you split the apt update and apt install, and then add a new package to install to your second run line months later, docker will reuse the months old cache of apt update and try to install packages that are months old, possibly no longer available, and your image may fail to build. Many people also run a rm -rf /var/lib/apt/lists/* after installing debian packages. And if you do this in a separate step, you will not actually delete the files from the previous layers, so your image will not shrink.

Update of operating system takes forever, while building Docker image

I'm using Debian:latest as the base image for my Docker containers.
Problem is that on every build I have to update the OS packages, which takes forever. Here is what I do:
FROM debian:latest
RUN apt-get update && apt-get install -y --force-yes --no-install-recommends nginx
...
apt-get update && apt-get install lasts forever. What do I do about this?
Docker images are minimal, only including the absolute necessities to run that base image. For the Debian base images, that means there is no package repo cache. So when you run an apt-get update it is downloading a the package repo cache for the first time from all the repos. If they included the package repo cache, it would be many megs of package state that would be quickly out of date, resulting in larger base images with little reduction on doing an update later.
The actual debian:latest image is relatively well maintained with commits from last month. You can view the various tags for it here: https://hub.docker.com/_/debian/
To reduce your image build time, I'd recommend not deleting your image every time. Instead, do your new build and tag, and once the new image is built, you can run a docker image prune --force to remove the untagged images from prior builds. This allows docker to reuse the cache from prior image builds.
Alternatively, you can create your own base image that you update less frequently and that has all of your application prerequisites. Build it like any other image, and then change the FROM debian:latest to FROM your_base_image.
One last tip, avoid using latest in your image builds, do something like FROM debian:9 instead so that a major version update in debian doesn't break your build.
Don't delete image on each build. Just modify your Dockerfile and build again. Docker is "smart" and it will keep unmodified layers building only from lines you changed. That intermediate images used for this purpose (docker is creating them automatically) can be easily removed with this command:
docker rmi $(docker images -q -f dangling=true)
You'll save a lot of time with this.Remember, don't delete image. Just modify the Dockerfile and build again. And after finishing with everything working lauch this command and that's all you need.
Another "good to launch for cleaning command" can be:
docker volume prune -f
But this last command is for other kind of cleaning not related to images... is more focused on containers.

Resources