In my understanding, docker build usually use cache if Dockerfile seems not to be changed and not include COPY command, so if I do it with no option, Dockerfile which includes apt-get or apt-get update(or something similler command, you know) will be cached and never update package actually.
I want to use latest package for several library(for security purpose) so I always use docker build with no cache option.
On the other hand, there is --mount=type=cache option. It's not docker build option but RUN command option. I read document. this RUN option makes package managers possible to be cached.
So, maybe my approach is wrong? With docker, does it generally use cache and never (or slight few) update packages?
when you not change the Dockerfile the cashe will always be used sure if the image is already downloaded locally.
your approch to use --no-cache is right.
on the other hand if you need to update the packages during the run time you may add apt-get -y update && apt-get -y upgrade to your ENTRYPOINT in this case you update the packages every time the container starts.
Related
At my company, we have hardened containers created by the security team, and I would like to extend the hardened container with another docker image. For example, if we have a hardened Debian container, and I want to add Apache, how do I do this?
I understand I can use FROM to use a base, but the examples I've seen, don't add another level of published images to an existing base, but specific commands. Do I just go to the official Dockerhub Apache (HTTP) image and just copy and paste the commands from the github repo? I'm assuming there's a cleaner way (but not sure if there is).
For example, do I
FROM mycompanyprivaterepo/Debian:latest
//some command?
FROM httpd
docker build -t mynewimagewithapache
UPDATE:
After attempting via apt-get apache2 per some comments, it kept hanging on interactive questions, Solved with the help of comments using:
My Dockerfile:
FROM myprivaterepo/hardened-ubuntu
RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get -qq install apache2
and building via:
$ docker build -t hardened-ubuntu-apache
Well, as far as I understood, you cannot use multi-stage builds and just
COPY --from=base-image /path/to/file/you-are-interested-in /path/inside/new-stage-image
in order to copy the required data to your preferred image. If this is the case, then you have to create your own Dockerfile with base image as your company mycompanyprivaterepo/Debian:latest, and then just create some layers on top of it in order to install required software, using RUN.
I have a docker image which run the following command
RUN apt-get update --fix-missing && apt-get install -y --no-install-recommends build-essential debhelper rpm ruby ruby-dev sudo cmake make gcc g++ flex bison git libpcap-dev libssl-dev ninja-build openssh-client python-dev python3-pip swig zlib1g-dev python3-setuptools python3-requests wget curl unzip zip default-jdk && apt-get clean && rm -rf /var/lib/apt/lists/*
If I run it couple time in the same day, the layer seems cached. However, docker will think the layer changed if I run it for the first time daily.
Just wonder what's special in the above command that makes docker thinks the layer changed?
This is not caused by docker. When docker sees a RUN command, all it does is simple string comparison to determine whether the layer is in the cache or not. If it sees it in cache, it will reuse it and if not, it will run it.
Since you have mentioned that it builds whole day using cache and then it doesn't the next day, the only possible explanation is that the cache has been invalidated/deleted during that time by someone/something.
I don't know how/where you are running the docker daemon but it may be the case that it is running in VM that is being recreated each day from a base image which would then destroy all the cache and force docker to rebuild the image.
Another explanation is that you have some cleanup process running once a day, maybe some cron that deletes the cache.
Bottom line is that docker will happily reuse that cache for unlimited period of time, as long as the cache actually exists.
I am assuming that previous layers has been built from cache (if there are any), otherwise you should look for COPY/ADD commands if they are not causing the cache busting due to file changes in your build context.
It's not the command, it's the steps that occur before it. Specifically, if the files being copied to previous layers were modified. I can be more specific if you'll edit the post to show all the steps in the Dockerfile before this one.
According to the docker doc:
Aside from the ADD and COPY commands, cache checking does not look at the files in the container to determine a cache match. For example, when processing a RUN apt-get -y update command the files updated in the container are not examined to determine if a cache hit exists. In that case just the command string itself is used to find a match
For a RUN command, it just command string itself is used to find a match. So, maybe any processes delete the cache layer, or maybe you changed your Dockerfile?
If I understand correctly, on standard Ubuntu systems for example, root certificates are provided by ca-certificates package and get updated when the package itself is updated.
But how can the root certificates be updated when using docker containers ? Is there a common preferred way of doing this, or must the containers be redeployed with an up-to-date docker image ?
The containers must be redeployed with an up-to-date image.
The Docker Hub base images like ubuntu actually get updated fairly regularly, and if you look at the tag list you can see that there are several date-stamped variants of the images. So one approach that will get you pretty close to current is to always (have your CI system) pull the base image before you build.
docker pull ubuntu:18.04
docker build .
If you can't do that, or if you're working from some sort of derived image that updates less frequently, you can just manually run apt-get upgrade in your Dockerfile. Doing this in the same place you're otherwise installing packages makes sense. It needs to be in the same RUN line as a matching apt-get update, and you might need some way to force Docker to not cache that update line to get current updates.
FROM python:3.8-slim
# Have an option to force rebuilds; the RUN line won't be
# cacheable if the dependency_stamp option changes
ARG dependency_stamp
ENV dependency_stamp=${dependency_stamp:-unknown}
RUN touch /dependencies.${dependency_stamp}
# Update base OS packages and install other things we need
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get upgrade \
&& DEBIAN_FRONTEND=noninteractive apt-get install \
--no-install-recommends --assume-yes \
...
If you find yourself doing this routinely, maintaining your own base images that are upgraded to current packages but don't have anything else installed can be helpful; if you find yourself doing that, you might have more control over the process and get smaller images if you build an image FROM ubuntu and install e.g. Python, rather than building an image FROM python and then installing updates over it.
To install my microservice binaries I need a centos. And since I have 20 microservice I'm trying to find a way to optimize the images size so I'm wondering if there's a way to create a docker image without os and at the moment of deployment Docker takes the OS Layer from cache to put it in all the images.. I'm a beginner so I don't know if I'm clear in my statements ?
Yes, look at the scratch keyword (docs):
You can use Docker’s reserved, minimal image, scratch, as a starting
point for building containers.
Also you may find useful using multi-stage builds.
An example:
FROM scratch
ADD hello /
FROM fedora
RUN yum -y update && yum clean all
RUN yum -y install nginx
In my Dockerfile I may have a step that looks like this in order to install some packages.
Run yum install pkg1 pkg2 -y &&\
yum -y clean all
The problem is that when I build the container more than once, Docker see's this command as not changing and never runs it. It instead chooses to use a previously cached layer.
However, pkg1 or pkg2 may have been updated in the yum repository and need to be updated, and since it instead used a cached docker layer, the container does not receive the updated packages.
I could build with the --no-cache option, but that would invalidate all cache layers, which substantially slows down the container build as usually my yum install commands are near the end of my Dockerfiles.
What is the best strategy to deal with this? Is there any solution to only invalidate the docker cache if there is a different version of the package in the cache vs repo?
From "Build cache", you could insert an ADD or COPY directive (of a dummy file) just before those RUN commands.
Whenever you want to invalidate the cache for the next RUN, modify the content of the dummy file, and the ADD/COPY (with the rest of the Dockerfile commands) won't rely on the cache.