Update of root certificates on docker - docker

If I understand correctly, on standard Ubuntu systems for example, root certificates are provided by ca-certificates package and get updated when the package itself is updated.
But how can the root certificates be updated when using docker containers ? Is there a common preferred way of doing this, or must the containers be redeployed with an up-to-date docker image ?

The containers must be redeployed with an up-to-date image.
The Docker Hub base images like ubuntu actually get updated fairly regularly, and if you look at the tag list you can see that there are several date-stamped variants of the images. So one approach that will get you pretty close to current is to always (have your CI system) pull the base image before you build.
docker pull ubuntu:18.04
docker build .
If you can't do that, or if you're working from some sort of derived image that updates less frequently, you can just manually run apt-get upgrade in your Dockerfile. Doing this in the same place you're otherwise installing packages makes sense. It needs to be in the same RUN line as a matching apt-get update, and you might need some way to force Docker to not cache that update line to get current updates.
FROM python:3.8-slim
# Have an option to force rebuilds; the RUN line won't be
# cacheable if the dependency_stamp option changes
ARG dependency_stamp
ENV dependency_stamp=${dependency_stamp:-unknown}
RUN touch /dependencies.${dependency_stamp}
# Update base OS packages and install other things we need
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get upgrade \
&& DEBIAN_FRONTEND=noninteractive apt-get install \
--no-install-recommends --assume-yes \
...
If you find yourself doing this routinely, maintaining your own base images that are upgraded to current packages but don't have anything else installed can be helpful; if you find yourself doing that, you might have more control over the process and get smaller images if you build an image FROM ubuntu and install e.g. Python, rather than building an image FROM python and then installing updates over it.

Related

How does docker cache the layer?

I have a docker image which run the following command
RUN apt-get update --fix-missing && apt-get install -y --no-install-recommends build-essential debhelper rpm ruby ruby-dev sudo cmake make gcc g++ flex bison git libpcap-dev libssl-dev ninja-build openssh-client python-dev python3-pip swig zlib1g-dev python3-setuptools python3-requests wget curl unzip zip default-jdk && apt-get clean && rm -rf /var/lib/apt/lists/*
If I run it couple time in the same day, the layer seems cached. However, docker will think the layer changed if I run it for the first time daily.
Just wonder what's special in the above command that makes docker thinks the layer changed?
This is not caused by docker. When docker sees a RUN command, all it does is simple string comparison to determine whether the layer is in the cache or not. If it sees it in cache, it will reuse it and if not, it will run it.
Since you have mentioned that it builds whole day using cache and then it doesn't the next day, the only possible explanation is that the cache has been invalidated/deleted during that time by someone/something.
I don't know how/where you are running the docker daemon but it may be the case that it is running in VM that is being recreated each day from a base image which would then destroy all the cache and force docker to rebuild the image.
Another explanation is that you have some cleanup process running once a day, maybe some cron that deletes the cache.
Bottom line is that docker will happily reuse that cache for unlimited period of time, as long as the cache actually exists.
I am assuming that previous layers has been built from cache (if there are any), otherwise you should look for COPY/ADD commands if they are not causing the cache busting due to file changes in your build context.
It's not the command, it's the steps that occur before it. Specifically, if the files being copied to previous layers were modified. I can be more specific if you'll edit the post to show all the steps in the Dockerfile before this one.
According to the docker doc:
Aside from the ADD and COPY commands, cache checking does not look at the files in the container to determine a cache match. For example, when processing a RUN apt-get -y update command the files updated in the container are not examined to determine if a cache hit exists. In that case just the command string itself is used to find a match
For a RUN command, it just command string itself is used to find a match. So, maybe any processes delete the cache layer, or maybe you changed your Dockerfile?

Docker build with latest apt package is general?

In my understanding, docker build usually use cache if Dockerfile seems not to be changed and not include COPY command, so if I do it with no option, Dockerfile which includes apt-get or apt-get update(or something similler command, you know) will be cached and never update package actually.
I want to use latest package for several library(for security purpose) so I always use docker build with no cache option.
On the other hand, there is --mount=type=cache option. It's not docker build option but RUN command option. I read document. this RUN option makes package managers possible to be cached.
So, maybe my approach is wrong? With docker, does it generally use cache and never (or slight few) update packages?
when you not change the Dockerfile the cashe will always be used sure if the image is already downloaded locally.
your approch to use --no-cache is right.
on the other hand if you need to update the packages during the run time you may add apt-get -y update && apt-get -y upgrade to your ENTRYPOINT in this case you update the packages every time the container starts.

docker build - Avoid ADDing files only needed at build time

I'm trying to build a docker image avoiding unnecessary bulk, and I've run into a problem that I think should be common, but so far I haven't found a straightforward solution. (I'm building the docker on an ubuntu 18.04 system, and starting with a FROM ubuntu layer.)
In particular, I have a very large .deb file (over 3G) that I need to install in the image. It's easy enough to COPY or ADD it and then RUN dpkg -i, but that results in duplication of several GB of space that I don't need. Of course, just removing the file doesn't reduce the image size.
I'd like to be able to mount a volume to access the .deb file, rather than COPY it, which is easy to do when running a container, but apparently not possible to do when building one?
What I've come up with so far is to build the docker up to the point where I would ADD the file, then run it with a volume mounted so I can access it from the container without COPYing it, then I dpkg -i it, then I do a docker commit to create an image from that container. Sure enough, I end up with an image that's over 3GB smaller than my first try, but that seems like a hack, and makes scripting the build more complicated.
I'm thinking there must be a more appropriate way to achieve this, but so far my searching has not revealed an obvious answer. Am I missing something?
Relying on docker commit indeed amounts to a hack :) and its use is thus mentioned as inadvisable by some references such as this blog article.
I only see one possible approach for the kind of use case you mention (copy a one-time .deb package, install it and remove the binary immediately from the image layer):
You could make remotely available to the docker engine that builds your image, the .deb you'd want to install, and replace the COPY + RUN directives with a single one, e.g., relying on curl:
RUN curl -OL https://example.com/foo.deb && dpkg -i foo.deb && rm -f foo.deb
If curl is not yet installed, you could run beforehand the usual APT commands:
RUN apt-get update -y -q \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y -q --no-install-recommends \
ca-certificates \
curl \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
Maybe there is another possible solution (but I don't think the multi-staged builds Docker feature would be of some help here, as all perms would be lost by doing e.g. COPY --from=build / /).

Check for updated package via yum in Dockerfile

In my Dockerfile I may have a step that looks like this in order to install some packages.
Run yum install pkg1 pkg2 -y &&\
yum -y clean all
The problem is that when I build the container more than once, Docker see's this command as not changing and never runs it. It instead chooses to use a previously cached layer.
However, pkg1 or pkg2 may have been updated in the yum repository and need to be updated, and since it instead used a cached docker layer, the container does not receive the updated packages.
I could build with the --no-cache option, but that would invalidate all cache layers, which substantially slows down the container build as usually my yum install commands are near the end of my Dockerfiles.
What is the best strategy to deal with this? Is there any solution to only invalidate the docker cache if there is a different version of the package in the cache vs repo?
From "Build cache", you could insert an ADD or COPY directive (of a dummy file) just before those RUN commands.
Whenever you want to invalidate the cache for the next RUN, modify the content of the dummy file, and the ADD/COPY (with the rest of the Dockerfile commands) won't rely on the cache.

Why are Docker container images so large? [closed]

Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed last month.
The community reviewed whether to reopen this question last month and left it closed:
Original close reason(s) were not resolved
Improve this question
I made a simple image through Dockerfile from Fedora (initially 320 MB).
Added Nano (this tiny editor of 1MB size), and the size of the image has risen to 530 MB. I've added Git on top of that (30-ish MB), and then my image size sky-rockets to 830 MB.
Isn't that insane?
I've tried to export and import container to remove history/intermediate images. This effort saved up to 25 MB, now my image size is 804 MB. I've also tried to run many commands on one RUN, but still I'm getting the same initial 830MB.
I'm having my doubts if it is worth to use Docker at all. I mean, I barely installed anything and I'm hitting 1GB over. If I will have to add some serious stuff like a database and so on I might run out of disk space.
Anyone suffers from ridiculous size of images? How do you deal with it?
Unless my Dockerfile is horribly incorrect?
FROM fedora:latest
MAINTAINER Me NotYou <email#dot.com>
RUN yum -y install nano
RUN yum -y install git
but it's hard to imagine what could go wrong in here.
As #rexposadas said, images include all the layers and each layer includes all the dependencies for what you installed. It is also important to note that the base images (like fedora:latest tend to be very bare-bones. You may be surprised by the number of dependencies your installed software has.
I was able to make your installation significantly smaller by adding yum -y clean all to each line:
FROM fedora:latest
RUN yum -y install nano && yum -y clean all
RUN yum -y install git && yum -y clean all
It is important to do that for each RUN, before the layer gets committed, or else deletes don't actually remove data. That is, in a union/copy-on-write file system, cleaning at the end doesn't really reduce file system usage because the real data is already committed to lower layers. To get around this you must clean at each layer.
$ docker history bf5260c6651d
IMAGE CREATED CREATED BY SIZE
bf5260c6651d 4 days ago /bin/sh -c yum -y install git; yum -y clean a 260.7 MB
172743bd5d60 4 days ago /bin/sh -c yum -y install nano; yum -y clean 12.39 MB
3f2fed40e4b0 2 weeks ago /bin/sh -c #(nop) ADD file:cee1a4fcfcd00d18da 372.7 MB
fd241224e9cf 2 weeks ago /bin/sh -c #(nop) MAINTAINER Lokesh Mandvekar 0 B
511136ea3c5a 12 months ago 0 B
Docker images are not large, you are just building large images.
The scratch image is 0B and you can use that to package up your code if you can compile your code into a static binary. For example, you can compile your Go program and package it on top of scratch to make a fully usable image that is less than 5MB.
The key is to not use the official Docker images, they are too big. Scratch isn't all that practical either so I'd recommend using Alpine Linux as your base image. It is ~5MB, then only add what is required for your app. This post about Microcontainers shows you how to build very small images base on Alpine.
UPDATE: the official Docker images are based on alpine now so they are good to use now.
Here are some more things you can do:
Avoid multiple RUN commands where you can. Put as much as possbile into one RUN command (using &&)
clean-up unnecessary tools like wget or git (which you only need for download or building stuff, but not to run your process)
With these both AND the recommendations from #Andy and #michau I was able to resize my nodejs image from 1.062 GB to 542 MB.
Edit:
One more important thing:
"It took me a while to really understand that each Dockerfile command creates a new container with the deltas. [...] It doesn't matter if you rm -rf the files in a later command; they continue exist in some intermediate layer container."
So now I managed to put apt-get install, wget, npm install (with git dependencies) and apt-get remove into a single RUN command, so now my image has only 438 MB.
Edit 29/06/17
With Docker v17.06 there comes a new features for Dockerfiles:
You can have multiple FROM statements inside one Dockerfile and only the stuff from last FROM will be in your final Docker image. This is useful to reduce image size, for example:
FROM nodejs as builder
WORKDIR /var/my-project
RUN apt-get install ruby python git openssh gcc && \
git clone my-project . && \
npm install
FROM nodejs
COPY --from=builder /var/my-project /var/my-project
Will result in an image having only the nodejs base image plus the content from /var/my-project from the first steps - but without the ruby, python, git, openssh and gcc!
Yes, those sizes are ridiculous, and I really have no idea why so few people notice that.
I made an Ubuntu image that is actually minimal (unlike other so-called "minimal" images). It's called textlab/ubuntu-essential and has 60 MB.
FROM textlab/ubuntu-essential
RUN apt-get update && apt-get -y install nano
The above image is 82 MB after installing nano.
FROM textlab/ubuntu-essential
RUN apt-get update && apt-get -y install nano git
Git has many more prerequisites, so the image gets larger, about 192 MB. That's still less that the initial size of most images.
You can also take a look at the script I wrote to make the minimal Ubuntu image for Docker. You can perhaps adapt it to Fedora, but I'm not sure how much you will be able to uninstall.
The following helped me a lot:
After removing unused packages (e.g. redis 1200 mb freed) inside my container, I have done the following:
docker export [containerID] -o containername.tar
docker import -m "commit message here" containername.tar imagename:tag
The layers get flatten. The size of the new image will be smaller because I've removed packages from the container as stated above.
This took a lot of time to understand this and that's why I've added my comment.
For best practise, you should execute a single RUN command, because
every RUN instruction in the Dockerfile writes a new layer in the image and every layer requires extra space on disk. In order to keep the number layers to a minimum, any file manipulation like install, moving, extracting, removing, etc, should ideally be made under a single RUN instruction
FROM fedora:latest
RUN yum -y install nano git && yum -y clean all
Docker Squash is a really nice solution to this. you can $packagemanager clean in the last step instead of in every line and then just run a docker squash to get rid of all of the layers.
https://github.com/jwilder/docker-squash
Yes the layer system is quite surprising.
If you have a base image and you increment it by doing the following:
# Test
#
# VERSION 1
# use the centos base image provided by dotCloud
FROM centos7/wildfly
MAINTAINER JohnDo
# Build it with: docker build -t "centos7/test" test/
# Change user into root
USER root
# Extract weblogic
RUN rm -rf /tmp/* \
&& rm -rf /wildfly/*
The image has exactly the same size. That essentially means, you have to manage to put into your RUN steps a lot of extract, install and cleanup magic to make the images as small as the software installed.
This makes life much harder...
The dockerBuild is missing RUN steps without commit.
We had a similar issue in our docker build process. Each image built was significantly larger than the others. As it turns out we were getting tar.gz files included in the image. Among these were the compressed images we upload to a server. So each image contained the prior images by accident. Image sizes were soon in the 8gb range.
.dockerignore is your friend. Make sure anything in your project not necessary to build the image is in the ignore file.

Resources