why we don't use CMD apt update instead of RUN apt update on Dockerfile? - docker

why we don't use CMD apt update instead of RUN apt update on Dockerfile
we use RUN apt update for update an image this is for one time but why we don't use CMD apt update for update every container we create ? ? ? ?

As it sounds like you already know, RUN is intended "xecute any commands in a new layer on top of the current image and commit the results", and CMD is intended to "xecute any commands in a new layer on top of the current image and commit the results". So RUN is a build-time instruction, while CMD is a run-time instruction.
There are a few reasons this won't be a good idea:
Containers should be fast
Containers are usually expected to consume as few resources as possible, and startup and shutdown quickly and easily. If we update a container's packages EVERY time we want to run a container, it might take the container many minutes or even hours on a bad network before it can even start running whatever process it is intended for.
Unexpected behavior
Part of the process when developing a new container image is ensuring that the packages that are necessary for the container to work, play well together. But if we are upgrading all the packages each time the container is run on whatever system it is run on, it is possible (if not inevitable) that there will eventually be a package that will be published that introduces a breaking change to the container, and this is obviously not ideal.
Now this could be avoided by removing the default repositories and replacing them with your own where you can vet each package upgrade, test them together, and publish them, but this is probably a much greater effort than what would make sense unless the repos would be serving multiple container images.
Image Versioning
Many container images (ex Golang) will version their images based on the version of Golang they support; however, when the underlying packages on the container are changing how would you start to version the image?
Now this isn't necessarily a deal breaker, but it could cause confusion among the containers user-base and ultimately undercut their trust in the container.
Unexpected network traffic
Even if well documented, most developers would not expect this type of functionality and would lead to development issues when your container requires access to the internet. For example, in a K8s environment networking can be extremely strict and the developer would need to manually open up a route to the internet (or a set of custom repos).
Additionally, even if the networking is not an issue, if you expected a lot of these containers to be started, you might be clogging the network with the upgrade packages and cause network performance issues.
Wouldn't work for base images
While it sounds like you are probably not developing an image intended to serve as a base image for anything else... but obviously the CMD likely would be overriden for the base image.

Related

How can squid be used in a Dockerfile to cache downloads in a host directory

I am running a docker build command with a Dockerfile, but this is being held up by a slow, and sometimes aborted, download of a certain package (google's boringssl as it happens).
I would like to install squid near the start of the Dockerfile, so that subsequent git clones, apt gets, etc, i.e. every kind of download is cached to a directory outside the docker image, by defining a volume in the docker build command.
I'm fairly familiar with Docker, and understand the concept of layers. So a reply dealing solely with those will not be useful to me (although possibly to others). But sometimes one has to make a Dockerfile change that disrupts subsequent layers, and also a build error of a subsystem within a layer will mean all the web fetches for that layer will need repeating on the next build. So layers are not the answer to everything cache-related.
Thanks in anticipation!

What's the best way to cache downloads done by docker while building containers?

While testing new Docker builds (modifying Dockerfile) it can take quite some time for the image to rebuild due to huge download size (either direct by wget, or indirect using apt, pip, etc)
One way around this that I personally use often is to just split commands I plan to modify into their own RUN variable. This avoids re-downloading some parts because previous layers are cached. This, however, doesn't cut it if the command that requires "tuning" is early on in the Dockerfile.
Another solution is to use an image that already contains most of the required packages so that it would just be pulled once and cached, but this can come with unnecessary "baggage".
So is there a straight forward way to cache all downloads done by Docker while building/running? I'm thinking of having Memcached on the host machine but it seems kind of an overkill. Any suggestions?
I'm also aware that I can test in an interactive shell first but sometimes you need to test the Dockerfile and make sure it's production-ready (including arguments and defaults) especially if the only way you will ever see what's going on after that point is ELK or cluster crash logs
This here:
https://superuser.com/questions/303621/cache-for-apt-packages-in-local-network
Is the same question but regarding a local network instead of the same machine. However, the answer can be used in this scenario, it's actually a simpler scenario than a network with multiple machines.
If you install Squid locally you can use it to cache all your downloads including your host-side downloads.
But more specifically, there's also a Squid Docker image!
Headsup: If you use a squid service in a docker-compose file, don't forget to use the squid service name instead of docker's subnet gateway 172.17.0.1:3128 becomes squid:3128
the way i did this was
used the new --mount=type=cache,target=/home_folder/.cache/curl
wrote a script which looks into the cache before calling curl (wrapper over curl with cache)
called the script in the Dockerfile during build
it is more a hack, works

Why do we build "inside" docker?

When I first learned Docker I expected a config file, image producer, CLI, and options for mounting and networks. That's all there.
I did not expect to put build commands inside a Dockerfile. I thought docker would wrap/tar/include a prebuilt task I made. Why give build commands in Docker?
Surely it can import a task thus keeping Jenkins/Bazel etc. distinct and apart for making an image/container?
I guess we are dealing with a misconception here. Docker is NOT a lighweight version of VMware/Xen/KVM/Parallels/FancyVirtualization.
Disclaimer: The following is heavily simplified for the sake of comprehensiveness.
So what is Docker?
In one sentence: Docker is a system to isolate processes from the other processes within an operating system as much as possible while still providing all means to run them. Put differently:
Docker is a package manager for isolated processes.
One of its closest ancestors are chroot and BSD jails. What those basically do is to isolate (more in the case of BSD, less in the case of chroot) a part of your OS resources and have a complete environment running independently from the rest of the OS - except for the kernel.
In order to be able to do that, a Docker image obviously needs to contain everything except for a kernel. So you need to provide a shell (if you choose to do so), standard libraries like glibc and even resources like CA certificates. For reference: In order to set up chroot jails, you did all this by hand once upon a time, preinstalling your chroot environment with each and every piece of software required. Docker is basically taking the heavy lifting from you here.
The mentioned isolation even down to the installed (and usable software) sounds cumbersome, but it gives you several advantages as a developer. Since you provide basically everything except for a (compatible) kernel, you can develop and test your code in the same environment it will run later down the road. Not a close approximation, but literally the same environment, bit for bit. A rather famous proverb in relation to Docker is:
"Runs on my machine" is no excuse any more.
Another advantage is that can add static resources to your Docker image and access them via quite ordinary file system semantics. While it is true that you can do that with virtualisation images as well, they usually do not come with a language for provisioning. Docker does - the Dockerfile:
FROM alpine
LABEL maintainer="you#example.com"
COPY file/in/host destination/on/image
Ok, got it, now why the build commands?
As described above, you need to provide all dependencies (and transitive dependencies) your application has. The easiest way to ensure that is to build your application inside your Docker image:
FROM somebase
RUN yourpackagemanager install long list of dependencies && \
make yourapplication && \
make install
If the build fails, you know you have missing dependencies. Now you can tweak and tune your Dockerfile until it compiles and is tested. So now your Docker image is finished, you can confidently distribute it, since you know that as long as the docker daemon runs on the machine somebody tries to run your image on, your image will run.
In the Go ecosystem, you basically assure your go.mod and go.sum are up to date and working and your work stay's reproducible.
Again, this works with virtualisation as well, so where is the deal?
A (good) docker image only runs what it needs to run. In the vast majority of docker images, this means exactly one process, for example your Go program.
Side note: It is very bad practise to run multiple processes in one Docker image, say your application and a database server and a cache and whatnot. That is what docker-compose is there for, or more generally container orchestration. But this is far too big of a topic to explain here.
A virtualised OS, however, needs to run a kernel, a shell, drivers, log systems and whatnot.
So the deal basically is that you get all the good stuff (isolation, reproducibility, ease of distribution) with less waste of resources (running 5 versions of the same OS with all its shenanigans).
Because we want to have enviroment for reproducible build. We don't want to depend on version of language, existence of compiler, version of libraires and so on.
Building inside a Dockerfile allows you to have all the tools and environment you need inside independently of your platform and ready to use. In a development perspective is easier to have all you need inside the container.
But you have to think about the objective of building inside a Dockerfile, if you have a very complex build process with a lot of dependencies you have to be worried about having all the tools inside and it reflects on the final size of your resulting image. Because this is not the same building to generate an artifact than building to produce the final container.
Thinking about this two aspects you have to learn to use the multistage build process in Docker here. The main idea is closer to your question because you can have a as many stages as you need depending on your build process and use different FROM images to ensure you have the correct requirements and dependences on each stage, to finally generate the image with the minimum dependencies and smaller size.
I'll add to the answers above:
Doing builds in or out of docker is a choice that depends on your goal. In my case I am more interested in docker containers for kubernetes, and in addition we have mature builds already.
This link shows how you take prebuilt tasks and add them to an image. This strategy together with adding libs, env etc leverages docker well and shows that indeed docker is flexible. https://medium.com/#chemidy/create-the-smallest-and-secured-golang-docker-image-based-on-scratch-4752223b7324

Create new docker image vs run shell commands

we are working with fabric-ca docker image. it does not come with scp installed so we have two options:
Option 1: create a new image as described here
Option 2: install scp from the shell when container is started
we'd like to understand what are the pros and cons of each.
Option 1: allows you to build on it further, creates a stable state, you can verify / test an image before releasing
Option 2: takes longer to startup, requires being online during container start, it is harder to trace / understand and manage software stack locked in e.g. bash scripts that start dockers vs. Dockerfile and whatever technology you will end up using for container orchestration.
Ultimately, I use option 2 only for discovery, proof of concept or trying something out. Once I know I need certain container on ongoing basis, I build a proper image via Dockerfile.
You should consider your option 2 a non-starter. Either build a custom image or use a host directory bind-mount (docker run -v /host/path:/container/path option) to inject the data you need; I would probably prefer the bind-mount option.
It’s extremely routine to docker rm a container, and when you do, any changes you’ve made locally in a container are lost. For example, if there is a new software release or a critical security update, you have to recreate the container with a new image. You should pretty much never install software in an interactive shell in a container, especially if you’re going to use it to copy in data your application needs: you’ll have to repeat this step every single time you delete and recreate the container.
Option 1:
The BUILD of the image is longer, but you execute it only the first time
The RUN is faster
You don't need an internet connection at RUN
Include a verification of the different steps
Allow tracability
Option 2:
The RUN is longer
You need need an internet connection at RUN
Harder to trace

Handling software updates in Docker images

Let's say I create a docker image called foo that contains the apt package foo. foo is a long running service inside the image, so the image isn't restarted very often. What's the best way to go about updating the package inside the container?
I could tag my images with the version of foo that they're running and install a specific version of the package inside the container (i.e. apt-get install foo=0.1.0 and tag my container foo:0.1.0) but this means keeping track of the version of the package and creating a new image/tag every time the package updates. I would be perfectly happy with this if there was some way to automate it but I haven't seen anything like this yet.
The alternative is to install (and update) the package on container startup, however that means a varying delay on container startup depending on whether it's a new container from the image or we're starting up an existing container. I'm currently using this method but the delay can be rather annoying for bigger packages.
What's the (objectively) best way to go about handling this? Having to wait for a container to start up and update itself is not really ideal.
If you need to update something in your container, you need to build a new container. Think of the container as a statically compiled binary, just like you would with C or Java. Everything inside your container is a dependency. If you have to update a dependency, you recompile and release a new version.
If you tamper with the contents of the container at startup time you lose all the benefits of Docker: That you have a traceable build process and each container is verifiably bit-for-bit identical everywhere and every time you copy it.
Now let's address why you need to update foo. The only reason you should have to update a dependency outside of the normal application delivery cycle is to patch a security vulnerability. If you have a CVE notice that ubuntu just released a security patch then, yep, you have to rebuild every container based on ubuntu.
There are several services that scan and tell you when your containers are vulnerable to published CVEs. For example, Quay.io and Docker Hub scan containers in your registry. You can also do this yourself using Clair, which Quay uses under the hood.
For any other type of update, just don't do it. Docker is a 100% fossilization strategy for your application and the OS it runs on.
Because of this your Docker container will work even if you copy it to 1000 hosts with slightly different library versions installed, or run it alongside other containers with different library versions installed. You container will continue to work 2 years from now, even if the dependencies can no longer be downloaded from the internet.
If for some reason you can't rebuild the container from scratch (e.g. it's already 2 years old and all the dependencies went missing) then yes, you can download the container, run it interactively, and update dependencies. Do this in a shell and then publish a new version of your container back into your registry and redeploy. Don't do this at startup time.

Resources