Do Docker images change? How to ensure they do not? - docker

One of the main benefits of Docker is reproducibility. One can specify exactly which programs and libraries get installed how and where, for example.
However, I'm trying to think this through and can't wrap my head around it. How I understand reproducibility is that if you request a certain tag, you will receive the same image with the same contents every time. However there are two issues with that:
Even if I try to specify a version as thoroughly as possible, for example python:3.8.3, I seem to have no guarantee that it points to a static non-changing image? A new version could be pushed to it at any time.
python:3.8.3 is a synonym for python:3.8.3-buster which refers to the Debian Buster OS image this is based on. So even if Python doesn't change, the underlying OS might have changes in some packages, is that correct? I looked at the official Dockerfile and it does not specify a specific version or build of Debian Buster.

If you depend on external docker images, your Docker image indeed has no guarantee of reproducability. The solution is to import the Python:3.8.3 image into your own Docker Registry, ideally a docker registry that can prevent overriding of tags (immutability), e.g. Harbor.
However, reproducibility if your Docker image is harder then only the base image you import. E.g. if you install some pip packages, and one of the pip packages does not pin a version of a package they depend on, you still have no guarantee that rebuilding your Docker image leads to the same image. Hosting those python packages in your own pip artifactory is again the solution here.

Addressing your individual concerns.
Even if I try to specify a version as thoroughly as possible, for example python:3.8.3, I seem to have no guarantee that it points to a static non-changing image? A new version could be pushed to it at any time.
I posted this in my comment on your question, but addressing it here as well. Large packages use semantic versioning. In order for trust to work, it has to be established. This method of versioning introduces trust and consistency to an otherwise (sometimes arbitrary) system.
The trust is that when they uploaded 3.8.3, it will remain as constant as possible for the future. If they added another patch, they will upload 3.8.4, if they added a feature, they will upload 3.9.0, and if they broke a feature, they would create 4.0.0. This ensures you, the user, that 3.8.3 will be the same, every time, everywhere.
Frameworks and operating systems often backport patches. PHP is known for this. If they find a security hole in v7 that was in v5, they will update all versions of v5 that had it. While all the v5 versions were updated from their original published versions, functionality remained constant. This is important, this is the trust.
So, unless you were "utilizing" that security hole to do what you needed to do, or relying on a bug, you should feel confident that 3.8.3 from DockerHub should always be used.
NodeJS is a great example. They keep all their old deprecated versions available in Docker Hub for archival sake.
I have been utilizing named tags (NOT latest) from Docker Hub in all my projects for work and home, and I've never into an issue after deployment where a project crashed because something changed "under my feet". In fact, just last week, I rebuilt and updated some code on an older version of NodeJS (from 4 years ago) which required a repull, and because it was a named version (not latest), it worked exactly as expected.
python:3.8.3 is a synonym for python:3.8.3-buster which refers to the Debian Buster OS image this is based on. So even if Python doesn't change, the underlying OS might have changes in some packages, is that correct? I looked at the official Dockerfile and it does not specify a specific version or build of Debian Buster.
Once a child image (python) is built off a parent image (buster), it is immutable. The exception is if the child image (python) was rebuilt at a later date and CHOOSES to use a different version of the parent image (buster). But this is considered bad-form, sneaky and undermines the PURPOSE of containers. I don't know any major package that does this.
This is like doing a git push --force on your repository after you changed around some commits. It's seriously bad practice.
The system is designed and built on trust, and in order for it to be used, adopted and grow, the trust must remain. Always check the older tags of any container you want to use, and be sure they allow their old deprecated tags to live on.
Thus, when you download python:3.8.3 today, or 2 years from now, it should function exactly the same.
For example, if you docker pull python:2.7.8, and then docker inspect python:2.7.8 you'll find that it is the same container that was created 5 years ago.
"Created": "2014-11-26T22:30:48.061850283Z",

Related

Is it bad practice to use mysql:latest as docker image?

Docker was built to make sure that code can run on any device without any issues of missing libraries, wrong versions, etc.
However if you use the "latest" tag instead of a fixed version number, then you actually don't know on beforehand with which version of the image you'll end up. Perhaps your code is at the time of writing compatible with the "latest" version but not in the future. Is it then a bad practice to use this tag? Is it a better practice to use a fixed version number?
Usually, probably, for the reasons you state, you likely want to avoid ...:latest tags here.
For databases in particular, there can be differences in the binary interfaces, and there definitely are differences in the on-disk storage. If you were expecting MySQL 5.7 but actually got MySQL 8.0 you might get some surprising behavior. If a MySQL 9.0 comes out and you get an automatic upgrade then your local installation might break.
A generic reason to avoid ...:latest is that Docker will use a local copy of an image with the right tag, without checking Docker Hub or another repository. So again in the hypothetical case where MySQL 9.0 comes out, but system A has an old mysql:latest that's actually 8.0, docker run mysql on an otherwise clean system B will get a different image, and that inconsistency can be problematic.
You will probably find Docker image tags for every specific patch release of the database, but in general only the most recent build gets security updates. I'd suggest that pinning to a minor version mysql:8.0 is a good generic approach; you may want the more specific version mysql:8.0.29 in your production environment if you need to be extremely conscious of the upgrades.
Some other software is less picky this way. I'd point at the Redis in-memory data store and the Nginx HTTP server as two things where their network interfaces and configuration have remained very stable. Probably nothing will go wrong if you use nginx:latest or redis:latest, even if different systems do wind up with different versions of the image.

Docker best practice: use OS or application as base image?

I would like to build a docker image which contains Apache and Python3. What is the suggested base image to use in this case?
There is a offical apache:2.4.43-alpine image I can use as the base image, or I can install apache on top of a alpine base image.
What would be the best approach in this case?
Option1:
FROM apache:2.4.43-alpine
<Install python3>
Option2:
FROM alpine:3.9.6
<Install Apache>
<Install Python3>
Here are my rules.
rule 1: If the images are official images (such as node, python, apache, etc), it is fine to use them as your application's base image directly, more than you build your own.
rule 2:, if the images are built by the owner, such as hashicorp/terraform, hashicorp is the owner of terraform, then it is better to use it, more than build your own.
rule 3: If you want to save time only, choice the most downloaded images with similar applications installed as base image
Make sure you can view its Dockerfile. Otherwise, don't use it at all, whatever how many download counted.
rule 4: never pull images from public registry servers, if your company has security compliance concern, build your own.
Another reason to build your own is, the exist image are not built on the operation system you prefer. Such as some images proved by aws, they are built with amazon linux 2, in most case, I will rebuild with my own.
rule 5: When build your own, never mind from which base image, no need reinvent the wheel and use exis image's Dockerfile from github.com if you can.
Avoid Alpine, it will often make Python library installs muuuuch slower (https://pythonspeed.com/articles/alpine-docker-python/)
In general, Python version is more important than Apache version. Latest Apache from stable Linux distro is fine even if not latest version, but latest Python might be annoyingly too old. Like, when 3.9 comes out, do you want to be on 3.7?
As such, I would recommend python:3.8-slim-buster (or whatever Python version you want), and install Apache with apt-get.

Docker, update image or just use bind-mounts for website code?

I'm using Django but I guess the question is applicable to any web project.
In our case, there are two types of codes, the first one being python code (run in django), and others are static files (html/js/css)
I could publish new image when there is a change in any of the code.
Or I could use bind mounts for the code. (For django, we could bind-mount the project root and static directory)
If I use bind mounts for code, I could just update the production machine (probably with git pull) when there's code change.
Then, docker image will handle updates that are not strictly our own code changes. (such as library update or new setup such as setting up elasticsearch) .
Does this approach imply any obvious drawback?
For security reasons is advised to keep an operating system up to date with the last security patches but docker images are meant to be released in an immutable fashion in order we can always be able to reproduce productions issues outside production, thus the OS will not update itself for security patches being released. So this means we need to rebuild and deploy our docker image frequently in order to stay on the safe side.
So I would prefer to release a new docker image with my code and static files, because they are bound to change more often, thus requiring frequent release, meaning that you keep the OS more up to date in terms of security patches without needing to rebuild docker images in production just to keep the OS up to date.
Note I assume here that you release new code or static files at least in a weekly basis, otherwise I still recommend to update at least once a week the docker images in order to get the last security patches for all the software being used.
Generally the more Docker-oriented solutions I've seen to this problem learn towards packaging the entire application in the Docker image. That especially includes application code.
I'd suggest three good reasons to do it this way:
If you have a reproducible path to docker build a self-contained image, anyone can build and reproduce it. That includes your developers, who can test a near-exact copy of the production system before it actually goes to production. If it's a Docker image, plus this code from this place, plus these static files from this other place, it's harder to be sure you've got a perfect setup matching what goes to production.
Some of the more advanced Docker-oriented tools (Kubernetes, Amazon ECS, Docker Swarm, Hashicorp Nomad, ...) make it fairly straightforward to deal with containers and images as first-class objects, but trickier to say "this image plus this glop of additional files".
If you're using a server automation tool (Ansible, Salt Stack, Chef, ...) to push your code out, then it's straightforward to also use those to push out the correct runtime environment. Using Docker to just package the runtime environment doesn't really give you much beyond a layer of complexity and some security risks. (You could use Packer or Vagrant with this tool set to simulate the deploy sequence in a VM for pre-production testing.)
You'll also see a sequence in many SO questions where a Dockerfile COPYs application code to some directory, and then a docker-compose.yml bind-mounts the current host directory over that same directory. In this setup the container environment reflects the developer's desktop environment and doesn't really test what's getting built into the Docker image.
("Static files" wind up in a gray zone between "is it the application or is it data?" Within the context of this question I'd lean towards packaging them into the image, especially if they come out of your normal build process. That especially includes the primary UI to the application you're running. If it's things like large image or video assets that you could reasonably host on a totally separate server, it may make more sense to serve those separately.)

Which docker base image to use in the Dockerfile?

I'm having a web application, which consists of two projects:
using VueJS as a front-end part;
using ExpressJS as a back-end part;
I now need to docker-size my application using a docker, but I'm not sure about the very first line in my docker files (which is referring to the used environment I guess, source).
What I will need to do now is separate docker images for both projects, but since I'm very new to this, I can't figure out what should be the very first lines for both of the Dockerfiles (in both of the projects).
I was developing the project in Windows 10 OS, where I'm having node version v8.11.1 and expressjs version 4.16.3.
I tried with some of the versions which I found (as node:8.11.1-alpine) but what I got a warning: `
SECURITY WARNING: You are building a Docker image from Windows against
a non-Windows Docker host.
Which made me to think that I should not only care about node versions, instead to care about OS as well. So not sure which base images to use now.
node:8.11.1-alpine is a perfectly correct tag for a Node image. This particular one is based on Alpine Linux - a lightweight Linux distro, which is often used when building Docker images because of it's small footprint.
If you are not sure about which base image you should choose, just read the documentation at DockerHub. It lists all currently supported tags and describes different flavours of the Node image ('Image Variants' section).
Quote:
Image Variants
The node images come in many flavors, each designed for a specific use case.
node:<version>
This is the defacto image. If you are unsure about what your needs are, you probably want to use this one. It is designed to be used both as a throw away container (mount your source code and start the container to start your app), as well as the base to build other images off of. This tag is based off of buildpack-deps. buildpack-deps is designed for the average user of docker who has many images on their system. It, by design, has a large number of extremely common Debian packages. This reduces the number of packages that images that derive from it need to install, thus reducing the overall size of all images on your system.
node:<version>-alpine
This image is based on the popular Alpine Linux project, available in the alpine official image. Alpine Linux is much smaller than most distribution base images (~5MB), and thus leads to much slimmer images in general.
This variant is highly recommended when final image size being as small as possible is desired. The main caveat to note is that it does use musl libc instead of glibc and friends, so certain software might run into issues depending on the depth of their libc requirements. However, most software doesn't have an issue with this, so this variant is usually a very safe choice. See this Hacker News comment thread for more discussion of the issues that might arise and some pro/con comparisons of using Alpine-based images.
To minimize image size, it's uncommon for additional related tools (such as git or bash) to be included in Alpine-based images. Using this image as a base, add the things you need in your own Dockerfile (see the alpine image description for examples of how to install packages if you are unfamiliar).
node:<version>-onbuild
The ONBUILD image variants are deprecated, and their usage is discouraged. For more details, see docker-library/official-images#2076.
While the onbuild variant is really useful for "getting off the ground running" (zero to Dockerized in a short period of time), it's not recommended for long-term usage within a project due to the lack of control over when the ONBUILD triggers fire (see also docker/docker#5714, docker/docker#8240, docker/docker#11917).
Once you've got a handle on how your project functions within Docker, you'll probably want to adjust your Dockerfile to inherit from a non-onbuild variant and copy the commands from the onbuild variant Dockerfile (moving the ONBUILD lines to the end and removing the ONBUILD keywords) into your own file so that you have tighter control over them and more transparency for yourself and others looking at your Dockerfile as to what it does. This also makes it easier to add additional requirements as time goes on (such as installing more packages before performing the previously-ONBUILD steps).
node:<version>-slim
This image does not contain the common packages contained in the default tag and only contains the minimal packages needed to run node. Unless you are working in an environment where only the node image will be deployed and you have space constraints, we highly recommend using the default image of this repository.

How to use Liberty 8.5.5.9 Docker

we believe new Websphere Liberty 16.0.0.2 has an important bug related to JAX-RS 2.0 client, which prevents standard REST calls from apps deployed to work. The last version we know to be this bug free is 8.5.5.9, but Dockerfile of the official Docker by IBM has already been updated to 16.0.0.2
Even though we use Dockers, I am no Docker geek. Is it possible to specify in my Dockerfile in first line:
FROM websphere-liberty:webProfile7
That I want the version of the Docker that includes 8.5.5.9 and not the last one? Which one would it be? (other Docker, like Solr, explain the different versions in the doc)
If you look at the 'tags' tab on Docker Hub you will see that there are other historical tags still available including websphere-liberty:8.5.5.9-webProfile7. Note that these images represent a snapshot in time e.g. they are not rebuilt when new versions of the base Ubuntu image are created. The intention is that Liberty provides zero migration and therefore you should always be able to use the latest. You have obviously found the counter-example...

Resources