Docker multi stage builds, Kubernetes, and Distroless compatibility - docker

I am facing "theoritical" compatility issues when using distroless-based containers with kubernetess 1.10.
Actually, distroless requires docker 17.5 (https://github.com/GoogleContainerTools/distroless) whereas kubernetes does support version 17.03 only (https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.10.md#external-dependencies)
is it possible to run distroless containers within kubernetes 1.10
clusters w/o any issue?
is it possible to build distroless based
images on a build server running docker 17.05 then deploying it on a
kubernetes 1.10 cluster (docker 17.03)?

The requirement for 17.05 is only to build a "distroless" image with docker build using multistage Dockerfile. When you have an image built, there is nothing stopping it from running on older Docker / containerd versions.
Docker has supported images with no distribution for ages now by using FROM: scratch and leaving it to the image author to populate whatever the software needs, which in some cases like fully static binaries might be only the binary of the software and nothing more :)

It seems that you might need Docker 17.05+ only for building images using multi-stage files.
After you build an image with the multi-stage Dockerfile, it will be the same image in the registry like if you build it in an old-fashioned way.
Taken from Use multi-stage builds:
With multi-stage builds, you use multiple FROM statements in your Dockerfile. Each FROM instruction can use a different base, and each of them begins a new stage of the build. You can selectively copy artifacts from one stage to another, leaving behind everything you don’t want in the final image.
The end result is the same tiny production image as before, with a significant reduction in complexity.
Kubernetes does not use Dockerfiles for creating pods. It uses ready to run images from the Docker registry instead.
That's why I believe that you can use such images in Kubernetes Pods without any issues.
But anyway, to create and push your images, you have to use a build machine with Docker 17.05+ that can consume new multi-stage syntax in the Dockerfile.

Related

What is the purpose of pushing an image in a CI/CD pipeline?

Context: Reading through this blog post.
Pushing images to a registry seems to be the "right thing to do" ... but I don't understand why.
What purpose does this serve? Is it because the server I ssh into needs to have a local copy of the image? And to do that, one approach is to pull an image from a registry?
What purpose does this serve? Is it because the server I ssh into needs to have a local copy of the image? And to do that, one approach is to pull an image from a registry?
From the CI/CD perspective, a docker registry is the equivalent of an artifact repository for images. You want a central source of these images to download from as you go from one docker host to another since your build server is most likely different than your dev and prod servers.
Couldn't I just upload an image from one machine (say a CI/CD server) via ssh? using dockerhub seems needlessly ceremonious to me. Like in this example (I know this api is deprecated but it illustrates my point).
It is possible to save/load images directly to a docker host, but there a few major downsides. First, you lose any benefit from docker's layered filesystem. When building an app in CI/CD, most of the time only the last few layers should need to be rebuilt with your application changes. There should be the same previous base image and various common layers to build your app that remain identical. With a registry, these common layers are seen, only the difference is pushed and pulled, making your deploys faster and saving you disk space. With a save/load command, all layers are sent every time since you do not know the state of the remote server when you run the save.
Second, this doesn't scale as you add hosts to run images. Every host would need the image copied on the chance you want to run it on that host, e.g. to handle failover or load balancing. It also won't work if you move to swarm mode or kubernetes since you could easily add new nodes to the cluster that won't have your image. Swarm mode defaults to looking up the sha256 of the image on the registry to guarantee the same image is always used even if the tag is modified on the registry after the initial deploy.
Keep in mind you can run your own registry server (there's a docker image and the api is open). Many artifact repositories (e.g. artifactory and nexus) include support for a docker registry. And many cloud providers include a registry with their container offerings. So you do not need to push to a remote docker hub to deploy locally.
One last point is that a registry server is useful to developers who can now pull the same image used in dev and prod to test against other microservices they are writing locally without the need to build everything locally or ssh to a CI/CD server or even prod to save and scp images back to their laptops.
Usually, you use a CI, CD pipeline when you want to streamline your build / test/ deploy process, and usually this happens if you have a production infrastructure to maintain that is actually critical to your business.
There is no need for a CI/CD pipeline if you're just playing around / prototyping IMO, in which case you can build you docker images on the machine directly, or ssh an image over. That's perfectly reasonable.
Look at the 'registry' as a repository of your binary image (i.e. a fixed version of your code that ideally is versioned and you know works)
Then deploying is as simple as telling your servers to pull the image and run it, from anywhere.
On a flexible architecture, you might have nodes coming up or going down at any time, and they need to be able to pull the latest code from somewhere to get back up and running automatically, at any time, without intervention.
Registry is single source of truth in this case. It means, that you can have multiple nodes (servers), cluster(s) and have the single place from where you can get your images. Also if of your nodes drop-down - you can fast start your image in the new one. Also you can automate your image's updating using registry's webhook, for example when you add new version of image registry gonna send webhook to any service that can upgrade your containers to the newest version.
Consider docker image as a new way of distribution of your software to your servers and docker-registry as a centralized storage of shared images(the like npm.org for js, maven.org for java).
For example,
if you develop java application, years before docker you may use .jar files to do it. The way docker image is better is that also include all OS level dependencies like JDK/JRE and system configurations. So this helps you to avoid "it works on my machine" effect.
To distribute docker image you can also use just docker file and build it all the time on every machine. Docker-Repository allows you to have centralized storage of pre-build images.
Pushing to docker-repository in your CI/CD allows to build your distributive once and further work with the same distributive both on integration and prod environments.
Using just Dockerfile will not guarantee you the same state on every build in every moment of time because you may install external dependencies in your Dockerfile script which may be updated or even removed between two sequential builds.

Shared build logic with docker-compose and multi-stage Dockerfiles

I am using docker-compose with multi-stage Dockerfiles to build and run multiple services. This works, but the "build" portion of each multi-stage build is largely copy-and-pasted between each service's Dockerfile. I want to reduce the copy-and-paste / centralize the common build logic in one spot.
Reading https://engineering.busbud.com/2017/05/21/going-further-docker-multi-stage-builds/ I could create a local image with the shared build steps and have the service Docker files depend on it, but I want the development experience to be a simple docker-compose up. Creating a local build image means a developer would have to know to run docker build [common_build_image] first so that the build image exists locally and THEN run docker compose up to build and run all the services that depend on it.
There doesn't appear to be a way to include a Dockerfile into another Dockerfile. FROM does not appear to support local paths.
Is there a way to accomplish what I want? Of course I can use a shell script to tie everything together, but that is basically what multi-stage builds was trying to solve in the first place.
It turns out you can "compose" docker-compose: https://docs.docker.com/compose/extends/#adding-and-overriding-configuration which is what I was looking for.

How to automate Multi-Arch-Docker Image builds

I have dockerized a nodejs app on github. My Dockerfile is based on the offical nodejs images. The offical node-repo supports multiple architectures (x86, amd64, arm) seamlessly. This means I can build the exact same Dockerfile on different machines resulting in different images for the respective architecture.
So I am trying to offer the same architectures seamlessly for my app, too. But how?
My goal is automate it as much as possible.
I know I need in theory to create a docker-manifest, which acts as a docker-repo and redirects the end-users-docker-clients to their suitable images.
Docker-Hub itself can monitor a github repo and kick off an automated build. Thats would take care of the amd64 image. But what about the remaining architectures?
There is also the service called 'TravisCI' which I guess could take care of the arm-build with the help of qemu.
Then I think both repos could then be referenced statically by the manifest-repo. But this still leaves a couple architectures unfulfilled.
But using multiple services/ways of building the same app feels wrong. Does anyone know a better and more complete solution to this problem?
It's basically running the same dockerfile through a couple machines and recording them in a manifest.
Starting with Docker 18.02 CLI you can create multi-arch manifests and push them to the docker registries if you enabled client-side experimental features. I was able to use VSTS and create a custom build task for multi-arch tags after the build. I followed this pattern.
docker manifest create --amend {multi-arch-tag} {os-specific-tag-1} {os-specific-tag-2}
docker manifest annotate {multi-arch-tag} {os-specific-tag-1} --os {os-1} --arch {arch-1}
docker manifest annotate {multi-arch-tag} {os-specific-tag-2} --os {os-2} --arch {arch-2}
docker manifest push --purge {multi-arch-tag}
On a side note, I packaged the 18.02 docker CLI for Windows and Linux in my custom VSTS task so no install of docker was required. The manifest command does not appear to need the docker daemon to function correctly.

Is it possible to run Docker multi-stage build images on older versions of Docker?

Creating multi stage builds requires Docker 17.05. Once created is it possible to use these images on older versions of the Docker daemon, or are the images themselves in a slightly different format?
Of course it's possible. The image format has not changed. It's just the build process that changed.

What are the pros and cons of docker pull and docker build from Dockerfile?

I have been playing around with docker for about a month and now I have a few images.
Recently, I want to share one of them to some other guy,
and I push that image X to my DockerHub, so that he can pull it from my repository.
However, this seems kind of a waste of time.
The total time spent here is the time I do docker push and the time he do docker pull.
If I just sent him the Dockerfile needed to build that image X, then the cost would be
the time I write a Dockerfile, the time to pass a text file, and the time he do docker build,
which is less than previous way since I maintain my Dockerfiles well.
So, that is my question: what are the pros/cons of these two approach?
Why Docker Inc. chose to launch a DockerHub service rather than a DockerfileHub service?
Any suggestions or answers would be appreciated.
Thanks a lot!
Let's assume you build an image from a Dockerfile and push that image to Docker Hub. During the build you download some sources and build a program. But when the build is done the sources become unavailable. Now the Dockerfile can't be used anymore but the image on Docker Hub is still working. That's a pro for Docker Hub.
But it can be a con too. For example if the sourcecode contains a terrible bug like Heartbleed or Shellshock. Then the sources get patched but the image on Docker Hub does not get updated.
In fact, the time you push image and the time you build image depend on your environment.
For example, you may prebuild a image for embedded system, but you won't want to build it on embedded system.
Docker Hub had provided an Automated Builds feature which will fetch Dockerfile from GitHub, and build image. So you can get the Dockerfile of image from GitHub, it's not necessary to have a service for sharing Dockerfile.

Resources