Docker & Kubernetes & architecture: understanding platform differences - docker

Intro
There is an option --platform for Docker image to be run and config platform for docker-compose.
Also, almost in all official Docker images in hub.docker.com there is some of supported architectures in one tag.
Example, Ubuntu official image:
Most of Servers (also in Kubernetes) are linux/amd64.
I updated my MacBook to new one with their own Silicon chip (M1/M2...) and now Docker Desktop showing me message:
For official images (you can see them without yellow note) it downloads automatically needed platform (I guess).
But for custom created images (in private repository like nexus, artifacts) I have no influence. Yes, I can build appropriate images (like with buildx) for different platforms and push it to the private repository, but, in companies, where repos managed by DevOps - it is tricky to do so. They say that the server architecture is linux/amd64, and if I develop web-oriented software (PHP etc.) on a different platform, even if the version (tag) is the same - then the environment is different, and there is no guarantee that it will work on the server.
I assumed that it is only the difference in interpretation of instructions between the software and the hardware.
I would like to understand the subject better. There is a lot of superficial information on the web, no details.
Questions
what "platform/architecture" for Docker image does it really means? Like core basics.
Will you really get different code for interpreted programming languages?
It seems to me that if the wrong platform is specified, the containers work very slowly. But how to measure this (script performance, interaction with the host file system, etc.)

TLDR
Build multi-arch images supporting multiple architectures
Always ensure that the image you're trying to run has compatible architecture
what "platform/architecture" for docker image does it really means? Like core basics. Links would be appreciated.
It means that some of the compiled binary code within the image contains CPU instructions exlusive to that specific architecture.
If you run that image on the incorrect architecture, it'll either be slower due to the incompatible code needing to run through an emulator, or it might even not work at all.
Some images are "multi-arch", where your Docker installation selects the most suitable architecture of the image to run.
Will you really get different code for interpreted programming languages?
Different machine code, yes. But it will be functionally equivalent.
It seems to me that if the wrong platform is specified, the containers work very slowly. But how to measure this (script performance, interaction with the host file system, etc.)
I recommend to always ensure you're running images meant for your machine's infrastructure.
For the sake of science, you could do an experiment.
You can build an image that is meant to run a simple batch job for two different architectures, and then you can try running them both on your machine. Compare the time it takes the containers to finish.
Sources:
https://serverfault.com/questions/1066298/why-are-docker-images-architecture-specific#:~:text=Docker%20images%20obviously%20contain%20processor,which%20makes%20them%20architecture%20dependent.
https://www.docker.com/blog/multi-arch-build-and-images-the-simple-way/
https://www.reddit.com/r/docker/comments/o7u8uy/run_linuxamd64_images_on_m1_mac/

Related

When using docker multi-architecture support how are the be/le endianess variants and soft/hard float aspect handled?

I am learnning about the Docker buildkit/buildx support for multiple architectures. I see that docker identify those using 3 variables: OS, ARCH, VARIANT, which are then put together to form a PLATFORM variant.
We are looking at building and running docker image within embedded environments. Yet, those environment have slightly alter kernel headers (public headers - UAPI) and also have various flavor of ARM cpu, some use soft-float, others hard-float, some are little-endian, others are big-endian.
As such, I am unclear what the existing docker platforms how those settings are being handled. I would also like ideally to build/add my own platforms to the system as to ensure proper compatibility of my docker image to my embedded platforms.
Can someone point me towards a proper resources on how additional details like this may be found?
At this time I have mostly read documentation about docker buildkit/buildx, I have look at the supprot for binfmt to use Qemu and the cross-compilation aspect. In my case, most of the file I will be adding to the docker image will be pre-built.

Why do we build "inside" docker?

When I first learned Docker I expected a config file, image producer, CLI, and options for mounting and networks. That's all there.
I did not expect to put build commands inside a Dockerfile. I thought docker would wrap/tar/include a prebuilt task I made. Why give build commands in Docker?
Surely it can import a task thus keeping Jenkins/Bazel etc. distinct and apart for making an image/container?
I guess we are dealing with a misconception here. Docker is NOT a lighweight version of VMware/Xen/KVM/Parallels/FancyVirtualization.
Disclaimer: The following is heavily simplified for the sake of comprehensiveness.
So what is Docker?
In one sentence: Docker is a system to isolate processes from the other processes within an operating system as much as possible while still providing all means to run them. Put differently:
Docker is a package manager for isolated processes.
One of its closest ancestors are chroot and BSD jails. What those basically do is to isolate (more in the case of BSD, less in the case of chroot) a part of your OS resources and have a complete environment running independently from the rest of the OS - except for the kernel.
In order to be able to do that, a Docker image obviously needs to contain everything except for a kernel. So you need to provide a shell (if you choose to do so), standard libraries like glibc and even resources like CA certificates. For reference: In order to set up chroot jails, you did all this by hand once upon a time, preinstalling your chroot environment with each and every piece of software required. Docker is basically taking the heavy lifting from you here.
The mentioned isolation even down to the installed (and usable software) sounds cumbersome, but it gives you several advantages as a developer. Since you provide basically everything except for a (compatible) kernel, you can develop and test your code in the same environment it will run later down the road. Not a close approximation, but literally the same environment, bit for bit. A rather famous proverb in relation to Docker is:
"Runs on my machine" is no excuse any more.
Another advantage is that can add static resources to your Docker image and access them via quite ordinary file system semantics. While it is true that you can do that with virtualisation images as well, they usually do not come with a language for provisioning. Docker does - the Dockerfile:
FROM alpine
LABEL maintainer="you#example.com"
COPY file/in/host destination/on/image
Ok, got it, now why the build commands?
As described above, you need to provide all dependencies (and transitive dependencies) your application has. The easiest way to ensure that is to build your application inside your Docker image:
FROM somebase
RUN yourpackagemanager install long list of dependencies && \
make yourapplication && \
make install
If the build fails, you know you have missing dependencies. Now you can tweak and tune your Dockerfile until it compiles and is tested. So now your Docker image is finished, you can confidently distribute it, since you know that as long as the docker daemon runs on the machine somebody tries to run your image on, your image will run.
In the Go ecosystem, you basically assure your go.mod and go.sum are up to date and working and your work stay's reproducible.
Again, this works with virtualisation as well, so where is the deal?
A (good) docker image only runs what it needs to run. In the vast majority of docker images, this means exactly one process, for example your Go program.
Side note: It is very bad practise to run multiple processes in one Docker image, say your application and a database server and a cache and whatnot. That is what docker-compose is there for, or more generally container orchestration. But this is far too big of a topic to explain here.
A virtualised OS, however, needs to run a kernel, a shell, drivers, log systems and whatnot.
So the deal basically is that you get all the good stuff (isolation, reproducibility, ease of distribution) with less waste of resources (running 5 versions of the same OS with all its shenanigans).
Because we want to have enviroment for reproducible build. We don't want to depend on version of language, existence of compiler, version of libraires and so on.
Building inside a Dockerfile allows you to have all the tools and environment you need inside independently of your platform and ready to use. In a development perspective is easier to have all you need inside the container.
But you have to think about the objective of building inside a Dockerfile, if you have a very complex build process with a lot of dependencies you have to be worried about having all the tools inside and it reflects on the final size of your resulting image. Because this is not the same building to generate an artifact than building to produce the final container.
Thinking about this two aspects you have to learn to use the multistage build process in Docker here. The main idea is closer to your question because you can have a as many stages as you need depending on your build process and use different FROM images to ensure you have the correct requirements and dependences on each stage, to finally generate the image with the minimum dependencies and smaller size.
I'll add to the answers above:
Doing builds in or out of docker is a choice that depends on your goal. In my case I am more interested in docker containers for kubernetes, and in addition we have mature builds already.
This link shows how you take prebuilt tasks and add them to an image. This strategy together with adding libs, env etc leverages docker well and shows that indeed docker is flexible. https://medium.com/#chemidy/create-the-smallest-and-secured-golang-docker-image-based-on-scratch-4752223b7324

Docker, update image or just use bind-mounts for website code?

I'm using Django but I guess the question is applicable to any web project.
In our case, there are two types of codes, the first one being python code (run in django), and others are static files (html/js/css)
I could publish new image when there is a change in any of the code.
Or I could use bind mounts for the code. (For django, we could bind-mount the project root and static directory)
If I use bind mounts for code, I could just update the production machine (probably with git pull) when there's code change.
Then, docker image will handle updates that are not strictly our own code changes. (such as library update or new setup such as setting up elasticsearch) .
Does this approach imply any obvious drawback?
For security reasons is advised to keep an operating system up to date with the last security patches but docker images are meant to be released in an immutable fashion in order we can always be able to reproduce productions issues outside production, thus the OS will not update itself for security patches being released. So this means we need to rebuild and deploy our docker image frequently in order to stay on the safe side.
So I would prefer to release a new docker image with my code and static files, because they are bound to change more often, thus requiring frequent release, meaning that you keep the OS more up to date in terms of security patches without needing to rebuild docker images in production just to keep the OS up to date.
Note I assume here that you release new code or static files at least in a weekly basis, otherwise I still recommend to update at least once a week the docker images in order to get the last security patches for all the software being used.
Generally the more Docker-oriented solutions I've seen to this problem learn towards packaging the entire application in the Docker image. That especially includes application code.
I'd suggest three good reasons to do it this way:
If you have a reproducible path to docker build a self-contained image, anyone can build and reproduce it. That includes your developers, who can test a near-exact copy of the production system before it actually goes to production. If it's a Docker image, plus this code from this place, plus these static files from this other place, it's harder to be sure you've got a perfect setup matching what goes to production.
Some of the more advanced Docker-oriented tools (Kubernetes, Amazon ECS, Docker Swarm, Hashicorp Nomad, ...) make it fairly straightforward to deal with containers and images as first-class objects, but trickier to say "this image plus this glop of additional files".
If you're using a server automation tool (Ansible, Salt Stack, Chef, ...) to push your code out, then it's straightforward to also use those to push out the correct runtime environment. Using Docker to just package the runtime environment doesn't really give you much beyond a layer of complexity and some security risks. (You could use Packer or Vagrant with this tool set to simulate the deploy sequence in a VM for pre-production testing.)
You'll also see a sequence in many SO questions where a Dockerfile COPYs application code to some directory, and then a docker-compose.yml bind-mounts the current host directory over that same directory. In this setup the container environment reflects the developer's desktop environment and doesn't really test what's getting built into the Docker image.
("Static files" wind up in a gray zone between "is it the application or is it data?" Within the context of this question I'd lean towards packaging them into the image, especially if they come out of your normal build process. That especially includes the primary UI to the application you're running. If it's things like large image or video assets that you could reasonably host on a totally separate server, it may make more sense to serve those separately.)

Can I run a docker container doing a x86 build on a IBM Power system?

Our build setup is backed into a large docker container (basically a 2 GB image coming with a complete X86 linux in itself).
We have two ways to actually build: the official approach is jenkins environment (running on X86 hardware). But we also have a little "side X86 server" running RH 7. Developers can log into that RH server and kick off specific builds (using said docker images) themselves.
Those RH servers will be shut down at some point, to be replaced with IBM Power8 machines (running RH7 Little Endian for power).
I am simply wondering: is there a chance that our existing build setup and docker images simply work on Power8? Or are the fundamental technical issues that make it unlikely and not even worth trying?
You can probably use your existing build methodology and scripts close to unchanged, but you'll need to rebuild the actual images.
You can't directly run x86 binaries on Power (at a very low level, the bytes of machine code are just different). Docker doesn't contain any sort of virtualization layer; it does a bunch of setup to isolate the container from the host, but then runs the binaries in an image directly.
If your Jenkins setup has enough parameters for image names and version tags, then you should be able to run the x86 and Power setups side-by-side; you need to encode the architecture somewhere in the built image name or tag; for instance, repo.example.com/app/build:20180904-power. (I don't know that one or the other is considered better if you control all of the machinery.) If you have a private repo, you could encode it earlier in the path, winding up with image names like repo.example.com/power/build:20180904.
You'd need to double-check that everywhere that has a Docker image reference has it correctly parameterized (which is a good practice anyways). That would include any direct docker run commands; any Docker Compose or Kubernetes YAML files or similar artifacts; and the FROM line of any Dockerfiles.
Existing build setup? Not sure!
Docker images? NO, don’t even try.
Docker images are actually multiple layers which stored on filesystem through corresponding storage driver and backing filesystem(shown in the output of docker info).
If storage driver/backing filesystem has been changed, which likely be true when OS changed, older docker images could not be valid any more. Meaning they must be rebuilt for sure.

Difference between hashicorp packer vs linuxkit

I have used hashicorp packer for building baked VM images.
But was wondering linuxkit too do the same stuff I mean building the baked VM images with the only difference of being more container and kernel centeric.
Want to know the exact difference between the working of these two and there use cases.
Also can there be any usecase using both packer and linuxkit.
I have used both fairly extensively (disclosure: I am a volunteer maintainer for LinuxKit). I used packer for quite some time, and switched almost all of the work I did in packer over to LinuxKit (lkt).
In principle both are open-source tools that serve the same purpose: generate an OS image that can be run. Practically, most use it for VM images to run on vbox, AWS, Azure, GCR, etc., but you can generate an image that will run on bare metal, which I have done as well.
Packer, being older, has a more extensive array of provisioners, builders, plugins, etc. It tries to be fairly broad-based and non-opinionated. Build for everywhere, run any install you want.
LinuxKit runs almost everything - onboot processes and continuous services - in a container. Even the init phase - where the OS image will be booted - is configured by copying files from OCI images.
LinuxKit's strong opinions about how to run and build things can in some ways be restrictive, but also liberating.
The most important differences, in my opinion, are the following:
lkt builds up from scratch to the bare minimum you need; Packet builds from an existing OS base.
lkt's security surface of attack will be smaller, because it starts not with an existing OS, but with, well, nothing.
lkt images can be significantly smaller, because you add in only precisely what you need.
lkt builds run locally. Packer essentially spins up a VM (vbox, EC2, whatever), runs some base image, modifies it per your instructions, and then saves it as a new image. lkt just manipulates OCI images by downloading and copying files to create a new image.
I can get to the same net result for differences 1-3 with Packer and LinuxKit, albeit lkt is much less work. E.g. I contributed the getty package to LinuxKit to separate and control when/how getty is launched, and in which namespace. The amount of work to separate and control that in a packer image built on a full OS would have been much harder. Same for the tpm package. Etc.
The biggest difference IMO, though, is step 4. Because Packer launches a VM and runs commands in it, it is much slower and much harder to debug. The same packer image that takes me 10+ mins to build can be 30 seconds in lkt. Your mileage may vary, depending on if the OCI images are downloaded or not, and how complex what you are doing is, but it really has been an order of magnitude faster for me.
Similarly, debugging step by step, or finding an error, running, debugging, and rebuilding, is far harder in a process that runs in a remote VM than it is in a local command: lkt build.
As I said, opinions are my own, but those are the reasons that I moved almost all of my build work to lkt, contributed, and agreed to join the excellent group of maintainers when asked by the team.
At the same time, I am deeply appreciative to HashiCorp for their fantastic toolset. Packer served me well; nowadays, LinuxKit serves me better.

Resources