When building Docker images, I find myself in a strange place -- I feel like I'm doing something that somebody has already done many times before -- and did a vastly better job at it. In most cases, this gut feeling is absolutely right -- I'm taking a piece of software and re-describe everything that's already described in the OS's packaging system in a Dockerfile.
More often than not, I even find myself installing software into the image using a packager manager and then looking inside that package to get some clues about writable paths, configuration files, open ports etc. for my Dockerfile. The duplication of effort between OS packager and Docker packager is most evident in such a case which I assume is one of the more common.
So basically, every Docker user building an image on top of pre-packaged software is re-packaging almost from scratch, but without the time and often the domain knowledge the OS packagers had for trial, error and polish. If we consider the low reusability of community-maintained images (re-basing from Debian to RHEL hurts), we're stuck with copying or re-implementing functionality that already exists and works on OS level, wasting a lot of time and putting a maintenance burden on the poor souls who'd inherit whatever we might leave behind.
Is there any way to resolve this duplication of effort and re-use whatever package maintainers have already learned about a piece of software in Docker?
The main source for Docker image reuse is hub.docker.com
Search first there if your system is already described in one of those images.
You can see their Dockerfile, and start your own from one of those images instead of starting from a basic ubuntu or wheezy one.
Related
Short version
I would like to know the technical reasons why do Docker images need to be created for multiple architectures. Also, it is not clear whether the point here is creating an image for each CPU architecture or for an OS. Shouldn't the OS abstract the architecture?
Long version
I can understand why the Docker Engine must be ported to multiple architectures. It is a piece of software that will interact with the OS, make system calls, and ultimately it is just code that is represented as a sequence of instructions within a particular instruction set, for a particular architecture. So the Docker Engine must be ported to multiple OS/architectures much like, let's say, Microsoft Word would have to be ported.
The same thing would occur to - let's say - the JVM, or to VirtualBox.
But, different than with Docker, software written for the JVM on Windows would run on Linux. The JVM would abstract the differences of the underlying OS/architectures, and run the same code on both platforms.
Why isn't that the case with Docker images? Why can't the Docker Engine just abstract the differences, and provide a common interface, so the image itself wouldn't need to be compatible with a specific OS/architecture?
Is this a decision (like "let's make different images per architecture because it is better for reason X"), or a consequence of how Docker works (like "we need to do it this way because Docker requires Y")?
Note
I'm not crying "omg, why??". This is not a rant or criticism, I'm just looking for a technical explanation for the need of different images for different architectures.
I'm not asking how to create a multi-architecture image.
I'm not looking for an answer like "multi-architecture images are needed so you can run your images on various platforms", which answers "what for?", but not "why is that needed?" (which is my question).
Besides that, when you see an image, it usually has an os/arch in the digest, like this:
What exactly the image is targeting? The OS, the architecture, or both? Shouldn't the OS abstract the underlying architecture?
edit: I'm starting to assume that the need for different images per architecture is on the lines of: the image will contain applications inside it. Let's say, it will contain the Go compiler. The Go compiler itself is a binary that must have been complied to different architectures. The image for x86-64 will contain the Go compiler compiled to x86-64, and so on. Is this correct? If this is correct, is this the only reason?
Why can't the Docker Engine just abstract the differences, and provide a common interface
Performance would be a major factor. Consider how slow Cygwin is for some things when providing a POSIX API on top of Windows by emulating some POSIX things that don't map directly to the Windows API. (e.g. fork() / exec separately, instead of CreateProcess).
And that's just source compatibility; the resulting binaries are specific to Cygwin on Windows. It's even worse if you want to do that at runtime (binary compat instead of source compat).
There's also the amount of complexity Docker would need to provide an efficient portable JIT-compiling VM on top of various OSes, especially across various CPU ISAs like x86-64 vs. AArch64 that don't even share common machine code.
If Docker had gone this route, it would really just be re-inventing a JVM or .NET CLR bytecode-based VM.
Or more likely, instead of reinventing that wheel, it would just use an existing VM and add image management on top of that. But then it couldn't work with native programs written in C, unless it transpiled them to Java or CLR bytecode.
All tough the promise of Docker is the elimination of differences when moving software between machines, you'll still face the problem that Docker runs with the host machine's CPU architecture, which can't be crossed in Docker.
Neither Docker, nor a virtual machine, abstract a CPU to enable full cross compatibility.
Emulators do. If both Docker and VM's would run on Emulators, they would be less performant as they are today.
The docker buildx command and --build-arg ARCH flag leverages the advantage of the qemu emulator, emulating the full system with an architecture during a build. The downside of emulation is that it runs much slower than normal builds.
I am a bit confused about Docker and how can I use it. My situation is the following:
I have a project that requires the use of a requisite, in my case installing ROS2. I have installed it in my system and develop a program. No problem there.
I wish to upload it to Gitlab and use CICD there. So I am guessing I will push it to my repository and then build a pipeline where I can use as image the docker image for ROS 2. I haven't tried it yet (will do it tomorrow) but I guess that is how I should do it.
My question is, can I do something similar (or how to ) in my local machine? In other words, just use the docker image and then develop and build over there and not install the requisite in the first place?
I heartily agree that using docker to develop locally improves the development experience, primarily by obviating system specific dependency management, just as you say.
Exactly how this is done depends on how many components you need to develop simultaneously, and how you want the development environment to behave .
An obvious place to start might be docker compose, a framework for starting multiple docker containers. https://docs.docker.com/compose/gettingstarted/ looks like quite a nice tutorial on the subject, and straight from the horse's mouth too.
However, your robotics project (?) may not be a very good fit for the server/client model behind the write - restart python - execute client - debug - repeat cycle in the document. To provide a better answer, we'd need a lot more understanding of how exactly your local development works - what exactly you want your development process to look like in this project might require a different solution. So add some workflow details to your question!
I have been working with Docker for Windows for about a year now, and I still do not have a good grasp of when I should use the different images, how they are related, and what components of Windows that are in them.
On this link:
https://hub.docker.com/_/microsoft-windows-base-os-images
there are four "Featured repos":
windows/servercore
windows/nanoserver
windows/iotcore
windows
I understand that windows/servercore should contain more things that nanoserver, but what are does things exactly? Why does some programs work in servercore and not nanoserver and is there some way of finding what is missing in nanoserver for a particular program?
In addition to this, they list three related repos:
microsoft/dotnet-framework
microsoft/dotnet
microsoft/iis
Both of the dotnet repos contain five sub repos, and the difference is that dotnet-framework is based on server core, while dotnet is based on nanoserver.
Is there some comprehensible documentation of all these repos/images, maybe with a graph for a simple overview? Do some of them have a public Dockerfile that explains how they were created, like for example this:?
https://github.com/docker-library/python/blob/master/3.6/windows/windowsservercore-ltsc2016/Dockerfile
The differences your are mentionning are less linked to Docker than you think.
All images are a successions of operation which will result in a functionning environnement. See it as an automated installation, just like you would do it by hand on a physical machine.
Having different images on a repo means that the installation is different, with different settings. I'm not a .NET expert nor a Windows Server enthousiast, but for what I found, Nano Server is another way to install a Windows Server, with less functionnality so it's light-weigth. (https://learn.microsoft.com/en-us/windows-server/get-started/getting-started-with-nano-server)
Those kind of technical difference are technology specific and you'll find all the informations needed on the official documentations of Microsoft.
Remember that Docker is a way to do something, not the designer of the os you are using, so most of the time you'll have to search in the actual documentation of your system (in that case, Windows Server and .NET framework).
I hope this helped you to understand a little better, have fun with Docker!
I started looking into docker lately and I understand a lot of the benefits it offers I think, you can quickly create a docker container and run it on different machines. Building (compiling) is also relatively easy, you can download the maven image for example and just build your code. That works fine. So, building is easy, testing is easy and deploying (and running) in production is easy.
What I don't understand is how docker can make the development phase easier. And what I mean with the development phase is, starting up your IDE, reading code, quickly navigate through your methods definition using the methods the IDE provides, use intelliSense, etc. Then change something, run a unit test, try a different third party library, etc. All things you can do with your IDE. But I don't understand how to do this with a docker image. I've read a few posts about starting the IDE from within your docker container, but that requires setting things up with a windows manager and I am not sure if that's the way to go.
Of course I can set up my laptop in such a way that I can do all of this with my IDE, but that way I bypass all of the benefits docker should offer. I still have to download dependencies, set up environment variables, do a lot of manual settings etc. And not just me, but everyone in the team.
So, not a very concrete question, possibly a duplicate, but I just can't wrap my head around it, how to use an IDE together with docker?
Yeah it's hard. It also depends on what language/framework you're using. But the things you mention are all easy to accomplish. For example we use Ruby a lot and someone in my team uses RubyMine to work with his code. That source code is mapped onto the container so the changes are reflected immediately. If you want to run a test, I'm sure you can override the command your IDE brings by default with something custom like docker run --rm myapp ./run_tests.sh or similar. At least that's what I do with VIM.
Probably the most important missing part when doing dev with Docker is debugging. I think JetBrains is starting to add features to their IDE's but I'm not sure on the status of that.
Also, almost every IDE or good editor has an integrated console. You could maintain a docker exec sessions opened there and run all your app commands, like tests, generators or any other. Even do some basic debugging.
Hope it helps.
I am trying to learn Docker and for that referring to online materials. I came to know that there is official hub of images which we can pull, and run a container.
The repos are available at https://hub.docker.com/ , part of screen shot:
In this diagram we can see the official images of ubuntu, httpd, mysql (and so on).
My question is:
Do all these images have "minimal OS" on which they run. For example, if we consider httpd image, does it have the needed OS on which it runs?
From my understanding images are built in a layered architecture from a parent image. So we have a parent image and then the changes for this image is one more layer above parent image. If you see dockerfile for an image you can see something like this
FROM node:6.11.5
This node:6.11.5 is a parent image for our current image.
If you check dockerfile of parent images you will find they are somewhere in the hierarchy follow from base image.
This base image is basically an OS without kernel but has only userland software based on the different linux distributions(eg, centos, debian). So all the images uses the host OS kernel. Hence, you cannot install a Windows container on a Linux host or vice-versa.
So basically all images are layered changes on the base image which is an OS without kernel.
Please find below links for further information:
https://serverfault.com/questions/755607/why-do-we-use-a-os-base-image-with-docker-if-containers-have-no-guest-os
https://blog.risingstack.com/operating-system-containers-vs-application-containers/
If you need to create a base image you can see the steps here.
https://docs.docker.com/develop/develop-images/baseimages/
Please correct me if i am wrong.
Here's the answer: "Containers," in all their many forms, are an illusion!"
Every process that is "running in a container" is, in fact, running directly on the host operating system. But it is "wearing rose-colored glasses." It thinks that it knows what its user-id is ... but it doesn't. 🤷♂️ It thinks it knows what the network looks like. What the filesystem looks like. ... ... ...
And, it can afford to think that way, because nothing that it is able to see would tell it otherwise.
... But, nothing that it sees is actually the physical truth.
"Containerization" allows us to achieve the essential technical requirement – total isolation – at very-considerably less cost than "virtualization." The running processes see what they need to see, and so "they need be none the wiser." Meanwhile, the host operating system, which is aware of the actual truth, can very-efficiently support them: it just has to maintain the illusion. The very-expensive "virtual machine" software layer is completely gone. The only "OS" that actually exists is the host.
Most images are based on a distribution as you can see it in their Dockerfiles. Except for the distribution images themselves. They have a different base-image, which is called scratch.
You can review the images they are based on when you visit the project's page on DockerHub, for example https://hub.docker.com/_/httpd/
Their Dockerfiles are referenced and you can review them by clicking on them, e.g. the first tag "2.2" refers to this file. The first line in the Dockerfile is FROM debian:jessie and shows, that it is based on a Debian image.
It is widely used to have a separated tag with the postfix -alpine in it to indicate, that alpine linux is used, which is a much smaller base-image than the Debian image. This leads to a smaller image of the httpd image, because the base-image is much smaller.
The whole idea is that the whole image is completely stand-alone running on hardware/virtualization layer. And thus (the pro:) also cannot be influenced by anything else than that is part of the image.
Every image contains an complete os. Special docker made OS's come with a few mega bytes: for example linux Alpine which is an OS with 8 megabytes!
But bigger OS like ubuntu/windows can be a few gigabytes. Both have their advantages since docker cuts an image up in layers so when you use anbase image twice (FROM command, see N20 Answers) then you will only download this layer once.
Smaller OS has the pro of only needing to download a few megabytes. but for every (linux) Library you want to use, you will have to download & include yourself. This custom made layer then is only used in your own image and thus is not re-used in other images and thus creates a customer extra download layer & megabytes people will have to download to run your image.
If you want to make an image from nothing you can start your dockerfile with:
FROM scratch
But this is not advised, unless you really know what your are doing and/or you are hobbying around.
I think a lot of these answers miss the point. Explaining what you can or may do does not answer the question: do all docker images need an OS?
After a bit of digging, the answer is no.
https://docs.docker.com/develop/develop-images/baseimages/
FROM scratch
ADD hello /
CMD ["/hello"]
There's no OS defined in this Dockerfile. Only a precompiled binary hello world app
Also here
https://hub.docker.com/_/scratch
Also in this question:
https://serverfault.com/questions/755607/why-do-we-use-a-os-base-image-with-docker-if-containers-have-no-guest-os
An answerer makes this statement:
why do we base the container on an OS image? Because you'd want to use some commands like (apt, ls, cd, pwd).
So often the OS is just included because you might want to use some bundled low level tools or SSH into it to do some things.