Do all docker images have minimal OS? - docker

I am trying to learn Docker and for that referring to online materials. I came to know that there is official hub of images which we can pull, and run a container.
The repos are available at https://hub.docker.com/ , part of screen shot:
In this diagram we can see the official images of ubuntu, httpd, mysql (and so on).
My question is:
Do all these images have "minimal OS" on which they run. For example, if we consider httpd image, does it have the needed OS on which it runs?

From my understanding images are built in a layered architecture from a parent image. So we have a parent image and then the changes for this image is one more layer above parent image. If you see dockerfile for an image you can see something like this
FROM node:6.11.5
This node:6.11.5 is a parent image for our current image.
If you check dockerfile of parent images you will find they are somewhere in the hierarchy follow from base image.
This base image is basically an OS without kernel but has only userland software based on the different linux distributions(eg, centos, debian). So all the images uses the host OS kernel. Hence, you cannot install a Windows container on a Linux host or vice-versa.
So basically all images are layered changes on the base image which is an OS without kernel.
Please find below links for further information:
https://serverfault.com/questions/755607/why-do-we-use-a-os-base-image-with-docker-if-containers-have-no-guest-os
https://blog.risingstack.com/operating-system-containers-vs-application-containers/
If you need to create a base image you can see the steps here.
https://docs.docker.com/develop/develop-images/baseimages/
Please correct me if i am wrong.

Here's the answer: "Containers," in all their many forms, are an illusion!"
Every process that is "running in a container" is, in fact, running directly on the host operating system. But it is "wearing rose-colored glasses." It thinks that it knows what its user-id is ... but it doesn't. 🤷‍♂️ It thinks it knows what the network looks like. What the filesystem looks like. ... ... ...
And, it can afford to think that way, because nothing that it is able to see would tell it otherwise.
... But, nothing that it sees is actually the physical truth.
"Containerization" allows us to achieve the essential technical requirement – total isolation – at very-considerably less cost than "virtualization." The running processes see what they need to see, and so "they need be none the wiser." Meanwhile, the host operating system, which is aware of the actual truth, can very-efficiently support them: it just has to maintain the illusion. The very-expensive "virtual machine" software layer is completely gone. The only "OS" that actually exists is the host.

Most images are based on a distribution as you can see it in their Dockerfiles. Except for the distribution images themselves. They have a different base-image, which is called scratch.
You can review the images they are based on when you visit the project's page on DockerHub, for example https://hub.docker.com/_/httpd/
Their Dockerfiles are referenced and you can review them by clicking on them, e.g. the first tag "2.2" refers to this file. The first line in the Dockerfile is FROM debian:jessie and shows, that it is based on a Debian image.
It is widely used to have a separated tag with the postfix -alpine in it to indicate, that alpine linux is used, which is a much smaller base-image than the Debian image. This leads to a smaller image of the httpd image, because the base-image is much smaller.

The whole idea is that the whole image is completely stand-alone running on hardware/virtualization layer. And thus (the pro:) also cannot be influenced by anything else than that is part of the image.
Every image contains an complete os. Special docker made OS's come with a few mega bytes: for example linux Alpine which is an OS with 8 megabytes!
But bigger OS like ubuntu/windows can be a few gigabytes. Both have their advantages since docker cuts an image up in layers so when you use anbase image twice (FROM command, see N20 Answers) then you will only download this layer once.
Smaller OS has the pro of only needing to download a few megabytes. but for every (linux) Library you want to use, you will have to download & include yourself. This custom made layer then is only used in your own image and thus is not re-used in other images and thus creates a customer extra download layer & megabytes people will have to download to run your image.
If you want to make an image from nothing you can start your dockerfile with:
FROM scratch
But this is not advised, unless you really know what your are doing and/or you are hobbying around.

I think a lot of these answers miss the point. Explaining what you can or may do does not answer the question: do all docker images need an OS?
After a bit of digging, the answer is no.
https://docs.docker.com/develop/develop-images/baseimages/
FROM scratch
ADD hello /
CMD ["/hello"]
There's no OS defined in this Dockerfile. Only a precompiled binary hello world app
Also here
https://hub.docker.com/_/scratch
Also in this question:
https://serverfault.com/questions/755607/why-do-we-use-a-os-base-image-with-docker-if-containers-have-no-guest-os
An answerer makes this statement:
why do we base the container on an OS image? Because you'd want to use some commands like (apt, ls, cd, pwd).
So often the OS is just included because you might want to use some bundled low level tools or SSH into it to do some things.

Related

how to choose the base windows image when create windows docker image?

For some reason, we have to build windows based docker. from here, we know there are 4 types of base image we could build from.
windows/nanoserver
windows/servercore
windows
windows/iotcore
I am sure I am not IOT relevant, so windows/iotcore is excluded. while it is not sure about the remains three. it seems from size perspective (nanoserver < servercore < windows). I should try in this order. by now, my service will not start in 1 neither 2. i have to try 3.
what are the criteria to choose between them?
clearly, I am missing some dll to start the service, while dependencywalker seems also not work in the base image 1 and 2. do someone have experience on how to identify this missing DLL? in this way, it still is possible to use minimize base image with the missed dll.
Progress update:
My service succeed run with #3(windows base image). but the docker image size is very very large. see following. this makes the choice important.
mcr.microsoft.com/windows/nanoserver 10.0.14393.2430 9fd35fc2a361 15 months ago 1.14GB
mcr.microsoft.com/windows/servercore 1809-amd64 733821d00bd5 5 days ago 4.81GB
mcr.microsoft.com/windows 1809-amd64 57e56a07cc8a 6 days ago 12GB
Many Thanks.
you've probably moved on by now but essentially
IOT - tiny, for builders and maker boards.
Nanoserver = smallest. running netcore apps. you have to build it using multi stage builds. It's quite advanced from what I see to get working.
ServerCore = middle. GUIless windows server. Is the most common default base image. You've not said what service is not running but it's possible that including the App Compatability FOD might solve the problem without increasing the size as much. Use newest container. 1903 I think it is.
https://learn.microsoft.com/en-us/windows-server/get-started-19/install-fod-19
Windows = fattest, the whole shebang

How are all official windows docker images related?

I have been working with Docker for Windows for about a year now, and I still do not have a good grasp of when I should use the different images, how they are related, and what components of Windows that are in them.
On this link:
https://hub.docker.com/_/microsoft-windows-base-os-images
there are four "Featured repos":
windows/servercore
windows/nanoserver
windows/iotcore
windows
I understand that windows/servercore should contain more things that nanoserver, but what are does things exactly? Why does some programs work in servercore and not nanoserver and is there some way of finding what is missing in nanoserver for a particular program?
In addition to this, they list three related repos:
microsoft/dotnet-framework
microsoft/dotnet
microsoft/iis
Both of the dotnet repos contain five sub repos, and the difference is that dotnet-framework is based on server core, while dotnet is based on nanoserver.
Is there some comprehensible documentation of all these repos/images, maybe with a graph for a simple overview? Do some of them have a public Dockerfile that explains how they were created, like for example this:?
https://github.com/docker-library/python/blob/master/3.6/windows/windowsservercore-ltsc2016/Dockerfile
The differences your are mentionning are less linked to Docker than you think.
All images are a successions of operation which will result in a functionning environnement. See it as an automated installation, just like you would do it by hand on a physical machine.
Having different images on a repo means that the installation is different, with different settings. I'm not a .NET expert nor a Windows Server enthousiast, but for what I found, Nano Server is another way to install a Windows Server, with less functionnality so it's light-weigth. (https://learn.microsoft.com/en-us/windows-server/get-started/getting-started-with-nano-server)
Those kind of technical difference are technology specific and you'll find all the informations needed on the official documentations of Microsoft.
Remember that Docker is a way to do something, not the designer of the os you are using, so most of the time you'll have to search in the actual documentation of your system (in that case, Windows Server and .NET framework).
I hope this helped you to understand a little better, have fun with Docker!

Secure Docker Image from Being Copied or Encrypt Docker Image Contents

We have developed a tool in python which uses many libraries and other algorithms. We want to give that to customers on premise through docker image. It works pretty well. However, if someone copies image and exports/extracts (export or save command), everything becomes visible that includes our python files and library (python) files as well.
Is there a way, we can protect our code such that customers can't export it or see anything inside the image? Is there a way whole image can be encrypted or locked? I believe obfuscation can help to an extent, is there an obfuscation tool that obfuscates whole project (all files and folders while not breaking references)?
The root user on the host machine (where the docker daemon runs) has full access to all the processes running on the host. That means the person who controls the host machine can always get access to the RAM of the application as well as the file system. That makes it impossible to hide a key for decrypting the file system or protecting RAM from debugging.
Since you are sharing the image, You got no way to protect it from copying.
However using obfuscation on a standard Linux box, you can make it harder to read the file system and RAM, but you can't make it impossible or the container cannot run.

Good way to pick a TAG for Dockerfile

I was wondering, what is some good ways, to decide which TAG for Dockerfile.
For instance, I'm clear on my requirement.
I need Python 3.6
I need Alpine distribution due to its small size.
Should I pick
FROM python:3.6.6-alpine3.8
or
FROM python3.6-alpine
Even if python:3.6-alpine is an alias for python:3.6.6-alpine3.8, generally (but it's a personal preference), I tried to be as precise as possible when choosing a base image, so I would rather use python:3.6.6-alpine3.8.
It avoids misunderstanding for people reading the Dockerfile, and for people using the image, like :
Oh! Why's that library not available in the image? Maybe because of Alpine version? By the way, what's the Alpine version? I'm going to check /etc/alpine-release to see...
For Python version, it's more complicated because using 3.6.6 tag, your build can eventually fail in the future if a 3.6.7 version is released; indeed, by looking Python images versions on Docker store, only images with the latest fix version seems to be kept. But all depends if your image will need to be rebuilt, or if it's just pushed once to your own registry and then used as a base image. If regular builds are expected, in that particular case, I would maybe use 3.6-alpine3.8, because, normally, fix versions does not remove compatibility and just add small improvements/bug fixes.
In short, it does not cost anything to be precise when choosing a tag and saves a lot of explanation when the image is used : you know exactly what's in the base image you extend, just by reading the Dockerfile. But be careful if those tags have a short life expectancy.

How do I leverage package maintainers' experience with Docker?

When building Docker images, I find myself in a strange place -- I feel like I'm doing something that somebody has already done many times before -- and did a vastly better job at it. In most cases, this gut feeling is absolutely right -- I'm taking a piece of software and re-describe everything that's already described in the OS's packaging system in a Dockerfile.
More often than not, I even find myself installing software into the image using a packager manager and then looking inside that package to get some clues about writable paths, configuration files, open ports etc. for my Dockerfile. The duplication of effort between OS packager and Docker packager is most evident in such a case which I assume is one of the more common.
So basically, every Docker user building an image on top of pre-packaged software is re-packaging almost from scratch, but without the time and often the domain knowledge the OS packagers had for trial, error and polish. If we consider the low reusability of community-maintained images (re-basing from Debian to RHEL hurts), we're stuck with copying or re-implementing functionality that already exists and works on OS level, wasting a lot of time and putting a maintenance burden on the poor souls who'd inherit whatever we might leave behind.
Is there any way to resolve this duplication of effort and re-use whatever package maintainers have already learned about a piece of software in Docker?
The main source for Docker image reuse is hub.docker.com
Search first there if your system is already described in one of those images.
You can see their Dockerfile, and start your own from one of those images instead of starting from a basic ubuntu or wheezy one.

Resources