Docker base images, what do they compose of? - docker

I am trying to wrap my head around the Docker architecture, in particular figuring out what exactly a base image consists of, and in doing I have been exploring some of the images found on the docker hub. Specifically when looking at the following repo it references the centos-7.2.1511-docker.tar.xz file.
I've downloaded and examined the contents of the tar and it has your typical Linux filesystem.
As I understand it, this is not a complete Linux OS and is just a replica of a linux filesystem with all the non essentials removed? Where all other requirements are drawn from the Host OS when a container is run(?)
My question essentially boils down to how one would go about creating that tar file? What exactly do you need. My intention is not to create one but rather understand what portion of files/data/dependencies come from a target OS to create an image and what gets used on the Host OS

A Docker container is a set of processes, running a sandbox enabled by Linux namespaces, on top of the host kernel.
A Docker image is a set of layers, which are often simply tarballs, of files that are unpacked, and made to look as if they are the root of the filesystem when used to start a container.
A Docker image could be just a single statically-linked executable! You can create your own Docker image from scratch by simply creating a tarball of a single executable, and giving it to docker load which wI'll store it as the appropriate internal format and register it as an image.
As you can see then, a Docker image need not be much. It certainly doesn't need a kernel, or any of the components normally used for configuring the system, networking daemons, or even things like cron. Those are all left to the host.
Things that are usually available in an image are a dynamic library runtime, and files like /etc/hosts, /etc/resolv.conf, and other files which are referenced directly by libc. This allows you to add typical dynamically-linked executables which interact with the system as if they're running on a traditionalal OS.
I have successfully "Dockerized" a legacy CentOS 6-based VM by uninstalling as many packages as possible, then tar-ing up the filesystem (excluding directories like /proc, /sys, /dev, etc.) and loading this via docker load. Afterwards, I started a container and (sometimes forcefully) removed additional "system" packages that serve no purpose in a Docker image, like kernel, udev, etc.
This blog post goes into some of the specifics of docker load:
http://tuhrig.de/difference-between-save-and-export-in-docker/

Related

Intro to Docker for FreeBSD Jail User - How and should I start the container with systemd?

We're currently migrating room server to the cloud for reliability, but our provider doesn't have the FreeBSD option. Although I'm prepared to pay and upload a custom system image for deployment, I nontheless want to learn how to start a application system instance using Docker.
in FreeBSD Jail, what I did was to extract an entire base.txz directory hierarchy as system content into /usr/jail/app, and pkg -r /usr/jail/app install apache24 php perl; then I configured /etc/jail.conf to start the /etc/rc script in the jail.
I followed the official FreeBSD Handbook, and this is generally what I've worked out so far.
But Docker is another world entirely.
To build a Docker image, there are two options: a) import from a tarball, b) use a Dockerfile. The latter of which lets you specify a "CMD", which is the default command to run, but
Q1. why isn't it available from a)?
Q2. where are information like "CMD ENV" stored? in the image? in the container?
Q3. How to start a GNU/Linux system in a container? Do I just run systemd and let it figure out the rest from configuration? Do I need to pass to it some special arguments or envvars?
You should think of a Docker container as a packaging around a single running daemon. The ideal Docker container runs one process and one process only. Systemd in particular is so heavyweight and invasive that it's actively difficult to run inside a Docker container; if you need multiple processes in a container then a lighter-weight init system like supervisord can work for you, but that's usually an exception more than a standard packaging.
Docker has an official tutorial on building and running custom images which is worth a read through; this is a pretty typical use case for Docker. In particular, best practice is to write a Dockerfile that describes how to build an image and check it into source control. Containers should avoid having persistent data if they can (storing everything in an external database is ideal); if you change an image, you need to delete and recreate any containers based on it. If local data is unavoidable then either Docker volumes or bind mounts will let you keep data "outside" the container.
While Docker has several other ways to create containers and images, none of them are as reproducible. You should avoid the import, export, and commit commands; and you should only use save and load if you can't use or set up a Docker registry and are forced to move images between systems via a tar file.
On your specific questions:
Q1. I suspect the best reason the non-docker build paths to create images don't easily let you specify things like CMD is just an implementation detail: if you look at the docker history of an image you'll see the CMD winds up being its own layer. Don't worry about it and use a Dockerfile.
Q2. The default CMD, any set ENV variables, and other related metadata are stored in the image alongside the filesystem tree. (Once you launch a container, it has a normal Unix process tree, with the initial process being pid 1.)
Q3. You don't "start a system in a container". Generally run one process or service in a container, and manage their lifecycles independently.

Why provide a Linux distro as a Dockerfile base when my host has all the software I need installed?

I want to start writing a Docker image. I have a .net Core 2.0 Web Api service that I have deployed to an Amazon Linux machine. It runs fine, but I would like to automate the build and deployment process a bit.
As far as I am concerned, there is no need for a Parent image for the image I need to build. I might grab some files from a location, run some dotnet CLI commands, and run the service using Apache as a reverse proxy. I dont really see the need for a parent image in any of that.
I am asking this question because most of the examples I have seen include a base image. Most of the time its something very generic, like "From Ubuntu". I have read that most images will include a parent image. According to Docker's documentation:
A parent image is the image that your image is based on. It refers to the contents of the FROM directive in the Dockerfile. Each subsequent declaration in the Dockerfile modifies this parent image. Most Dockerfiles start from a parent image, rather than a base image. However, the terms are sometimes used interchangeably.
What exactly is the point of inheriting from Ubuntu? Even the Docker docs suggest using Debian "since it’s very tightly controlled and kept minimal". Does that just ensure that your Linux machine has an Ubuntu distribution? Does it even matter if I am using Amazon Linux but use the Debian image as my base?
A Docker image runs in a set of filesystem namespaces which are unconnected from the host's except where you've chosen to bind-mount a volume. This means that tools installed on the host are unavailable to the container: Just because the host runs Amazon Linux doesn't mean that the userspace commands Amazon Linux provides (and the libraries those commands use to run) are available to the guests.
Without a Linux distro available inside the container, you wouldn't have a package management tool (yum, apt-get, etc) with which to install the tools you need to download a file, run software (that presumably needs to be linked to a libc, a copy of OpenSSL, or other shared components). There are also runtime parts of a working Linux system such as the resolver that are provided in userland by your distro and not shared from the host in a Docker install.
Using a base image ensures that you have tools available inside your container -- and it ensures that that container will work consistently on any Linux system with a compatible kernel and hardware architecture.
It's possible in theory to bind-mount many of the tools from the host (as by exposing all of /usr as a volume), but doing so would defeat many of the advantages Docker offers in portability.

Docker - only image with operating system?

So far I have only seen images with some operating system as a base layer. Is this necessary ? Is it possible to run some container without an operating system ?
An operating system consist of the kernel and userland utilities. So, there are no Docker images with "operating systems" as base layer, but a lot of images named after operating system distributions with their particular userland utilities.
You can create a Docker image from a tarball with anything you like. But it wouldn't be useful if it lacks /bin/sh and you want to include a RUN in the Dockerfile.
There is FROM scratch which you'll find if you follow how the images on docker hub have been built (you may have to follow various FROM x lines to different images until you reach scratch). Scratch is empty, completely empty, no libraries, no shells, no files at all. It's designed as a starting point that you can copy a bunch of files into.
You can also use scratch if you have a statically linked binary and don't want anything else inside the container. However note that debugging the container becomes more difficult since you can't launch a shell inside the container, but that also means attackers can't do that either.

Docker Base Images and Scaling Architecture

I have an app running on MongoDB, Node JS Api, React front end, Nginx proxy, etc. I have all of these setup as individual images and running locally (OSX) in separate linked containers, which I run with Docker Compose. In production, I have setup a (one) Ubuntu server on Digital Ocean at the moment, and expect to quickly scale as needed to multiple servers.
My question is what is the best way to handle the underlying Linux base image for each of these containers?
1) Should all of the linux setup (apt-gets, node / mongo installs, etc) exist on the Linux machine and outside of Docker and one could simply create a snapshot of this image, spin up a new server instance, and run the desired Docker container if you needed to quickly scale, or
2) Should all of the linux setup exist within a 'base' Ubuntu image, which the mongo, node, and nginx images build on top of. This results in each image's size growing significantly since they each have a separate instance of Ubuntu, plus all of the package dependencies to run mongo, node, and nginx, or
3) Should each process (mongo, node, nginx) have a separate linux base Docker image since they each have separate dependencies? Again, each image would be grow because they each would run an instance of Ubuntu.
What is the proper way to handle this with Docker?
The answer is #2, but I suspect you may not fully understand the relationship between container and image.
How Docker uses images
First of all an image from the the Docker docs:
Containers are created from images. An image is only downloaded and cached locally. Images are distributed via Registries.
Image layers
What makes Docker images different from virtual machine images is how they're built and stored. Again from the docs:
Each image consists of a series of layers.
Docker makes use of union file systems to combine these
layers into a single image. Union file systems allow files and
directories of separate file systems, known as branches, to be
transparently overlaid, forming a single coherent file system.
One of the reasons Docker is so lightweight is because of these
layers. When you change a Docker image—for example, update an
application to a new version— a new layer gets built. Thus, rather
than replacing the whole image or entirely rebuilding, as you may do
with a virtual machine, only that layer is added or updated. Now you
don’t need to distribute a whole new image, just the update, making
distributing Docker images faster and simpler.
So, your mongo, node, and nginx images will be thin layers on top of a base image containing your basic Linux setup. That base image will only be downloaded once and will be re-used as a component layer by the other images.

docker can I mount os from another container

I have been setting up a runtime with several images. I have been keeping them lean with one process and minimal os, based on debian (because I'm used to that).
However, I wonder why I need all these copies of the OS? Could I build one image with OS (to separate from host os) and then have other images mount relevant parts (read-only or copy where necessary -- /etc/ ...)?
I tried googling for this pattern but didn't find it. Are there any pitfalls? Does docker need "something" present to be able to boot an image even before mounting?
As long as you're using FROM debian as the base of each of your images, you only have one copy of debian. That's the beauty of using a copy-on-write filesystem like AUFS or btrfs.

Resources