Different images in containers - docker

I want to create separated containers with a single service in each (more or less). I am using the php7-apache image which seems to use a base image of debian:jessie, php7 and apache. Since apache and php in this case are pretty intertwined I don't mind using this container.
I want to start adding other services to their own containers (git for example) and was considering using a tiny base image like busybox or alpinebox for these containers to keep image size down.
That said, I have read that using the same base image as other containers only gives you the 'penalty' of the one time image download of the base OS (debian jessie) which is then cached - while using tiny OSes in other containers will download those OSes on top of the base OS.
What is the best practice in this case? Should I use the same base image (debian jessie) for all the containers in this case?

You may want to create a base image from scratch. Create a base image from scratch.
From docker documentation
You can use Docker’s reserved, minimal image, scratch, as a starting point for building containers. Using the scratch “image” signals to the build process that you want the next command in the Dockerfile to be the first filesystem layer in your image.
While scratch appears in Docker’s repository on the hub, you can’t pull it, run it, or tag any image with the name scratch. Instead, you can refer to it in your Dockerfile. For example, to create a minimal container using scratch:
This example creates the hello-world image used in the tutorials. If you want to test it out, you can clone the image repo

Related

What is docker's scratch image?

I'm new to docker and I was trying out the first hello world example in the docs. As I understand the hello-world image is based on top of the scratch image. Could someone please explain how the scratch image works? As I understand it is essentially blank. How is the binary executed in the hello-world image then?
The scratch image is the most minimal image in Docker. This is the base ancestor for all other images. The scratch image is actually empty. It doesn't contain any folders/files ...
The scratch image is mostly used for building other base images. For instance, the debian image is built from scratch as such:
FROM scratch
ADD rootfs.tar.xz /
CMD ["bash"]
The rootfs.tar.xz contains all the files system files. The Debian image adds the filesystem folders to the scratch image, which is empty.
As I understand it is essentially blank. How is the binary executed in
the hello-world image then?
The scratch image is blank.The hello-world executable added to the scratch image is actually statically compiled, meaning that it is self-contained and doesn't need any additional libraries to execute.
As stated in the offical docker docs:
Assuming you built the “hello” executable example from the Docker
GitHub example C-source code, and you compiled it with the -static
flag, you can then build this Docker image using: docker build --tag
hello
This confirms that the hello-world executable is statically compiled. For more info about static compiling, read here.
A bit late to the party, but adding to the answer of #yamenk.
Scratch isn't technically an image, but it's merely a reference. The way container images are constructed is that it makes use of the underlying Kernel providing only the tools and system calls that are present inside the kernel. Because in Linux everything is a file you can add any self-contained binary or an entire operating system as a file in this filesystem.
This means that when creating an image from Scratch, technically refers to the Kernel of the host system and all the files on top of it are loaded. That's why building from Scratch is no also a no-op operation and when adding just a single binary the size of the image is only the size of that binary plus a bit of overhead.
The resources that you can assign when executing an image in a container is by leveraging the cgroups functionality and the networking makes use of the linux network namespacing technique.
In a short, The official scratch image contains nothing, totally zero bytes.
But the container instance is not what the container image looks like. Even the scratch image is empty. When the container like runC run up a instance from a image built from scratch, It need more things (like rootfs etc.) than what you can see in the dockfile.

Is it necessary to have an base OS for docker image?

I am using Windows OS but to use docker I use CentOS VM over Oracle VM Virtualbox. I have seen a Dockerfile where centos is used as base image. First line of my Dockerfile is
FROM centos
If I check the Dockerfile of CentOS on Docker Hub then first line is
FROM scratch
scratch is used to build an explicitly empty image, especially for building images. Here I can understand that if I start traversing upward using "FROM " line then finally I will end up at "scratch" image. I can see that scratch can be used to create a minimal container.
Question: If I want to create some bigger applications using web server, database etc, then is it necessary to add a base OS image?
I have tried to search for mysql and tomcat and noticed that it finally uses a OS image.
My understanding of Container was that I can "just bundle the required software and my service" in the container. Please clarify.
Your understanding is correct, however "just bundle the required software and my service" may be cumbersome, especially if you also have some shell scripts that make further use of other support programs.
Using some base image that contains already all the necessary stuff is more convenient. You can share the same base image for several services and due to docker's layered images will have no overhead regarding disk space.

What is the overhead of creating docker images?

I'm exploring using docker so that we deploy new docker images instead of specific file changes, so all of the needs of the application come with each deployment etc.
Question 1:
If I add a new application file, say 10 MB, to a docker image when I deploy the new image, using the tools in Docker tool box, will this require the deployment of an entirely new image to my containers or do docker deployments just take the difference between the 2, similar to git version control?
Another way to put it, I looked on a list of docker base images and saw a version of ubuntu that is 188 MB. If I commit a new application to a docker image, using this base image, will my docker containers need to pull the full 188 MB, which they are already running, plus the application or is there a differential way of just getting what has changed?
Supplementary Question
Am I correct in assuming when using docker, deploying images is the intended approach? Meaning any new changes should require a new image deployment so that images are treated as immutable? When I was using AWS we followed this approach with AMI (Amazon Machine Images) but storing AMIs had low overhead, for docker I don't know yet.
Or is it a better practice to deploy dockerfiles and have the new image be built on the container itself?
Docker uses a layered union filesystem, only one copy of a layer will be pulled by a docker engine and stored on its filesystem. When you build an image, docker will check its layer cache to see if the same parent layer and same command have been used to build an existing layer, and if so, the cache is reused instead of building a new layer. Once any step in the build creates a new layer, all following steps will create new layers, so the order of your Dockerfile matters. You should add frequently changing steps to the end of the Dockerfile so the earlier steps can be cached.
Therefore, if you use a 200MB base image, have 50MB of additions, but only 10MB are new additions at the end of your Dockerfile, you'd push 250MB the first time to a docker engine, but only 10MB to an engine that already had a previous copy of that image, or 50MB to an engine that just had the 200MB base image.
The best practice with images is to build them once, push them to a registry (either self hosted using the registry image, cloud hosted by someone like AWS, or on Docker Hub), and then pull that image to each engine that needs to run it.
For more details on the layered filesystem, see https://docs.docker.com/engine/userguide/storagedriver/imagesandcontainers/
You can also work a little, in order to create smaller images.
You can use Alpine or Busybox instead of using bigger Ubuntu, Debian or Bitnami (Debian light).
A smaller image is more secure as less tools are available.
Some reading
http://blog.xebia.com/how-to-create-the-smallest-possible-docker-container-of-any-image/
https://www.dajobe.org/blog/2015/04/18/making-debian-docker-images-smaller/
You have 2 great tools in order to make smaller docker images
https://github.com/docker-slim/docker-slim
and
https://github.com/mvanholsteijn/strip-docker-image
Some examples with docker-slim
https://hub.docker.com/r/k3ck3c/grafana-xxl.slim/
shows
size before -> 357.3 MB
and using docker-slim -> 18.73 MB
or about simh
https://hub.docker.com/r/k3ck3c/simh_bitnami.slim/
size 5.388 MB
when the original
k3ck3c/simh_bitnami 88.86 MB
a popular netcat image
chilcano/netcat is 135.2 MB
when a netcat based on Alpine is 7.812 MB
and based on busybox will need 2 or 3 MB

Docker Base Images and Scaling Architecture

I have an app running on MongoDB, Node JS Api, React front end, Nginx proxy, etc. I have all of these setup as individual images and running locally (OSX) in separate linked containers, which I run with Docker Compose. In production, I have setup a (one) Ubuntu server on Digital Ocean at the moment, and expect to quickly scale as needed to multiple servers.
My question is what is the best way to handle the underlying Linux base image for each of these containers?
1) Should all of the linux setup (apt-gets, node / mongo installs, etc) exist on the Linux machine and outside of Docker and one could simply create a snapshot of this image, spin up a new server instance, and run the desired Docker container if you needed to quickly scale, or
2) Should all of the linux setup exist within a 'base' Ubuntu image, which the mongo, node, and nginx images build on top of. This results in each image's size growing significantly since they each have a separate instance of Ubuntu, plus all of the package dependencies to run mongo, node, and nginx, or
3) Should each process (mongo, node, nginx) have a separate linux base Docker image since they each have separate dependencies? Again, each image would be grow because they each would run an instance of Ubuntu.
What is the proper way to handle this with Docker?
The answer is #2, but I suspect you may not fully understand the relationship between container and image.
How Docker uses images
First of all an image from the the Docker docs:
Containers are created from images. An image is only downloaded and cached locally. Images are distributed via Registries.
Image layers
What makes Docker images different from virtual machine images is how they're built and stored. Again from the docs:
Each image consists of a series of layers.
Docker makes use of union file systems to combine these
layers into a single image. Union file systems allow files and
directories of separate file systems, known as branches, to be
transparently overlaid, forming a single coherent file system.
One of the reasons Docker is so lightweight is because of these
layers. When you change a Docker image—for example, update an
application to a new version— a new layer gets built. Thus, rather
than replacing the whole image or entirely rebuilding, as you may do
with a virtual machine, only that layer is added or updated. Now you
don’t need to distribute a whole new image, just the update, making
distributing Docker images faster and simpler.
So, your mongo, node, and nginx images will be thin layers on top of a base image containing your basic Linux setup. That base image will only be downloaded once and will be re-used as a component layer by the other images.

Can you share Docker containers?

I have been trying to figure out why one might choose adding every "step" of their setup to a Dockerfile which will create your container in a certain state.
The alternative in my mind is to just create a container from a simple base image like ubuntu and then (via shell input) configure your container the way you'd like.
But can you share containers? If you can only share images with Docker then I'd understand why one would want every step of their container setup listed in a Dockerfile.
The reason I ask is because I imagine there is some amount of headache involved with porting shell commands, file changes for configs, etc. to correct Dockerfile syntax and have them work correctly? But as a novice with Docker I could be overestimating the difficulty of that task.
EDIT: I suppose another valid reason for having the Dockerfile with each setup step is for documentation as to the initial state of the container. As opposed to being given a container in a certain state, but not necessarily having a way to know what all was done from the container's image base state.
But can you share containers? If you can only share images with Docker then I'd understand why one would want every step of their container setup listed in a Dockerfile.
Strictly speaking, no. However, you can create a new image from an existing container using the docker commit command:
$ docker commit <container-name> <image-name>
This command will create a new image from the existing container that you can push and pull from/to registries, export and import and create new containers from.
The reason I ask is because I imagine there is some amount of headache involved with porting shell commands, file changes for configs, etc. to correct Dockerfile syntax and have them work correctly? But as a novice with Docker I could be overestimating the difficulty of that task.
If you're already using some other mechanism for automated configuration, you can simply integrate your existing automation into the Docker build. For instance, if you are already configuring your images using shell scripts, simply add a build step in your Dockerfile in which to add your install scripts to the container and execute it. In theory, this can also work with configuration management utilities like Puppet, Salt and others.
EDIT: I suppose another valid reason for having the Dockerfile with each setup step is for documentation as to the initial state of the container. As opposed to being given a container in a certain state, but not necessarily having a way to know what all was done from the container's image base state.
True. As mentioned in comments, there are clear advantages to have an automated and reproducible build of your image. If you build your containers manually and then create an image with docker commit, you don't necessarily know how to re-build this image at a later point in time (which may become necessary when you want to release a new version of your application or re-build the image on top of an updated base image).

Resources