I'm trying to reduce the size of my docker image which is using Centos 7.2
The issue is that it's 257MB which is too high...
I have followed the best practices to write Dockerfile in order to reduce the size...
Is there a way to modify the image after the build and rebuild that image to see the size reduced ?
First of all if you want to reduce an OS size, don't start with big one like CentOS, you can start with alpine which is small
Now if you are still keen on using CentOS, do the following:
docker run -d --name centos_minimal centos:7.2.1511 tail -f /dev/null
This will start a command in the background. You can then get into the container using
docker exec -it centos_minimal bash
Now start removing packages that you don't need using yum remove or yum purge. Once you are done you can commit the image
docker commit centos_minimal centos_minimal:7.2.1511_trial1
Experimental Squash Image
Another option is to use an experimental feature of the build command. In this you can have a dockerfile like below
FROM centos:7
RUN yum -y purge package1 package2 package2
Then build this file using
docker build --squash -t centos_minimal:squash .
For this you need to add "experimental": true to your /etc/docker/daemon.json and then restart the docker server
It is possible, but not at all elegant. Just like you can add software to the base image, you could also remove:
FROM centos:7
RUN yum -y update && yum clean all
RUN yum -y install new_software
RUN yum -y remove obsolete_software
Ask yourself: does your OS have to be CentOS? Then I would recommend you use the default installation and make sure your have enough disk space and memory.
If it does not need to be CentOS, you should rather start with a more minimalistic image. See the discussion here:
Which Docker base image should be used to install Apps in a container without any additional OS?
Related
At my company, we have hardened containers created by the security team, and I would like to extend the hardened container with another docker image. For example, if we have a hardened Debian container, and I want to add Apache, how do I do this?
I understand I can use FROM to use a base, but the examples I've seen, don't add another level of published images to an existing base, but specific commands. Do I just go to the official Dockerhub Apache (HTTP) image and just copy and paste the commands from the github repo? I'm assuming there's a cleaner way (but not sure if there is).
For example, do I
FROM mycompanyprivaterepo/Debian:latest
//some command?
FROM httpd
docker build -t mynewimagewithapache
UPDATE:
After attempting via apt-get apache2 per some comments, it kept hanging on interactive questions, Solved with the help of comments using:
My Dockerfile:
FROM myprivaterepo/hardened-ubuntu
RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get -qq install apache2
and building via:
$ docker build -t hardened-ubuntu-apache
Well, as far as I understood, you cannot use multi-stage builds and just
COPY --from=base-image /path/to/file/you-are-interested-in /path/inside/new-stage-image
in order to copy the required data to your preferred image. If this is the case, then you have to create your own Dockerfile with base image as your company mycompanyprivaterepo/Debian:latest, and then just create some layers on top of it in order to install required software, using RUN.
I have noticed that many Dockerfiles try to minimize the number of instructions by several UNIX commands in a single RUN instruction. So is there any reason?
Also is there any difference in the outcomes between the two Dockerfiles below?
Dockerfile1
FROM ubuntu
MAINTAINER demousr#example.com
RUN apt-get update
RUN apt-get install –y nginx
CMD ["echo", "Image created"]
Dockerfile2
FROM ubuntu
MAINTAINER demousr#example.com
RUN apt-get update && apt-get install –y nginx
CMD ["echo", "Image created"]
Roughly speaking, a Docker image contains some metadata & an array of layers, and a running container is built upon these layers by adding a container layer (read-and-write), the layers from the underlying image being read-only at that point.
These layers can be stored in the disk in different ways depending on the configured driver. For example, the following image taken from the official Docker documentation illustrates the way the files changed in these different layers are taken into account with the OverlayFS storage driver:
Next, the Dockerfile instructions RUN, COPY, and ADD create layers, and the best practices mentioned on the Docker website specifically recommend to merge consecutive RUN commands in a single RUN command, to reduce the number of layers, and thereby reduce the size of the final image:
https://docs.docker.com/develop/dev-best-practices/
[…] try to reduce the number of layers in your image by minimizing the number of separate RUN commands in your Dockerfile. You can do this by consolidating multiple commands into a single RUN line and using your shell’s mechanisms to combine them together. […]
See also: https://docs.docker.com/develop/develop-images/dockerfile_best-practices/
Moreover, in your example:
RUN apt-get update -y -q
RUN apt-get install -y nginx
if you do docker build -t your-image-name . on this Dockerfile, then edit the Dockerfile after a while, add another package beyond nginx, then do again docker build -t your-image-name ., due to the Docker cache mechanism, the apt-get update -y -q won't be executed again, so the APT cache will be obsolete. So this is another upside for merging the two RUN commands.
In addition to the space savings, it's also about correctness
Consider your first dockerfile (a common mistake when working with debian-like systems which utilize apt):
FROM ubuntu
MAINTAINER demousr#example.com
RUN apt-get update
RUN apt-get install –y nginx
CMD ["echo", "Image created"]
If two or more images follow this pattern, a cache hit could cause the image to be unbuildable due to cached metadata
let's say I built an image which looks similar to that ~a few weeks ago
now I'm building this image today. there's a cache present up until the RUN apt-get update line
the docker build will reuse that cached layer (since the dockerfile and base image are identical) up to the RUN apt-get update
when the RUN apt-get install line runs, it will use the cached apt metadata (which is now weeks out of date and likely will error)
I want to install net-tools on one of my running containers, which is running busybox:uclibc image. But this image doesn't have any package manager like apt-get or apk. Is there a way to do it or should I just make changes to my image?
Anything based on Busybox doesn't have a package manager. It's a single binary with a bunch of symlinks into it, and the way to add software to it is to write C code and recompile. That is, /bin/busybox literally is ls and sed and sh and cp and ...
To install my microservice binaries I need a centos. And since I have 20 microservice I'm trying to find a way to optimize the images size so I'm wondering if there's a way to create a docker image without os and at the moment of deployment Docker takes the OS Layer from cache to put it in all the images.. I'm a beginner so I don't know if I'm clear in my statements ?
Yes, look at the scratch keyword (docs):
You can use Docker’s reserved, minimal image, scratch, as a starting
point for building containers.
Also you may find useful using multi-stage builds.
An example:
FROM scratch
ADD hello /
FROM fedora
RUN yum -y update && yum clean all
RUN yum -y install nginx
A beginner's question; how does Docker handle underlying operating system variations when using the RUN command?
Let's take, for example, a very simple Official Docker Hub Dockerfile, for JRE 1.8. When it comes to installing the packages for java, the Dockerfile uses apt-get:
RUN apt-get update && apt-get install -y --no-install-recommends ...
To the untrained eye, this appears to be a platform-specific instruction that will only work on Debian-based operating systems (or at least ones with APT installed).
How exactly would this work on a CentOS installation, for example, where the package manager would be yum? Or god forbid, something like Solaris.
If this pattern of using RUN to fork arbitrary shell commands is prevalent in docker, how does one avoid inter-platform, or even inter-version, dependencies?
i.e. what if the Dockerfile writer has a newer version of (say) grep than I do, and they've used some new CLI flag that isn't available on earlier versions?
The only two outcomes from this can be: (1) RUN command exits with non-zero exit code (2) the Dockerfile changes the installed version of grep before running the command.
The common point shared by all Dockerfiles is the FROM statement. It is the first line in the file and indicates the parent Docker image you're building on. A typical base image could be one with Ubuntu (i.e.: https://hub.docker.com/_/ubuntu/). The snippet you share in your question would fit well in an Ubuntu image (with apt-get) but not in a CentOS image.
In summary, you're installing docker in your CentOS system, but you're building a Docker image with Ubuntu in it.
As I commented in your question, you can add FROM statement to specify which relaying OS you want. for example:
FROM docker.io/centos:latest
RUN yum update -y
RUN yum install -y java
...
now you have to build/create the image with:
docker build -t <image-name> .
The idea is that you'll use the OS you are familiar with (for example, CentOS) and build an image of it. Now, you can take this image and run it above Ubuntu/CentOS/RHEL/whatever... with
docker run -it <image-name> bash
(You just need to install docker in the desired OS.