Docker container export and importing it into image - docker

I have started using docker recently, Initially when I stared the docker image was 2-3 GB in size. I am saving the work done from the container into an image(s) so the image size have grown significantly(~6 GB). I want delete images while preserving the work done. When I export the container to gziped file, the size of that file is ~1 GB. Will it work fine if I delete the current image I have now(~6 GB) and create a new one from the gzipped file with docker import. The description of import command says it will create filesystem image, its docker image or something else ie I will be able to create containers from that image?

You can save the image (see more details here), for example:
docker save busybox > busybox.tar
Another alternative is to write a Dokerfile which contains all the instructions necessary to build your image. The main advantage is that this is a text file which can be versioned controlled hence, you can keep track of all the changes you made to your image. Another advantage is that you can deploy that image elsewhere without having to copy images across the system. For example, instead of copying a 1GB or 6GB image, you just need to copy the DockerFile and build the image in that new host. More details about the docker file can be found here

Related

Will Docker assist as a high level compressor?

I have a package sizing around 2+GB in an ubuntu machine. I have created a docker image from docker file and the size is around 86MB. Now, I have created another image by updating the docker file with the copy command which is to copy the package in the ubuntu machine to the docker image while building a new docker image.
Now, I am seeing the docker image size is around 2GB since the docker image contains the package as well. I want to know does this docker phenomena helps us in anyway to reduce the size of the package after making as a docker image.
I want the docker image to be created in such a way it should also contains the package which is around 2GB but I don't want docker image size should cross 200MB.
Regards,
Karthik
Docker will not compress an image's data for you. Hoping for 10x compression on an arbitrary installed package is rather optimistic, Docker or otherwise.
While Docker uses some clever and complicated kernel mechanisms, in most cases a file in a container maps to an ordinary file on disk in a strange filesystem layout. If you're trying to install a 2 GB package into an image, it will require 2 GB of local disk space. There are some mechanisms to share data that can help reduce overall disk usage (if you run 10 containers based on that image, they will share the 2 GB base image and not use extra disk space) but no built-in compression.
One further note: once you've taken action in a Dockerfile to use space in some form, it's permanently used. Each line in the Dockerfile creates a new "layer" or separate image which records the changes from the previous layer. If you're trying to install package that's not in a repository of some sort, you're stuck with its space utilization in the image, unless you can use multi-stage builds for it:
FROM ubuntu:18.04
COPY package.deb .
# At this point there is a layer containing the .deb file and
# it uses space in the final image.
RUN dpkg --install package.deb && rm package.deb
# The file isn't "in the image", but the image remembers adding it
# and then removing it, so it still uses space.
As mentioned by David it's not possible using docker which uses overlay or something similar layered FS representation.
But there is a workaround to keep the size of image low (but the size of the container will be big !!), but it'll increase the startup time of app from the docker container, also may add some complications to the arguments that you pass to the app. A possible way to reduce size :
compress the package before adding
copy the compressed one while building an image
use some script as the ENTRYPOINT/CMD whatever you use to start the container which should perform
uncompression [you'll need the uncompression package installed in the container] and
start the application.(you can reuse the arguments received to docker run as $# while passing to the application - or even validate the arguments beforehand)
Check by compressing your package file with gzip / 7zip / bzip2 or any compressor of your choice - all to see the output file is of reduced size.
If it can't be compressed enough

The best way to create docker image "offline installer"

I use docker-compose file to get Elasticsearch Logstash Kibana stack. Everything works fine,
docker-compose build
command creates three images, about 600 MB each, downloads from docker repository needed layers.
Now, I need to do the same, but at the machine with no Internet access. Downloading from respositories there is impossible. I need to create "offline installer". The best way I found is
docker save image1 image2 image3 -o archivebackup.tar
but created file is almost 2GB. During
docker-compose build
command some data are downloaded from the Internet but it is definitely less than 2GB.
What is a better way to create my "offline installer", to avoid making it so big?
The save command is the way to go for running docker images online.
The size difference that you are noticing is because when you are pulling images from a registry, some layers might exist locally and are thus not pulled. So you are not pulling all the image layers, only the ones
that you don't have locally.
On the other hand, when you are saving the image to a tar, all the layers need to be stored.
The best way to create the Docker offline Installer is to
List item
Get the CI/CD pipeline to generate the TAR file as build process.
Later create a local folder with the required TAR files
Write a script to load these TAR files on the machine
The same script can fire the docker-compose up -d command to bring up the whole service ecosystem
Note : It is important to load the images before bringing up the services
Regarding the size issue the answer by Yamenk specifically points to the reason why the size increases. The reason is docker does not pull the shared layers.

How can I see Dockerfile for each docker image?

I have the following docker images.
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
hello-world latest 48b5124b2768 2 months ago 1.84 kB
docker/whalesay latest 6b362a9f73eb 22 months ago 247 MB
Is there a way I can see the Dockerfile of each docker image on my local system?
The answer at Where to see the Dockerfile for a docker image? does not help me because it does not exactly show the Dockerfile but the commands run to create the image. I want the Dockerfile itself.
As far as I know, no, you can't. Because a Dockerfile is used for building the image, it is not packed with the image itself. That means you should reverse engineer it. You can use docker inspect on an image or container, thus getting some insight and a feel of how it is configured. The layers an image are also visible, since you pull them when you pull a specific image, so that is also no secret.
However, you can usually see the Dockerfile in the repository of the image itself on Dockerhub. I can't say most repositories have Dockerfiles attached, but the most of the repositories I seen do have it.
Different repository maintainers may opt for different ways to document the Dockerfiles. You can see the Dockerfile tab on the repository page if automatic builds are set up. But when multiple parallel versions are available (like for Ubuntu), maintainers usually opt to put links the Dockerfiles for different versions in the description. If you take a look here: https://hub.docker.com/_/ubuntu/, under the "Supported tags" (again, for Ubuntu), you can see there are links to multiple Dockerfiles, for each respective Ubuntu version.
As the images are downloaded directly from the Dockerhub, only the image is pulled from the docker hub into your machine. If you want to see the dockerfile, then you can go to docker hub and type the image name and version name in the tag format (e.g ubuntu:14.04) this will open the image along with Docker file details. Also keep in mind, only if the owner of the image shared their Dockerfile, you can see it. Otherwise not. Most official images will not provide you with Dockerfile.
Hope it helps!
You can also regenerate the dockerfile from an image or use the docker history <image name> command to see what is inside.
check this: Link to answer
TL;DR
So if you have a docker image that was built by a dockerfile, you can recover this information (All except from the original FROM command, which is important, I’ll grant that. But you can often guess it, especially by entering the container and asking “What os are you?”). However, the maker of the image could have manual steps that you’d never know about anyways, plus they COULD just export an image, and re-import it and there would be no intermediate images at that point.
One approach could be to save the image in a image.tar file. Next extract the file and try to explore if you can find Dockerfile in any of the layer directories.
docker image save -o hello.tar hello-world
This will output a hello.tar file.
hello.tar is the compressed output image file and hello-world is the name of the image you are saving.
After that, extract the compressed file and explore the image layer directories. You may find Dockerfile in one of the directories.
However, there is one thing to note, if the image was built while ignoring the Dockerfile in the .dockerignore. Then you will not find the Dockerfile by this approach.

docker - how can we export/import (or save/load) only the new changes?

I'm new to docker, Could any one help for below query
Server has a docker image like 1GB Image:ver1 [this image stored has .tar file in server]
In ubuntu PC donwloaded the tar image form server and loads/import the image[Image:ver1] using Docker
A new Image:ver2 is available on the Serever, size is still 1GB but difference with ver1 is only 10MB.
Q1: If it’s possible to “import/load” the new image[Image:ver2] from server, how can we export/import (or save/load) only the new changes[i.e 10MB]?
Q2: if we are able to apply above changes on top o existing image[i.e Image:Ver1], what are the steps to do?
Docker is a file based system and for each Pull request, it only pulls out the files which are changed. For example, if suppose you have 1 GB data in a file in a docker image. Now, you added 500MB of data to it. Then, in case of docker pull, it will only pull the changes, i.e the Delta part between the 2 files. So, you are safe and it won't pull all the things separately.
Although, while making a DockerFile, or docker conf file, you should be very careful, as all the lines in a Docker file is stored as a layer in the system. If suppose you have 10 layers in your Docker file, and you are changing the 5th layer, then all the layers after the 5th layer will again be pulled. That is the only catch using Docker.
Rest, it will always pull the Delta of changes for each pull request.
If you want to save/load tar files of docker images, there's no option to export a partial image. You can send over the full image, move your data to be an external volume that isn't transfered this way, or you can use a docker registry.
The latter is relatively easy to implement, docker includes an image where you can run your own private registry. Pushes and pulls to a docker registry will only send the changed layers, so you can make use of layer caching and structure your Dockerfiles to minimize the number of changed layers.
Ok, I've built the tool to create diff (in top layers) of docker images' versions (layer by layer) as a tarball and inflate the original image later.
Note. Works only for changes in top layers.
4-step process:
docker inspect -> make json with old layers' hashes as a json file
Prepare diff based on new image and hashes of old (existing) layers
Transfer diff to target machine
Inflate target image's tar based on diff and old image

What is the overhead of creating docker images?

I'm exploring using docker so that we deploy new docker images instead of specific file changes, so all of the needs of the application come with each deployment etc.
Question 1:
If I add a new application file, say 10 MB, to a docker image when I deploy the new image, using the tools in Docker tool box, will this require the deployment of an entirely new image to my containers or do docker deployments just take the difference between the 2, similar to git version control?
Another way to put it, I looked on a list of docker base images and saw a version of ubuntu that is 188 MB. If I commit a new application to a docker image, using this base image, will my docker containers need to pull the full 188 MB, which they are already running, plus the application or is there a differential way of just getting what has changed?
Supplementary Question
Am I correct in assuming when using docker, deploying images is the intended approach? Meaning any new changes should require a new image deployment so that images are treated as immutable? When I was using AWS we followed this approach with AMI (Amazon Machine Images) but storing AMIs had low overhead, for docker I don't know yet.
Or is it a better practice to deploy dockerfiles and have the new image be built on the container itself?
Docker uses a layered union filesystem, only one copy of a layer will be pulled by a docker engine and stored on its filesystem. When you build an image, docker will check its layer cache to see if the same parent layer and same command have been used to build an existing layer, and if so, the cache is reused instead of building a new layer. Once any step in the build creates a new layer, all following steps will create new layers, so the order of your Dockerfile matters. You should add frequently changing steps to the end of the Dockerfile so the earlier steps can be cached.
Therefore, if you use a 200MB base image, have 50MB of additions, but only 10MB are new additions at the end of your Dockerfile, you'd push 250MB the first time to a docker engine, but only 10MB to an engine that already had a previous copy of that image, or 50MB to an engine that just had the 200MB base image.
The best practice with images is to build them once, push them to a registry (either self hosted using the registry image, cloud hosted by someone like AWS, or on Docker Hub), and then pull that image to each engine that needs to run it.
For more details on the layered filesystem, see https://docs.docker.com/engine/userguide/storagedriver/imagesandcontainers/
You can also work a little, in order to create smaller images.
You can use Alpine or Busybox instead of using bigger Ubuntu, Debian or Bitnami (Debian light).
A smaller image is more secure as less tools are available.
Some reading
http://blog.xebia.com/how-to-create-the-smallest-possible-docker-container-of-any-image/
https://www.dajobe.org/blog/2015/04/18/making-debian-docker-images-smaller/
You have 2 great tools in order to make smaller docker images
https://github.com/docker-slim/docker-slim
and
https://github.com/mvanholsteijn/strip-docker-image
Some examples with docker-slim
https://hub.docker.com/r/k3ck3c/grafana-xxl.slim/
shows
size before -> 357.3 MB
and using docker-slim -> 18.73 MB
or about simh
https://hub.docker.com/r/k3ck3c/simh_bitnami.slim/
size 5.388 MB
when the original
k3ck3c/simh_bitnami 88.86 MB
a popular netcat image
chilcano/netcat is 135.2 MB
when a netcat based on Alpine is 7.812 MB
and based on busybox will need 2 or 3 MB

Resources