How to push only whats changed with Docker push?

How to push only whats changed with Docker push? - docker

A. Here is how I created the image:
Got latest Ubuntu image
Ran as container and attached to it
Cloned source code from git inside docker container
Tagged and pushed docker image to my registry
B. And from a different machine I pulled, changed and pushed it by doing:
Docker pull from the registry
Start container with the pulled image and attach to it
Change something in the cloned git directory
Stop container, tag and push it to registry
Now the issue I'm seeing is that every time B is repeated it will try to upload ~600MB (which is the public image layer) to the registry which takes a long time in my case.
Is there any way to avoid uploading the whole 600MB and instead pushing the only directory that has changed?
What am I doing wrong? How do you guys use docker for frequent pushes?

Docker will only push changed layers, so it looks as though something in your workflow is not quite right. It will be much clearer if you use a Dockerfile, as each instruction explicitly creates a layer, but even with docker commit the results should be the same.
Example - run a container from the ubuntu image and run apt-get update and then commit the container to a new image. Now run docker history and you'll see the new images adds a layer on top of the bash image, which has the additional state from running the APT update:
> docker history sixeyed/temp1
IMAGE CREATED CREATED BY SIZE COMMENT
2d98a4114b7c About a minute ago /bin/bash 22.2 MB
14b59d36bae0 7 months ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0 B
<missing> 7 months ago /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$/ 1.895 kB
<missing> 7 months ago /bin/sh -c echo '#!/bin/sh' > /usr/sbin/polic 194.5 kB
<missing> 7 months ago /bin/sh -c #(nop) ADD file:620b1d9842ebe18eaa 187.8 MB
In this case, the diff between ubuntu and my temp1 image is the 22MB layer 2d98.
Now if I run a new container from temp1, create an empty file and run docker commit to create a new image, the new layer only has the changed file:
> docker history sixeyed/temp2
IMAGE CREATED CREATED BY SIZE COMMENT
e9ea4b4963e4 45 seconds ago /bin/bash 0 B
2d98a4114b7c About a minute ago /bin/bash 22.2 MB
14b59d36bae0 7 months ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0 B
<missing> 7 months ago /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$/ 1.895 kB
<missing> 7 months ago /bin/sh -c echo '#!/bin/sh' > /usr/sbin/polic 194.5 kB
<missing> 7 months ago /bin/sh -c #(nop) ADD file:620b1d9842ebe18eaa 187.8 MB
When I push the first image, only the 22MB layer will get uploaded - the others are mounted from ubuntu, which is already in the Hub. If I push the second image, only the changed layer gets pushed - the temp1 layer is mounted from the first push:
> docker push sixeyed/temp2
The push refers to a repository [docker.io/sixeyed/temp2]
f741d3d3ee9e: Pushed
64f89772a568: Mounted from sixeyed/temp1
5f70bf18a086: Mounted from library/ubuntu
6f32b23ac95d: Mounted from library/ubuntu
14d918629d81: Mounted from library/ubuntu
fd0e26195ab2: Mounted from library/ubuntu
So if your pushes are uploading 600MB, you're either making 600MB changes to the image, or your workflow is preventing Docker using layers correctly.

Docker already uploads only the changed layer.
It is similar to how Docker build only rebuilds the cache invalidated layers. Of course it has to communicate with the registry which layers are available (it reports as Already pushed). And if you have changed the sequence of your operations in the Dockerfile, they are absolutely new layers and all of them will be re-uploaded obviously.
FROM ubuntu
RUN echo "hello"
EXPOSE 80
and
FROM ubuntu
EXPOSE 80
RUN echo "hello"
These two images are miles apart even though the behavioral end result is same. So take care about such things.

Related

How to determine the specific version of a Docker image built from a latest/stable tag?

I have a Docker image that was created roughly a year ago. The Dockerfile contains:
FROM docker:stable
How can I determine the actual version of the docker image that stable was referring to back when the image was built?
Edit: What I want to do, in a nutshell, is replace FROM docker:stable with FROM docker:X.Y.Z where X.Y.Z is the version tag that "stable" was pointing to a year ago when the image was originally built.

As suggested by this answer
docker inspect --format='{{index .RepoDigests 0}}' $IMAGE
This will give you the sha256 hash of the image.
Then you can use a service like MicroBadger to get more info about that specific build.
If you want to recreate the Dockerfile you can use docker history to examine the layer history:
$ docker history docker
IMAGE CREATED CREATED BY SIZE COMMENT
3e23a5875458 8 days ago /bin/sh -c #(nop) ENV LC_ALL=C.UTF-8 0 B
8578938dd170 8 days ago /bin/sh -c dpkg-reconfigure locales && loc 1.245 MB
be51b77efb42 8 days ago /bin/sh -c apt-get update && apt-get install 338.3 MB
4b137612be55 6 weeks ago /bin/sh -c #(nop) ADD jessie.tar.xz in / 121 MB
750d58736b4b 6 weeks ago /bin/sh -c #(nop) MAINTAINER Tianon Gravi <ad 0 B
511136ea3c5a 9 months ago 0 B
Keep in mind that if the image has been manually tampered with, I don't know how reliable this output would be.
Finally if you want to go full hacker mode, this old thread on the Docker community forums has some info.
I'm not sure how you can get the tag, because I don't believe this is stored in the image itself, but in the repository. So you'd have to query the repository itself, or get a full list of image history and go detective on it.

Is there a way to tag a previous layer in a docker image or revert a commit?

Let's say there is a docker image, someone makes changes to it and then pushes it up to a docker repository. I then pull down the image. Is there a way to then take that image and run a container from a previous layer? Run the version before the changes were made.
If I run docker history it will look something like this:
docker history imagename:tag
IMAGE CREATED CREATED BY SIZE COMMENT
3e23a5875458 8 days ago /bin/sh -c #(nop) ENV LC_ALL=C.UTF-8 0 B
<missing> 8 days ago /bin/sh -c dpkg-reconfigure locales && loc 1.245 MB
<missing> 8 days ago /bin/sh -c apt-get update && apt-get install 338.3 MB
<missing> 6 weeks ago /bin/sh -c #(nop) ADD jessie.tar.xz in / 121 MB
<missing> 6 weeks ago /bin/sh -c #(nop) MAINTAINER ssss <ad 0 B
<missing> 9 months ago 0 B
It seems as if I could run an earlier version if I figure out a way to somehow tag or identify previous layers of the image.

You can, by tagging build layers of the image if you have access to them. As described here.
In your case what could be happening is that from version v1.10.0 forward they've changed the way that the docker engine handles content addressability. This is being heavily discussed here.
What it means is that you won't have access to the build layers unless you built this image in the current machine or exported and loaded by combining:
docker save imagename build-layer1 build-layer2 build-layer3 > image-caching.tar
docker load -i image-caching.tar
A user has posted a handy way to save that cache in the discussion I've mentioned previously:
docker save imagename $(sudo docker history -q imagename | tail -n +2 | grep -v \<missing\> | tr '\n' ' ') > image-caching.tar
This should collect all the build layers of the given image and save them to a cache tar file.

Is there a way to add only changed files to a docker image as a new layer - without resorting to docker commit?

TL;DR
Running COPY . /app on top of an image with but slightly outdated source code creates a new layer as large as the whole source code, even when there is only a few bytes worth of changes.
Is there a way to add only changed files to this docker image as a new layer - without resorting to docker commit?
Long version:
When deploying our application to production, we need to add the source code to the image. A very simple Dockerfile is used for this:
FROM neam/dna-project-base-debian-php:0.6.0
COPY . /app
Since the source code is huge (1.2 GB), this makes for quite a hefty push upon each deploy:
$ docker build -f .stack.php.Dockerfile -t project/project-web-src-php:git-commit-17c279b .
Sending build context to Docker daemon 1.254 GB
Step 0 : FROM neam/dna-project-base-debian-php:0.6.0
---> 299c10c416fc
Step 1 : COPY . /app
---> 78a30802804a
Removing intermediate container 13b49c323bb6
Successfully built 78a30802804a
$ docker tag -f project/project-web-src-php:git-commit-17c279b tutum.co/project/project-web-src-php:git-commit-17c279b
$ docker login --email=tutum-project#project.com --username=project --password=******** https://tutum.co/v1
WARNING: login credentials saved in /home/dokku/.docker/config.json
Login Succeeded
$ docker push tutum.co/project/project-web-src-php:git-commit-17c279b
The push refers to a repository [tutum.co/project/project-web-src-php] (len: 1)
Sending image list
Pushing repository tutum.co/project/project-web-src-php (1 tags)
Image a604b236bcde already pushed, skipping
Image 1565e86129b8 already pushed, skipping
...
Image 71156b357f2f already pushed, skipping
Image 299c10c416fc already pushed, skipping
78a30802804a: Pushing [=========> ] 234.2 MB/1.254 GB
Upon the next deploy, we only want to add the changed files to the image, but watch and behold when running COPY . /app on top of the previously added image actually requires us to push 1.2 GB worth of source code AGAIN, even when we only change a few bytes worth of source code:
New Dockerfile (.stack.php.git-commit-17c279b.Dockerfile):
FROM project/project-web-src-php:git-commit-17c279b
COPY . /app
After change a few files, adding some text and code, then building and pushing:
$ docker build -f .stack.php.git-commit-17c279b.Dockerfile -t project/project-web-src-php:git-commit-17c279b-with-a-few-changes .
Sending build context to Docker daemon 1.225 GB
Step 0 : FROM project/project-web-src-php:git-commit-17c279b
---> 4dc643a45de3
Step 1 : COPY . /app
---> ecc7adc194c4
Removing intermediate container cb3e87c6cb7a
Successfully built ecc7adc194c4
$ docker tag -f project/project-web-src-php:git-commit-17c279b-with-a-few-changes tutum.co/project/project-web-src-php:git-commit-17c279b-with-a-few-changes
$ docker push tutum.co/project/project-web-src-php:git-commit-17c279b-with-a-few-changes
The push refers to a repository [tutum.co/project/project-web-src-php] (len: 1)
Sending image list
Pushing repository tutum.co/project/project-web-src-php (1 tags)
Image 1565e86129b8 already pushed, skipping
Image a604b236bcde already pushed, skipping
...
Image fe64bff23cf8 already pushed, skipping
Image 71156b357f2f already pushed, skipping
ecc7adc194c4: Pushing [==> ] 68.21 MB/1.225 GB
There is a workaround to achieve small layers as described on Updating docker images with small changes using commits which includes launching a rsync process within the image and then using docker commit to save the new contents as a new layer, however (as mentioned in that thread) this is unorthodox since the image is not built from a Dockerfile, and we prefer an orthodox solution that does not rely on docker commit.
Is there a way to add only changed files to this docker image as a new layer - without resorting to docker commit?
Docker version 1.8.3

Actually, the solution IS to use COPY . /app as the OP is doing, there is however an open bug causing this not to work as expected on most systems
The only currently feasible workaround to this issue seems to be to use rsync to analyze the differences between the old and new images prior to pushing the new one, then use the changelog output to generate a tar-file containing the relevant changes which is subsequently COPY:ed to a new image layer.
This way, the layer sizes becomes a few bytes or kilobytes for smaller changes instead of 1.2 GB every time.
I put together documentation and scripts to help out with this over at https://github.com/neam/docker-diff-based-layers.
The end results are shown below:
Verify that basing the project images on the revision 1 image tag contents does not lead to desired outcome
Verify that subsequent COPY . /app commands re-adds all files in every layer instead of only the files that have changed:
docker history sample-project:revision-2
Output:
IMAGE CREATED CREATED BY SIZE COMMENT
4a3115eaf267 3 seconds ago /bin/sh -c #(nop) COPY dir:61d102421e6692b677 16.78 MB
d4b30af167f4 25 seconds ago /bin/sh -c #(nop) COPY dir:68b8f374d8731b8ad8 16.78 MB
c898fe1daa44 2 minutes ago /bin/sh -c apt-get update && apt-get install 10.77 MB
39a8a358844a 4 months ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0 B
b1dacad9c5c9 4 months ago /bin/sh -c #(nop) ADD file:5afd8eec1dc1e7666d 125.1 MB
Even though we added/changed only a few bytes, all files are re-added and 16.78 MB is added to the total image size.
Also, the file(s) that we removed did not get removed.
Create an image with an optimized layer
export RESTRICT_DIFF_TO_PATH=/app
export OLD_IMAGE=sample-project:revision-1
export NEW_IMAGE=sample-project:revision-2
docker-compose -f rsync-image-diff.docker-compose.yml up
docker-compose -f shell.docker-compose.yml -f process-image-diff.docker-compose.yml up
cd output; docker build -t sample-project:revision-2-processed .; cd ..
Verify that the processed new image has smaller sized layers with the changes:
docker history sample-project:revision-2-processed
Output:
IMAGE CREATED CREATED BY SIZE COMMENT
1920e750d362 24 seconds ago /bin/sh -c if [ -s /.files-to-remove.list ]; 0 B
1267bf926729 2 minutes ago /bin/sh -c #(nop) ADD file:5021c627243e841a45 19 B
d04a2181b62a 2 minutes ago /bin/sh -c #(nop) ADD file:14780990c926e673f2 264 B
d4b30af167f4 7 minutes ago /bin/sh -c #(nop) COPY dir:68b8f374d8731b8ad8 16.78 MB
c898fe1daa44 9 minutes ago /bin/sh -c apt-get update && apt-get install 10.77 MB
39a8a358844a 4 months ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0 B
b1dacad9c5c9 4 months ago /bin/sh -c #(nop) ADD file:5afd8eec1dc1e7666d 125.1 MB
Verify that the processed new image contains the same contents as the original:
export RESTRICT_DIFF_TO_PATH=/app
export OLD_IMAGE=sample-project:revision-2
export NEW_IMAGE=sample-project:revision-2-processed
docker-compose -f rsync-image-diff.docker-compose.yml up
The output should indicate that there are no differences between the images/tags. Thus, the sample-project:revision-2-processed tag can now be pushed and deployed, leading to the same end result but without having to push an unnecessary 16.78M over the wire, leading to faster deploy cycles.

Docker caching works per layer / instruction in the Dockerfile. In this case the files used in that layer (everything in the build-context (.)) are modified, so the layer needs to be rebuilt.
If there's specific parts of the code that don't change often, you could consider to add those in a separate layer, or even move those to a "base image"
FROM mybaseimage
COPY ./directories-that-dont-change-often /somewhere
COPY ./directories-that-change-often /somewhere
It may take some planning or restructuring for this to work, depending on your project, but may be worth doing.

My solution: (idea from https://github.com/neam/docker-diff-based-layers !)
docker rm -f uniquename 2> /dev/null
docker run --name uniquename -v ~/repo/mycode:/src ${REPO}/${IMAGE}:${BASE} rsync -ar --exclude-from '/src/.dockerignore' --delete /src/ /app/
docker commit uniquename ${REPO}/${IMAGE}:${NEW_TAG}

How to see the source of a DockerHub container so I can adapt it?

I want to work on a GeoDjango application that will be easy for others to work on too. I would like to use Docker to package the application.
So, a very simple question.
I can see that there is an existing GeoDjango container on DockerHub: https://registry.hub.docker.com/u/jhonatasmartins/geodjango/
How can I view the Dockerfile for this container, so that I can use it as a basis for my own container?

There is (at the moment) no way to see the Dockerfile used to generate an image, unless the author has published the Dockerfile somewhere. Images built as automated builds will have links to a source repository with a Dockerfile, but for images that were manually built and pushed you're limited to whatever folks publish in the description.
You could try contacting the image maintainer.

There are also times where doing a docker history <IMAGE_ID> could show you some tips about how the image was built. In fact, if the image was built using a Dockerfile will give you a clear image of the steps of the Dockerfile. For the ones committed from a containers you may not see some steps, but sometimes you can get some idea from there. For example for the image you said:
$ docker history jhonatasmartins/geodjango:latest
IMAGE CREATED CREATED BY SIZE
0b7e890a4644 3 months ago /bin/bash 112.2 MB
35174145916a 3 months ago /bin/bash 449.9 MB
5506de2b643b 4 months ago /bin/sh -c #(nop) CMD [/bin/bash] 0 B
22093c35d77b 4 months ago /bin/sh -c apt-get update && apt-get dist-upg 6.558 MB
3680052c0f5c 4 months ago /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$/ 1.895 kB
e791be0477f2 4 months ago /bin/sh -c rm -rf /var/lib/apt/lists/* 0 B
ccb62158e970 4 months ago /bin/sh -c echo '#!/bin/sh' > /usr/sbin/polic 194.8 kB
d497ad3926c8 4 months ago /bin/sh -c #(nop) ADD file:3996e886f2aa934dda 192.5 MB
511136ea3c5a 20 months ago 0 B
Edit1: As jwodder recommended below add --no-trunc option to docker inspect to check the complete command of each layer. I won't put here the one for the example because it's too verbose, but definitely you will get more info using it.

will docker share binary file or not?

Supposed that I will host two project on one machine and both of them will use apache. Then I need to create two container to separate them. And I need run apt-get install apache2 in both container. Will it take twice space as much as apache2's size?

No, Apache will be shared between both containers. If you create an image, every step will be saved in an own layer. So for example you start with Ubuntu (layer 1) and install Apache (layer 2). Then you add project A to the image (layer 3). In another Docker image you also start with Ubuntu (layer 1) and install Apache (layer 2). But then you add project B to the image (layer 3). Both images will be the same until layer 3. And this means the installation of Apache will be shared.
Here is an example:
Dockerfile for project A:
FROM ubuntu
RUN apt-get update
RUN apt-get install -y apache2
RUN touch /opt/a.txt
Dockerfile for project B:
FROM ubuntu
RUN apt-get update
RUN apt-get install -y apache2
RUN touch /opt/b.txt
Both files will create very e similar images except of the last command. If you look at the history of both images you will the following:
vagrant#ubuntu-13:/vagrant/Apache/b$ docker history test/a
IMAGE CREATED CREATED BY SIZE
4dc359259700 About a minute ago /bin/sh -c touch /opt/a.txt 8 B
9977b78fbad7 About a minute ago /bin/sh -c apt-get install -y apache2 54.17 MB
e83b3bf07b42 2 minutes ago /bin/sh -c apt-get update 20.67 MB
9cd978db300e 3 months ago /bin/sh -c #(nop) ADD precise.tar.xz in / 204.4 MB
6170bb7b0ad1 3 months ago /bin/sh -c #(nop) MAINTAINER Tianon Gravi <ad 0 B
511136ea3c5a 10 months ago 0 B
vagrant#ubuntu-13:/vagrant/Apache/b$ docker history test/b
IMAGE CREATED CREATED BY SIZE
c0daf4be2ed4 42 seconds ago /bin/sh -c touch /opt/b.txt 8 B
9977b78fbad7 About a minute ago /bin/sh -c apt-get install -y apache2 54.17 MB
e83b3bf07b42 3 minutes ago /bin/sh -c apt-get update 20.67 MB
9cd978db300e 3 months ago /bin/sh -c #(nop) ADD precise.tar.xz in / 204.4 MB
6170bb7b0ad1 3 months ago /bin/sh -c #(nop) MAINTAINER Tianon Gravi <ad 0 B
511136ea3c5a 10 months ago 0 B
You see the history of the images build by Dockerfile A and Dockerfile B. You also see the different layers (each line is a layer in the image). As you can see, the first 6 layers (!) are exactly the same in both images. Only the last layer is different (and has a different id). You will see this effect very nice when you build the images. When you build the first image A it will take some minutes, because Apache must be downloaded and so on. But when you build image B afterwards it will only take some seconds, because Apache is not downloaded again, instead the existing layer (here with id 9977b78fbad7) will be used!

Categories

HOME

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart