Docker cache permuted RUN instructions - docker

If one builds an image from a Dockerfile, would permuting 2 RUN instructions:
Create a completely new image(new hash) with the same cached layers permuted?
No new image is created as permuting does not affect build for the same set of RUN instructions?
RUN instruction1 replaced by RUN instruction2
RUN instruction2 replaced by RUN instruction1

If you permut the RUN instuctions a new image will be created. Here is an example:
FROM alpine
RUN echo abc
RUN echo cdf
Running docker image build -t image1 . And then permuting the RUN commands
and running docker image build -t image2 .. You will find that image1 and image2 have different ids.

Given this minimal Dockerfile:
FROM busybox
RUN echo text1 > file1
RUN echo text2 > file2
When you run:
docker build . -t my-image
docker inspect my-image
Then you get:
"RootFS": {
"Type": "layers",
"Layers": [
"sha256:08c2295a7fa5c220b0f60c994362d290429ad92f6e0235509db91582809442f3",
"sha256:2ce4cb064fd2dc11c0b6fe08ffed6364478f6de0a1ac115d8aa01005b4c2921a",
"sha256:b4f880ce3a2172db2a614faf516c172d1e205bbf293daaee0174c4a5bd93d5f3"
]
}
Now try again with permuted commands, build and inspect the image you get:
"RootFS": {
"Type": "layers",
"Layers": [
"sha256:08c2295a7fa5c220b0f60c994362d290429ad92f6e0235509db91582809442f3",
"sha256:812b39039b60290f4aa193d8f8bf03fbd13020dd5cfa6e6638feb68dac72cf9c",
"sha256:451c384fb837aa70e446a36d3571123144cb497a42819b7a30348e7d49b24a0b"
]
}
Note:
If your commands do not modify the file system e.g. RUN echo text your image would have only one layer sha256:08c2295a7fa5c220b0f60c9943
62d290429ad92f6e0235509db91582809442f3 which represents empty FS.
Conclusion:
Not only a new image is created, but also new layers (i.e. the new image is not just a re-ordered list of existing layers). This is probably because the layer includes not only the contents but its parent hash as well.
See http://windsock.io/explaining-docker-image-ids/ for more details.

Related

Building devcontainer with --ssh key for GitHub repository in build process fails on VS Code for ARM Mac

We are trying to run a python application using a devcontainer.json with VS Code.
The Dockerfile includes the installation of GitHub repositories with pip that require an ssh key. To build the images, we usually use the --ssh flag to pass the required key. We then use this key to run pip inside the Dockerfile as follows:
RUN --mount=type=ssh,id=ssh_key python3.9 -m pip install --no-cache-dir -r pip-requirements.txt
We now want to run a devcontainer.json inside VS Code. We have been trying many different ways.
1. Passing the --ssh key using the build arg variable:
Since you can not directly pass the --ssh key, we tried a workaround:
"args": {"kek":"kek --platform=linux/amd64 --ssh ssh_key=/Users/user/.ssh/id_rsa"}
This produces an OK looking build command that works in a normal terminal, but inside VS Code the key is not being passed and the build fails. (Both on Windows & Mac)
2. Putting an initial build command into the initializeCommand parameter and then a simple build command that should use the cached results:
We run a first build inside the initializeCommand parameter:
"initializeCommand": "docker buildx build --platform=linux/amd64 --ssh ssh_key=/Users/user/.ssh/id_rsa ."
and then we have a second build in the regular parameter:
"build": {
"dockerfile": "../Dockerfile",
"context": "..",
"args": {"kek":"kek --platform=linux/amd64"}
}
This solution is a nice workaround and works flawlessly on Windows. On the ARM Mac, however, only the initializeCommand build stage runs well, the actual build fails, as it does not use the cached version of the images. The crucial step when the --ssh key is used, fails just like described before.
We have no idea why the Mac VS Code ignores the already created images. In a regular terminal, again, the second build command generated by VS Code works flawlessly.
The problem is reproducible on different ARM Macs, and on different repositories.
Here is the entire devcontainer:
{
"name": "Dockername",
"build": {
"dockerfile": "../Dockerfile",
"context": "..",
"args": {"kek":"kek --platform=linux/amd64"}
},
"initializeCommand": "docker buildx build --platform=linux/amd64 --ssh ssh_key=/Users/user/.ssh/id_rsa .",
"runArgs": ["--env-file", "configuration.env", "-t"],
"customizations": {
"vscode": {
"extensions": [
"ms-python.python"
]
}
}
}
So, we finally found a work around:
We add a target to the initialize command:
"initializeCommand": "docker buildx build --platform=linux/amd64 --ssh ssh_key=/Users/user/.ssh/id_rsa -t dev-image ."
We create a new Dockerfile Dockerfile-devcontainer that only uses one line:
FROM --platform=linux/amd64 docker.io/library/dev-image:latest
In the build command of the devcontainer use that Dockerfile:
"name": "Docker",
"initializeCommand": "docker buildx build --platform=linux/amd64 --ssh ssh_key=/Users/user/.ssh/id_rsa -t dev-image:latest .",
"build": {
"dockerfile": "Dockerfile-devcontainer",
"context": "..",
"args": {"kek":"kek --platform=linux/amd64"}
},
"runArgs": ["--env-file", "configuration.env"],
"customizations": {
"vscode": {
"extensions": [
"ms-python.python"
]
}
}
}
In this way we can use the .ssh key and the docker image created in the initializeCommand (Tested on MacOS and Windows).

How to remove caches on Travis CI?

I cached a docker image on travis-ci. The docker image is created from a dockerfile. Now my dockerfile changed, and I need to remove caches and rebuild the docker image. How can I remove the caches on travis-ci?
My current .travis.yml looks like this:
language: C
services:
- docker
cache:
directories:
- docker_cache
before_script:
- |
echo Now loading...
filename=docker_cache/saved_images.tar
if [[ -f "$filename" ]]; then
echo "Got one from cache."
docker load < "$filename"
else
echo "Got one from scratch";
docker build -t $IMAGE .
docker save -o "$filename" $IMAGE
fi
script:
- docker run -it ${IMAGE} /bin/bash -c "pwd"
env:
- IMAGE=test04
Per the docs there are three methods:
Using the UI: "More options" -> "Caches" on the repo's page
Using the CLI: travis cache --delete
Using the API: DELETE /repos/{repository.id}/caches
That said, Docker images are one of the examples explicitly called out as a thing not to cache:
Large files that are quick to install but slow to download do not
benefit from caching, as they take as long to download from the cache
as from the original source
In your example it's not clear what's involved in the pipeline beyond that Dockerfile - even if the file itself hasn't changed, any of the things that go into it (base image, source code, etc.) might have. Caching the image means you may get false positives, builds that pass even though docker build would have failed.

Do I need separate Dockerfiles for py2 and py3?

Currently I have 2 Dockerfiles, Dockerfile-py2:
FROM python:2.7
# stuff
and Dockerfile-py3:
FROM python:3.4
# stuff
where both instances of # stuff are identical.
I build two docker images using an invoke task:
#task
def docker(ctx):
"""Build docker images.
"""
tag = ctx.run('git log -1 --pretty=%h').stdout.strip()
for pyversion in '23':
name = 'myrepo/myimage{pyversion}'.format(pyversion=pyversion)
image = '{name}:{tag}'.format(name=name, tag=tag)
latest = '{name}:latest'.format(name=name)
ctx.run('docker build -t {image} -f Dockerfile-py{pyversion} .'.format(image=image, pyversion=pyversion))
ctx.run('docker tag {image} {latest}'.format(image=image, latest=latest))
ctx.run('docker push {name}'.format(name=name))
is there any way to prevent the duplication of # stuff so I can't get in a situation where someone edits one file but not the other?
Here is one way using Dockerfile ARGS along with docker build --build-arg:
ARG version
FROM python:${version}
RUN echo "$(python --version)"
# stuff
Now you build for python2.7 like so:
docker build -t myimg/tmp --build-arg version=2.7 .
In the output you will see:
Step 3/3 : RUN echo "$(python --version)"
---> Running in 06e28a29a3d2
Python 2.7.16
And in the same way, for python3.4:
docker build -t myimg/tmp --build-arg version=3.4 .
In the output you will see:
Step 3/3 : RUN echo "$(python --version)"
---> Running in 2283edc1b65d
Python 3.4.10
As you can imagine you can also set default values for ${version} in your dockerfile:
ARG version=3.4
FROM python:${version}
RUN echo "$(python --version)"
# stuff
Now if you just do docker build -t myimg/tmp . you will build for python3.4. But you can still override with the previous two commands.
So to answer your question, No, you don't need two different docker files.

Delete environmental variable from docker image

I have looked around online and tried the obvious route (explained below) to remove an environmental variable from a docker image.
1 - I create a container from a modified ubuntu image using:
docker run -it --name my_container my_image
2 - I inspect the image and see the two environmental variables that I want to remove using:
docker inspect my_container
which yields:
...
"Env": [
"env_variable_1=abcdef",
"env_variable_2=ghijkl",
"env_variable_3=mnopqr",
...
3 - I exec into the container and remove the environmental variables via:
docker exec -it my_container bash
unset env_variable_1
unset env_variable_2
4 - I check to make sure the specified variables are gone:
docker inspect my_container
which yields:
...
"Env": [
"env_variable_3=mnopqr",
...
5 - I then commit this modified container as an image via:
docker commit my_container my_new_image
6 - And check for the presence of the deleted environmental variables via:
docker run -it --name my_new_container my_new_image
docker inspect my_new_container
which yields (drumroll please):
...
"Env": [
"env_variable_1=abcdef",
"env_variable_2=ghijkl",
"env_variable_3=mnopqr",
...
AKA the deleted variables are not carried through from the modified container to the new image in the docker commit
What am I missing out on here? Is unset really deleting the variables? Should I use another method to remove these environmental variables or another/modified method to commit the container as an image?
PS: I've confirmed the variables first exist when inside the container via env. I then confirmed they were not active using the same method after using unset my_variable
Thanks for your help!
You need to edit the Dockerfile that built the original image. The Dockerfile ENV directive has a couple of different syntaxes to set variables but none to unset them. docker run -e and the Docker Compose environment: setting can't do this either. This is not an especially common use case.
Depending on what you need, it may be enough to set the variables to an empty value, though this is technically different.
FROM my_image
ENV env_variable_1=""
RUN test -z "$env_variable_1" && echo variable 1 is empty
RUN echo variable 1 is ${env_variable_1:-empty}
RUN echo variable 1 is ${env_variable_1-unset}
# on first build will print out "empty", "empty", and nothing
The big hammer is to use an entrypoint script to unset the variable. The script would look like:
#!/bin/sh
unset env_variable_1 env_variable_2
exec "$#"
It would be paired with a Dockerfile like:
FROM my_image
COPY entrypoint.sh /
RUN chmod +x /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]
CMD ["same", "as", "before"]
docker inspect would still show the variable as set (because it is in the container metadata) but something like ps e that shows the container process's actual environment will show it unset.
As a general rule you should always use the docker build system to create an image, and never use docker commit. ("A modified Ubuntu image" isn't actually a reproducible recipe for debugging things or asking for help, or for rebuilding it when a critical security patch appears in six months.) docker inspect isn't intrinsically harmful but has an awful lot of useless information; I rarely have reason to use it.
Maybe you can try with this way, as in this answer:
docker exec -it -e env_variable_1 my_container bash
And then commit the container as usual.
I personally was looking to remove all environment variables to have a fresh image but without losing the contents inside the image.
The problem was that when i reused this image and reset those environment variables with new values, they were not changed, the old values were still present.
My solution was to reinitialize the image with docker export and then docker import.
Export
First, spin up a container with the image, then export the container to a tarball
docker export {container_name} > my_image.tar
Import
Import the tarball to a new image
docker import my_image.tar my_image_tag:latest
Doing this will reset the image, meaning only the contents of the container will remain.
All layers, environment variables, entrypoint, and command data will be gone.

Detect if Docker image would change on running build

I am building a Docker image using a command line like the following:
docker build -t myimage .
Once this command has succeeded, then rerunning it is a no-op as the image specified by the Dockerfile has not changed. Is there a way to detect if the Dockerfile (or one of the build context files) subsequently changes without rerunning this command?
looking at docker inspect $image_name from one build to another, several information doesn't change if the docker image hasn't changed. One of them is the docker Id. So, I used the Id information to check if a docker has been changed as follows:
First, one can get the image Id as follows:
docker inspect --format {{.Id}} $docker_image_name
Then, to check if there is a change after a build, you can follow these steps:
Get the image id before the build
Build the image
Get the image id after the build
Compare the two ids, if they match there is no change, if they don't match, there was a change.
Code: Here is a working bash script implementing the above idea:
docker inspect --format {{.Id}} $docker_image_name > deploy/last_image_build_id.log
# I get the docker last image id from a file
last_docker_id=$(cat deploy/last_image_build_id.log)
docker build -t $docker_image_name .
docker_id_after_build=$(docker inspect --format {{.Id}} $docker_image_name)
if [ "$docker_id_after_build" != "$last_docker_id" ]; then
echo "image changed"
else
echo "image didn't change"
fi
There isn't a dry-run option if that's what you are looking for. You can use a different tag to avoid affecting existing images and look for ---> Using cache in the output (then delete the tag if you don't want it).

Resources