Make: dependency on a docker image - docker

I'm using a docker workflow to generate some files, based on a given spec file, with the Makefile being (it's generating a client according to an OpenAPI spec):
SWAGGER ?= ${PWD}/swagger.yaml
GENERATOR ?= openapitools/openapi-generator-cli\:latest
generated: Makefile ${SWAGGER}
docker run --rm --user $$(id -u):$$(id -g) \
-v ${PWD}:/output -v ${SWAGGER}:/input/swagger.yaml \
${GENERATOR} \
generate -g python -i /input/swagger.yaml -o /output/generated \
this works fine, and will rebuild if I modify the input SPEC file.
But it doesn't rebuild when the docker image is changed.
Let's say I docker build the image with the same name:tag again, but with different code inside, or I use a different tagged version of the upstream image, whatever. This is kind of expected because the Makefile has no knowledge of the docker image's content or modification date. How can I make the Makefile understand the dependency on the docker image ?
I've tried to docker inspect the image to fetch the creation date, but I don't know how to make make understand that as a dependency (if the creation date is newer than the output dir, then rebuild)
I can't just add a dependency on the code inside the docker image, because the docker image might not even have been built from locally available files.
make might not be the tool for that kind of thing, maybe there is something else that I could use that understands the docker image dependency.

Whether the artifact you depend on is a Docker image or some service/data somewhere else, you need to represent what you know about it, in the form of a regular file. Because make only deals with files. :-)
I'd recommend you define some macro for "invoke docker build and save evidence of its success to a marker file with predictable name" so that you can use that to replace all calls to docker build, ensure consistent file handling. Full-blown example, assuming a/Dockerfile and b/Dockerfile exist.
# For consistency, the src dirs are named like the images they produce
IMAGES = a b
# Keep "stamps" around, recording that images were built.
# You could keep them in e.g. a `.docker-buildstamps/*` dir,
# but this example uses `*/docker-buildstamp`.
BUILDSTAMP_FILE = docker-buildstamp
BUILDSTAMPS = $(addsuffix /$(BUILDSTAMP_FILE),$(IMAGES))
.PHONY: all
all: $(BUILDSTAMPS)
# Pattern rule: let e.g. `a/docker-buildstamp` depend on changes to `a/*` (-but avoid circular dep)
%/$(BUILDSTAMP_FILE): % %/[!$(BUILDSTAMP_FILE)]*
$(docker_build)
clean:
docker image rm -f $(IMAGES)
rm -f $(BUILDSTAMPS)
# Turn `a/docker-buildstamp` back into `a`
define from_buildstamp
$(#:%/$(BUILDSTAMP_FILE)=%)
endef
# Self-explanatory
define docker_build
docker build -t $(from_buildstamp) $(from_buildstamp)
touch $#
endef

Related

Makefile run same target multiple time concurrently with different option

I wish to build multiple docker images through my makefile. I have a make target looking like this:
docker:
docker build -t service1:latest -f ./service1/Dockerfile .
docker build -t service2:latest -f ./service2/Dockerfile .
...
To gain time, I want to run them in parallel, so I wanted to update my makefile like this:
docker:
docker build -t $(SERVICE):latest -f ./$(SERVICE)/Dockerfile .
And calling it with something which would look like this:
make -j=2 SERVICE=service1 docker SERVICE=service2 docker
But obviously it does not work since there is multiple issues with this.
I was thinking to use the pattern %, but I am not quite sure how to achieve this cleanly.
What would be the right way to achieve this?
You could write something like this:
IMAGES = \
service1 \
service2
all: $(IMAGES)
.PHONY: $(IMAGES)
$(IMAGES):
docker build -t $#:latest -f $#/Dockerfile .
The .PHONY directive is necessary because otherwise it will find the directory named service1 or service2 and decide that the target does not need updating. .PHONY tells it to ignore this and build the target in any case.
Using this Makefile, if I run make -j it spawns to build processes in parallel.
While this works, I'm not sure that make is really the right tool for the job. The idea behind make is that it will only rebuild those things that need to be rebuilt, saving time if only a few things have been modified.
In this situation, make doesn't really have any way to make that sort of decisions.
Since you want to rebuild everything every time, you might be better off with a simple shell script and xargs:
#!/bin/bash
seq 2 |
xargs -iSERVICENUM -P0 docker build -t serviceSERVICENUM -f serviceSERVICENUM/Dockerfile .
Or if your services aren't actually named in a numeric sequence:
#!/bin/bash
SERVICES='
foo
bar
'
xargs -iSERVICE -P0 docker build -t SERVICE -f SERVICE/Dockerfile . <<<$SERVICES

How do I add an additional command line tool to an already existing Docker/Singularity image?

I work in neuroscience, and I use a cloud platform called Brainlife to upload and download data (linked here, but I don't think knowledge of Brainlife is relevant to this question). I use Brainlife's command line interface to upload and download data on my university's server. In order to use their CLI, I run Singularity with a Docker image created by Brainlife (found here). I run this using the following code:
singularity shell docker://brainlife/cli -B
I also have the file saved on my server account, and can run it like this:
singularity shell brainlifeimage.sif -B
After running one of those commands, I am able to download and upload data, usually successfully. Currently I'm following Brainlife's tutorial to bulk download data. The tutorial uses the command line tool "jq" (link), which isn't on their docker image. I tried installing it within the Singularity shell like this:
apt-get install jq
And it returned:
Reading package lists... Done
Building dependency tree
Reading state information... Done
W: Not using locking for read only lock file /var/lib/dpkg/lock
E: Unable to locate package jq
Is there an easy way to add this one tool to the image? I've been reading over the Singularity and Docker documentations, but Docker is all new to me and I'm really lost.
If relevant, my university server runs on Ubuntu 16.04.7 LTS, and I am using terminal on a Mac laptop running MacOS 11.3. This is my first stack overflow question - please let me know if i can provide any additional info! Thanks so much.
The short, specific answer: jq is portable, so you can just mount it into the image and use it normally. e.g.,
singularity shell -B /path/to/jq:/usr/bin/jq brainlifeimage.sif
The short, general answer: you can't modify the read only image and need to build a new one.
Long answer with several options and specific examples:
Since singularity images are read only, they cannot have persistent changes made to them. This is great for reproducibility, a bit inconvenient if your tools are likely to change often. You can rebuild the image in several ways, though all will require sudo permissions.
Write a new Singularity definition based on the docker image
Create a new definition file (generally called Singularity or something.def), use the current container as a base and add the desired software in the %post section. Then build the new image with: sudo singularity build brainy_jq.sif Singularity
The definition file docs are quite good and highly recommended.
Bootstrap: docker
From: brainlife/cli:latest
%post
apt-get update && apt-get install -y jq
Create a sandbox of the current singularity image, make your changes, and convert back to a read-only image. See the singularity docs on writable sandbox directories and converting images between formats.
# use --sandbox to create a writable singularity image
sudo singularity build --sandbox writable_brain/ brainlifeimage.sif
# --writable must still be used to make changes, and sudo for correct permissions
sudo singularity exec writable_brain/ bash -c 'apt-get update && apt-get install -y jq'
# convert back to read-only image for normal usage
sudo singularity build brainlifeimage_jq.sif writable_brain/
Modify the source docker image locally and build from that. One of the more... creative options. Almost sudo-free, except singularity pull doesn't accept docker-daemon so a sudo singularity build is necessary.
# add jq to a new docker container. the value for --name doesn't matter, but we use it
# in later steps. The entrypoint needs to be overridden in this case as well.
docker run -it --name brainlife-jq --entrypoint=/bin/bash \
brainlife/cli:1.5.25 -c 'apt-get update && apt-get install -y jq'
# use docker commit to create an image from the container so it can be reused
# note that we're using the name of the image set in the previous step
# the output of docker commit is the hash for the newly created image, so we grab that
IMAGE_ID=$(docker commit brainlife-jq)
# tag the newly created image with a more useful name
docker tag $IMAGE_ID brainlife/cli:1.5.25-jq
# here we use docker-daemon instead of docker to build from a locally cached docker image
# instead of looking at docker hub
sudo singularity build brainlife_jq.sif docker-daemon://brainlife/cli:1.5.25-jq
# now check that it all worked as planned
singularity exec brainlife_jq.sif which jq
# /usr/bin/jq
ref: docker commit, using locally cached docker images

How to serve a tensorflow model using docker image tensorflow/serving when there are custom ops?

I'm trying to use the tf-sentencepiece operation in my model found here https://github.com/google/sentencepiece/tree/master/tensorflow
There is no issue building the model and getting a saved_model.pb file with variables and assets. However, if I try to use the docker image for tensorflow/serving, it says
Loading servable: {name: model version: 1} failed:
Not found: Op type not registered 'SentencepieceEncodeSparse' in binary running on 0ccbcd3998d1.
Make sure the Op and Kernel are registered in the binary running in this process.
Note that if you are loading a saved graph which used ops from tf.contrib, accessing
(e.g.) `tf.contrib.resampler` should be done before importing the graph,
as contrib ops are lazily registered when the module is first accessed.
I am unfamiliar with how to build anything manually, and was hoping that I could do this without many changes.
One approach would be to:
Pull a docker development image
$ docker pull tensorflow/serving:latest-devel
In the container, make your code changes
$ docker run -it tensorflow/serving:latest-devel
Modify the code to add the op dependency here.
In the container, build TensorFlow Serving
container:$ tensorflow_serving/model_servers:tensorflow_model_server && cp bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server /usr/local/bin/
Use the exit command to exit the container
Look up the container ID:
$ docker ps
Use that container ID to commit the development image:
$ docker commit $USER/tf-serving-devel-custom-op
Now build a serving container using the development container as the source
$ mkdir /tmp/tfserving
$ cd /tmp/tfserving
$ git clone https://github.com/tensorflow/serving .
$ docker build -t $USER/tensorflow-serving --build-arg TF_SERVING_BUILD_IMAGE=$USER/tf-serving-devel-custom-op -f tensorflow_serving/tools/docker/Dockerfile .
You can now use $USER/tensorflow-serving to serve your image following the Docker instructions

How to verify if the content of two Docker images is exactly the same?

How can we determine that two Docker images have exactly the same file system structure, and that the content of corresponding files is the same, irrespective of file timestamps?
I tried the image IDs but they differ when building from the same Dockerfile and a clean local repository. I did this test by building one image, cleaning the local repository, then touching one of the files to change its modification date, then building the second image, and their image IDs do not match. I used Docker 17.06 (the latest version I believe).
If you want to compare content of images you can use docker inspect <imageName> command and you can look at section RootFS
docker inspect redis
"RootFS": {
"Type": "layers",
"Layers": [
"sha256:eda7136a91b7b4ba57aee64509b42bda59e630afcb2b63482d1b3341bf6e2bbb",
"sha256:c4c228cb4e20c84a0e268dda4ba36eea3c3b1e34c239126b6ee63de430720635",
"sha256:e7ec07c2297f9507eeaccc02b0148dae0a3a473adec4ab8ec1cbaacde62928d9",
"sha256:38e87cc81b6bed0c57f650d88ed8939aa71140b289a183ae158f1fa8e0de3ca8",
"sha256:d0f537e75fa6bdad0df5f844c7854dc8f6631ff292eb53dc41e897bc453c3f11",
"sha256:28caa9731d5da4265bad76fc67e6be12dfb2f5598c95a0c0d284a9a2443932bc"
]
}
if all layers are identical then images contains identical content
After some research I came up with a solution which is fast and clean per my tests.
The overall solution is this:
Create a container for your image via docker create ...
Export its entire file system to a tar archive via docker export ...
Pipe the archive directory names, symlink names, symlink contents, file names, and file contents, to an hash function (e.g., MD5)
Compare the hashes of different images to verify if their contents are equal or not
And that's it.
Technically, this can be done as follows:
1) Create file md5docker, and give it execution rights, e.g., chmod +x md5docker:
#!/bin/sh
dir=$(dirname "$0")
docker create $1 | { read cid; docker export $cid | $dir/tarcat | md5; docker rm $cid > /dev/null; }
2) Create file tarcat, and give it execution rights, e.g., chmod +x tarcat:
#!/usr/bin/env python3
# coding=utf-8
if __name__ == '__main__':
import sys
import tarfile
with tarfile.open(fileobj=sys.stdin.buffer, mode="r|*") as tar:
for tarinfo in tar:
if tarinfo.isfile():
print(tarinfo.name, flush=True)
with tar.extractfile(tarinfo) as file:
sys.stdout.buffer.write(file.read())
elif tarinfo.isdir():
print(tarinfo.name, flush=True)
elif tarinfo.issym() or tarinfo.islnk():
print(tarinfo.name, flush=True)
print(tarinfo.linkname, flush=True)
else:
print("\33[0;31mIGNORING:\33[0m ", tarinfo.name, file=sys.stderr)
3) Now invoke ./md5docker <image>, where <image> is your image name or id, to compute an MD5 hash of the entire file system of your image.
To verify if two images have the same contents just check that their hashes are equal as computed in step 3).
Note that this solution only considers as content directory structure, regular file contents, and symlinks (soft and hard). If you need more just change the tarcat script by adding more elif clauses testing for the content you wish to include (see Python's tarfile, and look for methods TarInfo.isXXX() corresponding to the needed content).
The only limitation I see in this solution is its dependency on Python (I am using Python3, but it should be very easy to adapt to Python2). A better solution without any dependency, and probably faster (hey, this is already very fast), is to write the tarcat script in a language supporting static linking so that a standalone executable file was enough (i.e., one not requiring any external dependencies, but the sole OS). I leave this as a future exercise in C, Rust, OCaml, Haskell, you choose.
Note, if MD5 does not suit your needs, just replace md5 inside the first script with your hash utility.
Hope this helps anyone reading.
Amazes me that docker doesn't do this sort of thing out of the box. Here's a variant on #mljrg's technique:
#!/bin/sh
docker create $1 | {
read cid
docker export $cid | tar Oxv 2>&1 | shasum -a 256
docker rm $cid > /dev/null
}
It's shorter, doesn't need a python dependency or a second script at all, I'm sure there are downsides but it seems to work for me with the few tests I've done.
There doesn't seem to be a standard way for doing this. The best way that I can think of is using the Docker multistage build feature.
For example, here I am comparing the apline and debian images. In yourm case set the image names to the ones you want to compare
I basically copy all the file from each image into a git repository and commit after each copy.
FROM alpine as image1
FROM debian as image2
FROM ubuntu
RUN apt-get update && apt-get install -y git
RUN git config --global user.email "you#example.com" &&\
git config --global user.name "Your Name"
RUN mkdir images
WORKDIR images
RUN git init
COPY --from=image1 / .
RUN git add . && git commit -m "image1"
COPY --from=image2 / .
RUN git add . && git commit -m "image2"
CMD tail > /dev/null
This will give you an image with a git repository that records the differences between the two images.
docker build -t compare .
docker run -it compare bash
Now if you do a git log you can see the logs and you can compare the two commits using git diff <commit1> <commit2>
Note: If the image building fails at the second commit, this means that the images are identical, since a git commit will fail if there are no changes to commit.
If we rebuild the Dockerfile it is almost certainly going to produce a new hash.
The only way to create an image with the same hash is to use docker save and docker load. See https://docs.docker.com/engine/reference/commandline/save/
We could then use Bukharov Sergey's answer (i.e. docker inspect) to inspect the layers, looking at the section with key 'RootFS'.

Getting Docker Container Id in Makefile to use in another command

I have a rule in my Makefile. Within this rule I need to manipulate some docker specific things so I need to get the id of the container in a portable way. In addition, I am using Docker Compose. Here is what I have that doesn't work.
a-rule: some deps
$(shell uuid="$(docker-compose ps -q myService)" docker cp "$$uuid":/a/b/c .)
I receive no errors or output, but I do not get a successful execution.
My goal is to get the uuid of the container that myService is running in and then use that uuid to copy a file from the container to my docker host.
edit:
the following works, but I'm still wondering if its possible to do inline variable settings
uuid=$(shell docker-compose ps -q myService)
a-rule: some deps
docker cp "$(uuid)":/a/b/c .
I ran into the same problem and realised that makefiles take output from shell variables with the use of $$. So I tried that and this should work for you:
a-rule: some deps
uuid=$$(docker-compose ps -q myService);\
docker cp "$$uuid":/a/b/c .
Bit late to the party but I hope this helps someone.

Resources