How to find which step in Dockerfile added some path? - docker

I have a Docker image which contains a file, say /usr/bin/foo. What's the easiest way to find out which step of the Dockerfile added that path? (Which I thought was equivalent to the question, of which layer of the Docker image does that path come from?)
I wrote a script which prints out all the paths in the image, prefixed by layer ID. It appears to work, but is quite slow:
#!/bin/bash
die() { echo 1>&2 "ERROR: $*"; exit 1; }
dir=$(mktemp -d)
trap "rm -rf $dir" EXIT
img="$1"
[[ -n "$img" ]] || die "wrong arguments"
docker image save "$img" | (cd $dir && tar xf -) ||
die "failed extracting docker image $img"
(cd $dir && find . -name '*.tar' | while read f; do layer=$(echo $f | cut -d/ -f2); tar tf $f | sed -e "s/^/$layer:/"; done) ||
die "failed listing layers"
(It could be made faster if it didn't write anything to disk. The problem is while tar tf - prints the paths in the TAR, it doesn't do the same for the nested layer.tar files. I am thinking I could use the Python tarfile module - but surely somebody else out there has done this already?)
However, I don't know how to translate the layer ID it gives me to a step in the Docker image. I thought I'd correlate it with the layer IDs reported by docker inspect:
docker image inspect $IMAGE | jq -r '.[].RootFS.Layers[]' | nl
But the layer ID which my script reports as containing the path, I can't find in the output of the above command. (Is that a consequence of BuildKit???)
In the end, I gave up on this whole approach. Instead I just made some educated guesses as to which Dockerfile line was probably creating that path, tested each guess by commenting it out (and all the lines after it), and soon I found the answer. Still, there must be a better way, surely? Ideally, what I'd like is something like a --contains-path= option to docker image history – which doesn't exist, but maybe there is something else which does the equivalent?

While dlayer does not have any searching function built-in, it is straight-forward to implement by combining it with a Perl one-liner:
docker image save $IMAGE |
dlayer -n 999999 |
perl -ne 'chomp;$query=quotemeta("usr/bin/foo");$cmd=$_ if $_ =~ m/ [\$] /;print "$cmd\n\t$_\n" if m/ $query/;'
This will print something like:
13 MB $ /opt/bar/install.sh # buildkit
637 B usr/bin/foo
-n 999999 is to increase limit of number of file names output from the default 100, otherwise the path will only be found if it is in the first 100 from that layer.
(I submitted a PR to add a built-in search function to dlayer, which removes the need for this one-line Perl script.)

Related

How can I load multiple tar images using nerdctl? (containerd)

There are around 10 container image files on the current directory, and I want to load them to my Kubernetes cluster that is using containerd as CRI.
[root#test tmp]# ls -1
test1.tar
test2.tar
test3.tar
...
I tried to load them at once using xargs but got the following result:
[root#test tmp]# ls -1 | xargs nerdctl load -i
unpacking image1:1.0 (sha256:...)...done
[root#test tmp]#
The first tar file was successfully loaded, but the command exited and the remaining tar files were not processed.
I have confirmed the command nerdctl load -i succeeded with exit code 0.
[root#test tmp]# nerdctl load -i test1.tar
unpacking image1:1.0 (sha256:...)...done
[root#test tmp]# echo $?
0
Does anyone know the cause?
Your actual ls command piped to xargs is seen as a single argument where file names are separated by null bytes (shortly said... see for example this article for a better in-depth analyze). If your version of xargs supports it, you can use the -0 option to take this into account:
ls -1 | xargs -0 nerdctl load -i
Meanwhile, this is not really safe and you should see why it's not a good idea to loop over ls output in your shell
I would rather transform the above to the following command:
for f in *.tar; do
nerdctl load -i "$f"
done

Is there any way to using hadolint for multiple dockerfiles?

Hadolint is an awesome tool for linting Dockerfiles. I am trying
to integrated to my CI but I am dealing with for run over multiple Dockerfiles. Does someone know how the syntax look like? Here is how my dirs appears to:
dir1/Dockerfile
dir2/Dockerfile
dir3/foo/Dockerfile
in gitlab-ci
stage: hadolint
image: hadolint/hadolint:latest-debian
script:
- mkdir -p reports
- |
hadolint dir1/Dockerfile > reports/dir1.json \
hadolint dir2/Dockerfile > reports/dir2.json \
hadolint dir3/foo/Dockerfile > reports/dir3.json
But the sample above is now working.
So as far as I found it, hadolint runs recursively. So in my case:
- hadolint */Dockerfile > reports/all_reports.json
But the problem with this approach is that all reports will be in one file which humper the maintenance and clarity
If you want to keep all reports separated (one per top-level directory), you may want to rely on some shell snippet?
I mean something like:
- |
find . -name Dockerfile -exec \
sh -c 'src=${1#./} && { set -x && hadolint "$1"; } | tee -a "reports/${src%%/*}.txt"' sh "{}" \;
Explanation:
find . -name Dockerfile loops over all Dockerfiles in the current directory;
-exec sh -c '…' runs a subshell for each Dockerfile, setting:
$0 = "sh" (dummy value)
$1 = "{}" (the full, relative path of the Dockerfile), "{}" and \; being directly related to the find … -exec pattern;
src=${1#./} trims the path, replacing ./dir1/Dockerfile with dir1/Dockerfile
${src%%/*} extracts the top-level directory name (dir1/Dockerfile → dir1)
and | tee -a … copies the output, appending hadolint's output to the top-level directory report file, for each parsed Dockerfile (while > … should be avoided here for obvious reasons, if you have several Dockerfiles in a single top-level directory).
I have replaced the .json extension with .txt as hadolint does not seem to output JSON data.

how to load all saved docker images in parallel

I have 20 images TARed, now I want to load those images on another system. However, loading itself is taking 30 to 40 minutes. All images are independent of each other so all images loading should happen in parallel, I believe.
I tried solution like running load command in background(&) and wait till loading finishes, but observed that it is taking even more time. Any help here is highly appreciated.
Note:- not sure about the option -i to docker load command.
Try
find /path/to/image/archives/ -iname "*.tar" -o -iname "*.tar.xz" |xargs -r -P4 -i docker load -i {}
This will load Docker image archives in parallel (adjust -P4 to the desired number of parallel loads or set to -P0 for unlimited concurrency).
For speeding up the pulling/saving processes, you can use ideas from the snippet below:
#!/usr/bin/env bash
TEMP_FILE="docker-compose.image.pull.yaml"
image_name()
{
local name="$1"
echo "$name" | awk -F '[:/]' '{ print $1 }'
}
pull_images_file_gen()
{
local from_file="$1"
cat <<EOF >"$TEMP_FILE"
version: '3.4'
services:
EOF
while read -r line; do
cat <<EOF >>"$TEMP_FILE"
$(image_name "$line"):
image: $line
EOF
done < "$from_file"
}
save_images()
{
local from_file="$1"
while read -r line; do
docker save -o /tmp/"$(image_name "$line")".tar "$line" &>/dev/null & disown;
done < "$from_file"
}
pull_images_file_gen "images"
docker-compose -f $TEMP_FILE pull
save_images "images"
rm -f $TEMP_FILE
images - contains needed Docker images names list line by line.
Good luck!

Why is xargs' exit code different based on the presence of "-I" option?

After reading the xargs man page, I am unable to understand the difference in exit codes from the following xargs invocations.
(The original purpose was to combine find and grep to check if an expressions exists in ALL the given files when I came across this behaviour)
To reproduce:
(use >>! if using zsh to force creation of file)
# Create the input files.
echo "a" >> 1.txt
echo "ab" >> 2.txt
# The end goal is to check for a pattern (in this case simply 'b') inside
# ALL the files returned by a find search.
find . -name "1.txt" -o -name "2.txt" | xargs -I {} grep -q "b" {}
echo $?
123 # Works as expected since 'b' is not present in 1.txt
find . -name "1.txt" -o -name "2.txt" | xargs grep -q "b"
echo $?
0 # Am more puzzled by why the behaviour is inconsistent
The EXIT_STATUS section on the man page says:
xargs exits with the following status:
0 if it succeeds
123 if any invocation of the command exited with status 1-125
124 if the command exited with status 255
125 if the command is killed by a signal
126 if the command cannot be run
127 if the command is not found
1 if some other error occurred.
I would have thought, that 123 if any invocation of the command exited with status 1-125 should apply irrespective of whether or not -I is used ?
Could you share any insights to explain this conundrum please?
Here is evidence of the effect of -I option with xargs with the help of a wrapper script which shows the number of invocations:
cat ./grep.sh
#/bin/bash
echo "I am being invoked at $(date +%Y%m%d_%H-%M-%S)"
grep $#
(the actual command being invoked, in this case grep doesn't really matter)
Now execute the same commands as in the question using the wrapper script instead:
❯ find . -name "1.txt" -o -name "2.txt" | xargs -I {} ./grep.sh -q "b" {}
I am being invoked at 20190410_09-46-29
I am being invoked at 20190410_09-46-30
❯ find . -name "1.txt" -o -name "2.txt" | xargs ./grep.sh -q "b"
I am being invoked at 20190410_09-46-53
I have just discovered a comment on the answer of a similar question that answers this question (complete credit to https://superuser.com/users/49184/daniel-andersson for his wisdom):
https://superuser.com/questions/557203/xargs-i-behaviour#comment678705_557230
Also, unquoted blanks do not terminate input items; instead the separator is the newline character. — this is central to understanding the behavior. Without -I, xargs only sees the input as a single field, since newline is not a field separator. With -I, suddenly newline is a field separator, and thus xargs sees three fields (that it iterates over). That is a real subtle point, but is explained in the man page quoted.
-I replace-str
Replace occurrences of replace-str in the initial-arguments
with names read from standard input. Also, unquoted blanks do
not terminate input items; instead the separator is the
newline character. Implies -x and -L 1.
Based on that,
find . -name "1.txt" -o -name "2.txt"
#returns
# ./1.txt
# ./2.txt
xargs -I {} grep -q "b" {}
# interprets the above as two separate lines since,
# with -I option the newline is now a *field separator*.
# So this results in TWO invocations of grep and since one of them fails,
# the overall output is 123 as documented in the EXIT_STATUS section
xargs grep -q "b"
# interprets the above as a single input field,
# so a single grep invocation which returns a successful exit code of 0 since the pattern was found in one of the files.

How to use Fred's ImageMagick textcleaner script?

I want to do OCR on some of my images, but images are not quite very impressive. So, for cleaning it I wanted to use Fred's ImageMagick Textcleaner script. Command that I gave:-
sh textcleaner.sh input_file output_file -g -e stretch -f 25 -o 20 -t 30 -u -s 1 -T -p 20
This is the arguments which Fred has given on website itself. I am also doing for same sample image. But I don't think so any of my options are working everything is by default. And I keep getting this error also
textcleaner.sh: line 177: type: textcleaner.sh: not found
usage: dirname path
usage: basename string [suffix]
basename [-a] [-s suffix] string [...]
And At last I had to keep the files in same folder where my textcleaner script is. How can I make it global and give the absolute path to it rather than putting the files wherever textcleaner is.
It's a bash script - it says so in the first line - yet you are trying to run it in sh - which is not bash. You need to make the script executable, by running
chmod +x textcleaner
then you can run it properly using:
./textcleaner ... arguments ...
That should make the error message go away. Then try showing us a sample image so we can try and see what the problem is.
In my ImageMagick scripts, the syntax is script name ...arguments... input output. So your command should be
bash textcleaner.sh -g -e stretch -f 25 -o 20 -t 30 -u -s 1 -T -p 20 input_file output_file
See my Pointers For Use (for further configuration) at my home page: http://www.fmwconcepts.com/imagemagick/index.php

Resources