Why is docker not completely deleting my file? - docker

I am trying to build using:
FROM mcr.microsoft.com/dotnet/core/sdk:2.1 AS builder
COPY pythonnet/src/ pythonnet/src
WORKDIR /pythonnet/src/runtime
RUN dotnet build -f netstandard2.0 -p:DefineConstants=\"MONO_LINUX\;XPLAT\;PYTHON3\;PYTHON37\;UCS4\;NETSTANDARD\" Python.Runtime.15.csproj
# copy myApp csproj and restore
COPY src/myApp/*.csproj /src/myApp/
WORKDIR /src/myApp
RUN dotnet restore
# now copy everything else as separate docker step
# (copy to staging folder, remove csproj, and copy down - so we don't overwrite project above)
WORKDIR /
COPY src/myApp/ ./staging/src/myApp
RUN rm ./staging/src/myApp/*.csproj \
&& cp -r ./staging/* ./ \
&& rm -rf ./staging
This was working fine, and in Windows 10 still does, but in CentOS 7 I get:
Step 10/40 : RUN rm ./staging/src/myApp/*.csproj && cp -r ./staging/* ./ && rm -rf ./staging
---> Running in 6b17ae0fae89
cp: cannot stat './staging/src/myApp/myApp.csproj': No such file or directory
Using ls instead of cp throws a similar file not found error, so it looks like Docker still knows about myApp.csproj but cannot see it since it has been removed.
Is there a way around this? I have tried using rsync but similar problems.

I simply ignored the issue by tacking on ;exit 0 on the offending lines. Not great, but does the job.
EDIT: This worked for me as I cannot upgrade the version of CemtOS. If you can, check out Alexander Block's answer.

I don't know specifically how to solve this problem as there's a lot of context in the filesystem that you haven't (and probably can't) share with us.
My suggestion on a strategy is that you:
comment out all lines from the failing one 'til the end of the Dockerfile
build the partial image
docker exec -it [image] bash to jump into the image
poke around and figure out what's going wrong
repeat 1-4 until things work as expected
It's not as fun as a perfectly insightful answer of course but this is a relentlessly effective algorithm even if it's tedious and annoying.
EDIT
My wild guess is that somehow, someway the linux machine doesn't have the file where it's expected for some reason and so it doesn't get copied into the image at all and that's why the docker build process can't find it. But there's no way to know without debugging the build process.

cp -r will stop and fail with that cannot stat <file> message whenever the source is a symbolic link and the target of the link does not exist. It will not copy links to non-existent files.
So my guess is that after you run COPY src/myApp/ ./staging/src/myApp your file ./staging/src/myApp/myApp.csproj is a symbolic link to a non-existent file. Why the following RUN rm ./staging/src/*.csproj doesn't remove it and stays silent about that, I don't know the answer to that.
To help demonstrate my theory, see below showing cp failing on a symlink on Centos 7.
[547] $ docker run --rm -it centos:7
Unable to find image 'centos:7' locally
7: Pulling from library/centos
524b0c1e57f8: Pull complete
Digest: sha256:e9ce0b76f29f942502facd849f3e468232492b259b9d9f076f71b392293f1582
Status: Downloaded newer image for centos:7
[root#a47b77cf2800 /]# ln -s /tmp/foo /tmp/bar
[root#a47b77cf2800 /]# ls -l /tmp/foo
ls: cannot access /tmp/foo: No such file or directory
[root#a47b77cf2800 /]# ls -l /tmp/bar
lrwxrwxrwx 1 root root 8 Jul 6 05:44 /tmp/bar -> /tmp/foo
[root#a47b77cf2800 /]# cp /tmp/foo /tmp/1
cp: cannot stat '/tmp/foo': No such file or directory
[root#a47b77cf2800 /]# cp /tmp/bar /tmp/2
cp: cannot stat '/tmp/bar': No such file or directory
Notice how you copy reports that it cannot stat either the source or destination of the symbolic link. It's the exact symptom you are seeing.
If you just want to get past this, you can try tar instead of cp or rsync.
Instead of
cp -r ./staging/* ./
use this instead:
tar -C ./staging -cf - . | tar -xf -
tar will happily copy symlinks that don't exist.

You've very likely encountered a kernel bug that has been fixed a long time ago in more recent kernels. As of https://de.wikipedia.org/wiki/CentOS, CentOS 7 is based on the Linux Kernel 3.10, which is pretty old already and does not have good Docker support in regard to the storage backend (overlay filesystem).
CentOS tried to backport needed fixes and features into 3.10, but seems to not have succeeded fully when it comes to overlay support. There are multiple (slightly different) issues regarding this which you can find when searching for "CentOS 7 overlay driver" on the internet. All of them have in common that removing of files from parent overlays does not work as expected.
For me it looks like rm calls on files return success, even though the files are not fully removed. Directory listings (e.g. by ls or shell expansion as in your case) then still list the file, while accessing the file then fails (no matter if read, write or deletion of the file).
I assume that what you've seen is just another incarnation of these issues. You should either switch to CentOS 8 or upgrade your Kernel (which is not officially supported by CentOS as far as I understand). Or even more radical, switch to a distribution which is used more often in combination with Docker and generally offers more recent Kernels, e.g. Debian or Ubuntu.

Related

How can I use a several line command in a Dockerfile in order to create a file within the resulting Image

I'm following installation instructions for RedhawkSDR, which rely on having a Centos7 OS. Since my machine uses Ubuntu 22.04, I'm creating a Docker container to run Centos7 then installing RedhawkSDR in that.
One of the RedhawkSDR installation instructions is to create a file with the following command:
cat<<EOF|sed 's#LDIR#'`pwd`'#g'|sudo tee /etc/yum.repos.d/redhawk.repo
[redhawk]
name=REDHAWK Repository
baseurl=file://LDIR/
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhawk
EOF
How do I get a Dockerfile to execute this command when creating an image?
(Also, although I can see that this command creates the file /etc/yum.repos.d/redhawk.repo, which consists of the lines from [redhawk] to gpgkey=...., I have no idea how to parse this command and understand exactly why it does that...)
Using the text editor of your choice, create the file on your local system. Remove the word sudo from it; give it an additional first line #!/bin/sh. Make it executable using chmod +x create-redhawk-repo.
Now it is an ordinary shell script, and in your Dockerfile you can just RUN it.
COPY create-redhawk-repo ./
RUN ./create-redhawk-repo
But! If you look at what the script actually does, it just writes a file into /etc/yum.repos.d with a LDIR placeholder replaced with some other directory. The filesystem layout inside a Docker image is fixed, and there's no particular reason to use environment variables or build arguments to hold filesystem paths most of the time. You could use a fixed path in the file
[redhawk]
name=REDHAWK Repository
baseurl=file:///redhawk-yum/
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhawk
and in your Dockerfile, just COPY that file in as-is, and make sure the downloaded package archive is in that directory. Adapting the installation instructions:
ARG redhawk_version=3.0.1
RUN wget https://github.com/RedhawkSDR/redhawk/releases/download/$redhawk_version/\
redhawk-yum-$redhawk_version-el7-x86_64.tar.gz \
&& tar xzf redhawk-yum-$redhawk_version-el7-x86_64.tar.gz \
&& rm redhawk-yum-$redhawk_version-el7-x86_64.tar.gz \
&& mv redhawk-yum-$redhawk_version-el7-x86_64 redhawk-yum \
&& rpm -i redhawk-yum/redhawk-release*.rpm
COPY redhawk.repo /etc/yum.repos.d/
Remember that, in a Dockerfile, you are root unless you've switched to another USER (and in that case you can use USER root to switch back); you do not need generally sudo in Docker at all, and can just delete sudo where it appears in these instructions.
How do I get a Dockerfile to execute this command when creating an image?
Just use printf and run this command as single line:
FROM image_name:image_tag
ARG LDIR="/default/folder/if/argument/not/set"
# if container has sudo command and default user is not root
# you should choose this variant
RUN printf '[redhawk]\nname=REDHAWK Repository\nbaseurl=file://%s/\nenabled=1\ngpgcheck=1\ngpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhawk\n' "$LDIR" | sudo tee /etc/yum.repos.d/redhawk.repo
# if default container user is root this command without piping may be used
RUN printf '[redhawk]\nname=REDHAWK Repository\nbaseurl=file://%s/\nenabled=1\ngpgcheck=1\ngpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhawk\n' "$LDIR" > /etc/yum.repos.d/redhawk.repo
Where LDIR is an argument and docker build process should be run like:
docker build ./ --build-arg LDIR=`pwd`

Forked docker image not building

I am trying to fork this docker image so that if anything changes on the original it won't affect me.
I have forked the repo corresponding to that image to my own repo.
I have cloned the repo and am trying to build it:
docker build . -t davcal/gcc-cross-x86_64-elf
I am getting this error:
+ cd /usr/local/src
+ ./build-binutils.sh 2.31.1
/bin/sh: 1: ./build-binutils.sh: not found
The command '/bin/sh -c set -x && cd /usr/local/src && ./build-binutils.sh ${BINUTILS_VERSION} && ./build-gcc.sh ${GCC_VERSION}' returned a non-zero code: 127
What makes no sense to me is that if I use the original image, it builds successfully:
FROM randomdude/gcc-cross-x86_64-elf
...
Maybe Docker Hub stores a pre-built image?
How do I fix this?
Note: I am using Windows. This shouldn't make a difference since the error originates within the container.
Edit
I tried patching the Dockerfile to chmod executable permissions to the sh files in case that was causing problems on Windows. Unfortunately, the exact same error occurs.
RUN set -x \
&& chmod +x /usr/local/src/build-binutils.sh \
&& chmod +x /usr/local/src/build-gcc.sh \
&& cd /usr/local/src \
&& ./build-binutils.sh ${BINUTILS_VERSION} \
&& ./build-gcc.sh ${GCC_VERSION}
Edit 2
Following this method, I inspected the container to see if the sh files actually exist. Here is the output.
I ran docker run --rm -it c53693f11514 bash, including the hash of the intermediate container of the previous successful step of the Dockerfile.
This is the output showing that the files do exist:
root#9b8a64ac2090:/# cd usr/local/src
root#9b8a64ac2090:/usr/local/src# ls
binutils-2.31.1 build-binutils.sh build-gcc.sh gcc-8.2.0
From the described symptoms, file exists, is a shell script, and works on other machines, the "file not found" error is most likely from Winidows linefeeds being added to the file. When the Linux kernel processes a shell script, it looks at the first line, the #!/bin/sh or similar, and then finds that interpreter to run the shell script. If that interpreter isn't found, you'll get a "file not found" error.
In this case, the file it's looking for won't be /bin/sh, but instead /bin/sh\r or /bin/sh^M depending on how you want to represent the carriage return character. You can fix that for single files with a tool like dos2unix but in general, you'll want to fix git itself since there are likely other files that have had their linefeeds corrupted. For details on adjusting the behavior of git, see this post.

Docker not running certain RUN directives?

I am trying to run a docker container to automatically set up a sphinx documentation site, but for some reason I get the following error when I try to build
Step 9/11 : RUN make html
---> Running in abd76075d0a0
make: *** No rule to make target 'html'. Stop.
When I run the container and console in, I see that sphinx-quickstart does not seem to have been run since there are no files present at all in /sphinx. Not sure what I have done wrong. Dockerfile is below.
1 # Run this with
2 # docker build .
3 # docker run -dit -p 8000:8000 <image_id>
4 FROM ubuntu:latest
5
6 WORKDIR /sphinx
7 VOLUME /sphinx
8
9 RUN apt-get update -y
10 RUN apt-get install python3 python3-pip vim git -y
11
12 RUN pip3 install -U pip
13 RUN pip3 install sphinx
14
15 RUN sphinx-quickstart . --quiet --project devops --author 'Timothy Pulliam' -v '0.1' --language 'en' --makefile
16 RUN make html
17
18 EXPOSE 8000/tcp
19
20
21 CMD ["python3", "-m", "http.server"]
EDIT:
Using LinPy's suggestion I was able to get it to work. It is still strange that it would not work the other way.
The Dockerfile VOLUME directive mostly only has confusing side effects. Unless you’re 100% clear on what it does and why you want it, you should just delete it.
In particular, one of those confusing side effects is that RUN commands that write into the volume directory just get lost. So when on line 7 you say VOLUME /sphinx, the RUN sphinx-quickstart on line 15 tries to write its output into the current directory, which is a declared volume directory, so the output content isn’t persisted into the image.
(Storing your code in a volume isn’t generally appropriate; build it into the image so it’s reusable later. You can use docker run -v to bind-mount content over any container-side directory regardless of whether or not it’s declared as a VOLUME.)
so you need to set those in one line:
RUN sphinx-quickstart . --quiet --project devops --author 'Timothy Pulliam' -v '0.1' --language 'en' --makefile && make html
I think you can see in the logs , remove intermediate container there for the rule html is not there anymore
You've already resolved the issue with LinPy's helpful comment, but just to add more, doing a quick google search with your error message comes up with this StackOverflow post...
gcc makefile error: "No rule to make target ..."
Perhaps you were accidentally invoking a different command (in this case a GCC command) rather than the .bat file provided by Sphinx.
Hopefully this might shed a bit more light on WHY it was happening. I assume the Ubuntu parent image you're using has GCC pre-installed.

Can't load docker from saved tar image

I committed my container as an image, then used "docker save" to save the image as a tar. Now I'm trying to load the tar on a GCC Centos 7 instance. I packaged it locally on my Ubuntu machine.
I've tried: docker load < image.tar and sudo docker load < image.tar
I also tried chmod 777 image.tar to see if the issue was permissions related.
Each time I try to load the image I get a variation of this error (the xxxx bit is a different number every time):
open /var/lib/docker/tmp/docker-import-xxxxxxxxx/repositories: no such file or directory
I think it might have something to do with permissions, because when I try to cd into /var/lib/docker/ I run into permissions issues.
Are there any other things I can try? Or is it likely to be a corrupted image?
There was a simple answer to this problem
I ran md5 checksums on the images before and after I moved them across systems and they were different. I re-transferred and all is working now.
for me the problem was that I added .tgz as an input. Once I extracted the tarball - there were the .tar file. Using that file as input it was successful.
The sequence of the BUILD(!) command is important, try this sequence:
### 1/3: Build it:
# docker build -f MYDOCKERFILE -t MYCNTNR .
### 2/3: Save it:
# docker save -o ./mycontainer.tar MYCNTNR
### 3/3: Copy it to the target machine:
# rsync/scp/... mycontainer.tar someone#target:.
### 4/4: On the target, load it :
# docker load -i MYCNTNR.tar
<snip>
Loaded image: MYCNTNR
I had the same issue and the following command fixed it for me:
cat <file_name>.tar | docker import - <image_name>:<image_version/tag>
Ref: https://webkul.com/blog/error-open-var-lib-docker-tmp-docker-import/

Dockerfile's RUN command doesn't find script

Using Docker Toolbox on Windows 10, Docker cannot build an image from my Dockerfile because it doesn't find a script (install-composer) that was copied to the image.
FROM php:7.2.5-apache
COPY scripts/install-composer /usr/bin
RUN chmod +x /usr/bin/install-composer
RUN /usr/bin/install-composer
The error I get, when reating the last RUN command, is:
/bin/sh: 1: /usr/bin/install-composer: not found
The chmod command does work however, indicating the file does actually exist in the image.
A very simple problem but a very misleading error.
The problem was caused by wrong file endings. Git was set up to convert the project files into Windows (CRLF) file endings. I reinstalled Git with the setting "Checkout as-is, commit Unix-style", deleted and recloned the repository, and it fixed the problem.
When it comes to explaining the misleading and confusing error message, my guess is that the file install-composer was actually found and executed. What it is actually saying was that was not found. This empty name was simply the CR caught between two LF (in other words, an empty line) and sh interpreted it as a call to a script file.
Try and group those RUN commands:
RUN chmod +x /usr/bin/install-composer && \
ls -alrth /usr/bin/install* && \
/usr/bin/install-composer
That way, you will see if the file is indeed copied and present.
You can also try, for your second RUN:
RUN /bin/bash -c "/usr/bin/install"
(assuming your script uses bash, and you have a bash installed in your image)

Resources