Copy a directory from container to host - docker

I have saved a folder called "ec2-user" in an image. And I can easily extract one of the files...
docker run -it shantanuo/bkup sh
docker exec -it e7611fc860a6 sh -c " cat /tmp/ec2-user/t1.txt" > t1.txt
This works as expected. But I do not know how to copy the entire directory "ec2-user" that is around 8 GB
In other words I am using docker as backup device. This is different that application deployment, but I will like to know if this is OK.

Looking at the name of your file and its size (8GB) I guess you're doing a database backup?
Since you want to copy the entire directory and it is relatively big, why not consider compressing the backup directory to a single file and then just do
docker cp my_container:/tmp/bkup/abc.xz .
To compress your backup you can use the xz utility.
Example:
shantanuo/bkup | xz -8 > /tmp/bkup.xz

Related

Why is docker not completely deleting my file?

I am trying to build using:
FROM mcr.microsoft.com/dotnet/core/sdk:2.1 AS builder
COPY pythonnet/src/ pythonnet/src
WORKDIR /pythonnet/src/runtime
RUN dotnet build -f netstandard2.0 -p:DefineConstants=\"MONO_LINUX\;XPLAT\;PYTHON3\;PYTHON37\;UCS4\;NETSTANDARD\" Python.Runtime.15.csproj
# copy myApp csproj and restore
COPY src/myApp/*.csproj /src/myApp/
WORKDIR /src/myApp
RUN dotnet restore
# now copy everything else as separate docker step
# (copy to staging folder, remove csproj, and copy down - so we don't overwrite project above)
WORKDIR /
COPY src/myApp/ ./staging/src/myApp
RUN rm ./staging/src/myApp/*.csproj \
&& cp -r ./staging/* ./ \
&& rm -rf ./staging
This was working fine, and in Windows 10 still does, but in CentOS 7 I get:
Step 10/40 : RUN rm ./staging/src/myApp/*.csproj && cp -r ./staging/* ./ && rm -rf ./staging
---> Running in 6b17ae0fae89
cp: cannot stat './staging/src/myApp/myApp.csproj': No such file or directory
Using ls instead of cp throws a similar file not found error, so it looks like Docker still knows about myApp.csproj but cannot see it since it has been removed.
Is there a way around this? I have tried using rsync but similar problems.
I simply ignored the issue by tacking on ;exit 0 on the offending lines. Not great, but does the job.
EDIT: This worked for me as I cannot upgrade the version of CemtOS. If you can, check out Alexander Block's answer.
I don't know specifically how to solve this problem as there's a lot of context in the filesystem that you haven't (and probably can't) share with us.
My suggestion on a strategy is that you:
comment out all lines from the failing one 'til the end of the Dockerfile
build the partial image
docker exec -it [image] bash to jump into the image
poke around and figure out what's going wrong
repeat 1-4 until things work as expected
It's not as fun as a perfectly insightful answer of course but this is a relentlessly effective algorithm even if it's tedious and annoying.
EDIT
My wild guess is that somehow, someway the linux machine doesn't have the file where it's expected for some reason and so it doesn't get copied into the image at all and that's why the docker build process can't find it. But there's no way to know without debugging the build process.
cp -r will stop and fail with that cannot stat <file> message whenever the source is a symbolic link and the target of the link does not exist. It will not copy links to non-existent files.
So my guess is that after you run COPY src/myApp/ ./staging/src/myApp your file ./staging/src/myApp/myApp.csproj is a symbolic link to a non-existent file. Why the following RUN rm ./staging/src/*.csproj doesn't remove it and stays silent about that, I don't know the answer to that.
To help demonstrate my theory, see below showing cp failing on a symlink on Centos 7.
[547] $ docker run --rm -it centos:7
Unable to find image 'centos:7' locally
7: Pulling from library/centos
524b0c1e57f8: Pull complete
Digest: sha256:e9ce0b76f29f942502facd849f3e468232492b259b9d9f076f71b392293f1582
Status: Downloaded newer image for centos:7
[root#a47b77cf2800 /]# ln -s /tmp/foo /tmp/bar
[root#a47b77cf2800 /]# ls -l /tmp/foo
ls: cannot access /tmp/foo: No such file or directory
[root#a47b77cf2800 /]# ls -l /tmp/bar
lrwxrwxrwx 1 root root 8 Jul 6 05:44 /tmp/bar -> /tmp/foo
[root#a47b77cf2800 /]# cp /tmp/foo /tmp/1
cp: cannot stat '/tmp/foo': No such file or directory
[root#a47b77cf2800 /]# cp /tmp/bar /tmp/2
cp: cannot stat '/tmp/bar': No such file or directory
Notice how you copy reports that it cannot stat either the source or destination of the symbolic link. It's the exact symptom you are seeing.
If you just want to get past this, you can try tar instead of cp or rsync.
Instead of
cp -r ./staging/* ./
use this instead:
tar -C ./staging -cf - . | tar -xf -
tar will happily copy symlinks that don't exist.
You've very likely encountered a kernel bug that has been fixed a long time ago in more recent kernels. As of https://de.wikipedia.org/wiki/CentOS, CentOS 7 is based on the Linux Kernel 3.10, which is pretty old already and does not have good Docker support in regard to the storage backend (overlay filesystem).
CentOS tried to backport needed fixes and features into 3.10, but seems to not have succeeded fully when it comes to overlay support. There are multiple (slightly different) issues regarding this which you can find when searching for "CentOS 7 overlay driver" on the internet. All of them have in common that removing of files from parent overlays does not work as expected.
For me it looks like rm calls on files return success, even though the files are not fully removed. Directory listings (e.g. by ls or shell expansion as in your case) then still list the file, while accessing the file then fails (no matter if read, write or deletion of the file).
I assume that what you've seen is just another incarnation of these issues. You should either switch to CentOS 8 or upgrade your Kernel (which is not officially supported by CentOS as far as I understand). Or even more radical, switch to a distribution which is used more often in combination with Docker and generally offers more recent Kernels, e.g. Debian or Ubuntu.

failing to `tar -zxvf` on a tarball with a symlink, inside Docker on Mac (tar: Cannot utime: No such file or directory)

i am doing a build inside a docker container of Alpine Linux on Docker-On-Mac. inside the tarball there is a symlink to a local file (README -> README.md) that is failing the untaring:
tar: tarname.tar.gz/README: Cannot utime: No such file or directory
two interesting facts:
When running the same on Linux (Docker-on-Ubuntu running Alpine) it works flawlessly.
when running twice it succeeds, because README.md is already in the partially-created dir.
unfortunately on (2) above, it is untarring as park of a build program (Alpine's abuild) so i cannot just run the tar command twice.
any thoughts?
tar tries to get the 'file modified time', which fails for some files. As I don't know the contents of the archive, I cannot say why this happens.
But you can circumvent this problem by using the -m a.k.a. --touch flag, which let's tar ignore/not query the modification time:
tar -m -xzvf tarname.tar.gz

Can't load docker from saved tar image

I committed my container as an image, then used "docker save" to save the image as a tar. Now I'm trying to load the tar on a GCC Centos 7 instance. I packaged it locally on my Ubuntu machine.
I've tried: docker load < image.tar and sudo docker load < image.tar
I also tried chmod 777 image.tar to see if the issue was permissions related.
Each time I try to load the image I get a variation of this error (the xxxx bit is a different number every time):
open /var/lib/docker/tmp/docker-import-xxxxxxxxx/repositories: no such file or directory
I think it might have something to do with permissions, because when I try to cd into /var/lib/docker/ I run into permissions issues.
Are there any other things I can try? Or is it likely to be a corrupted image?
There was a simple answer to this problem
I ran md5 checksums on the images before and after I moved them across systems and they were different. I re-transferred and all is working now.
for me the problem was that I added .tgz as an input. Once I extracted the tarball - there were the .tar file. Using that file as input it was successful.
The sequence of the BUILD(!) command is important, try this sequence:
### 1/3: Build it:
# docker build -f MYDOCKERFILE -t MYCNTNR .
### 2/3: Save it:
# docker save -o ./mycontainer.tar MYCNTNR
### 3/3: Copy it to the target machine:
# rsync/scp/... mycontainer.tar someone#target:.
### 4/4: On the target, load it :
# docker load -i MYCNTNR.tar
<snip>
Loaded image: MYCNTNR
I had the same issue and the following command fixed it for me:
cat <file_name>.tar | docker import - <image_name>:<image_version/tag>
Ref: https://webkul.com/blog/error-open-var-lib-docker-tmp-docker-import/

How to get files generated by docker run to host

I have run docker run to generate a file
sudo docker run -i --mount type=bind,src=/home/mathed/Simulation/custom_desman/1/Strains/Simulation2/Assembly,target=/home/mathed/Simulation/custom_desman/1/Strains/Simulation2/Assembly 990210oliver/mycc.docker:v1 MyCC.py /home/mathed/Simulation/custom_desman/1/Strains/Simulation2/Assembly/final_contigs_c10K.fa
This is the message I've got after executing.
20181029_0753
4mer
1_rename.py /home/mathed/Simulation/custom_desman/1/Strains/Simulation2/Assembly/final_contigs_c10K.fa 1000
Seqs >= 1000 : 32551
Minimum contig lengh for first stage clustering: 1236
run Prodigal.
/opt/prodigal.linux -i My.fa -a gene.aa -d gene.nuc -f gbk -o output -s potential_genes.txt
run fetchMG.
run UCLUST.
Get Feature.
2_GetFeatures_4mer.py for fisrt stage clustering
2_GetFeatures_4mer.py for second stage clustering
3_GetMatrix.py 1236 for fisrt stage clustering
22896 contigs entering first stage clustering
Clustering...
1_bhsne.py 20
2_ap.py /opt/ap 500 0
Cluster Correction.
to Split and Merge.
1_ClusterCorrection_Split.py 40 2
2_ClusterCorrection_Merge.py 40
Get contig by cluster.
20181029_0811
I now want to get the files generated by MyCC.py to host.
After reading Copying files from Docker container to host, I tried,
sudo docker cp 642ef90103be:/opt /home/mathed/data
But I got an error message
mkdir /home/mathed/data/opt: permission denied
Is there a way to get the files generated to a directory /home/mathed/data?
Thank you.
I assume your dest path does not exist.
Docker cp doc stats that in that case :
SRC_PATH specifies a directory
DEST_PATH does not exist
DEST_PATH is created as a directory and the contents of the source directory are copied into this directory
Thus it is trying to create a directory fro DEST_PATH... and docker must have the rights to do so.
According to the owner of the DEST_PATH top existing directory, you may have to either
create the directory first so that it will not be created by docker and give it the correct rights (looks like it has no rights to do so) using sudo chown {user}:{folder} + chmod +x {folder}
change the rights to the parent existing directory (chown + chmod again),
switch to path where docker is allowed to write.

How to verify if the content of two Docker images is exactly the same?

How can we determine that two Docker images have exactly the same file system structure, and that the content of corresponding files is the same, irrespective of file timestamps?
I tried the image IDs but they differ when building from the same Dockerfile and a clean local repository. I did this test by building one image, cleaning the local repository, then touching one of the files to change its modification date, then building the second image, and their image IDs do not match. I used Docker 17.06 (the latest version I believe).
If you want to compare content of images you can use docker inspect <imageName> command and you can look at section RootFS
docker inspect redis
"RootFS": {
"Type": "layers",
"Layers": [
"sha256:eda7136a91b7b4ba57aee64509b42bda59e630afcb2b63482d1b3341bf6e2bbb",
"sha256:c4c228cb4e20c84a0e268dda4ba36eea3c3b1e34c239126b6ee63de430720635",
"sha256:e7ec07c2297f9507eeaccc02b0148dae0a3a473adec4ab8ec1cbaacde62928d9",
"sha256:38e87cc81b6bed0c57f650d88ed8939aa71140b289a183ae158f1fa8e0de3ca8",
"sha256:d0f537e75fa6bdad0df5f844c7854dc8f6631ff292eb53dc41e897bc453c3f11",
"sha256:28caa9731d5da4265bad76fc67e6be12dfb2f5598c95a0c0d284a9a2443932bc"
]
}
if all layers are identical then images contains identical content
After some research I came up with a solution which is fast and clean per my tests.
The overall solution is this:
Create a container for your image via docker create ...
Export its entire file system to a tar archive via docker export ...
Pipe the archive directory names, symlink names, symlink contents, file names, and file contents, to an hash function (e.g., MD5)
Compare the hashes of different images to verify if their contents are equal or not
And that's it.
Technically, this can be done as follows:
1) Create file md5docker, and give it execution rights, e.g., chmod +x md5docker:
#!/bin/sh
dir=$(dirname "$0")
docker create $1 | { read cid; docker export $cid | $dir/tarcat | md5; docker rm $cid > /dev/null; }
2) Create file tarcat, and give it execution rights, e.g., chmod +x tarcat:
#!/usr/bin/env python3
# coding=utf-8
if __name__ == '__main__':
import sys
import tarfile
with tarfile.open(fileobj=sys.stdin.buffer, mode="r|*") as tar:
for tarinfo in tar:
if tarinfo.isfile():
print(tarinfo.name, flush=True)
with tar.extractfile(tarinfo) as file:
sys.stdout.buffer.write(file.read())
elif tarinfo.isdir():
print(tarinfo.name, flush=True)
elif tarinfo.issym() or tarinfo.islnk():
print(tarinfo.name, flush=True)
print(tarinfo.linkname, flush=True)
else:
print("\33[0;31mIGNORING:\33[0m ", tarinfo.name, file=sys.stderr)
3) Now invoke ./md5docker <image>, where <image> is your image name or id, to compute an MD5 hash of the entire file system of your image.
To verify if two images have the same contents just check that their hashes are equal as computed in step 3).
Note that this solution only considers as content directory structure, regular file contents, and symlinks (soft and hard). If you need more just change the tarcat script by adding more elif clauses testing for the content you wish to include (see Python's tarfile, and look for methods TarInfo.isXXX() corresponding to the needed content).
The only limitation I see in this solution is its dependency on Python (I am using Python3, but it should be very easy to adapt to Python2). A better solution without any dependency, and probably faster (hey, this is already very fast), is to write the tarcat script in a language supporting static linking so that a standalone executable file was enough (i.e., one not requiring any external dependencies, but the sole OS). I leave this as a future exercise in C, Rust, OCaml, Haskell, you choose.
Note, if MD5 does not suit your needs, just replace md5 inside the first script with your hash utility.
Hope this helps anyone reading.
Amazes me that docker doesn't do this sort of thing out of the box. Here's a variant on #mljrg's technique:
#!/bin/sh
docker create $1 | {
read cid
docker export $cid | tar Oxv 2>&1 | shasum -a 256
docker rm $cid > /dev/null
}
It's shorter, doesn't need a python dependency or a second script at all, I'm sure there are downsides but it seems to work for me with the few tests I've done.
There doesn't seem to be a standard way for doing this. The best way that I can think of is using the Docker multistage build feature.
For example, here I am comparing the apline and debian images. In yourm case set the image names to the ones you want to compare
I basically copy all the file from each image into a git repository and commit after each copy.
FROM alpine as image1
FROM debian as image2
FROM ubuntu
RUN apt-get update && apt-get install -y git
RUN git config --global user.email "you#example.com" &&\
git config --global user.name "Your Name"
RUN mkdir images
WORKDIR images
RUN git init
COPY --from=image1 / .
RUN git add . && git commit -m "image1"
COPY --from=image2 / .
RUN git add . && git commit -m "image2"
CMD tail > /dev/null
This will give you an image with a git repository that records the differences between the two images.
docker build -t compare .
docker run -it compare bash
Now if you do a git log you can see the logs and you can compare the two commits using git diff <commit1> <commit2>
Note: If the image building fails at the second commit, this means that the images are identical, since a git commit will fail if there are no changes to commit.
If we rebuild the Dockerfile it is almost certainly going to produce a new hash.
The only way to create an image with the same hash is to use docker save and docker load. See https://docs.docker.com/engine/reference/commandline/save/
We could then use Bukharov Sergey's answer (i.e. docker inspect) to inspect the layers, looking at the section with key 'RootFS'.

Resources