what does 'tar --overwrite' actually do (or not do)?

what does 'tar --overwrite' actually do (or not do)? - tar

I see that Linux tar has an option --overwrite. But overwriting seems to be the default. Moreover, specifying tar --no-overwrite does not change this behavior as the info file seems to suggest.
So what does that option actually do?
I test it with
ls -l >junk
ls -l junk
tar -cf junk.tar junk
>junk
ls -l junk
tar <option?> -xf junk.tar # option varies, results do not
ls -l junk

There are a few subtleties, but in general, here's the difference:
By default, "tar" tries to open output files with the flags O_CREAT | O_EXCL. If the file exists, this will fail, after which "tar" will retry by first trying to delete the existing file and then re-opening with the same flags (i.e., creating a new file).
In contrast, with the --overwrite option, "tar" tries to open output files with the flags O_CREAT | O_TRUNC. If the file exists, it will be truncated to zero size and overwritten.
The main implication is that "tar" by default will delete and re-create existing files, so they'll get new inode numbers. With --overwrite, the inode numbers won't change:
$ ls -li foo
total 0
5360222 -rw-rw-r-- 1 buhr buhr 0 Jun 26 15:16 bar
$ tar -cf foo.tar foo
$ tar -xf foo.tar # inode will change
$ ls -li foo
total 0
5360224 -rw-rw-r-- 1 buhr buhr 0 Jun 26 15:16 bar
$ tar --overwrite -xf foo.tar # inode won't change
$ ls -li foo
total 0
5360224 -rw-rw-r-- 1 buhr buhr 0 Jun 26 15:16 bar
$
This also means that, for each file overwritten, "tar" by default will need three syscalls (open, unlink, open) while --overwrite will need only one (open with truncation).

Related

Bind-Mount a single File with docker-compose

In my docker-compose (3.7) file I have something like
- ./a/src/:/var/www/html/
- ./a/config/local.php.ini:/usr/local/etc/php/conf.d/local.ini
as can be found for example in this example.
Whenever I change something on host in the ./a/src directory or in the container in /var/www/html/ it gets changed on the other side as expected. They are the same as they should be.
Not so with the file. It gets copied (I guess) to the container. But then, if I change local.php.ini on the host or /usr/local/etc/php/conf.d/local.ini the other one remains the same.
Is that the expected behavior? If yes, why and is it possible to change it so, both files are the same like with the directory
Note: This is not a duplicate of How to mount a single file in a volume. I get my file as file not as directory or such. I nevertheless tried it with absolute directories with ${PWD} as suggested there but that changed nothing.
Docker version 19.03.1, build 74b1e89
docker-compose version 1.24.1, build 4667896b
Host and container systems are Debian.

Please go through this.
I guess it might have caused because of this reason.
If you edit the file using text editor like vim, when you save the
file it does not save the file directly, rather it creates a new file
and copies it into place. This breaks the bind-mount, which is based
on inode. Since saving the file effectively changes the inode, changes
will not propagate into the container. Restarting the container will
pick up the new inode and changes will got reflected.
Here is an example, explaining what I mean:
# Create a file on host and list it contents and its inode number
-------------------
$ echo 'abc' > /root/file.txt
$ cat /root/file.txt
abc
$ ls -ltrhi /root/
total 4K
1623230 -rw-r--r-- 1 root root 4 Aug 23 17:44 file.txt
$
# Run an alpine container by mounting this file.txt
---------------------
$ docker run -itd -v /root/file.txt:/var/tmp/file.txt alpine sh
d59a2ad308d2de7dfbcf042439b295b27370e4014be94bc339f1c5c880bf205f
$
# Check file contents of file.txt and its inode number inside alpine container
$ docker exec -it d59a2ad308d2 sh
/ # cat /var/tmp/file.txt
abc
/ # ls -ltrhi /var/tmp/
total 4K
1623230 -rw-r--r-- 1 root root 4 Aug 23 17:44 file.txt
/ #
## NOTE: The inode number of file.txt is same here 1623230 on host and inside the container.
# Edit the file.txt inside alpine container using some text editor like vi
--------------------------
/ # vi /var/tmp/file.txt
/ # ls -ltrhi /var/tmp/
total 4K
1623230 -rw-r--r-- 1 root root 5 Aug 23 17:46 file.txt
/ # cat /var/tmp/file.txt
abcd
/ #
# Check content of file.txt on host, it will be the same as the one inside container since the inode number of file.txt inside container and on host is still same 1623230
--------------------------
$ cat /root/file.txt <<=== ran it on host
abcd
# Now edit content of file.txt on host and check its inode number.
$ vi file.txt
$ ls -ltrhi /root/
total 4K
862510 -rw-r--r-- 1 root root 6 Aug 23 17:47 file.txt
$ cat file.txt
abcde
$
## NOTE: the inode number of file.txt on host is changed to 862510 after editing the file using vi editor.
# Check content of file.txt inside alpine container and list it inode number
----------------------------
$ docker exec -it d59a2ad308d2 sh
/ # ls -ltrhi /var/tmp/
total 4K
1623230 -rw-r--r-- 0 root root 5 Aug 23 17:46 file.txt
/ # cat /var/tmp/file.txt
abcd
/ #
## NOTE: inode number here is the old one and doesn't match with the one on the host and hence the content of file.txt also doesn't match.
# Restart alpine container
---------------------------
$ docker restart d59a2ad308d2
d59a2ad308d2
$ docker exec -it d59a2ad308d2 sh
/ # cat /var/tmp/file.txt
abcde
/ # ls -ltrhi /var/tmp/
total 4K
862510 -rw-r--r-- 1 root root 6 Aug 23 17:47 file.txt
/ # [node1] (local) root#192.168.0.38 ~
$
## NOTE: After restarting container, the inode of file.txt is matching with the one on host and so the file contents also match.
I also highly recommend you to go through this link, it has more info.
Hope this helps.

let grep to search in multiple files in time order

i have a directory with a lot of files like this,
if i use grep to seach a string in those files, then it will search by this order, file log.0, then log.1....
but i want grep to search base on time order,
then i do like this,
grep -i 'stg_data.li51_cicmpdtap0521'
$(ls -ltr sequencer_cmbcs_seq_debug.log*) | less
but i get this error
grep：invalid option -- -
after i change to this, it worked,
grep -i 'stg_data.li51_cicmpdtap0521' $(ls -tr
sequencer_cmbcs_seq_debug.log*) | less
why ls -ltr do not work, but ls -tr work ? what's the difference between with -l and without -l here ?

The reason ls -ltr does not work is because grep is trying to use the entire "long" output of the returned directory listing. Essentially what that equates to is something like this:
-rw-rw-rw- 1 user staff 473 May 24 18:14 file
Which would give you a grep command like this:
grep -i 'string' -rw-rw-rw- 1 user staff 473 May 24 18:14 file | less
Notice the dashes in the first column (example 1); grep can't interpret what to make of the input file and returns "invalid option". When you changed your ls command to remove the -l long output you now just have filenames and grep is able to proceed.

Docker is deleting downloaded files during build when I use a VOLUME, why?

I have this simple Dockerfile:
FROM fabric8/java-centos-openjdk8-jdk
VOLUME /tmp
RUN curl -k -Lo /tmp/oc.tar.gz "https://mirror.openshift.com/pub/openshift-v3/clients/3.6.173.0.21/linux/oc.tar.gz" && ls -l /tmp
RUN ls -l /tmp && tar zxf /tmp/oc.tar.gz -C /usr/local/bin
It has to download a file, prints the /tmp folder contents, then ls again and extracts the downloaded file's content.
The problem is after downloading the file it is there (&& ls -l /tmp), but in the next RUN ls -l /tmp the file isn't there anymore.
Step 6/17 : RUN curl -k -Lo /tmp/oc.tar.gz "https://mirror.openshift.com/pub/openshift-v3/clients/3.6.173.0.21/linux/oc.tar.gz" && ls -l /tmp
---> Running in 5ad24909ed82
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 34.4M 100 34.4M 0 0 2489k 0 0:00:14 0:00:14 --:--:-- 5660k
total 35308
drwxr-xr-x 2 root root 4096 Mar 17 11:10 hsperfdata_root
-rwx------ 1 root root 836 Mar 2 01:07 ks-script-IAlIsB
-rw-r--r-- 1 root root 36145614 May 24 08:07 oc.tar.gz
-rw------- 1 root root 0 Mar 2 01:06 yum.log
Removing intermediate container 5ad24909ed82
---> 09e50e6d4d84
Step 7/17 : RUN ls -l /tmp && tar zxf /tmp/oc.tar.gz -C /usr/local/bin
---> Running in 49c305788ac9
total 8
drwxr-xr-x 2 root root 4096 Mar 17 11:10 hsperfdata_root
-rwx------ 1 root root 836 Mar 2 01:07 ks-script-IAlIsB
-rw------- 1 root root 0 Mar 2 01:06 yum.log
tar (child): /tmp/oc.tar.gz: Cannot open: No such file or directory
I has something to do with the VOLUME /tmp, without it, it works fine. What's the explanation of this?

Once the volume is defined, you won't be able to modify it. My best guess of what is happening internally during the build is that a temporary volume is setup with the temporary container used to perform the RUN step, and when the RUN step completes, the changes to the image are captured which will not include any changes to the temporary volume files. This behavior is documented by docker:
Changing the volume from within the Dockerfile: If any build steps
change the data within the volume after it has been declared, those
changes will be discarded.
I've also blogged on the topic here.

File ownership after docker cp

How can I control which user owns the files I copy in and out of a container?
The docker cp command says this about file ownership:
The cp command behaves like the Unix cp -a command in that directories are copied recursively with permissions preserved if possible. Ownership is set to the user and primary group at the destination. For example, files copied to a container are created with UID:GID of the root user. Files copied to the local machine are created with the UID:GID of the user which invoked the docker cp command. However, if you specify the -a option, docker cp sets the ownership to the user and primary group at the source.
It says that files copied to a container are created as the root user, but that's not what I see. I create two files owned by user id 1005 and 1006. Those owners are translated into the container's user namespace. The -a option seems to make no difference when I copy the file into a container.
$ sudo chown 1005:1005 test.txt
$ ls -l test.txt
-rw-r--r-- 1 1005 1005 29 Oct 6 12:43 test.txt
$ docker volume create sandbox1
sandbox1
$ docker run --name run1 -vsandbox1:/data alpine echo OK
OK
$ docker cp test.txt run1:/data/test1005.txt
$ docker cp -a test.txt run1:/data/test1005a.txt
$ sudo chown 1006:1006 test.txt
$ docker cp test.txt run1:/data/test1006.txt
$ docker cp -a test.txt run1:/data/test1006a.txt
$ docker run --rm -vsandbox1:/data alpine ls -l /data
total 16
-rw-r--r-- 1 1005 1005 29 Oct 6 19:43 test1005.txt
-rw-r--r-- 1 1005 1005 29 Oct 6 19:43 test1005a.txt
-rw-r--r-- 1 1006 1006 29 Oct 6 19:43 test1006.txt
-rw-r--r-- 1 1006 1006 29 Oct 6 19:43 test1006a.txt
When I copy files out of the container, they are always owned by me. Again, the -a option seems to do nothing.
$ docker run --rm -vsandbox1:/data alpine cp /data/test1006.txt /data/test1007.txt
$ docker run --rm -vsandbox1:/data alpine chown 1007:1007 /data/test1007.txt
$ docker cp run1:/data/test1006.txt .
$ docker cp run1:/data/test1007.txt .
$ docker cp -a run1:/data/test1006.txt test1006a.txt
$ docker cp -a run1:/data/test1007.txt test1007a.txt
$ ls -l test*.txt
-rw-r--r-- 1 don don 29 Oct 6 12:43 test1006a.txt
-rw-r--r-- 1 don don 29 Oct 6 12:43 test1006.txt
-rw-r--r-- 1 don don 29 Oct 6 12:47 test1007a.txt
-rw-r--r-- 1 don don 29 Oct 6 12:47 test1007.txt
-rw-r--r-- 1 1006 1006 29 Oct 6 12:43 test.txt
$

You can also change the ownership by logging in as root user into the container :
docker exec -it --user root <container-id> /bin/bash
chown -R <username>:<groupname> <folder/file>

In addition to #Don Kirkby's answer, let me provide a similar example in bash/shell script for the case that you want to copy something into a container while applying different ownership and permissions than those of the original file.
Let's create a new container from a small image that will keep running by itself:
docker run -d --name nginx nginx:alpine
Now wel'll create a new file which is owned by the current user and has default permissions:
touch foo.bar
ls -ahl foo.bar
>> -rw-rw-r-- 1 my-user my-group 0 Sep 21 16:45 foo.bar
Copying this file into the container will set ownership and group to the UID of my user and preserve the permissions:
docker cp foo.bar nginx:/foo.bar
docker exec nginx sh -c 'ls -ahl /foo.bar'
>> -rw-rw-r-- 1 4098 4098 0 Sep 21 14:45 /foo.bar
Using a little tar work-around, however, I can change the ownership and permissions that are applied inside of the container.
tar -cf - foo.bar --mode u=+r,g=-rwx,o=-rwx --owner root --group root | docker cp - nginx:/
docker exec nginx sh -c 'ls -ahl /foo.bar'
>> -r-------- 1 root root 0 Sep 21 14:45 /foo.bar
tar options explained:
c creates a new archive instead of unpacking one.
f - will write to stdout instead of a file.
foo.bar is the input file to be packed.
--mode specifies the permissions for the target. Similar to chown, they can be given in symbolic notation or as an octal number.
--owner sets the new owner of the file.
--group sets the new group of the file.
docker cp - reads the file that is to be copied into the container from stdin.
This approach is useful when a file needs to be copied into a created container before it starts, such that docker exec is not an option (which can only operate on running containers).

Just a one-liner (similar to #ramu's answer), using root to make the call:
docker exec -u 0 -it <container-id> chown node:node /home/node/myfile

In order to get complete control of file ownership, I used the tar stream feature of docker cp:
If - is specified for either the SRC_PATH or DEST_PATH, you can also stream a tar archive from STDIN or to STDOUT.
I launch the docker cp process, then stream a tar file to or from the process. As the tar entries go past, I can adjust the ownership and permissions however I like.
Here's a simple example in Python that copies all the files from /outputs in the sandbox1 container to the current directory, excludes the current directory so its permissions don't get changed, and forces all the files to have read/write permissions for the user.
from subprocess import Popen, PIPE, CalledProcessError
import tarfile
def main():
export_args = ['sudo', 'docker', 'cp', 'sandbox1:/outputs/.', '-']
exporter = Popen(export_args, stdout=PIPE)
tar_file = tarfile.open(fileobj=exporter.stdout, mode='r|')
tar_file.extractall('.', members=exclude_root(tar_file))
exporter.wait()
if exporter.returncode:
raise CalledProcessError(exporter.returncode, export_args)
def exclude_root(tarinfos):
print('\nOutputs:')
for tarinfo in tarinfos:
if tarinfo.name != '.':
assert tarinfo.name.startswith('./'), tarinfo.name
print(tarinfo.name[2:])
tarinfo.mode |= 0o600
yield tarinfo
main()

docker: how to show the diffs between 2 images

I have a Dockerfile with a sequence of RUN instructions that execute "apt-get install"s; for example, a couple of lines:
RUN apt-get install -y tree
RUN apt-get install -y git
After having executed "docker build", if I then execute "docker images -a", I see the listing of all the base-child-child-.... images that were created during the build.
I'd like to see a list of all of the packages that were installed when the "apt-get install -y git" line was executed (including the dependent packages that may have also been installed, besides the git packages).
Note: I believe that the "docker diff" command shows the diffs between a container and the image from which it was started. Instead I'd like the diffs between 2 images (of the same lineage): the "tree" and "git" image IDs. Is this possible?
Thanks.

Have a look at :
https://github.com/GoogleCloudPlatform/container-diff
This tool can diff local or remote docker images and can do so without requiring docker to be installed. It has file as well as package level "differs" (for example: apt, npm, and pip) so that you can more easily see the differences in packages that have changed between two docker images.
Disclaimer: I am a contributor to this project

This one worked for me:
docker run -it e5cba87ecd29 bash -c 'find /path/to/files -type f | sort | xargs -I{} sha512sum {}' > /tmp/dockerfiles.e5cba87ecd29.txt
docker run -it b1d19fe1a941 bash -c 'find /path/to/files -type f | sort | xargs -I{} sha512sum {}' > /tmp/dockerfiles.b1d19fe1a941.txt
meld /tmp/dockerfiles*
Where e5cba87ecd29 and b1d19fe1a941 are images I am interested in and /path/to/files is a directory which could be "/".
It lists all files, sorts it and add hash to it just in case. And meld highlights all the differences.

I suppose you could send both images' file systems to tarballs via docker export CONTAINER_ID or docker save IMAGE_ID (updated based on comments)
Then use whatever tool you like to diff the file systems - Git, Rdiff, etc.

It is now 2019 and I just found a useful tool which was released in late 2017.
https://opensource.googleblog.com/2017/11/container-diff-for-comparing-container-images.html
The following content is from container-diff github page:
container-diff diff <img1> <img2> --type=history [History]
container-diff diff <img1> <img2> --type=file [File System]
container-diff diff <img1> <img2> --type=size [Size]
container-diff diff <img1> <img2> --type=rpm [RPM]
container-diff diff <img1> <img2> --type=pip [Pip]
container-diff diff <img1> <img2> --type=apt [Apt]
container-diff diff <img1> <img2> --type=node [Node]
You can similarly run many analyzers at once:
container-diff diff <img1> <img2> --type=history --type=apt --type=node

Each RUN instruction creates a new container and you can inspect what a container changed by using docker diff <container>.
So after building your dockerfile, run docker ps -a to get a list of the containers the buildfile created. It should look something like:
CONTAINER ID IMAGE COMMAND CREATED STATUS ...
53d7dadafee7 f71e394eb0fc /bin/sh -c apt-get i 7 minutes ago Exit 0 ...
...
Now you can do do docker diff 53d7dadafee7 to see what was changed.

If you know container ID or name (even stopped container),
you can quickly dump file list on-the-fly.
$ docker export CONTAIN_ID_OR_NAME | tar tv
-rwxr-xr-x 0 0 0 0 2 6 21:22 .dockerenv
-rwxr-xr-x 0 0 0 0 2 6 21:22 .dockerinit
drwxr-xr-x 0 0 0 0 10 21 13:46 bin/
-rwxr-xr-x 0 0 0 1021112 10 8 2014 bin/bash
-rwxr-xr-x 0 0 0 31152 10 21 2013 bin/bunzip2
-rwxr-xr-x 0 0 0 0 10 21 2013 bin/bzcat link to bin/bunzip2
lrwxrwxrwx 0 0 0 0 10 21 2013 bin/bzcmp -> bzdiff
-rwxr-xr-x 0 0 0 2140 10 21 2013 bin/bzdiff
lrwxrwxrwx 0 0 0 0 10 21 2013 bin/bzegrep -> bzgrep
-rwxr-xr-x 0 0 0 4877 10 21 2013 bin/bzexe
......
Then you can save list to file and compare too list files.
If you insist to use image ID or name, you can dump first layer's file list on-the-fly:
$ docker save alpine |tar xO '*/layer.tar' | tar tv
drwxr-xr-x 0 0 0 0 12 27 06:32 bin/
lrwxrwxrwx 0 0 0 0 12 27 06:32 bin/ash -> /bin/busybox
lrwxrwxrwx 0 0 0 0 12 27 06:32 bin/base64 -> /bin/busybox
lrwxrwxrwx 0 0 0 0 12 27 06:32 bin/bbconfig -> /bin/busybox
-rwxr-xr-x 0 0 0 821408 10 27 01:15 bin/busybox
After all, i suggest you start the container then stop it, then you can get a merged file list as described in first way.
2017/02/01: The fastest way to show container's file list, you are free to enter its root dir to read files:
# PID=$(docker inspect -f '{{.State.Pid}}' CONTAIN_ID_OR_NAME)
# cd /proc/$PID/root && ls -lF
drwxr-xr-x 0 0 0 0 12 27 06:32 bin/
lrwxrwxrwx 0 0 0 0 12 27 06:32 bin/ash -> /bin/busybox
lrwxrwxrwx 0 0 0 0 12 27 06:32 bin/base64 -> /bin/busybox
lrwxrwxrwx 0 0 0 0 12 27 06:32 bin/bbconfig -> /bin/busybox
-rwxr-xr-x 0 0 0 821408 10 27 01:15 bin/busybox
Note, if you are using docker-machine, you need first enter it by
docker-machine ssh then sudo sh.
Now you get the root dir of the two container, you can use diff to compare them directly.

Have a look at:
https://github.com/moul/docker-diff
They list Brew install instructions for Mac, I'm assuming it's a Bash script, so I assume it could be made to work in other *nix environments.

It sounds like in this case maybe you only needed to see the diff between two layers. If so, dive is awesome for this; it lets you inspect the filesystem at each layer and you can filter files by change type (unchanged, added, removed, modified).
And if you want to inspect the differences between two unrelated images, having two dive processes running side by side works okay too.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

what does 'tar --overwrite' actually do (or not do)? - tar

Related

Bind-Mount a single File with docker-compose

let grep to search in multiple files in time order

Docker is deleting downloaded files during build when I use a VOLUME, why?

File ownership after docker cp

docker: how to show the diffs between 2 images

Categories

Resources