Why is copy slower than move? - cp

I had a big file that I'm moving about. The normal protocol in the lab is to copy it somewhere and then delete it.
I decided to change it to mv.
My question is, why is mv so much faster than cp?
To test it out I generated a file 2.7 GB in size.
time cp test.txt copy.txt
Took real 0m20.113s
time mv test.txt copy.txt
Took real 0m12.403s.
TL;DR mv was almost twice as fast as copy. Any explanations? Is this an expected result?
EDIT-
I decided to move/copy the folder to a destination other than the current folder.
time cp test.txt ../copy.txt
and
time mv test.txt ../copy.txt
This time cp took 9.238s and mv took only 0.297s. So not what some of the answers were suggesting.
UPDATE
The answers are right. When I tried to mv the file to a different disk on the same system, mv and cp took almost the same time.

When you mv a file on the same filesystem, the system just has to change directory entries to reflect your renaming. Data in the file is not even read.
(same filesystem means: same directory or same directory tree/same drive, provided that source and destination directories do not traverse symlinks leading to another filesystem of course!)
When you mv a file across file systems, it has the same effect as cp + rm: no speed gain (apart from the fact that you only run one command, and consistency is guaranteed: you don't have to check if cp succeeded to perform the rm)
(older versions of mv refused to move directories across filesystems, because they only did the renaming)
Be careful, it is not equivalent. cp overwrites destination by default, whereas mv will fail renaming a file/dir into an existing file/dir.

Related

Exclude a directory from `podman/docker export` stream and save to a file

I have a container that I want to export as a .tar file. I have used a podman run with a tar --exclude=/dir1 --exclude=/dir2 … that outputs to a file located on a bind-mounted host dir. But recently this has been giving me some tar: .: file changed as we read it errors, which podman/docker export would avoid. Besides the export I suppose is more efficient. So I'm trying to migrate to using the export, but the major obstacle is I can't seem to find a way to exclude paths from the tar stream.
If possible, I'd like to avoid modifying a tar archive already saved on disk, and instead modify the stream before it gets saved to a file.
I've been banging my head for multiple hours, trying useless advices from ChatGPT, looking at cpio, and attempting to pipe the podman export to tar --exclude … command. With the last I did have small success at some point, but couldn't make tar save the result to a particularly named file.
Any suggestions?
(note: I do not make distinction between docker and podman here as their export command is completely the same, and it's useful for searchability)

Is there any way in a dockerfile to make previous files part of the current layer (so `rename()` isn't crossing filesystem boundaries)?

I'm trying to build chromium v8 inside a container as part of another program's build chain, and one of the binary build tools crashes because it's trying to rename/move files across the filesystem boundary that overlayfs creates for container layers.
I've opened a bug with the build tool hoping there's a way it can gracefully handle overlayfs without potentially introducing a lot of burden to the chromium team, but in the best case scenario it's going to be quite a while before that's helpful.
I have some workarounds in the meantime:
running everything that touches those files in the same layer, which means they end up being re-downloaded every time (many GBs), which is a non-starter for some maintainers on the project I'm contributing to who are on metered connections.
copying to temp, deleting original, then moving back so it's owned by the layer, which will double the size on disk, and make the build much slower
Both of the options are kind of awful.
So that's why I'm curious if there's a way to just tell Docker or overlayfs to remove or ignore the boundaries between a few layers, or to tell them to move the files into the current layer.
For example:
Moving (even after a copy) is crossing a layer boundary and and can fail with XDEV errors.
COPY ./some-file ./some-file
RUN mv ./some-file ./some-renamed-file
I can get around this, by copying it internal to a layer, so it's owned by that layer:
COPY ./some-file ./some-temporary-file-name
RUN cp ./some-temporary-file-name ./the-real-file-name \
&& rm ./some-temporary-file-name
This is significantly more problematic and resource intesive when you need to do this with huge directories of source files, and the failure is part of an external program, where the specific files that will need to be moved are non-deterministic (files named with random hashes)
RUN git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git
WORKDIR /v8-temp
RUN fetch v8 \
&& mkdir /v8
# I have to duplicate the entire working directory
# since I can't know what the tool will need to move
WORKDIR /v8
RUN cp -r /v8-temp /v8 \
&& rm -rf /v8-temp \
&& gclient sync
# and if I don't clean up the image through other means
# I also have to wait for the old source files to be deleted,
# increasing the build time or image size of the result.

docker cp not working

I'm following this tutorial and when I get to the part where I call:
cp /tf_files/stripped_retrained_graph.pb bazel-bin/tensorflow/examples/android/assets/stripped_output_graph.pb
and
cp /tf_files/retrained_labels.txt bin/tensorflow/examples/android/assets/imagenet_comp_graph_label_strings.txt
They both say "No such file or directory".
As you can see in this image I can cd to the tf_files folder and see that the files are there.
I can also cd to /tensorflow/tensorflow/examples/android/assets and call ls which shows there's just a BUILD file there.
In the cp command is there supposed to already be a stripped_output_graph.pb file in the destination which gets replaced? Or is it meant to just be creating a new file there?
Is there some way of doing cp [source] [current directory] rather than specifying the destination as a path?
I've tried removing the file path part in hope that it just uses the source filename but that doesn't work.
Calling
cp /tf_files/stripped_retrained_graph.pb /tensorflow/tensorflow/examples/android/assets/stripped_output_graph.pb
and
cp /tf_files/retrained_labels.txt /tensorflow/tensorflow/examples/android/assets/imagenet_comp_graph_label_strings.txt
finally worked, wasn’t at all obvious that I’d have to change the destination path or what it should be though.
Also I accidentally saved a file as .p rather than .pb but managed to remove it using $ docker exec <container> rm -rf /tensorflow/tensorflow/examples/android/asset
s/stripped_output_graph.p
Now I managed to copy the files in correctly, but then when I installed the app it was still just running the regular demo app.
Not sure why it didn’t work, so frustrating.
When I rebuilt it after copying the files in I got these conflict messages
Are these normal to have?
It looks like maybe a different labels file is taking priority over mine, how can I reach the external/inception5h/imagenet_comp_graph_label_strings.txt file to delete it so my file is used instead?
Does the “external” part mean that I can’t actually access it?

tarring and untarring between two remote hosts

I have two systems that I'm splitting processing between, and I'm trying to find the most efficient way to move the data between the two. I've figured out how to tar and gzip to an archive on the first server ("serverA") and then use rsync to copy to the remote host ("serverB"). However, when I untar/unzip the data there, it saves the archive including the full path name from the original server. So if on server A my data is in:
/serverA/directory/a/lot/of/subdirs/myData/*
and, using this command:
tar -zcvf /serverA/directory/a/lot/of/subdirs/myData-archive.tar.gz /serverA/directory/a/lot/of/subdirs/myData/
Everything in .../myData is successfully tarred and zipped in myData-archive.tar.gz
However, after copying the archive, when I try to untar/unzip on the second host (I manually log in here to finish the processing, the first step of which is to untar/unzip) using this command:
tar -zxvf /serverB/current/directory/myData-archive.tar.gz
It untars everything in my current directory (serverB/current/directory/), however it looks like this:
/serverB/current/directory/serverA/directory/a/lot/of/subdirs/myData/Data*ext
How should I formulate both the tar commands so that my data ends up in a directory called
/serverB/current/directory/dataHERE/
?
I know I'll need the -C flag to untar into a different directory (in my case, /serverB/current/directory/dataHERE ), but I still can't figure out how to make it so that the entire path is not included when the archive gets untarred. I've seen similar posts but none that I saw discussed how to do this when moving between to different hosts.
UPDATE: per one of the answers in this question, I changed my commands to:
tar/zip on serverA:
tar -zcvf /serverA/directory/a/lot/of/subdirs/myData-archive.tar.gz serverA/directory/a/lot/of/subdirs/myData/ -C /serverA/directory/a/lot/of/subdirs/ myData
and, untar/unzip:
tar -zxvf /serverB/current/directory/myData-archive.tar.gz -C /serverB/current/directory/dataHERE
And now, not only does it untar/unzip the data to:
/serverB/current/directory/dataHERE/
like I wanted, but it also puts another copy of the data here:
/serverB/current/directory/serverA/directory/a/lot/of/subdirs/myData/
which I don't want. How do I need to fix my commands so that it only puts data in the first place?
On serverA do
( cd /serverA/directory/a/lot/of/subdirs; tar -zcvf myData-archive.tar.gz myData; )
After some more messing around, I figured out how to achieve what I wanted:
To tar on serverA:
tar -zcvf /serverA/directory/a/lot/of/subdirs/myData-archive.tar.gz -C /serverA/directory/a/lot/of/subdirs/ myData
Then to untar on serverB:
tar -zxvf /serverB/current/directory/myData-archive.tar.gz -C /serverB/current/directory/dataHERE

Linux tar help to extract folders

I kind of found the answer on the stackoverflow but have some confusion. I need some help.
I have a tar file which contains files and folders like this: usr/CCS/HMS*
I would like to extract all files and folders usr/CCS/HMS* but into a different filesystem, the new filesystem is /usr/TRAINP
HMS* should replace TRAINP*. TRAINP has folders like TRAINP/TRAINP.GL, TRAINP.AR, etc
the backup contains folders like usr/CCS/HMS/HMS.GL, usr/CCS/HMS.AR
When I am doing, it is restoring under /usr/TRAINP. I want usr/CCS/HMS* to replace /usr/TRAINP. This is kind of database restore with a different name.
Thanks a lot in advance.
Tar itself does not rename the contents when extracting. The best bet is to extract to some place in the target filesystem and move the results where you want.
For example:
cd /usr/CCS/TRAINP1
tar xf archive.tar usr/CCS/HMS1
mv usr/CCS/HMS1/* .
Or, if the TRAINP directories do not exist:
cd /
tar xf archive.tar usr/CCS
cd usr/CCS
for file in HMS*; do mv "$file" "TRAINP${file#HMS}"; done
Of course there are many variations and alternatives that will yield the same result. Note my example assumes usr/CCS belongs in /usr/CCS.

Resources