File removed before we read it - tar

I'm making a tarball of a directory with tar -C "$DIR" -chf "$TARBALL" and a few files keep giving me the warning, "File removed before we read it". The files are there, and I'm not running any other processes at the same time that could be writing over the files.
What does this error mean?

Because of the -h flag, tar needs to follow any symbolic links it finds. Any broken symbolic link will give the error "File removed before we read it" e.g. symbolic links that no longer lead to valid files.
Either:
make sure all the links point to valid files
remove all the invalid links
remove the -h flag, to shallow copy the links (rather than the files they point to)

Related

Exclude a directory from `podman/docker export` stream and save to a file

I have a container that I want to export as a .tar file. I have used a podman run with a tar --exclude=/dir1 --exclude=/dir2 … that outputs to a file located on a bind-mounted host dir. But recently this has been giving me some tar: .: file changed as we read it errors, which podman/docker export would avoid. Besides the export I suppose is more efficient. So I'm trying to migrate to using the export, but the major obstacle is I can't seem to find a way to exclude paths from the tar stream.
If possible, I'd like to avoid modifying a tar archive already saved on disk, and instead modify the stream before it gets saved to a file.
I've been banging my head for multiple hours, trying useless advices from ChatGPT, looking at cpio, and attempting to pipe the podman export to tar --exclude … command. With the last I did have small success at some point, but couldn't make tar save the result to a particularly named file.
Any suggestions?
(note: I do not make distinction between docker and podman here as their export command is completely the same, and it's useful for searchability)

wget files with extension from S3 bucket_contents.html

Problem outline
I'm trying to get all the files from an URL: https://archive-gw-1.kat.ac.za/public/repository/10.48479/7epd-w356/data/basic_products/bucket_contents.html
which appears to be a list of contents of an S3 bucket with associated download links.
When I attempt to download all the files with the extension *.jpeg, I'm simply returned the directory structure leading up to an subdirectory with no downloaded files.
Things I've tried
To do this I've tried all the variations of leading parameters for:
$ wget -r -np -A '*.jpeg' https://archive-gw-1.kat.ac.za/public/repository/10.48479/7epd-w356/data/basic_products/
...that I can think of, but none have actually downloaded the jpeg files.
If you provide the path to a specific file e.g.
$ wget https://archive-gw-1.kat.ac.za/public/repository/10.48479/7epd-w356/data/basic_products/Abell_133_hi.jpeg
...the files can be downloaded, which would suggest that I must be mishandling the wildcard aspect of the download surely?
Thoughts which could be wrong owing to limited knowledge of wget and website protocols
Unless the fact that the contents are held in a bucket_contents.html rather than an index.html is causing problems?

What does "dump" mean in the context of the GNU tar program?

The man page for tar uses the word "dump" and its forms several times. What does it mean? For example (manual page for tar 1.26):
"-h, --dereferencefollow symlinks; archive and dump the files they point to"
Many popular systems have a "trash can" or "recycle bin." I don't want the files dumped there, but it kind of sounds that way.
At present, I don't want tar to write or delete any file, except that I want tar to create or update a single tarball.
FYI, the man page for the tar installed on the system I am using at the moment is a lot shorter than what appears to be the current version. And the description of -h, --dereference there seems very different to me:
"When reading or writing a file to be archived, tar accesses the file that a symbolic link points to, rather than the symlink itself. See section Symbolic Links."
P.S. I could not get "block quote" to work properly in this post.
File system backups are also called dumps.
—#raymond-chen, quoting GNU tar manual

ab: Could not read POST data file: End of file found

I am getting the error in the title when trying to run Apache Bench to test a HTTP endpoint I wrote, but only when specifying a POST file with contents. If I specify an empty file to -p.
I have been trying various solutions found online regarding the encoding and format of the contents, but it seems like just about any content will get this error.
The problem was that when installing Apache Bench from source, I had copied over the ab executable file from httpd/support/bin/.lib/ab to ~/.local/bin. When I did that, it used the system-wide libapr instead of the one I had downloaded to httpd/srclib/apr. This caused some sort of version mismatch, I assume.
The solution was to remove my copy of ab from ~/.local/bin and instead create a script ~/.local/bin/ab with contents
#!/bin/sh
$SRC/httpd/support/ab "$#"
and make this executable with chmod a+x ~/.local/bin/ab.

docker cp not working

I'm following this tutorial and when I get to the part where I call:
cp /tf_files/stripped_retrained_graph.pb bazel-bin/tensorflow/examples/android/assets/stripped_output_graph.pb
and
cp /tf_files/retrained_labels.txt bin/tensorflow/examples/android/assets/imagenet_comp_graph_label_strings.txt
They both say "No such file or directory".
As you can see in this image I can cd to the tf_files folder and see that the files are there.
I can also cd to /tensorflow/tensorflow/examples/android/assets and call ls which shows there's just a BUILD file there.
In the cp command is there supposed to already be a stripped_output_graph.pb file in the destination which gets replaced? Or is it meant to just be creating a new file there?
Is there some way of doing cp [source] [current directory] rather than specifying the destination as a path?
I've tried removing the file path part in hope that it just uses the source filename but that doesn't work.
Calling
cp /tf_files/stripped_retrained_graph.pb /tensorflow/tensorflow/examples/android/assets/stripped_output_graph.pb
and
cp /tf_files/retrained_labels.txt /tensorflow/tensorflow/examples/android/assets/imagenet_comp_graph_label_strings.txt
finally worked, wasn’t at all obvious that I’d have to change the destination path or what it should be though.
Also I accidentally saved a file as .p rather than .pb but managed to remove it using $ docker exec <container> rm -rf /tensorflow/tensorflow/examples/android/asset
s/stripped_output_graph.p
Now I managed to copy the files in correctly, but then when I installed the app it was still just running the regular demo app.
Not sure why it didn’t work, so frustrating.
When I rebuilt it after copying the files in I got these conflict messages
Are these normal to have?
It looks like maybe a different labels file is taking priority over mine, how can I reach the external/inception5h/imagenet_comp_graph_label_strings.txt file to delete it so my file is used instead?
Does the “external” part mean that I can’t actually access it?

Resources