Strangely, I had assumed the -f option was for "force", not for "file".
I ran tar -xz because I wanted to see if any files would be overwritten. Now it has extracted all the files but has not returned control back to me. Should I just kill the process? Is it waiting for input?
-f commands tar to read the archive from a file. Without it, it tries to read it from stdin.
You can input Ctrl-C to kill it or Ctrl-D (Ctrl-Z in Windows) to send it EOF (at which point, it'll probably complain about incorrect archive format).
Without an -f option, tar will attempt to read from the TAPE device specified by the TAPE environment variable, or a file built into tar (usually something like /dev/st0 or stdin) if TAPE isn't set to anything.
Related
I have a container that I want to export as a .tar file. I have used a podman run with a tar --exclude=/dir1 --exclude=/dir2 … that outputs to a file located on a bind-mounted host dir. But recently this has been giving me some tar: .: file changed as we read it errors, which podman/docker export would avoid. Besides the export I suppose is more efficient. So I'm trying to migrate to using the export, but the major obstacle is I can't seem to find a way to exclude paths from the tar stream.
If possible, I'd like to avoid modifying a tar archive already saved on disk, and instead modify the stream before it gets saved to a file.
I've been banging my head for multiple hours, trying useless advices from ChatGPT, looking at cpio, and attempting to pipe the podman export to tar --exclude … command. With the last I did have small success at some point, but couldn't make tar save the result to a particularly named file.
Any suggestions?
(note: I do not make distinction between docker and podman here as their export command is completely the same, and it's useful for searchability)
I'm trying to use GNU Parallel to help me process some remote files that I don't want to save locally.
My command looks somewhat like that:
python list_files.py | \
parallel -j5 'aws s3 cp s3://s3-bucket/{} -' | \
parallel -j5 --round --pipe -l 5000 "python process_and_print.py"
process_and_print.py prints output for some input lines, but that output doesn't get to stdout immediately like I expected, instead I only see the output after the process is finished. If I remove the --round parameter is all works as expected.
Where does all that data get saved? Do I have a way to print it to stdout, line by line, without buffering?
Where does all that data get saved?
All buffered output from GNU Parallel is buffered in temporary files in $TMPDIR / --tmpdir which defaults to /tmp. You cannot see the files, as they are immediately removed (but kept open) to avoid you having to clean up, if GNU Parallel is killed.
Do I have a way to print it to stdout, line by line,
--line-buffer
without buffering?
-u disables buffering all together, but then you cannot guarantee line-by-line.
--line-buffer buffers a full line in memory from version 20170822 and thus does not buffer in /tmp.
I have two systems that I'm splitting processing between, and I'm trying to find the most efficient way to move the data between the two. I've figured out how to tar and gzip to an archive on the first server ("serverA") and then use rsync to copy to the remote host ("serverB"). However, when I untar/unzip the data there, it saves the archive including the full path name from the original server. So if on server A my data is in:
/serverA/directory/a/lot/of/subdirs/myData/*
and, using this command:
tar -zcvf /serverA/directory/a/lot/of/subdirs/myData-archive.tar.gz /serverA/directory/a/lot/of/subdirs/myData/
Everything in .../myData is successfully tarred and zipped in myData-archive.tar.gz
However, after copying the archive, when I try to untar/unzip on the second host (I manually log in here to finish the processing, the first step of which is to untar/unzip) using this command:
tar -zxvf /serverB/current/directory/myData-archive.tar.gz
It untars everything in my current directory (serverB/current/directory/), however it looks like this:
/serverB/current/directory/serverA/directory/a/lot/of/subdirs/myData/Data*ext
How should I formulate both the tar commands so that my data ends up in a directory called
/serverB/current/directory/dataHERE/
?
I know I'll need the -C flag to untar into a different directory (in my case, /serverB/current/directory/dataHERE ), but I still can't figure out how to make it so that the entire path is not included when the archive gets untarred. I've seen similar posts but none that I saw discussed how to do this when moving between to different hosts.
UPDATE: per one of the answers in this question, I changed my commands to:
tar/zip on serverA:
tar -zcvf /serverA/directory/a/lot/of/subdirs/myData-archive.tar.gz serverA/directory/a/lot/of/subdirs/myData/ -C /serverA/directory/a/lot/of/subdirs/ myData
and, untar/unzip:
tar -zxvf /serverB/current/directory/myData-archive.tar.gz -C /serverB/current/directory/dataHERE
And now, not only does it untar/unzip the data to:
/serverB/current/directory/dataHERE/
like I wanted, but it also puts another copy of the data here:
/serverB/current/directory/serverA/directory/a/lot/of/subdirs/myData/
which I don't want. How do I need to fix my commands so that it only puts data in the first place?
On serverA do
( cd /serverA/directory/a/lot/of/subdirs; tar -zcvf myData-archive.tar.gz myData; )
After some more messing around, I figured out how to achieve what I wanted:
To tar on serverA:
tar -zcvf /serverA/directory/a/lot/of/subdirs/myData-archive.tar.gz -C /serverA/directory/a/lot/of/subdirs/ myData
Then to untar on serverB:
tar -zxvf /serverB/current/directory/myData-archive.tar.gz -C /serverB/current/directory/dataHERE
If I send a ps- or pdf-file to CUPS is it then stored in the queue as is or is it only a pointer to the file that is in the queue?
In other words: Could the file be removed from disk after it has been sent (and placed in the printer queue) but before it is actually printed?
If there is some load on the printer would this test produce any paper:
#!/bin/bash
uptime | mpage > uptime.ps
lpr uptime.ps
rm uptime.ps
I had the same question for myself and just tested it.
lpr -P myprinter myfile.pdf && rm myfile.pdf
did print the 700K PDF file even though it was removed immediately after sending to print.
So I'd say the answer is "yes".
Is there a way to run shell commands without output buffering?
For example, hexdump file | ./my_script will only pass input from hexdump to my_script in buffered chunks, not line by line.
Actually I want to know a general solution how to make any command unbuffered?
Try stdbuf, included in GNU coreutils and thus virtually any Linux distro. This sets the buffer length for input, output and error to zero:
stdbuf -i0 -o0 -e0 command
The command unbuffer from the expect package disables the output buffering:
Ubuntu Manpage: unbuffer - unbuffer output
Example usage:
unbuffer hexdump file | ./my_script
AFAIK, you can't do it without ugly hacks. Writing to a pipe (or reading from it) automatically turns on full buffering and there is nothing you can do about it :-(. "Line buffering" (which is what you want) is only used when reading/writing a terminal. The ugly hacks exactly do this: They connect a program to a pseudo-terminal, so that the other tools in the pipe read/write from that terminal in line buffering mode. The whole problem is described here:
http://www.pixelbeat.org/programming/stdio_buffering/
The page has also some suggestions (the aforementioned "ugly hacks") what to do, i.e. using unbuffer or pulling some tricks with LD_PRELOAD.
You could also use the script command to make the output of hexdump line-buffered (hexdump will be run in a pseudo terminal which tricks hexdump into thinking its writing its stdout to a terminal, and not to a pipe).
# cf. http://unix.stackexchange.com/questions/25372/turn-off-buffering-in-pipe/
stty -echo -onlcr
script -q /dev/null hexdump file | ./my_script # FreeBSD, Mac OS X
script -q -c "hexdump file" /dev/null | ./my_script # Linux
stty echo onlcr
One should use grep or egrep "--line-buffered" options to solve this. no other tools needed.