GNU parallel: deleting line from joblog breaks parallel updating it - gnu-parallel

If you run GNU parallel with --joblog path/to/logfile and then delete a line from said logfile while parallel is running, GNU parallel is no longer able to append future completed jobs to it.
Execute this MWE:
#!/usr/bin/bash
parallel -j1 -n0 --joblog log sleep 1 ::: $(seq 10) &
sleep 5 && sed -i '$ d' log
If you tail -f log prior to execution, you can see that parallel keeps writing to this file. However, if you cat log after 10 seconds, you will see that nothing was written to the actual file now on disk after the third entry or so.
What's the reason behind this? Is there a way to delete something from the file and have GNU parallel be able to still write to it?
Some background as to why this happened:
Using GNU parallel, I started a few jobs on remote machines with --sshloginfile. I then needed to pkill a few jobs on one of the machines because a colleague needed to use it (and I subsequently removed the machine from the sshloginfile so that parallel wouldn't reuse it for new runs). If you pkill those processes started on the remote machine, they get an Exitval of 0 (it looks like they finished without issues; you can't tell that they were killed). I wanted to remove them immediately from the joblog so that when I restart parallel --resume later, parallel can have a look at the joblog and determine what's missing.
Turns out, this was a bad idea, as now my joblog is useless.

While #MarkSetchell is absolutely right in his comment, root problem here is due to man sed lying:
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if SUFFIX supplied)
sed -i does not edit files in place.
What it does is to make a temporary file in the same dir, copy the input file to the temporary file while doing the editing, and finally renaming the temporary file to the input file's name. Similar to this:
sed '$ d' log > sedXxO11P
mv sedXxO11P log
It is clear that the original log and sedXxO11P have different inodes - let us call them ino1 and ino2. GNU Parallel has ino1 open and really does not know about the existence of ino2. GNU Parallel will happily append to ino1 completely unaware that when it closes the file, the file will vanish because it has already been unlinked.
So you need to change the content of the file without changing the inode:
#!/usr/bin/bash
seq 10 | parallel -j1 -n0 --joblog log sleep 1 &
sleep 5
# Obvious race condition here:
# Anything appended to log before sed is done is lost.
# This can be avoided by suspending parallel while running this
tmp=$RANDOM$$
cp log $tmp
(rm $tmp; sed '$ d' >log) < $tmp
wait
cat log
This works right now. But do not expect this to be a supported feature - ever.

Related

What's the difference between RUN and bash script in a dockerfile?

I've seen many dockerfiles include all build steps in a RUN statement, like:
RUN echo "Hello" &&
cd /tmp &&
mv a.txt b.txt &&
...
and so on...
My question is: what's the benefits/drawbacks on replace these instructions by a single bash script that gives me highlight syntax, loop capabilities, etc?
Something like:
COPY ./script.sh /tmp
RUN bash /tmp/script.sh
and then
#!/bin/bash
echo "hello" ;
cd /tmp ;
mv a.txt b.txt ;
...
Thanks!
The primary difference is that when you COPY the bash script into the image it will be available for inspection in the running container, whereas the RUN command is a little more opaque. Putting your commands in a file like that is arguably more manageable for other reasons: changes in your VCS history will be a little more clear, and for longer or more complex scripts you will probably find it easier to format things cleanly with the script in a separate file rather than embedded in your Dockerfile in a RUN command.
Otherwise the result is the same (in both cases, you are executing the same set of commands), although the COPY and RUN will result in an extra image layer (vs. just the RUN by itself).
I guess running it off as a shell script gives you more control.
For instance, you can do if-else statements to check whether a command has failed or not and provide a code path to handle it. Whereas RUN is more straight forward and when the return code is not 0 it fails the build immediately.
Obviously the case you have there is a relatively simple one and it would not have had a huge difference. The only impact I can see here is the code readability aspect. Someone would have to read the shell script to know what is happening, comparing to having everything on a single file.
I guess it all comes down to using the right tool for the right job. If it is a simple command and you don't need complex logic handling then do RUN.

GNU Parallel output to stdout using --round-robin

I'm trying to use GNU Parallel to help me process some remote files that I don't want to save locally.
My command looks somewhat like that:
python list_files.py | \
parallel -j5 'aws s3 cp s3://s3-bucket/{} -' | \
parallel -j5 --round --pipe -l 5000 "python process_and_print.py"
process_and_print.py prints output for some input lines, but that output doesn't get to stdout immediately like I expected, instead I only see the output after the process is finished. If I remove the --round parameter is all works as expected.
Where does all that data get saved? Do I have a way to print it to stdout, line by line, without buffering?
Where does all that data get saved?
All buffered output from GNU Parallel is buffered in temporary files in $TMPDIR / --tmpdir which defaults to /tmp. You cannot see the files, as they are immediately removed (but kept open) to avoid you having to clean up, if GNU Parallel is killed.
Do I have a way to print it to stdout, line by line,
--line-buffer
without buffering?
-u disables buffering all together, but then you cannot guarantee line-by-line.
--line-buffer buffers a full line in memory from version 20170822 and thus does not buffer in /tmp.

Remove a certain amount of files with a sequence of characters in a directory

I stopped a process to troubleshoot something. Now, I would like to start the process where it left off in CentOS 6.4.
This script I stopped runs a perl script in a loop to process all of the files in /dev/shm/split/. These files were split into many parts from a larger file. An example of how they are named are as follows:
file.txt.aa
file.txt.ab
file.txt.ac
...and so on.
I have identified that the script left off at file.txt.fy. So, I would like to remove all of the files in /dev/shm/split/ that are from file.txt.aa through file.txt.fy.
I tried to create a whitelist for the rm command by doing:
ls /dev/shm/split/ > whitelist
cat whitelist | egrep -v 'file.txt.[aa-fz]' | tee whitelist.tmp
This did not do what I had intended.
Please help me! Thank you!
The problem with your command is that you cannot match two characters with the square bracket pattern in bash. You should use something like that instead:
ls file.txt.[a-e]? file.txt.f[a-y]
Basically decompose your range into two ranges, the first will match .aa to .ez, and the second .fa to .fy (included).
Note that I have used the ls command here. I always find it a good idea to first echo or ls the commands/files you're going to run when the operations you do are potentially destructive. When you're sure it produces the right output, go on and use rm instead of ls.

watching memory in PBS

I'm running a job on a cluster (using PBS) that runs out of memory. I'm trying to print the memory status for each node separately while my other job is running. I created a shell script and included a call to that script from inside my job submission script. But when I'm submitting my job it gives me permission denied error on the line that calls the script. I don't understand why do I get that error.
Secondly, I was thinking that I can have a 'watch free' or 'watch ps aux' in my script file but now I'm thinking if that will cause my submitted job to get stuck in memory-watching script and never continue to get to the main line that calls my parallel program.
After all, how can I achieve logging my memory in PBS for the jobs I'm submitting. My code is a C++ program using MRMPI (MPI MapReduce) library.
To see how much memory is being used throughout the job, run qstat -f:
$ qstat -f | grep used
resources_used.cput = 00:02:51
resources_used.energy_used = 0
resources_used.mem = 6960kb
resources_used.vmem = 56428kb
resources_used.walltime = 00:01:26
To examine past jobs you can look in the accounting file. This is located in the server_priv/accounting directory, the default is /var/spool/torque/server_priv/accounting/.
The entries look like this:
09/14/2015 10:52:11;E;202.napali;user=dbeer group=company jobname=intense.sh queue=batch ctime=1442248534 qtime=1442248534 etime=1442248534 start=1442248536 owner=dbeer#napali exec_host=napali/0-2 Resource_List.neednodes=1:ppn=3 Resource_List.nodect=1 Resource_List.nodes=1:ppn=3 session=20415 total_execution_slots=3 unique_node_count=1 end=0 Exit_status=0 resources_used.cput=1989 resources_used.energy_used=0 resources_used.mem=9660kb resources_used.vmem=58500kb resources_used.walltime=995
NOTE: if your ssh access to computing nodes of the cluster is closed, this method won't work!
This is how I ended up doing this. It might not be the best way but it works:
In summary, I added some short sleep periods in between my map and reduce steps by calling c++ sleep() function. And also wrote a script that ssh's to the nodes my job is running on and then gets the memory status on those nodes writing them in a file (using 'free' or 'top' commands).
More detailed: in my PBS job script, somewhere before the call to my binary, I added this line:
#this goes in job script, before the call to the job binary:
cat $PBS_NODEFILE > /some/path/nodelist.log
This writes a list of the nodes that my job runs on, into a file.
I have a second script "watchmem.sh":
#!/bin/bash
for i in $(seq 60)
do
while read line;
do
ssh $line 'bash -s' < /some/path/remote.sh "$line"
done < /some/path/nodelist.log
sleep 10
done
This script reads the file nodelist.log that we generated before, performs an ssh into each node and calls a third (and last script), remote.sh, on each of those nodes.
remote.sh contains the commands that we run on every node of our job. In this case it prints the current time and the result of 'free' into separate files for each node:
#remote.sh
echo "Current time : $(date)" >> $1
free >> $1 #this can be replaced by top by specifying a -n for it
Comparing the times from these files and the times I'm printing from my binary let's me find out the memory consumption (alloc/dealloc) in each step.
The sleep periods in my job is to make sure my scripts capture the memory status in between steps. 'sleep 10' in my script is to avoid unnecessary writes to the file; this period should be comparable to the sleep duration in the main job.

To use lsf bsub command without all the verbosity output

My problem is that: I have a bash script that do something and then call 800 bsub jobs like this:
pids=
rm -f ~/.count-*
for i in `ls _some_files_`; do
of=~/.count-${i}
bsub -I "grep _something_ $i > $of" &
pids="${!} ${pids}"
done
wait ${pids}
Then the scripts process the output files $of and echo the results.
The trouble is that I got a lot of lines like:
Job <7536> is submitted to default queue <interactive>.
<<Waiting for dispatch ...>>
<<Starting on hostA>>
It's actually 800 times the 3 lines above. Is there a way of suppressing this LSF lines?
I've tried in the loop above:
bsub -I "grep _something_ $i > $of" &> /dev/null
I does remove the LSF verbosity but instead of submitting almost all 800 jobs at once and then take less than 4 min to run, it submits just few jobs at a time and I have to wait more than an hour for the script to finish.
AFAIK lsf bsub doesn't seem to have a option to surpress all this verbosity. What can I do here?
You can suppress this output by setting the environment variable BSUB_QUIET to any value (including empty) before your bsub. So, before your loop say you could add:
export BSUB_QUIET=
Then if you want to return it back to normal you can clear the variable with:
unset BSUB_QUIET
Hope that helps you out.
Have you considered using job dependencies and post-process the logfiles?
1) Run each "child" job (removing the "-Is") and output the IO to separate output file. Each job should be submitted with a jobname (see -J). the jobname could form an array.
2) Your final job would be dependent on the children finishing (see -w).
Besides running concurrent across the cluster, another advantage of this approach is that your overall process is not susceptible to IO issues.

Resources