How to quote each argument from gnu parallel? - gnu-parallel

Given some tab-delimited content:
Test|One|Two|Three
Again|||Another
And a bash function:
function print_last() {
echo "$4"
}
export -f print_last
And the parallel command: parallel -C "\|" print_last :::: data.tsv
My expected output is:
Three
Another
However, Another never prints because the function only receives two arguments for that row of data. This is caused by the empty cells in the tabular data. My data will have blank cells and an varying number of columns.
So, without changing my command to include numbered arguments (print_last "{1}" "{2}" "{3}" "{4}"), how can I ensure that blank values are sent to the function?

Since your function is called print_last maybe it will be enough to simply get the last element:
parallel -C "\|" echo {-1} :::: data.tsv
Otherwise abuse that -X will repeat context:
parallel -C "\|" -X print_last \"\"{} :::: data.tsv

Related

Error in GNU parallel dynamic string replacement

I have more than 50 file pairs with names in the following format: AA-7R-76L1.clean.R1.fastq.gz, AA-7R-76L1.clean.R2.fastq.gz
I tried to use parallel in the following way:
parallel --plus echo {%R..fastq.gz} ::: *.fastq.gz |parallel 'repair.sh in1={}.R1.fastq.gz in2={}.R2.fastq.gz out1={}.repd.R1.fastq.gz out2={}.repd.R2.fastq.gz outs={}.singletons.fastq.gz repair'
--plus echo should dynamically replace R1.fastq.gz, R2.fastq.gz to capture the sample name i.e.HB-7R-25L0.clean. It should then feed it to repair.sh
The error I get is, the first section extracts the entire filename and does not capture the sample name. Thus in1 and in2 becomes AA-7R-76L1.clean.R1.fastq.gz.R1.fastq.gz and AA-7R-76L1.clean.R2.fastq.gz.R2.fastq.gz
What is the error here?
Something like:
$ parallel --plus --dry-run 'repair.sh in1={} in2={/R1/R2} out1={/R1/fixed.R1} out2={/R1/fixed.R2} outs={%.R1.fastq.gz}_singletons.fastq repair' ::: *R1.fastq.gz
(Assuming R1 and R2 is not part of the *-part of the name).

How to fix 'Unable to open [{2}]' error in Gnu Parallel

I want to parallelize an image processing step which uses two programs at the same time. My code works fine for a single image but when I try to parallelize it, it fails.
The two programs I am using are fx and getkey from USGS Integrated Software for Imagers and Spectrometers. I use fx to perform an arithmetic operation on my input image (which is 'f1' in the code below) and writes it to a new file (which is the 'to' parameter). getkey outputs the value of a requested keyword, which is a number in this case.
In the following code, I am subtracting the output of getkey from my input image, f1, and writing the result to a new file, which is defined by the 'to' parameter. This code works as I expect it to:
fx f1=W1660432760_1_overclocks_average_lwps5.cub to=testing_fx2.cub equation=f1-$(getkey from=W1660432760_1_overclocks_average_lwps5_stats.txt grpname=results keyword=average)
The problem comes when I try to parallelize it. The following code gives an error, saying 'Unable to open [{2}].'
parallel fx f1={1} to={1.}_minus_avg.cub equation=f1-$(getkey from={2} grpname=results keyword=average) ::: $(find *lwps5.cub) ::: $(find *stats.txt)
The result I am expecting is an output image with pixel values that are smaller by the getkey value compared to the input image.
If the two inputs should be combined in all ways:
parallel fx f1={1} to={1.}_minus_avg.cub 'equation=f1-$(getkey from={2} grpname=results keyword=average)' ::: *lwps5.cub ::: *stats.txt
If the two inputs should be linked:
parallel fx f1={1} to={1.}_minus_avg.cub 'equation=f1-$(getkey from={2} grpname=results keyword=average)' ::: *lwps5.cub :::+ *stats.txt
If neither of these solve you issue, then make a shell function that takes 2 arguments:
doit() {
arg1="$1"
arg2="$2"
# Do all your stuff with getkey and fx
}
export -f doit
# all combinations
parallel doit ::: *lwps5.cub ::: *stats.txt
# or linked
parallel doit ::: *lwps5.cub :::+ *stats.txt

duplicate grep output when comparing two files

I have literally been at this for 5 hours, I have busybox on my device, and I unfortunately do not have -X in grep to make my life easier.
edit;
I have two list both of them have mac addresses, essentially I am just wanting to achieve offline mac address lookup so I don't have to keep looking it up online
list.txt has vendor mac prefix of course this isn't the complete list but just for an example
00:13:46
00:15:E9
00:17:9A
00:19:5B
00:1B:11
00:1C:F0
scan will have list of different mac addresses unknown to which vendor they go to. Which will be full length mac addresses. when ever there is a match I want the line in scan to be output.
Pretty much it does that, but it outputs everything from the scan file, and then it will output matching one at the end, and causing duplicate. I tried sort -u, but it has no effect its as if there is two different output from two different methods, the reason why I say that is because it will instantly output scan file that has everything in it, and couple seconds later it will output the matching one.
From searching I came across this
#!/bin/bash
while read line; do
grep -F 'list' 'scan'
done < list.txt
which displays the duplicate result when/if found, the output is pretty much echoing my scan file then displaying the matched pattern, this creating duplicate
This is frustrating me that I have not found a solution after click on all the links in google up to page 9.
Please someone help me.
I don't know if the Busybox sed supports this out of the box, but it should be easy to do in Awk or Perl instead then.
Create a sed script to print lines from file2 which are covered by a prefix in file1 by transforming each line in file1 into a sed command to print a match for that regular expression:
sed 's%.*%/&/p%' file1 | sed -n -f - file2
The same in Awk:
awk 'NR==FNR { a[++i]="^" $0; next }
{ for (j=1; j<=i; ++j) if ($0 ~ a[j]) print }' file1 file2
Ok guys I did a nested for loop (probably very in efficient) but I got it working printing the matching mac addresses using this
#!/usr/bin/bash
for scanlist in `cat scan | cut -d: -f1,2,3`
do
for listt in `cat list`
do
if [[ $scanlist == $listt ]]; then
grep $scanlist scan
fi
done
done
if anyone can make this more elegant but it works for me for now. I think the problem I had was one list contained just 00:11:22 while my other list contained 00:11:22:33:44:55 that is why I cut it on my scanlist to make same length as my other list. So this only output the matches instead of doing duplicate output.

extract a line from a file using csh

I am writing a csh script that will extract a line from a file xyz.
the xyz file contains a no. of lines of code and the line in which I am interested appears after 2-3 lines of the file.
I tried the following code
set product1 = `grep -e '<product_version_info.*/>' xyz`
I want it to be in a way so that as the script find out that line it should save that line in some variable as a string & terminate reading the file immediately ie. it should not read furthermore aftr extracting the line.
Please help !!
grep has an -m or --max-count flag that tells it to stop after a specified number of matches. Hopefully your version of grep supports it.
set product1 = `grep -m 1 -e '<product_version_info.*/>' xyz`
From the man page linked above:
-m NUM, --max-count=NUM
Stop reading a file after NUM matching lines. If the input is
standard input from a regular file, and NUM matching lines are
output, grep ensures that the standard input is positioned to
just after the last matching line before exiting, regardless of
the presence of trailing context lines. This enables a calling
process to resume a search. When grep stops after NUM matching
lines, it outputs any trailing context lines. When the -c or
--count option is also used, grep does not output a count
greater than NUM. When the -v or --invert-match option is also
used, grep stops after outputting NUM non-matching lines.
As an alternative, you can always the command below to just check the first few lines (since it always occurs in the first 2-3 lines):
set product1 = `head -3 xyz | grep -e '<product_version_info.*/>'`
I think you're asking to return the first matching line in the file. If so, one solution is to pipe the grep result to head
set product1 = `grep -e '<product_version_info.*/>' xyz | head -1`

Unable to manipulate a byte array

I'm trying to pass a byte array from inside my rails app into another ruby script (still inside my rails app), for example:
`./app/animations/fade.sh "\x01\x01\x04\x00" &`
Yields ArgumentError (string contains null byte)
I suppose I'm stumped with how I can form this string and than pass it to my script, which will use it in this sort of fashion:
#sp.write ["#{ARGV[0]}", "f", "\x12"]
I'd like to form the string (on my rails app) like this if possible:
led = "\x01#{led.id}\x04\x00"
But I keep getting ArgumentError (string contains null byte) error. Is there a way I can form this string from elements in my rails app, then pass it to my external script?
You should just pass the data in through standard input, not the command line. You can use IO.popen for this purpose:
IO.popen("./app/animations/fade.sh", "w+") do |f|
f.write "\x01\x01\x04\x00"
end
And on the reading side:
input = $stdin.read
#sp.write [input, "f", "\x12"]
(By the way, it's more common to name Ruby scripts .rb instead of .sh; if fade.sh is meant to be a Ruby script, as I assume from the syntax you used in its example contents, you might want to name it fade.rb)
you could use base64 to pass the bytestring around
$ cat > test.sh
echo $1 | base64 -d
$ chmod a+x test.sh
and then from ruby:
irb
>> require 'base64'
=> true
>> `./test.sh "#{Base64.encode64 "\x01\x01\x04\x00"}"`
=> "\x01\x01\x04\x00"
Can your script accept input from STDIN instead? Perhaps using read.
If you can't do this, you could encode your null and escape your encoding.
E.G. 48656c6c6f0020576f726c64 could be encoded as 48656c6c6f200102020576f726c64
which in turn would be decoded again if both sides agree 2020=20 and 2001=00
Update I think encoding is what you'll have to do because I tried using read and it turns out to be a little too difficult. There's probably another option, but I don't see it yet.
Here's my script and two test runs:
dlamblin$ cat test.sh
echo "reading two lines of input, first line is length of second."
read len
read ans
echo "C string length of second line is:" ${#ans}
for ((c=0; c<$len; c++))
do
/bin/echo -n "${ans:$c:1},"
done
echo ' '
exit
dlamblin$ echo -e '12\0012Hello \0040World' | sh test.sh
reading two lines of input, first line is length of second.
C string length of second line is: 12
H,e,l,l,o, , ,W,o,r,l,d,
dlamblin$ echo -e '12\0012Hello \0000World' | sh test.sh
reading two lines of input, first line is length of second.
C string length of second line is: 5
H,e,l,l,o,,,,,,,,
#Octals \0000 \0012 \0040 are NUL NL and SP respectively

Resources