According to the manual of GNU parallel, the difference between --max-args / -n and --max-replace-args / -N, is that the latter is Like -n but also makes
replacement strings {1} .. {max-args} that represents argument 1 .. max-args.
--max-replace-args=max-args
-N max-args
Use at most max-args arguments per command line. Like -n but also makes
replacement strings {1} .. {max-args} that represents argument 1 .. max-args.
What does that actually mean? Does it mean that --max-args / -n would NOT interpret replacement strings {1} .. {max-args}? But the following test shows that the replacement strings {1} {2} {3} could be interpreted correctly:
$ parallel -n3 echo {3} {2} {1} ::: {A..F}
C B A
F E D
$ parallel -N3 echo {3} {2} {1} ::: {A..F}
C B A
F E D
So what's really the difference between the two?
I have a sample script called sample.sh which takes three inputs X,Y and Z
>> cat sample.sh
#! /bin/bash
X=$1
Y=$2
Z=#3
file=X$1_Y$2_Z$3
echo `hostname` `date` >> ./$file
Now I can give parameters in the following way:
parallel ./sample.sh {1} {2} {3} ::: 1.0000 1.1000 ::: 2.0000 2.1000 ::: 3.0000 3.1000
Or I could do:
parallel ./sample.sh {1} {2} {3} :::: xlist ylist zlist
where xlist, ylist and zlist are files which contain the parameter list.
But what if I want to have one file called parameter.dat?
>>> cat parameter.dat
#xlist
1.0000 1.1000
#ylist
2.0000 2.1000
#zlist
3.0000 3.1000
I can use awk to read parameter.dat and produce temporary files called xlist, ylist and so on...
But is there a better way using gnu-parallel itself?
Ultimately what I am looking for is to simply add more lines of xlist,ylist and zlist to parameter.dat and use the last instance of xlist, ylist or zlist to run sample.sh with, so that I keep a record of the parameter runs I have already done in parameter.dat itself.
I am looking for an elegant way to do this.
Edit: My current solution is:
#! /bin/bash
tail -1 < parameter.dat | head -1 | awk '{$1=$1};1' | tr ' ' '\n' > zlist
tail -3 < parameter.dat | head -1 | awk '{$1=$1};1' | tr ' ' '\n' > ylist
tail -5 < parameter.dat | head -1 | awk '{$1=$1};1' | tr ' ' '\n' > xlist
parallel ./sample.sh {1} {2} {3} :::: xlist ylist zlist
rm xlist ylist zlist
There is no built-in way of doing what you want, and your solution is not too bad.
If you control parameter.dat and it is not too big (128 KB) I would probably do:
$ cat parameter.dat
::: x valueX1 ValueX2
::: y valueY1 ValueY2
::: z ValueZ1 ValueZ2 ValueZ3
# There is on purpose no " around $() and the ::: is in parameter.dat
$ parallel --header : ./sample.sh $(cat parameter.dat)
--header : is used to ignore the first value of each line. It also means you can use {x} {y} and {z} in the command template.
This is easy to add another parameter, and you do not need to clean up tmp-files.
You are, however, restricted: Your values cannot contain space and some of the characters that have special meaning in shell (e.g. ? *). Other characters (e.g. $ ' " `) are fine.
This is the yaml file:
tasks:
test: {include: [bash_exec], args:['-c', 'state --m=4 in=in4.db | cppextract -f , -P NEW_MODEL /stdin Id Date {a,b,b2}{c,d}L {d1,d2,d3,d4}{x,}y | perl -lane '$F[0] = (shift #F) .".$F[0]"; $, = ":"; print #F;' | state2 --id=Id.Date wq.db -'], answer: '{{out}}/utestt.csv', n: 5, cols: [f,k]}
When parsed, it yields the following error:
Unexpected characters ($F[0] = (shift #F) .".$F[0]"; $, = ":"; print #F;'']
This command
state --m=4 in=in4.db | cppextract -f , -P NEW_MODEL /stdin Id Date {a,b,b2}{c,d}L {d1,d2,d3,d4}{x,}y | perl -lane '$F[0] = (shift #F) .".$F[0]"; $, = ":"; print #F;'
provides right output on linux command line but throws yaml parser exception when running through yaml.
First, let's untangle the YAML file in a more readable format:
tasks:
test: {
include: [bash_exec],
args:['-c', 'state --m=4 in=in4.db | cppextract -f , -P NEW_MODEL /stdin Id Date {a,b,b2}{c,d}L {d1,d2,d3,d4}{x,}y | perl -lane '$F[0] = (shift #F) .".$F[0]"; $, = ":"; print #F;' | state2 --id=Id.Date wq.db -'],
answer: '{{out}}/utestt.csv', n: 5, cols: [f,k]
}
The first problem is args:[; YAML requires you to separate a mapping value from the key (unless the key is a quoted scalar). Let's do that:
tasks:
test: {
include: [bash_exec],
args: [
'-c',
'state --m=4 in=in4.db | cppextract -f , -P NEW_MODEL /stdin Id Date {a,b,b2}{c,d}L {d1,d2,d3,d4}{x,}y | perl -lane '
$F[0] = (shift #F) .".$F[0]"; $, = ":"; print #F;' | state2 --id=Id.Date wq.db -'
],
answer: '{{out}}/utestt.csv', n: 5, cols: [f,k]
}
This makes it obvious what happens: You end the single-quoted scalar started with 'state right before the $ symbol. As we are in a YAML flow sequence (started by [), the parser expects a comma or the end of the sequence after that value. However, it finds a $ which is what it complains about.
Now obviously, you don't want to stop the scalar before the $; the ' is supposed to be part of the content. There are multiple ways to achieve this, but the most readable way is probably to define the value as a block scalar:
tasks:
test:
include: [bash_exec]
args:
- '-c'
- >-
state --m=4 in=in4.db | cppextract -f ,
-P NEW_MODEL /stdin Id Date {a,b,b2}{c,d}L {d1,d2,d3,d4}{x,}y |
perl -lane '$F[0] = (shift #F) .".$F[0]"; $, = ":"; print #F;' |
state2 --id=Id.Date wq.db -
answer:
- '{{out}}/utestt.csv',
- n: 5
- cols: [f, k]
>- starts a flow scalar, which can span multiple lines, and the linebreaks will be folded into a space character. Note that I removed the surrounding flow mapping ({…}) and replaced it with a block mapping to be able to use a block scalar in it.
I also changed answer to be a sequence which it is not currently, but it looks like it should be (it is also erroneous in the YAML you show).
I have two text files containing one column each, for example -
File_A File_B
1 1
2 2
3 8
If I do grep -f File_A File_B > File_C, I get File_C containing 1 and 2. I want to know how to use grep -v on two files so that I can get the non-matching values, 3 and 8 in the above example.
Thanks.
You can also use comm if it allows empty output delimiter
$ # -3 means suppress lines common to both input files
$ # by default, tab character appears before lines from second file
$ comm -3 f1 f2
3
8
$ # change it to empty string
$ comm -3 --output-delimiter='' f1 f2
3
8
Note: comm requires sorted input, so use comm -3 --output-delimiter='' <(sort f1) <(sort f2) if they are not already sorted
You can also pass common lines got from grep as input to grep -v. Tested with GNU grep, some version might not support all these options
$ grep -Fxf f1 f2 | grep -hxvFf- f1 f2
3
8
-F option to match strings literally, not as regex
-x option to match whole lines only
-h to suppress file name prefix
f- to accept stdin instead of file input
awk 'NR==FNR{a[$0]=$0;next} !($0 in a) {print a[(FNR)], $0}' f1 f2
3 8
To Understand the meaning of NR and FNR check below output of their print.
awk '{print NR,FNR}' f1 f2
1 1
2 2
3 3
4 4
5 1
6 2
7 3
8 4
Condition NR==FNR is used to extract the data from first file as both NR and FNR would be same for first file only.
With GNU diff command (to compare files line by line):
diff --suppress-common-lines -y f1 f2 | column -t
The output (left column contain lines from f1, right column - from f2):
3 | 8
-y, --side-by-side - output in two columns
If column Y contains only positive values, the following awk command works fine:
$ echo -e "g1 2\ng1 3\ng2 4\ng2 1\ng3 1" > input_pos.txt
$ cat input_pos.txt
g1 2
g1 3
g2 4
g2 1
g3 1
$ awk '{if(! $1 in a)a[$1]=$2; else if($2 > a[$1])a[$1]=$2} END{for(i in a) print i,a[i]}' input_pos.txt
g1 3
g2 4
g3 1
It works also well as long as there is at least one positive number:
$ echo -e "g1 2\ng1 -3\ng2 4\ng2 1\ng3 1" > input_pos-neg.txt
$ cat input_pos-neg.txt
g1 2
g1 -3
g2 4
g2 1
g3 1
$ awk '{if(! $1 in a)a[$1]=$2; else if($2 > a[$1])a[$1]=$2} END{for(i in a) print i,a[i]}' input_pos-neg.txt
g1 2
g2 4
g3 1
However, it doesn't work when there are only negative numbers:
$ echo -e "g1 -2\ng1 -3\ng2 -4\ng2 -1\ng3 -1" > input_neg.txt
$ cat input_neg.txt
g1 -2
g1 -3
g2 -4
g2 -1
g3 -1
$ awk '{if(! $1 in a)a[$1]=$2; else if($2 > a[$1])a[$1]=$2} END{for(i in a) print i,a[i]}' input_neg.txt
g1
g2
g3
Idem in this example:
$ echo -e "g1 -2\ng1 -3\ng2 4\ng2 1\ng3 1" > input_neg2.txt
$ cat input_neg2.txt
g1 -2
g1 -3
g2 4
g2 1
g3 1
$ awk '{if(! $1 in a)a[$1]=$2; else if($2 > a[$1])a[$1]=$2} END{for(i in a) print i,a[i]}' input_neg2.txt
g1
g2 4
g3 1
I looked at the gawk manual (Conversions of strings and numbers), and I tried to add a +0 to $2 to force the > comparison to be performed as numeric, but still can't find a solution to my problem. Any idea is welcomed!
Your problem is that the ! operator ties harder than in, thus if you parenthesize (! $1 in a), i.e. (! ($1 in a)) it works.