Passing multiple parameters to GNU parallel - gnu-parallel

I have a sample script called sample.sh which takes three inputs X,Y and Z
>> cat sample.sh
#! /bin/bash
X=$1
Y=$2
Z=#3
file=X$1_Y$2_Z$3
echo `hostname` `date` >> ./$file
Now I can give parameters in the following way:
parallel ./sample.sh {1} {2} {3} ::: 1.0000 1.1000 ::: 2.0000 2.1000 ::: 3.0000 3.1000
Or I could do:
parallel ./sample.sh {1} {2} {3} :::: xlist ylist zlist
where xlist, ylist and zlist are files which contain the parameter list.
But what if I want to have one file called parameter.dat?
>>> cat parameter.dat
#xlist
1.0000 1.1000
#ylist
2.0000 2.1000
#zlist
3.0000 3.1000
I can use awk to read parameter.dat and produce temporary files called xlist, ylist and so on...
But is there a better way using gnu-parallel itself?
Ultimately what I am looking for is to simply add more lines of xlist,ylist and zlist to parameter.dat and use the last instance of xlist, ylist or zlist to run sample.sh with, so that I keep a record of the parameter runs I have already done in parameter.dat itself.
I am looking for an elegant way to do this.
Edit: My current solution is:
#! /bin/bash
tail -1 < parameter.dat | head -1 | awk '{$1=$1};1' | tr ' ' '\n' > zlist
tail -3 < parameter.dat | head -1 | awk '{$1=$1};1' | tr ' ' '\n' > ylist
tail -5 < parameter.dat | head -1 | awk '{$1=$1};1' | tr ' ' '\n' > xlist
parallel ./sample.sh {1} {2} {3} :::: xlist ylist zlist
rm xlist ylist zlist

There is no built-in way of doing what you want, and your solution is not too bad.
If you control parameter.dat and it is not too big (128 KB) I would probably do:
$ cat parameter.dat
::: x valueX1 ValueX2
::: y valueY1 ValueY2
::: z ValueZ1 ValueZ2 ValueZ3
# There is on purpose no " around $() and the ::: is in parameter.dat
$ parallel --header : ./sample.sh $(cat parameter.dat)
--header : is used to ignore the first value of each line. It also means you can use {x} {y} and {z} in the command template.
This is easy to add another parameter, and you do not need to clean up tmp-files.
You are, however, restricted: Your values cannot contain space and some of the characters that have special meaning in shell (e.g. ? *). Other characters (e.g. $ ' " `) are fine.

Related

The difference between '--max-args' and '--max-replace-args' in GNU Parallel?

According to the manual of GNU parallel, the difference between --max-args / -n and --max-replace-args / -N, is that the latter is Like -n but also makes
replacement strings {1} .. {max-args} that represents argument 1 .. max-args.
--max-replace-args=max-args
-N max-args
Use at most max-args arguments per command line. Like -n but also makes
replacement strings {1} .. {max-args} that represents argument 1 .. max-args.
What does that actually mean? Does it mean that --max-args / -n would NOT interpret replacement strings {1} .. {max-args}? But the following test shows that the replacement strings {1} {2} {3} could be interpreted correctly:
$ parallel -n3 echo {3} {2} {1} ::: {A..F}
C B A
F E D
$ parallel -N3 echo {3} {2} {1} ::: {A..F}
C B A
F E D
So what's really the difference between the two?

grep the file if it matches delete it and save it in same name [duplicate]

I have a file f1:
line1
line2
line3
line4
..
..
I want to delete all the lines which are in another file f2:
line2
line8
..
..
I tried something with cat and sed, which wasn't even close to what I intended. How can I do this?
grep -v -x -f f2 f1 should do the trick.
Explanation:
-v to select non-matching lines
-x to match whole lines only
-f f2 to get patterns from f2
One can instead use grep -F or fgrep to match fixed strings from f2 rather than patterns (in case you want remove the lines in a "what you see if what you get" manner rather than treating the lines in f2 as regex patterns).
Try comm instead (assuming f1 and f2 are "already sorted")
comm -2 -3 f1 f2
For exclude files that aren't too huge, you can use AWK's associative arrays.
awk 'NR == FNR { list[tolower($0)]=1; next } { if (! list[tolower($0)]) print }' exclude-these.txt from-this.txt
The output will be in the same order as the "from-this.txt" file. The tolower() function makes it case-insensitive, if you need that.
The algorithmic complexity will probably be O(n) (exclude-these.txt size) + O(n) (from-this.txt size)
Similar to Dennis Williamson's answer (mostly syntactic changes, e.g. setting the file number explicitly instead of the NR == FNR trick):
awk '{if (f==1) { r[$0] } else if (! ($0 in r)) { print $0 } } ' f=1 exclude-these.txt f=2 from-this.txt
Accessing r[$0] creates the entry for that line, no need to set a value.
Assuming awk uses a hash table with constant lookup and (on average) constant update time, the time complexity of this will be O(n + m), where n and m are the lengths of the files. In my case, n was ~25 million and m ~14000. The awk solution was much faster than sort, and I also preferred keeping the original order.
if you have Ruby (1.9+)
#!/usr/bin/env ruby
b=File.read("file2").split
open("file1").each do |x|
x.chomp!
puts x if !b.include?(x)
end
Which has O(N^2) complexity. If you want to care about performance, here's another version
b=File.read("file2").split
a=File.read("file1").split
(a-b).each {|x| puts x}
which uses a hash to effect the subtraction, so is complexity O(n) (size of a) + O(n) (size of b)
here's a little benchmark, courtesy of user576875, but with 100K lines, of the above:
$ for i in $(seq 1 100000); do echo "$i"; done|sort --random-sort > file1
$ for i in $(seq 1 2 100000); do echo "$i"; done|sort --random-sort > file2
$ time ruby test.rb > ruby.test
real 0m0.639s
user 0m0.554s
sys 0m0.021s
$time sort file1 file2|uniq -u > sort.test
real 0m2.311s
user 0m1.959s
sys 0m0.040s
$ diff <(sort -n ruby.test) <(sort -n sort.test)
$
diff was used to show there are no differences between the 2 files generated.
Some timing comparisons between various other answers:
$ for n in {1..10000}; do echo $RANDOM; done > f1
$ for n in {1..10000}; do echo $RANDOM; done > f2
$ time comm -23 <(sort f1) <(sort f2) > /dev/null
real 0m0.019s
user 0m0.023s
sys 0m0.012s
$ time ruby -e 'puts File.readlines("f1") - File.readlines("f2")' > /dev/null
real 0m0.026s
user 0m0.018s
sys 0m0.007s
$ time grep -xvf f2 f1 > /dev/null
real 0m43.197s
user 0m43.155s
sys 0m0.040s
sort f1 f2 | uniq -u isn't even a symmetrical difference, because it removes lines that appear multiple times in either file.
comm can also be used with stdin and here strings:
echo $'a\nb' | comm -23 <(sort) <(sort <<< $'c\nb') # a
Seems to be a job suitable for the SQLite shell:
create table file1(line text);
create index if1 on file1(line ASC);
create table file2(line text);
create index if2 on file2(line ASC);
-- comment: if you have | in your files then specify “ .separator ××any_improbable_string×× ”
.import 'file1.txt' file1
.import 'file2.txt' file2
.output result.txt
select * from file2 where line not in (select line from file1);
.q
Did you try this with sed?
sed 's#^#sed -i '"'"'s%#g' f2 > f2.sh
sed -i 's#$#%%g'"'"' f1#g' f2.sh
sed -i '1i#!/bin/bash' f2.sh
sh f2.sh
Not a 'programming' answer but here's a quick and dirty solution: just go to http://www.listdiff.com/compare-2-lists-difference-tool.
Obviously won't work for huge files but it did the trick for me. A few notes:
I'm not affiliated with the website in any way (if you still don't believe me, then you can just search for a different tool online; I used the search term "set difference list online")
The linked website seems to make network calls on every list comparison, so don't feed it any sensitive data
A Python way of filtering one list using another list.
Load files:
>>> f1 = open('f1').readlines()
>>> f2 = open('f2.txt').readlines()
Remove '\n' string at the end of each line:
>>> f1 = [i.replace('\n', '') for i in f1]
>>> f2 = [i.replace('\n', '') for i in f2]
Print only the f1 lines that are also in the f2 file:
>>> [a for a in f1 if all(b not in a for b in f2)]
$ cat values.txt
apple
banana
car
taxi
$ cat source.txt
fruits
mango
king
queen
number
23
43
sentence is long
so what
...
...
I made a small shell scrip to "weed" out the values in source file which are present in values.txt file.
$cat weed_out.sh
from=$1
cp -p $from $from.final
for x in `cat values.txt`;
do
grep -v $x $from.final > $from.final.tmp
mv $from.final.tmp $from.final
done
executing...
$ ./weed_out source.txt
and you get a nicely cleaned up file....

numbers from egrep result in one line

I use egrep to output some lines with platform names:
XXX | egrep "i686-nptl-linux-gnu$|i686-w64-mingw32$|x86_64-unknown-linux-gnu$|x86_64-w64-mingw32$"
[30] i686-nptl-linux-gnu
[34] i686-w64-mingw32
[75] x86_64-unknown-linux-gnu
[77] x86_64-w64-mingw32
what I need is:
export PLATNUMS=30,34,75,77
How can I pipe the egrep command to sed / awk / bash script?
Try:
$ command | awk -F'[][ \t]+' '/i686-nptl-linux-gnu$|i686-w64-mingw32$|x86_64-unknown-linux-gnu$|x86_64-w64-mingw32$/{printf "%s%s",(f?",":"export PLATNUMS="),$2; f=1} END{print""}'
export PLATNUMS=30,34,75,77
How it works
-F'[][ \t]+'
Use any number of spaces, tabs, or [ or ] as field separators.
/i686-nptl-linux-gnu$|i686-w64-mingw32$|x86_64-unknown-linux-gnu$|x86_64-w64-mingw32$/{...}`
For the lines of interest, perform the commands in curly braces.
printf "%s%s",(f?",":"export PLATNUMS="),$2; f=1
For the lines of interest, print what we want.
The variable f marks whether this is the first line of interest.
END{print""}
After reading all lines, print a newline.
Creating a shell variable
export PLATNUMS=$(command | awk -F'[][ \t]+' '/i686-nptl-linux-gnu$|i686-w64-mingw32$|x86_64-unknown-linux-gnu$|x86_64-w64-mingw32$/{printf "%s%s",(f?",":""),$2; f=1} END{print""}')
For example, if the file input contains your data:
$ export PLATNUMS=$(awk -F'[][ \t]+' '/i686-nptl-linux-gnu$|i686-w64-mingw32$|x86_64-unknown-linux-gnu$|x86_64-w64-mingw32$/{printf "%s%s",(f?",":""),$2; f=1} END{print""}' input)
$ declare -p PLATNUMS
declare -x PLATNUMS="30,34,75,77"
For those who prefer their commands spread out over multiple lines:
export PLATNUMS=$(command | awk -F'[][ \t]+' '
/i686-nptl-linux-gnu$|i686-w64-mingw32$|x86_64-unknown-linux-gnu$|x86_64-w64-mingw32$/{
printf "%s%s",(f?",":""),$2
f=1
}
END{
print""
}
')
Perhaps this way, I can't try with your egrep.
export PLATNUMS=$(XXX | egrep "i686-nptl-linux-gnu$|i686-w64-mingw32$|x86_64-unknown-linux-gnu$|x86_64-w64-mingw32$" | sed ':A;s/\[\([[0-9]*\)].*/\1/;$bB;N;bA;:B;s/\n/,/g')
echo $PLATNUMS
How this work ?
Your egrep command return a multiline text
so sed read this text line by line this way
sed '
:A # label A
# here with your example
# on the first line the pattern space look like that
# [30] i686-nptl-linux-gnu
# on the second line the pattern space look like
# 30
# [34] i686-w64-mingw32
s/\[\([[0-9]*\)].*/\1/ # substitute all digit enclose by [] by only the digit
# on the first line the pattern space become
# 30
# on the second line the pattern space become
# 30
# 34
# and so on for each line
$bB # on the last line jump to B
N # get a newline in the pattern space
bA # It is not the last line so jump to A
:B # label B
# here we have read all the line
# the pattern space look like that without the #
# 30
# 34
# 75
# 77
s/\n/,/g' # subtitute all \n by a comma
# the pattern space become
# 30,34,75,77
# $(XXX | egrep .... | sed ...) return 30,34,75,77 in the variable PLATNUMS
# It is better not to use all capital letters in your variable name
With GNU sed and tr:
$ XXX | egrep "i686-nptl-linux-gnu$|i686-w64-mingw32$|x86_64-unknown-linux-gnu$|x86_64-w64-mingw32$" | sed -E 's,]\s+.+$,,g' | sed 's,^\[,,g' | tr '\n' ',' | sed -E 's,(^.+$),export PLATNUMS=\1,' | sed 's/,$//' && echo
I'm not sure what you want to achieve but you might want to automatically eval the output export:
$ eval $(XXX | egrep "i686-nptl-linux-gnu$|i686-w64-mingw32$|x86_64-unknown-linux-gnu$|x86_64-w64-mingw32$" | sed -E 's,]\s+.+$,,g' | sed 's,^\[,,g' | tr '\n' ',' | sed -E 's,(^.+$),export PLATNUMS=\1,' | sed 's/,$//' && echo)
$ echo $PLATNUMS
30,34,75,77
If you ever think you need grep+sed or 2 greps or 2 seds or any other combination then you should use 1 call to awk instead, and you never need grep or sed when you're using awk:
export PLATNUMS=$(XXX | awk -F'[][]' '/(i686-nptl-linux-gnu|i686-w64-mingw32|x86_64-unknown-linux-gnu|x86_64-w64-mingw32)$/{p=(p ? p "," : "") $2} END{print p}')
Btw in case it's useful, here's a couple of briefer regexps:
(i686-(nptl-linux-gnu|w64-mingw32)|x86_64-(unknown-linux-gnu|w64-mingw32))$
((i686-nptl|x86_64-unknown)-linux-gnu|(i686|x86_64)-w64-mingw32)$
and depending on your input data (since this will include combinations not provided by the above) you MIGHT only need:
(i686|x86_64)-(nptl|unknown|w64)-(linux-gnu|mingw32)$

joining 2 files on matching column values using awk

I know there have been similar questions posted but I'm still having a bit of trouble getting the output I want using awk FNR==NR...
I have 2 files as such
File 1:
123|this|is|good
456|this|is|better
...
File 2:
aaa|123
bbb|456
...
So I want to join on values from file 2/column2 to file 1/column1 and output file 1 (col 2,3,4) and file 2 (col 1).
Thanks in advance.
With awk you could do something like
awk -F \| 'BEGIN { OFS = FS } NR == FNR { val[$2] = $1; next } $1 in val { $(NF + 1) = val[$1]; print }' file2 file1
NF is the number of fields in a record (line by default), so $NF is the last field, and $(NF + 1) is the field after that. By assigning the saved value from the pass over file2 to it, a new field is appended to the record before it is printed.
One thing to note: This behaves like an inner join, i.e., only records are printed whose key appears in both files. To make this a right join, you can use
awk -F \| 'BEGIN { OFS = FS } NR == FNR { val[$2] = $1; next } { $(NF + 1) = val[$1]; print }' file2 file1
That is, you can drop the $1 in val condition on the append-and-print action. If $1 is not in val, val[$1] is empty, and an empty field will be appended to the record before printing.
But it's probably better to use join:
join -1 1 -2 2 -t \| file1 file2
If you don't want the key field to be part of the output, pipe the output of either of those commands through cut -d \| -f 2- to get rid of it, i.e.
join -1 1 -2 2 -t \| file1 file2 | cut -d \| -f 2-
If the files have the same number of lines in the same order, then
paste -d '|' file1 file2 | cut -d '|' -f 2-5
this|is|good|aaa
this|is|better|bbb
I see in a comment to Wintermute's answer that the files aren't sorted. With bash, process substitutions are handy to sort on the fly:
paste -d '|' <(sort -t '|' -k 1,1 file1) <(sort -t '|' -k 2,2 file2) |
cut -d '|' -f 2-5
To reiterate: this solution requires a one-to-one correspondence between the files

grep that match around the first match

I would like to grep a specific word 'foo' inside specific files, then get the N lines around my match and show only the blocks that contain a second grep.
I found this but it doesn't really work...
find . | grep -E '.*?\.(c|asm|mac|inc)$' | \
xargs grep --color -C3 -rie 'foo' | \
xargs -n1 --delimiter='--' | grep --color -l 'bar'
For instance I have the file 'a':
a
b
c
d
bar
f
foo
g
h
i
j
bar
l
The file b:
a
bar
c
d
e
foo
g
h
i
j
k
I expect this for grep -c2 on both files because bar is contained in the -c2 range of foo. I do not get any match for ./bar because bar is not in the range -c2 of foo...
--
./foo- bar
./foo- f
./foo- **foo**
./foo- g
./foo- h
--
Any ideas?
You could do this pretty simply with a "while read line" loop:
find -regextype posix-extended -regex "./file[a-z]" | while read line; do grep -nHC2 "foo" $line | grep --color bar; done
Output:
./filea-5-bar
./filec-46-... host pwns.me [94.23.120.252]: 451 4.7.1 Local bar
configuration error ...
In this example, I created the following files:
filea - your example a
fileb - your example b
filec - some random exim log output with foo and bar tossed in 2 lines apart
filed - the same exim log output, but with foo and bar tossed in 3 lines apart
You could also pipe the output after done, to alter the format:
; done | sed 's/-([0-9]{1,6})-/: line: \1 ::: /'
Formatted output
./filea: line: 5 ::: bar
./filec: line: 46 ::: ... host pwns.me [94.23.120.252]: 451 4.7.1 Local bar configuration error ...
I think I only understand the first line of your question and this does what I think you mean!
#!/bin/bash
N=2
pattern1=a
pattern2=z
matchinglines=$(awk -v p="$pattern1" '$0~p{print NR}' file) # Generate array of matching line numbers
for x in ${matchinglines[#]}
do
((start=x-N))
[[ $start -lt 1 ]] && start=1 # Avoid passing negative line nmumbers to sed
((end=x+N))
echo DEBUG: Checking block between lines $start and $end
sed -ne "${start},${end}p" file | grep -q "$pattern2"
[[ $? -eq 0 ]] && sed -ne "${start},${end}p" file
done
You need to set pattern1 and pattern2 at the start of the script. It basically does some awk to build an array of the line numbers that match your first pattern. Then it loops through the array and sets the start and end range to +/-N either side of each matching line number. It then uses sed to extraact that block and passes it through grep to see if it contains pattern2 printing it if it does. It may not be the most efficient, but it is easy enough to understand and maintain.
It assumes your file is called file
pipe it twice
grep "[^foo\n]" | grep "\n{ntimes}foo\n{ntimes}"

Resources