This is the yaml file:
tasks:
test: {include: [bash_exec], args:['-c', 'state --m=4 in=in4.db | cppextract -f , -P NEW_MODEL /stdin Id Date {a,b,b2}{c,d}L {d1,d2,d3,d4}{x,}y | perl -lane '$F[0] = (shift #F) .".$F[0]"; $, = ":"; print #F;' | state2 --id=Id.Date wq.db -'], answer: '{{out}}/utestt.csv', n: 5, cols: [f,k]}
When parsed, it yields the following error:
Unexpected characters ($F[0] = (shift #F) .".$F[0]"; $, = ":"; print #F;'']
This command
state --m=4 in=in4.db | cppextract -f , -P NEW_MODEL /stdin Id Date {a,b,b2}{c,d}L {d1,d2,d3,d4}{x,}y | perl -lane '$F[0] = (shift #F) .".$F[0]"; $, = ":"; print #F;'
provides right output on linux command line but throws yaml parser exception when running through yaml.
First, let's untangle the YAML file in a more readable format:
tasks:
test: {
include: [bash_exec],
args:['-c', 'state --m=4 in=in4.db | cppextract -f , -P NEW_MODEL /stdin Id Date {a,b,b2}{c,d}L {d1,d2,d3,d4}{x,}y | perl -lane '$F[0] = (shift #F) .".$F[0]"; $, = ":"; print #F;' | state2 --id=Id.Date wq.db -'],
answer: '{{out}}/utestt.csv', n: 5, cols: [f,k]
}
The first problem is args:[; YAML requires you to separate a mapping value from the key (unless the key is a quoted scalar). Let's do that:
tasks:
test: {
include: [bash_exec],
args: [
'-c',
'state --m=4 in=in4.db | cppextract -f , -P NEW_MODEL /stdin Id Date {a,b,b2}{c,d}L {d1,d2,d3,d4}{x,}y | perl -lane '
$F[0] = (shift #F) .".$F[0]"; $, = ":"; print #F;' | state2 --id=Id.Date wq.db -'
],
answer: '{{out}}/utestt.csv', n: 5, cols: [f,k]
}
This makes it obvious what happens: You end the single-quoted scalar started with 'state right before the $ symbol. As we are in a YAML flow sequence (started by [), the parser expects a comma or the end of the sequence after that value. However, it finds a $ which is what it complains about.
Now obviously, you don't want to stop the scalar before the $; the ' is supposed to be part of the content. There are multiple ways to achieve this, but the most readable way is probably to define the value as a block scalar:
tasks:
test:
include: [bash_exec]
args:
- '-c'
- >-
state --m=4 in=in4.db | cppextract -f ,
-P NEW_MODEL /stdin Id Date {a,b,b2}{c,d}L {d1,d2,d3,d4}{x,}y |
perl -lane '$F[0] = (shift #F) .".$F[0]"; $, = ":"; print #F;' |
state2 --id=Id.Date wq.db -
answer:
- '{{out}}/utestt.csv',
- n: 5
- cols: [f, k]
>- starts a flow scalar, which can span multiple lines, and the linebreaks will be folded into a space character. Note that I removed the surrounding flow mapping ({…}) and replaced it with a block mapping to be able to use a block scalar in it.
I also changed answer to be a sequence which it is not currently, but it looks like it should be (it is also erroneous in the YAML you show).
Related
Question: I'd like to print a single line directly following a line that contains a matching pattern.
My version of sed will not take the following syntax (it bombs out on +1p) which would seem like a simple solution:
sed -n '/ABC/,+1p' infile
I assume awk would be better to do multiline processing, but I am not sure how to do it.
Never use the word "pattern" in this context as it is ambiguous. Always use "string" or "regexp" (or in shell "globbing pattern"), whichever it is you really mean. See How do I find the text that matches a pattern? for more about that.
The specific answer you want is:
awk 'f{print;f=0} /regexp/{f=1}' file
or specializing the more general solution of the Nth record after a regexp (idiom "c" below):
awk 'c&&!--c; /regexp/{c=1}' file
The following idioms describe how to select a range of records given a specific regexp to match:
a) Print all records from some regexp:
awk '/regexp/{f=1}f' file
b) Print all records after some regexp:
awk 'f;/regexp/{f=1}' file
c) Print the Nth record after some regexp:
awk 'c&&!--c;/regexp/{c=N}' file
d) Print every record except the Nth record after some regexp:
awk 'c&&!--c{next}/regexp/{c=N}1' file
e) Print the N records after some regexp:
awk 'c&&c--;/regexp/{c=N}' file
f) Print every record except the N records after some regexp:
awk 'c&&c--{next}/regexp/{c=N}1' file
g) Print the N records from some regexp:
awk '/regexp/{c=N}c&&c--' file
I changed the variable name from "f" for "found" to "c" for "count" where
appropriate as that's more expressive of what the variable actually IS.
f is short for found. Its a boolean flag that I'm setting to 1 (true) when I find a string matching the regular expression regexp in the input (/regexp/{f=1}). The other place you see f on its own in each script it's being tested as a condition and when true causes awk to execute its default action of printing the current record. So input records only get output after we see regexp and set f to 1/true.
c && c-- { foo } means "if c is non-zero then decrement it and if it's still non-zero then execute foo" so if c starts at 3 then it'll be decremented to 2 and then foo executed, and on the next input line c is now 2 so it'll be decremented to 1 and then foo executed again, and on the next input line c is now 1 so it'll be decremented to 0 but this time foo will not be executed because 0 is a false condition. We do c && c-- instead of just testing for c-- > 0 so we can't run into a case with a huge input file where c hits zero and continues getting decremented so often it wraps around and becomes positive again.
It's the line after that match that you're interesting in, right? In sed, that could be accomplished like so:
sed -n '/ABC/{n;p}' infile
Alternatively, grep's A option might be what you're looking for.
-A NUM, Print NUM lines of trailing context after matching lines.
For example, given the following input file:
foo
bar
baz
bash
bongo
You could use the following:
$ grep -A 1 "bar" file
bar
baz
$ sed -n '/bar/{n;p}' file
baz
I needed to print ALL lines after the pattern ( ok Ed, REGEX ), so I settled on this one:
sed -n '/pattern/,$p' # prints all lines after ( and including ) the pattern
But since I wanted to print all the lines AFTER ( and exclude the pattern )
sed -n '/pattern/,$p' | tail -n+2 # all lines after first occurrence of pattern
I suppose in your case you can add a head -1 at the end
sed -n '/pattern/,$p' | tail -n+2 | head -1 # prints line after pattern
And I really should include tlwhitec's comment in this answer (since their sed-strict approach is the more elegant than my suggestions):
sed '0,/pattern/d'
The above script deletes every line starting with the first and stopping with (and including) the line that matches the pattern. All lines after that are printed.
awk Version:
awk '/regexp/ { getline; print $0; }' filetosearch
If pattern match, copy next line into the pattern buffer, delete a return, then quit -- side effect is to print.
sed '/pattern/ { N; s/.*\n//; q }; d'
Actually sed -n '/pattern/{n;p}' filename will fail if the pattern match continuous lines:
$ seq 15 |sed -n '/1/{n;p}'
2
11
13
15
The expected answers should be:
2
11
12
13
14
15
My solution is:
$ sed -n -r 'x;/_/{x;p;x};x;/pattern/!s/.*//;/pattern/s/.*/_/;h' filename
For example:
$ seq 15 |sed -n -r 'x;/_/{x;p;x};x;/1/!s/.*//;/1/s/.*/_/;h'
2
11
12
13
14
15
Explains:
x;: at the beginning of each line from input, use x command to exchange the contents in pattern space & hold space.
/_/{x;p;x};: if pattern space, which is the hold space actually, contains _ (this is just a indicator indicating if last line matched the pattern or not), then use x to exchange the actual content of current line to pattern space, use p to print current line, and x to recover this operation.
x: recover the contents in pattern space and hold space.
/pattern/!s/.*//: if current line does NOT match pattern, which means we should NOT print the NEXT following line, then use s/.*// command to delete all contents in pattern space.
/pattern/s/.*/_/: if current line matches pattern, which means we should print the NEXT following line, then we need to set a indicator to tell sed to print NEXT line, so use s/.*/_/ to substitute all contents in pattern space to a _(the second command will use it to judge if last line matched the pattern or not).
h: overwrite the hold space with the contents in pattern space; then, the content in hold space is ^_$ which means current line matches the pattern, or ^$, which means current line does NOT match the pattern.
the fifth step and sixth step can NOT exchange, because after s/.*/_/, the pattern space can NOT match /pattern/, so the s/.*// MUST be executed!
This might work for you (GNU sed):
sed -n ':a;/regexp/{n;h;p;x;ba}' file
Use seds grep-like option -n and if the current line contains the required regexp replace the current line with the next, copy that line to the hold space (HS), print the line, swap the pattern space (PS) for the HS and repeat.
Piping some greps can do it (it runs in POSIX shell and under BusyBox):
cat my-file | grep -A1 my-regexp | grep -v -- '--' | grep -v my-regexp
-v will show non-matching lines
-- is printed by grep to separate each match, so we skip that too
If you just want the next line after a pattern, this sed command will work
sed -n -e '/pattern/{n;p;}'
-n supresses output (quiet mode);
-e denotes a sed command (not required in this case);
/pattern/ is a regex search for lines containing the literal combination of the characters pattern (Use /^pattern$/ for line consisting of only of “pattern”;
n replaces the pattern space with the next line;
p prints;
For example:
seq 10 | sed -n -e '/5/{n;p;}'
Note that the above command will print a single line after every line containing pattern. If you just want the first one use sed -n -e '/pattern/{n;p;q;}'. This is also more efficient as the whole file is not read.
This strictly sed command will print all lines after your pattern.
sed -n '/pattern/,${/pattern/!p;}
Formatted as a sed script this would be:
/pattern/,${
/pattern/!p
}
Here’s a short example:
seq 10 | sed -n '/5/,${/5/!p;}'
/pattern/,$ will select all the lines from pattern to the end of the file.
{} groups the next set of commands (c-like block command)
/pattern/!p; prints lines that doesn’t match pattern. Note that the ; is required in early versions, and some non-GNU, of sed. This turns the instruction into a exclusive range - sed ranges are normally inclusive for both start and end of the range.
To exclude the end of range you could do something like this:
sed -n '/pattern/,/endpattern/{/pattern/!{/endpattern/d;p;}}
/pattern/,/endpattern/{
/pattern/!{
/endpattern/d
p
}
}
/endpattern/d is deleted from the “pattern space” and the script restarts from the top, skipping the p command for that line.
Another pithy example:
seq 10 | sed -n '/5/,/8/{/5/!{/8/d;p}}'
If you have GNU sed you can add the debug switch:
seq 5 | sed -n --debug '/2/,/4/{/2/!{/4/d;p}}'
Output:
SED PROGRAM:
/2/,/4/ {
/2/! {
/4/ d
p
}
}
INPUT: 'STDIN' line 1
PATTERN: 1
COMMAND: /2/,/4/ {
COMMAND: }
END-OF-CYCLE:
INPUT: 'STDIN' line 2
PATTERN: 2
COMMAND: /2/,/4/ {
COMMAND: /2/! {
COMMAND: }
COMMAND: }
END-OF-CYCLE:
INPUT: 'STDIN' line 3
PATTERN: 3
COMMAND: /2/,/4/ {
COMMAND: /2/! {
COMMAND: /4/ d
COMMAND: p
3
COMMAND: }
COMMAND: }
END-OF-CYCLE:
INPUT: 'STDIN' line 4
PATTERN: 4
COMMAND: /2/,/4/ {
COMMAND: /2/! {
COMMAND: /4/ d
END-OF-CYCLE:
INPUT: 'STDIN' line 5
PATTERN: 5
COMMAND: /2/,/4/ {
COMMAND: }
END-OF-CYCLE:
I have a sample script called sample.sh which takes three inputs X,Y and Z
>> cat sample.sh
#! /bin/bash
X=$1
Y=$2
Z=#3
file=X$1_Y$2_Z$3
echo `hostname` `date` >> ./$file
Now I can give parameters in the following way:
parallel ./sample.sh {1} {2} {3} ::: 1.0000 1.1000 ::: 2.0000 2.1000 ::: 3.0000 3.1000
Or I could do:
parallel ./sample.sh {1} {2} {3} :::: xlist ylist zlist
where xlist, ylist and zlist are files which contain the parameter list.
But what if I want to have one file called parameter.dat?
>>> cat parameter.dat
#xlist
1.0000 1.1000
#ylist
2.0000 2.1000
#zlist
3.0000 3.1000
I can use awk to read parameter.dat and produce temporary files called xlist, ylist and so on...
But is there a better way using gnu-parallel itself?
Ultimately what I am looking for is to simply add more lines of xlist,ylist and zlist to parameter.dat and use the last instance of xlist, ylist or zlist to run sample.sh with, so that I keep a record of the parameter runs I have already done in parameter.dat itself.
I am looking for an elegant way to do this.
Edit: My current solution is:
#! /bin/bash
tail -1 < parameter.dat | head -1 | awk '{$1=$1};1' | tr ' ' '\n' > zlist
tail -3 < parameter.dat | head -1 | awk '{$1=$1};1' | tr ' ' '\n' > ylist
tail -5 < parameter.dat | head -1 | awk '{$1=$1};1' | tr ' ' '\n' > xlist
parallel ./sample.sh {1} {2} {3} :::: xlist ylist zlist
rm xlist ylist zlist
There is no built-in way of doing what you want, and your solution is not too bad.
If you control parameter.dat and it is not too big (128 KB) I would probably do:
$ cat parameter.dat
::: x valueX1 ValueX2
::: y valueY1 ValueY2
::: z ValueZ1 ValueZ2 ValueZ3
# There is on purpose no " around $() and the ::: is in parameter.dat
$ parallel --header : ./sample.sh $(cat parameter.dat)
--header : is used to ignore the first value of each line. It also means you can use {x} {y} and {z} in the command template.
This is easy to add another parameter, and you do not need to clean up tmp-files.
You are, however, restricted: Your values cannot contain space and some of the characters that have special meaning in shell (e.g. ? *). Other characters (e.g. $ ' " `) are fine.
I use egrep to output some lines with platform names:
XXX | egrep "i686-nptl-linux-gnu$|i686-w64-mingw32$|x86_64-unknown-linux-gnu$|x86_64-w64-mingw32$"
[30] i686-nptl-linux-gnu
[34] i686-w64-mingw32
[75] x86_64-unknown-linux-gnu
[77] x86_64-w64-mingw32
what I need is:
export PLATNUMS=30,34,75,77
How can I pipe the egrep command to sed / awk / bash script?
Try:
$ command | awk -F'[][ \t]+' '/i686-nptl-linux-gnu$|i686-w64-mingw32$|x86_64-unknown-linux-gnu$|x86_64-w64-mingw32$/{printf "%s%s",(f?",":"export PLATNUMS="),$2; f=1} END{print""}'
export PLATNUMS=30,34,75,77
How it works
-F'[][ \t]+'
Use any number of spaces, tabs, or [ or ] as field separators.
/i686-nptl-linux-gnu$|i686-w64-mingw32$|x86_64-unknown-linux-gnu$|x86_64-w64-mingw32$/{...}`
For the lines of interest, perform the commands in curly braces.
printf "%s%s",(f?",":"export PLATNUMS="),$2; f=1
For the lines of interest, print what we want.
The variable f marks whether this is the first line of interest.
END{print""}
After reading all lines, print a newline.
Creating a shell variable
export PLATNUMS=$(command | awk -F'[][ \t]+' '/i686-nptl-linux-gnu$|i686-w64-mingw32$|x86_64-unknown-linux-gnu$|x86_64-w64-mingw32$/{printf "%s%s",(f?",":""),$2; f=1} END{print""}')
For example, if the file input contains your data:
$ export PLATNUMS=$(awk -F'[][ \t]+' '/i686-nptl-linux-gnu$|i686-w64-mingw32$|x86_64-unknown-linux-gnu$|x86_64-w64-mingw32$/{printf "%s%s",(f?",":""),$2; f=1} END{print""}' input)
$ declare -p PLATNUMS
declare -x PLATNUMS="30,34,75,77"
For those who prefer their commands spread out over multiple lines:
export PLATNUMS=$(command | awk -F'[][ \t]+' '
/i686-nptl-linux-gnu$|i686-w64-mingw32$|x86_64-unknown-linux-gnu$|x86_64-w64-mingw32$/{
printf "%s%s",(f?",":""),$2
f=1
}
END{
print""
}
')
Perhaps this way, I can't try with your egrep.
export PLATNUMS=$(XXX | egrep "i686-nptl-linux-gnu$|i686-w64-mingw32$|x86_64-unknown-linux-gnu$|x86_64-w64-mingw32$" | sed ':A;s/\[\([[0-9]*\)].*/\1/;$bB;N;bA;:B;s/\n/,/g')
echo $PLATNUMS
How this work ?
Your egrep command return a multiline text
so sed read this text line by line this way
sed '
:A # label A
# here with your example
# on the first line the pattern space look like that
# [30] i686-nptl-linux-gnu
# on the second line the pattern space look like
# 30
# [34] i686-w64-mingw32
s/\[\([[0-9]*\)].*/\1/ # substitute all digit enclose by [] by only the digit
# on the first line the pattern space become
# 30
# on the second line the pattern space become
# 30
# 34
# and so on for each line
$bB # on the last line jump to B
N # get a newline in the pattern space
bA # It is not the last line so jump to A
:B # label B
# here we have read all the line
# the pattern space look like that without the #
# 30
# 34
# 75
# 77
s/\n/,/g' # subtitute all \n by a comma
# the pattern space become
# 30,34,75,77
# $(XXX | egrep .... | sed ...) return 30,34,75,77 in the variable PLATNUMS
# It is better not to use all capital letters in your variable name
With GNU sed and tr:
$ XXX | egrep "i686-nptl-linux-gnu$|i686-w64-mingw32$|x86_64-unknown-linux-gnu$|x86_64-w64-mingw32$" | sed -E 's,]\s+.+$,,g' | sed 's,^\[,,g' | tr '\n' ',' | sed -E 's,(^.+$),export PLATNUMS=\1,' | sed 's/,$//' && echo
I'm not sure what you want to achieve but you might want to automatically eval the output export:
$ eval $(XXX | egrep "i686-nptl-linux-gnu$|i686-w64-mingw32$|x86_64-unknown-linux-gnu$|x86_64-w64-mingw32$" | sed -E 's,]\s+.+$,,g' | sed 's,^\[,,g' | tr '\n' ',' | sed -E 's,(^.+$),export PLATNUMS=\1,' | sed 's/,$//' && echo)
$ echo $PLATNUMS
30,34,75,77
If you ever think you need grep+sed or 2 greps or 2 seds or any other combination then you should use 1 call to awk instead, and you never need grep or sed when you're using awk:
export PLATNUMS=$(XXX | awk -F'[][]' '/(i686-nptl-linux-gnu|i686-w64-mingw32|x86_64-unknown-linux-gnu|x86_64-w64-mingw32)$/{p=(p ? p "," : "") $2} END{print p}')
Btw in case it's useful, here's a couple of briefer regexps:
(i686-(nptl-linux-gnu|w64-mingw32)|x86_64-(unknown-linux-gnu|w64-mingw32))$
((i686-nptl|x86_64-unknown)-linux-gnu|(i686|x86_64)-w64-mingw32)$
and depending on your input data (since this will include combinations not provided by the above) you MIGHT only need:
(i686|x86_64)-(nptl|unknown|w64)-(linux-gnu|mingw32)$
I know there have been similar questions posted but I'm still having a bit of trouble getting the output I want using awk FNR==NR...
I have 2 files as such
File 1:
123|this|is|good
456|this|is|better
...
File 2:
aaa|123
bbb|456
...
So I want to join on values from file 2/column2 to file 1/column1 and output file 1 (col 2,3,4) and file 2 (col 1).
Thanks in advance.
With awk you could do something like
awk -F \| 'BEGIN { OFS = FS } NR == FNR { val[$2] = $1; next } $1 in val { $(NF + 1) = val[$1]; print }' file2 file1
NF is the number of fields in a record (line by default), so $NF is the last field, and $(NF + 1) is the field after that. By assigning the saved value from the pass over file2 to it, a new field is appended to the record before it is printed.
One thing to note: This behaves like an inner join, i.e., only records are printed whose key appears in both files. To make this a right join, you can use
awk -F \| 'BEGIN { OFS = FS } NR == FNR { val[$2] = $1; next } { $(NF + 1) = val[$1]; print }' file2 file1
That is, you can drop the $1 in val condition on the append-and-print action. If $1 is not in val, val[$1] is empty, and an empty field will be appended to the record before printing.
But it's probably better to use join:
join -1 1 -2 2 -t \| file1 file2
If you don't want the key field to be part of the output, pipe the output of either of those commands through cut -d \| -f 2- to get rid of it, i.e.
join -1 1 -2 2 -t \| file1 file2 | cut -d \| -f 2-
If the files have the same number of lines in the same order, then
paste -d '|' file1 file2 | cut -d '|' -f 2-5
this|is|good|aaa
this|is|better|bbb
I see in a comment to Wintermute's answer that the files aren't sorted. With bash, process substitutions are handy to sort on the fly:
paste -d '|' <(sort -t '|' -k 1,1 file1) <(sort -t '|' -k 2,2 file2) |
cut -d '|' -f 2-5
To reiterate: this solution requires a one-to-one correspondence between the files
I would like to grep a specific word 'foo' inside specific files, then get the N lines around my match and show only the blocks that contain a second grep.
I found this but it doesn't really work...
find . | grep -E '.*?\.(c|asm|mac|inc)$' | \
xargs grep --color -C3 -rie 'foo' | \
xargs -n1 --delimiter='--' | grep --color -l 'bar'
For instance I have the file 'a':
a
b
c
d
bar
f
foo
g
h
i
j
bar
l
The file b:
a
bar
c
d
e
foo
g
h
i
j
k
I expect this for grep -c2 on both files because bar is contained in the -c2 range of foo. I do not get any match for ./bar because bar is not in the range -c2 of foo...
--
./foo- bar
./foo- f
./foo- **foo**
./foo- g
./foo- h
--
Any ideas?
You could do this pretty simply with a "while read line" loop:
find -regextype posix-extended -regex "./file[a-z]" | while read line; do grep -nHC2 "foo" $line | grep --color bar; done
Output:
./filea-5-bar
./filec-46-... host pwns.me [94.23.120.252]: 451 4.7.1 Local bar
configuration error ...
In this example, I created the following files:
filea - your example a
fileb - your example b
filec - some random exim log output with foo and bar tossed in 2 lines apart
filed - the same exim log output, but with foo and bar tossed in 3 lines apart
You could also pipe the output after done, to alter the format:
; done | sed 's/-([0-9]{1,6})-/: line: \1 ::: /'
Formatted output
./filea: line: 5 ::: bar
./filec: line: 46 ::: ... host pwns.me [94.23.120.252]: 451 4.7.1 Local bar configuration error ...
I think I only understand the first line of your question and this does what I think you mean!
#!/bin/bash
N=2
pattern1=a
pattern2=z
matchinglines=$(awk -v p="$pattern1" '$0~p{print NR}' file) # Generate array of matching line numbers
for x in ${matchinglines[#]}
do
((start=x-N))
[[ $start -lt 1 ]] && start=1 # Avoid passing negative line nmumbers to sed
((end=x+N))
echo DEBUG: Checking block between lines $start and $end
sed -ne "${start},${end}p" file | grep -q "$pattern2"
[[ $? -eq 0 ]] && sed -ne "${start},${end}p" file
done
You need to set pattern1 and pattern2 at the start of the script. It basically does some awk to build an array of the line numbers that match your first pattern. Then it loops through the array and sets the start and end range to +/-N either side of each matching line number. It then uses sed to extraact that block and passes it through grep to see if it contains pattern2 printing it if it does. It may not be the most efficient, but it is easy enough to understand and maintain.
It assumes your file is called file
pipe it twice
grep "[^foo\n]" | grep "\n{ntimes}foo\n{ntimes}"