I have a text file that contains text blocks roughly formatted like this:
Beginning of block
...
...
...
.........some_pattern.......
...
...
End of block
Beginning of block
...
... etc.
The blocks can have any number of lines but always start with the two delimiters. What I'd like to do is match "some_pattern" and print the whole block to stdout. With the example above, I would get this only:
Beginning of block
...
...
...
.........some_pattern.......
...
...
End of block
I've tried with something like this but without success:
grep "Beginning of block\n.*some_pattern.*\n.*End of block"
Any idea how to do this with grep? (or maybe with some other tool)
I guess awk is better for this:
awk '/Beginning of block/ {p=1};
{if (p==1) {a[NR]=$0}};
/some_pattern/ {f=1};
/End of block/ {p=0; if (f==1) {for (i in a) print a[i]};f=0; delete a}' file
Explanation
It just prints when the p flag is "active" and some_pattern is matched:
When it finds Beginning of block, then makes variable p=1 and starts storing the lines in the array a[].
If it finds some_pattern, it sets the flag f to 1, so that we know the pattern has been found.
When it finds End of block it resets p=0. If some_pattern had been found since the last Beginning of block, all the lines that had been stored are printed. Finally a[] is cleared and f is reset; we will have a fresh start when we again encounter Beginning of block.
Other test
$ cat a
Beginning of block
blabla
.........some_pattern.......
and here i am
hello
End of block
Beginning of block
...
... etc.
End of block
$ awk '/Beginning of block/ {p=1}; {if(p==1){a[NR]=$0}}; /some_pattern/ {f=1}; /End of block/ {p=0; if (f==1) {for (i in a) print a[i]}; delete a;f=0}' a
Beginning of block
blabla
.........some_pattern.......
and here i am
hello
End of block
The following might work for you:
sed -n '/Beginning of block/!b;:a;/End of block/!{$!{N;ba}};{/some_pattern/p}' filename
Not sure if I missed something but here is a simpler variation of one of the answers above:
awk '/Beginning of block/ {p=1};
/End of block/ {p=0; print $0};
{if (p==1) print $0}'
You need to print the input line in the End of Block case to get both delimiters.
I wanted a slight variation that doesn't print the delimiters. In the OP's question the delimiter pattern is simple and unique. Then the simplest is to pipe into | grep -v block. My case was more irregular, so I used the variation below. Notice the next statement so the opening block isn't printed by the third statement:
awk '/Beginning of block/ {p=1; next};
/End of block/ {p=0};
{if (p==1) print $0}'
Here's one way using awk:
awk '/Beginning of block/ { r=""; f=1 } f { r = (r ? r ORS : "") $0 } /End of block/ { if (f && r ~ /some_pattern/) print r; f=0 }' file
Results:
Beginning of block
...
...
...
.........some_pattern.......
...
...
End of block
sed -n "
/Beginning of block/,/End of block/ {
N
/End of block/ {
s/some_pattern/&/p
}
}"
sed is efficient for such a treatment
with grep, you certainly should pass through intermediary file or array.
Related
I have this text file:
# cat letter.txt
this
is
just
a
test
to
check
if
grep
works
The letter "e" appear in 3 words.
# grep e letter.txt
test
check
grep
Is there any way to return the letter printed on left of the selected character?
expected.txt
t
h
r
With shown samples in awk, could you please try following.
awk '/e/{print substr($0,index($0,"e")-1,1)}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/e/{ ##Looking if current line has e in it then do following.
print substr($0,index($0,"e")-1,1)
##Printing sub string from starting value of index e-1 and print 1 character from there.
}
' Input_file ##Mentioning Input_file name here.
You can use positive lookahead to match a character that is followed by an e, without making the e part of the match.
cat letter.txt | grep -oP '.(?=e)'
With sed:
sed -nE 's/.*(.)e.*/\1/p' letter.txt
Assuming you have this input file:
cat file
this
is
just
a
test
to
check
if
grep
works
egg
element
You may use this grep + sed solution to find letter or empty string before e:
grep -oE '(^|.)e' file | sed 's/.$//'
t
h
r
l
m
Or alternatively this single awk command should also work:
awk -F 'e' 'NF > 1 {
for (i=1; i<NF; i++) print substr($i, length($i), 1)
}' file
This might work for you (GNU sed):
sed -nE '/(.)e/{s//\n\1\n/;s/^[^\n]*\n//;P;D}' file
Turn off implicit printing and enable extended regexp -nE.
Focus only on lines that meet the requirements i.e. contain a character before e.
Surround the required character by newlines.
Remove any characters before and including the first newline.
Print the first line (up to the second newline).
Delete the first line (including the newline).
Repeat.
N.B. The solution will print each such character on a separate line.
To print all such characters on their own line, use:
sed -nE '/(.e)/{s//\n\1/g;s/^/e/;s/e[^\n]*\n?//g;s/\B/ /g;p}' file
N.B. Remove the s/\B /g if space separation is not needed.
With GNU awk you can use empty string as FS to split the input as individual characters:
awk -v FS= '/[e]/ {for(i=2;i<=NF;i++) if ($i=="e") print $(i-1)}' file
t
h
r
Excluding "e" at the beginning in the for loop.
edited
empty string if e is the first character in the word.
For example, this input:
cat file2
grep
erroneously
egg
Wednesday
effectively
awk -v FS= '/^[e]/ {print ""} /[e]/ {for(i=2;i<=NF;i++) if ($i=="e") print $(i-1)}' file2
r
n
W
n
f
v
I have some files, and I want grep to return the lines, where I have at least one string Position:"Engineer" AND at least one string which does have Position not equal to "Engineer"
So in the below file should return only first line:
Position:"Engineer" Name:"Jes" Position:"Accountant" Name:"Criss"
Position:"Engineer" Name:"Eva" Position:"Engineer" Name:"Adam"
I could write something like
grep 'Position:"Engineer"' filename | grep 'Position:"Accountant"'
And this works fine (I get only first line), but the thing is I don't know what are all of the possible values in Position, so the grep needs to be generic something like
grep 'Position:"Engineer"' filename | grep -v 'Position:"Engineer"'
But this doesn't return anything (as both grep contradict each other)
Do you have any idea how this can be done?
This line works :
grep "^Position:\"Engineer\"" filename | grep -v " Position:\"Engineer\""
The first expresion with "$" catch only the Position at the begining of line, the second expression with " " space remove the second "Postion" expression.
You can avoid the pipe and additional subshell by using awk if that is allowed, e.g.
awk '
$1~/Engineer/ {if ($3~/Engineer/) next; print}
$3~/Engineer/ {if ($1~/Engineer/) next; print}
' file
Above just checks if the first field contains Engineer and if so checks if field 3 also contains Engineer, and if so skips the record, if not prints it. The second rule, just swaps the order of the tests. The result of the tests is that Engineer can only appear in one of the fields (either first or third, but not both)
Example Use/Output
With your sample input in file, you would have:
$ awk '
$1~/Engineer/ {if ($3~/Engineer/) next; print}
$3~/Engineer/ {if ($1~/Engineer/) next; print}
' file
Position:"Engineer" Name:"Jes" Position:"Accountant" Name:"Criss"
Use negative lookahead to exclude a pattern after match.
grep 'Position:"Engineer"' | grep -P 'Position:"(?!Engineer)'
With two greps in a pipe:
grep -F 'Position:"Engineer"' file | grep -Ev '(Position:"[^"]*").*\1'
or, perhaps more robustly
grep -F 'Position:"Engineer"' file | grep -v 'Position:"Engineer".*Position:"Engineer"'
In general case, if you want to print the lines with unique Position fields,
grep -Ev '(Position:"[^"]*").*\1' file
should do the job, assuming all the lines have the format specified. This will work also when there are more than two Position fields in the line.
Single entry has multiple lines. Each entry is separated by two blank lines.
Each entry has to be made into a single line followed by a delimiter(;).
Sample Input:
Name:Sid
ID:123
Name:Jai
ID:234
Name:Arun
ID:12
Tried replacing the blank lines with cat test.cap | tr -s [:space:] ';'
Output:
Name:Sid;ID:123;Name:Jai;ID:234;Name:Arun;ID:12;
Expected Output:
Name:SidID:123;Name:JaiID:234;Name:ArunID:12;
Same is the case with Xargs.
I've used sed command as well but it only joined two lines into one. Where as I've 132 lines as one entry and 1000 such entries in one file.
You may use
cat file | awk 'BEGIN { FS = "\n"; RS = "\n\n"; ORS=";" } { gsub(/\n/, "", $0); print }' | sed 's/;;*$//' > output.file
Output:
Name:SidID:123;Name:JaiID:234;Name:ArunID:12
Notes:
FS = "\n" will set field separators to a newline`
RS = "\n\n" will set your record separators to double newline
gsub(/\n/, "", $0) will remove all newlines from a found record
sed 's/;;*$//' will remove the trailing ; added by awk
See the online demo
Could you please try following.
awk 'NF{val=(val?$0~/^ID/?val $0";":val $0:$0)} END{print val}' Input_file
Output will be as follows.
Name:SidID:123;Name:JaiID:234;Name:ArunID:12;
Explanation: Adding explanation of above code too now.
awk ' ##Starting awk program here.
NF{ ##Checking condition if a LINE is NOT NULL and having some value in it.
val=(val?$0~/^ID/?val $0";":val $0:$0) ##Creating a variable val here whose value is concatenating its own value along with check if a line starts with string ID then add a semi colon at last else no need to add it then.
}
END{ ##Starting END section of awk here.
print val ##Printing value of variable val here.
}
' Input_file ##Mentioning Input_file name here.
This might work for you (GNU sed):
sed -r '/./{N;s/\n//;H};$!d;x;s/.//;s/\n|$/;/g' file
If it is not a blank line, append the following line and remove the newline between them. Append the result to the hold space and if it is not the end of the file, delete the current line. At the end of the file, swap to the hold space, remove the first character (which will be a newline) and then replace all newlines (append an extra semi-colon for the last line only) with semi-colons.
I have this file.txt:
a b c
a f g
e h j
I wrote an awk script that does
BEGIN {...}
{...}
END {
a = "a"
b = "b"
system("grep " a " file.txt | grep " b " > t")
}
I expect it to print a b c in file t. Running the same script from ConEmu on Windows 7 will produce an empty t file. On the other hand, executing grep a file.txt | grep b > t will produce the expected result.
why am I doing so: I'm parsing obscure data from a complicated file with multiple field separators (or nested fields, if you prefer). After having the structure of the input file (which is not file.txt), since each line is a command that will be executed and their order is important to me, I would like to know if something is set to something else and if that is a condition for a new command to be introduced in the input file using awk. The condition file database is file.txt.
Why is that? Am I doing something wrong? Am I blind somehow? On Windows 7, gawk 4.2.0, (GNU) grep 2.4.2
I'm also unable to locate similar questions that might help: if you know any, do flag this as duplicate.
Since my goal was to achieve redirection of grep output to file from an awk script, and since the system() way did not work, I found this morning that:
BEGIN {...}
{...}
END {
a = "a"
b = "b"
ask = "grep " a " file.txt | grep " b
while (ask | getline _foo) {
print _foo > "dump.txt"
}
close(ask)
}
works just fine. As a bonus, I can do stuff with each line returned by grep. Again, this is working in Windows 7 using gawk 4.2.0 and ConEmu.
I know that if I have a file of patterns I can use
grep -f pat_file search_file
to search the file normally. How would you approach performing this task so that the command looks for each pattern only once?
I'm looking for efficiency, so it might be that simply writing a python program is the most efficient way to do it, but I bet there's something out there.
I would do this in awk:
FNR == NR { pattern[NR] = $0; next }
{
for (i in pattern) {
if ($0 ~ pattern[i]) {
print
delete pattern[i]
continue
}
}
}
To be called as follows:
awk -f script.awk patterns infile
where patterns contains your patterns and infile is the file you want to search.
The first command reads the patterns into an array; the second command (only executed for files after the first file) loops over the patterns, prints matching lines, deletes the pattern from the array and skips the rest of the patterns.
For an example input of
line with pattern1
another line with pattern1
line with pattern2
pattern1 again
pattern3 now
and pattern2
and a pattern file
pattern1
pattern2
pattern3
the output is
$ awk -f script.awk patterns infile
line with pattern1
line with pattern2
pattern3 now
To optimize, you could add a check after the delete statement to see if there are any patterns left and exit if not.
This MAY be what you're looking for:
awk '
NR==FNR { regexps[$0]; next }
{
found = 0
for (regexp in regexps) {
if ($0 ~ regexp) {
found = 1
delete regexps[regexp]
}
}
}
found
' pat_file search_file
but since you haven't provided any testable sample input and expected output it's just an untested guess.
By the way - never use the word "pattern" to describe what type of matching you want as it's ambiguous, use "string" or "regexp", whichever you really mean.