Display multiple lines using multiple patterns - grep

Hope you can shed some light in one of my requirements. Let say I have file with the following entries:
ABC 123
XYZ 789
XYZ 456
ABC 234
XYZ 789
ABC 567
XYZ 789
XYZ 678
XYZ 123
Basically, I have rows ABC with X numbers of XYZ rows after it. The number of XYZ records in each ABC varies from 1 to many.
I need a shell script that will output the ABC and the corresponding XYZ based on the patterns in the 2nd column.
For example, display the ABC record with pattern 567 and the corresponding XYZ record with pattern 678.
The output should only be:
ABC 567
XYZ 678

To solve this, I use awk to massage the data into a single line, then grep on that output, then sed to revert matching entries to the original format.
awk '{ printf ($1 == "ABC" ? "\n" : " #¶# ") $0 }' file |grep 567 |sed 's/ #¶# /\n/g'
Code walk:
I used #¶# as a delimiter. Use something that won't have conflicts in your data (otherwise you'll have to deal with escaping it). Also note that your UTF8 support mileage may vary.
awk prints, without trailing line break, two things concatenated:
If we're on an ABC line, a line break (\n). Otherwise, the delimiter (#¶#).
Then the existing line ($0)
grep then runs for your query. This lets you use -f FILE_OF_PATTERNS or a collection of -e PATTERNs
sed then reverts the delimiters back to the original format
This has the advantage of going line by line. If you have tens of thousands of XYZs in a single ABC, it'll be a bit slower, but this doesn't keep anything in memory, so this should be pretty scalable.
Here is the output of the above awk command (yes, there is a leading blank line, which doesn't matter):
$ awk '{ printf ($1 == "ABC" ? "\n" : " #¶# ") $0 }' file
ABC 123 #¶# XYZ 789 #¶# XYZ 456
ABC 234 #¶# XYZ 789
ABC 567 #¶# XYZ 789 #¶# XYZ 678 #¶# XYZ 123

try this if it works for you. I hope I understood your requirement right:
awk -v p1='ABC 567' -v p2='XYZ 678'
'$0~p1{t=1;print;next}/^ABC/{t=0}$0~p2&&t' file

Related

How to match the beginning of a line with patterns from id file?

I want to use my id_file to search my big_file extracting lines that match the id at the beginning of the line in big_file.
I'm a beginner and I'm struggling with grep (version grep (BSD grep) 2.5.1-FreeBSD) and understanding the solutions as cited below.
My id_file contains id's:
67b
84D
118
136
166
My big_file looks something like this:
118 ABL1_BCR
118 AC005258
166 HSP90AB1
166 IKZF2_SP
166 IL1RAP_D
136 ABL1_BCR
136 ABL1_BCR
555 BCR_136
555 BCR_136
555 BCR_136
59 UNC45B_M 166
59 WASF2_GN 166
59 YPEL5_CX 166
As suggested by Chris Seymour here
Try 1: I used
grep -wFf id_file big_file
That didn't work obviously, as the numbers occur elsewhere in the lines of the big_file.
Try 2: I modified the id_file;
^67b
^84D
^118
^136
^166
And ran grep -wFf id_file big_file again.
Of course, that didn't work either
I looked at batimar's take here but I'm failing to implement the suggestion.
Better usage is taking only some patterns from some file and this patterns use for your file
grep '^PAT' patterns.txt | grep -f - myfile
This will take all patterns from file patterns.txt starting with PAT and use this patterns from the next grep to search in myfile.
I tried to reproduce the code above with my example in several ways but apparently I just don't get what they mean there as none of it worked.
There were 2 outcomes to my tinkering 1: No such file or directory or no output at all.
Is there even a way to do this with grep only?
I'd greatly appreciate if anyone was able to break it down for me.
This seems to be an issue with BSD grep. See
https://unix.stackexchange.com/questions/352977/why-does-this-bsd-grep-result-differ-from-gnu-grep for similar issues.
You can use awk as an alternate (there's probably a duplicate somewhere with this exact solution):
awk 'NR==FNR{a[$1]; next} $1 in a' id_file large_file
NR==FNR{a[$1]; next} builds an associative array with first field of id_file as keys
$1 in a will be true if first field of a line from large_file matches any of the keys in array a. If so, entire line will be printed.
Using the id_file as described in the OP "Try 2"
^67b
^84D
^118
^136
^166
Then try this:
fname="id_file”; lines=$(cat $fname); for line in $lines; do grep $line big_file >> filtered_output; done

Prepend match number to output match lines of grep?

Say I have this file, test.log:
blabla test test
20 30 40
hello world
100 100
34 506 795
blabla test2
50 60 70
hello
10 10
200 200
blabla test BB
30 40 50
100 100
20 20 20 20
I would like to print all lines with blabla in them, the line after that - with the match number prepended.
Without match number, it is easy:
$ grep -A1 "blabla" test.log
blabla test test
20 30 40
--
blabla test2
50 60 70
--
blabla test BB
30 40 50
With a prepended match number, it would look like this:
1: blabla test test
1: 20 30 40
--
2: blabla test2
2: 50 60 70
--
3: blabla test BB
3: 30 40 50
The tricky part is, I want to preserve the match number, regardless if I just grep for a single line match, or with context (X lines after or before the match).
Is there an easy way to do this? If I could do a format specifier for the number, as in %03d, even better - but just a usual number would be fine too...
Something like
grep -A1 blahblah test.log | awk -v n=1 '$0 == "--" { n += 1; print; next }
{ printf("%03d: %s\n", n, $0) }'
Perl to the rescue!
perl -ne '/blabla/ and print ++$i, ":$_" and print "$i:", scalar <>' -- file
-n reads the input line by line
each line is read into the special variable $_
the diamond operator <> reads the next line from the input file
scalar makes it read just one line, not all the remaining ones
the variable $i is incremented each time blabla is encountered and is prepended to each output line.
Your specification doesn't handle the case when two blablas are present on adjacent lines.
To format the numbers, use sprintf:
perl -ne 'if (/blabla/) { $f = sprintf "%03d", ++$i; print $f, ":$_"; print "$f:", scalar <>}'

How to remove a word that matches at the beginning of each line

I would like to ask, how could I remove lines contaning the pattern AAA at their beginning?
example:
contents of file.txt:
AAA/bb/cc/d/d/d/d/e
AAA/dd/r/t/e/q/e/tg
AAA/uu/y/t/r/e/w/q
123 234 456 AAA/f/f/f/f/g/g
555 999 000 AAA/y/g/h/u/j/k
I would like to remove the first three lines with this type of pattern but would like to keep the last two lines.
The output of the command should be:
123 234 456 AAA/f/f/f/f/g/g
555 999 000 AAA/y/g/h/u/j/k
How could I do it with a unix command?
Thank you.
sed `/^AAA/d` file.txt
The /^AAA/ is a regular expression which matches AAA at the beginning of a line (^). d deletes the selected lines.
man sed for more information on the sed stream editor.

Why cut cannot work?

So basically I want to print out certain columns of the .data, .rodata and .bss sections of an ELF binary, and I use this command:
readelf -S hello | grep "data\|bss" | cut -f1,2,5,6
but to my surprise, the results are:
[15] .rodata PROGBITS 080484d8 0004d8 000020 00 A 0 0 8
[24] .data PROGBITS 0804a00c 00100c 000008 00 WA 0 0 4
[25] .bss NOBITS 0804a014 001014 000008 00 WA 0 0 4
which means the cut didn't work...
I don't know why and after some search online, I still don't know how to make it right, could anyone give me some help?
I would have used awk here since you can do all with one command.
readelf -S hello | awk '/data|bss/ {print $1,$2,$5,$6}'
awk will work with any blank space a separator. One space, multiple space, tabs etc.
You input is actually demited by spaces not TAB. By default cut expects TAB. This should work:
cut -d ' ' -f1,2,5,6
It specifies the delimiter as ' ' (space).

Need to grep for first occurrences of multiple strings

I am attempting to return the first occurrence of multiple strings, ie, I want to select the lines from the following text where the first occurrence of 1259, 3009, and 1589 happen.
ADWN 1259 11:00 B23
ADWN 3009 12:00 B19
DDWN 723 11:30 B04
ADWN 1589 14:20 B12
ADWN 1259 11:10 B23
DDWN 2534 13:00 B16
ADWN 3009 11:50 B14
This gives me all matches:
grep '1259\|3009\|1589' somelog.log
And this gives me only the first match
grep -m 1 '1259\|3009\|1589' somelog.log
I want to return the following:
ADWN 1259 11:00 B23
ADWN 3009 12:00 B19
ADWN 1589 14:20 B12
I think that creating a file with the required values, and then looping through the file, passing each number individually into the grep command will give me what I am looking for, but I haven't found an example of this. Is there a simple solution for this, is a loop the best way to handle this, or has this example already been answered elsewhere?
Thanks in advance for your ideas and suggestions--
Clyde
One way using awk:
awk '!array[$2]++ && $2 ~ /^1259$|^3009$|^1589$/' file.txt
Results:
ADWN 1259 11:00 B23
ADWN 3009 12:00 B19
ADWN 1589 14:20 B12
edit:
I should really get into the habit of reading the whole question first. I see that you're thinking of creating a file with the values you'd like to find the first occurrence of. Put these in a file called values.txt with one value per line. For example; here's the contents of values.txt:
1259
3009
1589
Then run this:
awk 'FNR==NR { array[$0]++; next } $2 in array { print; delete array[$2] }' values.txt file.txt
Results:
ADWN 1259 11:00 B23
ADWN 3009 12:00 B19
ADWN 1589 14:20 B12
1st command explanation:
If the second column ($2) equals one of those three values listed, add it to the array if it's not already in there. awk prints the whole line by default.
2nd command explanation:
FNR is number of records relative to the current input file.
NR is the total number of records.
The FNR==NR { ... } construct is only true for the first input file. So for each of the lines in values.txt, we add the whole line ($0) to an array (I've called it array, but you could give it another name). next forces awk to read the next line in values.txt (and skip processing the rest of the command). When FNR==NR is no longer true, the second file in the arguments list is read. We then check for the second column ($2)in the array, if it's in there, print it and remove it from the array. By using delete we essentially set a max count of one.
Try this. It might not work depending on your grep version:
grep -m 1 -e pattern1 -e pattern2
You can use for each (see Linux Shell Script For Each File in a Directory Grab the filename and execute a program)
For each pattern you want to match execute a separate grep concatenating to the output file
This one will work too.
for i in $(cut -d " " -f1 somelog.log | sort -u); do LC_ALL=C fgrep -m1 "$i" somelog.log; done

Resources