Grep Entire File For Strings, Not Line by Line - grep

I am wanting to search for files that contain 'even:suspendcount>0' AND 'even:holdcount>0'. These 2 strings of text must be somewhere in the file, not necessarily on the same line. The problem I am running into is that my search results are not pulling back files that contain 1 sting of text on say line #5 and the other on line #10. It is only pulling back files if they are on the same line number. How would I search for files that contains multiple strings of text just somewhere in the file, they do not have to be on the same line.

Using grep
To use grep to get files that have both strings in either order:
grep -lZ 'even:suspendcount>0' * | xargs --null grep -l 'even:holdcount>0'
How it works:
grep -lZ 'even:suspendcount>0' *
This returns a nul-separated list of the names of files which contain the string even:suspendcount>0.
xargs --null grep -l 'even:holdcount>0'
Of the files selected by the first step, this returns the names of files which also contain even:holdcount>0
Because we are using nul-separation when passing the file names from one process to the next, this approach is safe even for difficult file names.
Using awk
This prints the file name of any file that contains both strings:
awk 'BEGINFILE{f=0;g=0} /even:suspendcount>0/{f=1} /even:holdcount>0/{g=1} f && g{print FILENAME; nextfile}' *
How it works:
BEGINFILE{f=0;g=0}
As we start reading a new file, variables f and g are set to zero (false).
/even:suspendcount>0/{f=1}
If we encounter a line containing even:suspendcount>0, then set variable f to 1.
/even:holdcount>0/{g=1}
Similarly, f we encounter a line containing even:holdcount>0, then set variable g to 1.
f && g{print FILENAME; nextfile}
If both f and g are true (nonzero), then print the filename and skip to the next file.

A grep pattern is line-oriented, i.e. in your case it should be 'even:suspendcount>0' OR 'even:holdcount>0' (namely grep -E 'even:(suspend|hold)count>0').

Related

Grepping twice using result of first Grep in Large file

Am given a list if ID which I need to trace back a name in a file
file: ID contains
1
2
3
4
5
6
The ID are contained in a Large 2 GB file called result.txt
ABC=John,dhds,72828,73737,3939,92929
CDE=John,uubad,32424,ajdaio,343533
FG1=Peter,iasisaio,097282,iosoido
WER=Ann,97391279,89719379,7391739
result,**id=1**,iuhdihdio,ihwoihdoih,iuqhwiuh,ABC
result2,**id=2**,9729179,hdqihi,hidqi,82828,CDE
result3,**id=3**,biasi,8u9829,90u209w,jswjso,FG1
So I cat the ID file into a variable
I then use this variable in a loop to grep out the values to link back to the name using grep and cut -d from results.txt and output to a variable
so variable contains ABS CDE FG1
In the same loop I pass the output of the grep to perform another grep on results.txt, to get the name
ie regrets file for ABC CDE FG1
I do get the answer but takes a long time is their a more efficient way?
Thanks
Making some assumptions about your requirement... ID's that are not found in the big file will not be shown in the output; the desired output is in the format shown below.
Here are mock input files - f1 for the id's and f2 for the large file:
[mathguy#localhost test]$ cat f1
1
2
3
4
5
6
[mathguy#localhost test]$ cat f2
ABC=John,dhds,72828,73737,3939,92929
CDE=John,uubad,32424,ajdaio,343533
FG1=Peter,iasisaio,097282,iosoido
WER=Ann,97391279,89719379,7391739
result,**id=1**,iuhdihdio,ihwoihdoih,iuqhwiuh,ABC
result2,**id=2**,9729179,hdqihi,hidqi,82828,CDE
result3,**id=3**,biasi,8u9829,90u209w,jswjso,FG1
Proposed solution and output:
[mathguy#localhost test]$ sed 's/.*/\*\*id=&\*\*/' f1 | grep -Ff - f2 | \
> sed -E 's/^.*\*\*id=([[:digit:]]*)\*\*.*,([^,]*)$/\1 \2/'
1 ABC
2 CDE
3 FG1
The hard work here is done by grep -F which might be just fast enough for your needs. There is some prep work and some clean-up work done by sed, but those are both on small datasets.
First we take the id's from the input file and we output strings in the format **id=<number>**. The output is presented as the fixed-character patterns to grep -F via the option -f (take the patterns from file, in this case from stdin, invoked as -; that is, from the output of sed).
After we find the needed lines from the big file, the final sed just extracts the id and the name from each line.
Note: this assumes that each id is only found once in the big file. (Actually the command will work regardless; but if there are duplicate lines for an id, your business users will have to tell you how to handle. What if you get contradictory names for the same id? Etc.)

What is the best way to use tr and grep on a folder?

I'm trying to search through all files in a folder for the following string
<cert>
</cert>
However, I have to remove line returns.
The following code works on one file but how can I pipe an entire folder through the tr and grep? The -l option is to only print the filename and not the whole file.
tr -d '\n' < test | grep -l '<cert></cert>'
The tr/grep approach requires grep to process the whole file as one line. While GNU grep can handle long lines, many others cannot. Also, if the file is large, memory may be taxed.
The following avoids those issues. It searches through all files in the currect directory and report names of any that contain <cert> on one line and </cert> on the next:
awk 'last ~ "<cert>" && $0 ~ "</cert>" {print FILENAME; nextfile} {last=$0}' *
How it works
awk implicitly loops over all lines in a file.
This script uses one variable, last, which contains the text of the previous line.
last ~ "<cert>" && $0 ~ ""`
This tests if (a) the last line contains the characters <cert> and (b) the current line contains the characters </cert>.
If you actually wanted lines that contain <cert> and no other characters, then replace ~ with ==.
{print FILENAME; nextfile}
If the preceding condition returns true, then this prints the file's name and starts on the next file.
(nextfile was a common extension to awk that became POSIX 2012.)
{last=$0}
This updates the variable last to have the current line.

How to clean a CSV file using the 'grep' command

Assuming that we have the following record {(XXX1),(XXX2)},whatever What I want is, extract the information, based on the following rule, preferably with 'grep': if {} contains less or equal to two UNIQUE elements, the ones inside the (), then keep (both) of them, otherwise delete the whole row. As a further step, I want to extract the values within the (), and finally write the remaining lines in the following form: XXX1,XXX2,whatever
UPDATE:
For the following input:
{(XXX1),(XXX2)},whatever,unique=2
{(XXX1),(XXX1),(XXX1),(XXX2)},whatever,unique=2
{(XXX1)},whatever,unique=1
{},whatever,unique=0
{(XXX1),(XXX2),(XXX3),(XXX4)},whatever
I should get the following output:
XXX1,XXX2,whatever,unique=2
XXX1,whatever,unique=1
awk could do it, check this one-liner:
awk -F'[}{]' '{split($2,a,",");delete(b);for(x in a)b[a[x]]}length(b)<=2' file
let's do a small test:
kent$ cat file
ok,{(XXX1),(XXX2)},whatever,unique=2
ok,{(XXX1),(XXX1),(XXX1),(XXX2)},whatever,unique=2
ok,{(XXX1)},whatever,unique=1
ok,{},whatever,unique=0
nok,{(XXX1),(XXX2),(XXX3),(XXX4)},whatever
kent$ awk -F'[}{]' '{split($2,a,",");delete(b);for(x in a)b[a[x]]}length(b)<=2' file
ok,{(XXX1),(XXX2)},whatever,unique=2
ok,{(XXX1),(XXX1),(XXX1),(XXX2)},whatever,unique=2
ok,{(XXX1)},whatever,unique=1
ok,{},whatever,unique=0
you can see, the nok line was removed
EDIT
awk -F'[}{]' '{gsub(/[()]/,"");split($2,a,",");delete(b);for(x in a)b[a[x]];l=length(b)}l<=2&&l>0{s="";for(x in b)s=s""x",";sub(/,$/,"",s);y[s]=s $3}END{for(x in y)print y[x]}' file
test
kent$ cat file
{(XXX1),(XXX2)},whatever,unique=2
{(XXX1),(XXX1),(XXX1),(XXX2)},whatever,unique=2
{(XXX1)},whatever,unique=1
{},whatever,unique=0
{(XXX1),(XXX2),(XXX3),(XXX4)},whatever
kent$ awk -F'[}{]' '{gsub(/[()]/,"");split($2,a,",");delete(b);for(x in a)b[a[x]];l=length(b)}l<=2&&l>0{s="";for(x in b)s=s""x",";sub(/,$/,"",s);y[s]=s $3}END{for(x in y)print y[x]}' file
XXX1,XXX2,whatever,unique=2
XXX1,whatever,unique=1

extract a line from a file using csh

I am writing a csh script that will extract a line from a file xyz.
the xyz file contains a no. of lines of code and the line in which I am interested appears after 2-3 lines of the file.
I tried the following code
set product1 = `grep -e '<product_version_info.*/>' xyz`
I want it to be in a way so that as the script find out that line it should save that line in some variable as a string & terminate reading the file immediately ie. it should not read furthermore aftr extracting the line.
Please help !!
grep has an -m or --max-count flag that tells it to stop after a specified number of matches. Hopefully your version of grep supports it.
set product1 = `grep -m 1 -e '<product_version_info.*/>' xyz`
From the man page linked above:
-m NUM, --max-count=NUM
Stop reading a file after NUM matching lines. If the input is
standard input from a regular file, and NUM matching lines are
output, grep ensures that the standard input is positioned to
just after the last matching line before exiting, regardless of
the presence of trailing context lines. This enables a calling
process to resume a search. When grep stops after NUM matching
lines, it outputs any trailing context lines. When the -c or
--count option is also used, grep does not output a count
greater than NUM. When the -v or --invert-match option is also
used, grep stops after outputting NUM non-matching lines.
As an alternative, you can always the command below to just check the first few lines (since it always occurs in the first 2-3 lines):
set product1 = `head -3 xyz | grep -e '<product_version_info.*/>'`
I think you're asking to return the first matching line in the file. If so, one solution is to pipe the grep result to head
set product1 = `grep -e '<product_version_info.*/>' xyz | head -1`

can grep identify only one matching word in a file?

I have a file with a list of word and I want to identify only the word in the file which exactly matches another word?
So, for example, if I have in the file, the words "BEBE, BEBÉ, BEBÉS", and I look for "BEBE", I want it to return just the first one, which is the exact match.
I tried using grep -w "BEBE" filename.txt, but it doesn't work. It still gives me back all three of them.
Use -o to only display the part that matches with -w, also use -F for fixed string if you're not regex matching:
$ cat file
BEBE, BEBÉ, BEBÉS
$ grep -woF 'BEBÉ' file
BEBÉ
$ grep -woF 'BEBÉS' file
BEBÉS

Resources