Limitations of Grep

Limitations of Grep - grep

When I grep on a log file , the matched patterns are truncated to 2048 characters.
Eg: grep 'txn-id-111111' transactions.log gives
2015-12-18 (txn-id-111111) EmployeeInformation[AssociateDetails[empId=161223,empname=JohnSmith],AssociateAddress[street1 =1074 NY boulevard]..........................empBranch=N
The object is printed only until 2048 characters.
Is there a way to retrieve the complete line without getting truncated?
FYI..I'm using using super putty.

Related

fgrep not counting line with accented letter in it

The chances that I found a bug in fgrep are rather small, so my bet is that I missunderstand something. I was counting the number of addresses in a VCF file with
fgrep FN: Contacts.vcf| wc -l
to quickly find the number of NV (full name) fields.
I noticed that I lacked one compared to the count in my nextcloud adress book.
I tracked it down to the line of a friend called Jurriën.
If I keep his name fgrep doesn't count the line
FN:Jurriën Somelastname
If I remove the ë fgrep counts the line.
FN:Jurrin Somelastname
This is a simple DOS style encoded textfile, straight out of the Nextcloud server.
However fgrep sees it as a binary. so fgrep -a works. Is this the expected behaviour?

grep file with a large array

Hi i have a few archive of FW log and occasionally im required to compare them with a series of IP addresses (thousand of them) to get the date and time if the ip addresses matches. my current script is as follow:
#input the list of ip into array
mapfile -t -O 1 var < ip.txt while true
do
#check array is not null
if [[-n "${var[i]}"]] then
zcat /.../abc.log.gz | grep "${var[i]}"
((i++))
It does work but its way too slow and i would think that grep-ping a line with multiple strings would be faster than zcat on every ip line. So my question is is there a way to generate a 'long grep search string' from the ip.txt? or is there a better way to do this

Sure. One thing is that using cat is usually slightly inefficient. I'd recommend using zgrep here instead. You could generate a regex as follows
IP=`paste -s -d ' ' ip.txt`
zgrep -E "(${IP// /|})" /.../abc.log.gz
The first line loads the IP addresses into IP as a single line. The second line builds up a regex that looks something like (127.0.0.1|8.8.8.8) by replacing spaces with |'s. It then uses zgrep to search through abc.log.gz once, with that -Extended regex.
However, I recommend that you do not do this. Firstly, you should escape strings put into a regex. Even if you know that ip.txt really contains IP addresses (e.g. not controlled by a malicious user), you should still escape the periods. But rather than building up a search string and then escape it, just use the -Fixed strings and -file features of grep. Then you get the simple and fast one-liner:
zgrep -F -f ip.txt /.../abc.log.gz

Grep's word boundaries include spaces?

I tried to use grep to search for lines containing the word "bead" using "\b" but it doesn't find the lines containing the word "bead" separated by space. I tried this script:
cat in.txt | grep -i "\bbead\b" > out.txt
I get results like
BEAD-air.JPG
Bead, 3 sided MET DP110317.jpg
Bead. -2819 (FindID 10143).jpg
Bead(Gem), Artefacts of Phu Hoa site(Dong Nai province).jpg
Romano-British pendant amulet (bead) (FindID 241983).jpg
But I don't get the results like
Bead fun.jpg
Instead of getting some 2,000 lines, I'm only getting 92 lines
My OS is Windows 10 - 64 bit but I'm using grep 2.5.4 from the GnuWin32 package.
I've also tried the MSYS2, which includes grep 3.0 but it does the same thing.
And then, how can I search for words separated by space?
LATER EDIT:
It looks like grep has problems with big files. My input file is 2.4 GB in size. With smaller files, it works - I reported the bug here: https://sourceforge.net/p/getgnuwin32/discussion/554300/thread/03a84e6b/

Try this,
cat in.txt | grep -wi "bead"
-w provides you a whole word search

What you are doing normally should work but there are ways of setting what is and is not considered a word boundary. Rather than worry about it please try this instead:
cat in.txt | grep -iP "\bbead(\b|\s)" > out.txt
The P option adds in Perl regular expression power and the \s matches any sort of space character. The Or Bar | separates options within the parens ( )
While you are waiting for grep to be fixed you could use another tool if it is available to you. E.g.
perl -lane 'print if (m/\bbead\b/i);' in.txt > out.txt

Grep looking for something that fits one paramter but NOT the other

In my data set, we have a multitude of emails that must be parsed (alongside a myriad of other unrelated information like phone numbers and addresses and such.)
I am attempting to look for something that meets the criteria of an email, but does not have the proper format of an email. So, I tried using grep's "AND" function, wherein it fits the second parameter but not the first.
grep -E -c -v "^[a-mA-M][a-zA-Z]*\.#[A-Za-z]+\.[A-Za-z]{2,6}"Data.bash | grep # Data.bash
How should I be implementing this? As this just finds anything with an # in it (as the first parameter returns 0 and the second is just finding everything with an # in it).
In short, How do I AND two conditions together in grep?
EDIT: Sample Data
An email address has a user-id and domain names can consist of letters, numbers,
periods, and dashes.
Matches:
saltypickle#gmail.com
saltypickle#g-mail.com
No Match:
saltypickle#g^mail.com
saltypickle#.
#saltyPickle#
saltyPickle#

grep -P '^\w+#[[:alnum:]-.]+.com' inputfile
saltypickle#gmail.com
saltypickle#g-mail.com
This will allow any alpha ,number, - or . as domain name.
Following will print invalid email addresses:
grep -vP '^\w+#[[:alnum:]-.]+.com' inputfile
saltypickle#g^mail.com
saltypickle#.
#saltyPickle#
saltyPickle#

Only output values within a certain range

I run a command that produce lots of lines in my terminal - the lines are floats.
I only want certain numbers to be output as a line in my terminal.
I know that I can pipe the results to egrep:
| egrep "(369|433|375|368)"
if I want only certain values to appear. But is it possible to only have lines that have a value within ± 50 of 350 (for example) to appear?

grep matches against string tokens, so you have to either:
figure out the right string match for the number range you want (e.g., for 300-400, you might do something like grep -E [34].., with appropriate additional context added to the expression and a number of additional .s equal to your floating-point precision)
convert the number strings to actual numbers in whatever programming language you prefer to use and filter them that way
I'd strongly encourage you to take the second option.

I would go with awk here:
./yourProgram | awk '$1>250 && $1<350'
e.g.
echo -e "12.3\n342.678\n287.99999" | awk '$1>250 && $1<350'
342.678
287.99999

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Limitations of Grep - grep

Related

fgrep not counting line with accented letter in it

grep file with a large array

Grep's word boundaries include spaces?

Grep looking for something that fits one paramter but NOT the other

Only output values within a certain range

Categories

Resources