Problems finding human readable ascii text in log files - grep

I understand that assisting with college assignments is frowned upon here but I am seeking help with grep. I have been searching for answers online all day & am running out of time.
I have a zipped logparse file that I have unzipped - it contains over 1000 logs, I need to examine the contents of this file to obtain a 72 character human readable text from one of the log files, any grep command i run freezes my VM, losing time re-booting ect., I have attempted
grep --text <file>
grep -a <file>
grep -w <file>
grep [char] <file> returns a output stating text has been detected but again when i go to view via vi or vim it again crashes
any help greatly appreciated,
Sarah

Related

Grep returns no such file or directory when using multiple flags

I am a beginner of bash script. I just started to write a script where it checks the contents of b.txt can all be found in a.txt. (line by line preferably). My code is as following:
grep -Ffw b.txt a.txt
As you can see, I want to do fixed string instead of REGEX, I want to check everything from the b.txt file, because there are some strings inside the b.txt and I want to check if all of them exist in a.txt. And I also want to match the whole word only of course. So these are the requirements, however when I run this command it returns me an error says: grep: w: No such file or directory
I am thinking that maybe there are some limitations of the flags in bash? Sorry I am not really familiar with the language, didn't read much about the MAN page etc. If anyone could help me to solve the puzzle it would be appreciated :) In addition, i think if possible I would like to add a -q to surpress the output when there is a match also, right now I didn't add it in the example since it couldn't make it through with 3 flags even. So can anyone give me some hints here? Thanks in advance!
Hereby some explanation from the manpage:
OPTIONS
Generic Program Information
...
-F, --fixed-strings
Interpret PATTERNS as fixed strings, ...
-f FILE, --file=FILE
Obtain patterns from FILE, ...
-w, --word-regexp
Select only those lines ...
As you can see, the options -F and -w are indicated ending immediately (hence the comma in -F, and -w,), but the -f switch is followed by FILE, with means they belong together.
I you want to preserve the order Ffw, that's possible, but then you need to do something like:
grep -Ff b.txt -w a.txt
As mentioned by #kvantour, the solution is simply placing the -f before the b.txt file. grep -Fwf b.txt a.txt
Should have thought it when it says 'no such file or directory' as it was a clear indication that flags after the -f were treated as the path already.

grep multiple patterns using pattern file

I downloaded very huge list of hosts to block ads.
The problem is some sites are broken its functionality, like forum/discussion and/or pics. So i wanna remove some sites in hosts file.
Let say I wanna remove a.com and b.com from hosts.
These methods work.
grep -ve a.com -e b.com hosts > new_hosts
or
egrep -v 'a.com|b.com' hosts > new_hosts
Both are working fine. But if pattern increase, I wanna write the pattern in file.
If I use this
grep -vf pattern.txt hosts > new_hosts
Only the last pattern will be removed.
If pattern.txt contain
a.com
b.com
Only b.com omitted from new_hosts, a.com still written in new_hosts.
So what grep command to use using pattern file?
If you have a hosts file that you want to compare with another file containing entries you want to eliminate, this will be easier with uniq than with grep.
Just combine the files and run something like this:
cat hosts badfile badfile | sort | uniq -u > new_hosts
Badfile is added twice because if an entry is not already present in hosts, it will remain. Duplicating guarantees all copies are eliminated.
Thx for the feedback guys. Since most of you suspect the error from pattern.txt, then I suspect it could be windows notepad which made the error.
New line from Windows notepad is terminated by 0D 0A (hex).
I read somewhere the new line for grep shoud be 0A (hex).
After editing the pattern.txt using Notepad++, this command finally works :-)
grep -vf pattern.txt hosts > new_hosts
Or maybe this is better
fgrep -vf pattern.txt hosts > new_hosts
Both are working perfectly :-)

"grep: line too long" error message

I used the following syntax in order to find IP address under /etc
(answered by Dennis Williamson in superuser site)
but I get the message "grep: line too long".
Someone have idea how to ignore this message and why I get this?
grep -Er '\<([0-9]{1,3}\.){3}[0-9]{1,3}\>' /etc/
grep: line too long
The find/xargs solution didn't work for me, but resulted in the same error.
I solved this problem by using the -I grep option (ignore binary files). In my case there must have been a binary file in the list of files to search that had no linebreaks, so grep tries to read in a gigantic line that is too big. That's my guess at what this error means.
I got the idea from: http://web.archiveorange.com/archive/v/am8x7wI0r0243prrmYd4
This might not work for you of course if there's a text file with a line that is too long.
Use find to build a list of files to grep,
find /etc -type f -print0 | xargs -r0 grep -E '\<([0-9]{1,3}\.){3}[0-9]{1,3}\>'
In general find is a more flexible way of traversing the filesystem and building lists of files for other programs.
Perhaps your grep has a bug and scans by accident a binary file with too long lines (i.e. too much characters for grep to handle between two newlines). See this red hat page for more details (bug page).

How to make grep stop at first match on a line?

Well, I have a file test.txt
#test.txt
odsdsdoddf112 test1_for_grep
dad23392eeedJ test2 for grep
Hello World test
garbage
I want to extract strings which have got a space after them. I used following expression and it worked
grep -o [[:alnum:]]*.[[:blank:]] test.txt
Its output is
odsdsdoddf112
dad23392eeedJ
test2
for
Hello
World
But problem is grep prints all the strings that have got space after them, where as I want it to stop after first match on a line and then proceed to second line.
Which expression should I use here, in order to make it stop after first match and move to next line?
This problem may be solved with gawk or some other tool, but I will appreciate a solution which uses grep only.
Edit
I using GNU grep 2.5.1 on a Linux system, if that is relevant.
Edit
With the help of the answers given below, I tried my luck with
grep -o ^[[:alnum:]]* test.txt
grep -Eo ^[[:alnum:]]+ test.txt
and both gave me correct answers.
Now what surprises me is that I tried using
grep -Eo "^[[:alnum:]]+[[:blank:]]" test.txt
as suggested here but didn't get the correct answer.
Here is the output on my terminal
odsdsdoddf112
dad23392eeedJ
test2
for
Hello
World
But comments from RichieHindle and Adrian Pronk, shows that they got correct output on their systems. Anyone with some idea that why I too am not getting the same result on my system. Any idea? Any help will be appreciated.
Edit
Well, it seems that grep 2.5.1 has some bug because of which my output wasn't correct. I installed grep 2.5.4, now it is working correctly. Please see this link for details.
If you're sure you have no leading whitespace, add a ^ to match only at the start of a line, and change the * to a + to match only when you have one or more alphanumeric characters. (That means adding -E to use extended regular expressions).
grep -Eo "^[[:alnum:]]+[[:blank:]]" test.txt
(I also removed the . from the middle; I'm not sure what that was doing there?)
As the questioner discovered, this is a bug in versions of GNU grep prior to 2.5.3. The bug allows a caret to match after the end of a previous match, not just at beginning of line.
This bug is still present in other versions of grep, for instance in Mac OS X 10.9.4.
There isn't a universal workaround, but in the some examples, like non-spaces followed by a space, you can often get the desired behavior by leaving off the delimiter. That is, search for '[^ ]*' rather than '[^ ]* '.
grep -oe "^[^ ]* " test.txt
If we want to extract all meaningful input before garbage and actually stop on first match then -B NUM, --before-context=NUM option may be useful to "print NUM lines of leading context before matching lines".
Example:
grep --before-context=999999 "Hello World test"

To understand the practical use of Grep's option -H in different situations

This question is based on this answer.
Why do you get the same output from the both commands?
Command A
$sudo grep muel * /tmp
masi:muel
Command B
$sudo grep -H muel * /tmp
masi:muel
Rob's comment suggests me that Command A should not give me masi:, but only muel.
In short, what is the practical purpose of -H?
Grep will list the filenames by default if more than one filename is given. The -H option makes it do that even if only one filename is given. In both your examples, more than one filename is given.
Here's a better example:
$ grep Richie notes.txt
Richie wears glasses.
$ grep -H Richie notes.txt
notes.txt:Richie wears glasses.
It's more useful when you're giving it a wildcard for an unknown number of files, and you always want the filenames printed even if the wildcard only matches one file.
If you grep a single file, -H makes a difference:
$ grep muel mesi
muel
$ grep -H muel mesi
masi:muel
This could be significant in various scripting contexts. For example, a script (or a non-trivial piped series of commands) might not be aware of how many files it's actually dealing with: one, or many.
When you grep from multiple files, by default it shows the name of the file where the match was found. If you specify -H, the file name will always be shown, even if you grep from a single file. You can specify -h to never show the file name.
Emacs has grep interface (M-x grep, M-x lgrep, M-x rgrep). If you ask Emacs to search for foo in the current directory, then Emacs calls grep and process the grep output and then present you with results with clickable links. Clickable links, just like Google.
What Emacs does is that it passes two options to grep: -n (show line number) and -H (show filenames even if only one file. the point is consistency) and then turn the output into clickable links.
In general, consistency is good for being a good API, but consistency conflicts with DWIM.
When you directly use grep, you want DWIM, so you don't pass -H.

Resources