Strange behavior grep -rnw - grep

I am using grep (BSD grep) 2.5.1-FreeBSD in MacOS and I have found the following behavior.
I have two *.tex files. Each one of these contains the following lines
$k$-th bit of
$(i-m)$-th bit of
respectively. When I ran
grep --color -rnw . -e '\$-th bit of' --include="*.tex"
I got only the second file, i.e., $(i-m)$-th bit of, while I expect the two lines. Could you help me please to understand this behavior?

Never use -r or --include or any other grep option to find files. The GNU guys really screwed up by adding those options to grep when there's a perfectly good tool named find for finding files and now they've turned grep into a convoluted mush of finding files and Globally matching a Regular Expression within a file and Printing the result (G/RE/P).
Keep it simple - find the files with find then g/re/p within then using grep:
find . -name '*.tex' -exec grep --color -n '\$-th bit of' {} +
As others pointed out your g/re/p problem was the -w arg so I've removed that above.

I have the same version of grep.
It is caused by your use of the -w option:
-w, --word-regexp
The expression is searched for as a word (as if surrounded by `[[:<:]]' and `[[:>:]]'; see re_format(7)).
The matched part of the string $k$-th bit of is bounded on the left-hand side by a word character (i.e. k) so the match is treated as being inside a "word" and it can't therefore satisfy the "searched for as a whole word" requirement.
Try without -w and it will work fine.

Related

Grep returns no such file or directory when using multiple flags

I am a beginner of bash script. I just started to write a script where it checks the contents of b.txt can all be found in a.txt. (line by line preferably). My code is as following:
grep -Ffw b.txt a.txt
As you can see, I want to do fixed string instead of REGEX, I want to check everything from the b.txt file, because there are some strings inside the b.txt and I want to check if all of them exist in a.txt. And I also want to match the whole word only of course. So these are the requirements, however when I run this command it returns me an error says: grep: w: No such file or directory
I am thinking that maybe there are some limitations of the flags in bash? Sorry I am not really familiar with the language, didn't read much about the MAN page etc. If anyone could help me to solve the puzzle it would be appreciated :) In addition, i think if possible I would like to add a -q to surpress the output when there is a match also, right now I didn't add it in the example since it couldn't make it through with 3 flags even. So can anyone give me some hints here? Thanks in advance!
Hereby some explanation from the manpage:
OPTIONS
Generic Program Information
...
-F, --fixed-strings
Interpret PATTERNS as fixed strings, ...
-f FILE, --file=FILE
Obtain patterns from FILE, ...
-w, --word-regexp
Select only those lines ...
As you can see, the options -F and -w are indicated ending immediately (hence the comma in -F, and -w,), but the -f switch is followed by FILE, with means they belong together.
I you want to preserve the order Ffw, that's possible, but then you need to do something like:
grep -Ff b.txt -w a.txt
As mentioned by #kvantour, the solution is simply placing the -f before the b.txt file. grep -Fwf b.txt a.txt
Should have thought it when it says 'no such file or directory' as it was a clear indication that flags after the -f were treated as the path already.

grep match with string1 OR string2

I want to grep 2 patterns in a file on Solaris UNIX.
That is grep 'pattern1 OR pattern2' filename.
The following command does NOT work:
grep 'pattern1\|pattern2' filename
What is wrong with this command?
NOTE: I am on Solaris
What operating system are you on?
It will work with on systems with GNU grep, but on BSD, Solaris, etc., \| is not supported.
Try egrep or grep -E, e.g.
egrep 'pattern1|pattern2'
If you want POSIX functionality (i.e. Linux-like behavior) you can put the POSIX 2-compatible binaries at the beginning of your path:
$ echo $PATH
/usr/xpg4/bin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:[...]
There is also /usr/xpg6 which is POSIX 1 compatible.
/usr/bin: SVID/XPG3
/usr/xpg4/bin: POSIX.2/POSIX.2a/SUS/SUSv2/XPG4
/usr/xpg6/bin: POSIX.1-2001/SUSv3
That command works fine for me. Please add additional information such as your platform and the exact regular expression and file contents you're using (minimized to the smallest example that still reproduces the issue). (I would add a comment to your post but don't have enough reputation.)
That should be correct. Make sure that you do or don't add the appropriate spaces i.e. "pattern1\|pattern2" vs "pattern1\| pattern2".
Are you sure you aren't just having problems with cases or something? try the -i switch.
That depends entirely on what pattern1 and pattern2 are. If they're just words, it should work, otherwise you'll need:
grep '\(pattern1\)\|\(pattern2\)'
An arcane method using fgrep (ie: fixed strings) that works on Solaris 10...
Provide a pattern-list, with each pattern separated by a NEWLINE, yet quoted so as to be interpreted by the shell as one word:-
fgrep 'pattern1
pattern2' filename
This method also works for grep, fgrep and egrep in /usr/xpg4/bin, although the pipe-delimited ERE in any egrep is sometimes the least fussy.
You can insert arbitrary newlines in a string if your shell allows history-editing, eg: in bash issue C-v C-j in either emacs mode or vi-command mode.
egrep -e "string1|string2" works for me in SunOS 5.9 (Solaris)

"grep: line too long" error message

I used the following syntax in order to find IP address under /etc
(answered by Dennis Williamson in superuser site)
but I get the message "grep: line too long".
Someone have idea how to ignore this message and why I get this?
grep -Er '\<([0-9]{1,3}\.){3}[0-9]{1,3}\>' /etc/
grep: line too long
The find/xargs solution didn't work for me, but resulted in the same error.
I solved this problem by using the -I grep option (ignore binary files). In my case there must have been a binary file in the list of files to search that had no linebreaks, so grep tries to read in a gigantic line that is too big. That's my guess at what this error means.
I got the idea from: http://web.archiveorange.com/archive/v/am8x7wI0r0243prrmYd4
This might not work for you of course if there's a text file with a line that is too long.
Use find to build a list of files to grep,
find /etc -type f -print0 | xargs -r0 grep -E '\<([0-9]{1,3}\.){3}[0-9]{1,3}\>'
In general find is a more flexible way of traversing the filesystem and building lists of files for other programs.
Perhaps your grep has a bug and scans by accident a binary file with too long lines (i.e. too much characters for grep to handle between two newlines). See this red hat page for more details (bug page).

How to make grep stop at first match on a line?

Well, I have a file test.txt
#test.txt
odsdsdoddf112 test1_for_grep
dad23392eeedJ test2 for grep
Hello World test
garbage
I want to extract strings which have got a space after them. I used following expression and it worked
grep -o [[:alnum:]]*.[[:blank:]] test.txt
Its output is
odsdsdoddf112
dad23392eeedJ
test2
for
Hello
World
But problem is grep prints all the strings that have got space after them, where as I want it to stop after first match on a line and then proceed to second line.
Which expression should I use here, in order to make it stop after first match and move to next line?
This problem may be solved with gawk or some other tool, but I will appreciate a solution which uses grep only.
Edit
I using GNU grep 2.5.1 on a Linux system, if that is relevant.
Edit
With the help of the answers given below, I tried my luck with
grep -o ^[[:alnum:]]* test.txt
grep -Eo ^[[:alnum:]]+ test.txt
and both gave me correct answers.
Now what surprises me is that I tried using
grep -Eo "^[[:alnum:]]+[[:blank:]]" test.txt
as suggested here but didn't get the correct answer.
Here is the output on my terminal
odsdsdoddf112
dad23392eeedJ
test2
for
Hello
World
But comments from RichieHindle and Adrian Pronk, shows that they got correct output on their systems. Anyone with some idea that why I too am not getting the same result on my system. Any idea? Any help will be appreciated.
Edit
Well, it seems that grep 2.5.1 has some bug because of which my output wasn't correct. I installed grep 2.5.4, now it is working correctly. Please see this link for details.
If you're sure you have no leading whitespace, add a ^ to match only at the start of a line, and change the * to a + to match only when you have one or more alphanumeric characters. (That means adding -E to use extended regular expressions).
grep -Eo "^[[:alnum:]]+[[:blank:]]" test.txt
(I also removed the . from the middle; I'm not sure what that was doing there?)
As the questioner discovered, this is a bug in versions of GNU grep prior to 2.5.3. The bug allows a caret to match after the end of a previous match, not just at beginning of line.
This bug is still present in other versions of grep, for instance in Mac OS X 10.9.4.
There isn't a universal workaround, but in the some examples, like non-spaces followed by a space, you can often get the desired behavior by leaving off the delimiter. That is, search for '[^ ]*' rather than '[^ ]* '.
grep -oe "^[^ ]* " test.txt
If we want to extract all meaningful input before garbage and actually stop on first match then -B NUM, --before-context=NUM option may be useful to "print NUM lines of leading context before matching lines".
Example:
grep --before-context=999999 "Hello World test"

To understand the practical use of Grep's option -H in different situations

This question is based on this answer.
Why do you get the same output from the both commands?
Command A
$sudo grep muel * /tmp
masi:muel
Command B
$sudo grep -H muel * /tmp
masi:muel
Rob's comment suggests me that Command A should not give me masi:, but only muel.
In short, what is the practical purpose of -H?
Grep will list the filenames by default if more than one filename is given. The -H option makes it do that even if only one filename is given. In both your examples, more than one filename is given.
Here's a better example:
$ grep Richie notes.txt
Richie wears glasses.
$ grep -H Richie notes.txt
notes.txt:Richie wears glasses.
It's more useful when you're giving it a wildcard for an unknown number of files, and you always want the filenames printed even if the wildcard only matches one file.
If you grep a single file, -H makes a difference:
$ grep muel mesi
muel
$ grep -H muel mesi
masi:muel
This could be significant in various scripting contexts. For example, a script (or a non-trivial piped series of commands) might not be aware of how many files it's actually dealing with: one, or many.
When you grep from multiple files, by default it shows the name of the file where the match was found. If you specify -H, the file name will always be shown, even if you grep from a single file. You can specify -h to never show the file name.
Emacs has grep interface (M-x grep, M-x lgrep, M-x rgrep). If you ask Emacs to search for foo in the current directory, then Emacs calls grep and process the grep output and then present you with results with clickable links. Clickable links, just like Google.
What Emacs does is that it passes two options to grep: -n (show line number) and -H (show filenames even if only one file. the point is consistency) and then turn the output into clickable links.
In general, consistency is good for being a good API, but consistency conflicts with DWIM.
When you directly use grep, you want DWIM, so you don't pass -H.

Resources