grep not finding blank lines on a particular file - grep

I preform the command grep '^$' myfile and receive no results.
In vi you can clearly see it list the following on line 2 of the file. In vi I set number and set list and this is what that line looks like.
2 $
The previous line terminates with a $ too. If I run it without the the ^ it returns every single line like you would expect.
I run the command on other files and it works, but not from files from a particular source. The file is ASCII text, with CRLF line terminators, so are the others.
Not sure what else I can look at on this type of file that would affect these results.
*Looking in notepad++ looking at the hidden characters the problematic has CRLF at the end of the lines blank or otherwise.
The non-problematic one is just LF.
Somewhere in there is the problem just finding it difficult to craft a grep statement that figures this out.
*Took the problematic file and used dos2unix and grep -En '^$' myfile works now. Too bad I can't be editing this file for my ultimate fix.
*In the end this is what worked for this file type.
grep --color=never -n '^[^[:print:]]' myfile

Related

Why my rsync scripts do no work on the new mac? [duplicate]

I am making an NW.js app on macOS, and want to run the app in dev mode
by double-clicking on an icon.
In the first step, I'm trying to make my shell script work.
Using VS Code on Windows (I wanted to gain time), I have created a run-nw file at the root of my project, containing this:
#!/bin/bash
cd "src"
npm install
cd ..
./tools/nwjs-sdk-v0.17.3-osx-x64/nwjs.app/Contents/MacOS/nwjs "src" &
but I get this output:
$ sh ./run-nw
: command not found
: No such file or directory
: command not found
: No such file or directory
Usage: npm <command>
where <command> is one of: (snip commands list)
(snip npm help)
npm#3.10.3 /usr/local/lib/node_modules/npm
: command not found
: No such file or directory
: command not found
Some things I don't understand.
It seems that it takes empty lines as commands.
In my editor (VS Code) I have tried to replace \r\n with \n
(in case the \r creates problems) but it changes nothing.
It seems that it doesn't find the folders
(with or without the dirname instruction),
or maybe it doesn't know about the cd command ?
It seems that it doesn't understand the install argument to npm.
The part that really weirds me out, is that it still runs the app
(if I did an npm install manually)...
Not able to make it work properly, and suspecting something weird with
the file itself, I created a new one directly on the Mac, using vim this time.
I entered the exact same instructions, and... now it works without any
issues.
A diff on the two files reveals exactly zero difference.
What can be the difference? What can make the first script not work? How can I find out?
Update
Following the accepted answer's recommendations, after the wrong line
endings came back, I checked multiple things.
It turns out that since I copied my ~/.gitconfig from my Windows
machine, I had autocrlf=true, so every time I modified the bash
file under Windows, it re-set the line endings to \r\n.
So, in addition to running dos2unix (which you will have to
install using Homebrew on a Mac), if you're using Git, check your
.gitconfig file.
Yes. Bash scripts are sensitive to line-endings, both in the script itself and in data it processes. They should have Unix-style line-endings, i.e., each line is terminated with a Line Feed character (decimal 10, hex 0A in ASCII).
DOS/Windows line endings in the script
With Windows or DOS-style line endings , each line is terminated with a Carriage Return followed by a Line Feed character. You can see this otherwise invisible character in the output of cat -v yourfile:
$ cat -v yourfile
#!/bin/bash^M
^M
cd "src"^M
npm install^M
^M
cd ..^M
./tools/nwjs-sdk-v0.17.3-osx-x64/nwjs.app/Contents/MacOS/nwjs "src" &^M
In this case, the carriage return (^M in caret notation or \r in C escape notation) is not treated as whitespace. Bash interprets the first line after the shebang (consisting of a single carriage return character) as the name of a command/program to run.
Since there is no command named ^M, it prints : command not found
Since there is no directory named "src"^M (or src^M), it prints : No such file or directory
It passes install^M instead of install as an argument to npm which causes npm to complain.
DOS/Windows line endings in input data
Like above, if you have an input file with carriage returns:
hello^M
world^M
then it will look completely normal in editors and when writing it to screen, but tools may produce strange results. For example, grep will fail to find lines that are obviously there:
$ grep 'hello$' file.txt || grep -x "hello" file.txt
(no match because the line actually ends in ^M)
Appended text will instead overwrite the line because the carriage returns moves the cursor to the start of the line:
$ sed -e 's/$/!/' file.txt
!ello
!orld
String comparison will seem to fail, even though strings appear to be the same when writing to screen:
$ a="hello"; read b < file.txt
$ if [[ "$a" = "$b" ]]
then echo "Variables are equal."
else echo "Sorry, $a is not equal to $b"
fi
Sorry, hello is not equal to hello
Solutions
The solution is to convert the file to use Unix-style line endings. There are a number of ways this can be accomplished:
This can be done using the dos2unix program:
dos2unix filename
Open the file in a capable text editor (Sublime, Notepad++, not Notepad) and configure it to save files with Unix line endings, e.g., with Vim, run the following command before (re)saving:
:set fileformat=unix
If you have a version of the sed utility that supports the -i or --in-place option, e.g., GNU sed, you could run the following command to strip trailing carriage returns:
sed -i 's/\r$//' filename
With other versions of sed, you could use output redirection to write to a new file. Be sure to use a different filename for the redirection target (it can be renamed later).
sed 's/\r$//' filename > filename.unix
Similarly, the tr translation filter can be used to delete unwanted characters from its input:
tr -d '\r' <filename >filename.unix
Cygwin Bash
With the Bash port for Cygwin, there’s a custom igncr option that can be set to ignore the Carriage Return in line endings (presumably because many of its users use native Windows programs to edit their text files).
This can be enabled for the current shell by running set -o igncr.
Setting this option applies only to the current shell process so it can be useful when sourcing files with extraneous carriage returns. If you regularly encounter shell scripts with DOS line endings and want this option to be set permanently, you could set an environment variable called SHELLOPTS (all capital letters) to include igncr. This environment variable is used by Bash to set shell options when it starts (before reading any startup files).
Useful utilities
The file utility is useful for quickly seeing which line endings are used in a text file. Here’s what it prints for for each file type:
Unix line endings: Bourne-Again shell script, ASCII text executable
Mac line endings: Bourne-Again shell script, ASCII text executable, with CR line terminators
DOS line endings: Bourne-Again shell script, ASCII text executable, with CRLF line terminators
The GNU version of the cat utility has a -v, --show-nonprinting option that displays non-printing characters.
The dos2unix utility is specifically written for converting text files between Unix, Mac and DOS line endings.
Useful links
Wikipedia has an excellent article covering the many different ways of marking the end of a line of text, the history of such encodings and how newlines are treated in different operating systems, programming languages and Internet protocols (e.g., FTP).
Files with classic Mac OS line endings
With Classic Mac OS (pre-OS X), each line was terminated with a Carriage Return (decimal 13, hex 0D in ASCII). If a script file was saved with such line endings, Bash would only see one long line like so:
#!/bin/bash^M^Mcd "src"^Mnpm install^M^Mcd ..^M./tools/nwjs-sdk-v0.17.3-osx-x64/nwjs.app/Contents/MacOS/nwjs "src" &^M
Since this single long line begins with an octothorpe (#), Bash treats the line (and the whole file) as a single comment.
Note: In 2001, Apple launched Mac OS X which was based on the BSD-derived NeXTSTEP operating system. As a result, OS X also uses Unix-style LF-only line endings and since then, text files terminated with a CR have become extremely rare. Nevertheless, I think it’s worthwhile to show how Bash would attempt to interpret such files.
On JetBrains products (PyCharm, PHPStorm, IDEA, etc.), you'll need to click on CRLF/LF to toggle between the two types of line separators (\r\n and \n).
I was trying to startup my docker container from Windows and got this:
Bash script and /bin/bash^M: bad interpreter: No such file or directory
I was using git bash and the problem was about the git config, then I just did the steps below and it worked. It will configure Git to not convert line endings on checkout:
git config --global core.autocrlf input
delete your local repository
clone it again.
Many thanks to Jason Harmon in this link:
https://forums.docker.com/t/error-while-running-docker-code-in-powershell/34059/6
Before that, I tried this, that didn't works:
dos2unix scriptname.sh
sed -i -e 's/\r$//' scriptname.sh
sed -i -e 's/^M$//' scriptname.sh
If you're using the read command to read from a file (or pipe) that is (or might be) in DOS/Windows format, you can take advantage of the fact that read will trim whitespace from the beginning and ends of lines. If you tell it that carriage returns are whitespace (by adding them to the IFS variable), it'll trim them from the ends of lines.
In bash (or zsh or ksh), that means you'd replace this standard idiom:
IFS= read -r somevar # This will not trim CR
with this:
IFS=$'\r' read -r somevar # This *will* trim CR
(Note: the -r option isn't related to this, it's just usually a good idea to avoid mangling backslashes.)
If you're not using the IFS= prefix (e.g. because you want to split the data into fields), then you'd replace this:
read -r field1 field2 ... # This will not trim CR
with this:
IFS=$' \t\n\r' read -r field1 field2 ... # This *will* trim CR
If you're using a shell that doesn't support the $'...' quoting mode (e.g. dash, the default /bin/sh on some Linux distros), or your script even might be run with such a shell, then you need to get a little more complex:
cr="$(printf '\r')"
IFS="$cr" read -r somevar # Read trimming *only* CR
IFS="$IFS$cr" read -r field1 field2 ... # Read trimming CR and whitespace, and splitting fields
Note that normally, when you change IFS, you should put it back to normal as soon as possible to avoid weird side effects; but in all these cases, it's a prefix to the read command, so it only affects that one command and doesn't have to be reset afterward.
Coming from a duplicate, if the problem is that you have files whose names contain ^M at the end, you can rename them with
for f in *$'\r'; do
mv "$f" "${f%$'\r'}"
done
You properly want to fix whatever caused these files to have broken names in the first place (probably a script which created them should be dos2unixed and then rerun?) but sometimes this is not feasible.
The $'\r' syntax is Bash-specific; if you have a different shell, maybe you need to use some other notation. Perhaps see also Difference between sh and bash
Since VS Code is being used, we can see CRLF or LF in the bottom right depending on what's being used and if we click on it we can change between them (LF is being used in below example):
We can also use the "Change End of Line Sequence" command from the command pallet. Whatever's easier to remember since they're functionally the same.
One more way to get rid of the unwanted CR ('\r') character is to run the tr command, for example:
$ tr -d '\r' < dosScript.py > nixScript.py
I ran into this issue when I use git with WSL.
git has a feature where it changes the line-ending of files according to the OS you are using, on Windows it make sure the line endings are \r\n which is not compatible with Linux which uses only \n.
You can resolve this problem by adding a file name .gitattributes to your git root directory and add lines as following:
config/* text eol=lf
run.sh text eol=lf
In this example all files inside config directory will have only line-feed line ending and run.sh file as well.
For Notepad++ users, this can be solved by:
The simplest way on MAC / Linux - create a file using 'touch' command, open this file with VI or VIM editor, paste your code and save. This would automatically remove the windows characters.
If you are using a text editor like BBEdit you can do it at the status bar. There is a selection where you can switch.
For IntelliJ users, here is the solution for writing Linux script.
Use LF - Unix and masOS (\n)
Scripts may call each other.
An even better magic solution is to convert all scripts in the folder/subfolders:
find . -name "*.sh" -exec sed -i -e 's/\r$//' {} +
You can use dos2unix too but many servers do not have it installed by default.
For the sake of completeness, I'll point out another solution which can solve this problem permanently without the need to run dos2unix all the time:
sudo ln -s /bin/bash `printf 'bash\r'`

GNU join overwriting output columns [duplicate]

I am making an NW.js app on macOS, and want to run the app in dev mode
by double-clicking on an icon.
In the first step, I'm trying to make my shell script work.
Using VS Code on Windows (I wanted to gain time), I have created a run-nw file at the root of my project, containing this:
#!/bin/bash
cd "src"
npm install
cd ..
./tools/nwjs-sdk-v0.17.3-osx-x64/nwjs.app/Contents/MacOS/nwjs "src" &
but I get this output:
$ sh ./run-nw
: command not found
: No such file or directory
: command not found
: No such file or directory
Usage: npm <command>
where <command> is one of: (snip commands list)
(snip npm help)
npm#3.10.3 /usr/local/lib/node_modules/npm
: command not found
: No such file or directory
: command not found
Some things I don't understand.
It seems that it takes empty lines as commands.
In my editor (VS Code) I have tried to replace \r\n with \n
(in case the \r creates problems) but it changes nothing.
It seems that it doesn't find the folders
(with or without the dirname instruction),
or maybe it doesn't know about the cd command ?
It seems that it doesn't understand the install argument to npm.
The part that really weirds me out, is that it still runs the app
(if I did an npm install manually)...
Not able to make it work properly, and suspecting something weird with
the file itself, I created a new one directly on the Mac, using vim this time.
I entered the exact same instructions, and... now it works without any
issues.
A diff on the two files reveals exactly zero difference.
What can be the difference? What can make the first script not work? How can I find out?
Update
Following the accepted answer's recommendations, after the wrong line
endings came back, I checked multiple things.
It turns out that since I copied my ~/.gitconfig from my Windows
machine, I had autocrlf=true, so every time I modified the bash
file under Windows, it re-set the line endings to \r\n.
So, in addition to running dos2unix (which you will have to
install using Homebrew on a Mac), if you're using Git, check your
.gitconfig file.
Yes. Bash scripts are sensitive to line-endings, both in the script itself and in data it processes. They should have Unix-style line-endings, i.e., each line is terminated with a Line Feed character (decimal 10, hex 0A in ASCII).
DOS/Windows line endings in the script
With Windows or DOS-style line endings , each line is terminated with a Carriage Return followed by a Line Feed character. You can see this otherwise invisible character in the output of cat -v yourfile:
$ cat -v yourfile
#!/bin/bash^M
^M
cd "src"^M
npm install^M
^M
cd ..^M
./tools/nwjs-sdk-v0.17.3-osx-x64/nwjs.app/Contents/MacOS/nwjs "src" &^M
In this case, the carriage return (^M in caret notation or \r in C escape notation) is not treated as whitespace. Bash interprets the first line after the shebang (consisting of a single carriage return character) as the name of a command/program to run.
Since there is no command named ^M, it prints : command not found
Since there is no directory named "src"^M (or src^M), it prints : No such file or directory
It passes install^M instead of install as an argument to npm which causes npm to complain.
DOS/Windows line endings in input data
Like above, if you have an input file with carriage returns:
hello^M
world^M
then it will look completely normal in editors and when writing it to screen, but tools may produce strange results. For example, grep will fail to find lines that are obviously there:
$ grep 'hello$' file.txt || grep -x "hello" file.txt
(no match because the line actually ends in ^M)
Appended text will instead overwrite the line because the carriage returns moves the cursor to the start of the line:
$ sed -e 's/$/!/' file.txt
!ello
!orld
String comparison will seem to fail, even though strings appear to be the same when writing to screen:
$ a="hello"; read b < file.txt
$ if [[ "$a" = "$b" ]]
then echo "Variables are equal."
else echo "Sorry, $a is not equal to $b"
fi
Sorry, hello is not equal to hello
Solutions
The solution is to convert the file to use Unix-style line endings. There are a number of ways this can be accomplished:
This can be done using the dos2unix program:
dos2unix filename
Open the file in a capable text editor (Sublime, Notepad++, not Notepad) and configure it to save files with Unix line endings, e.g., with Vim, run the following command before (re)saving:
:set fileformat=unix
If you have a version of the sed utility that supports the -i or --in-place option, e.g., GNU sed, you could run the following command to strip trailing carriage returns:
sed -i 's/\r$//' filename
With other versions of sed, you could use output redirection to write to a new file. Be sure to use a different filename for the redirection target (it can be renamed later).
sed 's/\r$//' filename > filename.unix
Similarly, the tr translation filter can be used to delete unwanted characters from its input:
tr -d '\r' <filename >filename.unix
Cygwin Bash
With the Bash port for Cygwin, there’s a custom igncr option that can be set to ignore the Carriage Return in line endings (presumably because many of its users use native Windows programs to edit their text files).
This can be enabled for the current shell by running set -o igncr.
Setting this option applies only to the current shell process so it can be useful when sourcing files with extraneous carriage returns. If you regularly encounter shell scripts with DOS line endings and want this option to be set permanently, you could set an environment variable called SHELLOPTS (all capital letters) to include igncr. This environment variable is used by Bash to set shell options when it starts (before reading any startup files).
Useful utilities
The file utility is useful for quickly seeing which line endings are used in a text file. Here’s what it prints for for each file type:
Unix line endings: Bourne-Again shell script, ASCII text executable
Mac line endings: Bourne-Again shell script, ASCII text executable, with CR line terminators
DOS line endings: Bourne-Again shell script, ASCII text executable, with CRLF line terminators
The GNU version of the cat utility has a -v, --show-nonprinting option that displays non-printing characters.
The dos2unix utility is specifically written for converting text files between Unix, Mac and DOS line endings.
Useful links
Wikipedia has an excellent article covering the many different ways of marking the end of a line of text, the history of such encodings and how newlines are treated in different operating systems, programming languages and Internet protocols (e.g., FTP).
Files with classic Mac OS line endings
With Classic Mac OS (pre-OS X), each line was terminated with a Carriage Return (decimal 13, hex 0D in ASCII). If a script file was saved with such line endings, Bash would only see one long line like so:
#!/bin/bash^M^Mcd "src"^Mnpm install^M^Mcd ..^M./tools/nwjs-sdk-v0.17.3-osx-x64/nwjs.app/Contents/MacOS/nwjs "src" &^M
Since this single long line begins with an octothorpe (#), Bash treats the line (and the whole file) as a single comment.
Note: In 2001, Apple launched Mac OS X which was based on the BSD-derived NeXTSTEP operating system. As a result, OS X also uses Unix-style LF-only line endings and since then, text files terminated with a CR have become extremely rare. Nevertheless, I think it’s worthwhile to show how Bash would attempt to interpret such files.
On JetBrains products (PyCharm, PHPStorm, IDEA, etc.), you'll need to click on CRLF/LF to toggle between the two types of line separators (\r\n and \n).
I was trying to startup my docker container from Windows and got this:
Bash script and /bin/bash^M: bad interpreter: No such file or directory
I was using git bash and the problem was about the git config, then I just did the steps below and it worked. It will configure Git to not convert line endings on checkout:
git config --global core.autocrlf input
delete your local repository
clone it again.
Many thanks to Jason Harmon in this link:
https://forums.docker.com/t/error-while-running-docker-code-in-powershell/34059/6
Before that, I tried this, that didn't works:
dos2unix scriptname.sh
sed -i -e 's/\r$//' scriptname.sh
sed -i -e 's/^M$//' scriptname.sh
If you're using the read command to read from a file (or pipe) that is (or might be) in DOS/Windows format, you can take advantage of the fact that read will trim whitespace from the beginning and ends of lines. If you tell it that carriage returns are whitespace (by adding them to the IFS variable), it'll trim them from the ends of lines.
In bash (or zsh or ksh), that means you'd replace this standard idiom:
IFS= read -r somevar # This will not trim CR
with this:
IFS=$'\r' read -r somevar # This *will* trim CR
(Note: the -r option isn't related to this, it's just usually a good idea to avoid mangling backslashes.)
If you're not using the IFS= prefix (e.g. because you want to split the data into fields), then you'd replace this:
read -r field1 field2 ... # This will not trim CR
with this:
IFS=$' \t\n\r' read -r field1 field2 ... # This *will* trim CR
If you're using a shell that doesn't support the $'...' quoting mode (e.g. dash, the default /bin/sh on some Linux distros), or your script even might be run with such a shell, then you need to get a little more complex:
cr="$(printf '\r')"
IFS="$cr" read -r somevar # Read trimming *only* CR
IFS="$IFS$cr" read -r field1 field2 ... # Read trimming CR and whitespace, and splitting fields
Note that normally, when you change IFS, you should put it back to normal as soon as possible to avoid weird side effects; but in all these cases, it's a prefix to the read command, so it only affects that one command and doesn't have to be reset afterward.
Coming from a duplicate, if the problem is that you have files whose names contain ^M at the end, you can rename them with
for f in *$'\r'; do
mv "$f" "${f%$'\r'}"
done
You properly want to fix whatever caused these files to have broken names in the first place (probably a script which created them should be dos2unixed and then rerun?) but sometimes this is not feasible.
The $'\r' syntax is Bash-specific; if you have a different shell, maybe you need to use some other notation. Perhaps see also Difference between sh and bash
Since VS Code is being used, we can see CRLF or LF in the bottom right depending on what's being used and if we click on it we can change between them (LF is being used in below example):
We can also use the "Change End of Line Sequence" command from the command pallet. Whatever's easier to remember since they're functionally the same.
One more way to get rid of the unwanted CR ('\r') character is to run the tr command, for example:
$ tr -d '\r' < dosScript.py > nixScript.py
I ran into this issue when I use git with WSL.
git has a feature where it changes the line-ending of files according to the OS you are using, on Windows it make sure the line endings are \r\n which is not compatible with Linux which uses only \n.
You can resolve this problem by adding a file name .gitattributes to your git root directory and add lines as following:
config/* text eol=lf
run.sh text eol=lf
In this example all files inside config directory will have only line-feed line ending and run.sh file as well.
For Notepad++ users, this can be solved by:
The simplest way on MAC / Linux - create a file using 'touch' command, open this file with VI or VIM editor, paste your code and save. This would automatically remove the windows characters.
If you are using a text editor like BBEdit you can do it at the status bar. There is a selection where you can switch.
For IntelliJ users, here is the solution for writing Linux script.
Use LF - Unix and masOS (\n)
Scripts may call each other.
An even better magic solution is to convert all scripts in the folder/subfolders:
find . -name "*.sh" -exec sed -i -e 's/\r$//' {} +
You can use dos2unix too but many servers do not have it installed by default.
For the sake of completeness, I'll point out another solution which can solve this problem permanently without the need to run dos2unix all the time:
sudo ln -s /bin/bash `printf 'bash\r'`

Grep returns no such file or directory when using multiple flags

I am a beginner of bash script. I just started to write a script where it checks the contents of b.txt can all be found in a.txt. (line by line preferably). My code is as following:
grep -Ffw b.txt a.txt
As you can see, I want to do fixed string instead of REGEX, I want to check everything from the b.txt file, because there are some strings inside the b.txt and I want to check if all of them exist in a.txt. And I also want to match the whole word only of course. So these are the requirements, however when I run this command it returns me an error says: grep: w: No such file or directory
I am thinking that maybe there are some limitations of the flags in bash? Sorry I am not really familiar with the language, didn't read much about the MAN page etc. If anyone could help me to solve the puzzle it would be appreciated :) In addition, i think if possible I would like to add a -q to surpress the output when there is a match also, right now I didn't add it in the example since it couldn't make it through with 3 flags even. So can anyone give me some hints here? Thanks in advance!
Hereby some explanation from the manpage:
OPTIONS
Generic Program Information
...
-F, --fixed-strings
Interpret PATTERNS as fixed strings, ...
-f FILE, --file=FILE
Obtain patterns from FILE, ...
-w, --word-regexp
Select only those lines ...
As you can see, the options -F and -w are indicated ending immediately (hence the comma in -F, and -w,), but the -f switch is followed by FILE, with means they belong together.
I you want to preserve the order Ffw, that's possible, but then you need to do something like:
grep -Ff b.txt -w a.txt
As mentioned by #kvantour, the solution is simply placing the -f before the b.txt file. grep -Fwf b.txt a.txt
Should have thought it when it says 'no such file or directory' as it was a clear indication that flags after the -f were treated as the path already.

How to grep for two words existing on the same line? [duplicate]

This question already has answers here:
Match two strings in one line with grep
(23 answers)
Closed 3 years ago.
How do I grep for lines that contain two input words on the line? I'm looking for lines that contain both words, how do I do that? I tried pipe like this:
grep -c "word1" | grep -r "word2" logs
It just stucks after the first pipe command.
Why?
Why do you pass -c? That will just show the number of matches. Similarly, there is no reason to use -r. I suggest you read man grep.
To grep for 2 words existing on the same line, simply do:
grep "word1" FILE | grep "word2"
grep "word1" FILE will print all lines that have word1 in them from FILE, and then grep "word2" will print the lines that have word2 in them. Hence, if you combine these using a pipe, it will show lines containing both word1 and word2.
If you just want a count of how many lines had the 2 words on the same line, do:
grep "word1" FILE | grep -c "word2"
Also, to address your question why does it get stuck : in grep -c "word1", you did not specify a file. Therefore, grep expects input from stdin, which is why it seems to hang. You can press Ctrl+D to send an EOF (end-of-file) so that it quits.
Prescription
One simple rewrite of the command in the question is:
grep "word1" logs | grep "word2"
The first grep finds lines with 'word1' from the file 'logs' and then feeds those into the second grep which looks for lines containing 'word2'.
However, it isn't necessary to use two commands like that. You could use extended grep (grep -E or egrep):
grep -E 'word1.*word2|word2.*word1' logs
If you know that 'word1' will precede 'word2' on the line, you don't even need the alternatives and regular grep would do:
grep 'word1.*word2' logs
The 'one command' variants have the advantage that there is only one process running, and so the lines containing 'word1' do not have to be passed via a pipe to the second process. How much this matters depends on how big the data file is and how many lines match 'word1'. If the file is small, performance isn't likely to be an issue and running two commands is fine. If the file is big but only a few lines contain 'word1', there isn't going to be much data passed on the pipe and using two command is fine. However, if the file is huge and 'word1' occurs frequently, then you may be passing significant data down the pipe where a single command avoids that overhead. Against that, the regex is more complex; you might need to benchmark it to find out what's best — but only if performance really matters. If you run two commands, you should aim to select the less frequently occurring word in the first grep to minimize the amount of data processed by the second.
Diagnosis
The initial script is:
grep -c "word1" | grep -r "word2" logs
This is an odd command sequence. The first grep is going to count the number of occurrences of 'word1' on its standard input, and print that number on its standard output. Until you indicate EOF (e.g. by typing Control-D), it will sit there, waiting for you to type something. The second grep does a recursive search for 'word2' in the files underneath directory logs (or, if it is a file, in the file logs). Or, in my case, it will fail since there's neither a file nor a directory called logs where I'm running the pipeline. Note that the second grep doesn't read its standard input at all, so the pipe is superfluous.
With Bash, the parent shell waits until all the processes in the pipeline have exited, so it sits around waiting for the grep -c to finish, which it won't do until you indicate EOF. Hence, your code seems to get stuck. With Heirloom Shell, the second grep completes and exits, and the shell prompts again. Now you have two processes running, the first grep and the shell, and they are both trying to read from the keyboard, and it is not determinate which one gets any given line of input (or any given EOF indication).
Note that even if you typed data as input to the first grep, you would only get any lines that contain 'word2' shown on the output.
Footnote:
At one time, the answer used:
grep -E 'word1.*word2|word2.*word1' "$#"
grep 'word1.*word2' "$#"
This triggered the comments below.
you could use awk. like this...
cat <yourFile> | awk '/word1/ && /word2/'
Order is not important. So if you have a file and...
a file named , file1 contains:
word1 is in this file as well as word2
word2 is in this file as well as word1
word4 is in this file as well as word1
word5 is in this file as well as word2
then,
/tmp$ cat file1| awk '/word1/ && /word2/'
will result in,
word1 is in this file as well as word2
word2 is in this file as well as word1
yes, awk is slower.
The main issue is that you haven't supplied the first grep with any input. You will need to reorder your command something like
grep "word1" logs | grep "word2"
If you want to count the occurences, then put a '-c' on the second grep.
git grep
Here is the syntax using git grep combining multiple patterns using Boolean expressions:
git grep -e pattern1 --and -e pattern2 --and -e pattern3
The above command will print lines matching all the patterns at once.
If the files aren't under version control, add --no-index param.
Search files in the current directory that is not managed by Git.
Check man git-grep for help.
See also:
How to use grep to match string1 AND string2?
Check if all of multiple strings or regexes exist in a file.
How to run grep with multiple AND patterns?
For multiple patterns stored in the file, see: Match all patterns from file at once.
You cat try with below command
cat log|grep -e word1 -e word2
Use grep:
grep -wE "string1|String2|...." file_name
Or you can use:
echo string | grep -wE "string1|String2|...."

How to make grep stop at first match on a line?

Well, I have a file test.txt
#test.txt
odsdsdoddf112 test1_for_grep
dad23392eeedJ test2 for grep
Hello World test
garbage
I want to extract strings which have got a space after them. I used following expression and it worked
grep -o [[:alnum:]]*.[[:blank:]] test.txt
Its output is
odsdsdoddf112
dad23392eeedJ
test2
for
Hello
World
But problem is grep prints all the strings that have got space after them, where as I want it to stop after first match on a line and then proceed to second line.
Which expression should I use here, in order to make it stop after first match and move to next line?
This problem may be solved with gawk or some other tool, but I will appreciate a solution which uses grep only.
Edit
I using GNU grep 2.5.1 on a Linux system, if that is relevant.
Edit
With the help of the answers given below, I tried my luck with
grep -o ^[[:alnum:]]* test.txt
grep -Eo ^[[:alnum:]]+ test.txt
and both gave me correct answers.
Now what surprises me is that I tried using
grep -Eo "^[[:alnum:]]+[[:blank:]]" test.txt
as suggested here but didn't get the correct answer.
Here is the output on my terminal
odsdsdoddf112
dad23392eeedJ
test2
for
Hello
World
But comments from RichieHindle and Adrian Pronk, shows that they got correct output on their systems. Anyone with some idea that why I too am not getting the same result on my system. Any idea? Any help will be appreciated.
Edit
Well, it seems that grep 2.5.1 has some bug because of which my output wasn't correct. I installed grep 2.5.4, now it is working correctly. Please see this link for details.
If you're sure you have no leading whitespace, add a ^ to match only at the start of a line, and change the * to a + to match only when you have one or more alphanumeric characters. (That means adding -E to use extended regular expressions).
grep -Eo "^[[:alnum:]]+[[:blank:]]" test.txt
(I also removed the . from the middle; I'm not sure what that was doing there?)
As the questioner discovered, this is a bug in versions of GNU grep prior to 2.5.3. The bug allows a caret to match after the end of a previous match, not just at beginning of line.
This bug is still present in other versions of grep, for instance in Mac OS X 10.9.4.
There isn't a universal workaround, but in the some examples, like non-spaces followed by a space, you can often get the desired behavior by leaving off the delimiter. That is, search for '[^ ]*' rather than '[^ ]* '.
grep -oe "^[^ ]* " test.txt
If we want to extract all meaningful input before garbage and actually stop on first match then -B NUM, --before-context=NUM option may be useful to "print NUM lines of leading context before matching lines".
Example:
grep --before-context=999999 "Hello World test"

Resources