Search through *.gz files with keeping the file name - grep

Say I have multiple .gz files that I want to search a keyword in them. I can do this by piping zcat result to a grep like this:
zcat some.file.* | grep "keyword_1" | ... | grep "keyword_n"
The output of this command though will be just the matching line and won't have the file name in it. Is there any way I can attach the file name to the zcat output?

Try zgrep instead of zcat:
zgrep -H keyword some.file.*
And if you want to use egrep to get pattern matching:
export GREP=egrep
zgrep -H -e "(keyword1|keyword2)" some.file.*

Related

show filename with matching word from grep only

I am trying to find which words happened in logfiles plus show the logfilename for anything that matches following pattern:
'BA10\|BA20\|BA21\|BA30\|BA31\|BA00'
so if file dummylogfile.log contains BA10002 I would like to get a result such as:
dummylogfile.log:BA10002
it is totally fine if the logfile shows up twice for duplicate matches.
the closest I got is:
for f in $(find . -name '*.err' -exec grep -l 'BA10\|BA20\|BA21\|BA30\|BA31\|BA00' {} \+);do printf $f;printf ':';grep -o 'BA10\|BA20\|BA21\|BA30\|BA31\|BA00' $f;done
but this gives things like:
./register-05-14-11-53-59_24154.err:BA10
BA10
./register_mdw_files_2020-05-14-11-54-32_24429.err:BA10
BA10
./process_tables.2020-05-18-11-18-09_11428.err:BA30
./status_load_2020-05-18-11-35-31_9185.err:BA30
so,
1) there are empty lines with only the second match and
2) the full match (e.g., BA10004) is not shown.
thanks for the help
There are a couple of options you can pass to grep:
-H: This will report the filename and the match
-o: only show the match, not the full line
-w: The match must represent a full word (string build from [A-Za-z0-9_])
If we look at your regex, you use BA01, this will match only BA01 which can appear anywhere in the text, also mid word. If you want the regex to match a full word, it should read BA01[[:alnum:]_]* which adds any sequence of word-constituent characters (equivalent to [A-Za-z0-9_]). You can test this with
$ echo "foo BA01234 barBA012" | grep -Ho "BA01"
(standard input):BA01
(standard input):BA01
$ echo "foo BA01234 barBA012" | grep -How "BA01"
$ echo "foo BA01234 barBA012" | grep -How "BA01[[:alnum:]_]*"
(standard input):BA01234
So your grep should look like
grep -How "\('BA10\|BA20\|BA21\|BA30\|BA31\|BA00'\)[[:alnum:]_]*" *.err
From your example it seems that all files are in one directory. So the following works right away:
grep -l 'BA10\|BA20\|BA21\|BA30\|BA31\|BA00' *.err
If the files are in different directories:
find . -name '*.err' -print | xargs -I {} grep 'BA10\|BA20\|BA21\|BA30\|BA31\|BA00' {} /dev/null
Explanation: the addition of /dev/null to the filename {} forces grep to report the matching filename

grep mistaking pattern for file?

cat file.txt | grep -x "\d*"
grep: \Documents and Settings: Is a directory
I want to search file.txt for any lines that are numbers only but grep seems to be viewing \d* as a wildcard for files and not the pattern. How can I specify that it's the pattern and it should use stdin for what to grep over?
The file is full of lines of datetime stamps, some end with a letter, some don't.
20140110122200
20131208041510M
...
I'm trying to only get the lines that don't end in a letter.
EDIT: I've also tried setting the filename instead of piping it with cat. Not much different.
C:\long\path>grep -ex "\d*" -f file.txt
grep: \Dell: Is a directory
grep: \Documents and Settings: Is a directory
Why are you using cat to pass the file to grep? Why not just give grep the filename directly?
grep -x '\d*' file.txt
I think the actual problem you're seeing is that the * wildcard is being expanded. That's why grep is giving you errors that mention actual directories (beginning with 'd') on your system.

Using grep to find a string that starts with a character with numbers after

Okay I have a file that contains numbers like this:
L21479
What I am trying to do is use grep (or a similar tool) to find all the strings in a file that have the format:
L#####
The # will be the number. SO an L followed by 5 numbers.
Is this even possible in grep? Should I load the file and perform regex?
You can do this with grep, for example with the following command:
grep -E -o 'L[0-9]{5}' name_of_file
For example, given a file with the text:
kasdhflkashl143112343214L232134614
3L1431413543454L2342L3523269ufoidu
gl9983ugsdu8768IUHI/(JHKJASHD/(888
The command above will output:
L23213
L14314
L35232
If it is just in a single file, you can do something along the lines of:
grep -e 'L[0-9]{5}' filename
If you need to search all files in a directory for these strings:
find . -type f | xargs grep -e 'L[0-9]{5}'

grep with --include and --exclude

I want to search for a string foo within the app directory, but excluding any file which contains migrations in the file name. I expected this grep command to work
grep -Ir --include "*.py" --exclude "*migrations*" foo app/
The above command seems to ignore the --exclude filter. As an alternative, I can do
grep -Ir --include "*.py" foo app/ | grep -v migrations
This works, but this loses highlighting of foo in the results. I can also bring find into the mix and keep my highlighting.
find app/ -name "*.py" -print0 | xargs -0 grep --exclude "*migrations*" foo
I'm just wondering if I'm missing something about the combination of command line parameters to grep or if they simply don't work together.
I was looking for a term on a .py file, but didn't want migration files to be scanned, so what I found (for grep 2.10) was the following (I hope this helps):
grep -nR --include="*.py" --exclude-dir=migrations whatever_you_are_looking_for .
man grep says:
--include=GLOB
Search only files whose base name matches GLOB (using wildcard matching as described under
--exclude).
because it says "only" there, i'm guessing that your --include statment is overriding your --exclude statement.

use grep to return a list of files, given multiple keywords (like google returns a list of webpages)

I need to find ALL files that have multiple keywords anywhere in the file (not necessarily on the same line), given a starting directory like ~/. Does "grep -ro" do this?
(I'm using Unix, Mac OSX 10.4)
You can use the -l option to get a list of filenames with matches, so it's just a matter of finding all of the files that have the first keyword and then filtering that list down to the files that also have the second keyword:
grep -rl first_keyword basedir | xargs grep -l second_keyword
To search just *.txt
find ~/. -name "*.txt" | xargs grep -l first_keyword | xargs grep -l second_keyword
Thanks Adam!

Resources