How to grep out a batch of consecutive lines from a file - grep

I want to grep out a batch of consecutive lines, that starts at a specific pattern and ends at specific pattern.
E.g. the content of file looks like this :
line 1
line 2
.
.
.
my_start_pattern
.
.
.
my_end_pattern
.
.
.
line n
The output of grep should look like following :
my_start_pattern
.
.
.
my_end_pattern
Thanks.

Don't know if grep can do this, but awk can.
awk '/start pattern/,/end pattern/' data_file_name
(leave off the file name if you want to filter from stdin)

Related

show filename with matching word from grep only

I am trying to find which words happened in logfiles plus show the logfilename for anything that matches following pattern:
'BA10\|BA20\|BA21\|BA30\|BA31\|BA00'
so if file dummylogfile.log contains BA10002 I would like to get a result such as:
dummylogfile.log:BA10002
it is totally fine if the logfile shows up twice for duplicate matches.
the closest I got is:
for f in $(find . -name '*.err' -exec grep -l 'BA10\|BA20\|BA21\|BA30\|BA31\|BA00' {} \+);do printf $f;printf ':';grep -o 'BA10\|BA20\|BA21\|BA30\|BA31\|BA00' $f;done
but this gives things like:
./register-05-14-11-53-59_24154.err:BA10
BA10
./register_mdw_files_2020-05-14-11-54-32_24429.err:BA10
BA10
./process_tables.2020-05-18-11-18-09_11428.err:BA30
./status_load_2020-05-18-11-35-31_9185.err:BA30
so,
1) there are empty lines with only the second match and
2) the full match (e.g., BA10004) is not shown.
thanks for the help
There are a couple of options you can pass to grep:
-H: This will report the filename and the match
-o: only show the match, not the full line
-w: The match must represent a full word (string build from [A-Za-z0-9_])
If we look at your regex, you use BA01, this will match only BA01 which can appear anywhere in the text, also mid word. If you want the regex to match a full word, it should read BA01[[:alnum:]_]* which adds any sequence of word-constituent characters (equivalent to [A-Za-z0-9_]). You can test this with
$ echo "foo BA01234 barBA012" | grep -Ho "BA01"
(standard input):BA01
(standard input):BA01
$ echo "foo BA01234 barBA012" | grep -How "BA01"
$ echo "foo BA01234 barBA012" | grep -How "BA01[[:alnum:]_]*"
(standard input):BA01234
So your grep should look like
grep -How "\('BA10\|BA20\|BA21\|BA30\|BA31\|BA00'\)[[:alnum:]_]*" *.err
From your example it seems that all files are in one directory. So the following works right away:
grep -l 'BA10\|BA20\|BA21\|BA30\|BA31\|BA00' *.err
If the files are in different directories:
find . -name '*.err' -print | xargs -I {} grep 'BA10\|BA20\|BA21\|BA30\|BA31\|BA00' {} /dev/null
Explanation: the addition of /dev/null to the filename {} forces grep to report the matching filename

Unable to match patterns from one file line by line with contents of other file | bash shell

I have a file1.txt containing text like:
123 456 789
I need to search these strings line by line in another file2.txt like this:
"123" This is line 1
"456" This is line 2
"789" This is line 3
Matching lines need to be echoed or redirected to file3.txt
I tried couple of ways:
while read -r line; do
grep "$line" -c file2.txt
done < file1.txt
This doesn't give me any matches, although there are some.
I also tried grep like this:
grep -f file1.txt -c file2.txt
which unfortunately doesn't work either.
For all three matches, output should have been:
1
1
1
I am new to shell scripting. Could someone please suggest what is wrong here?
Thanks in advance.
In case you are ok with awk could you please try following then.
awk 'FNR==NR{a[$0];next} ($2 in a)' file1.txt FS="\"" file2.txt
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when file1.txt is being read.
a[$0] ##Creating array a with index current line here.
next ##next will skip further statements from here.
}
($2 in a) ##Checking condition if 2nd field is present in array a then print the current line from file2.txt
' file1.txt FS="\"" file2.txt ##Mentioning Input_file names here, where setting FS as " for file2.txt
2nd solution: With changing Input_file(s) sequence of reading.
awk 'FNR==NR{a[$2]=$0;next} ($0 in a){print a[$0]}' FS="\"" file2.txt file1.txt

grep exact match of string with alphabets and numbers

I am using grep to extract lines from file 1 that matches with string in file2. The string in file 2 has both alphabets and numbers. eg;
MSTRG.18691.1
MSTRG.18801.1
I used sed to write word boundaries for all the strings in the file 2.
file 2
\<MSTRG.18691.1\>
\<MSTRG.18801.1\>
and used grep -f file2 file1
but output has
MSTRG.18691.1.2
MSTRG.18801.1.3 also..
I want lines that matches exactly,
MSTRG.18691.1
MSTRG.18801.1
and not,
MSTRG.18691.1.2
MSTRG.18801.1.3
Few lines from my file1
t_name gene_name FPKM TPM
MSTRG.25.1 . 0 0
rna71519 . 93.398872 194.727926057583
gene34024 ND1 2971.72876 6195.77694943117
MSTRG.28.1 . 0 0
MSTRG.28.2 . 0 0
rna71520 . 33.235409 69.2927240732149
Updating the answer
You can use start with ^ and end with $ operator to match start with and begin with. To match exactly MSTRG.18691.1 you can add ^ & $ at both ends and remove the word boundaries, additionally . has special meaning in regex to match exactly . we need to escape that with a backslash \
Example pattern:
^MSTRG\.18691\.1$
^MSTRG\.18801\.1$
file1
MSTRG.18691.1
MSTRG.1311.1
MSTRG.18801.2
MSTRG.18801.3
MSTRG.18801.1.2
MSTRG.18801.1.1
MSTRG.18801.1
PrefixMSTRG.18801.1
Just create a normal file named file1 and paste the above content into it.
file2 (pattern file)
^MSTRG\.18801\.1$
Just create a normal file named file2 and paste the above content into it.
Run the below command from commandline
grep -i --color -f file2 file1
Result:
MSTRG.18801.1
Sed to add changes to the pattern file
Here is the sed command to escape . and add ^ and $ at the beginning and end of the pattern file you already have.
sed -Ee 's/\./\\./g' -e 's/^/\^/g' -e 's/$/\$/g' file2 > file2_updated
-E to support extended regex on BSD sed, you may need to replace -E with -r based on your system's sed
Updated patterns will be saved to file2_updated. Need to use the new pattern file in grep like this
grep -i -f file2_updated file1
The flag you're looking for is -F. From man grep:
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings (instead of regular expressions), separated by newlines, any of which is to be matched.
You can use this quite comfortably in conjunction with -f:
grep -Ff file2 file1
To be clear, this will treat every line of file2 as an exact match against file1.

Retrieve only the matching pattern of a list in another list with grep

I have this kind of files:
file1 :
TMCS09g1008676.1
MAMBO.3.3.2.1
TMCS09g1008678.1
TMCS09g1008678.2
CSH.1.2
TMCS09g1008681.3
TMCS09g1008682.1
TMCS09g1008683
file2:
TMCS09g1008676
MAMBO.3.3.2
TMCS09g1008678
TMCS09g1008679
CSH.1
TMCS09g1008681
TMCS09g1008682
TMCS09g1008683
What I want to do is to retrieve only the matching part of file 2 in file 1. Basically only the "red" part of the terminal showing the matching with the command grep -w -F -f file2 file1
Is there a way to achieve this?
thanks in advance

How can I grep hidden files?

I am searching through a Git repository and would like to include the .git folder.
grep does not include this folder if I run
grep -r search *
What would be a grep command to include this folder?
Please refer to the solution at the end of this post as a better alternative to what you're doing.
You can explicitly include hidden files (a directory is also a file).
grep -r search * .[^.]*
The * will match all files except hidden ones and .[^.]* will match only hidden files without ... However this will fail if there are either no non-hidden files or no hidden files in a given directory. You could of course explicitly add .git instead of .*.
However, if you simply want to search in a given directory, do it like this:
grep -r search .
The . will match the current path, which will include both non-hidden and hidden files.
I just ran into this problem, and based on #bitmask's answer, here is my simple modification to avoid the problem pointed out by #sehe:
grep -r search_string * .[^.]*
Perhaps you will prefer to combine "grep" with the "find" command for a complete solution like:
find . -exec grep -Hn search {} \;
This command will search inside hidden files or directories for string "search" and list any files with a coincidence with this output format:
File path:Line number:line with coincidence
./foo/bar:42:search line
./foo/.bar:42:search line
./.foo/bar:42:search line
./.foo/.bar:42:search line
To prevent matching . and .. which are not hidden files, you can use grep with ls -A like in this example:
ls -A | grep "^\."
^\. states that the first character must be .
The -A or --almost-all option excludes the results . and .. so that only hidden files and directories are matched.
You may want to use this approach, assuming you're searching the current directory (otherwise replace . with the desired directory):
find . -type f | xargs grep search
or if you just want to search at the top level (which is quicker to test if you're trying these out):
find . -type f -maxdepth 1 | xargs grep search
UPDATE: I modified the examples in response to Scott's comments. I also added "-type f".
To search within ONLY all hidden files and directories from your current location:
find . -name ".*" -exec grep -rs search {} \;
ONLY all hidden files:
find . -name ".*" -type f -exec grep -s search {} \;
ONLY all hidden directories:
find . -name ".*" -type d -exec grep -rs search {} \;
All the other answers are better. This one might be easy to remember:
find . -type f | xargs grep search
It finds only files (including hidden) and greps each file.
To find only within a certain folder you can use:
ls -al | grep " \."
It is a very simple command to list and pipe to grep.
In addition to Tyler's suggestion, Here is the command to grep all files and folders recursively including hidden files
find . -name "*.*" -exec grep -li 'search' {} \;
You can also search for specific types of hidden files like so for hidden directory files:
grep -r --include=*.directory "search-string"
This may work better than some of the other options. The other options that worked can be too slow.

Resources