How to grep a matching filename AND extension from pattern file to a text file? - grep

Content of testfile.txt
/path1/abc.txt
/path2/abc.txt.1
/path3/abc.txt123
Content of pattern.txt
abc.txt$
Bash Command
grep -i -f pattern.txt testfile.txt
Output:
/path1/abc.txt
This is a working solution, but currently the $ in the pattern is manually added to each line and this edited pattern file is uploaded to users. I am trying to avoid the manual amendment.
Alternate solution to loop and read line by line, but required scripting skills or upload scripts to user environment.
Want to keep the original pattern files in an audited environment, users just login and run simple cut-n-paste commands.
Any one liner solution?

You can use sed to add $ to pattern.txt and then use grep, but you might run into issues due to regexp metacharacters like the . character. For example, abc.txt$ will also match abc1txt. And unless you take care of matching only the basename from the file path, abc.txt$ will also match /some/path/foobazabc.txt.
I'd suggest to use awk instead:
$ awk '!f{a[$0]; next} $NF in a' pattern.txt f=1 FS='/' testfile.txt
/path1/abc.txt
pattern.txt f=1 FS='/' testfile.txt here a flag f is set between the two files and field separator is also changed to / for the second file
!f{a[$0]; next} if flag f is not set (i.e. for the first file), build an array a with line contents as the key
$NF in a for the second file, if the last field matches a key in array a, print the line
Just noticed that you are also using -i option, so use this for case insensitive matching:
awk '!f{a[tolower($0)]; next} tolower($NF) in a'

Since pattern.txt contains only a single pattern, and you don't want to change it, since it is an audited file, you could do
grep -i -f "$(<pattern.txt)'$' testfile.txt
instead. Note that this would break, if the maintainer of the file one day decided to actually write there a terminating $.
IMO, it would make more sense to explain to the maintainer of pattern.txt that he is supposed to place there a simple regular expression, which is going to match your testfile. In this case s/he can decide whether the pattern really should match only the right edge or some inner part of the lines.
If pattern.txt contains more than one line, and you want to add the $ to each line, you can likewise do a
grep -i -f <(sed 's/$/$/' <pattern.txt) testfile.txt

As the '$' symbol indicates pattern end. The following script should work.
#!/bin/bash
file_pattern='pattern.txt' # path to pattern file
file_test='testfile.txt' # path to test file
while IFS=$ read -r line
do
echo "$line"
grep -wn "$line" $file_test
done < "$file_pattern"
You can remove the IFS descriptor if the pattern file comes with leading/trailing spaces.
Also the grep option -w matches only whole word and -n provides with line number.

Related

How can i make grep show a line ignoring the words i want?

I am trying to use grep with the pwd command.
So, if i enter pwd, it shows me something like:
/home/hrq/my-project/
But, for purposes of a script i am making, i need to use it with grep, so it only prints what is after hrq/, so i need to hide my home folder always (the /home/hrq/) excerpt, and show only what is onwards (like, in this case, only my-project).
Is it possible?
I tried something like
pwd | grep -ov 'home', since i saw that the "-v" flag would be equivalent to the NOT operator, and combine it with the "-o" only matching flag. But it didn't work.
Given:
$ pwd
/home/foo/tmp
$ echo "$PWD"
/home/foo/tmp
Depending on what it is you really want to do, either of these is probably what you really should be using rather than trying to use grep:
$ basename "$PWD"
tmp
$ echo "${PWD#/home/foo/}"
tmp
Use grep -Po 'hrq/\K.*', for example:
grep -Po 'hrq/\K.*' <<< '/home/hrq/my-project/'
my-project/
Here, grep uses the following options:
-P : Use Perl regexes.
-o : Print the matches only (1 match per line), not the entire lines.
\K : Cause the regex engine to "keep" everything it had matched prior to the \K and not include it in the match. Specifically, ignore the preceding part of the regex when printing the match.
SEE ALSO:
grep manual
perlre - Perl regular expressions

(bash) grep -i not making search case insensitive for input files

I am trying to search inside a folder containing several files. The name of the files is written in upper case with a .sub extension in lower case:
AAA.sub
BBB.sub
CCC.sub
DDD.sub
I am searching a pattern trough those file using grep, however i would like to only use lower case letter for the input files.
In the man page for grep it is written:
-i, --ignore-case
Ignore case distinctions in both the PATTERN and the input files. (-i is specified by POSIX.)
So, if i understood properly:
grep -i subckt /schematics/aaa
and
grep -i subckt /schematics/AAA
Are supposed to both be able to search a pattern "subckt" in the file "aaa" regardless of its case (AAA or aaa) and if two files named aaa and AAA are present at the same time in the foler, i expect grep to search trough both of them.
However when i try my search with the 1st instruction (lower case) it does not work, giving me "no such file or directory" message.
When i try to search with the 2nd instruction (upper case) it works properly.
I obviously understood something wrong about how the -i option with grep, can anyone give me an answer regarding this matter?
Is it possible to be case insensitive with the input files when using grep?
EDIT:
My question was lacking details, even tough i have found the answer to my problem i will add the details in case someone else stumbles upon this:
I have one file that contains a list of each file name i want to grep. My list looks like this:
aaa capacitor C_0
bbb capacitor C_0
ccc resistor R_in
...
The grep is done inside a perl script, the perl script parses the list file and gets the name of each individual file name (aaa bbb ccc) inside a while loop.
However the name inside the list file is written in lower case whereas the name of the files i want to grep is written in upper case.
This is why i wanted to have the input file search to be case insensitive so that i could directly do a grep -i subck aaa and it would search inside the file 'AAA'
However, since the grep is launched from a perl script, and since it is apparently not possible to have grep behave like that, i used the uc() function of perl to convert aaa to AAA and do my grep with it. (see my answer below)
-i affects how the contents are searched, not the name of the files.
When the man page says "Ignore case distinctions in both the PATTERN and the input files." that really means that case is ignored in the pattern ( searching for AAA and aaa are equivalent) and the contents of the input files (a line would match if it includes "AAA" or "aaa" or even "AaA")
I think you want to either list all the filenames on the command line, or find a glob (i.e. wildcard) that matches all the filenames:
grep -i subckt *.sub
In Unix/Linux shells (bash, zsh, and so on) "*" is processed by the shell (bash) not the command (grep). The command receives the list of files and actually can't tell the difference between whether a user typed "grep foo *" and "grep foo file1 file2 file3" (if the directory includes those 3 files)
Please try the following command
find . -iname aaa.sub | grep -rn subckt
find with -iname option will list out files ignoring their case. In the above case find . -iname will list out both aaa.sub & AAA.sub. The output is piped to the grep command.
I have found a way to circumvent my problem by using the uc (upper case) function of perl to convert the input files for the grep function into upper case.
The grep command was launched from a perl script in the first place:
grep -i subckt /schematics/aaa
So, i just did that in my perl script:
$tmp=aaa
$tmp=uc($tmp)
grep -i subckt /schematics/$tmp
Now, the "aaa" name is just an example. In the perl script it is recovered from another parsed file that is written in lower case.
Thanks for the answers tough.
grep uses the filenames as they are listed on the command line. The -i option affects the contents of the files, not the names of the files.
You can use find to select filenames to be searched. The -iname option lets you match files ignoring case.
grep subckt $(find /schematics -iname aaa.sub -print)
If you have many filenames, or those filenames include spaces or other characters that would confuse the shell, the safe and secure way to do this is using the -print0 and -0 options:
find /schematics -iname aaa.sub -print0 | xargs -r -0 grep -i subckt

Get specific value of preprocessor macro

In my build settings i have define some preprocessor macros
i.e. SANDBOX_ENV=1
I want to use the value of SANDBOX_ENV in my shell script.
I have tried echo "SANDBOX value is = ${GCC_PREPROCESSOR_DEFINITIONS}"
but its giving me all macros values like DEBUG=1 SANDBOX_ENV=1 COCOAPODS=1
I want to use value that is assigned to SANDBOX_ENV
Try this:
#!/bin/bash
GCC_PREPROCESSOR_DEFINITIONS="DEBUG=1 SANDBOX_ENV=1 COCOAPODS=1"
# delete everything before our value ans stuff into TMPVAL
TMPVAL="${GCC_PREPROCESSOR_DEFINITIONS//*SANDBOX_ENV=/}"
# remove everything after our value from TMPVAL and return it
TMPVAL="${TMPVAL// */}"
echo $TMPVAL; #outputs 1
HTH,
bovako
You should be able to parse it easily with awk or something, but here's how I'd do it:
echo $GCC_PREPROCESSOR_DEFINITIONS | grep -Po 'SANDBOX_ENV=\d+' | sed 's/SANDBOX_ENV=//'
In your echo context:
echo "SANDBOX value is $(echo $GCC_PREPROCESSOR_DEFINITIONS | grep -Po 'SANDBOX_ENV=\d+' | sed 's/SANDBOX_ENV=//')"
Basically I piped the contents of GCC_PREPROCESSOR_DEFINITIONS and grepped out the SANDBOX_ENV portion.
grep -P
is to use the Perl regex \d+, because I don't like POSIX. Just a preference. Essentially what
grep -P 'SANDBOX_ENV=\d+'
does is to find the line in the content piped to it that contains the string "SANDBOX_ENV=" and any number of digits succeeding it. If the value might contain alphanumerics you can change the \d for digits to \w for word which encompasses a-zA-Z0-9 and you get:
grep -Po 'SANDBOX_ENV=\w+'
The + just means there must be at least one character of the type specified by the character before it, including all succeeding characters that matches.
the -o (only-matching) in grep -Po is used to isolate the match so that instead of the entire line you just get "SANDBOX_ENV=1".
This output is then piped to the sed command where I do a simple find and replace where I replaced "SANDBOX_ENV=" with "", leaving only the value behind it. There are probably easier ways to do it like with awk, but you'll have to learn that yourself.
If you want to have something self contained within the Build Settings and you don't mind slight indirection, then:
Create User-Defined settings SANDBOX_ENV=1 (or whatever value you want)
In Preprocessor Macros, add SANDBOX_ENV=${SANDBOX_ENV}
In your shell, to test, do
echo ${SANDBOX_ENV}
With the User-Defined Settings, you'll still be able to modify the value for Build Configuration and Architecture. So, for example, you could make the Debug config be SANDBOX_ENV=0 and Release be SANDBOX_ENV=1.
Might be the obvious answer, but have you simply tried:
echo ${SANDBOX_ENV}
If that doesn't work, try using eval:
eval "${GCC_PREPROCESSOR_DEFINITIONS}"
echo ${SANDBOX_ENV}

How to truncate long matching lines returned by grep or ack

I want to run ack or grep on HTML files that often have very long lines. I don't want to see very long lines that wrap repeatedly. But I do want to see just that portion of a long line that surrounds a string that matches the regular expression. How can I get this using any combination of Unix tools?
You could use the grep options -oE, possibly in combination with changing your pattern to ".{0,10}<original pattern>.{0,10}" in order to see some context around it:
-o, --only-matching
Show only the part of a matching line that matches PATTERN.
-E, --extended-regexp
Interpret pattern as an extended regular expression (i.e., force grep to behave as egrep).
For example (from #Renaud's comment):
grep -oE ".{0,10}mysearchstring.{0,10}" myfile.txt
Alternatively, you could try -c:
-c, --count
Suppress normal output; instead print a count of matching lines
for each input file. With the -v, --invert-match option (see
below), count non-matching lines.
Pipe your results thru cut. I'm also considering adding a --cut switch so you could say --cut=80 and only get 80 columns.
You could use less as a pager for ack and chop long lines: ack --pager="less -S" This retains the long line but leaves it on one line instead of wrapping. To see more of the line, scroll left/right in less with the arrow keys.
I have the following alias setup for ack to do this:
alias ick='ack -i --pager="less -R -S"'
grep -oE ".\{0,10\}error.\{0,10\}" mylogfile.txt
In the unusual situation where you cannot use -E, use lowercase -e instead.
Explanation:
cut -c 1-100
gets characters from 1 to 100.
The Silver Searcher (ag) supports its natively via the --width NUM option. It will replace the rest of longer lines by [...].
Example (truncate after 120 characters):
$ ag --width 120 '#patternfly'
...
1:{"version":3,"file":"react-icons.js","sources":["../../node_modules/#patternfly/ [...]
In ack3, a similar feature is planned but currently not implemented.
Taken from: http://www.topbug.net/blog/2016/08/18/truncate-long-matching-lines-of-grep-a-solution-that-preserves-color/
The suggested approach ".{0,10}<original pattern>.{0,10}" is perfectly good except for that the highlighting color is often messed up. I've created a script with a similar output but the color is also preserved:
#!/bin/bash
# Usage:
# grepl PATTERN [FILE]
# how many characters around the searching keyword should be shown?
context_length=10
# What is the length of the control character for the color before and after the
# matching string?
# This is mostly determined by the environmental variable GREP_COLORS.
control_length_before=$(($(echo a | grep --color=always a | cut -d a -f '1' | wc -c)-1))
control_length_after=$(($(echo a | grep --color=always a | cut -d a -f '2' | wc -c)-1))
grep -E --color=always "$1" $2 |
grep --color=none -oE \
".{0,$(($control_length_before + $context_length))}$1.{0,$(($control_length_after + $context_length))}"
Assuming the script is saved as grepl, then grepl pattern file_with_long_lines should display the matching lines but with only 10 characters around the matching string.
I put the following into my .bashrc:
grepl() {
$(which grep) --color=always $# | less -RS
}
You can then use grepl on the command line with any arguments that are available for grep. Use the arrow keys to see the tail of longer lines. Use q to quit.
Explanation:
grepl() {: Define a new function that will be available in every (new) bash console.
$(which grep): Get the full path of grep. (Ubuntu defines an alias for grep that is equivalent to grep --color=auto. We don't want that alias but the original grep.)
--color=always: Colorize the output. (--color=auto from the alias won't work since grep detects that the output is put into a pipe and won't color it then.)
$#: Put all arguments given to the grepl function here.
less: Display the lines using less
-R: Show colors
S: Don't break long lines
Here's what I do:
function grep () {
tput rmam;
command grep "$#";
tput smam;
}
In my .bash_profile, I override grep so that it automatically runs tput rmam before and tput smam after, which disabled wrapping and then re-enables it.
ag can also take the regex trick, if you prefer it:
ag --column -o ".{0,20}error.{0,20}"

Can grep show only words that match search pattern?

Is there a way to make grep output "words" from files that match the search expression?
If I want to find all the instances of, say, "th" in a number of files, I can do:
grep "th" *
but the output will be something like (bold is by me);
some-text-file : the cat sat on the mat
some-other-text-file : the quick brown fox
yet-another-text-file : i hope this explains it thoroughly
What I want it to output, using the same search, is:
the
the
the
this
thoroughly
Is this possible using grep? Or using another combination of tools?
Try grep -o:
grep -oh "\w*th\w*" *
Edit: matching from Phil's comment.
From the docs:
-h, --no-filename
Suppress the prefixing of file names on output. This is the default
when there is only one file (or only standard input) to search.
-o, --only-matching
Print only the matched (non-empty) parts of a matching line,
with each such part on a separate output line.
Cross distribution safe answer (including windows minGW?)
grep -h "[[:alpha:]]*th[[:alpha:]]*" 'filename' | tr ' ' '\n' | grep -h "[[:alpha:]]*th[[:alpha:]]*"
If you're using older versions of grep (like 2.4.2) which do not include the -o option, then use the above. Else use the simpler to maintain version below.
Linux cross distribution safe answer
grep -oh "[[:alpha:]]*th[[:alpha:]]*" 'filename'
To summarize: -oh outputs the regular expression matches to the file content (and not its filename), just like how you would expect a regular expression to work in vim/etc... What word or regular expression you would be searching for then, is up to you! As long as you remain with POSIX and not perl syntax (refer below)
More from the manual for grep
-o Print each match, but only the match, not the entire line.
-h Never print filename headers (i.e. filenames) with output lines.
-w The expression is searched for as a word (as if surrounded by
`[[:<:]]' and `[[:>:]]';
The reason why the original answer does not work for everyone
The usage of \w varies from platform to platform, as it's an extended "perl" syntax. As such, those grep installations that are limited to work with POSIX character classes use [[:alpha:]] and not its perl equivalent of \w. See the Wikipedia page on regular expression for more
Ultimately, the POSIX answer above will be a lot more reliable regardless of platform (being the original) for grep
As for support of grep without -o option, the first grep outputs the relevant lines, the tr splits the spaces to new lines, the final grep filters only for the respective lines.
(PS: I know most platforms by now would have been patched for \w.... but there are always those that lag behind)
Credit for the "-o" workaround from #AdamRosenfield answer
It's more simple than you think. Try this:
egrep -wo 'th.[a-z]*' filename.txt #### (Case Sensitive)
egrep -iwo 'th.[a-z]*' filename.txt ### (Case Insensitive)
Where,
egrep: Grep will work with extended regular expression.
w : Matches only word/words instead of substring.
o : Display only matched pattern instead of whole line.
i : If u want to ignore case sensitivity.
You could translate spaces to newlines and then grep, e.g.:
cat * | tr ' ' '\n' | grep th
Just awk, no need combination of tools.
# awk '{for(i=1;i<=NF;i++){if($i~/^th/){print $i}}}' file
the
the
the
this
thoroughly
grep command for only matching and perl
grep -o -P 'th.*? ' filename
I was unsatisfied with awk's hard to remember syntax but I liked the idea of using one utility to do this.
It seems like ack (or ack-grep if you use Ubuntu) can do this easily:
# ack-grep -ho "\bth.*?\b" *
the
the
the
this
thoroughly
If you omit the -h flag you get:
# ack-grep -o "\bth.*?\b" *
some-other-text-file
1:the
some-text-file
1:the
the
yet-another-text-file
1:this
thoroughly
As a bonus, you can use the --output flag to do this for more complex searches with just about the easiest syntax I've found:
# echo "bug: 1, id: 5, time: 12/27/2010" > test-file
# ack-grep -ho "bug: (\d*), id: (\d*), time: (.*)" --output '$1, $2, $3' test-file
1, 5, 12/27/2010
cat *-text-file | grep -Eio "th[a-z]+"
You can also try pcregrep. There is also a -w option in grep, but in some cases it doesn't work as expected.
From Wikipedia:
cat fruitlist.txt
apple
apples
pineapple
apple-
apple-fruit
fruit-apple
grep -w apple fruitlist.txt
apple
apple-
apple-fruit
fruit-apple
I had a similar problem, looking for grep/pattern regex and the "matched pattern found" as output.
At the end I used egrep (same regex on grep -e or -G didn't give me the same result of egrep) with the option -o
so, I think that could be something similar to (I'm NOT a regex Master) :
egrep -o "the*|this{1}|thoroughly{1}" filename
To search all the words with start with "icon-" the following command works perfect. I am using Ack here which is similar to grep but with better options and nice formatting.
ack -oh --type=html "\w*icon-\w*" | sort | uniq
You could pipe your grep output into Perl like this:
grep "th" * | perl -n -e'while(/(\w*th\w*)/g) {print "$1\n"}'
grep --color -o -E "Begin.{0,}?End" file.txt
? - Match as few as possible until the End
Tested on macos terminal
$ grep -w
Excerpt from grep man page:
-w: Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character.
ripgrep
Here are the example using ripgrep:
rg -o "(\w+)?th(\w+)?"
It'll match all words matching th.

Resources