grep -w with modified word-constituent characters - grep

I am using grep -w for exact word match. My problem is grep -w considers only letters, digits and underscore as word-constituent characters. But I want all other characters except space to be word-constituent characters. Here is my text file, output and desired output.
File : file1.txt
clk_file help
co <clk_file> help2
code_error help3
Command : grep -w clk_file file1.txt
Output :
clk_file help
co <clk_file> help2
Desired Output :
clk_file help
Can anyone tell me what should I use to get desired output.
Thanks in advance

grep is not enough for this, I believe.
This awk is a workaround:
$ awk '{f=0; for (i=1; i<=NF; i++) { if ($i == "clk_file") f=1} if (f) print}' file
clk_file help
It loops through all the words of the line and sets a flag f=1 in case the exact word clk_file matches one of them. Then, it prints the line if the flag was set.
In case you want the pattern to be a variable, use -v as this:
$ myvar="clk_file"
$ awk -v patt=$myvar '{f=0; for (i=1; i<=NF; i++) { if ($i == patt) f=1} if (f) print}' file
clk_file help
This is because it is not the same a awk variable and a bash variable. To use a bash variable in awk you need this -v assignment.

With sed:
sed -n '/\(^clk_file \| clk_file$\| clk_file \)/p' yourfile
With grep:
grep '\(^clk_file \| clk_file$\| clk_file \)' yourfile
As #AvinashRaj commented, If your grep supports -P option, then you no need to escape | , ( and ). Simply write as,
grep -P '(^clk_file | clk_file$| clk_file )' yourfile
Test:
$ cat file
clk_file help
help clk_file
co <clk_file> help2
code_error help3
help1 clk_file help2
$ sed -n '/\(^clk_file \| clk_file$\| clk_file \)/p' file
clk_file help
help clk_file
help1 clk_file help2
$ grep '\(^clk_file \| clk_file$\| clk_file \)' file
clk_file help
help clk_file
help1 clk_file help2
If your search value is in variable,
$ myval="clk_file"
$ grep "\(^${myval} \| ${myval}$\| ${myval} \)"
clk_file help
help clk_file
help1 clk_file help2

Related

How to grep repeated strings on a single line?

I have this a file.txt with one line, whose content is
/app/jdk/java/bin/java -server -Xms3g -Xmx3g -XX:MaxPermSize=256m -Dweblogic.Name=O2pPod8_mapp_msrv1_1 -Djava.security.policy=/app/Oracle/Middleware/Oracle_Home/wlserver/server/lib/weblogic.policy -Djava.security.egd=file:/dev/./urandom -Dweblogic.ProductionModeEnabled=true -Dweblogic.system.BootIdentityFile=/app/Oracle/Middleware/Oracle_Home/user_projects/domains/O2pPod8_domain/servers/O2pPod8_mapp_msrv1_1/data/nodemanager/boot.properties -Dweblogic.nodemanager.ServiceEnabled=true -Dweblogic.nmservice.RotationEnabled=true -Dweblogic.security.SSL.ignoreHostnameVerification=false -Dweblogic.ReverseDNSAllowed=false -Xms8192m -Xmx8192m -XX:MaxPermSize=2048m -XX:NewSize=1300m -XX:MaxNewSize=1300m -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
and when I do
cat file.txt | grep -io "Xms.*" | awk '{FS" "; print $1} ' | cut -d "s" -f2
output:
3g
why is grep not reading the second occurrence, i.e. I expect 3g and 8192m.
Infact, how do I print only 8192m in this case?
Your regex just says "find Xms followed by anything repeated 0 to n times". That returns the rest of the row from Xms onward.
What you actually want is something like "find Xms followed by anything until there's a whitespace repeated 0 to n times".
grep -io "Xms[^ ]*" file.txt | awk '{FS" "; print $1} ' | cut -d "s" -f2
In [^ ] the ^ means "not"
I'm not really sure what you are trying to achieve here but if you want the endings of all space-separated strings starting with -Xms, using bare awk is:
$ awk -v RS=" " '/^-Xms/{print substr($0,5)}' file
3g
8192m
Explained:
$ awk -v RS=" " ' # space separated records
/^-Xms/ { # strings starting with -Xms
print substr($0,5) # print starting from 5th position
}' file
If you wanted something else (word repeated in the title puzzles me a bit), please update the question with more detailed requirements.
Edit: I just noticed how do I print only 8192m in this case (that's the repeated maybe). Let's add a counter c and not print the first instance:
$ awk -v RS=" " '/^-Xms/&&++c>1{print substr($0,5)}' file
8192m
You could use grep -io "Xms[0-9]*[a-zA-Z]" instead of grep -io "Xms.*" to match a sequence of digits followed by a single character instead the entire line within a single group:
cat file.txt | grep -io "Xms[0-9]*[a-zA-Z]" | awk '{FS" "; print $1} ' | cut -d "s" -f2
Hope this helps!
The .* in your regexp is matching the rest of the line, you need [^ ]* instead. Look:
$ grep -o 'Xms.*' file
Xms3g -Xmx3g -XX:MaxPermSize=256m -Dweblogic.Name=O2pPod8_mapp_msrv1_1 -Djava.security.policy=/app/Oracle/Middleware/Oracle_Home/wlserver/server/lib/weblogic.policy -Djava.security.egd=file:/dev/./urandom -Dweblogic.ProductionModeEnabled=true -Dweblogic.system.BootIdentityFile=/app/Oracle/Middleware/Oracle_Home/user_projects/domains/O2pPod8_domain/servers/O2pPod8_mapp_msrv1_1/data/nodemanager/boot.properties -Dweblogic.nodemanager.ServiceEnabled=true -Dweblogic.nmservice.RotationEnabled=true -Dweblogic.security.SSL.ignoreHostnameVerification=false -Dweblogic.ReverseDNSAllowed=false -Xms8192m -Xmx8192m -XX:MaxPermSize=2048m -XX:NewSize=1300m -XX:MaxNewSize=1300m -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
$ grep -o 'Xms[^ ]*' file
Xms3g
Xms8192m
$ grep -o 'Xms[^ ]*' file | cut -d's' -f2
3g
8192m
$ grep -o 'Xms[^ ]*' file | cut -d's' -f2 | tail -1
8192m
or more concisely:
$ sed 's/.*Xms\([^ ]*\).*/\1/' file
8192m
The positive lookbehind of PCRE (the form: (?<=RE1)RE2) can resolve the problem easily:
$ grep -oP '(?<=Xms)\S+' file.txt
3g
8192m
Explains:
-o: show only the part of a line matching PATTERN.
-P: PATTERN is a Perl regular expression.
(?<=Xms)\S+: matches all continuous non-whitespace strings which are just following the string Xms.

How to search for 2 key words from files in a directory and print their filename if it occurs more than once

I am trying to grep or find for 2 specific words in each file in a directory. And then If i find more than one file found with such a combination - only then I should print those file names to a CSV file.
Here is what I tried so far:
find /dir/test -type f -printf "%f\n" | xargs grep -r -l -e 'ABCD1' -e 'ABCD2' > log1.csv
But this will provide all file names that have "ABCD1" and "ABCD2". In other words, this command will print the filename even if there is only one file that has this combo.
I will need to grep the entire directory for those 2 words and both words MUST be in more than one file if it has to write the filenames to CSV. I should also be able to include sub directories
Any help would be great!
Thanks
find + GNU grep solution:
find . -type f -exec grep -qPz 'ABCD1[\s\S]*ABCD2|ABCD2[\s\S]*ABCD1' {} \; -printf "%f\n" \
| tee /tmp/flist | [[ $(wc -l) -gt 1 ]] && cat /tmp/flist > log1.csv
Alternative way:
grep -lr 'ABCD2' /dir/test/* | xargs grep -l 'ABCD1' | tee /tmp/flist \
| [[ $(wc -l) -gt 1 ]] && sed 's/.*\/\([^\/]*\)$/\1/' /tmp/flist > log1.csv

trying to grep '--string' fails

I'm trying to grep for a string that starts with "--"
for some reason it counted as special character, but even when trying to use -F then grep gives me bad syntax:
[root#pc-01 /]# grep -F --restore .
-bash: --restore: command not found
any tips?
Thanks.
Try following.
grep -F -- --restore filename
You can escape the first - :
Without escaping:
[root#TIAGO-TEST2 tmp]# echo '--aa --bb --cc' | grep -o '--b'
grep: option '--b' is ambiguous; possibilities: '--basic-regexp' '--binary' '--byte-offset' '--binary-files' '--before-context'
Usage: grep [OPTION]... PATTERN [FILE]...
Try `grep --help' for more information.
Escaping:
[root#TIAGO-TEST2 tmp]# echo '--aa --bb --cc' | grep -o '\--b'
--b

grep is not working inside while loop

I have two files
File1
area a
area b
areaf
File2
area a :aaaa
area b:bbbb
area3:abc
areaf:hsg
area4:uhg
area5:yutr
while read -r line
do
grep -w ^line File2 | cut -d ":" -f2
done < File1
Desired output
aaaa
bbbb
hsg
actual output
grep: can't open a
area a
grep: cant open b
area3:abc
areaf:hsg
area4:uhg
area5:yutr
but when i run grep -w ^"area a" File2 | cut -d ":" -f2 it is giving the correct output :
aaaa
Please assist me on this. i tried for loop also. no success. grep is not working inside loop.
Your variable line might contain "special characters". For example, a space that might be interpreted as a separator by the shell. Or some characters that might be interpreted as pattern metacharacter by grep.
You both need to use fgrep and to quote your variable (I'm not sure -w add anything to that command -- why do you feel the need of it?):
fgrep -w "$line"
But doing so you loose the ability to locate "the first character"
An other option if the "start of line" match is required is to escape the search string:
while read -r line
do
line=$(echo "$line" | sed -e 's/[]\/$*.^|[]/\\&/g')
grep -w "^$line" File2 | cut -d ":" -f2
done < File1
You can achieve the same result without a loop, since grep can read patterns from a file via the -f option. This will be more robust:
grep -f input1 input2 | cut -d: -f2
Gives:
aaaa
bbbb
hsg

How do you exclude symlinks in a grep?

I want to grep -R a directory but exclude symlinks how dow I do it?
Maybe something like grep -R --no-symlinks or something?
Thank you.
Gnu grep v2.11-8 and on if invoked with -r excludes symlinks not specified on the command line and includes them when invoked with -R.
If you already know the name(s) of the symlinks you want to exclude:
grep -r --exclude-dir=LINK1 --exclude-dir=LINK2 PATTERN .
If the name(s) of the symlinks vary, maybe exclude symlinks with a find command first, and then grep the files that this outputs:
find . -type f -a -exec grep -H PATTERN '{}' \;
The '-H' to grep adds the filename to the output (which is the default if grep is searching recursively, but is not here, where grep is being handed individual file names.)
I commonly want to modify grep to exclude source control directories. That is most efficiently done by the initial find command:
find . -name .git -prune -o -type f -a -exec grep -H PATTERN '{}' \;
For now.. here is how I would exclude symbolic links when using grep
If you want just file names matching your search:
for f in $(grep -Rl 'search' *); do if [ ! -h "$f" ]; then echo "$f"; fi; done;
Explaination:
grep -R # recursive
grep -l # file names only
if [ ! -h "file" ] # bash if not a symbolic link
If you want the matched content output, how about a double grep:
srch="whatever"; for f in $(grep -Rl "$srch" *); do if [ ! -h "$f" ]; then
echo -e "\n## $f";
grep -n "$srch" "$f";
fi; done;
Explaination:
echo -e # enable interpretation of backslash escapes
grep -n # adds line numbers to output
.. It's not perfect of course. But it could get the job done!
If you're using an older grep that does not have the -r behavior described in Aryeh Leib Taurog's answer, you can use a combination of find, xargs and grep:
find . -type f | xargs grep "text-to-search-for"
If you are using BSD grep (Mac) the following works similar to '-r' option of Gnu grep.
grep -OR <PATTERN> <PATH> 2> /dev/null
From man page
-O If -R is specified, follow symbolic links only if they were explicitly listed on the command line.

Resources