I want to match all cyrillic characters, but print the ID to file. For example:
Author: Doe, John
Title: Оптимизация ресурсного потенциала промышленности города с учетом его конкурентных преимуществ
ID: 1234567
My current approach is to grep for cyrillic characters:
grep -i -r --include=*{rdf,redif,rdf~} --color="auto" -P -n '[\x{0400}-\x{04FF}]' > cyrillic.txt
How can I just print the ID line to a file and not the matching line?
Use -A1 option if the ID: line is right after the matching pattern. Then pipe it to another grep to get the line with ID:.
grep -A1 -i -r --include=*{rdf,redif,rdf~} --color="auto" -P -n '[\x{0400}-\x{04FF}]' \
| grep 'ID: ' > cryllic.txt
Use grep flag h - to suppress output file names - you'll have output like:
4:string with matching pattern
5:string with matching pattern
7:string with matching pattern
Now you can pipe this output into awk and print only fist column, which is matching string number:
{your_grep} | awk -F ':' '{print $1}' > cyrillic.txt
Related
How can we find two substrings within a line in particular order using grep?
For example:
grep -c "word1" | grep -r "word2" logs
gives if string has both word1 and word2. I am looking for string which has "... word1.... word2..."
Try a regex in grep like grep -E "word1.*word2"
$ echo -e 'both word1 and word2. \nI hich\n has "... word1.... word2..."' | grep -E "word1.*word2"
both word1 and word2.
has "... word1.... word2..."
You may need a better regex to match exactly the words, but that is not your question.
Team,
I want to grep for a substring container - and then only output that string and not whole line. how can i? I know i can awk on space and pull using $ but want to know how to do in grep?
echo $test_pods_info | grep -F 'test-'
output
test-78ac951e-89a6-4199-87a4-db8a1b8b054f export-9b55f0d5-071d-431-1d2ux0-avexport-xavierisp-sjc4--a4dd85-102 1/1 Running 0 19h
expected output
test-78ac951e-89a6-4199-87a4-db8a1b8b054f
awk is more suitable for this as you want to get first field in a matching line:
awk '/test-/{print $1}' <<< "$taxIncluded"
test-78ac951e-89a6-4199-87a4-db8a1b8b054f
If you really want to use grep then this might be what you're looking for:
grep -o 'test-\S*' <<< "$taxIncluded"
or:
grep -o 'test-[^[:space:]]*' <<< "$taxIncluded"
Try
echo $test_pods_info | grep -o 'test-'
the -o option is:
show[ing] only the part of a line matching PATTERN
according to grep --help. Of course, this will only print test-, so you'll need to rework your regex:
grep -oE '(test).*[[:space:]]\b'
Figured it out..
echo $test_pods_info | grep -o "\test-\w*-\w*\-\w*\-\w*\-\w*"
outoput
test-78ac951e-89a6-4199-87a4-db8a1b8b054f
but i wish there is simple way. like \test-*\
I have this a file.txt with one line, whose content is
/app/jdk/java/bin/java -server -Xms3g -Xmx3g -XX:MaxPermSize=256m -Dweblogic.Name=O2pPod8_mapp_msrv1_1 -Djava.security.policy=/app/Oracle/Middleware/Oracle_Home/wlserver/server/lib/weblogic.policy -Djava.security.egd=file:/dev/./urandom -Dweblogic.ProductionModeEnabled=true -Dweblogic.system.BootIdentityFile=/app/Oracle/Middleware/Oracle_Home/user_projects/domains/O2pPod8_domain/servers/O2pPod8_mapp_msrv1_1/data/nodemanager/boot.properties -Dweblogic.nodemanager.ServiceEnabled=true -Dweblogic.nmservice.RotationEnabled=true -Dweblogic.security.SSL.ignoreHostnameVerification=false -Dweblogic.ReverseDNSAllowed=false -Xms8192m -Xmx8192m -XX:MaxPermSize=2048m -XX:NewSize=1300m -XX:MaxNewSize=1300m -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
and when I do
cat file.txt | grep -io "Xms.*" | awk '{FS" "; print $1} ' | cut -d "s" -f2
output:
3g
why is grep not reading the second occurrence, i.e. I expect 3g and 8192m.
Infact, how do I print only 8192m in this case?
Your regex just says "find Xms followed by anything repeated 0 to n times". That returns the rest of the row from Xms onward.
What you actually want is something like "find Xms followed by anything until there's a whitespace repeated 0 to n times".
grep -io "Xms[^ ]*" file.txt | awk '{FS" "; print $1} ' | cut -d "s" -f2
In [^ ] the ^ means "not"
I'm not really sure what you are trying to achieve here but if you want the endings of all space-separated strings starting with -Xms, using bare awk is:
$ awk -v RS=" " '/^-Xms/{print substr($0,5)}' file
3g
8192m
Explained:
$ awk -v RS=" " ' # space separated records
/^-Xms/ { # strings starting with -Xms
print substr($0,5) # print starting from 5th position
}' file
If you wanted something else (word repeated in the title puzzles me a bit), please update the question with more detailed requirements.
Edit: I just noticed how do I print only 8192m in this case (that's the repeated maybe). Let's add a counter c and not print the first instance:
$ awk -v RS=" " '/^-Xms/&&++c>1{print substr($0,5)}' file
8192m
You could use grep -io "Xms[0-9]*[a-zA-Z]" instead of grep -io "Xms.*" to match a sequence of digits followed by a single character instead the entire line within a single group:
cat file.txt | grep -io "Xms[0-9]*[a-zA-Z]" | awk '{FS" "; print $1} ' | cut -d "s" -f2
Hope this helps!
The .* in your regexp is matching the rest of the line, you need [^ ]* instead. Look:
$ grep -o 'Xms.*' file
Xms3g -Xmx3g -XX:MaxPermSize=256m -Dweblogic.Name=O2pPod8_mapp_msrv1_1 -Djava.security.policy=/app/Oracle/Middleware/Oracle_Home/wlserver/server/lib/weblogic.policy -Djava.security.egd=file:/dev/./urandom -Dweblogic.ProductionModeEnabled=true -Dweblogic.system.BootIdentityFile=/app/Oracle/Middleware/Oracle_Home/user_projects/domains/O2pPod8_domain/servers/O2pPod8_mapp_msrv1_1/data/nodemanager/boot.properties -Dweblogic.nodemanager.ServiceEnabled=true -Dweblogic.nmservice.RotationEnabled=true -Dweblogic.security.SSL.ignoreHostnameVerification=false -Dweblogic.ReverseDNSAllowed=false -Xms8192m -Xmx8192m -XX:MaxPermSize=2048m -XX:NewSize=1300m -XX:MaxNewSize=1300m -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
$ grep -o 'Xms[^ ]*' file
Xms3g
Xms8192m
$ grep -o 'Xms[^ ]*' file | cut -d's' -f2
3g
8192m
$ grep -o 'Xms[^ ]*' file | cut -d's' -f2 | tail -1
8192m
or more concisely:
$ sed 's/.*Xms\([^ ]*\).*/\1/' file
8192m
The positive lookbehind of PCRE (the form: (?<=RE1)RE2) can resolve the problem easily:
$ grep -oP '(?<=Xms)\S+' file.txt
3g
8192m
Explains:
-o: show only the part of a line matching PATTERN.
-P: PATTERN is a Perl regular expression.
(?<=Xms)\S+: matches all continuous non-whitespace strings which are just following the string Xms.
I want to print the filename if only ALL the matches are present... on different lines
grep -l -w '10B\|01A\|gencode' */$a*filename.vcf
this prints out the filename, but not only if ALL three matches are present.
Would you consider to try awk? awk may solve it in following method,
awk '/10B/&&/01A/&&/gencode/{print FILENAME}' */$a*filename.vcf
try following, just edited your solution a bit.
grep -l '10B.*01A.*gencode' Input_file
With grep and its -P (Perl-Compatibility) option and positive lookahead regex (?=(regex)), to match patterns if in any order.
grep -lwP '(?=.*?10B)(?=.*?01A)(?=.*?gencode)' /path/to/infile
grep -l 'pattern1' files ... | xargs grep -l 'pattern2' | xargs grep -l 'pattern3'
From the grep manual:
-l, --files-with-matches
Suppress normal output; instead print the name of each input file from which output would normally have been printed. The scanning will stop on the first match. (-l is specified by POSIX.)
I have two files
File1
area a
area b
areaf
File2
area a :aaaa
area b:bbbb
area3:abc
areaf:hsg
area4:uhg
area5:yutr
while read -r line
do
grep -w ^line File2 | cut -d ":" -f2
done < File1
Desired output
aaaa
bbbb
hsg
actual output
grep: can't open a
area a
grep: cant open b
area3:abc
areaf:hsg
area4:uhg
area5:yutr
but when i run grep -w ^"area a" File2 | cut -d ":" -f2 it is giving the correct output :
aaaa
Please assist me on this. i tried for loop also. no success. grep is not working inside loop.
Your variable line might contain "special characters". For example, a space that might be interpreted as a separator by the shell. Or some characters that might be interpreted as pattern metacharacter by grep.
You both need to use fgrep and to quote your variable (I'm not sure -w add anything to that command -- why do you feel the need of it?):
fgrep -w "$line"
But doing so you loose the ability to locate "the first character"
An other option if the "start of line" match is required is to escape the search string:
while read -r line
do
line=$(echo "$line" | sed -e 's/[]\/$*.^|[]/\\&/g')
grep -w "^$line" File2 | cut -d ":" -f2
done < File1
You can achieve the same result without a loop, since grep can read patterns from a file via the -f option. This will be more robust:
grep -f input1 input2 | cut -d: -f2
Gives:
aaaa
bbbb
hsg