grep football player names with special characters - grep

Sorry for asking so many questions recently.
I have another issues with grep in bash. (I am using gitbash)
When I try to use grep for names with special characters e.g.:
Nikola Boranijaševi
Niclas Füllkrug
Christian Groß
Anderson Gonçalves
Oliver Kahn
Manual Neuer
etc....
code I tested:
egrep -i "Nikola Boranijaševi|Niclas Füllkrug|Christian Groß|Anderson Gonçalves" playerlist.csv >>matches.csv
this returns nothing.
the same code works for players without any special characters.....
I even did a small test like:
egrep -aoi "Anderson Gonçalves" playerlist.csv >> matches.csv
but again this did not output anything
does anyone know how to use grep with special characters?
I read here on the forum I should use double \ before special characters so I tried the following to:
egrep -aoi "Anderson Gon\\çalves" playerlist.csv >> matches.csv
again no output... any ideas would be appreciated thanks!!
even a grep with a wildchard chaarcter replacement instead of ç in Anderson Gonçalves in grep would help.. thank you!!

The answer is use wild cards; it's not perfect, but it does work:
egrep -aoi "Nikola Boranija*|Niclas F*|Christian Gro*|Anderson Gon*" playerlist.csv >>matches.csv

Related

Trying to figure out why my regex command won't work [duplicate]

This question already has answers here:
How do you use a plus symbol with a character class as part of a regular expression?
(3 answers)
Closed 2 years ago.
I have a problem to work on and was wondering why my regex won't work. It's a simple exercise to match words in a text dictionary that contains the top row. I believe I have a solution but grep comes up blank every time:
grep ^[qwertyuiop]+$ /opt/~~~~~~/data/web2
this is my command, which does nothing, but if i just put:
grep [qwertyuiop] /opt/~~~~~~/data/web2
it matches words with letters from the top row. Can anybody tell me why it isn't working? Thank you all for your time.
you're super close.
With grep you want to use the -x flag to match the whole line.
grep -x '[qwertyuiop]\+' /usr/share/dict/american-english
then a simple escaped + to match multiple characters.
if you want to avoid the -x you can take your original approach like so:
grep '^[qwertyuiop]\+$' /usr/share/dict/american-english
With an escape and some quotes it works marvelously, although i think the -x is more idiomatic, as some other people have commented, you can also get away with using -e although that can have some unintended consequences. I recommend man grep which gives a nice overview.
I don't think grep recognizes ^ $ or + on it's own. You have to use grep -e or egrep to use special characters like that

Grep Individual Commands not working when combined in Multi Pattern grep command

I have a need to perform multiple grep matches as part of the same grep command. When I run them individually, they work fine. But not when together. I hope someone could either show me a solution or perhaps can help me find a work-around. Here is sample stream:
(string start..) RollUp:"V" Enzyme:"ENZA ENZB ENZD ENZE" (..string end)
In the first command I am needing to isolate all RollUp substrings.Value is always A or V:
grep -o "RollUp:\"[AV]\""
In the second command I am needing to isolate all combinations of Enzyme values (1-20 total, spaces in between, don't know values names). This command works:
grep -oE 'Enzyme:[[:space:]]*"[^"]+"'
However, I need to match both patterns as part of same stream. When I try:
grep -oE "RollUp:\"[AV]\""\|Enzyme:[[:space:]]*"[^"]+""
, nothing is returned. I would be grateful for any ideas for getting this double grep pattern match to work. Thank you!
regex someting[^"]+ : this means string something followed by anything till next " is seen. Here + sign means , at least one or more match.
grep -oE 'RollUp:"[^"]+|Enzyme:[[:space:]]*"[^"]+"' file

Find a string between two characters with grep

I have found on this answer the regex to find a string between two characters. In my case I want to find every pattern between ‘ and ’. Here's the regex :
(?<=‘)(.*?)(?=’)
Indeed, it works when I try it on https://regex101.com/.
The thing is I want to use it with grep but it doesn't work :
grep -E '(?<=‘)(.*?)(?=’)' file
Is there anything missing ?
Those are positive look-ahead and look behind assertions. You need to enable it using PCRE(Perl Compatible Regex) and perhaps its better to get only matching part using -o option in GNU grep:
grep -oP '(?<=‘)(.*?)(?=’)' file

ksh - search for multiple strings and write lines to file

Any help would be greatly appreciated. I can read code and figure it out, but I have trouble writing from scratch.
I need help starting a ksh script that would search a file for multiple strings and write each line containing one of those strings to an output file.
If I use the following command:
$ grep "search pattern" file >> output file
...that does what I want it to. But I need to search multiple strings, and write the output in the order listed in the file.
Again... any help would be great! Thank you in advance!
Have a look at the regular expression manuals. You can specify multiple strings in the search expression such as grep "John|Bill"
Man grep will teach you a lot about regular expressions, but there are several online sites where you try them out, such as regex101 and (more colorful) regexr.
Sometimes you need egrep.
egrep "first substring|second substring" file
When you have a lot substrings you can put them in a variable first
findalot="first substring|second substring"
findalot="${findalot}|third substring"
findalot="${findalot}|find me too"
skipsome="notme"
skipsome="${skipsome}|dirty words"
egrep "${findalot}" file | egrep -v "${skipsome}"
Use "-f" in grep .
Write all the strings you want to match in a file ( lets say pattern_file , the list of strings should be one per line)
and use grep like below
grep -f pattern_file file > output_file

Opposite of "only-matching" in grep?

Is there any way to do the opposite of showing only the matching part of strings in grep (the -o flag), that is, show everything except the part that matches the regex?
That is, the -v flag is not the answer, since that would not show files containing the match at all, but I want to show these lines, but not the part of the line that matches.
EDIT: I wanted to use grep over sed, since it can do "only-matching" matches on multi-line, with:
cat file.xml|grep -Pzo "<starttag>.*?(\n.*?)+.*?</starttag>"
This is a rather unusual requirement, I don't think grep would alternate the strings like that. You can achieve this with sed, though:
sed -n 's/$PATTERN//gp' file
EDIT in response to OP's edit:
You can do multiline matching with sed, too, if the file is small enough to load it all into memory:
sed -rn ':r;$!{N;br};s/<starttag>.*?(\n.*?)+.*?<\/starttag>//gp' file.xml
You can do that with a little help from sed:
grep "pattern" input_file | sed 's/pattern//g'
I don't think there is a way in grep.
If you use ack, you could output Perl's special variables $` and $' variables to show everything before and after the match, respectively:
ack string --output="\$`\$'"
Similarly if you wanted to output what did match along with other text, you could use $& which contains the matched string;
ack string --output="Matched: $&"

Resources