Is there specific regex class including the euro sign?
According to the grep manual the [:print:] class (AFAIK € is printable) and [:punct:] classes don't contain the euro sign as they contain only the locale (en_US.UTF-8) and ASCII punctuation characters (including $)
$ echo "I can has 5€ ?" | grep -o "[[:print:]*"
I can has 5
Is there another solution (as I guess this problem will accouter with every currency char other than the dollar) that will allow me to catch every printable char?
EDIT
After playing with putty settings, I managed to display the € when I print the file, but grep-ing it acts weird. Initially I couldn't even print the "€" bbut after changing the putty encoding to cp1252 (rather than Unicode) I can see the sign. grepping still doesn't work though
$ cat test.bah
I can has 5€ ?
$ cat test.bah | grep -o '[[:print:]]*'
I can has 5
?
$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
...
Apparently, on my machine:
Ubuntu 12.04.1 LTS (GNU/Linux 3.2.0-32-virtual i686)
grep --version grep (GNU grep) 2.10
bash --version GNU bash, version 4.2.24(1)-release (i686-pc-linux-gnu)
The solution was to use the -P switch and match [[:print:]] or non printable [^[:print:]] chars:
$ cat test.bah
I can has 5€ or 5£?
$ cat test.bah | grep -P -o '[[:print:]]*'
I can has 5
or 5
?
(btw, the new lines are result of the multiple matches and not misprinting the currency signs)
$ cat test.bah | grep -P -o '[^[:print:]]*'
€
£
$ cat test.bah | grep -P -o '([[:print:]]|[^[:print:]])*'
I can has 5€ or 5£?
Few notes:
#melpomene showed in his answer that his console or version of grep handles the currency signs better. I am using the aws stock version of ubuntu 12.04
In my qustion I mentioned that I needed to change the putty settings in case
somebody needs the settings - under Window->Translation-> set the recived data to Win1252 (Western) counter intuitive as it may seem, setting the encoeding to UTF-8 made the € and £ appear as # or ▒ de
pending on the drawing char selected option)
$ echo "I can has 5€ ?" | grep -o '[[:print:]]*'
I can has 5€ ?
$ echo $LANG
en_US.utf8
Related
I saw this question: count (non-blank) lines-of-code in bash
I understand this pattern is correct.
grep -vc ^$ filename
Why this pattern returns same result?
grep -c '[^ ]' filename
What is trick in '[^ ]'?
$ printf 'foo 123\n \nxyz\n\t\n' > ip.txt
$ cat -T ip.txt
foo 123
xyz
^I
$ grep -vc '^$' ip.txt
4
$ grep -c '[^ ]' ip.txt
3
$ grep -c '[^[:blank:]]' ip.txt
2
grep -c '[^ ]' counts any line that has a non-space character. For example, foo 123 will be counted since alphabets are not space characters. So, which one to use depends on whether a line containing only space characters should be counted or not.
I am trying to learn RegEx, but it is hard.
For example, i have 3 files:
$ ls
thisisnothing12.txt Thisisnothing12.txt thisisnothing.txt
I want to use ls to grep out only the 2 files with digits on it..
These are what i have tried, but they doesn't show even a single file.. why ? What's wrong with em ?
$ ls | grep "^[\w]+[\d]+\.[\w]{3}$"
$ ls | grep "^[a-zA-Z]+[0-9]+\.[a-zA-Z]{3}$"
Thx.
There are different regex flavors, see https://stackoverflow.com/a/66256100/7475450
You need to use PCRE if you want to use \d:
$ touch thisisnothing12.txt Thisisnothing12.txt thisisnothing.txt
$ ls
Thisisnothing12.txt thisisnothing.txt thisisnothing12.txt
$ ls | grep '\d' # '\d' does not work in POSIX Basic regex
$ ls | grep -P '\d' # use PCRE regex
Thisisnothing12.txt
thisisnothing12.txt
$
As you can see you can search for just the characters you are interested in.
You can narrow down, such as finding files that start with a number:
$ touch 2feet.txt
$ ls | grep -P '\d'
2feet.txt
Thisisnothing12.txt
thisisnothing12.txt
$ ls | grep -P '^\d'
2feet.txt
$
Learn more with this tutorial: https://twiki.org/cgi-bin/view/Codev/TWikiPresentation2018x10x14Regex
^[\w]+[\d]+\.[\w]{3}$
^[a-zA-Z]+[0-9]+\.[a-zA-Z]{3}$
Let's simplify a bit. They are both essentially the same thing, because [\w] is the same as \w which is [A-Za-z]. And the same for \d.
So we can simplify to
^\w+\d+\.\w{3}$
The issue is that ^ asserts the start of the string, and $ is the end. grep works on each line. And ls returns all results on one line. You can use ls -1 to get one file per line. You also need the -P flag for grep to work with \w and \d.
$ ls -1 | grep -P "^\w+\d+\.\w{3}$"
You can try different regexes here: https://regexr.com/5mujo
I have this a file.txt with one line, whose content is
/app/jdk/java/bin/java -server -Xms3g -Xmx3g -XX:MaxPermSize=256m -Dweblogic.Name=O2pPod8_mapp_msrv1_1 -Djava.security.policy=/app/Oracle/Middleware/Oracle_Home/wlserver/server/lib/weblogic.policy -Djava.security.egd=file:/dev/./urandom -Dweblogic.ProductionModeEnabled=true -Dweblogic.system.BootIdentityFile=/app/Oracle/Middleware/Oracle_Home/user_projects/domains/O2pPod8_domain/servers/O2pPod8_mapp_msrv1_1/data/nodemanager/boot.properties -Dweblogic.nodemanager.ServiceEnabled=true -Dweblogic.nmservice.RotationEnabled=true -Dweblogic.security.SSL.ignoreHostnameVerification=false -Dweblogic.ReverseDNSAllowed=false -Xms8192m -Xmx8192m -XX:MaxPermSize=2048m -XX:NewSize=1300m -XX:MaxNewSize=1300m -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
and when I do
cat file.txt | grep -io "Xms.*" | awk '{FS" "; print $1} ' | cut -d "s" -f2
output:
3g
why is grep not reading the second occurrence, i.e. I expect 3g and 8192m.
Infact, how do I print only 8192m in this case?
Your regex just says "find Xms followed by anything repeated 0 to n times". That returns the rest of the row from Xms onward.
What you actually want is something like "find Xms followed by anything until there's a whitespace repeated 0 to n times".
grep -io "Xms[^ ]*" file.txt | awk '{FS" "; print $1} ' | cut -d "s" -f2
In [^ ] the ^ means "not"
I'm not really sure what you are trying to achieve here but if you want the endings of all space-separated strings starting with -Xms, using bare awk is:
$ awk -v RS=" " '/^-Xms/{print substr($0,5)}' file
3g
8192m
Explained:
$ awk -v RS=" " ' # space separated records
/^-Xms/ { # strings starting with -Xms
print substr($0,5) # print starting from 5th position
}' file
If you wanted something else (word repeated in the title puzzles me a bit), please update the question with more detailed requirements.
Edit: I just noticed how do I print only 8192m in this case (that's the repeated maybe). Let's add a counter c and not print the first instance:
$ awk -v RS=" " '/^-Xms/&&++c>1{print substr($0,5)}' file
8192m
You could use grep -io "Xms[0-9]*[a-zA-Z]" instead of grep -io "Xms.*" to match a sequence of digits followed by a single character instead the entire line within a single group:
cat file.txt | grep -io "Xms[0-9]*[a-zA-Z]" | awk '{FS" "; print $1} ' | cut -d "s" -f2
Hope this helps!
The .* in your regexp is matching the rest of the line, you need [^ ]* instead. Look:
$ grep -o 'Xms.*' file
Xms3g -Xmx3g -XX:MaxPermSize=256m -Dweblogic.Name=O2pPod8_mapp_msrv1_1 -Djava.security.policy=/app/Oracle/Middleware/Oracle_Home/wlserver/server/lib/weblogic.policy -Djava.security.egd=file:/dev/./urandom -Dweblogic.ProductionModeEnabled=true -Dweblogic.system.BootIdentityFile=/app/Oracle/Middleware/Oracle_Home/user_projects/domains/O2pPod8_domain/servers/O2pPod8_mapp_msrv1_1/data/nodemanager/boot.properties -Dweblogic.nodemanager.ServiceEnabled=true -Dweblogic.nmservice.RotationEnabled=true -Dweblogic.security.SSL.ignoreHostnameVerification=false -Dweblogic.ReverseDNSAllowed=false -Xms8192m -Xmx8192m -XX:MaxPermSize=2048m -XX:NewSize=1300m -XX:MaxNewSize=1300m -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
$ grep -o 'Xms[^ ]*' file
Xms3g
Xms8192m
$ grep -o 'Xms[^ ]*' file | cut -d's' -f2
3g
8192m
$ grep -o 'Xms[^ ]*' file | cut -d's' -f2 | tail -1
8192m
or more concisely:
$ sed 's/.*Xms\([^ ]*\).*/\1/' file
8192m
The positive lookbehind of PCRE (the form: (?<=RE1)RE2) can resolve the problem easily:
$ grep -oP '(?<=Xms)\S+' file.txt
3g
8192m
Explains:
-o: show only the part of a line matching PATTERN.
-P: PATTERN is a Perl regular expression.
(?<=Xms)\S+: matches all continuous non-whitespace strings which are just following the string Xms.
grep -w "ing_[0-9][0-9][0-9][0-9]"
The command mentioned above is working. But is there a short version of 4 digits?
This does not work:
grep -w "ing_[0-9]\+ {4}"
Grep by default use Basic Regular expressions. In BRE , you need to escape the curly braces so that it would consider the curly braces as repetition quantifier.
grep -w "ing_[0-9]\{4\}" file
Example:
$ echo 'ing_6786 says' | grep -w "ing_[0-9]{4}"
$ echo 'ing_6786 says' | grep -w "ing_[0-9]\{4\}"
ing_6786 says
If you are lucky and your grep supports modern (Perl) regular expressions, try -P argument
grep -wP "ing_[0-9]{4}"
I would like to grep digits inside a set of parentheses after a match.
Given foo.txt below,
foo: "32.1" bar: "42.0" misc: "52.3"
I want to extract the number after bar, 42.0.
The following line will match, but I'd like to extract the digit. I guess I could pipe the output back into grep looking for \d+.\d+, but is there a better way?
grep -o -P 'bar: "\d+.\d+"' foo.txt
One way is to use look ahead and look-behind assertions:
grep -o -P '(?<=bar: ")\d+.\d+(?=")'
Another is to use sed:
sed -e 's/.*bar: "\([[:digit:]]\+.[[:digit:]]\+\)".*/\1/'
You could use the below grep also,
$ echo 'foo: "32.1" bar: "42.0" misc: "52.3"' | grep -oP 'bar:\s+"\K[^"]*(?=")'
42.0