I am lite bit confused here egrep "^[0-9]+ < [0-9]{1,3}$". I know that :
^[0-9] wherever it starts with number. But what does + < and {1,3}mean here?
For +, this will mean one or more matches.
For {1,3} it means 1 to 3 times for matching.
Related
I'm trying to do a word search with regex and wonder how to type AND for multiple criteria.
For example, how to type the following:
(Start with a) AND (Contains p) AND (Ends with e), such as the word apple?
Input
apple
pineapple
avocado
Code
grep -E "regex expression here" input.txt
Desired output
apple
What should the regex expression be?
In general you can't implement and in a regexp (but you can implement then with .*) but you can in a multi-regexp condition using a tool that supports it.
To address the case of ands, you should have made your example starts with a and includes p and includes l and ends with e with input including alpine so it wasn't trivial to express in a regexp by just putting .*s in between characters but is trivial in a multi-regexp condition:
$ cat file
apple
pineapple
avocado
alpine
Using &&s will find both words regardless of the order of p and l as desired:
$ awk '/^a/ && /p/ && /l/ && /e$/' file
apple
alpine
but, as you can see, you can't just use .*s to implement and:
$ grep '^a.*p.*l.*e$' file
apple
If you had to use a single regexp then you'd have to do something like:
$ grep -E '^a.*(p.*l|l.*p).*e$' file
apple
alpine
two ways you can do it
all that "&&" is same as negating the totality of a bunch of OR's "||", so you can write the reverse of what you want.
at a single bit-level, AND is same as multiplication of the bits, which means, instead of doing all the && if u think it's overly verbose, you can directly "multiply" the patterns together :
awk '/^a/ * /p/ * /e$/'
so by multiplying them, you're doing the same as performing multiple logical ANDs all at once
(but only use the short hand if inputs aren't too gigantic, or when savings from early exit are known to be negligible.
don't think of them as merely regex patterns - it's easier for one to think of anything not inside an action block, what's typically referred to as pattern, as
any combination and collection of items that could be evaluated for a boolean outcome of TRUE or FALSE in the end
e.g. POSIX-compliant expressions that work in the space include
sprintf()
field assignments, etc
(even decrementing NR - if there's such a need)
but not
statements like next, print, printf(),
delete array etc, or any of the loop structures
surprisingly though, getline is directly doable
in the pattern space area (with some wrapper workaround)
For 1, I can get 101 to 191 to print. How do I include 203 and up as well so that it includes everything from 10 up? For 2, I can get the first set of names starting with an L to print but not the ones in the and 230. Please don't suggest I use something else like awk or sed, I want to know how to do it the way I am currently trying to do it. How can I expand the ranges I am searching in order to include more. Thanks.
For 1) since it has to be 10 or more, it needs 2 or more digits, so just use this:
grep 'per[0-9]\{2,\}'
For 2), just do
grep 'per[0-9]*:L'
and, of course, you can combine them with
grep 'per[0-9]\{2,\}:L'
Try using the * to grep for repeated Numbers like: grep "per[0-9]*:L" idfile.txt
This is a more detailed answer :)
Regex - Matching arbitrary amount of numbers
I was looking for a solution to a regex problem in Rails I had and an answer on a separate question lead me 90% of the path to the answer. Basically, what I would like to do is to have a ruby/rails script that will format a messy text in terms of capitalizing every letter after a "./,/!/?". This code by "Mark S"
ng = Nokogiri::HTML.fragment("<p>hello, how are you? oh, that's nice! i am glad you are fine. i am too.<br />i am glad to have met you.</p>")
ng.traverse{|n| (n.content = n.content.gsub(/(.*?)([\.|\!|\?])/) { " #{$1.strip.capitalize}#{$2}" }.strip) if n.text?}
ng.to_s
The only issue I have with this code, and it is a big issue, is that the code adds a space in between float numbers like "2.0", making a text like:
there is a cat in the hat.it has a 2.0 inch tail!
isn't that awesome?!I think so.
Become
There is a cat i the hat. It has a 2. 0 inch tail!
Isn't that awesome?! I think so.
where I obviously want it to be:
There is a cat i the hat. It has a 2.0 inch tail!
Isn't that awesome?! I think so.
Any suggestions on how to alter this text, for example so that any "." will be ignored by this code?
It seems you want to capitalize any lowercase letter at the beginning of the string or after ., !, or ?.
Use
s.gsub(/(\A|[.?!])(\p{Ll})/) { Regexp.last_match(1).length > 0 ? "#{$1} #{$2.capitalize}" : "#{$2.capitalize}" }
See the Ruby demo
Pattern details:
(\A|[.?!]) - Group 1 capturing the start of string location (empty string) or a ., ?, or !
(\p{Ll}) - Group 2 capturing any Unicode lowercase letter
Inside the replacement, we check if Group 1 value is not empty, and if it is, we just return the capitalized letter. Else, return the punctuation, a space, and the capitalized letter.
NOTE: However, there is a problem with abbreviations (as usual in these cases), like i.e., e.g., etc. Then there are words like iPhone, iCloud, eSklep, and so on.
I can't figure out what does this regex match:
A: "\\/\\/c\\/(\\d*)"
B: "\\/\\/(\\d*)"
I suppose they are matching some kind of number sequence since \d matches any digit but I'd like to know an example of a string that would be a match for this regex.
The pattern syntax is that specified by ICU. Expressions are created with NSRegularExpression in an iOS app and are correct.
The first matches //c/ + 0 or more digits. The second matches // + 0 or more digits. In both the digits are captured.
An example of a match for A) is //c/123
An example of a match for B) is //12345
When I use Cygwin which emulates Bash on Windows, I sometimes run into situations where I have to escape my escape characters which is what I think is making this expression look so weird. For instance, when I use sed to look for a single '\' I sometimes have to write it as '\\\\'. (Funny, StackOverflow proved my point. If you write 4 backslashes in the comment, it only shows two. So if you process it again, they might all disappear depending on your situation).
Considering this, it might be helpful to think of pairs of backslashes as representing only one if you're coming from a similar situation. My guess would be you are. Because of this I would say Erik Duymelinck is probably spot on. This will capture a sequence of digits that may or may not follow a couple slashes and a c:
//c/000
//00000
This regex matches an odd sequence of characters, which, at first glance, almost seem like a regex, since \d is a digit, and followed by an asterisk (\d*) would mean zero-or-more digits. But it's not a digit, because the escape-slash is escaped.
\\/\\/c\\/(\\d*)
So, for instance, this one matches the following text:
\/\/c\/\
\/\/c\/\d
\/\/c\/\dd
\/\/c\/\ddd
\/\/c\/\dddd
\/\/c\/\ddddd
\/\/c\/\dddddd
...
This one is almost the same
\\/\\/(\\d*)
except you just delete the c\/ from the above results:
\/\/\
\/\/\d
\/\/\dd
\/\/\ddd
\/\/\dddd
\/\/\ddddd
\/\/\dddddd
...
In both cases, the final \ and optional d is [capture group][1] one.
My first impression was that these regexes were intended for escaping in Java strings, meaning they would be completely invalid. If the were escaped for Java strings, such as
Pattern p = Pattern.compile("\\/\\/c\\/(\\d*)");
It would be invalid, because after un-escaping, it would result in this invalid regex:
\/\/c\/(\d*)
The single escape-slashes (\) are invalid. But the \d is valid, as it would mean any digit.
But again, I don't think they're invalid, and they're not escaped for a Java string. They're just odd.
What does the regular expression "\d{1,6}" (used in an ASP.NET MVC route as parameter constraint) check for/allow?
That will match 1-6 consecutive occurrences of any of the digits 0-9 (not necessarily the same digit).
a number with 1-6 digits
\d is the class for digits
the {1,6} means one to six element(s) of that class
if you want some reference you can consult this website probably not the best but kind of nice summary.
\d means a single decimal character. 0~9.
{minimum-length, maximum-length} means a precede expression (\d in this case) will be followed repeatedly.
As a result, your expression \d{1,6} would match any of them.
0
12
874
4757
48727
557473