How can I create a regular expression to match distinct words?
I tried the following regex, but it also matches words embedded in other words:
#"(abs|acos|acosh|asin|asinh|atan|atanh)"
For example, with
#"xxxabs abs"
abs by itself should match, but not inside xxxabs.
Although the solution (word boundaries) is an old classic, yours is an interesting question because the words in the alternation are so similar.
You can start with this:
\b(?:abs|acos|acosh|asin|asinh|atan|atanh)\b
And compress to that:
\b(?:a(?:cosh?|sinh?|tanh?|bs))\b
How does it work?
The key idea is to use the word boundaries \b to ensure that the match is not embedded in a larger word.
The idea of the compression is to make the engine match faster. It's hard to read, though, so unless you need every last drop of performance, that's purely for entertainment purposes.
Token-By-Token
\b # the boundary between a word char (\w) and
# something that is not a word char
(?: # group, but do not capture:
a # 'a'
(?: # group, but do not capture:
cos # 'cos'
h? # 'h' (optional (matching the most
# amount possible))
| # OR
sin # 'sin'
h? # 'h' (optional (matching the most
# amount possible))
| # OR
tan # 'tan'
h? # 'h' (optional (matching the most
# amount possible))
| # OR
bs # 'bs'
) # end of grouping
) # end of grouping
\b # the boundary between a word char (\w) and
# something that is not a word char
Bonus Regex
In case you're feeling depressed today, this alternate compression (is it longer than the original?) should cheer you up.
\b(?:a(?:(?:co|b)s|(?:cos|(?:si|ta)n)h|(?:si|ta)n))\b
Related
I have a bunch of sentences that I want to break into an array. Right now, I'm splitting every time \n appears in the string.
#chapters = #script.split('\n')
What I'd like to do is .split ever OTHER "." in the string. Is that possible in Ruby?
You could do it with a regex, but I'd start with a simple approach: just split on periods, then join pairs of substrings:
s = "foo. bar foo. foo bar. boo far baz. bizzle"
s.split(".").each_slice(2).map {|p| p.join "." }
# => => ["foo. bar foo", " foo bar. boo far baz", " bizzle"]
This is a case where it's easier to use String#scan than String#split.
We can use the following regular expression:
r = /(?<=\.|\A)[^.]*\.[^.]*(?=\.|\z)/
str=<<~_
Now is the time. This is it. It is now. The time to have fun.
The time to make new friends. The time to party.
_
str.scan(r)
#=> [
# "Now is the time. This is it",
# " It is now. The time to have fun",
# "\nThe time to make new friends. The time to party"
#=> ]
We can write the regular expression in free-spacing mode to make it self-documenting.
r = /
(?<= # begin a positive lookbehind
\A # match the beginning of the string
| # or
\. # match a period
) # end positive lookbehind
[^.]* # match zero or more characters other than periods
\. # match a period
[^.]* # match zero or more characters other than periods
(?= # begin a positive lookahead
\. # match a period
| # or
\z # match the end of the string
) # end positive lookahead
/x # invoke free-spacing regex definition mode
Note that (?<=\.|\A) can be replaced with (?<![^\.]). (?<![^\.]) is a negative lookbehind that asserts the match is not preceded by a character other than a period.
Similarly, (?=\.|\z) can be replaced with (?![^.]). (?![^.]) is a negative lookahead that asserts the match is not followed by a character other than a period.
I am trying to get a regex that separates the parts of a IIF made in VB to convert it to a RoR if. The string that I am trying to convert is this one:
Var007>2,IIF(Var133=2 OR Var133=3,'',Var132+IIF(Var123=2,Var122+IIF(Var113=2,Var112,0),0)+IIF(Var007>3,IIF(Var143=3,Var142,0),0)),''
And the regex that I am developing is this one:
(.{1,}),(?![^\(]*\))(.{1,}),(?![^\(]*\))(.{1,})
I want to get this:
Var007>2
IIF(Var133=2 OR Var133=3,'',Var132+IIF(Var123=2,Var122+IIF(Var113=2,Var112,0),0)+IIF(Var007>3,IIF(Var143=3,Var142,0),0))
Now , I am getting this because can´t select a group between brackets.
Var007>2,IIF(Var133=2 OR Var133=3,'',Var132+IIF(Var123=2,Var122+IIF(Var113=2,Var112,0),0)+IIF(Var007>3
IIF(Var143=3,Var142,0),0))
You can see it on Rubular.
This are a few examples of the possible string:
Var007>0,IIF(Var002=0,0, ((Var111*Var112)*CaracteristicaArticulo('Var002','Kilos M2')*(1+(CaracteristicaArticulo('Var002','Porcentaje Rozamiento')/100)))+IIF(Var022=1,Var112*0.800,0)+(Var112*0.339)),''
Var007>1,IIF(Var110=0,IIF(Var025=0 OR Var025=1 OR Var025=39 OR Var025=2,20,IIF(Var025=3 OR Var025=4,21,IIF(Var025=5 OR Var025=6 OR Var025=28 OR Var025=29,22,IIF(Var025=7 OR Var025=8 OR Var025=9 OR Var025=10,24,IIF(Var025=12,26,IIF(Var025=11 OR Var025=14 OR Var025=16 OR Var025=17,27,' ')))))),''),''
There won't be single quotes inside string literals.
Please I need your help ;)
It is generally not a good idea to parse strings like this with regex, but your requirements are not that complex in this case.
Here is a solution that will match "tokens" that consist of 1 or more occurrences of 1+ word chars followed with a balanced amount of (...) and having '...' substrings inside (with possible ( or )), or chars other than ,:
s = "Var007>2,IIF(Var133=2 OR Var133=3,'',Var132+IIF(Var123=2,Var122+IIF(Var113=2,Var112,0),0)+IIF(Var007>3,IIF(Var143=3,Var142,0),0)),''"
rx = /
( # Group 1, what we need to extract
(?: # A non-capturing group acting as a container
\w+ # 1 or more word chars
( # Group 2 (technical one)
\( # opening parenthesis
(?:
'[^']*' # a single quoted substring with no single quotes inside
| # or
[^()']+ # 1 or more chars other than quote and parentheses
| # or
\g<2> # recurse Group 2 pattern
)* # Group 2 end, can repeat 0 or more times
\) # closing parenthesis
)
|
[^,] # Any char other than a comma
)+ # One or more occurrences of the alternatives in the container group
) # End of Group 1
/x # extended mode with all in-pattern whitespace ignored
res = []
s.scan(rx) { |m|
res << m[0] # Only collect Group 1 values dropping all others
}
puts res
See the Ruby demo online
Output:
Var007>2
IIF(Var133=2 OR Var133=3,'',Var132+IIF(Var123=2,Var122+IIF(Var113=2,Var112,0),0)+IIF(Var007>3,IIF(Var143=3,Var142,0),0))
''
I'm currently using this regex for my names \A^[a-zA-Z'.,\s-]*\z; however, I don't want there to be any consecutive characters for a apostrophe, period, comma, whitespace, or hyphen. How can I do this?
The significant part would be (?:[a-zA-Z]|['.,\s-](?!['.,\s-])).
Meaning:
(?:
[a-zA-Z] # letters
| # or
['.,\s-] # any of these
(?!['.,\s-]) # but in front can not be another of these
)
But, in this case:
Guedes, Washington
------^^----------
Would invalidate the name, so maybe you want remove \s from the negative look-ahead.
Hope it helps.
How about this (string of letters, potentially ending with one of those terminator chars)
\A^[a-zA-Z]*['.,\s-]?\z
I want to write a program which takes build number in the format of 23.0.23.345 (first two-digits then dot, then zero, then dot, then two-digits, dot, three-digits):
number=23.0.23.345
pattern = /(^[0-9]+\.{0}\.[0-9]+\.[0-9]$)/
numbers.each do |number|
if number.match pattern
puts "#{number} matches"
else
puts "#{number} does not match"
end
end
Output:
I am getting error:
floating literal anymore put zero before dot
I'd use something like this to find patterns that match:
number = 'foo 1.2.3.4 23.0.23.345 bar'
build_number = number[/
\d{2} # two digits
\.
0
\.
\d{2} # two more digits
\.
\d{3}
/x]
build_number # => "23.0.23.345"
This example is using String's [/regex/] method, which is a nice shorthand way to apply and return the result of a regex. It returns the first match only in the form I'm using. Read the documentation for more information and examples.
Your pattern won't work because it doesn't do what you think it does. Here's how I'd read it:
/( # group
^ # start of line
[0-9]+ # one or more digits
\.{0} # *NO* dots
\. # one dot
[0-9]+ # one or more digits
\. # one dot
[0-9] # one digit
$ # end of line
)/x
The problem is \.{0} which means you don't want any dots.
The x flag tells Ruby to use multiline, which ignores blanks/whitespace and comments, making it easy to build a pattern that is documented.
Why reinvent the wheel? Use a gem like versionomy. You can parse the versions, compare them, check for equality, increment a particular part, etc. It even handles alpha, beta, patchlevels, etc.
require 'versionomy'
number='23.0.23.345'
v = Versionomy.parse number
v.major #=> 23
v.minor #=> 0
v.tiny #=> 23
v.tiny2 #=> 345
numbers = "23.0.23.345", "23.0.33.173", "0.0.0.0"
pattern = /\d{2}\.0\.\d{2}\.\d{3}/x
numbers.each do |number|
if number.match pattern
puts "#{number} matches"
else
puts "#{number} does not match"
end
end
The "number" array in line one needs to have values of strings and not integers, I also changed the array "number" to "numbers", you will also need multiple items in the numbers array to call the ".each" method in your loop.
There seems to be agreement on what regular expression you should use. If your ultimate goal is to extract the elements of the strings as integers, you could do this:
str = "I'm looking for 23.0.345.26, or was that 23.0.26.345?"
str.scan(/(\d{2})\.(0)\.(\d{2})\.(\d{3})/).flatten.map(&:to_i)
#=> [23, 0, 26, 345]
Given a string like:
"#[19:Sara Mas] what's the latest with the TPS report? #[30:Larry Peters] can you help out here?"
I want to find a way to dynamically return, the user tagged and the content surrounding. Results should be:
user_id: 19
copy: what's the latest with the TPS report?
user_id: 30
copy: can you help out here?
Any ideas on how this can be done with ruby/rails? Thanks
How is this regex for finding matches?
#\[\d+:\w+\s\w+\]
Split the string, then handle the content iteratively. I don't think it'd take more than:
tmp = string.split('#').map {|str| [str[/\[(\d*).*/,1], str[/\](.*^)/,1]] }
tmp.first #=> ["19", "what's the latest with the TPS report?"]
Does that help?
result = subject.scan(/\[(\d+).*?\](.*?)(?=#|\Z)/m)
This grabs id and content in backreferences 1 and 2 respectively. For stoping the capture either # or the end of string must be met.
"
\\[ # Match the character “[” literally
( # Match the regular expression below and capture its match into backreference number 1
\\d # Match a single digit 0..9
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
. # Match any single character that is not a line break character
*? # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
\\] # Match the character “]” literally
( # Match the regular expression below and capture its match into backreference number 2
. # Match any single character that is not a line break character
*? # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
)
(?= # Assert that the regex below can be matched, starting at this position (positive lookahead)
# Match either the regular expression below (attempting the next alternative only if this one fails)
\# # Match the character “\#” literally
| # Or match regular expression number 2 below (the entire group fails if this one fails to match)
\$ # Assert position at the end of the string (or before the line break at the end of the string, if any)
)
"
This will match something starting from # and ending to punctuation makr. Sorry if I didn't understand correctly.
result = subject.scan(/#.*?[.?!]/)