Regex to separate IIF clauses in groups

Regex to separate IIF clauses in groups - ruby-on-rails

I am trying to get a regex that separates the parts of a IIF made in VB to convert it to a RoR if. The string that I am trying to convert is this one:
Var007>2,IIF(Var133=2 OR Var133=3,'',Var132+IIF(Var123=2,Var122+IIF(Var113=2,Var112,0),0)+IIF(Var007>3,IIF(Var143=3,Var142,0),0)),''
And the regex that I am developing is this one:
(.{1,}),(?![^\(]*\))(.{1,}),(?![^\(]*\))(.{1,})
I want to get this:
Var007>2
IIF(Var133=2 OR Var133=3,'',Var132+IIF(Var123=2,Var122+IIF(Var113=2,Var112,0),0)+IIF(Var007>3,IIF(Var143=3,Var142,0),0))
Now , I am getting this because can´t select a group between brackets.
Var007>2,IIF(Var133=2 OR Var133=3,'',Var132+IIF(Var123=2,Var122+IIF(Var113=2,Var112,0),0)+IIF(Var007>3
IIF(Var143=3,Var142,0),0))
You can see it on Rubular.
This are a few examples of the possible string:
Var007>0,IIF(Var002=0,0, ((Var111*Var112)*CaracteristicaArticulo('Var002','Kilos M2')*(1+(CaracteristicaArticulo('Var002','Porcentaje Rozamiento')/100)))+IIF(Var022=1,Var112*0.800,0)+(Var112*0.339)),''
Var007>1,IIF(Var110=0,IIF(Var025=0 OR Var025=1 OR Var025=39 OR Var025=2,20,IIF(Var025=3 OR Var025=4,21,IIF(Var025=5 OR Var025=6 OR Var025=28 OR Var025=29,22,IIF(Var025=7 OR Var025=8 OR Var025=9 OR Var025=10,24,IIF(Var025=12,26,IIF(Var025=11 OR Var025=14 OR Var025=16 OR Var025=17,27,' ')))))),''),''
There won't be single quotes inside string literals.
Please I need your help ;)

It is generally not a good idea to parse strings like this with regex, but your requirements are not that complex in this case.
Here is a solution that will match "tokens" that consist of 1 or more occurrences of 1+ word chars followed with a balanced amount of (...) and having '...' substrings inside (with possible ( or )), or chars other than ,:
s = "Var007>2,IIF(Var133=2 OR Var133=3,'',Var132+IIF(Var123=2,Var122+IIF(Var113=2,Var112,0),0)+IIF(Var007>3,IIF(Var143=3,Var142,0),0)),''"
rx = /
( # Group 1, what we need to extract
(?: # A non-capturing group acting as a container
\w+ # 1 or more word chars
( # Group 2 (technical one)
\( # opening parenthesis
(?:
'[^']*' # a single quoted substring with no single quotes inside
| # or
[^()']+ # 1 or more chars other than quote and parentheses
| # or
\g<2> # recurse Group 2 pattern
)* # Group 2 end, can repeat 0 or more times
\) # closing parenthesis
)
|
[^,] # Any char other than a comma
)+ # One or more occurrences of the alternatives in the container group
) # End of Group 1
/x # extended mode with all in-pattern whitespace ignored
res = []
s.scan(rx) { |m|
res << m[0] # Only collect Group 1 values dropping all others
}
puts res
See the Ruby demo online
Output:
Var007>2
IIF(Var133=2 OR Var133=3,'',Var132+IIF(Var123=2,Var122+IIF(Var113=2,Var112,0),0)+IIF(Var007>3,IIF(Var143=3,Var142,0),0))
''

Related

How do I replace all the apostrophes that come right before or right after a comma?

I have a string aString = "old_tag1,old_tag2,'new_tag1','new_tag2'"
I want to replace the apostrophees that come right before or right after a comma. For example in my case the apostrophees enclosing new_tag1 and new_tag2 should be removed.
This is what I have right now
aString = aString.gsub("'", "")
This is however problematic as it removes any apostrophe inside for example if I had 'my_tag's' instead of 'new_tag1'. How do I get rid of only the apostrophes that come before or after the commas ?
My desired output is
aString = "old_tag1,old_tag2,new_tag1,new_tag2"

My guess is to use regex as well, but in a slightly other way:
aString = "old_tag1,old_tag2,'new_tag1','new_tag2','new_tag3','new_tag4's'"
aString.gsub /(?<=^|,)'(.*?)'(?=,|$)/, '\1\2\3'
#=> "old_tag1,old_tag2,new_tag1,new_tag2,new_tag3,new_tag4's"
The idea is to find a substring with bounding apostrophes and paste it back without it.
regex = /
(?<=^|,) # watch for start of the line or comma before
' # find an apostrophe
(.*?) # get everything between apostrophes in a non-greedy way
' # find a closing apostrophe
(?=,|$) # watch after for the comma or the end of the string
/x
The replacement part just paste back the content of the first, second, and third groups (everything between parenthesis).
Thanks for #Cary for /x modificator for regexes, I didn't know about it! Extremely useful for explanation.

This answers the question, "I want to replace the apostrophes that come right before or right after a comma".
r = /
(?<=,) # match a comma in a positive lookbehind
\' # match an apostrophe
| # or
\' # match an apostrophe
(?=,) # match a comma in a positive lookahead
/x # free-spacing regex definition mode
aString = "old_tag1,x'old_tag2'x,x'old_tag3','new_tag1','new_tag2'"
aString.gsub(r, '')
#=> => "old_tag1,x'old_tag2'x,x'old_tag3,new_tag1,new_tag2'"
If the objective is instead to remove single quotes enclosing a substring when the left quote is at the the beginning of the string or is immediately preceded by a comma and the right quote is at the end of the string or is immediately followed by comma, several approaches are possible. One is to use a single, modified regex, as #Dimitry has done. Another is to split the string on commas, process each string in the resulting array and them join the modified substrings, separated by commas.
r = /
\A # match beginning of string
\' # match single quote
.* # match zero or more characters
\' # match single quote
\z # match end of string
/x # free-spacing regex definition mode
aString.split(',').map { |s| (s =~ r) ? s[1..-2] : s }.join(',')
#=> "old_tag1,x'old_tag2'x,x'old_tag3',new_tag1,new_tag2"
Note:
arr = aString.split(',')
#=> ["old_tag1", "x'old_tag2'x", "x'old_tag3'", "'new_tag1'", "'new_tag2'"]
"old_tag1" =~ r #=> nil
"x'old_tag2'x" =~ r #=> nil
"x'old_tag3'" =~ r #=> nil
"'new_tag1'" =~ r #=> 0
"'new_tag2'" =~ r #=> 0

Non regex replacement
Regular expressions can get really ugly. There is a simple way to do it with just string replacement: search for the pattern ,' and ', and replace with ,
aString.gsub(",'", ",").gsub("',", ",")
=> "old_tag1,old_tag2,new_tag1,new_tag2'"
This leaves the trailing ', but that is easy to remove with .chomp("'"). A leading ' can be removed with a simple regex .gsub(/^'/, "")

Regex - Grab characters between two characters but not including them

I'm trying to grab the rsession variable in the string below between '=' and '&' in a string.
I'm able to do it with /rsession(\=.+?\&)/.
But how do I do it so the output doesn't include the '=' and '&'?
The string is:
"app=1334300&rsession=0806343413_1:5bc6a3c80271826a1c0016c1520d3&token=a6caacf7edfbd9383429e30a1adfadf385208985ad&redirectReq=true"

You can use this regex:
/(?<=rsession=)[^&]*/
Explanation:
(?<= # asserts the match at position after
rsession= # the literal string `rsession=`
) # it is called: positive lookbehind
[^&] # not `&` character
* # as many as possible
Hope it helps.

You can capture it in group like this.
Regex: /rsession=([^&]+)/
Explanation:
([^&]+) will capture in group all characters until a & is met. Use \1 or $1 to use captured group.
Regex101 Demo

what does _? mean in the following regex? [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 8 years ago.
What does _? mean in the following rails regex?
/\A_?[a-z]_?(?:[a-z0-9.-]_?)*\z/i
I have attempted to decipher the regex as follows
# regex explained
# \A : matches the beginning of the string
# _? :
# [ : beginning of character group
# a-z : any lowercase letter
# ] : end of character group
# _? :
# ( : is a capture group, anything matched within the parens is saved for later use
# ?: : non-capturing group: matches below, but doesn't store a back-ref
# [ : beginning of character group
# a-z : any lowercase letter
# A-Z : any uppercase letter
# 0-9 : any digit
# . : a fullstop or "any character" ??????
# _ : an underscore
# ] : end of character group
# _? :
# ) : See above
# * : zero or more times of the given characters
# \z : is the end of the string

_ matches an underscore.
? matches zero or one of the preceeding character; basically making the preceeding character optional.
So _? will match one underscore if it is present, and will match without it.

? means that the previous expression should appear 0 or 1 times, similarly to how * means it should match 0 or more times, or + means it should match 1 or more times.
So, for example, with the RE /\A_?[A-Z]?\z/, the following strings will match:
_O
_
P
but these will not:
____
A_
PP
The RE you posted originally states:
The string may begin with an underscore
Then there must be a lowercase letter
Then there may be another underscore
For the rest of the string, there must be a letter, number, period, or -, which may be followed by an underscore
Example strings that match this RE:
_a_abcdefg
b_abc_def_
_qasdf_poiu_
a12345_
z._.._...._......_
u

_? means _ is optional.
It can accept _sadasd_sadsadsa_asdasdasd_ or asdasdsadasdasd i.e _ separated strings where _ is optional.
See demo.
http://regex101.com/r/hQ1rP0/89

Regex for distinct words (not embedded in other words)

How can I create a regular expression to match distinct words?
I tried the following regex, but it also matches words embedded in other words:
#"(abs|acos|acosh|asin|asinh|atan|atanh)"
For example, with
#"xxxabs abs"
abs by itself should match, but not inside xxxabs.

Although the solution (word boundaries) is an old classic, yours is an interesting question because the words in the alternation are so similar.
You can start with this:
\b(?:abs|acos|acosh|asin|asinh|atan|atanh)\b
And compress to that:
\b(?:a(?:cosh?|sinh?|tanh?|bs))\b
How does it work?
The key idea is to use the word boundaries \b to ensure that the match is not embedded in a larger word.
The idea of the compression is to make the engine match faster. It's hard to read, though, so unless you need every last drop of performance, that's purely for entertainment purposes.
Token-By-Token
\b # the boundary between a word char (\w) and
# something that is not a word char
(?: # group, but do not capture:
a # 'a'
(?: # group, but do not capture:
cos # 'cos'
h? # 'h' (optional (matching the most
# amount possible))
| # OR
sin # 'sin'
h? # 'h' (optional (matching the most
# amount possible))
| # OR
tan # 'tan'
h? # 'h' (optional (matching the most
# amount possible))
| # OR
bs # 'bs'
) # end of grouping
) # end of grouping
\b # the boundary between a word char (\w) and
# something that is not a word char
Bonus Regex
In case you're feeling depressed today, this alternate compression (is it longer than the original?) should cheer you up.
\b(?:a(?:(?:co|b)s|(?:cos|(?:si|ta)n)h|(?:si|ta)n))\b

How to find all instances of #[XX:XXXX] in a string and then find the surrounding text?

Given a string like:
"#[19:Sara Mas] what's the latest with the TPS report? #[30:Larry Peters] can you help out here?"
I want to find a way to dynamically return, the user tagged and the content surrounding. Results should be:
user_id: 19
copy: what's the latest with the TPS report?
user_id: 30
copy: can you help out here?
Any ideas on how this can be done with ruby/rails? Thanks
How is this regex for finding matches?
#\[\d+:\w+\s\w+\]

Split the string, then handle the content iteratively. I don't think it'd take more than:
tmp = string.split('#').map {|str| [str[/\[(\d*).*/,1], str[/\](.*^)/,1]] }
tmp.first #=> ["19", "what's the latest with the TPS report?"]
Does that help?

result = subject.scan(/\[(\d+).*?\](.*?)(?=#|\Z)/m)
This grabs id and content in backreferences 1 and 2 respectively. For stoping the capture either # or the end of string must be met.
"
\\[ # Match the character “[” literally
( # Match the regular expression below and capture its match into backreference number 1
\\d # Match a single digit 0..9
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
. # Match any single character that is not a line break character
*? # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
\\] # Match the character “]” literally
( # Match the regular expression below and capture its match into backreference number 2
. # Match any single character that is not a line break character
*? # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
)
(?= # Assert that the regex below can be matched, starting at this position (positive lookahead)
# Match either the regular expression below (attempting the next alternative only if this one fails)
\# # Match the character “\#” literally
| # Or match regular expression number 2 below (the entire group fails if this one fails to match)
\$ # Assert position at the end of the string (or before the line break at the end of the string, if any)
)
"
This will match something starting from # and ending to punctuation makr. Sorry if I didn't understand correctly.
result = subject.scan(/#.*?[.?!]/)

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Regex to separate IIF clauses in groups - ruby-on-rails

Related

How do I replace all the apostrophes that come right before or right after a comma?

Regex - Grab characters between two characters but not including them

what does _? mean in the following regex? [duplicate]

Regex for distinct words (not embedded in other words)

How to find all instances of #[XX:XXXX] in a string and then find the surrounding text?

Categories

Resources