what does _? mean in the following regex? [duplicate] - ruby-on-rails

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 8 years ago.
What does _? mean in the following rails regex?
/\A_?[a-z]_?(?:[a-z0-9.-]_?)*\z/i
I have attempted to decipher the regex as follows
# regex explained
# \A : matches the beginning of the string
# _? :
# [ : beginning of character group
# a-z : any lowercase letter
# ] : end of character group
# _? :
# ( : is a capture group, anything matched within the parens is saved for later use
# ?: : non-capturing group: matches below, but doesn't store a back-ref
# [ : beginning of character group
# a-z : any lowercase letter
# A-Z : any uppercase letter
# 0-9 : any digit
# . : a fullstop or "any character" ??????
# _ : an underscore
# ] : end of character group
# _? :
# ) : See above
# * : zero or more times of the given characters
# \z : is the end of the string

_ matches an underscore.
? matches zero or one of the preceeding character; basically making the preceeding character optional.
So _? will match one underscore if it is present, and will match without it.

? means that the previous expression should appear 0 or 1 times, similarly to how * means it should match 0 or more times, or + means it should match 1 or more times.
So, for example, with the RE /\A_?[A-Z]?\z/, the following strings will match:
_O
_
P
but these will not:
____
A_
PP
The RE you posted originally states:
The string may begin with an underscore
Then there must be a lowercase letter
Then there may be another underscore
For the rest of the string, there must be a letter, number, period, or -, which may be followed by an underscore
Example strings that match this RE:
_a_abcdefg
b_abc_def_
_qasdf_poiu_
a12345_
z._.._...._......_
u

_? means _ is optional.
It can accept _sadasd_sadsadsa_asdasdasd_ or asdasdsadasdasd i.e _ separated strings where _ is optional.
See demo.
http://regex101.com/r/hQ1rP0/89

Related

Regex does not check the first character after first check

I am trying to write a regex that:
Allows only numbers, lowercase letters and also "-" and "_".
String can only start with: letter number or "uuid:"
String must have at least one letter in it.
It must consist of at least 2 characters.
I managed to create such a regex: \A(?:uuid:|[a-z0-9])(?=(.*[a-z])){1,}(?:\w|-)+\z
I just don't understand why if the first character is a letter, it is not taken into account, so it doesn't pass for example: a1.
And also why it allows uppercase letters AA.
Tests: https://rubular.com/r/Q5gEP15iaYkHYQ
Thank you in advance for your help
You could also get the matches without lookarounds using an alternation matching at least 2 characters from the start.
If you don't want to match uppercase chars A-Z, then you can omit /i for case insensitive matching.
\A(?:uuid:|[a-z][a-z0-9_-]|[0-9][0-9_-]*[a-z])[a-z0-9_-]*\z
Explanation
\A Start of string
(?: Non capture group
uuid: match literally
| Or
[a-z][a-z0-9_-] match a char a-z and one of a-z 0-9 _ -
| Or
[0-9][0-9_-]*[a-z] Match a digit, optional chars 0-9 _ - and then a-z
) Close non capture group
[a-z0-9_-]* Match optional chars a-z 0-9 _ -
\z End of string
Regex rubular demo
It looks like AA meets all your requirements: it contains a letter, at least two chars, and starts with a letter, and contains "only numbers, lowercase letters and also - and _". NOTE you have an i flag that makes pattern matching case insensitive, and if you do not want to allow any uppercase letters, just remove it from the end of the regex literal.
To fix the other real issues, you can use
/\A(?=[^a-z]*[a-z])(?=.{2})(?:uuid:|[a-z0-9])[a-z0-9_-]*\z/
See this Rubular demo.
Note that in the demo, (?=[^a-z\n]*[a-z]) is used rather than (?=[^a-z]*[a-z]) because the test is performed on a single multiline string, not an array of separate strings.
Details:
\A - start of string
(?=[^a-z]*[a-z]) - minimum one letter
(?=.{2}) - minimim two chars
(?:uuid:|[a-z0-9]) - uuid:, or one letter or digit
[a-z0-9_-]* - zero or more letters, digits, _ or -
\z - end of string
You can use the following regular expression as the argument of String#match?. I've written the expression in free-spacing mode to make it self-documenting. Free-spacing mode is invoked with the option x (i.e. /x). It causes whitespace and comments to be ignored.
Note that I've defined named capture group common_code so that I can use its instructions as a subexpression (a.k.a. subroutine) in subsequent instructions. The invocation \g<common_code> tells Ruby to repeat the instructions in that capture group. See Subexpression Calls. You will see that numbered capture groups can be used as well.
The use of subexpressions has three benefits:
less code is required;
there is less chance of making errors, both when initially forming the expression and later when modifying it, by confining all instructions that are to repeat through the expression in one place (in the capture group); and
it makes the expression easier to read and understand.
re = /
\A # match the beginning of the string
(?: # begin a non-capture group
[a-z] # match a lc letter
(?<common_code> # begin a capture group named 'common_code'
[\da-z_-] # match a digit, lc letter, underscore or hyphen
) # end capture group
+ # execute the preceding capture group or more times
| # or
uuid: # match a string literal
\g<common_code>* # match code in capture group common_code >= 0 times
| # or
\d # match a digit
\g<common_code>* # match code in capture group common_code >= 0 times
[a-z] # match a lc letter
\g<common_code>* # match code in capture group common_code >= 0 times
) # end the non-capture group
\z # match the end of the string
/x # invoke free-spacing regex definition mode
test = %w| ab a1 a_ a- 0a 01a 0_a 0-a uuid: uuid:a uuid:0 uuid:_ uuid:- | +
%w| a _a -a a$ auuid: 01 0_ 0- 01_- |
test.each do |s|
puts "#{s.ljust(7)} -> #{s.match?(re)}"
end
ab -> true
a1 -> true
a_ -> true
a- -> true
0a -> true
01a -> true
0_a -> true
0-a -> true
uuid: -> true
uuid:a -> true
uuid:0 -> true
uuid:_ -> true
uuid:- -> true
a -> false
_a -> false
-a -> false
a$ -> false
auuid: -> false
01 -> false
0_ -> false
0- -> false
01_- -> false

what does the instruction "name =~ /[A-Z].*/"?

I'm studying ruby ​​on rails and I'm seeing a code, but I could not understand how it actually works.
''''ruby
validate: first_letter_must_be_uppercase
private
def first_letter_must_be_uppercase
errors.add ("name", "first letter must be uppercase") unless name =~ /[A-Z].*/
end
The code is basically checking that the string should contain the first letter in the upper case using the regular expression
explanation:
/[A-Z].*/
[A-Z] - Checks for any capital letter from A to Z
. - checks for any wildcard character
* - matches for 0 to any number of repetition.
To sum up
The input string should match the following format - A capital letter from A-Z and then should have 0 to any number of wildcard characters
You can check it on Rubular
EDIT
As pointed out by #vasfed if you want to match the first character the regex need to be changed to
/\A[A-Z].*/
\A - Ensure start of the string

Regex to separate IIF clauses in groups

I am trying to get a regex that separates the parts of a IIF made in VB to convert it to a RoR if. The string that I am trying to convert is this one:
Var007>2,IIF(Var133=2 OR Var133=3,'',Var132+IIF(Var123=2,Var122+IIF(Var113=2,Var112,0),0)+IIF(Var007>3,IIF(Var143=3,Var142,0),0)),''
And the regex that I am developing is this one:
(.{1,}),(?![^\(]*\))(.{1,}),(?![^\(]*\))(.{1,})
I want to get this:
Var007>2
IIF(Var133=2 OR Var133=3,'',Var132+IIF(Var123=2,Var122+IIF(Var113=2,Var112,0),0)+IIF(Var007>3,IIF(Var143=3,Var142,0),0))
Now , I am getting this because can´t select a group between brackets.
Var007>2,IIF(Var133=2 OR Var133=3,'',Var132+IIF(Var123=2,Var122+IIF(Var113=2,Var112,0),0)+IIF(Var007>3
IIF(Var143=3,Var142,0),0))
You can see it on Rubular.
This are a few examples of the possible string:
Var007>0,IIF(Var002=0,0, ((Var111*Var112)*CaracteristicaArticulo('Var002','Kilos M2')*(1+(CaracteristicaArticulo('Var002','Porcentaje Rozamiento')/100)))+IIF(Var022=1,Var112*0.800,0)+(Var112*0.339)),''
Var007>1,IIF(Var110=0,IIF(Var025=0 OR Var025=1 OR Var025=39 OR Var025=2,20,IIF(Var025=3 OR Var025=4,21,IIF(Var025=5 OR Var025=6 OR Var025=28 OR Var025=29,22,IIF(Var025=7 OR Var025=8 OR Var025=9 OR Var025=10,24,IIF(Var025=12,26,IIF(Var025=11 OR Var025=14 OR Var025=16 OR Var025=17,27,' ')))))),''),''
There won't be single quotes inside string literals.
Please I need your help ;)
It is generally not a good idea to parse strings like this with regex, but your requirements are not that complex in this case.
Here is a solution that will match "tokens" that consist of 1 or more occurrences of 1+ word chars followed with a balanced amount of (...) and having '...' substrings inside (with possible ( or )), or chars other than ,:
s = "Var007>2,IIF(Var133=2 OR Var133=3,'',Var132+IIF(Var123=2,Var122+IIF(Var113=2,Var112,0),0)+IIF(Var007>3,IIF(Var143=3,Var142,0),0)),''"
rx = /
( # Group 1, what we need to extract
(?: # A non-capturing group acting as a container
\w+ # 1 or more word chars
( # Group 2 (technical one)
\( # opening parenthesis
(?:
'[^']*' # a single quoted substring with no single quotes inside
| # or
[^()']+ # 1 or more chars other than quote and parentheses
| # or
\g<2> # recurse Group 2 pattern
)* # Group 2 end, can repeat 0 or more times
\) # closing parenthesis
)
|
[^,] # Any char other than a comma
)+ # One or more occurrences of the alternatives in the container group
) # End of Group 1
/x # extended mode with all in-pattern whitespace ignored
res = []
s.scan(rx) { |m|
res << m[0] # Only collect Group 1 values dropping all others
}
puts res
See the Ruby demo online
Output:
Var007>2
IIF(Var133=2 OR Var133=3,'',Var132+IIF(Var123=2,Var122+IIF(Var113=2,Var112,0),0)+IIF(Var007>3,IIF(Var143=3,Var142,0),0))
''

How do I replace all the apostrophes that come right before or right after a comma?

I have a string aString = "old_tag1,old_tag2,'new_tag1','new_tag2'"
I want to replace the apostrophees that come right before or right after a comma. For example in my case the apostrophees enclosing new_tag1 and new_tag2 should be removed.
This is what I have right now
aString = aString.gsub("'", "")
This is however problematic as it removes any apostrophe inside for example if I had 'my_tag's' instead of 'new_tag1'. How do I get rid of only the apostrophes that come before or after the commas ?
My desired output is
aString = "old_tag1,old_tag2,new_tag1,new_tag2"
My guess is to use regex as well, but in a slightly other way:
aString = "old_tag1,old_tag2,'new_tag1','new_tag2','new_tag3','new_tag4's'"
aString.gsub /(?<=^|,)'(.*?)'(?=,|$)/, '\1\2\3'
#=> "old_tag1,old_tag2,new_tag1,new_tag2,new_tag3,new_tag4's"
The idea is to find a substring with bounding apostrophes and paste it back without it.
regex = /
(?<=^|,) # watch for start of the line or comma before
' # find an apostrophe
(.*?) # get everything between apostrophes in a non-greedy way
' # find a closing apostrophe
(?=,|$) # watch after for the comma or the end of the string
/x
The replacement part just paste back the content of the first, second, and third groups (everything between parenthesis).
Thanks for #Cary for /x modificator for regexes, I didn't know about it! Extremely useful for explanation.
This answers the question, "I want to replace the apostrophes that come right before or right after a comma".
r = /
(?<=,) # match a comma in a positive lookbehind
\' # match an apostrophe
| # or
\' # match an apostrophe
(?=,) # match a comma in a positive lookahead
/x # free-spacing regex definition mode
aString = "old_tag1,x'old_tag2'x,x'old_tag3','new_tag1','new_tag2'"
aString.gsub(r, '')
#=> => "old_tag1,x'old_tag2'x,x'old_tag3,new_tag1,new_tag2'"
If the objective is instead to remove single quotes enclosing a substring when the left quote is at the the beginning of the string or is immediately preceded by a comma and the right quote is at the end of the string or is immediately followed by comma, several approaches are possible. One is to use a single, modified regex, as #Dimitry has done. Another is to split the string on commas, process each string in the resulting array and them join the modified substrings, separated by commas.
r = /
\A # match beginning of string
\' # match single quote
.* # match zero or more characters
\' # match single quote
\z # match end of string
/x # free-spacing regex definition mode
aString.split(',').map { |s| (s =~ r) ? s[1..-2] : s }.join(',')
#=> "old_tag1,x'old_tag2'x,x'old_tag3',new_tag1,new_tag2"
Note:
arr = aString.split(',')
#=> ["old_tag1", "x'old_tag2'x", "x'old_tag3'", "'new_tag1'", "'new_tag2'"]
"old_tag1" =~ r #=> nil
"x'old_tag2'x" =~ r #=> nil
"x'old_tag3'" =~ r #=> nil
"'new_tag1'" =~ r #=> 0
"'new_tag2'" =~ r #=> 0
Non regex replacement
Regular expressions can get really ugly. There is a simple way to do it with just string replacement: search for the pattern ,' and ', and replace with ,
aString.gsub(",'", ",").gsub("',", ",")
=> "old_tag1,old_tag2,new_tag1,new_tag2'"
This leaves the trailing ', but that is easy to remove with .chomp("'"). A leading ' can be removed with a simple regex .gsub(/^'/, "")

Regex - Grab characters between two characters but not including them

I'm trying to grab the rsession variable in the string below between '=' and '&' in a string.
I'm able to do it with /rsession(\=.+?\&)/.
But how do I do it so the output doesn't include the '=' and '&'?
The string is:
"app=1334300&rsession=0806343413_1:5bc6a3c80271826a1c0016c1520d3&token=a6caacf7edfbd9383429e30a1adfadf385208985ad&redirectReq=true"
You can use this regex:
/(?<=rsession=)[^&]*/
Explanation:
(?<= # asserts the match at position after
rsession= # the literal string `rsession=`
) # it is called: positive lookbehind
[^&] # not `&` character
* # as many as possible
Hope it helps.
You can capture it in group like this.
Regex: /rsession=([^&]+)/
Explanation:
([^&]+) will capture in group all characters until a & is met. Use \1 or $1 to use captured group.
Regex101 Demo

Resources