Regex does not check the first character after first check - ruby-on-rails

I am trying to write a regex that:
Allows only numbers, lowercase letters and also "-" and "_".
String can only start with: letter number or "uuid:"
String must have at least one letter in it.
It must consist of at least 2 characters.
I managed to create such a regex: \A(?:uuid:|[a-z0-9])(?=(.*[a-z])){1,}(?:\w|-)+\z
I just don't understand why if the first character is a letter, it is not taken into account, so it doesn't pass for example: a1.
And also why it allows uppercase letters AA.
Tests: https://rubular.com/r/Q5gEP15iaYkHYQ
Thank you in advance for your help

You could also get the matches without lookarounds using an alternation matching at least 2 characters from the start.
If you don't want to match uppercase chars A-Z, then you can omit /i for case insensitive matching.
\A(?:uuid:|[a-z][a-z0-9_-]|[0-9][0-9_-]*[a-z])[a-z0-9_-]*\z
Explanation
\A Start of string
(?: Non capture group
uuid: match literally
| Or
[a-z][a-z0-9_-] match a char a-z and one of a-z 0-9 _ -
| Or
[0-9][0-9_-]*[a-z] Match a digit, optional chars 0-9 _ - and then a-z
) Close non capture group
[a-z0-9_-]* Match optional chars a-z 0-9 _ -
\z End of string
Regex rubular demo

It looks like AA meets all your requirements: it contains a letter, at least two chars, and starts with a letter, and contains "only numbers, lowercase letters and also - and _". NOTE you have an i flag that makes pattern matching case insensitive, and if you do not want to allow any uppercase letters, just remove it from the end of the regex literal.
To fix the other real issues, you can use
/\A(?=[^a-z]*[a-z])(?=.{2})(?:uuid:|[a-z0-9])[a-z0-9_-]*\z/
See this Rubular demo.
Note that in the demo, (?=[^a-z\n]*[a-z]) is used rather than (?=[^a-z]*[a-z]) because the test is performed on a single multiline string, not an array of separate strings.
Details:
\A - start of string
(?=[^a-z]*[a-z]) - minimum one letter
(?=.{2}) - minimim two chars
(?:uuid:|[a-z0-9]) - uuid:, or one letter or digit
[a-z0-9_-]* - zero or more letters, digits, _ or -
\z - end of string

You can use the following regular expression as the argument of String#match?. I've written the expression in free-spacing mode to make it self-documenting. Free-spacing mode is invoked with the option x (i.e. /x). It causes whitespace and comments to be ignored.
Note that I've defined named capture group common_code so that I can use its instructions as a subexpression (a.k.a. subroutine) in subsequent instructions. The invocation \g<common_code> tells Ruby to repeat the instructions in that capture group. See Subexpression Calls. You will see that numbered capture groups can be used as well.
The use of subexpressions has three benefits:
less code is required;
there is less chance of making errors, both when initially forming the expression and later when modifying it, by confining all instructions that are to repeat through the expression in one place (in the capture group); and
it makes the expression easier to read and understand.
re = /
\A # match the beginning of the string
(?: # begin a non-capture group
[a-z] # match a lc letter
(?<common_code> # begin a capture group named 'common_code'
[\da-z_-] # match a digit, lc letter, underscore or hyphen
) # end capture group
+ # execute the preceding capture group or more times
| # or
uuid: # match a string literal
\g<common_code>* # match code in capture group common_code >= 0 times
| # or
\d # match a digit
\g<common_code>* # match code in capture group common_code >= 0 times
[a-z] # match a lc letter
\g<common_code>* # match code in capture group common_code >= 0 times
) # end the non-capture group
\z # match the end of the string
/x # invoke free-spacing regex definition mode
test = %w| ab a1 a_ a- 0a 01a 0_a 0-a uuid: uuid:a uuid:0 uuid:_ uuid:- | +
%w| a _a -a a$ auuid: 01 0_ 0- 01_- |
test.each do |s|
puts "#{s.ljust(7)} -> #{s.match?(re)}"
end
ab -> true
a1 -> true
a_ -> true
a- -> true
0a -> true
01a -> true
0_a -> true
0-a -> true
uuid: -> true
uuid:a -> true
uuid:0 -> true
uuid:_ -> true
uuid:- -> true
a -> false
_a -> false
-a -> false
a$ -> false
auuid: -> false
01 -> false
0_ -> false
0- -> false
01_- -> false

Related

Rails string split every other "."

I have a bunch of sentences that I want to break into an array. Right now, I'm splitting every time \n appears in the string.
#chapters = #script.split('\n')
What I'd like to do is .split ever OTHER "." in the string. Is that possible in Ruby?
You could do it with a regex, but I'd start with a simple approach: just split on periods, then join pairs of substrings:
s = "foo. bar foo. foo bar. boo far baz. bizzle"
s.split(".").each_slice(2).map {|p| p.join "." }
# => => ["foo. bar foo", " foo bar. boo far baz", " bizzle"]
This is a case where it's easier to use String#scan than String#split.
We can use the following regular expression:
r = /(?<=\.|\A)[^.]*\.[^.]*(?=\.|\z)/
str=<<~_
Now is the time. This is it. It is now. The time to have fun.
The time to make new friends. The time to party.
_
str.scan(r)
#=> [
# "Now is the time. This is it",
# " It is now. The time to have fun",
# "\nThe time to make new friends. The time to party"
#=> ]
We can write the regular expression in free-spacing mode to make it self-documenting.
r = /
(?<= # begin a positive lookbehind
\A # match the beginning of the string
| # or
\. # match a period
) # end positive lookbehind
[^.]* # match zero or more characters other than periods
\. # match a period
[^.]* # match zero or more characters other than periods
(?= # begin a positive lookahead
\. # match a period
| # or
\z # match the end of the string
) # end positive lookahead
/x # invoke free-spacing regex definition mode
Note that (?<=\.|\A) can be replaced with (?<![^\.]). (?<![^\.]) is a negative lookbehind that asserts the match is not preceded by a character other than a period.
Similarly, (?=\.|\z) can be replaced with (?![^.]). (?![^.]) is a negative lookahead that asserts the match is not followed by a character other than a period.

Regular wrong regular expression, not validating

please i want to validate the inputs from a user, the format for the inputs would be: 3 uppercase characters, 3 integer numbers, an optional space, a -, an optional space, either a 'LAB or ((EN or ENLH) with 1 interger number ranging from a [1-9]).
The regex i wrote is
/\D{3}\d{3}\s?-\s?(LAB|(EN(LH)?\d{1}))/
am finding it difficult to stop inputs after the LAB so that when EEE333 - LAB1 is inputed it becomes invalid.
If you are asking how to prevent LAB1 at the end, use an end of line anchor $ in your regex test:
/\D{3}\d{3}\s?-\s?(LAB|(EN(LH)?\d{1}))$/
If you are trying to require exactly one digit at the end of the acceptable strings, move the single digit match outside of the optional groups:
/\D{3}\d{3}\s?-\s?(LAB|(EN(LH)?))\d{1}$/
I have wrote for you the following regular expression:
[A-Z]{3}[0-9]{3}\s?-\s?(?:LAB|(?:EN|LH))[1-9]{1}
The regex works a follows:
[A-Z]{3}
MATCH EXACTLY THREE UPPERCASE CHARACTERS RANGING FROM A TO Z
[0-9]{3}
MATCH EXACTLY THREE NUMBERS RANGING FROM 0 TO 9
\s?\-\s?
MATCH a space (optional) or a '-' (required) or a space (optional)
(?:LAB|(?:EN|LH))
MATCH 'LAB' OR ('EN' OR 'LH')?: omits capturing LAB OR EN OR LH
[1-9]{1}
MATCH EXACTLY ONE NUMBERS RANGING FROM 1 TO 9
You could place your regex between word boundaries \b.
You start your regex with \D which is any character that is not a digit. That would for example also match $%^. You could use [A-Z].
You use \d{1} which is a shorhand for [0-9], but you want to match a digit between 1 and 9 [1-9]. You could also omit the {1}.
Maybe this updated will work for you?
\b[A-Z]{3}\d{3} ?- ?(?:LAB|(?:EN(?:LH)?[1-9]))\b
Explanation
A word boundary \b
Match 3 uppercase characters [A-Z]{3}
Match 3 digits \d{3}
Match an optional whitespace, a hyphen and another optional whitespace ?- ?
A non capturing group which for example matches LAB or EN EN1 or ENLH or ENLH9 (?:EN(?:LH)?[1-9]))
A word boundary \b

How to create a regular expression to match specific substrings inside brackets?

In my Ruby on Rails app I need a regex that accepts the following values:
{DD}
{MM}
{YY}
{NN}
{NNN}
{NNNN}
{NNNNN}
{NNNNNN}
upper and lowercase letters
the special characters -, _, . and #
I am still new to regular expressions and I came up with this:
/\A[a-zA-Z._}{#-]*\z/
This works pretty well already, however it also matches strings that should not be allowed such as:
}FOO or {YYY}
Can anybody help?
You may use
/\A(?:\{(?:DD|MM|YY|N{2,6})\}|[A-Za-z_.#-])*\z/
See Rubular demo
\A - start of string anchor
(?:\{(?:DD|MM|YY|N{2,6})\}|[A-Za-z_.#-])* - a non-capturing group ((?:...) that only groups sequences of atoms and does not create submatches/subcaptures) zero or more occurrences of:
\{(?:DD|MM|YY|N{2,6})\} - a { then either DD, or MM, YY, 2 to 6N followed with }
| - or
[A-Za-z_.#-] - 1 char from the set (ASCII letter, _, ., # or -)
\z - end of string.

what does _? mean in the following regex? [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 8 years ago.
What does _? mean in the following rails regex?
/\A_?[a-z]_?(?:[a-z0-9.-]_?)*\z/i
I have attempted to decipher the regex as follows
# regex explained
# \A : matches the beginning of the string
# _? :
# [ : beginning of character group
# a-z : any lowercase letter
# ] : end of character group
# _? :
# ( : is a capture group, anything matched within the parens is saved for later use
# ?: : non-capturing group: matches below, but doesn't store a back-ref
# [ : beginning of character group
# a-z : any lowercase letter
# A-Z : any uppercase letter
# 0-9 : any digit
# . : a fullstop or "any character" ??????
# _ : an underscore
# ] : end of character group
# _? :
# ) : See above
# * : zero or more times of the given characters
# \z : is the end of the string
_ matches an underscore.
? matches zero or one of the preceeding character; basically making the preceeding character optional.
So _? will match one underscore if it is present, and will match without it.
? means that the previous expression should appear 0 or 1 times, similarly to how * means it should match 0 or more times, or + means it should match 1 or more times.
So, for example, with the RE /\A_?[A-Z]?\z/, the following strings will match:
_O
_
P
but these will not:
____
A_
PP
The RE you posted originally states:
The string may begin with an underscore
Then there must be a lowercase letter
Then there may be another underscore
For the rest of the string, there must be a letter, number, period, or -, which may be followed by an underscore
Example strings that match this RE:
_a_abcdefg
b_abc_def_
_qasdf_poiu_
a12345_
z._.._...._......_
u
_? means _ is optional.
It can accept _sadasd_sadsadsa_asdasdasd_ or asdasdsadasdasd i.e _ separated strings where _ is optional.
See demo.
http://regex101.com/r/hQ1rP0/89

How to find all instances of #[XX:XXXX] in a string and then find the surrounding text?

Given a string like:
"#[19:Sara Mas] what's the latest with the TPS report? #[30:Larry Peters] can you help out here?"
I want to find a way to dynamically return, the user tagged and the content surrounding. Results should be:
user_id: 19
copy: what's the latest with the TPS report?
user_id: 30
copy: can you help out here?
Any ideas on how this can be done with ruby/rails? Thanks
How is this regex for finding matches?
#\[\d+:\w+\s\w+\]
Split the string, then handle the content iteratively. I don't think it'd take more than:
tmp = string.split('#').map {|str| [str[/\[(\d*).*/,1], str[/\](.*^)/,1]] }
tmp.first #=> ["19", "what's the latest with the TPS report?"]
Does that help?
result = subject.scan(/\[(\d+).*?\](.*?)(?=#|\Z)/m)
This grabs id and content in backreferences 1 and 2 respectively. For stoping the capture either # or the end of string must be met.
"
\\[ # Match the character “[” literally
( # Match the regular expression below and capture its match into backreference number 1
\\d # Match a single digit 0..9
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
. # Match any single character that is not a line break character
*? # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
\\] # Match the character “]” literally
( # Match the regular expression below and capture its match into backreference number 2
. # Match any single character that is not a line break character
*? # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
)
(?= # Assert that the regex below can be matched, starting at this position (positive lookahead)
# Match either the regular expression below (attempting the next alternative only if this one fails)
\# # Match the character “\#” literally
| # Or match regular expression number 2 below (the entire group fails if this one fails to match)
\$ # Assert position at the end of the string (or before the line break at the end of the string, if any)
)
"
This will match something starting from # and ending to punctuation makr. Sorry if I didn't understand correctly.
result = subject.scan(/#.*?[.?!]/)

Resources