Rails regex syntax error - ruby-on-rails

I am trying to set a regex validation on a form with the code below. I want to allow any alphabetical character, including accents, numbers and hyphen, apostrophe, comma and space. This expression should match the result : "Tir à l'arc, 3d, danse"
validates :interest_list, tags: true, if: lambda { interest_list.any? }
validates :interest_list, format: { with: /\A[[:alpha:]\d-'’, ]\z/, message: "only allows letters, space, hyphen and apostrophe" }
But I have this error empty range in char class: /\A[[:alpha:]\d-'’,]\z/
Can anyone tell me what I'm doing wrong ?

Any - that appears inside a character class in any position other than the first or last is treated as a range, ie. [0-9] is shorthand for [0123456789]. This range is calculated based on the ASCII values.
You have \d-' in your regex and \d isn't valid to use for the start/end of a range. Probably what you want is to move - to the start or end of your []
/\A[[:alpha:]\d'’, -]\z/
...and to solve your next problem/question - as it is your regex will only match a single character, you probably also want a repeat on that character class, like a +:
/\A[[:alpha:]\d'’, -]+\z/

Error: Regex Construction ..
Invalid range end in character class
\A[[:alpha:]\d->>>HERE>>>'’, ]\z
\d - anything is invalid range because a range operator - cannot
specify a range between a class and anything else.
You'd need to escape the - to make it a literal \A[[:alpha:]\d\-'’, ]\z
or add it to the end or beginning \A[[:alpha:]\d'’, -]\z

Related

what does the instruction "name =~ /[A-Z].*/"?

I'm studying ruby ​​on rails and I'm seeing a code, but I could not understand how it actually works.
''''ruby
validate: first_letter_must_be_uppercase
private
def first_letter_must_be_uppercase
errors.add ("name", "first letter must be uppercase") unless name =~ /[A-Z].*/
end
The code is basically checking that the string should contain the first letter in the upper case using the regular expression
explanation:
/[A-Z].*/
[A-Z] - Checks for any capital letter from A to Z
. - checks for any wildcard character
* - matches for 0 to any number of repetition.
To sum up
The input string should match the following format - A capital letter from A-Z and then should have 0 to any number of wildcard characters
You can check it on Rubular
EDIT
As pointed out by #vasfed if you want to match the first character the regex need to be changed to
/\A[A-Z].*/
\A - Ensure start of the string

Brakeman insufficient validation warning of regex anchors

I'm trying to implement a validation in a model like this.
validates_format_of :field, with: /[0-9]/, message: 'must have at least one number (0-9)'
Brakeman detects this a Format Validation security issue and it recommends to add the anchors between the regular expression.
Insufficient validation for 'field' using /[0-9]/. Use \A and \z as anchors near line 54
If I add those anchors, the regular expression stops working, so I don't know what to do in this case. Here's the tests I made using rails c.
"asdf1234".match(/\A[0-9]\z/) # => nil
"foobar1".match(/\A[0-9]\z/) # => nil
I need that the method return #<MatchData "1"> in both cases.
Any ideas?
Thanks.
If you need to match a string that has at least 1 digit inside, and any other chars before and after, you may use
/\A[^0-9]*[0-9].*\z/m
or just
/\A.*[0-9].*\z/m
Details
\A - start of string
[^0-9]* - zero or more chars other than an ASCII digit
[0-9] - an ASCII digit
.* - any 0+ chars, as many as possible, up to the
\z - end of string.
The m modifier makes . match any char, including a line break char.
Actually, /\A.*[0-9].*\z/m will be a bit slower, as the first .* will grab all the string at once and then will backtrack to find the last digit. The first one is more optimized.

Rails validate format with regex

In my rails app, I want to validate input on a string field containing any number of keywords (which could be more than 1 natural language word (e.g. "document number")). To recognize the individual keywords, I am entering them separated by ", " (or get their end by end of string).
For this I use
validates :keywords, presence: true, format: { with: /((\w+\s?-?\w+)(,|\z))/i, message: "please enter keywords in correct format"}
It should allow the attribute keywords (string) to contain: "word1, word2, word3 word4, word5-word6"
It should not allow the use of any other pattern. e.g. not "word1; word2;"
It does incorrectly allow "word1; word2"
On rubular, this regex works; yet in my rails app it allows for example "word1; word2" or "word3; word-"
where is my error (got to say am beginner in Ruby and regex)?
You need to use anchors \A and \z and modify the pattern to fit that logic as follows:
/\A(\w+(?:[\s-]*\w+)?)(?:,\s*\g<1>)*\z/
See the Rubular demo
Details:
\A - start of string
(\w+(?:[\s-]*\w+)?) - Group 1 capturing:
\w+ - 1 or more word chars
(?:[\s-]*\w+)? - 1 or 0 sequences of:
[\s-]* - 0+ whitespaces or -
\w+ - 1 or more word chars
(?:,\s*\g<1>)* - 0 or more sequences of:
,\s* - comma and 0+ whitespaces
\g<1> - the same pattern as in Group 1
\z - end of string.

rails 4 model text regex allow 0-9, a-z, links and email address

I am working with the following model validation and my tests are working except when I started adding the ability to include links bad characters are making it through :(
validates :application_process,
presence: true,
format: { with: %r{\A[\w\d .,:/-#&?]+\z}, message: :bad_format }
I want to allow the following:
A-Z
a-z
0-9
?
:
/
#
.
,
The regex you have contains a -. A hyphen inside a character class creates a range if it is not escaped and does not appear after a shorthand character class, a range, start or end of the character class.
So, if you need to match a literal hyphen escape it or place at the end of the character class (before ]).
To only match the characters and ranges you specify in the question, use
%r{\A[A-Za-z0-9?:/#.,]+\z}
To add a hyphen:
%r{\A[A-Za-z0-9?:/#.,-]+\z}
^

ActiveSupport::Inflector::camelize - help in understanding regex

Short version:
I am having a rather hard time understanding two rather complex regular expressions in the ActiveSupport::Inflector::camelize method.
This is the definition of the camelize method:
def camelize(term, uppercase_first_letter = true)
string = term.to_s
if uppercase_first_letter
string = string.sub(/^[a-z\d]*/) { inflections.acronyms[$&] || $&.capitalize }
else
string = string.sub(/^(?:#{inflections.acronym_regex}(?=\b|[A-Z_])|\w)/) { $&.downcase }
end
string.gsub(/(?:_|(\/))([a-z\d]*)/i) { "#{$1}#{inflections.acronyms[$2] || $2.capitalize}" }.gsub('/', '::')
end
I have some difficulty understanding:
string = string.sub(/^(?:#{inflections.acronym_regex}(?=\b|[A-Z_])|\w)/) { $&.downcase }
and:
string.gsub(/(?:_|(\/))([a-z\d]*)/i) { "#{$1}#{inflections.acronyms[$2] || $2.capitalize}" }.gsub('/', '::')
Please explain to me what they mean. Thank you.
Long version
This shows me trying to understand the regex and how I interpret them to mean. It would be very helpful if you could go through this and correct my mistakes.
For the first regex
string = string.sub(/^(?:#{inflections.acronym_regex}(?=\b|[A-Z_])|\w)/) { $&.downcase }
Based on what I am seeing, inflections.acronym_regex is from the Inflections class in the ActiveSupport::Inflector module, and in the initialize method of the Inflections class,
def initialize
#plurals, #singulars, #uncountables, #humans, #acronyms, #acronym_regex = [], [], [], [], {}, /(?=a)b/
end
acronym_regex is assigned /(?=a)b/. From what I understand from http://www.ruby-doc.org/core-2.0.0/Regexp.html#class-Regexp-label-Anchors ,
(?=pat) - Positive lookahead assertion: ensures that the following characters match pat, but doesn't include those characters in the matched text
So /(?=a)b/ ensures that character a is inside the text, but we dont include character a inside the matched text, and what immediately follows character a must be character b. In other words, "abc" would match this regex, but "bbc" would not match this regex, and the matched text for "abc" would be "b" (instead of "ab").
So combining the value of inflections.acronym_regex into this regex /^(?:#{inflections.acronym_regex}(?=\b|[A-Z_])|\w)/, I do not know which of the following two regex results:
A. /^(?:/(?=a)b/(?=\b|[A-Z_])|\w)/
B. /^(?:(?=a)b(?=\b|[A-Z_])|\w)/
although I am thinking it is B. From what I understand, (?: provides grouping without capturing, (?= means positive lookahead assertion, \b matches word boundaries when outside brackets and matches backspace when inside brackets. So in english terms, regex B, when matching against a text, will find a string that begins with an a character, followed by a b character, and one of (1. backspace [whatever that may mean] 2. any uppercase character or underscore 3. any english alphabetic character, digit, or underscore).
However, I find it strange that passing upper_case_first_letter = false to the camelize function should cause it to match a string starting with the characters ab, given that that does not seem to be how the camelize function behaves.
For the second regex
string.gsub(/(?:_|(\/))([a-z\d]*)/i) { "#{$1}#{inflections.acronyms[$2] || $2.capitalize}" }.gsub('/', '::')
The regex is:
/(?:_|(\/))([a-z\d]*)/i
I am guessing that this regex will match a substring that starts with either an _ or /, followed by 0 or more (upper or lowercase english alpabetic characters or digit). Furthermore, for the first group (?:_|(\/)), whether we match the _ or /, the ([a-z\d]*) capturing group will always be regarded as the second group. I do understand the part where the block tries to look up inflections.acronyms[$2] and on failure, does $2.captitalize.
Since (?: means grouping without capturing, what is the value of $1 when we match _ ? Is it still _ ? And for the .gsub('/', '::') portion, I am guessing that it gets applied for each match in the initial gsub, instead of being applied to the overall string after the outer gsub call is done?
Apologies for the really long post. Please point out my errors in understanding the 2 regular expressions, or explain them in a better way if you can do it.
Thank you.
However, I find it strange that passing upper_case_first_letter =
false to the camelize function should cause it to match a string
starting with the characters ab, given that that does not seem to be
how the camelize function behaves.
?: acts like a . here and does match the string (ie. single character) but there is no grouping, therefore the match is in $&.
Since (?: means grouping without capturing, what is the value of $1
when we match _ ? Is it still _ ?
It's nil since there is no capturing. The value is in $2
And for the .gsub('/', '::') portion, I am guessing that it gets
applied for each match in the initial gsub, instead of being applied
to the overall string after the outer gsub call is done?
It's applied to the overall result as gsub with block returns a string and the gsub('/', '::') is outside of a block.

Resources