Regex problem with match method - ruby-on-rails

I want validate many mails ( one or more) with regex expression but this attribute don´t belongs to any model. So I wrote a method:
def emails_are_valid?(emails)
#regex with validation
regex = Regexp.new("^(\s*[a-zA-Z0-9\._%-]+#[a-zA-Z0-9\.-]+\.[a-zA-Z]{2,4}\s*([,]{1}[\s]*[a-zA-Z0-9\._%-]+#[a-zA-Z0-9\.-]+\.[a-zA-Z]{2,4}\s*)*)$")
#if the quantity of emails is zero o its validations is bad return false.
if emails.blank? || emails.match(regex).nil?
return false
else
return true
end
end
I evaluate this string mm#somedomain.com aa, mma#somedomain.com when
I tested this regex in http://www.rubular.com/ is bad (no matches).
So According to this page my regex is ok.
But when I evaluate emails.match(regex).nil? that returns me false (So the string is valid, but this string is bad)
Please I need help. my regex is bad or my emails_are_valid? method is bad or match method is bad.
Thanks in advance.

You should have used single quotes instead of double quotes when declaring your regex, otherwise \s gets parsed as an escape sequence.
Change that line to
regex = Regexp.new('^(\s*[a-zA-Z0-9\._%-]+#[a-zA-Z0-9\.-]+\.[a-zA-Z]{2,4}\s*([,]{1}[\s]*[a-zA-Z0-9\._%-]+#[a-zA-Z0-9\.-]+\.[a-zA-Z]{2,4}\s*)*)$')
and the method will work.
On a side note, this would be a more concise way of doing the same thing - http://codepad.org/YbsqIkcP

Related

Ruby on Rails: Checking for valid regex does not work properly, high false rate

In my application I've got a procedure which should check if an input is valid or not. You can set up a regex for this input.
But in my case it returns false instead of true. And I can't find the problem.
My code looks like this:
gaps.each_index do | i |
if gaps[i].first.starts_with? "~"
# regular expression
begin
regex = gaps[i].first[1..-1]
# a pipe is used to seperate the regex from the solution-string
if regex.include? "|"
puts "REGEX FOUND ------------------------------------------"
regex = regex.split("|")[0..-2].join("|")
end
reg = Regexp.new(regex, true)
unless reg.match(data[i])
puts "REGEX WRONGGGG -------------------"
#wrong_indexes << i
end
rescue
end
else
# normal string
if data[i].nil? || data[i].strip != gaps[i].first.strip
#wrong_indexes << i
end
end
An example would be:
[[~berlin|berlin]]
The left one before the pipe is the regex and the right one next to the pipe is the correct solution.
This easy input should return true, but it doesn't.
Does anyone see the problem?
Thank you all
EDIT
Somewhere in this lines must be the problem:
if regex.include? "|"
puts "REGEX FOUND ------------------------------------------"
regex = regex.split("|")[0..-2].join("|")
end
reg = Regexp.new(regex, true)
unless reg.match(data[i])
Update: Result without ~
The whole point is that you are initializing regex using the Regexp constructor
Constructs a new regular expression from pattern, which can be either a String or a Regexp (in which case that regexp’s options are propagated, and new options may not be specified (a change as of Ruby 1.8).
However, when you pass the regex (obtained with regex.split("|")[0..-2].join("|")) to the constructor, it is a string, and reg = Regexp.new(regex, true) is getting ~berlin (or /berlin/i) as a literal string pattern. Thus, it actually is searching for something you do not expect.
See, regex= "[[/berlin/i|berlin]]" only finds a *literal /berlin/i text (see demo).
Also, you need to get the pattern from the [[...]], so strip these brackets with regex = regex.gsub(/\A\[+|\]+\z/, '').split("|")[0..-2].join("|").
Note you do not need to specify the case insensitive options, since you already pass true as the second parameter to Regexp.new, it is already case-insensitive.
If you are performing whole word lookup, add word boundaries: regex= "[[\\bberlin\\b|berlin]]" (see demo).

string regex on ruby on rails

how to make sure my string format must be like this :
locker_number=3,email=ucup#gmail.com,mobile_phone=091332771331,firstname=ucup
i want my string format `"key=value,"
how to make regex for check my string on ruby?
This regex will find what you're after.
\w+=.*?(,|$)
If you want to capture each pairing use
(\w+)=(.*?)(?:,|$)
http://rubular.com/r/A2ernIzQkq
The \w+ is one or more occurrences of a character a-z, 1-9, or an underscore. The .*? is everything until the first , or the end of the string ($). The pipe is or and the ?: tells the regex no to capture that part of the expression.
Per your comment it would be used in Ruby as such,
(/\w+=.*?(,|$)/ =~ my_string) == 0
You can use a regex like this:
\w+=.*?(,|$)
Working demo
You can use this code:
"<your string>" =~ /\w+=.*?(,|$)/
What about something like this? It's picky about the last element not ending with ,. But it doesn't enforce the need for no commas in the key or no equals in the value.
'locker_number=3,email=ucup#gmail.com,mobile_phone=091332771331,firstname=ucup' =~ /^([^=]+=[^,]+,)*([^=]+=[^,]+)$/

ActiveSupport::Inflector::camelize - help in understanding regex

Short version:
I am having a rather hard time understanding two rather complex regular expressions in the ActiveSupport::Inflector::camelize method.
This is the definition of the camelize method:
def camelize(term, uppercase_first_letter = true)
string = term.to_s
if uppercase_first_letter
string = string.sub(/^[a-z\d]*/) { inflections.acronyms[$&] || $&.capitalize }
else
string = string.sub(/^(?:#{inflections.acronym_regex}(?=\b|[A-Z_])|\w)/) { $&.downcase }
end
string.gsub(/(?:_|(\/))([a-z\d]*)/i) { "#{$1}#{inflections.acronyms[$2] || $2.capitalize}" }.gsub('/', '::')
end
I have some difficulty understanding:
string = string.sub(/^(?:#{inflections.acronym_regex}(?=\b|[A-Z_])|\w)/) { $&.downcase }
and:
string.gsub(/(?:_|(\/))([a-z\d]*)/i) { "#{$1}#{inflections.acronyms[$2] || $2.capitalize}" }.gsub('/', '::')
Please explain to me what they mean. Thank you.
Long version
This shows me trying to understand the regex and how I interpret them to mean. It would be very helpful if you could go through this and correct my mistakes.
For the first regex
string = string.sub(/^(?:#{inflections.acronym_regex}(?=\b|[A-Z_])|\w)/) { $&.downcase }
Based on what I am seeing, inflections.acronym_regex is from the Inflections class in the ActiveSupport::Inflector module, and in the initialize method of the Inflections class,
def initialize
#plurals, #singulars, #uncountables, #humans, #acronyms, #acronym_regex = [], [], [], [], {}, /(?=a)b/
end
acronym_regex is assigned /(?=a)b/. From what I understand from http://www.ruby-doc.org/core-2.0.0/Regexp.html#class-Regexp-label-Anchors ,
(?=pat) - Positive lookahead assertion: ensures that the following characters match pat, but doesn't include those characters in the matched text
So /(?=a)b/ ensures that character a is inside the text, but we dont include character a inside the matched text, and what immediately follows character a must be character b. In other words, "abc" would match this regex, but "bbc" would not match this regex, and the matched text for "abc" would be "b" (instead of "ab").
So combining the value of inflections.acronym_regex into this regex /^(?:#{inflections.acronym_regex}(?=\b|[A-Z_])|\w)/, I do not know which of the following two regex results:
A. /^(?:/(?=a)b/(?=\b|[A-Z_])|\w)/
B. /^(?:(?=a)b(?=\b|[A-Z_])|\w)/
although I am thinking it is B. From what I understand, (?: provides grouping without capturing, (?= means positive lookahead assertion, \b matches word boundaries when outside brackets and matches backspace when inside brackets. So in english terms, regex B, when matching against a text, will find a string that begins with an a character, followed by a b character, and one of (1. backspace [whatever that may mean] 2. any uppercase character or underscore 3. any english alphabetic character, digit, or underscore).
However, I find it strange that passing upper_case_first_letter = false to the camelize function should cause it to match a string starting with the characters ab, given that that does not seem to be how the camelize function behaves.
For the second regex
string.gsub(/(?:_|(\/))([a-z\d]*)/i) { "#{$1}#{inflections.acronyms[$2] || $2.capitalize}" }.gsub('/', '::')
The regex is:
/(?:_|(\/))([a-z\d]*)/i
I am guessing that this regex will match a substring that starts with either an _ or /, followed by 0 or more (upper or lowercase english alpabetic characters or digit). Furthermore, for the first group (?:_|(\/)), whether we match the _ or /, the ([a-z\d]*) capturing group will always be regarded as the second group. I do understand the part where the block tries to look up inflections.acronyms[$2] and on failure, does $2.captitalize.
Since (?: means grouping without capturing, what is the value of $1 when we match _ ? Is it still _ ? And for the .gsub('/', '::') portion, I am guessing that it gets applied for each match in the initial gsub, instead of being applied to the overall string after the outer gsub call is done?
Apologies for the really long post. Please point out my errors in understanding the 2 regular expressions, or explain them in a better way if you can do it.
Thank you.
However, I find it strange that passing upper_case_first_letter =
false to the camelize function should cause it to match a string
starting with the characters ab, given that that does not seem to be
how the camelize function behaves.
?: acts like a . here and does match the string (ie. single character) but there is no grouping, therefore the match is in $&.
Since (?: means grouping without capturing, what is the value of $1
when we match _ ? Is it still _ ?
It's nil since there is no capturing. The value is in $2
And for the .gsub('/', '::') portion, I am guessing that it gets
applied for each match in the initial gsub, instead of being applied
to the overall string after the outer gsub call is done?
It's applied to the overall result as gsub with block returns a string and the gsub('/', '::') is outside of a block.

regex basic url expression

Hi I'm creating a regular expression (ruby) to test the beginning and end of string. I have both parts but can't join them.
Beginning of string
\A(http:\/\/+)
End of string
(.pdf)\z
How to join?
Bonus if it could validate in-between and accept anything (to avoid http://.pdf)
By the way, rubular http://rubular.com is a neat place to validate expressions
Use .+ to match any character except \n one or more times.
\A(http:\/\/+).+(\.pdf)\z
Should match http://www.stackoverflow.com/bestbook.pdf but not http://.pdf

string format check

Suppose I have string variables like following:
s1="10$"
s2="10$ I am a student"
s3="10$Good"
s4="10$ Nice weekend!"
As you see above, s2 and s4 have white space(s) after 10$ .
Generally, I would like to have a way to check if a string start with 10$ and have white-space(s) after 10$ . For example, The rule should find s2 and s4 in my above case. how to define such rule to check if a string start with '10$' and have white space(s) after?
What I mean is something like s2.RULE? should return true or false to tell if it is the matched string.
---------- update -------------------
please also tell the solution if 10# is used instead of 10$
You can do this using Regular Expressions (Ruby has Perl-style regular expressions, to be exact).
# For ease of demonstration, I've moved your strings into an array
strings = [
"10$",
"10$ I am a student",
"10$Good",
"10$ Nice weekend!"
]
p strings.find_all { |s| s =~ /\A10\$[ \t]+/ }
The regular expression breaks down like this:
The / at the beginning and the end tell Ruby that everything in between is part of the regular expression
\A matches the beginning of a string
The 10 is matched verbatim
\$ means to match a $ verbatim. We need to escape it since $ has a special meaning in regular expressions.
[ \t]+ means "match at least one blank and/or tab"
So this regular expressions says "Match every string that starts with 10$ followed by at least one blank or tab character". Using the =~ you can test strings in Ruby against this expression. =~ will return a non-nil value, which evaluates to true if used in a conditional like if.
Edit: Updated white space matching as per Asmageddon's suggestion.
this works:
"10$ " =~ /^10\$ +/
and returns either nil when false or 0 when true. Thanks to Ruby's rule, you can use it directly.
Use a regular expression like this one:
/10\$\s+/
EDIT
If you use =~ for matching, note that
The =~ operator returns the character position in the string of the
start of the match
So it might return 0 to denote a match. Only a return of nil means no match.
See for example http://www.regular-expressions.info/ruby.html on a regular expression tutorial for ruby.
If you want to proceed to cases with $ and # then try this regular expression:
/^10[\$#] +/

Resources