Two questions that I believe are connected and I think are regex related but have me stumped after some fruitless googling.
validates :image_url, format: { with: %r{\.(gif|jpg)\Z}i }
My guesses: similar to ruby/regex i = ignore case, single pipe means 'or'. Guessing \Z means end of string. The brackets are just containers unlike ruby/regex where they signify something wildly different.
But what does the %r do? I haven't run across that in ruby/regex.
ok_urls = %w{ fred.gif fred.jpg FRED.Jpg}
%r and %w seem to be doing the same thing so I'm confused why there are two separate commands to do the same thing. Sorry if this isn't very clear.
A Regexp holds a regular expression, used to match a pattern against strings. Regexps are created using the /.../ and %r{...} literals, and by the Regexp::new constructor.
%r and %w seem to be doing the same thing so I'm confused..
%w{ fred.gif fred.jpg FRED.Jpg}
# => ["fred.gif", "fred.jpg", "FRED.Jpg"]
%r{ a b }
# => / a b /
No. They are not same, as you can see above.
One thing I noticed with %r{}, as you don't need to escape slashes.
# /../ literals:
url.match /http:\/\/example\.com\//
# => #<MatchData "http://example.com/">
# %r{} literals:
url.match %r{http://example\.com/}
# => #<MatchData "http://example.com/">
Use %r only for regular expressions matching more than one '/' character.
# bad
%r(\s+)
# still bad
%r(^/(.*)$)
# should be /^\/(.*)$/
# good
%r(^/blog/2011/(.*)$)
Related
So what I am doing is iterating over various versions of snippet of code (for e.g. Associations.rb in Rails).
What I want to do is just extract one snippet of the code, for example the has_many method:
def has_many(name, scope = nil, options = {}, &extension)
reflection = Builder::HasMany.build(self, name, scope, options, &extension)
Reflection.add_reflection self, name, reflection
end
At first I was thinking of just searching this entire file for the string def has_many and then saving everything between that string and end. The obvious issue with this, is that different versions of this file can have multiple end strings within the method.
For instance, whatever I come up with for the above snippet, should also work for this one too:
def has_many(association_id, options = {})
validate_options([ :foreign_key, :class_name, :exclusively_dependent, :dependent, :conditions, :order, :finder_sql ], options.keys)
association_name, association_class_name, association_class_primary_key_name =
associate_identification(association_id, options[:class_name], options[:foreign_key])
require_association_class(association_class_name)
if options[:dependent] and options[:exclusively_dependent]
raise ArgumentError, ':dependent and :exclusively_dependent are mutually exclusive options. You may specify one or the other.' # ' ruby-mode
elsif options[:dependent]
module_eval "before_destroy '#{association_name}.each { |o| o.destroy }'"
elsif options[:exclusively_dependent]
module_eval "before_destroy { |record| #{association_class_name}.delete_all(%(#{association_class_primary_key_name} = '\#{record.id}')) }"
end
define_method(association_name) do |*params|
force_reload = params.first unless params.empty?
association = instance_variable_get("##{association_name}")
if association.nil?
association = HasManyAssociation.new(self,
association_name, association_class_name,
association_class_primary_key_name, options)
instance_variable_set("##{association_name}", association)
end
association.reload if force_reload
association
end
# deprecated api
deprecated_collection_count_method(association_name)
deprecated_add_association_relation(association_name)
deprecated_remove_association_relation(association_name)
deprecated_has_collection_method(association_name)
deprecated_find_in_collection_method(association_name)
deprecated_find_all_in_collection_method(association_name)
deprecated_create_method(association_name)
deprecated_build_method(association_name)
end
Assuming that each value is stored as text in some column in my db.
How do I approach this, using Ruby's string methods or should I be approaching this another way?
Edit 1
Please note that this question relates specifically to string manipulation via using a Regex, without a parser.
As discussed, this should be done with a parser like Ripper.
However, to answer if it can be done with string methods, I will match the syntax with a regex, provided:
You can rely on indentation i.e. the string has the exact same characters before "def" and before "end".
There are no multiline strings in between that could simulate an "end" with the same indentation. That includes multine strings, HEREDOC, %{ }, etc.
Code
regex = /^
(\s*) # matches the indentation (we'll backreference later)
def\ +has_many\b # literal "def has_many" with a word boundary
(?:.*+\n)*? # match whole lines - as few as possible
\1 # matches the same indentation as the def line
end\b # literal "end"
/x
subject = %q|
def has_many(name, scope = nil, options = {}, &extension)
if association.nil?
instance_variable_set("##{association_name}", association)
end
end|
#Print matched text
puts subject.to_enum(:scan,regex).map {$&}
ideone demo
The regex relies on:
Capturing the whitespace (indentation) with the group (\s*),
followed by the literal def has_many.
It then consumes as few lines as it can with (?:.*+\n)*?.
Notice that .*+\n matches a whole line
and (?:..)*? repeats it 0 or more times. Also, the last ? makes the repetition lazy (as few as possible).
It will consume lines until it matches the following condition...
\1 is a backreference, storing the text matched in (1), i.e. the exact same indentation as the first line.
Followed by end obviously.
Test in Rubular
I'm trying to display an array of words from a user's post. However the method I'm using treats an apostrophe like whitespace.
<%= var = Post.pluck(:body) %>
<%= var.join.downcase.split(/\W+/) %>
So if the input text was: The baby's foot
it would output the baby s foot,
but it should be the baby's foot.
How do I accomplish that?
Accepted answer is too naïve:
▶ "It’s naïve approach".split(/[^'\w]+/)
#⇒ [
# [0] "It",
# [1] "s",
# [2] "nai",
# [3] "ve",
# [4] "approach"
# ]
this is because nowadays there is almost 2016 and many users might want to use their normal names, like, you know, José Østergaard. Punctuation is not only the apostroph, as you might notice.
▶ "It’s naïve approach".split(/[^'’\p{L}\p{M}]+/)
#⇒ [
# [0] "It’s",
# [1] "naïve",
# [2] "approach"
# ]
Further reading: Character Properties.
Along the lines of mudasobwa's answer, here's what \w and \W bring to the party:
chars = [*' ' .. "\x7e"].join
# => " !\"\#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~"
That's the usual visible lower-ASCII characters we'd see in code. See the Regexp documentation for more information.
Grabbing the characters that match \w returns:
chars.scan(/\w+/)
# => ["0123456789",
# "ABCDEFGHIJKLMNOPQRSTUVWXYZ",
# "_",
# "abcdefghijklmnopqrstuvwxyz"]
Conversely, grabbing the characters that don't match \w, or that match \W:
chars.scan(/\W+/)
# => [" !\"\#$%&'()*+,-./", ":;<=>?#", "[\\]^", "`", "{|}~"]
\w is defined as [a-zA-Z0-9_] which is not what you want to normally call "word" characters. Instead they're typically the characters we use to define variable names.
If you're dealing with only lower-ASCII characters, use the character-class
[a-zA-Z]
For instance:
chars = [*' ' .. "\x7e"].join
lower_ascii_chars = '[a-zA-Z]'
not_lower_ascii_chars = '[^a-zA-Z]'
chars.scan(/#{lower_ascii_chars}+/)
# => ["ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz"]
chars.scan(/#{not_lower_ascii_chars}+/)
# => [" !\"\#$%&'()*+,-./0123456789:;<=>?#", "[\\]^_`", "{|}~"]
Instead of defining your own, you can take advantage of the POSIX definitions and character properties:
chars.scan(/[[:alpha:]]+/)
# => ["ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz"]
chars.scan(/\p{Alpha}+/)
# => ["ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz"]
Regular expressions always seem like a wonderful new wand to wave when extracting information from a string, but, like the Sorcerer's Apprentice found out, they can create havoc when misused or not understood.
Knowing this should help you write a bit more intelligent patterns. Apply that to what the documentation shows and you should be able to easily figure out a pattern that does what you want.
You can use below RegEx instead of /\W+/
var.join.downcase.split(/[^'\w]+/)
/\W/ refers to all non-word characters, apostrophe is one such non-word character.
To keep the code as close to original intent, we can use /[^'\w]/ - this means that all characters that are not apostrophe and word character.
Running that string through irb with the same split call that you wrote in your comment gets this:
irb(main):008:0> "The baby's foot".split(/\W+/)
=> ["The", "baby", "s", "foot"]
However, if you use split without an explicit delimiter, you get the split you're looking for:
irb(main):009:0> "The baby's foot".split
=> ["The", "baby's", "foot"]
Does that get you what you're looking for?
Rails titleize method removes hyphen and underscore, and capitalize method does not capitalize the word comes after hyphen and underscore. I wanted to do something like following:
sam-joe denis-moore → Sam-Joe Denis-Moore
sam-louise o'donnell → Sam-Louise O'Donnell
arthur_campbell john-foo → Arthur_Campbell John-Foo"
What is pattern that need to use on gsub below for this:
"sam-joe denis-moore".humanize.gsub(??) { $1.capitalize }
# => "Sam-Joe Denis-Moore"
Any help is really appreciated
While lurker's answer works, it's far more complicated than it needs to be. As you surmised, you can do this with gsub alone:
INITIAL_LETTER_EXPR = /(?:\b|_)[a-z]/
arr = [ "sam-joe denis-moore",
"sam-louise o'donnell",
"arthur_campbell john-foo" ]
arr.each do |str|
puts str.gsub(INITIAL_LETTER_EXPR) { $&.upcase }
end
# => Sam-Joe Denis-Moore
# Sam-Louise O'Donnell
# Arthur_Campbell John-Foo
Try this:
my_string.split(/([ _-])/).map(&:capitalize).join
You can put whatever delimiters you like in the regex. I used , _, and -. So, for example:
'sam-joe denis-moore'.split(/([ _-])/).map(&:capitalize).join
Results in:
'Sam-Joe Denis-Moore'
What it does is:
.split(/([ _-])/) splits the string into an array of substrings at the given delimiters, and keeps the delimiters as substrings
.map(&:capitalize) maps the resulting array of strings to a new array of strings, capitalizing each string in the array (delimiters, when capitalized, are unaffected)
.join joins the resulting array of substrings back together for the final result
You could, if you want, monkey patch the String class with your own titleize:
class String
def my_titleize
self.split(/([ _-])/).map(&:capitalize).join
end
end
Then you can do:
'sam-joe denis-moore'.my_titleize
=> 'Sam-Joe Denis-Moore'
I am trying to write and test a regex valiation which allows only for a sequence of paired integers, in the format
n,n n,n
where n is any integer not beginning with zero and pairs are space separated. There may be a single pair or the field may also be empty.
So with this data, it should give 2 errors
12,2 11,2 aa 111,11,11
error 1: the 'aa'
error 2: the triplet (111,11,11)
In my Rails model I have this
validates_format_of :sequence_excluded_region, :sequence_included_region,
with: /[0-9]*,[0-9] /, allow_blank: true
In my Rspec model test I have this
it 'is invalid with alphanumeric SEQUENCE_INCLUDED_REGION' do
expect(DesignSetting.create!(sequence_included_region: '12,2 11,2 aa 111,11,11')).to have(1).errors_on(:sequence_included_region)
end
The test fails, as the regex does not find the errors, or perhaps I am calling the test incorrectly.
Failures:
1) DesignSetting is invalid with alphanumeric SEQUENCE_INCLUDED_REGION
Failure/Error: expect(DesignSetting.create!(sequence_included_region: '12,2 11,2 aa 111,11,11')).to have(2).errors_on(:sequence_included_region)
expected 2 errors on :sequence_included_region, got 0
# ./spec/models/design_setting_spec.rb:5:in `block (2 levels) in <top (required)>'
Regex
Your regex matches a single pair followed by a space anywhere in the string.
'12,2 11,2 aa 111,11,11 13,3'.scan /[0-9]*,[0-9] /
=> ["12,2 ", "11,2 "]
So any string with one valid pair followed by a space will be valid. Also a single pair would fail 3,4 as there is no space.
A regex that would validate the entire string:
positive_int = /[1-9][0-9]*/
pair = /#{positive_int},#{positive_int}/
re_validate = /
\A # Start of string
#{pair} # Must have one number pair.
(?:\s#{pair})* # Can be followed by any number of pairs with a space delimiter
\z # End of string (no newline)
/x
Validators
I don't use rails much but it seems like you are expecting too much from a simple regex validator for it to parse out the individual error components from a string for you.
If you split the variable up by space and then validated each element of the array you could get that detail for each field.
'12,2 11,2 aa 111,11,11 13,3'.split(' ').reject{|f| f =~ /^[1-9][0-9]*,[1-9][0-9]*$/ }
You can put something like that into a custom validator class using validates_with which you can then have direct control of your errors with...
class RegionValidator < ActiveModel::Validator
def validate(record)
record.sequence_included_region.split(' ').reject{|f| f =~ /^[1-9][0-9]*,[1-9][0-9]*$/ }.each do |err|
record.errors[sequence_included_region] << "bad region field [#{err}]"
end
end
end
(?<=\s|^)\d+,\d+(?=\s|$)
Try this.Replace with empty string.The left string split by are your errors.
See demo.
http://regex101.com/r/rQ6mK9/22
Hey... how would you validate a full_name field (name surname).
Consider names like:
Ms. Jan Levinson-Gould
Dr. Martin Luther King, Jr.
Brett d'Arras-d'Haudracey
Brüno
Instead of validating the characters that are there, you might just want to ensure some set of characters is not present.
For example:
class User < ActiveRecord::Base
validates_format_of :full_name, :with => /\A[^0-9`!##\$%\^&*+_=]+\z/
# add any other characters you'd like to disallow inside the [ brackets ]
# metacharacters [, \, ^, $, ., |, ?, *, +, (, and ) need to be escaped with a \
end
Tests
Ms. Jan Levinson-Gould # pass
Dr. Martin Luther King, Jr. # pass
Brett d'Arras-d'Haudracey # pass
Brüno # pass
John Doe # pass
Mary-Jo Jane Sally Smith # pass
Fatty Mc.Error$ # fail
FA!L # fail
#arold Newm#n # fail
N4m3 w1th Numb3r5 # fail
Regular expression explanation
NODE EXPLANATION
--------------------------------------------------------------------------------
\A the beginning of the string
--------------------------------------------------------------------------------
[^`!##\$%\^&*+_=\d]+ any character except: '`', '!', '#', '#',
'\$', '%', '\^', '&', '*', '+', '_', '=',
digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
\z the end of the string
Any validation you perform here is likely to break down unless it is extremely general. For instance, enforcing a minimum length of 3 is probably about as reasonable as you can get without getting into the specifics of what is entered.
When you have names like "O'Malley" with an apostrophe, "Smith-Johnson" with a dash, "Andrés" with accented characters or extremely short names such as "Vo Ly" with virtually no characters at all, how do you validate without excluding legitimate cases? It's not easy.
At least one space and at least 4 char (including the space)
\A(?=.* )[^0-9`!##\\\$%\^&*\;+_=]{4,}\z