I have a search string that a user inputs text into.
If it contains any part of a postal code like: 1N1 or 1N11N1 or 1N1 1N1 then I want to pull that out of the text.
example:
John Doe 1n11n1
or
1n1 John Doe
or
John 1n11n1 Doe
I want to capture this:
postal_code: 1n11n1
other: John Doe
Can this be done using regex?
Try matching the regular expression /((?:\d[A-Za-z]\d)+)/ and returning $1:
def get_postal_code(s)
r = /((?:\d[A-Za-z]\d)+)/
return (s =~ r) ? [$1, s.sub(r,'')] : nil
end
# Example usage...
get_postal_code('John Doe 1n11n1') # => ['1n11n1', 'John Doe ']
get_postal_code('1n1 John Doe') # => ['1n1', ' John Doe']
get_postal_code('John Doe 1n1') # => ['1n1', 'John Doe ']
You could also cleanup the "other" string as follows.
...
return (s =~ r) ? [$1, s.sub(r,'').gsub(/\s+/,' ').strip] : nil
end
get_postal_code('John Doe 1n11n1') # => ['1n11n1', 'John Doe']
get_postal_code('1n1 John Doe') # => ['1n1', 'John Doe']
get_postal_code('John Doe 1n1') # => ['1n1', 'John Doe']
Not sure what is the format of the postal codes where you are, but I'd definitely resort to regexlib:
http://regexlib.com/Search.aspx?k=postal%20code
You'll find many regular expressions that you can use to match the postal code in your string.
To get the rest of the string, you can simply do a regex remove on the postal code and get the resulting string. There is probably a more efficient way to do this, but I'm going for simplicity :)
Hope this helps!
Yes, this can be done using a regex. Depending on the type of data in the rows you may be at risk for false positives, because anything that matches the pattern will be seen as a postal code (in your example though that does not seem likely).
Assuming that in your patterns N is an alpha character and 1 a numeric character you'd do something like the below:
strings = ["John Doe 1n11n1", "1n1 John Doe", "John 1n1 1n1 Doe"]
regex = /([0-9]{1}[A-Za-z]{1}[0-9]{2}[A-Za-z]{1}[0-9]{1}|[0-9]{1}[A-Za-z]{1}[0-9]{1}\s[0-9]{1}[A-Za-z]{1}[0-9]{1}|[0-9]{1}[A-Za-z]{1}[0-9]{1})/
strings.each do |s|
if regex.match(s)
puts "postal_code: #{regex.match(s)[1]}"
puts "rest: #{s.gsub(regex, "")}"
puts
end
end
This outputs:
postal_code: 1n11n1
rest: John Doe
postal_code: 1n1
rest: John Doe
postal_code: 1n1 1n1
rest: John Doe
If you want to get rid of excess spaces you can use String#squeeze(" ") to make it so :)
Related
I got an object of CSV, the content is like:
full_name | phone_number |
Mike Smith | (123)-456-7890|
Tony Davis | (213)-564-7890|
And I would like to split the column of 'full_name' into two columns, which represents 'first_name' and 'last_name':
first_name | last_name | phone_number |
Mike | Smith | (123)-456-7890|
Tony | Davis | (213)-564-7890|
I didn't find a command to do it, how can I split it? Thank you very much!
The code for this part is:
def to_csv(options = {})
CSV.generate(options) do |csv|
columns = %w[ full_name phone_number ]
csv << columns
each do |books|
csv << columns.map { |column| books.public_send(column) }
.map { |column| ActionController::Base.helpers.strip_tags(column.to_s) }
end
end
end
And this method is called by:
send_data(books.to_csv(col_sep: ','),filename: 'booksAuthorsInfo.csv',type: 'application/csv')
The class type of 'books' is ActiveRecord_AssociationRelation, and I don't know how to split that full_name column into two columns, it looks like the String.split is not helpful.
So far the question doesn't make enough sense so I expect it'll be closed eventually. Here's some information about how to split the fields so that, maybe, we can get to the real root of the problem.
Meditate on this:
ary = <<EOT
full_name | phone_number |
Mike Smith | (123)-456-7890|
Tony Davis | (213)-564-7890|
EOT
PIPE_DELIMITER = /\s*\|\s*/
munged_data = ary.lines[1..-1].map { |l|
full_name, phone_number = l.split(PIPE_DELIMITER) # => ["Mike Smith", "(123)-456-7890"], ["Tony Davis", "(213)-564-7890"]
first_name, last_name = full_name.split # => ["Mike", "Smith"], ["Tony", "Davis"]
[first_name, last_name, phone_number]
}
# => [["Mike", "Smith", "(123)-456-7890"], ["Tony", "Davis", "(213)-564-7890"]]
At this point it's possible to generate some usable CSV data:
require 'csv'
CSV.open("/dev/stdout", "wb") do |csv|
csv << %w[first_name last_name phone_number]
munged_data.each do |row|
csv << row
end
end
Which results in:
# >> first_name,last_name,phone_number
# >> Mike,Smith,(123)-456-7890
# >> Tony,Davis,(213)-564-7890
Note the use of a regular expression as a parameter to split; This tells split to clean up the resulting output a bit by only splitting on zero-or-more whitespace delimited by |:
PIPE_DELIMITER = /\s*\|\s*/
full_name, phone_number = l.split('|') # => ["Mike Smith ", " (123)-456-7890", "\n"], ["Tony Davis ", " (213)-564-7890", "\n"]
full_name, phone_number = l.split(PIPE_DELIMITER) # => ["Mike Smith", "(123)-456-7890"], ["Tony Davis", "(213)-564-7890"]
At that point it is easy to split the names:
first_name, last_name = full_name.split # => ["Mike", "Smith"], ["Tony", "Davis"]
I have the following factory which I'd like to use in conjunction with a FactoryGirl.create_list to produce a small dataset with some specific values:
FactoryGirl.define do
factory :name do
forename "Ziggy"
surname "Stardust"
factory :sequence_of_names do
sequence(:forename) do |n|
forenames = %w(Robert Tommy Tomi Rob Mohammad Amélie Zoo John Robert Brown)
"#{forenames[n-1]}"
end
sequence(:surname) do |n|
surnames = %w(Thingy Robert smyth Brown Adbul Zoo Cafe Robert Thingy)
"#{surnames[n-1]}"
end
end
end
end
The forename 'Amélie' has caused an issue:
syntax error, unexpected $end, expecting keyword_end ...rt Tommy Tomi
Rob Mohammad Amélie Zoo John Robert Brown)
In an rspec file I can simply add the following for the 'é' character to be supported:
# encoding: UTF-8
But this doesn't seem to work for a FactoryGirl file; and ideas?
Trying to add a very rudimentary description template to one of my Rails models. What I want to do is take a template string like this:
template = "{{ name }} is the best {{ occupation }} in {{ city }}."
and a hash like this:
vals = {:name => "Joe Smith", :occupation => "birthday clown", :city => "Las Vegas"}
and get a description generated. I thought I could do this with a simple gsub but Ruby 1.8.7 doesn't accept hashes as the second argument. When I do a gsub as a block like this:
> template.gsub(/\{\{\s*(\w+)\s*\}\}/) {|m| vals[m]}
=> " is the best in ."
You can see it replaces it with the entire string (with curly braces), not the match captures.
How do I get it to replace "{{ something }}" with vals["something"] (or vals["something".to_sym])?
TIA
Using Ruby 1.9.2
The string formatting operator % will format a string with a hash as the arg
>> template = "%{name} is the best %{occupation} in %{city}."
>> vals = {:name => "Joe Smith", :occupation => "birthday clown", :city => "Las Vegas"}
>> template % vals
=> "Joe Smith is the best birthday clown in Las Vegas."
Using Ruby 1.8.7
The string formatting operator in Ruby 1.8.7 doesn't support hashes. Instead, you can use the same arguments as the Ruby 1.9.2 solution and patch the String object so when you upgrade Ruby you won't have to edit your strings.
if RUBY_VERSION < '1.9.2'
class String
old_format = instance_method(:%)
define_method(:%) do |arg|
if arg.is_a?(Hash)
self.gsub(/%\{(.*?)\}/) { arg[$1.to_sym] }
else
old_format.bind(self).call(arg)
end
end
end
end
>> "%05d" % 123
=> "00123"
>> "%-5s: %08x" % [ "ID", 123 ]
=> "ID : 0000007b"
>> template = "%{name} is the best %{occupation} in %{city}."
>> vals = {:name => "Joe Smith", :occupation => "birthday clown", :city => "Las Vegas"}
>> template % vals
=> "Joe Smith is the best birthday clown in Las Vegas."
codepad example showing the default and extended behavior
The easiest thing is probably to use $1.to_sym in your block:
>> template.gsub(/\{\{\s*(\w+)\s*\}\}/) { vals[$1.to_sym] }
=> "Joe Smith is the best birthday clown in Las Vegas."
From the fine manual:
In the block form, the current match string is passed in as a parameter, and variables such as $1, $2, $`, $&, and $’ will be set appropriately. The value returned by the block will be substituted for the match on each call.
Why does the titlecase mess up the name? I have:
John Mark McMillan
and it turns it into:
>> "john mark McMillan".titlecase
=> "John Mark Mc Millan"
Why is there a space added to the last name?
Basically I have this in my model:
before_save :capitalize_name
def capitalize_name
self.artist = self.artist.titlecase
end
I am trying to make sure that all the names are titlecase in the DB, but in situtations with a camelcase name it fails. Any ideas how to fix this?
You can always do it yourself if Rails isn't good enough:
class String
def another_titlecase
self.split(" ").collect{|word| word[0] = word[0].upcase; word}.join(" ")
end
end
"john mark McMillan".another_titlecase
=> "John Mark McMillan"
This method is a small fraction of a second faster than the regex solution:
My solution:
ruby-1.9.2-p136 :034 > Benchmark.ms do
ruby-1.9.2-p136 :035 > "john mark McMillan".split(" ").collect{|word|word[0] = word[0].upcase; word}.join(" ")
ruby-1.9.2-p136 :036?> end
=> 0.019311904907226562
Regex solution:
ruby-1.9.2-p136 :042 > Benchmark.ms do
ruby-1.9.2-p136 :043 > "john mark McMillan".gsub(/\b\w/) { |w| w.upcase }
ruby-1.9.2-p136 :044?> end
=> 0.04482269287109375
Hmm, that's odd.. but you could write a quick custom regex to avoid using that method.
class String
def custom_titlecase
self.gsub(/\b\w/) { |w| w.upcase }
end
end
"John Mark McMillan".custom_titlecase # => "John Mark McMillan"
Source
If all you want is to ensure that each word starts with a capital:
class String
def titlecase2
self.split(' ').map { |w| w[0] = w[0].upcase; w }.join(' ')
end
end
irb(main):016:0> "john mark McMillan".titlecase2
=> "John Mark McMillan"
Edited (inspired by The Tin Man's suggestion)
A hack will be:
class String
def titlecase
gsub(/(?:_|\b)(.)/){$1.upcase}
end
end
p "john mark McMillan".titlecase
# => "John Mark McMillan"
Note that the string 'john mark McMillan' is inconsistent in capitalization, and is somewhat unexpected as a human input, or if it is not from a human input, you probably should not have the strings stored in that way. A string like 'john mark mc_millan' is more consistent, and would more likely appear as a human input if you define such convention. My answer will handle these cases as well:
p "john mark mc_millan".titlecase
# => "John Mark McMillan"
If you want to handle the case where someone has entered JOHN CAPSLOCK JOE as well as the others, I combined this one:
class String
def proper_titlecase
if self.titleize.split.length == self.split.length
self.titleize
else
self.split(" ").collect{|word| word[0] = word[0].upcase; word}.join(" ")
end
end
end
Depends if you want that kinda logic on a String method ;)
The documentation for titlecase says ([emphasis added]):
Capitalizes all the words and replaces
some characters in the string to
create a nicer looking title. titleize
is meant for creating pretty output.
It is not used in the Rails internals.
I'm only guessing here, but perhaps it regards PascalCase as a problem - maybe it thinks it's the name of a ActiveRecordModelClass.
We have just added this which supports a few different cases that we face.
class String
# Default titlecase converts McKay to Mc Kay, which is not great
# May even need to remove titlecase completely in the future to leave
# strings unchanged
def self.custom_title_case(string = "")
return "" if !string.is_a?(String) || string.empty?
split = string.split(" ").collect do |word|
word = word.titlecase
# If we titlecase and it turns in to 2 words, then we need to merge back
word = word.match?(/\w/) ? word.split(" ").join("") : word
word
end
return split.join(" ")
end
end
And the rspec test
# spec/lib/modules/string_spec.rb
require 'rails_helper'
require 'modules/string'
describe "String" do
describe "self.custom_title_case" do
it "returns empty string if incorrect params" do
result_one = String.custom_title_case({ test: 'object' })
result_two = String.custom_title_case([1, 2])
result_three = String.custom_title_case()
expect(result_one).to eq("")
expect(result_two).to eq("")
expect(result_three).to eq("")
end
it "returns string in title case" do
result = String.custom_title_case("smiths hill")
expect(result).to eq("Smiths Hill")
end
it "caters for 'Mc' i.e. 'john mark McMillan' edge cases" do
result_one = String.custom_title_case("burger king McDonalds")
result_two = String.custom_title_case("john mark McMillan")
result_three = String.custom_title_case("McKay bay")
expect(result_one).to eq("Burger King McDonalds")
expect(result_two).to eq("John Mark McMillan")
expect(result_three).to eq("McKay Bay")
end
it "correctly cases uppercase words" do
result = String.custom_title_case("NORTH NARRABEEN")
expect(result).to eq("North Narrabeen")
end
end
end
You're trying to use a generic method for converting Rail's internal strings into more human readable names. It's not designed to handle "Mc" and "Mac" and "Van Der" and any number of other compound spellings.
You can use it as a starting point, then special case the results looking for the places it breaks and do some fix-ups, or you can write your own method that includes special-casing those edge cases. I've had to do that several times in different apps over the years.
You may also encounter names with two capital letters, such as McLaren, McDonald etc.
Have not spent time trying to improve it, but you could always do
Code
# Rails.root/config/initializers/string.rb
class String
def titleize_name
self.split(" ")
.collect{|word| word[0] = word[0].upcase; word}
.join(" ").gsub(/\b('?[a-z])/) { $1.capitalize }
end
end
Examples
[2] pry(main)> "test name".titleize_name
=> "Test Name"
[3] pry(main)> "test name-name".titleize_name
=> "Test Name-Name"
[4] pry(main)> "test McName-name".titleize_name
=> "Test McName-Name"
The "Why" question has already been answered...but as evidenced by the selected answer and upvotes, I think what most of us are ACTUALLY wanting is a silver bullet to deal with the hell that is name-formatting...While multiple capitals trigger that behavior, I've found that hyphenated names do the same.
These cases and many more have already been handled in the gem, NameCase.
In version 2.0 it only converts a string if the string is all uppercase or all lowercase, based on a defined ruleset as a best guess. I like this, because I'm sure the ruleset can never be 100% correct. Example, Ian McDonald (from Scotland) has a different capitalization from Ian Mcdonald (from Ireland)...however those names will be handled correctly at the time of input if the user is particular and if not, the name can be corrected if needed and retain its formatting.
My Solution:
# If desired, add string method once NameCase gem is added
class String
def namecase
NameCase(self)
end
end
Tests: (name.namecase)
test_names = ["john mark McMillan", "JOHN CAPSLOCK JOE", "test name", "test name-name", "test McName-name", "John w McHENRY", "ian mcdonald", "Ian McDonald", "Ian Mcdonald"]
test_names.each { |name| puts '# "' + name + '" => "' + name.namecase + '"' }
# "john mark McMillan" => "John Mark McMillan"
# "JOHN CAPSLOCK JOE" => "John Capslock Joe"
# "test name" => "Test Name"
# "test name-name" => "Test Name-Name"
# "test McName-name" => "Test McName-Name"
# "John w McHENRY" => "John w McHENRY" -FAIL
# "ian mcdonald" => "Ian McDonald"
# "Ian McDonald" => "Ian McDonald"
# "Ian Mcdonald" => "Ian Mcdonald"
If you feel you need to handle all of the corner cases on this page and don't care about losing names that may have been formatted at the start, eg. Ian Mcdonald (from Ireland)...you could use upcase first:
Tests: (name.upcase.namecase)
test_names.each { |name| puts '# "' + name + '" => "' + name.upcase.namecase + '"' }
# "john mark McMillan" => "John Mark McMillan"
# "JOHN CAPSLOCK JOE" => "John Capslock Joe"
# "test name" => "Test Name"
# "test name-name" => "Test Name-Name"
# "test McName-name" => "Test McName-Name"
# "John w McHENRY" => "John W McHenry"
# "ian mcdonald" => "Ian McDonald"
# "Ian McDonald" => "Ian McDonald"
# "Ian Mcdonald" => "Ian McDonald"
The only silver bullet is to go old school...ALL CAPS. But who wants that eyesore in their modern web app?
I'm currently using the following to parse emails:
def parse_emails(emails)
valid_emails, invalid_emails = [], []
unless emails.nil?
emails.split(/, ?/).each do |full_email|
unless full_email.blank?
if full_email.index(/\<.+\>/)
email = full_email.match(/\<.*\>/)[0].gsub(/[\<\>]/, "").strip
else
email = full_email.strip
end
email = email.delete("<").delete(">")
email_address = EmailVeracity::Address.new(email)
if email_address.valid?
valid_emails << email
else
invalid_emails << email
end
end
end
end
return valid_emails, invalid_emails
end
The problem I'm having is given an email like:
Bob Smith <bob#smith.com>
The code above is delete Bob Smith and only returning bob#smith.
But what I want is an hash of FNAME, LNAME, EMAIL. Where fname and lname are optional but email is not.
What type of ruby object would I use for that and how would I create such a record in the code above?
Thanks
I've coded so that it will work even if you have an entry like: John Bob Smith Doe <bob#smith.com>
It would retrieve:
{:email => "bob#smith.com", :fname => "John", :lname => "Bob Smith Doe" }
def parse_emails(emails)
valid_emails, invalid_emails = [], []
unless emails.nil?
emails.split(/, ?/).each do |full_email|
unless full_email.blank?
if index = full_email.index(/\<.+\>/)
email = full_email.match(/\<.*\>/)[0].gsub(/[\<\>]/, "").strip
name = full_email[0..index-1].split(" ")
fname = name.first
lname = name[1..name.size] * " "
else
email = full_email.strip
#your choice, what the string could be... only mail, only name?
end
email = email.delete("<").delete(">")
email_address = EmailVeracity::Address.new(email)
if email_address.valid?
valid_emails << { :email => email, :lname => lname, :fname => fname}
else
invalid_emails << { :email => email, :lname => lname, :fname => fname}
end
end
end
end
return valid_emails, invalid_emails
end
Here's a slightly different approach that works better for me. It grabs the name whether it is before or after the email address and whether or not the email address is in angle brackets.
I don't try to parse the first name out from the last name -- too problematic (e.g. "Mary Ann Smith" or Dr. Mary Smith"), but I do eliminate duplicate email addresses.
def parse_list(list)
r = Regexp.new('[a-z0-9\.\_\%\+\-]+#[a-z0-9\.\-]+\.[a-z]{2,4}', true)
valid_items, invalid_items = {}, []
## split the list on commas and/or newlines
list_items = list.split(/[,\n]+/)
list_items.each do |item|
if m = r.match(item)
## get the email address
email = m[0]
## get everything before the email address
before_str = item[0, m.begin(0)]
## get everything after the email address
after_str = item[m.end(0), item.length]
## enter the email as a valid_items hash key (eliminating dups)
## make the value of that key anything before the email if it contains
## any alphnumerics, stripping out any angle brackets
## and leading/trailing space
if /\w/ =~ before_str
valid_items[email] = before_str.gsub(/[\<\>\"]+/, '').strip
## if nothing before the email, make the value of that key anything after
##the email, stripping out any angle brackets and leading/trailing space
elsif /\w/ =~ after_str
valid_items[email] = after_str.gsub(/[\<\>\"]+/, '').strip
## if nothing after the email either,
## make the value of that key an empty string
else
valid_items[email] = ''
end
else
invalid_items << item.strip if item.strip.length > 0
end
end
[valid_items, invalid_items]
end
It returns a hash with valid email addresses as keys and the associated names as values. Any invalid items are returned in the invalid_items array.
See http://www.regular-expressions.info/email.html for an interesting discussion of email regexes.
I made a little gem out of this in case it might be useful to someone at https://github.com/victorgrey/email_addresses_parser
You can use rfc822 gem. It contains regular expression for seeking for emails that conform with RFC. You can easily extend it with parts for finding first and last name.
Along the lines of mspanc's answer, you can use the mail gem to do the basic email address parsing work for you, as answered here: https://stackoverflow.com/a/12187502/1019504