Rails validates NOT in regex - ruby-on-rails

I'm trying to create a Rails 3 validation that will ensure that people are not using one of the common free email addresses.
My thought was something like this ....
validates_format_of :email, :with => /^((?!gmail).*)$|^((?!yahoo).*)$|^((?!hotmail).*)$/
or
validates_exclusion_of :email, :in => %w( gmail. GMAIL. hotmail. HOTMAIL. live. LIVE. aol. AOL. ), :message => "You must use your corporate email address."
But neither works properly. Any ideas?

Basically you've written a regex that matches anything. Let's break it down.
/
^( # [ beginning of string
(?!gmail) # followed by anything other than "gmail"
. # followed by any one character
)$ # followed by the end the of string
| # ] OR [
^( # beginning of the string
(?!yahoo) # followed by anything other than "yahoo"
. # followed by any one character
)$ # followed by the end of the string
| # ] OR [
^( # beginning of the string
(?!hotmail) # followed by anything other than "hotmail"
.* # followed by any or no characters
)$ # followed by the end the of string
/ # ]
When you think about it you'll realize that the only strings that won't match are ones that start with "gmail," "yahoo," and "hotmail"--all at the same time, which is impossible.
What you really want is something like this:
/
.+# # one or more characters followed by #
(?! # followed by anything other than...
(gmail|yahoo|hotmail) # [ one of these strings
\. # followed by a literal dot
) # ]
.+ # followed by one or more characters
$ # and the end of the string
/i # case insensitive
Put it together and you have:
expr = /.+#(?!(gmail|yahoo|hotmail)\.).+$/i
test_cases = %w[ foo#gmail.com
bar#yahoo.com
BAZ#HOTMAIL.COM
qux#example.com
quux
]
test_cases.map {|addr| expr =~ addr }
# => [nil, nil, nil, 0, nil]
# (nil means no match, 0 means there was a match starting at character 0)

Related

How to read a CSV that contains double quotes (")

I have a CSV File that looks like this
"url","id","role","url","deadline","availability","location","my_type","keywords","source","external_id","area","area (1)"
"https://myurl.com","123456","This a string","https://myurl.com?source=5&param=1","31-01-2020","1","Location´s Place","another_string, my_string","key1, key2, key3","anotherString","145129","Place in Earth",""
It has 13 columns.
The issue is that I get each row with a \" and I don't want that. Also, I get 16 columns back in the read.
This is what I have done
csv = CSV.new(File.open('myfile.csv'), quote_char:"\x00", force_quotes:false)
csv.read[1]
Output:
["\"https://myurl.com\"", "\"123456\"", "\"This a string\"", "\"https://myurl.com?source=5&param=1\"", "\"31-01-2020\"", "\"1\"", "\"Location´s Place\"", "\"another_string", " my_string\"", "\"key1", " key2", " key3\"", "\"anotherString\"", "\"145129\"", "\"Place in Earth\"", "\"\""]
The file you showed is a standard CSV file. There is nothing special needed. Just delete all those unnecessary arguments:
csv = CSV.new(File.open('myfile.csv'))
csv.read[1]
#=> [
# "https://myurl.com",
# "123456",
# "This a string",
# "https://myurl.com?source=5&param=1",
# "31-01-2020",
# "1",
# "Location´s Place",
# "another_string, my_string",
# "key1, key2, key3",
# "anotherString",
# "145129",
# "Place in Earth",
# ""
# ]
force_quotes doesn't do anything in your code, because it controls whether or not the CSV library will quote all fields when writing CSV. You are reading, not writing, so this argument is useless.
quote_char: "\x00" is clearly wrong, since the quote character in the example you posted is clearly " not NUL.
quote_char: '"' would be correct, but is not necessary, since it is the default.

Find within the first 10?

I'm using Nokogiri to screen-scrape contents of a website.
I set fetch_number to specify the number of <divs> that I want to retrieve. For example, I may want the first(10) tweets from the target page.
The code looks like this:
doc.css(".tweet").first(fetch_number).each do |item|
title = item.css("a")[0]['title']
end
However, when there is less than 10 matching div tags returned, it will report
NoMethodError: undefined method 'css' for nil:NilClass
This is because, when no matching HTML is found, it will return nil.
How can I make it return all the available data within 10? I don't need the nils.
UPDATE:
task :test_fetch => :environment do
require 'nokogiri'
require 'open-uri'
url = 'http://themagicway.taobao.com/search.htm?&search=y&orderType=newOn_desc'
doc = Nokogiri::HTML(open(url) )
puts doc.css(".main-wrap .item").count
doc.css(".main-wrap .item").first(30).each do |item_info|
if item_info
href = item_info.at(".detail a")['href']
puts href
else
puts 'this is empty'
end
end
end
Return resultes(Near the end):
24
http://item.taobao.com/item.htm?id=41249522884
http://item.taobao.com/item.htm?id=40369253621
http://item.taobao.com/item.htm?id=40384876796
http://item.taobao.com/item.htm?id=40352486259
http://item.taobao.com/item.htm?id=40384968205
.....
http://item.taobao.com/item.htm?id=38843789106
http://item.taobao.com/item.htm?id=38843517455
http://item.taobao.com/item.htm?id=38854788276
http://item.taobao.com/item.htm?id=38825442050
http://item.taobao.com/item.htm?id=38630599372
http://item.taobao.com/item.htm?id=38346270714
http://item.taobao.com/item.htm?id=38357729988
http://item.taobao.com/item.htm?id=38345374874
this is empty
this is empty
this is empty
this is empty
this is empty
this is empty
count reports only 24 elements, but it retuns a 30 array.
And it actually is not an array, but Nokogiri::XML::NodeSet? I'm not sure.
title = item.css("a")[0]['title']
is a bad practice.
Instead, consider writing using at or at_css instead of search or css:
title = item.at('a')['title']
Next, if the <a> tag returned doesn't have a title parameter, Nokogiri and/or Ruby will be upset because the title variable will be nil. Instead, improve your CSS selector to only allow matches like <a title="foo">:
require 'nokogiri'
doc = Nokogiri::HTML('<body>foobar</body>')
doc.at('a').to_html # => "foo"
doc.at('a[title]').to_html # => "bar"
Notice how the first, which is not constrained to look for tags with a title parameter returns the first <a> tag. Using a[title] will only return ones with a title parameter.
That means your loop over the values will never return nil, and you won't have a problem needing to compact them out of the returned array.
As a general programming tip, if you're getting nils like that, look at the code generating the array, because odds are good it's not doing it right. You should ALWAYS know what sort of results your code will generate. Using compact to clean up the array is a knee-jerk reaction to not having written the code correctly most of the time.
Here's your updated code:
require 'nokogiri'
require 'open-uri'
url = 'http://themagicway.taobao.com/search.htm?&search=y&orderType=newOn_desc'
doc = Nokogiri::HTML(open(url) )
puts doc.css(".main-wrap .item").count
doc.css(".main-wrap .item").first(30).each do |item_info|
if item_info
href = item_info.at(".detail a")['href']
puts href
else
puts 'this is empty'
end
end
And here's what's wrong:
doc.css(".main-wrap .item").first(30)
Here's a simple example demonstrating why that doesn't work:
require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
<html>
<body>
<p>foo</p>
</body>
</html>
EOT
In Nokogiri, search',cssandxpath` are equivalent, except that the first is generic and can take either CSS or XPath, while the last two are specific to that language.
doc.search('p') # => [#<Nokogiri::XML::Element:0x3fcf360ef750 name="p" children=[#<Nokogiri::XML::Text:0x3fcf360ef4f8 "foo">]>]
doc.search('p').size # => 1
doc.search('p').map(&:to_html) # => ["<p>foo</p>"]
That shows that the NodeSet returned by doing a simple search returns only one node, and what the node looks like.
doc.search('p').first(2) # => [#<Nokogiri::XML::Element:0x3fe3a28d2848 name="p" children=[#<Nokogiri::XML::Text:0x3fe3a28c7b50 "foo">]>, nil]
doc.search('p').first(2).size # => 2
Searching using first(n) returns "n" elements. If that many aren't found Nokogiri fills them in using nil values.
This is counter what we'd assume first(n) to do, since Enumerable#first returns up-to-n and won't pad with nils. This isn't a bug, but it is unexpected behavior since Enumerable's first sets the expected behavior for methods with that name, but, this is NodeSet#first, not Enumerable#first, so it does what it does until the Nokogiri authors change it. (You can see why it happens if you look at the source for that particular method.)
Instead, slicing the NodeSet does show the expected behavior:
doc.search('p')[0..1] # => [#<Nokogiri::XML::Element:0x3fe3a28d2848 name="p" children=[#<Nokogiri::XML::Text:0x3fe3a28c7b50 "foo">]>]
doc.search('p')[0..1].size # => 1
doc.search('p')[0, 2] # => [#<Nokogiri::XML::Element:0x3fe3a28d2848 name="p" children=[#<Nokogiri::XML::Text:0x3fe3a28c7b50 "foo">]>]
doc.search('p')[0, 2].size # => 1
So, don't use NodeSet#first(n), use the slice form NodeSet#[].
Applying that, I'd write the code something like:
require 'nokogiri'
require 'open-uri'
URL = 'http://themagicway.taobao.com/search.htm?&search=y&orderType=newOn_desc'
doc = Nokogiri::HTML(open(URL))
hrefs = doc.css(".main-wrap .item .detail a[href]")[0..29].map { |anchors|
anchors['href']
}
puts hrefs.size
puts hrefs
# >> 24
# >> http://item.taobao.com/item.htm?id=41249522884
# >> http://item.taobao.com/item.htm?id=40369253621
# >> http://item.taobao.com/item.htm?id=40384876796
# >> http://item.taobao.com/item.htm?id=40352486259
# >> http://item.taobao.com/item.htm?id=40384968205
# >> http://item.taobao.com/item.htm?id=40384816312
# >> http://item.taobao.com/item.htm?id=40384600507
# >> http://item.taobao.com/item.htm?id=39973451949
# >> http://item.taobao.com/item.htm?id=39861209551
# >> http://item.taobao.com/item.htm?id=39545678869
# >> http://item.taobao.com/item.htm?id=39535371171
# >> http://item.taobao.com/item.htm?id=39509186150
# >> http://item.taobao.com/item.htm?id=38973412667
# >> http://item.taobao.com/item.htm?id=38910499863
# >> http://item.taobao.com/item.htm?id=38942960787
# >> http://item.taobao.com/item.htm?id=38910403350
# >> http://item.taobao.com/item.htm?id=38843789106
# >> http://item.taobao.com/item.htm?id=38843517455
# >> http://item.taobao.com/item.htm?id=38854788276
# >> http://item.taobao.com/item.htm?id=38825442050
# >> http://item.taobao.com/item.htm?id=38630599372
# >> http://item.taobao.com/item.htm?id=38346270714
# >> http://item.taobao.com/item.htm?id=38357729988
# >> http://item.taobao.com/item.htm?id=38345374874
Try this
doc.css(".tweet").first(fetch_number).each do |item|
title = item.css("a")[0]['title'] rescue nil
end
And let me know it works or not? It will not show error
Try compact.
[1, nil, 2, nil, 3] # => [1, 2, 3]
http://www.ruby-doc.org/core-2.1.3/Array.html#method-i-compact
(ie: first(fetch_number).compact.each do |item|)

Regex for striping non alphabetic and non numeric characters

Total programming newbie here. In ruby, how would I go about striping the following string of non alphabetic and non numeric characters and then split the string into an array by splitting it through spaces.
Example
string = "Honey - a sweet, sticky, yellow fluid made by bees and other insects from nectar collected from flowers."
Into this
tokenized_string = ["Honey", "a", "sweet", "sticky", "yellow", "fluid", "made", "by", "bees", "and", "other", "insects", "from", "nectar", "collected", "from", "flowers"]
Any help would be much appreciated!
I'd use:
string = "Honey - a sweet, sticky, yellow fluid made by bees and other insects from nectar collected from flowers."
string.delete('^A-Za-z0-9 ').split
# => ["Honey",
# "a",
# "sweet",
# "sticky",
# "yellow",
# "fluid",
# "made",
# "by",
# "bees",
# "and",
# "other",
# "insects",
# "from",
# "nectar",
# "collected",
# "from",
# "flowers"]
If you're trying to remove everything but alphanumerics, then the \w character class can't be used because it is defined as [A-Za-z0-9_], which allows _ to leak in or squeeze through. Here's an example:
'foo_BAR12'[/\w+/] # => "foo_BAR12"
That matched the entire string, including _.
'foo_BAR12'[/[A-Za-z0-9]+/] # => "foo"
That stopped at _, because the class [A-Za-z0-9] doesn't include it.
\w should be considered a matching pattern for variable names, NOT for alphanumerics. If you want a character class for alphanumerics, look at the POSIX \[\[:alnum:\]\] class:
'foo_BAR12'[/[[:alnum:]]+/] # => "foo"
There are a lot of possibilities, e.g.:
string.gsub(/\W/) { |m| m if m == ' ' }.split
or, even clearer:
string.gsub(/\W/) { |m| m if m.strip.empty? }.split
Very simple. The following gives you the array you want without your having to use split:
string.scan(/\w+/)
Play around with it on Rubular.com.
Do as belowe using String#scan
string = "Honey - a sweet, sticky, yellow fluid made by bees and other insects from nectar collected from flowers."
string.scan(/[a-zA-Z0-9]+/)
# => ["Honey",
# "a",
# "sweet",
# "sticky",
# "yellow",
# "fluid",
# "made",
# "by",
# "bees",
# "and",
# "other",
# "insects",
# "from",
# "nectar",
# "collected",
# "from",
# "flowers"]

Map string to another string in Ruby Rails

Hey guys,
i have 5 model attributes, for example, 'str' and 'dex'. A user has strength, dexterity attribute.
When i call user.increase_attr('dex') i want to do it through 'dex' and not having to pass 'dexterity' string all the way.
Of course, i can just check if ability == 'dex' and convert it to 'dexterity' when i will need to do user.dexterity += 1 and then save it.
But what is a good ruby way to do that ?
Look at Ruby's Abbrev module that's part of the standard library. This should give you some ideas.
require 'abbrev'
require 'pp'
class User
def increase_attr(s)
"increasing using '#{s}'"
end
end
abbreviations = Hash[*Abbrev::abbrev(%w[dexterity strength speed height weight]).flatten]
user = User.new
user.increase_attr(abbreviations['dex']) # => "increasing using 'dexterity'"
user.increase_attr(abbreviations['s']) # => "increasing using ''"
user.increase_attr(abbreviations['st']) # => "increasing using 'strength'"
user.increase_attr(abbreviations['sp']) # => "increasing using 'speed'"
If an ambiguous value is passed in, (the "s"), nothing will match. If a unique value is found in the hash, the returned value is the full string, making it easy to map short strings to the full string.
Because having varying lengths of the trigger strings would be confusing to the user you could strip all elements of the hash that have keys shorter than the shortest unambiguous key. In other words, remove anything shorter than two characters because of the collision of "speed" ("sp") and "strength" ("st"), meaning "h", "d" and "w" need to go. It's a "be kind to the poor human users" thing.
Here's what is created when Abbrev::abbrev does its magic and it's coerced into a Hash.
pp abbreviations
# >> {"dexterit"=>"dexterity",
# >> "dexteri"=>"dexterity",
# >> "dexter"=>"dexterity",
# >> "dexte"=>"dexterity",
# >> "dext"=>"dexterity",
# >> "dex"=>"dexterity",
# >> "de"=>"dexterity",
# >> "d"=>"dexterity",
# >> "strengt"=>"strength",
# >> "streng"=>"strength",
# >> "stren"=>"strength",
# >> "stre"=>"strength",
# >> "str"=>"strength",
# >> "st"=>"strength",
# >> "spee"=>"speed",
# >> "spe"=>"speed",
# >> "sp"=>"speed",
# >> "heigh"=>"height",
# >> "heig"=>"height",
# >> "hei"=>"height",
# >> "he"=>"height",
# >> "h"=>"height",
# >> "weigh"=>"weight",
# >> "weig"=>"weight",
# >> "wei"=>"weight",
# >> "we"=>"weight",
# >> "w"=>"weight",
# >> "dexterity"=>"dexterity",
# >> "strength"=>"strength",
# >> "speed"=>"speed",
# >> "height"=>"height",
# >> "weight"=>"weight"}
def increase_attr(attr)
attr_map = {'dex' => :dexterity, 'str' => :strength}
increment!(attr_map[attr]) if attr_map.include?(attr)
end
Basically create a Hash that has the key of 'dex', 'str' etc and points to the expanded version of that word(in symbol format).

Is it possible to simulate the behaviour of sprintf("%g") using the Rails NumberHelper methods?

sprintf("%g", [float]) allows me to format a floating point number without specifying precision, such that 10.00 is rendered as 10 and 10.01 is rendered as 10.01, and so on. This is neat.
In my application I'm rendering numbers using the Rails NumberHelper methods so that I can take advantage of the localization features, but I can't figure out how to achieve the above functionality through these helpers since they expect an explicit :precision option.
Is there a simple way around this?
Why not just use Ruby's Kernel::sprintf with NumberHelper? Recommended usage with this syntax: str % arg where str is the format string (%g in your case):
>> "%g" % 10.01
=> "10.01"
>> "%g" % 10
=> "10"
Then you can use the NumberHelper to print just the currency symbol:
>> foo = ActionView::Base.new
>> foo.number_to_currency(0, :format => "%u") + "%g"%10.0
=> "$10"
and define your own convenience method:
def pretty_currency(val)
number_to_currency(0, :format => "%u") + "%g"%val
end
pretty_currency(10.0) # "$10"
pretty_currency(10.01) # "$10.01"
I have solved this by adding another method to the NumberHelper module as follows:
module ActionView
module Helpers #:nodoc:
module NumberHelper
# Formats a +number+ such that the the level of precision is determined using the logic of sprintf("%g%"), that
# is: "Convert a floating point number using exponential form if the exponent is less than -4 or greater than or
# equal to the precision, or in d.dddd form otherwise."
# You can customize the format in the +options+ hash.
#
# ==== Options
# * <tt>:separator</tt> - Sets the separator between the units (defaults to ".").
# * <tt>:delimiter</tt> - Sets the thousands delimiter (defaults to "").
#
# ==== Examples
# number_with_auto_precision(111.2345) # => "111.2345"
# number_with_auto_precision(111) # => "111"
# number_with_auto_precision(1111.2345, :separator => ',', :delimiter => '.') # "1,111.2345"
# number_with_auto_precision(1111, :separator => ',', :delimiter => '.') # "1,111"
def number_with_auto_precision(number, *args)
options = args.extract_options!
options.symbolize_keys!
defaults = I18n.translate(:'number.format', :locale => options[:locale], :raise => true) rescue {}
separator ||= (options[:separator] || defaults[:separator])
delimiter ||= (options[:delimiter] || defaults[:delimiter])
begin
number_with_delimiter("%g" % number,
:separator => separator,
:delimiter => delimiter)
rescue
number
end
end
end
end
end
It is the specific call to number_with_delimiter with the %g option which renders the number as described in the code comments above.
This works great for me, but I'd welcome thoughts on this solution.

Resources