How can remove http(s), www(n) and public suffixes in ruby? - ruby-on-rails

Input => Expected Output
https://mail.google.com.au => mail.google
http://www.google.in => google
https://www9.calendar.google.co.uk => calendar.google
https://www12.stage.calendar.google.co.uk => stage.calendar.google
www.blog.botreetechnologies.com => blog.botreetechnologies
Update
t = URI.parse 'http://www.google.com'
t.host
#=> "www.google.com"
URI.split 'http://www.google.com'
#=> ["http", nil, "www.google.com", nil, nil, "", nil, nil, nil]
uri = URI.parse("http://www.google.co.uk")
#=> #<URI::HTTP http://www.google.co.uk>
domain = PublicSuffix.parse(uri.host)
#=> #<PublicSuffix::Domain:0x00000003c538e0 #sld="google", #tld="co.uk", #trd="www">
domain.sld
#=> "google"
uri = URI.parse("http://www.mail.google.co.uk")
#=> #<URI::HTTP http://www.mail.google.co.uk>
domain = PublicSuffix.parse(uri.host)
#=> #<PublicSuffix::Domain:0x00000002e97bc0 #sld="google", #tld="co.uk", #trd="www.mail">
domain.sld
#=> "google"
%w[http://www.example.com/page http://blog.example.com/page].each do |u|
puts URI.parse(u).host.sub(/^www\./, '')
end
# example.com
# blog.example.com
uri = URI.parse("www.pinkpoodles.com.au")
#=> #<URI::Generic www.pinkpoodles.com.au>
uri.host
#=> nil

I can't think of a "one-liner", but something like this would work:
require 'uri'
require 'public_suffix'
def simple_host(uri)
uri = URI(uri)
uri = URI("http://#{uri}") unless uri.scheme
domain = PublicSuffix.parse(uri.host)
trd = domain.trd
if trd
trd = trd.split('.')
trd.shift if trd.first.start_with?('www')
end
[*trd, domain.sld].join('.')
end
simple_host('https://mail.google.com.au') #=> "mail.google"
simple_host('http://www.google.in') #=> "google"
simple_host('https://www9.calendar.google.co.uk') #=> "calendar.google"
simple_host('https://www12.stage.calendar.google.co.uk') #=> "stage.calendar.google"
simple_host('www.blog.botreetechnologies.com') #=> "blog.botreetechnologies"

Not a job for one line. But here's a function that does some string manipulation that at least satisfies the test cases you have provided.
def foo(url)
url = url.split("//").last.split(".")
url = url.each do |word|
if word.include?("http") || word.include?("www") || word.length < 3
url.delete(word)
end
end
if url.length > 1
if url.length >= 3 && url[2].length > 3
url = [url[0], url[1], url[2]].join('.')
else
url = [url[0], url[1]].join(".")
end
else
url = url.first
end
end
foo 'http://www.google.in'
# => 'google'
foo 'https://www9.calendar.google.co.uk'
# => 'calendar.google'
foo 'https://mail.google.com.au'
# => 'mail.google'
foo 'https://www12.stage.calendar.google.co.uk'
# => 'stage.calendar.google'
foo 'www.blog.botreetechnologies.com'
# => 'blog.botreetechnologies'

Here how I fixed that then. On a toe so pasting quickly
def filename(website_domain)
domain = website_domain.starts_with?('http') ? website_domain : "https://#{website_domain}"
uri = URI.parse domain
suffix = PublicSuffix.parse(uri.host)
name = uri.host.sub(/^www\d*\./i, '').sub(/\.#{suffix.tld}/, '')
"#{name}.filtered.xml"
end

Related

Is there a method to set a value in rails to nil and save?

What I'm thinking of is something where I can say:
e = Foo.new
e.bar = "hello"
e.save
e.reload
e.bar.nil!
e.reload
e.bar.nil? => true
Kind of #touch but sets nil and saves.
EDIT
Super sorry guys. I mean this:
e = Foo.new
e.bar = "hello"
e.save
e.reload
e.bar.nil!
e.reload
e.bar.nil? => true
Maybe something like:
module ActiveRecord
class Base
def nil!(*names)
unless persisted?
raise ActiveRecordError, <<-MSG.squish
cannot nil on a new or destroyed record object. Consider using
persisted?, new_record?, or destroyed? before nilling
MSG
end
unless names.empty?
changes = {}
names.each do |column|
column = column.to_s
changes[column] = write_attribute(column, nil)
end
primary_key = self.class.primary_key
scope = self.class.unscoped.where(primary_key => _read_attribute(primary_key))
if locking_enabled?
locking_column = self.class.locking_column
scope = scope.where(locking_column => _read_attribute(locking_column))
changes[locking_column] = increment_lock
end
clear_attribute_changes(changes.keys)
result = scope.update_all(changes) == 1
if !result && locking_enabled?
raise ActiveRecord::StaleObjectError.new(self, "nil")
end
#_trigger_update_callback = result
result
else
true
end
end
end
end
Put that in an initializer and it'll let you null out the title of a comment with Comment.last.nil!(:title).
You can't save a nil to the database, and furthermore, once an object has been created as a particular class you can never change that. It can only be converted by creating a new object, something an in-place modifier like this hypothetical nil! does.
The closest thing you can get is:
e = Foo.new
e.bar = "hello"
e.save
e.reload
e.delete!
e.reload
e.destroyed? # => true
f = Foo.find_by(id: e.id)
f.nil? # => true

In a method, how to use a default argument in abscence of second parameter in ruby?

def alphabetize(arr,rev=false)
if rev
arr.sort!{|a,b| b<=>a}
else
arr.sort!
end
puts arr
end
alphabetize([5,3,8,1],false)
This is a code which I am supposed to submit on a codecademy exercise, but upon submission I get the following error:
It looks like your method doesn't default to alphabetizing an array when it doesn't receive a second parameter.
Remove the false argument so you have on your last line:
alphabetize([5,3,8,1])
or this worked for me:
def alphabetize(arr, rev=false)
arr.sort!
if rev
arr.reverse!
else
arr
end
end
numbers = [5,7,2,3]
alphabetize(numbers)
puts numbers
You should puts return value from this method outside
def alphabetize(arr,rev=false)
if rev
arr.sort!{|a,b| b<=>a}
else
arr.sort!
end
arr
end
puts alphabetize(['d', 'c', 'a', 'b'])
If you puts the arr inside the method, the method will return nil, not arr itself. For example:
irb(main):001:0> def test()
irb(main):002:1> puts "hello"
irb(main):003:1> end
=> :test
irb(main):004:0> a = test()
hello
=> nil
irb(main):005:0> a
=> nil
irb(main):006:0> def test()
irb(main):007:1> "hello"
irb(main):008:1> end
=> :test
irb(main):009:0> a = test()
=> "hello"

Using BOSS api with rails

I am trying to use yahoo's BOSS api with rails.
Controller:
class WelcomeController < ApplicationController
def index
require 'bossman'
BOSSMan.application_id = "api key"
boss = BOSSMan::Search.web("prospect park", :count => 5, :filter => "-hate")
puts "Matches:"
puts
boss.results.each { |result| puts "#{result.title} [#{result.url}]" }
end
end
In gem file I have include
gem 'gemcutter'
gem 'bossman','~> 0.4.1'
gem 'fakeweb'
gem 'spec'
gem 'activesupport'
When I run the application, I get the following error:
No such file or directory - getaddrinfo
Extracted source (around line #6):
BOSSMan.application_id = ""
boss = BOSSMan::Search.images("brooklyn dumbo", :dimensions => "large") #Line 6
boss.results.map { |result| result.url }
end
After long hours of debugging I finally managed to get the boss api to work via rails. It's not really polished, but my guess is everybody should be able to work with it.
This is how:
require 'oauth_util.rb'
require 'net/http'
def get_response(buckets, query_params)
parsed_url = URI.parse("https://yboss.yahooapis.com/ysearch/#{buckets}" )
o = OauthUtil.new
o.consumer_key = YAHOO_KEY
o.consumer_secret = YAHOO_SECRET
o.sign(parsed_url,query_params)
Net::HTTP.start( parsed_url.host, parsed_url.port, :use_ssl => true ) { | http |
req = Net::HTTP::Get.new ("/ysearch/#{buckets}?format=json&q=#{query_params}")
req['Authorization'] = o.header
response = http.request(req)
return response.read_body
}
end
Using my altered Verison of oauth_util.rb
# A utility for signing an url using OAuth in a way that's convenient for debugging
# Note: the standard Ruby OAuth lib is here http://github.com/mojodna/oauth
# License: http://gist.github.com/375593
# Usage: see example.rb below
require 'uri'
require 'cgi'
require 'openssl'
require 'base64'
class OauthUtil
attr_accessor :consumer_key, :consumer_secret, :token, :token_secret, :req_method,
:sig_method, :oauth_version, :callback_url, :params, :req_url, :base_str
def initialize
#consumer_key = ''
#consumer_secret = ''
#token = ''
#token_secret = ''
#req_method = 'GET'
#sig_method = 'HMAC-SHA1'
#oauth_version = '1.0'
#callback_url = ''
#time
end
# openssl::random_bytes returns non-word chars, which need to be removed. using alt method to get length
# ref http://snippets.dzone.com/posts/show/491
def nonce
Array.new( 5 ) { rand(256) }.pack('C*').unpack('H*').first
end
def percent_encode( string )
# ref http://snippets.dzone.com/posts/show/1260
return URI.escape( string, Regexp.new("[^#{URI::PATTERN::UNRESERVED}]") ).gsub('*', '%2A')
end
# #ref http://oauth.net/core/1.0/#rfc.section.9.2
def signature
key = percent_encode( #consumer_secret ) + '&' + percent_encode( #token_secret )
# ref: http://blog.nathanielbibler.com/post/63031273/openssl-hmac-vs-ruby-hmac-benchmarks
digest = OpenSSL::Digest.new( 'sha1' )
hmac = OpenSSL::HMAC.digest( digest, key, #base_str )
# ref http://groups.google.com/group/oauth-ruby/browse_thread/thread/9110ed8c8f3cae81
Base64.encode64( hmac ).chomp.gsub( /\n/, '' )
end
# sort (very important as it affects the signature), concat, and percent encode
# #ref http://oauth.net/core/1.0/#rfc.section.9.1.1
# #ref http://oauth.net/core/1.0/#9.2.1
# #ref http://oauth.net/core/1.0/#rfc.section.A.5.1
def query_string
pairs = []
#params.sort.each { | key, val |
pairs.push( "#{ percent_encode( key ) }=#{ percent_encode( val.to_s ) }" )
}
pairs.join '&'
end
def header
'OAuth oauth_version="1.0",oauth_nonce="'+#nonce_now+'",oauth_timestamp="'+#time+'",oauth_consumer_key="'+#consumer_key+'",oauth_signature_method="'+#sig_method+'",oauth_signature="'+percent_encode(signature)+'"'
end
# organize params & create signature
def sign( parsed_url, query_param )
#time=Time.now.to_i.to_s
#nonce_now=nonce
#params = {
'format' => 'json',
'oauth_consumer_key' => #consumer_key,
'oauth_nonce' => #nonce_now,
'oauth_signature_method' => #sig_method,
'oauth_timestamp' => #time,
'oauth_version' => #oauth_version,
'q'=> query_param
}
# if url has query, merge key/values into params obj overwriting defaults
#if parsed_url.query
# #params.merge! CGI.parse( parsed_url.query )
#end
# #ref http://oauth.net/core/1.0/#rfc.section.9.1.2
#req_url = parsed_url.scheme + '://' + parsed_url.host + parsed_url.path
# create base str. make it an object attr for ez debugging
# ref http://oauth.net/core/1.0/#anchor14
#base_str = [
#req_method,
percent_encode(req_url),
# normalization is just x-www-form-urlencoded
percent_encode(query_string)
].join( '&' )
# add signature
return self
end
end
You can try and include Socket lib at the top of your controller.
require 'socket'
this seems like some socket issue with BOSS.

How can I change data collection from hash to array?

Now I'm fetching data from another url...
Here is my code:
require 'rubygems'
require 'nokogiri'
html = page.body
doc = Nokogiri::HTML(html)
doc.encoding = 'utf-8'
rows = doc.search('//table[#id = "MainContent_GridView1"]//tr')
#details = rows.collect do |row|
detail = {}
[
[:car, 'td[1]/text()'],
[:article, 'td[2]/text()'],
[:group, 'td[3]/text()'],
[:price, 'td[4]/text()'],
].each do |name, xpath|
detail[name] = row.at_xpath(xpath).to_s.strip
end
detail
end
#details
I tried to do it via array, not a hash. But I get a lot of errors...
Are there any ideas?
I need it for another method...
also i set data (this result hash) to another car here:
oem_art = []
#constr_num.each do |o|
as_oem = get_from_as_oem(o.ARL_SEARCH_NUMBER)
if as_oem.present?
oem_art << as_oem
end
end
#oem_art = oem_art.to_a.uniq
Do you just want to change a hash into an array? If so, just use the to_a method on your hash.
hash = {:a => "something", :b => "something else"}
array = hash.to_a
array.inspect #=> [[:a, "something"], [:b, "something else"]]
It looks like you're looking for something like hash['key'] to hash.key in Ruby
The Hash Class doesn't support .key notation by default, OpenStruct creates an Object from the Hash so you can use dot notation to access the properties. Overall it's basically just syntactic sugar with overhead.
Suggested code (from linked answer)
>> require 'ostruct'
=> []
>> foo = {'bar'=>'baz'}
=> {"bar"=>"baz"}
>> foo_obj = OpenStruct.new foo
=> #<OpenStruct bar="baz">
>> foo_obj.bar
=> "baz"
So in your example, you could do:
# Initialised somewhere
require 'ostruct'
DETAIL_INDICES = {
:car => 1,
:article => 2,
:group => 3,
:price => 4,
}
# ** SNIP **
#details = rows.map do |row|
DETAIL_INDICES.inject({}) do |h,(k,v)|
h.merge(k => row.at_xpath("td[#{v}]/text()").to_s.strip)
end
end.collect { |hash| OpenStruct.new hash }
#details.each do |item|
puts item.car
end
Of course if performance is a concern you can merge your map&collect (They are the same), but this is just a minor separation for basic semantic differences, although I usually only use map for consistency, so feel free to choose yourself :)
EDIT -- Additional code from your edit simplified
#oem_art = #constr_num.select do |item|
as_oem = get_from_as_oem(item.ARL_SEARCH_NUMBER)
as_oem.present?
end
puts #oem_art.uniq

How to properly handle changed attributes in a Rails before_save hook?

I have a model that looks like this:
class StopWord < ActiveRecord::Base
UPDATE_KEYWORDS_BATCH_SIZE = 1000
before_save :update_keywords
def update_keywords
offset = 0
max_id = ((max_kw = Keyword.first(:order => 'id DESC')) and max_kw.id) || 0
while offset <= max_id
begin
conditions = ['id >= ? AND id < ? AND language = ? AND keyword RLIKE ?',
offset, offset + UPDATE_KEYWORDS_BATCH_SIZE, language]
# Clear keywords that matched the old stop word
if #changed_attributes and (old_stop_word = #changed_attributes['stop_word']) and not #new_record
Keyword.update_all 'stopword = 0', conditions + [old_stop_word]
end
Keyword.update_all 'stopword = 1', conditions + [stop_word]
rescue Exception => e
logger.error "Skipping batch of #{UPDATE_KEYWORDS_BATCH_SIZE} keywords at offset #{offset}"
logger.error "#{e.message}: #{e.backtrace.join "\n "}"
ensure
offset += UPDATE_KEYWORDS_BATCH_SIZE
end
end
end
end
This works just fine, as the unit tests show:
class KeywordStopWordTest < ActiveSupport::TestCase
def test_stop_word_applied_on_create
kw = Factory.create :keyword, :keyword => 'foo bar baz', :language => 'en'
assert !kw.stopword, 'keyword is not a stop word by default'
sw = Factory.create :stop_word, :stop_word => kw.keyword.split(' ')[1], :language => kw.language
kw.reload
assert kw.stopword, 'keyword is a stop word'
end
def test_stop_word_applied_on_save
kw = Factory.create :keyword, :keyword => 'foo bar baz', :language => 'en', :stopword => true
sw = Factory.create :keyword_stop_word, :stop_word => kw.keyword.split(' ')[1], :language => kw.language
sw.stop_word = 'blah'
sw.save
kw.reload
assert !kw.stopword, 'keyword is not a stop word'
end
end
But mucking with the #changed_attributes instance variable just feels wrong. Is there a standard Rails-y way to get the old value of an attribute that is being modified on a save?
Update: Thanks to Douglas F Shearer and Simone Carletti (who apparently prefers Murphy's to Guinness), I have a cleaner solution:
def update_keywords
offset = 0
max_id = ((max_kw = Keyword.first(:order => 'id DESC')) and max_kw.id) || 0
while offset <= max_id
begin
conditions = ['id >= ? AND id < ? AND language = ? AND keyword RLIKE ?',
offset, offset + UPDATE_KEYWORDS_BATCH_SIZE, language]
# Clear keywords that matched the old stop word
if stop_word_changed? and not #new_record
Keyword.update_all 'stopword = 0', conditions + [stop_word_was]
end
Keyword.update_all 'stopword = 1', conditions + [stop_word]
rescue StandardError => e
logger.error "Skipping batch of #{UPDATE_KEYWORDS_BATCH_SIZE} keywords at offset #{offset}"
logger.error "#{e.message}: #{e.backtrace.join "\n "}"
ensure
offset += UPDATE_KEYWORDS_BATCH_SIZE
end
end
end
Thanks, guys!
You want ActiveModel::Dirty.
Examples:
person = Person.find_by_name('Uncle Bob')
person.changed? # => false
person.name = 'Bob'
person.changed? # => true
person.name_changed? # => true
person.name_was # => 'Uncle Bob'
person.name_change # => ['Uncle Bob', 'Bob']
Full documentation: http://api.rubyonrails.org/classes/ActiveModel/Dirty.html
You're using the right feature but the wrong API.
You should #changes and #changed?.
See this article and the official API.
Two additional notes about your code:
Never rescue Exception directly when you actually want to rescue execution errors. This is Java-style. You should rescue StandardError instead because lower errors are normally compilation error or system error.
You don't need the begin block in this case.
def update_keywords
...
rescue => e
...
ensure
...
end

Resources