Given a URL, how can I get just the domain? - ruby-on-rails

Given URLs like:
http://online.wsj.com/
http://online.wsj.com/article/SB10001424052970204409004577158764211274708.html
http://www.techcrunch.com/2012/01/13/techcrunch-coo/
Using Ruby/Rails, how can I return back just the domain?
online.wsj.com
online.wsj.com
techcrunch.com
No protocol, no slashes, just the subdomain if it's not www, and the domain, and ext?

Use Addressable::URI.parse and the #host instance method:
Addressable::URI.parse("http://techcrunch.com/foo/bar").host #=> "techcrunch.com"

Be aware that if you have an url without http://, this returns nil:
require 'uri'
url = "www.techcrunch.com/2012/01/13/techcrunch-coo/"
p URI.parse(url).host # nil
So something like this should be a safer solution:
require 'uri'
url = "www.techcrunch.com/2012/01/13/techcrunch-coo/"
url = 'http://' + url unless url.match(/^http:\/\//)
puts URI.parse(url).host

pry(main)> require 'uri'
pry(main)> url = "http://www.techcrunch.com/2012/01/13/techcrunch-coo?param1=foo&param2=bar"
pry(main)> URI.parse(url).host
=> "www.techcrunch.com"

>> require 'uri'
>> URI.parse("http://www.techcrunch.com/2012/01/13/techcrunch-coo/").host
=> "www.techcrunch.com"

Related

Use Hash value on url with Nokogiri or RestClient

I have a url like :
http://172.0.0.1:22230/test.action?data={"foo":"bar","joe":"doe"}&sign=x6das
In my browser I can get data from that url, but if I'm use nokogiri
Nokogiri::HTML(open('http://172.0.0.1:22230/test.action?data={"foo":"bar","joe":"doe"}&sign=x6das'))
I get
URI::InvalidURIError: bad URI(is not URI?): http://172.0.0.1:22230/test.action?data={"foo":"bar","joe":"doe"}&sign=x6das
from /home/worka/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/uri/common.rb:176:in `split'
Also with RestClient
RestClient.get 'http://172.0.0.1:22230/test.action?data={"foo":"bar","joe":"doe"}&sign=x6das'
I got same an error.
Encode your url first then use it.
url = 'http://172.0.0:22230/test.action?data={"foo":"bar","joe":"doe"}&sign=x6das'
encoded_url = CGI::escape(url)
Nokogiri::HTML(open(encoded_url))
When dealing with URIs, it's a good idea to use the tools designed for them such as URI, which comes with Ruby.
The URI can't be
http://172.0.0.1:22230/test.action?data={"foo":"bar","joe":"doe"}&sign=x6das
because the data component is invalid. If you are adding data then I'd start with:
require 'uri'
uri = URI.parse('http://172.0.0.1:22230/test.action?sign=x6das')
query = URI.decode_www_form(uri.query).to_h # => {"sign"=>"x6das"}
data = {"foo" => "bar","joe" => "doe"}
uri.query = URI.encode_www_form(query.merge(data)) # => "sign=x6das&foo=bar&joe=doe"
uri.to_s # => "http://172.0.0.1:22230/test.action?sign=x6das&foo=bar&joe=doe"
Your initial example using {"foo":"bar","joe":"doe"} is JSON serialized data, which usually isn't passed in a URL like that. If you need to create JSON, start with the initial hash:
require 'json'
data = {"foo" => "bar","joe" => "doe"}
data.to_json # => "{\"foo\":\"bar\",\"joe\":\"doe\"}"
to_json serializes the hash into a string, which could then be encoded into the URI:
data = {"foo" => "bar","joe" => "doe"}
uri = URI.parse('http://172.0.0.1:22230/test.action?sign=x6das')
query = URI.decode_www_form(uri.query).to_h # => {"sign"=>"x6das"}
uri.query = URI.encode_www_form(query.merge('data' => data.to_json)) # => "sign=x6das&data=%7B%22foo%22%3A%22bar%22%2C%22joe%22%3A%22doe%22%7D"
But again, sending encoded JSON as a query parameter in the URI is not very common or standard since data payload is smaller without the JSON encoding.
Ok I got solved my problem
url = http://172.0.0.1:22230/test.action?data={"foo":"bar","joe":"doe"}&sign=x6das
RestClient.get(URI.encode(url.strip))

Ruby hexdigest sha1 pack('H*') string encoding...

I meet an encoding problem... No errors in the console, but the output is not well encoded.
I must use Digest::SHA1.hexdigest on a string and then must pack the result.
The below example should outputs '{´p)ODýGΗ£Iô8ü:iÀ' but it outputs '{?p)OD?GΗ?I?8?:i?' in the console and '{�p)OD�G^BΗ�I�8^D�:i�' in the log file.
So, my variable called pack equals '{?p)OD?GΗ?I?8?:i?' and not '{´p)ODýGΗ£Iô8ü:iÀ'. That's a big problem... I'm doing it in a Rails task.
Any idea guys?
Thanks
# encoding: utf-8
require 'digest/sha1'
namespace :my_app do
namespace :check do
desc "Description"
task :weather => :environment do
hexdigest = Digest::SHA1.hexdigest('29d185d98c984a359e6e6f26a0474269partner=100043982026&code=34154&profile=large&filter=movie&striptags=synopsis%2Csynopsisshort&format=json&sed=20130527')
pack = [hexdigest].pack("H*")
puts pack # => {?p)OD?GΗ?I?8?:i?
puts '{´p)ODýGΗ£Iô8ü:iÀ' # => {´p)ODýGΗ£Iô8ü:iÀ
end
end
end
This is what I did (my conversion from PHP to Ruby)
# encoding: utf-8
require 'open-uri'
require 'base64'
require 'digest/sha1'
class Allocine
$_api_url = 'http://api.allocine.fr/rest/v3'
$_partner_key
$_secret_key
$_user_agent = 'Dalvik/1.6.0 (Linux; U; Android 4.2.2; Nexus 4 Build/JDQ39E)'
def initialize (partner_key, secret_key)
$_partner_key = partner_key
$_secret_key = secret_key
end
def get(id)
# build the params
params = { 'partner' => $_partner_key,
'code' => id,
'profile' => 'large',
'filter' => 'movie',
'striptags' => 'synopsis,synopsisshort',
'format' => 'json' }
# do the request
response = _do_request('movie', params)
return response
end
private
def _do_request(method, params)
# build the URL
query_url = $_api_url + '/' + method
# new algo to build the query
http_build_query = Rack::Utils.build_query(params)
sed = DateTime.now.strftime('%Y%m%d')
sig = URI::encode(Base64.encode64(Digest::SHA1.digest($_secret_key + http_build_query + '&sed=' + sed)))
return sig
end
end
Then call
allocine = Allocine.new(ALLOCINE_PARTNER_KEY, ALLOCINE_SECRET_KEY)
puts allocine.get('any ID')
get method return 'e7RwKU9E%2FUcCzpejSfQ4BPw6acA%3D' in PHP and 'cPf6I4ZP0qHQTSVgdKTbSspivzg=%0A' in Ruby...
thanks again
I think this "encoding" issue has turned up due to debugging other parts of a conversion from PHP to Ruby. The target API that will consume a digest of params looks like it will accept a signature variable constructed in Ruby as follows (edit: well this is guess, there may also be relevant differences between Ruby and PHP in URI encoding and base64 defaults):
require 'digest/sha1'
require 'base64'
require 'uri'
sig_data = 'edhefhekjfhejk8edfefefefwjw69partne...'
sig = URI.encode( Base64.encode64( Digest::SHA1.digest( sig_data ) ) )
=> "+ZabHg22Wyf7keVGNWTc4sK1ez4=%0A"
The exact construction of sig_data from the parameters that are being signed is also important. That is generated by the PHP method http_build_query, and I do not know what order or escaping that will apply to input params. If your Ruby version gets them in a different order, or escapes differently to PHP, the signature will be wrong (edit: Actually it is possible we are looking here for a signature on the exact query string sent the API - I don't know). It is possibly an issue of that sort that has led you down the rabbit hole of how the signature is constructed?
Thank you guys for your help.
Problem is solved. With the following code I obtain exactly the same string as with PHP:
http_build_query = Rack::Utils.build_query(params)
sed = DateTime.now.strftime('%Y%m%d')
sig = CGI::escape(Base64.strict_encode64(Digest::SHA1.digest($_secret_key + http_build_query + '&sed=' + sed)))
Now I've another problem for which I opened a new question here.
thanks you very much.

creating a helper method to remove http or https from the beginning of a url in rails

I'm accepting user input in the form of web links, e.g.: http://google.com
In my database I don't want to store the http:// prefix, or https://. I was going to do a string search at the beginning of the URL for those two things.
I feel like this is something rails/ruby might do out of the box, does anyone know of such a function?
From the document:
require 'uri'
uri = URI("http://foo.com/posts?id=30&limit=5#time=1305298413")
uri.scheme #=> "http"
uri.host #=> "foo.com"
uri.path #=> "/posts"
uri.query #=> "id=30&limit=5"
uri.fragment #=> "time=1305298413"
Or,
require 'uri'
URI.split("http://www.ruby-lang.org/")
# => ["http", nil, "www.ruby-lang.org", nil, nil, "/", nil, nil, nil]

Rails Facebook avatar to data-uri

I'm trying to pull a facebook avatar via auth. Here's what i'm doing:
def image_uri
require 'net/http'
image = URI.parse(params[:image]) # https://graph.facebook.com/565515262/picture
fetch = Net::HTTP.get_response(image)
based = 'data:image/jpg;base64,' << Base64.encode64(fetch)
render :text => based
end
I'm getting the following error (new error — edited):
Connection reset by peer
I've tried googling about, I can't seem to get a solution, any ideas?
I'm basically looking for the exact functioning of PHP's file_get_contents()
Try escaping the URI before parsing:
URI.parse URI.escape(params[:image])
Make sure that params[:image] does contain the uri you want to parse... I would instead pass the userid and interpolate it into the uri.
URI.parse URI.escape("https://graph.facebook.com/#{params[:image]}/picture)"
Does it throw the same error when you use a static string "https://graph.facebook.com/565515262/picture"
What does it say when you do
render :text => params[:image]
If both of the above don't answer your question then please try specifying the use of HTTPS-
uri = URI('https://secure.example.com/some_path?query=string')
Net::HTTP.start(uri.host, uri.port, :use_ssl => uri.scheme == 'https').start do |http|
request = Net::HTTP::Get.new uri.request_uri
response = http.request request # Net::HTTPResponse object
end
Presuming you are on ruby < 1.9.3, you will also have to
require 'net/https'
If you are on ruby 1.9.3 you don't have to do anything.
Edit
If you are on the latest version, you can simply do:
open(params[:image]) # http://graph.facebook.com/#{#user.facebook_id}/picture

URL substring matching regular expression?

Suppose I only want user to type in url starts with http://www.google.com
What is the regular expression for this?
Thanks!
Just get the substring from 0 to the length of http://www.google.com and you're done.
Rather than use a regex, you might want to consider using the URI library that comes with Ruby. It's made to take apart and build URLs, is well tested, and less error-prone than trying to reinvent the same functionality.
require 'uri'
url = URI.parse('http://www.google.com/path/to/page.html?a=1&b=2')
url.scheme # => "http"
url.host # => "www.google.com"
url.path # => "/path/to/page.html"
url.query # => "a=1&b=2"
If that's not good enough, the Addressable::URI gem is even more capable.
Try this:
/\Ahttp:\/\/www\.google\.com(.*)?\Z/
ruby-1.9.2-p0 > "http://www.google.com" =~ /\Ahttp:\/\/www\.google\.com(.*)?\Z/
=> 0
ruby-1.9.2-p0 > "http://www.google.com/foobar" =~ /\Ahttp:\/\/www\.google\.com(.*)?\Z/
=> 0
ruby-1.9.2-p0 > $1
=> "/foobar"
Rails has a convenient start_with? method for this. If it's just a static string, no regular expression is needed.
url.start_with?("http://www.google.com")

Resources