Get URL headers without the HTML - ruby-on-rails

A bit of a strange question. Is there a way to ask a webserver to return only the headers and not the HTML itself ?
I want to ask a server for a URL and see if its valid (not 404/500/etc) and follow the redirections (if present) but not get the actual HTML content.
Thanks
Preferably a way to do this in Ruby

use HEAD instead of GET or POST
http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html Section 9.4

As suggested, check the Net::HTTP library..
require 'net/http'
Net::HTTP.new('www.twitter.com').request_head('/').class

This is exactly what HEAD HTTP method does.
For Ruby, there is a beautiful gem, much simpler than the low-level net/http that allows you to perform HEAD requests.
gem install rest-open-uri
then
irb> require 'rubygems'
=> true
irb> require 'rest-open-uri'
=> true
irb> sio = open("http://stackoverflow.com", :method => :head)
=> #
irb> sio.meta
=> {"expires"=>"Tue, 30 Nov 2010 18:08:47 GMT", "last-modified"=>"Tue, 30 Nov 2010 18:07:47 GMT", "content-type"=>"text/html; charset=utf-8", "date"=>"Tue, 30 Nov 2010 18:08:27 GMT", "content-length"=>"193779", "cache-control"=>"public, max-age=18", "vary"=>"*"}
irb> sio.status
=> ["200", "OK"]
It follows redirections. You have to rescue for SocketError when host doesn't exists or OpenURI::HTTPError if file doesn't exists.
If you want something more powerfull have a look at Mechanize or HTTParty.

Use Ruby's net/http and the HEAD method that Mak mentioned. Check ri Net::HTTP#head from the command line for info.

actually i had to fold pantulis' answer into my own. it seems like there are two kinds of urls neither fns worked alone so i did
module URI
def self.online?(uri)
URI.exists?(uri)
end
def self.exists?(uri)
URI.exists_ver1?(uri)
end
def self.exists_ver1?(url)
#url = url
["http://", "https://"].each do |prefix|
url = url.gsub(prefix, "")
end
begin
code = Net::HTTP.new(url).request_head('/').code
[2,3].include?(code.to_i/100)
rescue
URI.exists_ver2?(#url)
end
end
def self.exists_ver2?(url)
url = "http://#{url}" if URI.parse(url).scheme.nil?
return false unless URI.is_a?(url)
uri = URI(url)
begin
request = Net::HTTP.new uri.host
response= request.request_head uri.path
#http status code 200s and 300s are ok, everything else is an error
[2,3].include? response.code.to_i/100
rescue
false
end
end
end

Related

How to run cURL commands in Rails

I'm using Ruby on Rails 5 and I need to execute the following command in my application:
curl -F 'client_id=126581840734567' -F 'client_secret=678ebe1b3b8081231aab27dff738313' -F 'grant_type=authorization_code' -F 'redirect_uri=https://uri.com/' -F 'code=AQBi4L2Ohy3Q_N3V48OygFm0zb3gEsL985x5TIyDTNDJaLs93BwXiT1tyGYWoCg1HlBDU7ZRjUfLL5HVlzw4G-7YkVEjp6Id2WuqOz0Ylt-k2ADwDC5upH3CGVtHgf2udQhLlfDnQz5NPsnmxjg4bW3PJpW5FaQs8fn1ztgYp-ssfAf6IRt2-sI45ZC8cqqr5K_12y0Nq_Joh0H-tTfVyNLKatIxHPCqRDb3tfqgmxim1Q' https://api.instagram.com/oauth/access_token
so that it returns something like:
{"access_token": "IGQVJYS0k8V6ZACRC10WjYxQWtyMVRZAN8VXamh0RVBZAYi34RkFlOUxXZnTJsbjlEfnFJNmprQThmQ4hTckpFUmJEaXZAnQlNYa25aWURnX3hpO12NV1VMWDNMWmdIT3FicnJfZAVowM3VldlVWZAEViN1ZAidHlyU2VDMUNuMm2V", "user_id": 17231445640157812}
Is there a way to make Rails execute those types of commands? I was trying the following:
uri = URI.parse('https://api.instagram.com/oauth/access_token')
http = Net::HTTP.new(uri.host, uri.port)
request = Net::HTTP::Post.new(uri.request_uri)
request.set_form_data({
"client_id" => "126581840734567",
"client_secret" => "678ebe1b3b8081231aab27dff738313",
"grant_type" => "authorization_code",
"redirect_uri" => "http://nace.network/",
"code" => params[:code]
})
res = Net::HTTP.start(uri.hostname, uri.port) do |http|
http.request(request)
end
but I get the following error:
end of file reached
in this line:
res = Net::HTTP.start(uri.hostname, uri.port) do |http|
http.request(request)
end
You're using HTTPS, so you need to add this to your code:
Net::HTTP.start(uri.hostname, uri.port, use_ssl: true) do |http|
res = http.request(request)
end
But if you don't need persistent connections, you could also use this:
res = Net::HTTP.post_form(uri,
"client_id" => "126581840734567",
"client_secret" => "678ebe1b3b8081231aab27dff738313",
"grant_type" => "authorization_code",
"redirect_uri" => "http://nace.network/",
"code" => params[:code]
)
Also, you could consider using a library like Faraday, which is a lot easier to deal with.
Edit
This is from TinMan's comment below, sound points.
Using cURL from inside Ruby or Rails is extremely valuable. There is an incredible amount of functionality inside cURL that isn't implemented in Rails or Ruby; Even Ruby's HTTP clients have a hard time replicating it, so cURL is very acceptable depending on the needs of the application. And, depending on the application, because cURL is in compiled C, it could easily outrun pure Ruby clients.
Curl is a means of issuing HTTP (or HTTPs) requests from the command line.
You don't want to use CURL in Rails. You want to issue HTTP requests from within Rails. Using curl is okay, it's one way to issue HTTP requests from with Rails.
We can refine that down further to, you want to issue HTTP requests from Ruby. Narrowing/distilling down to the most basic version of the problem is always good to do.
We knew all this already probably - still worth writing down for us all to benefit from!
Use HTTP in Ruby
We want to use a HTTP Client. There are many but, for this I'm going to use Faraday (a gem) 'cause I like it.
You've made a good start with Ruby's built in NET:HTTP but I prefer Faraday's DSL. It results in more readable and extendable code.
So, here is a class! I barely tested this so, use as a starting point. Make sure you write some unit tests for it.
# This is a Plain Old Ruby Object (PORO)
# It will work in Rails but, isn't Rails specific.
require 'faraday' # This require is needed as it's a PORO.
class InstagramOAuth
attr_reader :code
# The code parameter will likely change frequently, so we provide it
# at run time.
def initialize(code)
#code = code
end
def get_token
connection.get('/oauth/access_token') do |request|
request.params[:code] = code
end
end
private
def connection
#connection ||= Faraday.new(
url: instagram_api_url,
params: params,
ssl: { :ca_path => https_certificate_location }
)
end
def instagram_api_url
#url ||= 'https://api.instagram.com'
end
# You need to find out where these are for your self.
def https_certificate_location
'/usr/lib/ssl/certs'
end
def params
# These params likely won't change to often so we set a write time
# in the class like this.
{
client_id: '126581840734567',
client_secret: '678ebe1b3b8081231aab27dff738313',
grant_type: 'authorization_code',
redirect_uri: 'https://uri.com/'
}
end
end
# How do we use it? Like so
# Your big old authorisation code from your question
code = 'AQBi4L2Ohy3Q_N3V48OygFm0zb3gEsL985x5TIyDTNDJaLs93BwXiT1tyGYWoCg1HlBDU'\
'7ZRjUfLL5HVlzw4G-7YkVEjp6Id2WuqOz0Ylt-k2ADwDC5upH3CGVtHgf2udQhLlfDnQz'\
'5NPsnmxjg4bW3PJpW5FaQs8fn1ztgYp-ssfAf6IRt2-sI45ZC8cqqr5K_12y0Nq_Joh0H'\
'-tTfVyNLKatIxHPCqRDb3tfqgmxim1Q'
# This will return a Faraday::Response object but, what is in it?
response = InstagramOAuth.new(code).get_token
# Now we've got a Hash
response_hash = response.to_hash
puts 'Request made'
puts "Request full URL: #{response_hash[:url]}"
puts "HTTP status code: #{response_hash[:status]}"
puts "HTTP response body: #{response_hash[:body]}"
When I ran the snippet above I got the following. The class works, you just need to tweak the request params until you get what you want. Hopefully the class demonstrates how to send HTTP requests in Ruby/Rails.
Request made
Request full URL: https://api.instagram.com/oauth/access_token?client_id=126581840734567&client_secret=678ebe1b3b8081231aab27dff738313&code=AQBi4L2Ohy3Q_N3V48OygFm0zb3gEsL985x5TIyDTNDJaLs93BwXiT1tyGYWoCg1HlBDU7ZRjUfLL5HVlzw4G-7YkVEjp6Id2WuqOz0Ylt-k2ADwDC5upH3CGVtHgf2udQhLlfDnQz5NPsnmxjg4bW3PJpW5FaQs8fn1ztgYp-ssfAf6IRt2-sI45ZC8cqqr5K_12y0Nq_Joh0H-tTfVyNLKatIxHPCqRDb3tfqgmxim1Q&grant_type=authorization_code&redirect_uri=https%3A%2F%2Furi.com%2F
HTTP status code: 405
HTTP response body:
Additional Reading
. https://lostisland.github.io/faraday/usage/
. https://github.com/lostisland/faraday/wiki/Setting-up-SSL-certificates

Rails Facebook avatar to data-uri

I'm trying to pull a facebook avatar via auth. Here's what i'm doing:
def image_uri
require 'net/http'
image = URI.parse(params[:image]) # https://graph.facebook.com/565515262/picture
fetch = Net::HTTP.get_response(image)
based = 'data:image/jpg;base64,' << Base64.encode64(fetch)
render :text => based
end
I'm getting the following error (new error — edited):
Connection reset by peer
I've tried googling about, I can't seem to get a solution, any ideas?
I'm basically looking for the exact functioning of PHP's file_get_contents()
Try escaping the URI before parsing:
URI.parse URI.escape(params[:image])
Make sure that params[:image] does contain the uri you want to parse... I would instead pass the userid and interpolate it into the uri.
URI.parse URI.escape("https://graph.facebook.com/#{params[:image]}/picture)"
Does it throw the same error when you use a static string "https://graph.facebook.com/565515262/picture"
What does it say when you do
render :text => params[:image]
If both of the above don't answer your question then please try specifying the use of HTTPS-
uri = URI('https://secure.example.com/some_path?query=string')
Net::HTTP.start(uri.host, uri.port, :use_ssl => uri.scheme == 'https').start do |http|
request = Net::HTTP::Get.new uri.request_uri
response = http.request request # Net::HTTPResponse object
end
Presuming you are on ruby < 1.9.3, you will also have to
require 'net/https'
If you are on ruby 1.9.3 you don't have to do anything.
Edit
If you are on the latest version, you can simply do:
open(params[:image]) # http://graph.facebook.com/#{#user.facebook_id}/picture

Submit post and get and receive the response page in ruby (external website)

I know to get a simple page I do:
require 'net/http'
source = Net::HTTP.get('example.com', '/index.html')
But how do I make a post from a form and get the page that returns the results of the data submitted? Is it possible?
According to Net::HTTP doc you can do
res = Net::HTTP.post_form("example.com/index.html", 'q' => 'ruby', 'max' => '50')
puts res.body
see http://ruby-doc.org/stdlib-1.9.3/libdoc/net/http/rdoc/Net/HTTP.html#method-c-post_form
An really easy way is to use the resttclient gem:
require 'rest_client'
result = RestClient.post 'http://example.com/resource', :param1 => 'one'

invalid URI - How to prevent, URI::InvalidURIError errors?

I got the following back from delayed_job:
[Worker(XXXXXX pid:3720)] Class#XXXXXXX failed with URI::InvalidURIError: bad URI(is not URI?): https://s3.amazonaws.com/cline-local-dev/2/attachments/542/original/mac-os-x[1].jpeg?AWSAccessKeyId=xxxxxxxx&Expires=1295403309&Signature=xxxxxxx%3D - 3 failed attempts
The way this URI comes from in my app is.
In my user_mailer I do:
#comment.attachments.each do |a|
attachments[a.attachment_file_name] = open(a.authenticated_url()) {|f| f.read }
end
Then in my attachments model:
def authenticated_url(style = nil, expires_in = 90.minutes)
AWS::S3::S3Object.url_for(attachment.path(style || attachment.default_style), attachment.bucket_name, :expires_in => expires_in, :use_ssl => attachment.s3_protocol == 'https')
end
That being said, is there some type of URI.encode or parsing I can do to prevent a valid URI (as I checked the URL works in my browser) for erroring and killing delayed_job in rails 3?
Thank you!
Ruby has (at least) two modules for dealing with URIs.
URI is part of the standard library.
Addressable::URI, is a separate gem, and more comprehensive, and claims to conform to the spec.
Parse a URL with either one, modify any parameters using the gem's methods, then convert it using to_s before passing it on, and you should be good to go.
I tried ' open( URI.parse(URI.encode( a.authenticated_url() )) ' but that errord with OpenURI::HTTPError: 403 Forbidden
If you navigated to that page via a browser and it succeeded, then later failed going to it directly via code, it's likely there is a cookie or session state that is missing. You might need to use something like Mechanize, which will maintain that state while allowing you to navigate through a site.
EDIT:
require 'addressable/uri'
url = 'http://www.example.com'
uri = Addressable::URI.parse(url)
uri.query_values = {
:foo => :bar,
:q => '"one two"'
}
uri.to_s # => "http://www.example.com?foo=bar&q=%22one%20two%22"

Mechanize with FakeWeb

I'm using Mechanize to extract the links from the page.
To ease with development, I'm using fakeweb to do superfast response to get less waiting and annoying with every code run.
tags_url = "http://website.com/tags/"
FakeWeb.register_uri(:get, tags_url, :body => "tags.txt")
agent = WWW::Mechanize.new
page = agent.get(tags_url)
page.links.each do |link|
puts link.text.strip
end
When I run the above code, it says:
nokogiri_test.rb:33: undefined method `links' for #<WWW::Mechanize::File:0x9a886e0> (NoMethodError)
After inspecting the class of the page object
puts page.class # => File
If I don't fake out the tags_url, it works since the page class is now Page
puts page.class # => Page
So, how can I use the fakeweb with mechanize to return Page instead of File object?
Use FakeWeb to replay a prefetched HTTP request:
tags_url = "http://website.com/tags/"
request = `curl -is #{tags_url}`
FakeWeb.register_uri(:get, tags_url, :response => request)
agent = WWW::Mechanize.new
page = agent.get(tags_url)
page.links.each do |link|
puts link.text.strip
end
Calling curl with the -i flag will include headers in the response.
You can easily fix that adding the option :content_type => "text/html" you your FakeWeb.register_uri call

Resources