Optimizing HTTP Status Request on Heroku (Net:HTTP) - ruby-on-rails

I'm running an app on heroku where users can get the HTTP status (200,301,404, etc) of several URLs that they can paste on a form.
Although it runs fine on my local rails server, when I upload it on heroku, I cannot check more than 30 URLs (I want to check 200), as heroku time outs after 30seconds giving me an H12 Error.
def gethttpresponse(url)
httpstatusarray = Hash.new
uri = URI.parse(url)
http = Net::HTTP.new(uri.host, uri.port)
request = Net::HTTP::Get.new(uri.request_uri)
response = http.request(request)
httpstatusarray['url'] = url
httpstatusarray['status'] = response.code
return httpstatusarray
end
At the moment I'm using Net:HTTP, and it seems very heavy. Is there anything I can change on my code or any other gem I could use to get the HTTP status/headers on a more efficient (fast) way?
i noticed that
response.body holds the entire HTML source code of the page which i do not need. is this loaded on the response object by default?
If this is the most efficient way to check HTTP Status, would you agree that this needs to become a background job?
Any reference on gems that work faster, reading material and thoughts are more than welcome!

If a request takes longer than 30 seconds then the request is timed out as you're seeing here. You're entirely dependent on how fast the server the other end responds. For example, if say it were itself a Heroku application on 1 dyno then it will take possibly 10 seconds to unidle the application therefore leaving only 20 seconds for the other URLs to be tested.
I suggest you move each check to it's own background job and then poll the jobs to know when they complete and update the UI accordingly.

Related

In ruby/rails, can you differentiate between no network response vs long-running response?

We have a Rails app with an integration with box.com. It happens fairly frequently that a request for a box action to our app results in a Passenger process being tied up for right around 15 minutes, and then we get the following exception:
Errno::ETIMEDOUT: Connection timed out - SSL_connect
Often it's on something that should be fairly quick, such as listing the contents of a small folder, or deleting a single document.
I'm under the impression that these requests never actually got to an open channel, that either at the tcp or ssl levels we got no initial response, or the full handshake/session-setup never completed.
I'd like to get either such condition to timeout quickly, say 15 seconds, but allow for a large file that is successfully transferring to continue.
Is there any way to get TCP or SSL to raise a timeout much sooner when the connection at either of those levels fails to complete setup, but not raise an exception if the session is successfully established and it's just taking a long time to actually transfer the data?
Here is what our current code looks like - we are not tied to doing it this way (and I didn't write this code):
def box_delete(uri)
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
request = Net::HTTP::Delete.new(uri.request_uri)
http.request(request)
end

Mechanize error "too many bad responses"

Doing scraping I've found that some urls failed. After check the url looked ok in the browser and see in wireshark the remote server was answering with a 200 I've finally found that the url:
http://www.segundamano.es/electronica-barcelona-particulares/galaxy-note-3-mas.htm
was failing with
Net::HTTP::Persistent::Error: too many bad responses after 0 requests on 42319240, last used 1414078471.6468294 seconds ago
More weird is that if you remove a character from the last part, it works. If you add the character in another place, it fails again.
Update 1
The "code"
agent = Mechanize.new
page = agent.get("http://www.segundamano.es/electronica-barcelona-particulares/galaxy-note-3.htm")
Net::HTTP::Persistent::Error: too many bad responses after 0 requests on 41150840, last used 1414079640.353221 seconds ago
This is a network error which normally occurs if you make too many requests to a certain source from the same IP and thus the page takes too long to load. You could try adding a custom timeout to your connection agent, keep the connection alive and ignore bad chunking (potentially bad):
agent = Mechanize.new
agent.keep_alive = true
agent.ignore_bad_chunking = true
agent.open_timeout = 25
agent.read_timeout = 25
page = agent.get("http://www.segundamano.es/electronica-barcelona-particulares/galaxy-note-3.htm")
But that is not giving you a guarantee that the connection will be successfull, it just increases the chances.
It's hard to say why you get the error on one url and not on another. When you remove the 3 you request a different page; one that might be easier for the server to process? My point being: There is nothing wrong with your Mechanize setup but with the response you are getting back.
Agree with Severin, the problem was in the other side. As I can't do anything in the server, I was trying different libs to fetch the data. It was weird that some of them worked and others don't. Trying different setups for mechanize, at the end I've found a good one:
agent = Mechanize.new { |agent|
agent.gzip_enabled = false
}

How to get the status after sending data to external server rails

In my rails (3.2.13) app I send data to an external server using a form, then the external server process the data I sent and shows that the result is ok or not, I need to save that result or status to my rails app database, but I'm not sure about how to redirect to another page when the process in the external server is done.
I have a function to ask the server if the process of that data went ok using the reference or id that I sent in the first place using the form but as I said I don't know how to redirect after the process is finish...
please help me
You can use some core Ruby libraries to make a subsequent request on the same endpoint to determine the status code of your request. Try the following, cited in whole from Ruby Inside:
# Basic REST.
# Most REST APIs will set semantic values in response.body and response.code.
require "net/http"
http = Net::HTTP.new("api.restsite.com")
request = Net::HTTP::Post.new("/users")
request.set_form_data({"users[login]" => "quentin"})
response = http.request(request)
# Use nokogiri, hpricot, etc to parse response.body.
request = Net::HTTP::Get.new("/users/1")
response = http.request(request)
# As with POST, the data is in response.body.
request = Net::HTTP::Put.new("/users/1")
request.set_form_data({"users[login]" => "changed"})
response = http.request(request)
request = Net::HTTP::Delete.new("/users/1")
response = http.request(request)
Once you've instantiated a response object, you can operate on it in the following manner:
response.code #=> returns HTTP response code

405 Method not allowed on Net::HTTP request [ruby on rails]

I'm trying to verify if there is a remote url with following code:
endpoint_uri = URI.parse(#endpoint.url)
endpoint_http = Net::HTTP.new(endpoint_uri.host, endpoint_uri.port)
endpoint_request = Net::HTTP::Head.new(endpoint_uri.request_uri)
endpoint_response = endpoint_http.request(endpoint_request)
I'm still getting 405 Method not allowed. When I use Get instead Head in Net::HTTP::Head.new I'm getting 200 Success but also with whole remote document in response what results in bigger response time (0.3s => 0.9s).
Any ideas why this is happening? Thx
There's a chance that the #endpoint url you're trying to interact with doesn't support HEAD requests (which would be really weird, but still may be the case). Your code works fine for me with a handful of urls (google.com, stackoverflow.com, etc.)
Have you tried a curl request to see what it returns?
curl -I http://www.the_website_you_want_to_test.com

How do I limit the size of a Net::HTTP request?

I'm creating an API service which allows people to provide a URL of an image to the API call, and the the service downloads the image to process.
How do I ensure somebody does NOT give me the URL of, like, a 5MB image? Is there a way to limit the request?
This is what I have so far, which basically grabs everything.
req = Net::HTTP::Get.new(url.path)
res = Net::HTTP.start(url.host, url.port) { |http|
http.request(req)
}
Thanks,
Conrad
cwninja unfortunately gave you an answer that will only work for accidental attacks. An intelligent attacker will have no trouble at all defeating that check. There are two main reasons his method should not be used. First, nothing guarantees that the information in a HEAD response will match the corresponding GET response. A properly behaving server certainly will do this, but a malicious actor does not have to follow the spec. The attacker could simply send a HEAD response that says it has a Content-Length that's less than your threshold, but then hand you a huge file in the GET response. But that doesn't even cover the potential for a server to send back a response with the Transfer-Encoding: chunked header set. A chunked response could quite possibly never end. A few people pointing your server at never-ending responses could carry out a trivial resource-exhaustion attack, even if your HTTP client enforces a timeout.
To do this correctly, you need to use an HTTP library that allows you to count the bytes as they're received, and abort if it crosses the threshold. I would probably recommend Curb for this rather than Net::HTTP. (Can you even do this at all with Net::HTTP?) If you use the on_body and/or on_progress callbacks, you can count the incoming bytes and abort mid-response if you receive a file that's too large. Obviously, as cwninja already pointed out, if you receive a Content-Length header larger than your threshold, you want to abort for that too. Curb is also notably faster than Net::HTTP.
Try running this first:
Net::HTTP.start(url.host, url.port) { |http|
response = http.request_head(url.path)
raise "File too big." if response['content-length'].to_i > 5*1024*1024
}
You still have a race condition (someone could swap out the file after you do the HEAD request), but in the simple case this asks the server for the headers it would send back from a GET request.
Another one way to limit downloading size (full code should check response status, exception handling etc. It's just an example)
Net::HTTP.start(uri.host, uri.port) do |http|
request = Net::HTTP::Get.new uri.request_uri
http.request request do |response|
# check response codes here
body=''
response.read_body do |chunk|
body += chunk
break if body.size > MY_SAFE_SIZE_LIMIT
end
break
end
end
Combining the other two answers, I'd like to 1) check the size header, 2) watch the size of chunks, while also 3) supporting https and 4) aggressively enforcing a timeout. Here's a helper I came up with:
require "net/http"
require 'uri'
module FetchUtil
# Fetch a URL, with a given max bytes, and a given timeout
def self.fetch_url url, timeout_sec=5, max_bytes=5*1024*1024
uri = URI.parse(url)
t0 = Time.now.to_f
body = ''
Net::HTTP.start(uri.host, uri.port,
:use_ssl => (uri.scheme == 'https'),
:open_timeout => timeout_sec,
:read_timeout => timeout_sec) { |http|
# First make a HEAD request and check the content-length
check_res = http.request_head(uri.path)
raise "File too big" if check_res['content-length'].to_i > max_bytes
# Then fetch in chunks and bail on either timeout or max_bytes
# (Note: timeout won't work unless bytes are streaming in...)
http.request_get(uri.path) do |res|
res.read_body do |chunk|
raise "Timeout error" if (Time.now().to_f-t0 > timeout_sec)
raise "Filesize exceeded" if (body.length+chunk.length > max_bytes)
body += chunk
end
end
}
return body
end
end

Resources