Read only portion of a remote file with rails [duplicate] - ruby-on-rails

It seems like the methods of Ruby's Net::HTTP are all or nothing when it comes to reading the body of a web page. How can I read, say, the just the first 100 bytes of the body?
I am trying to read from a content server that returns a short error message in the body of the response if the file requested isn't available. I need to read enough of the body to determine whether the file is there. The files are huge, so I don't want to get the whole body just to check if the file is available.

This is an old thread, but the question of how to read only a portion of a file via HTTP in Ruby is still a mostly unanswered one according to my research. Here's a solution I came up with by monkey-patching Net::HTTP a bit:
require 'net/http'
# provide access to the actual socket
class Net::HTTPResponse
attr_reader :socket
end
uri = URI("http://www.example.com/path/to/file")
begin
Net::HTTP.start(uri.host, uri.port) do |http|
request = Net::HTTP::Get.new(uri.request_uri)
# calling request with a block prevents body from being read
http.request(request) do |response|
# do whatever limited reading you want to do with the socket
x = response.socket.read(100);
# be sure to call finish before exiting the block
http.finish
end
end
rescue IOError
# ignore
end
The rescue catches the IOError that's thrown when you call HTTP.finish prematurely.
FYI, the socket within the HTTPResponse object isn't a true IO object (it's an internal class called BufferedIO), but it's pretty easy to monkey-patch that, too, to mimic the IO methods you need. For example, another library I was using (exifr) needed the readchar method, which was easy to add:
class Net::BufferedIO
def readchar
read(1)[0].ord
end
end

Shouldn't you just use an HTTP HEAD request (Ruby Net::HTTP::Head method) to see if the resource is there, and only proceed if you get a 2xx or 3xx response? This presumes your server is configured to return a 4xx error code if the document is not available. I would argue this was the correct solution.
An alternative is to request the HTTP head and look at the content-length header value in the result: if your server is correctly configured, you should easily be able to tell the difference in length between a short message and a long document. Another alternative: set the content-range header field in the request (which again assumes that the server is behaving correctly WRT the HTTP spec).
I don't think that solving the problem in the client after you've sent the GET request is the way to go: by that time, the network has done the heavy lifting, and you won't really save any wasted resources.
Reference: http header definitions

I wanted to do this once, and the only thing that I could think of is monkey patching the Net::HTTP#read_body and Net::HTTP#read_body_0 methods to accept a length parameter, and then in the former just pass the length parameter to the read_body_0 method, where you can read only as much as length bytes.

To read the body of an HTTP request in chunks, you'll need to use Net::HTTPResponse#read_body like this:
http.request_get('/large_resource') do |response|
response.read_body do |segment|
print segment
end
end

Are you sure the content server only returns a short error page?
Doesn't it also set the HTTPResponse to something appropriate like 404. In which case you can trap the HTTPClientError derived exception (most likely HTTPNotFound) which is raised when accessing Net::HTTP.value().
If you get an error then your file wasn't there if you get 200 the file is starting to download and you can close the connection.

You can't. But why do you need to? Surely if the page just says that the file isn't available then it won't be a huge page (i.e. by definition, the file won't be there)?

Related

Verify Shopify webhook

I believe that to have a Shopify webhook integrate with a Rails app, the Rails app needs to disable the default verify_authenticity_token method, and implement its own authentication using the X_SHOPIFY_HMAC_SHA256 header. The Shopify docs say to just use request.body.read. So, I did that:
def create
verify_webhook(request)
# Send back a 200 OK response
head :ok
end
def verify_webhook(request)
header_hmac = request.headers["HTTP_X_SHOPIFY_HMAC_SHA256"]
digest = OpenSSL::Digest.new("sha256")
request.body.rewind
calculated_hmac = Base64.encode64(OpenSSL::HMAC.digest(digest, SHARED_SECRET, request.body.read)).strip
puts "header hmac: #{header_hmac}"
puts "calculated hmac: #{calculated_hmac}"
puts "Verified:#{ActiveSupport::SecurityUtils.secure_compare(calculated_hmac, header_hmac)}"
end
The Shopify webhook is directed to the correct URL and the route gives it to the controller method shown above. But when I send a test notification, the output is not right. The two HMACs are not equal, and so it is not verified. I am fairly sure that the problem is that Shopify is using the entire request as their seed for the authentication hash, not just the POST contents. So, I need the original, untouched HTTP request, unless I am mistaken.
This question seemed like the only promising thing on the Internet after at least an hour of searching. It was exactly what I was asking and it had an accepted answer with 30 upvotes. But his answer... is absurd. It spits out an unintelligible, garbled mess of all kinds of things. Am I missing something glaring?
Furthermore, this article seemed to suggest that what I am looking for is not possible. It seems that Rails is never given the unadulterated request, but it is split into disparate parts by Rack, before it ever gets to Rails. If so, I guess I could maybe attempt to reassemble it, but I would have to even get the order of the headers correct for a hash to work, so I can't imagine that would be possible.
I guess my main question is, am I totally screwed?
The problem was in my SHARED_SECRET. I assumed this was the API secret key, because a few days ago it was called the shared secret in the Shopify admin page. But now I see a tiny paragraph at the bottom of the notifications page that says,
All your webhooks will be signed with ---MY_REAL_SHARED_SECRET--- so
you can verify their integrity.
This is the secret I need to use to verify the webhooks. Why there are two of them, I have no idea.
Have you tried doing it in the order they show in their guides? They have a working sample for ruby.
def create
request.body.rewind
data = request.body.read
header = request.headers["HTTP_X_SHOPIFY_HMAC_SHA256"]
verified = verify_webhook(data, header)
head :ok
end
They say in their guides:
Each Webhook request includes a X-Shopify-Hmac-SHA256 header which is
generated using the app's shared secret, along with the data sent in
the request.
the keywors being "generated using shared secret AND DATA sent in the request" so all of this should be available on your end, both the DATA and the shared secret.

RSpec mocking response attribute

I'm stuck with a small issue which I can't seem to find on the RSpec Doc (or elsewhere). I suspect it's probably to do because it's trying an HTTP call but unsure.
I'm currently attempting to mock a response in my controller, the actual response is coming straight from aws-sdk (so it's an external service).
Controller:
response = {
snapshot_id: client.copy_snapshot({foo, bar}).snapshot_id
}
Is it possible to stub .snapshot_id?
_spec.rb
before do
allow(client).to receive(:copy_snapshot).and_return('Something')
put :update, format: :json
puts JSON.parse(response.body)
end
The tests passes successfully without adding .snapshot_id on the controller page and I can see
{"snapshot_id"=>"Something"}
Otherwise I get "this resource is currently unavailable." adding it in, I'm slightly lost if I'm doing it wrong or if this is intended behavior.
The reasoning is that when the server is on I will actually get a response back, however omitting snapshot_id causes the server to hang as amazon returns back a lot of data (headers, etc)
Any help is greatly appreciated.

Generate XML dynamically and post it to a web service in Rails

I am currently developing a Rails app in which I need to dynamically send XML request to an external web service. I've never done this before and I a bit lost.
More precisely I need to send requests to my logistic partner when the status of an order is updated. For instance when an order is confirmed I need to send data such as the customer's address, the pickup address, etc...
I intended to use the XML builder to dynamically generate the request and Net:HTTP or HTTParty to post the request, based on this example.
Is that the right way to do so? How can I generate the XML request outside the controller and then use it in HTTParty or Net:HTTP?
Thanks for your help,
Clem
That method will work just fine.
As for how to get the XML where you need it, just pass it around like any other data. You can use the Builder representation, which will automatically convert to a String as appropriate, or you can pass around a stringified (to_s) version of the Builder object.
If, for example, it makes sense for your model (which we'll call OrderStatus) to generate the XML, and for your controller to post the request:
# Model (order_status.rb)
def to_xml
xml = Builder::XmlMarkup.new
... # Your code here
xml
end
# Controller (order_statuses_controller.rb)
def some_method
#order_status = OrderStatus.find(:some_criteria)
... # Your code here
http = Net::HTTP.new("www.thewebservicedomain.com")
response = http.post("/some/path/here", #order_status.to_xml)
end
You may want to wrap the HTTP calls in a begin/rescue/end block and do something with the response, but otherwise it's all pretty straightforward and simple.
Make XML with Builder, then send it down the wire.
In your case it sounds like you may need to send several different requests as the order evolves; in that case:
Plan out what your possible order states are.
Determine what data needs to be sent for each state.
Decide how to represent that state within your models, so you can send the appropriate request when the state changes.
Where my example uses one method to generate XML, maybe you'll want 5 methods to handle 5 possible order states.

My web site need to read a slow web site, how to improve the performance

I'm writing a web site with rails, which can let visitors inputing some domains and check if they had been regiestered.
When user clicked "Submit" button, my web site will try to post some data to another web site, and read the result back. But that website is slow for me, each request need 2 or 3 seconds. So I'm worried about the performance.
For example, if my web server allows 100 processes at most, that there are only 30 or 40 users can visit my website at the same time. This is not acceptable, is there any way to improve the performance?
PS:
At first, I want to use ajax reading that web site, but because of the "cross-domain" problem, it doesn't work. So I have to use this "ajax proxy" solution.
It's a bit more work, but you can use something like DelayedJob to process the requests to the other site in the background.
DelayedJob creates separate worker processes that look at a jobs table for stuff to do. When the user clicks submit, such a job is created, and starts running in one of those workers. This off-loads your Rails workers, and keeps your website snappy.
However, you will have to create some sort of polling mechanism in the browser while the job is running. Perhaps using a refresh or some simple AJAX. That way, the visitor could see a message such as “One moment, please...”, and after a while, the actual results.
Rather than posting some data to the websites, you could use an HTTP HEAD request, which (I believe) should return only the header information for that URL.
I found this code by googling around a bit:
require "net/http"
req = Net::HTTP.new('google.com', 80)
p req.request_head('/')
This will probably be faster than a POST request, and you won't have to wait to receive the entire contents of that resource. You should be able to determine whether the site is in use based on the response code.
Try using typhoeus rather than AJAX to get the body. You can POST the domain names for that site to check using typhoeus and can parse the response fetched. Its extremely fast compared to other solutions. A snippet that i ripped from the wiki page from the github repo http://github.com/pauldix/typhoeus shows that you can run requests in parallel (Which is probably what you want considering that it takes 1 to 2 seconds for an ajax request!!) :
hydra = Typhoeus::Hydra.new
first_request = Typhoeus::Request.new("http://localhost:3000/posts/1.json")
first_request.on_complete do |response|
post = JSON.parse(response.body)
third_request = Typhoeus::Request.new(post.links.first) # get the first url in the post
third_request.on_complete do |response|
# do something with that
end
hydra.queue third_request
return post
end
second_request = Typhoeus::Request.new("http://localhost:3000/users/1.json")
second_request.on_complete do |response|
JSON.parse(response.body)
end
hydra.queue first_request
hydra.queue second_request
hydra.run # this is a blocking call that returns once all requests are complete
first_request.handled_response # the value returned from the on_complete block
second_request.handled_response # the value returned from the on_complete block (parsed JSON)
Also Typhoeus + delayed_job = AWESOME!

Is there anyway to make a Rails / Rack application tell the web server to drop the connection

There are many security reasons why one would want to drop an HTTP connection with no response (eg. OWASP's SSL best practices). When these can be detected at the server level then it's no big deal. However, what if you can only detect this condition at the application level?
Does Rails, or more generally Rack, have any standard way of telling the server to drop the connection without a response? If not, are there some standard headers to pass in that will accomplish that in common web servers (I'm thinking Nginx or Apache)? Even if there is not a standard header is there a reasonable way to configure that behavior? Is this a fool's errand?
Nginx has a mechanism for this. When you are returning a special status code 444 (it's non-standard), Nginx silently drops the connection. This happens only when you return this code from the Nginx config, i.e. like
location = /drop {
return 444;
}
and you cannot return this status code from your application. The workaround is to return X-Accel-Redirect: /drop header from the app to tell Nginx use /drop location for this request.
I could be wrong but I don't think Rack or Rails provide a way to drop a connection. I think the closest might be something like "render :nothing => true". But even that ironically sends something (A single space, apparently to avoid a Safari bug...) but at least its terminating the request rather than redirecting (having the client initiate a new request) as the OWASP warns against.
class TestController < ApplicationController
def nothing
render :nothing => true
end
end
>> app.get('test/nothing')
=> 200
>> app.response.body
=> " "
I hope that helps.
Would you please elaborate on what you mean by "dropping a connection"? If sending back headers with whatever response code you want (Moved, Unauthorized, Not Found) is okay - you already got the answer (render :noting, or :head). You can add :status => some_status.
If you mean dropping connection on TCP/IP level, as do firewalls, well that's another thing. I doubt this is possible. And I don't think advisable (if possible).
And on the page you posted link to "dropping connection" is used as a synonym to refuse HTTPS connection - means render some response with status Unauthorized or something like that.

Resources