Ruby on Rails 3: Streaming data through Rails to client - ruby-on-rails

I am working on a Ruby on Rails app that communicates with RackSpace cloudfiles (similar to Amazon S3 but lacking some features).
Due to the lack of the availability of per-object access permissions and query string authentication, downloads to users have to be mediated through an application.
In Rails 2.3, it looks like you can dynamically build a response as follows:
# Streams about 180 MB of generated data to the browser.
render :text => proc { |response, output|
10_000_000.times do |i|
output.write("This is line #{i}\n")
end
}
(from http://api.rubyonrails.org/classes/ActionController/Base.html#M000464)
Instead of 10_000_000.times... I could dump my cloudfiles stream generation code in there.
Trouble is, this is the output I get when I attempt to use this technique in Rails 3.
#<Proc:0x000000010989a6e8#/Users/jderiksen/lt/lt-uber/site/app/controllers/prospect_uploads_controller.rb:75>
Looks like maybe the proc object's call method is not being called? Any other ideas?

Assign to response_body an object that responds to #each:
class Streamer
def each
10_000_000.times do |i|
yield "This is line #{i}\n"
end
end
end
self.response_body = Streamer.new
If you are using 1.9.x or the Backports gem, you can write this more compactly using Enumerator.new:
self.response_body = Enumerator.new do |y|
10_000_000.times do |i|
y << "This is line #{i}\n"
end
end
Note that when and if the data is flushed depends on the Rack handler and underlying server being used. I have confirmed that Mongrel, for instance, will stream the data, but other users have reported that WEBrick, for instance, buffers it until the response is closed. There is no way to force the response to flush.
In Rails 3.0.x, there are several additional gotchas:
In development mode, doing things such as accessing model classes from within the enumeration can be problematic due to bad interactions with class reloading. This is an open bug in Rails 3.0.x.
A bug in the interaction between Rack and Rails causes #each to be called twice for each request. This is another open bug. You can work around it with the following monkey patch:
class Rack::Response
def close
#body.close if #body.respond_to?(:close)
end
end
Both problems are fixed in Rails 3.1, where HTTP streaming is a marquee feature.
Note that the other common suggestion, self.response_body = proc {|response, output| ...}, does work in Rails 3.0.x, but has been deprecated (and will no longer actually stream the data) in 3.1. Assigning an object that responds to #each works in all Rails 3 versions.

Thanks to all the posts above, here is fully working code to stream large CSVs. This code:
Does not require any additional gems.
Uses Model.find_each() so as to not bloat memory with all matching objects.
Has been tested on rails 3.2.5,
ruby 1.9.3 and heroku using unicorn, with single dyno.
Adds a GC.start at every 500 rows, so as not to blow the heroku dyno's
allowed memory.
You may need to adjust the GC.start depending on your Model's memory footprint. I have successfully used this to stream 105K models into a csv of 9.7MB without any problems.
Controller Method:
def csv_export
respond_to do |format|
format.csv {
#filename = "responses-#{Date.today.to_s(:db)}.csv"
self.response.headers["Content-Type"] ||= 'text/csv'
self.response.headers["Content-Disposition"] = "attachment; filename=#{#filename}"
self.response.headers['Last-Modified'] = Time.now.ctime.to_s
self.response_body = Enumerator.new do |y|
i = 0
Model.find_each do |m|
if i == 0
y << Model.csv_header.to_csv
end
y << sr.csv_array.to_csv
i = i+1
GC.start if i%500==0
end
end
}
end
end
config/unicorn.rb
# Set to 3 instead of 4 as per http://michaelvanrooijen.com/articles/2011/06/01-more-concurrency-on-a-single-heroku-dyno-with-the-new-celadon-cedar-stack/
worker_processes 3
# Change timeout to 120s to allow downloading of large streamed CSVs on slow networks
timeout 120
#Enable streaming
port = ENV["PORT"].to_i
listen port, :tcp_nopush => false
Model.rb
def self.csv_header
["ID", "Route", "username"]
end
def csv_array
[id, route, username]
end

It looks like this isn't available in Rails 3
https://rails.lighthouseapp.com/projects/8994/tickets/2546-render-text-proc
This appeared to work for me in my controller:
self.response_body = proc{ |response, output|
output.write "Hello world"
}

In case you are assigning to response_body an object that responds to #each method and it's buffering until the response is closed, try in in action controller:
self.response.headers['Last-Modified'] = Time.now.to_s

Just for the record, rails >= 3.1 has an easy way to stream data by assigning an object that respond to #each method to the controller's response.
Everything is explained here: http://blog.sparqcode.com/2012/02/04/streaming-data-with-rails-3-1-or-3-2/

Yes, response_body is the Rails 3 way of doing this for the moment: https://rails.lighthouseapp.com/projects/8994/tickets/4554-render-text-proc-regression

This solved my problem as well - I have gzip'd CSV files, want to send to the user as unzipped CSV, so I read them a line at a time using a GzipReader.
These lines are also helpful if you're trying to deliver a big file as a download:
self.response.headers["Content-Type"] = "application/octet-stream"
self.response.headers["Content-Disposition"] = "attachment; filename=#{filename}"

In addition, you will have to set the 'Content-Length' header by your self.
If not, Rack will have to wait (buffering body data into memory) to determine the length.
And it will ruin your efforts using the methods described above.
In my case, I could determine the length.
In cases you can't, you need to make Rack to start sending body without a 'Content-Length' header.
Try to add into config.ru "use Rack::Chunked" after 'require' before the 'run'. (Thanks arkadiy)

I commented in the lighthouse ticket, just wanted to say the self.response_body = proc approach worked for me though I needed to use Mongrel instead of WEBrick to succeed.
Martin

Applying John's solution along with Exequiel's suggestion worked for me.
The statement
self.response.headers['Last-Modified'] = Time.now.to_s
marks the response as non-cacheable in rack.
After investigating further, I figured one could also use this :
headers['Cache-Control'] = 'no-cache'
This, to me, is just slightly more intuitive. It conveys the message to any1 else who may be reading my code. Also, in case a future version of rack stops checking for Last-Modified , a lot of code may break and it may be a while for folks to figure out why.

Related

Rails: Convert REST API to websocket client

I have a typical Rails REST Api written for a http consumers. However, it turns out they need web socket API because of the integration POS Machines.
The typical API looks like this;
class Api::Pos::V1::TransactionsController < ApplicationController
before_action :authenticate
def index
#transactions = #current_business.business_account.business_deposits.last(5)
render json: {
status: 200,
number: #transactions.count,
transactions: #transactions.as_json(only: [:created_at, :amount, :status, :client_card_number, :client_phone_number])
}
end
private
def request_params
params.permit(:account_number, :api_key)
end
def authenticate
render status: 401, json: {
status: 401,
error: "Authentication Failed."
} unless current_business
end
def current_business
account_number = request_params[:account_number].to_s
api_key = request_params[:api_key].to_s
if account_number and api_key
account = BusinessAccount.find_by(account_number: account_number)
if account && Business.find(account.business_id).business_api_key.token =~ /^(#{api_key})/
#current_business = account.business
else
false
end
end
end
end
How can i serve the same responses using web-sockets?
P.S: Never worked with sockets before
Thank you
ActionCable
I would second Dimitris's reference to ActionCable, as it's expected to become part of Rails 5 and should (hopefully) integrate with Rails quite well.
Since Dimitris suggested SSE, I would recommend against doing so.
SSE (Server Sent Events) use long polling and I would avoid this technology for many reasons which include the issue of SSE connection interruptions and extensibility (websockets allow you to add features that SSE won't support).
I am almost tempted to go into a rant about SSE implementation performance issues, but... even though websocket implementations should be more performant, many of them suffer from similar issues and the performance increase is often only in thanks to the websocket connection's longer lifetime...
Plezi
Plezi* is a real-time web application framework for Ruby. You can either use it on it's own (which is not relevant for you) or together with Rails.
With only minimal changes to your code, you should be able to use websockets to return results from your RESTful API. Plezi's Getting Started Guide has a section about unifying the backend's RESTful and Websocket API's. Implementing it in Rails should be similar.
Here's a bit of Demo code. You can put it in a file called plezi.rb and place it in your application's config/initializers folder...
Just make sure you're not using any specific Servers (thin, puma, etc'), allowing Plezi to override the server and use the Iodine server, and remember to add Plezi to your Gemfile.
class WebsocketDemo
# authenticate
def on_open
return close unless current_business
end
def on_message data
data = JSON.parse(data) rescue nil
return close unless data
case data['msg']
when /\Aget_transactions\z/i
# call the RESTful API method here, if it's accessible. OR:
transactions = #current_business.business_account.business_deposits.last(5)
write {
status: 200,
number: transactions.count,
# the next line has what I think is an design flaw, but I left it in
transactions: transactions.as_json(only: [:created_at, :amount, :status, :client_card_number, :client_phone_number])
# # Consider, instead, to avoid nesting JSON streams:
# transactions: transactions.select(:created_at, :amount, :status, :client_card_number, :client_phone_number)
}.to_json
end
end
# don't disclose inner methods to the router
protected
# better make the original method a class method, letting you reuse it.
def current_business
account_number = params[:account_number].to_s
api_key = params[:api_key].to_s
if account_number && api_key
account = BusinessAccount.find_by(account_number: account_number)
if account && Business.find(account.business_id).business_api_key.token =~ /^(#{api_key})/
return (#current_business = account.business)
end
false
end
end
end
Plezi.route '/(:api_key)/(:account_number)', WebsocketDemo
Now we have a route that looks something like: wss://my.server.com/app_key/account_number
This route can be used to send and receive data in JSON format.
To get the transaction list, the client side application can send:
JSON.stringify({msg: "get_transactions"})
This will result in data being send to the client's websocket.onmessage callback with the last five transactions.
Of course, this is just a short demo, but I think it's a reasonable proof of concept.
* I should point out that I'm biased, as I'm Plezi's author.
P.S.
I would consider moving the authentication into a websocket "authenticate" message, allowing the application key to be sent in a less conspicuous manner.
EDIT
These are answers to the questions in the comments.
Capistrano
I don't use Capistrano, so I'm not sure... but, I think it would work if you add the following line to your Capistrano tasks:
Iodine.protocol = false
This will prevent the server from auto-starting, so your Capistrano tasks flow without interruption.
For example, at the beginning of the config/deploy.rb you can add the line:
Iodine.protocol = false
# than the rest of the file, i.e.:
set :deploy_to, '/var/www/my_app_name'
#...
You should also edit your rakefile and add the same line at the beginning of the rakefile, so your rakefile includes the line:
Iodine.protocol = false
Let me know how this works. Like I said, I don't use Capistrano and I haven't tested it out.
Keeping Passenger using a second app
The Plezi documentation states that:
If you really feel attached to your thin, unicorn, puma or passanger server, you can still integrate Plezi with your existing application, but they won't be able to share the same process and you will need to utilize the Placebo API (a guide is coming soon).
But the guide isn't written yet...
There's some information in the GitHub Readme, but it will be removed after the guide is written.
Basically you include the Plezi application with the Redis URL inside your Rails application (remember to make sure to copy all the gems used in the gemfile). than you add this line:
Plezi.start_placebo
That should be it.
Plezi will ignore the Plezi.start_placebo command if there is no other server defined, so you can put the comment in a file shared with the Rails application as long as Plezi's gem file doesn't have a different server.
You can include some or all of the Rails application code inside the Plezi application. As long as Plezi (Iodine, actually) is the only server in the Plezi GEMFILE, it should work.
The applications will synchronize using Redis and you can use your Plezi code to broadcast websocket events inside your Rails application.
You may want to have a look at https://github.com/rails/actioncable which is the Rails way to deal with WebSockets, but currently in Alpha.
Judging from your code snippet, the client seems to only consume data from your backend. I'm skeptical whether you really need WebSockets. Ιf the client won't push data back to the server, Server Sent Events seem more appropriate.
See relevant walk-through and documentation.

Is I18n.with_locale threadsafe?

I have created a feature that publish a news with the language of the page's creator.
Here is the code that create the news :
def add_news
locale = creator.language.blank? ? I18n.locale : creator.language
I18n.with_locale(locale) do
title = I18n.t('news.subject')
end
create_news({title: title})
end
It works good, the news is created with the good language. But sometimes, a wrong language is used. I have read the sourcecode of i18n (https://github.com/svenfuchs/i18n/blob/master/lib/i18n.rb), and for me the with_local function is not threadsafe. I was very surprised beacause I have read no post on that problem.
So, waht do you think about that ? Threadsafe or not ? Do you know a other solution if so ?
Thanks and br,
Eric
Looks like it is from the Ruby on Rails guides as it is using Thread.current.
Also ran a small (conclusive) experiment:
n = I18n.available_locales.length
10.times do |i|
loc = I18n.available_locales[i % n]
Thread.new do
I18n.with_locale(loc) do
puts "#{loc} #{I18n.t 'one.of.your.keys'}"
end
end
end
Thread.current is not thread safe for threaded web servers like Puma or Thin. See github.com/steveklabnik/request_store for a more detailed explanation:
The problem
Everyone's worrying about concurrency these days. So people are using
those fancy threaded web servers, like Thin or Puma. But if you use
Thread.current, and you use one of those servers, watch out! Values
can stick around longer than you'd expect, and this can cause bugs.
For example, if we had this in our controller:
def index
Thread.current[:counter] ||= 0
Thread.current[:counter] += 1
render :text => Thread.current[:counter]
end
If we ran this on MRI with Webrick, you'd get 1 as output, every time.
But if you run it with Thin, you get 1, then 2, then 3...

Rails 3 application/octet-stream

I am working on an application where I use paperclip for uploading images, then the image is manipulated in a flash app and returned to my application using application/octet-stream. The problem is that the parameters from flash are not available using params. I have seen examples where something like
File.open(..,..) {|f| f.write(request.body) }
but when I do this, the file is damaged some how.
How can I handle this in rails 3?
After you make sure that the request parameters have hit the Rails application, you may want to ensure that there were no parsing problems. Try to add these lines in you controller's action:
def update # (or whatever)
logger.debug "params: #{params.inspect}"
# I hope you do not test this using very large files ;)
logger.debug "request.raw_post: #{request.raw_post.inspect}"
# ...
end
Maybe the variable names got changed somehow? Maybe something escaped the parameter string one time too much?
Also, you have said that the file into which you want to save the request body is damaged. How exactly?
The request.body object does not need to be String. It may be a StringIO, for example, so you may want to type this:
File.open(..,..) {|f| f.write(request.body.read) }

Rails - Paper_Clip - Support for Multi File Uploads

I have paper_clip installed on my Rails 3 app, and can upload a file - wow that was fun and easy!
Challenge now is, allowing a user to upload multiple objects.
Whether it be clicking select fileS and being able to select more than one. Or clicking a more button and getting another file upload button.
I can't find any tutorials or gems to support this out of the box. Shocking I know...
Any suggestions or solutions. Seems like a common need?
Thanks
Okay, this is a complex one but it is doable. Here's how I got it to work.
On the client side I used http://github.com/valums/file-uploader, a javascript library which allows multiple file uploads with progress-bar and drag-and-drop support. It's well supported, highly configurable and the basic implementation is simple:
In the view:
<div id='file-uploader'><noscript><p>Please Enable JavaScript to use the file uploader</p></noscript></div>
In the js:
var uploader = new qq.FileUploader({
element: $('#file-uploader')[0],
action: 'files/upload',
onComplete: function(id, fileName, responseJSON){
// callback
}
});
When handed files, FileUploader posts them to the server as an XHR request where the POST body is the raw file data while the headers and filename are passed in the URL string (this is the only way to upload a file asyncronously via javascript).
This is where it gets complicated, since Paperclip has no idea what to do with these raw requests, you have to catch and convert them back to standard files (preferably before they hit your Rails app), so that Paperclip can work it's magic. This is done with some Rack Middleware which creates a new Tempfile (remember: Heroku is read only):
# Embarrassing note: This code was adapted from an example I found somewhere online
# if you recoginize any of it please let me know so I pass credit.
module Rack
class RawFileStubber
def initialize(app, path=/files\/upload/) # change for your route, careful.
#app, #path = app, path
end
def call(env)
if env["PATH_INFO"] =~ #path
convert_and_pass_on(env)
end
#app.call(env)
end
def convert_and_pass_on(env)
tempfile = env['rack.input'].to_tempfile
fake_file = {
:filename => env['HTTP_X_FILE_NAME'],
:type => content_type(env['HTTP_X_FILE_NAME']),
:tempfile => tempfile
}
env['rack.request.form_input'] = env['rack.input']
env['rack.request.form_hash'] ||= {}
env['rack.request.query_hash'] ||= {}
env['rack.request.form_hash']['file'] = fake_file
env['rack.request.query_hash']['file'] = fake_file
if query_params = env['HTTP_X_QUERY_PARAMS']
require 'json'
params = JSON.parse(query_params)
env['rack.request.form_hash'].merge!(params)
env['rack.request.query_hash'].merge!(params)
end
end
def content_type(filename)
case type = (filename.to_s.match(/\.(\w+)$/)[1] rescue "octet-stream").downcase
when %r"jp(e|g|eg)" then "image/jpeg"
when %r"tiff?" then "image/tiff"
when %r"png", "gif", "bmp" then "image/#{type}"
when "txt" then "text/plain"
when %r"html?" then "text/html"
when "js" then "application/js"
when "csv", "xml", "css" then "text/#{type}"
else 'application/octet-stream'
end
end
end
end
Later, in application.rb:
config.middleware.use 'Rack::RawFileStubber'
Then in the controller:
def upload
#foo = modelWithPaperclip.create({ :img => params[:file] })
end
This works reliably, though it can be a slow process when uploading a lot of files simultaneously.
DISCLAIMER
This was implemented for a project with a single, known & trusted back-end user. It almost certainly has some serious performance implications for a high traffic Heroku app and I have not fire tested it for security. That said, it definitely works.
The method Ryan Bigg recommends is here:
https://github.com/rails3book/ticketee/commit/cd8b466e2ee86733e9b26c6c9015d4b811d88169
https://github.com/rails3book/ticketee/commit/982ddf6241a78a9e6547e16af29086627d9e72d2
The file-uploader recommendation by Daniel Mendel is really great. It's a seriously awesome user experience, like Gmail drag-and-drop uploads. Someone wrote a blog post about how to wire it up with a rails app using the rack-raw-upload middleware, if you're interested in an up-to-date middleware component.
http://pogodan.com/blog/2011/03/28/rails-html5-drag-drop-multi-file-upload
https://github.com/newbamboo/rack-raw-upload
http://marc-bowes.com/2011/08/17/drag-n-drop-upload.html
There's also another plugin that's been updated more recently which may be useful
jQuery-File-Upload
Rails setup instructions
Rails setup instructions for multiples
And another one (Included for completeness. I haven't investigated this one.)
PlUpload
plupload-rails3
These questions are highly related
Drag-and-drop file upload in Google Chrome/Chromium and Safari?
jQuery Upload Progress and AJAX file upload
I cover this in Rails 3 in Action's Chapter 8. I don't cover uploading to S3 or resizing images however.
Recommending you buy it based solely on it fixing this one problem may sound a little biased, but I can just about guarantee you that it'll answer other questions you have down the line. It has a Behaviour Driven Development approach as one of the main themes, introducing you to Rails features during the development of an application. This shows you not only how you can build an application, but also make it maintainable.
As for the resizing of images after they've been uploaded, Paperclip's got pretty good documentation on that. I'd recommend having a read and then asking another question on SO if you don't understand any of the options / methods.
And as for S3 uploading, you can do this:
has_attached_file :photo, :styles => { ... }, :storage => :s3
You'd need to configure Paperclip::Storage::S3 with your S3 details to set it up, and again Paperclip's got some pretty awesome documentation for this.
Good luck!

Secure paperclip urls only for secure pages

I'm trying to find the best way to make paperclip urls secure, but only for secure pages.
For instance, the homepage, which shows images stored in S3, is http://mydomain.com and the image url is http://s3.amazonaws.com/mydomainphotos/89/thisimage.JPG?1284314856.
I have secure pages like https://mydomain.com/users/my_stuff/49 that has images stored in S3, but the S3 protocol is http and not https, so the user gets a warning from the browser saying that some elements on the page are not secure, blah blah blah.
I know that I can specify :s3_protocol in the model, but this makes everything secure even when it isn't necessary. So, I'm looking for the best way to change the protocol to https on the fly, only for secure pages.
One (probably bad) way would be to create a new url method like:
def custom_url(style = default_style, ssl = false)
ssl ? self.url(style).gsub('http', 'https') : self.url(style)
end
One thing to note is that I'm using the ssl_requirement plugin, so there might be a way to tie it in with that.
I'm sure there is some simple, standard way to do this that I'm overlooking, but I can't seem to find it.
If anyone stumbles upon this now: There is a solution in Paperclip since April 2012! Simply write:
Paperclip::Attachment.default_options[:s3_protocol] = ""
in an initializer or use the s3_protocol option inside your model.
Thanks to #Thomas Watson for initiating this.
If using Rails 2.3.x or newer, you can use Rails middleware to filter the response before sending it back to the user. This way you can detect if the current request is an HTTPS request and modify the calls to s3.amazonaws.com accordingly.
Create a new file called paperclip_s3_url_rewriter.rb and place it inside a directory that's loaded when the server starts. The lib direcotry will work, but many prefer to create an app/middleware directory and add this to the Rails application load path.
Add the following class to the new file:
class PaperclipS3UrlRewriter
def initialize(app)
#app = app
end
def call(env)
status, headers, response = #app.call(env)
if response.is_a?(ActionController::Response) && response.request.protocol == 'https://' && headers["Content-Type"].include?("text/html")
body = response.body.gsub('http://s3.amazonaws.com', 'https://s3.amazonaws.com')
headers["Content-Length"] = body.length.to_s
[status, headers, body]
else
[status, headers, response]
end
end
end
Then just register the new middleware:
Rails 2.3.x: Add the line below to environment.rb in the beginning of the Rails::Initializer.run block.
Rails 3.x: Add the line below to application.rb in the beginning of the Application class.
config.middleware.use "PaperclipS3UrlRewriter"
UPDATE:
I just edited my answer and added a check for response.is_a?(ActionController::Response) in the if statement. In some cases (maybe caching related) the response object is an empty array(?) and hence fails when request is called upon it.
UPDATE 2:
I edited the Rack/Middleware code example above to also update the Content-Length header. Otherwise the HTML body will be truncated by most browsers.
Use the following code in a controller class:
# locals/arguments/methods you must define or have available:
# attachment - the paperclip attachment object, not the ActiveRecord object
# request - the Rack/ActionController request
AWS::S3::S3Object.url_for \
attachment.path,
attachment.options[:bucket].to_s,
:expires_in => 10.minutes, # only necessary for private buckets
:use_ssl => request.ssl?
You can of course wrap this up nicely into a method.
FYI - some of the answers above do not work with Rails 3+, because ActionController::Response has been deprecated. Use the following:
class PaperclipS3UrlRewriter
def initialize(app)
#app = app
end
def call(env)
status, headers, response = #app.call(env)
if response.is_a?(ActionDispatch::BodyProxy) && headers && headers.has_key?("Content-Type") && headers["Content-Type"].include?("text/html")
body_string = response.body[0]
response.body[0] = body_string.gsub('http://s3.amazonaws.com', 'https://s3.amazonaws.com')
headers["Content-Length"] = body_string.length.to_s
[status, headers, response]
else
[status, headers, response]
end
end
end
And make sure that you add the middleware in a good place in the stack (I added it after Rack::Runtime)
config.middleware.insert_after Rack::Runtime, "PaperclipS3UrlRewriter"

Resources