Batch downloads in Ruby on Rails

Batch downloads in Ruby on Rails - ruby-on-rails

I've got a Ruby on Rails app and I was wondering what the best way do do batch downloads would be? At any given time I've got a set of URLs that point to files that I want my users to be able to download, but based on a search of those files done by my users I want them to be able to download a subset of those files, say the search result, in one process instead of having them download them individually. This set of files may potentially number in the thousands. My real question is, based on an array of URLs, how do I enable my app to download that entire set at once? I of course did some Googling and I came up with the solution below. It doesn't seem to work for me, but it did seem to work for those who posted it as a solution for a similar problem. Any and all input would be appreciated.
# controller code
def download
for n in 0..(#urls.length - 1)
send_file(#urls[n], :type => "video/quicktime",
:filename => #urls[n].basename,
:disposition => "attachment")
end
end
# view code
<%= link_to 'Download them all', :controller => 'my_controller',
:action => 'download' %>

This approach seems to me that it will use a huge amount of memory, especially with 1000s of files downloaded per user at a time. Perhaps instead you should ZIP the files in the background after they click a link and then send it to the user or email them the location of where the ZIP's at. It'll still use a lot of memory to ZIP that many files, so perhaps offloading that task to another server would be good.

Related

Prevent bots from accessing rails active_storage images

My site has a large number of graphs which are recalculated each day as new data is available. The graphs are stored on Amazon S3 using active_storage. A typical example would be
# app/models/graph.rb
class Graph < ApplicationRecord
has_one_attached :plot
end
and in the view
<%= image_tag graphs.latest.plot %>
where graphs.latest retrieves the latest graph. Each day, a new graph and attached plot is created and the old graph/plot is deleted.
A number of bots, including from Google and Yandex are indexing the graphs, but then are generating exceptions when the bot returns and accesses the image again at urls like
www.myapp.com/rails/active_storage/representations/somelonghash
Is there a way to produce a durable link for the plot that does not expire when the graph/plot is deleted and then recalculated. Failing this, is there a way to block bots from accessing these plots.
Note that I currently have a catchall at the end of the routes.rb file:
get '*all', to: 'application#route_not_found', constraints: lambda { |req|
req.path.exclude? 'rails/active_storage'
} if Rails.env.production?
The exclusion of active storage in the catchall is in response to this issue. It is tempting to remove the active_storage exemption, but this might then stop proper active_storage routes.
Maybe I can put something in rack_rewrite.rb to fix this?

Interesting question.
A simple solution would be to use the send_data functionality to send the image directly. However, that can have it's own issues, mostly in terms of probably increasing server bandwidth usage (and reducing server performance). However, a solution like that is what you need if you don't want to go through the trouble of the below as far as creating a redirect model goes and the logic around that.
Original Answer
The redirect will require setting up some sort of Redirects::Graph model. That basically can verify that a graph was deleted and redirect to the new graph instead of the requested one. It would have two fields, a old_signed_id (biglonghash) and a new_signed_id.
Every time you delete
We'll need to populate the redirects model and also add a new entry every time a new graph is created (we should be able to generate the signed_id from the blob somehow).
For performance, and to avoid tons of redirects in a row which may result in a different error/issue. You'll have to manage the request change. IE: Say you now have a redirect A => B, you delete B replacing it with C now you need a A => C and B => C (to avoid an A => B => C redirect), this chain could get rather long indeed. This can be handled efficiently by just adding the new signed_id => new_id index and doing a Redirects::Graph.where(new_signed_id: old_signed_id).update_all(new_signed_id: new_signed_id) to update all the relevant old redirects, whenever you re-generate the graph.
The controller itself is trickier, cleanest method I can think of is to monkey patch the ActiveStorage::RepresentationsController to add a before_action that does something like this (may not work as-is, the params[:signed_id] and the representations path may not be right):
before_action :redirect_if_needed
def redirect_if_needed
redirect_model = Redirects::Graph.find_by(old_signed_id: params[:signed_id])
redirect_to rails_activestorage_representations_path(
signed_id: redirect_model.new_signed_id
) if redirect_model.present?
end
If you have version control setup for your database (IE: Papertrail gem or something) you may be able to work out the old_signed_id and new_signed_id with a bit of work and build the redirects for the urls currently causing errors. Otherwise, sadly this approach will only prevent future errors, and it may be impossible to get the current broken urls working.
Ideally, though you would update the blob itself to use the new graph instead of the old graph rather than deleting, but not sure that's possible/practical.

have you tried?
In the file /robots.txt put:
User-agent: *
Disallow: /rails/active_storage*

Streaming Download while File is Created

I was wondering if anyone knows how to stream a file download while its being created at the same time.
I'm generating a huge CSV export and as of right now it takes a couple minutes for the file to be created. Once its created the browser then downloads the file.
I want to change this so that the browser starts downloading the file while its being created. Looking at this progress bar users will be more willing to wait. Even though it would tell me there an “Unknown time remaining” I’m less likely to get impatient since I know data is being steadily downloaded.
NOTE: Im using Rails version 3.0.9
Here is my code:
def users_export
File.new("users_export.csv", "w") # creates new file to write to
#todays_date = Time.now.strftime("%m-%d-%Y")
#outfile = #todays_date + ".csv"
#users = User.select('id, login, email, last_login, created_at, updated_at')
FasterCSV.open("users_export.csv", "w+") do |csv|
csv << [ #todays_date ]
csv << [ "id","login","email","last_login", "created_at", "updated_at" ]
#users.find_each(:batch_size => 100 ) do |u|
csv << [ u.id, u.login, u.email, u.last_login, u.created_at, u.updated_at ]
end
end
send_file "users_export.csv",
:type => 'text/csv; charset=iso-8859-1; header=present',
:disposition => "attachment; filename=#{#outfile}",
:stream => true,
end

I sought an answer to this question several weeks ago. I thought that if data was being streamed back to the client then maybe Heroku wouldn't time out one of my long running API calls after 30 seconds. I even found an answer that looked promising:
format.xml do
self.response_body =
lambda { |response, output|
output.write("<?xml version='1.0' encoding='UTF-8' ?>")
output.write("<results type='array' count='#{#report.count}'>")
#report.each do |result|
output.write("""
<result>
<element-1>Data-1</element-1>
<element-2>Data-2</element-2>
<element-n>Data-N</element-n>
</result>
""")
end
output.write("</results>")
}
end
The idea being that the response_body lambda will have direct access to the output buffer going back to the client. However, in practice Rack has its own ideas about what data should be sent back and when. Furthermore this response_body as lambda pattern is deprecated in newer versions of rails and I think support is dropped outright in 3.2. You could get your hands dirty in the middleware stack and write this output as a Rails Metal but......
If I may be so bold, I strongly suggest refactoring this work to a background job. The benefits are many:
Your users will not have to just sit and wait for the download. They can request a file and then browse away to other more exciting portions of your site.
The file generation and download will be more robust, for example, if a user loses internet connectivity, even briefly, on minute three of a download under the current setup, they will lose all that time and need to start over again. If the file is being generated in the background on your site, they only need internet for as long as it takes to get the job started.
It will decrease the load on your front-end processes and may decrease the load on your site in total if the background job generates the files and you provide links to the generated files on a page within your app. Chances are one file generation could serve several downloads.
Since practically all Rails web servers are single threaded and synchronous out of the box, you will have an entire app server process tied up on this one file download for each time a user requests it. This makes it easy for users to accidentally carry out a DoS attack on your site.
You can ship the background generated file to a CDN such as S3 and perhaps gain a performance boost on the download speed your users see.
When the background process is done you can notify the user via email so they don't even have to be at the computer where they initiated the file generation in order to know it's done.
Once you have a background job system in your application you will find many more uses for it, such as sending email or updating search indexing.
Sorry that this doesn't really answer your original question. But I strongly believe this is a better overall solution.

Email open notification - ruby on rails

If I will send 100 email to the registered user and I want to know if users open email or not
How can I do this using Ruby on Rails?

The only way to do this, is to use html email with a tracker image. You need to include a user specific image into the code.
class TrackingController < ApplicationController
def image
# do something with params[:id]
send_file "/path/to/an/image"
end
end
add the following route:
# Rails 2
map.tracking_image "tracking_image/:id.gif", :controller => 'tracking', :action => image
# Rails 3
match 'products/:id', :to => 'tracking#image', :as => "tracking_image"
# Rails 4 (match without verb is deprecated)
get 'producsts/:id' => 'tracking#image', as: 'tracking_image'
# or
match 'producsts/:id' => 'tracking#image', as: 'tracking_image', via: :get
in your email template something like this:
<%= image_tag tracking_image_url(#user.id) %>
But be aware, that this it's not guaranteed that the user reads the email and loads the image, some email clients don't load images, until the user wants to. And If he doesn't you can't do anything about this. Also if the user uses text mail only this won't work neither.

Short answer, You can't. Slightly longer answer You can't reliably.
Using something like VERP you can automate the the bounce processing, to get a fairly good idea if the far end mail server accepted the email. But after that all bets are off. You can't really tell what the email server did with it, (route it to junk/spam folder, put in inbox, silently drop it on the floor/bit bucket, etc..). You could enable read-receipt headers in your email, but that is client specific (and people like me eat/deny them). You can look into using a web bug, for example customize each email with an HTML file, that pulls a remote image, that has a unique id associated with it, but again client specific, most will not load remote images. So unless the email bounces there is no 100% reliable way to tell what happens to the email after it leaves your server.

I am not very familiar with ruby but have written multiple mass mailer apps. You can use a webbug image to get an approximate open rate. Basically it is just a one pixel or transparent image with some tracking information:
<img src="http://mysite/trackingimage.gif?email=x&customer=y">
What I do is make a directory called trackingimage.gif with an index in it that reads and stores the url params and then relocates to the real image.

What is the best way to show my users a preview of email templates in Ruby on Rails?

My software sends emails for users. I want to show them what the emails will look like before they get sent. However, with ActionMailer conventions, the entire template is in one file. This means the html,head,body tags, etc. Can anyone think of a good way to give my users a preview of what the emails I send out will look like?
Thanks!

I had the same issue. I built out the display with the associated model I was sending rather than in the mailer. I was able to feed sample data or live data to display it to the user.
when it came time to actually send it, I rendered the exact same thing within the mailer view
EDIT:
I apologize for the crap variable names in advance. I am not sure I am allowed to explicitly talk about them :)
Lets say I have a BarMailer function called foo(status,bar)
where status is a test email or a live email and bar is my associated model.
I called deliver_foo("test",bar)
deliver_foo sends out a multipart message so for each part I render_message and pass along variables I need. for example:
p.body = render_message('bar_html', :bar => bar, :other_data => bar.other_data)
so, that render_message is is saying to specifically use the bar_html view (I also have a bar_text for plain text).
this is the contents of my bar_html view:
<%=render :inline => #bar.some_parent.some_other_model.html, :locals => {:other_data => #other_data, :time => Time.now, :bar => #bar }%>
Its a little complicated, but it is based on a template system. By rendering inline everywhere, I am able to use the same code for a number of different functions including previewing and sending. I like this because it becomes a WYSIWIG. No extra code or functionality that could be buggy and muck with the potential output in an email. If it works in one area, it will work in the other. Plus keeping it DRY means I am not going to forget to modify a copy (which I would do frequently, hehe).

Getting rendered images to the browsers in Rails

I am writing a Rails app that processes data into a graph (using Scruffy). I am wondering how can I render the graph to a blog/string and then send the blog/string directly to the the browser to be displayed (without saving it to a file)? Or do I need to render it, save it to a file, then display the saved image file in the browser?

I think you will be able to use send_data for this purpose:
send_data data_string, :filename => 'icon.jpg', :type => 'image/jpeg', :disposition => 'inline'
If you put this in a controller action - say show on a picture controller, then all you need do is include the following in your view (assuming RESTful routes):
<%= image_tag picture_path(#picture) %>

I wonder if sending direct to the browser is the best way? If there is the possibility that users will reload the page would this short circuit any cache possibilities? I ask because I really don't know.

"If there is the possibility that users will reload the page would this short circuit any cache possibilities?"
No - whether you're serving from a file system or send_data doesn't matter. The browser is getting the data from your server anyway. Just make sure you've got your HTTP caching directives sorted out.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart