Rails send_data timeout issue - ruby-on-rails

To achieve export to excel, I use RubyXL to create a workbook based on queried result, and use send_data to download. The code is like:
workbook = RubyXL::Workbook.new
# Fill workbook here
send_data workbook.stream.string, filename: "myrpeort.xlsx", disposition: 'attachment'
It works well when there are not too much data, but when the data size increases, for example the saved excel file exceeds 3M, the download fails in browser with following message:
Network Error (tcp_error)
A communication error occurred: ""
The Web Server may be down, too
busy, or experiencing other problems preventing it from responding to
requests. You may wish to try again at a later time.
Seems it is not related to server timeout setting, I even changed the timeout in unicorn to 6000 (100 minutes), it still did not work...
Could you throw me some light on how to solve the issue? Thanks in advance!

Related

How can I get around Heroku's HTTP 30 second limit?

I inherited a rails app that is deployed using Heroku (I think). I edit it on AWS's Cloud9 IDE and, for now, just do everything in development mode. The app's purpose is to process large amounts of survey data and spit it out onto a PDF report. This works for small reports with like 10 rows of data, but when I load a report that is querying a data upload of 5000+ rows to create an HTML page which gets converted to a PDF, it takes around 105 seconds, much longer than Heroku's 30 seconds allotted for HTTP requests.
Heroku says this on their website, which gave me some hope:
"Heroku supports HTTP 1.1 features such as long-polling and streaming responses. An application has an initial 30 second window to respond with a single byte back to the client. However, each byte transmitted thereafter (either received from the client or sent by your application) resets a rolling 55 second window. If no data is sent during the 55 second window, the connection will be terminated." (Source: https://devcenter.heroku.com/articles/request-timeout#long-polling-and-streaming-responses)
This sounds excellent to me - I can just send a request to the client every second or so in a loop until we're done creating the large PDF report. However, I don't know how to send or receive a byte or so to "reset the rolling 55 second window" they're talking about.
Here's the part of my controller that is sending the request.
return render pdf: pdf_name + " " + pdf_year.to_s,
disposition: 'attachment',
page_height: 1300,
encoding: 'utf8',
page_size: 'A4',
footer: {html: {template: 'recent_grad/footer.html.erb'}, spacing: 0 },
margin: { top: 10, # default 10 (mm)
bottom: 20,
left: 10,
right: 10 },
template: "recent_grad/report.html.erb",
locals: {start: #start, survey: #survey, years: #years, college: #college, department: #department, program: #program, emphasis: #emphasis, questions: #questions}
I'm making other requests to get to this point, but I believe the part that is causing the issue is here where the template is being rendered. My template queries the database in a finite loop that stops when it runs out of survey questions to query from.
My question is this: how can I "send or receive a byte to the client" to tell Heroku "I'm still trying to create this massive PDF so please reset the timer and give me my 55 seconds!" Is it in the form of a query? Because, if so, I am querying the MySql database over and over again in my report.html.erb file.
Also, it used to work without issues and does work on small reports, but now I get the error "504 Gateway Timeout" before the request is complete on the actual page, but my puma console continues to query the database like a mad man. I assume it's a Heroku problem because the 504 error happens exactly every 35 seconds (5 seconds to process the other parts and 30 seconds to try to finish the loop in the template so it can render correctly).
If you need more information or code, please ask! Thanks in advance
EDIT:
Both of the comments below suggest possible duplicates, but neither of them have a real answer with real code, they simply refer to the docs that I am quoting here. I'm looking for a code example (or at least a way to get my foot in the door), not just a link to the docs. Thanks!
EDIT 2:
I tried what #Sergio said and installed SideKiq. I think I'm really close, but still having some issues with the worker. The worker doesn't have access to ActionView::Base which is required for the render method in rails, so it's not working. I can access the worker method which means my sidekiq and redis servers are running correctly, but it gets caught on the ActionView line with this error:
WARN: NameError: uninitialized constant HardWorker::ActionView
Here's the worker code:
require 'sidekiq'
Sidekiq.configure_client do |config|
# config.redis = { db: 1 }
config.redis = { url: 'redis://172.31.6.51:6379/0' }
end
Sidekiq.configure_server do |config|
# config.redis = { db: 1 }
config.redis = { url: 'redis://172.31.6.51:6379/0' }
end
class HardWorker
include Sidekiq::Worker
def perform(pdf_name, pdf_year)
av = ActionView::Base.new()
av.view_paths = ActionController::Base.view_paths
av.class_eval do
include Rails.application.routes.url_helpers
include ApplicationHelper
end
puts "inside hardworker"
puts pdf_name, pdf_year
av.render pdf: pdf_name + " " + pdf_year.to_s,
disposition: 'attachment',
page_height: 1300,
encoding: 'utf8',
page_size: 'A4',
footer: {html: {template: 'recent_grad/footer.html.erb'}, spacing: 0 },
margin: { top: 10, # default 10 (mm)
bottom: 20,
left: 10,
right: 10 },
template: "recent_grad/report.html.erb",
locals: {start: #start, survey: #survey, years: #years, college: #college, department: #department, program: #program, emphasis: #emphasis, questions: #questions}
end
end
Any suggestions?
EDIT 3:
I did what #Sergio said and attempted to make a PDF from an html.erb file directly and save it to a file. Here's my code:
# /app/controllers/recentgrad_controller.rb
pdf = WickedPdf.new.pdf_from_html_file('home/ec2-user/environment/gradSurvey/gradSurvey/app/views/recent_grad/report.html.erb')
save_path = Rails.root.join('pdfs', pdf_name + pdf_year.to_s + '.pdf')
File.open(save_path, 'wb') do |file|
file << pdf
end
And the error output:
RuntimeError (Failed to execute:
["/usr/local/rvm/gems/ruby-2.4.1#gradSurvey/bin/wkhtmltopdf", "file:///home/ec2-user/environment/gradSurvey/gradSurvey/app/views/recent_grad/report.html.erb", "/tmp/wicked_pdf_generated_file20190523-15416-hvb3zg.pdf"]
Error: PDF could not be generated!
Command Error: Loading pages (1/6)
Error: Failed loading page file:///home/ec2-user/environment/gradSurvey/gradSurvey/app/views/recent_grad/report.html.erb (sometimes it will work just to ignore this error with --load-error-handling ignore)
Exit with code 1 due to network error: ContentNotFoundError
):
I have no idea what it means when it says "sometimes it will work just to ignore this error with --load-error-handling ignore". The file definitely exists and I've tried maybe 5 variations of the file path.
I've had to do something like this several times. In all cases, I ended up writing a background job that does all the heavy lifting generation. And because it's not a web request, it's not affected by the 30 seconds timeout. It goes something like this:
client (your javascript code) requests a new report.
server generates job description and enqueues it for your worker to pick up.
worker picks the job from the queue and starts working (querying database, etc.)
in the meanwhile, client periodically asks the server "is my report done yet?". Server responds with "not yet, try again later"
worker is finished generating the report. It uploads the file to some storage (S3, for example), sets job status to "completed" and job result to the download link for the uploaded report file.
server, seeing that job is completed, can now respond to client status update requests "yes, it's done now. Here's the url. Have a good day."
Everybody's happy. And nobody had to do any streaming or playing with heroku's rolling response timeouts.
The scenario above uses short-polling. I find it the easiest to implement. But it is, of course, a bit wasteful with regard to resources. You can use long-polling or websockets or other fancy things.
Check my response here just in case it works for you. I didn´t wanted to change the user workflow adding a bg job and then a place/notification to get the result.
I use Rails controller streaming support with Live module and set the right reponse headers. I fetch the data from some Enumerable object.

Slow cache read on first cache fetch in Rails

I am seeing some very slow cache reads in my rails app. Both redis (redis-rails) and memcached (dalli) produced the same results.
It looks like it is only the first call to Rails.cache that causes the slowness (averaging 500ms).
I am using skylight to instrument my app and see a graph like:
I have a Rails.cache.fetch call in this code, but when I benchmark it I see it average around 8ms, which matches what memcache-top shows for my average call time.
I thought this might be dalli connections opening slowly, but benchmarking that didnt show anything slow either. I'm at a loss for what else to check into.
Does anyone have any good techniques for tracking this sort of thing down in a rails app?
Edit #1
Memcache server is stored in ENV['MEMCACHE_SERVERS'], all the servers are in the us-east-1 datacenter.
Cache config looks like:
config.cache_store = :dalli_store, nil, { expires_in: 1.day, compress: true }
I ran something like:
100000.times { Rails.cache.fetch('something') }
and calculated the average timings and got something on the order of 8ms when running on one of my webservers.
Testing my theory of the first request is slow, I opened a console on my web server and ran the following as the first command.
irb(main):002:0> Benchmark.ms { Rails.cache.fetch('someth') { 1 } }
Dalli::Server#connect my-cache.begfpc.0001.use1.cache.amazonaws.com:11211
=> 12.043342
Edit #2
Ok, I split out the fetch into a read and write, and tracked them independently with statsd. It looks like the averages sit around what I would expect, but the max times on the read are very spiky and get up into the 500ms range.
http://s16.postimg.org/5xlmihs79/Screen_Shot_2014_12_19_at_6_51_16_PM.png

How much size is too large for Session on Rails 3

I want to load a text file in Session.
The file size is about 50KB ~ 100KB.
When user trigger the function in my page. it will create the Session.
My Server's RAM is about 8GB. and the max users is about 100
Because there will be a script run in background to collect IP and MAC in LAN.
The script continues write data into text file.
In the same time, the webpage will using Ajax to fetch fresh data from text file.and display on the page.
Is it suitable to implement by session to keep the result? or any better way to achieve ?
Thanks ~
The Python script will collect the data in the LAN in 1 ~ 3 minutes.(Background job)
To avoid blocking for 1~3 minutes. I will use Ajax to fetch the data in text file (continuing added by Python script) and show on the page.
And my user should carry the information cross pages. So I want to store the data in Session.
00:02:D1:19:AA:50: 172.19.13.39
00:02:D1:13:E8:10: 172.19.12.40
00:02:D1:13:EB:06: 172.19.1.83
C8:9C:DC:6F:41:CD: 172.19.12.73
C8:9C:DC:A4:FC:07: 172.19.12.21
00:02:D1:19:9B:72: 172.19.13.130
00:02:D1:13:EB:04: 172.19.13.40
00:02:D1:15:E1:58: 172.19.12.37
00:02:D1:22:7A:4D: 172.19.11.84
00:02:D1:24:E7:0F: 172.19.1.79
00:FD:83:71:00:10: 172.19.11.45
00:02:D1:24:E7:0D: 172.19.1.77
00:02:D1:81:00:02: 172.19.11.58
00:02:D1:24:36:35: 172.19.11.226
00:02:D1:1E:18:CA: 172.19.12.45
00:02:D1:0D:C5:A8: 172.19.1.45
74:27:EA:29:80:3E: 172.19.12.62
Why does this need to be stored in the browser? Couldn't you fire off what you're collecting to a data store somewhere?
Anyway, assuming you HAD to do this, and the example you gave is pretty close to the data you'll actually be seeing, you have a lot of redundant data there. You could save space for the IPs by creating a hash pointing to each successive value, I.E.
{172 => {19 => {13 => [39], 12 => [40, 73, 21], 1 => [83]}}} ...etc. Similarly for the MAC addresses. But again, you can probably simplify this problem a LOT by storing the info you need somewhere other than the session.

cherrypy serve multiple requests / per connection

i have this code
(on the fly compression and stream)
#cherrypy.expose
def backup(self):
path = '/var/www/httpdocs'
zip_filename = "backup" + t.strftime("%d_%m_%Y_") + ".zip"
cherrypy.response.headers['Content-Type'] = 'application/zip'
cherrypy.response.headers['Content-Disposition'] = 'attachment; filename="%s"' % (zip_filename,)
#https://github.com/gourneau/SpiderOak-zipstream/blob/3463c5ccb5d4a53fc5b2bdff849f25bae9ead761/zipstream.py
return ZipStream(path)
backup._cp_config = {'response.stream': True}
the problem i faced is when i'm downloading the file i cant browse any other page or send any other request until the download done...
i think that the problem is that cherrypy can't serve more than one request at a time/ per user
any suggestion?
When you say "per user", do you mean that another request could come in for a different "session" and it would be allowed to continue?
In that case, your issue is almost certainly due to session locking in cherrypy. You can read more about it is the session code. Since the sessions are unlocked late by default, the session is not available for use by other threads (connections) while the backup is still being processed.
Try setting tools.sessions.locking = 'explicit' in the _cp_config for that handler. Since you’re not writing anything to the session, it’s probably safe not to lock at all.
Good luck. Hope that helps.
Also, from the FAQ:
"CherryPy certainly can handle multiple connections. It’s usually your browser that is the culprit. Firefox, for example, will only open two connections at a time to the same host (and if one of those is for the favicon.ico, then you’re down to one). Try increasing the number of concurrent connections your browser makes, or test your site with a tool that isn’t a browser, like siege, Apache’s ab, or even curl."

Saving an ActiveRecord non-transactionally

My application accepts file uploads, with some metadata being stored in the DB, and the file itself on the file system. I am trying to make the metadata visible in the application before the file upload and post-processing are finished, but because saves are transactional, I have had no success. I have tried the callbacks and calling create_or_update() instead of save(), all to no avail. Is there a way to do this without re-writing the guts of ActiveRecord::Base? I've even attempted naming the method make() instead of save(), but perplexingly that had no effect.
The code below "works" fine, but the database is not modified until everything else is finished.
def save(upload)
uploadFile = upload['datafile']
originalName = uploadFile.original_filename
self.fileType = File.extname(originalName)
create_or_update()
# write the file
File.open(self.filePath, "wb") { |f| f.write(uploadFile.read) }
begin
musicFile = TagLib::File.new(self.filePath())
self.id3Title = musicFile.title
self.id3Artist = musicFile.artist
self.id3Length = musicFile.length
rescue TagLib::BadFile => exc
logger.error("Failed to id track: \n #{exc}")
end
if(self.fileType == '.mp3')
convertToOGG();
end
create_or_update()
end
Any ideas would be quite welcome, thanks.
Have you considered processing the file upload as a background task? Save the metadata as normal and then perform the upload and post-processing using Delayed Job or similar. This Railscast has the details.
You're getting the meta-data from the file, right? So is the problem that the conversion to OGG is taking too long, and you want the data to appear before the conversion?
If so, John above has the right idea -- you're going to need to accept the file upload, and schedule a conversion to occur sometime in the future.
The main reason why is that your rails thread will process the OGG conversion and can't respond to any other web-requests until it's complete. Blast!
Some servers compensate for this by having multiple rails threads, but I recommend a background queue (use BJ if you host yourself, or Heroku's background jobs if you host there).

Resources