Its my understanding that when I upload a file to my heroku instance its a synchronous request and I will get a 200 back when the request is done, which means my upload has been processed and stored by paperclip.
I am using plupload which does a serial upload (one file at a time). On Heroku I have 3 dynos and my app becomes unresponsive and I get timeouts trying to use the app. My upload should really only tie up at most a single dyno while all the files are being uploaded since its done serially and file 2 doesnt start until a response is returned from file 1.
As a test I bumped my dynos to 15 and ran the upload. Again I see the posts come into the logs and then I start seeing output of paperclip commands (cant remember if it was identify or convert) and I start getting timeouts.
I'm really lost as to why this is happening. I do know I 'can' upload directly to s3 but my current approach should be just fine. Its an admin interface that is only used by a single person and again at most it should tie up a single dyno since all the uploaded files are sent serially.
Any ideas?
I've been working on the same problem for a couple of days. The problem, so far as I understand, is that when uploading files through heroku, your requests are still governed by the 30 second timeout limit. On top of this, it seems that subsequent requests issued to the same dyno (application instance) can cause it to accrue the response times and terminate. For example, if you issue two subsequent requests to your web app that each take 15 seconds to upload, you could recieve a timeout, which will force the dyno to terminate the request. This is most likely why you are receiving timeout errors. If this continues on multiple dynos, you could end up with an application crash, or just generally poor performance.
What I ended up doing was using jquery-file-upload. However, if you are uploading large files (multiple MBs), then you will still experience errors as heroku is still processing the uploads. In particular I used this technique to bypass heroku entirely and upload directly from the client's browser to s3. I use this to upload to a temp directory, and then use carrierwave to 're-download' the file and process medium and thumbnail versions in the background by pushing the job to Qu. Now, there are no timeouts, but the user has to wait for the jobs to get processed in the background.
Also important to note is that heroku dynos operate independently of each other, so by increasing the number of web dynos, you are creating more instances of your application for other users, but each one is still subject to 30 second timeouts and 512Mb of memory. Regardless of how many dynos you have, you will still have the same issues. More dynos != better performance.
You can use something like Dropzonejs in order to divide your files in queue and send them separately. That way the request wont timeout.
Related
Firstly: I realise in an ideal world I could achieve this using SOA. Humour me :)
Background
Imagine I have a rails app running on heroku with very minimal traffic in terms of user requests, they can be happily served by 1 web dyno.
I also have a machine somewhere in the world which is regularly and repeatedly submitting large files to my application via http://example.com/api/bigupload as fast as it is able.
The large files eat up my web dynos and so the user experience is bad. I increase the web dynos, but the large file uploads continue to tie them all up in long requests.
Question
Is there some way I can keep one worker in 'reserve' which will not respond to the big upload requests and concentrate on serving user traffic for other URLs?
Note: I have a similar situation to this one where automated large image uploads are eating my requests and delaying users accessing the API, albeit on a larger scale.
I think you're effectively asking: "Is there a way to partition my web dynos so that only some respond to a certain subset of requests".
The answer (today) is no unfortunately. Heroku routes randomly across all your web dynos.
What web server are you running on your web dyno? Are you using a concurrent web server? If you're not, that may have a large impact (in that it won't tie the dyno up nearly as much).
Have you explored a different architecture where instead of your other app submitting big uploads, it submits pointers to the big payloads. That way your web dyno can simply dump them on a queue, and your workers can fetch the payloads and process - and then you can scale by increasing the number of workers.
I have a simple setup going for an API i'm building in rails.
A zip is uploaded via a POST, and I take the file, store it in rails.root/tmp using carrierwave and then background an s3 upload with sidekiq.
the reason i store the file temporarily is because i can't send a complex object to sidekiq, so i store it and send the id, and let sidekiq find it and do work with it, then delete the file once it's done.
the problem is that once it's time for my sidekiq worker to find the file by its path, it can't because it doesn't exist. i've read that heroku's ephemeral file system deletes its files when things are reconfigured/servers are restarted, etc.
none of these things are happening, however and the file doesn't exist. so my theory is that the sidekiq worker is actually trying to open the path that gets passed to it on its own filesystem since it's a separate worker and that file doesn't exist. can someone confirm this? if that's the case, are the any alternate ways to do this?
If your worker is executed on another dyno than your web process, you are experiencing this issue because of dyno isolation. read more about this here: https://devcenter.heroku.com/articles/dynos#isolation-and-security
Although it is possible to run sidekiq workers and the web process on the same machine (maybe not on heroku, i am not sure about that), it is not advisable to design your system architecture like that.
If your application grows or experiences temporarily high loads, you may want to spread the load across multiple servers, and usually also run your workers on separate servers than your web process in order to not block the web process in case that your workers are keeping the server busy.
In all those cases you can never share data on the local filesystem between the web process and the worker.
I would recommend to consider directly uploading the file to S3 using https://github.com/waynehoover/s3_direct_upload
This also takes a lot of load off your web server
I'm using unicorn on Heroku. one of the issues I'm having is with file uploads. We use carrierwave for uploads, and basically, even for a file that's about 2MB size, by the time 50-60% upload is done, Unicorn times out.
We aren't using unicorn when we test locally, and I don't have any issues with large files locally (though the files get uploaded to AWS using carrierwave, just as with production + staging). However, on staging & production servers, I see that we get a timeout.
Any strategies on fixing this issue? I'm not sure I can put this file upload on a delayed job (because I need to confirm to my users that the file has indeed been successfully uploaded).
Thanks!
Ringo
If you're uploading big files to S3 via Heroku, you can't reasonably avoid timeouts. If someone decides to upload a large file, it's going to time out. If it takes longer than 30s to upload to Heroku, transfer to S3, and process, the request will time out. For good reason too, a 30s request is just crappy performance.
This blog post (and github repo) is very helpful: http://pjambet.github.io/blog/direct-upload-to-s3/
With it, you should be able to get direct-to-s3 file uploads working. You completely avoid hitting Heroku for the bulk of the upload. Using jquery-fileupload's callbacks, you can post to your application after the file is successfully uploaded, and process it in the background using delayed_job. Confirming to your users that the upload is successful is an application problem you just need to take care of.
Sounds like your timeout is set too low. What does your unicorn config look like?
See https://devcenter.heroku.com/articles/rails-unicorn for a good starting point.
My app is basic (1 dyno, 20MB slug size) and some of the pages take too long to load at times. Using Firebug, I've observed that most of the times the pages load within 3-4 sec but sometimes it takes more than a minute for the page to load (both data points are when the cache on the browser is cleared). The basic html response was within 500ms and the main component of the time was downloading a png image (17kb image) for which the wait time (after sending request) was more than a min. I cannot understand why this would be the case.
I am using Yslow to analyze the entire page (gave a B grade) and I think this has something to do with Heroku taking long to send images at times.
I have referred to the question - Why are my basic Heroku apps taking two seconds to load?
As suggested in the answers, I have put a simple cron task in heroku that accesses the homepage every hour through a URI GET request.
What could I do to improve the speed?
I am considering the following things:
1. Move images to a CDN
2. Put a get expires header as given in http://upstre.am/blog/tag/heroku/
I have put a simple cron task in heroku that accesses the homepage every hour through a URI GET request.
From what you are describing you are using Heroku cron job to ping your app. Unfortunately this will not work as you have to use an external ping service such as Pingdom.
Update: seems like external ping services like Pingdom no longer works either..
Heroku 'idles' dynos if they aren't used for more than 30min I believe. This means you'll need more than 1 web dyno if you want to keep the app active and ready to load at any time. Worker dynos are never idled.
Basically just set your app to 2 web dynos.
We are expect to serve few thousands uploads within 2 or 3 minutes. Most of the uploads will be about 20 -> 200 Mb.
Technically, I think upload has not much to do with Rails, but rather the WebServer (Apache/Nginx), so as long as the server can handle concurrent requests, then there not much work for Rails app to do (except to move the file to proper storage and to create database record to track the file).
Is my assumption right? Normally, how many concurrent uploads a single Rails App process can be expected to handle? (Given the Rails App could take 20 ms with all the calculation, moving file and create database record, but the connection must be kept alive for 1 minute so that the file can be successful transferred)
Not really but close. A single rails application instance can only handle a single request at a time but it's easy to use a server that has a pool of these instances using nginx and passenger or mongrels and a load balancer.
You should create a load test to confirm any of your assumptions.
I would use curl to simulate 10/100/1000 users uploading a few megabytes using multiple processes and tune the upload speed to simulate slow clients to see how that affects your performance. Measure the response times for 10 concurrent requests and record and observe the results.
You could use the nginx upload module and by-pass rails if you can and if that helps. Always test your assumptions.