I'm using unicorn on Heroku. one of the issues I'm having is with file uploads. We use carrierwave for uploads, and basically, even for a file that's about 2MB size, by the time 50-60% upload is done, Unicorn times out.
We aren't using unicorn when we test locally, and I don't have any issues with large files locally (though the files get uploaded to AWS using carrierwave, just as with production + staging). However, on staging & production servers, I see that we get a timeout.
Any strategies on fixing this issue? I'm not sure I can put this file upload on a delayed job (because I need to confirm to my users that the file has indeed been successfully uploaded).
Thanks!
Ringo
If you're uploading big files to S3 via Heroku, you can't reasonably avoid timeouts. If someone decides to upload a large file, it's going to time out. If it takes longer than 30s to upload to Heroku, transfer to S3, and process, the request will time out. For good reason too, a 30s request is just crappy performance.
This blog post (and github repo) is very helpful: http://pjambet.github.io/blog/direct-upload-to-s3/
With it, you should be able to get direct-to-s3 file uploads working. You completely avoid hitting Heroku for the bulk of the upload. Using jquery-fileupload's callbacks, you can post to your application after the file is successfully uploaded, and process it in the background using delayed_job. Confirming to your users that the upload is successful is an application problem you just need to take care of.
Sounds like your timeout is set too low. What does your unicorn config look like?
See https://devcenter.heroku.com/articles/rails-unicorn for a good starting point.
Related
One of the features of the application which I am currently working on is photo upload. Customers upload photos in frontend and photos are passed to rails backed and then stored on Amazon S3.
I have noticed that a huge amount of request time is spent uploading photos to s3. The photos are uploaded one by one so latency is multiplied. It would be great if I could somehow store photos temporarily in RAM and increase request speed.
I have thinked about running a Sidekiq job with a file as params but according to sidekiq documentation passing a huge object is not good practise. How can I solve this in another vay ?
I think this problem by using an API to generate a presigned url and using cognito to upload the image on s3 and get the image link.
nginx/puma running on machine A should save the image as a local file. Run Sidekiq on the same machine A and pass the filename to a job in a host-specific queue for Sidekiq to process. That way you can pass a file reference without worrying which machine will process it.
Make sure Sidekiq deletes the file so you don't fill up the disk!
https://www.mikeperham.com/2013/11/13/advanced-sidekiq-host-specific-queues/
I have a scheduler task which downloads a video from a url. I want to temporarily store this on my Heroku server just long enough so that I can upload it to S3. I can't figure out a way to upload directly from external URL to S3, so instead I'm using my server as the 'middle man'.
But I don't understand where I should be storing the file on my server, or if Heroku will even allow it.
If you're on the Cedar or Cedar-14 stack, you can write the file anywhere on the filesystem.
You're probably aware (if not, you should be) that Heroku Dynos have an ephemeral filesystem and that this filesystem is discarded the moment a dyno is stopped or restarted - which can happen for any number of reasons. With that in mind, you'll probably want to design your task scheduler in such a way that failed jobs are retried a couple of times.
I have a Rails 4 app in production running on a single VPS with nginx and passenger.
I use jquery-fileupload + carrierwave to store uploaded images in my own public folder.
I am trying to find a good solution to calculate and save upload and download times and durations, preferably using server-side code.
So far, I have tried this answer, which tries to calculate the duration with ruby code.
It basically uses the rails send_file command to send the file and saves the time before and after the send_file command to know the duration. This has two main problems:
The send_file command (along with its similar alternatives) pauses the ruby code until file transfer is complete (at least in development), which is not practical.
In production with passenger and nginx, this pause does not happen, and the requests are not blocked, probably because the send_file command delegates the actual file transfer to nginx and immediately continues the code. When I time uploads/downloads in this situation, I always get a few milliseconds!
So, I'm stuck.
I was thinking a viable solution would be to somehow extract this information from nginx or passenger logs once a day using a cron job. I'm not sure if it can be done however.
I have a simple setup going for an API i'm building in rails.
A zip is uploaded via a POST, and I take the file, store it in rails.root/tmp using carrierwave and then background an s3 upload with sidekiq.
the reason i store the file temporarily is because i can't send a complex object to sidekiq, so i store it and send the id, and let sidekiq find it and do work with it, then delete the file once it's done.
the problem is that once it's time for my sidekiq worker to find the file by its path, it can't because it doesn't exist. i've read that heroku's ephemeral file system deletes its files when things are reconfigured/servers are restarted, etc.
none of these things are happening, however and the file doesn't exist. so my theory is that the sidekiq worker is actually trying to open the path that gets passed to it on its own filesystem since it's a separate worker and that file doesn't exist. can someone confirm this? if that's the case, are the any alternate ways to do this?
If your worker is executed on another dyno than your web process, you are experiencing this issue because of dyno isolation. read more about this here: https://devcenter.heroku.com/articles/dynos#isolation-and-security
Although it is possible to run sidekiq workers and the web process on the same machine (maybe not on heroku, i am not sure about that), it is not advisable to design your system architecture like that.
If your application grows or experiences temporarily high loads, you may want to spread the load across multiple servers, and usually also run your workers on separate servers than your web process in order to not block the web process in case that your workers are keeping the server busy.
In all those cases you can never share data on the local filesystem between the web process and the worker.
I would recommend to consider directly uploading the file to S3 using https://github.com/waynehoover/s3_direct_upload
This also takes a lot of load off your web server
Its my understanding that when I upload a file to my heroku instance its a synchronous request and I will get a 200 back when the request is done, which means my upload has been processed and stored by paperclip.
I am using plupload which does a serial upload (one file at a time). On Heroku I have 3 dynos and my app becomes unresponsive and I get timeouts trying to use the app. My upload should really only tie up at most a single dyno while all the files are being uploaded since its done serially and file 2 doesnt start until a response is returned from file 1.
As a test I bumped my dynos to 15 and ran the upload. Again I see the posts come into the logs and then I start seeing output of paperclip commands (cant remember if it was identify or convert) and I start getting timeouts.
I'm really lost as to why this is happening. I do know I 'can' upload directly to s3 but my current approach should be just fine. Its an admin interface that is only used by a single person and again at most it should tie up a single dyno since all the uploaded files are sent serially.
Any ideas?
I've been working on the same problem for a couple of days. The problem, so far as I understand, is that when uploading files through heroku, your requests are still governed by the 30 second timeout limit. On top of this, it seems that subsequent requests issued to the same dyno (application instance) can cause it to accrue the response times and terminate. For example, if you issue two subsequent requests to your web app that each take 15 seconds to upload, you could recieve a timeout, which will force the dyno to terminate the request. This is most likely why you are receiving timeout errors. If this continues on multiple dynos, you could end up with an application crash, or just generally poor performance.
What I ended up doing was using jquery-file-upload. However, if you are uploading large files (multiple MBs), then you will still experience errors as heroku is still processing the uploads. In particular I used this technique to bypass heroku entirely and upload directly from the client's browser to s3. I use this to upload to a temp directory, and then use carrierwave to 're-download' the file and process medium and thumbnail versions in the background by pushing the job to Qu. Now, there are no timeouts, but the user has to wait for the jobs to get processed in the background.
Also important to note is that heroku dynos operate independently of each other, so by increasing the number of web dynos, you are creating more instances of your application for other users, but each one is still subject to 30 second timeouts and 512Mb of memory. Regardless of how many dynos you have, you will still have the same issues. More dynos != better performance.
You can use something like Dropzonejs in order to divide your files in queue and send them separately. That way the request wont timeout.