Does anyone know if it's possible to integrate carrierwave backgrounder (store_in_background) with Heroku? - ruby-on-rails

https://github.com/lardawge/carrierwave_backgrounder
I would like to use store_in_background method for delaying storing the files to the S3 but I'm little bit afraid as Heroku is read only system and wondering if anyone has managed that ?

It would work iff you're using Heroku's newer stack which offers an ephemeral file system. I'd recommend something like queue_classic instead of carrierwave_backgrounder.
Queue Classic uses Postgres-specific features to deliver great performance. It also has the advantage of being able to be modified by postgres triggers/procedures. This allows you to queue an image delete when an image row is deleted in ONE query.

Related

Which web service to use as a cache storage?

I am creating an application as a practice which will scrape a website. I want to cache scraped results using a web service, but I am not sure which one to use.
I have looked into Amazon's Elasticache and S3.
Elasticache seems like an overkill for this problem, but uses Redis in the background which will reduce my workload (I guess?).
S3 is not in-memory, but bigger issue for me is that I am not completely sure it is good solution for this kind of problem.
I don't need anything super fancy. I would like something easy to set up, yet efficient if that is possible.
So which one to chose? Are there any better alternatives?
Why do you think that Redis will reduce your work load? Redis is really powerful cache and will save your data really good. Elasticache of AWS can works with redis, should be really easy to setup and you only need control the what you will save there.

What is the best approach to handle large file uploads in a rails app?

I am interested in understanding the different approaches to handling large file uploads in a Rails application, 2-5Gb files.
I understand that in order to transfer a file of this size it will need to be broken down into smaller parts, I have done some research and here is what I have so far.
Server-side config will be required to accept large POST requests and probably a 64bit machine to handle anything over 4Gb.
AWS supports multipart upload.
HTML5 FileSystemAPI has a persistent uploader that uploads the file in chunks.
A library for Bitorrent although this requires a transmission client which is not ideal
Can all of these methods be resumed like FTP, the reason I dont want to use FTP is that I want to keep in the web app if this is possible? I have used carrierwave and paperclip but I am looking for something that will be able to be resumed as uploading a 5Gb file could take some time!
Of these approaches I have listed I would like to undertand what has worked well and if there are other approaches that I may be missing? No plugins if possible, would rather not use Java Applets or Flash. Another concern is that these solutions hold the file in memory while uploading, that is also a constraint I would rather avoid if possible.
I've dealt with this issue on several sites, using a few of the techniques you've illustrated above and a few that you haven't. The good news is that it is actually pretty realistic to allow massive uploads.
A lot of this depends on what you actually plan to do with the file after you have uploaded it... The more work you have to do on the file, the closer you are going to want it to your server. If you need to do immediate processing on the upload, you probably want to do a pure rails solution. If you don't need to do any processing, or it is not time-critical, you can start to consider "hybrid" solutions...
Believe it or not, I've actually had pretty good luck just using mod_porter. Mod_porter makes apache do a bunch of the work that your app would normally do. It helps not tie up a thread and a bunch of memory during the upload. It results in a file local to your app, for easy processing. If you pay attention to the way you are processing the uploaded files (think streams), you can make the whole process use very little memory, even for what would traditionally be fairly expensive operations. This approach requires very little actual setup to your app to get working, and no real modification to your code, but it does require a particular environment (apache server), as well as the ability to configure it.
I've also had good luck using jQuery-File-Upload, which supports good stuff like chunked and resumable uploads. Without something like mod_porter, this can still tie up an entire thread of execution during upload, but it should be decent on memory, if done right. This also results in a file that is "close" and, as a result, easy to process. This approach will require adjustments to your view layer to implement, and will not work in all browsers.
You mentioned FTP and bittorrent as possible options. These are not as bad of options as you might think, as you can still get the files pretty close to the server. They are not even mutually exclusive, which is nice, because (as you pointed out) they do require an additional client that may or may not be present on the uploading machine. The way this works is, basically, you set up an area for them to dump to that is visible by your app. Then, if you need to do any processing, you run a cron job (or whatever) to monitor that location for uploads and trigger your servers processing method. This does not get you the immediate response the methods above can provide, but you can set the interval to be small enough to get pretty close. The only real advantage to this method is that the protocols used are better suited to transferring large files, the additional client requirement and fragmented process usually outweigh any benefits from that, in my experience.
If you don't need any processing at all, your best bet may be to simply go straight to S3 with them. This solution falls down the second you actually need to do anything with the files other than server them as static assets....
I do not have any experience using the HTML5 FileSystemAPI in a rails app, so I can't speak to that point, although it seems that it would significantly limit the clients you are able to support.
Unfortunately, there is not one real silver bullet - all of these options need to be weighed against your environment in the context of what you are trying to accomplish. You may not be able to configure your web server or permanently write to your local file system, for example. For what it's worth, I think jQuery-File-Upload is probably your best bet in most environments, as it only really requires modification to your application, so you could move an implementation to another environment most easily.
This project is a new protocol over HTTP to support resumable upload for large files. It bypass Rails by providing its own server.
http://tus.io/
http://www.jedi.be/blog/2009/04/10/rails-and-large-large-file-uploads-looking-at-the-alternatives/ has some good comparisons of the options, including some outside of Rails.
Please go through it.It was helpful in my case
Also another site to go to is:-
http://bclennox.com/extremely-large-file-uploads-with-nginx-passenger-rails-and-jquery
Please let me know if any of this does not work out
I would by-pass the rails server and post your large files(split into chunks) directly from the browser to Amazon Simple Storage. Take a look at this post on splitting files with JavaScript. I'm a little curious how performant this setup would be and I feel like tinkering with this setup this weekend.
I think that Brad Werth nailed the answer
just one approach could be upload directly to S3 (and even if you do need some reprocessing after you could theoretical use aws lambda to notify your app ... but to be honest I'm just guessing here, I'm about to solve the same problem myself, I'll expand on this later)
http://aws.amazon.com/articles/1434
if you use carrierwave
https://github.com/dwilkie/carrierwave_direct_example
Uploading large files on Heroku with Carrierwave
Let me also pin down few options that might help others looking for a real world solution.
I have a Rails 6 with Ruby 2.7 and the main purpose of this app is to create a Google drive like environment where users can upload images and videos and them process them again for high quality.
Obviously we did tried using local processing using Sidekiq background jobs but it was overwhelming during large uploads like 1GB and more.
We did tried tuts.io but personally I think is not quite easy to setup just like Jquery File uploads.
So we experimented with AWS..moving in steps listed below and it worked like a charm....uploading directly to S3 from the browser.
using React drop zone uploader...we uploads multiple files to S3.
we setup Aws Lambda for an input bucket to get triggered for all types of object creations on that bucket.
this Lambda converts the file and again uploads the reprocessed one to another one - output bucket and notifies us using Aws SNS to keep a track of what worked and what failed.
in Rails side... we just dynamically use the new output bucket and then serve it with Aws Cloud-front distribution.
You may check Aws notes on MediaConvert to check step by step guide and they also have a well written Github repos for all sorts of experimentation.
So, from the user's point of view, he can upload one large file, with Acceleration enabled on the S3, the React library show uploading progress and once it gets uploaded, Rails callback api again verifies its existence in the S3 BUCKET like mybucket/user_id/file_uploaded_slug and then its confirmed to user through a simple flash message.
You can also configure Lambda to notify end user on successful upload/encoding, if needed.
Refer this documentation - https://github.com/mike1011/aws-media-services-vod-automation/tree/master/MediaConvert-WorkflowWatchFolderAndNotification
Hope it helps someone here.

Carrierwave/Paperclip and Resque - Sharing across several computers

I am working on a rails application that requires files to be uploaded to my server and then have resque workers (running on several other computers) use those files to do some tasks. I have my workers all set to do the task but I can't seem to find a nice way to get the files from my host computer to my worker computers. I've tried Carrierwave (and looked the documentation for Paperclip), but all I see is using S3 which I cannot use. My only idea is to store a string which contains the URI where the file may be found so that the workers can download them and start working. I'm not particularly fond this idea. Does anyone have any suggestions on what might be best way to do this? Thank you!
Update
I should also note the files that need to be shared are roughly 200MB each
Have you considered something like a Network File System instead of doing this inside your application?
Depending on what platform your workers and server are you should have numerous options to share a filesystem (I assume you have a LAN running between them).
And even if no real LAN, sshfs could work too..
The upsides are obvious: Your Ruby application only has to deal with a regular filesystem using FileUtils and the heavy lifting of pushing stuff around is handled by a much more reliable infrastructure

How to use Elasticsearch on Heroku

I've just finished watching both Railscasts' episodes on Elasticsearch. I've also went ahead and implemented it into my rails application (3.1) and everything is working great. How I want to deploy my app to Heroku but I'm unsure how to get Elasticsearch working on Heroku (specifically on a cedar stack).
Any help would be greatly appreciated!
You can very easily [and freely ;-)] roll your own ElasticSearch server on Amazon EC2, and just connect to it with your app. This is what we're doing, and it's working nicely...
http://www.elasticsearch.org/tutorials/elasticsearch-on-ec2/
Heroku now supports ElasticSearch with the Bonsai add on. https://devcenter.heroku.com/articles/bonsai
I created a Play framework module that will run Elastic Search on Heroku using S3 to persist the state. No need for an EC2 instance - you only pay for the cost of S3 data which is much less - mainly IO transactions. It uses the ElasticSearch S3 gateway (persistence mechanism).
You can use it either by extending the Play application to create specific endpoints for your search functions, or if you like, you can access ElasticSearch REST API directly (by default it exposes it on the route http://yourapp.com/es). There is a very basic authentication system to secure it.
The only downside to this setup is that the dyno can take some time to spin up. So, it won't work well if you let the dyno spin down from inactivity - and you may get nailed for S3 data transfer charges if that happens a lot and your index is huge. The upside is you control your own data and it is cheap cheap cheap. Another word of warning - you will need to be careful to keep inside the memory limits of a Heroku dyno. That said, we had full text search autocomplete functions working on several indexes with no problems.
You might be able to build a similar module in Rails using JRuby to talk to the ElasticSearch Java API. My main contribution here was figuring out how to run it inside another web framework - since Play also uses Netty it was pretty easy to embed it. Performance tests compared to an EC2 cluster + Tire (Rails gem for ElasticSearch) showed that the Heroku/Play approach performed faster searches.
The project is here: https://github.com/carchrae/elastic-play - I'd be happy to help people set it up - it should be pretty painless.
That was exactly my first thought when I watched the RailsCast but unfortunately it's a java daemon that it runs as which isn't possible on Heroku.
Anyway you can't run it on a normal Heroku dyno since it would have to save data to disk which is not persisted on Heroku. You need to wait for an Add-on or host it somewhere else.

Multiple Uploads to Amazon S3 from Ruby on Rails - What Background Processing System to Use?

I'm developing a Ruby on Rails application that needs to allow the user to simultaneously upload 16 high-quality images at once.
This often means somewhere around 10-20 megabytes (sometimes more), but it's the number of connections that are becoming the most pertinent issue.
The images are being sent to Amazon S3 from Paperclip, which unfortunately opens and closes a new connection for each of the 16 files. Needless to say, I need to move the system to run as background processes to keep my web server from locking up like it already is with no traffic.
My question is, out of all the Rails-based systems to use for background jobs (Starling, BackgroundRb, Spawn, etc.), if there is one that might fit the bill for this scenario better than the others (I'm new to building an in-the-background system anyway, so all of the available systems are equally new to me)?
There's no shortage of rails plugins to do async processing, and basically all of them work fine. Personally I like Delayed Job's api best.
I wouldn't use Starling or other actual queue daemons since for this task using the database to store any necessary state should be just fine.
This might help!
http://aaronvb.com/blog/2009/7/19/paperclip-amazon-s3-background-upload-using-starling-and-workling
EDIT:
It's not possible, through a normal html multipart form, to send files to the background. They have to be done through that request. If you are looking for a way around that, you can try SWFUpload and then once that's done use a background process to handle the Amazon S3 uploads.
this is also a good survey blog post http://4loc.wordpress.com/2010/03/10/background-jobs-in-ruby-on-rails/
I like swfupload, we use it on some S3 apps that we wrote. It's proven to be very fast and stable. You can have actions fired off via Ajax after the uploads, etc… We have had a ton of uploads go through it with 0 failures.

Resources