How to see the speedup when using Cloudinary "direct upload" method? - ruby-on-rails

I have a RoR web app that allow users upload images and use Cloudinary as cloud storage. I read their document and find a cool way called "direct uploading" which reduce my server's loading. To my knowledge, the spirit is changing workflow
image -> server -> Cloudinary
to
image -> Cloudinary
and my server only store an Cloudinary url to database, not the image file (Tell me if I'm wrong, thx).
So my question is, how to check whether I have changed to "direct uploading" method successfully? Open element inspector to see time cost for each POST and GET requests? Other better options?
I expect big advances via this way, but how can I feel it?
Thanks form a rookie =)
# The app is deployed on heroku.
# Doesn't change to direct uploading method yet.
# This app is private, only serve for around 10 people.

You can indeed (and it is very recommended to) bypass your server and let Cloudinary take care of the upload processing directly. This indeed lowers the processing of your server to simply store the uploaded image's details, and the image is directly stored in your Cloudinary account. This indeed quickens the upload process. You can test out the sample project which demonstrates both server-side and client-side uploads.

Related

How to Upload Large Files on Heroku (Particularly Videos)

I'm using heroku to host a web application with the primary focus of hosting videos. The videos are hosted through vimeo pro, and I'm using the vimeo gem by matthooks to help handle the upload process. Upload works for small files, but not for larger ones (~50mb, for example).
A look at heroku logs shows that I am getting http error 413, which stands for "Request Entity Too Large." I believe this might have to do with a limit that heroku places on file uploads (greater than 30mb, according to this webpage). The problem though is that any information I can find on the subject seems to be outdated and conflicting (like this page that claims there is no size limit). I also couldn't find anything on heroku's site about this.
I've searched google and found a few somewhat relevant pages (one and two), but no solutions that worked for me. Most of the pages I found deal with uploading large files to amazon s3, which is different from what I'm trying to do.
Here's the relevant output of the logs:
2012-07-18T05:13:31+00:00 heroku[nginx]: 152.3.68.6 - - [18/Jul/2012:05:13:31 +0000]
"POST /videos HTTP/1.1" 413 192 "http://neoteach.com/components/19" "Mozilla/5.0
(Macintosh; Intel Mac OS X 10.7; rv:13.0) Gecko/20100101 Firefox/13.0.1" neoteach.com
There are no other errors in the logs. This is the only output that appears when I try to upload a video that is too large. Which means that this is not a timeout error or a problem with exceeding the allotted memory per dyno.
Does heroku really place a limit on upload sizes? If so, is there any way to change this limit? Note that the files themselves are not being stored on heroku's servers at all, they are merely being passed on to vimeo's servers.
If the problem is not limit on upload sizes, does anyone have an idea of what else might be going wrong?
Much thanks!
Update:
OP here. I'm still not exactly sure why I was getting this particular 413 error, but I was able to come up with a solution that works using the s3_swf_upload gem. The implementation involves flash, which is less than ideal, but it was the only solution (out of 3 or 4 that I tried) that I could get working.
As Neil pointed out (thanks Neil!), the error I should have been getting is "H12 - Request timeout". And I did end up running into this error after repeated trials. The problem occurs when you try to upload large files to the heroku server from your controller (using a web dyno), because it takes too long for the server to respond to the post request.
The proper approach is to send the file directly to s3 without passing through heroku.
Here's a high-level overview of my approach:
Use the s3_swf_upload gem to supply a direct upload form to s3.
Detect when the file is done uploading with the javascript callback function provided in the gem.
Using javascript, send rails a post message to let your server know the file is done uploading.
The controller that responds to the javascript post does two things: (a) assigns an s3_key attribute to the video object (served up as a param in the form). (b) initiates a background task using the delayed_job gem.
The background task retrieves the file from s3. I used the aws-sdk gem to accomplish this, because it was already included in s3_swf_upload. Note that this is distinctly different from the aws-s3 gem (in fact they conflict with one another).
After the file has been retrieved from s3, I used the vimeo gem to upload it to vimeo (still in the background).
The implementation above works, but it isn't perfect. For files that are close to 500MB in size, you'll still run into R14 errors in your worker dynos. This occurs because heroku only allots 512MB of memory per dyno, so you can't load the entire file into memory at once. The way around this problem is to implement some sort of chunking in the final step, where you retrieve the file from s3 and upload it to vimeo piece by piece. I'm still working on this part, and I'd love to hear any suggestions you might have.
Hopefully this might help someone. Feel free to ask me any questions. Like I said, my solution isn't perfect so feel free to add your own answer if you think it could be better.
I think the best option here is indeed to upload directly to S3. It's much cheaper and much more secure than allowing users to upload files to your own server (or Heroku in this case). It's also a well-proven pattern used by lots of video hosting platforms (I know vzaar do this).
Check out the jQuery upload plugin, which allows direct uploads to S3: https://github.com/blueimp/jQuery-File-Upload
Also check out the Railscasts around this topic: #381 and #383.
Your biggest problem is not the size of the files here, but the fact that you are expecting the user to upload large files to Heroku, and then pass them on. The issue here is that all requests on the Heroku platform must return the first byte within 30 seconds - which in your case is very unlikely.
Therefore, you need to look at getting users to upload direct to S3/Vimeo/whereever and then connect your application data to these uploaded assets.
If you're using Ruby, then the carrier-wave direct gem might be worth a look for how it's done . Failing that there are 3rd party services out there which allow you to do this via some code which you can drop into the page, but these come with an attached cost.

Carrier Wave gem and Heroku

I read here that Heroku doesn't allow you to store photos on their server and that people use CarrierWave gem with Amazon to store photos. However, I just watched Ryan Bate's Carrier Wave RailsCasts and he also mentions how CarrierWave has a remote url option whereby it will, in his words, "download" the photo from a URL and display it on your site. Does this mean that it stays on the remote server and just gets presented by CarrierWave on the Heroku site? I assume Carrier Wave's not somehow attempting to transfer the image at the url to the new server?
Might be a stupid question but I don't know a lot about servers (or anything :)))
the remote url option for CarrierWave gives the user a different way of providing the picture to your server: instead of uploading the picture file directly, the user may give a URL where the picture is (say, on a Flickr account, or something). When this is provided to the application using CarrierWave, the picture is downloaded from the third-party location (given by the url) to the application server -- just as if the user had uploaded it directly -- and then stored to Amazon's S3.

How to proxy files from S3 through rails application to avoid leeching?

In order to avoid hot-linking, S3 bandwidth leeching, etc I would like to make my bucket private and serve the files through a Rails app. Concept in general sounds very easy, but I am not entirely sure which approach would be the best for the situation.
I am using paperclip for general asset management. Is there any build-in way to achieve this type of proxy?
In general I can easily parse the url's from paperclip and point them back to my own controller. What should happen from this point? Should I simply use Net::HTTP to download the image, and then serve it with send_data? In between I want to log referer and set proper Control-Cache headers, since I have a reverse-proxy in front of the app. Is Net::HTTP + send_data resonable way in this case?
Maybe whole ideas is really bad for some reasons I am not aware at this moment? I general I believe that reveling the direct S3 links to public bucket is dangerous and yield in some serious problems in case of leeching / hot-linking...
Update:
If you have any other ideas which can reduce S3 bill and prevent hot-linking leeching in anyway please share, even if they are not directly related to Rails.
Use (a private bucket|private files) and use signed URLs to the files stored on S3.
The signature includes an expiration time (e.g. 10 minutes from now, whatever you would like to set), as well as a cryptographic hash. S3 will refuse to serve files if the signature is invalid, or if the expiration time has passed.
This is useful because only you can create valid URLs to your private files in S3, and you can control how long the URLs remain valid. This prevents leeching, because leechers can't sign their own URLs and, if they get a URL that you signed, that URL will expire very shortly and after that can not be used.
Since there wasn't a nuts and bolts answer above, here's a small code sample of how to stream a file that's stored on S3.
render :text => proc { |response, output|
AWS::S3::S3Object.stream(path, bucket) do |segment|
output.write segment
output.flush # not sure if this is needed
end
}
Depending on your webserver this may (mongrel) or may not (webrick) work, so don't get too frustrated if it doesn't stream in development.
Provide temporary pre-signed URLs:
def show
redirect_to Aws::S3::Presigner.new.presigned_url(
:get_object,
bucket: 'mybucket',
key: '/folder/file.pdf'
expires_in: 60)
end
S3 still distributes the content so you offload the work from Rails (which is very slow at it), handles HTTP caching, HEAD operations, and uses Amazon CDN.
I'd probably avoid to do this -- at least until I'd have no other choice.
You need to take into account that you'll probably also add to the bandwidth bill if you download the image each time. Also, by processing each image through a script you'll also need more CPU and RAM required to do this. Not the greatest outlook -- IMHO.
I would probably enable the access logs for Amazon S3 and write a tool small to analyze usage and change the permissions on the bucket/object in case usage is goes the roof. Run this as a cronjob every 10 minutes or so and you should be save?
You could also use s3stat. They also offer a free plan.
Edit: As per my recommendation for Varnish, I'm adding a link to a blog entry about preventing hotlinking using Varnish.

Why would you upload assets directly to S3?

I have seen quite a few code samples/plugins that promote uploading assets directly to S3. For example, if you have a user object with an avatar, the file upload field would load directly to S3.
The only way I see this being possible is if the user object is already created in the database and your S3 bucket + path is something like
user_avatars.domain.com/some/id/partition/medium.jpg
But then if you had an image tag that tried to access that URL when an avatar was not uploaded, it would yield a bad result. How would you handle checking for existence?
Also, it seems like this would not work well for most has many associations. For example, if a user had many songs/mp3s, where would you store those and how would you access them.
Also, your validations will be shot.
I am having trouble thinking of situations where direct upload to S3 (or any cloud) is a good idea and was hoping people could clarify either proper use cases, or tell me why my logic is incorrect.
Why pay for storage/bandwidth/backups/etc. when you can have somebody in the cloud handle it for you?
S3 (and other Cloud-based storage options) handle all the headaches for you. You get all the storage you need, a good distribution network (almost definitely better than you'd have on your own unless you're paying for a premium CDN), and backups.
Allowing users to upload directly to S3 takes even more of the bandwidth load off of you. I can see the tracking concerns, but S3 makes it pretty easy to handle that situation. If you look at the direct upload methods, you'll see that you can force a redirect on a successful upload.
Amazon will then pass the following to the redirect handler: bucket, key, etag
That should give you what you need to track the uploaded asset after success. Direct uploads give you the best of both worlds. You get your tracking information and it unloads your bandwidth.
Check this link for details: Amazon S3: Browser-Based Uploads using POST
If you are hosting your Rails application on Heroku, the reason could very well be that Heroku doesn't allow file-uploads larger than 4MB:
http://docs.heroku.com/s3#direct-upload
So if you would like your users to be able to upload large files, this is the only way forward.
Remember how web servers work.
Unless you're using a sort of async web setup like you could achieve with Node.JS or Erlang (just 2 examples), then every upload request your web application serves ties up an entire process or thread while the file is being uploaded.
Imagine that you're uploading a file that's several megabytes large. Most internet users don't have tremendously fast uplinks, so your web server spends a lot of time doing nothing. While it's doing all of that nothing, it can't service any other requests. Which means your users start to get long delays and/or error responses from the server. Which means they start using some other website to get the same thing done. You can always have more processes and threads running, but each of those costs additional memory which eventually means additional $.
By uploading straight to S3, in addition to the bandwidth savings that Justin Niessner mentioned and the Heroku workaround that Thomas Watson mentioned, you let Amazon worry about that problem. You can have a single-process webserver effectively handle very large uploads, since it punts that actual functionality over to Amazon.
So yeah, it's more complicated to set up, and you have to handle the callbacks to track things, but if you deal with anything other than really small files (and even in those cases), why cost yourself more money?
Edit: fixing typos

Using Google App Engine as a Content delivery network

I would like to know if Google App Engine can be used as a Content delivery network like aws S3. I'm running a RoR app on Heroku and I would like store my uploaded files on GAE instead of s3.
If it's possible what would be the best way to do it?
http://24ways.org/2008/using-google-app-engine-as-your-own-cdn
It won't be able to host files over 1MB though.
Make sure to read through the comments on that blog post as well, some have concerns about the terms of service.
GAE in itself isn't meant to be a CDN... that doesn't, however, stop you from writing a CDN application on top of it. The only limit you'll need to worry about is the 50 MB limit on the size of the blobstore. Such an app will have to provide a URL that you can hit to get the upload URL, which could then be used to upload the file. The download url can also be generated with the upload URL, and used to access the content.

Resources