Scalable way to share cached files across frontend servers - ruby-on-rails

I have multiple backend servers continuously building and refresing the public parts of an api in order to cache it. The backend servers are builing depending on what has to be done in the job queue.
At a time,
backend server 1 will build :
/article/1.json
/article/5.json
backend server 2 will build :
/article/3.json
/article/9.json
/article/6.json
I need to serve these files from the front-end servers. The cache is stored as file in order to be directly served by nginx without going through the rails stack.
The issue is to manage to have the cache up to date on the front-end servers in a scalable way (adding new servers should be seamless).
I've considered :
NFS / S3 (but too slow)
Memcached (but can't serve directly from nginx - might be wrong ?)
CouchDB direcly serving JSON (I feel this is too big for the job)
Backend to write json in redis, job in fronted to re-write files at the good place (currently my favorite option)
Any experience or great idea on a better way to achieve this ?

You don't say how long it takes to build a single article, but assuming it's not horrifically slow, I think you'd be better off letting the app servers build the pages on the fly and having the front end servers doing the caching. In this scenerio you could put some combination of haproxy/varnish/squid/nginx in front of your app servers and let them do the balancing/caching for you.
You could do the same thing I suppose if you continued to build them continuously on the backend.
You're end goal is to have this:
internet -> load balancer -> caching server 1 --> numerous app servers
\-> caching server 2 -/
Add more caching servers and app servers as needed. The internet will never know. Depending on what software you pick the load balancer/caching server might be the same, or might not. Really depends on your load and particular needs.

If you don't want to hit the rails stack, you catch the request with something like rack-cache before it ever reaches the whole app:
http://rtomayko.github.io/rack-cache/
At least that way, you only have to bootstrap rack.
It also supports memcached as a storage mechanism: http://rtomayko.github.io/rack-cache/storage

You are right, S3 is pretty slow by itlsef, especially HTTPS session setup can take up to 5-10 seconds. But S3 is the ideal storage for primary data, we use it a lot but with combination of S3 Nginx proxy to speed data delivery up and inject caching facilities.
Nginx S3 proxy solution well tested on production and the caching mechanism works perfect, every application server goes to proxy that fetches original file from S3 to be cached.
To prevent dog-pile effect you can use:
proxy_cache_lock for new files, doc
proxy_cache_use_stale updating for updated files, doc
An S3 Nginx proxy configuration look at this https://gist.github.com/mikhailov/9639593

Related

How to manage a large number of images in a Rails app?

I have to manage around 200 high quality images in my app. I am currently using Cloudinary to store these images.
But I've seen that many of apps uses a different domain name to store images and other assets (ex: "assets.example.com"). If I understand currently this is called asset_host in a Rails app. I've found documentation on what it is but not much on how to set it up or how the files are served.
How do they do something like that ? Do they pay an other domain name/server and just uses this server to store assets ?
200 images is not that much, even at 100Mb/image (original+downsized variants) it's just 20Gb of storage, under moderate load can easily be handled by one server without any clouds, additional domains etc. And since you're already storing them in a cloud storage - you do not have to worry.
asset_host is for asset pipeline (your css/js/images from app/assets which end up in public/assets), not app's managed data
In the old days assets were served from other hosts to get around connection count limits in browsers(so that assets can be downloaded in parallel and site loaded faster), this is not relevant for modern HTTP/2 (and even the opposite - there's an overhead in establishing additional http connection), unless you're under a really high load or have a specific need for that(for example - when deploying in a container it may be useful to store assets separately).
Second benefit is that browser will not send app's cookies to other host, which saves a little bandwidth.
Many sites set up that domain to be handled by the same physical web server
As for paying for domain - assets.example.com is a third-level domain for example.com, if you already own the latter - you own it too, just need to set up A(and optionally AAAA) DNS records and the server.

serving premium videos using ruby on rails

I am stuck with a performance related issue. I have a ROR application running on VPS. I am posting my question here after having spent lot of time on internet regarding this issue and was not able to find reliable solution.
My ROR application has nearly 300 premium videos and 200 pdf's, after registration user is allowed to watch Free Videos. If the user upgrades account by making payment, then he/she can watch premium videos.
Currently I am serving video file using send file method, below is the code.
SEND_FILE_METHOD = :default
def download
head(:not_found) and return if (track = Video.find_by_id(params[:id])).nil?
#head(:forbidden) and return unless track.downloadable?(current_user)
path = track.video.path(params[:style])
head(:bad_request) and return unless File.exist?(path) && params[:format].to_s == File.extname(path).gsub(/^\.+/, '')
contenttype = MIME::Types.type_for(path).first.content_type # => "image/gif"
send_file_options = { :type => contenttype }
case SEND_FILE_METHOD
when :apache then send_file_options[:x_sendfile] = true
when :nginx then head(:x_accel_redirect => path.gsub(Rails.root, ''), :content_type => send_file_options[:type]) and return
end
send_file(path, send_file_options)
end
My question here is,
what is this right way to serve premium video's via rails application.(My Client VPS has only 1 GB RAM ) and videos are saved in FILE SYSTEM on VPS. My concern is that, if my application gets more than a 100 request at a time, ROR app may fail to server request. I also thought of placing it in public folder, but the videos are only for Paid user, so I cant put them in public videos.
Any help or link to solution is highly appreciated..
Many Thanks :)
If you really want to support serving static content at high volumes, don't put the load on your web server, especially if it's not that strong.
Keep your server resources for stuff that are your core business logic (like managing accounts and permissions) and delegate serving static content to other dedicated services.
I'm only familiar with the AWS suite so I can recommend the following:
For the PDFs (and other files) - Store them on AWS S3 and have your app point to that location. You can create an address within your domain, and can generate separate links for different customers (with expiration) so you can control who accesses the files and for how long.
For the videos - If you serve them as regular files (e.g. download an mp4) - I'd go with the same S3 solution. If you plan to stream them to clients - look at CloudFront Streaming
CloudFront (AWS implementation of a CDN) in general is a good idea if you have clients from different geographies and you want to serve their content from the closest location to them.
As for prices - you can see at the products' pages. It's pretty cheap in my opinion and definitely worth the money considering the scalability it gives you. You might have some learning-curve at the beginning to start working with their APIs but it's really not that difficult and plenty of tutorials are available.
Equivalent solutions (from other vendors) do exist. I'd recommend looking around to see what fits you best.
Good luck!
If possible, you should consider using something similar to S3 or a CDN if possible. These are built to get files to customers quickly and will result in zero load on your server once you've handed the file off to them.
If you want to stick with serving files from your vps then you should look at X-Sendfile (apache) or X-Accel-Redirect (nginx). With these the file is still served by your VPS but it is handled by the web server rather than your rails code.
Rails will use generate these headers for you when you use send_file, but for this to work you need to configure your web server appropriately.
On apache this means installing the sendfile module, on nginx you have to configure which bits of the filesystem are accessible. The code for Rack::Sendfile (the mechanism rails uses) explains how to do this

PDF caching on heroku with cloudflare

I'm having a problem getting the caching I need to work using CloudFlare.
We use CloudFlare for caching all our assets on S3 which works 100% using a separate subdomain cdn
We also use CloudFlare for our main site (hosted on Heroku) as well, e.g. www
My problem is I can't get CloudFlare to cache PDFs that are generated from our Rails app. I'm using the WickedPDF gem to dynamically generate certain PDFs for invoices, etc. I don't want to upload these as files to say S3 but we would like to have CloudFlare cache these so they don't get generated each and every time, as the time spent generating these PDFs is a little intensive.
CloudFlare is turned on and is "accelerating" for the subdomain in question and we're using SSL, but PDFs never seem to cache properly.
Is there something else we need to do to ensure these get cached? Or maybe there's another solution that would work for Heroku? (eg we can't use Page caching since it relies on the filesystem) I also checked the WickedPDF documentation so see if we could do anything else, but found nothing about expire controls.
Thanks,
We should actually cache it as long as the resources are on-domain & not being delivered through a third-party resource in some way.
Keep in mind:
1. Our caching depends on the number of requests for the resources (at least three).
2. Caching is very much data center dependent (in other words, if your site receives a lot of traffic at a data center it is going to be cached; if your site doesn't get a lot of traffic in another data center it may not cache).
I would open a support ticket if you're still having issues.

Rails' page caching vs. HTTP reverse proxy caches

I've been catching up with the Scaling Rails screencasts. In episode 11 which covers advanced HTTP caching (using reverse proxy caches such as Varnish and Squid etc.), they recommend only considering using a reverse proxy cache once you've already exhausted the possibilities of page, action and fragment caching within your Rails application (as well as memcached etc. but that's not relevant to this question).
What I can't quite understand is how using an HTTP reverse proxy cache can provide a performance boost for an application that already uses page caching. To simplify matters, let's assume that I'm talking about a single host here.
This is my understanding of how both techniques work (maybe I'm wrong):
With page caching the Rails process is hit initially and then generates a static HTML file that is served directly by the Web server for subsequent requests, for as long as the cache for that request is valid. If the cache has expired then Rails is hit again and the static file is regenerated with the updated content ready for the next request
With an HTTP reverse proxy cache the Rails process is hit when the proxy needs to determine whether the content is stale or not. This is done using various HTTP headers such as ETag, Last-Modified etc. If the content is fresh then Rails responds to the proxy with an HTTP 304 Not Modified and the proxy serves its cached content to the browser, or even better, responds with its own HTTP 304. If the content is stale then Rails serves the updated content to the proxy which caches it and then serves it to the browser
If my understanding is correct, then doesn't page caching result in less hits to the Rails process? There isn't all that back and forth to determine if the content is stale, meaning better performance than reverse proxy caching. Why might you use both techniques in conjunction?
You are right.
The only reason to consider it is if your apache sets expires headers. In this configuration, the proxy can take some of the load off apache.
Having said this, apache static vs proxy cache is pretty much an irrelevancy in the rails world. They are both astronomically fast.
The benefits you would get would be for your none page cacheable stuff.
I prefer using proxy caching over page caching (ala heroku), but thats just me, and a digression.
A good proxy cache implementation (e.g., Squid, Traffic Server) is massively more scalable than Apache when using the prefork MPM. If you're using the worker MPM, Apache is OK, but a proxy will still be much more scalable at high loads (tens of thousands of requests / second).
Varnish for example has a feature when the simultaneous requests to the same URL (which is not in cache) are queued and only single/first request actually hits the back-end. That could prevent some nasty dog-pile cases which are nearly impossible to workaround in traditional page caching scenario.
Using a reverse proxy in a setup with only one app server seems a bit overkill IMO.
In a configuration with more than one app server, a reverse proxy (e.g. varnish, etc.) is the most effective way for page caching.
Think of a setup with 2 app servers:
User 'Bob'(redirected to node 'A') posts a new message, the page gets expired and recreated on node 'A'.
User 'Cindy' (redirected to node 'B') requests the page where the new message from 'Bob' should appear, but she can't see the new message, because the page on node 'B' wasn't expired and recreated.
This concurrency problem could be solved with a reverse proxy.

Using a CDN to store/serve user image uploads?

I'm still new to the whole CDN ideaology, so this might be a stupid question but I'm sure someone can shed some light on this. I've got a basic php script that takes user image uploads, resizes them, creates a directory ($user_id), and stores the finished product in the directory (like www.mysite.com/uploads/$user_id/image1.jpg). Works like a charm.
I just got all the hosting stuff squared away and I'm using the Rackspace (Slicehost?) "Cloud Server" architecture. I also signed up for the Rackspace (Mosso?) "Cloud Files". So far so good.
So my question is: Should I be storing the images that users upload locally (on my apache server) or as objects via Cloud Files? It seems like a great idea to separate the static content from my web server so I can just use it to generate the dynamic content. But would it be a lot of overhead to create a CDN-enabled Container each time a user uploads an image?
Hopefully I'm not missing the boat on this one totally. I can't seem to find a whole lot of info about this, but I'm sure there is a good reason why I should either pursue or avoid this idea. Any suggestions are greatly appreciated!
I am not familiar with Rackspace's offering, but the general logic behind using a CDN for static content is to achieve two goals:
offload the bandwidth and processing
to other servers, freeing up yours.
move the requests off to the client
Move the large static content closer
to the client
When you send the generated HTML to the browser, it will "see" the images as www.yourdomain.com/my_image.jpg for example, and perform additional requests for each piece of static content, potentially starving your server of threads to service requests. If you move all this content onto a CDN, the browser would see something like cdn.yourdomain.com, and the browser will request the images from the CDN, thus allowing your server to service other requests instead. Additionally, most CDN's distribute your content to multiple locations and have geographic routing for requests to serve the content from the closest possible location, improving the perceived load time for clients.

Resources