I've been catching up with the Scaling Rails screencasts. In episode 11 which covers advanced HTTP caching (using reverse proxy caches such as Varnish and Squid etc.), they recommend only considering using a reverse proxy cache once you've already exhausted the possibilities of page, action and fragment caching within your Rails application (as well as memcached etc. but that's not relevant to this question).
What I can't quite understand is how using an HTTP reverse proxy cache can provide a performance boost for an application that already uses page caching. To simplify matters, let's assume that I'm talking about a single host here.
This is my understanding of how both techniques work (maybe I'm wrong):
With page caching the Rails process is hit initially and then generates a static HTML file that is served directly by the Web server for subsequent requests, for as long as the cache for that request is valid. If the cache has expired then Rails is hit again and the static file is regenerated with the updated content ready for the next request
With an HTTP reverse proxy cache the Rails process is hit when the proxy needs to determine whether the content is stale or not. This is done using various HTTP headers such as ETag, Last-Modified etc. If the content is fresh then Rails responds to the proxy with an HTTP 304 Not Modified and the proxy serves its cached content to the browser, or even better, responds with its own HTTP 304. If the content is stale then Rails serves the updated content to the proxy which caches it and then serves it to the browser
If my understanding is correct, then doesn't page caching result in less hits to the Rails process? There isn't all that back and forth to determine if the content is stale, meaning better performance than reverse proxy caching. Why might you use both techniques in conjunction?
You are right.
The only reason to consider it is if your apache sets expires headers. In this configuration, the proxy can take some of the load off apache.
Having said this, apache static vs proxy cache is pretty much an irrelevancy in the rails world. They are both astronomically fast.
The benefits you would get would be for your none page cacheable stuff.
I prefer using proxy caching over page caching (ala heroku), but thats just me, and a digression.
A good proxy cache implementation (e.g., Squid, Traffic Server) is massively more scalable than Apache when using the prefork MPM. If you're using the worker MPM, Apache is OK, but a proxy will still be much more scalable at high loads (tens of thousands of requests / second).
Varnish for example has a feature when the simultaneous requests to the same URL (which is not in cache) are queued and only single/first request actually hits the back-end. That could prevent some nasty dog-pile cases which are nearly impossible to workaround in traditional page caching scenario.
Using a reverse proxy in a setup with only one app server seems a bit overkill IMO.
In a configuration with more than one app server, a reverse proxy (e.g. varnish, etc.) is the most effective way for page caching.
Think of a setup with 2 app servers:
User 'Bob'(redirected to node 'A') posts a new message, the page gets expired and recreated on node 'A'.
User 'Cindy' (redirected to node 'B') requests the page where the new message from 'Bob' should appear, but she can't see the new message, because the page on node 'B' wasn't expired and recreated.
This concurrency problem could be solved with a reverse proxy.
Related
I'm working on a project to require HTTPS everywhere among a suite of MVC and WebAPI applications. I'm trying to understand the trade-offs between clicking the "Require SSL" checkbox in IIS & using a URL Rewrite zmodule vs. using a RequireHttpsAttribute in my global filters and modifying my web.config.
I've found the following guides detailing each approach:
https://webmasters.stackexchange.com/questions/28057/iis-7-require-ssl-automatically-redirect-to-https
http://tech.trailmax.info/2014/02/implemnting-https-everywhere-in-asp-net-mvc-application/
Explain the mechanism can be lengthy, so I will just list the most significant differences in behaviour:
do "Require SSL" in IIS:
The context basically expalin what it do, it's "Require" not "Enforce", which means, if people trying to access your website content through http, the server will just respond with a 403 error, which is usually not a desired behavior, but this may help some certain situation
using URL rewrite module:
The module itself can do quite some different thing, but I assume you are just going to do the regular https redirect. Which means, if user trying to hit ANY content of the site through http, the server will do a 301 or 302 redirect to the https version of same url. This is usually a good option since it doesn't affect any usability of the website.
Global RequireHttpsAttribute action filter: This do similar thing to option number 2, it will do a 302 redirect for any http request that is hitting an ACTION. The main difference is that this only applies to all actions in your controllers, Which means, if someone trying to just get a image or css file through http on your website, this option will let it through and not do any enforcement. This leave you the capability to serve static contents through http, which can be useful in some specific circumstances
Just one extra thing worth mention, the 301 and 302 redirect is not going too well with http POST, so if your user trying to do a post through http, the request body will get lost (thanks to the info from #ChrisPratt).
Typically the folks managing the infrastructure are responsible for making sure things are on https. Typically they aren't very good at this so that is where the RequireHttpsAttribute kicks in as it can encforce https requests at a code level thereby enforcing the HTTPS-only attribute.
In practice it isn't so great as many production setups -- including stackoverflow.com's -- see https terminated in an edge device before being unwrapped and handed to the back-end apps as http and the require https attribute isn't quite nuanced enough to understand this distinction.
The best bet in general is to configure the edge device providing the public http interface to take HTTPS and only HTTPS. Then setup secondary virtual sites [or whatever is vendor appropriate] to redirect all traffic to the cannonical HTTPS url. I'd be a bit nervous about relying upon the RequireHttpsAttribute unless it will be a small app handling it's own requests. That still leaves open holes in terms of artifacts and other things that might not be coming off of a controller.
I have multiple backend servers continuously building and refresing the public parts of an api in order to cache it. The backend servers are builing depending on what has to be done in the job queue.
At a time,
backend server 1 will build :
/article/1.json
/article/5.json
backend server 2 will build :
/article/3.json
/article/9.json
/article/6.json
I need to serve these files from the front-end servers. The cache is stored as file in order to be directly served by nginx without going through the rails stack.
The issue is to manage to have the cache up to date on the front-end servers in a scalable way (adding new servers should be seamless).
I've considered :
NFS / S3 (but too slow)
Memcached (but can't serve directly from nginx - might be wrong ?)
CouchDB direcly serving JSON (I feel this is too big for the job)
Backend to write json in redis, job in fronted to re-write files at the good place (currently my favorite option)
Any experience or great idea on a better way to achieve this ?
You don't say how long it takes to build a single article, but assuming it's not horrifically slow, I think you'd be better off letting the app servers build the pages on the fly and having the front end servers doing the caching. In this scenerio you could put some combination of haproxy/varnish/squid/nginx in front of your app servers and let them do the balancing/caching for you.
You could do the same thing I suppose if you continued to build them continuously on the backend.
You're end goal is to have this:
internet -> load balancer -> caching server 1 --> numerous app servers
\-> caching server 2 -/
Add more caching servers and app servers as needed. The internet will never know. Depending on what software you pick the load balancer/caching server might be the same, or might not. Really depends on your load and particular needs.
If you don't want to hit the rails stack, you catch the request with something like rack-cache before it ever reaches the whole app:
http://rtomayko.github.io/rack-cache/
At least that way, you only have to bootstrap rack.
It also supports memcached as a storage mechanism: http://rtomayko.github.io/rack-cache/storage
You are right, S3 is pretty slow by itlsef, especially HTTPS session setup can take up to 5-10 seconds. But S3 is the ideal storage for primary data, we use it a lot but with combination of S3 Nginx proxy to speed data delivery up and inject caching facilities.
Nginx S3 proxy solution well tested on production and the caching mechanism works perfect, every application server goes to proxy that fetches original file from S3 to be cached.
To prevent dog-pile effect you can use:
proxy_cache_lock for new files, doc
proxy_cache_use_stale updating for updated files, doc
An S3 Nginx proxy configuration look at this https://gist.github.com/mikhailov/9639593
We are currently updating our sites at work, and I am responsible for choosing/designing our caching strategy.
Our sites are all article based magazine sites, however some of them have a user system for restricted articles that needs subscriptions.
We have used page caching so far (and stored the pages in memcached) with a little bit of javascript. However Im thinking that Rack::Cache or maybe Varnish is a better solution now. As far as i can see, does it work almost the same way performance wise:
Page caching, caches the full page in memcached and this cache will be served directly from memcached by nginx on future request.
Rack::Cache, also caches the full page in memcached, owever the cached version is served by the webserver instead of nginx. Rack::Cache uses HTTP-caching headers which means that visitors also will store a local cache in the browsers. Furthermore would it be easy to replace with Varnish, that also uses HTTP-caching headers.
Im i right so far, and does anyone else have some comments on the differences or the performance of the two strategies? It's also possible too use both, but i can see any advantages with this approach as they will cache the same types of pages.
Note: Please correct me if any of my assumptions are wrong. I'm not very sure of any of this...
I have been playing around with HTTP caching on Heroku and trying to work out
a nice way to differentiate between mobile and desktop requests when caching using Varnish
on Heroku.
My first idea was that I could set a Vary header so the cache is Varied on If-None-Match. As Rails automatically sends back etags generated from a hash of the content the etag would vary between desktop and mobile requests (different templates) and so it would eventually cache two versions (not fact, just my original thoughts). I have been playing around with this but I don't think it works.
Firstly, I can't wrap my head around when/if anything gets cached as surely requests with If-None-Match will be conditional gets anyway? Secondly, in practice fresh requests (ones without If-None-Match) sometimes receive the mobile site. Is this because the cache doesn't know whether to serve up the mobile or desktop cached version as the If-None-Match header isn't there?
As it probably sounds, I am rather confused. Will this approach work in any way or am I being silly? Also, is there anyway to achieve separate cached versions if I am unable to reach the Varnish config at all (as I am on Heroku)?
The exact code I am using in Rails to set the cache headers is:
response.headers['Cache-Control'] = 'public, max-age=86400'
response.headers['Vary'] = 'If-None-Match'
Edit: I am aware I can use Vary: User-Agent but trying to avoid it if possible due to it have a high miss rate (many, many user agents).
You could try Vary: User-Agent. However you'll have many cached versions of a single page (one for each user agent).
An other solution may be to detect mobile browsers directly in the reverse proxy, set a X-Is-Mobile-Browser client header before the reverse proxy attempts to find a cached page, set a Vary: X-Is-Mobile-Browser on the backend server (so that the reverse proxy will only cache 2 versions of the same page) and replace that header with Vary: User-Agent before sending to client.
If you can not change your varnish configuration, you have to make different urls for mobile and desktop pages. You can add some url-parameter (?mobile=true), add a piece in your path (yourdomain.com/mobile/news) or use a different host (like m.yourdomain.com).
This makes a lot of sense because (I've seen this many times, both in CMSs and applications) at some point in time you want to differentiate content and structure for mobile devices. People just do different things or are looking for different information on mobile devices...
I am trying to reduce the load on my webservers by adding an "Image server" (a dedicated server for handling image requests), and redirecting all requests for .gif,.jpg,.png etc., to it.
My question is, what is the best way to handle the redirection?
At the firewall level? (can I do this using iptables?)
At the load balancer level? (can ldirectord handle this?)
At the apache level - using rewrite rules?
Thanks for any suggestions on the best way to do this.
--Update--
One thing I would add is that these are domains that are hosted for 3rd parties, so I can't expect all the developers to modify their code and point their images to another server.
The further up the chain you can do it, the better.
Ideally, do it at the DNS level by using a different domain for your images (eg imgs.example.com)
If you can afford it, get someone else to do it by using a CDN (Content delivery network).
-Update-
There are also 2 featuers of apache's mod_rewrite that you might want to look at. They are all described well at http://httpd.apache.org/docs/1.3/misc/rewriteguide.html.
The first is under the heading "Dynamic Miror" in the above document, that uses the mod_rewrite Proxy flag [p]. This lets your server silently fetch files from another domain and return them.
The second is to just redirect the request to the new domain. This second option puts less strain on your server, but requests still need to come in and it slows down the final rendering of the page, as each request needs to make an essentially redundant request to your server first.
i agree with rikh. If you want images to be served from a different webserver, then serve them on a different web-server. For example:
<IMG src="images/Brett.jpg">
becomes
<IMG src="http://brettnesbitt.akamia-technologies.com/images/Brett.jpg">
Any kind of load balancer will still feed the image from the web-server's pipe, which is what you're trying to avoid.
i, of course, know what you really want. What you really want is for any request like:
GET images/Brett.jpg HTTP/1.1
to automatically get converted into:
HTTP/1.1 307 Temporary Redirect
Location: http://brettnesbitt.akamia-technologies.com/images/Brett.jpg
this way you don't have to do any work, except copy the images to the other web-server.
That i really don't know how to do.
By using the phrase "NAT", it implies that the firewall/router receives HTTP requests, and you want to forward the request to a different internal server if the HTTP request was for image files.
This then begs the question about what you're actually trying to save. No matter which internal web-server services the HTTP request, the data is still going to have to flow through the firewall/router's pipe.
The reason i bring it up is because the common scenario when someone wants to serve images from a different server is because they want to split up high-bandwidth, mostly static, low-CPU cost content from their actual logic.
Only using NAT to re-write the packet and send it to a different server will not work towards that common issue.
The other reason might be because images are not static content on your system, and a request to
GET images/Brett.jpg HTTP/1.1
actually builds an image on the fly, with a high-CPU cost, or only using with data available (i.e. SQL Server database) to ServerB.
If this is the case then i would still use a different server name on the image request:
GET http://www.brettsoft.com/default.aspx HTTP/1.1
GET http://imageserver.brettsoft.com/images/Brett.jpg HTTP/1.1
i understand what you're hoping for, with network packet inspection to override the NAT rule and send it to another server - i've never seen any such thing that can do that.
It sounds more "proxy-ish", where the web-proxy does this. (i.e. pfSense and m0n0wall can't do it)
Which then leads to a kind of solution we used once: a custom web-server that analyzes the request, makes the appropriate request off some internal server, and binary writes the response to the client.
That pain in the ass solution was insisted upon by a "security consultant", who apparently believes in security through obscurity.
i know IIS cannot do such things for you itself - i don't know about other web-server products.
i just asked around, and apparently if you wanted to write a custom kernel module for you linux based router, you could have it inspect packets and take appropriate action. Such a module might exist. There are, apparently, plenty of other open-sourced modules to use as a starting point.
But i'd rather shoot myself in the head.