Remove unnecessary HTTP headers in my rails answers - ruby-on-rails

I am currently developing an API where size matters: I want the answer to contain as few bytes as possible. I optimized my JSON answer, but rails still responds with many strange headers
HTTP/1.1 200 OK
Server: nginx/0.7.67 # Not from Rails, so ok.
Date: Wed, 25 Apr 2012 20:17:21 GMT # Date does not matter. We use ETag Can I remove this?
ETag: "678ff0c6074b9456832a710a3cab8e22" # Needed.
Content-Type: application/json; charset=utf-8 # Also needed.
Transfer-Encoding: chunked # The alternative would be Content-Length, so ok.
Connection: keep-alive # Good, less TCP overhead.
Status: 200 OK # Redundant! How can I remove this?
X-UA-Compatible: IE=Edge,chrome=1 # Completely unneded.
Cache-Control: no-cache # Not needed.
X-Request-Id: c468ce87bb6969541c74f6ea761bce27 # Not a real header at all.
X-Runtime: 0.001376 # Same goes for this
X-Rack-Cache: invalidate, pass # And this.
So there are lots of unnecessary HTTP headers. I could filter them in my server (nginx), but is there a way stop this directly in rails?

You can do this with a piece of Rack middleware. See https://gist.github.com/02c1cc8ce504033d61bf for an example of to do it in one.
When adding it to your app config, use something like config.middleware.insert_before(ActionDispatch::Static, ::HeaderDelete)
You want to insert it before whatever the first item in the list that displays when you run rake middleware, which in my case is ActionDispatch::Static.
http://guides.rubyonrails.org/rails_on_rack.html may be somewhat helpful if you haven't been exposed to Rack in the Rails context before.

Another option, since you're using Nginx, is the HttpHeadersMoreModule. This will allow you to have fine-grain control of exactly which headers are sent down the wire.
In your case, you'd specifically want to use the more_clear_headers directive, as such:
more_clear_headers Server Date Status X-UA-Compatible Cache-Control X-Request-Id X-Runtime X-Rack-Cache;
This also clears the Server header, since it's not really necessary, and if you're trying to save bytes, every little bit helps.
This module does require you to compile Nginx on your own, but that really shouldn't scare you. Nginx is very easy to compile, just follow the installation instructions.

I agree that both solutions presented by x1a4 and Stephen McCarth are good.
Ideally you should definitely use the HttpHeadersMoreModule however if someone is fan of native Ubuntu NginX package with security updates like I am, (or you don't have time for that, or just lazy) you don't necessary need to do that.
Another way is to use proxy_hide_header
server {
location #unicorn {
# ...
proxy_hide_header X-Powered-By;
proxy_hide_header X-Runtime;
# ...
}
}
note: #unicorn is just upsteram server, the location can be whatever /, /assets, ..
Now one argument against this solution is if you use several server blocks inside configuration that you need to specify proxy_hide_header to each one of them. Well yes but you can just create file and include it
# /etc/nginx/sites-enabled/my_app
server {
location #unicorn {
# ...
include /etc/nginx/shared/stealth_headers
# ...
}
}
# /etc/nginx/shared/stealth_headers
proxy_hide_header X-Powered-By;
proxy_hide_header X-Runtime
So why I think this solution is better than to use the middle-ware solution as presented by x1a4 ?
I had similar middle-ware solution before and it was working fine for couple of months. Then one day we stopped receiving Exception errors by exception monitoring tool party_foul gem. Long story short Middlewares are tricky, we done some code changes and this middleware was throwing exception, but it was throwing exception that was not caught with middleware that was suppose to monitor exceptions. So yes the whole thing is my bad, I should keep better eye on my code not doing stupid stuff, hewever I had unpleasant experience that is hard to erase, so I'm just recommending if you can rather to handle this on NginX level, not on middle-ware level
+ it make more sence if your NginX is handling several configurations (you don't have to update several applications if some change)

Related

Does not setting cache-control automatically enable caching even without conditional request?

For the following image: https://upload.wikimedia.org/wikipedia/commons/7/79/2010-brown-bear.jpg
There isn't any cache-control header. And based on here even if you don't send anything then it will use its default value which is private. That being doesn't the URLSession need to perform a conditional request to make sure its still valid?
Is there anything in the headers that allows it to make such a conditional request? Because I don't see cache-control, max-age, Expires. The only things I see is are Last-Modified & Etag but again it needs to validate against the server or does not specifying anything make it cache indefinitely?! I've already read this answer, but doesn't discuss this scenario.
Yet it's being cached by the URLSession. (Because if I turn off internet, still it gets downloaded)
Only other thing I see is "Strict-Transport-Security": max-age=106384710.
Does that effect caching? I've already look here and don't believe it should. From what I the max-age for the HSTS key is there only to enforce it to be accessed from HTTPS for a certain period of time. Once the max-age is reached then access through HTTP is also possible.
These are all the headers that I'm getting back:
Date : Wed, 31 Oct 2018 14:15:33 GMT
Content-Length : 215104
Access-Control-Expose-Headers: Age, Date, Content-Length, Content-Range, X-Content-Duration, X-Cache, X-Varnish
Via : 1.1 varnish (Varnish/5.1), 1.1 varnish (Varnish/5.1)
Age : 18581
Etag : 00e21950bf432476c91b811bb685b6af
Strict-Transport-Security : max-age=106384710; includeSubDomains; preload
Accept-Ranges : bytes
Content-Type : image/jpeg
Last-Modified : Fri, 04 Oct 2013 23:30:08 GMT
Access-Control-Allow-Origin : *
Timing-Allow-Origin : *
x-analytics : https=1;nocookies=1
x-object-meta-sha1base36 : 42tq5grg9rq1ydmqd4z5hmmqj6h2309
x-varnish : 60926196 48388489, 342256851 317476424
x-cache-status : hit-front
x-trans-id : tx08ed43bbcc1946269a9a3-005bd97070
x-timestamp : 1380929407.39127
x-cache : cp1076 hit/7, cp1090 hit/7
x-client-ip : 2001:558:1400:4e:171:2a98:fad6:2579
This question was asked because of this comment
doesn't the URLSession need to perform a conditional request to make sure its still valid?
The user-agent should be performing a conditional request, because of the
Etag: 00e21950bf432476c91b811bb685b6af
present. My desktop Chrome certainly does performs the conditional request (and gets back 304 Not Modified).
But it's free not to
But a user-agent is perfectly free to decide on it's own. It's perfectly free to look at:
Last-Modified: Fri, 04 Oct 2013 23:30:08 GMT
and decide that there resource is probably good for the next five minutes1. And if the network connection is down, its perfectly reasonable and correct to display the cached version instead. In fact, your browser would show you web-sites even while your dial-up 0.00336 Mbps dial-up modem was disconnected.
You wouldn't want your browser to show you nothing, when it knows full well it can show you something. It becomes even more useful when we're talking about poor internet connectivity not because of slow dialup and servers that go down, but of mobile computing, and metered data plans.
1I say 5 minutes, because in the early web, servers did not give cache hints. So browsers cached things without even being asked. And 5 minutes was a good number. And you used Ctrl+F5 (or was it Shift+F5, or was it Shift+Click, or was it Alt+Click) to force the browser to bypass the cache.

Azure CDN not caching controller response

I put code from end of this article to my MVC controller method:
http://msdn.microsoft.com/en-us/library/windowsazure/gg680299.aspx
I configured cname for cdn and all working fine except I feel that cdn not caching :)
There is CDN url
http://cdn.services.idemkvrachu.ru/services/BranchLogo/82f204fe-bb1d-4204-b817-d424e1284b17/E0F4F2AE-B6C2-4516-BE7C-59B649E2C5AC?lastUpdated=635169430040919922&width=499
And this is original url
http://prm.idemkvrachu.ru/cdn/services/BranchLogo/82f204fe-bb1d-4204-b817-d424e1284b17/E0F4F2AE-B6C2-4516-BE7C-59B649E2C5AC?lastUpdated=635169430040919922&width=499
This is my code:
Response.Cache.SetExpires(DateTime.Now.AddDays(14));
Response.Cache.SetCacheability(HttpCacheability.Public);
Response.Cache.SetLastModified(blob.ChangDateOfs.DateTime);
return File(bytes, format);
When I checked timings receiving picture from original link and cdn - I found that timings higher on cdn.
Also I was trying change blob.ChangDateOfs and comparing Last-Modified header from cdn response: it immediately changes.
What's wrong with my code? Maybe this header breaks cdn cache Cache-Control public, no-cache="Set-Cookie" ?
To troubleshoot caching issues the first thing you want to do is validate if your content is actually getting cached or not.o
To do this you can add the X-LDebug header with a value of 2. An example of doing this against your endpoint with the relevant portions of output included:
C:\Azure\Tools\wget\bin>wget -S --header "X-LDebug:2" http://cdn.services.idemkvrachu.ru/services/BranchLogo/82f204fe-bb1d-4204-b817-d424e1284b17/E0F4F2AE-B6C2-4516-BE7C-59B649E2C5AC?lastUpdated=635169430040919922&width=499
Cache-Control: public, no-cache="Set-Cookie"
Set-Cookie: ASP.NET_SessionId=nnxb3xqdqetj0uhlffdmtf03; path=/; HttpOnly
Set-Cookie: idCity=31ed5892-d3cb-45eb-bd4f-526cd65f5302; domain=idemkvrachu.cloudapp.net;
X-Cache: MISS from cds173.sat9.msecn.net
As you can see, you are setting the Cache-Control header to no-cache="Set-Cookie", and then are setting a cookie. This is telling the CDN to not cache the content. Since your code is only setting the cache control to Public I assume that you have a setting in your web.config or aspx page that is modifying the cache control header to add the no-cache="Set-Cookie".

Heroku & Rails - Varnish is only caching very occasionally

I have an issue similar to Heroku & Rails - Varnish HTTP Cache Not Working, but the solution (wait for a while, then everything works) doesn't seem to apply - I've had the setup below for several days.
This thread on the Heroku Google group has some users with the same problem. They mention that it takes a while for everything to be cached, but my understanding is that after a while, everything should get cached, no? Or does that only apply if there is a Lot of traffic?
I need some advice on where I should be looking/what I can try changing in order to get caching working properly.
My setup:
I have http://www.swingoutlondon.co.uk running on Heroku (Rails 3.0.3, Ruby 1.9.2, bamboo-mri-1.9.2) and the main index page performs a lot of database queries to return what is essentially a static page - usually taking about 2-3 seconds (yes, that's something I really do need to address, but I figure varnish caching is a quick win).
I've set the Cache-Control response header as described here, and indeed that does seem to have been set on the page:
>> curl -I http://swingoutlondon.co.uk
HTTP/1.1 200 OK
Server: nginx
Date: Sun, 13 May 2012 00:01:05 GMT
Content-Type: text/html; charset=utf-8
Connection: keep-alive
Cache-Control: public, max-age=300
Etag: "2565201f3ae39c6a9a1f6b1fb8bbbe0a"
X-Ua-Compatible: IE=Edge,chrome=1
X-Runtime: 1.699667
Content-Length: 44224
Accept-Ranges: bytes
X-Varnish: 681634826
Age: 0
Via: 1.1 varnish
Note: Cache-Control: public, max-age=300
I assume that Age: 0 indicates that it hasn't retrieved a cached copy, and indeed the command returns in the normal slow 2-3 seconds.
If keep repeatedly trying that curl, I can occasionally a cached copy (the page loads in under half a second and Age is greater than 0).
I must confess to not fully understanding HTTP headers, but one clue might be: when Age is greater than 0, I get two lots of digits in X-Varnish (in all other cases I only get one set):
X-Varnish: 848670407 848650521
Here's what I've checked:
the source of is identical each time.
I have one before_filter on that page, which sets the time the page was last updated as an instance variable.
there are a number of cookies - as far as I can see they are all set by either Google Analytics or the Twitter or Facebook buttons.
For good measure, here are my Request headers:
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en-GB,en-US;q=0.8,en;q=0.6
Cache-Control:max-age=0
Connection:keep-alive
Cookie:__utma=264326157.189257391.1336869624.1336869624.1336869624.1; __utmb=264326157.2.10.1336869624; __utmc=264326157; __utmz=264326157.1336869624.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)
Host:www.swingoutlondon.co.uk
If-None-Match:"2565201f3ae39c6a9a1f6b1fb8bbbe0a"
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.168 Safari/535.19
Ah well - turns out that because Heroku uses multiple independent Varnish servers, and because traffic to Swing Out London is relatively low, I shouldn't expect to have many pages served by the caches if my max-age is only 5 minutes. Setting it to 20 or 30 minutes results in much more caching.
I've written a detailed blog post collecting my learnings. Thanks to Garry Shulter for helping me out with this one.

Howto control Varnish and a Browser using Cache-Control: max-age Header in a Rails environment?

Recently I added a Varnish instance to a Rails application stack. Varnish in it's default configuration can be convinced from caching a certain resource using the Cache-Control Header like so:
Cache-Control: max-age=86400, public=true
I achieved that one using the expires_in statement in my controllers:
def index
expires_in 24.hours, public: true
respond_with 'some content'
end
That worked well. What I did not expect is, that the Cache-Control header ALSO affects the browser. That leads to the problem that both - Varnish and my users browser cache a certain resource. The resource is purged from varnish correctly, but the browser does not attempts to request it again unless max-age is reached.
So I wonder wether I should use 'expires_in' in combination with Varnish at all? I could filter the Cache-Control header in a Nginx or Apache instance in front of Varnish, but that seems odd.
Can anyone enlighten me?
Regards
Felix
That is actually a very good and valid question, and a very common one with reverse proxies.
The problem is that there's only one Cache-Control property and it is intended for the client browser (private cache) and/or a proxy server (shared cache). If you don't want 3rd party proxies to cache your content at all, and want every request to be served by your Varnish (or by your Rails backend), you must send appropriate Cache-Control header from Varnish.
Modifying Cache-Control header sent by the backend is discussed in detail at https://www.varnish-cache.org/trac/wiki/VCLExampleLongerCaching
You can approach the solution from two different angles. If you wish to define max-age at your Rails backend, for instance to specify different TTL for different objects, you can use the method described in the link above.
Another solution is to not send Cache-Control headers at all from the backend, and instead define desirable TTLs for objects in varnish vcl_fetch(). This is the approach we have taken.
We have a default TTL of 600 seconds in Varnish, and define longer TTLs for pages that are definitely explicitly purged when changes are made. Here's our current vcl_fetch() definition:
sub vcl_fetch {
if (req.http.Host ~ "(forum|discus)") {
# Forum pages are purged explicitly, so cache them for 48h
set beresp.ttl = 48h;
}
if (req.url ~ "^/software/") {
# Software pages are purged explicitly, so cache them for 48h
set beresp.ttl = 48h;
}
if (req.url ~ "^/search/forum_search_results" ) {
# We don't want forum search results to be cached for longer than 5 minutes
set beresp.ttl = 300s;
}
if(req.url == "/robots.txt") {
# Robots.txt is updated rarely and should be cached for 4 days
# Purge manually as required
set beresp.ttl = 96h;
}
if(beresp.status == 404) {
# Cache 404 responses for 15 seconds
set beresp.http.Cache-Control = "max-age=15";
set beresp.ttl = 15s;
set beresp.grace = 15s;
}
}
In our case we don't send Cache-Control headers at all from the web backend servers.

Why is Apache + Rails is spitting out two status headers for code 500?

I have a rails app that is working fine except for one thing.
When I request something that doesn't exist (i.e. /not_a_controller_or_file.txt) and rails throws a "No Route matches..." exception, the response is this (blank line intentional):
HTTP/1.1 200 OK
Date: Thu, 02 Oct 2008 10:28:02 GMT
Content-Type: text/html
Content-Length: 122
Vary: Accept-Encoding
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Status: 500 Internal Server Error
Content-Type: text/html
<html><body><h1>500 Internal Server Error</h1></body></html>
I have the ExceptionLogger plugin in /vendor, though that doesn't seem to be the problem. I haven't added any error handling beyond the custom 500.html in public (though the response doesn't contain that HTML) and I have no idea where this bit of html is coming from.
So Something, somewhere is adding that HTTP/1.1 200 status code too early, or the Status: 500 too late. I suspect it's Apache because I get the appropriate HTTP/1.1 500 header (at the top) when I use Webrick.
My production stack is as follows:
Apache 2
Mongrel (5 instances)
RubyOnRails 2.1.1 (happens in both 1.2 and 2.1.1)
I forgot to mention, the error is caused by a "no route matches..." exception
This is a fairly old thread, but for what it's worth I found a great resource that includes a detailed description of the problem and the solution. Apparently this bug affects Rails < 2.3 when used with Mongrel.
The article that helped me understand the problem & write my own patch.
An official Rails bug ticket that includes a patch for Rails 2.2.2.
This html file is coming from Rails. It is encountering some sort of error (probably an exception of some kind, or some other unrecoverable error).
If the extra blank line between the Status: header and the actual headers is there, and not just a typo, then this would go a long way to explaining why Apache is reporting a 200 OK message.
The Status header is how Rails, PHP, or whatever tells Apache "There was an error, please return this code instead of 200 OK." The fact there is a blank line means something extra is going on and Ruby is outputting a blank line before the error output for whatever reason. Maybe it's previous output from your script. The long and short of it is though, the extra blank line means that Apache thinks "Oh, blank line, no extra headers, this is all content now.", which would be consistent with the Content-Length header you provided.
My guess for why there's a blank line would be previous script output, perhaps a line ending at the end of a fully script page. As to why the 500 error is happening, there isn't nearly enough info here to tell you that. Maybe a file I/O error.
Edit: Given the extra information provided by Dave about the internals, I'd say this is actually an issue with the proxying that goes on behind the scenes... I couldn't tell you exactly what though, beyond what's already been said.
This is coming from rails itself.
http://github.com/rails/rails/tree/master/actionpack/lib/action_controller/dispatcher.rb#L60
The dispatcher is return an error page with the status code of 200 (Success).

Resources