Heroku & Rails - Varnish is only caching very occasionally - ruby-on-rails

I have an issue similar to Heroku & Rails - Varnish HTTP Cache Not Working, but the solution (wait for a while, then everything works) doesn't seem to apply - I've had the setup below for several days.
This thread on the Heroku Google group has some users with the same problem. They mention that it takes a while for everything to be cached, but my understanding is that after a while, everything should get cached, no? Or does that only apply if there is a Lot of traffic?
I need some advice on where I should be looking/what I can try changing in order to get caching working properly.
My setup:
I have http://www.swingoutlondon.co.uk running on Heroku (Rails 3.0.3, Ruby 1.9.2, bamboo-mri-1.9.2) and the main index page performs a lot of database queries to return what is essentially a static page - usually taking about 2-3 seconds (yes, that's something I really do need to address, but I figure varnish caching is a quick win).
I've set the Cache-Control response header as described here, and indeed that does seem to have been set on the page:
>> curl -I http://swingoutlondon.co.uk
HTTP/1.1 200 OK
Server: nginx
Date: Sun, 13 May 2012 00:01:05 GMT
Content-Type: text/html; charset=utf-8
Connection: keep-alive
Cache-Control: public, max-age=300
Etag: "2565201f3ae39c6a9a1f6b1fb8bbbe0a"
X-Ua-Compatible: IE=Edge,chrome=1
X-Runtime: 1.699667
Content-Length: 44224
Accept-Ranges: bytes
X-Varnish: 681634826
Age: 0
Via: 1.1 varnish
Note: Cache-Control: public, max-age=300
I assume that Age: 0 indicates that it hasn't retrieved a cached copy, and indeed the command returns in the normal slow 2-3 seconds.
If keep repeatedly trying that curl, I can occasionally a cached copy (the page loads in under half a second and Age is greater than 0).
I must confess to not fully understanding HTTP headers, but one clue might be: when Age is greater than 0, I get two lots of digits in X-Varnish (in all other cases I only get one set):
X-Varnish: 848670407 848650521
Here's what I've checked:
the source of is identical each time.
I have one before_filter on that page, which sets the time the page was last updated as an instance variable.
there are a number of cookies - as far as I can see they are all set by either Google Analytics or the Twitter or Facebook buttons.
For good measure, here are my Request headers:
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en-GB,en-US;q=0.8,en;q=0.6
Cache-Control:max-age=0
Connection:keep-alive
Cookie:__utma=264326157.189257391.1336869624.1336869624.1336869624.1; __utmb=264326157.2.10.1336869624; __utmc=264326157; __utmz=264326157.1336869624.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)
Host:www.swingoutlondon.co.uk
If-None-Match:"2565201f3ae39c6a9a1f6b1fb8bbbe0a"
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.168 Safari/535.19

Ah well - turns out that because Heroku uses multiple independent Varnish servers, and because traffic to Swing Out London is relatively low, I shouldn't expect to have many pages served by the caches if my max-age is only 5 minutes. Setting it to 20 or 30 minutes results in much more caching.
I've written a detailed blog post collecting my learnings. Thanks to Garry Shulter for helping me out with this one.

Related

Does not setting cache-control automatically enable caching even without conditional request?

For the following image: https://upload.wikimedia.org/wikipedia/commons/7/79/2010-brown-bear.jpg
There isn't any cache-control header. And based on here even if you don't send anything then it will use its default value which is private. That being doesn't the URLSession need to perform a conditional request to make sure its still valid?
Is there anything in the headers that allows it to make such a conditional request? Because I don't see cache-control, max-age, Expires. The only things I see is are Last-Modified & Etag but again it needs to validate against the server or does not specifying anything make it cache indefinitely?! I've already read this answer, but doesn't discuss this scenario.
Yet it's being cached by the URLSession. (Because if I turn off internet, still it gets downloaded)
Only other thing I see is "Strict-Transport-Security": max-age=106384710.
Does that effect caching? I've already look here and don't believe it should. From what I the max-age for the HSTS key is there only to enforce it to be accessed from HTTPS for a certain period of time. Once the max-age is reached then access through HTTP is also possible.
These are all the headers that I'm getting back:
Date : Wed, 31 Oct 2018 14:15:33 GMT
Content-Length : 215104
Access-Control-Expose-Headers: Age, Date, Content-Length, Content-Range, X-Content-Duration, X-Cache, X-Varnish
Via : 1.1 varnish (Varnish/5.1), 1.1 varnish (Varnish/5.1)
Age : 18581
Etag : 00e21950bf432476c91b811bb685b6af
Strict-Transport-Security : max-age=106384710; includeSubDomains; preload
Accept-Ranges : bytes
Content-Type : image/jpeg
Last-Modified : Fri, 04 Oct 2013 23:30:08 GMT
Access-Control-Allow-Origin : *
Timing-Allow-Origin : *
x-analytics : https=1;nocookies=1
x-object-meta-sha1base36 : 42tq5grg9rq1ydmqd4z5hmmqj6h2309
x-varnish : 60926196 48388489, 342256851 317476424
x-cache-status : hit-front
x-trans-id : tx08ed43bbcc1946269a9a3-005bd97070
x-timestamp : 1380929407.39127
x-cache : cp1076 hit/7, cp1090 hit/7
x-client-ip : 2001:558:1400:4e:171:2a98:fad6:2579
This question was asked because of this comment
doesn't the URLSession need to perform a conditional request to make sure its still valid?
The user-agent should be performing a conditional request, because of the
Etag: 00e21950bf432476c91b811bb685b6af
present. My desktop Chrome certainly does performs the conditional request (and gets back 304 Not Modified).
But it's free not to
But a user-agent is perfectly free to decide on it's own. It's perfectly free to look at:
Last-Modified: Fri, 04 Oct 2013 23:30:08 GMT
and decide that there resource is probably good for the next five minutes1. And if the network connection is down, its perfectly reasonable and correct to display the cached version instead. In fact, your browser would show you web-sites even while your dial-up 0.00336 Mbps dial-up modem was disconnected.
You wouldn't want your browser to show you nothing, when it knows full well it can show you something. It becomes even more useful when we're talking about poor internet connectivity not because of slow dialup and servers that go down, but of mobile computing, and metered data plans.
1I say 5 minutes, because in the early web, servers did not give cache hints. So browsers cached things without even being asked. And 5 minutes was a good number. And you used Ctrl+F5 (or was it Shift+F5, or was it Shift+Click, or was it Alt+Click) to force the browser to bypass the cache.

Valence 500 errors when retrieving module content

I'm consistently receiving HTTP 303 ("See Other") redirection to HTTP 500 error responses from our Valence API when trying to retrieve the structure for a handful of modules. (Handful = 20-30 modules out of about 19,000). The ones that fail consistently fail, but can be viewed through the web UI without issue.
Here's a cURL request of a failing request:
GET /d2l/api/le/1.4/420523/content/modules/2872608/structure/?x_a=...&x_b=...&x_c=...&x_d=...&x_t=... HTTP/1.1
User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2
Host: d2l.deakin.edu.au
Accept: */*
And the response:
HTTP/1.1 303 See other
Cache-Control: private
Content-Length: 0
Location: /d2l/error/500
Server: Microsoft-IIS/7.5
X-XSS-Protection: 0
X-Powered-By: ASP.NET
Date: Mon, 16 May 2016 05:13:37 GMT
The modules that I've looked up in the web UI don't seem any different to any other module, for example a module might contain 3 sub-modules, all with topic-only content, and one of the modules gives a 303 response, 2 do not.
There are a couple of org units with 2 or 3 "bad" modules, but mostly they're singular.
I've also tried this at API versions 1.1 -> 1.4 (we are currently on v1.4)
I also raised a support call with the vendor for this issue.
It turns out to be a bug in Valence API versions <= 1.4, triggered by a module containing a topic file which ends in a dot/period (hence having the form some_file..ext once its given an extension).
Solution: upgrade to Valence 1.6 where the bug is fixed, or rename affected files.

Firefox stored cached incomplete response

I just found a partial response being cached as complete in one of our customer's machines, which rendered the whole website unusable. And I have absolutely no idea, what could possible have gone wrong there.
So what could have possibly gone wrong in the following setup?
On the server-side, we have an ASP.NET-application running. One IHttpHandler handles requests to javascript-files. It basically minifies the files as they are requested and writes the result on the response-stream. It does also log the length of the string being written to the Response-Stream:
String javascript = /* Javascript is retrieved here */;
HttpResponse response = context.Response;
response.ContentEncoding = Encoding.UTF8;
response.ContentType = "application/javascript";
HttpCachePolicy cache = response.Cache;
cache.SetCacheability(HttpCacheability.Public);
cache.SetMaxAge(TimeSpan.FromDays(300));
cache.SetETag(ETag);
cache.SetExpires(DateTime.Now.AddDays(300));
cache.SetLastModified(LastModified);
cache.SetRevalidation(HttpCacheRevalidation.None);
response.Headers.Add("Vary", "Accept-Encoding");
Log.Info("{0} characters sent", javascript.length);
response.Write(javascript);
response.Flush();
response.End();
The content is then normally sent using gzip-encoding with chunked transfer-encoding. Seems simple enough to me.
Unfortunately, I just had a remote-session with a user, where only about 1/3 of the file was in the cache, which broke the file of course (15k instead of 44k). In the cache, the content-encoding was also set to gzip, all communication took place via https.
After having opened the source-file on the user's machine, I just hit Ctrl-F5 and the full content was displayed immediately.
What could have possibly gone wrong?
In case it matters, please find the cache-entry from Firefox below:
Cache entry information
key: <resource-url>
fetch count: 49
last fetched: 2015-04-28 15:31:35
last modified: 2015-04-27 15:29:13
expires: 2016-02-09 14:27:05
Data size: 15998 B
Security: This is a secure document.
security-info: (...)
request-method: GET
request-Accept-Encoding: gzip, deflate
response-head: HTTP/1.1 200 OK
Cache-Control: public, max-age=25920000
Content-Type: application/javascript; charset=utf-8
Content-Encoding: gzip
Expires: Tue, 09 Feb 2016 14:27:12 GMT
Last-Modified: Tue, 02 Jan 2001 11:00:00 GMT
Etag: W/"0"
Vary: Accept-Encoding
Server: Microsoft-IIS/8.0
X-AspNet-Version: 4.0.30319
Date: Wed, 15 Apr 2015 13:27:12 GMT
necko:classified: 1
Your clients browser is most likely caching the JavaScript files which would mean the src of your scripts isn't changing.
For instance if you were to request myScripts
<script src="/myScripts.js">
Then the first time, the client would request that file and any further times the browser would read its cache.
You need to append some sort of unique value such as a timestamp to the end of your scripts so even if the browser caches the file, the new timestamp will act like a new file name.
The client receives the new scripts after pressing Ctrl+F5 because this is a shortcut to empty the browsers cache.
MVC has a really nice way of doing this which involves appending a unique code which changes everytime the application or it's app pool is restarted. Check out MVC Bundling and Minification.
Hope this helps!

Extract useful data from wireshark/tcpdump

I'm trying to extract data from gathered packages (tcpdump/wireshark). If I go to a website, all I can capture are the headers, but not the content of the webpage. Example:
Tcpdump:
17:34:51.861910 IP HackMachine-G51J.47928 > 50.6.246.185.http: Flags
[P.], seq 511:1032, ack 181, win 237, options [nop,nop,TS val 9134579
ecr 2921721692], length 521
E..=.8#.#.....V2....8.PiI................. ..a..%.\GET /default.css
HTTP/1.1 Host: www.rationallyparanoid.com User-Agent: Mozilla/5.0
(X11; Linux x86_64; rv:30.0) Gecko/20100101 Firefox/30.0 Accept:
text/css,/;q=0.1 Accept-Language: en-US,en;q=0.5 Accept-Encoding:
gzip, deflate DNT: 1 Referer:
http://www.rationallyparanoid.com/articles/tcpdump.html x-pzi27:
kill+911+warfare x-khy3445: Dear%20NSA%2C%0Afuck%20you%21 Connection:
keep-alive If-Modified-Since: Sat, 20 Apr 2013 23:47:10 GMT
If-None-Match: "3660064-14dd-517328fe" Cache-Control: max-age=0
All I get are headers. Does someone have any idea how to extract the content?
You can't get it from the response that packet, because it's not delivered!
HTTP supports an "If-Modified-Since" header; as the RFC says:
The "If-Modified-Since" header field makes a GET or HEAD request
method conditional on the selected representation's modification date
being more recent than the date provided in the field-value.
Transfer of the selected representation's data is avoided if that
data has not changed.
As you can see, the reply, in the next packet, has a reply of "304 Not Modified", meaning that the page in question hasn't changed since the time specified in the If-Modified-Since header, so any copy the machine already fetched at that time is Good Enough.
If you want the content of the page to show up in a network trace, you would have to convince your browser to discard any copy it's saved, so that it doesn't use If-Modified-Since; I don't know how that's done with Firefox (I assume, from the headers, that you're using Firefox), but repeatedly trying to fetch the page might be treated as a "discard the cached copy" indication, and there might also be ways in the Firefox UI to discard cached copies of pages.

Very Strange ASP.NET MVC and IE8 Link/URL Issue

I've got a really strange issue in MVC/IE8 and I was wondering if anyone had seen anything like it. I've got a URL that returns an excel spreadsheet in the form http://application.mycompany/Controller.aspx/Action/ID1/ID2 (I am using the .aspx in the route due to the version of IIS). This opens from a page with the application, fine. I also have e-mails with HTML bodies that get sent around that have links in them and all the links open fine apart from this one - it gives an error saying "Cannot download ID2 from application.mycompany, IE was unable to open the site". I've checked the HTML in the e-mail and the link location on the page that works and they are the same. If I type the exact link text (or copy and paste it from the e-mail) into the IE address bar it again fails, but then if I click on the same text in the addres bar at the end and hit enter a second time it loads and spits out the file. It seems to work ok pasting it straight into FireFox, but I can't check clicking on the link and loading FF as I'm working remotely and don't have Firefox on my citrix desktop, nor can I find anyone who has it as their default browser (but as it works first time, I would imagine it would be ok).
Has anyone seen this before or got any ideas what might be causing it, please? This issue occurs on more than one machine, so it isn't an odd add-in/setting as far as I can see.
Thanks
MH
----------------------------Update-----------------------
I've used Fiddler to see what's going on and the response to both requests is identical, apart from the timestamp.
Failed response:-
HTTP/1.1 200 OK
Cache-Control: no-cache
Pragma: no-cache
Content-Type: application/vnd.ms-excel; charset=utf-8
Expires: -1
Server: Microsoft-IIS/7.0
X-AspNetMvc-Version: 1.0
content-disposition: attachment;filename=Filename.xls
X-AspNet-Version: 2.0.50727
X-Powered-By: ASP.NET
Date: Mon, 20 Dec 2010 10:31:52 GMT
Content-Length: 2354
<style type"text/css">.text { mso-number-format:\#; } .TableHead { background-color: #BDBDBD; } </style>
....confidential file content removed
Successful response:-
HTTP/1.1 200 OK
Cache-Control: no-cache
Pragma: no-cache
Content-Type: application/vnd.ms-excel; charset=utf-8
Expires: -1
Server: Microsoft-IIS/7.0
X-AspNetMvc-Version: 1.0
content-disposition: attachment;filename=Filename.xls
X-AspNet-Version: 2.0.50727
X-Powered-By: ASP.NET
Date: Mon, 20 Dec 2010 10:32:18 GMT
Content-Length: 2354
<style type"text/css">.text { mso-number-format:\#; } .TableHead { background-color: #BDBDBD; } </style>
and just to reiterate, the only thing I do to generate the second request is to click on the URL text in the address bar that generated the first request and hit enter.
The error I get is ""Internet Explorer was not able to open this Internet site. The requested site is either unavailable or cannot be found. Please try again later."
To quote the similar discussion on the Sitepoint forums.
It's the caching, if you select
no-cache IE can't find the file to
save it, but if you allow caching it
works. Annoying as the same file can
be downloaded at different times and
may well be different if things have
changed, so caching is a little
dangerous (I'll have to suffix the
filename with a timestamp to avoid
this, I think).
http://support.microsoft.com/kb/316431
so this fixes it
Response.Cache.SetCacheability(HttpCacheability.Private);
but it still doesn't explain why it
sometimes works, esp directly from the
web page link
"This behaviour is by design" -
translation "we can't get this to work
and CBA to get it working" - amazing
how firefox, opera and chrome don't
have this problem......
This kb article might help shed some light:
http://support.microsoft.com/kb/316431
"Web sites that want to allow this type of operation should remove the no-cache header or headers."
I've seen some wierdness around IE + Outlook + mailto links with large numbers of characters. Above 128 characters, mailto links do not work with IE + Outlook.
I would try to decrease the number of characters in your anchor to see if you can get it to be more reliable.

Resources