For the following image: https://upload.wikimedia.org/wikipedia/commons/7/79/2010-brown-bear.jpg
There isn't any cache-control header. And based on here even if you don't send anything then it will use its default value which is private. That being doesn't the URLSession need to perform a conditional request to make sure its still valid?
Is there anything in the headers that allows it to make such a conditional request? Because I don't see cache-control, max-age, Expires. The only things I see is are Last-Modified & Etag but again it needs to validate against the server or does not specifying anything make it cache indefinitely?! I've already read this answer, but doesn't discuss this scenario.
Yet it's being cached by the URLSession. (Because if I turn off internet, still it gets downloaded)
Only other thing I see is "Strict-Transport-Security": max-age=106384710.
Does that effect caching? I've already look here and don't believe it should. From what I the max-age for the HSTS key is there only to enforce it to be accessed from HTTPS for a certain period of time. Once the max-age is reached then access through HTTP is also possible.
These are all the headers that I'm getting back:
Date : Wed, 31 Oct 2018 14:15:33 GMT
Content-Length : 215104
Access-Control-Expose-Headers: Age, Date, Content-Length, Content-Range, X-Content-Duration, X-Cache, X-Varnish
Via : 1.1 varnish (Varnish/5.1), 1.1 varnish (Varnish/5.1)
Age : 18581
Etag : 00e21950bf432476c91b811bb685b6af
Strict-Transport-Security : max-age=106384710; includeSubDomains; preload
Accept-Ranges : bytes
Content-Type : image/jpeg
Last-Modified : Fri, 04 Oct 2013 23:30:08 GMT
Access-Control-Allow-Origin : *
Timing-Allow-Origin : *
x-analytics : https=1;nocookies=1
x-object-meta-sha1base36 : 42tq5grg9rq1ydmqd4z5hmmqj6h2309
x-varnish : 60926196 48388489, 342256851 317476424
x-cache-status : hit-front
x-trans-id : tx08ed43bbcc1946269a9a3-005bd97070
x-timestamp : 1380929407.39127
x-cache : cp1076 hit/7, cp1090 hit/7
x-client-ip : 2001:558:1400:4e:171:2a98:fad6:2579
This question was asked because of this comment
doesn't the URLSession need to perform a conditional request to make sure its still valid?
The user-agent should be performing a conditional request, because of the
Etag: 00e21950bf432476c91b811bb685b6af
present. My desktop Chrome certainly does performs the conditional request (and gets back 304 Not Modified).
But it's free not to
But a user-agent is perfectly free to decide on it's own. It's perfectly free to look at:
Last-Modified: Fri, 04 Oct 2013 23:30:08 GMT
and decide that there resource is probably good for the next five minutes1. And if the network connection is down, its perfectly reasonable and correct to display the cached version instead. In fact, your browser would show you web-sites even while your dial-up 0.00336 Mbps dial-up modem was disconnected.
You wouldn't want your browser to show you nothing, when it knows full well it can show you something. It becomes even more useful when we're talking about poor internet connectivity not because of slow dialup and servers that go down, but of mobile computing, and metered data plans.
1I say 5 minutes, because in the early web, servers did not give cache hints. So browsers cached things without even being asked. And 5 minutes was a good number. And you used Ctrl+F5 (or was it Shift+F5, or was it Shift+Click, or was it Alt+Click) to force the browser to bypass the cache.
Related
I'm using Alamofire 5 and have the requirement that some GET-requests should be cached. If the data is older then 20 minutes the real API should be hit.
What I found is to use the ResponseCacher. But I do not see a way to configure the individual request and need some advice.
let responseCacher = ResponseCacher(behavior: .modify { _, response in
let userInfo = ["date": Date()]
return CachedURLResponse(
response: response.response,
data: response.data,
userInfo: userInfo,
storagePolicy: .allowed)
})
let configuration = URLSessionConfiguration.af.default
private override init() {
configuration.timeoutIntervalForRequest = 10
configuration.requestCachePolicy = .reloadRevalidatingCacheData
Session(
configuration: configuration,
serverTrustManager: ServerTrustManager(evaluators: evaluators),
cachedResponseHandler: responseCacher
)
If the backend is returning proper caching headers that you want to limit to a certain amount of time, adding a Cache-Control: max-age= header on the request may work.
If the backend isn't return proper caching headers, using ResponseCacher is the way to go. You would modify the CachedURLResponse's response to include the proper Cache-Control header.
To elaborate on Jon's answer, the easiest way to achieve what you want is to just let the backend declare the cache semantics of this endpoint, then ensure that on the client side, URLSession uses a URLCache (which is probably the default anyway) and let URLSession and the backend do the rest. This requires, that you have control over the backend, though!
The more elaborate answer:
Here is just an example, how a server may return a response with declared cache semantics:
URL: https://www.example.com/ Status Code: 200
Age: 238645
Cache-Control: max-age=604800
Date: Tue, 12 Jan 2021 18:43:58 GMT
Etag: "3147526947"
Expires: Tue, 19 Jan 2021 18:43:58 GMT
Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
Vary: Accept-Encoding
x-cache: HIT
Accept-Ranges: bytes
Content-Encoding: gzip
Content-Length: 648
Content-Type: text/html; charset=UTF-8
Server: ECS (dcb/7EC7)
This server literally outputs the full range of what a server can declare regarding caching. The first eight headers (from Age to x-cache) declare the caching.
Cache-Control: max-age=604800 for example declares, that the data's freshness equals 604800 seconds. Having the date when the server created the data, the client can now check if the data is still "fresh".
Expires: Tue, 19 Jan 2021 18:43:58 GMT means the exact same thing, it declares when the data is outdated specifying the wall clock. This is redundant with the above declaration, but it is very clearly defined in the HTTP how clients should treat this.
Having an Age header is a hint, that the response has been actually delivered from a cache that exists between the client and the origin server. The age is the estimation of this data's age - the duration from when it has been created on the origin and when it has been delivered.
I don't wont to go into detail what every header means exactly and how a client and server should behave according HTTP since this is a very intense topic, but what you have to do is basically when you define an endpoint, is just to define the duration of the "freshness" of the returned data.
The whole details: Hypertext Transfer Protocol (HTTP/1.1): Caching
Once you came up with a good duration, Web-application frameworks (like Rails, SpringBoot, etc.) give great help with declaring cache semantics out of the box. Then Web-application frameworks will output the corresponding headers in the response - more or less "automagically".
The URLSession will automatically do the right thing according the HTTP protocol (well, almost). That is, it will store the response in the cache and when you perform a subsequent request it first looks for a suitable response in the cache and return that response if the "freshness" of the data is still given.
If that cached data is too old (according the given response headers and the current data and time), it will try to get a fresh data by sending the request to the origin server. Any upstream cache or eventually the origin server may then return fresh data. Your URLSession data task does all this transparently without giving you a clue whether the data comes from the cache or the origin server. And honestly, in most cases you don't need to know it.
Declaring the cache semantics according HTTP is very powerful and it usually should suit your needs. Also, the client may tailor its needs with specifying certain request headers, for example allowing to return even outdated data or ignoring any cached values, and much more.
Every detail may deserve a dedicated question and answer on SO.
I just found a partial response being cached as complete in one of our customer's machines, which rendered the whole website unusable. And I have absolutely no idea, what could possible have gone wrong there.
So what could have possibly gone wrong in the following setup?
On the server-side, we have an ASP.NET-application running. One IHttpHandler handles requests to javascript-files. It basically minifies the files as they are requested and writes the result on the response-stream. It does also log the length of the string being written to the Response-Stream:
String javascript = /* Javascript is retrieved here */;
HttpResponse response = context.Response;
response.ContentEncoding = Encoding.UTF8;
response.ContentType = "application/javascript";
HttpCachePolicy cache = response.Cache;
cache.SetCacheability(HttpCacheability.Public);
cache.SetMaxAge(TimeSpan.FromDays(300));
cache.SetETag(ETag);
cache.SetExpires(DateTime.Now.AddDays(300));
cache.SetLastModified(LastModified);
cache.SetRevalidation(HttpCacheRevalidation.None);
response.Headers.Add("Vary", "Accept-Encoding");
Log.Info("{0} characters sent", javascript.length);
response.Write(javascript);
response.Flush();
response.End();
The content is then normally sent using gzip-encoding with chunked transfer-encoding. Seems simple enough to me.
Unfortunately, I just had a remote-session with a user, where only about 1/3 of the file was in the cache, which broke the file of course (15k instead of 44k). In the cache, the content-encoding was also set to gzip, all communication took place via https.
After having opened the source-file on the user's machine, I just hit Ctrl-F5 and the full content was displayed immediately.
What could have possibly gone wrong?
In case it matters, please find the cache-entry from Firefox below:
Cache entry information
key: <resource-url>
fetch count: 49
last fetched: 2015-04-28 15:31:35
last modified: 2015-04-27 15:29:13
expires: 2016-02-09 14:27:05
Data size: 15998 B
Security: This is a secure document.
security-info: (...)
request-method: GET
request-Accept-Encoding: gzip, deflate
response-head: HTTP/1.1 200 OK
Cache-Control: public, max-age=25920000
Content-Type: application/javascript; charset=utf-8
Content-Encoding: gzip
Expires: Tue, 09 Feb 2016 14:27:12 GMT
Last-Modified: Tue, 02 Jan 2001 11:00:00 GMT
Etag: W/"0"
Vary: Accept-Encoding
Server: Microsoft-IIS/8.0
X-AspNet-Version: 4.0.30319
Date: Wed, 15 Apr 2015 13:27:12 GMT
necko:classified: 1
Your clients browser is most likely caching the JavaScript files which would mean the src of your scripts isn't changing.
For instance if you were to request myScripts
<script src="/myScripts.js">
Then the first time, the client would request that file and any further times the browser would read its cache.
You need to append some sort of unique value such as a timestamp to the end of your scripts so even if the browser caches the file, the new timestamp will act like a new file name.
The client receives the new scripts after pressing Ctrl+F5 because this is a shortcut to empty the browsers cache.
MVC has a really nice way of doing this which involves appending a unique code which changes everytime the application or it's app pool is restarted. Check out MVC Bundling and Minification.
Hope this helps!
I'm working on a small single-page application using HTML5. One feature is to show PDF documents embedded in the page, which documents can be selected form a list.
NOw I'm trying to make Chrome (at first, and then all the other modern browsers) use the local client cache to fulfill simple GET request for PDF documents without going through the server (other than the first time of course). I cause the PDF file to be requested by setting the "data" property on an <object> element in HTML.
I have found a working example for XMLHttpRequest (not <object>). If you use Chrome's developer tools (Network tab) you can see that the first request goes to the server, and results in a response with these headers:
Cache-Control:public,Public
Content-Encoding:gzip
Content-Length:130
Content-Type:text/plain; charset=utf-8
Date:Tue, 03 Jul 2012 20:34:15 GMT
Expires:Tue, 03 Jul 2012 20:35:15 GMT
Last-Modified:Tue, 03 Jul 2012 20:34:15 GMT
Server:Microsoft-IIS/7.5
Vary:Accept-Encoding
The second request is served from the local cache without any server roundtrip, which is what I want.
Back in my own application, I then used ASP-NET MVC 4 and set
[OutputCache(Duration=60)]
on my controller. The first request to this controller - with URL http://localhost:63035/?doi=10.1155/2007/98732 results in the following headers:
Cache-Control:public, max-age=60, s-maxage=0
Content-Length:238727
Content-Type:application/pdf
Date:Tue, 03 Jul 2012 20:45:08 GMT
Expires:Tue, 03 Jul 2012 20:46:06 GMT
Last-Modified:Tue, 03 Jul 2012 20:45:06 GMT
Server:Microsoft-IIS/8.0
Vary:*
The second request results in another roundtrip to the server, with a much quicker response (suggesting server-side caching?) but returns 200 OK and these headers:
Cache-Control:public, max-age=53, s-maxage=0
Content-Length:238727
Content-Type:application/pdf
Date:Tue, 03 Jul 2012 20:45:13 GMT
Expires:Tue, 03 Jul 2012 20:46:06 GMT
Last-Modified:Tue, 03 Jul 2012 20:45:06 GMT
Server:Microsoft-IIS/8.0
Vary:*
The third request for the same URL results in yet another roundtrip and a 304 response with these headers:
Cache-Control:public, max-age=33, s-maxage=0
Date:Tue, 03 Jul 2012 20:45:33 GMT
Expires:Tue, 03 Jul 2012 20:46:06 GMT
Last-Modified:Tue, 03 Jul 2012 20:45:06 GMT
Server:Microsoft-IIS/8.0
Vary:*
My question is, how should I set the OutputCache attribute in order to get the desired behaviour (i.e. PDF requests fullfilled from the client cache, within X seconds of the initial request)?
Or, am I not doing things right when I cause the PDF to display by setting the "data" property on an <object> element?
Clients are never obligated to cache. Each browser is free to use its own heuristic to decide whether it is worth caching an object. After all, any use of cache "competes" with other uses of the cache.
Caching is not designed to guarantee a quick response; it is designed to, on average, increase the likelihood that frequently used resources that are not changing will already be there. What you are trying to do, is not what caches are designed to help with.
Based on the results you report, the version of Chrome you were using in 2012 decided it was pointless to cache an object that would expire in 60 seconds, and had only been asked for once. So it threw away the first copy, after using it. Then you asked a second time, and it started to give this URL a bit more priority -- it must have remembered recent URLs, and observed that this was a second request -- it kept the copy in cache, but when the third request came, asked server to verify that it was still valid (presumably because the expiration time was only a few seconds away). The server said "304 -- not modified -- use the copy you cached". It did NOT send the pdf again.
IMHO, that was reasonable cache behavior, for an object that will expire soon.
If you want to increase the chance that the PDF will stick around longer, then give a later expiration time, but say that it must check with the server to see if it is still valid.
If using HTTP Cache-Control header, this might be: private, max-age: 3600, must-revalidate. With this, you should see a round-trip to server, which will give 304 response as long as the page is valid. This should be a quick response, since no data is sent back -- the browser's cached version is used.
private is optional -- not related to this caching behavior -- I'm assuming whatever this volatile PDF is, it only makes sense for the given user and/or shouldn't be hanging around for a long time, in some shared location.
If you really need the performance of not talking to the server at all, then consider writing javascript to hide/show the DOM element holding that PDF, rather than dropping it, and needing to ask for it again.
Your javascript code for the page is the only place that "understands" that you really want that PDF to stick around, even if you aren't currently showing it to the user.
Have you tried setting the Location property of the OutputCache to "Client"
[OutputCache(Duration=60, Location = OutputCacheLocation.Client)]
By default the location property is set to "Any" which could mean that the response is cached on the client, on a proxy, or at the server.
more at MSDN OutputCacheLocation
I have an issue similar to Heroku & Rails - Varnish HTTP Cache Not Working, but the solution (wait for a while, then everything works) doesn't seem to apply - I've had the setup below for several days.
This thread on the Heroku Google group has some users with the same problem. They mention that it takes a while for everything to be cached, but my understanding is that after a while, everything should get cached, no? Or does that only apply if there is a Lot of traffic?
I need some advice on where I should be looking/what I can try changing in order to get caching working properly.
My setup:
I have http://www.swingoutlondon.co.uk running on Heroku (Rails 3.0.3, Ruby 1.9.2, bamboo-mri-1.9.2) and the main index page performs a lot of database queries to return what is essentially a static page - usually taking about 2-3 seconds (yes, that's something I really do need to address, but I figure varnish caching is a quick win).
I've set the Cache-Control response header as described here, and indeed that does seem to have been set on the page:
>> curl -I http://swingoutlondon.co.uk
HTTP/1.1 200 OK
Server: nginx
Date: Sun, 13 May 2012 00:01:05 GMT
Content-Type: text/html; charset=utf-8
Connection: keep-alive
Cache-Control: public, max-age=300
Etag: "2565201f3ae39c6a9a1f6b1fb8bbbe0a"
X-Ua-Compatible: IE=Edge,chrome=1
X-Runtime: 1.699667
Content-Length: 44224
Accept-Ranges: bytes
X-Varnish: 681634826
Age: 0
Via: 1.1 varnish
Note: Cache-Control: public, max-age=300
I assume that Age: 0 indicates that it hasn't retrieved a cached copy, and indeed the command returns in the normal slow 2-3 seconds.
If keep repeatedly trying that curl, I can occasionally a cached copy (the page loads in under half a second and Age is greater than 0).
I must confess to not fully understanding HTTP headers, but one clue might be: when Age is greater than 0, I get two lots of digits in X-Varnish (in all other cases I only get one set):
X-Varnish: 848670407 848650521
Here's what I've checked:
the source of is identical each time.
I have one before_filter on that page, which sets the time the page was last updated as an instance variable.
there are a number of cookies - as far as I can see they are all set by either Google Analytics or the Twitter or Facebook buttons.
For good measure, here are my Request headers:
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en-GB,en-US;q=0.8,en;q=0.6
Cache-Control:max-age=0
Connection:keep-alive
Cookie:__utma=264326157.189257391.1336869624.1336869624.1336869624.1; __utmb=264326157.2.10.1336869624; __utmc=264326157; __utmz=264326157.1336869624.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)
Host:www.swingoutlondon.co.uk
If-None-Match:"2565201f3ae39c6a9a1f6b1fb8bbbe0a"
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.168 Safari/535.19
Ah well - turns out that because Heroku uses multiple independent Varnish servers, and because traffic to Swing Out London is relatively low, I shouldn't expect to have many pages served by the caches if my max-age is only 5 minutes. Setting it to 20 or 30 minutes results in much more caching.
I've written a detailed blog post collecting my learnings. Thanks to Garry Shulter for helping me out with this one.
I am currently developing an API where size matters: I want the answer to contain as few bytes as possible. I optimized my JSON answer, but rails still responds with many strange headers
HTTP/1.1 200 OK
Server: nginx/0.7.67 # Not from Rails, so ok.
Date: Wed, 25 Apr 2012 20:17:21 GMT # Date does not matter. We use ETag Can I remove this?
ETag: "678ff0c6074b9456832a710a3cab8e22" # Needed.
Content-Type: application/json; charset=utf-8 # Also needed.
Transfer-Encoding: chunked # The alternative would be Content-Length, so ok.
Connection: keep-alive # Good, less TCP overhead.
Status: 200 OK # Redundant! How can I remove this?
X-UA-Compatible: IE=Edge,chrome=1 # Completely unneded.
Cache-Control: no-cache # Not needed.
X-Request-Id: c468ce87bb6969541c74f6ea761bce27 # Not a real header at all.
X-Runtime: 0.001376 # Same goes for this
X-Rack-Cache: invalidate, pass # And this.
So there are lots of unnecessary HTTP headers. I could filter them in my server (nginx), but is there a way stop this directly in rails?
You can do this with a piece of Rack middleware. See https://gist.github.com/02c1cc8ce504033d61bf for an example of to do it in one.
When adding it to your app config, use something like config.middleware.insert_before(ActionDispatch::Static, ::HeaderDelete)
You want to insert it before whatever the first item in the list that displays when you run rake middleware, which in my case is ActionDispatch::Static.
http://guides.rubyonrails.org/rails_on_rack.html may be somewhat helpful if you haven't been exposed to Rack in the Rails context before.
Another option, since you're using Nginx, is the HttpHeadersMoreModule. This will allow you to have fine-grain control of exactly which headers are sent down the wire.
In your case, you'd specifically want to use the more_clear_headers directive, as such:
more_clear_headers Server Date Status X-UA-Compatible Cache-Control X-Request-Id X-Runtime X-Rack-Cache;
This also clears the Server header, since it's not really necessary, and if you're trying to save bytes, every little bit helps.
This module does require you to compile Nginx on your own, but that really shouldn't scare you. Nginx is very easy to compile, just follow the installation instructions.
I agree that both solutions presented by x1a4 and Stephen McCarth are good.
Ideally you should definitely use the HttpHeadersMoreModule however if someone is fan of native Ubuntu NginX package with security updates like I am, (or you don't have time for that, or just lazy) you don't necessary need to do that.
Another way is to use proxy_hide_header
server {
location #unicorn {
# ...
proxy_hide_header X-Powered-By;
proxy_hide_header X-Runtime;
# ...
}
}
note: #unicorn is just upsteram server, the location can be whatever /, /assets, ..
Now one argument against this solution is if you use several server blocks inside configuration that you need to specify proxy_hide_header to each one of them. Well yes but you can just create file and include it
# /etc/nginx/sites-enabled/my_app
server {
location #unicorn {
# ...
include /etc/nginx/shared/stealth_headers
# ...
}
}
# /etc/nginx/shared/stealth_headers
proxy_hide_header X-Powered-By;
proxy_hide_header X-Runtime
So why I think this solution is better than to use the middle-ware solution as presented by x1a4 ?
I had similar middle-ware solution before and it was working fine for couple of months. Then one day we stopped receiving Exception errors by exception monitoring tool party_foul gem. Long story short Middlewares are tricky, we done some code changes and this middleware was throwing exception, but it was throwing exception that was not caught with middleware that was suppose to monitor exceptions. So yes the whole thing is my bad, I should keep better eye on my code not doing stupid stuff, hewever I had unpleasant experience that is hard to erase, so I'm just recommending if you can rather to handle this on NginX level, not on middle-ware level
+ it make more sence if your NginX is handling several configurations (you don't have to update several applications if some change)