URLSession caching not working as documented - what am I doing wrong?

URLSession caching not working as documented - what am I doing wrong? - ios

According to the documentation if you use the default useProtocolCachePolicy the logic is as follows:
If a cached response does not exist for the request, the URL loading system fetches the data from the originating source.
Otherwise, if the cached response does not indicate that it must be revalidated every time, and if the cached response is not stale (past its expiration date), the URL loading system returns the cached response.
If the cached response is stale or requires revalidation, the URL loading system makes a HEAD request to the originating source to see if the resource has changed. If so, the URL loading system fetches the data from the originating source. Otherwise, it returns the cached response.
However, my experimentation (see below) has shown this to be completely false. Even if the response is cached it is never used, and no HEAD request is ever made.
Scenario
I am making a request to a URL that returns ETag and Last-Modified headers which never change. I have made the request at least once so the response is already cached (which I can verify by looking at the cache DB for the app on the iOS simulator)
Using useProtocolCachePolicy (the default)
If I have a URLSession with a URLSessionConfiguration with requestCachePolicy set to useProtocolCachePolicy then the response is cached (I can see it in the cache DB), but the cached response is never used. Repeated requests to the same URL always make a new GET request without If-None-Match or If-Modified-Since headers, so the server always returns HTTP 200 with the full response. The cached response is ignored.
Using reloadRevalidatingCacheData on every URLRequest
If I set the cachePolicy on each URLRequest to reloadRevalidatingCacheData then I see caching in action. Each time I make the request, a GET request is made with the If-None-Match and If-Modified-Since headers set to the values of the ETag and Last-Modified headers, respectively, of the cached response. As nothing has changed, the server responds with a 304 Not Modified, and the locally cached response is returned to the caller.
Using reloadRevalidatingCacheData only on the URLSessionConfiguration
If I only set requestCachePolicy = . reloadRevalidatingCacheData on the URLSessionConfiguration (instead of on each URLRequest) then when the app starts only the first request uses cache headers and gets a 304 Not Modified response. Subsequent requests are normal GET requests without any cache headers.
Conclusion
All the other cache policy settings are basically variants of "only use cached data" or "never use the cache" so are not relevant here.
There is no scenario in which URLSession makes a HEAD request as the documentation claims, and no situation in which it just uses cached data without revalidation based on expiration date information in the original response.
The workaround I will use is to set cachePolicy = .reloadRevalidatingCacheData on every URLRequest to get some level of local caching, as 304 Not Modified response only return headers and no data so there is a saving of network traffic.
If anyone as any better solutions, or knows how to get URLSession working as documented, then I would love to know.

Service response headers should include:
Cache-Control: must-revalidate
Apple will use this instruction to implement .useProtocolCachePolicy as described in documentation.

Related

URLResponse cached although response's cache-control header is set to no-cache

In my iOS app I want to cache images that are requested from different destinations. For downloading images I use URLSessionDataTasks with the default caching mechanism provided by URLSession.shared, which makes use of the NSURLRequestUseProtocolCachePolicy.
The caching works basically fine. Responses are being cached and cache headers like etag and cache-control "max-age" are correctly being handled. But if the server responds with the cache-control header set to "no-cache", the URLCache of the URLSession is still caching the image. I can access the cached response via URLCache.shared.cachedResponse(for: request) and also a new data task with the same request will return time image from the cache (which I validated by using Charles proxy and I'm not seeing the request I am awaiting).
Why isn't it correctly handling the respond's cache header? Do I need to manually check the respond's cache headers?

The no-cache directive doesn't mean "don't store it in the cache". Rather it instructs the cache not to serve a cached response without validating with the server first. The [RFC7234][1] specification says the following regarding the no-cache directive.
The "no-cache" response directive indicates that the response MUST NOT
be used to satisfy a subsequent request without successful validation
on the origin server. This allows an origin server to prevent a cache
from using it to satisfy a request without contacting it, even by
caches that have been configured to send stale responses.
If the no-cache response directive specifies one or more field-names,
then a cache MAY use the response to satisfy a subsequent request,
subject to any other restrictions on caching. However, any header
fields in the response that have the field-name(s) listed MUST NOT be
sent in the response to a subsequent request without successful
revalidation with the origin server. This allows an origin server to
prevent the re-use of certain header fields in a response, while still
allowing caching of the rest of the response.
So what will happen is that for "fresh" responses with the no-cache directive, a conditional request will be sent to verify whether the stored response can be used. If the response is still valid, the server will send a 304 - Not Modified response. Upon receiving the 304 response, the cache will serve the stored response with the no-cache directive. If the stored response is no longer valid, the server will send a new response.
[1]: https://www.rfc-editor.org/rfc/rfc7234#section-5.2.2

Detect cached response in Alamofire

The app I'm currently working on needs to present a notification to the user only when a networking request returns a non-cached response.
The app uses Alamofire for networking requests.
How can I determine if a response from Alamofire came from the cache or not?

the only way is to check HTTP header fields in HTTP response. Cache-Control field tells all caching mechanisms from server to client whether they may cache this object. It is measured in seconds. Cache-Control: max-age=3600 means that the content of respond can be one hour old.
if you need non cached respond from server, you have to specify Cache-Control field in your request as Cache-Control: no-cache
there is no difference if you are using Alamofire or not

How modify url request for NSURLCache

I'm using AFNetworking, making API calls that respond with proper cache-control headers.
Everything works fine with requests honoring the cache except for one glitch.
The API I'm using requires a signature to be created with a lifetime of 5 minutes. It's generated from the current time, API key, and API secret. So when I pass this in as a sig parameter, the cache is going to continually miss.
Example:
request1: http://foo.com?p=hello&apikey=12345&sig=ABCDEF
request2: http://foo.com?p=hello&apikey=12345&sig=ZYXWVU
So request 2 is a cache miss.
Question: How could I modify the request in such a way as to strip out the signature parameter for caching only?

NSURLCache on iOS 4.3.x not checking Last-Modified or Etag headers

If I download a document using NSURLConnection/NSURLCache that gets cached, edit that document on the server (so Last-Modified and Etag headers change) and then download the document again, the previously cache version is returned. NSURLCache/NSURLConnection makes no attempt to check for a newer resource using If-Modified-Since/If-None-Match headers in the request (which would return a newer version of the resource).
SHould NSURLCache used in conjunction with NSURLConnection check for an updated resource on the server using Last-Modified/Etag headers that have been previously cached? I can't seem to find any documentation to say whether this should happen or if checking for HTTP 304 content is up to the developer.

I'll let other people comment on how to use NSURLCache. I found that the most reliable way to prevent caching with NSURLConnection, proxy servers, and misconfigured web servers, was to append an incrementing number to your URL.
So rather than using http://mycompany.com/path, use http://mycompany.com/path?c=1, http://mycompany.com/path?c=2, http://mycompany.com/path?c=3, etc, etc.
It's a hack, but a good one.

Why/how do browsers know to cache content (html,css,js,etc) when not explicitly instructed to do so

I was looking at Chirpy for css/js minifying,compression, etc.
I noticed it doesn't support caching. It doesn't have any logic for sending expires headers, etags, etc.
The absence of this feature made me question if caching content is not as much of a concern; YSlow! grades this so I'm a little confused. Now I'm researching caching and cannot explain why this css file, SuperFish.css, is being retrieved from cache.
Visit http://www.weirdlover.com (developer of Chirpy)
Look at initial network track. Notice, there is no expiration header for SuperFish.css.
Revisit the page and inspect the network trace again. Now SuperFish.css is retrieved from cache.
Why is the SuperFish.css retrieved from cache upon revisiting the page? This happens even when I close all instances of chrome and then revisit the page.

This seems to fall with in the HTTP specification.
13.4 Response Cacheability
Unless specifically constrained by a cache-control (section 14.9) directive, a caching system MAY always store a successful response (see section 13.8) as a cache entry, MAY return it without validation if it is fresh
13.2.2 Heuristic Expiration
Since origin servers do not always provide explicit expiration times, HTTP caches typically assign heuristic expiration times, employing algorithms that use other header values (such as the Last-Modified time) to estimate a plausible expiration time.
It would seem by not providing a cache-control header, and leaving out the expires header the client is free to use a heuristic to generate an expiry date and then caches the response based upon that.
The presence of an etag has no effect on this since the etag is used to re-validate an expired cache entry, and in this case chrome considers the cached entry to be fresh (the same applies to last-modified), thus it hasn't yet expired.
The general principle being if the origin server is concerned with freshness it should explicitly state it.

In this case (when server doesn't return Expires header), the browser should make HTTP request with If-Modified-Since header, and if the server returns HTTP 304 Not modified then the browser gets the data from the cache.
But, I see, nowadays browsers don't do any requests when the data is in the cache. I think they behave this way for better response time.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart