Is there a way to query the HTML5 application cache? - ios

Is there a way to query the contents of the HTML5 application cache?
I'm writing an iOS application that uses a lot of cached web content. Before loading a given page when the app is offline, I'd like to check whether the page exists in the cache. If it doesn't, I'll notify the user that they have to be online to see that content; if it does, I'll go ahead and load it.
Now, iOS has its own URL caching system, and I initially just assumed that I could check the contents of the cache this way:
if ([[NSURLCache sharedURLCache] cachedResponseForRequest:myRequest] != nil) {
// go ahead and load the page
}
else {
// notify the user that the content isn't available
}
Silly me. It seems that iOS's cache and HTML5's cache are unrelated: -cachedResponseForRequest: returns nil for any request, even when I can see that the URL is in the HTML5 application cache (using the Safari web debugger).
So, is there some way that I can query the contents of the HTML5 application cache? It doesn't matter if the answer uses Objective-C code or Javascript, since I can always just execute the relevant JS from Objective-C.

There are two properties of HTML5 AppCache which mean that in normal operation there shouldn't be a need to do so:
AppCache update operations are atomic, either the entire cache is updated, or none of it it
Once an AppCache is created then all files that are in the cache are served from the cache
The end result is that for any given version of the manifest file, any file listed in it that gets loaded into the browser will be consistent with all the other files listed in the manifest. All you should need to check is window.applicationCache.status and check that it is not UNCACHED.
There is another possibility. If you are 'lazily adding' files to the AppCache as described in Dive Into HTML5 then it could be that you're not sure which files are cached. In this case you could adapt one of the approaches for detecting online state, I'm not going to give you a fully tested solution but here is the general idea:
Create a web page containing a unique identifier, something that's unlikely to ever appear normally in a page. The identifier can be in hidden content in an otherwise normal page.
Set this page as the generic FALLBACK in your manifest.
Request pages with AJAX.
Scan the response for the unique identifier, if you find it then you know the page requested is not in the AppCache

Yes,the cache is stored in the Application.db.

Related

How to prevent an app redirect on iOS for an URL with query?

we're troubleshooting a problematic scenario involving the Universal Links on iOS.
In a nutshell, we need to make sure the app redirect happens only for the path without a query.
I.e. when the user goes to https://www.example.com/path, we want it to be redirected to an app. However, we have a bunch of complex scripts on the site, triggered by urls like https://www.example.com/path?someparam=1&anotherparam=23, that should be handled via the **browser **(no support for those features in the app yet).
We've have our Domain Association file uploaded, and it has featured the following snippet in the components section:
{"/":"/path"},
This does force the iOS to redirect to our app whenever user clicks a no-query an any application, or scans a QR code.
However, it does also cause a redirect for the links with query (https://www.example.com/path?someparam=1&anotherparam=23)
I've read through the https://developer.apple.com/documentation/bundleresources/applinks on this, and have tried a few tricks:
Insert an exclude block prior to the existing one:
{"/":"/path", "?":"?*", "exclude":true},
{"/":"/path"},
Update the existing block to enforce an empty query on it:
{"/":"/path", "?":""}, // supposed to override the implied "?":"*"
Prepend the existing block with a bit different form of the query pattern (to intercept a specfic param in query):
{"/":"/path", "?":{"someparam":"?*"}, "exclude":true},
{"/":"/path"},
Nothing did help so far - iOS still does redirect the links https://www.example.com/path?someparam=1&anotherparam=23 to the application. I have checked that the file is not cached by our reverse proxy. I have checked that the old file is not being cached by our cloudflare CDN. I have checked that the old file is not being cached by the apple delivery service. I have got a clean iOS device to ensure I don't have the redirect preference cached localy - still redirects.
Is this scenario supported at all? Maybe there is a bug in iOS/Safari implementation?

Storage of user data

When looking at how websites such as Facebook stores profile images, the URLs seem to use randomly generated value. For example, Google's Facebook page's profile picture page has the following URL:
https://scontent-lhr3-1.xx.fbcdn.net/hprofile-xft1/v/t1.0-1/p160x160/11990418_442606765926870_215300303224956260_n.png?oh=28cb5dd4717b7174eed44ca5279a2e37&oe=579938A8
However why not just organise it like so:
https://scontent-lhr3-1.xx.fbcdn.net/{{ profile_id }}/50x50.png
Clearly this would be much easier in terms of storage and simplicity. Am I missing something? Thanks.
Companies like Facebook have fairly intense CDNs. They may look like randomly generated urls but they aren't, each individual route is on purpose and programed to be handled in that manner.
They aren't after simplicity of storage like you would be if you were just using a FTP to connect to a basic marketing website server. While you may put all your images in a /images folder, Facebook is much too complex for this. Dozens of different types of applications accessing hundreds if not thousands of CDNs and servers world wide.
If you ever build a web app, such as a Ruby on Rails app, and you work with a services such as AWS (Amazon Web Services) you'll also encounter what seems like nonsensical urls. But it's all part of the fast delivery network provided within the architecture. Every time you "push" your app up to the server new urls are generated for each unique resource automatically, css files, JavaScript files, image files, etc all dynamically created. You don't have to type in each of these unique urls individually each time you publish the app, the code simply knows where to look for those as a part of the publishing process.
Example: you tell the web app to look for
//= require jquery
and it returns you http://example.com/assets/jquery-eb3e278249152b5b5d5170b73d9dbf52.js?body=1 in your header.
It doesn't matter that the url is more complex than it should be, the application recognizes it, and that's all that matters.
Simply put, I think it can boil down to two main reasons: Security and Cache:
Security - Adding these long unpredictable hashes prevent others from guessing photo URLs and makes it pretty hard to download photos you aren't supposed to.
Consider what would happen if I could easily guess your profile photo URL and download it, even when you explicitly chose to share it only with friends.
Cache - by adding "random" query params to each photo, you make sure each photo instance gets its own URL. Thus you can store the photo in browser's cache for a long time, knowing that whenever you replace it with a new one, the new photo will have a fresh URL and the browser won't keep showing you the old photo.
If you were to keep the same URL for each user's profile photo (e.g. https://scontent-lhr3-1.xx.fbcdn.net/{{ profile_id }}/50x50.png), and then upload a new photo, either one of these can happen:
If you stored the photo in browser's cache for a long time, the browser will keep showing you the cached version (as long as URL is the same, and cache hasn't expired, there's no need to re-download the image).
If, instead, you only keep the image in cache for short period of time, you end up hitting your server much more then actually needed, increasing the load and hurting performance.
I hope this clarifies it.
With your route scheme, how would you avoid strangers to access the pictures of a private account? The hash also prevent bots to downloads all the pictures.
I get your pain :-) I might not stay with describing how this problem could appear more, but rather let me speak of a solution. Well it is normal that in general code while dealing with hashed value or even base64ed value it seems likes mess to deal with, but with an identifier to explain along, it does not remain much!
I use to work in a company where we use to collate Facebook post, using Graph API get its Insights Object and extract information from it for easy passing around within UI and sending back to our Redis cache store; and once we defined a data-structure in TaffyDB how an object organization is going to look like, everything just made sense with its ability to query the useful finite from long junk looking stream of minified Javascript stream
Refer: http://www.taffydb.com/
The extra values in the URL are useful to:
Track access. This is like when a newspaper appends "&homepage" vs. "&email" to an article URL, so their system knows how a reader found the page.
Avoid abuse and control access. Imagine that a user loaded a small, popular pornographic image into a profile image. They could then hijack the CDN to be a free web host for their porn site. But that code is used internally by the CDN to limit the number of views.

Sensitive Data stored in cache.db-wal file?

I am facing an issue in an iOS application that uses a UIWebView to render HTML5 code that is part of the application bundle.
This HTML5 code makes ajax requests to our backend which may potentially have sensitive data in them. This is all done over HTTPS and our application never stores the sensitive data. However, when doing security testing for the application, we found that http post requests where being stored in a local SQL Lite database (cache.db) as of iOS 5.
It was easy to manage that, by setting the NSURLCache global object to have zero disk storage, and deleting the file when appropriate.
Now however, it looks like in iOS 6.1 Apple has changed the implementation again, and the data is being stored in cache.db-wal. I have limited knowledge of SQL Lite, but I think this is a file created when SQL Lite is initialized with certain options.
Any suggestions as to a fix?
After further research, it seems that the suggestion by Hot Licks above was correct, by adding the "no-cache, no-store" value to the HTTP response, the HTTP request values where not logged in the SQLite database.
For example, in ASP.Net MVC:
public ActionResult PostSensitiveData(string data)
{
Response.Cache.SetCacheability(HttpCacheability.NoCache);
Response.Cache.SetNoStore();
return Json(data);
}
The other files created by SQLite (-journal, -wal, -shm) are part of the database itself.
When you delete the cache.db file, also delete any cache.db-* files.
To prevent that data gets inserted in the first place, open the database and create some trigger like this on every table:
CREATE TRIGGER MyTable_evil_trigger
BEFORE INSERT ON MyTable
BEGIN
SELECT RAISE(IGNORE);
END;
(And then check whether the UIWebView blows up when the inserted records don't actually show up …)
You can call
[[NSURLCache sharedURLCache] removeAllCachedResponses]
This will clear all the cached url calls from the Cache.db file.
I struggled on the same issue. Since I was using react-native, I felt that the current answer were inconvenient.
So, I came up with two solutions:
Use this package
https://github.com/qq273335649/oa-react-native-clear-cache
Then using the function clearCache. The problem is that it erases all the cache, not necessarily convenient.
After the queries that saves the confidential data in the cache, I remove the db-wal file with the package expo-file-system (you need to be using expo for this one)
I hope it helps.

How would HTML5 offline manifest/functionality work with ASP.NET MVC 4?

Ok, I'm building a PoC for a mobile application that needs to have offline capabilities, and I have several questions about whether I'm designing the application correctly and also what behavior I will get from the cache manifest.
This question is about including URLs of Controller actions in both the CACHE section of the manifest as well as in the NETWORK section.
I believe I've read some conflicting information online about this. In a few sites I read that including the wild card in the NETWORK section would make the browser try to retrieve everything from the server when it's online, and just use whatever is cached if there is no internet connection.
However, this morning I read the following on Dive into HTML5 : Let's take this offline:
The line marked NETWORK: is the beginning of the “online whitelist” section.
Resources in this section are never cached and are
not available offline. (Attempting to load them while offline will
result in an error.)
So, which information is correct? How would the application behave if I added the URL for a controller action in both the CACHE and the NETWORK sections?
I have a very simple and small PoC working so far, and this is what I've observed regarding this question:
I have a controller action that just generates 4 random numbers and sets them on the ViewBag, and the View will display them on a UL.
I'm not using Output caching at all. The only caching comes from the manifest file.
Before adding the manifest attribute to my Layout.cshtml's html tag, each time I requested the View, I'd get different random numbers every time, and a breakpoint set on the controller action would be hit.
The first time I requested the URL/View after adding the manifest attribute, the breakpoint on the controller is hit 3 times (as opposed to just 1 before). This is already weird and I'll post a separate question about this, I'm just writing it here for reference.
After the manifest and the resources are cached (verified by looking at the Console window on Chrome Dev Tools), everytime I request the View/URL I get the cached version and the breakpoint is never hit again.
This behavior makes me believe that whatever is in the CACHE section will override or ignore anything that is on the NETWORK section, but like I said (and the reason I'm asking here) is because I'm new to working with this and I'm not sure if this is how it's supposed to work or if I'm missing something or not using it correctly.
Any help is greatly appreciated
Here's the relevant section of the cache.manifest:
CACHE MANIFEST
#V1.0
CACHE:
/
/Content/Site.css
/Content/themes/base/jquery-ui.css
NETWORK:
*
/
FALLBACK:
As it turns out, html5 appcache or manifest caching does work differently than I expected it to.
Here's a quote from whatwg.org, which explains it nicely:
Offline Web Applications
The application cache feature works best if the application logic is
separate from the application and user data, with the logic (markup,
scripts, style sheets, images, etc) listed in the manifest and stored
in the application cache, with a finite number of static HTML pages
for the application, and with the application and user data stored in
Web Storage or a client-side Indexed Database, updated dynamically
using Web Sockets, XMLHttpRequest, server-sent events, or some other
similar mechanism.
Legacy applications, however, tend to be designed so that the user
data and the logic are mixed together in the HTML, with each operation
resulting in a new HTML page from the server.
The mixed-content model does not work well with the application cache
feature: since the content is cached, it would result in the user
always seeing the stale data from the previous time the cache was
updated.
While there is no way to make the legacy model work as fast as the
separated model, it can at least be retrofitted for offline use using
the prefer-online application cache mode. To do so, list all the
static resources used by the HTML page you want to have work offline
in an application cache manifest, use the manifest attribute to select
that manifest from the HTML file, and then add the following line at
the bottom of the manifest:
SETTINGS:
prefer-online
NETWORK:
*
so, as it turns out, application cache is not a good fit for pages with dynamic information that are rendered on the server. whatwg.org calls these type of apps "legacy".
for a natural fit with application cache, you'd need to have only the display and generic logic on your html page and retrieve any dynamic information through ajax requests.
hope this helps.

Including dynamic images in web page

I have a web application in which the user can configure reports (ASP.NET MVC, no Reporting Services or anything). The configuration is then represented as a JavaScript object which is sent to the server as JSON to retrieve data and actually generate the report. The submission HTML look similar to this:
<form method="post" action="/RenderReport" target="_blank">
<input type="hidden" name="reportJson"/>
</form>
This works very well for opening the report in a new browser window. However, in this report, I want to include images that are genetated from the data. How can this be done in a good way? The obvious ways that come to mind are:
Embed the metadata necessary to generate the images in the URL, like <img src="/GenerateImage/?metadata1=2&metadata2=4"/>. This won't work, however, since the metadata is very likely to make the URL exceed the 2083 characters max in IE.
Use an ajax POST request, and then when the response comes back, create an image element like <img src="data:image/png;base64,{data_in_json_response}"/>. This is not possible, though, since my application has to work in IE6, which doesn't support data URIs.
Generate the images while generating the report, creating a unique key for each image, and then use URLs of the form <img src="/GetCachedImage?id=23497abc289"/>. This is my current best idea, but it does raise the issue of where to cache the images. The places I can think of are:
In the session. Advantage: The cached item is automatically deleted when when the session is abandoned. Disadvantage: accessing the session will serialize accesses to the page within a session. This is bad in my case.
In the database: Advantage: Works well. Disadvantage: Unnecessary overhead, the cached items must be deleted some time.
In the Application / Cache object. I haven't really thought through all advantages and disadvantages of this one.
It also raises the question of when to delete the cached items. If I delete them right after the image is shown, it seems that the page can't be refreshed or printed without the images becoming red xes. Every other option means extra complexity.
How can this problem be solved in a good way, or at least one that isn't bad?
You can do a rotating disk cache of images rather easily... Google "ASP.NET image resizing module", the source code includes a disk caching module with configurable size.
However,
If the report is HTML, and contains image references, you have no way of knowing how long that report will be hanging around. Those images may be needed forever... Say someone copies and pastes into an e-mail... those links will stick around, and suddenly break when the cache is cleared.
If you only have a single server, you could use a hybrid approach. Create a Dictionary of cached images where the 'string' is your ID value in your example. object is the collection of parameters you need to create the image. Then you can just make a request for yourserver/generate/image/123456 and return the appropriate type.
This wouldn't work in a server farm unless you have some way to share the "object" that represent your parameters. You will still have to clean up this dictionary somehow or risk it growing without bound.

Resources