Core Data on client (iOS) to cache data from a server Strategy - ios

I have written many iOS apps that was communicating with the backend. Almost every time, I used HTTP cache to cache queries and parse the response data (JSON) into objective-C objects. For this new project, I'm wondering if a Core Data approach would make sense.
Here's what I thought:
The iOS client makes request to the server and parse the objects from JSON to CoreData models.
Every time I need a new object, instead of fetching the server directly, I parse CoreData to see if I already made that request. If that object exists and hasn't expired, I use the fetched object.
However, if the object doesn't exist or has expired (Some caching logic would be applied here), I would fetch the object from the server and update CoreData accordingly.
I think having such an architecture could help with the following:
1. Avoid unnecessary queries to the backend
2. Allow a full support for offline browsing (You can still make relational queries with DataCore's RDBMS)
Now here's my question to SO Gods:
I know this kinda requires to code the backend logic a second time (Server + CoreData) but is this overkill?
Any limitation that I have under estimated?
Any other idea?

First of all, If you're a registered iOS Dev, you should have access to the WWDC 2010 Sessions. One of those sessions covered a bit of what you're talking about: "Session 117, Building a Server-driven User Experience". You should be able to find it on iTunes.
A smart combination of REST / JSON / Core Data works like a charm and is a huge time-saver if you plan to reuse your code, but will require knowledge about HTTP (and knowledge about Core Data, if you want your apps to perform well and safe).
So the key is to understand REST and Core Data.
Understanding REST means Understanding HTTP Methods (GET, POST, PUT, DELETE, ...HEAD ?) and Response-Codes (2xx, 3xx, 4xx, 5xx) and Headers (Last-Modified, If-Modified-Since, Etag, ...)
Understanding Core Data means knowing how to design your Model, setting up relations, handling time-consuming operations (deletes, inserts, updates), and how to make things happen in the background so your UI keeps responsive. And of course how to query locally on sqlite (eg. for prefetching id's so you can update objects instead of create new ones once you get their server-side equivalents).
If you plan to implement a reusable API for the tasks you mentioned, you should make sure you understand REST and Core Data, because that's where you will probably do the most coding. (Existing API's - ASIHttpRequest for the network layer (or any other) and any good JSON lib (eg. SBJSON) for parsing will do the job.
The key to make such an API simple is to have your server provide a RESTful Service, and your Entities holding the required attributes (dateCreated, dateLastModified, etc.) so you can create Requests (easily done with ASIHttpRequest, be they GET, PUT, POST, DELETE) and add the appropriate Http-Headers, e.g. for a Conditional GET: If-Modified-Since.
If you already feel comfortable with Core Data and can handle JSON and can easily do HTTP Request and handle Responses (again, ASIHttpRequest helps a lot here, but there are others, or you can stick to the lower-level Apple NS-Classes and do it yourself), then all you need is to set the correct HTTP Headers for your Requests, and handle the Http-Response-Codes appropriately (assuming your Server is REST-ful).
If your primary goal is to avoid to re-update a Core-Data entity from a server-side equivalent, just make sure you have a "last-modified" attribute in your entity, and do a conditional GET to the server (setting the "If-Modified-Since" Http-Header to your entities "last-modified" date. The server will respond with Status-Code 304 (Not-Modified) if that resource didn't change (assuming the server is REST-ful). If it changed, the server will set the "Last-Modified" Http-Header to the date the last change was made, will respond with Status-Code 200 and deliver the resource in the body (eg. in JSON format).
So, as always, the answer is to your question is as always probably 'it depends'.
It mostly depends what you'd like to put in your reusable do-it-all core-data/rest layer.
To tell you numbers: It took me 6 months (in my spare time, at a pace of 3-10 hours per week) to have mine where I wanted it to be, and honestly I'm still refactoring, renaming, to let it handle special use-cases (cancellation of requests, roll-backs etc) and provide fine-grained call-backs (reachability, network-layer, serialization, core data saving...), . But it's pretty clean and elaborate and optimized and hopefully fits my employer's general needs (an online market-place for classifieds with multiple iOS apps). That time included doing learning, testing, optimizing, debugging and constantly changing my API (First adding functionality, then improving it, then radically simplifying it, and debugging it again).
If time-to-market is your priority, you're better off with a simple and pragmatic approach: Nevermind reusability, just keep the learnings in mind, and refactor in the next project, reusing and fixing code here and there. In the end, the sum of all experiences might materialize in a clear vision of HOW your API works and WHAT it provides. If you're not there yet, keep your hands of trying to make it part of project budget, and just try to reuse as much of stable 3'rd-Party API's out there.
Sorry for the lenghty response, I felt you were stepping into something like building a generic API or even framework. Those things take time, knowledge, housekeeping and long-term commitment, and most of the time, they are a waste of time, because you never finish them.
If you just want to handle specific caching scenarios to allow offline usage of your app and minimize network traffic, then you can of course just implement those features. Just set if-modified-since headers in your request, inspect last-modified headers or etags, and keep that info persistent in your persistet entities so you can resubmit this info in later requests. Of course I'd also recommend caching (persistently) resources such as images locally, using the same HTTP headers.
If you have the luxury of modifying (in a REST-ful manner) the server-side service, then you're fine, provided you implement it well (from experience, you can save as much as 3/4 of network/parsing code iOS-side if the service behaves well (returns appropriate HTTP status codes, avoids checks for nil, number transformations from strings, dates, provide lookup-id's instead of implicit strings etc...).
If you don't have that luxury, then either that service is at least REST-ful (which helps a lot), or you'll have to fix things client-side (which is a pain, often).

There is a solution out there that I couldn't try because I'm too far in my project to refactor the server caching aspect of my app but it should be useful for people out there that are still looking for an answer:
It does exactly what I did but it's much more abstracted that what I did. Very insightful stuff there. I hope it helps somebody!

I think it's a valid approach. I've done this a number of times. The tricky part is when you need to deal with synchronizing: if client and server can both change things at the same time. You almost always need app-specific merging logic for this.


How to handle SAP Kapsel Offline app OData conflicts properly?

I build an app that is able to store OData offline by using SAP Kapsel Plugins.
More or less it's the same as generated by WEB ID or similer to the apps in this example:
Now I am at the point to check the error resolution potential. I created a sync conflict (chaning data on the server after the offline database was stored and changed something on the app and started a flush).
As mentioned in the documentation I can see the error in ErrorArchive and could also see some details. But what I am missing is the information of the "current" data on the database.
In the error details I can just see the data on the device but not the data changed on the server.
For example:
Device is loading some names into offline store
Device is offline
User A is changing some names
User B is changing one of this names directly online
User A is online again and starts a sync
User A is now informend about the entity that was changed BUT:
not the content user B entered
I just see the "offline" data.
Is there a solution to see the "current" and the "offline" one in a kind of compare view?
Please also note that the server communication is done by the Kapsel Plugin and not with normal AJAX calls. This could be an alternative but I am wondering if there is no smarter way supported by the API?
Meanwhile I figured out how to load the online data (manually).
This could be done by switching http handler back to normal one.
Anyhow this does not look like a proper solution and I also have the issue with the conflict log itself. It must be deleted before any refresh could be applied.
I could not find any proper documentation for that. Also ETag handling is hardly described in SAPUI5 and SAP Kapsel documentation.
This question is a really tricky one, due to its implications. I understand that you are simulating a synchronization error due to concurrent modification, and want to know if there is a way for the client to obtain the "current" server state in order to give the user a means to compare the local and server state.
First, let me give you the short answer: No, there is no way for the client to see the current server state "for reference" via the Offline APIs when there are synchronization errors. Doing an online query as outlined above might work, but it certainly is a bad idea.
Now for the longer answer, which explains why this is not necessarily a defect and why I said there are quite some implications to the answer.
Types of Synchronization Errors
We distinguish a number of synchronization errors, and in this context, we are clearly dealing with business-related issues. There are two subtypes here: Those that the user can correct, e.g. validation errors, and those that are issues in the business process itself.
If the user violates the input range, e.g. by putting a negative price for a product, the server would reply with the corresponding message: "-1 is not a valid input value for 'Price'". You, as a developer, can display such messages to the user from the error archive, and the ensuing fix is indeed a very easy one.
Now when we talk about concurrent modification, things get really, really nasty. In fact, I like to say that in this case there is an issue with the business process, because on one hand, we allow data to get out of sync. On the other hand, the process allows multiple users to manipulate the same piece of information. How all relevant users should now be notified and synchronize, is no longer just a technical detail, but in fact a new business process. There just is no way to generically device how to handle this case. In most cases, it would involve back-office experts who need to decide how the changes should be merged.
A Better Solution
Angstrom pointed out that there is no way to manipulate ETags on the client side, and you should in fact not even think about it. ETags work like version numbers in optimistic locking scenarios, and changing the ETag basically means "Just overwrite what's on the server". This is a no-go in serious scenarios.
An acceptable workaround would be the following:
Make sure the server returns verbose error messages so that the user can see what happened and what caused the conflict.
If that does not help, refresh the data. This will get you an updated ETag, and merge the local changes into the "current" server state, but only locally. "Merging" really means that local changes always overwrite remote changes.
The user now has another opportunity to review the data and can submit it again.
A Good Solution
Better is not necessarily good, so here is what you should really do: Never let concurrent modification happen because it is really expensive to handle. This implies that not the developer should address this issue, but the business needs to change the process.
The right question to ask is, "When you replicate data in a distributed system, why do you allow it to be modified concurrently at all?" Typically stakeholders will not like this kind of question, and the appropriate reaction is to work out a conflict resolution process together with them. Only then they will realize how expensive fixing that kind of desynchronization is, and more often than not they will see that adjusting the process is way cheaper than insisting in yet another back-office process to fix the issues it causes. Even if they insist that there is a need for this concurrent modification, they will now understand that it is not your task to sort this out and that they need to invest in a conflict resolution process.
There is no way to compare the server and client state to the server state on the client, but you can do a refresh to retain the local changes and get an updated ETag. The real solution, however, is to rework the business process, because this no longer is a purely technical issue.
The default solution is that SMP or HCPms is detecting errors by ETags. At client side there is no API to manipulate ETags in case of conflicts. A potential solution to implement a kind of diff view on the device would work like this:
Show errors
Cache errors (maybe only in memory?)
delete the errors
do a refresh of the database
build a diff view with current data and cached errors
The idea with
could also work but could be very tricky and may introduce side effects.
Maybe some requests are triggered against the "wrong" backend.

Is using a Web API as dataprovider for a website efficient?

I was thinking about setting up a project with Web API. Basically build the API first and program the web site using this API.
Although it's sound promising I was wondering:
If I separate the logic in a nice way, I might end up retrieving data on a web-page through multiple API call's, which in turn are multiple connections with the server with all the overhead etc..
For example, if I use, let's say 8 different API call's on one page, I can't imagine it won't have an impact on the web-page's performance.
So, have I misunderstood something? Or is this kind of overhead negligible - or does the need for multiple call's indicates that the design is wrong?
Thanks in advance.
Well, we did it. Web API server providing the REST access to all the data. Independent UI Clients consuming it as the only access-point to underlying peristence.
The first request takes some time. It is significantly longer. It must init all the UI Client stuff, and get the least needed data from a server. (Menu, user, access rights, metadata...list-view data)
The point, the real advantage, is hidden in the second, the third... request. Lot of stuff is already there on a UI Client. And even, if this is requested again, caching (Server, Client, both) could be introduced.
So, this would mean more requests (at least during the UI Client start up)... but it does not imply ... slower application.
The maintenance benefit is hidden (maybe it is not hidden, it should be obvious) in the Separation of Concern. On the server, we are no longer solving the issue, where to place the user data handling, the base-controller or child-controller... should there by the Master-page, the Layout-controller...
Solved. We are taking care about single, specific stuff, published via REST. One method, one business operation. And that's the dream if we'd like to keep that application alive and be the repairman and extender.
One aspect is that you can display the page to the end user very very fast . Once the page is loaded, use Jquery async calls and any Javscript template tool (like angularjs or mustacheJs) to call the web api simultaneously to build the client page views.
I have used this approach in multiple project and experience of the user is tremendous.
Most modern browsers support 6-8 parallel connections to the same site. So you do have to be careful about that. Unless you are connecting to that many separate systems, I would try to reduce the number of connections. Or ensure the calls are called asynchronously by different events to reduce the chance of parallel connections.
Making a series of HTTP calls to obtain data for your page will have an overhead. Only testing will tell you how that might impact in your scenario.
There is little point using Web API just because you can. You should have a legitimate reason for building a RESTful API. Even then, if it is primarily for your own consumption, design it to deliver a ViewModel for each page in one call.

iOS App Offline and synchronization

I am trying to build an offline synchronization capability into my iOS App and would like to get some feedback/advice from the community on the strategy and best practice to be followed to do the same. The app details are as follows:
The app shows a digital catalog to users and allows them to perform actions like creating and placing orders, among others.
Currently the app only works when online, and we have APIs for all actions like viewing the catalog, creating/placing orders which return JSON data.
We would like to provide offline/synchronization capability to users, through which users can view the catalog and create/place orders while offline, and when they come online the order details will be synchronized and updated to our server.
We would also like to pull the latest data from the server, and have the app keep itself up to date in case of catalog changes or order changes that happened at the Server while the app was offline.
Can you guys help me to come with the best design and approach for handling this kind of functionality?
I have done something similar just in the beginning of this year. After I read about NSOperationQueue and NSOperation I did a straight forward approach:
Whenever an object is changed/added/... in my local database, I add a new "sync"-operation to the queue and I do not care about, if the app is online or offline (I added a reachability observer which either suspended the queue or takes it back working; of course, I do re-queueing if an error occurs (lost network during sync)). The operation itself reads/writes the database and does the networking stuff. My ViewController use a NSFetchedResultsController (with delegate=self) to get callbacks on changes. In some cases I needed some extra local data (it is about counting objects), where I have used NSManagedObjectContextObjectsDidChangeNotification.
Furthermore, I have used Multi-Context CoreData which sounded quite reasonable to use (I have only two contexts).
To get notified about changes from your server, I believe that iOS 7 has something new for you.
On the server side, you should read a little for the actual approach you want to go for: i.e. Data Synchronization by Dan Grover or Developing Android REST Client Applications (of course there are many more good articles out there).
Caution: you might be disappointed when you expect an easy solution. Your requirement is not unusual, but the solution might become more complex than you expect - depending on the "business rules" and other reasonable requirements. If you intelligently restrict your requirements you may find a solution which you can implement yourself, otherwise you may also consider to use a commercial product.
I could imagine, that if you design the business logic such that it takes an offline state into account and exposes this explicitly in the business logic, you may find a solution which you can implement yourself with moderate effort. What I mean by this is for example, when a user creates an order, it is initially in "not committed" stated. The order will only be committed when there is access to the server and if the server gives the "OK" that this order can actually be placed by this user. The server may also deny the order, sending corresponding messages to the user.
There are probably quite a few subtle issues that may arise due to the requirement of eventual consistency.
See also this question which contains pointers to solutions from commercial products, and if you visit their web sites give valuable information about the complexity of the problem and how this can be solved.

Recommendations to test API request layer in iOS apps using NSOperations and Coredata

I develop an iOS app that uses a REST API. The iOS app requests data in worker threads and stores the parsed results in core data. All views use core data to visualize the information. The REST API changes rapidly and I have no real control over the interface.
I am looking for advice how perform integration tests for the app as easy as possible. Should I test against the API or against Mock data? But how to mock GET requests properly if you can create resources with POST or modify them with PUT?
What frameworks do you use for these kind of problems? I played with Frank, which looks nice but is complicated due to rapid UI changes in the iOS app. How would you test the "API request layer" in the app? Worker threads are NSOperations in a queue - everything is build asynchronously. Any recommendations?
I would strongly advise you to mock the server. Servers go down, the behavior changes, and if a test failure implies "maybe my code still works", you have a problem on your hands, because your test doesn't tell you whether or not the code is broken, which is the whole point.
As for how to mock the server, for a unit test that does this:
first_results = list_things()
results_after_delete = list_thing()
I have a mock data structure that looks like this:
{ list_things_request : [first_results, results_after_delete],
delete_thing_request: [delete_thing_response] }
It's keyed on your request, and the value is an array of responses for that request in the order that they were seen. Thus you can support repeatedly running the same request (like listing the things) and getting a different result. I use this format because in my situation it is possible for my API calls to run in a slightly different order than it did last time. If your tests are simpler, you might be able to get away with a simple list of request/response pairs.
I have a flag in my unit tests that indicate if I am in "record" mode (that is, talking to a real server and recording this data-structure to disk) or if I am in "playback" mode (talking to the datastructure). When I need to work with a test, I "record" the interactions with the server and then play them back.
I use the little-known SenTestCaseDidStartNotification to track which unit test is running and isolate my data files appropriately.
The other thing to keep in mind is that instability is the root of all evil. If you have code that does things with sets, or gets the current date, and such, this tends to change the requests and responses, which do not work in an offline scenario. So be careful with those.
(Since nobody stepped in yet and gave you a complete walkthrough) My humble advice: Step back a bit, take out the magic of async, regard everything as sync (api calls, parsing, persistence), and isolate each step as a consumer/producer. After all you don't wan't to unit-test NSURLConnection, or JSONKit or whatever (they should have been tested if you use them), you want to test YOUR code. Your code takes some input and produces output, non-aware of the fact that the input was in fact the output genereated in a background thread somewhere. You can do the isolated test all sync.
Can we agree on the fact that your Views don't care about how their model data was provided? If yes, well, test your View with mock objects.
Can we agree on the fact that your parser doesn't care about how the data was provided? If yes, well, test your parser with mock data.
Network layer: same applies as described above, in the end you'll get an NSDictionary of headers, and some NSData or NSString of content. I don't think you want to unit-test NSURLConnection or any 3'rd party networking api you trust (asihttp, afnetworking,...?), so in the end, what's to be tested?
You can mock up URLs, request headers and POST data for each use-case you have, and setup test cases for expected responses.
In the end, IMHO, it's all about "normalizing" out asyc.
Take a look at Nocilla
For more info, check this other answer to a similar question

Storing Data In Memory: Session vs Cache vs Static

A bit of backstory: I am working on an web application that requires quite a bit of time to prep / crunch data before giving it to the user to edit / manipulate. The data request task ~ 15 / 20 secs to complete and a couple secs to process. Once there, the user can manipulate vaules on the fly. Any manipulation of values will require the data to be reprocessed completely.
Update: To avoid confusion, I am only making the data call 1 time (the 15 sec hit) and then wanting to keep the results in memory so that I will not have to call it again until the user is 100% done working with it. So, the first pull will take a while, but, using Ajax, I am going to hit the in-memory data to constantly update and keep the response time to around 2 secs or so (I hope).
In order to make this efficient, I am moving the intial data into memory and using Ajax calls back to the server so that I can reduce processing time to handle the recalculation that occurs w/ this user's updates.
Here is my question, with performance in mind, what would be the best way to storing this data, assuming that only 1 user will be working w/ this data at any given moment.
Also, the user could potentially be working in this process for a few hours. When the user is working w/ the data, I will need some kind of failsafe to save the user's current data (either in a db or in a serialized binary file) should their session be interrupted in some way. In other words, I will need a solution that has an appropriate hook to allow me to dump out the memory object's data in the case that the user gets disconnected / distracted for too long.
So far, here are my musings:
Session State - Pros: Locked to one user. Has the Session End event which will meet my failsafe requirements. Cons: Slowest perf of the my current options. The Session End event is sometimes tricky to ensure it fires properly.
Caching - Pros: Good Perf. Has access to dependencies which could be a bonus later down the line but not really useful in current scope. Cons: No easy failsafe step other than a write based on time intervals. Global in scope - will have to ensure that users do not collide w/ each other's work.
Static - Pros: Best Perf. Easies to maintain as I can directly leverage my current class structures. Cons: No easy failsafe step other than a write based on time intervals. Global in scope - will have to ensure that users do not collide w/ each other's work.
Does anyone have any suggestions / comments on what I option I should choose?
Update: Forgot to mention, I am using VB.Net, Asp.Net, and Sql Server 2005 to perform this task.
I'll vote for secret option #4: use the database for this. If you're talking about a 20+ second turnaround time on the data, you are not going to gain anything by trying to do this in-memory, given the limitations of the options you presented. You might as well set this up in the database (give it a table of its own, or even a separate database if the requirements are that large).
I'd go with the caching method of for storing the data across any page loads. You can name the cache you want to store the data in to avoid conflicts.
For tracking user-made changes, I'd go with a more old-school approach: append to a text file each time the user makes a change and then sweep that file at intervals to save changes back to DB. If you name the files based on the user/account or some other session-unique indicator then there's no issue with conflict and the app (or some other support app, which might be a better idea in general) can sweep through all such files and update the DB even if the session is over.
The first part of this can be adjusted to stagger the write out more: save changes to Session, then write that to file at intervals, then sweep the file at larger intervals. you can tune it to performance and choose what level of possible user-change loss will be possible.
Use the Session, but don't rely on it.
Simply, let the user "name" the dataset, and make a point of actively persisting it for the user, either automatically, or through something as simple as a "save" button.
You can not rely on the session simply because it is (typically) tied to the users browser instance. If they accidentally close the browser (click the X button, their PC crashes, etc.), then they lose all of their work. Which would be nasty.
Once the user has that kind of control over the "persistent" state of the data, you can rely on the Session to keep it in memory and leverage that as a cache.
I think you've pretty much just answered your question with the pros/cons. But if you are looking for some peer validation, my vote is for the Session. Although the performance is slower (do you know by how much slower?), your processing is going to take a long time regardless. Do you think the user will know the difference between 15 seconds and 17 seconds? Both are "forever" in web terms, so go with the one that seems easiest to implement.
perhaps a bit off topic. I'd recommend putting those long processing calls in asynchronous (not to be confused with AJAX's asynchronous) pages.
Take a look at this article and ping me back if it doesn't make sense.
I suggest to create a copy of the data in a new database table (let's call it EDIT) as you send the initial results to the user. If performance is an issue, do this in a background thread.
As the user edits the data, update the table (also in a background thread if performance becomes an issue). If you have to use threads, you must make sure that the first thread is finished before you start updating the rows.
This allows a user to walk away, come back, even restart the browser and commit whenever she feels satisfied with the result.
One possible alternative to what the others mentioned, is to store the data on the client.
Assuming the dataset is not too large, and the code that manipulates it can be handled client side. You could store the data as an XML data island or JSON object. This data could then be manipulated/processed and handled all client side with no round trips to the server. If you need to persist this data back to the server the end resulting data could be posted via an AJAX or standard postback.
If this does not work with your requirements I'd go with just storing it on the SQL server as the other comment suggested.
