Hey, I'm working on the model layer for our app here.
Some of the requirements are like this:
It should work on iPhone OS 3.0+.
The source of our data is a RESTful Rails application.
We should cache the data locally using Core Data.
The client code (our UI controllers) should have as little knowledge about any network stuff as possible and should query/update the model with the Core Data API.
I've checked out the WWDC10 Session 117 on Building a Server-driven User Experience, spent some time checking out the Objective Resource, Core Resource, and RestfulCoreData frameworks.
The Objective Resource framework doesn't talk to Core Data on its own and is merely a REST client implementation. The Core Resource and RestfulCoreData all assume you talk to Core Data in your code and they solve all the nuts and bolts in the background on the model layer.
All looks okay so far and initially I though either Core Resource or RestfulCoreData will cover all of the above requirements, but... There's a couple of things none of them seemingly happen to solve correctly:
The main thread should not be blocked while saving local updates to the server.
If the saving operation fails the error should be propagated to the UI and no changes should be saved to the local Core Data storage.
Core Resource happens to issue all of its requests to the server when you call - (BOOL)save:(NSError **)error on your Managed Object Context and therefore is able to provide a correct NSError instance of the underlying requests to the server fail somehow. But it blocks the calling thread until the save operation finishes. FAIL.
RestfulCoreData keeps your -save: calls intact and doesn't introduce any additional waiting time for the client thread. It merely watches out for the NSManagedObjectContextDidSaveNotification and then issues the corresponding requests to the server in the notification handler. But this way the -save: call always completes successfully (well, given Core Data is okay with the saved changes) and the client code that actually called it has no way to know the save might have failed to propagate to the server because of some 404 or 421 or whatever server-side error occurred. And even more, the local storage becomes to have the data updated, but the server never knows about the changes. FAIL.
So, I'm looking for a possible solution / common practices in dealing with all these problems:
I don't want the calling thread to block on each -save: call while the network requests happen.
I want to somehow get notifications in the UI that some sync operation went wrong.
I want the actual Core Data save fail as well if the server requests fail.
Any ideas?
You should really take a look at RestKit (http://restkit.org) for this use case. It is designed to solve the problems of modeling and syncing remote JSON resources to a local Core Data backed cache. It supports an offline mode for working entirely from the cache when there is no network available. All syncing occurs on a background thread (network access, payload parsing, and managed object context merging) and there is a rich set of delegate methods so you can tell what is going on.
There are three basic components:
The UI Action and persisting the change to CoreData
Persisting that change up to the server
Refreshing the UI with the response of the server
An NSOperation + NSOperationQueue will help keep the network requests orderly. A delegate protocol will help your UI classes understand what state the network requests are in, something like:
#protocol NetworkOperationDelegate
- (void)operation:(NSOperation *)op willSendRequest:(NSURLRequest *)request forChangedEntityWithId:(NSManagedObjectID *)entity;
- (void)operation:(NSOperation *)op didSuccessfullySendRequest:(NSURLRequest *)request forChangedEntityWithId:(NSManagedObjectID *)entity;
- (void)operation:(NSOperation *)op encounteredAnError:(NSError *)error afterSendingRequest:(NSURLRequest *)request forChangedEntityWithId:(NSManagedObjectID *)entity;
#end
The protocol format will of course depend on your specific use case but essentially what you're creating is a mechanism by which changes can be "pushed" up to your server.
Next there's the UI loop to consider, to keep your code clean it would be nice to call save: and have the changes automatically pushed up to the server. You can use NSManagedObjectContextDidSave notifications for this.
- (void)managedObjectContextDidSave:(NSNotification *)saveNotification {
NSArray *inserted = [[saveNotification userInfo] valueForKey:NSInsertedObjects];
for (NSManagedObject *obj in inserted) {
//create a new NSOperation for this entity which will invoke the appropraite rest api
//add to operation queue
}
//do the same thing for deleted and updated objects
}
The computational overhead for inserting the network operations should be rather low, however if it creates a noticeable lag on the UI you could simply grab the entity ids out of the save notification and create the operations on a background thread.
If your REST API supports batching, you could even send the entire array across at once and then notify you UI that multiple entities were synchronized.
The only issue I foresee, and for which there is no "real" solution is that the user will not want to wait for their changes to be pushed to the server to be allowed to make more changes. The only good paradigm I have come across is that you allow the user to keep editing objects, and batch their edits together when appropriate, i.e. you do not push on every save notification.
This becomes a sync problem and not one easy to solve. Here's what I'd do: In your iPhone UI use one context and then using another context (and another thread) download the data from your web service. Once it's all there go through the sync/importing processes recommended below and then refresh your UI after everything has imported properly. If things go bad while accessing the network, just roll back the changes in the non UI context. It's a bunch of work, but I think it's the best way to approach it.
Core Data: Efficiently Importing Data
Core Data: Change Management
Core Data: Multi-Threading with Core Data
You need a callback function that's going to run on the other thread (the one where actual server interaction happens) and then put the result code/error info a semi-global data which will be periodically checked by UI thread. Make sure that the wirting of the number that serves as the flag is atomic or you are going to have a race condition - say if your error response is 32 bytes you need an int (whihc should have atomic acces) and then you keep that int in the off/false/not-ready state till your larger data block has been written and only then write "true" to flip the switch so to speak.
For the correlated saving on the client side you have to either just keep that data and not save it till you get OK from the server of make sure that you have a kinnf of rollback option - say a way to delete is server failed.
Beware that it's never going to be 100% safe unless you do full 2-phase commit procedure (client save or delete can fail after the signal from the server server) but that's going to cost you 2 trips to the server at the very least (might cost you 4 if your sole rollback option is delete).
Ideally, you'd do the whole blocking version of the operation on a separate thread but you'd need 4.0 for that.
Related
Background story
I am developing a big iOS app. This app works under specific assumptions. The main of them is that app should work offline with internal storage which is a snapshot of last synchronized state of data saved on server. I decided to use CoreData to handle this storage. Every time app launches I check if WiFi connection is enabled and then try to synchronize storage with server. The synchronization can take about 3 minutes because of size of data.
The synchronization process consists of several stages and in each of them I:
fetch some data from the server (XML)
deserialize it
save it in Core Data
Problem
Synchronization process can be interrupted for several reasons (internet connection, server down, user leaving application, etc). This may cause data to be out-of-sync.
Let's assume that synchronization process has 5 stages and it breaks after third. It results in 3/5 of data being updated in internal storage and the rest being out of sync. I can't allow it because data are strongly connected to each other (business logic).
Goal
I don't know if it is possible but I'm thinking about implementing one solution. On start of synchronization process I would like to create snapshot (some kind of copy) of current state of Core Date and during synchronization process work on it. When synchronization process completes with success then this snapshot could overwrite current CoreData state. When synchronization interrupts then snapshot can be simply aborted. My internal storage will be secured.
Questions
How to create CoreData snapshot?
How to work with CoreData snapshot?
How to overwrite CoreDate state with snapshot?
Thanks in advice for any help. Code examples, if it is possible, will be appreciated.
EDIT 1
The size of data is too big to handle it with multiple CoreData's contexts. During synchronization I am saving current context multiple times to cleanup memory. If I do not do it, the application will crash with memory error.
I think it should be resolved with multiple NSPersistentStoreCoordinators using for example this method: link. Unfortunately, I don't know how to implement this.
You should do exactly what you said. Just create class (lets call it SyncBuffer) with methods "load", "sync" and "save".
The "load" method should read all entities from CoreData and store it in class variables.
The "sync" method should make all the synchronisation using class variables.
Finally the "save" method should save all values from class variables to CoreData - here you can even remove all data from CoreData and save brand new values from SyncBuffer.
A CoreData stack is composed at its core by three components: A context (NSManagedObjectContext) a model (NSManagedObjectModel) and the store coordinator (NSPersistentStoreCoordinator/NSPersistentStore).
What you want is to have two different contexts, that shares the same model but use two different stores. The store itself will be of the same type (i.e. an SQLite db) but use a different source file.
At this page you can see some documentation about the stack:
https://developer.apple.com/library/archive/documentation/Cocoa/Conceptual/CoreData/InitializingtheCoreDataStack.html#//apple_ref/doc/uid/TP40001075-CH4-SW1
The NSPersistentContainer is a convenience class to initialise the CoreData stack.
Take the example of the initialisation of a NSPersistentContainer from the link: you can have the exact same code to initialise it, twice, but with the only difference that the two NSPersistentContainer use a different .sqlite file: i.e. you can have two properties in your app delegate called managedObjectContextForUI and managedObjectContextForSyncing that loads different .sqlite files. Then in your program you can use the context from one store to get current data to show to the user and you can use the context that use the other store with a different .sqlite if you are doing sync operations. When the sync operations are finally done you can eventually swap the two files and after clearing and reloading the NSPersistentContainer (this might be tricky, because you will want to invalidate and reload all managed objects: you are switching to an entirely new context) you can then show the newly synced data to the user and start syncing again on a new .sqlite file.
The way I understand the problem is that you wish to be able download a large "object graph". It is however so large that it cannot be loaded at once in memory, so you would have to break it in chunks and then merge it locally into to Core data.
If that is the case, I think that's not trivial. I am not sure I can think of direct solution without understanding the object relations and even then it may be really overwhelming.
An overly simplistic solution may be to generate the sqlite file on the backend then download it in chunks; it seems ugly, but it serves to separate the business logic from the sync, i.e. the sqlite file becomes the transport layer. So, I think the essence to the solution would be to find a way to physically represent the data you are syncing in a format that allows for splitting it in chunks and that can afterwards be merged into a sqlite file (if you insist on using Core data).
Please also note that as far as I know Amazon (https://aws.amazon.com/appsync/) and Realm (https://realm.io/blog/introducing-realm-mobile-platform/) provide background sync of you local database, but those are paid services and you would have to be careful not be locked in (should not depend on their libs in your model layer, instead have a translation layer).
Sometimes operations are depending on each other, user can not update record until he inserted it. How can client changes be synchronised / uploaded to server, without blocking the GUI, if user interactions are more frequent than internet communication allows it?
In my recent version of my app, I store changes in Core Data and same time send change to backend, and until success message returns, I block GUI to keep client and server storage in a consistent state. But I know it is not a good approach especially if there is a lot of control in the same GUI, and user could play with them manipulate them quickly with short delays. Because it is annoying, that you have to wait.
What general approach do you recommend to build a responsive app which is not depending and able to hide the relative slowness of internet communication. Any good tutorial about it?
This is a theoretic question, and expecting theoretic answer.
A good way is to have this setup with parent and child managed object contexts:
SavingContext (background, saves to persistent store)
MainContext (main thread, child context of saving context)
WorkerContext (background, child context of main context)
Thanks Marcus Zarra.
UI initiated changes in the model get saved right away in the main context. You then send them to the backend in a spawned worker context. You can have several of these without problem.
Once the response comes from the server, you save the changes in the worker context which "pushes" them up to the main context. Here you can define a merge policy that resolves any conflicts (for details please ask a new question).
The main context can now update the UI with the new information (if any).
My app connects to a Parse server asynchronously and downloads the necessary data into the app's Core Data store . I would then like to display this data in a tableview. But in most cases -- as the connection is asynchronous -- the table view can access the data store much faster than the downloaded does . In this situation, I get an empty table view cell, and just after that the data is ready in the data store.
What is the best way to deal with the delays caused by asynchronous downloads? Is there a concept that I'm missing? Is it NSFetchedResultsController?
What do you think is the best way to deal with the delays caused by
asynchronous downloads?
It depends on requirements you have. In particular, if the user can interact with UI during async downloads, you can do nothing on it, otherwise you could use just a spinner to alert him something is downloading and stop the interaction until the sync as finished.
Anyway, in both cases, you should say something about the download. In particular, are you saving data in a different thread (different from the main one)? If so you should merge the changes from the context you use in background to the context associated with the NSFetchedResultsController (always the main one since NSFetchedResultsController manages UI elements).
Is there a "concept" that I miss and is it NSFetchedResultsController?
Did you setup correctly the delegate NSFetchedResultsControllerDelegate? If so, the NSFetchedResultsController tracks changes on the entity you registered on your fetch request. Not changes will happen for other entities.
Asynchronous is a design problem that you will need to work with. Look to some of the other popular applications in your field and see how they solve it. Do they show a spinner (I personally hate that) or do they show some unobtrusive indicator that data is being downloaded (better)?
If you use a NSFetchedResultsController (which I am guessing by your question you are not currently) you will get the data displayed once it is saved in Core Data with no additional effort on your part. So you can at least show the data as soon as possible.
In the meantime, I recommend let the cells/table be empty and let the user know that your app is working. Display the data as soon as possible. Perhaps consider downloading the data in pieces so that they can start to see it asap.
I am searching for the best possible way to update a fairly large core-data based dataset in the background, with as little effect on the application UI (main thread) as possible.
There's some good material available on this topic including:
Session 211 from WWDC 2013 (Core Data Performance Optimization and Debugging, from around 25:30 onwards)
Importing Large Data Sets from objc.io
Common Background Practices from objc.io (Core Data in the Background)
Backstage with Nested Managed Object Contexts
Based on my research and personal experience, the best option available is to effectively use two separate core-data stacks that only share data at the database (SQLite) level. This means that we need two separate NSPersistentStoreCoordinators, each of them having it's own NSManagedObjectContext. With write-ahead logging enabled on the database (default from iOS 7 onwards), the need for locking could be avoided in almost all cases (except when we have two or more simultaneous writes, which is not likely in my scenario).
In order to do efficient background updates and conserve memory, one also needs to process data in batches and periodically save the background context, so the dirty objects get stored to the database and flushed from memory. One can use the NSManagedObjectContextDidSaveNotification that gets generated at this point to merge the background changes into the main context, but in general you don't want to update your UI immediately after a batch has been saved. You want to wait until the background job is completely done and than refresh the UI (recommended in both the WWDC session and objc.io articles). This effectively means that the application main context remains out of sync with the database for a certain time period.
All this leads me to my main question, which is, what can go wrong, if I changed the database in this manner, without immediately telling the main context to merge changes? I'm assuming it's not all sunshine an roses.
One specific scenario that I have in my head is, what happens if a fault needs to be fulfilled for an object loaded in the main context, if the background operation has in between deleted that object from the database? Can this for instance happen on a NSFetchedResultsController based table view that uses a batchSize to fetch objects incrementally into memory? I.e., an object that has not yet been fully fetched gets deleted but than we scroll up to a point where the object needs to get loaded. Is this a potential problem? Can other things go wrong? I'd appreciate any input on this matter.
Great question!
I.e., an object that has not yet been fully fetched gets deleted but
than we scroll up to a point where the object needs to get loaded. Is
this a potential problem?
Unfortunately it'll cause problems. A following exception will be thrown:
Terminating app due to uncaught exception 'NSObjectInaccessibleException', reason: 'CoreData could not fulfill a fault for '0xc544570 <x-coredata://(...)>'
This blog post (section titled "How to do concurrency with Core Data?") might be somewhat helpful, but it doesn't exhaust this topic. I'm struggling with the same problems in an app I'm working on right now and would love to read a write-up about it.
Based on your question, comments, and my own experience, it seems the larger problem you are trying to solve is:
1. Using an NSFetchedResultsController on the main thread with thread confinement
2. Importing a large data set, which will insert, update, or delete managed objects in a context.
3. The import causes large merge notifications to be processed by the main thread to update the UI.
4. The large merge has several possible effects:
- The UI gets slow, or too busy to be usable. This may be because you are using beginUpdates/endUpdates to update a tableview in your NSFetchedResultsControllerDelegate, and you have a LOT of animations queing up because of the large merge.
- Users can run into "Could not fulfill fault" as they try to access a faulted object which has been removed from the store. The managed object context thinks it exists, but when it goes to the store to fulfill the fault the fault it's already been deleted. If you are using reloadData to update a tableview in your NSFetchedResultsControllerDelegate, you are more likely to see this happen than when using beginUpdates/endUpdates.
The approach you are trying to use to solve the above issues is:
- Create two NSPersistentStoreCoordinators, each attached to the same NSPersistentStore or at least the same NSPersistentStore SQLite store file URL.
- Your import occurs on NSManagedObjectContext 1, attached to NSPersistentStoreCoordinator 1, and executing on some other thread(s). Your NSFetchedResultsController is using NSManagedObjectContext 2, attached to NSPersistentStoreCoordinator 2, running on the main thread.
- You are moving the changes from NSManagedObjectContext 1 to 2
You will run into a few problems with this approach.
- An NSPersistentStoreCoordinator's job is to mediate between it's attached NSManagedObjectContexts and it's attached stores. In the multiple-coordinator-context scenario you are describing, changes to the underlying store by NSManagedObjectContext 1 which cause a change in the SQLite file will not be seen by NSPersistentStoreCoordinator 2 and it's context. 2 does not know that 1 changed the file, and you will have "Could not fulfill fault" and other exciting exceptions.
- You will still, at some point, have to put the changed NSManagedObjects from the import into NSManagedObjectContext 2. If these changes are large, you will still have UI problems and the UI will be out of sync with the store, potentially leading to "Could not fulfill fault".
- In general, because NSManagedObjectContext 2 is not using the same NSPersistentStoreCoordinator as NSManagedObjectContext 1, you are going to have problems with things being out of sync. This isn't how these things are intended to be used together. If you import and save in NSManagedObjectContext 1, NSManagedObjectContext 2 is immediately in a state not consistent with the store.
Those are SOME of the things that could go wrong with this approach. Most of these problems will become visible when firing a fault, because that accesses the store. You can read more about how this process works in the Core Data Programming Guide, while the Incremental Store Programming Guide describes the process in more detail. The SQLite store follows the same process that an incremental store implementation does.
Again, the use case you are describing - getting a ton of new data, executing find-Or-Create on the data to create or update managed objects, and deleting "stale" objects that may in fact be the majority of the store - is something I have dealt with every day for several years, seeing all of the same problems you are. There are solutions - even for imports that change 60,000 complex objects at a time, and even using thread confinement! - but that is outside the scope of your question.
(Hint: Parent-Child contexts don't need merge notifications).
Two Persistent Store Coordinators (pscs) is certainly the way to go with large datasets. File locking is faster than the locking within core data.
There's no reason you couldn't use the background psc to create thread confined NSManagedObjectContexts in which each is created for each operation you do in the background. However, instead of letting core data manage the queueing you now need to create NSOperationQueues and/or threads to manage the operations based on what you're doing in the background. NSManagedObjectContexts are free and not expensive. Once you do this you can hang onto your NSManagedObjectContext and only use it during that one operation and/or threads life time and build as many changes as you want and wait until the end to commit them and merge them to the main thread how ever you decide. Even if you have some main thread writes you can still at crucial points in your operations life time refetch/merge back into your threads context.
Also it's important to know that if you're working on large sets of data don't worry about merging contexts so as long as you aren't touching something else. For example if you have class A and class B and you have two seperate opertions/threads to work on them and they have no direct relationship you do not have to merge the contexts if one changes you can keep on rolling with the changes. The only major need for merging background contexts in this fashion is if there are direct relationships faulting. It would be better to prevent this though through some sort of serialization whether it be NSOperationQueue or what ever else. So feel free to work away on different objects in the background just be careful about their relationships.
I've worked on a large scale core data projects and had this pattern work very well for me.
Indeed, this is the best core data scenario you can work with. Almost no Main UI staleness, and easy background management of your data. When you want to tell the Main Context (and maybe a currently running NSFetchedResultsController) you listen for save notifications of the backgroundContext like this:
[[NSNotificationCenter defaultCenter]
addObserver:self selector:#selector(reloadFetchedResults:)
name:NSManagedObjectContextDidSaveNotification
object:backgroundObjectContext];
Then, you can merge changes, but waiting for the Main Thread context to catch them before saving. When you receive the mergeChangesFromContextDidSaveNotification notification the changes are not yet saved. Hence the performBlockAndWait is mandatory, so the Main context gets the changes and then the NSFetchedResultsController updates its values correctly.
-(void)reloadFetchedResults:(NSNotification*)notification
{
NSManagedObjectContext*moc=[notification object];
if ([moc isEqual:backgroundObjectContext])
{
// Delete caches of fethcedResults if you have a deletion
if ([[theNotification.userInfo objectForKey:NSDeletedObjectsKey] count]) {
[NSFetchedResultsController deleteCacheWithName:nil];
}
// Block the background execution of the save, and merge changes before
[managedObjectContext performBlockandWait:^{
[managedObjectContext
mergeChangesFromContextDidSaveNotification:notification];
}];
}
}
There is a pitfall no one has noticed. You can get the save notification before the background context has actually saved the object you want to merge. If you want to avoid problems by a faster Main Context asking for an object that has not been saved yet by the background context, you should (you really should) call obtainPermanentIDsForObjects before any background save. Then you are safe to call the mergeChangesFromContextDidSaveNotification. This will ensure that the merge receives a valid permanent Id for merging.
I'm designing an application that relies heavily on a database. I need my application to be resilient to short losses of connectivity to the database (network going down for a few seconds for example). What is the usual patterns that people use for these kind of problems. Is there something that I can do on the database access layer to handle gracefully a small glitch in the network connection to the db (i'm using hibernate + oracle jdbc + dbcp pool).
I'll assume you have hidden every database access behind a DAO or something similiar.
Now create wrappers around these DAOs, that try to call them, and in case of an exception wait a second and retry. Of course this will cause 'hanging' of the application during db-outage, but it will come back to live when the database becomes available.
If this is not acceptable you'll have to move the cut up closer to the ui layer. Consider the following approach.
User causes a
request.
wrap all the request information in a message and put it in the queue.
return to the user, telling him that his request will get processed in a short time.
A worker registered on the queue will process the request, retrying when database problems exist.
Note that you are now deep in concurrency land. So you must handle things like requests referencing an entity which already got deleted.
Read up on 'eventual consistency'
Since you are using hibernate, you'll have to deal with lazy loading. An interruption in connectivity will kill your session, so for you it might be best not to use lazy loading at all, but work with detached objects.