I'm working on a program which repeatedly needs to fetch new data, parse it and store it using Core Data. One of the problems is that the data is split up over multiple web service requests and so the parsing needs to be split up in various parts before the final object is assembled. All the parsing also needs to happen in the background.
I thought about creating a new NSManagedObjectContext per request, but then the problem is that I have to find a way to pass my objects from one context to the other and that seems quite tricky to me, considering it can easily take 10 parsing steps until the object is complete.
So now I thought about using a single NSManagedObjectContext initialised with a NSPrivateQueueConcurrencyType. It seems to work fine, except that sometimes I will receive an EXC_BAD_ACCESS in one step of the flow. So my question is, am I on the right path here? I know that I can nest multiple performBlock calls and that core data will take care of the threading. But can I also use multiple non-nested performBlock calls spread over time (which is what I'm doing), as long as they are all running on the same NSManagedObjectContext?
Implemented it like this and it turns out it works fine.
Related
I have read several tutorials that recommend using two (or more) NSManageObjectContexts when implementing core data, so as not to block the UI of the main queue. I am a little confused, however, because some recommend making the child context of the persistent store coordinator that of type mainQueueConcurrencyType, and then giving it its own child context of type privateQueueConcurrencyType, while others suggest the opposite.
I would personally think the the best setup for using two contexts would be to have the persistent store coordinator -> privateQueueConcurrencyType -> mainQueueConcurrencyType, and then only saving to the private context, and only reading from the main context. My understanding of the benefits of this setup is that saving to the private context won't have to go through the main context, as well as reading on the main context will always include the changes that are made on the private context.
I know that many apps require a unique solution that this setup might not work for, but as a general good practice, does this make sense?
Edit:
Some people have pointed out that this setup isn’t necessary with the introduction of NSPersistentContainer. The reason I am asking about it is because I’ve inherited a huge project at work that uses a pre-iOS-10 setup, and its experiencing issues.
I am open to re-writing our core data stack using NSPersistentContainer, but I wouldn't be comfortable spending the time on it unless I could find an example of how it should be setup with respect to our use cases ahead of time.
Here are the steps that most of our main use cases follow:
1) User edits object (eg. adds a photo/text to an abstract object).
2) An object (sync task) is created to encapsulate an API call to update the edited object on the server. Sync tasks are saved to core data in a queue to fire one after the other, and only when internet is available (thus allowing offline editing).
3) The edited object is also immediately saved to core data and then returned to the user so that the UI reflects its updates.
With NSPersistentContainer, would having all the writing done in performBackgroundTask, and all the viewing done on viewContext suffice for our needs for the above use cases?
Since iOS10 you don't need to worry about any of this, just use the contexts NSPersistentContainer provides for you.
I'm very used to creating a Core Data stack synchronously. However, I just noticed that this Apple provided example doesn't do that, instead it adds the persistent store coordinator on a background thread.
https://developer.apple.com/library/mac/documentation/Cocoa/Conceptual/CoreData/InitializingtheCoreDataStack.html
Why?
What are the ramifications?
Will this approach 'just work' in place of a synchronous Core Data stack setup?
The call to addPersistentStoreWithType... can block if you are doing a migration or interacting with iCloud. Therefore it is safer to throw that onto a background queue so that there is no risk of blocking the UI thread.
In addition, since the applicationDidFinishLaunching is on a ticking clock, you do not want to risk blocking that method for any longer than necessary. Since the example Core Data stack is created through that method, it is best to make sure it returns as quickly as possible.
Update
Any chance of adding how to handle this to your answer - i.e. how should the asynchronous addition be handled, any best practices? I'm guessing that fetched results controllers would update once the store coordinator is added, so showing some sort of 'loading' status in the UI until then may be best. It seems like a little too much for smaller projects though... would everything 'just work' without any code to handle the background setup? e.g. do fetches block until the coordinator is added?
In general your addPersistentStore call is going to finish long before any NSFetchedResultsController hits it. If the addPersistentStore does somehow come in after the NSFetchedResultsController fires then you will need to execute a second performFetch: as Core Data does not fire any internal notifications when that action is complete.
Personally, in the background queue, I will either execute a completionBlock/closure when the addPersistentStore is done or fire off a NSNotification. But I only add that after there is an issue that I have discovered during testing. Otherwise it is just pre-optimization.
We have been trying to debug a Core Data multiple-context/threading issue wherein merging a Core Data save notification into our main thread NSManagedObjectContext is sporadically crashing the app. This is crashing ~2% of our app sessions and we are at a loss as to how to solve this. We would really appreciate any guidance or general advice on what could possibly cause this crash.
We have a Core Data setup that looks like this:
N.B. This is the default Core Data stack in Magical Record v2.3 created from [MagicalRecord setupAutoMigratingCoreDataStack]
This is the scenario where our app is crashing:
HTTP request returns JSON
JSON is parsed into NSManagedObjects (Some new entities, some updated entities) on Root Saving Context
Root Saving Context saves to persistent store
NSManagedObjectContextDidSaveNotification is broadcast by Core Data. Default context on main queue observes this and calls mergeChangesFromContextDidSaveNotification: with the NSDictionary of changes on the main thread.
It crashes when objectID is sent to an invalid object (most likely NSManagedObject has been deallocated).
This is occurring inside the private implementation of NSManagedObjectContext mergeChangesFromContextDidSaveNotification: so it is impossible for us to see what has actually gone wrong here; all we can tell at this point that an object which should exist, does not.
This only happens on a small percent of Core Data saves, indicating that may not be a fundamental flaw in our Core Data → API stack. Moreover, there is no indication that the size or type of the changes (insertions/updates/deletions) in the context changes have any impact on the likelihood of the crash.
The documentation of NSManagedObjectContextDidSaveNotification says that:
"You can pass the notification object to mergeChangesFromContextDidSaveNotification: on another thread, however you must not use the managed object in the user info dictionary directly on another thread. For more details, see Concurrency with Core Data in Core Data Programming Guide."
Maybe this is the issue? I would make sure the object u get from the notification is getting saved on Default Context on the same thread it was posted by Root.
It's been some time now since this question was posted and after rediscovering it I'd like to answer my own question for the sake of others who find this thread.
In my circumstance I had migrated a large code base from sibling NSManagedObjectContexts updated via NSManagedObjectContextDidSaveNotification's. However the problem was not really anything to do with this, even though this did expose the issue.
The real cause of this were places that there were older parts of the code, setup by previous engineers, that had setup KVO on NSManagedObjects and their properties. It transpired that KVO on Core Data entities is in fact a very very bad idea.
More accurately, it appeared that this happened when KVO was setup on entities and either the object, or the target of a relationship on this object was deleted from the NSPersistentStore. This second condition seemed to not be the only cause of the issue, but was definitely a very prominent cause in my situation.
Lesson's learnt:
Use a fetched results controller when you need to. KVO is not a convenient shortcut and you shouldn't avoid migrating dodgy Core Data KVO code to NSFetchedResultsControllers or another sensible alternative as the procrastination will just hurt you.
Multi threaded Core Data is a difficult but very worthwhile skill to become an expert in. Knowing your Core Data stack and the nuances and limitations of Core Data multithreading is absolutely worth all the mental anguish.
One possibility is that your persistent store has become corrupted and is in an inconsistent state. If this happens an error code is generated which Magical Record does not necessarily deal with. This can be the source of a number of difficult-to-repeat apparently-random crashes related to Magical Record (and may or may not be considered a Magical Record bug).
It's worth reading the Magical Record issues threads here (same issue) and here (different issue, but could be similar cause). When I hit these problems I managed to make some temporary patch fixes following various hints in those threads, but ultimately I decided to remove my dependency on Magical Record, and I have had no problems since then.
I am developing an application that uses Core Data for internal storage. This application has the following functionalities :
Synchronize data with a server by downloading and parsing a large XML file then save the entries with core data.
Allow user to make fetches (large data fetches) and CRUD operations.
I have read through a lot and a lot of documentation that there are several patterns to follow in order to apply multithreading with Core Data :
Nested contexts : this patterns seems to have many performance issues (children contexts block ancestor when making fetches).
Use one main thread context and background worker contexts.
Use a single context (main thread context) and apply multithreading with GCD.
I tried the 3 mentioned approaches and i realized that 2 last ones work fine. However i am not sure if these approaches are correct when talking about performance.
Is there please a well known performant pattern to apply in order to make a robust application that implements the mentioned functionalities ?
rokridi,
In my Twitter iOS apps, Retweever and #chat, I use a simple two MOC model. All database insertions and deletions take place on a private concurrent insertionMOC. The main MOC merges through -save: notifications from the insertionMOC and during merge processing emits a custom UI update notification. This lets me work in a staged fashion. All tweets come in to the app are processed on the background and are presented to the UI when everything is done.
If you download the apps, #chat's engine has been modernized and is more efficient and more isolated from the main thread than Retweever's engine.
Anon,
Andrew
Apple recommends using separate context for each thread.
The pattern recommended for concurrent programming with Core Data is
thread confinement: each thread must have its own entirely private
managed object context. There are two possible ways to adopt the
pattern: Create a separate managed object context for each thread
and share a single persistent store coordinator. This is the
typically-recommended approach. Create a separate managed object
context and persistent store coordinator for each thread. This
approach provides for greater concurrency at the expense of greater
complexity (particularly if you need to communicate changes between
different contexts) and increased memory usage.
See the apple Documentation
As per apple documentation use Thread Confinement to Support Concurrency
Creating one managed object context per thread. It will make your life easy. This is for when you are parsing large data in background and also fetching data on main thread to display in UI.
About the merging issue there are some best ways to do them.
Never pass objects between thread, but pass object ids to other thread if necessary and access them from that thread, for example when you are saving data by parsing xml you should save them on current thread moc and get the ids of them, and pass to UI thread, In UI thread re fetch them.
You can also register for notification and when one moc will change you will get notified by the user info dictionary which will have updated objects, you can pass them to merge context method call.
I am searching for the best possible way to update a fairly large core-data based dataset in the background, with as little effect on the application UI (main thread) as possible.
There's some good material available on this topic including:
Session 211 from WWDC 2013 (Core Data Performance Optimization and Debugging, from around 25:30 onwards)
Importing Large Data Sets from objc.io
Common Background Practices from objc.io (Core Data in the Background)
Backstage with Nested Managed Object Contexts
Based on my research and personal experience, the best option available is to effectively use two separate core-data stacks that only share data at the database (SQLite) level. This means that we need two separate NSPersistentStoreCoordinators, each of them having it's own NSManagedObjectContext. With write-ahead logging enabled on the database (default from iOS 7 onwards), the need for locking could be avoided in almost all cases (except when we have two or more simultaneous writes, which is not likely in my scenario).
In order to do efficient background updates and conserve memory, one also needs to process data in batches and periodically save the background context, so the dirty objects get stored to the database and flushed from memory. One can use the NSManagedObjectContextDidSaveNotification that gets generated at this point to merge the background changes into the main context, but in general you don't want to update your UI immediately after a batch has been saved. You want to wait until the background job is completely done and than refresh the UI (recommended in both the WWDC session and objc.io articles). This effectively means that the application main context remains out of sync with the database for a certain time period.
All this leads me to my main question, which is, what can go wrong, if I changed the database in this manner, without immediately telling the main context to merge changes? I'm assuming it's not all sunshine an roses.
One specific scenario that I have in my head is, what happens if a fault needs to be fulfilled for an object loaded in the main context, if the background operation has in between deleted that object from the database? Can this for instance happen on a NSFetchedResultsController based table view that uses a batchSize to fetch objects incrementally into memory? I.e., an object that has not yet been fully fetched gets deleted but than we scroll up to a point where the object needs to get loaded. Is this a potential problem? Can other things go wrong? I'd appreciate any input on this matter.
Great question!
I.e., an object that has not yet been fully fetched gets deleted but
than we scroll up to a point where the object needs to get loaded. Is
this a potential problem?
Unfortunately it'll cause problems. A following exception will be thrown:
Terminating app due to uncaught exception 'NSObjectInaccessibleException', reason: 'CoreData could not fulfill a fault for '0xc544570 <x-coredata://(...)>'
This blog post (section titled "How to do concurrency with Core Data?") might be somewhat helpful, but it doesn't exhaust this topic. I'm struggling with the same problems in an app I'm working on right now and would love to read a write-up about it.
Based on your question, comments, and my own experience, it seems the larger problem you are trying to solve is:
1. Using an NSFetchedResultsController on the main thread with thread confinement
2. Importing a large data set, which will insert, update, or delete managed objects in a context.
3. The import causes large merge notifications to be processed by the main thread to update the UI.
4. The large merge has several possible effects:
- The UI gets slow, or too busy to be usable. This may be because you are using beginUpdates/endUpdates to update a tableview in your NSFetchedResultsControllerDelegate, and you have a LOT of animations queing up because of the large merge.
- Users can run into "Could not fulfill fault" as they try to access a faulted object which has been removed from the store. The managed object context thinks it exists, but when it goes to the store to fulfill the fault the fault it's already been deleted. If you are using reloadData to update a tableview in your NSFetchedResultsControllerDelegate, you are more likely to see this happen than when using beginUpdates/endUpdates.
The approach you are trying to use to solve the above issues is:
- Create two NSPersistentStoreCoordinators, each attached to the same NSPersistentStore or at least the same NSPersistentStore SQLite store file URL.
- Your import occurs on NSManagedObjectContext 1, attached to NSPersistentStoreCoordinator 1, and executing on some other thread(s). Your NSFetchedResultsController is using NSManagedObjectContext 2, attached to NSPersistentStoreCoordinator 2, running on the main thread.
- You are moving the changes from NSManagedObjectContext 1 to 2
You will run into a few problems with this approach.
- An NSPersistentStoreCoordinator's job is to mediate between it's attached NSManagedObjectContexts and it's attached stores. In the multiple-coordinator-context scenario you are describing, changes to the underlying store by NSManagedObjectContext 1 which cause a change in the SQLite file will not be seen by NSPersistentStoreCoordinator 2 and it's context. 2 does not know that 1 changed the file, and you will have "Could not fulfill fault" and other exciting exceptions.
- You will still, at some point, have to put the changed NSManagedObjects from the import into NSManagedObjectContext 2. If these changes are large, you will still have UI problems and the UI will be out of sync with the store, potentially leading to "Could not fulfill fault".
- In general, because NSManagedObjectContext 2 is not using the same NSPersistentStoreCoordinator as NSManagedObjectContext 1, you are going to have problems with things being out of sync. This isn't how these things are intended to be used together. If you import and save in NSManagedObjectContext 1, NSManagedObjectContext 2 is immediately in a state not consistent with the store.
Those are SOME of the things that could go wrong with this approach. Most of these problems will become visible when firing a fault, because that accesses the store. You can read more about how this process works in the Core Data Programming Guide, while the Incremental Store Programming Guide describes the process in more detail. The SQLite store follows the same process that an incremental store implementation does.
Again, the use case you are describing - getting a ton of new data, executing find-Or-Create on the data to create or update managed objects, and deleting "stale" objects that may in fact be the majority of the store - is something I have dealt with every day for several years, seeing all of the same problems you are. There are solutions - even for imports that change 60,000 complex objects at a time, and even using thread confinement! - but that is outside the scope of your question.
(Hint: Parent-Child contexts don't need merge notifications).
Two Persistent Store Coordinators (pscs) is certainly the way to go with large datasets. File locking is faster than the locking within core data.
There's no reason you couldn't use the background psc to create thread confined NSManagedObjectContexts in which each is created for each operation you do in the background. However, instead of letting core data manage the queueing you now need to create NSOperationQueues and/or threads to manage the operations based on what you're doing in the background. NSManagedObjectContexts are free and not expensive. Once you do this you can hang onto your NSManagedObjectContext and only use it during that one operation and/or threads life time and build as many changes as you want and wait until the end to commit them and merge them to the main thread how ever you decide. Even if you have some main thread writes you can still at crucial points in your operations life time refetch/merge back into your threads context.
Also it's important to know that if you're working on large sets of data don't worry about merging contexts so as long as you aren't touching something else. For example if you have class A and class B and you have two seperate opertions/threads to work on them and they have no direct relationship you do not have to merge the contexts if one changes you can keep on rolling with the changes. The only major need for merging background contexts in this fashion is if there are direct relationships faulting. It would be better to prevent this though through some sort of serialization whether it be NSOperationQueue or what ever else. So feel free to work away on different objects in the background just be careful about their relationships.
I've worked on a large scale core data projects and had this pattern work very well for me.
Indeed, this is the best core data scenario you can work with. Almost no Main UI staleness, and easy background management of your data. When you want to tell the Main Context (and maybe a currently running NSFetchedResultsController) you listen for save notifications of the backgroundContext like this:
[[NSNotificationCenter defaultCenter]
addObserver:self selector:#selector(reloadFetchedResults:)
name:NSManagedObjectContextDidSaveNotification
object:backgroundObjectContext];
Then, you can merge changes, but waiting for the Main Thread context to catch them before saving. When you receive the mergeChangesFromContextDidSaveNotification notification the changes are not yet saved. Hence the performBlockAndWait is mandatory, so the Main context gets the changes and then the NSFetchedResultsController updates its values correctly.
-(void)reloadFetchedResults:(NSNotification*)notification
{
NSManagedObjectContext*moc=[notification object];
if ([moc isEqual:backgroundObjectContext])
{
// Delete caches of fethcedResults if you have a deletion
if ([[theNotification.userInfo objectForKey:NSDeletedObjectsKey] count]) {
[NSFetchedResultsController deleteCacheWithName:nil];
}
// Block the background execution of the save, and merge changes before
[managedObjectContext performBlockandWait:^{
[managedObjectContext
mergeChangesFromContextDidSaveNotification:notification];
}];
}
}
There is a pitfall no one has noticed. You can get the save notification before the background context has actually saved the object you want to merge. If you want to avoid problems by a faster Main Context asking for an object that has not been saved yet by the background context, you should (you really should) call obtainPermanentIDsForObjects before any background save. Then you are safe to call the mergeChangesFromContextDidSaveNotification. This will ensure that the merge receives a valid permanent Id for merging.