We have been trying to debug a Core Data multiple-context/threading issue wherein merging a Core Data save notification into our main thread NSManagedObjectContext is sporadically crashing the app. This is crashing ~2% of our app sessions and we are at a loss as to how to solve this. We would really appreciate any guidance or general advice on what could possibly cause this crash.
We have a Core Data setup that looks like this:
N.B. This is the default Core Data stack in Magical Record v2.3 created from [MagicalRecord setupAutoMigratingCoreDataStack]
This is the scenario where our app is crashing:
HTTP request returns JSON
JSON is parsed into NSManagedObjects (Some new entities, some updated entities) on Root Saving Context
Root Saving Context saves to persistent store
NSManagedObjectContextDidSaveNotification is broadcast by Core Data. Default context on main queue observes this and calls mergeChangesFromContextDidSaveNotification: with the NSDictionary of changes on the main thread.
It crashes when objectID is sent to an invalid object (most likely NSManagedObject has been deallocated).
This is occurring inside the private implementation of NSManagedObjectContext mergeChangesFromContextDidSaveNotification: so it is impossible for us to see what has actually gone wrong here; all we can tell at this point that an object which should exist, does not.
This only happens on a small percent of Core Data saves, indicating that may not be a fundamental flaw in our Core Data → API stack. Moreover, there is no indication that the size or type of the changes (insertions/updates/deletions) in the context changes have any impact on the likelihood of the crash.
The documentation of NSManagedObjectContextDidSaveNotification says that:
"You can pass the notification object to mergeChangesFromContextDidSaveNotification: on another thread, however you must not use the managed object in the user info dictionary directly on another thread. For more details, see Concurrency with Core Data in Core Data Programming Guide."
Maybe this is the issue? I would make sure the object u get from the notification is getting saved on Default Context on the same thread it was posted by Root.
It's been some time now since this question was posted and after rediscovering it I'd like to answer my own question for the sake of others who find this thread.
In my circumstance I had migrated a large code base from sibling NSManagedObjectContexts updated via NSManagedObjectContextDidSaveNotification's. However the problem was not really anything to do with this, even though this did expose the issue.
The real cause of this were places that there were older parts of the code, setup by previous engineers, that had setup KVO on NSManagedObjects and their properties. It transpired that KVO on Core Data entities is in fact a very very bad idea.
More accurately, it appeared that this happened when KVO was setup on entities and either the object, or the target of a relationship on this object was deleted from the NSPersistentStore. This second condition seemed to not be the only cause of the issue, but was definitely a very prominent cause in my situation.
Lesson's learnt:
Use a fetched results controller when you need to. KVO is not a convenient shortcut and you shouldn't avoid migrating dodgy Core Data KVO code to NSFetchedResultsControllers or another sensible alternative as the procrastination will just hurt you.
Multi threaded Core Data is a difficult but very worthwhile skill to become an expert in. Knowing your Core Data stack and the nuances and limitations of Core Data multithreading is absolutely worth all the mental anguish.
One possibility is that your persistent store has become corrupted and is in an inconsistent state. If this happens an error code is generated which Magical Record does not necessarily deal with. This can be the source of a number of difficult-to-repeat apparently-random crashes related to Magical Record (and may or may not be considered a Magical Record bug).
It's worth reading the Magical Record issues threads here (same issue) and here (different issue, but could be similar cause). When I hit these problems I managed to make some temporary patch fixes following various hints in those threads, but ultimately I decided to remove my dependency on Magical Record, and I have had no problems since then.
Related
I'm in the middle of development of an iOS application after working quite some time with Core Data and Magical Record and having an error:
error: NULL _cd_rawData but the object is not being turned into a fault
I didn't know Core Data before this project and as it turns out I was very naive to think I can just work with Magical Record without worrying about concurrency, as I haven't dedicated any thoughts/work about managed contexts for the main thread and background threads.
After A LOT of reading about Core Data Managed Object Contexts and Magical Record, I understand that:
NSManagedObjects are not thread safe.
NSManagedObjectId IS thread safe.
I can use: Entity *localEntity = [entity MR_inContext:localContext] of Magical Record to work with an entity in a background thread's context.
I should use Magical Record's saveWithBlock:completion: and saveWithBlockAndWait: methods to get a managed context to use for background threads.
A little information regarding my application:
I'm using the latest version of Magical Record which is 2.2.
I have a backend server which my application talks to a lot.
Their communication is similar to Whatsapp, as it uses background threads for communicating with the server and updating managed objects upon successful responses.
I'm wrapping the model with DataModel objects that hold the managed objects in arrays for quick referencing of UI/background use.
Now - my questions are:
Should I fetch only from the UI thread? Is it okay that I'm holding the managed objects in DataModel objects?
How can I create a new entity from a background thread and use the newly created entity in the DataModel objects?
Is there a best design scenario that I should use? specifically when sending a request to the server and getting a response, should I create a new managed context and use it throughout the thread's activity?
Let me know if everything is clear. If not, I'll try and add clarity.
Any help or guidelines would be appreciated.
Thanks!
I'm not working with MagicalRecord, but these questions are more related to CoreData than to MagicalRecord, so I'll try to answer them :).
1) Fetching from main(UI) thread
There are many ways how to design app model, so two important things I've learned using CoreData for few years:
when dealing with UI, always fetch objects on main thread. As You correctly stated NSManagedObjects are not thread safe so you can't (well, can, but shouldn't) access their data from different thread. NSFetchedResultsController is Your best friend when you need to display long lists (eg. for messages – but watch out for request's batchSize).
you should design your storage and fetches to be fast and responsive. Use indexes, fetch only needed properties, prefetch relationships etc.
on the other hand if you need to fetch from large amount of data, you can use context on different thread and transfer NSManagedObjectIDs only. Let's say your user has huge amount of messages and you want to show him latest 10 from specific contact. You can create background context (private concurrency), fetch these 10 message IDs (NSManagedObjectIDResultType), store them in array (or any other suitable format for you), return them to your main thread and fetch those IDs only. Note that this approach speed things up if fetch takes long because of predicate/sortDescriptor not if the "problem" is in the process of turning faults into objects (eg. large UIImage stored in transformable attribute :) )
2) Creating entity in background
You can create the object in background context, store it's NSManagedObjectID after you save context (object has only temporary ID before save) and send it back to your main thread, where you can perform fetch by ID and get the object in your main context.
3) Working with background context
I don't know if it's exactly the best, but I'm pretty much satisfied with NSManagedObjectContext observation and merging from notifications. Checkout:
mergeChangesFromContextDidSaveNotification:
So, you create background context, add main context as observer for changes (NSManagedObjectContextObjectsDidChangeNotification) and background context automatically sends you notifications (every time you perform save) about all of it's changes – inserted/updated/deleted objects (no worries, you can just merge it by calling mergeChangesFromContextDidSaveNotification:). This has many advantages, such as:
everything is updated automatically (every object you have fetched in "observing context" gets updated/deleted)
every merge runs in memory (no fetches, no persisting on main thread)
if you implement NSFetchedResultsController's delegate method, everything updates automatically (not exactly everything – see below)
On the other side:
take care about merging policy (NSMangedObjectContext mergePolicy)
watchout for referencing managed objects that got deleted from background (or just another context)
NSFetchedResultsController updates only on changes to "direct" attributes (checkout this SO question)
Well, I hope it answers your questions. If everything comes up, don't hesitate to ask :)
Side note about child contexts
Also take a peek to child contexts. They can be powerful too. Basically every child context sends it's changes to parent context on save (in case of "base" context (no parent context), it sends it's changes to persistent coordinator).
For example when you are creating edit/add controller you can create child context from your main context and perform all changes in it. When user decides to cancel the operation, you just destroy (remove reference) the child context and no changes will be stored. If user decides to accept changes he/she made, save the child context and destroy it. By saving child context all changes are propagated to it's parent store (in this example your main context). Just be sure to also save the parent context (at some point) to persist these changes (save: method doesn't do the bubbling). Checkout documentation of managing parent store.
Happy coding!
I am searching for the best possible way to update a fairly large core-data based dataset in the background, with as little effect on the application UI (main thread) as possible.
There's some good material available on this topic including:
Session 211 from WWDC 2013 (Core Data Performance Optimization and Debugging, from around 25:30 onwards)
Importing Large Data Sets from objc.io
Common Background Practices from objc.io (Core Data in the Background)
Backstage with Nested Managed Object Contexts
Based on my research and personal experience, the best option available is to effectively use two separate core-data stacks that only share data at the database (SQLite) level. This means that we need two separate NSPersistentStoreCoordinators, each of them having it's own NSManagedObjectContext. With write-ahead logging enabled on the database (default from iOS 7 onwards), the need for locking could be avoided in almost all cases (except when we have two or more simultaneous writes, which is not likely in my scenario).
In order to do efficient background updates and conserve memory, one also needs to process data in batches and periodically save the background context, so the dirty objects get stored to the database and flushed from memory. One can use the NSManagedObjectContextDidSaveNotification that gets generated at this point to merge the background changes into the main context, but in general you don't want to update your UI immediately after a batch has been saved. You want to wait until the background job is completely done and than refresh the UI (recommended in both the WWDC session and objc.io articles). This effectively means that the application main context remains out of sync with the database for a certain time period.
All this leads me to my main question, which is, what can go wrong, if I changed the database in this manner, without immediately telling the main context to merge changes? I'm assuming it's not all sunshine an roses.
One specific scenario that I have in my head is, what happens if a fault needs to be fulfilled for an object loaded in the main context, if the background operation has in between deleted that object from the database? Can this for instance happen on a NSFetchedResultsController based table view that uses a batchSize to fetch objects incrementally into memory? I.e., an object that has not yet been fully fetched gets deleted but than we scroll up to a point where the object needs to get loaded. Is this a potential problem? Can other things go wrong? I'd appreciate any input on this matter.
Great question!
I.e., an object that has not yet been fully fetched gets deleted but
than we scroll up to a point where the object needs to get loaded. Is
this a potential problem?
Unfortunately it'll cause problems. A following exception will be thrown:
Terminating app due to uncaught exception 'NSObjectInaccessibleException', reason: 'CoreData could not fulfill a fault for '0xc544570 <x-coredata://(...)>'
This blog post (section titled "How to do concurrency with Core Data?") might be somewhat helpful, but it doesn't exhaust this topic. I'm struggling with the same problems in an app I'm working on right now and would love to read a write-up about it.
Based on your question, comments, and my own experience, it seems the larger problem you are trying to solve is:
1. Using an NSFetchedResultsController on the main thread with thread confinement
2. Importing a large data set, which will insert, update, or delete managed objects in a context.
3. The import causes large merge notifications to be processed by the main thread to update the UI.
4. The large merge has several possible effects:
- The UI gets slow, or too busy to be usable. This may be because you are using beginUpdates/endUpdates to update a tableview in your NSFetchedResultsControllerDelegate, and you have a LOT of animations queing up because of the large merge.
- Users can run into "Could not fulfill fault" as they try to access a faulted object which has been removed from the store. The managed object context thinks it exists, but when it goes to the store to fulfill the fault the fault it's already been deleted. If you are using reloadData to update a tableview in your NSFetchedResultsControllerDelegate, you are more likely to see this happen than when using beginUpdates/endUpdates.
The approach you are trying to use to solve the above issues is:
- Create two NSPersistentStoreCoordinators, each attached to the same NSPersistentStore or at least the same NSPersistentStore SQLite store file URL.
- Your import occurs on NSManagedObjectContext 1, attached to NSPersistentStoreCoordinator 1, and executing on some other thread(s). Your NSFetchedResultsController is using NSManagedObjectContext 2, attached to NSPersistentStoreCoordinator 2, running on the main thread.
- You are moving the changes from NSManagedObjectContext 1 to 2
You will run into a few problems with this approach.
- An NSPersistentStoreCoordinator's job is to mediate between it's attached NSManagedObjectContexts and it's attached stores. In the multiple-coordinator-context scenario you are describing, changes to the underlying store by NSManagedObjectContext 1 which cause a change in the SQLite file will not be seen by NSPersistentStoreCoordinator 2 and it's context. 2 does not know that 1 changed the file, and you will have "Could not fulfill fault" and other exciting exceptions.
- You will still, at some point, have to put the changed NSManagedObjects from the import into NSManagedObjectContext 2. If these changes are large, you will still have UI problems and the UI will be out of sync with the store, potentially leading to "Could not fulfill fault".
- In general, because NSManagedObjectContext 2 is not using the same NSPersistentStoreCoordinator as NSManagedObjectContext 1, you are going to have problems with things being out of sync. This isn't how these things are intended to be used together. If you import and save in NSManagedObjectContext 1, NSManagedObjectContext 2 is immediately in a state not consistent with the store.
Those are SOME of the things that could go wrong with this approach. Most of these problems will become visible when firing a fault, because that accesses the store. You can read more about how this process works in the Core Data Programming Guide, while the Incremental Store Programming Guide describes the process in more detail. The SQLite store follows the same process that an incremental store implementation does.
Again, the use case you are describing - getting a ton of new data, executing find-Or-Create on the data to create or update managed objects, and deleting "stale" objects that may in fact be the majority of the store - is something I have dealt with every day for several years, seeing all of the same problems you are. There are solutions - even for imports that change 60,000 complex objects at a time, and even using thread confinement! - but that is outside the scope of your question.
(Hint: Parent-Child contexts don't need merge notifications).
Two Persistent Store Coordinators (pscs) is certainly the way to go with large datasets. File locking is faster than the locking within core data.
There's no reason you couldn't use the background psc to create thread confined NSManagedObjectContexts in which each is created for each operation you do in the background. However, instead of letting core data manage the queueing you now need to create NSOperationQueues and/or threads to manage the operations based on what you're doing in the background. NSManagedObjectContexts are free and not expensive. Once you do this you can hang onto your NSManagedObjectContext and only use it during that one operation and/or threads life time and build as many changes as you want and wait until the end to commit them and merge them to the main thread how ever you decide. Even if you have some main thread writes you can still at crucial points in your operations life time refetch/merge back into your threads context.
Also it's important to know that if you're working on large sets of data don't worry about merging contexts so as long as you aren't touching something else. For example if you have class A and class B and you have two seperate opertions/threads to work on them and they have no direct relationship you do not have to merge the contexts if one changes you can keep on rolling with the changes. The only major need for merging background contexts in this fashion is if there are direct relationships faulting. It would be better to prevent this though through some sort of serialization whether it be NSOperationQueue or what ever else. So feel free to work away on different objects in the background just be careful about their relationships.
I've worked on a large scale core data projects and had this pattern work very well for me.
Indeed, this is the best core data scenario you can work with. Almost no Main UI staleness, and easy background management of your data. When you want to tell the Main Context (and maybe a currently running NSFetchedResultsController) you listen for save notifications of the backgroundContext like this:
[[NSNotificationCenter defaultCenter]
addObserver:self selector:#selector(reloadFetchedResults:)
name:NSManagedObjectContextDidSaveNotification
object:backgroundObjectContext];
Then, you can merge changes, but waiting for the Main Thread context to catch them before saving. When you receive the mergeChangesFromContextDidSaveNotification notification the changes are not yet saved. Hence the performBlockAndWait is mandatory, so the Main context gets the changes and then the NSFetchedResultsController updates its values correctly.
-(void)reloadFetchedResults:(NSNotification*)notification
{
NSManagedObjectContext*moc=[notification object];
if ([moc isEqual:backgroundObjectContext])
{
// Delete caches of fethcedResults if you have a deletion
if ([[theNotification.userInfo objectForKey:NSDeletedObjectsKey] count]) {
[NSFetchedResultsController deleteCacheWithName:nil];
}
// Block the background execution of the save, and merge changes before
[managedObjectContext performBlockandWait:^{
[managedObjectContext
mergeChangesFromContextDidSaveNotification:notification];
}];
}
}
There is a pitfall no one has noticed. You can get the save notification before the background context has actually saved the object you want to merge. If you want to avoid problems by a faster Main Context asking for an object that has not been saved yet by the background context, you should (you really should) call obtainPermanentIDsForObjects before any background save. Then you are safe to call the mergeChangesFromContextDidSaveNotification. This will ensure that the merge receives a valid permanent Id for merging.
I understand the documented differences between these two calls. However does anyone know the reasons for the following observed behaviour I have noticed:
If I have a parentContext and a temporary childContext where I use the childContext to edit, insert and deleted objects, if use [childContext objectWithID:objectID]; to retrieve a known existing managed object, present in the parent context, it will sometimes give me an object with a fault that on being fired fails and generates an exception. I understand objectWithID: will, by design, always return an object in a faulted state regardless of if an actual managedObject exists for the given objectID. However if the object actually exists in the parent context, I would expect that when any of the properties are accessed, the object will always be successfully retrieved from the parent context (e.g. the fault will be fired) without any problem. If I use [childContext existingObjectWithID:objectID]; I find it does indeed always succeed.
For the record, I have turned off caching on the child context and this same behaviour occurs after [childContext resetContext] has been called - so it's not an artefact of old cached data hanging around that is inconsistent with the parent context.
The documentation alone seems to me to be insufficient to explain this behaviour. I can of course chalk it up to experience and just say "I now know to always use existingObjectWithID: when passing object IDs to my child edit context perform block" but I feel uneasy and would like to understand exactly what is going on here (not least so I can understand if there is any performance impact of using the one as over the other but also to understand what the constraint is so I can ensure there isn't some bad practice I'm unnecessarily implementing in my code and then using a wrong or inefficient call to fix it).
I've seen this behavior in my own code (the stack traces were from Mars, took forever to track down the cause of the issue), and my solution was the same (move from use of objectWithID: in the child context to use of existingObjectWithID:).
In my case, I'd create an object on the parent context, immediately obtain a permanent object ID for it, and ask for a save using UIManagedDocument's updateChangeCount:UIDocumentChangeDone, which will schedule a save at some point in the future. I think this point is key for the discussion.
I'd then perform a network call to obtain XML data pertaining to the newly-created object, and parse and import that data into Core Data. This occurred on a background thread, and I'd pass the object ID to the thread, which used a thread-confined child import context.
Under iOS5, I'd use objectWithID: in the worker thread to fault the new object into the import context. Worked fine, no issues.
Under iOS6, this began to fail, with bizarre stack traces, and the only solution was to move to existingObjectWithID:. Under testing, it appears to be a solid solution with this change in place.
Like you, I've tried to find some definitive statement as to why this works, but I've been unsuccessful in doing so. However, I think the pattern that my code exhibited and the stack traces I was seeing perhaps explains what's happening. I was always seeing a crash attempting to obtain data from the persistent store. I believe that under iOS5, the save I requested via UIManagedDocument was occurring before my child thread got a chance to run, and thus the new object was persisted before I called objectWithID:. It would appear from my crash logs that that is not the case under iOS6; the thread runs before the object has made it to the persistent store, and thus it can't be fetched into the child context yet. My assumption is that existingObjectWithID: is ensuring that any pending SQL I/O is performed before attempting a fetch, and thus the persistent store is consistent when the fetch occurs in my worker thread. I have been unable to find anything definitive to support this, but extensive testing seems to support that that's what's happening.
Having the same issue as the OP, and while I think the exception is by design as per
If the object is not registered in the context, it may be fetched or returned as a fault. This method always returns an object. The data in the persistent store represented by objectID is assumed to exist—if it does not, the returned object throws an exception when you access any property (that is, when the fault is fired). The benefit of this behavior is that it allows you to create and use faults, then create the underlying data later or in a separate context.
in the Apple docs -- that is, the data is NOT in the persistent store yet, it's sitting in the parent context -- it still seems odd that I simply cannot get my realized object in the parent context. Why must the persistent store have to know about it, for my object to be populated?
Getting the "existing object" using existingObjectWithID before getting the "crash object" using objectWithID will lead to both objects being equal.
The other way round the app will crash as soon as a property of the "crash object" is accessed. In this case the objects are not equal. The "crash object" has a temporary ObjectID.
I'm syncing with a MySQL database.
Initially, I was going to loop through all my new/modified objects and set all the foreign keys for that object and then do the next object, and so on... But that's a lot of fetch requests.
So instead I wanted to loop through all my new/modified objects and set the foreign keys one at a time. So the first pass over my objects sets fk1, my next sets fk2, so on...
Cool, fetch requests drastically reduced. Now I'm curious if I could thread these fk setters. They aren't dependent on each other, but they are modifying the same object, even though they're only setting one relationship, and it's a different relationship. Speaking in git terms, these changes could be 'merged' together without any conflict, but is it possible to push changes in one child managedObjectContext(childContext:save) up to the parentManagedObjectContext(parent:performBlock^{parent:save}) and pull it down in another, different child managedObjectContext(???)? Or will the merge policy only take one childContext's version of the object and leave the other fks effectively unchanged.
I know this exists: NSManagedObjectContext/refreshObject:mergeChanges:
But that's on an object by object level. Will that cause a bunch of fetches? Or will that update my whole context at once/in batches?
Following Apple's suggestion from here:
https://developer.apple.com/library/mac/#documentation/Cocoa/Conceptual/CoreData/Articles/cdImporting.html
I've created/updated my values before I start setting any relationships, so all entities already exist before I try to point any relationships at them.
Aside: We have a couple apps that could benefit from the concurrency, because they throw a considerable amount of data around, and with the quad core iPad apps, this would really help out with the time the initial sync takes.
I'n not sure what you are trying to do and why (you could write less lines in your question and be more clear), but here are some guidelines for working with Core Data:
-- NSManagedObjectContext is not thread safe. Therefore, you need to limit your access to this managed object context to happen inside 1 thread. Otherwise, you may end up having many errors you can't understand.
-- NSManagedObjectContexts apart from doing other things, serve like "snapshots" of your persistent store. That means that when you change an object, you save it to the persistent store, you post a NSManagedObjectContextDidSaveNotification and then call mergeChangesFromContextDidSaveNotification: in another place inside your program in order to load the latest data from the persistent store. Be careful of thread safety.
NSManagedObjectContext/refreshObject:mergeChanges: according to apple is not about just refreshing a managed object. If you pass YES as the second argument, it will write any pending changes from this managed object context to the persistent store, and will load any other changes to other properties of this object from the persistent store, thus "synchronizing" your object with the persistent store. If you pass NO as the second argument, the object loses any pending changes, and it is being turned to a fault. That means that when you attempt to access it, the Managed Object Context will reload the object as it was last saved to the database. It will NOT reload the entire managed object context. It will only operate on the object.
Aside: I have written a blog post that scratches the surface of asynchronous loading from a core data database. In my case, since I'm doing heavy lifting with the database, I ended up using an NSOperation that operates with its own NSManagedObjectContext, and using serial GCD queues to save large chunks of data, since it was faster than having multiple threads accessing the same persistent store, even if they operate on different managed object contexts.
I hope I helped.
My app needs to be able to disconnect from a server and connect to another one at whim, which necessitates dumping whatever persistent store we have. Issue here is that releasing the 'main' managed object context means whatever objects in it that I have laying around fault, which causes all sorts of unexpected little issues and crashes.
Is there a better way to 'reset' my stack/managed objects littered around the program than just calling release on everything in my Core Data stack?
You need to close down you Core Data stack from the top down.
Make sure that no managed objects are retained by any object other than the managed object context e.g. make sure the objects aren't held in an array owned by a UI controller.
Save the managed object context to clean up any loose ends.
Fully release the context and nil it. The context should never be retained by more than one object e.g the app delegate, anyway.
Send removePersistentStore:error: to the persistent store coordinator.
Use standard file operations to delete the actual store file.
Changing Core Data like this on the fly is difficult because Core Data isn't just a little database hung off the side of the app. It is intended to serve as the apps entire model layer. Since Apple is really into Model-View-Controller design, then the model is the actual core of the program (hence Core Data's name.) As such, you can't really turn it on and off the way you would a mere SQL database.
You might actually want to rethink your design so that you can change servers without having to shut down your entire data model. E.g. simply delete all managed objects associated with an unused server.
If you mean you want to fault all objects so they will be fetched again from your persistent store,
[managedObjectContext reset];