Core Data managed object context design recommendation - ios

We are working on an Enterprise-level application, which will store tens of thousands of objects with Core Data, and we are having issues on several fronts.
Our application has several independent systems which operate on the data when needed. These systems include discovery of items, loading of items, synchronization and UI display. If we design our software correctly, there should be little to none merge conflicts due to the different systems modifying same objects. Each system has its own operation queues, all performing in the background. We wish to keep all object creation and modification in the background to minimize UI performance issues, especially during initial ramp up, where thousands of objects might be created from data on the server. Here we have hit several problems with our various design attempts. Huge memory consumption during these ramp ups, and incorrect orchestration of all the contexts and child contexts, causing deadlocks and crashes.
We have attempted the following designs:
One root NSPrivateQueueConcurrencyType managed object context which has one child NSMainQueueConcurrencyType context. The UI fetched results controllers use this child context to fetch results from. From the NSMainQueueConcurrencyType child context, we created one NSPrivateQueueConcurrencyType child context, which we called "savingContext" and each background operation created a child context of that "savingContext", did its changes, and finally did what we called a "deep save", recursively saving to the top. We initially chose this design to not have to deal with NSManagedObjectContextDidSaveNotification notifications from many different child contexts. We wrapped every call to the NSPrivateQueueConcurrencyType contexts and access to objects with performBlockAndWait:. Functionally, this design performed. All changes and inserts were saved to the persistent store, and UI was updated with the changes. This, introduced two issues. One was laggy UI during ramp up because of merged changes going through the NSMainQueueConcurrencyType child context, and more importantly, very high memory usage during ramp up. We would hit prohibitive RAM usages due to inability to call reset recursively on contexts (as the main UI child context is there too) and/or lack of knowledge when to call refreshObject:mergeChanges:. So we went a different road.
Have two top-level contexts linked with the persistent store coordinator, one NSPrivateQueueConcurrencyType for save child contexts, and a NSMainQueueConcurrencyType for UI display. The NSMainQueueConcurrencyType listens to NSManagedObjectContextDidSaveNotification notifications from the main NSPrivateQueueConcurrencyType context and merges them in the main thread. Each background operation creates a child context of the main NSPrivateQueueConcurrencyType context, also with private queue concurrency type, does what it does, performs a "deep save" recursively, which performs a save on the current context, a recursive call of deep save to its parent, calls reset on the current context and saves again. This way we avoid the memory issues, as created objects are released quickly after save. However, with this design, we have hit a plethora of issues such as dead locks, NSInternalInconsistencyException exceptions and fetched results controllers not updating the UI despite there being save notifications for the NSMainQueueConcurrencyType context. This also cause initial load times in the UI to slow a lot. In the previous design, the fetched results controller returned results very fast, while this has the UI blocked for several seconds until the view loads (we initialize the fetched results controller in viewDidLoad).
We have tried many intermediate designs, but they all revolve around the same issues, either very high memory usage, fetched results controller not updating the UI or deadlocks and NSInternalInconsistencyException exceptions.
I am really getting frustrated. I can't but feel as if our designs are overtly complicated for something that should be rather simple, and it is just our lack of understanding some fundamental that is killing us.
So what would you guys suggest? What arrangement would you recommend for our contexts? How should we manage different contexts in different threads? Best practices for freeing up inserted objects and resetting contexts? Avoiding dead locks? All help would be appreciated at this point.
I have also seen recommendations for the MagicalRecords category. Is it recommended? We have are already invested in using Core Data types, how difficult would it be to migrate using MR?

First, to manage your memory, your second architecture gives you much more flexibility.
Second, there are two kinds of memory to manage: malloc-ed memory and resident VM memory. You can have a low malloc-ed memory footprint and still have a large VM resident region. This is due, in my experience, to Core Data aggressively holding on to newly inserted items. I solve this problem with a post-save trimming notification.
Third, MOCs are cheap. Use'em and throw'em away. In other words, release memory early and often.
Fourth, try to do almost nothing data base wise on the main MOC. Yes, this sounds counter-productive. What I mean is that all of your complex queries really should be done on background threads and then have the results passed to the main thread or have the queries redone from the main thread while exploiting the now populated row-cache. By doing this, you keep the UI live.
Fifth, in my heavily multi-queued app, I try to have all of my saves really occur in the background. This keeps my main MOC fast and consistent with data coming in from the net.
Sixth, the NSFetchedResultsController is a quite useful but specialized controller. If your application pushes it outside of its area of competence, it starts locking up your interface. When that happens, I roll my own controller by listening for the -didSave notifications myself.

Related

With CoreData is it OK to have multiple contexts on the same thread?

With CoreData, is it OK to have multiple contexts on the same thread? At work we are debating whether having multiple contexts on the main thread can cause deadlock. I can't find any reason not to do it, but I am concerned that when one of the main thread contexts saves and merges into the other main thread context it may cause deadlock.
Note there appears to be a related ticket that is actually NOT related at all: Multiple contexts in the main thread: why and when use them? This ticket ONLY discusses using multiple contexts in general, and what a context is, and does not actually ever discuss any issues with using multiple contexts on the same thread.
https://developer.apple.com/library/mac/documentation/Cocoa/Conceptual/CoreData/Articles/cdConcurrency.html
and look here too
http://www.cocoanetics.com/2012/07/multi-context-coredata/
If you choose not to use the thread containment pattern—that is, if you try to pass managed objects or contexts between threads, and so on—you must be extremely careful about locking, and as a consequence you are likely to negate any benefit you may otherwise derive from multi-threading. You also need to consider that:
Any time you manipulate or access managed objects, you use the associated managed object context.
Core Data does not present a situation where reads are “safe” but changes are “dangerous”—every operation is “dangerous” because every operation has cache coherency effects and can trigger faulting.
Managed objects themselves are not thread safe.
If you want to work with a managed object across different threads, you must lock its context.

Core data, what concurrency model to use?

I am developing iOS an app which will gather big amounts of data from several sources (up to tens of thousands of objects, but simple objects, no images) and save it to my own database using core data. I then analyse this data and display the results to the user.
I want to know if there is any benefit to using a Main Queue Nsmanagedobjectcontext or if it is enough that I use a private one.
I also want to know what the benefit is of having several NSManagedObjectContext or if one is enough?
The concurrency model i am using currently only has one private queue nsmanagedobjectcontext connected to a persistant store coordinator. All the data analysis is performed on the private queue and then I simply pass the analyzed data to the main queue to display it. On older devices (iPhone 4) my application can sometimes crash when too much data is being loaded (i.e. downloaded from the external databases) at the same time, is this related to my choice of concurrency model?
Your current approach sounds fine. You only need a main thread context if you want the main thread to interact with the data, and in your case you don't so that's fine.
Your memory management is effectively unrelated and is more tied to how many things you have going on at once (it sounds like one) and how many objects you try to keep in main memory at any one time (it sounds like many) instead of faulting them out to the data store. This is what you need to look at / work on. Instruments can help you see how many objects you're keeping in memory.
At least call refreshObject:mergeChanges: with NO for merge changes to fault out any objects that you aren't using.
Also, remember that you're working on a mobile device and that processing up to tens of thousands of objects is a job better handled by a server...

Pitfalls of using two persistent store coordinators for efficient background updates

I am searching for the best possible way to update a fairly large core-data based dataset in the background, with as little effect on the application UI (main thread) as possible.
There's some good material available on this topic including:
Session 211 from WWDC 2013 (Core Data Performance Optimization and Debugging, from around 25:30 onwards)
Importing Large Data Sets from objc.io
Common Background Practices from objc.io (Core Data in the Background)
Backstage with Nested Managed Object Contexts
Based on my research and personal experience, the best option available is to effectively use two separate core-data stacks that only share data at the database (SQLite) level. This means that we need two separate NSPersistentStoreCoordinators, each of them having it's own NSManagedObjectContext. With write-ahead logging enabled on the database (default from iOS 7 onwards), the need for locking could be avoided in almost all cases (except when we have two or more simultaneous writes, which is not likely in my scenario).
In order to do efficient background updates and conserve memory, one also needs to process data in batches and periodically save the background context, so the dirty objects get stored to the database and flushed from memory. One can use the NSManagedObjectContextDidSaveNotification that gets generated at this point to merge the background changes into the main context, but in general you don't want to update your UI immediately after a batch has been saved. You want to wait until the background job is completely done and than refresh the UI (recommended in both the WWDC session and objc.io articles). This effectively means that the application main context remains out of sync with the database for a certain time period.
All this leads me to my main question, which is, what can go wrong, if I changed the database in this manner, without immediately telling the main context to merge changes? I'm assuming it's not all sunshine an roses.
One specific scenario that I have in my head is, what happens if a fault needs to be fulfilled for an object loaded in the main context, if the background operation has in between deleted that object from the database? Can this for instance happen on a NSFetchedResultsController based table view that uses a batchSize to fetch objects incrementally into memory? I.e., an object that has not yet been fully fetched gets deleted but than we scroll up to a point where the object needs to get loaded. Is this a potential problem? Can other things go wrong? I'd appreciate any input on this matter.
Great question!
I.e., an object that has not yet been fully fetched gets deleted but
than we scroll up to a point where the object needs to get loaded. Is
this a potential problem?
Unfortunately it'll cause problems. A following exception will be thrown:
Terminating app due to uncaught exception 'NSObjectInaccessibleException', reason: 'CoreData could not fulfill a fault for '0xc544570 <x-coredata://(...)>'
This blog post (section titled "How to do concurrency with Core Data?") might be somewhat helpful, but it doesn't exhaust this topic. I'm struggling with the same problems in an app I'm working on right now and would love to read a write-up about it.
Based on your question, comments, and my own experience, it seems the larger problem you are trying to solve is:
1. Using an NSFetchedResultsController on the main thread with thread confinement
2. Importing a large data set, which will insert, update, or delete managed objects in a context.
3. The import causes large merge notifications to be processed by the main thread to update the UI.
4. The large merge has several possible effects:
- The UI gets slow, or too busy to be usable. This may be because you are using beginUpdates/endUpdates to update a tableview in your NSFetchedResultsControllerDelegate, and you have a LOT of animations queing up because of the large merge.
- Users can run into "Could not fulfill fault" as they try to access a faulted object which has been removed from the store. The managed object context thinks it exists, but when it goes to the store to fulfill the fault the fault it's already been deleted. If you are using reloadData to update a tableview in your NSFetchedResultsControllerDelegate, you are more likely to see this happen than when using beginUpdates/endUpdates.
The approach you are trying to use to solve the above issues is:
- Create two NSPersistentStoreCoordinators, each attached to the same NSPersistentStore or at least the same NSPersistentStore SQLite store file URL.
- Your import occurs on NSManagedObjectContext 1, attached to NSPersistentStoreCoordinator 1, and executing on some other thread(s). Your NSFetchedResultsController is using NSManagedObjectContext 2, attached to NSPersistentStoreCoordinator 2, running on the main thread.
- You are moving the changes from NSManagedObjectContext 1 to 2
You will run into a few problems with this approach.
- An NSPersistentStoreCoordinator's job is to mediate between it's attached NSManagedObjectContexts and it's attached stores. In the multiple-coordinator-context scenario you are describing, changes to the underlying store by NSManagedObjectContext 1 which cause a change in the SQLite file will not be seen by NSPersistentStoreCoordinator 2 and it's context. 2 does not know that 1 changed the file, and you will have "Could not fulfill fault" and other exciting exceptions.
- You will still, at some point, have to put the changed NSManagedObjects from the import into NSManagedObjectContext 2. If these changes are large, you will still have UI problems and the UI will be out of sync with the store, potentially leading to "Could not fulfill fault".
- In general, because NSManagedObjectContext 2 is not using the same NSPersistentStoreCoordinator as NSManagedObjectContext 1, you are going to have problems with things being out of sync. This isn't how these things are intended to be used together. If you import and save in NSManagedObjectContext 1, NSManagedObjectContext 2 is immediately in a state not consistent with the store.
Those are SOME of the things that could go wrong with this approach. Most of these problems will become visible when firing a fault, because that accesses the store. You can read more about how this process works in the Core Data Programming Guide, while the Incremental Store Programming Guide describes the process in more detail. The SQLite store follows the same process that an incremental store implementation does.
Again, the use case you are describing - getting a ton of new data, executing find-Or-Create on the data to create or update managed objects, and deleting "stale" objects that may in fact be the majority of the store - is something I have dealt with every day for several years, seeing all of the same problems you are. There are solutions - even for imports that change 60,000 complex objects at a time, and even using thread confinement! - but that is outside the scope of your question.
(Hint: Parent-Child contexts don't need merge notifications).
Two Persistent Store Coordinators (pscs) is certainly the way to go with large datasets. File locking is faster than the locking within core data.
There's no reason you couldn't use the background psc to create thread confined NSManagedObjectContexts in which each is created for each operation you do in the background. However, instead of letting core data manage the queueing you now need to create NSOperationQueues and/or threads to manage the operations based on what you're doing in the background. NSManagedObjectContexts are free and not expensive. Once you do this you can hang onto your NSManagedObjectContext and only use it during that one operation and/or threads life time and build as many changes as you want and wait until the end to commit them and merge them to the main thread how ever you decide. Even if you have some main thread writes you can still at crucial points in your operations life time refetch/merge back into your threads context.
Also it's important to know that if you're working on large sets of data don't worry about merging contexts so as long as you aren't touching something else. For example if you have class A and class B and you have two seperate opertions/threads to work on them and they have no direct relationship you do not have to merge the contexts if one changes you can keep on rolling with the changes. The only major need for merging background contexts in this fashion is if there are direct relationships faulting. It would be better to prevent this though through some sort of serialization whether it be NSOperationQueue or what ever else. So feel free to work away on different objects in the background just be careful about their relationships.
I've worked on a large scale core data projects and had this pattern work very well for me.
Indeed, this is the best core data scenario you can work with. Almost no Main UI staleness, and easy background management of your data. When you want to tell the Main Context (and maybe a currently running NSFetchedResultsController) you listen for save notifications of the backgroundContext like this:
[[NSNotificationCenter defaultCenter]
addObserver:self selector:#selector(reloadFetchedResults:)
name:NSManagedObjectContextDidSaveNotification
object:backgroundObjectContext];
Then, you can merge changes, but waiting for the Main Thread context to catch them before saving. When you receive the mergeChangesFromContextDidSaveNotification notification the changes are not yet saved. Hence the performBlockAndWait is mandatory, so the Main context gets the changes and then the NSFetchedResultsController updates its values correctly.
-(void)reloadFetchedResults:(NSNotification*)notification
{
NSManagedObjectContext*moc=[notification object];
if ([moc isEqual:backgroundObjectContext])
{
// Delete caches of fethcedResults if you have a deletion
if ([[theNotification.userInfo objectForKey:NSDeletedObjectsKey] count]) {
[NSFetchedResultsController deleteCacheWithName:nil];
}
// Block the background execution of the save, and merge changes before
[managedObjectContext performBlockandWait:^{
[managedObjectContext
mergeChangesFromContextDidSaveNotification:notification];
}];
}
}
There is a pitfall no one has noticed. You can get the save notification before the background context has actually saved the object you want to merge. If you want to avoid problems by a faster Main Context asking for an object that has not been saved yet by the background context, you should (you really should) call obtainPermanentIDsForObjects before any background save. Then you are safe to call the mergeChangesFromContextDidSaveNotification. This will ensure that the merge receives a valid permanent Id for merging.

How to dynamically use MOC depending on thread to protect core data

I've read through the materials regarding core data and threading and understand the principles of a separate MOC for each thread. My question is, what's the best way to dynamically determine whether to use a different MOC or the main one. I have some methods that are sometimes called on the main thread, sometimes in background. Is dynamically detecting thread not recommended or is it okay? Any pitfalls? Or do people just write separate methods for the background processes?
Some additional detail...i have a refresh process that performs a bunch of updates off the main thread (so not to lock the UI while user is waiting) using a simple performSelectorInBackground. This process moves thru steps serially so i dont have to worry about multiple things accessing DB on THIS thread, obviously the trick is keeping the main and background safe. I have implemented using a separate context and merging in other places, but i recently rearchitected and am now using methods in the background i wasnt before. So i wanted to rewrite those, use the separate context, but sometimes ill be hitting them on the main thread and can access main MOC just fine.
You do not give much detail about how you are managing your background operation and what you are doing with it, so it is pretty difficult to suggest anything.
In general, since creating a MOC is a pretty fast operation, you could create a new temporary MOC each time you need one in read-only mode (e.g. for data lookup). If you also have updates (e.g., adding new object or modifying existing ones), you should factor in the cost of merging, thus creating temporary MOCs each time could not be a good approach.
Another good approach could be creating a child context in your background thread.
But, as I said, it all depends on what you are doing.
Have a look at this good post about multi-threaded Core Data usage: Multi-Context CoreData. It describes a couple of scenarios and the solutions for them.
EDIT:
You could certainly use isMainThread to discriminate between the two cases (where you can use the main MOC and when you need a new one). That is what that method is for (and it is surely not expensive).
On the other hand, if you want a cleaner implementation, the best approach IMO would be creating a child MOC (which simplifies a lot the merging process - it becomes almost automatic, since you just need to save the parent context after saving the temporary context).
You'll need a new NSManagedObjectContext for each thread, and you'll need to create new versions of your NSManagedObjects from that thread's new MOC. Read #sergio's answer regarding the pros/cons of that approach.
To check if you're on the main thread, you can use [NSThread isMainThread] and make determinations that way. Or, when you're spinning up a new thread to crunch on CoreData, also create a new MOC.
A common approach is to associate each managed object context with a particular serial dispatch queue. So there's one for the main queue, and you can dynamically create them otherwise.
Once you're tying these things to queues, you can use dispatch_queue_set_specific to attach a particular context to a particular queue and dispatch_get_specific to get the context for the current queue. They both turned up in iOS 5 so you'll see some iOS 4-compatible code that jumps through much more complicated hoops but you don't really need to worry about it any more.
Alternatively, if your contexts are tired to particular NSRunLoops or NSThreads, store the context to [[NSThread currentThread] threadDictionary] — it's exactly what it's there for.

Core Data stack with only a single context initialized with NSPrivateQueueConcurrencyType

I'm working on an app that requires multiple asynchronous downloads and saving of their contents to Core Data entities. One of the downloads is large and noticed the UI was being blocked while creating/writing to the managed object context. My research led me to read up on concurrent Core Data setups and I started implementing one of these. But I'm running into issues and spending a lot of time correcting things.
Before I continue, I'm thinking about simply setting up a single MOC with NSPrivateQueueConcurrencyType. Nothing I read mentions doing this. This way I could optionally perform MOC operations in the background, or just use the main thread as usual while maintaining a single MOC.
Is this a good approach? If not, what is wrong with it? I doubt this is the right approach because if it is, NSPrivateQueueConcurrencyType dominates NSMainQueueConcurrencyType and there would be no reason to have the latter.
There is nothing wrong with using a NSPrivateQueueConcurrencyType MOC for background tasks.
But you will probably still need a NSMainQueueConcurrencyType MOC.
From the documentation:
The context is associated with the main queue, and as such is tied
into the application’s event loop, but it is otherwise similar to a
private queue-based context. You use this queue type for contexts
linked to controllers and UI objects that are required to be used only
on the main thread.
As an example, for a fetched results controller, you would use the
NSMainQueueConcurrencyType MOC.

Resources