Core data, what concurrency model to use? - ios

I am developing iOS an app which will gather big amounts of data from several sources (up to tens of thousands of objects, but simple objects, no images) and save it to my own database using core data. I then analyse this data and display the results to the user.
I want to know if there is any benefit to using a Main Queue Nsmanagedobjectcontext or if it is enough that I use a private one.
I also want to know what the benefit is of having several NSManagedObjectContext or if one is enough?
The concurrency model i am using currently only has one private queue nsmanagedobjectcontext connected to a persistant store coordinator. All the data analysis is performed on the private queue and then I simply pass the analyzed data to the main queue to display it. On older devices (iPhone 4) my application can sometimes crash when too much data is being loaded (i.e. downloaded from the external databases) at the same time, is this related to my choice of concurrency model?

Your current approach sounds fine. You only need a main thread context if you want the main thread to interact with the data, and in your case you don't so that's fine.
Your memory management is effectively unrelated and is more tied to how many things you have going on at once (it sounds like one) and how many objects you try to keep in main memory at any one time (it sounds like many) instead of faulting them out to the data store. This is what you need to look at / work on. Instruments can help you see how many objects you're keeping in memory.
At least call refreshObject:mergeChanges: with NO for merge changes to fault out any objects that you aren't using.
Also, remember that you're working on a mobile device and that processing up to tens of thousands of objects is a job better handled by a server...

Related

How to create snapshot of CoreData state?

Background story
I am developing a big iOS app. This app works under specific assumptions. The main of them is that app should work offline with internal storage which is a snapshot of last synchronized state of data saved on server. I decided to use CoreData to handle this storage. Every time app launches I check if WiFi connection is enabled and then try to synchronize storage with server. The synchronization can take about 3 minutes because of size of data.
The synchronization process consists of several stages and in each of them I:
fetch some data from the server (XML)
deserialize it
save it in Core Data
Problem
Synchronization process can be interrupted for several reasons (internet connection, server down, user leaving application, etc). This may cause data to be out-of-sync.
Let's assume that synchronization process has 5 stages and it breaks after third. It results in 3/5 of data being updated in internal storage and the rest being out of sync. I can't allow it because data are strongly connected to each other (business logic).
Goal
I don't know if it is possible but I'm thinking about implementing one solution. On start of synchronization process I would like to create snapshot (some kind of copy) of current state of Core Date and during synchronization process work on it. When synchronization process completes with success then this snapshot could overwrite current CoreData state. When synchronization interrupts then snapshot can be simply aborted. My internal storage will be secured.
Questions
How to create CoreData snapshot?
How to work with CoreData snapshot?
How to overwrite CoreDate state with snapshot?
Thanks in advice for any help. Code examples, if it is possible, will be appreciated.
EDIT 1
The size of data is too big to handle it with multiple CoreData's contexts. During synchronization I am saving current context multiple times to cleanup memory. If I do not do it, the application will crash with memory error.
I think it should be resolved with multiple NSPersistentStoreCoordinators using for example this method: link. Unfortunately, I don't know how to implement this.
You should do exactly what you said. Just create class (lets call it SyncBuffer) with methods "load", "sync" and "save".
The "load" method should read all entities from CoreData and store it in class variables.
The "sync" method should make all the synchronisation using class variables.
Finally the "save" method should save all values from class variables to CoreData - here you can even remove all data from CoreData and save brand new values from SyncBuffer.
A CoreData stack is composed at its core by three components: A context (NSManagedObjectContext) a model (NSManagedObjectModel) and the store coordinator (NSPersistentStoreCoordinator/NSPersistentStore).
What you want is to have two different contexts, that shares the same model but use two different stores. The store itself will be of the same type (i.e. an SQLite db) but use a different source file.
At this page you can see some documentation about the stack:
https://developer.apple.com/library/archive/documentation/Cocoa/Conceptual/CoreData/InitializingtheCoreDataStack.html#//apple_ref/doc/uid/TP40001075-CH4-SW1
The NSPersistentContainer is a convenience class to initialise the CoreData stack.
Take the example of the initialisation of a NSPersistentContainer from the link: you can have the exact same code to initialise it, twice, but with the only difference that the two NSPersistentContainer use a different .sqlite file: i.e. you can have two properties in your app delegate called managedObjectContextForUI and managedObjectContextForSyncing that loads different .sqlite files. Then in your program you can use the context from one store to get current data to show to the user and you can use the context that use the other store with a different .sqlite if you are doing sync operations. When the sync operations are finally done you can eventually swap the two files and after clearing and reloading the NSPersistentContainer (this might be tricky, because you will want to invalidate and reload all managed objects: you are switching to an entirely new context) you can then show the newly synced data to the user and start syncing again on a new .sqlite file.
The way I understand the problem is that you wish to be able download a large "object graph". It is however so large that it cannot be loaded at once in memory, so you would have to break it in chunks and then merge it locally into to Core data.
If that is the case, I think that's not trivial. I am not sure I can think of direct solution without understanding the object relations and even then it may be really overwhelming.
An overly simplistic solution may be to generate the sqlite file on the backend then download it in chunks; it seems ugly, but it serves to separate the business logic from the sync, i.e. the sqlite file becomes the transport layer. So, I think the essence to the solution would be to find a way to physically represent the data you are syncing in a format that allows for splitting it in chunks and that can afterwards be merged into a sqlite file (if you insist on using Core data).
Please also note that as far as I know Amazon (https://aws.amazon.com/appsync/) and Realm (https://realm.io/blog/introducing-realm-mobile-platform/) provide background sync of you local database, but those are paid services and you would have to be careful not be locked in (should not depend on their libs in your model layer, instead have a translation layer).

CoreData in-memory setup with MagicalRecord 3

Hello I'm using CoreData + MagicalRecord 3 to manage the data in my app. Until then everything was working fine, but then I realize in production than my app is freezing like hell !
So I started to investigate knowing about the fact that to not stuck the UI, it's better to have a main context and a background context and save stuff in background etc...
Nevertheless I have to question due to my setup. I use CoreData in-memory store system (for the best performance) and I don't care about storing the data on disk of my app, I'm fine with a volatile model that will be destroyed when the app is killed or in background for too long. I just want to be able to find my data from any view controller and without coupling.
So I have few questions :
1) If I would use 1 unique context, what would happen if I NEVER save it to the memory store ? For instance if I MR_createEntity then I retrieve this entity from the context and update it, is it updated everywhere or do I have to save it so it can be updated ? In other term was is the interest of saving for in-memory where you don't want to persist the data forever ?
2) If I use 1 unique context that I declare being background, if I display my screen before my data is finished to saved, the screen won't be able to find and display my data right ? Unless I use NSFetchResultController right ?
1) you want to save your data even with an in memory store for a couple of reasons. First, so that you can use core data properly in the case where you might change your mind and persist your data. Second, you'll likely want to access and process some data on different threads/queues. In that case, you'll have to use Core Data's data safety mechanisms for threads/queues. The store is the lowest level at which Core Data will sync data across threads (the old way). This may be less important if you use nested contexts to sync your data (the new way). But even with nested contexts, you'll need call save in order for your changes to merge across contexts. Core Data doesn't really like it when you save to a nil store.
2) You can make and use your own context for displaying data. NSFetchedResultsController does a lot of the leg work in listening for the correct notifications and making sure you're getting very specific updates for the data you asked for in the first place. NSFRC is not always necessary, but will certainly be the easiest way to start.

CoreData - (Performance) Considerations for frequent data

Background
We have an app that receives sensor data at 100 Hz. Each sensor data contains three floats. Occasionally (max 1/s) some other metadata may be received that needs to be saved as well. The UI displays the latest 1000 sensor values in a graph. There are no undo-requirements - all received data must be saved to file. Each session lasts for at least 10 min, but may (in rare circumstances and mostly due to mistake) be up to an hour.
Current approach
Model: SensorData has a many-to-one relationship with Session. MetaData has a many-to-one relationship with Session.
CoreData: Set up a UIManagedDocument to handle CoreData. One MOC on main thread with a child MOC on a private queue. The child MOC creates the objects and add them to the object graph. Every 100th data, save child MOC. Once session ends, save main MOC to PSC.
Edit: The problem I have with the current approach is that saving in the child MOC lags behind, which means not all data has been processed when session ends and processing time increases with run time.
Questions
Is it feasible to use CoreData as storage mechanism at ~100 Hz, or should I look at some alternative (like saving to a csv-file)?
What considerations must I take to ensure proper/optimal performance?
I have had performance issues with saves taking a long time and blocking UI. How can I avoid this? I.e. what saving policy should I use?
Drawbacks and advantages of current approach?
I think Core Data can do this.
You could use Marcus Zarra's approach of three contexts to make sure the actual save also happens in the background.
RootContext (background) saves to persistent store ---> is parent of
MainContext (main thread) to update the UI ---> is parent of one or more
WorkerContext (background) to create new data from sensor
You could then actually save more frequently in the background to the persistent store directly without impacting UI responsiveness. This should also improve memory usage. Saving the worker context will push the changes to the UI which can be updated accordingly.
For performance make sure you batch save - with three floats I would estimate every 1.000 to 5.000 records or so (you need to experiment to find the optimal value).
Turn off the undo manager. (context.undoManager = nil)
Another consideration would be to maybe think hard about what you want to show in the UI and perhaps calculate values to display on the fly and send that to the UI, rather than have the UI rely on the entire session's data set to update itself.
I have come up against exactly this issue, in an elaboration of this project.
My task is to record live sensor data from (for example) Core Motion and Core Location at rates up to 100Hz whilst simultaneously running a smoothly animating interface which can inolve any of Core Graphics, Core Animation, OpenGL and live video. There are ~20-40 separate data items to track, mostly doubles but one or two strings, and they do not all arrive at the same sync rate.
Any hold-up during saves, however slight, will have an immediate hit on the interface.
I was interested to compare using Core Data against writing directly to a SQL database (using sqlite3). My personal experience so far (this is a work in progress) is that the SQL approach is much better suited to this type of problem than Core data. In fact its not really what Core Data was optimised for (which is rather to manage complex document object models with undo, persistence and efficient faulting). The Core Data model almost assumes that persistent saves will be prohibitively slow (for example, saving to iCloud), and much of it's engineering is designed to offer solutions to that problem.
I have tried various core data patterns, backgrounding, parent/child contexts, sync, async, batching saves ... and invariably i find a noticeable stutter whenever a persistent save actually occurs.
The SQL approach, on the other hand, is simple to understand, efficent and completely free of noticable glitches.
It may well be that I have not arrived at the optimal core data pattern for this problem (and I will be digging deeper into this, as it is an interesting edge case). However I would definitely suggest a look at the direct-to-SQL approach if that makes sense for you in your broader app context.
In slightly different data-streaming use-cases (for example, a 250-500Hz signal delivered over bluetooth) I have opted for the kind of signal-processing tricks used by audio interfaces - ring buffers, queues and callbacks can become very useful as your data rate goes up. At some point the data rate will get too high for a database-writing process to keep up: then - as you suggest - saving directly to file will be more efficient. You can always read the data back out of files at some later point and populate the database (or core data) when sampling is not taking place.
Matt Gallagher made a nice comparision of Core Data and Databases.
It's a fairly old piece, but the patterns haven't changed so it is still relevant. There's also useful little (and similarly-aged) discussion here on the benefits of flat file over database writing with high-frequency streams.

Fetch related Core Data objects in background? (to prevent UI freeze)

I am currently using a method where I run a fetch request in the background to obtain object IDs, and then instantiating them with -existingObjectWithID:error:.
The problem is that these objects have to-many relation to a large number of objects. And the UI freezes for a while when these objects are accessed. (They are accessed all at once.)
I am guessing that the related objects are faults. I am trying to figure out a way to preload them in the background. Is there a solution to this problem?
Do you know for sure that it is your main thread that is causing the slowdown (sure sounds like it) - I'd use Instruments and "Time Profiler" to be sure, and there is also a way to turn on SQL debugging/timing too.
If it is your main thread, there are fantastic WWDC videos (take a look at 2010 too, not just 2011) on how to optimize Core Data.
Try the setRelationshipKeyPathsForPrefetching: method on NSFetchRequest. Pass in an array of keys that represent relationships that should be fetched rather than faulted.
Core data is not thread safe. So for background thread you should have separate managed context.
Typically core data don't take lots time to load. But if you are storing blobs (like image data) it can hit the performance. You should you NSFetchRequestController with page size you want to set. It much faster So you probably wont need to worry about about background fetching

Performance of NSManagedObjectContext save degrades dramatically

I am having issues with a CoreData-based iOS app when it tries to build the initial DB from data sent from the server. Basically, the server sends down 1MB chunks of objects (about 3,000 per chunk), and the iOS client deserializes them and writes them into disk.
What I'm seeing is that everything is going pretty well for about the first 8 chunks (out of 44), then performance drops off dramatically and each chunk starts taking longer and longer, as in the image below. Pretty much all the time is consumed in [NSManagedObjectContext save] as you can see in the Instruments profiling data, but also it appears that the app is no longer running at 100% of CPU for some reason, like it's waiting on disk I/O or something.
A few important facts about how I'm doing this:
Each chunk is processed in its own NSManagedObjectContext with its own NSAutoreleasePool, so there is no object build-up in a non-flushed context between processing of chunks.
There is no NSUndoManager set on any of the contexts.
There is no mergeChangesFromContextDidSaveNotification: going on (i.e. the chunk contexts aren't pushing their changes into a "master" context)
I'm using a SQLite-based datastore on iOS 4.3.
The records being written do have indexes on them.
The entire sync job is processed on a single GCD background thread (i.e. dispatch_queue_create() and dispatch_async()).
I have no idea why the performance suddenly drops off like that or what can be done to address it. I have poked around and read the following, but nothing has jumped out at me yet:
http://cocoawithlove.com/2008/03/testing-core-data-with-very-big.html
Does the performance of saving a ManagedObjectContext depend on the number of contained (unchanged) objects?
Any ideas or pointers for making this app scale up to 100,000 records in the database would be much appreciated.
Edit - extra stats
This Instruments graph shows the same simulation as above (on iPad2), but includes the disk activity stats and you can see pretty plainly that all of the "not running at 100% CPU" time seems to be taken up with writing to disk.
I also ran same sync attempt running on the iOS simulator. Overall memory usage is more or less constant for each chunk except for a dictionary that contains object IDs that grows slightly over time (but these are not CoreData objects or anything that would affect saves, they are just NSNumbers). This dict is a small amount of memory compared to the total heap and so the problem is not running out of memory.
What is interesting about this test is that the CoreData Save instrument reports that the successive saves take roughly the same amount of time, which obviously conflicts with the CPU profiling information from the first set of results. It seems like CoreData thinks it is taking the same amount of time to push changes to the DB, but the DB itself (i.e. SQLite) suddenly takes a lot longer to actually stream those changes to disk.
I know this is an old issue, so this is probably no longer relevant for you, but it may be to someone else.
I've seen performance issues seeding a Core Data database over iCloud and discovered that if you have inverse relationships on the data model you can be hurt incredibly badly performance wise. The way iCloud transaction logging has been implemented, it actually seems to be an inevitable problem. Each transaction sent to iCloud (have a look at them on developer.icloud.com - they're just zipped up plists) records every relationship that is affected by a change. Unlike when you modify one end of an relationship in Core Data, and it takes care of the inverse end, the core data transaction log ends recording the changes at BOTH ends, rather than it working it out.
So if you have a 1 to many relationship, and you create another record which will end up hanging off the 'many' end - well the record at the '1' end will also be updated to reflect the fact a new additional record is now hanging off it. If you have an architecture that means you have a 'type' object that lots of 'data' objects hang off, then every time you add a new data object, the type one is going to have a transaction written for it as well - but here's the kicker, because the iCloud Core Data transactions record the ENTIRE state of edited entities, not just the changes, EVERY relationship already recorded against it is also added to the log, not just the one indicating the new subordinate record. This can quickly spiral out of control as the amount of data written grows as the number of relationships between entities grows - it ends up taking longer and longer to save batches.
I've answered a question a bit like this before here on the Apple dev forums which might be useful as I never seem to be able to describe this succinctly.
The easiest option to improve seeding performance if this scenario is what is impacting you is to switch inverse relationships off, but this isn't always an option.
More information about your implementation would help. For example, do you run this on the main thread or are you implementing background threads? However, I have seen this behavior before. When performing extensive batch operations using Core Data, it can slow down if not memory managed properly. Have you checked memory usage? Have you checked for leaks? Another thing to try is to make sure you are using NSAutoreleasePool correctly if needed. By draining the pool periodically, that may help performance.

Resources