Background
We have an app that receives sensor data at 100 Hz. Each sensor data contains three floats. Occasionally (max 1/s) some other metadata may be received that needs to be saved as well. The UI displays the latest 1000 sensor values in a graph. There are no undo-requirements - all received data must be saved to file. Each session lasts for at least 10 min, but may (in rare circumstances and mostly due to mistake) be up to an hour.
Current approach
Model: SensorData has a many-to-one relationship with Session. MetaData has a many-to-one relationship with Session.
CoreData: Set up a UIManagedDocument to handle CoreData. One MOC on main thread with a child MOC on a private queue. The child MOC creates the objects and add them to the object graph. Every 100th data, save child MOC. Once session ends, save main MOC to PSC.
Edit: The problem I have with the current approach is that saving in the child MOC lags behind, which means not all data has been processed when session ends and processing time increases with run time.
Questions
Is it feasible to use CoreData as storage mechanism at ~100 Hz, or should I look at some alternative (like saving to a csv-file)?
What considerations must I take to ensure proper/optimal performance?
I have had performance issues with saves taking a long time and blocking UI. How can I avoid this? I.e. what saving policy should I use?
Drawbacks and advantages of current approach?
I think Core Data can do this.
You could use Marcus Zarra's approach of three contexts to make sure the actual save also happens in the background.
RootContext (background) saves to persistent store ---> is parent of
MainContext (main thread) to update the UI ---> is parent of one or more
WorkerContext (background) to create new data from sensor
You could then actually save more frequently in the background to the persistent store directly without impacting UI responsiveness. This should also improve memory usage. Saving the worker context will push the changes to the UI which can be updated accordingly.
For performance make sure you batch save - with three floats I would estimate every 1.000 to 5.000 records or so (you need to experiment to find the optimal value).
Turn off the undo manager. (context.undoManager = nil)
Another consideration would be to maybe think hard about what you want to show in the UI and perhaps calculate values to display on the fly and send that to the UI, rather than have the UI rely on the entire session's data set to update itself.
I have come up against exactly this issue, in an elaboration of this project.
My task is to record live sensor data from (for example) Core Motion and Core Location at rates up to 100Hz whilst simultaneously running a smoothly animating interface which can inolve any of Core Graphics, Core Animation, OpenGL and live video. There are ~20-40 separate data items to track, mostly doubles but one or two strings, and they do not all arrive at the same sync rate.
Any hold-up during saves, however slight, will have an immediate hit on the interface.
I was interested to compare using Core Data against writing directly to a SQL database (using sqlite3). My personal experience so far (this is a work in progress) is that the SQL approach is much better suited to this type of problem than Core data. In fact its not really what Core Data was optimised for (which is rather to manage complex document object models with undo, persistence and efficient faulting). The Core Data model almost assumes that persistent saves will be prohibitively slow (for example, saving to iCloud), and much of it's engineering is designed to offer solutions to that problem.
I have tried various core data patterns, backgrounding, parent/child contexts, sync, async, batching saves ... and invariably i find a noticeable stutter whenever a persistent save actually occurs.
The SQL approach, on the other hand, is simple to understand, efficent and completely free of noticable glitches.
It may well be that I have not arrived at the optimal core data pattern for this problem (and I will be digging deeper into this, as it is an interesting edge case). However I would definitely suggest a look at the direct-to-SQL approach if that makes sense for you in your broader app context.
In slightly different data-streaming use-cases (for example, a 250-500Hz signal delivered over bluetooth) I have opted for the kind of signal-processing tricks used by audio interfaces - ring buffers, queues and callbacks can become very useful as your data rate goes up. At some point the data rate will get too high for a database-writing process to keep up: then - as you suggest - saving directly to file will be more efficient. You can always read the data back out of files at some later point and populate the database (or core data) when sampling is not taking place.
Matt Gallagher made a nice comparision of Core Data and Databases.
It's a fairly old piece, but the patterns haven't changed so it is still relevant. There's also useful little (and similarly-aged) discussion here on the benefits of flat file over database writing with high-frequency streams.
Related
Hello I'm using CoreData + MagicalRecord 3 to manage the data in my app. Until then everything was working fine, but then I realize in production than my app is freezing like hell !
So I started to investigate knowing about the fact that to not stuck the UI, it's better to have a main context and a background context and save stuff in background etc...
Nevertheless I have to question due to my setup. I use CoreData in-memory store system (for the best performance) and I don't care about storing the data on disk of my app, I'm fine with a volatile model that will be destroyed when the app is killed or in background for too long. I just want to be able to find my data from any view controller and without coupling.
So I have few questions :
1) If I would use 1 unique context, what would happen if I NEVER save it to the memory store ? For instance if I MR_createEntity then I retrieve this entity from the context and update it, is it updated everywhere or do I have to save it so it can be updated ? In other term was is the interest of saving for in-memory where you don't want to persist the data forever ?
2) If I use 1 unique context that I declare being background, if I display my screen before my data is finished to saved, the screen won't be able to find and display my data right ? Unless I use NSFetchResultController right ?
1) you want to save your data even with an in memory store for a couple of reasons. First, so that you can use core data properly in the case where you might change your mind and persist your data. Second, you'll likely want to access and process some data on different threads/queues. In that case, you'll have to use Core Data's data safety mechanisms for threads/queues. The store is the lowest level at which Core Data will sync data across threads (the old way). This may be less important if you use nested contexts to sync your data (the new way). But even with nested contexts, you'll need call save in order for your changes to merge across contexts. Core Data doesn't really like it when you save to a nil store.
2) You can make and use your own context for displaying data. NSFetchedResultsController does a lot of the leg work in listening for the correct notifications and making sure you're getting very specific updates for the data you asked for in the first place. NSFRC is not always necessary, but will certainly be the easiest way to start.
I am developing iOS an app which will gather big amounts of data from several sources (up to tens of thousands of objects, but simple objects, no images) and save it to my own database using core data. I then analyse this data and display the results to the user.
I want to know if there is any benefit to using a Main Queue Nsmanagedobjectcontext or if it is enough that I use a private one.
I also want to know what the benefit is of having several NSManagedObjectContext or if one is enough?
The concurrency model i am using currently only has one private queue nsmanagedobjectcontext connected to a persistant store coordinator. All the data analysis is performed on the private queue and then I simply pass the analyzed data to the main queue to display it. On older devices (iPhone 4) my application can sometimes crash when too much data is being loaded (i.e. downloaded from the external databases) at the same time, is this related to my choice of concurrency model?
Your current approach sounds fine. You only need a main thread context if you want the main thread to interact with the data, and in your case you don't so that's fine.
Your memory management is effectively unrelated and is more tied to how many things you have going on at once (it sounds like one) and how many objects you try to keep in main memory at any one time (it sounds like many) instead of faulting them out to the data store. This is what you need to look at / work on. Instruments can help you see how many objects you're keeping in memory.
At least call refreshObject:mergeChanges: with NO for merge changes to fault out any objects that you aren't using.
Also, remember that you're working on a mobile device and that processing up to tens of thousands of objects is a job better handled by a server...
For example if we hit "Stop" in XCode, it will close the app, mimicking the crash behaviour.
But if my Core Data Context hasn't been saved, when I go back, the data won't be there.
Are there any workaround for this?
Should I save the context every time a big operation is finished?
Thanks.
Based on my experience you should decide the right granularity when you use Core Data save mechanism.
IMHO (maybe someone else could have different opinions) there is no standard to follow. My rule of thumb is taking into considerations two different aspects. The user and performances.
In the first case, you should save whenever the user performs critical operations. e.g. the user has inserted a lot of values in a form and hence he will expect to not insert them again. Regarding the second aspect, save operations could impact the performance of your app. If you frequently write changes to disk the app will be less responsive. On the contrary having so many objects in memory could led to memory warning (those will cause Core Data to take specific behaviors).
A tradeoff could be using background operations to save changes or take advantage of new Core Data API. Obviously, previous rules still remain valid.
I am currently using a method where I run a fetch request in the background to obtain object IDs, and then instantiating them with -existingObjectWithID:error:.
The problem is that these objects have to-many relation to a large number of objects. And the UI freezes for a while when these objects are accessed. (They are accessed all at once.)
I am guessing that the related objects are faults. I am trying to figure out a way to preload them in the background. Is there a solution to this problem?
Do you know for sure that it is your main thread that is causing the slowdown (sure sounds like it) - I'd use Instruments and "Time Profiler" to be sure, and there is also a way to turn on SQL debugging/timing too.
If it is your main thread, there are fantastic WWDC videos (take a look at 2010 too, not just 2011) on how to optimize Core Data.
Try the setRelationshipKeyPathsForPrefetching: method on NSFetchRequest. Pass in an array of keys that represent relationships that should be fetched rather than faulted.
Core data is not thread safe. So for background thread you should have separate managed context.
Typically core data don't take lots time to load. But if you are storing blobs (like image data) it can hit the performance. You should you NSFetchRequestController with page size you want to set. It much faster So you probably wont need to worry about about background fetching
I am having issues with a CoreData-based iOS app when it tries to build the initial DB from data sent from the server. Basically, the server sends down 1MB chunks of objects (about 3,000 per chunk), and the iOS client deserializes them and writes them into disk.
What I'm seeing is that everything is going pretty well for about the first 8 chunks (out of 44), then performance drops off dramatically and each chunk starts taking longer and longer, as in the image below. Pretty much all the time is consumed in [NSManagedObjectContext save] as you can see in the Instruments profiling data, but also it appears that the app is no longer running at 100% of CPU for some reason, like it's waiting on disk I/O or something.
A few important facts about how I'm doing this:
Each chunk is processed in its own NSManagedObjectContext with its own NSAutoreleasePool, so there is no object build-up in a non-flushed context between processing of chunks.
There is no NSUndoManager set on any of the contexts.
There is no mergeChangesFromContextDidSaveNotification: going on (i.e. the chunk contexts aren't pushing their changes into a "master" context)
I'm using a SQLite-based datastore on iOS 4.3.
The records being written do have indexes on them.
The entire sync job is processed on a single GCD background thread (i.e. dispatch_queue_create() and dispatch_async()).
I have no idea why the performance suddenly drops off like that or what can be done to address it. I have poked around and read the following, but nothing has jumped out at me yet:
http://cocoawithlove.com/2008/03/testing-core-data-with-very-big.html
Does the performance of saving a ManagedObjectContext depend on the number of contained (unchanged) objects?
Any ideas or pointers for making this app scale up to 100,000 records in the database would be much appreciated.
Edit - extra stats
This Instruments graph shows the same simulation as above (on iPad2), but includes the disk activity stats and you can see pretty plainly that all of the "not running at 100% CPU" time seems to be taken up with writing to disk.
I also ran same sync attempt running on the iOS simulator. Overall memory usage is more or less constant for each chunk except for a dictionary that contains object IDs that grows slightly over time (but these are not CoreData objects or anything that would affect saves, they are just NSNumbers). This dict is a small amount of memory compared to the total heap and so the problem is not running out of memory.
What is interesting about this test is that the CoreData Save instrument reports that the successive saves take roughly the same amount of time, which obviously conflicts with the CPU profiling information from the first set of results. It seems like CoreData thinks it is taking the same amount of time to push changes to the DB, but the DB itself (i.e. SQLite) suddenly takes a lot longer to actually stream those changes to disk.
I know this is an old issue, so this is probably no longer relevant for you, but it may be to someone else.
I've seen performance issues seeding a Core Data database over iCloud and discovered that if you have inverse relationships on the data model you can be hurt incredibly badly performance wise. The way iCloud transaction logging has been implemented, it actually seems to be an inevitable problem. Each transaction sent to iCloud (have a look at them on developer.icloud.com - they're just zipped up plists) records every relationship that is affected by a change. Unlike when you modify one end of an relationship in Core Data, and it takes care of the inverse end, the core data transaction log ends recording the changes at BOTH ends, rather than it working it out.
So if you have a 1 to many relationship, and you create another record which will end up hanging off the 'many' end - well the record at the '1' end will also be updated to reflect the fact a new additional record is now hanging off it. If you have an architecture that means you have a 'type' object that lots of 'data' objects hang off, then every time you add a new data object, the type one is going to have a transaction written for it as well - but here's the kicker, because the iCloud Core Data transactions record the ENTIRE state of edited entities, not just the changes, EVERY relationship already recorded against it is also added to the log, not just the one indicating the new subordinate record. This can quickly spiral out of control as the amount of data written grows as the number of relationships between entities grows - it ends up taking longer and longer to save batches.
I've answered a question a bit like this before here on the Apple dev forums which might be useful as I never seem to be able to describe this succinctly.
The easiest option to improve seeding performance if this scenario is what is impacting you is to switch inverse relationships off, but this isn't always an option.
More information about your implementation would help. For example, do you run this on the main thread or are you implementing background threads? However, I have seen this behavior before. When performing extensive batch operations using Core Data, it can slow down if not memory managed properly. Have you checked memory usage? Have you checked for leaks? Another thing to try is to make sure you are using NSAutoreleasePool correctly if needed. By draining the pool periodically, that may help performance.