I am having issues with a CoreData-based iOS app when it tries to build the initial DB from data sent from the server. Basically, the server sends down 1MB chunks of objects (about 3,000 per chunk), and the iOS client deserializes them and writes them into disk.
What I'm seeing is that everything is going pretty well for about the first 8 chunks (out of 44), then performance drops off dramatically and each chunk starts taking longer and longer, as in the image below. Pretty much all the time is consumed in [NSManagedObjectContext save] as you can see in the Instruments profiling data, but also it appears that the app is no longer running at 100% of CPU for some reason, like it's waiting on disk I/O or something.
A few important facts about how I'm doing this:
Each chunk is processed in its own NSManagedObjectContext with its own NSAutoreleasePool, so there is no object build-up in a non-flushed context between processing of chunks.
There is no NSUndoManager set on any of the contexts.
There is no mergeChangesFromContextDidSaveNotification: going on (i.e. the chunk contexts aren't pushing their changes into a "master" context)
I'm using a SQLite-based datastore on iOS 4.3.
The records being written do have indexes on them.
The entire sync job is processed on a single GCD background thread (i.e. dispatch_queue_create() and dispatch_async()).
I have no idea why the performance suddenly drops off like that or what can be done to address it. I have poked around and read the following, but nothing has jumped out at me yet:
http://cocoawithlove.com/2008/03/testing-core-data-with-very-big.html
Does the performance of saving a ManagedObjectContext depend on the number of contained (unchanged) objects?
Any ideas or pointers for making this app scale up to 100,000 records in the database would be much appreciated.
Edit - extra stats
This Instruments graph shows the same simulation as above (on iPad2), but includes the disk activity stats and you can see pretty plainly that all of the "not running at 100% CPU" time seems to be taken up with writing to disk.
I also ran same sync attempt running on the iOS simulator. Overall memory usage is more or less constant for each chunk except for a dictionary that contains object IDs that grows slightly over time (but these are not CoreData objects or anything that would affect saves, they are just NSNumbers). This dict is a small amount of memory compared to the total heap and so the problem is not running out of memory.
What is interesting about this test is that the CoreData Save instrument reports that the successive saves take roughly the same amount of time, which obviously conflicts with the CPU profiling information from the first set of results. It seems like CoreData thinks it is taking the same amount of time to push changes to the DB, but the DB itself (i.e. SQLite) suddenly takes a lot longer to actually stream those changes to disk.
I know this is an old issue, so this is probably no longer relevant for you, but it may be to someone else.
I've seen performance issues seeding a Core Data database over iCloud and discovered that if you have inverse relationships on the data model you can be hurt incredibly badly performance wise. The way iCloud transaction logging has been implemented, it actually seems to be an inevitable problem. Each transaction sent to iCloud (have a look at them on developer.icloud.com - they're just zipped up plists) records every relationship that is affected by a change. Unlike when you modify one end of an relationship in Core Data, and it takes care of the inverse end, the core data transaction log ends recording the changes at BOTH ends, rather than it working it out.
So if you have a 1 to many relationship, and you create another record which will end up hanging off the 'many' end - well the record at the '1' end will also be updated to reflect the fact a new additional record is now hanging off it. If you have an architecture that means you have a 'type' object that lots of 'data' objects hang off, then every time you add a new data object, the type one is going to have a transaction written for it as well - but here's the kicker, because the iCloud Core Data transactions record the ENTIRE state of edited entities, not just the changes, EVERY relationship already recorded against it is also added to the log, not just the one indicating the new subordinate record. This can quickly spiral out of control as the amount of data written grows as the number of relationships between entities grows - it ends up taking longer and longer to save batches.
I've answered a question a bit like this before here on the Apple dev forums which might be useful as I never seem to be able to describe this succinctly.
The easiest option to improve seeding performance if this scenario is what is impacting you is to switch inverse relationships off, but this isn't always an option.
More information about your implementation would help. For example, do you run this on the main thread or are you implementing background threads? However, I have seen this behavior before. When performing extensive batch operations using Core Data, it can slow down if not memory managed properly. Have you checked memory usage? Have you checked for leaks? Another thing to try is to make sure you are using NSAutoreleasePool correctly if needed. By draining the pool periodically, that may help performance.
Related
My application is in swift(latest version) language, And its has a bit complex database structure.
I'm dumping records while app launch first time as app must support offline information, My app can have millions of records.
Now Saving records in entities, which has the relationship with around 14-15 entity(one to one and one to many).
My Application through memory warning and gets terminated after around 1000 thousand records. I tried with profiling for leakages but that time app is working fine, however, it take a long time.
I have tried to create singleton class of context manager and also tried with creating local kind of variable while inserting a chunk of records.
For now, I'm fetching 50 records from web API and saving my context by updating my entities.
I have tried with autoreleasepool, but no success.
Please suggest me what should it do?
Thank you
Ashwin
I can advice you to watch this video. It is very inspiring and explains a lot of useful things about core data:
https://developer.apple.com/videos/play/wwdc2013/211/
Are you using fetchBatchSize property?
https://developer.apple.com/reference/coredata/nsfetchrequest/1506558-fetchbatchsize
If you are processing large amounts of Core Data objects in a loop, then you need to periodically save the context so that core data can turn modified objects back into faults instead of keeping them in memory. How often you need to save and when depends on your application and the code you are using to process, which it would be helpful to see in your question. You'll need to experiment yourself to find a balance between speed and memory use.
Use the allocations instrument and you will see where your memory is going. You're not leaking memory, you're just using too much of it.
Disable zombie object of your project. Below I have posted how to disable zombie object follow it via images.
For more details about zombie object enter link description here
Image 1
Image 2
I have an app that uses Realm as a staging database. It receives information from a bluetooth device, processes it, and sends the processed result to a server.
The incoming data from bluetooth gets stored in a Realm table (table1). A separate thread reads data from the Realm database, processes it, and stores it into a second table (table2) for uploading to a server. When it pulls this data and successfully processed it, it deletes it from table1.
The third thread pulls data from table2, and when it successfully sends, removes it from table2.
I'm using a database here in case, for whatever reason, the app is killed - data won't be lost... it will just resume where it left off when the app is restarted. But as you can see, the database is not something that hangs around (it's not like an address book or something... it is just temporary staging)
What I notice is that no matter what the heck I do, the realm database file just increases in size over time. I'll end up with a database that if I open it, will have one record in it, but the database file on disk could be 10s of MB in size if the app is running long enough.
Data is being processed on different background queues so as to not block any UX (one of the reasons I'm using Realm instead of CoreData). But I'm using things like autoreleasepools and the invalidate command to avoid objects that are read from having copies made (as suggested by many realm questions/answers)
What gives? I know I don't have a code sample here, but this just seems like a basic garbage collection problem in Realm. I've seen other questions related to this where people are like "why is my database so huge", and the answers suggest doing things like "writeCopyToPath", but that feels like an incredible hack, and regardless, it would be very difficult - this app is meant to be constantly connected and monitoring a bluetooth device, so to do this, it would mean stopping, making sure all threads that might alter the database are quiesced, doing the copy to compact the db, and then starting everything back up again. That just seems nonsensical to me. I might interrupt user operations for example. I don't want a user to not be able to do something because I decided it was time to do database maintenance.
I feel like I'm either missing some incredibly fundamental point in how to make Realm not keep junk around, or Realm is just the completely wrong solution for my problem. I've never seen this problem with databases - adding and deleting lots of records... quickly... seems like something a database should just be able to do without exploding in size.
Are you making sure that the background thread is not holding on to old versions of the Realm, preventing the space from being reused?
Quote from the docs (https://realm.io/docs/swift/latest/#seeing-changes-from-other-threads):
If a thread has no runloop (which is generally the case in a background thread), then Realm.refresh() must be called manually in order to advance the transaction to the most recent state.
Failing to refresh Realms on a regular basis could lead to some transaction versions becoming “pinned”, preventing Realm from reusing the disk space used by that version, leading to larger file sizes.
By using both the synchronous=OFF and journal_mode=MEMORY options, I am able to reduce the speed of updates from 15 ms to around 2 ms which is a major performance improvement. These updates happen one at a time, so many other optimizations (like using transactions about a bunch of them) are not applicable.
According to the SQLite documentation, the DB can go 'corrupt' in the worst case if there is a power outage of some type. However, is not the worst thing that can happen is for the data to be lost, or possibly part of a transaction to be lost (which I guess is a form of corruption). Is it really possible for arbitrary corruption to occur with either of these options? If so, why?
I am not using any transactions, so partially written data from transactions is not a concern, and I can handle loosing data once in a blue moon. But if 'corruption' means that all the data in the DB can be randomly changed in an unpredictable way, that would be a strong reason to not use these options.
Does any one know what the real worst-case behavior would be on iOS?
Tables are organized as B-trees with the rowid as the key.
If some writes get lost while SQLite is updating the tree structure, the entire table might become unreadable.
(The same can happen with indexes, but those could be simply dropped and recreated.)
Data is organized in pages (typically 1 KB or 4 KB). If some page update gets lost while some tree is being reorganized, all the data in these pages (i.e., some random rows from the table with nearby rowid values) might become corruped.
If SQLite needs to allocate a new page, and that page contains plausible data (e.g., deleted data from the same table), and the writing of that page gets lost, then you have incorrect data in the table, without the ability to detect it.
Background
We have an app that receives sensor data at 100 Hz. Each sensor data contains three floats. Occasionally (max 1/s) some other metadata may be received that needs to be saved as well. The UI displays the latest 1000 sensor values in a graph. There are no undo-requirements - all received data must be saved to file. Each session lasts for at least 10 min, but may (in rare circumstances and mostly due to mistake) be up to an hour.
Current approach
Model: SensorData has a many-to-one relationship with Session. MetaData has a many-to-one relationship with Session.
CoreData: Set up a UIManagedDocument to handle CoreData. One MOC on main thread with a child MOC on a private queue. The child MOC creates the objects and add them to the object graph. Every 100th data, save child MOC. Once session ends, save main MOC to PSC.
Edit: The problem I have with the current approach is that saving in the child MOC lags behind, which means not all data has been processed when session ends and processing time increases with run time.
Questions
Is it feasible to use CoreData as storage mechanism at ~100 Hz, or should I look at some alternative (like saving to a csv-file)?
What considerations must I take to ensure proper/optimal performance?
I have had performance issues with saves taking a long time and blocking UI. How can I avoid this? I.e. what saving policy should I use?
Drawbacks and advantages of current approach?
I think Core Data can do this.
You could use Marcus Zarra's approach of three contexts to make sure the actual save also happens in the background.
RootContext (background) saves to persistent store ---> is parent of
MainContext (main thread) to update the UI ---> is parent of one or more
WorkerContext (background) to create new data from sensor
You could then actually save more frequently in the background to the persistent store directly without impacting UI responsiveness. This should also improve memory usage. Saving the worker context will push the changes to the UI which can be updated accordingly.
For performance make sure you batch save - with three floats I would estimate every 1.000 to 5.000 records or so (you need to experiment to find the optimal value).
Turn off the undo manager. (context.undoManager = nil)
Another consideration would be to maybe think hard about what you want to show in the UI and perhaps calculate values to display on the fly and send that to the UI, rather than have the UI rely on the entire session's data set to update itself.
I have come up against exactly this issue, in an elaboration of this project.
My task is to record live sensor data from (for example) Core Motion and Core Location at rates up to 100Hz whilst simultaneously running a smoothly animating interface which can inolve any of Core Graphics, Core Animation, OpenGL and live video. There are ~20-40 separate data items to track, mostly doubles but one or two strings, and they do not all arrive at the same sync rate.
Any hold-up during saves, however slight, will have an immediate hit on the interface.
I was interested to compare using Core Data against writing directly to a SQL database (using sqlite3). My personal experience so far (this is a work in progress) is that the SQL approach is much better suited to this type of problem than Core data. In fact its not really what Core Data was optimised for (which is rather to manage complex document object models with undo, persistence and efficient faulting). The Core Data model almost assumes that persistent saves will be prohibitively slow (for example, saving to iCloud), and much of it's engineering is designed to offer solutions to that problem.
I have tried various core data patterns, backgrounding, parent/child contexts, sync, async, batching saves ... and invariably i find a noticeable stutter whenever a persistent save actually occurs.
The SQL approach, on the other hand, is simple to understand, efficent and completely free of noticable glitches.
It may well be that I have not arrived at the optimal core data pattern for this problem (and I will be digging deeper into this, as it is an interesting edge case). However I would definitely suggest a look at the direct-to-SQL approach if that makes sense for you in your broader app context.
In slightly different data-streaming use-cases (for example, a 250-500Hz signal delivered over bluetooth) I have opted for the kind of signal-processing tricks used by audio interfaces - ring buffers, queues and callbacks can become very useful as your data rate goes up. At some point the data rate will get too high for a database-writing process to keep up: then - as you suggest - saving directly to file will be more efficient. You can always read the data back out of files at some later point and populate the database (or core data) when sampling is not taking place.
Matt Gallagher made a nice comparision of Core Data and Databases.
It's a fairly old piece, but the patterns haven't changed so it is still relevant. There's also useful little (and similarly-aged) discussion here on the benefits of flat file over database writing with high-frequency streams.
I'm getting a weird issue. I have the following set-up:
Model.xcdatamodeld
ModelBackup.xcdatamodeld
Each of these have their own NSManagedObjectContext NSPersistantStoreCoordinator and NSMananagedObjectModel.
I can read and write to each of these successfully separately. However, if I try to read/write to Model at the same time I'm read/writing to ModelBackup, I get infinite memory allocation. In the simulator, the CPU will spike to 200% and the memory will increase around 80-100 MB/second. This will eventually crash when memory hits 2.0+GB. This happens when I do a context save on both these NSManagedObjectContexts. I can read/access them fine.
Anyone know why I'm unable to write to both of these?
I have Model with ConcurrencyType:NSMainQueueConcurrencyType and ModelBackup with ConcurrencyType:NSPrivateQueueConcurrencyType.
The idea behind having two different xcdatamodeld is a workaround/alternative the parent-child core-data pattern. Our app is doing massive updates to the datamodel so I want to have a background datamodel perform those updates then when the app launches the next time, switch to the new updated sqlite.
First, you can do massive updates with parent/child as long as you pay attention to how much you are writing to disk at any one save.
Second, you can have two NSPersistentStoreCoordinator instances pointed at the same sqlite file and avoid using two files.
Third, What does your code look like for the creation of these contexts? Are you using the second context only within -performBlock: calls?
Fourth, what happens when you stop in the middle of that memory allocation? What does your stack look like?
For your immediate problem, you have an infinite loop. If I had to guess I would guess that you are using NSManagedObjectContextDidSaveNotification, probably with a nil object and each save is kicking off a save to the other context causing a loop.
Seeing the code would help solve your immediate issue. However I suspect there is an easier solution to your problem.