CoreData and a large initial data load

CoreData and a large initial data load - ios

I'm performing a large initial data load into coredata on iOS. My app comes loaded with about 500 compressed JSON files (about 250MB inflated). I sequentially decompress and load each file into coredata. I'm pretty sure I am closing all the streams. In the beginning I had a large connected graph using relationships. I have since replaced all relationships with URIs; so, there is no explicit object graph. My store type is SQLlite. I'm assuming an iPhone 6 at iOS 12+.
Data is loaded on a background queue and I can watch the progress on my iOS device. I do periodic context saves and resets at logical stopping points.
My problem is the initial data load consumes too much memory, about 600MB before it's terminated because of memory issues. If I stop the loading about half way, memory consumption will peak at 300MB+ and then rapidly fall off to to 13MB. It doesn't look like a memory leak. Apart from the managed object context there are no objects that span the breadth of the load.
What this looks like to me is that CoreData is not flushing anything to storage because it's lower priority than my inserts. In other words, the backlog grows too fast.
If I add a delay using asyncAfter to the last half of the data load, it will start at the low water mark or about 13MB.
So here are the questions:
does my diagnosis seem plausible?
is there some magic method that will cause CoreData to flush its cache
can I create a queue that is lower priority than whatever coredata is using to flush objects to storage?
conversely, do I need to throttle my inserts?
am I using coredata in a manner it wasn't designed for (i.e., ~250MB data base)?

Related

Quick tips to improve pylon playback performance

Turn off all the data channels (visualizations) that are not in use.
If high resolution is not needed, set FPS to ow to reduce amount of data loaded, then turn it back up and refresh it only the spot of interest is identified. Use the Copy link button to save the time offset url.
Turn off large data channels. Seek to the instance of interest and then turn ON the large data channel.
Using the left/right arrow keys to step through the frame gives a bit smoother performance than randomly seeking.
Concurrency is basically the amount of worker to fetch data. But if they user’s machine is already reaching the limit of data transfer, having more concurrency is just more overhead (timestamp 46:40)
(warning) Setting Buffer value very high might lead to loading data that is not needed.
Sometimes the “performance glitch” could be because of data unavailability or some other issue. DP is working on a over haul to Pylon to show users the (un)availability of data on the UI.

Xodus high insertion rate

I'm using Xodus for storing time-series data (100-500 million rows are inserted daily.)
I have multiple stores per one environment. New store is created every day, older stores (created more than 30 days can be deleted). Recently my total environment size grew up to 500 gb.
Reading/Writing speed degraded dramatically, after initial investigation it turns out, that Xodus background cleaner thread is consuming almost all IO resources. iostats shows almost 90 % utilization with 20 mb/sec reading and 0 mb/sec writing.
I decided to give background thread some time to cleanup environment, but it keep running for few days, so eventually I had to delete whole environment.
Xodus is great tool, it looks for me that I've made wrong choose, Xodus is not designed for inserting huge amount of data due append-only modifications design. If you insert too much data, background cleaner thread will not be able to compact your data and will consume all IO.
Can you advice any tip and tricks when working with big data size with Xodus ? I could create new environment every day instead of creating new store

If you are ok about fetching data from different environments, then you will definitely benefit from creating an instance of Environment every day instead of an instance of Store. In that case, GC will work on only a daily amount of data. Insertion rate will be more or less constant, whereas fetching will slowly degrade with the increase of the total amount of data.
If working with several environments within a single JVM, make sure the exodus.log.cache.shared setting of EnvironmentConfig is set to true.

Core Data Excessive VM: SQLite page cache

I will keep this question general for now and avoid cluttering this with code.
I have an iOS application that uses Core Data (sqlite) for its data store. The model is fairly complex, with large hierarchy of objects. When I fetch and import these large data sets I am noticing that the application shuts down after awhile due to a memory warning.
The Allocations profiler shows me excessive "transient" VM: SQLite page objects. The size of this keeps growing and growing but NEVER goes down. I have tried to ensure that all of my NSManagedObjectContext saves occur inside performBlock calls.
It would seem to me as if there are object contexts that are not getting deallocated and / or reset.
I have tried disabling undoManager in NSManagedObjectContext. setting the stalenessInterval to a very low value (1.0), and calling reset on my MOC's after they are done saving data upon import.
What does this mean when the transient VM SQLite page cache continues to go up so high?
What needs to be done in order to make the page cache go down?
What is an acceptable size for this cache to get to in a large Core Data application?
Thanks,

Well it turns out the transient VM SQLite page cache column show in Instruments is cumulative to the session, not the "current" value. Well of course it never goes down then!
Turns out that some other optimizations around ensuring managed object contexts get cleared out fixed our CoreData memory issue.
Great article here on the subject: Core Data issues with memory allocation

Core Data refuses to clear external data references from memory

I am loading large amounts of data into Core Data on a background thread with a background NSManagedObjectContext. I frequently reset this background context after it's saved in order to clear the object graph from memory. The context is also disposed of once the operation is complete.
The problem is that no matter what I do, Core Data refuses to release large chunks of data that are stored as external references. I've verified this in the Allocations instrument. Once the app restarts the memory footprint stays extremely low as these external references are only unfaulted when accessed by the user. I need to be able to remove these BLOBS from memory after the initial download and import since they take up too much space collectively. On average they are just html so most are less than 1MB.
I have tried refreshObject:mergeChanges: with the flag set to NO on pretty much everything. I've even tried reseting my main NSManagedObjectContext too. I have plenty of autorelease pools, there are no memory leaks, and zombies isn't enabled. How can I reduce my Core Data memory footprint when external references are initially created?
I've reviewed all of Apple's documentation and can't find anything about the life cycle of external BLOBS. I've also searched the many similar questions on this site with no solution: Core Data Import - Not releasing memory
Everything works fine after the app first reboots, but I need this first run to be stable too. Anyone else been able to successfully fault NSData BLOBS with Core Data?

I'm assuming the "clear from memory" means "cause the objects to be deallocated" and not "return the address space to the system". The former is under your control. The latter is not.
If you can see the allocations in the Allocations instrument, have you turned on tracking of reference count events and balanced the retains and releases? There should be an indicative extra retain (or more).
If you can provide a simple example project, it would be easier to figure out what is going on.

SQLite: ON disk Vs Memory Database

We are trying to Integrate SQLite in our Application and are trying to populate as a Cache. We are planning to use it as a In Memory Database. Using it for the first time. Our Application is C++ based.
Our Application interacts with the Master Database to fetch data and performs numerous operations. These Operations are generally concerned with one Table which is quite huge in size.
We replicated this Table in SQLite and following are the observations:
Number of Fields: 60
Number of Records: 1,00,000
As the data population starts, the memory of the Application, shoots up drastically to ~1.4 GB from 120MB. At this time our application is in idle state and not doing any major operations. But normally, once the Operations start, the Memory Utilization shoots up. Now with SQLite as in Memory DB and this high memory usage, we don’t think we will be able to support these many records.
When I create the DB on Disk, the DB size sums to ~40MB. But still the Memory Usage of the Application remains very high.
Q. Is there a reason for this high usage. All buffers have been cleared and as said before the DB is not in memory?
Any help would be deeply appreciated.
Thanks and Regards
Sachin

You can use the vacuum command to free up memory by reducing the size of sqlite database.
If you are doing a lot of insert update operations then the db size may increase. You can use vaccum command to free up space.

SQLite uses memory for things other than the data itself. It holds not only the data, but also the connections, prepared statements, query cache, query results, etc. You can read more on SQLite Memory Allocation and tweak it. Make sure you are properly destroying your objects too (sqlite3_finalize(), etc.).

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart