Migrating large Core Data database crash

Migrating large Core Data database crash - ios

I've got an application that stores products in a Core Data file. These pruducts include images as "Transformable" data.
Now I tried adding some attributes using Lightweight migration. When I tested this with a small database it worked well but when I use a really large one with nearly 500 MB the application usually crashes because of low memory. Does anybody know how to solve this problem?
Thanks in advanced!

You'll have to use one of the other migration options. The automatic lightweight migration process is really convenient to use. But it has the drawback that it loads the entire data store into memory at once. Two copies, really, one for before migration and one for after.
First, can any of this data be re-created or re-downloaded? If so, you might be able to use a custom mapping model from the old version to the new one. With a custom mapping model you can indicate that some attributes don't get migrated, which reduces memory issues by throwing out that data. Then when migration is complete, recreate or re-download that data.
If that's not the case... Apple suggests a multiple pass technique using multiple mapping models. If you have multiple entity types that contribute to the large data store size, it might help. Basically you end up migrating different entity types in different passes, so you avoid the overhead of loading everything at once.
If that is not the case then (e.g. the bloat is all from instances of the same entity type), well, it's time to write your own custom migration code. This will involve setting up two Core Data stacks, one with the existing data and one with the new model. Run through the existing data store, creating new objects in the new store. If you do this in batches you'll be able to keep memory under control. The general approach would be:
Create new instances in the new model and copy attributes only. You can't set up relationships yet because related objects might not exist in the new data store. Keep a mutable dictionary mapping NSManagedObjectIDs from the old store to the new one, for use in the next step. To keep memory use low:
As soon as you have created a destination store object, free up the memory for the source object by using refreshObject:mergeChanges with NO for the second argument.
Every 10 instances (or 50, or whatever) save changes on the destination managed object context and then reset it. The interval is a balancing act-- do it too often and you'll slow down unnecessarily, do it too rarely and memory use rises.
Do a second pass where you set up relationships in the destination store. For each source object,
Find the corresponding destination object, using the object ID map you created
Run through the source object's relationships. For each one, find the corresponding destination object, also using the object ID map.
Set the destination object's relationship based on the result.
While you are at it consider why your data store is so big. Are you storing a bunch of binary data blobs in the data store? If so, make sure you're using the "Allows external storage" option in the new model.

Related

How to create snapshot of CoreData state?

Background story
I am developing a big iOS app. This app works under specific assumptions. The main of them is that app should work offline with internal storage which is a snapshot of last synchronized state of data saved on server. I decided to use CoreData to handle this storage. Every time app launches I check if WiFi connection is enabled and then try to synchronize storage with server. The synchronization can take about 3 minutes because of size of data.
The synchronization process consists of several stages and in each of them I:
fetch some data from the server (XML)
deserialize it
save it in Core Data
Problem
Synchronization process can be interrupted for several reasons (internet connection, server down, user leaving application, etc). This may cause data to be out-of-sync.
Let's assume that synchronization process has 5 stages and it breaks after third. It results in 3/5 of data being updated in internal storage and the rest being out of sync. I can't allow it because data are strongly connected to each other (business logic).
Goal
I don't know if it is possible but I'm thinking about implementing one solution. On start of synchronization process I would like to create snapshot (some kind of copy) of current state of Core Date and during synchronization process work on it. When synchronization process completes with success then this snapshot could overwrite current CoreData state. When synchronization interrupts then snapshot can be simply aborted. My internal storage will be secured.
Questions
How to create CoreData snapshot?
How to work with CoreData snapshot?
How to overwrite CoreDate state with snapshot?
Thanks in advice for any help. Code examples, if it is possible, will be appreciated.
EDIT 1
The size of data is too big to handle it with multiple CoreData's contexts. During synchronization I am saving current context multiple times to cleanup memory. If I do not do it, the application will crash with memory error.
I think it should be resolved with multiple NSPersistentStoreCoordinators using for example this method: link. Unfortunately, I don't know how to implement this.

You should do exactly what you said. Just create class (lets call it SyncBuffer) with methods "load", "sync" and "save".
The "load" method should read all entities from CoreData and store it in class variables.
The "sync" method should make all the synchronisation using class variables.
Finally the "save" method should save all values from class variables to CoreData - here you can even remove all data from CoreData and save brand new values from SyncBuffer.

A CoreData stack is composed at its core by three components: A context (NSManagedObjectContext) a model (NSManagedObjectModel) and the store coordinator (NSPersistentStoreCoordinator/NSPersistentStore).
What you want is to have two different contexts, that shares the same model but use two different stores. The store itself will be of the same type (i.e. an SQLite db) but use a different source file.
At this page you can see some documentation about the stack:
https://developer.apple.com/library/archive/documentation/Cocoa/Conceptual/CoreData/InitializingtheCoreDataStack.html#//apple_ref/doc/uid/TP40001075-CH4-SW1
The NSPersistentContainer is a convenience class to initialise the CoreData stack.
Take the example of the initialisation of a NSPersistentContainer from the link: you can have the exact same code to initialise it, twice, but with the only difference that the two NSPersistentContainer use a different .sqlite file: i.e. you can have two properties in your app delegate called managedObjectContextForUI and managedObjectContextForSyncing that loads different .sqlite files. Then in your program you can use the context from one store to get current data to show to the user and you can use the context that use the other store with a different .sqlite if you are doing sync operations. When the sync operations are finally done you can eventually swap the two files and after clearing and reloading the NSPersistentContainer (this might be tricky, because you will want to invalidate and reload all managed objects: you are switching to an entirely new context) you can then show the newly synced data to the user and start syncing again on a new .sqlite file.

The way I understand the problem is that you wish to be able download a large "object graph". It is however so large that it cannot be loaded at once in memory, so you would have to break it in chunks and then merge it locally into to Core data.
If that is the case, I think that's not trivial. I am not sure I can think of direct solution without understanding the object relations and even then it may be really overwhelming.
An overly simplistic solution may be to generate the sqlite file on the backend then download it in chunks; it seems ugly, but it serves to separate the business logic from the sync, i.e. the sqlite file becomes the transport layer. So, I think the essence to the solution would be to find a way to physically represent the data you are syncing in a format that allows for splitting it in chunks and that can afterwards be merged into a sqlite file (if you insist on using Core data).
Please also note that as far as I know Amazon (https://aws.amazon.com/appsync/) and Realm (https://realm.io/blog/introducing-realm-mobile-platform/) provide background sync of you local database, but those are paid services and you would have to be careful not be locked in (should not depend on their libs in your model layer, instead have a translation layer).

Singletons vs Core Data

This question is not about the technical problem, but rather the approach.
I know two more or less common approaches to store the data received from the server in your app:
1) Using managers, data holders etc to store the data. They are most often some kind of singleton and are used to store the models received from the server. (E.g. - the array of the posts/places/users) Singletons are needed to be able to access the data from any screen. I think the majority of apps uses this approach.
2) Using Core Data (or maybe Realm) as in-memory storage. This approach avoids having singletons, but, I guess, it is a bit more complex (and crash risky) to maintain and support.
How do you store data and why?
P.S. Any answers would help. But big "thank you" for detailed ones, with reasons.

The reason people opt to use Core Data/Relam/Shark or any other iOS ORM is mainly for the purpose of persisting data between runs of the app.
Currently there are two ways of doing this, for single values and very small (not that I encourage it) objects you can use the UserDefaults to persist between app launches. For a approach closer to a database, infact in the case of Core Data and SharkORM, they are built on top of SQLite, you need to use an ORM.
Using a manager to store an array of a data models will only persist said models for the lifetime of the app. For example when the user force quits the app, restarts their device or in some circumstances when iOS terminates your app, all that data will be lost permanently. This is because it is stored in RAM which is volatile memory, rather than in a database on the disk itself.
Using a database layer even if you don't specifically require persistence between launches can have its advantages though; for instance SharkORM allows you to execute raw SQL queries on your objects if you don't want to use the built in powerful query builder. This can be useful to quickly pull the model you are interested in rather than iterating through a local array.
In reply to your question, how do I store data?
Well, I use a combination of all three. Say for instance I called to an API for some data which I wanted to display there and then to the user, I would use a manager instance with an array to hold the data model.
But on the flipside if I wanted to store that data for later or if I needed to execute a complex query on it, I would store it on disk using Shark.
If however I just wanted to store whether or not the user had seen my on boarding flow I would just persist a boolean value into UserDefaults.
I hope this is detailed enough for you.

CoreData isn't strictly "in-memory". You can load objects into your data model and save them into their context, then they might actually be on disk and out of main memory, and they can easily be brought back via fetch requests.
Singletons, on the other hand, do typically stay in main memory all the time until the user terminates the app. If you have larger objects that you are storing in some data structure (e.g. full resolution images when all you really needed was a thumbnail), this can be quite a resource hog.

Core Data vs NSFileManager

The problem:
I have been using for some time now my own cache system by using NSFileManager. Normally the data I receive is JSON and I just save the dictionary directly into cache (in the Documents Folder). When I need it back I will just go get it. I also implement, sometimes when I feel it's better, a NSDictionary on the root Folder with keys/values for the path for a given resource. For example:
Resource about weather in Geneve 17/02/2013, so I would have a key called GE_17_02_2013 and the value the path to the NSDictionary with the information.
Normally I don't need to do any complex queries. But, somehow, and from what I have been reading, when you have a lot of data, you should stick with Core Data. In my case, I normally have a lot of data, but I never actually felt the application going down, or suffering in terms of performance. So my questions are:
In this case, where sometimes (the weather thing was just an
example) I need to just remove all the data (a Twitter feed, for
example) and replace it by a completely new stream of data, is Core
Data worth? I think removing all the data, and inserting (populating) it, is heavier than just store the NSDictionary and replacing the old one.
Sometimes it would envolve images, textFiles, etc and the
NSFileManager does it perfectly, so what advantages could Core
Data bring in this cases?
P.S: I just saw this post, where this kind of question is made and numbers prove which one is actually faster. Still, I would like as well an empiric answer.

Core Data is worth using in every scenario you described. In fact, if an app stores more than preferences, you should probably use Core Data. Here are some reasons, among which, you'll find answers to your own problems:
is definitely faster than the filesystem, even if you wipe out everything and write it again as you describe (so you don't benefit to much from caching). This is basically because you can coalesce your operations and only access the store when needed. So if you read, write and read, you can save only once, the rest is done in memory, which is, needless to say, very fast.
everything is versioned and you can migrate from one version to another easily (while keeping the content the user has on the device)
80% of your model operations come free. Like, when something changes, you can override the willSave managed object method and notify your controllers.
using cascade makes it trivial to delete even very complex object structures
while is a bad idea to keep images in the database, you can still keep them on the filesystem and have core data delete them automatically when the managed object that represents them is deleted
is flexible, in fact is so flexible that you could migrate your project from using the local filesystem to using a server with very little modifications by writing a custom data store.
the core data designer basically creates the model objects for you. You don't need to create your own model classes (which you would have to if using the filesystem)

In this case ... is Core Data worth it?
Yes, to the extent that you need something more centrally managed than trying to draw up your own file-system schema. Core Data, and its lower-level cousin SQL, are still the best choice for persistence that we have right now. Besides, the performance hit of using NSKeyed(Un)Archiver to keep serializing/deserializing a dictionary over and over again becomes increasingly noticeable with larger datasets.
I think removing all the data, and inserting (populating) it, is heavier
than just store the NSDictionary and replacing the old one.
Absolutely, yes. But, you don't have to think about cache turnover like that. If you imagine Core Data as a static model, you can use a cache layer to ferry data in and out of the store. Need that resource about the weather? Check the cache layer. If it's not in there, make the cache run a fetch request. Need to turn over the whole cache? Have the cache empty itself then run a request to mark every entity with some kind of flag to show they are invalid. The expensive deletion you're worrying about can be done by a background process when you see that all your new data has been safely interned in the cache.
Sometimes it would envolve images, textFiles, etc and the
NSFileManager does it perfectly, so what advantages could Core Data
bring in this cases?
Unfortunately, not many. For blobs of data (which is essentially what Core Data does in these situations), storage and fetches to and from Core Data can quickly get costly. They can also take up a noticeably larger space on disk if they aren't compressed (which further decreases performance). If you need a faster alternative, use a store more suited to the task like Tokyo Cabinet or LevelDB, and use the entities in the Core Data store as a kind of stand-in that would, say, contain the key to the resource in one of those relational databases.

Optimal way of syncing Core Data with server-side data?

I have what I would presume is a very common situation, but as I'm new to iOS programming, I'm not sure of the optimal way to code it.
Synopsis:
I have data on a server which can be retrieved by the iPhone app via a REST service. On the server side, the data is objects with a foreign key (an integer id number).
I'm storing the data retrieved via REST in Core Data. The managed objects have an "objId" attribute so that I can uniquely identify the managed objects in the rest of my code.
My app must always reflect the server data.
On subsequent requests made to the server:
some objects may not be returned, they have been deleted on the server - in which case I need to delete the corresponding objects from Core Data - so that I'm reflecting the state of the server correctly.
some objects have attributes which have changed, therefore the corresponding managed objects need updating with the new data.
my solution - and question to you
To get things going in my app, I made the easiest solution of deleting all objects in Core Data, then adding all new objects in, created with the latest server side data.
I don't think this is the best way to approach it :) As I progress on with my app, I now want to link up my tableview with NSFetchedResultsController, and have realised that my approach of deleting everything and re-adding is not going to work any more.
What is the tried and trusted way of syncing Core Data with server side data?
Do I need to make a fetch request for each object id I get back from the server, and then update the object with the new data?
And then go through all of the objects in core data and see which ones have not been updated, and delete those?
Is that the best way to do it? It just seems a little expensive to do a fetch for each object in Core Data, that's all.
Pseudo code is fine for any answers :)
thanks in advance!

Well, consider your download. First, you should be doing this in a background thread (if not, there are lots of SO posts that talk about how to do that).
I would suggest that you implement what makes sense first, and then, after you can get valid performance data from running Instruments, consider performance optimization. Of course, use some common sense on "easy" performance stuff (your design can take care of the big ones easily enough).
Anyway, get your data from the online resource, and then, for each object fetched, use the "unique object id" to fetch the object from core data. You know there is only one object with that ID, so you can set fetchLimit to 1 on your fetch request. You can also configure your "object id" attribute to be an INDEX in the database. This way, you get the fastest search from the underlying database, and it knows to stop looking once it finds your one object. This should be pretty snappy.
Now you have your object. Change any attributes necessary. Save, rinse, and repeat.
Furthermore, for several reasons, you may want to know when objects were last updated. I'd suggest adding a timestamp to each object that gets changed with the current time every time an object is changed. This will also help in deleting objects. Since your online database does not tell you which objects are deleted, you must have some way to know that an item is "old and no longer needed."
An easy way to do this is to remember the time you started your update. After processing all objects from the download, you now have a way to find all the objects that were deleted from the online database. Basically, any object with a "last update" timestamp before the time you began the update should be removed (since they were not added or modified in the last update). You can also index the database on this field, which will make finding those objects faster - unless your database is huge, I'd wait to see what Instruments has to say about this one though.

Can you edit the same NSManagedObject in 2 different ManagedObjectContexts and merge their changes?

I'm syncing with a MySQL database.
Initially, I was going to loop through all my new/modified objects and set all the foreign keys for that object and then do the next object, and so on... But that's a lot of fetch requests.
So instead I wanted to loop through all my new/modified objects and set the foreign keys one at a time. So the first pass over my objects sets fk1, my next sets fk2, so on...
Cool, fetch requests drastically reduced. Now I'm curious if I could thread these fk setters. They aren't dependent on each other, but they are modifying the same object, even though they're only setting one relationship, and it's a different relationship. Speaking in git terms, these changes could be 'merged' together without any conflict, but is it possible to push changes in one child managedObjectContext(childContext:save) up to the parentManagedObjectContext(parent:performBlock^{parent:save}) and pull it down in another, different child managedObjectContext(???)? Or will the merge policy only take one childContext's version of the object and leave the other fks effectively unchanged.
I know this exists: NSManagedObjectContext/refreshObject:mergeChanges:
But that's on an object by object level. Will that cause a bunch of fetches? Or will that update my whole context at once/in batches?
Following Apple's suggestion from here:
https://developer.apple.com/library/mac/#documentation/Cocoa/Conceptual/CoreData/Articles/cdImporting.html
I've created/updated my values before I start setting any relationships, so all entities already exist before I try to point any relationships at them.
Aside: We have a couple apps that could benefit from the concurrency, because they throw a considerable amount of data around, and with the quad core iPad apps, this would really help out with the time the initial sync takes.

I'n not sure what you are trying to do and why (you could write less lines in your question and be more clear), but here are some guidelines for working with Core Data:
-- NSManagedObjectContext is not thread safe. Therefore, you need to limit your access to this managed object context to happen inside 1 thread. Otherwise, you may end up having many errors you can't understand.
-- NSManagedObjectContexts apart from doing other things, serve like "snapshots" of your persistent store. That means that when you change an object, you save it to the persistent store, you post a NSManagedObjectContextDidSaveNotification and then call mergeChangesFromContextDidSaveNotification: in another place inside your program in order to load the latest data from the persistent store. Be careful of thread safety.
NSManagedObjectContext/refreshObject:mergeChanges: according to apple is not about just refreshing a managed object. If you pass YES as the second argument, it will write any pending changes from this managed object context to the persistent store, and will load any other changes to other properties of this object from the persistent store, thus "synchronizing" your object with the persistent store. If you pass NO as the second argument, the object loses any pending changes, and it is being turned to a fault. That means that when you attempt to access it, the Managed Object Context will reload the object as it was last saved to the database. It will NOT reload the entire managed object context. It will only operate on the object.
Aside: I have written a blog post that scratches the surface of asynchronous loading from a core data database. In my case, since I'm doing heavy lifting with the database, I ended up using an NSOperation that operates with its own NSManagedObjectContext, and using serial GCD queues to save large chunks of data, since it was faster than having multiple threads accessing the same persistent store, even if they operate on different managed object contexts.
I hope I helped.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart