Core Data sectionNameKeyPath with Relationship Attribute Performance Issue - ios

I have a Core Data Model with three entities:
Person, Group, Photo with relationships between them as follows:
Person <<-----------> Group (one to many relationship)
Person <-------------> Photo (one to one)
When I perform a fetch using the NSFetchedResultsController in a UITableView, I want to group in sections the Person objects using the Group's entity name attribute.
For that, I use sectionNameKeyPath:#"group.name".
The problem is that when I'm using the attribute from the Group relationship, the NSFetchedResultsController fetches everything upfront in small batches of 20 (I have setFetchBatchSize: 20) instead of fetching batches while I'm scrolling the tableView.
If I use an attribute from the Person entity (like sectionNameKeyPath:#"name") to create sections everything works OK: the NSFetchResultsController loads small batches of 20 objects as I scroll.
The code I use to instantiate the NSFetchedResultsController:
- (NSFetchedResultsController *)fetchedResultsController {
if (_fetchedResultsController) {
return _fetchedResultsController;
}
NSFetchRequest *fetchRequest = [[NSFetchRequest alloc] init];
NSEntityDescription *entity = [NSEntityDescription entityForName:[Person description]
inManagedObjectContext:self.managedObjectContext];
[fetchRequest setEntity:entity];
// Specify how the fetched objects should be sorted
NSSortDescriptor *groupSortDescriptor = [[NSSortDescriptor alloc] initWithKey:#"group.name"
ascending:YES];
NSSortDescriptor *personSortDescriptor = [[NSSortDescriptor alloc] initWithKey:#"birthName"
ascending:YES
selector:#selector(localizedStandardCompare:)];
[fetchRequest setSortDescriptors:[NSArray arrayWithObjects:groupSortDescriptor, personSortDescriptor, nil]];
[fetchRequest setRelationshipKeyPathsForPrefetching:#[#"group", #"photo"]];
[fetchRequest setFetchBatchSize:20];
NSError *error = nil;
NSArray *fetchedObjects = [self.managedObjectContext executeFetchRequest:fetchRequest error:&error];
if (fetchedObjects == nil) {
NSLog(#"Error Fetching: %#", error);
}
_fetchedResultsController = [[NSFetchedResultsController alloc] initWithFetchRequest:fetchRequest
managedObjectContext:self.managedObjectContext sectionNameKeyPath:#"group.name" cacheName:#"masterCache"];
_fetchedResultsController.delegate = self;
return _fetchedResultsController;
}
This is what I get in Instruments if I create sections based on "group.name" without any interaction with the App's UI:
And this is what I get (with a bit of scrolling on UITableView) if sectionNameKeyPath is nil:
Please, can anyone help me out on this issue?
EDIT 1:
It seems that I get inconsistent results from the simulator and Instruments: when I've asked this question, the app was starting in the simulator in about 10 seconds (by Time Profiler) using the above code.
But today, using the same code as above, the app starts in the simulator in 900ms even if it makes a temporary upfront fetch for all the objects and it's not blocking the UI.
I've attached some fresh screenshots:
EDIT 2:
I reset the simulator and the results are intriguing: after performing an import operation and quitting the app the first run looked like this:
After a bit of scrolling:
Now this is what happens on a second run:
After the fifth run:
EDIT 3:
Running the app the seventh time and eight time, I get this:

This is your stated objective: "I need the Person objects to be grouped in sections by the relationship entity Group, name attribute and the NSFetchResultsController to perform fetches in small batches as I scroll and not upfront as it is doing now."
The answer is a little complicated, primarily because of how an NSFetchedResultsController builds sections, and how that affects the fetching behavior.
TL;DR; To change this behavior, you would need to change how NSFetchedResultsController builds sections.
What is happening?
When an NSFetchedResultsController is given a fetch request with pagination (fetchLimit and/or fetchBatchSize), several things happen.
If no sectionNameKeyPath is specified, it does exactly what you expect. The fetch returns an proxy array of results, with "real" objects for the first fetchBathSize number of items. So for example if you have setFetchBatchSize to 2, and your predicate matches 10 items in the store, the results contain the first two objects. The other objects will be fetched separately as they are accessed. This provides a smooth paginated response experience.
However, when a sectionNameKeyPath is specified, the fetched results controller has to do a bit more. To compute the sections it needs to access that key path on all the objects in the results. It enumerates the 10 items in the results in our example. The first two have already been fetched. The other 8 will be fetched during enumeration to get the key path value needed to build the section information. If you have a lot of results for your fetch request, this can be very inefficient. There are a number of public bugs concerning this functionality:
NSFetchedResultsController initially takes too long to set up sections
NSFetchedResultsController ignores fetchLimit property
NSFetchedResultsController, Table Index, and Batched Fetch Performance Issue
... And several others. When you think about it, this makes sense. To build the NSFetchedResultsSectionInfo objects requires the fetched results controller to see every value in the results for the sectionNameKeyPath, aggregate them to the unique union of values, and use that information to create the correct number of NSFetchedResultsSectionInfo objects, set the name and index title, know how many objects in the results a section contains, etc. To handle the general use case there is no way around this. With that in mind, your Instruments traces may make a lot more sense.
How can you change this?
You can attempt to build your own NSFetchedResultsController that provides an alternative strategy for building NSFetchedResultsSectionInfo objects, but you may run into some of the same problems. For example, if you are using the existing fetchedObjects functionality to access members of the fetch results, you will encounter the same behavior when accessing objects that are faults. Your implementation would need a strategy for dealing with this (it's doable, but very dependant on your needs and requirements).
Oh god no. What about some kind of temporary hack that just makes it perform a little better but doesn't fix the problem?
Altering your data model will not change the above behavior, but can change the performance impact slightly. Batch updates will not have any significant effect on this behavior, and in fact will not play nicely with a fetched results controller. It may be much more useful to you, however, to instead set the relationshipKeyPathsForPrefetching to include your "group" relationship, which may improve the fetching and faulting behavior significantly. Another strategy may be to perform another fetch to batch fault these objects before you attempt to use the fetched results controller, which will populate the various levels of Core Data in-memory caches in a more efficient manner.
The NSFetchedResultsController cache is primarily for section information. This prevents the sections from having to be completely recalculated on each change (in the best case), but can actually make the initial fetch to build the sections take much longer. You will have to experiment to see if the cache is worthwhile for your use case.
If your primary concern is that these Core Data operations are blocking user interaction, you can offload them from the main thread. NSFetchedResultsController can be used on a private queue (background) context, which will prevent Core Data operations from blocking the UI.

Based on my experience a way to achieve your goal is to denormalize your model. In particular, you could add a group attribute in your Person entity and use that attribute as sectionNameKeyPath. So, when you create a Person you should also pass the group it belongs to.
This denormalization process is correct since it allows you to avoid fetching of related Group objects since not necessary. A cons could be that if you change the name of a group, all the persons associated with that name must change, on the contrary you can have incorrect values.
The key aspect here is the following. You need to have in mind that Core Data is not a
relational database. The model should not designed as a database schema, where normalization could take place, but it should be designed from the perspective of how the data are presented and used in the user interface.
Edit 1
I cannot understand your comment, could you explain better?
What I've found very intriguing though is that even if the app is
performing a full upfront fetch in the simulator, the app loads in
900ms (with 5000 objects) on the device despite the simulator where it
loads much slower.
Anyway, I would be interested in knowing details about your Photo entity. If you pre-fetch photo the overall execution could be influenced.
Do you need to pre-fetch a Photo within your table view? Are they thumbs (small photos)? Or normal images? Do you take advantage of External Storage Flag?
Adding an additional attribute (say group) to the Person entity could not be a problem. Updating the value of that attribute when the name of a Group object changes it's not a problem if you perform it in background. In addition, starting from iOS 8 you have available a batch update as described in Core Data Batch Updates.

After almost a year since I've posted this question, I've finally found the culprits that enable this behaviour (which slightly changed in Xcode 6):
Regarding the inconsistent fetch times: I was using a cache and at the time I was back and forth with opening, closing and resetting the simulator.
Regarding the fact that everything was fetched upfront in small batches without scrolling (in Xcode 6's Core Data Instruments that's not the case anymore - now it's one, big fetch which takes entire seconds):
It seems that setFetchBatchSize does not work correctly with parent/child contexts. The issue was reported back in 2012 and it seems that it's still there http://openradar.appspot.com/11235622.
To overcome this issue, I created another independent context with an NSMainQueueConcurrencyType and set its persistence coordinator to be the same that my other contexts are using.
More about issue #2 here: https://stackoverflow.com/a/11470560/1641848

Related

Efficient way to get relationship count in database

I want to know what is the best way to get count of related entities in to-many relationship. Let's say I have a data model that looks like this (simplified), and I want to know the number of passengers for each bus:
Currently I can think of two options:
Add an extra attribute to bus entity called passengerCount which will be updated every time a passenger is added/removed.
Every time the count of passengers needs to be displayed, it's done by fetching the passengers and displaying their count.
Both of my options seem quite inefficient, even though I'm not aware how heavy it is to update/fetch values with core data. For example, imagine doing number 2 for every table view cell.
My question is: What is the best way to do this? A method in NSManagedObject class perhaps (I couldn't find any) or some other way that is more efficient?
Three remarks at the very beginning:
A. You should care about efficiency when you have a runtime problem. "Premature optimization is the root of all evil." (Donald Knuth)
B. Who said that all passenger entities has to be fetched? You think of something like this …
[bus.passengers count]
… causing passengers to be fetched. But Core Data supports faulting, so maybe the entities are maybe fetched into fault. (Having only an id, but not the full object.)
C. You can see what Core Data does, when you turn verbose mode on. To do so pass the launch argument
-com.apple.CoreData.SQLDebug 1
To your question itself:
If you really have a problem, you can ask for a count explicitly with -countForFetchRequest:error:.
NSFetchRequest *fetch = [NSFetchRequest fetchRequestWithEntityName:#"passenger"];
fetch.predicate = [NSPredicate predicateWithFormat:#"bus == %#", bus];
…
NSUInteger count = [context countForFetchRequest:fetch error:NULL]; // Please pass an NSError instance in real world
Typed in Safari.
The XCode auto-generated NSManagedObject class for your Core Data entity bus contains a property for its to-many relationships to Passenger objects.
You can think of this property like a "computed attribute" of your entity (meaning you will not set the attribute yourself but Core Data updates it automatically when you add or delete a relationship). This property is an NSSet? (with references to the related Passenger objects) and the NSSet supports the .count method.
So you can use .count without a special fetch request.

Core Data working with subset of full data store...HOW?

I'm banging my head trying to figure out something in Core Data that I think should be simple to do, and I need some help.
I have a data store which contains data from the past two years, but in my app, I have certain criteria so that the user only works with a subset of that data (i.e. just the past month). I have created the predicates to generate the fetch request and all that works fine.
My issue is that I then want to run some additional predicates on this subset of data (i.e. I just want objects with name=Sally). I'd like to do so without having to re-run the original predicate with an additional predicate (in a NSCompoundPredicate); I'd rather just run it on the subset of data already created.
Can I just run a predicate on the fetch results?
Is the format of the predicate the same as for the initial calls into the core data store?
Thanks for any help.
You could filter the original array of results using a predicate. See the NSArray filteredArrayUsingPredicate method.
The only reason I would consider doing what you're talking about is if you find that modifying the predicate at the FetchedResultsController level caused significant performance degradation.
My recommendation is to get a reference to your fetchedResultsController in your view controller and update your predicate with a compound predicate matching your additional search parameters.
If you were to tie your viewControllers data source to a predicate of a predicate you wouldn't be able to properly utilize the NSFetchedResultsControllers Delegate methods that allow for easy, dynamic updating of Views such as table view and collection view.
/* assuming you're abstracting your datastore with a singleton */
self.fetchedResultsController = [DataStore sharedStore].fetchedResultsController;
self.fetchedResultsController.delegate = self;
[self.fetchedResultsController.fetchRequest setPredicate:yourPredicate];
Make sure to configure your fetch requests' batch size to a reasonable value to improve performance.

Core Data — Find-or-Create Efficiently

According to Apple's documentation(link)—
There are many situations where you may need to find existing objects
(objects already saved in a store) for a set of discrete input values.
A simple solution is to create a loop, then for each value in turn
execute a fetch to determine whether there is a matching persisted
object and so on. This pattern does not scale well. If you profile
your application with this pattern, you typically find the fetch to be
one of the more expensive operations in the loop (compared to just
iterating over a collection of items). Even worse, this pattern turns
an O(n) problem into an O(n^2) problem.
It is much more efficient—when possible—to create all the managed
objects in a single pass, and then fix up any relationships in a
second pass. For example, if you import data that you know does not
contain any duplicates (say because your initial data set is empty),
you can just create managed objects to represent your data and not do
any searches at all. Or if you import "flat" data with no
relationships, you can create managed objects for the entire set and
weed out (delete) any duplicates before save using a single large IN
predicate.
Question 1: Considering that my data I'm importing doesn't have any relationships, how do I implement what is described in the last line.
If you do need to follow a find-or-create pattern—say because you're
importing heterogeneous data where relationship information is mixed
in with attribute information—you can optimize how you find existing
objects by reducing to a minimum the number of fetches you execute.
How to accomplish this depends on the amount of reference data you
have to work with. If you are importing 100 potential new objects, and
only have 2000 in your database, fetching all of the existing and
caching them may not represent a significant penalty (especially if
you have to perform the operation more than once). However, if you
have 100,000 items in your database, the memory pressure of keeping
those cached may be prohibitive.
You can use a combination of an IN predicate and sorting to reduce
your use of Core Data to a single fetch request.
Example code:
// Get the names to parse in sorted order.
NSArray *employeeIDs = [[listOfIDsAsString componentsSeparatedByString:#"\n"]
sortedArrayUsingSelector: #selector(compare:)];
// create the fetch request to get all Employees matching the IDs
NSFetchRequest *fetchRequest = [[NSFetchRequest alloc] init];
[fetchRequest setEntity:
[NSEntityDescription entityForName:#"Employee" inManagedObjectContext:aMOC]];
[fetchRequest setPredicate: [NSPredicate predicateWithFormat: #"(employeeID IN %#)", employeeIDs]];
// Make sure the results are sorted as well.
[fetchRequest setSortDescriptors:
#[ [[NSSortDescriptor alloc] initWithKey: #"employeeID" ascending:YES] ]];
// Execute the fetch.
NSError *error;
NSArray *employeesMatchingNames = [aMOC executeFetchRequest:fetchRequest error:&error];
You end up with two sorted arrays—one with the employee IDs passed
into the fetch request, and one with the managed objects that matched
them. To process them, you walk the sorted lists following these
steps:
Get the next ID and Employee. If the ID doesn't match the Employee ID,
create a new Employee for that ID. Get the next Employee: if the IDs
match, move to the next ID and Employee.
Question 2: In the above example, I get two sorted arrays as described above. Considering the worst case scenario where all the objects that are to be inserted are present in the store, I don't see anyway that I can solve the problem in O(n) time. Apple describes the two steps as above but that is an O(n^2) job. For any kth element in the input array, there might or might not exist an element that matches it in the first k elements in the output array. So in the worst case, the complexity will be O(nC2) = O(n^2).
So, what I believe Apple is doing is making sure that fetch only processes once even though there are O(n^2) checks required. If so, then I'll go with this; but is there any other way of doing this efficiently.
Please understand, that I don't want to fetch again and again - fetch once for an input array of size 100 identifiers.
Ad. 1 The fact of having relationships isn't important here. This explanation only says that if you download your data from e.g. a remote server and your items have some IDs, then you can fetch them all from the persistent store in one request, instead of fetching each object in a separate request.
Ad. 2
Apple describes the two steps as above but that is an O(n^2) job.
It's not. Please read these lines carefully:
To process them, you walk the sorted lists following these steps:
Get the next ID and Employee. If the ID doesn't match the Employee ID,
create a new Employee for that ID. Get the next Employee: if the IDs
match, move to the next ID and Employee.
You walk the arrays/lists simultaneously, so you never have to make this check: "there might or might not exist an element that matches it in the first k elements in the output array." You don't need to check previous elements as they're sorted and they certainly won't contain the object you're interested in.
If anyone is looking for the original Apple documentation there is snapshot here:
http://web.archive.org/web/20150908024050/https://developer.apple.com/library/mac/documentation/cocoa/conceptual/coredata/articles/cdimporting.html

Improve process of mirroring server database to a client database via JSON?

I have a ready enterprise (non-AppStore) legacy iOS application for iPad that I need to refactor (it was written by another developer, my predecessor on my current job).
This application fetches its data via JSON from a server having MSSQL database. The database schema has about 30 tables, the most capacious are: Client, City, Agency each having about 10.000 records each and the further growth is expected in the future. After the JSON is received (one JSON request-and-response pair for each table) - it is mapped to the CoreData - the process which also includes glueing together the corresponding CoreData entities (Client, City, Agency and others) with each other i.e. setting the relations beetween these entities on the CoreData layer.
In itself the project's CoreData fetch-part (or read-part) is heavily optimized - it uses, I guess, almost all possible performance and memory tweaks CoreData has, that is why UI layer of application is very fast and responsive, so that I consider its work as completely satisfactory and adequate.
The problem is the process of preparation of CoreData layer, i.e. the server-to-client synchronization process: it takes too much time. Consider 30 network requests resulting in 30 JSON packs ("pack" I mean "one table - one JSON"), which are then mapped to 30 CoreData entities, which are then glued together (the appropriate CoreData relations are set beetween them). When I first saw how all this is done in this project (too slow), the first idea to come into my head was:
"For the first time a complete synchronization is performed (app's first launch time) - perform a fetch of the whole database data in, say, one archived file (something like database dump) and then somehow import it as a whole to a Core Data land".
But then I realized that, even if such transmission of such one-file dump was possible, CoreData would still require me to perform a gluing of the corresponding CoreData entities to set the appropriate relations beetween them so that it is hard to imagine that I could benefit in performance if I would rely on this scheme.
Also, my colleague suggested me to consider SQLite as a complete alternative to Core Data, but unfortunately I don't have an experience of using it, that's why I am completely blind to foresee all the consequences of such serious design decision (even having the synchronization process very slow, my app does work, especially its UI performance is very good now). The only thing I can imagine about SQLite that in contrast to Core Data it will not push me to glue some additional relations on a client side, because SQLite has its good old foreign key system, doesn't it?
And so here are the questions (Respondents, please do not mix these points when you answer - there is too much confusion I have about all of them):
Does anybody have such experience of taking "first-time large import of the whole database" approach in a way I have described above? I would be very thankful to know about any solutions should they exploit JSON<->CoreData pair or not.
Does Core Data has some global import mechanism which can allow mass-creation of corresponding 30-tables-schema (possibly using some specific source other than "30 packs of JSON" described above) without a need of setting up corresponding relations for 30 entities?
Are there any possibilities to speed up the synchronization process if 2) is impossible? Here I mean the improvements of current JSON<->CoreData scheme my app uses.
Migration to SQLite: should I consider such migration? What I will benefit from it? How the whole process of replication->transmission->client preparations could look like then?
Other alternatives to CoreData and SQLite - what could they be or look like?
Any other thoughts or visions you may have about the situation I've described?
UPDATE 1
Though the answer written by Mundi is good (one large JSON, "No" for using SQLite), I am still interested if there are any other insights into the the problem I've described.
UPDATE 2
I did try to use my russian english the best way I could to describe my situation in a hope for my question could become pretty clear to everyone who will read it. By this second update I will try to provide it with some more guides to make my question even more clear.
Please, consider two dichotomies:
What can/should I use as a data layer on iOS client - CoreData vs SQLite?
What can/should I use as a transport layer - JSON (single-JSON-at-once as suggested in the answer, even zipped maybe) or some DB-itself-dumps (if it is even possible, of course - notice I am also asking this in my question).
I think it is pretty obvious the "sector" which is formed by intersection of these two dichotomies, choosing CoreData from the first one and JSON from the second is the most wide-spread default in iOS development world and also it is used by my app from this question.
Having that said, I claim that I would be thankful to see any answers regarding CoreData-JSON pair as well as the answers considering using any other "sectors" (what about opting SQLite and some kind of its dumps approach, why not?)
Also, important to note, that I don't want to just drop the current option for some other alternatives, I just want to get the solution working fast on both synchronization and UI phases of its usage. So answers about improving current scheme as well as answers suggesting the other schemes are welcome!
Now, please see the following update #3 which provides more details for my current CoreData-JSON situation:
UPDATE 3
As I have said, currently my app receives 30 packs of JSON - one pack for the whole table. Let's take capacious tables for example: Client, Agency, City.
It is Core Data, so if a client record has non-empty agency_id field, I need to create new Core Data entity of class Agency (NSManagedObject subclass) and fill it with this record's JSON data, that's why I need to already have corresponding Core Data entity for this agency of class Agency (NSManagedObject's subclass), and finally I need to do something like client.agency = agency; and then call [currentManagedObjectContext save:&error]. Having it done this way, later I can then ask this client to be fetched and ask its .agency property to find corresponding entity. I hope I am completely sane when I do it this way.
Now imagine this pattern applied to the following situation:
I have just received the following 3 separate JSON packs: 10000 clients and 4000 cities and 6000 agencies (client has one city, city has many clients; client has agency, agency has many clients, agency has one city, city has many agencies).
Now I want to setup the following relations on Core Data level: I want my client entity client to be connected to a corresponding city and corresponding agency.
The current implementation of this in the project does very ugly thing:
Since dependency order is the following: City -> Agency -> Client i.e. the City needs to be baked first, the application begins creating entities for City and persists them to Core Data.
Then it deals with the JSON of agencies: it iterates through every JSON record - for every agency, it creates a new entity agency and by its city_id, it fetches corresponding entity city and connects it using the agency.city = city. After the iteration through the whole agencies JSON array is done, current managed object context is saved (actually the -[managedObjectContext save:] is done several times, each after 500 records processed). At this step it is obvious that fetching one of 4000 cities for every client for every of 6000 agencies has a huge performance impact on the whole synchronization process.
Then, finally it deals with the JSON of clients: like in previous 2 stage, it iterates through the whole 10000-elements JSON array and one by one performs the fetch of corresponding agencies and ZOMG cities, and this impacts the overall performance in the same manner like previous stage 2 does.
It is all very BAD.
The only performance optimization I can see here, is that the first stage could leave a large dictionary with cities ids (I mean NSNumber's of real ids) and faulted City entities as values) so it would be possible to prevent ugly find process of the following stage 2 and then do the same on the stage 3 using the analogous caching trick, but the problem is that there are much more relations beetween all the 30 tables that just-described [ Client-City, Client-Agency, Agency-City ] so the final procedure involving a caching of all the entities will the most probably hit the resources iPad device reserves for my app.
UPDATE 4
Message for future respondents: I've tried my best to make this answer well-detailed and well-formed and I really expect you to answer with verbose answers. It would be great if your answer would really address the complexity of problem discussed here and complement my efforts I've made to make my question clear and general as much as possible. Thanks.
UPDATE 5
Related topics: Core Data on client (iOS) to cache data from a server Strategy, Trying to make a POST request with RestKit and map the response to Core Data.
UPDATE 6
Even after it is no more possible to open new bounties and there is accepted answer, I still would be glad to see any other answers containing additional information about the problem this topic addresses. Thanks in advance.
I have experience in a very similar project. The Core Data insertions take some time, so we condition the user that this will take a while, but only the first time. The best performance tweak was of course to get the batch size right between saves, but I am sure you are aware of that.
One performance suggestion: I have tried a few things and found that creating many download threads can be a hit on performance, I suppose because for each request there is some latency from the server etc.
Instead, I discovered that downloading all the JSON in one go was much faster. I do not know how much data you have, but I tested with > 100.000 records and a 40MB+ JSON string this works really fast, so the bottleneck is just the Core Data insertions. With an #autorelease pool this even performed acceptably on a first generation iPad.
Stay away from the SQLite API - it will take you more than a man year (provided high productivity) to replicate the performance optimizations you get out of the box with Core Data.
First off, you're doing a lot of work, and it will take some time no matter how you slice it, but there are ways to improve things.
I'd recommend doing your fetches in batches, with a batch size matching your batch size for processing new objects. For example, when creating new Agency records, do something like:
Make sure the current Agency batch is sorted by city_id. (I'll explain why later).
Get the City ID for each Agency in the batch. Depending on how your JSON is structured, this is probably a one-liner like this (since valueForKey works on arrays):
NSArray *cityIDs = [myAgencyBatch valueForKey:#"city_id"];
Get all the City instances for the current pass in one fetch by using the IDs you found in the previous step. Sort the results by city_id. Something like:
NSFetchRequest *request = [NSFetchRequest fetchRequestWithEntityName:#"City"];
NSPredicate *predicate = [NSPredicate predicateWithFormat:#"city_id in %#", cityIDs];
[request setPredicate:predicate];
[request setSortDescriptors:#[ [NSSortDescriptor sortDescriptorWithKey:#"city_id" ascending:YES] ]];
NSArray *cities = [context executeFetchRequest:request error:nil];
Now, you have one array of Agency and another one of City, both sorted by city_id. Match them up to set up the relationships (check city_id in case things don't match). Save changes, and go on to the next batch.
This will dramatically reduce the number of fetches you need to do, which should speed things up. For more on this technique, see Implementing Find-or-Create Efficiently in Apple's docs.
Another thing that may help is to "warm up" Core Data's internal cache with the objects you need before you start fetching them. This will save time later on because getting property values won't require a trip to the data store. For this you'd do something like:
NSFetchRequest *request = [NSFetchRequest fetchRequestWithEntityName:#"City"];
// no predicate, get everything
[request setResultType:NSManagedObjectIDResultType];
NSArray *notUsed = [context executeFetchRequest:request error:nil];
..and then just forget about the results. This is superficially useless but will alter the internal Core Data state for faster access to City instances later on.
Now as for your other questions,
Using SQLite directly instead of Core Data might not be a terrible choice for your situation. The benefit would be that you'd have no need to set up the relationships, since you could use use fields like city_id as foreign keys. So, fast importing. The downside, of course, is that you'll have to do your own work converting your model objects to/from SQL records, and probably rewrite quite a lot of existing code that assumes Core Data (e.g. every time you follow a relationship you now need to look up records by that foreign key). This change might fix your import performance issues, but the side effects could be significant.
JSON is generally a very good format if you're transmitting data as text. If you could prepare a Core Data store on the server, and if you would use that file as-is instead of trying to merge it into an existing data store, then that would almost certainly speed things up. Your import process would run once on the server and then never again. But those are big "if"s, especially the second one. If you get to where you need to merge a new server data store with existing data, you're right back to where you are now.
Do you have control of the server? I ask, because it sounds like you do from the following paragraph:
"For the first time a complete synchronization is performed (app's first launch time) - perform the fetch of the whole database data in, say, one archived file (something like database dump) and then somehow import it as a whole to the CoreData land".
If sending a dump is possible, why not send the Core Data file itself? Core Data (by default) is backed by a SQLite database -- why not generate that database on the server, zip it and send it across the wire?
This would mean you could eliminate all the JSON parsing, network requests etc and replace it with a simple file download and archive extraction. We did this on a project and it improved performance immeasurably.
For each row in your table there must be a timestamp column. If there isn't one, you should add it.
First time and each time you fetch database dump you store last update date and time.
On every next time you instruct the database to return only those records that were changed or updated since the previous download operation. There also should be a "deleted" flag for you to remove vanished records.
Then you only need to update certain matching records saving time on all fronts.
To speed up the first time sync you can also ship a seed database with the app, so that it could be imported immediately without any network operations.
Download the JSON files by hand.
Put them into your project.
Somewhere in the project configuration or header files take a note of download date and time.
On the first run, locate and load said files, then proceed like you're updating them.
If in doubt, refer to the manual.
Example:
NSString *filePath = [[NSBundle mainBundle] pathForResource:#"cities"
ofType:#"json"];
NSData *citiesData = [NSData dataWithContentsOfFile:filePath];
// I assume that you're loading an array
NSArray *citiesSeed = [NSJSONSerialization JSONObjectWithData:citiesData
options:NSJSONReadingMutableContainers error:nil];
Here you have my recommendations:
Use magicalrecord. It's a CoreData wrapper that saves you a lot of boilerplate code, plus it comes with very interesting features.
Download all the JSON in one request, as others suggested. If you can embed the first JSON document into the app, you can save the download time and start populating the database right when you open the app for the first time. Also, with magicalrecord is quite easy to perform this save operation in a separate thread and then sync all contexts automatically. This can improve the responsiveness of your app.
It seems that you should refactor that ugly method once you have solved the first import issue. Again, I would suggest to use magicalrecord to easily create those entities.
We've recently moved a fairly large project from Core Data to SQLite, and one of the main reasons was bulk insert performance. There were quite a few features we lost in the transition, and I would not advise you to make the switch if you can avoid it. After the transition to SQLite, we actually had performance issues in areas other than bulk inserts which Core Data was transparently handling for us, and even though we fixed those new issues, it took some amount of time getting back up and running. Although we've spent some time and effort in transitioning from Core Data to SQLite, I can't say that there are any regrets.
With that cleared up, I'd suggest you get some baseline measurements before you go about fixing the bulk insert performance.
Measure how long it takes to insert those records in the current state.
Skip setting up the relationships between those objects altogether, and then measure the insert performance.
Create a simple SQLite database, and measure the insert performance with that. This should give you a very good baseline estimate of how long it takes to perform the actual SQL inserts and will also give you a good idea of the Core Data overhead.
A few things you can try off the bat to speed up inserts:
Ensure that there are no active fetched results controllers when you are performing the bulk inserts. By active, I mean fetched result controllers that have a non-nil delegate. In my experience, Core Data's change tracking was the single most expensive operation when trying to do bulk insertions.
Perform all changes in a single context, and stop merging changes from different contexts until this bulk inserts are done.
To get more insight into what's really going on under the hood, enable Core Data SQL debugging and see the SQL queries that are being executed. Ideally, you'd want to see a lot of INSERTs, and a few UPDATEs. But if you come across too many SELECTs, and/or UPDATEs, then that's a sign that you are doing too much reading, or updating of objects.
Use the Core-Data profiler instrument to get a better high-level overview of what's going on with Core Data.
I've decided to write my own answer summarizing the techniques and advices I found useful for my situation. Thanks to all folks who posted their answers.
I. Transport
"One JSON". This is the idea that I want to give a try. Thanks #mundi.
The idea of archiving JSON before sending it to a client, be it a one JSON pack or a 30 separate 'one table - one pack'.
II. Setting up Core Data relations
I will describe a process of importing JSON->CoreData import using imaginary large import operation as if it was performed in one method (I am not sure will it be so or not - maybe I split it into a logical chunks).
Let's imagine that in my imaginary app there are 15 capacious tables, where "capacious" means "cannot be held in memory at once, should be imported using batches" and 15 non-capacious tables each having <500 records, for example:
Capacious:
cities (15k+)
clients (30k+)
users (15k+)
events (5k+)
actions (2k+)
...
Small:
client_types (20-)
visit_types (10-)
positions (10-)
...
Let's imagine, that I already have JSON packs downloaded and parsed into composite NSArray/NSDictionary variables: I have citiesJSON, clientsJSON, usersJSON, ...
1. Work with small tables first
My pseudo-method starts with import of tiny tables first. Let's take client_types table: I iterate through clientTypesJSON and create ClientType objects (NSManagedObject's subclasses). More than that I collect resulting objects in a dictionary with these objects as its values and "ids" (foreign keys) of these objects as keys.
Here is the pseudocode:
NSMutableDictionary *clientTypesIdsAndClientTypes = [NSMutableDictionary dictionary];
for (NSDictionary *clientTypeJSON in clientsJSON) {
ClientType *clientType = [NSEntityDescription insertNewObjectForEntityForName:#"ClientType" inManagedObjectContext:managedObjectContext];
// fill the properties of clientType from clientTypeJSON
// Write prepared clientType to a cache
[clientTypesIdsAndClientTypes setValue:clientType forKey:clientType.id];
}
// Persist all clientTypes to a store.
NSArray *clientTypes = [clientTypesIdsAndClientTypes allValues];
[managedObjectContext obtainPermanentIDsForObjects:clientTypes error:...];
// Un-fault (unload from RAM) all the records in the cache - because we don't need them in memory anymore.
for (ClientType *clientType in clientTypes) {
[managedObjectContext refreshObject:clientType mergeChanges:NO];
}
The result is that we have a bunch of dictionaries of small tables, each having corresponding set of objects and their ids. We will use them later without a refetching because they are small and their values (NSManagedObjects) are now faults.
2. Use the cache dictionary of objects from small tables obtained during step 1 to set up relationships with them
Let's consider complex table clients: we have clientsJSON and we need to set up a clientType relation for each client record, it is easy because we do have a cache with clientTypes and their ids:
for (NSDictionary *clientJSON in clientsJSON) {
Client *client = [NSEntityDescription insertNewObjectForEntityForName:#"Client" inManagedObjectContext:managedObjectContext];
// Setting up SQLite field
client.client_type_id = clientJSON[#"client_type_id"];
// Setting up Core Data relationship beetween client and clientType
client.clientType = clientTypesIdsAndClientTypes[client.client_type_id];
}
// Save and persist
3. Dealing with large tables - batches
Let's consider a large clientsJSON having 30k+ clients in it. We do not iterate through the whole clientsJSON but split it into a chunks of appropriate size (500 records), so that [managedObjectContext save:...] is called every 500 records. Also it is important to wrap operation with each 500-records batch into an #autoreleasepool block - see Reducing memory overhead in Core Data Performance guide
Be careful - the step 4 describes the operation applied to a batch of 500 records not to a whole clientsJSON!
4. Dealing with large tables - setting up relationships with large tables
Consider the following method, we will use in a moment:
#implementation NSManagedObject (Extensions)
+ (NSDictionary *)dictionaryOfExistingObjectsByIds:(NSArray *)objectIds inManagedObjectContext:(NSManagedObjectContext *)managedObjectContext {
NSDictionary *dictionaryOfObjects;
NSArray *sortedObjectIds = [objectIds sortedArrayUsingSelector:#selector(compare:)];
NSFetchRequest *fetchRequest = [[NSFetchRequest alloc] initWithEntityName:NSStringFromClass(self)];
fetchRequest.predicate = [NSPredicate predicateWithFormat:#"(id IN %#)", sortedObjectIds];
fetchRequest.sortDescriptors = #[[[NSSortDescriptor alloc] initWithKey: #"id" ascending:YES]];
fetchRequest.includesPropertyValues = NO;
fetchRequest.returnsObjectsAsFaults = YES;
NSError *error;
NSArray *fetchResult = [managedObjectContext executeFetchRequest:fetchRequest error:&error];
dictionaryOfObjects = [NSMutableDictionary dictionaryWithObjects:fetchResult forKeys:sortedObjectIds];
return dictionaryOfObjects;
}
#end
Let's consider clientsJSON pack containing a batch (500) of Client records we need to save. Also we need to set up a relationship beetween these clients and their agencies (Agency, foreign key is agency_id).
NSMutableArray *agenciesIds = [NSMutableArray array];
NSMutableArray *clients = [NSMutableArray array];
for (NSDictionary *clientJSON in clientsJSON) {
Client *client = [NSEntityDescription insertNewObjectForEntityForName:#"Client" inManagedObjectContext:managedObjectContext];
// fill client fields...
// Also collect agencies ids
if ([agenciesIds containsObject:client.agency_id] == NO) {
[agenciesIds addObject:client.agency_id];
}
[clients addObject:client];
}
NSDictionary *agenciesIdsAndAgenciesObjects = [Agency dictionaryOfExistingObjectsByIds:agenciesIds];
// Setting up Core Data relationship beetween Client and Agency
for (Client *client in clients) {
client.agency = agenciesIdsAndAgenciesObjects[client.agency_id];
}
// Persist all Clients to a store.
[managedObjectContext obtainPermanentIDsForObjects:clients error:...];
// Un-fault all the records in the cache - because we don't need them in memory anymore.
for (Client *client in clients) {
[managedObjectContext refreshObject:client mergeChanges:NO];
}
Most of what I use here is described in these Apple guides: Core Data performance, Efficiently importing data. So the summary for steps 1-4 is the following:
Turn objects into faults when they are persisted and so their property values become unnecessary as import operation goes futher.
Construct dictionaries with objects as values and their ids as keys, so these dictionaries can serve as lookup tables when constructing a relationships beetween these objects and other objects.
Use #autoreleasepool when iterating through a large number of records.
Use a method similar to dictionaryOfExistingObjectsByIds or to a method that Tom references in his answer, from Efficiently importing data - a method that has SQL IN predicate behind it to significantly reduce a number of fetches. Read Tom's answer and referenced Apple's corresponding guide to better understand this technique.
Good reading on this topic
objc.io issue #4: Importing Large Data Sets

Creating sections with NSFetchedResultsController, on the fly

I'm using NSFetchedResultsController (NSFRC) to display information in a UITableView. I'm trying to create the option for the user to sort the cells in sections as opposed to alphabetically. The problem is, the sections would then be determined using downloaded information. On top of this the section for each item will be changing relatively often so I don't want to save the section. I have noticed the mention of transient attributes, in my research of similar problems, but i've never used these before I'm not sure if I can use them baring in mind that all the calculations are done once the data has already been loaded, and I also want this solution to be compatible with my previous Core Data database. Also I'm not particularly great at Core Data, (nor Objective-C at that!) so I'm not entirely sure how I'd go about doing this.
So here's what I want to go for if we're using transient attributes (this next bit is theoretical as I don't know if transient attributes are the correct way forward). I would like 4 possible sections, 0-3 (I'll rename them using the TableView delegate to get around sorting problems). When the calculations are done, each cell will be assigned the transient attribute (if needed, the default section would be 2). I hope this all makes sense.
Right, now for some theoretical code. First I create the transient property in the Data Model screen-thing, and make it transient by checking the transient check box... Sounds simple enough.
In the code for the calculations in willDisplayCell (needs to be done in wDC for a couple of reasons), the entity could be saved like this:
MyEntity *myEntity = [self.fetchedResultsController objectAtIndexPath:indexPath];
myEntity.sectionTransientProperty = 2;
if (![self.managedObjectContext save:&error]) {
NSLog(#"Error: %#", error);
FATAL_CORE_DATA_ERROR(error);
return;
}
Done, right? Is that how we assign a value to a transient property?
Then I change the sorting option in NSFRC when I alloc it:
fetchedResultsController = [[NSFetchedResultsController alloc]
initWithFetchRequest:fetchRequest
managedObjectContext:self.managedObjectContext
sectionNameKeyPath:#"sectionTransientProperty"
cacheName:#"MyEntity"];
How are we doing, what else do I need to do? Or have I got this so horribly wrong I should just give up on Core Data and NSFRC? If you guys could help guide me through this I'd really appreciate it. If you need me to post any more code I would be happy to.
Regards,
Mike
If you want an FRC with sections, you have to add a sort descriptor to the fetch request, and that sort descriptor cannot be based on transient attributes.
See the documentation of initWithFetchRequest:managedObjectContext:sectionNameKeyPath:cacheName:`:
If the controller generates sections, the first sort descriptor in
the array is used to group the objects into sections; its key must
either be the same as sectionNameKeyPath or the relative ordering
using its key must match that using sectionNameKeyPath.
and Fetch Predicates and Sort Descriptors in the "Core Data Programming Guide":
The SQL store, on the other hand, compiles the predicate and sort
descriptors to SQL and evaluates the result in the database itself.
This is done primarily for performance, but it means that evaluation
happens in a non-Cocoa environment, and so sort descriptors (or
predicates) that rely on Cocoa cannot work. The supported sort
selectors are ...
In addition you cannot sort on transient properties using the SQLite store.
This means that you cannot create sections purely on transient attributes. You need a persistent attribute that creates the ordering for the sections.
UPDATE: A typical use of a transient attribute as sectionNameKeyPath is: Your objects have a "timeStamp" attribute, and you want to group the objects into sections with one section per month (see the DateSectionTitles sample code from the iOS Developer Library). In this case you have
a persistent attribute "timeStamp",
use "timeStamp" as first sort descriptor for the fetch request,
a transient attribute "sectionIdentifier" which is used as sectionNameKeyPath. "sectionIdentifier" is calculated from "timeStamp" and returns a string representing the year and the month of the timestamp, e.g. "2013-01".
The first thing the FRC does is to sort all fetched objects according to the "timeStamp" attribute. Then the objects are grouped into sections according to the "sectionIdentifier" attribute.
So for a FRC to group the objects into sections you really need a persistent attribute. The easiest solution would be to add a persistent attribute "sectionNumber" to your entity, and use that for "sectionNameKeyPath" and for the first sort descriptor.

Resources