Efficient way to get relationship count in database - ios

I want to know what is the best way to get count of related entities in to-many relationship. Let's say I have a data model that looks like this (simplified), and I want to know the number of passengers for each bus:
Currently I can think of two options:
Add an extra attribute to bus entity called passengerCount which will be updated every time a passenger is added/removed.
Every time the count of passengers needs to be displayed, it's done by fetching the passengers and displaying their count.
Both of my options seem quite inefficient, even though I'm not aware how heavy it is to update/fetch values with core data. For example, imagine doing number 2 for every table view cell.
My question is: What is the best way to do this? A method in NSManagedObject class perhaps (I couldn't find any) or some other way that is more efficient?

Three remarks at the very beginning:
A. You should care about efficiency when you have a runtime problem. "Premature optimization is the root of all evil." (Donald Knuth)
B. Who said that all passenger entities has to be fetched? You think of something like this …
[bus.passengers count]
… causing passengers to be fetched. But Core Data supports faulting, so maybe the entities are maybe fetched into fault. (Having only an id, but not the full object.)
C. You can see what Core Data does, when you turn verbose mode on. To do so pass the launch argument
-com.apple.CoreData.SQLDebug 1
To your question itself:
If you really have a problem, you can ask for a count explicitly with -countForFetchRequest:error:.
NSFetchRequest *fetch = [NSFetchRequest fetchRequestWithEntityName:#"passenger"];
fetch.predicate = [NSPredicate predicateWithFormat:#"bus == %#", bus];
…
NSUInteger count = [context countForFetchRequest:fetch error:NULL]; // Please pass an NSError instance in real world
Typed in Safari.

The XCode auto-generated NSManagedObject class for your Core Data entity bus contains a property for its to-many relationships to Passenger objects.
You can think of this property like a "computed attribute" of your entity (meaning you will not set the attribute yourself but Core Data updates it automatically when you add or delete a relationship). This property is an NSSet? (with references to the related Passenger objects) and the NSSet supports the .count method.
So you can use .count without a special fetch request.

Related

Core data force to-many faults to all be fired?

I have a collection of CoreData objects that have a to-many relationship to another object type.
On some of these object I need to search through the related objects to find a particular one. So I loop through them looking for a match which works fine. But on closer inspection I can see CoreData firing off each fault off as it gets to each item in the loop, obviously this is no good - hundreds of faults fired off individually.
Can I trigger CoreData to fire all of the faults in that relationship as a group?
I don't want to just prefech the relationship in the first place because I am dealing with a very large number of objects and for almost all of them I won't ever need to drill down into the related objects.
You could use the inverse relationship to "manually" fetch the related objects, using a predicate to restrict the results. For example if Department has a to-many relationship to Employee and you want to fetch all the Employees for currentDepartment, the fetch might look like this:
NSFetchRequest *employeeFetch = [NSFetchRequest fetchRequestWithEntityName:#"Employee"];
employeeFetch.predicate = [NSPredicate predicateWithFormat:#"department == %#",currentDepartment"];
This will fetch the required Employee objects in one go (*). You could then use either the array returned by the fetch, or the set given by the currentDepartment.employees relationship to search through. Depending on the complexity of the search you are performing, you might even be able to express it as another clause in the predicate, and avoid the need to loop at all.
(*) Technically, the objects returned by the fetch will still be faults (unless you set returnsObjectsAsFaults to false), but the data for these faults has been fetched from the store into the cache, so firing the fault will now have minimal overhead.

Core Data sectionNameKeyPath with Relationship Attribute Performance Issue

I have a Core Data Model with three entities:
Person, Group, Photo with relationships between them as follows:
Person <<-----------> Group (one to many relationship)
Person <-------------> Photo (one to one)
When I perform a fetch using the NSFetchedResultsController in a UITableView, I want to group in sections the Person objects using the Group's entity name attribute.
For that, I use sectionNameKeyPath:#"group.name".
The problem is that when I'm using the attribute from the Group relationship, the NSFetchedResultsController fetches everything upfront in small batches of 20 (I have setFetchBatchSize: 20) instead of fetching batches while I'm scrolling the tableView.
If I use an attribute from the Person entity (like sectionNameKeyPath:#"name") to create sections everything works OK: the NSFetchResultsController loads small batches of 20 objects as I scroll.
The code I use to instantiate the NSFetchedResultsController:
- (NSFetchedResultsController *)fetchedResultsController {
if (_fetchedResultsController) {
return _fetchedResultsController;
}
NSFetchRequest *fetchRequest = [[NSFetchRequest alloc] init];
NSEntityDescription *entity = [NSEntityDescription entityForName:[Person description]
inManagedObjectContext:self.managedObjectContext];
[fetchRequest setEntity:entity];
// Specify how the fetched objects should be sorted
NSSortDescriptor *groupSortDescriptor = [[NSSortDescriptor alloc] initWithKey:#"group.name"
ascending:YES];
NSSortDescriptor *personSortDescriptor = [[NSSortDescriptor alloc] initWithKey:#"birthName"
ascending:YES
selector:#selector(localizedStandardCompare:)];
[fetchRequest setSortDescriptors:[NSArray arrayWithObjects:groupSortDescriptor, personSortDescriptor, nil]];
[fetchRequest setRelationshipKeyPathsForPrefetching:#[#"group", #"photo"]];
[fetchRequest setFetchBatchSize:20];
NSError *error = nil;
NSArray *fetchedObjects = [self.managedObjectContext executeFetchRequest:fetchRequest error:&error];
if (fetchedObjects == nil) {
NSLog(#"Error Fetching: %#", error);
}
_fetchedResultsController = [[NSFetchedResultsController alloc] initWithFetchRequest:fetchRequest
managedObjectContext:self.managedObjectContext sectionNameKeyPath:#"group.name" cacheName:#"masterCache"];
_fetchedResultsController.delegate = self;
return _fetchedResultsController;
}
This is what I get in Instruments if I create sections based on "group.name" without any interaction with the App's UI:
And this is what I get (with a bit of scrolling on UITableView) if sectionNameKeyPath is nil:
Please, can anyone help me out on this issue?
EDIT 1:
It seems that I get inconsistent results from the simulator and Instruments: when I've asked this question, the app was starting in the simulator in about 10 seconds (by Time Profiler) using the above code.
But today, using the same code as above, the app starts in the simulator in 900ms even if it makes a temporary upfront fetch for all the objects and it's not blocking the UI.
I've attached some fresh screenshots:
EDIT 2:
I reset the simulator and the results are intriguing: after performing an import operation and quitting the app the first run looked like this:
After a bit of scrolling:
Now this is what happens on a second run:
After the fifth run:
EDIT 3:
Running the app the seventh time and eight time, I get this:
This is your stated objective: "I need the Person objects to be grouped in sections by the relationship entity Group, name attribute and the NSFetchResultsController to perform fetches in small batches as I scroll and not upfront as it is doing now."
The answer is a little complicated, primarily because of how an NSFetchedResultsController builds sections, and how that affects the fetching behavior.
TL;DR; To change this behavior, you would need to change how NSFetchedResultsController builds sections.
What is happening?
When an NSFetchedResultsController is given a fetch request with pagination (fetchLimit and/or fetchBatchSize), several things happen.
If no sectionNameKeyPath is specified, it does exactly what you expect. The fetch returns an proxy array of results, with "real" objects for the first fetchBathSize number of items. So for example if you have setFetchBatchSize to 2, and your predicate matches 10 items in the store, the results contain the first two objects. The other objects will be fetched separately as they are accessed. This provides a smooth paginated response experience.
However, when a sectionNameKeyPath is specified, the fetched results controller has to do a bit more. To compute the sections it needs to access that key path on all the objects in the results. It enumerates the 10 items in the results in our example. The first two have already been fetched. The other 8 will be fetched during enumeration to get the key path value needed to build the section information. If you have a lot of results for your fetch request, this can be very inefficient. There are a number of public bugs concerning this functionality:
NSFetchedResultsController initially takes too long to set up sections
NSFetchedResultsController ignores fetchLimit property
NSFetchedResultsController, Table Index, and Batched Fetch Performance Issue
... And several others. When you think about it, this makes sense. To build the NSFetchedResultsSectionInfo objects requires the fetched results controller to see every value in the results for the sectionNameKeyPath, aggregate them to the unique union of values, and use that information to create the correct number of NSFetchedResultsSectionInfo objects, set the name and index title, know how many objects in the results a section contains, etc. To handle the general use case there is no way around this. With that in mind, your Instruments traces may make a lot more sense.
How can you change this?
You can attempt to build your own NSFetchedResultsController that provides an alternative strategy for building NSFetchedResultsSectionInfo objects, but you may run into some of the same problems. For example, if you are using the existing fetchedObjects functionality to access members of the fetch results, you will encounter the same behavior when accessing objects that are faults. Your implementation would need a strategy for dealing with this (it's doable, but very dependant on your needs and requirements).
Oh god no. What about some kind of temporary hack that just makes it perform a little better but doesn't fix the problem?
Altering your data model will not change the above behavior, but can change the performance impact slightly. Batch updates will not have any significant effect on this behavior, and in fact will not play nicely with a fetched results controller. It may be much more useful to you, however, to instead set the relationshipKeyPathsForPrefetching to include your "group" relationship, which may improve the fetching and faulting behavior significantly. Another strategy may be to perform another fetch to batch fault these objects before you attempt to use the fetched results controller, which will populate the various levels of Core Data in-memory caches in a more efficient manner.
The NSFetchedResultsController cache is primarily for section information. This prevents the sections from having to be completely recalculated on each change (in the best case), but can actually make the initial fetch to build the sections take much longer. You will have to experiment to see if the cache is worthwhile for your use case.
If your primary concern is that these Core Data operations are blocking user interaction, you can offload them from the main thread. NSFetchedResultsController can be used on a private queue (background) context, which will prevent Core Data operations from blocking the UI.
Based on my experience a way to achieve your goal is to denormalize your model. In particular, you could add a group attribute in your Person entity and use that attribute as sectionNameKeyPath. So, when you create a Person you should also pass the group it belongs to.
This denormalization process is correct since it allows you to avoid fetching of related Group objects since not necessary. A cons could be that if you change the name of a group, all the persons associated with that name must change, on the contrary you can have incorrect values.
The key aspect here is the following. You need to have in mind that Core Data is not a
relational database. The model should not designed as a database schema, where normalization could take place, but it should be designed from the perspective of how the data are presented and used in the user interface.
Edit 1
I cannot understand your comment, could you explain better?
What I've found very intriguing though is that even if the app is
performing a full upfront fetch in the simulator, the app loads in
900ms (with 5000 objects) on the device despite the simulator where it
loads much slower.
Anyway, I would be interested in knowing details about your Photo entity. If you pre-fetch photo the overall execution could be influenced.
Do you need to pre-fetch a Photo within your table view? Are they thumbs (small photos)? Or normal images? Do you take advantage of External Storage Flag?
Adding an additional attribute (say group) to the Person entity could not be a problem. Updating the value of that attribute when the name of a Group object changes it's not a problem if you perform it in background. In addition, starting from iOS 8 you have available a batch update as described in Core Data Batch Updates.
After almost a year since I've posted this question, I've finally found the culprits that enable this behaviour (which slightly changed in Xcode 6):
Regarding the inconsistent fetch times: I was using a cache and at the time I was back and forth with opening, closing and resetting the simulator.
Regarding the fact that everything was fetched upfront in small batches without scrolling (in Xcode 6's Core Data Instruments that's not the case anymore - now it's one, big fetch which takes entire seconds):
It seems that setFetchBatchSize does not work correctly with parent/child contexts. The issue was reported back in 2012 and it seems that it's still there http://openradar.appspot.com/11235622.
To overcome this issue, I created another independent context with an NSMainQueueConcurrencyType and set its persistence coordinator to be the same that my other contexts are using.
More about issue #2 here: https://stackoverflow.com/a/11470560/1641848

Core Data — Find-or-Create Efficiently

According to Apple's documentation(link)—
There are many situations where you may need to find existing objects
(objects already saved in a store) for a set of discrete input values.
A simple solution is to create a loop, then for each value in turn
execute a fetch to determine whether there is a matching persisted
object and so on. This pattern does not scale well. If you profile
your application with this pattern, you typically find the fetch to be
one of the more expensive operations in the loop (compared to just
iterating over a collection of items). Even worse, this pattern turns
an O(n) problem into an O(n^2) problem.
It is much more efficient—when possible—to create all the managed
objects in a single pass, and then fix up any relationships in a
second pass. For example, if you import data that you know does not
contain any duplicates (say because your initial data set is empty),
you can just create managed objects to represent your data and not do
any searches at all. Or if you import "flat" data with no
relationships, you can create managed objects for the entire set and
weed out (delete) any duplicates before save using a single large IN
predicate.
Question 1: Considering that my data I'm importing doesn't have any relationships, how do I implement what is described in the last line.
If you do need to follow a find-or-create pattern—say because you're
importing heterogeneous data where relationship information is mixed
in with attribute information—you can optimize how you find existing
objects by reducing to a minimum the number of fetches you execute.
How to accomplish this depends on the amount of reference data you
have to work with. If you are importing 100 potential new objects, and
only have 2000 in your database, fetching all of the existing and
caching them may not represent a significant penalty (especially if
you have to perform the operation more than once). However, if you
have 100,000 items in your database, the memory pressure of keeping
those cached may be prohibitive.
You can use a combination of an IN predicate and sorting to reduce
your use of Core Data to a single fetch request.
Example code:
// Get the names to parse in sorted order.
NSArray *employeeIDs = [[listOfIDsAsString componentsSeparatedByString:#"\n"]
sortedArrayUsingSelector: #selector(compare:)];
// create the fetch request to get all Employees matching the IDs
NSFetchRequest *fetchRequest = [[NSFetchRequest alloc] init];
[fetchRequest setEntity:
[NSEntityDescription entityForName:#"Employee" inManagedObjectContext:aMOC]];
[fetchRequest setPredicate: [NSPredicate predicateWithFormat: #"(employeeID IN %#)", employeeIDs]];
// Make sure the results are sorted as well.
[fetchRequest setSortDescriptors:
#[ [[NSSortDescriptor alloc] initWithKey: #"employeeID" ascending:YES] ]];
// Execute the fetch.
NSError *error;
NSArray *employeesMatchingNames = [aMOC executeFetchRequest:fetchRequest error:&error];
You end up with two sorted arrays—one with the employee IDs passed
into the fetch request, and one with the managed objects that matched
them. To process them, you walk the sorted lists following these
steps:
Get the next ID and Employee. If the ID doesn't match the Employee ID,
create a new Employee for that ID. Get the next Employee: if the IDs
match, move to the next ID and Employee.
Question 2: In the above example, I get two sorted arrays as described above. Considering the worst case scenario where all the objects that are to be inserted are present in the store, I don't see anyway that I can solve the problem in O(n) time. Apple describes the two steps as above but that is an O(n^2) job. For any kth element in the input array, there might or might not exist an element that matches it in the first k elements in the output array. So in the worst case, the complexity will be O(nC2) = O(n^2).
So, what I believe Apple is doing is making sure that fetch only processes once even though there are O(n^2) checks required. If so, then I'll go with this; but is there any other way of doing this efficiently.
Please understand, that I don't want to fetch again and again - fetch once for an input array of size 100 identifiers.
Ad. 1 The fact of having relationships isn't important here. This explanation only says that if you download your data from e.g. a remote server and your items have some IDs, then you can fetch them all from the persistent store in one request, instead of fetching each object in a separate request.
Ad. 2
Apple describes the two steps as above but that is an O(n^2) job.
It's not. Please read these lines carefully:
To process them, you walk the sorted lists following these steps:
Get the next ID and Employee. If the ID doesn't match the Employee ID,
create a new Employee for that ID. Get the next Employee: if the IDs
match, move to the next ID and Employee.
You walk the arrays/lists simultaneously, so you never have to make this check: "there might or might not exist an element that matches it in the first k elements in the output array." You don't need to check previous elements as they're sorted and they certainly won't contain the object you're interested in.
If anyone is looking for the original Apple documentation there is snapshot here:
http://web.archive.org/web/20150908024050/https://developer.apple.com/library/mac/documentation/cocoa/conceptual/coredata/articles/cdimporting.html

Correct way to accessing attributes of a class modeled in core data

Let's say a Recipe object has an NSSet of one or more Ingredients, and that the same relationship is modeled in core data.
Given a recipe, what id the correct way to access its ingredients?
In this example it seems natural to use recipe.ingredients, but I could equally use an NSFetchRequest for Ingredient entities with an NSPredicate to match by recipe.
Now let's say I want only the ingredients that are 'collected'. This is less clear cut to me - should I use a fetch request for ingredients with a predicate restricting by recipe and collected state? Or loop through recipe.ingredients?
At the other end of the scale, perhaps I need only ingredients from this recipe that also appear in other recipes. Now, the fetch request seems more appealing.
What is the correct general approach? Or is it a case by case scenario? I am interested in the impact on:
Consitancy
Readability
Performance
Robustness (for example, it is easy to make an error in a fetch request that the compiler cannot catch).
Let's go through these in order.
Getting the ingredients for a specific Recipe, when you already have a reference: Use recipe.ingredients every time.
Getting the ingredients for a specific Recipe that have a specific value (e.g. a Boolean flag value): Easiest is probably to start with recipe.ingredients as above and then use something like NSSet's objectsPassingTest to filter them. Most elegant is to set up a fetched property on Recipe that just returns these ingredients with no extra code (the syntax may not be immediately obvious, see a previous answer I wrote for details). These two probably perform about equally. Least appealing is a fetch request.
Getting ingredients that appear in multiple recipe instances: Probably a fetch request for the Ingredient entity where the predicate is something like recipe in %#, and the %# is replaced by a list of Recipe instances.
Some basic info:
*memory operations are ~100-1000 times faster then disk operations.
*A fetch request execution is always a trip to the store (disk), and so, degrade performance.
In your case, you have a "small" set of objects that need to be queried for information.
simply iterating over them using the recipe.ingredients set would fault them one by one, each access will be a trip to the store (fault resolution).
In this case, use prefetching (either in the request, set the setRelationshipKeyPathsForPrefetching: to prefetch the ingredients relationship or execute a fetch request that fetch the set with the appropriate predicate).
If you need specific data only, then use the fetch request approach to retrieve only the data you need.
if you intend to repeatedly access the relationship for queries and info, just fetch the entire set by using prefetching, and query in-memory.
My point is:
Think of the approach that minimize your disk access (in any case you need at least 1 access).
If your data is too large to fit in memory, or to be queried in memory, perform a fetch to get only the data you need.
Now:
1.Consistency - Pick a method you find comfortable and stick with it (i use prefetching)
2.Readability - Using a property is much more readable then executing a query, however it is less efficient if not using prefetching.
3.Performance - Disk access degrade performance, but is unavoidable in some situations
4.Robustness - A fetch request show that you know what is best for your data usage. use it wisely.
To make sure you are minimising disk access, turn SQLite debug on
(-com.apple.CoreData.SQLDebug)
Edit:
Faulting behaviour
In this example it seems natural to use recipe.ingredients, but I
could equally use an NSFetchRequest for Ingredient entities with an
NSPredicate to match by recipe.
Why would you do the latter when you can do the former? You already have the recipe, and it already has a set of ingredients, so there's no need to look at all the ingredients and filter out just those that are related to the recipe that you already have.
Now let's say I want only the ingredients that are 'collected'. This
is less clear cut to me - should I use a fetch request for ingredients
with a predicate restricting by recipe and collected state? Or loop
through recipe.ingredients?
Apply the predicate to the recipe's ingredients:
NSPredicate *isCollected = [NSPredicate predicateWithFormat:#"collected == YES"];
NSSet *collectedIngredients = [recipe.ingredients filteredSetUsingPredicate:isCollected];
At the other end of the scale, perhaps I need only ingredients from
this recipe that also appear in other recipes. Now, the fetch request
seems more appealing.
Again, using a fetch request here seems wasteful because you already have easy access to the set of ingredients that could be in the final result, and that's potentially a much smaller set than the set of all ingredients. Use the same approach as above, but change the predicate to test the recipes associated with each ingredient. Something like:
NSPredicate *p = [NSPredicate predicateWithFormat:#"recipes > 1"];
NSSet *i = [recipe.ingredients filteredSetUsingPredicate:p];
What is the correct general approach?
Fetch requests are a good way to search through all instances of a given entity. You're always going to start with a fetch request to get some objects to work with. But when the objects you want are somehow related to an object that you already have you can (and should) use those relationships to get what you want.

Creating sections with NSFetchedResultsController, on the fly

I'm using NSFetchedResultsController (NSFRC) to display information in a UITableView. I'm trying to create the option for the user to sort the cells in sections as opposed to alphabetically. The problem is, the sections would then be determined using downloaded information. On top of this the section for each item will be changing relatively often so I don't want to save the section. I have noticed the mention of transient attributes, in my research of similar problems, but i've never used these before I'm not sure if I can use them baring in mind that all the calculations are done once the data has already been loaded, and I also want this solution to be compatible with my previous Core Data database. Also I'm not particularly great at Core Data, (nor Objective-C at that!) so I'm not entirely sure how I'd go about doing this.
So here's what I want to go for if we're using transient attributes (this next bit is theoretical as I don't know if transient attributes are the correct way forward). I would like 4 possible sections, 0-3 (I'll rename them using the TableView delegate to get around sorting problems). When the calculations are done, each cell will be assigned the transient attribute (if needed, the default section would be 2). I hope this all makes sense.
Right, now for some theoretical code. First I create the transient property in the Data Model screen-thing, and make it transient by checking the transient check box... Sounds simple enough.
In the code for the calculations in willDisplayCell (needs to be done in wDC for a couple of reasons), the entity could be saved like this:
MyEntity *myEntity = [self.fetchedResultsController objectAtIndexPath:indexPath];
myEntity.sectionTransientProperty = 2;
if (![self.managedObjectContext save:&error]) {
NSLog(#"Error: %#", error);
FATAL_CORE_DATA_ERROR(error);
return;
}
Done, right? Is that how we assign a value to a transient property?
Then I change the sorting option in NSFRC when I alloc it:
fetchedResultsController = [[NSFetchedResultsController alloc]
initWithFetchRequest:fetchRequest
managedObjectContext:self.managedObjectContext
sectionNameKeyPath:#"sectionTransientProperty"
cacheName:#"MyEntity"];
How are we doing, what else do I need to do? Or have I got this so horribly wrong I should just give up on Core Data and NSFRC? If you guys could help guide me through this I'd really appreciate it. If you need me to post any more code I would be happy to.
Regards,
Mike
If you want an FRC with sections, you have to add a sort descriptor to the fetch request, and that sort descriptor cannot be based on transient attributes.
See the documentation of initWithFetchRequest:managedObjectContext:sectionNameKeyPath:cacheName:`:
If the controller generates sections, the first sort descriptor in
the array is used to group the objects into sections; its key must
either be the same as sectionNameKeyPath or the relative ordering
using its key must match that using sectionNameKeyPath.
and Fetch Predicates and Sort Descriptors in the "Core Data Programming Guide":
The SQL store, on the other hand, compiles the predicate and sort
descriptors to SQL and evaluates the result in the database itself.
This is done primarily for performance, but it means that evaluation
happens in a non-Cocoa environment, and so sort descriptors (or
predicates) that rely on Cocoa cannot work. The supported sort
selectors are ...
In addition you cannot sort on transient properties using the SQLite store.
This means that you cannot create sections purely on transient attributes. You need a persistent attribute that creates the ordering for the sections.
UPDATE: A typical use of a transient attribute as sectionNameKeyPath is: Your objects have a "timeStamp" attribute, and you want to group the objects into sections with one section per month (see the DateSectionTitles sample code from the iOS Developer Library). In this case you have
a persistent attribute "timeStamp",
use "timeStamp" as first sort descriptor for the fetch request,
a transient attribute "sectionIdentifier" which is used as sectionNameKeyPath. "sectionIdentifier" is calculated from "timeStamp" and returns a string representing the year and the month of the timestamp, e.g. "2013-01".
The first thing the FRC does is to sort all fetched objects according to the "timeStamp" attribute. Then the objects are grouped into sections according to the "sectionIdentifier" attribute.
So for a FRC to group the objects into sections you really need a persistent attribute. The easiest solution would be to add a persistent attribute "sectionNumber" to your entity, and use that for "sectionNameKeyPath" and for the first sort descriptor.

Resources