NSFetchRequest: FetchBatchSize and Faulting Behaviour - ios

I am new with Core Data, so sorry if this is a stupid question.
Is there a way to set the fetchBatchSize property on an automatic fetch request generated by firing a fault by accessing an NSManagedObject relationship?
For instance, let's say I have a "Companies" entity and a "Employees" entity with a one-to-many relationship from "Companies" to "Employees". I make a fetch request to retrieve all the companies, then for one company I would like to load its employees.
The obvious way would be to do something like this :
NSSet *employees = [anyCompany employees];
But then, how do I set the fetchBatchSize property to ensure not to load too much data at the same time?
Thank you in advance.

The fetchBatchSize just defines how many records are going to be retrieved in one round trip to the persistent store. For example, if you have 1000 entries for one entity and your batch size is 20, a fetch request fetching all entries will actually execute 50 SQL statements.
It is clear that this is not very efficient depending on the context of your fetch. You can calibrate the fetch request with the batch size if memory becomes an issue, but in most cases you really do not have to care about it too much. Unnecessary multiple round trips to the store, however, will most likely affect performance.
So just use an expression like
aCompany.employees
liberally and let Core Data deal with the memory management. It will typically only retrieve the entities and attributes it actually needs for display or calculation.

Related

Core data force to-many faults to all be fired?

I have a collection of CoreData objects that have a to-many relationship to another object type.
On some of these object I need to search through the related objects to find a particular one. So I loop through them looking for a match which works fine. But on closer inspection I can see CoreData firing off each fault off as it gets to each item in the loop, obviously this is no good - hundreds of faults fired off individually.
Can I trigger CoreData to fire all of the faults in that relationship as a group?
I don't want to just prefech the relationship in the first place because I am dealing with a very large number of objects and for almost all of them I won't ever need to drill down into the related objects.
You could use the inverse relationship to "manually" fetch the related objects, using a predicate to restrict the results. For example if Department has a to-many relationship to Employee and you want to fetch all the Employees for currentDepartment, the fetch might look like this:
NSFetchRequest *employeeFetch = [NSFetchRequest fetchRequestWithEntityName:#"Employee"];
employeeFetch.predicate = [NSPredicate predicateWithFormat:#"department == %#",currentDepartment"];
This will fetch the required Employee objects in one go (*). You could then use either the array returned by the fetch, or the set given by the currentDepartment.employees relationship to search through. Depending on the complexity of the search you are performing, you might even be able to express it as another clause in the predicate, and avoid the need to loop at all.
(*) Technically, the objects returned by the fetch will still be faults (unless you set returnsObjectsAsFaults to false), but the data for these faults has been fetched from the store into the cache, so firing the fault will now have minimal overhead.

Efficient way to get relationship count in database

I want to know what is the best way to get count of related entities in to-many relationship. Let's say I have a data model that looks like this (simplified), and I want to know the number of passengers for each bus:
Currently I can think of two options:
Add an extra attribute to bus entity called passengerCount which will be updated every time a passenger is added/removed.
Every time the count of passengers needs to be displayed, it's done by fetching the passengers and displaying their count.
Both of my options seem quite inefficient, even though I'm not aware how heavy it is to update/fetch values with core data. For example, imagine doing number 2 for every table view cell.
My question is: What is the best way to do this? A method in NSManagedObject class perhaps (I couldn't find any) or some other way that is more efficient?
Three remarks at the very beginning:
A. You should care about efficiency when you have a runtime problem. "Premature optimization is the root of all evil." (Donald Knuth)
B. Who said that all passenger entities has to be fetched? You think of something like this …
[bus.passengers count]
… causing passengers to be fetched. But Core Data supports faulting, so maybe the entities are maybe fetched into fault. (Having only an id, but not the full object.)
C. You can see what Core Data does, when you turn verbose mode on. To do so pass the launch argument
-com.apple.CoreData.SQLDebug 1
To your question itself:
If you really have a problem, you can ask for a count explicitly with -countForFetchRequest:error:.
NSFetchRequest *fetch = [NSFetchRequest fetchRequestWithEntityName:#"passenger"];
fetch.predicate = [NSPredicate predicateWithFormat:#"bus == %#", bus];
…
NSUInteger count = [context countForFetchRequest:fetch error:NULL]; // Please pass an NSError instance in real world
Typed in Safari.
The XCode auto-generated NSManagedObject class for your Core Data entity bus contains a property for its to-many relationships to Passenger objects.
You can think of this property like a "computed attribute" of your entity (meaning you will not set the attribute yourself but Core Data updates it automatically when you add or delete a relationship). This property is an NSSet? (with references to the related Passenger objects) and the NSSet supports the .count method.
So you can use .count without a special fetch request.

Core Data constraint with "includesPropertyValues"

I would like to make a more efficient fetch on Core Data entities and I have a query.
I want to delete a large amount of records (millions).
My logic is:
fetch all records for the entity
delete all fetched records.
To improve fetching,
I set the following constraint:
fetch.includesPropertyValues = NO;
My question is: will the relationships (which are kept as properties in the managed objects) also be deleted?
Yes, if you delete a managed object, the relationship delete rules apply regardless of this flag.
With so many records you might also want to want to process the instances in batches. Use setFetchLimit: to get a subset of the instances, delete those, save changes, and repeat until no more instances are found.

Correct way to accessing attributes of a class modeled in core data

Let's say a Recipe object has an NSSet of one or more Ingredients, and that the same relationship is modeled in core data.
Given a recipe, what id the correct way to access its ingredients?
In this example it seems natural to use recipe.ingredients, but I could equally use an NSFetchRequest for Ingredient entities with an NSPredicate to match by recipe.
Now let's say I want only the ingredients that are 'collected'. This is less clear cut to me - should I use a fetch request for ingredients with a predicate restricting by recipe and collected state? Or loop through recipe.ingredients?
At the other end of the scale, perhaps I need only ingredients from this recipe that also appear in other recipes. Now, the fetch request seems more appealing.
What is the correct general approach? Or is it a case by case scenario? I am interested in the impact on:
Consitancy
Readability
Performance
Robustness (for example, it is easy to make an error in a fetch request that the compiler cannot catch).
Let's go through these in order.
Getting the ingredients for a specific Recipe, when you already have a reference: Use recipe.ingredients every time.
Getting the ingredients for a specific Recipe that have a specific value (e.g. a Boolean flag value): Easiest is probably to start with recipe.ingredients as above and then use something like NSSet's objectsPassingTest to filter them. Most elegant is to set up a fetched property on Recipe that just returns these ingredients with no extra code (the syntax may not be immediately obvious, see a previous answer I wrote for details). These two probably perform about equally. Least appealing is a fetch request.
Getting ingredients that appear in multiple recipe instances: Probably a fetch request for the Ingredient entity where the predicate is something like recipe in %#, and the %# is replaced by a list of Recipe instances.
Some basic info:
*memory operations are ~100-1000 times faster then disk operations.
*A fetch request execution is always a trip to the store (disk), and so, degrade performance.
In your case, you have a "small" set of objects that need to be queried for information.
simply iterating over them using the recipe.ingredients set would fault them one by one, each access will be a trip to the store (fault resolution).
In this case, use prefetching (either in the request, set the setRelationshipKeyPathsForPrefetching: to prefetch the ingredients relationship or execute a fetch request that fetch the set with the appropriate predicate).
If you need specific data only, then use the fetch request approach to retrieve only the data you need.
if you intend to repeatedly access the relationship for queries and info, just fetch the entire set by using prefetching, and query in-memory.
My point is:
Think of the approach that minimize your disk access (in any case you need at least 1 access).
If your data is too large to fit in memory, or to be queried in memory, perform a fetch to get only the data you need.
Now:
1.Consistency - Pick a method you find comfortable and stick with it (i use prefetching)
2.Readability - Using a property is much more readable then executing a query, however it is less efficient if not using prefetching.
3.Performance - Disk access degrade performance, but is unavoidable in some situations
4.Robustness - A fetch request show that you know what is best for your data usage. use it wisely.
To make sure you are minimising disk access, turn SQLite debug on
(-com.apple.CoreData.SQLDebug)
Edit:
Faulting behaviour
In this example it seems natural to use recipe.ingredients, but I
could equally use an NSFetchRequest for Ingredient entities with an
NSPredicate to match by recipe.
Why would you do the latter when you can do the former? You already have the recipe, and it already has a set of ingredients, so there's no need to look at all the ingredients and filter out just those that are related to the recipe that you already have.
Now let's say I want only the ingredients that are 'collected'. This
is less clear cut to me - should I use a fetch request for ingredients
with a predicate restricting by recipe and collected state? Or loop
through recipe.ingredients?
Apply the predicate to the recipe's ingredients:
NSPredicate *isCollected = [NSPredicate predicateWithFormat:#"collected == YES"];
NSSet *collectedIngredients = [recipe.ingredients filteredSetUsingPredicate:isCollected];
At the other end of the scale, perhaps I need only ingredients from
this recipe that also appear in other recipes. Now, the fetch request
seems more appealing.
Again, using a fetch request here seems wasteful because you already have easy access to the set of ingredients that could be in the final result, and that's potentially a much smaller set than the set of all ingredients. Use the same approach as above, but change the predicate to test the recipes associated with each ingredient. Something like:
NSPredicate *p = [NSPredicate predicateWithFormat:#"recipes > 1"];
NSSet *i = [recipe.ingredients filteredSetUsingPredicate:p];
What is the correct general approach?
Fetch requests are a good way to search through all instances of a given entity. You're always going to start with a fetch request to get some objects to work with. But when the objects you want are somehow related to an object that you already have you can (and should) use those relationships to get what you want.

NSFetchRequest of simply filter the relation?

Assume a NSManagedObject A and one B. Now A has a to-many B relationship on bs.
Is [A.bs filter...] with some NSPredicate is very convenient, but likely slower then building up a NSFetchRequest on B with the same predicate and a condition to match the relation, or am I mistaken?
I guess this performance issue is even worse, if you do something like lastObject on the result to obtain only a single result. (The NSFetchRequest offers a fetchLimit property that can be exploited for this purpose).
On a similar note, if you are just interested in one or two properties, NSFetchRequest providers a propertiesToFetch property as well.
My reasoning behind this is, that using the relation directly requires core data to pull all NSManagedObjects into the relevant NSManagedObjectContext. While the NSFetchRequest can perform optimizations on the store-level.
Now:
is my reasoning correct? Thus is the takeaway, that if you are not interested in all relation objects, go with a NSFetchRequest?
is there a solution to have the convenient (and obviously more readable) approach via the relations having similar performance?
Yes your assumption is correct that performance wise fetch requests can use optimizations while the filtering approach is actually a two stage approach where first you fetch all your objects into a NSSet and then send -filteredSetUsingPredicate: to the NSSet which is out of core data's scope already.
One alternative would be to use fetch request templates in your model and instantiate them in your code with [managedObjectModel fetchRequestFromTemplateWithName:substitutionVariables:]
Your reasoning is correct.
I would add NSFetchRequests offer much more flexibility and opportunities to optimize the requests.
Fetch requests can uses indexes in the underlaying database.
Fetch requests can retrieve objects as faults instead of loading them in full.
Fetch requests can preload some chosen attributes.
They can even perform some operations for you (see NSExpressionDescription, NSExpression).
...
Even if you need to use some in-memory filtering (with objectsWithOptions:passingTest: per example), you better have to use a fetch request to preload the objects and attributes you need. Many fault resolutions will be slower in any case.

Resources