Performance improvement in Core Data relationship

Performance improvement in Core Data relationship - ios

I have two core data entities (which have relationship, and its inverse), pre-populated (around 50k registers on each), and I need to make a relationship. It's almost a 1:1 relation. They have an attribute in common, so they must be in a relationship if both attributes are equal.
I'm trying to do it in a rough way, and getting a lot of memory issues (it quickly escalates to memory warnings).
#autoreleasepool {
NSFetchRequest *e2sRequest = [[NSFetchRequest alloc] initWithEntityName:#"Entity2"];
e2sRequest.includesPropertyValues = NO;
e2sRequest.includesSubentities = NO;
NSArray *e2s = [self.fatherMOC executeFetchRequest:e2sRequest error:nil];
if(e2s.count > 0) {
NSFetchRequest *e1sRequest = [[NSFetchRequest alloc] initWithEntityName:#"Entity1"];
e1sRequest.includesPropertyValues = NO;
e1sRequest.includesSubentities = NO;
NSArray *e1s = [self.fatherMOC executeFetchRequest:e1sRequest error:nil];
for(Entity1 *e1 in e1s) {
NSString *attributeInCommon = e1.attributeInCommon;
NSPredicate *predicate = [NSPredicate predicateWithFormat:#"attributeInCommon = %#", attributeInCommon];
Entity2 *e2matching = (Entity2 *)[e2s filteredArrayUsingPredicate:predicate].lastObject;
if(e2) {
e1.e2 = e2matching;
}
}
}
}
I've tried getting the attribute in common and the objectID in memory in a NSDictionary, with no result. I've tried a couple more of methods, ones being terribly slow, and others being terrible memory-eaters.
I know that I must check the errors, I know I can do it in less lines of code, but think of it as a debug/on a rush code, so I'll be fixed.
Thanks in advance

You're trying to load 100000 items all at the same time so it's no wonder you have memory issues.
You need to batch and if you create an autorelease pool you need to drain it sometimes (so it needs to be involved with the batch).
So, set a fetchBatchSize on the first fetch request. Then, iterate over the results it finds taking fetchBatchSize items at a time. This is where the pool should be so it's released after each batch. Start with a batch of 100 and see how it goes.
Each batch then makes the second query with a predicate to limit to the set of values that can actually match with the current batch.
Then run your matching logic.
Consider also using the Core Data tool in Instruments to check what's happening, how many requests you make to the data store and how long it all takes.

I suppose that this operation (match 50.000 entities with 50.000 other entities based on a common string attribute that acts as a unique key) is not something you want to repeat on users' devices. Rather, it seems you need to do it once in preparation of the seed data.
Therefore there is actually no need to optimize, because time and (on the simulator) memory are won't be the issue.
So just perform this in batches, e.g. as follows:
fetch 1000 e1,
fetch 1000 corresponding e2 with a predicate
link
save
drain memory
repeat
Some hints:
To get to the distinct 1.000 records chunks, add a sort descriptor and use fetchOffset and fetchLimit.
The predicate for getting the records would be something like this.
NSArray *attributes = [e1Results valueForKeyPath:#"attributeInCommon"];
request.predicate =
[NSPredicate predicateWithFormat:#"attributeInCommon in #%", attributes];

Related

Importing large datasets into core data, making the relationships in Swift

I have a CoreData database which houses around 500.000 stamps and 86.000 series. I have to download them from a web api, which uses JSON. The adding of stamps and series into core data goes with no problem. But I have troubles when making relationships between the two.
Each stamp has one serie and each serie can have multiple stamps. As seen in the picture of my datamodel above.
I need to make the relationship between the two, efficiently and fast. While I was doing some research I stumbled across this website https://www.objc.io/issues/4-core-data/importing-large-data-sets-into-core-data/ The piece that I'm most interested in:
A similar problem often arises when establishing relationships between
the newly imported objects. Using a fetch request to get each related
object independently is vastly inefficient. There are two possible
ways out of this: either we resolve relationships in batches similar
to how we imported the objects in the first place, or we cache the
objectIDs of the already-imported objects. Resolving relationships in
batches allows us to greatly reduce the number of fetch requests
required by fetching many related objects at once. Don’t worry about
potentially long predicates like:
[NSPredicate predicateWithFormat:#"identifier IN %#", identifiersOfRelatedObjects];
Resolving a predicate with many identifiers in the IN (...) clause is
always way more efficient than going to disk for each object
independently. However, there is also a way to avoid fetch requests
altogether (at least if you only need to establish relationships
between newly imported objects). If you cache the objectIDs of all
imported objects (which is not a lot of data in most cases really),
you can use them later to retrieve faults for related objects using
objectWithID:.
// after a batch of objects has been imported and saved
for (MyManagedObject *object in importedObjects) {
objectIDCache[object.identifier] = object.objectID;
}
// ... later during resolving relationships
NSManagedObjectID objectID = objectIDCache[object.foreignKey];
MyManagedObject *relatedObject = [context objectWithID:objectId];
object.toOneRelation = relatedObject;
Note that this example assumes that the identifier property is unique
across all entity types, otherwise we would have to account for
duplicate identifiers for different types in the way we cache the
object IDs.
But I have no idea what they mean by that, can someone give some more explanation about this. Preferably in Swift, as that is the language I understand the best and also the language in which I'm creating my app.
Of course other suggestions are also fine. Note, moving away from CoreData is not an option anymore.

The task of making relationship between two objects involves having those two objects at hand. Considering that they have been already created in Core Data, you may execute a fetch request with predicate like
#"countryID == %#", countryObjectData[#"id"]
and you'll get them. But if you need to establish five hundred thousands relationships, you'll have to execute one million fetch requests. It's slow.
Retrieving an NSManagedObject by its NSManagedObjectID is significally faster than searching by property value. Before starting parsing you can build a cache of all your Core Data objects by entity in form of server key -> objectID pairs.
self.cache = [NSMutableDictionary dictionaryWithCapacity:self.managedObjectModel.entities.count];
NSExpressionDescription *objectIdDescription = [[NSExpressionDescription alloc] init];
objectIdDescription.name = #"objectID";
objectIdDescription.expression = [NSExpression expressionForEvaluatedObject];
objectIdDescription.expressionResultType = NSObjectIDAttributeType;
NSString *key = #"serverID";
for (NSEntityDescription *entity in self.managedObjectModel.entities) {
NSMutableDictionary *entityCache = [NSMutableDictionary dictionary];
self.cache[entity.name] = entityCache;
NSFetchRequest *request = [NSFetchRequest fetchRequestWithEntityName:entity.name];
request.resultType = NSDictionaryResultType;
request.propertiesToFetch = #[key, objectIdDescription];
NSArray *result = [self.context executeFetchRequest:request error:nil];
for (NSDictionary *item in result) {
id value = item[key];
NSManagedObjectID *objectID = item[#"objectID"];
entityCache[value] = objectID;
}
}
Having that cache you can get your objects like this:
id serverKey = countryObjectData[#"id"];
NSManagedObjectID *objectID = self.cache[#"Country"][serverKey];
Country *country = [self.context objectWithID:objectID]
It's much faster.
When you creating new objects while parsing JSON, you need to add their server key and objectID pair to cache – after obtaining permanent IDs. Delete that pair from cache when you deleting object.

Retrieve Object ID from a fetch request on an abstract entity with a group by clause

I'm implementing a search in an iOS app (targeting iOS 8 and up), and have a set of entities that store search terms for other objects. The search term entities have an abstract parent. Each concrete search term entity has a relationship to the object it relates to (an object can have many search terms). This way I can run a fetch request against the abstract parent search term entity, and through each concrete result get to the object that matches. As an example:
I'm then making a request like this to find matches for a given string:
NSString *text = #"query";
NSFetchRequest *fetchRequest = [NSFetchRequest fetchRequestWithEntityName:#"AbstractSearchTerm"];
fetchRequest.predicate = [NSPredicate predicateWithFormat:#"term CONTAINS[cd] %#", text];
NSSortDescriptor *boostSortDescriptor = [NSSortDescriptor sortDescriptorWithKey:#"boost" ascending:NO];
fetchRequest.sortDescriptors = #[boostSortDescriptor];
NSError *error = nil;
[self.managedObjectContext executeFetchRequest:fetchRequest error:&error];
The fetch request is then fed to a NSFetchedResultsController which drives a table view of search results. This works well enough except that I get multiple results for each Category, Tag etc. because of the to-many relationship to the search term (I can't use a to-one relationship because some results need to be be boosted over others). So I thought to somehow group the results by the actual entity that they relate to.
Firstly - if there's a better way to achieve what I'm trying to do without having to get into grouping then I'm all ears. Perhaps something using subqueries.. Otherwise, here's where it gets messy!
In order to group results I obviously need something to group on; I have a unique identifier for each record (e.g. one from my remote API) which I can denormalise to a property on AbstractSearchTerm, so I can use that. I modified my fetch request like so:
NSEntityDescription *entityDescription = [NSEntityDescription entityForName:#"AbstractSearchTerm" inManagedObjectContext:self.managedObjectContext];
NSDictionary *entityProperties = [entityDescription propertiesByName];
NSString *text = #"query";
NSFetchRequest *fetchRequest = [NSFetchRequest fetchRequestWithEntityName:#"AbstractSearchTerm"];
// ...
fetchRequest.propertiesToFetch = #[entityProperties[#"uniqueIdentifier"], entityProperties[#"term"], entityProperties[#"boost"]];
fetchRequest.propertiesToGroupBy = #[entityProperties[#"uniqueIdentifier"], entityProperties[#"term"], entityProperties[#"boost"]];
fetchRequest.resultType = NSDictionaryResultType;
NSError *error = nil;
[self.managedObjectContext executeFetchRequest:fetchRequest error:&error];
And I get results, but I don't have the objectID to work back to the actual search object, and thus to the matched relation. So I thought to also retrieve the objectID. There's a way to do that, according to this answer, but when I try it like so:
NSExpressionDescription* objectIdDesc = [[NSExpressionDescription alloc] init];
objectIdDesc.name = #"objectID";
objectIdDesc.expression = [NSExpression expressionForEvaluatedObject];
objectIdDesc.expressionResultType = NSObjectIDAttributeType;
// ...
fetchRequest.propertiesToFetch = #[objectIdDesc, entityProperties[#"uniqueIdentifier"], entityProperties[#"term"], entityProperties[#"boost"]];
fetchRequest.propertiesToGroupBy = #[objectIdDesc, entityProperties[#"uniqueIdentifier"], entityProperties[#"term"], entityProperties[#"boost"]];
I get this error:
Invalid keypath expression ((<NSExpressionDescription: 0x7f9e89454e50>), name objectID, isOptional 1, isTransient 0, entity (null), renamingIdentifier objectID, validation predicates (
), warnings (
), versionHashModifier (null)
userInfo {
}) passed to setPropertiesToFetch:
Which I can't get around. It may be a bug in Core Data, or perhaps that approach isn't valid when the fetch request has a group by clause. Without a way to get back to a concrete entity instance I'm stuck. Since I'm going to be querying an amount of records in at least the tens of thousands I don't want to be making too many queries or putting a lot of data in memory, so until now I've ruled out storing results in an array and filtering them in code (I'd prefer to keep using NSFetchedResultsController and batch results). If anyone has any advice on where I could go from here I'd really appreciate it.

I assume you want to store searches the user can compose and persist them within your Core Data model. To create entities describing this search information is a valid approach.
However, I think that your idea to then link these search objects with actual data object is not very good from a design perspective. The data objects should not be burdened with searching functionality (especially not on a data model level). Thus, your search modeling should be independent from your data, i.e. without direct relationships between those two groups of entities.
There is really no advantage having direct relationships. Both constructing and executing a fetch request, and pulling down entities in a relationship have the same persistent store overhead, so it just adds unnecessary complexity.
Instead, have the search related entities describe the search. I think that you can achieve that quite simply without parent and child entities. I would think that all need is perhaps a to-many relationship for subpredicates, but maybe not even that. All search relevant information can be coded in string attributes. The entire search model could be as simple as:
Search <----->> Condition
I think you can achieve all you want (including grouping, counting etc.) with this very much simplified setup. Also, you do not have to worry about the result types any more, just fetch NSManagedObejcts.

You need to added following to get ObjectID
fetchRequest.resultType = NSManagedObjectIDResultType

Core Data sectionNameKeyPath with Relationship Attribute Performance Issue

I have a Core Data Model with three entities:
Person, Group, Photo with relationships between them as follows:
Person <<-----------> Group (one to many relationship)
Person <-------------> Photo (one to one)
When I perform a fetch using the NSFetchedResultsController in a UITableView, I want to group in sections the Person objects using the Group's entity name attribute.
For that, I use sectionNameKeyPath:#"group.name".
The problem is that when I'm using the attribute from the Group relationship, the NSFetchedResultsController fetches everything upfront in small batches of 20 (I have setFetchBatchSize: 20) instead of fetching batches while I'm scrolling the tableView.
If I use an attribute from the Person entity (like sectionNameKeyPath:#"name") to create sections everything works OK: the NSFetchResultsController loads small batches of 20 objects as I scroll.
The code I use to instantiate the NSFetchedResultsController:
- (NSFetchedResultsController *)fetchedResultsController {
if (_fetchedResultsController) {
return _fetchedResultsController;
}
NSFetchRequest *fetchRequest = [[NSFetchRequest alloc] init];
NSEntityDescription *entity = [NSEntityDescription entityForName:[Person description]
inManagedObjectContext:self.managedObjectContext];
[fetchRequest setEntity:entity];
// Specify how the fetched objects should be sorted
NSSortDescriptor *groupSortDescriptor = [[NSSortDescriptor alloc] initWithKey:#"group.name"
ascending:YES];
NSSortDescriptor *personSortDescriptor = [[NSSortDescriptor alloc] initWithKey:#"birthName"
ascending:YES
selector:#selector(localizedStandardCompare:)];
[fetchRequest setSortDescriptors:[NSArray arrayWithObjects:groupSortDescriptor, personSortDescriptor, nil]];
[fetchRequest setRelationshipKeyPathsForPrefetching:#[#"group", #"photo"]];
[fetchRequest setFetchBatchSize:20];
NSError *error = nil;
NSArray *fetchedObjects = [self.managedObjectContext executeFetchRequest:fetchRequest error:&error];
if (fetchedObjects == nil) {
NSLog(#"Error Fetching: %#", error);
}
_fetchedResultsController = [[NSFetchedResultsController alloc] initWithFetchRequest:fetchRequest
managedObjectContext:self.managedObjectContext sectionNameKeyPath:#"group.name" cacheName:#"masterCache"];
_fetchedResultsController.delegate = self;
return _fetchedResultsController;
}
This is what I get in Instruments if I create sections based on "group.name" without any interaction with the App's UI:
And this is what I get (with a bit of scrolling on UITableView) if sectionNameKeyPath is nil:
Please, can anyone help me out on this issue?
EDIT 1:
It seems that I get inconsistent results from the simulator and Instruments: when I've asked this question, the app was starting in the simulator in about 10 seconds (by Time Profiler) using the above code.
But today, using the same code as above, the app starts in the simulator in 900ms even if it makes a temporary upfront fetch for all the objects and it's not blocking the UI.
I've attached some fresh screenshots:
EDIT 2:
I reset the simulator and the results are intriguing: after performing an import operation and quitting the app the first run looked like this:
After a bit of scrolling:
Now this is what happens on a second run:
After the fifth run:
EDIT 3:
Running the app the seventh time and eight time, I get this:

This is your stated objective: "I need the Person objects to be grouped in sections by the relationship entity Group, name attribute and the NSFetchResultsController to perform fetches in small batches as I scroll and not upfront as it is doing now."
The answer is a little complicated, primarily because of how an NSFetchedResultsController builds sections, and how that affects the fetching behavior.
TL;DR; To change this behavior, you would need to change how NSFetchedResultsController builds sections.
What is happening?
When an NSFetchedResultsController is given a fetch request with pagination (fetchLimit and/or fetchBatchSize), several things happen.
If no sectionNameKeyPath is specified, it does exactly what you expect. The fetch returns an proxy array of results, with "real" objects for the first fetchBathSize number of items. So for example if you have setFetchBatchSize to 2, and your predicate matches 10 items in the store, the results contain the first two objects. The other objects will be fetched separately as they are accessed. This provides a smooth paginated response experience.
However, when a sectionNameKeyPath is specified, the fetched results controller has to do a bit more. To compute the sections it needs to access that key path on all the objects in the results. It enumerates the 10 items in the results in our example. The first two have already been fetched. The other 8 will be fetched during enumeration to get the key path value needed to build the section information. If you have a lot of results for your fetch request, this can be very inefficient. There are a number of public bugs concerning this functionality:
NSFetchedResultsController initially takes too long to set up sections
NSFetchedResultsController ignores fetchLimit property
NSFetchedResultsController, Table Index, and Batched Fetch Performance Issue
... And several others. When you think about it, this makes sense. To build the NSFetchedResultsSectionInfo objects requires the fetched results controller to see every value in the results for the sectionNameKeyPath, aggregate them to the unique union of values, and use that information to create the correct number of NSFetchedResultsSectionInfo objects, set the name and index title, know how many objects in the results a section contains, etc. To handle the general use case there is no way around this. With that in mind, your Instruments traces may make a lot more sense.
How can you change this?
You can attempt to build your own NSFetchedResultsController that provides an alternative strategy for building NSFetchedResultsSectionInfo objects, but you may run into some of the same problems. For example, if you are using the existing fetchedObjects functionality to access members of the fetch results, you will encounter the same behavior when accessing objects that are faults. Your implementation would need a strategy for dealing with this (it's doable, but very dependant on your needs and requirements).
Oh god no. What about some kind of temporary hack that just makes it perform a little better but doesn't fix the problem?
Altering your data model will not change the above behavior, but can change the performance impact slightly. Batch updates will not have any significant effect on this behavior, and in fact will not play nicely with a fetched results controller. It may be much more useful to you, however, to instead set the relationshipKeyPathsForPrefetching to include your "group" relationship, which may improve the fetching and faulting behavior significantly. Another strategy may be to perform another fetch to batch fault these objects before you attempt to use the fetched results controller, which will populate the various levels of Core Data in-memory caches in a more efficient manner.
The NSFetchedResultsController cache is primarily for section information. This prevents the sections from having to be completely recalculated on each change (in the best case), but can actually make the initial fetch to build the sections take much longer. You will have to experiment to see if the cache is worthwhile for your use case.
If your primary concern is that these Core Data operations are blocking user interaction, you can offload them from the main thread. NSFetchedResultsController can be used on a private queue (background) context, which will prevent Core Data operations from blocking the UI.

Based on my experience a way to achieve your goal is to denormalize your model. In particular, you could add a group attribute in your Person entity and use that attribute as sectionNameKeyPath. So, when you create a Person you should also pass the group it belongs to.
This denormalization process is correct since it allows you to avoid fetching of related Group objects since not necessary. A cons could be that if you change the name of a group, all the persons associated with that name must change, on the contrary you can have incorrect values.
The key aspect here is the following. You need to have in mind that Core Data is not a
relational database. The model should not designed as a database schema, where normalization could take place, but it should be designed from the perspective of how the data are presented and used in the user interface.
Edit 1
I cannot understand your comment, could you explain better?
What I've found very intriguing though is that even if the app is
performing a full upfront fetch in the simulator, the app loads in
900ms (with 5000 objects) on the device despite the simulator where it
loads much slower.
Anyway, I would be interested in knowing details about your Photo entity. If you pre-fetch photo the overall execution could be influenced.
Do you need to pre-fetch a Photo within your table view? Are they thumbs (small photos)? Or normal images? Do you take advantage of External Storage Flag?
Adding an additional attribute (say group) to the Person entity could not be a problem. Updating the value of that attribute when the name of a Group object changes it's not a problem if you perform it in background. In addition, starting from iOS 8 you have available a batch update as described in Core Data Batch Updates.

After almost a year since I've posted this question, I've finally found the culprits that enable this behaviour (which slightly changed in Xcode 6):
Regarding the inconsistent fetch times: I was using a cache and at the time I was back and forth with opening, closing and resetting the simulator.
Regarding the fact that everything was fetched upfront in small batches without scrolling (in Xcode 6's Core Data Instruments that's not the case anymore - now it's one, big fetch which takes entire seconds):
It seems that setFetchBatchSize does not work correctly with parent/child contexts. The issue was reported back in 2012 and it seems that it's still there http://openradar.appspot.com/11235622.
To overcome this issue, I created another independent context with an NSMainQueueConcurrencyType and set its persistence coordinator to be the same that my other contexts are using.
More about issue #2 here: https://stackoverflow.com/a/11470560/1641848

Core-data one to many relationship loses his relationship after fetching new objects [duplicate]

This question already has an answer here:
Core data relationship lost after fetching more objects into the entities
(1 answer)
Closed 9 years ago.
I've asked this question before. But i'm opening a new one because I have some other insights now. First of all this is how my core data model looks like.
Now when I fetch my first appointments into my model. Everything works oké. But the problem comes when I load up new appointments. Then the previous appointments location relation goes to NULL. The strange things is that the location relationship only works with the appointments that are last loaded in.
I'm using restkit for mapping my JSON into my core-data model. And this is how I made the relationship.
[locationMapping addPropertyMapping:[RKRelationshipMapping relationshipMappingFromKeyPath:#"appointments" toKeyPath:#"appointments" withMapping:appointmentMapping]];
Can anybody help me with this problem ?

First of all, your model is horrible (no offense). You should create LabelData, Data and VerplichtData entities. These should have to-one relationships to Location / Appointment. Location and Appointment should have to-many relationships to LabelData, Data and VerplichtData.
You should probably follow Mundis advice and not use rest kit, it will probably make debugging lots easier. Apple has a pretty decent strategy for importing data in a smart way (i.e. fast and without duplication). Here a copy-paste from the docs in case the link dies:
Implementing Find-or-Create Efficiently
A common technique when importing data is to follow a "find-or-create" pattern, where you set up some data from which to create a managed object, determine whether the managed object already exists, and create it if it does not.
There are many situations where you may need to find existing objects (objects already saved in a store) for a set of discrete input values. A simple solution is to create a loop, then for each value in turn execute a fetch to determine whether there is a matching persisted object and so on. This pattern does not scale well. If you profile your application with this pattern, you typically find the fetch to be one of the more expensive operations in the loop (compared to just iterating over a collection of items). Even worse, this pattern turns an O(n) problem into an O(n^2) problem.
It is much more efficient—when possible—to create all the managed objects in a single pass, and then fix up any relationships in a second pass. For example, if you import data that you know does not contain any duplicates (say because your initial data set is empty), you can just create managed objects to represent your data and not do any searches at all. Or if you import "flat" data with no relationships, you can create managed objects for the entire set and weed out (delete) any duplicates before save using a single large IN predicate.
If you do need to follow a find-or-create pattern—say because you're importing heterogeneous data where relationship information is mixed in with attribute information—you can optimize how you find existing objects by reducing to a minimum the number of fetches you execute. How to accomplish this depends on the amount of reference data you have to work with. If you are importing 100 potential new objects, and only have 2000 in your database, fetching all of the existing and caching them may not represent a significant penalty (especially if you have to perform the operation more than once). However, if you have 100,000 items in your database, the memory pressure of keeping those cached may be prohibitive.
You can use a combination of an IN predicate and sorting to reduce your use of Core Data to a single fetch request. Suppose, for example, you want to take a list of employee IDs (as strings) and create Employee records for all those not already in the database. Consider this code, where Employee is an entity with a name attribute, and listOfIDsAsString is the list of IDs for which you want to add objects if they do not already exist in a store.
First, separate and sort the IDs (strings) of interest.
// get the names to parse in sorted order
NSArray *employeeIDs = [[listOfIDsAsString componentsSeparatedByString:#"\n"]
sortedArrayUsingSelector: #selector(compare:)];
Next, create a predicate using IN with the array of name strings, and a sort descriptor which ensures the results are returned with the same sorting as the array of name strings. (The IN is equivalent to an SQL IN operation, where the left-hand side must appear in the collection specified by the right-hand side.)
// Create the fetch request to get all Employees matching the IDs.
NSFetchRequest *fetchRequest = [[NSFetchRequest alloc] init];
[fetchRequest setEntity:
[NSEntityDescription entityForName:#"Employee" inManagedObjectContext:aMOC]];
[fetchRequest setPredicate: [NSPredicate predicateWithFormat:#"(employeeID IN %#)", employeeIDs]];
// make sure the results are sorted as well
[fetchRequest setSortDescriptors:
#[[[NSSortDescriptor alloc] initWithKey: #"employeeID" ascending:YES]]];
Finally, execute the fetch.
NSError *error;
NSArray *employeesMatchingNames = [aMOC executeFetchRequest:fetchRequest error:&error];
You end up with two sorted arrays—one with the employee IDs passed into the fetch request, and one with the managed objects that matched them. To process them, you walk the sorted lists following these steps:
Get the next ID and Employee. If the ID doesn't match the Employee ID, create a new Employee for that ID.
Get the next Employee: if the IDs match, move to the next ID and Employee.
Regardless of how many IDs you pass in, you only execute a single fetch, and the rest is just walking the result set.
The listing below shows the complete code for the example in the previous section.
// Get the names to parse in sorted order.
NSArray *employeeIDs = [[listOfIDsAsString componentsSeparatedByString:#"\n"]
sortedArrayUsingSelector: #selector(compare:)];
// create the fetch request to get all Employees matching the IDs
NSFetchRequest *fetchRequest = [[NSFetchRequest alloc] init];
[fetchRequest setEntity:
[NSEntityDescription entityForName:#"Employee" inManagedObjectContext:aMOC]];
[fetchRequest setPredicate: [NSPredicate predicateWithFormat: #"(employeeID IN %#)", employeeIDs]];
// Make sure the results are sorted as well.
[fetchRequest setSortDescriptors:
#[ [[NSSortDescriptor alloc] initWithKey: #"employeeID" ascending:YES] ]];
// Execute the fetch.
NSError *error;
NSArray *employeesMatchingNames = [aMOC executeFetchRequest:fetchRequest error:&error];

Core Data — Find-or-Create Efficiently

According to Apple's documentation(link)—
There are many situations where you may need to find existing objects
(objects already saved in a store) for a set of discrete input values.
A simple solution is to create a loop, then for each value in turn
execute a fetch to determine whether there is a matching persisted
object and so on. This pattern does not scale well. If you profile
your application with this pattern, you typically find the fetch to be
one of the more expensive operations in the loop (compared to just
iterating over a collection of items). Even worse, this pattern turns
an O(n) problem into an O(n^2) problem.
It is much more efficient—when possible—to create all the managed
objects in a single pass, and then fix up any relationships in a
second pass. For example, if you import data that you know does not
contain any duplicates (say because your initial data set is empty),
you can just create managed objects to represent your data and not do
any searches at all. Or if you import "flat" data with no
relationships, you can create managed objects for the entire set and
weed out (delete) any duplicates before save using a single large IN
predicate.
Question 1: Considering that my data I'm importing doesn't have any relationships, how do I implement what is described in the last line.
If you do need to follow a find-or-create pattern—say because you're
importing heterogeneous data where relationship information is mixed
in with attribute information—you can optimize how you find existing
objects by reducing to a minimum the number of fetches you execute.
How to accomplish this depends on the amount of reference data you
have to work with. If you are importing 100 potential new objects, and
only have 2000 in your database, fetching all of the existing and
caching them may not represent a significant penalty (especially if
you have to perform the operation more than once). However, if you
have 100,000 items in your database, the memory pressure of keeping
those cached may be prohibitive.
You can use a combination of an IN predicate and sorting to reduce
your use of Core Data to a single fetch request.
Example code:
// Get the names to parse in sorted order.
NSArray *employeeIDs = [[listOfIDsAsString componentsSeparatedByString:#"\n"]
sortedArrayUsingSelector: #selector(compare:)];
// create the fetch request to get all Employees matching the IDs
NSFetchRequest *fetchRequest = [[NSFetchRequest alloc] init];
[fetchRequest setEntity:
[NSEntityDescription entityForName:#"Employee" inManagedObjectContext:aMOC]];
[fetchRequest setPredicate: [NSPredicate predicateWithFormat: #"(employeeID IN %#)", employeeIDs]];
// Make sure the results are sorted as well.
[fetchRequest setSortDescriptors:
#[ [[NSSortDescriptor alloc] initWithKey: #"employeeID" ascending:YES] ]];
// Execute the fetch.
NSError *error;
NSArray *employeesMatchingNames = [aMOC executeFetchRequest:fetchRequest error:&error];
You end up with two sorted arrays—one with the employee IDs passed
into the fetch request, and one with the managed objects that matched
them. To process them, you walk the sorted lists following these
steps:
Get the next ID and Employee. If the ID doesn't match the Employee ID,
create a new Employee for that ID. Get the next Employee: if the IDs
match, move to the next ID and Employee.
Question 2: In the above example, I get two sorted arrays as described above. Considering the worst case scenario where all the objects that are to be inserted are present in the store, I don't see anyway that I can solve the problem in O(n) time. Apple describes the two steps as above but that is an O(n^2) job. For any kth element in the input array, there might or might not exist an element that matches it in the first k elements in the output array. So in the worst case, the complexity will be O(nC2) = O(n^2).
So, what I believe Apple is doing is making sure that fetch only processes once even though there are O(n^2) checks required. If so, then I'll go with this; but is there any other way of doing this efficiently.
Please understand, that I don't want to fetch again and again - fetch once for an input array of size 100 identifiers.

Ad. 1 The fact of having relationships isn't important here. This explanation only says that if you download your data from e.g. a remote server and your items have some IDs, then you can fetch them all from the persistent store in one request, instead of fetching each object in a separate request.
Ad. 2
Apple describes the two steps as above but that is an O(n^2) job.
It's not. Please read these lines carefully:
To process them, you walk the sorted lists following these steps:
Get the next ID and Employee. If the ID doesn't match the Employee ID,
create a new Employee for that ID. Get the next Employee: if the IDs
match, move to the next ID and Employee.
You walk the arrays/lists simultaneously, so you never have to make this check: "there might or might not exist an element that matches it in the first k elements in the output array." You don't need to check previous elements as they're sorted and they certainly won't contain the object you're interested in.

If anyone is looking for the original Apple documentation there is snapshot here:
http://web.archive.org/web/20150908024050/https://developer.apple.com/library/mac/documentation/cocoa/conceptual/coredata/articles/cdimporting.html

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart