Checking for duplicates when importing to CoreData

Checking for duplicates when importing to CoreData - ios

I'm importing data into a Core Data store using RestKit and need to check for duplicates. If the item is already in the store, I'd like to update it with the latest attributes. If it's a new item, I'd like to create it.
The import was slow so I used Instruments and saw that the longest part of importing was checking to see if the item already exists (with a fetch request)
So I'd like to know if checking to see if the item is already in the store, is it faster to:
use countForFetchRequest to see if the item already exists, then executeFetchRequest to return the item to update or
just executeFetchRequest to get the item to update
or is there a better way to do this?
I thought countForFetchRequest would be faster since the entire NSManagedObject isn't returned and only execute the fetch request if I know there's going to be a NSManagedObject.
Thanks
- (Product *)productWithId:(int)productID {
NSManagedObjectContext *context = [Model sharedInstance].managedObjectContext;
NSPredicate *predicate = [NSPredicate predicateWithFormat:#"product_id == %d", productID];
NSFetchRequest *request = [[NSFetchRequest alloc] init];
request.entity = [NSEntityDescription entityForName:#"Product" inManagedObjectContext:context];
request.predicate = predicate;
request.fetchLimit = 1;
NSError *error = nil;
NSUInteger count = [context countForFetchRequest:request error:&error];
if (!error && count == 1) {
NSArray *results = [context executeFetchRequest:request error:&error];
if (!error && [results count]) {
return [results objectAtIndex:0];
}
return nil;
}
return nil;
}

As far I know, the best way to find and/or import objects within Core Data is described in Implementing Find-or-Create Efficiently.
The documentation describes a find or create pattern that it's based on sorting data: the data you download from the service and the data you grab form the store.
I really suggest you to read the link I provided. You will see a speed up on your performances.
Obviously you should do the work in background, preventing the main thread to freeze, using thread confinement or new iOS Core Data queue API.
Hope that helps.

Related

NSBatchDeleteRequest does not delete relationship

I have a problem with NSBatchDeleteRequest seems that is not possible to delete relationship references.
I have two entities:
News
Categories
where a category can have multiple news.
Now, when I try to delete all the objects in the core data using NSBatchDeleteRequest with the following code, then looking into the sqlite file seems that all categories are deleted, all news are deleted, but the relationship between categories and news persists, and this cause faults.
Here the delete function:
NSFetchRequest *fetchRequest = [NSFetchRequest fetchRequestWithEntityName:entityName];
NSBatchDeleteRequest *delete = [[NSBatchDeleteRequest alloc] initWithFetchRequest:fetchRequest];
[delete setResultType:NSBatchDeleteResultTypeCount];
NSError *error;
NSBatchDeleteResult *results = [deleteContext executeRequest:delete error:&error];
Any idea on how to fix this?

You can probably do [manageObjectContext reset];

Set shouldDeleteInaccessibleFaults: to YES and inaccessible/unfulfillable faults will be deleted. This solves the immediate problem.
The WWDC 2015 session What's New in Core Data talks about this a little bit. Both NSBatchDeleteRequest and NSBatchUpdateRequest modify the persistent store without the participation of the NSManagedObjectContext - which will result in the context's view of the data being inconsistent with the store.
The in-memory copy of the deleted object needs to be updated in the NSManagedObjectContext - have the batch delete request return the object IDs of the deleted objects and tell the NSManagedObjectContext to refresh those IDs.
This would look something like this:
[managedObjectContext performBlock:^{
NSBatchDeleteRequest batchDeleteRequest = [NSBatchDeleteRequest alloc] initWithFetchRequest:fetchRequest];
NSBatchDeleteResult result = nil;
result = [managedObjectContext executeRequest:batchDeleteRequest error:&error];
if ([[result result] count] > 0){
[managedObjectContext performBlock:^{
NSArray<NSManagedObjectID *> *objectIDs = (NSArray<NSManagedObjectID *>)[result result];
[objectIDs enumerateObjectsUsingBlock:^(NSManagedObjectID *objID, NSUInteger idx, BOOL *stop) {
NSError *error = nil;
NSManagedObject *obj = [managedObjectContext existingObjectWithID:objID error:&error];
if (![obj isFault] ) {
[managedObjectContext refreshObject:obj mergeChanges:YES];
}
}];
}];
}
}];
When the batch delete runs, relationships will be deleted or nullified, but a cascading set of delete rules may not be executed, and validation rules will not be executed - it is up to your application to ensure data integrity when using any of the batch change requests.
Your data model may require you to issue multiple delete requests to prevent related objects from being orphaned but still findable. For example, you may need a second batch delete to locate previously related entities that now have empty relationships. The predicate for such a request may look like:
NSPredicate *predicate = [NSPredicate predicateWithFormat:#"toMany.#count == 0"];
Or may use a subquery, etc.

I think the best solution might be to first delete the categories in the object graph, thus nullifying all relationships.
After that you could proceed with the NSBatchDeleteRequest for the news items.

Finding duplicate values in core data

i'm inserting new objects into the database by core data. Is there any way to check if there is any duplicate in the database before i insert the values in?
AccountDetails * newEntry = [NSEntityDescription insertNewObjectForEntityForName:#"AccountDetails" inManagedObjectContext:self.managedObjectContext];
newEntry.acc_date=date;
newEntry.bank_id=bank_id1;
NSError *error;
if (![self.managedObjectContext save:&error]) {
NSLog(#"Whoops, couldn't save: %#", [error localizedDescription]);
}
[self.view endEditing:YES];
everytime i run the app , it reinsert the values again. i want to check if there is any new category in it if there isnt then i will add that new one in only.
thanks in advance..

You can fetch or you can count. Counting is much faster than fetching. Depends on what you are trying to do.
If you just want to insert new and skip duplicates then use -[NSManagedObjectContext countForFetchRequest: error:] to determine if the object exists.
You can pre-build the predicate and just replace the unique value on each loop so that even the cost of the predicate is low. This is fairly performant but not the best solution because it hits the disk on each loop.
Another option would be to change the fetch to have:
Just the unique value
A NSDictionary result
Then grab all of the unique values from your insertable array into an array of strings (for example) then do a single fetch with:
[fetchRequest setPredicate:[NSPredicate predicateWithFormat:#"myUnique in %#", uniqueIDArray]];
Then you have an array of uniques that ARE in the store already. From there as you loop over your objects you check against that array, if the unique in there you skip, otherwise you insert. That will yield the best performance for a straight insert or skip requirement.

You need to fetch from the db and check, your code will be doing something like this helper method I use frequently in my code, if the results.count is > 1, then DUPLICATE found :
- (NSManagedObject*) findOrCreateObjectByValue:(id)value
propertyName:(NSString*)propertyName
entityName:(NSString*)entityName
additionalInfo:(NSDictionary*)additionalInfo
context:(NSManagedObjectContext*)context
error:(NSError* __autoreleasing*)error
{
NSManagedObject* res = nil;
NSFetchRequest* r = [NSFetchRequest fetchRequestWithEntityName:entityName];
[r setPredicate:[NSPredicate predicateWithFormat:#"%K == %#",propertyName,value]];
NSArray* matched = [context executeFetchRequest:r
error:error];
if (matched) {
if ([matched count] < 2) {
res = [matched lastObject];
if (!res) { //No existing objects found, create one
res = [NSEntityDescription insertNewObjectForEntityForName:entityName
inManagedObjectContext:context];
[res setValue:value
forKey:propertyName];
}
} else {
if (error) {
*error = [NSError errorWithDomain:#"some_domain"
code:9999
userInfo:#{#"description" : #"duplicates found"}];
}
}
}
return res;
}

Updating Core Data Properties WITHOUT using Table Views

I have a POS type app that uses Core Data to store daily sales transactions using table views. I am attempting to retrieve and update certain Core Date Properties, like daily sales counts, WITHOUT using table views. Table views use row at index path to point to the correct object (row). I am using the Fetched Results controller with a predicate to retrieve the fetched object (row) Question: How do I obtain the index of the fetched row so that I can retrieve and then update the correct property values? All books and examples use table views to change properties.
Entity Product
Product *product;
______________________________
[self setupFetchedResultsController]; (This returns one object)
product = [NSFetchedResultsController objectAtIndexPath:[NSIndexPath indexPathForRow:0 inSection:0]]; (objectAtIndexPath - Errors of course)

I think you shouldn't use NSFetchedResultsController in this case. If you don't want to use it in either a UITableView or a UICollectionView, you're probably better of without it. You're probably better of using a NSFetchRequest instead, it's pretty easy to set up:
NSFetchRequest *fetchRequest = [NSFetchRequest fetchRequestWithEntityName:#"Entity"];
NSPredicate *predicate = [NSPredicate predicateWithFormat:#"someValue=1"];
fetchRequest.predicate = predicate;
NSError *error = nil;
NSArray *array = [self.managedObjectContext executeFetchRequest:fetchRequest error:&error];
Now you have a NSArray with all the results, which you could use without having to deal with index paths.
If you're still using a NSFetchedResultController for a table (I'm not sure if you do), those rows will still be updated whenever you make a change.
Update: To update one of the objects returned by the fetch, could be done like this:
Entity *entity = [array firstObject];
[entity setSomeProperty:#"CoreDataIsAwesome"];
NSError *error = nil;
if ([self.managedObjectContext save:&error]) {
NSLog(#"Entity updated!");
} else {
NSLog(#"Something went wrong: %#", error);
}

You can use the method indexPathOfObject: on your fetched results controller to return the index path of the given object to then do your updates.

CoreData updating an NSManagedObject more than once saves several copies?

I have 90 CoreData entities called "ItemModel" with 2 attributes 'uid', 'description', where each of the item is inserted as an NSManagedObject:
NSManagedObject *object = [NSEntityDescription insertNewObjectForEntityForName: #"ItemModel" inManagedObjectContext: AFYDelegate.managedObjectContext];
The first server call assigns the 'uid' to each of the 90 items fetched above for key "uid". The context is not saved here.
On a later second server call, I like to update 'description' for the 90 items, for each of the NSManagedObject using indexPath - by fetching and passing each object to the following method and saving the context:
[self updateItemToDataModel:object withData: description];
....
....
- (void)updateItemToDataModel:(NSManagedObject *) object withData:(NSString *)data
{
[object setValue:data forKey:#"description"];
NSError * error = nil;
if (![self.managedObjectContext save:&error]) {
//Handle any error with the saving of the context
NSLog(#"%#",error.localizedDescription);
}
}
The above works fine in updating CoreData BUT after closing the Simulator and running the code again, there will be two duplicates for each item with the same 'uid' and 'description'. This means I have 180 items now. Repeatedly closing and running the code creates more and more items.
I tried removing updateItemToDataModel method, resetting the Simulator and it works fine with 90 items.
I'm new to CoreData if someone can help. What's wrong with my code if I only wished to update existing items?

You are inserting a new object into the MOC (managed object context) each time--instead of doing a fetch and finding an existing instance of the object you wish to update.
To fetch the existing object you might execute a fetch request like so...
NSPredicate * predicate = [NSPredicate predicateWithFormat:#"uid == %#", uidToMatch];
NSFetchRequest * fetchRequest = [[NSFetchRequest alloc] init];
[fetchRequest setPredicate:predicate];
[fetchRequest setEntity:[NSEntityDescription entityForName:#"ItemModel" inManagedObjectContext:managedObjectContext]];
NSError * error = nil;
NSArray * results = [managedObjectContext executeFetchRequest:fetchRequest error:&error];
if ([results count]) {
// you may need to handle more than one match in your code...
// you could also set a fetch limit of 1 and guarantee you only get the first object, eg: [fetchRequest setFetchLimit:1];
}
else {
// no results
}
You might want to wrap that in a helper function so you can re-use it. And read up on NSFetchRequest, NSPredicate and writing predicates in order to do fancier fetch requests.

Any way to optimize simple NSFetchRequest for single object?

I'm dong data processing in a child moc in a background queue. I need to query the database by ID so that I can differentiate updating-existing-object from creating-new-object. I found most of the time(the total processing time is about 2s for 50 items) is consumed by executeFetchRequest:error:. The NSPredicate is of the simplest form — only to match a single ID attribute(ID attribute is already indexed), and the NSFetchRequest should return one or none(ID is unique). Is there any way to optimize this kind of NSFetchRequest?
Here is my current code:
+ (User *)userWithID:(NSNumber *)ID inManagedObjectContext:(NSManagedObjectContext *)context {
NSFetchRequest *fetchRequest = [NSFetchRequest fetchRequestWithEntityName:#"User"];
NSPredicate *predicate = [NSPredicate predicateWithFormat:#"ID == %#", ID];
[fetchRequest setPredicate:predicate];
[fetchRequest setFetchBatchSize:1];
NSError *error = nil;
NSArray *users = [context executeFetchRequest:fetchRequest error:&error];
if (error) {
abort();
}
if ([users count] == 1) {
return [users objectAtIndex:0];
} else if ([users count] > 1) {
// Sanity check.
…
} else {
return nil;
}
}

As #ChrisH pointed out in comments under the question, doing a fetch for every ID is no good. So I changed my processing flow to this:
Enumerate data the first time to extract IDs.
Do a single fetch to fetch all existing users matching IDs and put them in a dictionary keyed by ID(named as existingUsers).
Enumerate data the second time to do the real processing: in each iteration, either update one existing user found in existingUsers or create a new user, add it into existingUsers if it is new.
The code is almost doubled, but so is the performance. Really good tradeoff!

To expand on my comment to the original question, it's not efficient to repeatedly perform fetch requests with Core Data when importing data.
The simplest approach, as #an0 indicated, is to perform one fetch of all the existing objects you will be checking against, and then constructing an NSDictionary containing the objects with the attribute you will be checking as keys. So sticking with the original User and userID example:
NSFetchRequest *fetchRequest = [NSFetchRequest fetchRequestWithEntityName:#"User"];
NSError *error = nil;
NSArray *users = [context executeFetchRequest:fetchRequest error:&error];
if (error) {
//handle appropriately
}
NSMutableDictionary *userToIdMap = [NSMutableDictionary dictionary];
for (User *user in users){
[userToIdMap setObject:user forKey:user.ID];
}
Now in your method that processes new data you can check the userToIdMap dictionary instead of making fetch requests.
A more sophisticated approach, suited to larger data sets, is outlined in the Core Data Programming Guide's Efficently Importing Data. Take a look at the section called 'Implementing Find-Or-Create Efficiently'. The approach suggested by Apple here misses out some code regarding how to walk the arrays you create, and my solution to that problem is in this SO question: Basic array comparison algorithm

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Checking for duplicates when importing to CoreData - ios

Related

NSBatchDeleteRequest does not delete relationship

Finding duplicate values in core data

Updating Core Data Properties WITHOUT using Table Views

CoreData updating an NSManagedObject more than once saves several copies?

Any way to optimize simple NSFetchRequest for single object?

Categories

Resources