Core Data fetch predicate nil check failing/unexpected results? - ios

I have a Core Data layer with several thousand entities, constantly syncing to a server. The sync process uses fetch requests to check for deleted_at for the purposes of soft-deletion. There is a single context performing save operations in a performBlockAndWait call. The relationship mapping is handled by the RestKit library.
The CoreDataEntity class is a subclass of NSManagedObject, and it is also the superclass for all our different core data object classes. It has some attributes that are inherited by all our entities, such as deleted_at, entity_id, and all the boilerplate fetch and sync methods.
My issue is some fetch requests seem to return inconsistent results after modifications to the objects. For example after deleting an object (setting deleted_at to the current date):
[CoreDataEntity fetchEntitiesWithPredicate:[NSPredicate predicateWithFormat:#"deleted_at==nil"]];
Returns results with deleted_at == [NSDate today]
I have successfully worked around this behavior by additionally looping through the results and removing the entities with deleted_at set, however I cannot fix the converse issue:
[CoreDataEntity fetchEntitiesWithPredicate:[NSPredicate predicateWithFormat:#"deleted_at!=nil"]];
Is returning an empty array in the same conditions, preventing a server sync from succeeding.
I have confirmed deleted_at is set on the object, and the context save was successful. I just don't understand where to reset whatever cache is causing the outdated results?
Thanks for any help!
Edit: Adding a little more information, it appears that once one of these objects becomes corrupted, the only way get it to register is modifying the value again. Could this be some sort of Core Data index not updating when a value is modified?
Update: It appears to be a problem with RestKit https://github.com/RestKit/RestKit/issues/2218

You are apparently using some sintactic sugar extension to Core Data. I suppose that in your case it is a SheepData, right?
fetchEntitiesWithPredicate: there implemented as follows:
+ (NSArray*)fetchEntitiesWithPredicate:(NSPredicate*)aPredicate
{
return [self fetchEntitiesWithPredicate:aPredicate inContext:[SheepDataManager sharedInstance].managedObjectContext];
}
Are you sure that [SheepDataManager sharedInstance].managedObjectContext receives all the changes that you are making to your objects? Is it receives notifications of saves, or is it child context of your save context?
Try to replace your fetch one-liner with this:
[<your saving context> performBlockAndWait:^{
NSFetchRequest *request = [NSFetchRequest fetchRequestWithEntityName:#"CoreDataEntity"];
request.predicate = [NSPredicate predicateWithFormat:#"deleted_at==nil"];
NSArray *results = [<your saving context> executeFetchRequest:request error:NULL];
}];

First, after a save have you looked in the store to make sure your changes are there? Without seeing your entire Core Data stack it is difficult to get a solid understanding what might be going wrong. If you are saving and you see the changes in the store then the question comes into your contexts. How are they built and when. If you are dealing with sibling contexts that could be causing your issue.
More detail is required as to how your core data stack looks.
Yes, the changes are there. As I mentioned in the question, I can loop through my results and remove all those with deleted_at set successfully
That wasn't my question. There is a difference between looking at objects in memory and looking at them in the SQLite file on disk. The questions I have about this behavior are:
Are the changes being persisted to disk before you query for them again
Are you working with multiple contexts and potentially trying to fetch from a stale sibling.
Thus my questions about on disk changes and what your core data stack looks like.
Threading
If you are using one context, are you using more than one thread in your app? If so, are you using that context on more than one thread?
I can see a situation where if you are violating the thread confinement rules you can be corrupting data like this.

Try adding an extra attribute deleted that is a bool with a default of false. Then the attribute is always set and you can look for entities that are either true or false depending on your needs at the moment. If the value is true then you can look at deleted_at to find out when.
Alternatively try setting the deleted_at attribute to some old date (like perhaps 1 Jan 1980), then anything that isn't deleted will have a fixed date that is too old to have been set by the user.
Edit: There is likely some issue with deleted_at having never been touched on some entities that is confusing the system. It is also possible that you have set the fetch request to return results in the dictionary style in which case recent changes will not be reflected in the fetch results.

Related

Why are Core Data NSManagedObject faults fired upon deletion?

I'm trying to efficiently batch delete a lot of NSManagedObjects (without using an NSBatchDeleteRequest). I have been following the general procedure in this answer (adapted to Swift), by batching an operation which requests objects, deletes, saves and then resets the context. My fetch request sets includesPropertyValues to false.
However, when this runs, at the point where each object is deleted from the context, the fault is fired. Adding logging as follows:
// Fetch one object without property values
let f = NSFetchRequest<NSManagedObject>(entityName: "Entity")
f.includesPropertyValues = false
f.fetchLimit = 1
// Get the result from the fetch. This will be a fault
let firstEntity = try! context.fetch(f).first!
// Delete the object, watch whether the object is a fault before and after
print("pre-delete object is fault: \(firstEntity.isFault)")
context.delete(firstEntity)
print("post-delete object is fault: \(firstEntity.isFault)")
yields the output:
pre-delete object is fault: true
post-delete object is fault: false
This occurs even when there are no overrides of any CoreData methods (willSave(), prepareForDeletion(), validateForUpdate(), etc). I can't figure out what else could be causing these faults to fire.
Update: I've created a simple example in a Swift playground. This has a single entity with a single attribute, and no relationships. The playground deletes the managed object on the main thread, from the viewContext of an NSPersistentContainer, a demonstrates that the object property isFault changes from true to false.
I think an authoritative answer would require a look at the Core Data source code. Since that's not likely to be forthcoming, here are some reasons I can think of that this might be necessary.
For entities that have relationships, it's probably necessary to examine the relationship to handle delete rules and maintain data integrity. For example if the delete rule is "cascade", it's necessary to fire the fault to figure out what related instances should be deleted. If it's "nullify", fire the fault to figure out which related instances need to have their relationship value set to nil.
In addition to the above, entities with relationships need to have validation checks performed on related instances. For example if you delete an object with a relationship that uses the "nullify" delete rule, and the inverse relationship is not optional, you would fail the validation check on the inverse relationship. Checking this likely triggers firing the fault.
Binary attributes can have data automatically stored in external files (the "allows external storage" option). In order to clean up the external file, it's probably necessary to fire the fault, in order to know which file to delete.
I think all of these could probably be optimized away. For example, don't fire faults if the entity has no relationships and has no attributes that use external storage. However, this is looking from the outside without access to source code. There might be other reasons that require firing the fault. That seems likely. Or it could be that nobody has attempted this optimization, for whatever reason. That seems less likely but is possible.
BTW I forked your playground code to get a version that doesn't rely on an external data model file, but instead builds the model in code.
Tom Harrington has explained it best. CoreData's internal implementation apparently requires to fire fault when marking an object to be removed from the persistent store, just like it would if you were accessing a property of the object. As explained in this answer, "An NSManagedObject is always dynamically rendered. Hence, if it is deleted, Core Data faults out the data".
This seems to be the normal behaviour at least for the moment being, not really an issue.

Why does storing a reference to an NSManagedObject prevent it from updating?

This question is poorly phased but this can be better explained in code.
We have a Core Data Stack with private and main contexts as defined by Marcus Zarra here: http://martiancraft.com/blog/2015/03/core-data-stack/
We call a separate class to do a fetch request (main context) and return an array of NSManagedObjects:
NSArray *ourManagedObjects = [[Client sharedClient].coreDataManager fetchArrayForClass:[OurObject class] sortKey:#"name" ascending:YES];
We then do some processing and store a reference:
self.ourObjects = processedManagedObjects
Our view contains a UITableView and this data is used to populate it and that works just fine.
We change the data on our CMS, pull to refresh on the UITableView to trigger a sync (private context) and then call this same function to retrieve the updated data. However, the fetch request returns the exact same data as before even though when I check the sqlite db directly it contains the new data. To get the new values to display I have to reload the app.
I have discovered that if I don't assign the processedManagedObjects to self, the fetch request does indeed return the correct data, so it looks like holding a reference to the NSManagedObject stops it from getting new data from the main context. However I have no idea why that would be.
To clarify, we're pretty sure there's nothing wrong with our Core Data Stack, even when these managed objects are not being updated, other are being updated just fine, it's only this one where we store a local reference.
It sounds like what's going on is:
Managed objects don't automatically update themselves to reflect the latest data in the persistent store when changes are made via a different managed object context.
As a result, if you keep a reference to the objects, they keep whatever data they already had.
On the other hand if you don't keep a reference but instead re-fetch them, you get the new data because there was no managed object hanging around with its old data.
You have a few options:
You could keep the reference and have your context refresh the managed objects, using either the refresh(_, mergeChanges:) method or refreshAllObjects().
If it makes sense for your app, use an NSFetchedResultsController and use its delegate methods to be notified of changes.
Don't keep the reference.
The first is probably best-- refreshAllObjects() is probably what you want. Other options might be better based on other details of your app.
Try setting the shouldRefreshRefetchedObjects property of the fetch request to true. According to the documentation:
By default when you fetch objects, they maintain their current property values, even if the values in the persistent store have changed. Invoking this method with the parameter true means that when the fetch is executed, the property values of fetched objects are updated with the current values in the persistent store.

NSFetchRequest with resultType set to NSDictionaryResultType and saving of changed objects

Based on some limited testing, I see that if I
Execute a Fetch request with result type = NSDictionaryResultType
Do some manipulations on the returned values
Store back the MOC on which Fetch request was executed
the changes in step 2 are not written back to the persistent store because I am changing a dictionary and not a "managed object". Is that a correct understanding?
Most likely you are abusing the dictionary result type. Unlike in conventional database programming, you are not wasting valuable memory resources when fetching the entire objects rather than just one selected attributes, due to an under-the-hood mechanism called "faulting".
Try fetching with managed object result type (default) and you can very easily manipulate your objects and save them back to Core Data. You would not need to do an additional fetch just to get the object you want to change.
Consider dictionaries only in special situations with huge data volumes, difficult relational grouping logic, etc., which make it absolutely necessary.
(That being said, it is unlikely that it is ever absolutely necessary. I have yet to encounter a case where the necessity of dictionaries for fetches was not an indirect result of flawed data model design.)
Yes, kind of, you can't store a dictionary back into the context directly so you can't save any updates that way.
If you get a dictionary object then you need to include in it the associated managed object id (if it isn't aggregated) or do another fetch to get the object(s) to update.

What's the point of self.managedObjectContext == nil in NSManagedObject prepareForDeletion?

I have a Reminder entity that needs to update its date property whenever a certain entity B is deleted. I've spent some days coding thinking I could do some useful things in my managed object subclass on deletion time. I tried
- (void)willSave
{
if (self.isDeleted)
// use self.managedObjectContext
}
The context was nil. Relationships were also torn down there. Fair enough.
So... I started writing cumbersome code for prepareForDeletion to circumvent the fact that the object hadn't been deleted yet, but then Core Data throws self.managedObjectContext == nil in my face. The documentation says that this is where I do stuff "before relationships are torn down". So what is the point in self.managedObjectContext == nil if self.relationshipA.managedObjectContext is accessible (as the docs suggest)? And more importantly, why does my not yet deleted object not have its context?
I read a comment here regarding that problem
its not 'fault' as much as it is a 'disown', the context has disowned your object (he was deleted and save was committed to the database) and so your object was disowned. don't save in methods that are changing and object as the save should probably be committed/saved after the operation anyway. – Dan Shelly May 21 at 19:05
My code was:
[moc deleteObject:obj]
[moc save:NULL]
When I removed the save operation my self.managedObjectContext existed in prepareForDeletion. That is, until auto-save, when it was nil again. Probably because the parent context also deleted it, followed by a save by the UIManagedDocument.
I'm starting to think that my only options are to make a custom delete method (that works until Core Data cascades a deletion, in which case it won't be called), or make a new class that listens to NSManagedObjectContextDidSaveNotification.
Update:
The user wants to keep in touch with a person, and wants to be reminded after a certain interval (stored in ContactWish) if no contact has been made. What I'm trying to accomplish is that when the latest ContactOccasion for a certain person is deleted, the corresponding occasion->person->wish->reminder gets updated (using the interval).
Since this is a learning experience for me I wanted to find out the right way (one that works with cascade deletion etc.) and not just call for an update manually from every place in my code where I do [MOContext deleteObject:occasion]. Suggestions are welcome.
(the reminder entity has also been prepared for more manual use)
Would it not be much more logical to have the Reminder entity manage its date property? It could "listen" (maybe via changedValues:) to its relationship entities being deleted and perform the update.
This seems more consistent, as the B entity should not really be concerned with the logic of the Reminder entity updates.
Edit
Pursuant to the discussion below and based on my opinion that you cannot load up the database cascade delete model too much with update logic:
Rather than react to a deletion you can introduce an attribute that you set and listen to in order to do the changes.
I really do not see how relying on core data delete mechanisms is easier or more elegant than just writing your own "deleteOccasion" method that handles this logic.

Core Data: delete all objects of an entity type, ie clear a table

This has been asked before, but no solution described that is fast enough for my app needs.
In the communications protocol we have set up, the server sends down a new set of all customers every time a sync is performed. Earlier, we had been storing as a plist. Now want to use Core Data.
There can be thousands of entries. Deleting each one individually takes a long time. Is there a way to delete all rows in a particular table in Core Data?
delete from customer
This call in sqlite happens instantly. Going through each one individually in Core Data can take 30 seconds on an iPad1.
Is it reasonable to shut down Core Data, i.e. drop the persistence store and all managed object contexts, then drop into sqlite and perform the delete command against the table? No other activity is going on during this process so I don't need access to other parts of the database.
Dave DeLong is an expert at, well, just about everything, and so I feel like I'm telling Jesus how to walk on water. Granted, his post is from 2009, which was a LONG time ago.
However, the approach in the link posted by Bot is not necessarily the best way to handle large deletes.
Basically, that post suggests to fetch the object IDs, and then iterate through them, calling delete on each object.
The problem is that when you delete a single object, it has to go handle all the associated relationships as well, which could cause further fetching.
So, if you must do large scale deletes like this, I suggest adjusting your overall database so that you can isolate tables in specific core data stores. That way you can just delete the entire store, and possibly reconstruct the small bits that you want to remain. That will probably be the fastest approach.
However, if you want to delete the objects themselves, you should follow this pattern...
Do your deletes in batches, inside an autorelease pool, and be sure to pre-fetch any cascaded relationships. All these, together, will minimize the number of times you have to actually go to the database, and will, thus, decrease the amount of time it takes to perform your delete.
In the suggested approach, which comes down to...
Fetch ObjectIds of all objects to be deleted
Iterate through the list, and delete each object
If you have cascade relationships, you you will encounter a lot of extra trips to the database, and IO is really slow. You want to minimize the number of times you have to visit the database.
While it may initially sound counterintuitive, you want to fetch more data than you think you want to delete. The reason is that all that data can be fetched from the database in a few IO operations.
So, on your fetch request, you want to set...
[fetchRequest setRelationshipKeyPathsForPrefetching:#[#"relationship1", #"relationship2", .... , #"relationship3"]];
where those relationships represent all the relationships that may have a cascade delete rule.
Now, when your fetch is complete, you have all the objects that are going to be deleted, plus the objects that will be deleted as a result of those objects being deleted.
If you have a complex hierarchy, you want to prefetch as much as possible ahead of time. Otherwise, when you delete an object, Core Data is going to have to go fetch each relationship individually for each object so that it can managed the cascade delete.
This will waste a TON of time, because you will do many more IO operations as a result.
Now, after your fetch has completed, then you loop through the objects, and delete them. For large deletes you can see an order of magnitude speed up.
In addition, if you have a lot of objects, break it up into multiple batches, and do it inside an auto release pool.
Finally, do this in a separate background thread, so your UI does not pend. You can use a separate MOC, connected to a persistent store coordinator, and have the main MOC handle DidSave notifications to remove the objects from its context.
WHile this looks like code, treat it as pseudo-code...
NSManagedObjectContext *deleteContext = [[NSManagedObjectContext alloc] initWithConcurrencyType:NSPrivateConcurrencyType];
// Get a new PSC for the same store
deleteContext.persistentStoreCoordinator = getInstanceOfPersistentStoreCoordinator();
// Each call to performBlock executes in its own autoreleasepool, so we don't
// need to explicitly use one if each chunk is done in a separate performBlock
__block void (^block)(void) = ^{
NSFetchRequest *fetchRequest = //
// Only fetch the number of objects to delete this iteration
fetchRequest.fetchLimit = NUM_ENTITIES_TO_DELETE_AT_ONCE;
// Prefetch all the relationships
fetchRequest.relationshipKeyPathsForPrefetching = prefetchRelationships;
// Don't need all the properties
fetchRequest.includesPropertyValues = NO;
NSArray *results = [deleteContext executeFetchRequest:fetchRequest error:&error];
if (results.count == 0) {
// Didn't get any objects for this fetch
if (nil == results) {
// Handle error
}
return;
}
for (MyEntity *entity in results) {
[deleteContext deleteObject:entity];
}
[deleteContext save:&error];
[deleteContext reset];
// Keep deleting objects until they are all gone
[deleteContext performBlock:block];
};
[deleteContext preformBlock:block];
Of course, you need to do appropriate error handling, but that's the basic idea.
Fetch in batches if you have so much data to delete that it will cripple memory.
Don't fetch all the properties.
Prefetch relationships to minimize IO operations.
Use autoreleasepool to keep memory from growing.
Prune the context.
Perform the task on a background thread.
If you have a really complex graph, make sure you prefetch all the cascaded relationships for all entities in your entire object graph.
Note, your main context will have to handle DidSave notifications to keep its context in step with the deletions.
EDIT
Thanks. Lots of good points. All well explained except, why create the
separate MOC? Any thoughts on not deleting the entire database, but
using sqlite to delete all rows from a particular table? – David
You use a separate MOC so the UI is not blocked while the long delete operation is happening. Note, that when the actual commit to the database happens, only one thread can be accessing the database, so any other access (like fetching) will block behind any updates. This is another reason to break the large delete operation into chunks. Small pieces of work will provide some chance for other MOC(s) to access the store without having to wait for the whole operation to complete.
If this causes problems, you can also implement priority queues (via dispatch_set_target_queue), but that is beyond the scope of this question.
As for using sqlite commands on the Core Data database, Apple has repeatedly said this is a bad idea, and you should not run direct SQL commands on a Core Data database file.
Finally, let me note this. In my experience, I have found that when I have a serious performance problem, it is usually a result of either poor design or improper implementation. Revisit your problem, and see if you can redesign your system somewhat to better accommodate this use case.
If you must send down all the data, perhaps query the database in a background thread and filter the new data so you break your data into three sets: objects that need modification, objects that need deletion, and objects that need to be inserted.
This way, you are only changing the database where it needs to be changed.
If the data is almost brand new every time, consider restructuring your database where these entities have their own database (I assume your database already contains multiple entities). That way you can just delete the file, and start over with a fresh database. That's fast. Now, reinserting several thousand objects is not going to be fast.
You have to manage any relationships manually, across stores. It's not difficult, but it's not automatic like relationships within the same store.
If I did this, I would first create the new database, then tear down the existing one, replace it with the new one, and then delete the old one.
If you are only manipulating your database via this batch mechanism, and you do not need object graph management, then maybe you want to consider using sqlite instead of Core Data.
iOS 9 and later
Use NSBatchDeleteRequest. I tested this in the simulator on a Core Data entity with more than 400,000 instances and the delete was almost instantaneous.
// fetch all items in entity and request to delete them
let fetchRequest = NSFetchRequest(entityName: "MyEntity")
let deleteRequest = NSBatchDeleteRequest(fetchRequest: fetchRequest)
// delegate objects
let myManagedObjectContext = (UIApplication.sharedApplication().delegate as! AppDelegate).managedObjectContext
let myPersistentStoreCoordinator = (UIApplication.sharedApplication().delegate as! AppDelegate).persistentStoreCoordinator
// perform the delete
do {
try myPersistentStoreCoordinator.executeRequest(deleteRequest, withContext: myManagedObjectContext)
} catch let error as NSError {
print(error)
}
Note that the answer that #Bot linked to and that #JodyHagins mentioned has also been updated to this method.
Really your only option is to remove them individually. I do this method with a ton of objects and it is pretty fast. Here is a way someone does it by only loading the managed object ID so it prevents any unnecessary overhead and makes it faster.
Core Data: Quickest way to delete all instances of an entity
Yes, it's reasonable to delete the persistent store and start from scratch. This happen fairly quick. What you can do is remove the persistent store (with the persistent store URL) from the persistent store coordinator, and then use the url of the persistent store to delete the database file from your directory folder. I did it using NSFileManager's removeItemAtURL.
Edit: one thing to consider: Make sure to disable/release the current NSManagedObjectContext instance, and to stop any other thread which might be doing something with a NSManagedObjectContext which is using the same persistent store. Your application will crash if a context tries to access the persistent store.

Resources