Why are Core Data NSManagedObject faults fired upon deletion? - ios

I'm trying to efficiently batch delete a lot of NSManagedObjects (without using an NSBatchDeleteRequest). I have been following the general procedure in this answer (adapted to Swift), by batching an operation which requests objects, deletes, saves and then resets the context. My fetch request sets includesPropertyValues to false.
However, when this runs, at the point where each object is deleted from the context, the fault is fired. Adding logging as follows:
// Fetch one object without property values
let f = NSFetchRequest<NSManagedObject>(entityName: "Entity")
f.includesPropertyValues = false
f.fetchLimit = 1
// Get the result from the fetch. This will be a fault
let firstEntity = try! context.fetch(f).first!
// Delete the object, watch whether the object is a fault before and after
print("pre-delete object is fault: \(firstEntity.isFault)")
context.delete(firstEntity)
print("post-delete object is fault: \(firstEntity.isFault)")
yields the output:
pre-delete object is fault: true
post-delete object is fault: false
This occurs even when there are no overrides of any CoreData methods (willSave(), prepareForDeletion(), validateForUpdate(), etc). I can't figure out what else could be causing these faults to fire.
Update: I've created a simple example in a Swift playground. This has a single entity with a single attribute, and no relationships. The playground deletes the managed object on the main thread, from the viewContext of an NSPersistentContainer, a demonstrates that the object property isFault changes from true to false.

I think an authoritative answer would require a look at the Core Data source code. Since that's not likely to be forthcoming, here are some reasons I can think of that this might be necessary.
For entities that have relationships, it's probably necessary to examine the relationship to handle delete rules and maintain data integrity. For example if the delete rule is "cascade", it's necessary to fire the fault to figure out what related instances should be deleted. If it's "nullify", fire the fault to figure out which related instances need to have their relationship value set to nil.
In addition to the above, entities with relationships need to have validation checks performed on related instances. For example if you delete an object with a relationship that uses the "nullify" delete rule, and the inverse relationship is not optional, you would fail the validation check on the inverse relationship. Checking this likely triggers firing the fault.
Binary attributes can have data automatically stored in external files (the "allows external storage" option). In order to clean up the external file, it's probably necessary to fire the fault, in order to know which file to delete.
I think all of these could probably be optimized away. For example, don't fire faults if the entity has no relationships and has no attributes that use external storage. However, this is looking from the outside without access to source code. There might be other reasons that require firing the fault. That seems likely. Or it could be that nobody has attempted this optimization, for whatever reason. That seems less likely but is possible.
BTW I forked your playground code to get a version that doesn't rely on an external data model file, but instead builds the model in code.

Tom Harrington has explained it best. CoreData's internal implementation apparently requires to fire fault when marking an object to be removed from the persistent store, just like it would if you were accessing a property of the object. As explained in this answer, "An NSManagedObject is always dynamically rendered. Hence, if it is deleted, Core Data faults out the data".
This seems to be the normal behaviour at least for the moment being, not really an issue.

Related

Core Data double-inserting child records in one-to-many association

We have an iOS application that uses Core Data to persist records fetched from a private web API. One of our API requests fetches a list of Project records, each of which has multiple associated Location records. ObjectMapper is used to deserialize the JSON response, and we have a custom transformer that assigns the nested Location attributes to a Core Data association on the Project entity.
The relevant part of the code looks like this. It's executed within a PromiseKit promise (hence the seal), and we save first to a background context and then propagate to the main context that gets used on the UI thread.
WNManagedObjectController.backgroundContext.perform {
let project = Mapper<Project>().map(JSONObject: JSON(json).object)!
try! WNManagedObjectController.backgroundContext.save()
WNManagedObjectController.managedContext.performAndWait {
do {
try WNManagedObjectController.managedContext.save()
seal.fulfill(project.objectID)
} catch {
seal.reject(error)
}
}
}
The problem we're having is that this insert process is saving each Location record to the database twice. Strangely, the duplicated Location records don't have any association with their parent Project record. That is to say, if Location records are looked up with an NSFetchRequest, or if I run a query on the underlying SQLite database, I can see that there are two entries for each Location, but project.locations only returns one copy of each Location. The same (or very similar) process applied to other record types with the same structure also results in duplicates.
I've tried several things so far to narrow down the problem:
Inspected the API JSON - no duplicates.
Inspected the state of the project.locations property immediately before the Core Data write. No duplicate records are present prior to the objects being persisted, indicating that the deserializer and custom nested attributes transformer are working correctly.
Removed the block that propagates the changes to the main thread managed object context, in case this was causing the insert to occur twice. Still get duplicates with solely the write to the background context.
Run the app with com.apple.CoreData.ConcurrencyDebug 1 set. No exception is thrown in this process, confirming that it's not a thread safety issue of some kind.
Run the app with com.apple.CoreData.SQLDebug 1 set. I can see in the logs that Core Data is inserting exactly twice as many Location rows as expected into the underlying SQLite database.
Implemented a uniqueness constraint on the entity. This fixes the problem in terms of what data gets persisted, but will still throw an error unless an NSMergePolicy is set.
The last item in that list effectively solves the problem, but it's treating the symptom, not the cause. Data integrity is important for our application, and I'm looking to understand what the underlying problem might be, or other options I might pursue for investigating it further.
Thanks!
A year and eight months later, I finally got to the bottom of this bug when a similar issue occurred with a different set of records. The problem was that I was calling ObjectMapper on each Location object twice. I was using ObjectMapper's mapArray method within a custom ObjectMapper TransformType to deserialize and persist the Location records associated with each Project, which worked as follows:
let locations = Mapper<Location>().mapArray(JSONObject: value as AnyObject)
However, what I had overlooked is that I was also overriding the constructor for Location and calling ObjectMapper again there:
required public init?(map: Map) {
let entity = NSEntityDescription.entity(forEntityName: "Location", in: WNManagedObjectController.backgroundContext)
super.init(entity: entity!, insertInto: WNManagedObjectController.backgroundContext)
mapping(map: map)
}
The line mapping(map: map) was unnecessary, and proved to be the culprit. In a similar scenario with two levels of one-to-many associations, this had the somewhat amusing consequence of quadrupling (!) the records at the second level - their parents had been duplicated, each copy of which subsequently duplicated its children. This was what ultimately led me to the cause of the bug.

NSFetchRequest with resultType set to NSDictionaryResultType and saving of changed objects

Based on some limited testing, I see that if I
Execute a Fetch request with result type = NSDictionaryResultType
Do some manipulations on the returned values
Store back the MOC on which Fetch request was executed
the changes in step 2 are not written back to the persistent store because I am changing a dictionary and not a "managed object". Is that a correct understanding?
Most likely you are abusing the dictionary result type. Unlike in conventional database programming, you are not wasting valuable memory resources when fetching the entire objects rather than just one selected attributes, due to an under-the-hood mechanism called "faulting".
Try fetching with managed object result type (default) and you can very easily manipulate your objects and save them back to Core Data. You would not need to do an additional fetch just to get the object you want to change.
Consider dictionaries only in special situations with huge data volumes, difficult relational grouping logic, etc., which make it absolutely necessary.
(That being said, it is unlikely that it is ever absolutely necessary. I have yet to encounter a case where the necessity of dictionaries for fetches was not an indirect result of flawed data model design.)
Yes, kind of, you can't store a dictionary back into the context directly so you can't save any updates that way.
If you get a dictionary object then you need to include in it the associated managed object id (if it isn't aggregated) or do another fetch to get the object(s) to update.

Core Data fetch predicate nil check failing/unexpected results?

I have a Core Data layer with several thousand entities, constantly syncing to a server. The sync process uses fetch requests to check for deleted_at for the purposes of soft-deletion. There is a single context performing save operations in a performBlockAndWait call. The relationship mapping is handled by the RestKit library.
The CoreDataEntity class is a subclass of NSManagedObject, and it is also the superclass for all our different core data object classes. It has some attributes that are inherited by all our entities, such as deleted_at, entity_id, and all the boilerplate fetch and sync methods.
My issue is some fetch requests seem to return inconsistent results after modifications to the objects. For example after deleting an object (setting deleted_at to the current date):
[CoreDataEntity fetchEntitiesWithPredicate:[NSPredicate predicateWithFormat:#"deleted_at==nil"]];
Returns results with deleted_at == [NSDate today]
I have successfully worked around this behavior by additionally looping through the results and removing the entities with deleted_at set, however I cannot fix the converse issue:
[CoreDataEntity fetchEntitiesWithPredicate:[NSPredicate predicateWithFormat:#"deleted_at!=nil"]];
Is returning an empty array in the same conditions, preventing a server sync from succeeding.
I have confirmed deleted_at is set on the object, and the context save was successful. I just don't understand where to reset whatever cache is causing the outdated results?
Thanks for any help!
Edit: Adding a little more information, it appears that once one of these objects becomes corrupted, the only way get it to register is modifying the value again. Could this be some sort of Core Data index not updating when a value is modified?
Update: It appears to be a problem with RestKit https://github.com/RestKit/RestKit/issues/2218
You are apparently using some sintactic sugar extension to Core Data. I suppose that in your case it is a SheepData, right?
fetchEntitiesWithPredicate: there implemented as follows:
+ (NSArray*)fetchEntitiesWithPredicate:(NSPredicate*)aPredicate
{
return [self fetchEntitiesWithPredicate:aPredicate inContext:[SheepDataManager sharedInstance].managedObjectContext];
}
Are you sure that [SheepDataManager sharedInstance].managedObjectContext receives all the changes that you are making to your objects? Is it receives notifications of saves, or is it child context of your save context?
Try to replace your fetch one-liner with this:
[<your saving context> performBlockAndWait:^{
NSFetchRequest *request = [NSFetchRequest fetchRequestWithEntityName:#"CoreDataEntity"];
request.predicate = [NSPredicate predicateWithFormat:#"deleted_at==nil"];
NSArray *results = [<your saving context> executeFetchRequest:request error:NULL];
}];
First, after a save have you looked in the store to make sure your changes are there? Without seeing your entire Core Data stack it is difficult to get a solid understanding what might be going wrong. If you are saving and you see the changes in the store then the question comes into your contexts. How are they built and when. If you are dealing with sibling contexts that could be causing your issue.
More detail is required as to how your core data stack looks.
Yes, the changes are there. As I mentioned in the question, I can loop through my results and remove all those with deleted_at set successfully
That wasn't my question. There is a difference between looking at objects in memory and looking at them in the SQLite file on disk. The questions I have about this behavior are:
Are the changes being persisted to disk before you query for them again
Are you working with multiple contexts and potentially trying to fetch from a stale sibling.
Thus my questions about on disk changes and what your core data stack looks like.
Threading
If you are using one context, are you using more than one thread in your app? If so, are you using that context on more than one thread?
I can see a situation where if you are violating the thread confinement rules you can be corrupting data like this.
Try adding an extra attribute deleted that is a bool with a default of false. Then the attribute is always set and you can look for entities that are either true or false depending on your needs at the moment. If the value is true then you can look at deleted_at to find out when.
Alternatively try setting the deleted_at attribute to some old date (like perhaps 1 Jan 1980), then anything that isn't deleted will have a fixed date that is too old to have been set by the user.
Edit: There is likely some issue with deleted_at having never been touched on some entities that is confusing the system. It is also possible that you have set the fetch request to return results in the dictionary style in which case recent changes will not be reflected in the fetch results.

What's the point of self.managedObjectContext == nil in NSManagedObject prepareForDeletion?

I have a Reminder entity that needs to update its date property whenever a certain entity B is deleted. I've spent some days coding thinking I could do some useful things in my managed object subclass on deletion time. I tried
- (void)willSave
{
if (self.isDeleted)
// use self.managedObjectContext
}
The context was nil. Relationships were also torn down there. Fair enough.
So... I started writing cumbersome code for prepareForDeletion to circumvent the fact that the object hadn't been deleted yet, but then Core Data throws self.managedObjectContext == nil in my face. The documentation says that this is where I do stuff "before relationships are torn down". So what is the point in self.managedObjectContext == nil if self.relationshipA.managedObjectContext is accessible (as the docs suggest)? And more importantly, why does my not yet deleted object not have its context?
I read a comment here regarding that problem
its not 'fault' as much as it is a 'disown', the context has disowned your object (he was deleted and save was committed to the database) and so your object was disowned. don't save in methods that are changing and object as the save should probably be committed/saved after the operation anyway. – Dan Shelly May 21 at 19:05
My code was:
[moc deleteObject:obj]
[moc save:NULL]
When I removed the save operation my self.managedObjectContext existed in prepareForDeletion. That is, until auto-save, when it was nil again. Probably because the parent context also deleted it, followed by a save by the UIManagedDocument.
I'm starting to think that my only options are to make a custom delete method (that works until Core Data cascades a deletion, in which case it won't be called), or make a new class that listens to NSManagedObjectContextDidSaveNotification.
Update:
The user wants to keep in touch with a person, and wants to be reminded after a certain interval (stored in ContactWish) if no contact has been made. What I'm trying to accomplish is that when the latest ContactOccasion for a certain person is deleted, the corresponding occasion->person->wish->reminder gets updated (using the interval).
Since this is a learning experience for me I wanted to find out the right way (one that works with cascade deletion etc.) and not just call for an update manually from every place in my code where I do [MOContext deleteObject:occasion]. Suggestions are welcome.
(the reminder entity has also been prepared for more manual use)
Would it not be much more logical to have the Reminder entity manage its date property? It could "listen" (maybe via changedValues:) to its relationship entities being deleted and perform the update.
This seems more consistent, as the B entity should not really be concerned with the logic of the Reminder entity updates.
Edit
Pursuant to the discussion below and based on my opinion that you cannot load up the database cascade delete model too much with update logic:
Rather than react to a deletion you can introduce an attribute that you set and listen to in order to do the changes.
I really do not see how relying on core data delete mechanisms is easier or more elegant than just writing your own "deleteOccasion" method that handles this logic.

Optimistic locking support in NSIncrementalStore subclass

I am implementing a custom NSIncrementalStore subclass which uses a relational database for persistent storage. One of the things that I still struggle with is the support for optimistic locking.
(feel free to skip this lengthy description right to my question below)
I analyzed how Core Data's SQLite incremental store approaches this problem by examining SQL logs produced by it and came up with following conclusions:
Each entity table in the database has a Z_OPT column which indicates the number of times a particular instance of this entity (row) has been modified, starting from 1 (initial insertion).
Each time a managed object is modified, Z_OPT value in its corresponding database row is incremented.
The store maintains cache (referred to as row cache in Core Data docs) of NSIncrementalStoreNode instances, each having a version property equal to Z_OPT value returned by previous SELECT or UPDATE SQL query on managed object's row.
When a managed object is returned from NSManagedObjectContext (e.g. by executing NSFetchRequest on it), MOC creates snapshot of this object which contains this version number.
When the object is modified or deleted, Core Data makes sure that it has not been modified or deleted outside the context by comparing versions of cached row and object snapshot. All of this happen when -save: is called on the context that the object belongs to. If the versions are different then a merge conflict is detected and handled based on set merging policy.
When MOC is being saved, the -newValuesForObjectWithID:withContext:error: method is called for each modified/deleted object which in turn returns NSIncrementalStoreNode with version number. This version is then compared to snapshot's version and if they are different, the save fails with appropriate merge conflicts (at least with default merge policy).
This simple use case works properly with my store since -newValuesForObjectWithID:withContext:error: checks the row cache first which is enough if the object was concurrently modified in other context using the same store instance. If this is the case, then the cache contains updated row with higher version number which is enough to detect a conflict.
But how can I detect than the underlying database has been modified outside my store, possibly by other application or other store instance using the same database file? I know this is an unfrequent edge case but Core Data handles it properly and I would prefer to do the same.
Core Data's store uses SQL queries like these to update/delete object's row:
UPDATE ZFOO SET Z_OPT=Y, (...) WHERE (...) AND Z_OPT=X
DELETE FROM ZFOO WHERE (...) AND Z_OPT=X
where:
X - version number last known to the store (from cache)
Y - new version number
If such a query fails (no rows affected) the row is updated in store's cache and its version compared against the one previously cached.
My question is: how can a custom NSIncrementalStore inform Core Data that optimistic locking failure has occurred for some updated/deleted/locked objects? It is only the store that is able to tell that when it handles NSSaveChangesRequest passed to it its -executeRequest:withContext:error: method.
If the underlying database does not change under the store, then conflicts are detected since Core Data calls -newValuesForObjectWithID:withContext:error: on each modified/deleted/locked object prior to executing save changes request on the store. I was not able to find any way for NSIncrementalStore to inform Core Data that an optimistic locking failure has occurred after it started to handle the save request. Is there some undocumented way to do that? Core Data seems to throw some exception in that case which is then magically translated into failed save request with NSError listing all the conflicts. I am only able to mimic that partly by returning nil from -executeRequest:withContext:error: and creating the error message by myself. I think there must be a way to use the standard Core Data conflict handling mechanism in this scenario as well.
I realize that this is not an answer to you question, but I will try and give you my point of view on CoreData and correlation to Databases:
(1st level cache)
NSPesistentStoreCoordinator + NSPersistentStore == A single connection to the database
(2nd level cache)
NSManagedObjectContext == cache over the connection holding changes
So, to my understanding your issue is that you have multiple connections to your store, each making changes, but you have no central version control over your records.
Your store will receive a -executeRequest:withContext:error: with NSSaveRequestType
You will then be responsible to verify that the record versions match, if you find a conflict in the connection level (level 1) you report version mismatch between the context (level 2) and the coordinator.
you need to report version missmatch between your connection (level 1) and your store.
To be able to do this your store must report changes on it across all connections to it (ConnectionManager), or it might offer hooks to changes performed on it.
I'm no SQLite expert, but the SQLite API does have something to offer in that area:
update hook
commit hook
changes
total changes
(I have no experience in setting these kind of hooks, but if CoreData use them it will not show in the debug logs)
you can report these errors by setting the error pointer (NSError**) and setting its internal data to match the one that CoreData coordinator is setting (create merge conflict and set the information in them as needed)
Note that optimistic locking failure will only occur during -executeRequest:withContext:error:
(unless you have a rogue connection to the store, one that is not tracked by the manager.
To support this behaviour your manager might need to verify each record as it is committed for a save [huge performance cost] , or use some hooks into the changes recently made to records
)
To handle multiple connections to your store you might need to have a shared cache of NSIncrementalStoreNode, keyed by the store url:
static #{
url1 : actualCacheMapping1,
url2 : actualCacheMapping2,
...
}
each connection save to the store will be verified agains the store url actual cache.
Hope this make some sense for you.
My question is: how can a custom NSIncrementalStore inform Core Data that optimistic locking failure has occurred for some updated/deleted/locked objects? It is only the store that is able to tell that when it handles NSSaveChangesRequest passed to it its -executeRequest:withContext:error: method.
In an NSIncrementalStore, NSIncrementalStoreNodes represent the store snapshots. The version property of the node is the optimistic locking primitive. The persistent store is responsible for detecting optimistic locking failures in at the store level, while the managed object context can detect them higher up. An optimistic locking failure at the store level might happen if the system the store is talking to was changed by something else, and there is a conflict between that system's state and that representation of state in the persistent store. For example, if the store was communicating with a web service and the web service data was changed by another user, etc.
If an optimistic locking failure is detected in your store implementation during a save, your store is responsible for creating NSMergeConflict objects describing it. These will be propagated up by the NSPersistentStoreCoordinator.
[[NSMergeConflict alloc] initWithSource:managedObject newVersion:newVersion oldVersion:oldVersion cachedSnapshot:inMemorySnapshot persistedSnapshot:storedSnapshot];
Snapshot dictionaries should include all modelled attribute property names as keys along with their values. This does not include relationships. For some stores, using the values from the reference objects or NSIncrementalStoreNodes may suffice as long as they only include the modelled attribute property name as keys (and those are easy to get from the entity description).
Once these objects have been created, create an NSError in the NSCocoaErrorDomain with the code NSPersistentStoreSaveConflictsError. The userInfo object should contain the key NSPersistentStoreSaveConflictsErrorKey which should contain an array of the NSMergeConflict objects. Return that from the save request, and the NSPersistentStoreCoordinator will be responsible for finding resolution. Rememeber, you should not generate merge conflicts for conflicts between the state of objects in the NSManagedObjectContext and your store, only for conflicts between whatever in-memory or cached state in your store and where ever the data is kept or persisted (like a web service, or database, etc.)

Resources