Multi-threading with core data and API requests

Multi-threading with core data and API requests - ios

Intro
I've read alot of tutorials and articles on Core Data concurrency, but I'm having an issue that is not often covered, or covered in a real-world way that I am hoping someone can help with. I've checked the related questions in SO and none give an answer to this particular question that I can find.
Background
We have an existing application which fetches data from an API (in the background thread) and then saves the records returned into core data. We also need to display these records in the application at the time.
So the process we currently go through is to:
Make a network request for data (background)
Parse the data and map the objects to NSManagedObjects and save (background)
In the completion handler (main thread) we fetch records from core data with the same order and limit that we requested from the API.
Most tutorials on core data concurrency follow this pattern of saving in one thread and then fetching in another, but most of them give examples like:
NSArray *listOfPeople = ...;
[NSManagedObjectHelper saveDataInBackgroundWithContext:^(NSManagedObjectContext *localContext){
for (NSDictionary *personInfo in listOfPeople)
{
PersonEntity *person = [PersonEntity createInContext:localContext];
[person setValuesForKeysWithDictionary:personInfo];
}
} completion:^{
self.people = [PersonEntity findAll];
}];
Source
So regardless of the amount of records you get back, you just fetch all content. This works for small datasets, but I want to be more efficient. I've read many times not to read/write data across threads, so fetching afterwards gets around this issue, but I don't want to fetch all, I just want the new records.
My Problem
So, for my real world example. I want to make a request to my API for the latest information (maybe anything older than my oldest record in core data) and save it, them I need the exact data returned from the API in the main thread ready for display.
So my question is, When I reach my completion handler, how do I know what to fetch? or what did the API return?. A couple of methods I've considered so far:
after saving each record, store the ID in a temporary array and then perform some fetch where id IN array_of_ids.
If I am asking for the latest records, I could just use the count of records returned, then use an order by and limit in my request to the latest x records.
My Question
I realize that the above could be answering my own question but I want to know if there is a better way, or is one of those methods much better to use than the other? I just have this feeling that I am missing something
Thanks
EDIT:
Neither answer below actually addresses the question, This is to do with fetching and saving data in the background and then using the returned data in the main thread. I know it's not a good idea to pass data between threads, so the common way around this is to fetch from core data after inserting. I want to work out the more efficient way.

Have you checked NSFetchedResultsController? Instead of fetching presented objects into array, you will use fetched controller in similar fashion. Through NSFetchedResultsControllerDelegate you would be notified about all the changes performed in background (rows added, removed, changed) and no manual tracking would be needed.

I feel You missing case with two silmultaneous API calls. Both storring ids and counting created enities wont work for that case. Consider adding timestamp property for each PersonEntity.
Assuming that Your intention is to display recently updated persons.
The calcutation of the oldest timestamp to display can look like this:
#property NSDate *lastViewRefreshTime;
#property NSDate *oldestEntityToDisplay;
(...)
if (self.lastViewRefreshTime.timeIntervalSinceNow < -3) {
self.oldestEntityToDisplay = self.lastViewRefreshTime;
}
self.lastViewRefreshTime = [NSDate date];
[self displayPersonsAddedAfter: self.oldestEntityToDisplay];
Now, if two API responses returns in period shorter than 3s their data will be displayed together.

Related

SwiftUI - How can I know when my FetchedResults changes?

I'd like to pass to my model the newest fetched data of my core data entities, in order to have them synched.
Is this possible?
The reason is that I have many variables that have to be calculated from the data saved in core data. These values are used in my views, so they should update at the same time.
(Until now I just found a way to pass them around every time with functions, but I find this very chaotic...)
Until now:
func doSomethingWithFetchedData(fetchedData: FetchedResults<Entity>) {
//return what I need
}
Thanks!

NSFetchedResultsController Subscribing to updates for many objects matching a fetch request has been easier than subscribing to updates from a single managed object, thanks to NSFetchedResultsController. It comes with a delegate that informs us about changes to the underlying data in a structured way, because it was designed to integrate with tables and collection views
Here is a good link to start with

Core data slow processing updates on a background thread

I am having a major problem with my application speed in processing updates on a background thread. Instruments shows that almost all of this time is spend inside performBlockAndWait where I am fetching out the objects which need updating.
My updates may come in by the hundreds depending on the amount of time offline and the approach I am currently using is to process them individually; ie fetch request to pull out the object, update, then save.
It sounds slow and it is. The problem I have is that I don't want to load everything into memory at once, so need to fetch them individually as we go, also I save as I go to ensure that if there is an issue with a single update it won't mess up the rest.
Is there a better approach?

I hit similar slow performance when upserting a large collection of objects. In my case I'm willing to keep the full change set in memory and perform a single save so the large volume of fetch requests dominated my processing time.
I got a significant performance improvement from maintaining an in memory cache mapping my resources' primary keys to NSManagedObjectIDs. That allowed me to use existingObjectWithId:error: rather than a fetch request for an individual object.
I suspect I might do even better by collecting the primary keys for all resources of a given entity description, issuing a single fetch request for all of them at once (batching those results as necessary), and then processing the changes to each resource.

You may benefit from using NSBatchUpdateRequest assuming you're targeting iOS 8+ only.
These guys have a great example of it but the TLDR is basically:
Example: Say we want to update all unread instances of MyObject to be marked as read:
NSBatchUpdateRequest *req = [[NSBatchUpdateRequest alloc] initWithEntityName:#"MyObject"];
req.predicate = [NSPredicate predicateWithFormat:#"read == %#", #(NO)];
req.propertiesToUpdate = #{
#"read" : #(YES)
};
req.resultType = NSUpdatedObjectsCountResultType;
NSBatchUpdateResult *res = (NSBatchUpdateResult *)[context executeRequest:req error:nil];
NSLog(#"%# objects updated", res.result);
Note the above example is taken from the aforementioned blog, I didn't write the snippet.

Core Data fetch predicate nil check failing/unexpected results?

I have a Core Data layer with several thousand entities, constantly syncing to a server. The sync process uses fetch requests to check for deleted_at for the purposes of soft-deletion. There is a single context performing save operations in a performBlockAndWait call. The relationship mapping is handled by the RestKit library.
The CoreDataEntity class is a subclass of NSManagedObject, and it is also the superclass for all our different core data object classes. It has some attributes that are inherited by all our entities, such as deleted_at, entity_id, and all the boilerplate fetch and sync methods.
My issue is some fetch requests seem to return inconsistent results after modifications to the objects. For example after deleting an object (setting deleted_at to the current date):
[CoreDataEntity fetchEntitiesWithPredicate:[NSPredicate predicateWithFormat:#"deleted_at==nil"]];
Returns results with deleted_at == [NSDate today]
I have successfully worked around this behavior by additionally looping through the results and removing the entities with deleted_at set, however I cannot fix the converse issue:
[CoreDataEntity fetchEntitiesWithPredicate:[NSPredicate predicateWithFormat:#"deleted_at!=nil"]];
Is returning an empty array in the same conditions, preventing a server sync from succeeding.
I have confirmed deleted_at is set on the object, and the context save was successful. I just don't understand where to reset whatever cache is causing the outdated results?
Thanks for any help!
Edit: Adding a little more information, it appears that once one of these objects becomes corrupted, the only way get it to register is modifying the value again. Could this be some sort of Core Data index not updating when a value is modified?
Update: It appears to be a problem with RestKit https://github.com/RestKit/RestKit/issues/2218

You are apparently using some sintactic sugar extension to Core Data. I suppose that in your case it is a SheepData, right?
fetchEntitiesWithPredicate: there implemented as follows:
+ (NSArray*)fetchEntitiesWithPredicate:(NSPredicate*)aPredicate
{
return [self fetchEntitiesWithPredicate:aPredicate inContext:[SheepDataManager sharedInstance].managedObjectContext];
}
Are you sure that [SheepDataManager sharedInstance].managedObjectContext receives all the changes that you are making to your objects? Is it receives notifications of saves, or is it child context of your save context?
Try to replace your fetch one-liner with this:
[<your saving context> performBlockAndWait:^{
NSFetchRequest *request = [NSFetchRequest fetchRequestWithEntityName:#"CoreDataEntity"];
request.predicate = [NSPredicate predicateWithFormat:#"deleted_at==nil"];
NSArray *results = [<your saving context> executeFetchRequest:request error:NULL];
}];

First, after a save have you looked in the store to make sure your changes are there? Without seeing your entire Core Data stack it is difficult to get a solid understanding what might be going wrong. If you are saving and you see the changes in the store then the question comes into your contexts. How are they built and when. If you are dealing with sibling contexts that could be causing your issue.
More detail is required as to how your core data stack looks.
Yes, the changes are there. As I mentioned in the question, I can loop through my results and remove all those with deleted_at set successfully
That wasn't my question. There is a difference between looking at objects in memory and looking at them in the SQLite file on disk. The questions I have about this behavior are:
Are the changes being persisted to disk before you query for them again
Are you working with multiple contexts and potentially trying to fetch from a stale sibling.
Thus my questions about on disk changes and what your core data stack looks like.
Threading
If you are using one context, are you using more than one thread in your app? If so, are you using that context on more than one thread?
I can see a situation where if you are violating the thread confinement rules you can be corrupting data like this.

Try adding an extra attribute deleted that is a bool with a default of false. Then the attribute is always set and you can look for entities that are either true or false depending on your needs at the moment. If the value is true then you can look at deleted_at to find out when.
Alternatively try setting the deleted_at attribute to some old date (like perhaps 1 Jan 1980), then anything that isn't deleted will have a fixed date that is too old to have been set by the user.
Edit: There is likely some issue with deleted_at having never been touched on some entities that is confusing the system. It is also possible that you have set the fetch request to return results in the dictionary style in which case recent changes will not be reflected in the fetch results.

How to sync data from web service with Core Data?

I'm trying to sync my data from a web service in a simple way. I download my data using AFNetworking, and using a unique identifier on each object, I want to either insert, delete or update that data.
The problem is that with Core Data you have to actually insert objects in the NSObjectManagedContext to instantiate NSManagedObjects. Like this:
MyModel *model = (MyModel *)[NSEntityDescription insertNewObjectForEntityForName:#"MyModel" inManagedObjectContext:moc];
model.value = [jsonDict objectForKey:#"value"];
So when I get the data from the web service, I insert them right away in Core Data. So there's no real syncing going on: I just delete everything beforehand and then insert what's being returned from my web service.
I guess there's a better way of doing this, but I don't know how. Any help?

You are running into the classic insert/update/delete paradigm.
The answer is, it depends. If you get a chunk of json data then you can use KVC to extract the unique ids from that chunk and do a fetch against your context to find out what exists already. From there it is a simple loop over the chunk of data, inserting and updating as appropriate.
If you do not get the data in a nice chunk like that then you will probably need to do a fetch for each record to determine if it is an insert or update. That is far more expensive and should be avoided. Batch fetching before hand is recommended.
Deleting is just about as expensive as fetching/updating since you need to fetch the objects to delete them anyway so you might as well handle updating properly instead.
Update
Yes there is an efficient way of building the dictionary out of the Core Data objects. Once you get your array of existing objects back from Core Data, you can turn it into a dictionary with:
NSArray *array = ...; //Results from Core Data fetch
NSDictionary *objectMap = [NSDictionary dictionaryWithObjects:array forKeys:[array valueForKey:#"identifier"]];
This assumes that you have an attribute called identifier in your Core Data entity. Change the name as appropriate.
With that one line of code you now have all of your existing objects in a NSDictionary that you can then look up against as you walk the JSON.

The easiest thing to do is to restore the Json to a entity that maps properly to it. Once you've mapped it, determine if a object matching the entities ID exists already, if so then fetch the entity and merge changes. If not, create a new entity in Core Data and restore the Json to it.
I'm building a app were I do client side syncing with Evernote. They keep a syncUpdate number on all of their objects and at the server level. So when I start my sync I check if my clients syncUpdate count is less than the servers. If so, I know I am out of sync. If my updateCount is at 400 and the server is at 410, I tell the server to provide me with all objects between updateCount 400 and 410. Then I check if I already have the objects or not and perform my update/create.
Every time a object is modified on the server, that objects updateCount is increments along with the servers.
The server also keeps a time stamp of the last update, which I can check against also.

Core Data: delete all objects of an entity type, ie clear a table

This has been asked before, but no solution described that is fast enough for my app needs.
In the communications protocol we have set up, the server sends down a new set of all customers every time a sync is performed. Earlier, we had been storing as a plist. Now want to use Core Data.
There can be thousands of entries. Deleting each one individually takes a long time. Is there a way to delete all rows in a particular table in Core Data?
delete from customer
This call in sqlite happens instantly. Going through each one individually in Core Data can take 30 seconds on an iPad1.
Is it reasonable to shut down Core Data, i.e. drop the persistence store and all managed object contexts, then drop into sqlite and perform the delete command against the table? No other activity is going on during this process so I don't need access to other parts of the database.

Dave DeLong is an expert at, well, just about everything, and so I feel like I'm telling Jesus how to walk on water. Granted, his post is from 2009, which was a LONG time ago.
However, the approach in the link posted by Bot is not necessarily the best way to handle large deletes.
Basically, that post suggests to fetch the object IDs, and then iterate through them, calling delete on each object.
The problem is that when you delete a single object, it has to go handle all the associated relationships as well, which could cause further fetching.
So, if you must do large scale deletes like this, I suggest adjusting your overall database so that you can isolate tables in specific core data stores. That way you can just delete the entire store, and possibly reconstruct the small bits that you want to remain. That will probably be the fastest approach.
However, if you want to delete the objects themselves, you should follow this pattern...
Do your deletes in batches, inside an autorelease pool, and be sure to pre-fetch any cascaded relationships. All these, together, will minimize the number of times you have to actually go to the database, and will, thus, decrease the amount of time it takes to perform your delete.
In the suggested approach, which comes down to...
Fetch ObjectIds of all objects to be deleted
Iterate through the list, and delete each object
If you have cascade relationships, you you will encounter a lot of extra trips to the database, and IO is really slow. You want to minimize the number of times you have to visit the database.
While it may initially sound counterintuitive, you want to fetch more data than you think you want to delete. The reason is that all that data can be fetched from the database in a few IO operations.
So, on your fetch request, you want to set...
[fetchRequest setRelationshipKeyPathsForPrefetching:#[#"relationship1", #"relationship2", .... , #"relationship3"]];
where those relationships represent all the relationships that may have a cascade delete rule.
Now, when your fetch is complete, you have all the objects that are going to be deleted, plus the objects that will be deleted as a result of those objects being deleted.
If you have a complex hierarchy, you want to prefetch as much as possible ahead of time. Otherwise, when you delete an object, Core Data is going to have to go fetch each relationship individually for each object so that it can managed the cascade delete.
This will waste a TON of time, because you will do many more IO operations as a result.
Now, after your fetch has completed, then you loop through the objects, and delete them. For large deletes you can see an order of magnitude speed up.
In addition, if you have a lot of objects, break it up into multiple batches, and do it inside an auto release pool.
Finally, do this in a separate background thread, so your UI does not pend. You can use a separate MOC, connected to a persistent store coordinator, and have the main MOC handle DidSave notifications to remove the objects from its context.
WHile this looks like code, treat it as pseudo-code...
NSManagedObjectContext *deleteContext = [[NSManagedObjectContext alloc] initWithConcurrencyType:NSPrivateConcurrencyType];
// Get a new PSC for the same store
deleteContext.persistentStoreCoordinator = getInstanceOfPersistentStoreCoordinator();
// Each call to performBlock executes in its own autoreleasepool, so we don't
// need to explicitly use one if each chunk is done in a separate performBlock
__block void (^block)(void) = ^{
NSFetchRequest *fetchRequest = //
// Only fetch the number of objects to delete this iteration
fetchRequest.fetchLimit = NUM_ENTITIES_TO_DELETE_AT_ONCE;
// Prefetch all the relationships
fetchRequest.relationshipKeyPathsForPrefetching = prefetchRelationships;
// Don't need all the properties
fetchRequest.includesPropertyValues = NO;
NSArray *results = [deleteContext executeFetchRequest:fetchRequest error:&error];
if (results.count == 0) {
// Didn't get any objects for this fetch
if (nil == results) {
// Handle error
}
return;
}
for (MyEntity *entity in results) {
[deleteContext deleteObject:entity];
}
[deleteContext save:&error];
[deleteContext reset];
// Keep deleting objects until they are all gone
[deleteContext performBlock:block];
};
[deleteContext preformBlock:block];
Of course, you need to do appropriate error handling, but that's the basic idea.
Fetch in batches if you have so much data to delete that it will cripple memory.
Don't fetch all the properties.
Prefetch relationships to minimize IO operations.
Use autoreleasepool to keep memory from growing.
Prune the context.
Perform the task on a background thread.
If you have a really complex graph, make sure you prefetch all the cascaded relationships for all entities in your entire object graph.
Note, your main context will have to handle DidSave notifications to keep its context in step with the deletions.
EDIT
Thanks. Lots of good points. All well explained except, why create the
separate MOC? Any thoughts on not deleting the entire database, but
using sqlite to delete all rows from a particular table? – David
You use a separate MOC so the UI is not blocked while the long delete operation is happening. Note, that when the actual commit to the database happens, only one thread can be accessing the database, so any other access (like fetching) will block behind any updates. This is another reason to break the large delete operation into chunks. Small pieces of work will provide some chance for other MOC(s) to access the store without having to wait for the whole operation to complete.
If this causes problems, you can also implement priority queues (via dispatch_set_target_queue), but that is beyond the scope of this question.
As for using sqlite commands on the Core Data database, Apple has repeatedly said this is a bad idea, and you should not run direct SQL commands on a Core Data database file.
Finally, let me note this. In my experience, I have found that when I have a serious performance problem, it is usually a result of either poor design or improper implementation. Revisit your problem, and see if you can redesign your system somewhat to better accommodate this use case.
If you must send down all the data, perhaps query the database in a background thread and filter the new data so you break your data into three sets: objects that need modification, objects that need deletion, and objects that need to be inserted.
This way, you are only changing the database where it needs to be changed.
If the data is almost brand new every time, consider restructuring your database where these entities have their own database (I assume your database already contains multiple entities). That way you can just delete the file, and start over with a fresh database. That's fast. Now, reinserting several thousand objects is not going to be fast.
You have to manage any relationships manually, across stores. It's not difficult, but it's not automatic like relationships within the same store.
If I did this, I would first create the new database, then tear down the existing one, replace it with the new one, and then delete the old one.
If you are only manipulating your database via this batch mechanism, and you do not need object graph management, then maybe you want to consider using sqlite instead of Core Data.

iOS 9 and later
Use NSBatchDeleteRequest. I tested this in the simulator on a Core Data entity with more than 400,000 instances and the delete was almost instantaneous.
// fetch all items in entity and request to delete them
let fetchRequest = NSFetchRequest(entityName: "MyEntity")
let deleteRequest = NSBatchDeleteRequest(fetchRequest: fetchRequest)
// delegate objects
let myManagedObjectContext = (UIApplication.sharedApplication().delegate as! AppDelegate).managedObjectContext
let myPersistentStoreCoordinator = (UIApplication.sharedApplication().delegate as! AppDelegate).persistentStoreCoordinator
// perform the delete
do {
try myPersistentStoreCoordinator.executeRequest(deleteRequest, withContext: myManagedObjectContext)
} catch let error as NSError {
print(error)
}
Note that the answer that #Bot linked to and that #JodyHagins mentioned has also been updated to this method.

Really your only option is to remove them individually. I do this method with a ton of objects and it is pretty fast. Here is a way someone does it by only loading the managed object ID so it prevents any unnecessary overhead and makes it faster.
Core Data: Quickest way to delete all instances of an entity

Yes, it's reasonable to delete the persistent store and start from scratch. This happen fairly quick. What you can do is remove the persistent store (with the persistent store URL) from the persistent store coordinator, and then use the url of the persistent store to delete the database file from your directory folder. I did it using NSFileManager's removeItemAtURL.
Edit: one thing to consider: Make sure to disable/release the current NSManagedObjectContext instance, and to stop any other thread which might be doing something with a NSManagedObjectContext which is using the same persistent store. Your application will crash if a context tries to access the persistent store.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart