Core data slow processing updates on a background thread

Core data slow processing updates on a background thread - ios

I am having a major problem with my application speed in processing updates on a background thread. Instruments shows that almost all of this time is spend inside performBlockAndWait where I am fetching out the objects which need updating.
My updates may come in by the hundreds depending on the amount of time offline and the approach I am currently using is to process them individually; ie fetch request to pull out the object, update, then save.
It sounds slow and it is. The problem I have is that I don't want to load everything into memory at once, so need to fetch them individually as we go, also I save as I go to ensure that if there is an issue with a single update it won't mess up the rest.
Is there a better approach?

I hit similar slow performance when upserting a large collection of objects. In my case I'm willing to keep the full change set in memory and perform a single save so the large volume of fetch requests dominated my processing time.
I got a significant performance improvement from maintaining an in memory cache mapping my resources' primary keys to NSManagedObjectIDs. That allowed me to use existingObjectWithId:error: rather than a fetch request for an individual object.
I suspect I might do even better by collecting the primary keys for all resources of a given entity description, issuing a single fetch request for all of them at once (batching those results as necessary), and then processing the changes to each resource.

You may benefit from using NSBatchUpdateRequest assuming you're targeting iOS 8+ only.
These guys have a great example of it but the TLDR is basically:
Example: Say we want to update all unread instances of MyObject to be marked as read:
NSBatchUpdateRequest *req = [[NSBatchUpdateRequest alloc] initWithEntityName:#"MyObject"];
req.predicate = [NSPredicate predicateWithFormat:#"read == %#", #(NO)];
req.propertiesToUpdate = #{
#"read" : #(YES)
};
req.resultType = NSUpdatedObjectsCountResultType;
NSBatchUpdateResult *res = (NSBatchUpdateResult *)[context executeRequest:req error:nil];
NSLog(#"%# objects updated", res.result);
Note the above example is taken from the aforementioned blog, I didn't write the snippet.

Related

Multi-threading with core data and API requests

Intro
I've read alot of tutorials and articles on Core Data concurrency, but I'm having an issue that is not often covered, or covered in a real-world way that I am hoping someone can help with. I've checked the related questions in SO and none give an answer to this particular question that I can find.
Background
We have an existing application which fetches data from an API (in the background thread) and then saves the records returned into core data. We also need to display these records in the application at the time.
So the process we currently go through is to:
Make a network request for data (background)
Parse the data and map the objects to NSManagedObjects and save (background)
In the completion handler (main thread) we fetch records from core data with the same order and limit that we requested from the API.
Most tutorials on core data concurrency follow this pattern of saving in one thread and then fetching in another, but most of them give examples like:
NSArray *listOfPeople = ...;
[NSManagedObjectHelper saveDataInBackgroundWithContext:^(NSManagedObjectContext *localContext){
for (NSDictionary *personInfo in listOfPeople)
{
PersonEntity *person = [PersonEntity createInContext:localContext];
[person setValuesForKeysWithDictionary:personInfo];
}
} completion:^{
self.people = [PersonEntity findAll];
}];
Source
So regardless of the amount of records you get back, you just fetch all content. This works for small datasets, but I want to be more efficient. I've read many times not to read/write data across threads, so fetching afterwards gets around this issue, but I don't want to fetch all, I just want the new records.
My Problem
So, for my real world example. I want to make a request to my API for the latest information (maybe anything older than my oldest record in core data) and save it, them I need the exact data returned from the API in the main thread ready for display.
So my question is, When I reach my completion handler, how do I know what to fetch? or what did the API return?. A couple of methods I've considered so far:
after saving each record, store the ID in a temporary array and then perform some fetch where id IN array_of_ids.
If I am asking for the latest records, I could just use the count of records returned, then use an order by and limit in my request to the latest x records.
My Question
I realize that the above could be answering my own question but I want to know if there is a better way, or is one of those methods much better to use than the other? I just have this feeling that I am missing something
Thanks
EDIT:
Neither answer below actually addresses the question, This is to do with fetching and saving data in the background and then using the returned data in the main thread. I know it's not a good idea to pass data between threads, so the common way around this is to fetch from core data after inserting. I want to work out the more efficient way.

Have you checked NSFetchedResultsController? Instead of fetching presented objects into array, you will use fetched controller in similar fashion. Through NSFetchedResultsControllerDelegate you would be notified about all the changes performed in background (rows added, removed, changed) and no manual tracking would be needed.

I feel You missing case with two silmultaneous API calls. Both storring ids and counting created enities wont work for that case. Consider adding timestamp property for each PersonEntity.
Assuming that Your intention is to display recently updated persons.
The calcutation of the oldest timestamp to display can look like this:
#property NSDate *lastViewRefreshTime;
#property NSDate *oldestEntityToDisplay;
(...)
if (self.lastViewRefreshTime.timeIntervalSinceNow < -3) {
self.oldestEntityToDisplay = self.lastViewRefreshTime;
}
self.lastViewRefreshTime = [NSDate date];
[self displayPersonsAddedAfter: self.oldestEntityToDisplay];
Now, if two API responses returns in period shorter than 3s their data will be displayed together.

RestKit, Core Data, Magical Record, Lots of data and Lagging UI

I got an app which stores data about restaurants. Their menu, schedule and so on.
Theres ~200 restaurants in the DB by now. The app used to retrieve those places by a single shot, but it was taking too much time to load, so i decided to load the data one by one. On the start app asks the server for an array of place's ids, then get the data via API.
I set completion blocks for RK's operations to background queues, but it didn't help. Scrolls are running not smoothy enough and sometimes the app even deadlocks(sic!) or crashes with no error output in the console. I've tried using Instruments to locate what makes UI go jerky, but didn't succeed. I even turned off image loading and sorting functions. I used to have RK's requests calls wrapped in #synchronized, so i removed it, but no effect.
Never thought i'd be facing this kind of issue after few weeks of experience with objective-c. I've tried every possible way that came to my mind, so now im kinda giving up.
Please help me :)
The code below gets called 200 times right after the app start.
NSURLRequest *request = [[DEAPIService sharedInstance].manager requestWithObject:nil
method:RKRequestMethodGET
path:path
parameters:params];
//DLog(request.URL.absoluteString);
NSManagedObjectContext *context = [NSManagedObjectContext MR_contextWithParent:[NSManagedObjectContext MR_contextForCurrentThread]];
RKManagedObjectRequestOperation *operation = [[DEAPIService sharedInstance].manager managedObjectRequestOperationWithRequest:request
managedObjectContext:context
success:success
failure:failure];
// dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0);
dispatch_queue_t backgroundQueue = dispatch_queue_create("com.name.bgqueue", NULL);
operation.successCallbackQueue = backgroundQueue;
[[DEAPIService sharedInstance].manager enqueueObjectRequestOperation:operation];
Response descriptors are set before that. And also there's many relations in the DB. I've looked up the DB's file size - it's ~1.5Mb. I wonder what will happen if there would be over 1k restaurants. Is it a good way to load this kind of data? What are the best practises?

Ok, from the information provided you should be able to simplify a lot. At the moment your code is dropping down to a low level and getting involved in threading and context management that you want to leave for RestKit to sort out. You have a properly configured object manager and core data stack by the looks of things so you should let it do the work.
This means removing the request, context and request operation code and simply calling `getObjectsAtPath:parameters:success:failure:'. RestKit will then deal with all of the download and mapping in the background and save the context.
You should also really be using fetched results controllers throughout your app, and if you are they will automatically detect the changes that RestKit has saved and update your UI.
Once you have that, any blocking of the UI is unrelated to RestKit and your download requirement and should be related to subsequent image management / downloads.

Deleting Core Data after X amount of days

So I have a bunch of objects in Core Data and want them to auto delete after X amount of days (this would be based off of an NSDate). I did some searching and it seems that you can only delete one core data object at a time, not a group of them, let alone ones that are based off of a certain date. I'm thinking maybe to have a loop running going through each object - but that seems like it would be very processor heavy. Any ideas on where I should be looking to do this? Thanks.

A loop deleting objects one by one is the correct approach.
Deleting objects in Core Data is extremely processor heavy. If that's a problem, then Core Data is not suitable for your project, and you should use something else. I recommend FCModel, as a light weight alternative that is very efficient.
If you are going to stick with Core Data, it's a good idea to perform large operations on a background NSOperationQueue, so the main application is not locked up while deleting the objects. You need to be very careful with Core Data across multiple threads, the approach is to have a separate managed object context for each thread, both using the same persistent store coordinator. Do not ever share a managed object across threads, but you can share the objectID, to fetch a second copy of the same database record on the other managed object context.
Basically your background thread creates a new context, deletes all the objects in a loop, then (on the main thread preferably, see documentation) save the background thread context. This will merge your changes unless there is a conflict (both contexts modify the same object) — in that scenario you have a few options, I'd just abort the entire delete operation and start again.
Apple has good documentation available for all the issues and sample code available here: https://developer.apple.com/library/mac/documentation/Cocoa/Conceptual/CoreData/Articles/cdConcurrency.html
It's a bit daunting, you need to do some serious homework, but the actual code is very simple once you've got your head around how everything works. Or just use FCModel, which is designed for fast batch operations.

It's not as processor heavy as you may think :) (of course it depends of data amount)
Feel free to use loop
- (void)deleteAllObjects
{
NSArray *allEntities = self.managedObjectModel.entities;
for (NSEntityDescription *entityDescription in allEntities)
{
NSFetchRequest *fetchRequest = [[NSFetchRequest alloc] init];
[fetchRequest setEntity:entityDescription];
fetchRequest.includesPropertyValues = NO;
fetchRequest.includesSubentities = NO;
NSError *error;
NSArray *items = [self.managedObjectContext executeFetchRequest:fetchRequest error:&error];
for (NSManagedObject *managedObject in items) {
[self.managedObjectContext deleteObject:managedObject];
}
if (![self.managedObjectContext save:&error]) {
NSLog(#"Error occurred");
}
}
}

As others have noted, iterating over the objects is the only way to actually delete the objects in Core Data. This is one of those use cases where Core Data's approach kind of falls down, because it's just not optimized for that kind of use.
But there are ways to deal with it to avoid unwanted delays in your app, so that the user doesn't have to wait while your code chugs over a ton of delete requests.
If you have a lot of objects that need to be deleted and you don't want to have to wait until the process is complete, you can fake the initial delete at first and then later do the actual delete when it's convenient. Something like:
Add a custom boolean attribute to the entity type called something like toBeDeleted with a default value of NO.
When you have a bunch of objects to delete, set toBeDeleted to YES on all of them in a single step by using NSBatchUpdateRequest (new in iOS 8). This class is mostly undocumented, so look at the header file or at the BNR blog post about it. You'll specify the property name and the new attribute value, and Core Data will do a mass, quick update.
Make sure your fetch requests all check that toBeDeleted is NO. Now objects marked for deletion will be excluded when fetching even though they still exist.
At some point-- later on, but soon-- run some code in the background that fetches and deletes objects that have toBeDeleted set to YES.

Core Data: delete all objects of an entity type, ie clear a table

This has been asked before, but no solution described that is fast enough for my app needs.
In the communications protocol we have set up, the server sends down a new set of all customers every time a sync is performed. Earlier, we had been storing as a plist. Now want to use Core Data.
There can be thousands of entries. Deleting each one individually takes a long time. Is there a way to delete all rows in a particular table in Core Data?
delete from customer
This call in sqlite happens instantly. Going through each one individually in Core Data can take 30 seconds on an iPad1.
Is it reasonable to shut down Core Data, i.e. drop the persistence store and all managed object contexts, then drop into sqlite and perform the delete command against the table? No other activity is going on during this process so I don't need access to other parts of the database.

Dave DeLong is an expert at, well, just about everything, and so I feel like I'm telling Jesus how to walk on water. Granted, his post is from 2009, which was a LONG time ago.
However, the approach in the link posted by Bot is not necessarily the best way to handle large deletes.
Basically, that post suggests to fetch the object IDs, and then iterate through them, calling delete on each object.
The problem is that when you delete a single object, it has to go handle all the associated relationships as well, which could cause further fetching.
So, if you must do large scale deletes like this, I suggest adjusting your overall database so that you can isolate tables in specific core data stores. That way you can just delete the entire store, and possibly reconstruct the small bits that you want to remain. That will probably be the fastest approach.
However, if you want to delete the objects themselves, you should follow this pattern...
Do your deletes in batches, inside an autorelease pool, and be sure to pre-fetch any cascaded relationships. All these, together, will minimize the number of times you have to actually go to the database, and will, thus, decrease the amount of time it takes to perform your delete.
In the suggested approach, which comes down to...
Fetch ObjectIds of all objects to be deleted
Iterate through the list, and delete each object
If you have cascade relationships, you you will encounter a lot of extra trips to the database, and IO is really slow. You want to minimize the number of times you have to visit the database.
While it may initially sound counterintuitive, you want to fetch more data than you think you want to delete. The reason is that all that data can be fetched from the database in a few IO operations.
So, on your fetch request, you want to set...
[fetchRequest setRelationshipKeyPathsForPrefetching:#[#"relationship1", #"relationship2", .... , #"relationship3"]];
where those relationships represent all the relationships that may have a cascade delete rule.
Now, when your fetch is complete, you have all the objects that are going to be deleted, plus the objects that will be deleted as a result of those objects being deleted.
If you have a complex hierarchy, you want to prefetch as much as possible ahead of time. Otherwise, when you delete an object, Core Data is going to have to go fetch each relationship individually for each object so that it can managed the cascade delete.
This will waste a TON of time, because you will do many more IO operations as a result.
Now, after your fetch has completed, then you loop through the objects, and delete them. For large deletes you can see an order of magnitude speed up.
In addition, if you have a lot of objects, break it up into multiple batches, and do it inside an auto release pool.
Finally, do this in a separate background thread, so your UI does not pend. You can use a separate MOC, connected to a persistent store coordinator, and have the main MOC handle DidSave notifications to remove the objects from its context.
WHile this looks like code, treat it as pseudo-code...
NSManagedObjectContext *deleteContext = [[NSManagedObjectContext alloc] initWithConcurrencyType:NSPrivateConcurrencyType];
// Get a new PSC for the same store
deleteContext.persistentStoreCoordinator = getInstanceOfPersistentStoreCoordinator();
// Each call to performBlock executes in its own autoreleasepool, so we don't
// need to explicitly use one if each chunk is done in a separate performBlock
__block void (^block)(void) = ^{
NSFetchRequest *fetchRequest = //
// Only fetch the number of objects to delete this iteration
fetchRequest.fetchLimit = NUM_ENTITIES_TO_DELETE_AT_ONCE;
// Prefetch all the relationships
fetchRequest.relationshipKeyPathsForPrefetching = prefetchRelationships;
// Don't need all the properties
fetchRequest.includesPropertyValues = NO;
NSArray *results = [deleteContext executeFetchRequest:fetchRequest error:&error];
if (results.count == 0) {
// Didn't get any objects for this fetch
if (nil == results) {
// Handle error
}
return;
}
for (MyEntity *entity in results) {
[deleteContext deleteObject:entity];
}
[deleteContext save:&error];
[deleteContext reset];
// Keep deleting objects until they are all gone
[deleteContext performBlock:block];
};
[deleteContext preformBlock:block];
Of course, you need to do appropriate error handling, but that's the basic idea.
Fetch in batches if you have so much data to delete that it will cripple memory.
Don't fetch all the properties.
Prefetch relationships to minimize IO operations.
Use autoreleasepool to keep memory from growing.
Prune the context.
Perform the task on a background thread.
If you have a really complex graph, make sure you prefetch all the cascaded relationships for all entities in your entire object graph.
Note, your main context will have to handle DidSave notifications to keep its context in step with the deletions.
EDIT
Thanks. Lots of good points. All well explained except, why create the
separate MOC? Any thoughts on not deleting the entire database, but
using sqlite to delete all rows from a particular table? – David
You use a separate MOC so the UI is not blocked while the long delete operation is happening. Note, that when the actual commit to the database happens, only one thread can be accessing the database, so any other access (like fetching) will block behind any updates. This is another reason to break the large delete operation into chunks. Small pieces of work will provide some chance for other MOC(s) to access the store without having to wait for the whole operation to complete.
If this causes problems, you can also implement priority queues (via dispatch_set_target_queue), but that is beyond the scope of this question.
As for using sqlite commands on the Core Data database, Apple has repeatedly said this is a bad idea, and you should not run direct SQL commands on a Core Data database file.
Finally, let me note this. In my experience, I have found that when I have a serious performance problem, it is usually a result of either poor design or improper implementation. Revisit your problem, and see if you can redesign your system somewhat to better accommodate this use case.
If you must send down all the data, perhaps query the database in a background thread and filter the new data so you break your data into three sets: objects that need modification, objects that need deletion, and objects that need to be inserted.
This way, you are only changing the database where it needs to be changed.
If the data is almost brand new every time, consider restructuring your database where these entities have their own database (I assume your database already contains multiple entities). That way you can just delete the file, and start over with a fresh database. That's fast. Now, reinserting several thousand objects is not going to be fast.
You have to manage any relationships manually, across stores. It's not difficult, but it's not automatic like relationships within the same store.
If I did this, I would first create the new database, then tear down the existing one, replace it with the new one, and then delete the old one.
If you are only manipulating your database via this batch mechanism, and you do not need object graph management, then maybe you want to consider using sqlite instead of Core Data.

iOS 9 and later
Use NSBatchDeleteRequest. I tested this in the simulator on a Core Data entity with more than 400,000 instances and the delete was almost instantaneous.
// fetch all items in entity and request to delete them
let fetchRequest = NSFetchRequest(entityName: "MyEntity")
let deleteRequest = NSBatchDeleteRequest(fetchRequest: fetchRequest)
// delegate objects
let myManagedObjectContext = (UIApplication.sharedApplication().delegate as! AppDelegate).managedObjectContext
let myPersistentStoreCoordinator = (UIApplication.sharedApplication().delegate as! AppDelegate).persistentStoreCoordinator
// perform the delete
do {
try myPersistentStoreCoordinator.executeRequest(deleteRequest, withContext: myManagedObjectContext)
} catch let error as NSError {
print(error)
}
Note that the answer that #Bot linked to and that #JodyHagins mentioned has also been updated to this method.

Really your only option is to remove them individually. I do this method with a ton of objects and it is pretty fast. Here is a way someone does it by only loading the managed object ID so it prevents any unnecessary overhead and makes it faster.
Core Data: Quickest way to delete all instances of an entity

Yes, it's reasonable to delete the persistent store and start from scratch. This happen fairly quick. What you can do is remove the persistent store (with the persistent store URL) from the persistent store coordinator, and then use the url of the persistent store to delete the database file from your directory folder. I did it using NSFileManager's removeItemAtURL.
Edit: one thing to consider: Make sure to disable/release the current NSManagedObjectContext instance, and to stop any other thread which might be doing something with a NSManagedObjectContext which is using the same persistent store. Your application will crash if a context tries to access the persistent store.

Fast Zero result filtering with Core Data

I'm building an interface to allow users to filter a set of Photos. The data model is above. I'd like the disable any controls that would result in 0 results.
To accomplish this, I'm running new fetch requests for every control that is off/not selected each time the user makes their own change. I add the data that the control represents to my NSCompoungPredicate, then remove it after I get the result. If the count of the result is 0, I disable that control.
I'm doing this all on the main thread so in some cases there is a bit of a lag in the app. Is there a better way to do this type of filtering with less overhead? Should I run these filter fetches on their own thread? I've never done that with CoreData and worm what I've read I need a separate context for that and I'm not sure how to go about setting up code for that.

A bit of code would help. Aside from that, here are a few suggestions.
First, on your fetch request, use countForFetchRequest:error: because it will just query the database, and return the count instead of object information.
Second, if you don't want to use threads, and the search is still too slow, you can do the initial query when the app starts. This will then enable/disable the various controls.
You can simple catch the context notifications that tell when data has changed, and update that information accordingly. Then, you don't have to do any queries at all. Just initialize, and update the status as objects are added/removed from the database.
If you want to use threads, then that's not really all that difficult.
It sounds like all you want is a thread that is just running queries. You setup a MOC...
NSManagedObjectContext *checkerMoc = [[NSManagedObjectContext alloc] initWithConcurrencyType:NSPrivateQueueConcurrencyType];
checkerMoc.persistentStoreCoordinator = MyCurrentMoc.persistentStoreCoordinator;
Now, whenever you want to check the database...
[checkerMoc performBlock:^{
NSFetchRequest *fetchRequest = ...
// Do your fetch request... this block of code is running in the other thread
[checkerMoc fetch...];
// When the fetch request is done, do whatever you want in your UI...
dispatch_async(dispatch_get_main_queue(), ^{
// Now this code is running in the main thread... access your UI
self.myControl.enabled = fetchResultCount > 0;
});
}];
Note, you are using the same persistent store coordinator, so if the main thread tries to access the database, it will get stacked behind this request. You can also use a separate persistentStoreCoordinator for checkerMoc is this is an issue.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart