I'm parsing data from a JSON file that has approximately 20000 objects. I've been running the time profiler to figure out where my bottlenecks are and speed up the parse and I've managed to reduce the parse time by 45%, however according to the time profiler 78% of my time is being taken by the context.save() and much of the heavy portions throughout the parse are sourcing from where I call NSEntityDescription.insertNewObjectForEntityForName.
Does anyone have any idea if theres any way to speed this up? I'm currently batching my saves every 5000 objects. I tried groupings of 100,1000,2000,5000,10000 and I found that 5000 was the most optimal on the device I'm running. I've read through the Core Data Programming Guide but have found most of the advice it gives is to optimizing fetching on large numbers of data and not parsing or inserting.
The answer could very well be, Core Data has its limitations, but I wanted to know if anyone has found ways to further optimize inserting thousands of objects.
UPDATE
As requested some sample code on how I handle parsing
class func parseCategories(data: NSDictionary, context: NSManagedObjectContext, completion: ((success: Bool) -> Void)) {
let totalCategories = data.allValues.count
var categoriesParsed = 0
for (index, category) in data.allValues.enumerate() {
let privateContext = NSManagedObjectContext(concurrencyType: NSManagedObjectContextConcurrencyType.PrivateQueueConcurrencyType)
privateContext.persistentStoreCoordinator = (UIApplication.sharedApplication().delegate as! AppDelegate).persistentStoreCoordinator!
privateContext.mergePolicy = NSMergeByPropertyStoreTrumpMergePolicy
//Do the parsing for this iteration on a separate background thread
privateContext.performBlock({ () -> Void in
guard let categoryData = category.valueForKey("category") as? NSArray else{
print("Fatal Error: could not parse the category data into an NSArray. This should never happen")
completion(success: false)
return
}
let newCategory: Categories?
do {
let newCategory = NSEntityDescription.insertNewObjectForEntityForName("Categories", inManagedObjectContext: privateContext) as! Categories
newCategory.name = category.valueForKey("name") as? String ?? ""
newCategory.sortOrder = category.valueForKey("sortOrder") as? NSNumber ?? -1
SubCategory.parseSubcategories(category.valueForKey("subcategories") as! NSArray, parentCategory: newCategory, context: privateContext)
} catch {
print("Could not create the Category object as expected \(error)")
completion(success: false)
}
do {
print("Num Objects Inserted: \(privateContext.insertedObjects.count)") //Num is between 3-5k
try privateContext.save()
} catch {
completion(success: false)
return
}
categoriesParsed+=1
if categoriesParsed == totalCategories{
completion(success: true)
}
})
}
}
In the above code, I look through the top level data objects which I call a "Category", I spin off background threads for each object to parse concurrently. There are only 3 of this top level object, so it doesn't get too thread heavy.
Each Category has SubCategories, and several other levels of child objects which yield several thousand objects each getting inserted.
My core data stack is configured with one sqlite database the standard way that is configured when you create an app with CoreData
One reason is that you're saving the managed object context in each single iteration, which is expensive and not needed. Save it after the last item has been inserted.
Related
Today when i was making simple test
func testCountSales() {
measureMetrics([XCTPerformanceMetric_WallClockTime], automaticallyStartMeasuring: false, for: {
let employee = self.getEmployees()
let employeeDetails = EmployeeDetailViewController()
self.startMeasuring()
_ = employeeDetails.salesCountForEmployees(employee)
self.stopMeasuring()
})
}
func getEmployees() -> Employee {
let coreDataStack = CoreDataStack(modelName: "EmployeeDirectory")
let request: NSFetchRequest<Employee> = NSFetchRequest(entityName: "Employee")
request.sortDescriptors = [NSSortDescriptor(key: "guid", ascending: true)]
request.fetchBatchSize = 1
let results: [AnyObject]?
do {
results = try coreDataStack.mainContext.fetch(request)
} catch _ {
results = nil
}
return results![0] as! Employee
}
I wondered do fetchBachSize really working? I tried to see debug section the was full array (50 elements as it supposed to be).All of them were faults. ok. Then i tried to add Observer for FetchedResultsController's fetchedObject's count property
var window: UIWindow?
lazy var coreDataStack = CoreDataStack(modelName: "EmployeeDirectory")
let amountToImport = 50
let addSalesRecords = true
func application(_ application: UIApplication, didFinishLaunchingWithOptions launchOptions: [UIApplicationLaunchOptionsKey : Any]?) -> Bool {
importJSONSeedDataIfNeeded()
guard let tabController = window?.rootViewController as? UITabBarController,
let employeeListNavigationController = tabController.viewControllers?[0] as? UINavigationController,
let employeeListViewController = employeeListNavigationController.topViewController as? EmployeeListViewController else {
fatalError("Application storyboard mis-configuration. Application is mis-configured")
}
employeeListViewController.coreDataStack = coreDataStack
employeeListViewController.addObserver(self, forKeyPath: "fetchedResultsController", options: [.new], context: nil)
guard let departmentListNavigationController = tabController.viewControllers?[1] as? UINavigationController,
let departmentListViewController = departmentListNavigationController.topViewController as? DepartmentListViewController else {
fatalError("Application storyboard mis-configuration. Application is mis-configured")
}
departmentListViewController.coreDataStack = coreDataStack
return true
}
override func observeValue(forKeyPath keyPath: String?, of object: Any?, change: [NSKeyValueChangeKey : Any]?, context: UnsafeMutableRawPointer?) {
if keyPath == "fetchedResultsController" {
print ("Gold")
let a = window?.rootViewController as? UITabBarController
let employeeListNavigationController = a?.viewControllers?[0] as? UINavigationController
let b = employeeListNavigationController?.topViewController as? EmployeeListViewController
print( b?.fetchedResultsController.fetchedObjects?.count)
}
}
it showed me that it was nil then right away 50. and apparently they are also faults. Then why do we need fetchBatchSize and when it comes to play? And how? Please if someone have any idea i would appreciate very much
When you specify non-zero fetchBatchSize to request core data will query the underlying persistent store and on receiving the result only the objects in the fetchBatchSize range of the resulting array will be fetched to memory other entries in the array are page faults.
Quote from apple :
If you set a nonzero batch size, the collection of objects returned
when an instance of NSFetchRequest is executed is broken into batches.
When the fetch is executed, the entire request is evaluated and the
identities of all matching objects recorded, but only data for objects
up to the batchSize will be fetched from the persistent store at a
time. The array returned from executing the request is a proxy object
that transparently faults batches on demand. (In database terms, this
is an in-memory cursor.)
For example, lets assume your query results in a array of 1000 objects and you specified fetchBatchSize as 50. Then out of 1000 objects only first 50 will be converted to actual objects/entities and will be loaded in memory. While remaining 950 will be simply a page fault. When your code tries to access first 50 objects, objects will be accessed directly from memory and when your code tries to access 51th object, because object isn't there in memory (hence page fault) core data will gracefully construct the object by fetching next batch of data from persistent store and returns it to your code.
Using fetchBatchSize you can control the number of objects that you deal with in memory while working with large data set.
it showed me that it was nil then right away 50. and apparently they
are also faults.
I believe you were expecting FetchedResultsController to fetch 1 object because you specified fetchBatchSize as 1. But it eventually fetched 50. So you are confused now isn't it?
fetchBatchSize will not alter the fetched objects count of FetchedResultsController it only affects the number of objects actually loaded into memory out of search result. If you really wanted to control number Of Objects fetched by FetchedResultsController you should rather consider using fetchLimit
Read :
https://developer.apple.com/documentation/coredata/nsfetchrequest/1506622-fetchlimit
Quote from apple:
The fetch limit specifies the maximum number of objects that a request
should return when executed. If you set a fetch limit, the framework
makes a best effort to improve efficiency, but does not guarantee it.
For every object store except the SQL store, a fetch request executed
with a fetch limit in effect simply performs an unlimited fetch and
throws away the unasked for rows
Hope it helps
I have a list of points of interest. This points were loaded from a Realm database. Each point should present its distance to the user's position.
Each time I get a new location, I calculate the distance to all points. To avoid a frozen screen, I was doing the math in a background thread, after i display the list in a table in the main thread.
func updatedLocation(currentLocation: CLLocation) {
let qualityOfServiceClass = QOS_CLASS_BACKGROUND
let backgroundQueue = dispatch_get_global_queue(qualityOfServiceClass, 0)
dispatch_async(backgroundQueue, {
for point in self.points{
let stringDistance = self.distanceToPoint(currentLocation, destination: point.coordinate)
point.stringDistance = stringDistance
}
dispatch_async(dispatch_get_main_queue(), { () -> Void in
self.tableView?.reloadData()
})
})
}
However I get this error:
libc++abi.dylib: terminating with uncaught exception of type realm::IncorrectThreadException: Realm accessed from incorrect thread.
I know i am getting this error because I'm accessing the realm objects in a background thread, however, they are already loaded into an array and I never make a new query to the database.
In addition, the var i'm updating his not saved into the database.
Any idea how to solve this? I wanted to avoid doing the math in the main thread.
thanks in advance
I assume you wrap Realm Results objects into Array like the following:
let results = realm.objects(Point)
self.points = Array(results)
However, that is not enough. Because each element in the array is still tied with Realm, that cannot be access another thread.
A recommended way is re-create Realm and re-fetch the Results each threads.
dispatch_async(backgroundQueue, {
let realm = try! Realm()
let points = realm.objects(...)
try! realm.write {
for point in points{
let stringDistance = self.distanceToPoint(currentLocation, destination: point.coordinate)
point.stringDistance = stringDistance
}
}
dispatch_async(dispatch_get_main_queue(), { () -> Void in
...
})
})
Realm objects have live-update feature. When committed changed to Realm objects on sub-thread, those changes reflect to the objects in other thread immediately. So you do not need to re-fetch the query in the main thread. What you should do is just reload the table view.
If you'd like to wrap array and pass it to other thread directly, you should wrap all elements of resutls as follows:
let results = realm.objects(Point)
self.points = results.map { (point) -> Point in
return Point(value: point)
}
Basically I am trying to display 3 users, I am querying from the _User class the following: username, profilePicture & Name.
After that I would like to query the last photo they posted from the Posts class.
Here is how I have setup my code :
let userQuery = PFQuery(className: "_User")
userQuery.limit = 3
userQuery.addDescendingOrder("createdAt")
userQuery.findObjectsInBackgroundWithBlock ({ (objects:[PFObject]?, error:NSError?) -> Void in
if error == nil {
self.profilePicArray.removeAll(keepCapacity: false)
self.usernameArray.removeAll(keepCapacity: false)
self.fullnameArray.removeAll(keepCapacity: false)
self.uuidArray.removeAll(keepCapacity: false)
for object in objects! {
self.profilePicArray.append(object.valueForKey("profilePicture") as! PFFile)
self.usernameArray.append(object.valueForKey("username") as! String)
self.fullnameArray.append(object.valueForKey("firstname") as! String)
self.uuidArray.append(object.valueForKey("uuid") as! String)
}
let imageQuery = PFQuery(className: "Posts")
imageQuery.whereKey("username", containedIn: self.usernameArray)
imageQuery.findObjectsInBackgroundWithBlock({ (objects:[PFObject]?, error:NSError?) -> Void in
if error == nil {
self.lastPicArray.removeAll(keepCapacity: false)
for object in objects! {
self.lastPicArray.append(object.valueForKey("image") as! PFFile)
}
self.collectionView.reloadData()
} else {
print(error!.localizedDescription)
}
})
} else {
print(error!.localizedDescription)
}
})
}
But when I run it, it isn't showing the correct image for the user showing...I cannot get my head round it, it's been driving me mad for several hours now!!
The background thread isn't a problem for fetching your images, but your image query is not set to return images in any specific order, nor does it seem to limit how many photos will come back for each user (unless you have it set up so that the Post object can only hold one post per user). You could handle this in a few different ways, but the easiest way might be to sort your object array in some predictable order, using the username as the sort key, and then when you get your images back you can sort them in the same order based on their username property before putting them into the array that drives the collectionView.
One other note - you'll want to put your .reloadData() call back on the main thread or it'll happen at an unpredictable time.
It might cause by data transferring in different time cost since you use findObjectsInBackgroundWithBlock. Threads running in background are hard to control. So the order of images in imageQuery objects is different with usernameArray. Or findObjectsInBackgroundWithBlock doesn't guarantee that data will be fetched in same order with keys set. Not 100% sure about this.
But try putting the image fetching procedure inside the first for object in objects! loop. It might be able to solve your problem.
For sorting answer from creeperspeak, you should be able to get username back from imageQuery class just like object.valueForKey("username") in your first loop. Then match them with self.usernameArray.
I am using CloudKit as a server backend for my iOS application. I'm using it to house some relatively static data along with a handful of images(CKAsset). I ran into a problem when the time came for me to actually fetch those assets from the public database. They load at an excruciatingly slow speed.
My use case is to load an image into every cell inside of a collection view. The images are only 200kb in size, but the fetch process took an average of 2.2 seconds for the download to complete and set the image in a cell. For comparison, I took URLs of similar sized stock images and loaded them in using NSURLSession. It took a mere 0.18 - 0.25 seconds for each image to load.
I have tried multiple different ways of downloading the images from CK: direct fetch of the record, query, and operation query. All of them have similar results. I am also dispatching back to the main queue within the completion block prior to setting the image for the cell.
My database is setup to have a primary object with several fields of data. I then setup a backwards reference style system for the photos, where each photo just has a reference to a primary object. That way I can load the photos on demand without bogging down the main data.
It looks something like this:
Primary Object:
title: String, startDate: Date
Photo Object:
owner: String(reference to primary object), image: Asset
Here is an example request that I tried to directly fetch one of the photos:
let publicDb = CKContainer.defaultContainer().publicCloudDatabase
let configRecordId = CKRecordID(recordName: "e783f542-ec0f-46j4-9e99-b3e3ez505adf")
publicDb.fetchRecordWithID(configRecordId) { (record, error) -> Void in
dispatch_async(dispatch_get_main_queue()) {
guard let photoRecord = record else { return }
guard let asset = photoRecord["image"] as? CKAsset else { return }
guard let photo = NSData(contentsOfURL: asset.fileURL) else { return }
let image = UIImage(data: photo)!
cell.cardImageView.image = image
}
}
I can't seem to figure out why these image downloads are taking so long, but it's really quite the showstopper if I can't get them to load in a reasonable about of time.
Update: I tried the fetch operation with a smaller image, 23kb. The fetch was faster, anywhere from 0.3 - 1.1 seconds. That's better, but still doesn't meet the expectation that I had for what CloudKit should be able to provide.
I am using CKQueryOperation. I found that once I added the following line to my code that downloading CKAssets sped up by about a factor of 5-10x.
queryOperation.qualityOfService = .UserInteractive
Here is my full code:
func getReportPhotos(report:Report, completionHandler: (report:Report?, error:NSError?) -> ()) {
let photo : Photo = report.photos![0] as! Photo
let predicate : NSPredicate = NSPredicate(format: "recordID = %#", CKRecordID(recordName: photo.identifier!))
let query : CKQuery = CKQuery(recordType: "Photo", predicate: predicate)
let queryOperation : CKQueryOperation = CKQueryOperation()
queryOperation.query = query
queryOperation.resultsLimit = numberOfReportsPerQuery
queryOperation.qualityOfService = .UserInteractive
queryOperation.recordFetchedBlock = { record in
photo.date = record.objectForKey("date") as? NSDate
photo.fileType = record.objectForKey("fileType") as? String
let asset : CKAsset? = record.objectForKey("image") as? CKAsset
if asset != nil {
let photoData : NSData? = NSData(contentsOfURL:asset!.fileURL)
let photo : Photo = report.photos![0] as! Photo
photo.image = UIImage(data:photoData!)
}
}
queryOperation.queryCompletionBlock = { queryCursor, error in
dispatch_async(dispatch_get_main_queue(), {
completionHandler(report: report, error: error)
})
}
publicDatabase?.addOperation(queryOperation)
}
There seems to be something slowing down your main thread which introduces a delay in executing the capture block of your dispatch_async call. Is it possible that your code calls this record fetching function multiple times in parallel ? This would cause the NSData(contentsOfURL: asset.fileURL) processing to hog the main thread and introduce cumulative delays.
In any case, if only as a good practice, loading the image with NSData should be performed in the background and not on the main thread.
I have the following Core Data model:
And I'm trying to update the many-to-many relationship between Speaker and TalkSlot from a JSON I receive from a REST API call.
I have tried dozens of ways, replacing my many-to-many by 2 one-to-many's, removing from one side or the other, but one way or the other I keep getting EXC_BAD_ACCESS or SIGABRT and I just don't understand the proper way to do it. Here is the last thing I tried:
for speaker in NSArray(array: slot!.speakers!.allObjects){
if let speaker = speaker as? Speaker {
speaker.mutableSetValueForKey("talks").removeObject(slot!)
}
}
slot!.mutableSetValueForKey("speakers").removeAllObjects()
if let speakersArray = talkSlotDict["speakers"] as? NSArray {
for speakerDict in speakersArray {
if let speakerDict = speakerDict as? NSDictionary {
if let linkDict = speakerDict["link"] as? NSDictionary {
if let href = linkDict["href"] as? String {
if let url = NSURL(string: href) {
if let uuid = url.lastPathComponent {
if let speaker = self.getSpeakerWithUuid(uuid) {
speaker.mutableSetValueForKey("talks").addObject(slot!)
slot!.mutableSetValueForKey("speakers").addObject(speaker)
}
}
}
}
}
}
}
}
If it helps, the API I'm using is documented here as I'm trying to cache the schedule of a conference into Core Data in an Apple Watch extension. Note that I managed to store all the rest of the schedule without any issue. But for this relationship, each time I try to update it after storing it the first time, I get an EXC_BAD_ACCESS (or sometimes a SIGABRT), at a random place in my code of course. Any idea what I'm doing wrong?
OK, after reading a few other questions associating Core Data EXC_BAD_ACCESS errors and multi-threading, I noticed that I was doing my caching on a NSURLSession callback. Once I called my caching function on the main thread using the code below, the EXC_BAD_ACCESS errors completely disappeared and now the data seems to be saved and updated properly:
dispatch_async(dispatch_get_main_queue(), { () -> Void in
self.cacheSlotsForSchedule(schedule, data: data)
})