How to correctly add large dataset in CoreData?

How to correctly add large dataset in CoreData? - ios

I have a huge NSArray (4.000.000 objects) that I want to save into Core Data.
Because I use ARC and the autorelease pool may get too big, I partitioned the process into multiple loops (so autorelease pools can have the chance to drain themselves).
In the following code I use a manager (clMan) to add items from the dictionaries inside the array(regions). The dictionaries contain two string fields which are parsed into scalar integers.
Code for partitioning the data into multiple loops
int loopSize = 50000;
int loops = 0;
int totalRepetitions = regions.count;
loops = totalRepetitions / loopSize;
int remaining = totalRepetitions % loopSize;
loops += 1;
for (int i = 0; i < loops; i++) {
int k = 0;
if (i == 0) k = 1;
if (i == (loops - 1))
{
// Last loop
for (long j = i * loopSize + k; j < i * loopSize + remaining; j++) {
[clMan addItemWithData:[regions objectAtIndex:j]];
}
[clMan saveContext];
break;
}
// Complete loops before the last one
for (long j = i * loopSize + k; j < (i + 1) * loopSize; j++) {
[clMan addItemWithData:[regions objectAtIndex:j]];
}
[clMan saveContext];
NSLog(#"Records added : %d", i * loopSize);
}
NSLog(#"Finished adding into core data");
Code for adding the data into core data:
-(void)addItemWithData:(NSDictionary *)data
{
MyRegion *region = [NSEntityDescription
insertNewObjectForEntityForName:#"MyRegion"
inManagedObjectContext:self.context];
region.index = [((NSString *)[data objectForKey:REGION_INDEX]) intValue];
region.id = [((NSString *)[data objectForKey:REGION_ID]) intValue];
}
The program crashes when it reaches the 1 500 000 index. The crash does not seem to happen because of parsing issues / logic.
Can anyone tell me if my logic is bad or what is the correct way to add this amount of data in CoreData?

After each loop, try calling NSManagedObjectContext.reset to "forget" the local copies in the MOC. Otherwise these might not be cleared and are causing a problem.
The WWDC 2012 code samples on iCloud have a method called seedStore where they migrate a local core data SQL database to the iCloud one - using a batch size of 5000 records and there it explicitly states that:
if (0 == (i % batchSize)) {
success = [moc save:&localError];
if (success) {
/*
Reset the managed object context to free the memory for the inserted objects
The faulting array used for the fetch request will automatically free
objects with each batch, but inserted objects remain in the managed
object context for the lifecycle of the context
*/
[moc reset];
} else {
NSLog(#"Error saving during seed: %#", localError);
break;
}
}
(Here i is the current index of the batch, thus i % batchSize == 0 if we start a new batch)

Related

NSData to NSArray of NSNumbers?

I'm coming from Swift to Objective-C and a problem I have run into is that NSData doesn't seem have methods to enumerate over UInt8 values the way Swift's Data does. Other answers with similar titles exist but they all deal with property lists, whereas I just have a bucket of ASCII bytes. Specifically, the code I want to replicate in Obj-C is:
//Find the next newline in ASCII data given a byte offset
let subset = myData.advanced(by: offset)
var lineEnd: Data.Index = myData.endIndex
subset.enumerateBytes({ (memory, idx, stop) in
let newline: UInt8 = 10
for idx in memory.indices {
let charByte = memory[idx]
if charByte == newline {
lineEnd = idx
stop = true; return;
}
}
})
Ideally I want a way to convert an NSArray to an array of NSNumbers which I can extract the intValues from. Of course, if there is a better method, let me know. A goal is to keep the Obj-C code as similar to Swift as possible as I will be maintaining the two codebases simultaneously.

The only good way to access an NSData's bytes is to call its -bytes method, which gets you a C pointer to its internal storage.
NSData *data = ...;
const uint8_t *bytes = data.bytes;
NSUInteger length = data.length;
for (NSUInteger i = 0; i < length; i++) {
uint8_t byte = bytes[i];
// do something with byte
}

The closest equivalent to advance would be subdataWithRange. The equivalent to enumerateBytes is enumerateByteRangesUsingBlock. It would yield something like:
NSData *subset = [data subdataWithRange:NSMakeRange(offset, data.length - offset)];
__block NSUInteger lineEnd = data.length;
Byte newline = 10;
[subset enumerateByteRangesUsingBlock:^(const void * _Nonnull bytes, NSRange byteRange, BOOL * _Nonnull stop) {
for (NSInteger index = 0; index < byteRange.length; index++) {
Byte charByte = ((Byte *)bytes)[index];
if (charByte == newline) {
lineEnd = index + byteRange.location;
*stop = true;
return;
}
}
}];
Note, I made a few changes from your Swift example:
If the data was not contiguous, your Swift example returns the index within the current block. But I suspect you want the location within subset, not the current block. I'm wagering that you've never noticed this because it's pretty rare that NSData blocks are not continuous.
But the Swift code doesn't look correct to me. This Objective-C example reports the offset within subset, not within the current block within subset.
It's not observable performance difference, but I pulled the definition of newline out of the enumeration block. Why repeatedly define that?
If you're really searching for a character in the NSData, I'd suggest avoiding creating the subset altogether. Just use rangeOfData:options:range:. This will find whatever you're looking for.

upload many files and save in CoreData

Using the API I get 500 pictures, upload them asynchronously. Then I want to keep all these pictures in CoreData, but the application crashes due to insufficient memory.
When upload finished i call method createFromBlock
+(id)createFromBlock:(MRBlock *)block{
ManagedBlock *item = [ManagedBlock MR_createInContext:DefaultContext];
item.id = #(block.id);
item.name = block.name;
item.slidesInBlock = #(block.slidesInBlock);
item.sizeBlock = block.sizeBlock;
item.desc = block.desc;
item.imagePath = block.imagePath;
item.image = [MRUtils transformedValue:block.image];
item.price = block.price;
int i = 0;
ManagedItem *new = nil;
for (MRItem *lol in block.items){
NSLog(#"%i", i);
new = [ManagedItem createFromItem:lol];
new.block = item;
[item addItemsObject:new];
new = nil;
i++;
}
[DefaultContext MR_saveWithOptions:MRSaveSynchronously completion:nil];
return item;
}
In foreach block.items app is crashed. approximately after 150-160 positions.
If i comment new = [ManagedItem createFromItem:lol]; - app dont crash
+(id)createFromItem:(MRItem *)object{
ManagedItem *item = [ManagedItem MR_createInContext:DefaultContext];
item.id = #(object.id);
item.title = object.title;
item.detail = object.detail;
item.imagePath = object.imagePath;
item.image = [MRUtils transformedValue:object.image];
return item;
}

First, you should not load all your data and then save it. You should load in small batches, and save each batch.
However, for your specific example, I believe you can get away with turning each object into a fault after saving.
I have never used magical record, and have no idea if it will allow you do to contexty things without calling its methods. I looked at it once, but it hides way too many details for me... Unless you are doing the most basic things, Core Data can be quite complex, and I want to know everything that is going on with my core data code.
+(id)createFromBlock:(MRBlock *)block{
#autoreleasepool {
ManagedBlock *item = [ManagedBlock MR_createInContext:DefaultContext];
item.id = #(block.id);
item.name = block.name;
item.slidesInBlock = #(block.slidesInBlock);
item.sizeBlock = block.sizeBlock;
item.desc = block.desc;
item.imagePath = block.imagePath;
item.image = [MRUtils transformedValue:block.image];
item.price = block.price;
NSUInteger const batchSize = 10;
NSUInteger count = block.items.count;
NSUInteger index = 0;
while (index < count) {
#autoreleasepool {
NSMutableArray *newObjects = [NSMutableArray array];
for (NSUInteger batchIndex = 0;
index < count && batchIndex < batchSize;
++index, ++batchIndex) {
MRItem *lol = [block.items objectAtIndex:index];
ManagedItem *new = [ManagedItem createFromItem:lol];
new.block = item;
[item addItemsObject:new];
[newObjects addObject:new];
}
[DefaultContext MR_saveWithOptions:MRSaveSynchronously completion:nil];
for (NSManagedObject *object in newObjects) {
// Don't know if MagicalRecord will like this or not...
[DefaultContext refreshObject:object mergeChanges:NO];
}
}
}
return item;
}
Basically, that code processes 10 objects at a time. When that batch is done, the context is saved, and then turns those 10 objects into faults, releasing the memory they are holding. The auto-release-pool makes sure that any object created in the containing scope are released when the scope exits. If no other object holds a retain on the objects, then they are deallocated.
This is a key concept when dealing with large numbers or just large objects in Core Data. Also, understanding the #autoreleasepool is extremely important, though people who have never used MRR do not appreciate its benefit.
BTW, I just typed that into the editor, so don't expect it to compile and run with a simple copy/paste.

Count of child in Coredata

I need to find count of children based on logic.
I have table A, it has two relationship B and C. Now i need to find count of B and C.
Count = No of B * NO of C.
Data:
A1
{
{
B1a
},
{
C1a,
C1b
}
},
A2:
{
{
B2a,
B2b
},
{
C2a,
C2b
}
}
Total Count = 6
i have tried with following
NSEntityDescription *entity = [NSEntityDescription entityForName:#"A" inManagedObjectContext:context];
NSArray *allObjects = [context executeFetchRequest:fetchRequest error:&fetchError];
NSInteger totalCount= 0;
for(A *a in allObjects)
{
NSInteger countOfB = [a.B count];
NSInteger countOfc = [a.C count];
totalCount = totalCount + (countOfB * countOfc);
}
This is work fine. But when i have 10000 records it is taking more time. Please suggest me if any alternative ways.

Don't do the multiplication on demand. Each time an A instance has the relationship to B or C changed, calculate the product and store it in a new attribute on A. Now to fetch your total count you can use only a single fetch (returning dictionary type) and then #sum (using the array with a collection operator) and none of the objects need to actually be loaded into memory.
Consider using KVO to monitor for relationship changes and trigger the update to your product.

Comparing objects of NSArray1 to NSArray2 [duplicate]

This question already has answers here:
Finding out NSArray/NSMutableArray changes' indices
(3 answers)
Closed 8 years ago.
Here is what i need it to do.
NSArray has 10 objects
NSArray2 has the same 10 objects but in different indexes.
I need to compare if NSArray1 index:5 matches NSArray2 index:5 if not tell me if the object has moved up or down in the NSArray2, same for every other object inside that array.
The objects have the following properties: id and name.
Any suggestion on how i can accomplish this?

If you have RAM to spare you could build a map from the object's id to its index in array 1, then scan array 2 and compare, like:
NSMutableDictionary *map = [[NSMutableDictionary alloc] init];
for (NSUInteger j = 0; j < array1.count; j++) {
id object = array1[j];
map[object.id] = #(j);
}
for (NSUInteger j = 0; j < array2.count; j++) {
id object = array2[j];
id identifier = object.id;
NSUInteger array1Index = [map[identifier] unsignedIntegerValue];
// Compare array1Index to j here.
}
That'll let you compare with a running time that grows like the number of objects in the arrays, but note that you have to spend some extra RAM to make that map. You could compare with only constant RAM costs if you're willing to spend more time:
for (NSUInteger j = 0; j < array1.count; j++) {
id object = array1[j];
NSUInteger k = [array2 indexOfObject:object];
// Compare j and k, note that k could be NSNotFound.
}
And that should have a running time that grows like the product of the array counts.

Preferred way to make a mutable copy of a non-mutable object?

There are 2 choices (possibly more). Using NSSet as an example:
NSMutableSet * mutableSet = [ NSMutableSet setWithSet:nonMutableSet ] ;
or
NSMutableSet * mutableSet = [ [ nonMutableSet mutableCopy ] autorelease ] ;
Is there any difference between these two implementations? Is one "more efficient" at all? (Other examples would be NSArray/NSMutableArray and NSDictionary/NSMutableDictionary

Benchmark fun! :)
#import <Foundation/Foundation.h>
int main (int argc, const char * argv[])
{ #autoreleasepool {
NSMutableSet *masterSet = [NSMutableSet set];
for (NSInteger i = 0; i < 100000; i++) {
[masterSet addObject:[NSNumber numberWithInteger:i]];
}
clock_t start = clock();
for (NSInteger i = 0; i < 100; i++) {
#autoreleasepool {
[NSMutableSet setWithSet:masterSet];
}
}
NSLog(#"a: --- %lu", clock() - start);
sleep(1);
start = clock();
for (NSInteger i = 0; i < 100; i++) {
#autoreleasepool {
[[masterSet mutableCopy] autorelease];
}
}
NSLog(#"b: --- %lu", clock() - start);
return 0;
} }
On my machine (10.7), setWithSet: is ~3x slower than -mutableCopy (Does somebody want to try on iOS 5? :) )
Now, the question is: why?
-mutableCopy is spending most of its time in CFBasicHashCreateCopy() (see CFBasicHash.m). This appears to be copying the hash buckets directly, with no rehashing.
Running Time Self Symbol Name
256.0ms 61.5% 0.0 -[NSObject mutableCopy]
256.0ms 61.5% 0.0 -[__NSCFSet mutableCopyWithZone:]
256.0ms 61.5% 0.0 CFSetCreateMutableCopy
255.0ms 61.2% 156.0 CFBasicHashCreateCopy
97.0ms 23.3% 44.0 __CFSetStandardRetainValue
-setWithSet is enumerating through each value of the set, and then adding it to the new set. From the implementation of CFBasicHashAddValue (again in CFBasicHash.m), it looks like it is rehashing each value in the set.
Running Time Self Symbol Name
1605.0ms 86.0% 0.0 +[NSSet setWithSet:]
1605.0ms 86.0% 2.0 -[NSSet initWithSet:copyItems:]
1232.0ms 66.0% 68.0 -[__NSPlaceholderSet initWithObjects:count:]
1080.0ms 57.8% 299.0 CFBasicHashAddValue
324.0ms 17.3% 28.0 -[NSSet getObjects:count:]
272.0ms 14.5% 75.0 __CFBasicHashFastEnumeration
This rehash makes sense at the CFSet level. CFSets take a CFSetHashCallBack in the callBacks parameter; thus, two CFSets of CFNumbers could have a different hashing routine specified. Foundation's NSSet uses CFSet under-the-hood, and has a CFSetHashCallBack function which invokes -[NSObject hash]. (Although I guess that Apple could optimize this case and avoid the rehash when two sets have the same hash callback).
Note that this benchmark is for NSSet (of NSNumbers) only, other collection classes may have different performance characteristics.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How to correctly add large dataset in CoreData? - ios

Related

NSData to NSArray of NSNumbers?

upload many files and save in CoreData

Count of child in Coredata

Comparing objects of NSArray1 to NSArray2 [duplicate]

Preferred way to make a mutable copy of a non-mutable object?

Categories

Resources