I noticed after increasing the number of arrays that are instantiated into memory from 8 to 23, my app just stops running
[NSMutableArray addObject:obj]
on the 13th array on 32 bit devices only. On an iPad Air 2 (device and sim) and iPhone 6 (device and sim) all 23 arrays are populated and the app functions as expected.
I understand there's a point at which a device will run out of available memory, and I noticed in Xcode on a 32 bit device, the app memory was hovering around 50-55mb, but the app doesn't crash or give a memory warning in the console. On a 64 bit device or sim, at the same point of interest, the app memory is around 90-95mb?
1) how is memory for 32 bit devices different from 64 bit devices when it comes to the amount of data that can be instantiated?
2) is there a certain number of arrays that can be init to memory unrelated to the size, considering I could populate 2 out of 23 arrays with a single small object and the first would have the right count and the 2nd (any array with an ID > 13) would be 0 like this?
if (obj.eventTypeID == [NSNumber numberWithInt:1]) {obj.color = [UIColor whiteColor];[array1 addObject:obj];}
//ALL ARRAYS ALWAYS POPULATE NO MATTER THE COUNT OR SIZE BETWEEN 1 AND 13
if (obj.eventTypeID == [NSNumber numberWithInt:13]) {obj.color = [UIColor greenColor];[array13 addObject:obj];}
//ALL ARRAYS ARE ALWAYS EMPTY BETWEEN 14 AND 23
if (obj.eventTypeID == [NSNumber numberWithInt:23]) {obj.color = [UIColor redColor];[array23 addObject:obj];}
Hopefully that's enough to go on. Just remember, the app works as expected on 64 bit, but not in 32 bit.
For what it's worth, I ended up fixing the problem by casting the conditional statements using intValue...genius, I know...
if ([obj.eventTypeID intValue] == 1)
Although this solved a huge bug, I still don't understand why the precision of doing it the other way is any different.
obj.eventTypeID is an NSInteger
So why is this any different? More importantly, why is this difference powerful enough to stop the thread from processing these conditions inside a loop if it's not an intValue? These are some of the unanswered parts of this question, but it's long been solved with the solution above.
Related
I'm comparing two NSNumbers in my app and I've done it the wrong way:
if(max < selected)
And it should be:
if([max longValue] < [selected longValue])
So the first comparison is really comparing the two object memory addresses, the funny thing (at least for me) is that the values seems to be related with the addresses. For example, if I get the first number with value 5 its memory address is 0xb000000000000053 and if I get the second with 10 is 0xb0000000000000a3 (being "a" equivalent to 10 in hexadecimal).
For that reason the first comparison (wrong) was actually working. Now an user complaint about an error here and is obviously because of this but it has lead me to the next questions:
Does this only happen in simulators? Cause it's where I'm testing, and the user will have a real device. Or maybe this happen normally but it's not a rule always fulfilled?
This is a "tagged pointer," not an address. The value 5 is packed inside the pointer as you've seen. You can identify tagged pointers because they're odd (the last bit is 1). It's not possible to fetch odd addresses on any of Apple's hardware (the word size is 4 or 8 bytes), so that bit is never set for a real address.
Tagged pointers are only available on 64-bit platforms. If you run on a 32-bit platform then the values will be real pointers, and they may not be in any particular order, which will then lead to the kinds of bugs you're encountering. Unfortunately I don't believe there is any way to get a compiler warning or even a static analysis warning for this kind of misuse on NSNumber.
Mike Ash provides an in-depth discussion of the subject.
On a slightly related note, on 32-bit platforms, certain NSNumbers are singletons, particularly small values since they're used a lot (-1 through 12 as I recall, but I believe it's different on different platforms). This means that == may happen to work for some numbers, but not for others. It also means that without ARC, it was possible to over-release a specific value (for example, 4) such that your program would crash the next time it happened to use that value. True story.... very hard to debug.
I realized Data memory implementation in nand2tetris course. But I really don't understand some parts of my implementation:
CHIP Memory {
IN in[16], load, address[15];
OUT out[16];
PARTS:
DMux4Way(in=load, sel=address[13..14], a=RAM1, b=RAM2, c=scr, d=kbr);
Or(a=RAM1, b=RAM2, out=RAM);
RAM16K(in=in, load=RAM, address=address[0..13], out=RAMout);
Screen(in=in, load=scr, address=address[0..12], out=ScreenOut);
Keyboard(out=KeyboardOut);
Mux4Way16(a=RAMout, b=RAMout, c=ScreenOut, d=KeyboardOut, sel=address[13..14], out=out);
}
Is responsible for what load here. I understand that if load is 0 - out of Dmux4Way in any case will be 0 0 0 0. But i don't understand how it works in that case after that. Namely how it allows don't load data in Memory.
At least incomprehensible why in Screen we fed address[0..12] instead address[0..14] - full address. In my opinion we should use second because Screen memory map stay after RAM memory map and if we want to request for Screen memory map - we should use range (16 384 - 24 575) - decimal or (100000000000000 - 101111111111111) - binary. But how we can represent that range use just 13 width buss (address[0..12]) ??? It's impossible.
Therefore if we want to represent Screen memory map we should use range which was presented above. And that range has 15 width or address[0..14] BUT not address[0..12] (width 13). But why works just address[0..12] and doesn't work address[0..14](full address)
DMux4Way(in=load, sel=address[13..14], a=RAM1, b=RAM2, c=scr, d=kbr);
I'm sorry to criticize you at the beginning, but questions you ask suggest that you didn't do this exercise yourself or didn't start the whole course from the beginning.
To answer your questions:
Ad.1.
You demultiplex a single bit (load bit) to the correct memory part. Thereafter, you then feed the input data to all memory parts at the same time.
It's easier and neater than doing it the other way around, namely, to direct 16-bit input to the correct part (RAM16K, screen, or keyboard) while having a load bit that is connected and active at every register in all the parts.
To clarify. You have 2 possible destinations when writing data: RAM and Screen. The smallest demultiplexer you have is a 4-way multiplexer and that's what you're using. When you write into memory, you need to provide 2 pieces of information: the data and destination, both at the same time.
You might demultiplex the input data with DMux4Way16 and separately single load bit with DMux4Way but that would take 2 demultiplexers, and we can do better than that. That's what's done here, you direct data input to both RAM and Screen and then only use one demultiplexer : DMux4Way to select one of 2 possible destinations, only the one selected will be loaded with new data, on the other data input will be ignored. Knowing that, you need to study A-instruction format: when bit 14 and 13 of A-instruction (or data residing in A-register) have the binary value 00 or 01, the destination is RAM. When bit 14 and 13 have the binary value 10, it means the screen is the destination.
When you notice that you choose these 2 bits as sel for your demultiplexer. Selections 0 and 1 have the same meaning, so you can OR them and feed the output as load to RAM. Selection 2 means Screen will be loaded with a, new value, so load bit goes there. Selection 3 is never used so we don't care about it - output d of demultiplexer will not be connected anywhere. We make use of the demultiplexer's feature: The selected output will have value 1 and all other outputs will yield 0 as a result. It means only 1 memory destination will be loaded.
Ad.2.
Screen is separate device, it has nothing to do with RAM, ROM or Keyboard memory devices here. You, and only you, give meaning to what bits mean what to this specific device. To answer your question, when you address some register in Screen you address it in its own internal address space. In its internal address space first address will be 0, but from whole Memory it will be 16384. It's your job to make this transition. In this particular case, size of Screen memory device it is not necessary to use 14-bit address bus, 13 bits is all you need. What would 14th bit mean in this case? It wouldn't add any value. Also, you are user and not designer of Screen, you only look at and follow its interface description.
Hope it answers your questions, if not I urge you to go back and study more carefully previous hardware related chapters from course.
My app needs to load large files (3 to 7 MB of text) into arrays of String, i.e., one string per line.
The simple way to go is to use componentsSeparatedByString or componentsSeparatedByCharactersInSet (or their Swift counterparts) as follows:
array = [fileContents componentsSeparatedByCharactersInSet: [NSCharacterSet newlineCharacterSet]];
Everything works as it should, except that for large enough files and old enough devices, times are unacceptably long. Just to be clear, it takes 2+ seconds to create the array on an iPad 4th gen, and 8+ seconds on an iPhone 4.
Since I can't change the hardware of the customers and I can't change componentsSeparatedBy..., is there a way I could speed this up?
Perhaps having on disk, instead of the text file, something closer to an array of String (which would maybe load faster)? If so, how?
(Swift solutions are welcome, too)
I have some places in my code which looks like this:
var i = 0
for c in vertexStates[0] {
//this operation is costy (encapsulates 4 linear interpolation inside)
currentVertexes.append(vertexStates[1][i++].interpolateTo(c, alpha: factor))
}
And I know that there is more than 1000 vertexes in vertexStates[index] array for sure (maybe up to 3000). What are the best practices for optimizing (vectorizing) such operations? Should figure out how to do it in some threads? Will profits from using multi-threading outweight over head? Maybe there are other ways of doing such operations faster?
I need general approach on how to optimize such operations (in my case which produces array from two other arrays and order is important for me), no matter if 3000 counts as long or not. My iPhone 6 Plus CPU is loaded by 65% during this operations, so I can predict 4s will show very poor results, even though I haven't tested it yet.
100 isn't very long. 300 isn't very long. 100,000 is where we can start arguing whether something is very long.
Did you measure how long things take? What is the slowest device where your code could run? If you run on iOS 7, how well does it run on an iPhone 4? If you run on iOS 8 or 9 only, how well does it run on 4s or iPad 2?
The first step is measuring. Post with results.
I'm trying to optimize a function (an FFT) on iOS, and I've set up a test program to time its execution over several hundred calls. I'm using mach_absolute_time() before and after the function call to time it. I'm doing the tests on an iPod touch 4th generation running iOS 6.
Most of the timing results are roughly consistent with each other, but occasionally one run will take much longer than the others (as much as 100x longer).
I'm pretty certain this has nothing to do with my actual function. Each run has the same input data, and is a purely numerical calculation (i.e. there are no system calls or memory allocations). I can also reproduce this if I replace the FFT with an otherwise empty for loop.
Has anyone else noticed anything like this?
My current guess is that my app's thread is somehow being interrupted by the OS. If so, is there any way to prevent this from happening? (This is not an app that will be released on the App Store, so non-public APIs would be OK for this.)
I no longer have an iOS 5.x device, but I'm pretty sure this was not happening prior to the update to iOS 6.
EDIT:
Here's a simpler way to reproduce:
for (int i = 0; i < 1000; ++i)
{
uint64_t start = mach_absolute_time();
for (int j = 0; j < 1000000; ++j);
uint64_t stop = mach_absolute_time();
printf("%llu\n", stop-start);
}
Compile this in debug (so the for loop is not optimized away) and run; most of the values are around 220000, but occasionally a value is 10 times larger or more.
In my experience, mach_absolute_time is not reliable. Now I use CFAbsoluteTime instead. It returns the current time in seconds with a much better precision than the second.
const CFAbsoluteTime newTime = CFAbsoluteTimeGetCurrent();
mach_absolute_time() is actually very low level and reliable. It runs at a steady 24MHz on all iOS devices, from the 3GS to the iPad 4th gen. It's also the fastest way to get timing information, taking between 0.5µs and 2µs depending on CPU. But if you get interrupted by another thread, of course you're going to get spurious results.
SCHED_FIFO with maximum priority will allow you to hog the CPU, but only for a few seconds at most, then the OS decides you're being too greedy. You might want to try sleep( 5 ) before running your timing test, as this will build up some "credit".
You don't actually need to start a new thread, you can temporarily change the priority of the current thread with this:
struct sched_param sched;
sched.sched_priority = 62;
pthread_setschedparam( pthread_self(), SCHED_FIFO, &sched );
Note that sched_get_priority_min & max return a conservative 15 & 47, but this only corresponds to an absolute priority of about 0.25 to 0.75. The actual usable range is 0 to 62, which corresponds to 0.0 to 1.0.
It happens while app spend some time in another threads.