I am in a situation that I need to get the items in an array and time is sensitive. I have the option of using a separate variable to hold the current count or just use NSMutableArray's count method.
ex: if (myArray.count == ... ) or if (myArrayCount == ...)
How expensive is it to get the counting of items from the count method of an array?
The correct answer is, there is no difference in speed, so access the count of the array as you wish my child :)
Fetching NSArray's count method is no more expensive then fetching a local variable in which you've stored this value. It's not calculated when it's called. It's calculated when the array is created and stored.
For NSMutableArray, the only difference is that the property is recalculated any time you modify the contents of the array. The end result is still the same--when you call count, the number returned was precalculated. It's just returning the precalculated number it already stored.
Storing count in a variable, particularly for an NSMutableArray is actually a worse option because the size of the array could change, and access the count in this variable is not faster whatsoever. It only provides the added risk of potential inaccuracy.
The best way to prove to yourself that this is a preset value that is not calculated upon the count method being called is to create two arrays. One array has only a few elements. The other array has tens of thousands of elements. Now time how long it takes count to return. You'll find the time for count to return is identical no matter the size of the array.
As a correction to everyone above, NSArray does not have a count property. It has a count method. The method itself either physically counts all of the elements within the array or is a getter for a private variable the array stores. Unless you plan on subclassing NSArray and create a higher efficient system for counting dynamic and/or static arrays... you're not going to get better performance than using the count method on an NSArray. As a matter of fact, you should count on the fact that Apple has already optimized this method to it's max. My main ponder after this is that if you are doing an asynchronous call and your focus is optimizing the count of an NSArray how do you not know that you are seriously doing something wrong. If you are performing some high performance hitting method on the main thread or such... you should consider optimizing that. The performance hit of iterating and counting through the array using NSArray's count method should in no way effect your performance to any noticeable rate.
You should read up more on performance for NSArrays and NSMutableArrays if this is truly a concern for you. You can start here: link
If you need to get the item**s** then getting the count is not time critical. You'd also want to look at fast enumeration, or using enumeration with dispatch blocks, especially with parallel execution.
Edit:
Asa's is the most correct answer. I misunderstood the question.
Asa is right because the compiler will automatically optimize this and use the fastest way on its own.
TheGamingArt is correct about NSArray being as optimal as could be. However, this is only for obj-c.
Don't forget you have access to c and c++ which means you can use vectors which should be only 'slightly' faster considering it won't use obj-c messaging. However, it wouldn't surprise me if the difference isn't noticeable. c++ vector benchmarks: http://baptiste-wicht.com/posts/2012/12/cpp-benchmark-vector-list-deque.html
This is a good example of Premature Optimization (http://c2.com/cgi/wiki?PrematureOptimization). I suggest you look into GCD or NSOperations (http://www.raywenderlich.com/19788/how-to-use-nsoperations-and-nsoperationqueues)
Related
I am trying to squeeze every bit of efficiency out of my application I am working on.
I have a couple arrays that follow the following conditions:
They are NEVER appended to, I always calculate the index myself
The are allocated once and never change size
It would be nice if they were thread safe as long as it doesn't cost performance
Some hold primitives like floats, or unsigned ints. One of them does hold a class.
Most of these arrays at some point are passed into a glBuffer
Never cleared just overwritten
Some of the arrays individual elements are changed entirely by = others are changed by +=
I currently am using swift native arrays and am allocating them like var arr = [GLfloat](count: 999, repeatedValue: 0) however I have been reading a lot of documentation and it sounds like Swift arrays are much more abstract then a traditional C-style array. I am not even sure if they are allocated in a block or more like a linked list with bits and pieces thrown all over the place. I believe by doing the code above you cause it to allocate in a continuous block but i'm not sure.
I worry that the abstract nature of Swift arrays is something that is wasting a lot of precious processing time. As you can see by my above conditions I dont need any of the fancy appending, or safety features of Swift arrays. I just need it simple and fast.
My question is: In this scenario should I be using some other form of array? NSArray, somehow get a C-style array going, create my own data type?
Im looking into thread safety, would a different array type that was more thread safe such as NSArray be any slower?
Note that your requirements are contradictory, particularly #2 and #7. You can't operate on them with += and also say they will never change size. "I always calculate the index myself" also doesn't make sense. What else would calculate it? The requirements for things you will hand to glBuffer are radically different than the requirements for things that will hold objects.
If you construct the Array the way you say, you'll get contiguous memory. If you want to be absolutely certain that you have contiguous memory, use a ContiguousArray (but in the vast majority of cases this will give you little to no benefit while costing you complexity; there appear to be some corner cases in the current compiler that give a small advantage to ContinguousArray, but you must benchmark before assuming that's true). It's not clear what kind of "abstractness" you have in mind, but there's no secrets about how Array works. All of stdlib is open source. Go look and see if it does things you want to avoid.
For certain kinds of operations, it is possible for other types of data structures to be faster. For instance, there are cases where a dispatch_data is better and cases where a regular Data would be better and cases where you should use a ManagedBuffer to gain more control. But in general, unless you deeply know what you're doing, you can easily make things dramatically worse. There is no "is always faster" data structure that works correctly for all the kinds of uses you describe. If there were, that would just be the implementation of Array.
None of this makes sense to pursue until you've built some code and started profiling it in optimized builds to understand what's going on. It is very likely that different uses would be optimized by different kinds of data structures.
It's very strange that you ask whether you should use NSArray, since that would be wildly (orders of magnitude) slower than Array for dealing with very large collections of numbers. You definitely need to experiment with these types a bit to get a sense of their characteristics. NSArray is brilliant and extremely fast for certain problems, but not for that one.
But again, write a little code. Profile it. Look at the generated assembler. See what's happening. Watch particularly for any undesired copying or retain counting. If you see that in a specific case, then you have something to think about changing data structures over. But there's no "use this to go fast." All the trade-offs to achieve that in the general case are already in Array.
Is there a difference in speed between checking a NSSet if it contains a certain object using [ containsObject:] vs using [ objectsPassingTest:block] with the stop variable set to YES so that it stops after first match?
Also, if the set contains objects of a custom class, my understanding is that the containsObject uses the isEqual: method to perform its check and hence this has to be overridden in the custom class. Will this slow down the containsObject check as opposed to the case where the NSSset contains objects of Apple classes like NSString, NSNumber etc?
I plan to run some benchmarks when I get some time, but have an interview tomorrow and would like to have the answer handy for that one.
Well you should run the benchmarks you plan to, but you can guesstimate an answer.
An implementation of containsObject: might iterate calling isEqual: on each member; while an implementation of objectsPassingTest: might iterate, call the block on each member, and the block calls isEqual:...
I think you can guesstimate based on that. Have a good interview, though if the interviewer reads SO...
Even I have problems with this kind of Qs on SO, I will (partially) answer to it. And I do not think that the interviewer will get the final result, but your thoughts on it.
Both do a check with -isEqual:. But -containsObject: can do it directly, while -objectPassingTest: has to call a block.This might not be expensive, but since the code to execute is not expensive, too, this might cause a performance impact.
Beside this -containsObject: can use hashing to find an object. -objectPassingTest: in NSSet cannot, since it has no idea of what the test is. The block cannot do this neither, because he gets the objects one by one.
However, if you have mutable objects in the set, what object of custom classes are typically, no hashing can be done, because it is impossible to implement useful hashing on mutable objects in a collection.
So my estimation: Having immutable objects with a properly implemented -hash, -containsObject: will beat -objectPassingTest: by far, otherwise not that much.
I have a complex tableview. Objective C
To populate a number of labels in each cell I had a multi-dimension table composed of 2 NSMutableArrays (1 embedded inside the other). The result was an array that had 3 columns per row.
To free up memory I used
[arrayname removeAllObjects];
Well all I can say is that this did absolutely nothing.
This array (which per row was only holding about 130 chars of data and in this sample data I only had 30 rows) was like a virus that's favourite food was memory. It ate chunks of 50mg like there was no tomorrow.
The removeAllObjects did nothing to recover memory.
I have searched high and low and find no clear way to free up memory when you are working with NSMutableArrays and it seems like multi-dimensional ones are cookie monsters.
In the end I removed the multi-dimensional array and just built a single NSMutableArray which was just all of the data concatenated into a string, which I then subStringed out the 3 pieces of data when I needed them.
Memory returned back to normal.
This may not have been the ideal solution. Has anyone found a clear way of releasing the memory of NSMutableArrays ?
It would help if we knew what language you were using. If you're writing in C, just call
free(<array pointer>);
In the end I removed the multi-dimensional array and just built a single NSMutableArray which was just all of the data concatenated into a string, which I then subStringed out the 3 pieces of data when I needed them.
Memory returned back to normal.
This may not have been the ideal solution but it worked
Multi-dimensional NSMutableArray is unusual, but supported. It may not work the way you expect it to; it really is an array of arrays. In many languages, a "2D" array is really a wrapper around a multiplication function. This matters because with NSMutableArray, each subarray could be a different length, and you need to initialize and manage each of them.
But with ARC, the memory will be handled fine. You just have a mistake in your code. You're probably either retaining the subarray somewhere that you didn't mean to, or you aren't waiting for the autorelease pool to drain. But either way, arrays of arrays are fine in Cocoa.
NSArray and NSMutableArray offer multiple ways to sort them using the sortedArrayUsing... and sortUsing... methods respectively, however none of those methods appear to offer a way to terminate a sort after it has been started.
For relatively small arrays, or when the comparison logic is trivial, this is probably not a big deal, but with larger arrays or when the comparison logic is not trivial, I would like to be able to cancel a sort already in process.
Trivial Use Case Example
Sorting a set of results that match based on a user's fuzzy search string. As the user types in the search field, results are fetched on a background thread and sorted before being presented to the user. If the fetch-and-sort operation is not completed before the user changes the search string, then it should be cancelled and a new fetch-and-sort operation started. The problem is that if the fetch-and-sort operation has already reached the sorting stage and called one of the NSArray sort methods above, then there's no way to cancel it. Instead, the next fetch-and-sort operation is left waiting for the now stale sort operation to complete.
So far, I've come up with two possible solutions but neither seems all that elegant.
Attempted Solution #1
Allow newer fetch-and-sort operations to start before any stale fetch-and-sort operations are finished. I just keep track of which is the latest operation using some internal state and as the operations complete, if they aren't the primary operation, then their results are discarded.
This works, but it can quickly result in multiple outstanding sorting operations all running concurrently, whether they need to be or not. This can be somewhat mitigated by throttling the maximum number of concurrent operations, but then I'm just adding an arbitrary limit. Pending, stale operations can be cancelled before they get executed, but I'm still left with situations where sorting work is being done when it doesn't need to be.
Attempted Solution #2
Roll my own quick sort or merge sort implementation and add an isCancelled flag to those routines so that they can quickly unwind and terminate. This is working, and working fairly well, but when the sorting operation doesn't need to be cancelled, the run time is about 15-20% slower than using one of the NSArray methods. Part of this, I imagine, is the overhead of calling methods like objectAtIndex and exchangeObjectAtIndex which I assume the internal sorting routines can bypass depending on how the NSArray is internally storing the objects in question.
It also feels wrong to be rolling my own sorting implementations in 2015 against something like AppKit and NSArray.
Semi-Attempted Solutions
Keeping a previously sorted array around and re-using that for filtering: This doesn't really work for what I'm trying to do so for sake of discussion, assume that the array I have to sort on is always unsorted and has no relationship to the previously sorted array.
Moving away from NSArray and back to C-style arrays: This works pretty well and the performance is quite good, but I'm left playing a bunch of games with ARC and the complexity of the overall implementatiion is significantly higher because at the end of the day, I'm always dealing with NSObjects. There's also a non-zero cost of going back and worth between NSArray and C-style arrays.
Summary
So, all of that to get back to the original question: "How do you cancel an in-progress NSArray sorting method?"
Tech Note
For those that are curious why this is a problem to begin with, I'm attempting to sort somewhere between 500,000 to 1,000,000 strings using compare methods like localizedStandardCompare, which is dramatically slower than just a straight NSString compare. The runtime difference between the various sortUsing... methods is relatively insignificant when compared to the total time to sort.
Starting where you end:
So, all of that to get back to the original question: "How do you cancel an in-progress NSArray sorting method?"
You don't. Cancellation isn't supported and anything you come up with is bound to be fragile.
So back to what you've done:
Roll my own quick sort or merge sort implementation and add an isCancelled flag to those routines so that they can quickly unwind and terminate. This is working, and working fairly well, but when the sorting operation doesn't need to be cancelled, the run time is about 15-20% slower than using one of the NSArray methods.
This is the way to go in this case, you just need to work on that slowdown...
You might be right, part of the slowdown might be the need to call methods for indexing and exchanging elements. Have you tried caching C function pointers to the common methods you require? If at the start of a sort you obtain direct C function pointers to objectAtIndex: et al. using the Objective-C runtime function class_getMethodImplementation() you can replace all the calls to method lookup with simple indirection.
If such manipulations fail then maybe look at C arrays again. As NSArray is toll-free bridged to CFArrayRef you can use CFArrayGetValues to copy out the elements into a malloc'ed C array, sort that, and then use CFArrayCreate to get back to a NSArray. Provided you are careful and not mutating the array you are sorting, as the elements will be in the original array they will already be retained and creating the new array will retain them once more, you can probably handle memory management by doing nothing. Sorting the C-array will be faster, but extraction and creation are going to be O(N) operations on top of the sort.
HTH
After several days of testing, I've opted to go with a custom, in-place merge sort implementation that accepts a boolean flag to trigger a cancellation.
A few follow-up points for those interested:
The raw performance of my merge sort implementation still lags somewhat behind the raw performance of the NSArray sortUsingComparator method. Instruments indicates that NSArray is using a merge sort as well, so I suspect the performance difference can be attributed to a more tuned implementation by Apple than I came up with and the ability to directly access NSArray's internals. NSArray's implementation took about 28 seconds to sort 1,000,000 strings using localizedStandardCompare as compared to 31.5 seconds for mine. (MacBook Air 2013)
Converting an NSMutableArray to a C-array of objects did not yield enough of a performance improvement to warrant the added complexity. The sort time was only reduced by between 0.5 - 1.0 second. Nice to have, but still dwarfed by the time spent in localizedStandardCompare. For input arrays of much smaller sizes (100,000), the speed difference was almost negligible. This surprised me, but Instruments is showing that all of the "overhead" in using an NSMutableArray is mostly noise when compared to the sort operation itself.
Parallelizing the merge function and farming out the tasks via GCD yielded a noticeable improvement of between 6.0 - 7.0 seconds, reducing the total time to sort to less than what NSArray sortUsingComparator was taking. Tuning the job count and stride length based on input array size could offer even more improvements (albeit minor ones at this stage).
Ultimately, a parallelized and cancelable implementation is proving to offer the best user experience for what I have in mind.
Firstly, I think common way to handle the problem you are mentioning, is not to cancel sorting, but to add a delay before fetch/sort operation is made.
Users usually type in short bursts. So add a delay of x seconds(e.g. 0.5s) before fetch and sort will actually begin.
Example:
User types 'a'.
Start a x second timer.
Before timer expires user types 'b'
Invalidate old timer and start a new one with x seconds.
Timer expires, start fetch and sort operation.
Hope this helps.
Instead of implementing your own sorting algorithm (which check for cancelation), you can implement your own comparator, which can check the cancelation condition, and throw an exception to interrupt the NSArray sortUsing...
The call to NSArray sortUsing.. should be enclosed in a try/catch group
- (void) testInterruptSort
{
NSArray *a = #[#"beta", #"alpha", #"omega", #"foo", #"bar"];
NSArray *sorted;
BOOL interrupted = NO;
NSString * const myException = #"User";
#try {
int __block n = 0;
sorted = [a sortedArrayUsingComparator:^(NSString *s1, NSString *s2) {
if (/* cancel condition*/ (1) && (n>5)) {
n++;
NSException *e = [NSException exceptionWithName:myException reason:#"interrupted" userInfo:nil];
[e raise];
}
return [s1 localizedStandardCompare:s2];
}];
}
#catch (NSException *exception) {
// should check if this is the "User" exception
// see https://developer.apple.com/library/mac/documentation/Cocoa/Conceptual/Exceptions/Tasks/HandlingExceptions.html
interrupted = YES;
}
#finally {
}
NSLog(#"interrupted: %#, result = %#\n", interrupted ? #"YES":#"NO", sorted);
}
I would like to know what isEqualToArray actually does...
I have an array with size 160, each containing a dictionary with 11 entries, but I can do the comparison simply based on the first column (contains the date when the row was changed).
Now I can do that with a simple for-cycle:
BOOL different = FALSE;
for (int index = 0 ; index < [newInfo count] ; ++index)
if (![[[oldInfo objectAtIndex:index] objectForKey:#"Update"] isEqual:[[newInfo objectAtIndex:index] objectForKey:#"Update"]]) {
different = TRUE;
break;
}
if (different) {
}
else
NSLog(#"Contact information hasn't been updated yet");
Or I can use the built-in isEqualToArray method:
if ([oldInfo isEqualToArray:newInfo])
NSLog(#"Contact information hasn't been updated yet");
else {
NSLog(#"Contact information has been updated, saving new contact information");
[newInfo writeToFile:path atomically:YES];
}
Now, if assuming isEqualToArray just invokes isEqualTo for each cell, the for-loop method runs for 1/11 of the time isEqualToArray does (only need to compare one column instead of 11).
Maybe I'm just too much into optimizing... (I've been at many contests where runtime was limited and I'm feeling the after-effects).
The Documentation says:
Two arrays have equal contents if they each hold the same number of objects and objects at a given index in each array satisfy the isEqual: test.
So basically you are right.
From a design point of view I would either go for isEqualToArray:, since it makes the code easier to understand or introduce a BOOL hasUpdates if you are concern about performance, which has the additionally advantage that you don't have to hold two copies in memory.
I suspect that many people wrongly assume that performance is proportional to the number of source statements executed and that a function like isEqualToArray is blindingly fast compared to the equivalent directly-coded loop.
In fact, while sometimes the coders of these APIs do indeed know a few "tricks of the trade" that speed things up a bit (or have access to internal interfaces you can't use), just as often they must throw in additional logic to handle "oddball" cases that you don't care about, or simply to make the API "general".
So in most cases the choice should be based on which most reasonably fits the overall program and makes the logic clear. In some cases the explicit loop is better, especially if one can harness some of the logic (eg, to take a later-required "max" of the array values) to avoid duplication of effort.
Also, when there is a complex API function (more complex than isEqualToArray) you're not quite sure you understand, it's often better to code things in a straight-forward manner rather than deal with the complex function. Once you have the code working you can come back and "optimize" things to use the complex API.
When you know both objects are Arrays, isEqualTo<Class> method is a faster way to check equality than for loop.
isEqualTo<Class> is used to provide specific checks for equality.so isEqualToArray: checks that the arrays contain an equal number of objects.
So as per my knowledge i can say isEqualToArray is better option when you know that two objects are arrays.