I would like to know what isEqualToArray actually does...
I have an array with size 160, each containing a dictionary with 11 entries, but I can do the comparison simply based on the first column (contains the date when the row was changed).
Now I can do that with a simple for-cycle:
BOOL different = FALSE;
for (int index = 0 ; index < [newInfo count] ; ++index)
if (![[[oldInfo objectAtIndex:index] objectForKey:#"Update"] isEqual:[[newInfo objectAtIndex:index] objectForKey:#"Update"]]) {
different = TRUE;
break;
}
if (different) {
}
else
NSLog(#"Contact information hasn't been updated yet");
Or I can use the built-in isEqualToArray method:
if ([oldInfo isEqualToArray:newInfo])
NSLog(#"Contact information hasn't been updated yet");
else {
NSLog(#"Contact information has been updated, saving new contact information");
[newInfo writeToFile:path atomically:YES];
}
Now, if assuming isEqualToArray just invokes isEqualTo for each cell, the for-loop method runs for 1/11 of the time isEqualToArray does (only need to compare one column instead of 11).
Maybe I'm just too much into optimizing... (I've been at many contests where runtime was limited and I'm feeling the after-effects).
The Documentation says:
Two arrays have equal contents if they each hold the same number of objects and objects at a given index in each array satisfy the isEqual: test.
So basically you are right.
From a design point of view I would either go for isEqualToArray:, since it makes the code easier to understand or introduce a BOOL hasUpdates if you are concern about performance, which has the additionally advantage that you don't have to hold two copies in memory.
I suspect that many people wrongly assume that performance is proportional to the number of source statements executed and that a function like isEqualToArray is blindingly fast compared to the equivalent directly-coded loop.
In fact, while sometimes the coders of these APIs do indeed know a few "tricks of the trade" that speed things up a bit (or have access to internal interfaces you can't use), just as often they must throw in additional logic to handle "oddball" cases that you don't care about, or simply to make the API "general".
So in most cases the choice should be based on which most reasonably fits the overall program and makes the logic clear. In some cases the explicit loop is better, especially if one can harness some of the logic (eg, to take a later-required "max" of the array values) to avoid duplication of effort.
Also, when there is a complex API function (more complex than isEqualToArray) you're not quite sure you understand, it's often better to code things in a straight-forward manner rather than deal with the complex function. Once you have the code working you can come back and "optimize" things to use the complex API.
When you know both objects are Arrays, isEqualTo<Class> method is a faster way to check equality than for loop.
isEqualTo<Class> is used to provide specific checks for equality.so isEqualToArray: checks that the arrays contain an equal number of objects.
So as per my knowledge i can say isEqualToArray is better option when you know that two objects are arrays.
Related
I am trying to squeeze every bit of efficiency out of my application I am working on.
I have a couple arrays that follow the following conditions:
They are NEVER appended to, I always calculate the index myself
The are allocated once and never change size
It would be nice if they were thread safe as long as it doesn't cost performance
Some hold primitives like floats, or unsigned ints. One of them does hold a class.
Most of these arrays at some point are passed into a glBuffer
Never cleared just overwritten
Some of the arrays individual elements are changed entirely by = others are changed by +=
I currently am using swift native arrays and am allocating them like var arr = [GLfloat](count: 999, repeatedValue: 0) however I have been reading a lot of documentation and it sounds like Swift arrays are much more abstract then a traditional C-style array. I am not even sure if they are allocated in a block or more like a linked list with bits and pieces thrown all over the place. I believe by doing the code above you cause it to allocate in a continuous block but i'm not sure.
I worry that the abstract nature of Swift arrays is something that is wasting a lot of precious processing time. As you can see by my above conditions I dont need any of the fancy appending, or safety features of Swift arrays. I just need it simple and fast.
My question is: In this scenario should I be using some other form of array? NSArray, somehow get a C-style array going, create my own data type?
Im looking into thread safety, would a different array type that was more thread safe such as NSArray be any slower?
Note that your requirements are contradictory, particularly #2 and #7. You can't operate on them with += and also say they will never change size. "I always calculate the index myself" also doesn't make sense. What else would calculate it? The requirements for things you will hand to glBuffer are radically different than the requirements for things that will hold objects.
If you construct the Array the way you say, you'll get contiguous memory. If you want to be absolutely certain that you have contiguous memory, use a ContiguousArray (but in the vast majority of cases this will give you little to no benefit while costing you complexity; there appear to be some corner cases in the current compiler that give a small advantage to ContinguousArray, but you must benchmark before assuming that's true). It's not clear what kind of "abstractness" you have in mind, but there's no secrets about how Array works. All of stdlib is open source. Go look and see if it does things you want to avoid.
For certain kinds of operations, it is possible for other types of data structures to be faster. For instance, there are cases where a dispatch_data is better and cases where a regular Data would be better and cases where you should use a ManagedBuffer to gain more control. But in general, unless you deeply know what you're doing, you can easily make things dramatically worse. There is no "is always faster" data structure that works correctly for all the kinds of uses you describe. If there were, that would just be the implementation of Array.
None of this makes sense to pursue until you've built some code and started profiling it in optimized builds to understand what's going on. It is very likely that different uses would be optimized by different kinds of data structures.
It's very strange that you ask whether you should use NSArray, since that would be wildly (orders of magnitude) slower than Array for dealing with very large collections of numbers. You definitely need to experiment with these types a bit to get a sense of their characteristics. NSArray is brilliant and extremely fast for certain problems, but not for that one.
But again, write a little code. Profile it. Look at the generated assembler. See what's happening. Watch particularly for any undesired copying or retain counting. If you see that in a specific case, then you have something to think about changing data structures over. But there's no "use this to go fast." All the trade-offs to achieve that in the general case are already in Array.
Is there a difference in speed between checking a NSSet if it contains a certain object using [ containsObject:] vs using [ objectsPassingTest:block] with the stop variable set to YES so that it stops after first match?
Also, if the set contains objects of a custom class, my understanding is that the containsObject uses the isEqual: method to perform its check and hence this has to be overridden in the custom class. Will this slow down the containsObject check as opposed to the case where the NSSset contains objects of Apple classes like NSString, NSNumber etc?
I plan to run some benchmarks when I get some time, but have an interview tomorrow and would like to have the answer handy for that one.
Well you should run the benchmarks you plan to, but you can guesstimate an answer.
An implementation of containsObject: might iterate calling isEqual: on each member; while an implementation of objectsPassingTest: might iterate, call the block on each member, and the block calls isEqual:...
I think you can guesstimate based on that. Have a good interview, though if the interviewer reads SO...
Even I have problems with this kind of Qs on SO, I will (partially) answer to it. And I do not think that the interviewer will get the final result, but your thoughts on it.
Both do a check with -isEqual:. But -containsObject: can do it directly, while -objectPassingTest: has to call a block.This might not be expensive, but since the code to execute is not expensive, too, this might cause a performance impact.
Beside this -containsObject: can use hashing to find an object. -objectPassingTest: in NSSet cannot, since it has no idea of what the test is. The block cannot do this neither, because he gets the objects one by one.
However, if you have mutable objects in the set, what object of custom classes are typically, no hashing can be done, because it is impossible to implement useful hashing on mutable objects in a collection.
So my estimation: Having immutable objects with a properly implemented -hash, -containsObject: will beat -objectPassingTest: by far, otherwise not that much.
NSArray and NSMutableArray offer multiple ways to sort them using the sortedArrayUsing... and sortUsing... methods respectively, however none of those methods appear to offer a way to terminate a sort after it has been started.
For relatively small arrays, or when the comparison logic is trivial, this is probably not a big deal, but with larger arrays or when the comparison logic is not trivial, I would like to be able to cancel a sort already in process.
Trivial Use Case Example
Sorting a set of results that match based on a user's fuzzy search string. As the user types in the search field, results are fetched on a background thread and sorted before being presented to the user. If the fetch-and-sort operation is not completed before the user changes the search string, then it should be cancelled and a new fetch-and-sort operation started. The problem is that if the fetch-and-sort operation has already reached the sorting stage and called one of the NSArray sort methods above, then there's no way to cancel it. Instead, the next fetch-and-sort operation is left waiting for the now stale sort operation to complete.
So far, I've come up with two possible solutions but neither seems all that elegant.
Attempted Solution #1
Allow newer fetch-and-sort operations to start before any stale fetch-and-sort operations are finished. I just keep track of which is the latest operation using some internal state and as the operations complete, if they aren't the primary operation, then their results are discarded.
This works, but it can quickly result in multiple outstanding sorting operations all running concurrently, whether they need to be or not. This can be somewhat mitigated by throttling the maximum number of concurrent operations, but then I'm just adding an arbitrary limit. Pending, stale operations can be cancelled before they get executed, but I'm still left with situations where sorting work is being done when it doesn't need to be.
Attempted Solution #2
Roll my own quick sort or merge sort implementation and add an isCancelled flag to those routines so that they can quickly unwind and terminate. This is working, and working fairly well, but when the sorting operation doesn't need to be cancelled, the run time is about 15-20% slower than using one of the NSArray methods. Part of this, I imagine, is the overhead of calling methods like objectAtIndex and exchangeObjectAtIndex which I assume the internal sorting routines can bypass depending on how the NSArray is internally storing the objects in question.
It also feels wrong to be rolling my own sorting implementations in 2015 against something like AppKit and NSArray.
Semi-Attempted Solutions
Keeping a previously sorted array around and re-using that for filtering: This doesn't really work for what I'm trying to do so for sake of discussion, assume that the array I have to sort on is always unsorted and has no relationship to the previously sorted array.
Moving away from NSArray and back to C-style arrays: This works pretty well and the performance is quite good, but I'm left playing a bunch of games with ARC and the complexity of the overall implementatiion is significantly higher because at the end of the day, I'm always dealing with NSObjects. There's also a non-zero cost of going back and worth between NSArray and C-style arrays.
Summary
So, all of that to get back to the original question: "How do you cancel an in-progress NSArray sorting method?"
Tech Note
For those that are curious why this is a problem to begin with, I'm attempting to sort somewhere between 500,000 to 1,000,000 strings using compare methods like localizedStandardCompare, which is dramatically slower than just a straight NSString compare. The runtime difference between the various sortUsing... methods is relatively insignificant when compared to the total time to sort.
Starting where you end:
So, all of that to get back to the original question: "How do you cancel an in-progress NSArray sorting method?"
You don't. Cancellation isn't supported and anything you come up with is bound to be fragile.
So back to what you've done:
Roll my own quick sort or merge sort implementation and add an isCancelled flag to those routines so that they can quickly unwind and terminate. This is working, and working fairly well, but when the sorting operation doesn't need to be cancelled, the run time is about 15-20% slower than using one of the NSArray methods.
This is the way to go in this case, you just need to work on that slowdown...
You might be right, part of the slowdown might be the need to call methods for indexing and exchanging elements. Have you tried caching C function pointers to the common methods you require? If at the start of a sort you obtain direct C function pointers to objectAtIndex: et al. using the Objective-C runtime function class_getMethodImplementation() you can replace all the calls to method lookup with simple indirection.
If such manipulations fail then maybe look at C arrays again. As NSArray is toll-free bridged to CFArrayRef you can use CFArrayGetValues to copy out the elements into a malloc'ed C array, sort that, and then use CFArrayCreate to get back to a NSArray. Provided you are careful and not mutating the array you are sorting, as the elements will be in the original array they will already be retained and creating the new array will retain them once more, you can probably handle memory management by doing nothing. Sorting the C-array will be faster, but extraction and creation are going to be O(N) operations on top of the sort.
HTH
After several days of testing, I've opted to go with a custom, in-place merge sort implementation that accepts a boolean flag to trigger a cancellation.
A few follow-up points for those interested:
The raw performance of my merge sort implementation still lags somewhat behind the raw performance of the NSArray sortUsingComparator method. Instruments indicates that NSArray is using a merge sort as well, so I suspect the performance difference can be attributed to a more tuned implementation by Apple than I came up with and the ability to directly access NSArray's internals. NSArray's implementation took about 28 seconds to sort 1,000,000 strings using localizedStandardCompare as compared to 31.5 seconds for mine. (MacBook Air 2013)
Converting an NSMutableArray to a C-array of objects did not yield enough of a performance improvement to warrant the added complexity. The sort time was only reduced by between 0.5 - 1.0 second. Nice to have, but still dwarfed by the time spent in localizedStandardCompare. For input arrays of much smaller sizes (100,000), the speed difference was almost negligible. This surprised me, but Instruments is showing that all of the "overhead" in using an NSMutableArray is mostly noise when compared to the sort operation itself.
Parallelizing the merge function and farming out the tasks via GCD yielded a noticeable improvement of between 6.0 - 7.0 seconds, reducing the total time to sort to less than what NSArray sortUsingComparator was taking. Tuning the job count and stride length based on input array size could offer even more improvements (albeit minor ones at this stage).
Ultimately, a parallelized and cancelable implementation is proving to offer the best user experience for what I have in mind.
Firstly, I think common way to handle the problem you are mentioning, is not to cancel sorting, but to add a delay before fetch/sort operation is made.
Users usually type in short bursts. So add a delay of x seconds(e.g. 0.5s) before fetch and sort will actually begin.
Example:
User types 'a'.
Start a x second timer.
Before timer expires user types 'b'
Invalidate old timer and start a new one with x seconds.
Timer expires, start fetch and sort operation.
Hope this helps.
Instead of implementing your own sorting algorithm (which check for cancelation), you can implement your own comparator, which can check the cancelation condition, and throw an exception to interrupt the NSArray sortUsing...
The call to NSArray sortUsing.. should be enclosed in a try/catch group
- (void) testInterruptSort
{
NSArray *a = #[#"beta", #"alpha", #"omega", #"foo", #"bar"];
NSArray *sorted;
BOOL interrupted = NO;
NSString * const myException = #"User";
#try {
int __block n = 0;
sorted = [a sortedArrayUsingComparator:^(NSString *s1, NSString *s2) {
if (/* cancel condition*/ (1) && (n>5)) {
n++;
NSException *e = [NSException exceptionWithName:myException reason:#"interrupted" userInfo:nil];
[e raise];
}
return [s1 localizedStandardCompare:s2];
}];
}
#catch (NSException *exception) {
// should check if this is the "User" exception
// see https://developer.apple.com/library/mac/documentation/Cocoa/Conceptual/Exceptions/Tasks/HandlingExceptions.html
interrupted = YES;
}
#finally {
}
NSLog(#"interrupted: %#, result = %#\n", interrupted ? #"YES":#"NO", sorted);
}
I am in a situation that I need to get the items in an array and time is sensitive. I have the option of using a separate variable to hold the current count or just use NSMutableArray's count method.
ex: if (myArray.count == ... ) or if (myArrayCount == ...)
How expensive is it to get the counting of items from the count method of an array?
The correct answer is, there is no difference in speed, so access the count of the array as you wish my child :)
Fetching NSArray's count method is no more expensive then fetching a local variable in which you've stored this value. It's not calculated when it's called. It's calculated when the array is created and stored.
For NSMutableArray, the only difference is that the property is recalculated any time you modify the contents of the array. The end result is still the same--when you call count, the number returned was precalculated. It's just returning the precalculated number it already stored.
Storing count in a variable, particularly for an NSMutableArray is actually a worse option because the size of the array could change, and access the count in this variable is not faster whatsoever. It only provides the added risk of potential inaccuracy.
The best way to prove to yourself that this is a preset value that is not calculated upon the count method being called is to create two arrays. One array has only a few elements. The other array has tens of thousands of elements. Now time how long it takes count to return. You'll find the time for count to return is identical no matter the size of the array.
As a correction to everyone above, NSArray does not have a count property. It has a count method. The method itself either physically counts all of the elements within the array or is a getter for a private variable the array stores. Unless you plan on subclassing NSArray and create a higher efficient system for counting dynamic and/or static arrays... you're not going to get better performance than using the count method on an NSArray. As a matter of fact, you should count on the fact that Apple has already optimized this method to it's max. My main ponder after this is that if you are doing an asynchronous call and your focus is optimizing the count of an NSArray how do you not know that you are seriously doing something wrong. If you are performing some high performance hitting method on the main thread or such... you should consider optimizing that. The performance hit of iterating and counting through the array using NSArray's count method should in no way effect your performance to any noticeable rate.
You should read up more on performance for NSArrays and NSMutableArrays if this is truly a concern for you. You can start here: link
If you need to get the item**s** then getting the count is not time critical. You'd also want to look at fast enumeration, or using enumeration with dispatch blocks, especially with parallel execution.
Edit:
Asa's is the most correct answer. I misunderstood the question.
Asa is right because the compiler will automatically optimize this and use the fastest way on its own.
TheGamingArt is correct about NSArray being as optimal as could be. However, this is only for obj-c.
Don't forget you have access to c and c++ which means you can use vectors which should be only 'slightly' faster considering it won't use obj-c messaging. However, it wouldn't surprise me if the difference isn't noticeable. c++ vector benchmarks: http://baptiste-wicht.com/posts/2012/12/cpp-benchmark-vector-list-deque.html
This is a good example of Premature Optimization (http://c2.com/cgi/wiki?PrematureOptimization). I suggest you look into GCD or NSOperations (http://www.raywenderlich.com/19788/how-to-use-nsoperations-and-nsoperationqueues)
I'm pretty sure this is a silly newbie question but I didn't know it so I had to ask...
Why do we use data structures, like Linked List, Binary Search Tree, etc? (when no dynamic allocation is needed)
I mean: wouldn't it be faster if we kept a single variable for a single object? Wouldn't that speed up access time? Eg: BST possibly has to run through some pointers first before it gets to the actual data.
Except for when dynamic allocation is needed, is there a reason to use them?
Eg: using linked list/ BST / std::vector in a situation where a simple (non-dynamic) array could be used.
Each thing you are storing is being kept in it's own variable (or storage location). Data structures apply organization to your data. Imagine if you had 10,000 things you were trying to track. You could store them in 10,000 separate variables. If you did that, then you'd always be limited to 10,000 different things. If you wanted more, you'd have to modify your program and recompile it each time you wanted to increase the number. You might also have to modify the code to change the way in which the calculations are done if the order of the items changes because the new one is introduced in the middle.
Using data structures, from simple arrays to more complex trees, hash tables, or custom data structures, allows your code to both be more organized and extensible. Using an array, which can either be created to hold the required number of elements or extended to hold more after it's first created keeps you from having to rewrite your code each time the number of data items changes. Using an appropriate data structure allows you to design algorithms based on the relationships between the data elements rather than some fixed ordering, giving you more flexibility.
A simple analogy might help to understand. You could, for example, organize all of your important papers by putting each of them into separate filing cabinet. If you did that you'd have to memorize (i.e., hard-code) the cabinet in which each item can be found in order to use them effectively. Alternatively, you could store each in the same filing cabinet (like a generic array). This is better in that they're all in one place, but still not optimum, since you have to search through them all each time you want to find one. Better yet would be to organize them by subject, putting like subjects in the same file folder (separate arrays, different structures). That way you can look for the file folder for the correct subject, then find the item you're looking for in it. Depending on your needs you can use different filing methods (data structures/algorithms) to better organize your information for it's intended use.
I'll also note that there are times when it does make sense to use individual variables for each data item you are using. Frequently there is a mixture of individual variables and more complex structures, using the appropriate method depending on the use of the particular item. For example, you might store the sum of a collection of integers in a variable while the integers themselves are stored in an array. A program would need to be pretty simple though before the introduction of data structures wouldn't be appropriate.
Sorry, but you didn't just find a great new way of doing things ;) There are several huge problems with this approach.
How could this be done without requring programmers to massively (and nontrivially) rewrite tons of code as soon as the number of allowed items changes? Even when you have to fix your data structure sizes at compile time (e.g. arrays in C), you can use a constant. Then, changing a single constant and recompiling is sufficent for changes to that size (if the code was written with this in mind). With your approach, we'd have to type hundreds or even thousands of lines every time some size changes. Not to mention that all this code would be incredibly hard to read, write, maintain and verify. The old truism "more lines of code = more space for bugs" is taken up to eleven in such a setting.
Then there's the fact that the number is almost never set in stone. Even when it is a compile time constant, changes are still likely. Writing hundreds of lines of code for a minor (if it exists at all) performance gain is hardly ever worth it. This goes thrice if you'd have to do the same amount of work again every time you want to change something. Not to mention that it isn't possible at all once there is any remotely dynamic component in the size of the data structures. That is to say, it's very rarely possible.
Also consider the concept of implicit and succinct data structures. If you use a set of hard-coded variables instead of abstracting over the size, you still got a data structure. You merely made it implicit, unrolled the algorithms operating on it, and set its size in stone. Philosophically, you changed nothing.
But surely it has a performance benefit? Well, possible, although it will be tiny. But it isn't guaranteed to be there. You'd save some space on data, but code size would explode. And as everyone informed about inlining should know, small code sizes are very useful for performance to allow the code to be in the cache. Also, argument passing would result in excessive copying unless you'd figure out a trick to derive the location of most variables from a few pointers. Needless to say, this would be nonportable, very tricky to get right even on a single platform, and liable to being broken by any change to the code or the compiler invocation.
Finally, note that a weaker form is sometimes done. The Wikipedia page on implicit and succinct data structures has some examples. On a smaller scale, some data structures store much data in one place, such that it can be accessed with less pointer chasing and is more likely to be in the cache (e.g. cache-aware and cache-oblivious data structures). It's just not viable for 99% of all code and taking it to the extreme adds only a tiny, if any, benefit.
The main benefit to datastructures, in my opinion, is that you are relationally grouping them. For instance, instead of having 10 separate variables of class MyClass, you can have a datastructure that groups them all. This grouping allows for certain operations to be performed because they are structured together.
Not to mention, having datastructures can potentially enforce type security, which is powerful and necessary in many cases.
And last but not least, what would you rather do?
string string1 = "string1";
string string2 = "string2";
string string3 = "string3";
string string4 = "string4";
string string5 = "string5";
Console.WriteLine(string1);
Console.WriteLine(string2);
Console.WriteLine(string3);
Console.WriteLine(string4);
Console.WriteLine(string5);
Or...
List<string> myStringList = new List<string>() { "string1", "string2", "string3", "string4", "string5" };
foreach (string s in myStringList)
Console.WriteLine(s);