Is there a difference in speed between checking a NSSet if it contains a certain object using [ containsObject:] vs using [ objectsPassingTest:block] with the stop variable set to YES so that it stops after first match?
Also, if the set contains objects of a custom class, my understanding is that the containsObject uses the isEqual: method to perform its check and hence this has to be overridden in the custom class. Will this slow down the containsObject check as opposed to the case where the NSSset contains objects of Apple classes like NSString, NSNumber etc?
I plan to run some benchmarks when I get some time, but have an interview tomorrow and would like to have the answer handy for that one.
Well you should run the benchmarks you plan to, but you can guesstimate an answer.
An implementation of containsObject: might iterate calling isEqual: on each member; while an implementation of objectsPassingTest: might iterate, call the block on each member, and the block calls isEqual:...
I think you can guesstimate based on that. Have a good interview, though if the interviewer reads SO...
Even I have problems with this kind of Qs on SO, I will (partially) answer to it. And I do not think that the interviewer will get the final result, but your thoughts on it.
Both do a check with -isEqual:. But -containsObject: can do it directly, while -objectPassingTest: has to call a block.This might not be expensive, but since the code to execute is not expensive, too, this might cause a performance impact.
Beside this -containsObject: can use hashing to find an object. -objectPassingTest: in NSSet cannot, since it has no idea of what the test is. The block cannot do this neither, because he gets the objects one by one.
However, if you have mutable objects in the set, what object of custom classes are typically, no hashing can be done, because it is impossible to implement useful hashing on mutable objects in a collection.
So my estimation: Having immutable objects with a properly implemented -hash, -containsObject: will beat -objectPassingTest: by far, otherwise not that much.
Related
I've already seen this question:
What's the difference between the atomic and nonatomic attributes?
I understand that #atomic does not guarantee thread safe, and I have to use other mechanisms (e.g. #synchronized) to realize that. Based on that, I don't still know EXACTLY when to use #atomic attribute. I'd like to know USE CASE of using #atomic alone.
The typical use-case for atomic properties is when dealing with a primitive data type across multiple threads. For example, let's say you have some background thread doing some processing and you have some BOOL state property, e.g. isProcessComplete and your main thread wants to check to see if the background process is complete:
if (self.isProcessComplete) {
// Do something
}
In this case, declaring this property as atomic allows us to use/update this property across multiple threads without any more complicated synchronization mechanism because:
we're dealing with a scalar, primitive data type, e.g. BOOL;
we declared it to be atomic; and
we're using the accessor method (e.g. self.) rather than accessing the ivar directly.
When dealing with objects or other more complicated situations, atomic generally is insufficient. As you point out, in practice, atomic, alone, is rarely sufficient to achieve thread safety, which is why we don't use it very often. But for simple, stand-alone, primitive data types, atomic can be an easy way to ensure safe access across multiple threads.
#atomic will guarantee You, that the value, that You will receive will not be gibberish. A possible situation is to read a given value from one thread and to set its value from another. Then the #atomic keyword will ensure that You receive a whole value. Important thing to now is that the value that You get is not guaranteed to be the one that has been set most recently.
The second part of the Your question, about the use case is purely circumstantial, depending on the implementation the intentions. For example if You have some kind of time based update on a list every second or so, You could use atomic to ensure that You will get whole values, and the timer that updates Your list can ensure that You will have the latest data on screen or under the hood in some implicit logic.
EDIT: After a remark from #Rob I saw the need of paraphrasing the last part of my answer. In most cases atomic is not enough to get the job done. And there is a need of a better solution, like #synchronized.
atomic guarantees that competing threads will get/set whole values, whether you're setting a primitive type or a pointer to an object. With nonatomic access, it's possible to get a torn read or write, especially if you're dealing with 64-bit values.
atomic also guarantees that an object pointer won't be garbage when you retrieve it:
https://github.com/opensource-apple/objc4/blob/master/runtime/objc-accessors.mm#L58-L61.
That being said, if you're pointing to a mutable object, atomic won't offer you any guarantees about the object's state.
atomic is appropriate if you're dealing with:
Immutable objects
Primitives (like int, char, float)
atomic is inappropriate if:
The object or any of its descendant properties are mutable
If you need thread safety for a primitive value or immutable object, this is one of the fastest ways to guard against torn reads/writes that you can find.
I am trying to squeeze every bit of efficiency out of my application I am working on.
I have a couple arrays that follow the following conditions:
They are NEVER appended to, I always calculate the index myself
The are allocated once and never change size
It would be nice if they were thread safe as long as it doesn't cost performance
Some hold primitives like floats, or unsigned ints. One of them does hold a class.
Most of these arrays at some point are passed into a glBuffer
Never cleared just overwritten
Some of the arrays individual elements are changed entirely by = others are changed by +=
I currently am using swift native arrays and am allocating them like var arr = [GLfloat](count: 999, repeatedValue: 0) however I have been reading a lot of documentation and it sounds like Swift arrays are much more abstract then a traditional C-style array. I am not even sure if they are allocated in a block or more like a linked list with bits and pieces thrown all over the place. I believe by doing the code above you cause it to allocate in a continuous block but i'm not sure.
I worry that the abstract nature of Swift arrays is something that is wasting a lot of precious processing time. As you can see by my above conditions I dont need any of the fancy appending, or safety features of Swift arrays. I just need it simple and fast.
My question is: In this scenario should I be using some other form of array? NSArray, somehow get a C-style array going, create my own data type?
Im looking into thread safety, would a different array type that was more thread safe such as NSArray be any slower?
Note that your requirements are contradictory, particularly #2 and #7. You can't operate on them with += and also say they will never change size. "I always calculate the index myself" also doesn't make sense. What else would calculate it? The requirements for things you will hand to glBuffer are radically different than the requirements for things that will hold objects.
If you construct the Array the way you say, you'll get contiguous memory. If you want to be absolutely certain that you have contiguous memory, use a ContiguousArray (but in the vast majority of cases this will give you little to no benefit while costing you complexity; there appear to be some corner cases in the current compiler that give a small advantage to ContinguousArray, but you must benchmark before assuming that's true). It's not clear what kind of "abstractness" you have in mind, but there's no secrets about how Array works. All of stdlib is open source. Go look and see if it does things you want to avoid.
For certain kinds of operations, it is possible for other types of data structures to be faster. For instance, there are cases where a dispatch_data is better and cases where a regular Data would be better and cases where you should use a ManagedBuffer to gain more control. But in general, unless you deeply know what you're doing, you can easily make things dramatically worse. There is no "is always faster" data structure that works correctly for all the kinds of uses you describe. If there were, that would just be the implementation of Array.
None of this makes sense to pursue until you've built some code and started profiling it in optimized builds to understand what's going on. It is very likely that different uses would be optimized by different kinds of data structures.
It's very strange that you ask whether you should use NSArray, since that would be wildly (orders of magnitude) slower than Array for dealing with very large collections of numbers. You definitely need to experiment with these types a bit to get a sense of their characteristics. NSArray is brilliant and extremely fast for certain problems, but not for that one.
But again, write a little code. Profile it. Look at the generated assembler. See what's happening. Watch particularly for any undesired copying or retain counting. If you see that in a specific case, then you have something to think about changing data structures over. But there's no "use this to go fast." All the trade-offs to achieve that in the general case are already in Array.
I am in a situation that I need to get the items in an array and time is sensitive. I have the option of using a separate variable to hold the current count or just use NSMutableArray's count method.
ex: if (myArray.count == ... ) or if (myArrayCount == ...)
How expensive is it to get the counting of items from the count method of an array?
The correct answer is, there is no difference in speed, so access the count of the array as you wish my child :)
Fetching NSArray's count method is no more expensive then fetching a local variable in which you've stored this value. It's not calculated when it's called. It's calculated when the array is created and stored.
For NSMutableArray, the only difference is that the property is recalculated any time you modify the contents of the array. The end result is still the same--when you call count, the number returned was precalculated. It's just returning the precalculated number it already stored.
Storing count in a variable, particularly for an NSMutableArray is actually a worse option because the size of the array could change, and access the count in this variable is not faster whatsoever. It only provides the added risk of potential inaccuracy.
The best way to prove to yourself that this is a preset value that is not calculated upon the count method being called is to create two arrays. One array has only a few elements. The other array has tens of thousands of elements. Now time how long it takes count to return. You'll find the time for count to return is identical no matter the size of the array.
As a correction to everyone above, NSArray does not have a count property. It has a count method. The method itself either physically counts all of the elements within the array or is a getter for a private variable the array stores. Unless you plan on subclassing NSArray and create a higher efficient system for counting dynamic and/or static arrays... you're not going to get better performance than using the count method on an NSArray. As a matter of fact, you should count on the fact that Apple has already optimized this method to it's max. My main ponder after this is that if you are doing an asynchronous call and your focus is optimizing the count of an NSArray how do you not know that you are seriously doing something wrong. If you are performing some high performance hitting method on the main thread or such... you should consider optimizing that. The performance hit of iterating and counting through the array using NSArray's count method should in no way effect your performance to any noticeable rate.
You should read up more on performance for NSArrays and NSMutableArrays if this is truly a concern for you. You can start here: link
If you need to get the item**s** then getting the count is not time critical. You'd also want to look at fast enumeration, or using enumeration with dispatch blocks, especially with parallel execution.
Edit:
Asa's is the most correct answer. I misunderstood the question.
Asa is right because the compiler will automatically optimize this and use the fastest way on its own.
TheGamingArt is correct about NSArray being as optimal as could be. However, this is only for obj-c.
Don't forget you have access to c and c++ which means you can use vectors which should be only 'slightly' faster considering it won't use obj-c messaging. However, it wouldn't surprise me if the difference isn't noticeable. c++ vector benchmarks: http://baptiste-wicht.com/posts/2012/12/cpp-benchmark-vector-list-deque.html
This is a good example of Premature Optimization (http://c2.com/cgi/wiki?PrematureOptimization). I suggest you look into GCD or NSOperations (http://www.raywenderlich.com/19788/how-to-use-nsoperations-and-nsoperationqueues)
I would like to know what isEqualToArray actually does...
I have an array with size 160, each containing a dictionary with 11 entries, but I can do the comparison simply based on the first column (contains the date when the row was changed).
Now I can do that with a simple for-cycle:
BOOL different = FALSE;
for (int index = 0 ; index < [newInfo count] ; ++index)
if (![[[oldInfo objectAtIndex:index] objectForKey:#"Update"] isEqual:[[newInfo objectAtIndex:index] objectForKey:#"Update"]]) {
different = TRUE;
break;
}
if (different) {
}
else
NSLog(#"Contact information hasn't been updated yet");
Or I can use the built-in isEqualToArray method:
if ([oldInfo isEqualToArray:newInfo])
NSLog(#"Contact information hasn't been updated yet");
else {
NSLog(#"Contact information has been updated, saving new contact information");
[newInfo writeToFile:path atomically:YES];
}
Now, if assuming isEqualToArray just invokes isEqualTo for each cell, the for-loop method runs for 1/11 of the time isEqualToArray does (only need to compare one column instead of 11).
Maybe I'm just too much into optimizing... (I've been at many contests where runtime was limited and I'm feeling the after-effects).
The Documentation says:
Two arrays have equal contents if they each hold the same number of objects and objects at a given index in each array satisfy the isEqual: test.
So basically you are right.
From a design point of view I would either go for isEqualToArray:, since it makes the code easier to understand or introduce a BOOL hasUpdates if you are concern about performance, which has the additionally advantage that you don't have to hold two copies in memory.
I suspect that many people wrongly assume that performance is proportional to the number of source statements executed and that a function like isEqualToArray is blindingly fast compared to the equivalent directly-coded loop.
In fact, while sometimes the coders of these APIs do indeed know a few "tricks of the trade" that speed things up a bit (or have access to internal interfaces you can't use), just as often they must throw in additional logic to handle "oddball" cases that you don't care about, or simply to make the API "general".
So in most cases the choice should be based on which most reasonably fits the overall program and makes the logic clear. In some cases the explicit loop is better, especially if one can harness some of the logic (eg, to take a later-required "max" of the array values) to avoid duplication of effort.
Also, when there is a complex API function (more complex than isEqualToArray) you're not quite sure you understand, it's often better to code things in a straight-forward manner rather than deal with the complex function. Once you have the code working you can come back and "optimize" things to use the complex API.
When you know both objects are Arrays, isEqualTo<Class> method is a faster way to check equality than for loop.
isEqualTo<Class> is used to provide specific checks for equality.so isEqualToArray: checks that the arrays contain an equal number of objects.
So as per my knowledge i can say isEqualToArray is better option when you know that two objects are arrays.
I came across this post while I was looking for things to improve performance. Currently, in my application we are returning IList<> all over the place. Is it a good idea to change all of these returns to AsQueryable() ?
Here is what I found -
AsQueryable() - Context needs to be
open and you cannot control the
lifetime of the database context
it need to be disposed properly. Also
it is deferred execution('faster
filtering' as compared to Lists)
IList<> - This should be preferred
over List<> as it provides a barebone
and lightweight implementation.
Also when should be one preferred over another ? I know the basics but I am sorry I am still not clear when and how should we use them correctly in an application. It would be great to know this as the next time I would try to keep it in mind before returning anything..Thanks a lot.
Basically, you should try to reference the widest type you need. For example, if some variable is declared as List<...>, you put a constraint for the type of the values that can be assigned to it. It may happen that you need only sequential access, so it would be enough to declare the variable as IEnumerable<...> instead. That will allow you to assign the values of other types to the variable, as well as the results of LINQ operations.
If you see that your variable needs access by index, you can again declare it as IList<...> and not just List<...>, allowing other types implementing IList<...> be assigned to it.
For the function return types, it depends upon you. If you think it's important that the function returns exactly List<...>, you declare it to return exactly List<...>. If the only important thing is access to the result by index, perhaps you don't need to constrain yourself to return exactly List<...>, you may declare return type as IList<...> (but return actually an instance of List<...> in this implementation, and possibly of some another type supporting IList<...> later). Again, if you see that the only important thing about the return value of your function is that it can be enumerated (and the access by index is not needed), you should change the function return type to IEnumerable<...>, giving yourself more freedom.
Now, about AsQueriable, again it depends on your logic. If you think that possible delayed evaluation is a good thing in your case, as it may aid to avoid the unneeded calculations, or you intend to use it as a part of some another query, you use it. If you think that the results have to be "materialized", i.e., calculated at this very moment, you would better return a List<...>. You would especially need to materialize your result if the calculation later may result in a different list!
With the database a good rule of thumb is to use AsQueriable for the short-term intermediate results, but List for the "final" results which will be used within some longer time. Of course having a non-materialized query hanging around makes closing the database impossible (since at the moment of actual evaluation of the the database should be still open).
if you do not intend to do any further queries over sql server then you should return IList because it produces in-memory data
If you are concerned about performance you should also try to run your queries on as few DB requests as possible and cache the most used queries. It is very common to reduce significantly the request process time using batch approachs.
Which ORM do you use to retrieve data from DB? If you use NHibernate, see this post about how to use Future, Multi Criteria 1, Multi Criteria 2 and Multi Query.
Greetings.