Performance issue while finding min and max with functional approach - ios

I have an array of subviews and I want to find the lowest tag and the highest tag (~ min and max). I tried to play with the functional approach of Swift and optimized it as much as my knowledge allowed me, but when I do this:
let startVals = (min:Int.max, max:Int.min)
var minMax:(min: Int, max: Int) = subviews.filter({$0 is T2GCell}).reduce(startVals) {
(min($0.min, $1.tag), max($0.max, $1.tag))
}
I still get worse performance (approximately 10x slower) than good ol' for cycle:
var lowest2 = Int.max
var highest2 = Int.min
for view in subviews {
if let cell = view as? T2GCell {
lowest2 = lowest2 > cell.tag ? cell.tag : lowest2
highest2 = highest2 < cell.tag ? cell.tag : highest2
}
}
To be totally precise I am also including snippet of the measuring code. Note that the "after-recalculations" for human readable times is done outside of any measurement:
let startDate: NSDate = NSDate()
// code
let endDate: NSDate = NSDate()
// outside of measuring block
let dateComponents: NSDateComponents = NSCalendar(calendarIdentifier: NSCalendarIdentifierGregorian)!.components(NSCalendarUnit.CalendarUnitNanosecond, fromDate: startDate, toDate: endDate, options: NSCalendarOptions(0))
let time = Double(Double(dateComponents.nanosecond) / 1000000.0)
My question is - am I doing it wrong, or this use case is simply not suitable for functional approach?
EDIT
This is is 2x slower:
var extremes = reduce(lazy(subviews).map({$0.tag}), startValues) {
(min($0.lowest, $1), max($0.highest, $1))
}
And this is only 20% slower:
var extremes2 = reduce(lazy(subviews), startValues) {
(min($0.lowest, $1.tag), max($0.highest, $1.tag))
}
Narrowed and squeezed down to very nice performance times, but still not as fast as the for cycle.
EDIT 2
I noticed I left out the filter in previous edits. When added:
var extremes3 = reduce(lazy(subviews).filter({$0 is T2GCell}), startValues) {
(min($0.lowest, $1.tag), max($0.highest, $1.tag))
}
I'm back to 2x slower performance.

In optimized builds, reduce and for should be completely equivalent in performance. However, in unoptimized debug builds, a for loop may beat the reduce version, because reduce will not be specialized and inlined. The filter can be removed, eliminating an unnecessary extra array creation, however that array creation is going to be pretty fast (all it is doing is copying pointers into memory) so that is not really a big deal, eliminating it is more for clarity.
However, I believe part of the problem is that in your reduce, you are calling the .tag property on AnyObject, whereas in your for loop version, you are calling T2GCell.tag. This could make a big difference. You can see this if you break out the filter:
// filtered will be of type [AnyObject]
let filtered = subviews.filter({$0 is T2GCell})
let minMax:(min: Int, max: Int) = filtered.reduce(startVals) {
// so $1.tag is calling AnyObject.tag, not T2GCell.tag
(min($0.min, $1.tag), max($0.max, $1.tag))
}
This means .tag is going to be dynamically bound at runtime, potentially a slower operation.
Here's some sample code that demonstrates the difference. If you compile this will swiftc -O you'll see the statically-bound (or rather not-quite-so dynamically-bound) reduce and the for loop perform pretty much the same:
import Foundation
#objc class MyClass: NSObject {
var someProperty: Int
init(_ x: Int) { someProperty = x }
}
let classes: [AnyObject] = (0..<10_000).map { _ in MyClass(Int(arc4random())) }
func timeRun<T>(name: String, f: ()->T) -> String {
let start = CFAbsoluteTimeGetCurrent()
let result = f()
let end = CFAbsoluteTimeGetCurrent()
let timeStr = toString(Int((end - start) * 1_000_000))
return "\(name)\t\(timeStr)µs, produced \(result)"
}
let runs = [
("Using AnyObj.someProperty", {
reduce(classes, 0) { prev,next in max(prev,next.someProperty) }
}),
("Using MyClass.someProperty", {
reduce(classes, 0) { prev,next in
(next as? MyClass).map { max(prev,$0.someProperty) } ?? prev
}
}),
("Using plain ol' for loop", {
var maxSoFar = 0
for obj in classes {
if let mc = obj as? MyClass {
maxSoFar = max(maxSoFar, mc.someProperty)
}
}
return maxSoFar
}),
]
println("\n".join(map(runs, timeRun)))
Output from this on my machine:
Using AnyObj.someProperty 4115µs, produced 4294310151
Using MyClass.someProperty 1169µs, produced 4294310151
Using plain ol' for loop 1178µs, produced 4294310151

Can't reproduce your exact example, but you can try moving away the filter. The following code should be functionally equivalent to your last attempt.
var extremes4 = reduce(subviews, startValues) {
$1 is T2GCell ? (min($0.lowest, $1.tag), max($0.highest, $1.tag)) : $0
}
Thus you don't iterate twice on subviews. Notice I removed lazy since it appears you always use the entire list.
By the way, IMHO, functional programming can be a very useful approach, but I would think twice before sacrificing code clarity for the only purpose of a fancy functional approach. Thus if a for loop is clearer, and even faster ... just use it ;-) That said, is good for you to experiment with different ways to approach the same problem.

One issue I could think of is that the looping is done twice. First the filter returns an filtered array and then a looping in reduce.
filter(_:)
Returns an array containing the elements of the array for which a provided closure indicates a match.
Declaration
func filter(includeElement: (T) -> Bool) -> [T]
Discussion
Use this method to return a new array by filtering an existing array. The closure that you supply for includeElement: should return a Boolean value to indicate whether an element should be included (true) or excluded (false) from the final collection:
While in the second case there is only one loop.
I am not sure if there is any difference of execution time for 'is' as 'as?' operator.

Related

Corrupted memory with DispatchQueue.concurrentPerform

I see a strange behavior with the following code (which runs on Playground).
import Foundation
let count = 100
var array = [[Int]](repeating:[Int](), count:count)
DispatchQueue.concurrentPerform(iterations: count) { (i) in
array[i] = Array(i..<i+count)
}
// Evaluation
for (i,value) in array.enumerated() {
if (value.count != count) {
print(i, value.count)
}
}
The result is different each time, and sometime crashes with memory corruption. It looks like a memory reallocation (of "array") is happening while another thread is accessing the memory.
Is this a bug (of iOS) or an expected behavior? Am I missing something?
This is expected behaviour. Swift arrays are not thread safe; That is, modifying a Swift array from multiple threads concurrently will cause corruption.
I realise that you are just experimenting, but even if arrays were thread safe, this would not be a good use of concurrentPerform and would probably perform worse than a simple for loop given the threading overhead.
Once you introduce an appropriate synchronisation method to guard the array update, such as dispatching that update onto a serial dispatch queue, it will definitely perform more slowly than a simple for loop
Here is the solution. Thank you for quick responses.
import Foundation
let count = 1000
var arrays = [[Int]](repeating:[Int](), count:count)
let dispatchGroup = DispatchGroup()
let lockQueue = DispatchQueue(label: "lockQueue")
DispatchQueue.concurrentPerform(iterations: count) { (i) in
dispatchGroup.enter()
let array = Array(i..<i+count) // The actual code is very complex
lockQueue.async {
arrays[i] = array
dispatchGroup.leave()
}
}
dispatchGroup.wait()
// Evaluation
for (i,value) in arrays.enumerated() {
if (value.count != count) {
print(i, value.count)
}
}
In my case I had to generate 24k elements array with Float multiple times per second and it takes around 40ms for each on old iPhone 6.
Since each element of the array is only assigned once, I've decided to try raw array using pointers:
class UnsafeArray<T> {
let count: Int
let cArray: UnsafeMutablePointer<T>
init(_ size: Int) {
count = size
cArray = UnsafeMutablePointer<T>.allocate(capacity: size)
}
subscript(index: Int) -> T {
get { return cArray[index] }
set { cArray[index] = newValue }
}
deinit {
free(cArray)
}
}
Then I used it like this:
let result = UnsafeArray<Float>(24000)
DispatchQueue.concurrentPerform(iterations: result.count, execute: { i in
result[i] = someCalculation()
})
And it worked! Now it takes 9-16ms. Also, I have no memory leaks using this code.

Swift Performance with instance property arrays

I've come across an interesting Swift performance problem, and was looking for some suggestions, analysis on why this is happening.
I have an algorithm that required hundreds of thousands of array accesses in a loop. I find that if I reference the array as an instance property (from inside the same class instance), the performance is very poor. It seems that the array is being de-referenced at each iteration. That seems strange given that the arrays are members of the same class doing the work. Wouldn't self.x not require x to be dereferenced over and over again? The equivalent Java code doesn't have the same performance problem.
In the below example, test3 takes 0.5 seconds and test4 takes 0.15 seconds.
Do I really have to go through all my code and assign locally scoped arrays every single time I do something?
Any tips/ideas would be welcome. I have the compiler optimization set to Fast-O.
Simon
EDIT: The answer is spelled out in this article here:
https://developer.apple.com/swift/blog/?id=27
Hope it helps. Long story short, private/final for the class scoped variables will remove the need for the unwanted indirection to access the array.
class MyClass {
var array_1 = [Int64] (count: 16 , repeatedValue: 0)
var array_2 = [Int64] (count: 16 , repeatedValue: 0)
func runTest3() {
// test #3
//
let start = NSDate().timeIntervalSince1970
for i in 0 ... 10000000 {
if (array_1[ i%16 ] & array_2[ i%16 ] ) != 0 {
// whatever
}
}
let passed = NSDate().timeIntervalSince1970 - start
print("3 time passed: \(passed)")
}
func runTest4() {
// test #4
//
let start = NSDate().timeIntervalSince1970
let localArray_1 = self.array_1
let localArray_2 = self.array_2
for i in 0 ... 10000000 {
if (localArray_1[ i%16 ] & localArray_2[ i%16 ] ) != 0 {
// whatever
}
}
let passed = NSDate().timeIntervalSince1970 - start
print("4 time passed: \(passed)")
}
}
https://developer.apple.com/swift/blog/?id=27
Private/Final for the class-scoped variables removes the performance problem. Reasons in the above article. Thanks everyone for the help.

What is a fast way to convert a string of two characters to an array of booleans?

I have a long string (sometimes over 1000 characters) that I want to convert to an array of boolean values. And it needs to do this many times, very quickly.
let input: String = "001"
let output: [Bool] = [false, false, true]
My naive attempt was this:
input.characters.map { $0 == "1" }
But this is a lot slower than I'd like. My profiling has shown me that the map is where the slowdown is, but I'm not sure how much simpler I can make that.
I feel like this would be wicked fast without Swift's/ObjC's overhead. In C, I think this is a simple for loop where a byte of memory is compared to a constant, but I'm not sure what the functions or syntax is that I should be looking at.
Is there a way to do this much faster?
UPDATE:
I also tried a
output = []
for char in input.characters {
output.append(char == "1")
}
And it's about 15% faster. I'm hoping for a lot more than that.
This is faster:
// Algorithm 'A'
let input = "0101010110010101010"
var output = Array<Bool>(count: input.characters.count, repeatedValue: false)
for (index, char) in input.characters.enumerate() where char == "1" {
output[index] = true
}
Update: under input = "010101011010101001000100000011010101010101010101"
0.0741 / 0.0087, where this approach is faster that author's in 8.46 times. With bigger data correlation more positive.
Also, with using nulTerminatedUTF8 speed a little increased, but not always speed higher than algorithm A:
// Algorithm 'B'
let input = "10101010101011111110101000010100101001010101"
var output = Array<Bool>(count: input.nulTerminatedUTF8.count, repeatedValue: false)
for (index, code) in input.nulTerminatedUTF8.enumerate() where code == 49 {
output[index] = true
}
In result graph appears, with input length 2196, where first and last 0..1, A – second, B – third point.
A: 0.311sec, B: 0.304sec
import Foundation
let input:String = "010101011001010101001010101100101010100101010110010101010101011001010101001010101100101010100101010101011001010101001010101100101010100101010"
var start = clock()
var output = Array<Bool>(count: input.nulTerminatedUTF8.count, repeatedValue: false)
var index = 0
for val in input.nulTerminatedUTF8 {
if val != 49 {
output[index] = true
}
index+=1
}
var diff = clock() - start;
var msec = diff * 1000 / UInt(CLOCKS_PER_SEC);
print("Time taken \(Double(msec)/1000.0) seconds \(msec%1000) milliseconds");
This should be really fast. Try it out. For 010101011010101001000100000011010101010101010101 it takes 0.039 secs.
I would guess that this is as fast as possible:
let targ = Character("1")
let input: String = "001" // your real string goes here
let inputchars = Array(input.characters)
var output:[Bool] = Array.init(count: inputchars.count, repeatedValue: false)
inputchars.withUnsafeBufferPointer {
inputbuf in
output.withUnsafeMutableBufferPointer {
outputbuf in
var ptr1 = inputbuf.baseAddress
var ptr2 = outputbuf.baseAddress
for _ in 0..<inputbuf.count {
ptr2.memory = ptr1.memory == targ
ptr1 = ptr1.successor()
ptr2 = ptr2.successor()
}
}
}
// output now contains the result
The reason is that, thanks to the use of buffer pointers, we are simply cycling through contiguous memory, just like the way you cycle through a C array by incrementing its pointer. Thus, once we get past the initial setup, this should be as fast as it would be in C.
EDIT In an actual test, the time difference between the OP's original method and this one is the difference between
13.3660290241241
and
0.219357967376709
which is a pretty dramatic speed-up. I hasten to add, however, that I have excluded the initial set-up from the timing test. This line:
let inputchars = Array(input.characters)
...is particularly expensive.
This should be a little faster than the enumerate() where char == "1" version (0.557s for 500_000 alternating ones and zeros vs. 1.159s algorithm 'A' from diampiax)
let input = inputStr.utf8
let n = input.count
var output = [Bool](count: n, repeatedValue: false)
let one = UInt8(49) // 1
for (idx, char) in input.enumerate() {
if char == one { output[idx] = true }
}
but it's also a lot less readable ;-p
edit: both versions are slower than the map variant, maybe you forgot to compile with optimizations?
One more step should speed that up even more. Using reserveCapacity will resize the array once before the loops starts instead of trying to do it as the loop runs.
var output = [Bool]()
output.reserveCapacity(input.characters.count)
for char in input.characters {
output.append(char == "1")
}
Use withCString(_:) to retrieve a raw UnsafePointer<Int8>. Iterate over that and compare to 49 (ascii value of "1").
What about a more functional style? It's not fastest (47 ms), today, for sure...
import Cocoa
let start = clock()
let bools = [Bool](([Character] ("010101011001010101001010101100101010100101010110010101010101011001010101001010101100101010100101010101011001010101001010101100101010100101010".characters)).map({$0 == "1"}))
let msec = (clock() - start) * 1000 / UInt(CLOCKS_PER_SEC);
print("Time taken \(Double(msec)/1000.0) seconds \(msec%1000) milliseconds");
I need to some testing to be sure but I think one issue with many approaches given including the original map is that they need to iterate over the string to count the characters and then a second time to actually process the characters.
Have you tried:
let output = [Bool](input.characters.lazy.map { $0 == "1" })
This might only do a single iteration.
The other thing that could speed things up is if you can avoid using strings but instead use arrays of characters of an appropriate encoding (particularly if is more fixed size units (e.g. UTF16 or ASCII). Then then length lookup will be O(1) rather than O(n) and the iteration may be faster too
BTW always test performance with the optimiser enabled and never in the Playground because the performance characteristics are completely different, sometimes by a factor of 100.

This algorithm takes more time than expected

I have this code, where i would call this "checkingfunction" function. I am not using any threading in my app, I would love to use if it benefits the performance of my app.
The "checkingfunction", takes more time than i expected. It takes more than 30 seconds to complete the execution. I cant wait that long in my app. That is not good, in middle of the game.
Somebody help me out here to rewrite the function, so that i can execute it in a faster way. Some functional programming way, if possible.
func returnCharactersFromAFourLetterString(inputString : String) -> (First : Character,Second : Character, Third : Character, Fourth : Character)
{
return (inputString[advance(inputString.startIndex, 0)],inputString[advance(inputString.startIndex, 1)],inputString[advance(inputString.startIndex, 2)],inputString[advance(inputString.startIndex, 3)])
}
func checkingWords(userEnteredWord : String)
{
var tupleFourLetters = self.returnCharactersFromAFourLetterString(userEnteredWord)
var firstLetter = String(tupleFourLetters.First)
var secondLetter = String(tupleFourLetters.Second)
var thirdLetter = String(tupleFourLetters.Third)
var fourthLetter = String(tupleFourLetters.Fourth)
var mainArrayOfWords : [String] = [] // This array contains around 0.2 million words
var userEnteredTheseWords : [String] = [] // This array contains less than 10 elements
// Check for FirstLetter
for index in 0..<array.count // Array of Letters as Strings , count = 200
{
var input = array[index]
var firstWord = "\(input)\(secondLetter)\(thirdLetter)\(fourthLetter)"
var secondWord = "\(firstLetter)\(input)\(thirdLetter)\(fourthLetter)"
var thirdWord = "\(firstLetter)\(secondLetter)\(input)\(fourthLetter)"
var fourthWord = "\(firstLetter)\(secondLetter)\(thirdLetter)\(input)"
if !contains(userEnteredTheseWords, firstWord) && !contains(userEnteredTheseWords, secondWord) && !contains(userEnteredTheseWords, thirdWord) && !contains(userEnteredTheseWords, fourthWord)
{
if contains(mainArrayOfWords, firstWord )
{
self.delegate?.wordMatchedFromDictionary(firstWord)
return
}
else if contains(mainArrayOfWords, secondWord)
{
self.delegate?.wordMatchedFromDictionary(secondWord)
return
}
else if contains(mainArrayOfWords, thirdWord)
{
self.delegate?.wordMatchedFromDictionary(thirdWord)
return
}
else if contains(mainArrayOfWords, fourthWord)
{
self.delegate?.wordMatchedFromDictionary(fourthWord)
return
}
}
if index == array.count - 1
{
self.delegate?.wordMatchedFromDictionary("NoWord")
}
}
}
Input of this function is a four letter word, Inside this function i am changing each letter by looping through that 200 letters, and checking in the mainArray that, whether any of these changed words exists in mainArray. If exists, then return me that word, otherwise just return NoWord. So totally, we can see that we are checking that contains(mainArray, word) thing around 800 times, i think this is the line which consumes more time, cause mainArray contains 0.2 million words.
Use dictionaries to look up things.
When you measure times, especially with Swift code, measure a release build, not a debug build. On the other hand, measure on the slowest device capable of running your code.

Using Swift to sort array of NSString's results in low memory warning

I'm implementing a search function in which the end result is an Array of NSString, sorted by how closely they resemble the search string. The fuzzy match algorithm is custom, and typically doesn't have a problem.
It does, however, have a memory issue when the Array contains thousands of NSString that are very similar (i.e. Title, Copy of Title, Title 2). Instruments reports that the persistent memory at the time of crash is 98% from malloc of NSString with my fuzzy match algorithm being the responsible caller.
On smaller sets (2,000 random strings) that don't crash, the memory is released and everything appears behave expectedly. Any thoughts on how to decrease the large memory usage?
data = data.filter({ (item) -> Bool in
var itemString = self.converter(item)
return itemString.scoreAgainst(string) > 0
}).sorted({ (item1, item2) -> Bool in
var string1 = self.converter(item1)
var string2 = self.converter(item2)
return string1.scoreAgainst(string) > string2.scoreAgainst(string)
})
The method scoreAgainst is really kosher. It just does a series of lowercasing, uppercasing, rangeOfString: and substringWithRange: to give a score for the match.
If memory is getting released when its done, but the peak memory usage is too high, you would employ autoreleasepool to minimize peak memory usage:
data = data.filter { (item) -> Bool in
var isPositive: Bool!
autoreleasepool {
let itemString = self.converter(item)
isPositive = itemString.scoreAgainst(string) > 0
}
return isPositive
}.sorted { (item1, item2) -> Bool in
var isGreaterThan: Bool!
autoreleasepool {
let string1 = self.converter(item1)
let string2 = self.converter(item2)
isGreaterThan = string1.scoreAgainst(string) > string2.scoreAgainst(string)
}
return isGreaterThan
}
As Alex points out, if converter and scoreAgainst are expensive, you might want to reduce the number of calls you have to do by calling this only once for each item (though I would suggest you still need autoreleasepool, because this simpler logic reduces the number of calls to your routine, but doesn't eliminate it):
data = data.map {
item -> (String, Double) in
var score: Double!
autoreleasepool {
score = self.converter(item).scoreAgainst(string)
}
return (item, score)
}
.filter { $0.1 > 0 }
.sorted { $0.1 > $1.1 }
.map { $0.0 }
It wasn't clear if item1 was a String or whatever, so you'd want to make sure the in statement of the map call matches it, but hopefully it illustrates the idea.
You are doing something very expensive and repetitive there; constructing two new objects in each call to compare.
In principle, compare might be called as many as N^2 times (for a very bad sort algorithm), and more likely N.log(N) times.
how about the following code:
data = Array(zip2(data,data.map(self.converter($0).scoreAgainst(string)>0)))
.filter({ (item,score) -> Bool in
return score > 0
}).sorted({ (item1, item2) -> Bool in
return item1.score > item2.score
}).map({(item,score) in item})
Here we calculate the score N times, instead of N + N.log(N) times, and avoid constructing the temporary objects.

Resources