Swift Collection underestimateCount usage - ios

I wonder, what is the use case for Collection underestimateCount? Documentation says that it has the same complexity as standard Collection count.
/// Returns a value less than or equal to the number of elements in
/// `self`, *nondestructively*.
///
/// - Complexity: O(N).
public func underestimateCount() -> Int
But it doesn't describe when it should be used and for what reason.

underestimatedCount is actually a requirement of the Sequence protocol, and has a default implementation that just returns 0:
public var underestimatedCount: Int {
return 0
}
However, for sequences that provide their own implementation of underestimatedCount, this can be useful for logic that needs a lower bound of how long the sequence is, without having to iterate through it (remember that Sequence gives no guarantee of non-destructive iteration).
For example, the map(_:) method on Sequence (see its implementation here) uses underestimateCount in order to reserve an initial capacity for the resultant array:
public func map<T>(
_ transform: (Iterator.Element) throws -> T
) rethrows -> [T] {
let initialCapacity = underestimatedCount
var result = ContiguousArray<T>()
result.reserveCapacity(initialCapacity)
// ...
This allows map(_:) to minimise the cost of repeatedly appending to the result, as an initial block of memory has (possibly) already been allocated for it (although its worth noting in any case that ContiguousArray has an exponential growth strategy that amortises the cost of appending).
However, in the case of a Collection, the default implementation of underestimateCount actually just returns the collection's count:
public var underestimatedCount: Int {
// TODO: swift-3-indexing-model - review the following
return numericCast(count)
}
Which will be an O(1) operation for collections that conform to RandomAccessCollection, O(n) otherwise.
Therefore, because of this default implementation, using a Collection's underestimatedCount directly is definitely less common than using a Sequence's, as Collection guarantees non-destructive iteration, and in most cases underestimatedCount will just return the count.
Although, of course, custom collection types could provide their own implementation of underestimatedCount – giving a lower bound of how many elements they contain, in a possibly more efficient way than their count implementation, which could potentially be useful.

(Since the duplicate target I've suggested is somewhat outdated)
In Swift 3, the method underestimateCount() has been replaced by the computed property underestimatedCount. We can have a look at the source code for the implementation of the latter for Collection:
/// A value less than or equal to the number of elements in the collection.
///
/// - Complexity: O(1) if the collection conforms to
/// `RandomAccessCollection`; otherwise, O(*n*), where *n* is the length
/// of the collection.
public var underestimatedCount: Int {
// TODO: swift-3-indexing-model - review the following
return numericCast(count)
}
/// The number of elements in the collection.
///
/// - Complexity: O(1) if the collection conforms to
/// `RandomAccessCollection`; otherwise, O(*n*), where *n* is the length
/// of the collection.
public var count: IndexDistance {
return distance(from: startIndex, to: endIndex)
}
Its apparent that underestimatedCount simply makes use of count for types conforming to Collection (unless these types implements their own version of underestimatedCount).

Related

Casting a multidimensional array to Data object for TF inference

I am currently using the Swift release of Tensorflow in my iOS app.
My model is working fine, but I am having trouble copying the data into the first Tensor so I can use the neural net to detect stuff.
I consulted the testsuite inside the repository, and their code is working as follows:
They are using some extensions:
extension Array {
/// Creates a new array from the bytes of the given unsafe data.
///
/// - Note: Returns `nil` if `unsafeData.count` is not a multiple of
/// `MemoryLayout<Element>.stride`.
/// - Parameter unsafeData: The data containing the bytes to turn into an array.
init?(unsafeData: Data) {
guard unsafeData.count % MemoryLayout<Element>.stride == 0 else { return nil }
let elements = unsafeData.withUnsafeBytes {
UnsafeBufferPointer<Element>(
start: $0,
count: unsafeData.count / MemoryLayout<Element>.stride
)
}
self.init(elements)
}
}
extension Data {
/// Creates a new buffer by copying the buffer pointer of the given array.
///
/// - Warning: The given array's element type `T` must be trivial in that it can be copied bit
/// for bit with no indirection or reference-counting operations; otherwise, reinterpreting
/// data from the resulting buffer has undefined behavior.
/// - Parameter array: An array with elements of type `T`.
init<T>(copyingBufferOf array: [T]) {
self = array.withUnsafeBufferPointer(Data.init)
}
}
to create the array containing the data, and a Data object from that:
static let inputData = Data(copyingBufferOf: [Float32(1.0), Float32(3.0)])
Afterwards, they copy the inputData into the neural net.
I've tried to modify their code to load an image into a [1,28,28,1] Tensor.
The image is looking something like this:
[[[[Float32(254.0)],
[Float32(255.0)],
[Float32(254.0)],
[Float32(250.0)],
[Float32(252.0)],
[Float32(255.0)],
[Float32(255.0)],
[Float32(255.0)],
[Float32(255.0)],
[Float32(254.0)],
[Float32(214.0)],
[Float32(160.0)],
[Float32(130.0)],
[Float32(124.0)],
[Float32(129.0)],
...
you get the point.
But if I try to cast that to Data / init Data with the image data I somehow only get 8 bytes:
private func createTestData() -> Data {
return Data(copyingBufferOf:
[[[[Float32(254.0)],
[Float32(255.0)],
[Float32(254.0)],
...
Same goes for the code in the tests, but for them, it is fine (2*Float32 = 8 bytes).
For me, that is considerably too small (should be 28*28*4 = 3136 bytes)!
Is there something I am missing (have I overlooked something)?
What do I need to do to get my images into the correct arrays/data types?
A Swift Array is a fixed-sized structure with (opaque) pointers to the actual element storage. The withUnsafeBufferPointer() method calls the given closure with a buffer pointer to that element storage. In the case of a [Float] array, that is a pointer to the memory address of the floating point values. That's why
array.withUnsafeBufferPointer(Data.init)
works to get a Data value representing the floating point numbers.
If you pass a nested array (e.g. of type [[Float]]) to the withUnsafeBufferPointer() method then the closure is called with a pointer to the Array structures of the inner arrays. So the element type now is not Float but [Float] – and not a “trivial type” in the sense of the warning
/// - Warning: The given array's element type `T` must be trivial in that it can be copied bit
/// for bit with no indirection or reference-counting operations; otherwise, reinterpreting
/// data from the resulting buffer has undefined behavior.
What you need to do is to flatten the nested array to a simple array, and then create a Data value from the simple array.

"Preloading" A Dictionary With Keys in Swift

This is a fairly simple issue, but one I would like to solve, as it MAY help with performance.
I want to find out if Swift has a way to create a Dictionary, specifying ONLY keys, and maybe no values, or a single value that is set in each entry.
In other words, I want to create a Dictionary object, and "preload" its keys. Since this is Swift, the values could be 0 or nil (or whatever is a default empty).
The reason for this, is so that I can avoid two loops, where I go through once, filling a Dictionary with keys and empty values, and a second one, where I then set those values (There's a practical reason for wanting this, which is a bit out of the scope of this question).
Here's sort of what I'm thinking:
func gimme_a_new_dictionary(_ inKeyArray:[Int]) -> [Int:Int] {
var ret:[Int:Int] = [:]
for key in inKeyArray {
ret[key] = 0
}
return ret
}
let test1 = gimme_a_new_dictionary([4,6,1,3,0,1000])
But I'm wondering if there's a quicker way to do the same thing (as in "language construct" way -I could probably figure out a faster way to do this in a function).
UPDATE: The first solution ALMOST works. It works fine in Mac/iOS. However, the Linux version of Swift 3 doesn't seem to have the uniqueKeysWithValues initializer, which is annoying.
func gimme_a_new_dictionary(_ inKeyArray:[Int]) -> [Int:Int] {
return Dictionary<Int,Int>(uniqueKeysWithValues: inKeyArray.map {($0, 0)})
}
let test1 = gimme_a_new_dictionary([4,6,1,3,0,1000])
For Swift 4, you can use the dictionary constructor that takes a sequence and use map to create the sequence of tuples from your array of keys:
let dict = Dictionary(uniqueKeysWithValues: [4,6,1,3,0,1000].map {($0, 0)})
I presume you could optimize your code in terms of allocation by specifying the minimum capacity during the initialization. However, one liner may be the above answer, it's essentially allocation and looping to add 0 in each position.
func gimme_a_new_dictionary(_ inKeyArray:[Int], minCapacity: Int) -> [Int:Int] {
var ret = Dictionray<Int, Int>(minimumCapacity: minCapacity)
for key in inKeyArray {
ret[key] = 0
}
return ret
}
let test1 = gimme_a_new_dictionary([4,6,1,3,0,1000])
Take a look at this official documentation:
/// Use this initializer to avoid intermediate reallocations when you know
/// how many key-value pairs you are adding to a dictionary. The actual
/// capacity of the created dictionary is the smallest power of 2 that
/// is greater than or equal to `minimumCapacity`.
///
/// - Parameter minimumCapacity: The minimum number of key-value pairs to
/// allocate buffer for in the new dictionary.
public init(minimumCapacity: Int)

unable to find enumerate() func in Swift standard library reference

I'm new to Swift and is learning the concept of Array. I saw the code below from "swift programming language 2.1".
var array = [1,2,3,4,5]
for (index, value) in array.enumerate() {
print("\(value) at index \(index)")
}
I want to read a bit more about the enumerate() func so I looked up the Apple developer's page on Array, however, I could not find a func named enumerate() on this page. Am I looking at the wrong place or is there something I am missing? Coudl someone please give me a hand? Thanks in advance for any help!
When you encounter a Swift standard library function or method that you can't find documentation on, command-click on it in Xcode. That will take you to its definition, which in this case is
extension SequenceType {
/// Return a lazy `SequenceType` containing pairs (*n*, *x*), where
/// *n*s are consecutive `Int`s starting at zero, and *x*s are
/// the elements of `base`:
///
/// > for (n, c) in "Swift".characters.enumerate() {
/// print("\(n): '\(c)'")
/// }
/// 0: 'S'
/// 1: 'w'
/// 2: 'i'
/// 3: 'f'
/// 4: 't'
#warn_unused_result
public func enumerate() -> EnumerateSequence<Self>
}
What the above states is that enumerate() gives you back a tuple for each value in your collection, with the first element in the tuple being the index of the current item and the second being the value of that item.

Swift nil check performance

I have a computed property that does a nil check.
var _textSize: CGSize?
var textSize: CGSize {
get {
if _textSize == nil {
// compute _textSize
}
return _textSize
}
}
When profiling in Instruments, the == nil check appears as:
I believe that the static == infix<A where ...> (A?, A?) -> Bool is the nil check. Is this the case and if so are nil checks expensive?
The correct answer really depends by what you mean with "expensive".
My point of view
However, IMHO, checking if a value is nil is not an expensive operation.
It is executed in time O(1), so it's a constant time and does not grow depending on other values.
Finally it's a very easy operation for the CPU.
Class vs Struct
I suppose there is a difference (in terms of required time) if the computed property does belong to a Class or to a Struct.
Class
In the first case (Class) the OS needs to retrieve from the Heap the instance of the class and next checking whether the property is nil. The Heap is on the RAM which is fast (but not the fastest memory on the device).
Struct
On the other hand, if we are using a Struct, data about the computed property is available on the Stack which if faster then the Heap.
Wrap up
So, in conclusion:
checking whether the computed property of a Class is nil is fast
checking whether the computed property of a Struct is nil is very fast

How to implement the Hashable Protocol in Swift for an Int array (a custom string struct)

I am making a structure that acts like a String, except that it only deals with Unicode UTF-32 scalar values. Thus, it is an array of UInt32. (See this question for more background.)
What I want to do
I want to be able to use my custom ScalarString struct as a key in a dictionary. For example:
var suffixDictionary = [ScalarString: ScalarString]() // Unicode key, rendered glyph value
// populate dictionary
suffixDictionary[keyScalarString] = valueScalarString
// ...
// check if dictionary contains Unicode scalar string key
if let renderedSuffix = suffixDictionary[unicodeScalarString] {
// do something with value
}
Problem
In order to do that, ScalarString needs to implement the Hashable Protocol. I thought I would be able to do something like this:
struct ScalarString: Hashable {
private var scalarArray: [UInt32] = []
var hashValue : Int {
get {
return self.scalarArray.hashValue // error
}
}
}
func ==(left: ScalarString, right: ScalarString) -> Bool {
return left.hashValue == right.hashValue
}
but then I discovered that Swift arrays don't have a hashValue.
What I read
The article Strategies for Implementing the Hashable Protocol in Swift had a lot of great ideas, but I didn't see any that seemed like they would work well in this case. Specifically,
Object property (array is does not have hashValue)
ID property (not sure how this could be implemented well)
Formula (seems like any formula for a string of 32 bit integers would be processor heavy and have lots of integer overflow)
ObjectIdentifier (I'm using a struct, not a class)
Inheriting from NSObject (I'm using a struct, not a class)
Here are some other things I read:
Implementing Swift's Hashable Protocol
Swift Comparison Protocols
Perfect hash function
Membership of custom objects in Swift Arrays and Dictionaries
How to implement Hashable for your custom class
Writing a good Hashable implementation in Swift
Question
Swift Strings have a hashValue property, so I know it is possible to do.
How would I create a hashValue for my custom structure?
Updates
Update 1: I would like to do something that does not involve converting to String and then using String's hashValue. My whole point for making my own structure was so that I could avoid doing lots of String conversions. String gets it's hashValue from somewhere. It seems like I could get it using the same method.
Update 2: I've been looking into the implementation of string hash codes algorithms from other contexts. I'm having a little difficulty knowing which is best and expressing them in Swift, though.
Java hashCode algorithm
C algorithms
hash function for string (SO question and answers in C)
Hashing tutorial (Virginia Tech Algorithm Visualization Research Group)
General Purpose Hash Function Algorithms
Update 3
I would prefer not to import any external frameworks unless that is the recommended way to go for these things.
I submitted a possible solution using the DJB Hash Function.
Update
Martin R writes:
As of Swift 4.1, the compiler can synthesize Equatable and Hashable
for types conformance automatically, if all members conform to
Equatable/Hashable (SE0185). And as of Swift 4.2, a high-quality hash
combiner is built-in into the Swift standard library (SE-0206).
Therefore there is no need anymore to define your own hashing
function, it suffices to declare the conformance:
struct ScalarString: Hashable, ... {
private var scalarArray: [UInt32] = []
// ... }
Thus, the answer below needs to be rewritten (yet again). Until that happens refer to Martin R's answer from the link above.
Old Answer:
This answer has been completely rewritten after submitting my original answer to code review.
How to implement to Hashable protocol
The Hashable protocol allows you to use your custom class or struct as a dictionary key. In order to implement this protocol you need to
Implement the Equatable protocol (Hashable inherits from Equatable)
Return a computed hashValue
These points follow from the axiom given in the documentation:
x == y implies x.hashValue == y.hashValue
where x and y are values of some Type.
Implement the Equatable protocol
In order to implement the Equatable protocol, you define how your type uses the == (equivalence) operator. In your example, equivalence can be determined like this:
func ==(left: ScalarString, right: ScalarString) -> Bool {
return left.scalarArray == right.scalarArray
}
The == function is global so it goes outside of your class or struct.
Return a computed hashValue
Your custom class or struct must also have a computed hashValue variable. A good hash algorithm will provide a wide range of hash values. However, it should be noted that you do not need to guarantee that the hash values are all unique. When two different values have identical hash values, this is called a hash collision. It requires some extra work when there is a collision (which is why a good distribution is desirable), but some collisions are to be expected. As I understand it, the == function does that extra work. (Update: It looks like == may do all the work.)
There are a number of ways to calculate the hash value. For example, you could do something as simple as returning the number of elements in the array.
var hashValue: Int {
return self.scalarArray.count
}
This would give a hash collision every time two arrays had the same number of elements but different values. NSArray apparently uses this approach.
DJB Hash Function
A common hash function that works with strings is the DJB hash function. This is the one I will be using, but check out some others here.
A Swift implementation provided by #MartinR follows:
var hashValue: Int {
return self.scalarArray.reduce(5381) {
($0 << 5) &+ $0 &+ Int($1)
}
}
This is an improved version of my original implementation, but let me also include the older expanded form, which may be more readable for people not familiar with reduce. This is equivalent, I believe:
var hashValue: Int {
// DJB Hash Function
var hash = 5381
for(var i = 0; i < self.scalarArray.count; i++)
{
hash = ((hash << 5) &+ hash) &+ Int(self.scalarArray[i])
}
return hash
}
The &+ operator allows Int to overflow and start over again for long strings.
Big Picture
We have looked at the pieces, but let me now show the whole example code as it relates to the Hashable protocol. ScalarString is the custom type from the question. This will be different for different people, of course.
// Include the Hashable keyword after the class/struct name
struct ScalarString: Hashable {
private var scalarArray: [UInt32] = []
// required var for the Hashable protocol
var hashValue: Int {
// DJB hash function
return self.scalarArray.reduce(5381) {
($0 << 5) &+ $0 &+ Int($1)
}
}
}
// required function for the Equatable protocol, which Hashable inheirits from
func ==(left: ScalarString, right: ScalarString) -> Bool {
return left.scalarArray == right.scalarArray
}
Other helpful reading
Which hashing algorithm is best for uniqueness and speed?
Overflow Operators
Why are 5381 and 33 so important in the djb2 algorithm?
How are hash collisions handled?
Credits
A big thanks to Martin R over in Code Review. My rewrite is largely based on his answer. If you found this helpful, then please give him an upvote.
Update
Swift is open source now so it is possible to see how hashValue is implemented for String from the source code. It appears to be more complex than the answer I have given here, and I have not taken the time to analyze it fully. Feel free to do so yourself.
Edit (31 May '17): Please refer to the accepted answer. This answer is pretty much just a demonstration on how to use the CommonCrypto Framework
Okay, I got ahead and extended all arrays with the Hashable protocol by using the SHA-256 hashing algorithm from the CommonCrypto framework. You have to put
#import <CommonCrypto/CommonDigest.h>
into your bridging header for this to work. It's a shame that pointers have to be used though:
extension Array : Hashable, Equatable {
public var hashValue : Int {
var hash = [Int](count: Int(CC_SHA256_DIGEST_LENGTH) / sizeof(Int), repeatedValue: 0)
withUnsafeBufferPointer { ptr in
hash.withUnsafeMutableBufferPointer { (inout hPtr: UnsafeMutableBufferPointer<Int>) -> Void in
CC_SHA256(UnsafePointer<Void>(ptr.baseAddress), CC_LONG(count * sizeof(Element)), UnsafeMutablePointer<UInt8>(hPtr.baseAddress))
}
}
return hash[0]
}
}
Edit (31 May '17): Don't do this, even though SHA256 has pretty much no hash collisions, it's the wrong idea to define equality by hash equality
public func ==<T>(lhs: [T], rhs: [T]) -> Bool {
return lhs.hashValue == rhs.hashValue
}
This is as good as it gets with CommonCrypto. It's ugly, but fast and not manypretty much no hash collisions for sure
Edit (15 July '15): I just made some speed tests:
Randomly filled Int arrays of size n took on average over 1000 runs
n -> time
1000 -> 0.000037 s
10000 -> 0.000379 s
100000 -> 0.003402 s
Whereas with the string hashing method:
n -> time
1000 -> 0.001359 s
10000 -> 0.011036 s
100000 -> 0.122177 s
So the SHA-256 way is about 33 times faster than the string way. I'm not saying that using a string is a very good solution, but it's the only one we can compare it to right now
It is not a very elegant solution but it works nicely:
"\(scalarArray)".hashValue
or
scalarArray.description.hashValue
Which just uses the textual representation as a hash source
One suggestion - since you are modeling a String, would it work to convert your [UInt32] array to a String and use the String's hashValue? Like this:
var hashValue : Int {
get {
return String(self.scalarArray.map { UnicodeScalar($0) }).hashValue
}
}
That could conveniently allow you to compare your custom struct against Strings as well, though whether or not that is a good idea depends on what you are trying to do...
Note also that, using this approach, instances of ScalarString would have the same hashValue if their String representations were canonically equivalent, which may or may not be what you desire.
So I suppose that if you want the hashValue to represent a unique String, my approach would be good. If you want the hashValue to represent a unique sequence of UInt32 values, #Kametrixom's answer is the way to go...

Resources