Filter huge array optimising performance - ios

I have an array with over 300k objects which I'm showing in a UITableView. When filtering by prefix match using the filter method, the first search takes a bit over 60s! After that, the search is way faster taking around 1s, which I'd still want to improve a bit more.
The object looks like this:
struct Book {
let id: Int
let title: String
let author: String
let summary: String
}
This is how I'm filtering at the moment:
filteredBooks = books.filter { $0.title.lowercased().hasPrefix(prefix.lowercased()) }
The data comes from a JSON file which I decode using Codable (which takes a bit longer than I would like as well). I'm trying to achieve this without a database or any kind of framework implementation nor lazy loading the elements. I'd like to be able to show the 300k objects in the UITableView and realtime filter with a decent performance.
I've Googled a bit and found the Binary Search and Trie search algorithms but didn't know how to implement them to be able to use them with Codable and my struct. Also, maybe replacing the struct with another data type would help but not sure which either.

Because I liked the challenge I put something together.
It is basically a tree with each layer of the tree containing a title prefix plus the elements with that exact match plus a list of lower trees with each having the same prefix plus one more letter of the alphabet:
extension String {
subscript (i: Int) -> String {
let start = index(startIndex, offsetBy: i)
return String(self[start...start])
}
}
struct Book {
let id: Int
let title: String
let author: String
let summary: String
}
class PrefixSearchable <Element> {
let prefix: String
var elements = [Element]()
var subNodes = [String:PrefixSearchable]()
let searchExtractor : (Element) -> String
private init(prefix: String, searchExtractor:#escaping(Element) -> String) {
self.prefix = prefix
self.searchExtractor = searchExtractor
}
convenience init(_ searchExtractor:#escaping(Element) -> String) {
self.init(prefix: "", searchExtractor: searchExtractor)
}
func add(_ element : Element) {
self.add(element, search: searchExtractor(element))
}
private func add(_ element : Element, search : String) {
if search == prefix {
elements.append(element)
} else {
let next = search[prefix.count]
if let sub = subNodes[next] {
sub.add(element, search: search)
} else {
subNodes[next] = PrefixSearchable(prefix: prefix + next, searchExtractor: searchExtractor)
subNodes[next]!.add(element, search: search)
}
}
}
func elementsWithChildren() -> [Element] {
var ele = [Element]()
for (_, sub) in subNodes {
ele.append(contentsOf: sub.elementsWithChildren())
}
return ele + elements
}
func search(search : String) -> [Element] {
print(prefix)
if search.count == prefix.count {
return elementsWithChildren()
} else {
let next = search[prefix.count]
if let sub = subNodes[next] {
return sub.search(search: search)
} else {
return []
}
}
}
}
let searchable : PrefixSearchable<Book> = PrefixSearchable({ $0.title.lowercased() })
searchable.add(Book(id: 1, title: "title", author: "", summary: ""))
searchable.add(Book(id: 2, title: "tille", author: "", summary: ""))
print(searchable.search(search: "ti")) // both books
print(searchable.search(search: "title")) // just one book
print(searchable.search(search: "xxx")) // no books
It can probably be improved in terms of readability (my swift is quite rusty right now). I would not guarantee that it works in all corner cases.
You would probably have to add a "search limit" which stops recursively returning all children if no exact match is found.

Before your start changing anything, run Instruments and determine where your bottlenecks are. It's very easy to chase the wrong things.
I'm very suspicious of that 60s number. That's a huge amount of time and suggests that you're actually doing this filtering repeatedly. I'm betting you do it once per visible row or something like that. That would explain why it's so much faster the second time. 300k is a lot, but it really isn't that much. Computers are very fast, and a minute is a very long time.
That said, there are some obvious problems with your existing filter. It recomputes prefix.lowercased() 300k times, which is unnecessary. you can pull that out:
let lowerPrefix = prefix.lowercased()
filteredBooks = books.filter { $0.title.lowercased().hasPrefix(lowerPrefix) }
Similarly, you're recomputing all of title.lowercased() for every search, and you almost never need all of it. You might cache the lowercased versions, but you also might just lowercase what you need:
let lowerPrefix = prefix.lowercased()
let prefixCount = prefix.count // This probably isn't actually worth caching
filteredBooks = books.filter { $0.title.prefix(prefixCount).lowercased() == lowerPrefix }
I doubt you'll get a lot of benefit this way, but it's the kind of thing to explore before exploring novel data structures.
That said, if the only kind of search you need is a prefix search, the Trie is definitely designed precisely for that problem. And yeah, binary searching is also worth considering if you can keep your list in title order, and prefix searching is the only thing you care about.
While it won't help your first search, keep in mind that your second search can often be much faster by caching recent searches. In particular, if you've searched "a" already, then you know that "ap" will be a subset of that, so you should use that fact. Similarly, it is very common for these kinds of searches to repeat themselves when users make typos and backspace. So saving some recent results can be a big win, at the cost of memory.
At these scales, memory allocations and copying can be a problem. Your Book type is on the order of 56 bytes:
MemoryLayout.stride(ofValue: Book()) // 56
(The size is the same, but stride is a bit more meaningful when you think about putting them in an array; it includes any padding between elements. In this case the padding is 0. But if you added a Bool property, you'd see the difference.)
The contents of strings don't have to be copied (if there's no mutation), so it doesn't really matter how long the strings are. But the metadata does have to be copied, and that adds up.
So a full copy of this array is on the order of 16MB of "must copy" data. The largest subset you would expect would be 10-15% (10% of words start with the most common letter, s, in English, but titles might skew this some). That's still on the order of a megabyte of copying per filter.
You can improve this by working exclusively in indices rather than full elements. There unfortunately aren't great tools for that in stdlib, but they're not that hard to write.
extension Collection {
func indices(where predicate: (Element) -> Bool) -> [Index] {
indices.filter { predicate(self[$0]) }
}
}
Instead of copying 56 bytes, this copies 8 bytes per result which could significantly reduce your memory churn.
You could also implement this as an IndexSet; I'm not certain which would be faster to work with.

Related

How to find value difference of two struct instances in Swift

I have two instances from the same struct in Swift. I need to find out key-values that have the same keys but different values.
For example:
struct StructDemo {
let shopId: Int
let template: String?
}
let a = StructDemo(shopId: 3, template: "a")
let a = StructDemo(shopId: 3, template: "different a")
// My expectation is to return the change pairs
let result = [template: "different a"]
My approach is as show below but comes errors.
static func difference(left: StructDemo, right: StructDemo) -> [String: Any]{
var result:[String: Any] = [:]
for leftItem in Mirror(reflecting: left).children {
guard let key = leftItem.label else { continue }
let value = leftItem.value
if value != right[key] { // This is the problem. Errror message: Protocol 'Any' as a type cannot conform to 'RawRepresentable'
result[key] = right[key]
}
}
}
Appreciate for any suggestion and solutions.
Thank you
The problem that you are seeing is that you referred to
right[key]
but right is a StructDemo and is not subscriptable. You can't look up fields given a runtime name. Well, you can with Mirror which you correctly used for left, but you did not mirror right.
Using Mirror will lead to other issues, as you will have expressions with static type Any where Equatable will be required in order to compare values.
IMHO, your best bet is to avoid a generic, reflective approach, and just embrace static typing and write a custom difference functions that iterates all the known fields of your type. Hard coding is not so bad here, if there is only one struct type that you are trying to diff.
If you have a handful of struct types each needing diffs then that might be a different story, but I don't know a good way to get around the need for Equatable. But if you have a ton of diffable types, maybe you want dictionaries to begin with?

"Preloading" A Dictionary With Keys in Swift

This is a fairly simple issue, but one I would like to solve, as it MAY help with performance.
I want to find out if Swift has a way to create a Dictionary, specifying ONLY keys, and maybe no values, or a single value that is set in each entry.
In other words, I want to create a Dictionary object, and "preload" its keys. Since this is Swift, the values could be 0 or nil (or whatever is a default empty).
The reason for this, is so that I can avoid two loops, where I go through once, filling a Dictionary with keys and empty values, and a second one, where I then set those values (There's a practical reason for wanting this, which is a bit out of the scope of this question).
Here's sort of what I'm thinking:
func gimme_a_new_dictionary(_ inKeyArray:[Int]) -> [Int:Int] {
var ret:[Int:Int] = [:]
for key in inKeyArray {
ret[key] = 0
}
return ret
}
let test1 = gimme_a_new_dictionary([4,6,1,3,0,1000])
But I'm wondering if there's a quicker way to do the same thing (as in "language construct" way -I could probably figure out a faster way to do this in a function).
UPDATE: The first solution ALMOST works. It works fine in Mac/iOS. However, the Linux version of Swift 3 doesn't seem to have the uniqueKeysWithValues initializer, which is annoying.
func gimme_a_new_dictionary(_ inKeyArray:[Int]) -> [Int:Int] {
return Dictionary<Int,Int>(uniqueKeysWithValues: inKeyArray.map {($0, 0)})
}
let test1 = gimme_a_new_dictionary([4,6,1,3,0,1000])
For Swift 4, you can use the dictionary constructor that takes a sequence and use map to create the sequence of tuples from your array of keys:
let dict = Dictionary(uniqueKeysWithValues: [4,6,1,3,0,1000].map {($0, 0)})
I presume you could optimize your code in terms of allocation by specifying the minimum capacity during the initialization. However, one liner may be the above answer, it's essentially allocation and looping to add 0 in each position.
func gimme_a_new_dictionary(_ inKeyArray:[Int], minCapacity: Int) -> [Int:Int] {
var ret = Dictionray<Int, Int>(minimumCapacity: minCapacity)
for key in inKeyArray {
ret[key] = 0
}
return ret
}
let test1 = gimme_a_new_dictionary([4,6,1,3,0,1000])
Take a look at this official documentation:
/// Use this initializer to avoid intermediate reallocations when you know
/// how many key-value pairs you are adding to a dictionary. The actual
/// capacity of the created dictionary is the smallest power of 2 that
/// is greater than or equal to `minimumCapacity`.
///
/// - Parameter minimumCapacity: The minimum number of key-value pairs to
/// allocate buffer for in the new dictionary.
public init(minimumCapacity: Int)

How do I append to a section of an array? [[]]

I have an array of arrays, theMealIngredients = [[]]
I'm creating a newMeal from a current meal and I'm basically copying all checkmarked ingredients into it from the right section and row from a tableview. However, when I use the append, it obviously doesn't know which section to go in as the array is multidimensional. It keeps telling me to cast it as an NSArray but that isn't what I want to do, I don't think.
The current line I'm using is:
newMeal.theMealIngredients.append((selectedMeal!.theMealIngredients[indexPath.section][indexPath.row])
You should re-model your data to match its meaning, then extract your tableview from that. That way you can work much more easily on the data without having to fuss with the special needs of displaying the data. From your description, you have a type, Meal that has [Ingredient]:
struct Meal {
let name: String
let ingredients: [Ingredient]
}
(None of this has been tested; but it should be pretty close to correct. This is all in Swift 3; Swift 2.2 is quite similar.)
Ingredient has a name and a kind (meat, carbs, etc):
struct Ingredient {
enum Kind: String {
case meat
case carbs
var name: String { return self.rawValue }
}
let kind: Kind
let name: String
}
Now we can think about things in terms of Meals and Ingredients rather than sections and rows. But of course we need sections and rows for table views. No problem. Add them.
extension Meal {
// For your headers (these are sorted by name so they have a consistent order)
var ingredientSectionKinds: [Ingredient.Kind] {
return ingredients.map { $0.kind }.sorted(by: {$0.name < $1.name})
}
// For your rows
var ingredientSections: [[Ingredient]] {
return ingredientSectionKinds.map { sectionKind in
ingredients.filter { $0.kind == sectionKind }
}
}
}
Now we can easily grab an ingredient for any given index path, and we can implement your copying requirement based on index paths:
extension Meal {
init(copyingIngredientsFrom meal: Meal, atIndexPaths indexPaths: [IndexPath]) {
let sections = meal.ingredientSections
self.init(name: meal.name, ingredients: indexPaths.map { sections[$0.section][$0.row] })
}
}
Now we can do everything in one line of calling code in the table view controller:
let newMeal = Meal(copyingIngredientsFrom: selectedMeal,
atIndexPaths: indexPathsForSelectedRows)
We don't have to worry about which section to put each ingredient into for the copy. We just throw all the ingredients into the Meal and let them be sorted out later.
Some of this is code is very inefficient (it recomputes some things many times). That would be a problem if ingredient lists could be long (but they probably aren't), and can be optimized if needed by caching the results or redesigning the internal implementation details of Meal. But starting with a clear data model keeps the code simple and straightforward rather than getting lost in nested arrays in the calling code.
Multi-dimentional arrays are very challenging to use well in Swift because they're not really multi-dimentional. They're just arrays of arrays. That means every row can have a different number of columns, which is a common source of crashing bugs when people run off the ends of a given row.

Swift: Mirror(reflecting: self) too slow?

I am trying to make a dictionary with the properties of a class of mine.
class SomeClass() {
var someString = "Hello, stackoverflow"
var someInt = 42 // The answer to life, the universe and everything
var someBool = true
func objectToDict() -> [String: String] {
var dict = [String: String]()
let reflection = Mirror(reflecting: self)
for child in reflection.children {
if let key = child.label {
dict[key] = child.value as? AnyObject
}
return dict
}
}
but objectToDict() is very slow. Is there a way to speed this up, or may be another approach to add the property values to a Dictionary?
I do not agree with most other users. Using reflection results less code, which means less time to develop, maintain, and test your product. With a well written library like EVReflection you don't need to worry about things behind the scene too much.
However, if performance is going to be a concern, do NOT use reflection based approaches at all. I'd say it's never really a problem in front-end development for me, especially in iOS, but it cannot be ignored in back-end development.
To see how slow it can be, I ran some test in Xcode. I'll write a blog about it, but generally speaking, getting Mirror is not the worst part (plus it may be possible to catch property list in memory), so replacing it with objc runtime wouldn't change the situation too much. In the other hand, setValue(_, forKey) is surprisingly slow. Considering that in real life you also need to perform tasks like checking dynamicType and so on, using the dynamic approach surely will make it 100+ times slower, which won't be acceptable for server development.
- Looping 1,000,000 times for a class with 1 `Int` property
- Getting mirror: 1.52
- Looping throw mirror and set `child.value`: 3.3
- Looping throw mirror and set `42`: 3.27
- Setting value `42`: 0.05
Again, for iOS I'll keep using it to save my time. Hopefully end customers won't care about whether it's 0.005 seconds or 0.0005 seconds.
Not only is that slow, it's also not a good idea: mirroring is for debug introspection only. You should instead construct the dictionary yourself. This ensures that you have the flexibility to store all the data in exactly the right way, and also decouples your Swift property names from the keys of the dictionary you're generating.
class SomeClass {
var someString = "Hello, stackoverflow"
var someInt = 42 // The answer to life, the universe and everything
var someBool = true
func objectToDict() -> [String: AnyObject] {
return ["someString": someString, "someInt": someInt, "someBool": someBool]
}
}

In Swift, how do I set every element inside an array to nil?

var roomsLiveStates = [Firebase?]()
for ref in roomsLiveStates {
if ref != nil {
ref = nil
}
}
}
This doesn't seem to work.
You can just set each to nil:
for index in 0 ..< roomsLiveStates.count {
roomsLiveStates[index] = nil
}
As The Swift Programming Language says in its Control Flow discussion of for syntax:
This example prints the first few entries in the five-times-table:
for index in 1...5 {
println("\(index) times 5 is \(index * 5)")
}
... In the example above, index is a constant whose value is automatically set at the start of each iteration of the loop. As such, it does not have to be declared before it is used. It is implicitly declared simply by its inclusion in the loop declaration, without the need for a let declaration keyword.
As this says, the index is a constant. As such, you can not change its value.
You can also use a map:
roomsLiveStates = roomsLiveStates.map { _ in nil }
This is less code and a good habit to get into for other cases where you may want to create a processed copy of an array without mutating the original.
if you want to set each element in array to a numberic value(int, float, double ...), you can try vDSP framework.
Please check this:
https://developer.apple.com/documentation/accelerate/vdsp/vector_clear_and_fill_functions
You can also just reassign the whole array to one that only contains nil like:
roomsLiveStates = [Firebase?](count: roomsLiveStates.count, repeatedValue: nil)
Although now that I think about it, this doesn't seem so good, because (probably?) new memory gets allocated which is not fast at all
EDIT: I just checked and found that using .map is a lot slower in Debug builds. However on Release builds, .map is about 20% faster. So I suggest using the .map version (which also is quiet a bit prettier ;)):
array = array.map { _ in nil }
Iterating the array to map it, will result in poor performance (iteration + new allocation).
Allocating the array all over (init(repeating...) is better, but still allocates a new array, very costly.
The best way would be to zero out the data without allocating it again, and as every C programmer knows, that's why we have bzero and memset for.
It won't matter much for small arrays in a non repeating action, and as long as you remember it - using the smallest code possible could make sense, so sometimes using map or init makes sense.
Final note on the test - map and init use multiple allocations of the same size in this test, which makes allocation allot faster than in real world usage.
In general, memory allocation is not your friend, it is a time consuming process that may result in having to defragment the heap, calling kernel code to allocate new virtual memory, and also must use locks and/or memory barriers.
import Foundation
guard CommandLine.argc == 2, let size = Int(CommandLine.arguments[1]),size > 0 else {
fatalError("Usage: \(CommandLine.arguments[0]) {size}\nWhere size > 0")
}
var vector = [Int].init(repeating: 2, count: size)
let ITERATIONS = 1000
var start:Date
var end:Date
start = Date()
for _ in 0..<ITERATIONS {
vector = vector.map({ _ in return 0 })
}
end = Date()
print("Map test: \(end.timeIntervalSince(start)) seconds")
start = Date()
for _ in 0..<ITERATIONS {
vector = [Int].init(repeating: 0, count: size)
}
end = Date()
print("init(repeating:,count:) test: \(end.timeIntervalSince(start)) seconds")
start = Date()
for _ in 0..<ITERATIONS {
let size = MemoryLayout.size(ofValue: vector[0]) * vector.count // could optimize by moving out the loop, but that would miss the purpose
vector.withUnsafeMutableBytes { ptr in
let ptr = ptr.baseAddress
bzero(ptr, size)
}
}
end = Date()
print("bzero test: \(end.timeIntervalSince(start)) seconds")
Results when running with an array of 5,000,000 items size (M1 Mac), compiled with optimizations:
Map test: 5.986680030822754 seconds
init(repeating:,count:) test: 2.291425108909607 seconds
bzero test: 0.6462910175323486 seconds
Edit:
Just realized it's also to initialize the memory using the initializeMemory(...) method.
Something like:
_ = vector.withUnsafeMutableBytes { ptr in
ptr.initializeMemory(as: Int.self, repeating: 0)
}
The performance is virtually the same as with bzero, and it is pure swift and shorter.

Resources