With reference to the answer posted by Asperi (https://stackoverflow.com/users/12299030/asperi) on Question: Highlight a specific part of the text in SwiftUI
I have found his answer quite useful, however, when my String input exceeds 32k characters the app crashes, so I am assuming the String() is a max of 32k and am looking for a work around.
In my app, if someone searches for the word "pancake", the search word will be stored and when the user looks at the detail page (of lets say a recipe), the word pancake will highlight. All works well with this answer, but when the recipe exceeds 32k characters, the app crashes with exceeding index range messages. (specific error message: Thread 1: EXC_BAD_ACCESS (code=2, address=0x16d43ffb4))
Here is the modified code from the answer on that question:
This will print the data:
hilightedText(str: self.recipes.last!.recipeData!)
.multilineTextAlignment(.leading)
.font(.system(size: CGFloat( settings.fontSize )))
There is obviously more to this code above, but in essence, it iterates a database, and finds the last record containing 'search word' and displays the recipeData here, which is a large string contained in the database.
to implement the highlightedText functionality:
func hilightedText(str: String) -> Text {
let textToSearch = searched
var result: Text!
for word in str.split(separator: " ") {
var text = Text(word)
if word.uppercased().contains(textToSearch.uppercased()) {
text = text.bold().foregroundColor(.yellow)
}
//THIS NEXT LINE has been identified as the problem:
result = (result == nil ? text : result + Text(" ") + text)
}
return result
}
I've modified the answer from Asperi slightly to suit my needs and all works really well, unless I come across a recipeData entry that is larger than 32k in size, as stated before.
I have tried Replacing String with a few other data types and nothing works..
Any ideas?
Thank you!
UPDATE:
After lengthy discussion in the comments, it appears that the root cause of the issue is at some point, for some records, I am exceeding the maximum Text("") concatenations.
In the above code, each word is split out, evaluated and added to the long string "result" which winds up looking like this:
Text("word") + Text(" ") + Text("Word")
and so on.
This is done, so I can easily apply color attributes per word, but it would seem that once I hit a certain number of words (which is less that 32k, one record was 22k and crashed), the app crashes.
Leo suggested https://stackoverflow.com/a/59531265/2303865 this thread as an alternative and I will have to attempt to implement that instead.
Thank you..
Hmm... unexpected limitation... anyway - learn something new.
Ok, here is improved algorithm, which should move that limitation far away.
Tested with Xcode 12 / iOS 14. (also updated code in referenced topic Highlight a specific part of the text in SwiftUI)
func hilightedText(str: String, searched: String) -> Text {
guard !str.isEmpty && !searched.isEmpty else { return Text(str) }
var result = Text("")
var range = str.startIndex..<str.endIndex
repeat {
guard let found = str.range(of: searched, options: .caseInsensitive, range: range, locale: nil) else {
result = result + Text(str[range])
break
}
let prefix = str[range.lowerBound..<found.lowerBound]
result = result + Text(prefix) + Text(str[found]).bold().foregroundColor(.yellow)
range = found.upperBound..<str.endIndex
} while (true)
return result
}
After much discussion in the comments, it became clear that I was hitting a maximum Text() concatenations limit, so beware, apparently there is one.
I realized however that I only needed to have a split Text("Word") when that particular word required special formatting (IE highlighting, etc), otherwise, I could concat all of the raw strings together and send that as a Text("String of words").
This approach mitigated the action of having every single word sent as a Text("Word" by itself and cut down greatly on the number of Text()'s being returned.
see code below that solved the issue:
func hilightedText(str: String) -> Text {
let textToSearch = searched
var result = Text(" ")
var words: String = " "
var foundWord = false
for line in str.split(whereSeparator: \.isNewline) {
for word in line.split(whereSeparator: \.isWhitespace) {
if word.localizedStandardContains(textToSearch) {
foundWord = true
result += Text(words) + Text(" ") + Text(word).bold().foregroundColor(.yellow)
} else {
if foundWord {
words = ""
}
foundWord = false
words += " " + word
}
}
words += "\n\n"
}
return result + Text(" ") + Text(words)
}
extension Text {
static func += (lhs: inout Text, rhs: Text) {
lhs = lhs + rhs
}
}
It could use some cleanup as also discussed in the comments for splitting by whitespace, etc, but this was just to overcome the issue of crashing. Needs some additional testing before I call it good, but no more crashing..
ADDED:
the suggestion to use separator by .isWhiteSpace worked, but when I put it back together, everything was a space, no more line breaks, so I added the extra split by line breaks to preserve the line breaks.
Related
I have an array with over 300k objects which I'm showing in a UITableView. When filtering by prefix match using the filter method, the first search takes a bit over 60s! After that, the search is way faster taking around 1s, which I'd still want to improve a bit more.
The object looks like this:
struct Book {
let id: Int
let title: String
let author: String
let summary: String
}
This is how I'm filtering at the moment:
filteredBooks = books.filter { $0.title.lowercased().hasPrefix(prefix.lowercased()) }
The data comes from a JSON file which I decode using Codable (which takes a bit longer than I would like as well). I'm trying to achieve this without a database or any kind of framework implementation nor lazy loading the elements. I'd like to be able to show the 300k objects in the UITableView and realtime filter with a decent performance.
I've Googled a bit and found the Binary Search and Trie search algorithms but didn't know how to implement them to be able to use them with Codable and my struct. Also, maybe replacing the struct with another data type would help but not sure which either.
Because I liked the challenge I put something together.
It is basically a tree with each layer of the tree containing a title prefix plus the elements with that exact match plus a list of lower trees with each having the same prefix plus one more letter of the alphabet:
extension String {
subscript (i: Int) -> String {
let start = index(startIndex, offsetBy: i)
return String(self[start...start])
}
}
struct Book {
let id: Int
let title: String
let author: String
let summary: String
}
class PrefixSearchable <Element> {
let prefix: String
var elements = [Element]()
var subNodes = [String:PrefixSearchable]()
let searchExtractor : (Element) -> String
private init(prefix: String, searchExtractor:#escaping(Element) -> String) {
self.prefix = prefix
self.searchExtractor = searchExtractor
}
convenience init(_ searchExtractor:#escaping(Element) -> String) {
self.init(prefix: "", searchExtractor: searchExtractor)
}
func add(_ element : Element) {
self.add(element, search: searchExtractor(element))
}
private func add(_ element : Element, search : String) {
if search == prefix {
elements.append(element)
} else {
let next = search[prefix.count]
if let sub = subNodes[next] {
sub.add(element, search: search)
} else {
subNodes[next] = PrefixSearchable(prefix: prefix + next, searchExtractor: searchExtractor)
subNodes[next]!.add(element, search: search)
}
}
}
func elementsWithChildren() -> [Element] {
var ele = [Element]()
for (_, sub) in subNodes {
ele.append(contentsOf: sub.elementsWithChildren())
}
return ele + elements
}
func search(search : String) -> [Element] {
print(prefix)
if search.count == prefix.count {
return elementsWithChildren()
} else {
let next = search[prefix.count]
if let sub = subNodes[next] {
return sub.search(search: search)
} else {
return []
}
}
}
}
let searchable : PrefixSearchable<Book> = PrefixSearchable({ $0.title.lowercased() })
searchable.add(Book(id: 1, title: "title", author: "", summary: ""))
searchable.add(Book(id: 2, title: "tille", author: "", summary: ""))
print(searchable.search(search: "ti")) // both books
print(searchable.search(search: "title")) // just one book
print(searchable.search(search: "xxx")) // no books
It can probably be improved in terms of readability (my swift is quite rusty right now). I would not guarantee that it works in all corner cases.
You would probably have to add a "search limit" which stops recursively returning all children if no exact match is found.
Before your start changing anything, run Instruments and determine where your bottlenecks are. It's very easy to chase the wrong things.
I'm very suspicious of that 60s number. That's a huge amount of time and suggests that you're actually doing this filtering repeatedly. I'm betting you do it once per visible row or something like that. That would explain why it's so much faster the second time. 300k is a lot, but it really isn't that much. Computers are very fast, and a minute is a very long time.
That said, there are some obvious problems with your existing filter. It recomputes prefix.lowercased() 300k times, which is unnecessary. you can pull that out:
let lowerPrefix = prefix.lowercased()
filteredBooks = books.filter { $0.title.lowercased().hasPrefix(lowerPrefix) }
Similarly, you're recomputing all of title.lowercased() for every search, and you almost never need all of it. You might cache the lowercased versions, but you also might just lowercase what you need:
let lowerPrefix = prefix.lowercased()
let prefixCount = prefix.count // This probably isn't actually worth caching
filteredBooks = books.filter { $0.title.prefix(prefixCount).lowercased() == lowerPrefix }
I doubt you'll get a lot of benefit this way, but it's the kind of thing to explore before exploring novel data structures.
That said, if the only kind of search you need is a prefix search, the Trie is definitely designed precisely for that problem. And yeah, binary searching is also worth considering if you can keep your list in title order, and prefix searching is the only thing you care about.
While it won't help your first search, keep in mind that your second search can often be much faster by caching recent searches. In particular, if you've searched "a" already, then you know that "ap" will be a subset of that, so you should use that fact. Similarly, it is very common for these kinds of searches to repeat themselves when users make typos and backspace. So saving some recent results can be a big win, at the cost of memory.
At these scales, memory allocations and copying can be a problem. Your Book type is on the order of 56 bytes:
MemoryLayout.stride(ofValue: Book()) // 56
(The size is the same, but stride is a bit more meaningful when you think about putting them in an array; it includes any padding between elements. In this case the padding is 0. But if you added a Bool property, you'd see the difference.)
The contents of strings don't have to be copied (if there's no mutation), so it doesn't really matter how long the strings are. But the metadata does have to be copied, and that adds up.
So a full copy of this array is on the order of 16MB of "must copy" data. The largest subset you would expect would be 10-15% (10% of words start with the most common letter, s, in English, but titles might skew this some). That's still on the order of a megabyte of copying per filter.
You can improve this by working exclusively in indices rather than full elements. There unfortunately aren't great tools for that in stdlib, but they're not that hard to write.
extension Collection {
func indices(where predicate: (Element) -> Bool) -> [Index] {
indices.filter { predicate(self[$0]) }
}
}
Instead of copying 56 bytes, this copies 8 bytes per result which could significantly reduce your memory churn.
You could also implement this as an IndexSet; I'm not certain which would be faster to work with.
I'm currently developing a custom keyboard app and am having trouble parsing what the keyboard has outputted onto the text document proxy. How does one go about this? I feel like I'm losing my mind. Currently I'm looping:
for letter in (proxy.documentContextBeforeInput?.characters)!
However, this is only getting the text on the line the cursor is currently on, before the cursor, such that if my textDocumentProxy contains:
Some text above
Some text below (cursor position)
My loop only iterates throught the "Some text below" portion.
Is there any way to loop through the entirety of a UITextDocumentProxy? Thank you.
documentContextBeforeInput, as its name mentions, only returns the text before the input. To get the full text string, you should do something like:
let entireText = (proxy.documentContextBeforeInput ?? "") + (proxy.documentContextAfterInput ?? "")
if let chars = entireText.characters {
for letter in chars {
//DO something useful
}
}
I am developing a IOS custom keyboard. I was wondering if there was a way to fetch the current text inside of the text field and how it would work.
For example, we can use textDocumentProxy.hasText() to see if the textfield has text inside but I want to know the exact string that is inside the textfield.
The closest things would be textDocumentProxy.documentContextBeforeInput and textDocumentProxy.documentContextAfterInput. These will respect sentences and such, which means if the value is a paragraph, you will only get the current sentence. Users have been known to retrieve the entire string by repositioning the cursor multiple times until everything is retrieved.
Of course, you generally do not have to worry about this if the field expects a single value like a username, email, id number, etc. Combining the values of both before and after input contexts should suffice.
Sample Code
For the single phrase value, you would do:
let value = (textDocumentProxy.documentContextBeforeInput ?? "") + (textDocumentProxy.documentContextAfterInput ?? "")
For values that might contain sentence ending punctuation, it will be a little more complicated as you need to run it on a separate thread. Because of this, and the fact that you have to move the input cursor to get the full text, the cursor will visibly move. It is also unknown whether this will be accepted into the AppStore (after all, Apple probably did not add an easy way to get the full text on purpose in order to prevent official custom keyboards from invading a user's privacy).
Note: the below code is based off of this Stack Overflow answer except modified for Swift, removed unnecessary sleeps, uses strings with no custom categories, and uses a more efficient movement process.
func foo() {
dispatch_async(dispatch_queue_create("com.example.test", DISPATCH_QUEUE_SERIAL)) { () -> Void in
let string = self.fullDocumentContext()
}
}
func fullDocumentContext() {
let textDocumentProxy = self.textDocumentProxy
var before = textDocumentProxy.documentContextBeforeInput
var completePriorString = "";
// Grab everything before the cursor
while (before != nil && !before!.isEmpty) {
completePriorString = before! + completePriorString
let length = before!.lengthOfBytesUsingEncoding(NSUTF8StringEncoding)
textDocumentProxy.adjustTextPositionByCharacterOffset(-length)
NSThread.sleepForTimeInterval(0.01)
before = textDocumentProxy.documentContextBeforeInput
}
// Move the cursor back to the original position
self.textDocumentProxy.adjustTextPositionByCharacterOffset(completePriorString.characters.count)
NSThread.sleepForTimeInterval(0.01)
var after = textDocumentProxy.documentContextAfterInput
var completeAfterString = "";
// Grab everything after the cursor
while (after != nil && !after!.isEmpty) {
completeAfterString += after!
let length = after!.lengthOfBytesUsingEncoding(NSUTF8StringEncoding)
textDocumentProxy.adjustTextPositionByCharacterOffset(length)
NSThread.sleepForTimeInterval(0.01)
after = textDocumentProxy.documentContextAfterInput
}
// Go back to the original cursor position
self.textDocumentProxy.adjustTextPositionByCharacterOffset(-(completeAfterString.characters.count))
let completeString = completePriorString + completeAfterString
print(completeString)
return completeString
}
I'm replacing the selected text in a textView with the new one. To accomplish this, I'm using this code based on this answer of beyowulf. All works well, the replaced text becomes selected, the problem arises when in the text there is one ore more special characters (like emoji etc). In this case the selected text misses one ore more characters at the end of the selection.
mainTextField.replaceRange((theRange), withText: newStr) // replace old text with the new one
selectNewText(theRange, newStr: newStr) // select the new text
func selectNewText(theRange: UITextRange, newStr: String) {
let newStrLength = newStr.characters.count // let's see how long is the string
mainTextField.selectedTextRange = mainTextField.textRangeFromPosition(theRange.start, toPosition: mainTextField.positionFromPosition(theRange.start, offset: newStrLength)!)
mainTextField.becomeFirstResponder()
}
OK, after I read the answers and comments to this question, I fixed this problem by replacing this statement (which returns the "human-perceptible" number of characters):
let newStrLength = newStr.characters.count
With this one:
let newStrLength = newStr.utf16.count
PS
By the way, here is some test I done with different implementations:
let str = "Abc😬"
let count = str.characters.count
print(count) // 4
let count2 = str.utf16.count
print(count2) // 5
let count3 = str.utf8.count
print(count3) // 7
In a UITableView, I'm listing a bunch of languages to be selected. And to put a section index view to the right like in Contacts app, I'm getting all first letters of languages in the list and then use it to generate the section index view.
It works almost perfect, Just I encountered with a problem in getting first letter of some strings in Hebrew. Here a screenshot from playground, one of the language name that I couldn't get the first letter:
Problem is, the first letter of the name of the language that has "ina" language code, isn't "א", it's an empty character; it's not a space, it's just an empty character. As you can see, it's actually 12 characters in total, but when I get count of it, it says 13 characters because there is an non-space empty character in index 0.
It works perfectly if I use "eng" or "ara" languages with putting these values in value: parameter. So maybe the problem is cause of system that returns a language name with an empty character in some cases, I don't know.
I tried some different methods of getting first letter, but any of it didn't work.
Here "א" isn't the first letter, it's the second letter. So I thought maybe I can find a simple hack with that, but I want to try solving it before trying workarounds.
Here is the code:
let locale = NSLocale(localeIdentifier: "he")
let languageName = locale.displayNameForKey(NSLocaleIdentifier, value: "ina")!
let firstLetter = first(languageName)!
println(countElements(languageName))
for character in languageName {
println(character)
}
You could use an NSCharacterSet.controlCharacterSet() to test each character. I can't figure out how to stay in Swift-native strings, but here's a function that uses NSString to return the first non-control character:
func firstNonControlCharacter(str: NSString) -> String? {
let controlChars = NSCharacterSet.controlCharacterSet()
for i in 0..<str.length {
if !controlChars.characterIsMember(str.characterAtIndex(i)) {
return str.substringWithRange(NSRange(location: i, length: 1))
}
}
return nil
}
let locale = NSLocale(localeIdentifier: "he")
let languageName = locale.displayNameForKey(NSLocaleIdentifier, value: "ina")!
let firstChar = firstNonControlCharacter(languageName) // Optional("א")