fastest indexOf function for strings - ios

I am currently using following extension for a string to get the index for a specific string in a big string:
func indexOf(target: String, startIndex: Int) -> Int
{
var startRange = advance(self.startIndex, startIndex)
var range = self.rangeOfString(target, options: NSStringCompareOptions.LiteralSearch, range: Range<String.Index>(start: startRange, end: self.endIndex))
if let range = range {
return distance(self.startIndex, range.startIndex)
} else {
return -1
}
}
I am calling this many times and I have a performance issue.
Does anyone have an idea how to do the indexOf() faster ?
Currently I am doing this in swift. Will doing this in Objective-C and bridging give a better performance ? Or probably if possible include any C Code ? Any ideas ?
UPDATE more about the Background
I have a long text, say with 5000 characters.
The Text contains several Metadata tags beside from normal text. These Tags are like {{blabl{{ sdasdg }} abla}} ; [[bla bla|blabla]] ; {|bla|}.
I like to remove them or format them in a specific way.
I can't use regular expression for this, because regular expression does not support stacked expressions ({{ {{ {{ {{dsgasdg}} }}}} }} )
So I wrote my own functions, which works, but is very slow.
What I am actually doing is I go throught the text and I am simply searchiong for these tags. For this I need a base function like the following, to determine which tag is the first and at which position. When I found a tag I will go to the next and so on. I recognized, that this is my most timeconsuming part of all. Of course I am calling this also a lot of time.
func getStart(sText:String, alSearchPatterns:[String], ifrom:Int) -> (Pattern:String, index:Int) {
var bweiter:Bool=true;
var actualcharacter:Character;
var returnPattern="";
var returnIndex = -1;
println("ifrom : " + String(ifrom));
var indexfound:Int = -1;
// finde ersten character der Patterns
var bsuchepattern=true;
for(var i=0;i<alSearchPatterns.count && bsuchepattern;i++){
let sPattern=alSearchPatterns[i];
let pattern_first_char=sPattern[0];
//let pattern_first_char=String(sPattern[0]);
let characterIndex = sText.indexOfCharacter(pattern_first_char, fromIndex: ifrom); // find(sText, pattern_first_char);
//let characterIndex = sText.indexOf(pattern_first_char, startIndex: ivon);
if(characterIndex != -1){
if((indexfound == -1) || characterIndex < indexfound){
// found something that is first of all actually.
let patternlength=sPattern.length;
let substring_in_text=sText.substring(characterIndex, endIndex: characterIndex + patternlength);
if(substring_in_text.equals(sPattern)){
returnPattern=sPattern;
returnIndex=characterIndex;
}
}
}
}
return (returnPattern,returnIndex);
}
Any hints how to do this more performant or any hints on how to do this better in general.

Related

How Can I Construct an Efficient CoreData Search, Including Allowing For Preceding and Trailing Characters Here?

Based on straight SQL searches in a previous app, I am adding CoreData searching to a new app. These searches are in a custom dictionary db that the app contains; this function does the work:
public func wordMatcher (pad: Int, word: Array<String>, substitutes : Set<String> ) {
let context = CoreDataManager.shared.persistentContainer.viewContext
var query: Array<String>
var foundPositions : Set<Int> = []
var searchTerms : Array<String> = []
if word.count >= 4 {
for i in 0..<word.count {
for letter in substitutes {
query = word
query[i] = letter
searchTerms.append(query.joined())
let rq: NSFetchRequest<Word> = Word.fetchRequest()
rq.predicate = NSPredicate(format: "name LIKE %#", query.joined())
rq.fetchLimit = 1
do {
if try context.fetch(rq).count != 0 {
foundPositions.insert(i)
break
}
} catch {
}
}
// do aggregated searchTerms search here instead of individual searches?
}
}
}
The NSFetchRequest focuses on one permutation at a time. But I'm accumulating the search string fragments in the array searchTerms because I don't know if it would be more efficient to construct a single query connected with ORs, and I also don't know how to do that in CoreData.
The focus is on the positions in the original term word: I need to indicate if any given location has at least one of the substitutes as a valid fit. So to implement the aggregate searchTerms approach, a FetchRequest would have to happen for each location in the base term.
A second complication is the one referred to in the title of the question. I am using LIKE because the search term in the FetchRequest could be a substring in a longer word. However, the maximum number of letters is 11, and pad is the starting point of the original term in that field of 11 spaces.
So if pad is 3, then I would need to allow for 0..<pad preceding characters. And because there may be trailing characters, I would also want results with 0..<(11 - (pad + word.count)) alphabetic characters after the last letter in the search term.
Regex seems like one way to do this, but I haven't found a clear example of how to do this in this case, and especially with the multiple search terms (if that's the way to go). The limits of SQLite in the previous version forced constructing multiple queries with increasing numbers of "_" underscores to indicate the padding characters; that tended to really explode the number of queries.
BTW, substitutes is limited to an absolute maximum of 9 values, and in practice is usually below 5, so things are a little more manageable.
I would like to get a grip on this, and so if anyone can provide direction or examples that can make this a reasonably efficient function, the help is appreciated greatly.
EDIT:
I've realized that I need a result for each position in the target string, with cases where the leading and trailing spaces also may need to contain a substitute as well.
So I'm moving to this:
public func wordMatcher (pad: Int, word: Array<String>, substitutes : Set<String> ) {
let context = CoreDataManager.shared.persistentContainer.viewContext
var pad_ = pad
var query: Array<String>
var foundPositions : Set<Int> = []
let rq: NSFetchRequest<Word> = Word.fetchRequest()
rq.fetchLimit = 1
let subs = "[\(substitutes.joined())]"
// if word.count >= 4 { // because those locations will be blocked off anyway otherwise
let start = pad > 0 ? -1 : 0
let finish = 11 - (pad + word.count) > 0 ? word.count + 1 : word.count
for i in start..<finish {
query = word
var _pad = 11 - (pad + word.count)
if i == -1 {
query = Array(arrayLiteral: subs) + query
pad_ -= 1
} else if i > word.count {
query.append(subs)
_pad -= 1
} else {
pad_ = pad
query[i] = subs
}
let endPad = _pad > 0 ? "{0,\(_pad)}" : ""
let predMatch = ".\(query.joined())\(endPad)"
print(predMatch)
rq.predicate = NSPredicate(format:"position <= %# AND word MATCHES %#", pad_, predMatch)
do {
if try context.fetch(rq).count != 0 {
foundPositions.insert(i)
}
} catch {
}
// }
}
lFreq = foundPositions
}
This relies on a regex substitution, inserted into the original target string. What I'll have to find out is if this is fast enough at the edge cases, but it may not be critical even in the worst case.
predMatch will end up looking something like "ab[xyx]d{0,3}", and I think I can get rid of the position section by changing it to be "{0,2}ab[xyx]d{0,3}". But I guess I'm going to have to try to find out.

backspace not work in outside of regex in swift

I use this method for patterning the phone number in UITextField at the .editingChange event
But the delete key only removes the numbers
extension String{
func applyPatternOnNumbers(pattern: String) -> String {
let replacmentCharacter: Character = "#"
let pureNumber = self.replacingOccurrences( of: "[^۰-۹0-9]", with: "", options: .regularExpression)
var result = ""
var pureNumberIndex = pureNumber.startIndex
for patternCharacter in pattern {
if patternCharacter == replacmentCharacter {
guard pureNumberIndex < pureNumber.endIndex else { return result }
result.append(pureNumber[pureNumberIndex])
pureNumber.formIndex(after: &pureNumberIndex)
} else {
result.append(patternCharacter)
}
}
return result
}
}
use at the editingChange event
let pattern = "+# (###) ###-####"
let mobile = textField.text.substring(to: pattern.count-1)
textfield.text = mobile.applyPatternOnNumbers(pattern: pattern)
// print(textfield.text) +1 (800) 666-8888
the problem is space & - , ( , ) chars can not to be removed
The RegEx you are trying is to not consider digits only:
[^۰-۹0-9]
I'm not sure, but you may change it to:
[^۰-۹0-9\s-\(\)]
and it may work. You might just add a \ before your special chars inside [] and you can any other chars into it that you do not need to be replaced.
Or you may simplify it to
[^\d\s-\(\)]
and it might work.
Method 2
You may use this RegEx which is an exact match to the phone number format you are having:
\+\d+\s\(\d{3}\)\s\d{3}-\d{4}
You may remove the first +, if it is unnecessary
\d+\s\(\d{3}\)\s\d{3}-\d{4}

How to split string as English and non English using Swift 4?

I have a string which contains English and Arabic together. I am using an API, that is why I cannot set an indicator in it.
What I want to get is: the Arabic and English split into tow parts. Here is a sample String:
"بِاسْمِكَ رَبِّي وَضَعْتُ جَنْبِي، وَبِكَ أَرْفَعُهُ، فَإِنْ أَمْسَكْتَ نَفْسِي فَارْحَمْهَا، وَإِنْ أَرْسَلْتَهَا فَاحْفَظْهَا، بِمَا تَحْفَظُ بِهِ عِبَادَكَ الصَّالِحِينَ.Bismika rabbee wadaAAtu janbee wabika arfaAAuh, fa-in amsakta nafsee farhamha, wa-in arsaltaha fahfathha bima tahfathu bihi AAibadakas-saliheen. In Your name my Lord, I lie down and in Your name I rise, so if You should take my soul then have mercy upon it, and if You should return my soul then protect it in the manner You do so with Your righteous servants.",
I cannot find how to split it into 2 parts that I get Arabic and English into two different parts.
What I want:
so there can be any language, my problem is to only take out English or Arabic language and show them in respective fields.
How can I achieve it?
You can use a Natural Language Tagger, which would work even if both scripts are intermingled:
import NaturalLanguage
let str = "¿como? بداية start وسط middle начать средний конец نهاية end. 從中間開始. "
let tagger = NLTagger(tagSchemes: [.script])
tagger.string = str
var index = str.startIndex
var dictionary = [String: String]()
var lastScript = "other"
while index < str.endIndex {
let res = tagger.tag(at: index, unit: .word, scheme: .script)
let range = res.1
let script = res.0?.rawValue
switch script {
case .some(let s):
lastScript = s
dictionary[s, default: ""] += dictionary["other", default: ""] + str[range]
dictionary.removeValue(forKey: "other")
default:
dictionary[lastScript, default: ""] += str[range]
}
index = range.upperBound
}
print(dictionary)
and print the result if you'd like:
for entry in dictionary {
print(entry.key, ":", entry.value)
}
yielding :
Hant : 從中間開始.
Cyrl : начать средний конец
Arab : بداية وسط نهاية
Latn : ¿como? start middle end.
This is still not perfect since the language tagger only checks to which script the most number of letters in a word belong to. For example, in the string you're working with, the tagger would consider الصَّالِحِينَ.Bismika as one word. To overcome this, we could use two pointers and traverse the original string and check the script of words individually. Words are defined as contiguous letters:
let str = "بِاسْمِكَ رَبِّي وَضَعْتُ جَنْبِي، وَبِكَ أَرْفَعُهُ، فَإِنْ أَمْسَكْتَ نَفْسِي فَارْحَمْهَا، وَإِنْ أَرْسَلْتَهَا فَاحْفَظْهَا، بِمَا تَحْفَظُ بِهِ عِبَادَكَ الصَّالِحِينَ.Bismika rabbee wadaAAtu janbee wabika arfaAAuh, fa-in amsakta nafsee farhamha, wa-in arsaltaha fahfathha bima tahfathu bihi AAibadakas-saliheen. In Your name my Lord, I lie down and in Your name I rise, so if You should take my soul then have mercy upon it, and if You should return my soul then protect it in the manner You do so with Your righteous servants."
let tagger = NLTagger(tagSchemes: [.script])
var i = str.startIndex
var dictionary = [String: String]()
var lastScript = "glyphs"
while i < str.endIndex {
var j = i
while j < str.endIndex,
CharacterSet.letters.inverted.isSuperset(of: CharacterSet(charactersIn: String(str[j]))) {
j = str.index(after: j)
}
if i != j { dictionary[lastScript, default: ""] += str[i..<j] }
if j < str.endIndex { i = j } else { break }
while j < str.endIndex,
CharacterSet.letters.isSuperset(of: CharacterSet(charactersIn: String(str[j]))) {
j = str.index(after: j)
}
let tempo = String(str[i..<j])
tagger.string = tempo
let res = tagger.tag(at: tempo.startIndex, unit: .word, scheme: .script)
if let s = res.0?.rawValue {
lastScript = s
dictionary[s, default: ""] += dictionary["glyphs", default: ""] + tempo
dictionary.removeValue(forKey: "glyphs")
}
else { dictionary["other", default: ""] += tempo }
i = j
}
You can use the NaturalLanguageTagger as answered by #ielyamani but the only limitation is that it is iOS 12+
If you are trying to do this on earlier iOS versions, you can take a look at NSCharacterSet
You can create your own characterset to check whether a string has english characters and numbers
extension String {
func containsLatinCharacters() -> Bool {
var charSet = NSCharacterSet(charactersInString: "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890")
charSet = charSet.invertedSet
let range = (self as NSString).rangeOfCharacterFromSet(charSet)
if range.location != NSNotFound {
return false
}
return true
}
}
Another option is to use the charactersets already available:
let nonLatinString = string.trimmingCharacters(in: .alphanumerics)//symbols will still get through
let latinString = string.trimmingCharacters(in: CharacterSet.alphanumerics.inverted)//symbols and non-latin characters wont get through
With these you can get the strings you want quite easily. But if these are not good enough, you can look to create your own characterset, use union, intersect etc to filter out the wanted and the unwanted characters.
Step 1:
You have to split whole string into an array by "." as I can see there are "." between sentence.
Step 2:
Pass each sentence to determine its language and append into different string.
Final Code
//add in your viewController
enum Language : String {
case arabic = "ar"
case english = "en"
}
override func viewDidLoad() {
super.viewDidLoad()
//make array of string
let kalmaArray = "بِاسْمِكَ رَبِّي وَضَعْتُ جَنْبِي، وَبِكَ أَرْفَعُهُ، فَإِنْ أَمْسَكْتَ نَفْسِي فَارْحَمْهَا، وَإِنْ أَرْسَلْتَهَا فَاحْفَظْهَا، بِمَا تَحْفَظُ بِهِ عِبَادَكَ الصَّالِحِينَ.Bismika rabbee wadaAAtu janbee wabika arfaAAuh, fa-in amsakta nafsee farhamha, wa-in arsaltaha fahfathha bima tahfathu bihi AAibadakas-saliheen. In Your name my Lord, I lie down and in Your name I rise, so if You should take my soul then have mercy upon it, and if You should return my soul then protect it in the manner You do so with Your righteous servants.".components(separatedBy: ".")
splitInLanguages(kalmaArray: kalmaArray)
}
private func splitInLanguages(kalmaArray: [String]){
var englishText = ""
var arabicText = ""
for kalma in kalmaArray {
if kalma.count > 0 {
if let language = NSLinguisticTagger.dominantLanguage(for: kalma) {
switch language {
case Language.arabic.rawValue:
arabicText.append(kalma)
arabicText.append(".")
break
default: // English
englishText.append(kalma)
englishText.append(".")
break
}
} else {
print("Unknown language")
}
}
}
debugPrint("Arabic: ", arabicText)
debugPrint("English: ", englishText)
}
I hope it will help you to split the string in two language. Let me know if you are still having any issue.

How to get all strings between particular delimiters?

I have a string called source. This string contains tags, marked with number signs (#) on left and right side.
What is the most efficient way to get tag names from the source string.
Source string:
let source = "Here is tag 1: ##TAG_1##, tag 2: ##TAG_2##."
Expected result:
["TAG_1", "TAG_2"]
Not a very short solution, but here you go:
let tags = source.componentsSeparatedByCharactersInSet(NSCharacterSet(charactersInString: " ,."))
.filter { (str) -> Bool in
return str.hasSuffix("##") && str.hasPrefix("##")
}
.map { (str) -> String in
return str.stringByReplacingOccurrencesOfString("##", withString: "")
}
Split the string at all occurences of ##:
let components = source.components(separatedBy: "##")
// Result: ["Here is tag 1: ", "TAG_1", ", tag 2: ", "TAG_2", "."]
Check that there's an odd number of components, otherwise there's an odd amount of ##s:
guard components.count % 2 == 1 else { fatalError("Unbalanced delimiters") }
Get every second element:
components.enumerated().filter{ $0.offset % 2 == 1 }.map{ $0.element }
In a single function:
import Foundation
func getTags(source: String, delimiter: String = "##") -> [String] {
let components = source.components(separatedBy: delimiter)
guard components.count % 2 == 1 else { fatalError("Unbalanced delimiters") }
return components.enumerated().filter{ $0.offset % 2 == 1 }.map{ $0.element }
}
getTags(source: "Here is tag 1: ##TAG_1##, tag 2: ##TAG_2##.") // ["TAG_1", "TAG_2"]
You can read this post and adapt the answer for your needs: Swift: Split a String into an array
If not you can also create your own method, remember a string is an array of characters, so you can use a loop to iterate through and check for a '#'
let strLength = source.characters.count;
var strEmpty = "";
for( var i=0; i < strLength; i++ )
{
if( source[ i ] == '#' )
{
var j=(i+2);
for( j; source[ (i+j) ] != '#'; j++ )
strEmpty += source[ (i+j) ]; // concatenate the characters to another variable using the += operator
i = j+2;
// do what you need to with the tag
}
}
I am more of a C++ programmer than a Swift programmer, so this is how I would approach it if I didn't want to use standard methods. There may be a better way of doing it, but I don't have any Swift knowledge.
Keep in mind if this does not compile then you may have to adapt the code slightly as I do not have a development environment I can test this in before posting.

Replace part of string with lower case letters - Swift

I have a Swift based iOS app and one of the features allows you to comment on a post. Anyway, users can add "#mentions" in their posts to tag other people. However I want to stop the user from adding a username with a capital letter.
Is there anyway I can convert a string, so that the #usernames are all in lowercase?
For example:
I really enjoy sightseeing with #uSerABC (not allowed)
I really enjoy sightseeing with #userabc (allowed)
I know there is a property for the string in swift called .lowercaseString - but the problem with that, is that it makes the entire string lowercase and thats not what I want. I only want the #username to be in lower case.
Is there any way around this with having to use the .lowercase property.
Thanks for your time, Dan.
This comes from a code I use to detect hashtags, I've modified to detect mentions:
func detectMentionsInText(text: String) -> [NSRange]? {
let mentionsDetector = try? NSRegularExpression(pattern: "#(\\w+)", options: NSRegularExpressionOptions.CaseInsensitive)
let results = mentionsDetector?.matchesInString(text, options: NSMatchingOptions.WithoutAnchoringBounds, range: NSMakeRange(0, text.utf16.count)).map { $0 }
return results?.map{$0.rangeAtIndex(0)}
}
It detects all the mentions in a string by using a regex and returns an NSRange array, by using a range you have the beginning and the end of the "mention" and you can easily replace them with a lower case version.
Split the string into two using the following command -
let arr = myString.componentsSeparatedByString("#")
//Convert arr[1] to lower case
//Append to arr[0]
//Enjoy
Thanks to everyone for their help. In the end I couldn't get any of the solutions to work and after a lot of testing, I came up with this solution:
func correctStringWithUsernames(inputString: String, completion: (correctString: String) -> Void) {
// Create the final string and get all
// the seperate strings from the data.
var finalString: String!
var commentSegments: NSArray!
commentSegments = inputString.componentsSeparatedByString(" ")
if (commentSegments.count > 0) {
for (var loop = 0; loop < commentSegments.count; loop++) {
// Check the username to ensure that there
// are no capital letters in the string.
let currentString = commentSegments[loop] as! String
let capitalLetterRegEx = ".*[A-Z]+.*"
let textData = NSPredicate(format:"SELF MATCHES %#", capitalLetterRegEx)
let capitalResult = textData.evaluateWithObject(currentString)
// Check if the current loop string
// is a #user mention string or not.
if (currentString.containsString("#")) {
// If we are in the first loop then set the
// string otherwise concatenate the string.
if (loop == 0) {
if (capitalResult == true) {
// The username contains capital letters
// so change it to a lower case version.
finalString = currentString.lowercaseString
}
else {
// The username does not contain capital letters.
finalString = currentString
}
}
else {
if (capitalResult == true) {
// The username contains capital letters
// so change it to a lower case version.
finalString = "\(finalString) \(currentString.lowercaseString)"
}
else {
// The username does not contain capital letters.
finalString = "\(finalString) \(currentString)"
}
}
}
else {
// The current string is NOT a #user mention
// so simply set or concatenate the finalString.
if (loop == 0) {
finalString = currentString
}
else {
finalString = "\(finalString) \(currentString)"
}
}
}
}
else {
// No issues pass back the string.
finalString = inputString
}
// Pass back the correct username string.
completion(correctString: finalString)
}
Its certainly not the most elegant or efficient solution around but it does work. If there are any ways of improving it, please leave a comment.

Resources