Which NSRegularExpression was found using the | operator - ios

I'm currently implementing NSRegularExpressions to check for patterns inside a UITextView string in my project.
The patterns check and operations are working as expected; for example: I'm trying to find the regular **bold** markdown pattern and if I find it I apply some text attributed to the range, and it works as expected.
I have though came across a problem. I don't know how to run multiple patterns at once and apply different operations for each pattern found.
In my UITextView delegate textViewDidChange or shouldChangeTextIn range: NSRange I am running the bold pattern check \\*{2}([\\w ]+)\\*{2} but then I am as well running the italic pattern check \\_{1}([\\w ]+)\\_{1}, looping again through the UITextView text.
I have implemented the following custom function, that applies the passed in regex to the string, but I have to call this function multiple times to check for each pattern, that's why I'd love to put the pattern check into one single, then "parse" each match.
fileprivate func regularExpression(regex: NSRegularExpression, type: TypeAttributes) {
let str = inputTextView.attributedText.string
let results = regex.matches(in: str, range: NSRange(str.startIndex..., in: str))
_ = results.map { self.applyAttributes(range: $0.range, type: type) }
}
Thanks.
EDIT
I can "merge" both patterns with the | operand like the following:
private let combinedPattern = "\\*{2}([\\w ]+)\\*{2}|\\_{1}([\\w ]+)\\_{1}"
but my problem is to know which pattern was found the \\*{2}([\\w ]+)\\*{2} one or the \\_{1}([\\w ]+)\\_{1}

If you use the combined pattern you have the results in different range of the match result.
If you want to access the first capture group (the bold pattern) you need to access the range at 1. When the match matches the second group you will have the first with an invalid range, so you need to check if it's valid of not this way:
results.forEach {
var range = $0.range(at: 1)
if range.location + range.length < str.count {
self.applyAttributes(range: range, type: .bold)
}
range = $0.range(at: 2)
if range.location + range.length < str.count {
self.applyAttributes(range: range, type: .italic)
}
}
After that you can extend your TypeAttributes enum to return the index range that is linked to your regular expression:
extension NSRange {
func isValid(for string:String) -> Bool {
return location + length < string.count
}
}
let attributes: [TypeAttributes] = [.bold, .italic]
results.forEach { match in
attributes.enumerated().forEach { index, attribute in
let range = match.range(at: index+1)
if range.isValid(for: str) {
self.applyAttributes(range: range, type: attribute[index])
}
}
}

Related

How to get range of specific substring even if a duplicate

I want to detect the words that begin with a #, and return their specific ranges. Initially I tried using the following code:
for word in words {
if word.hasPrefix("#") {
let matchRange = theSentence.range(of: word)
//Do stuff with this word
}
}
This works fine, except if you have a duplicate hashtag it will return the range of the first occurrence of the hashtag. This is because of the nature of the range(_:) function.
Say I have the following string:
"The range of #hashtag should be different to this #hashtag"
This will return (13, 8) for both hashtags, when really it should return (13, 8) as well as (50, 8). How can this be fixed? Please note that emojis should be able to be detected in the hashtag too.
EDIT
If you want to know how to do this with emojis to, go here
Create regex for that and use it with the NSRegularExpression and find the matches range.
var str = "The range of #hashtag should be different to this #hashtag"
let regex = try NSRegularExpression(pattern: "(#[A-Za-z0-9]*)", options: [])
let matches = regex.matchesInString(str, options:[], range:NSMakeRange(0, str.characters.count))
for match in matches {
print("match = \(match.range)")
}
Why don't you separate your word in chunks where each chunk starts with #. Then you can know how many times your word with # appears in sentence.
Edit: I think that regex answer is the best way for this but this is an other approach for same solution.
var hastagWords = [""]
for word in words {
if word.hasPrefix("#") {
// Collect all words which begin with # in an array
hastagWords.append(word)
}
}
// Create a copy of original word since we will change it
var mutatedWord = word.copy() as! String
for hashtagWord in hastagWords {
let range = mutatedWord.range(of: hashtagWord)
if let aRange = range {
// If range is OK then remove the word from original word and go to an other range
mutatedWord = mutatedWord.replacingCharacters(in: aRange, with: "")
}
}

Find index of Nth instance of substring in string in Swift

My Swift app involves searching through text in a UITextView. The user can search for a certain substring within that text view, then jump to any instance of that string in the text view (say, the third instance). I need to find out the integer value of which character they are on.
For example:
Example 1: The user searches for "hello" and the text view reads "hey hi hello, hey hi hello", then the user presses down arrow to view second instance. I need to know the integer value of the first h in the second hello (i.e. which # character that h in hello is within the text view). The integer value should be 22.
Example 2: The user searches for "abc" while the text view reads "abcd" and they are looking for the first instance of abc, so the integer value should be 1 (which is the integer value of that a since it's the first character of the instance they're searching for).
How can I get the index of the character the user is searching for?
Xcode 11 • Swift 5 or later
let sentence = "hey hi hello, hey hi hello"
let query = "hello"
var searchRange = sentence.startIndex..<sentence.endIndex
var indices: [String.Index] = []
while let range = sentence.range(of: query, options: .caseInsensitive, range: searchRange) {
searchRange = range.upperBound..<searchRange.upperBound
indices.append(range.lowerBound)
}
print(indices) // "[7, 21]\n"
Another approach is NSRegularExpression which is designed to easily iterate through matches in an string. And if you use the .ignoreMetacharacters option, it will not apply any sophisticated wildcard/regex logic, but will just look for the string in question. So consider:
let string = "hey hi hello, hey hi hello" // string to search within
let searchString = "hello" // string to search for
let matchToFind = 2 // grab the second occurrence
let regex = try! NSRegularExpression(pattern: searchString, options: [.caseInsensitive, .ignoreMetacharacters])
You could use enumerateMatches:
var count = 0
let range = NSRange(string.startIndex ..< string.endIndex, in: string)
regex.enumerateMatches(in: string, range: range) { result, _, stop in
count += 1
if count == matchToFind {
print(result!.range.location)
stop.pointee = true
}
}
Or you can just find all of them with matches(in:range:) and then grab the n'th one:
let matches = regex.matches(in: string, range: range)
if matches.count >= matchToFind {
print(matches[matchToFind - 1].range.location)
}
Obviously, if you were so inclined, you could omit the .ignoreMetacharacters option and allow the user to perform regex searches, too (e.g. wildcards, whole word searches, start of word, etc.).
For Swift 2, see previous revision of this answer.

Return range with first and last character in string

I have a string: "Hey #username that's funny". For a given string, how can I search the string to return all ranges of string with first character # and last character to get the username?
I suppose I can get all indexes of # and for each, get the substringToIndex of the next space character, but wondering if there's an easier way.
If your username can contain only letters and numbers, you can use regular expression for that:
let s = "Hey #username123 that's funny"
if let r = s.rangeOfString("#\\w+", options: NSStringCompareOptions.RegularExpressionSearch) {
let name = s.substringWithRange(r) // #username123"
}
#Vladimir's answer is correct, but if you're trying to find multiple occurrences of "username", this should also work:
let s = "Hey #username123 that's funny"
let ranges: [NSRange]
do {
// Create the regular expression.
let regex = try NSRegularExpression(pattern: "#\\w+", options: [])
// Use the regular expression to get an array of NSTextCheckingResult.
// Use map to extract the range from each result.
ranges = regex.matchesInString(s, options: [], range: NSMakeRange(0, s.characters.count)).map {$0.range}
}
catch {
// There was a problem creating the regular expression
ranges = []
}
for range in ranges {
print((s as NSString).substringWithRange(range))
}

iOS - regex to match word boundary, including underscore

I have a regex that I'm trying to run to match a variety of search terms. For example:
the search "old" should match:
-> age_old
-> old_age
but not
-> bold - as it's not at the start of the word
To do this, I was using a word boundary. However, word boundary doesn't take into account underscores. As mentioned here, there are work arounds available in other languages. Unfortunately, with NSRegularExpression, this doesn't look possible. Is there any other way to get a word boundary to work? Or other options?
TLDR: Use one of the following:
let rx = "(?<=_|\\b)old(?=_|\\b)"
let rx = "(?<![^\\W_])old(?![^\\W_])"
let rx = "(?<![\\p{L}\\d])old(?![\\p{L}\\d])"
See a regex demo #1, regex demo #2 and regex demo #3.
Swift and Objective C support ICU regex flavor. This flavor supports look-behinds of fixed and constrained width.
(?= ... )    Look-ahead assertion. True if the parenthesized pattern matches at the current input position, but does not advance the input position.
(?! ... )    Negative look-ahead assertion. True if the parenthesized pattern does not match at the current input position. Does not advance the input position.
(?<= ... )    Look-behind assertion. True if the parenthesized pattern matches text preceding the current input position, with the last character of the match being the input character just before the current position. Does not alter the input position. The length of possible strings matched by the look-behind pattern must not be unbounded (no * or + operators.)
(?<! ... )    Negative Look-behind assertion.
So, you can use
let regex = "(?<![\\p{L}\\d])old(?![\\p{L}\\d])";
See regex demo
Here is a Swift code snippet extracting all "old"s:
func matchesForRegexInText(regex: String, text: String) -> [String] {
do {
let regex = try NSRegularExpression(pattern: regex, options: [])
let nsString = text as NSString
let results = regex.matchesInString(text,
options: [], range: NSMakeRange(0, nsString.length))
return results.map { nsString.substringWithRange($0.range)}
} catch let error as NSError {
print("invalid regex: \(error.localizedDescription)")
return []
}
}
let s = "age_old -> old_age but not -> bold"
let rx = "(?<![\\p{L}\\d])old(?![\\p{L}\\d])"
let matches = matchesForRegexInText(rx, text: s)
print(matches) // => ["old", "old"]

fastest indexOf function for strings

I am currently using following extension for a string to get the index for a specific string in a big string:
func indexOf(target: String, startIndex: Int) -> Int
{
var startRange = advance(self.startIndex, startIndex)
var range = self.rangeOfString(target, options: NSStringCompareOptions.LiteralSearch, range: Range<String.Index>(start: startRange, end: self.endIndex))
if let range = range {
return distance(self.startIndex, range.startIndex)
} else {
return -1
}
}
I am calling this many times and I have a performance issue.
Does anyone have an idea how to do the indexOf() faster ?
Currently I am doing this in swift. Will doing this in Objective-C and bridging give a better performance ? Or probably if possible include any C Code ? Any ideas ?
UPDATE more about the Background
I have a long text, say with 5000 characters.
The Text contains several Metadata tags beside from normal text. These Tags are like {{blabl{{ sdasdg }} abla}} ; [[bla bla|blabla]] ; {|bla|}.
I like to remove them or format them in a specific way.
I can't use regular expression for this, because regular expression does not support stacked expressions ({{ {{ {{ {{dsgasdg}} }}}} }} )
So I wrote my own functions, which works, but is very slow.
What I am actually doing is I go throught the text and I am simply searchiong for these tags. For this I need a base function like the following, to determine which tag is the first and at which position. When I found a tag I will go to the next and so on. I recognized, that this is my most timeconsuming part of all. Of course I am calling this also a lot of time.
func getStart(sText:String, alSearchPatterns:[String], ifrom:Int) -> (Pattern:String, index:Int) {
var bweiter:Bool=true;
var actualcharacter:Character;
var returnPattern="";
var returnIndex = -1;
println("ifrom : " + String(ifrom));
var indexfound:Int = -1;
// finde ersten character der Patterns
var bsuchepattern=true;
for(var i=0;i<alSearchPatterns.count && bsuchepattern;i++){
let sPattern=alSearchPatterns[i];
let pattern_first_char=sPattern[0];
//let pattern_first_char=String(sPattern[0]);
let characterIndex = sText.indexOfCharacter(pattern_first_char, fromIndex: ifrom); // find(sText, pattern_first_char);
//let characterIndex = sText.indexOf(pattern_first_char, startIndex: ivon);
if(characterIndex != -1){
if((indexfound == -1) || characterIndex < indexfound){
// found something that is first of all actually.
let patternlength=sPattern.length;
let substring_in_text=sText.substring(characterIndex, endIndex: characterIndex + patternlength);
if(substring_in_text.equals(sPattern)){
returnPattern=sPattern;
returnIndex=characterIndex;
}
}
}
}
return (returnPattern,returnIndex);
}
Any hints how to do this more performant or any hints on how to do this better in general.

Resources