NSRegularExpression returns only a single match - ios

I had written a regex for matching cell addresses like in a spreadsheet and tried to match input strings using NSRegularExpression's matches(in string:, options:, range:) method.
let input = "A:IV"
let regexStr = "(\\$?(([a-h]?[a-z])|i[a-v])([ \\t]*)(:([ \\t]*)\\$?(([a-h]?[a-z])|i[a-v])([ \\t]*))*(:([ \\t]*)\\$?(([a-h]?[a-z])|i[a-v])))"
let colRegex = try! NSRegularExpression.init(pattern: regexStr, options: .caseInsensitive)
let matches = colRegex.matches(in: input, options: [], range: NSRange(location: 0, length: input.utf16.count))
for match in matches {
print("Match >>>>", match)
}
This code prints only a single match A:I and does not match A:IV.
When I change the regex like this :
(\\$?(i[a-v]|([a-h]?[a-z]))([ \\t]*)(:([ \\t]*)\\$?(i[a-v]|([a-h]?[a-z]))([ \\t]*))*(:([ \\t]*)\\$?(i[a-v]|([a-h]?[a-z]))))
The code prints a single match A:IV and not A:I. Even though matching the entire input string is my goal, I do not understand why there is no more than one match for any given input string. There seems to no option that negates this behaviour either.

Related

iOS Swift: looking for ranges of matching word in a string

I need to make a function that returns me ranges of matching words in a given string, for example, given the sentence below:
Hey, bro! Your brother is also her brother.
I want to find an array of Range in the sentence that matches the word "bro", it should match the exact word (case insensitive), so "bro" should only match "bro" but not "brother".
I thought about:
split the sentence, e.g. "hey", "bro", "your", "brother", "is", "also", "her", "brother"
map each word to a word with range, e.g. "hey" would become ["hey", 0...2]
filter and map the word and range array, matching "bro"
Step 2 needs some treatment to make sure the range for each word (in the sentence) can be mapped to the right word, e.g. the first "brother" and second "brother" should have different ranges depending on where they are located.
Is there any smarter way of doing this?
Edit:
Sorry, I forgot to mention, the reason for not using Regex was that sometimes the word has a dot in it, for example:
there is orange in the basket.
from the above sentence, finding the string "or.ge" using regex would match "orange" as well.
I have tested in Playground, You can use this extension to get the values matching this reg ex.
extension String {
func ranges(of substring: String, options: CompareOptions = [], locale: Locale? = nil) -> [Range<Index>] {
var ranges: [Range<Index>] = []
while ranges.last.map({ $0.upperBound < self.endIndex }) ?? true,
let range = self.range(of: substring, options: options, range: (ranges.last?.upperBound ?? self.startIndex)..<self.endIndex, locale: locale)
{
ranges.append(range)
}
return ranges
}
}
let searchString = "bro"
var str = "Hey, bro! Your brother is also her brother."
var reg = str.ranges(of: "(?<![\\p{L}\\d])\(searchString)(?![\\p{L}\\d])", options: [.regularExpression, .caseInsensitive])
str.removeSubrange(reg.first!)
print(str)
Credits to,
iOS - regex to match word boundary, including underscore
One simple solution is to use regular expressions with \b to match “word boundaries”, e.g.
let searchString = "bro"
let sentence = "Hey, Bro! Your brother is also her brother."
let regex = try! NSRegularExpression(pattern: #"\b\#(searchString)\b"#, options: .caseInsensitive)
regex.enumerateMatches(in: sentence, range: NSRange(sentence.startIndex..., in: sentence)) { match, _, _ in
guard let match = match else { return }
print(match.range)
// or, if you want a String.Range
if let range = Range(match.range, in: sentence) {
print(sentence[range])
}
}
There are other richer API (e.g. the Natural Language framework), which, while not perfect, provide richer parsing of natural language text. For example, the below will differentiate between the verb “saw” and noun “saw”:
import NaturalLanguage
let text = "I saw the hammer. I did not see a saw."
let tagger = NLTagger(tagSchemes: [.lexicalClass])
tagger.string = text
let options: NLTagger.Options = [.omitWhitespace, .joinContractions]
tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass, options: options) { tag, range in
guard let tag = tag else { return true }
print(tag, String(text[range]))
return true
}
Producing:
NLTag(_rawValue: Pronoun) I
NLTag(_rawValue: Verb) saw
NLTag(_rawValue: Determiner) the
NLTag(_rawValue: Noun) hammer
NLTag(_rawValue: SentenceTerminator) .
NLTag(_rawValue: Pronoun) I
NLTag(_rawValue: Verb) did
NLTag(_rawValue: Adverb) not
NLTag(_rawValue: Verb) see
NLTag(_rawValue: Determiner) a
NLTag(_rawValue: Noun) saw
NLTag(_rawValue: SentenceTerminator) .

Convert placeholders such as %1$s to {x} in Swift

I'm parsing an XML doc (using XMLParser) and some of the values have php-like placeholders, e.g. %1$s, and I would like to convert those to {x-1}.
Examples:
%1$s ---> {0}
%2$s ---> {1}
I'm doing this in a seemingly hacky way, using regex:
But there must be a better implementation of this regex.
Consider a string:
let str = "lala fawesfgeksgjesk 3rf3f %1$s rk32mrk3mfa %2$s fafafczcxz %3$s czcz $#$##%## %4$s qqq %5$s"
Now we're going to extract the integer strings between strings % and $s:
let regex = try! NSRegularExpression(pattern: "(?<=%)[^$s]+")
let range = NSRange(location: 0, length: str.utf16.count)
let matches = regex.matches(in: str, options: [], range: range)
matches.map {
print(String(str[Range($0.range, in: str)!]))
}
Works quite fine. The issue is that the "4" value got mixed up because of the preceding random strings before the %4$s.
Prints:
1
2
3
## %4
5
Is there any better way to do this?
This might not be a very efficient (or swifty :)) way but it gets the job done. What it does is that it searches for a given reg ex and uses the matched substring to extract the numeric value and decrease it and then perform a simple replace between the substring and a newly constructed placeholder value. This is executed in a loop until no more matches are found.
let pattern = #"%(\d*)\$s"#
while let range = str.range(of: pattern, options: .regularExpression) {
let placeholder = str[range]
let number = placeholder.trimmingCharacters(in: CharacterSet(charactersIn: "0123456789.").inverted)
if let value = Int(number) {
str = str.replacingOccurrences(of: placeholder, with: "{\(value - 1)}")
}
}

Swift Regex to allow only uppercase letters and numbers mixed

In my case, I need to Implement Regex for my UITextField. Here, my textfield should allow only uppercase with number mixed values.
For Example:
AI1234
ER3456
I used below one, but not working
^[A-Z0-9]{3}?$
This regex matches the pattern above
2 Uppercase characters followed by 4 numbers
^[A-Z]{2}\\d{4}
You can test it on https://regexr.com/
Edit:
let str = """
AI1234
ER3456
"""
let pattern = try? NSRegularExpression(pattern: "[A-Z]{2}\\d{4}", options: [])
let range = NSRange(location: 0, length: str.utf16.count)
let matches = pattern?.matches(in: str, options: [], range: range)
print(matches)

Use regex to match emojis as well as text in string

I am trying to find the range of specific substrings of a string. Each substring begins with a hashtag and can have any character it likes within it (including emojis). Duplicate hashtags should be detected at distinct ranges. A kind user from here suggested this code:
var str = "The range of #hashtag should be different to this #hashtag"
let regex = try NSRegularExpression(pattern: "(#[A-Za-z0-9]*)", options: [])
let matches = regex.matchesInString(str, options:[], range:NSMakeRange(0, str.characters.count))
for match in matches {
print("match = \(match.range)")
}
However, this code does not work for emojis. What would be the regex expression to include emojis? Is there a way to detect a #, followed by any character up until a space/line break?
Similarly as in Swift extract regex matches,
you have to pass an NSRange to the match functions, and the
returned ranges are NSRanges as well. This can be achieved
by converting the given text to an NSString.
The #\S+ pattern matches a # followed by one or more
non-whitespace characters.
let text = "The 😀range of #hashtag🐶 should 👺 be 🇩🇪 different to this #hashtag🐮"
let nsText = text as NSString
let regex = try NSRegularExpression(pattern: "#\\S+", options: [])
for match in regex.matchesInString(text, options: [], range: NSRange(location: 0, length: nsText.length)) {
print(match.range)
print(nsText.substringWithRange(match.range))
}
Output:
(15,10)
#hashtag🐶
(62,10)
#hashtag🐮
You can also convert between NSRange and Range<String.Index>
using the methods from NSRange to Range<String.Index>.
Remark: As #WiktorStribiżew correctly noticed, the above pattern
will include trailing punctuation (commas, periods, etc). If
that is not desired then
let regex = try NSRegularExpression(pattern: "#[^[:punct:][:space:]]+", options: [])
would be an alternative.

Return range with first and last character in string

I have a string: "Hey #username that's funny". For a given string, how can I search the string to return all ranges of string with first character # and last character to get the username?
I suppose I can get all indexes of # and for each, get the substringToIndex of the next space character, but wondering if there's an easier way.
If your username can contain only letters and numbers, you can use regular expression for that:
let s = "Hey #username123 that's funny"
if let r = s.rangeOfString("#\\w+", options: NSStringCompareOptions.RegularExpressionSearch) {
let name = s.substringWithRange(r) // #username123"
}
#Vladimir's answer is correct, but if you're trying to find multiple occurrences of "username", this should also work:
let s = "Hey #username123 that's funny"
let ranges: [NSRange]
do {
// Create the regular expression.
let regex = try NSRegularExpression(pattern: "#\\w+", options: [])
// Use the regular expression to get an array of NSTextCheckingResult.
// Use map to extract the range from each result.
ranges = regex.matchesInString(s, options: [], range: NSMakeRange(0, s.characters.count)).map {$0.range}
}
catch {
// There was a problem creating the regular expression
ranges = []
}
for range in ranges {
print((s as NSString).substringWithRange(range))
}

Resources