Swift regular expression for arabic decimals in arabic text - ios

I have some Arabic text which has some decimals as well.
for example this text
"بِسۡمِ اللّٰہِ الرَّحۡمٰنِ الرَّحِیۡمِ ﴿۱﴾"
"وَاِذَا قِیۡلَ لَہُمۡ اٰمِنُوۡا کَمَاۤ اٰمَنَ النَّاسُ قَالُوۡۤا اَنُؤۡمِنُ کَمَاۤ اٰمَنَ السُّفَہَآءُ ؕ اَلَاۤ اِنَّہُمۡ ہُمُ السُّفَہَآءُ وَلٰکِنۡ لَّا یَعۡلَمُوۡنَ ﴿۱۴﴾"
This text has verse numbers as Arabic digits in the end.
I wanted to find out all the matches for the verse numbers in these verses.
In swift I am tring to use the regular expression but somehow i am not coming up with the correct regex.
Here is my code:
func getRegex() {
// unicode for the arabic digits
let regexStr = "[\u{0660}-\u{0669}]+"
//let regexStr = "[\\p{N}]+"
//let regexStr = "[۹۸۷۶۵۴۳۲۱۰]+"
do {
let regex = try NSRegularExpression(pattern: regexStr, options: .caseInsensitive)
let matches = regex.matches(in: self.arabicText, options: .anchored, range: NSRange(location: 0, length: self.arabicText.count))
print("Matches count : \(matches.count)")
} catch {
print(error)
}
}
Can somebody guide me on how I can get the matches for the Arabic digits in the example Arabic text?

The .anchored argument makes the pattern only match at the start of string, so you need to remove it.
Also, as your string is not ASCII, you need to use self.arabicText.utf16.count string property rather than accessing the self.arabicText.count directly.
So, you can use
let regexStr = "[۹۸۷۶۵۴۳۲۱۰]+"
and then
let matches = regex.matches(in: self.arabicText, options: [], range: NSRange(location: 0, length: self.arabicText.utf16.count))

Related

Swift: Getting range of text that includes emojis [duplicate]

This question already has an answer here:
Swift Regex doesn't work
(1 answer)
Closed 5 years ago.
I'm trying to parse out "#mentions" from a user provided string. The regular expression itself seems to find them, but the range it provides is incorrect when emoji are present.
let text = "😂😘🙂 #joe "
let tagExpr = try? NSRegularExpression(pattern: "#\\S+")
tagExpr?.enumerateMatches(in: text, range: NSRange(location: 0, length: text.characters.count)) { tag, flags, pointer in
guard let tag = tag?.range else { return }
if let newRange = Range(tag, in: text) {
let replaced = text.replacingCharacters(in: newRange, with: "[email]")
print(replaced)
}
}
When running this
tag = (location: 7, length: 2)
And prints out
😂😘🙂 [email]oe
The expected result is
😂😘🙂 [email]
NSRegularExpression (and anything involving NSRange) operates on UTF16 counts / indexes. For that matter, NSString.count is the UTF16 count as well.
But in your code, you're telling NSRegularExpression to use a length of text.characters.count. This is the number of composed characters, not the UTF16 count. Your string "😂😘🙂 #joe " has 9 composed characters, but 12 UTF16 code units. So you're actually telling NSRegularExpression to only look at the first 9 UTF16 code units, which means it's ignoring the trailing "oe ".
The fix is to pass length: text.utf16.count.
let text = "😂😘🙂 #joe "
let tagExpr = try? NSRegularExpression(pattern: "#\\S+")
tagExpr?.enumerateMatches(in: text, range: NSRange(location: 0, length: text.utf16.count)) { tag, flags, pointer in
guard let tag = tag?.range else { return }
if let newRange = Range(tag, in: text) {
let replaced = text.replacingCharacters(in: newRange, with: "[email]")
print(replaced)
}
}

Swift Regex to allow only uppercase letters and numbers mixed

In my case, I need to Implement Regex for my UITextField. Here, my textfield should allow only uppercase with number mixed values.
For Example:
AI1234
ER3456
I used below one, but not working
^[A-Z0-9]{3}?$
This regex matches the pattern above
2 Uppercase characters followed by 4 numbers
^[A-Z]{2}\\d{4}
You can test it on https://regexr.com/
Edit:
let str = """
AI1234
ER3456
"""
let pattern = try? NSRegularExpression(pattern: "[A-Z]{2}\\d{4}", options: [])
let range = NSRange(location: 0, length: str.utf16.count)
let matches = pattern?.matches(in: str, options: [], range: range)
print(matches)

Using NSRegularExpression produces incorrect ranges when emoji are present [duplicate]

This question already has an answer here:
Swift Regex doesn't work
(1 answer)
Closed 5 years ago.
I'm trying to parse out "#mentions" from a user provided string. The regular expression itself seems to find them, but the range it provides is incorrect when emoji are present.
let text = "😂😘🙂 #joe "
let tagExpr = try? NSRegularExpression(pattern: "#\\S+")
tagExpr?.enumerateMatches(in: text, range: NSRange(location: 0, length: text.characters.count)) { tag, flags, pointer in
guard let tag = tag?.range else { return }
if let newRange = Range(tag, in: text) {
let replaced = text.replacingCharacters(in: newRange, with: "[email]")
print(replaced)
}
}
When running this
tag = (location: 7, length: 2)
And prints out
😂😘🙂 [email]oe
The expected result is
😂😘🙂 [email]
NSRegularExpression (and anything involving NSRange) operates on UTF16 counts / indexes. For that matter, NSString.count is the UTF16 count as well.
But in your code, you're telling NSRegularExpression to use a length of text.characters.count. This is the number of composed characters, not the UTF16 count. Your string "😂😘🙂 #joe " has 9 composed characters, but 12 UTF16 code units. So you're actually telling NSRegularExpression to only look at the first 9 UTF16 code units, which means it's ignoring the trailing "oe ".
The fix is to pass length: text.utf16.count.
let text = "😂😘🙂 #joe "
let tagExpr = try? NSRegularExpression(pattern: "#\\S+")
tagExpr?.enumerateMatches(in: text, range: NSRange(location: 0, length: text.utf16.count)) { tag, flags, pointer in
guard let tag = tag?.range else { return }
if let newRange = Range(tag, in: text) {
let replaced = text.replacingCharacters(in: newRange, with: "[email]")
print(replaced)
}
}

Use regex to match emojis as well as text in string

I am trying to find the range of specific substrings of a string. Each substring begins with a hashtag and can have any character it likes within it (including emojis). Duplicate hashtags should be detected at distinct ranges. A kind user from here suggested this code:
var str = "The range of #hashtag should be different to this #hashtag"
let regex = try NSRegularExpression(pattern: "(#[A-Za-z0-9]*)", options: [])
let matches = regex.matchesInString(str, options:[], range:NSMakeRange(0, str.characters.count))
for match in matches {
print("match = \(match.range)")
}
However, this code does not work for emojis. What would be the regex expression to include emojis? Is there a way to detect a #, followed by any character up until a space/line break?
Similarly as in Swift extract regex matches,
you have to pass an NSRange to the match functions, and the
returned ranges are NSRanges as well. This can be achieved
by converting the given text to an NSString.
The #\S+ pattern matches a # followed by one or more
non-whitespace characters.
let text = "The 😀range of #hashtag🐶 should 👺 be 🇩🇪 different to this #hashtag🐮"
let nsText = text as NSString
let regex = try NSRegularExpression(pattern: "#\\S+", options: [])
for match in regex.matchesInString(text, options: [], range: NSRange(location: 0, length: nsText.length)) {
print(match.range)
print(nsText.substringWithRange(match.range))
}
Output:
(15,10)
#hashtag🐶
(62,10)
#hashtag🐮
You can also convert between NSRange and Range<String.Index>
using the methods from NSRange to Range<String.Index>.
Remark: As #WiktorStribiżew correctly noticed, the above pattern
will include trailing punctuation (commas, periods, etc). If
that is not desired then
let regex = try NSRegularExpression(pattern: "#[^[:punct:][:space:]]+", options: [])
would be an alternative.

Replace regex match with attributed string and text

Our app Api returns a field with custom format for user mentions just like:
"this is a text with mention for #(steve|user_id)".
So before display it on UITextView, need to process the text, find the pattern and replace with something more user friendly.
Final result would be "this is a text with mention for #steve" where #steve should have a link attribute with user_id. Basically the same functionality as Facebook.
First I've created an UITextView extension, with a match function for the regex pattern.
extension UITextView {
func processText(pattern: String) {
let inString = self.text
let regex = try? NSRegularExpression(pattern: pattern, options: [])
let range = NSMakeRange(0, inString.characters.count)
let matches = (regex?.matchesInString(inString, options: [], range: range))! as [NSTextCheckingResult]
let attrString = NSMutableAttributedString(string: inString, attributes:attrs)
//Iterate over regex matches
for match in matches {
//Properly print match range
print(match.range)
//A basic idea to add a link attribute on regex match range
attrString.addAttribute(NSLinkAttributeName, value: "\(schemeMap["#"]):\(must_be_user_id)", range: match.range)
//Still text it's in format #(steve|user_id) how could replace it by #steve keeping the link attribute ?
}
}
}
//To use it
let regex = ""\\#\\(([\\w\\s?]*)\\|([a-zA-Z0-9]{24})\\)""
myTextView.processText(regex)
This is what I have right now, but I'm stucked trying to get final result
Thanks a lot !
I changed your regex a bit, but got a pretty good result. Modified the code a little as well, so you can test it directly in Playgrounds.
func processText() -> NSAttributedString {
let pattern = "(#\\(([^|]*)([^#]*)\\))"
let inString = "this is a text with mention for #(steve|user_id1) and #(alan|user_id2)."
let regex = try? NSRegularExpression(pattern: pattern, options: [])
let range = NSMakeRange(0, inString.characters.count)
let matches = (regex?.matchesInString(inString, options: [], range: range))!
let attrString = NSMutableAttributedString(string: inString, attributes:nil)
print(matches.count)
//Iterate over regex matches
for match in matches.reverse() {
//Properly print match range
print(match.range)
//Get username and userid
let userName = attrString.attributedSubstringFromRange(match.rangeAtIndex(2)).string
let userId = attrString.attributedSubstringFromRange(match.rangeAtIndex(3)).string
//A basic idea to add a link attribute on regex match range
attrString.addAttribute(NSLinkAttributeName, value: "\(userId)", range: match.rangeAtIndex(1))
//Still text it's in format #(steve|user_id) how could replace it by #steve keeping the link attribute ?
attrString.replaceCharactersInRange(match.rangeAtIndex(1), withString: "#\(userName)")
}
return attrString
}

Resources