Swift: Getting range of text that includes emojis [duplicate] - ios

This question already has an answer here:
Swift Regex doesn't work
(1 answer)
Closed 5 years ago.
I'm trying to parse out "#mentions" from a user provided string. The regular expression itself seems to find them, but the range it provides is incorrect when emoji are present.
let text = "😂😘🙂 #joe "
let tagExpr = try? NSRegularExpression(pattern: "#\\S+")
tagExpr?.enumerateMatches(in: text, range: NSRange(location: 0, length: text.characters.count)) { tag, flags, pointer in
guard let tag = tag?.range else { return }
if let newRange = Range(tag, in: text) {
let replaced = text.replacingCharacters(in: newRange, with: "[email]")
print(replaced)
}
}
When running this
tag = (location: 7, length: 2)
And prints out
😂😘🙂 [email]oe
The expected result is
😂😘🙂 [email]

NSRegularExpression (and anything involving NSRange) operates on UTF16 counts / indexes. For that matter, NSString.count is the UTF16 count as well.
But in your code, you're telling NSRegularExpression to use a length of text.characters.count. This is the number of composed characters, not the UTF16 count. Your string "😂😘🙂 #joe " has 9 composed characters, but 12 UTF16 code units. So you're actually telling NSRegularExpression to only look at the first 9 UTF16 code units, which means it's ignoring the trailing "oe ".
The fix is to pass length: text.utf16.count.
let text = "😂😘🙂 #joe "
let tagExpr = try? NSRegularExpression(pattern: "#\\S+")
tagExpr?.enumerateMatches(in: text, range: NSRange(location: 0, length: text.utf16.count)) { tag, flags, pointer in
guard let tag = tag?.range else { return }
if let newRange = Range(tag, in: text) {
let replaced = text.replacingCharacters(in: newRange, with: "[email]")
print(replaced)
}
}

Related

Swift regular expression for arabic decimals in arabic text

I have some Arabic text which has some decimals as well.
for example this text
"بِسۡمِ اللّٰہِ الرَّحۡمٰنِ الرَّحِیۡمِ ﴿۱﴾"
"وَاِذَا قِیۡلَ لَہُمۡ اٰمِنُوۡا کَمَاۤ اٰمَنَ النَّاسُ قَالُوۡۤا اَنُؤۡمِنُ کَمَاۤ اٰمَنَ السُّفَہَآءُ ؕ اَلَاۤ اِنَّہُمۡ ہُمُ السُّفَہَآءُ وَلٰکِنۡ لَّا یَعۡلَمُوۡنَ ﴿۱۴﴾"
This text has verse numbers as Arabic digits in the end.
I wanted to find out all the matches for the verse numbers in these verses.
In swift I am tring to use the regular expression but somehow i am not coming up with the correct regex.
Here is my code:
func getRegex() {
// unicode for the arabic digits
let regexStr = "[\u{0660}-\u{0669}]+"
//let regexStr = "[\\p{N}]+"
//let regexStr = "[۹۸۷۶۵۴۳۲۱۰]+"
do {
let regex = try NSRegularExpression(pattern: regexStr, options: .caseInsensitive)
let matches = regex.matches(in: self.arabicText, options: .anchored, range: NSRange(location: 0, length: self.arabicText.count))
print("Matches count : \(matches.count)")
} catch {
print(error)
}
}
Can somebody guide me on how I can get the matches for the Arabic digits in the example Arabic text?
The .anchored argument makes the pattern only match at the start of string, so you need to remove it.
Also, as your string is not ASCII, you need to use self.arabicText.utf16.count string property rather than accessing the self.arabicText.count directly.
So, you can use
let regexStr = "[۹۸۷۶۵۴۳۲۱۰]+"
and then
let matches = regex.matches(in: self.arabicText, options: [], range: NSRange(location: 0, length: self.arabicText.utf16.count))

Convert placeholders such as %1$s to {x} in Swift

I'm parsing an XML doc (using XMLParser) and some of the values have php-like placeholders, e.g. %1$s, and I would like to convert those to {x-1}.
Examples:
%1$s ---> {0}
%2$s ---> {1}
I'm doing this in a seemingly hacky way, using regex:
But there must be a better implementation of this regex.
Consider a string:
let str = "lala fawesfgeksgjesk 3rf3f %1$s rk32mrk3mfa %2$s fafafczcxz %3$s czcz $#$##%## %4$s qqq %5$s"
Now we're going to extract the integer strings between strings % and $s:
let regex = try! NSRegularExpression(pattern: "(?<=%)[^$s]+")
let range = NSRange(location: 0, length: str.utf16.count)
let matches = regex.matches(in: str, options: [], range: range)
matches.map {
print(String(str[Range($0.range, in: str)!]))
}
Works quite fine. The issue is that the "4" value got mixed up because of the preceding random strings before the %4$s.
Prints:
1
2
3
## %4
5
Is there any better way to do this?
This might not be a very efficient (or swifty :)) way but it gets the job done. What it does is that it searches for a given reg ex and uses the matched substring to extract the numeric value and decrease it and then perform a simple replace between the substring and a newly constructed placeholder value. This is executed in a loop until no more matches are found.
let pattern = #"%(\d*)\$s"#
while let range = str.range(of: pattern, options: .regularExpression) {
let placeholder = str[range]
let number = placeholder.trimmingCharacters(in: CharacterSet(charactersIn: "0123456789.").inverted)
if let value = Int(number) {
str = str.replacingOccurrences(of: placeholder, with: "{\(value - 1)}")
}
}

Using NSRegularExpression produces incorrect ranges when emoji are present [duplicate]

This question already has an answer here:
Swift Regex doesn't work
(1 answer)
Closed 5 years ago.
I'm trying to parse out "#mentions" from a user provided string. The regular expression itself seems to find them, but the range it provides is incorrect when emoji are present.
let text = "😂😘🙂 #joe "
let tagExpr = try? NSRegularExpression(pattern: "#\\S+")
tagExpr?.enumerateMatches(in: text, range: NSRange(location: 0, length: text.characters.count)) { tag, flags, pointer in
guard let tag = tag?.range else { return }
if let newRange = Range(tag, in: text) {
let replaced = text.replacingCharacters(in: newRange, with: "[email]")
print(replaced)
}
}
When running this
tag = (location: 7, length: 2)
And prints out
😂😘🙂 [email]oe
The expected result is
😂😘🙂 [email]
NSRegularExpression (and anything involving NSRange) operates on UTF16 counts / indexes. For that matter, NSString.count is the UTF16 count as well.
But in your code, you're telling NSRegularExpression to use a length of text.characters.count. This is the number of composed characters, not the UTF16 count. Your string "😂😘🙂 #joe " has 9 composed characters, but 12 UTF16 code units. So you're actually telling NSRegularExpression to only look at the first 9 UTF16 code units, which means it's ignoring the trailing "oe ".
The fix is to pass length: text.utf16.count.
let text = "😂😘🙂 #joe "
let tagExpr = try? NSRegularExpression(pattern: "#\\S+")
tagExpr?.enumerateMatches(in: text, range: NSRange(location: 0, length: text.utf16.count)) { tag, flags, pointer in
guard let tag = tag?.range else { return }
if let newRange = Range(tag, in: text) {
let replaced = text.replacingCharacters(in: newRange, with: "[email]")
print(replaced)
}
}

Swift - Whitespace count in a string [duplicate]

This question already has answers here:
Find number of spaces in a string in Swift
(3 answers)
Closed 5 years ago.
How do you get the count of the empty space within text?
It would be more helpful to me if explained with an example.
You can either use componentsSeparatedBy or filter function like
let array = string.components(separatedBy:" ")
let spaceCount = array.count - 1
or
let spaceCount = string.filter{$0 == " "}.count
If you want to consider other whitespace characters (not only space) use regular expression:
let string = "How to get count of the empty space in text,Like how we get character count like wise i need empty space count in a text, It would be more helpful if explained with an example."
let regex = try! NSRegularExpression(pattern: "\\s")
let numberOfWhitespaceCharacters = regex.numberOfMatches(in: string, range: NSRange(location: 0, length: string.utf16.count))
Regular expression \\s considers tab, cr, lf and space
Easiest way is to do something like this:
let emptySpacesCount = yourString.characters.filter { $0 == " " }.count
What this does is it takes characters from your string, filter out everything that is not space and then counts number of remaining elements.
You can try this example;
let string = "Whitespace count in a string swift"
let spaceCount = string.characters.filter{$0 == " "}.count

Swift advancedBy can't handle newline character "\r\n" [duplicate]

This question already has answers here:
NSRange to Range<String.Index>
(16 answers)
Closed 7 years ago.
I ran into a very strange problem today with Swift 2.
I have this simple method to extract a substring based on NSRange:
func substringWithRange(string: String, range: NSRange) -> String {
let startIndex = string.startIndex.advancedBy(range.location)
let endIndex = startIndex.advancedBy(range.length)
let substringRange = Range<String.Index>(start: startIndex, end: endIndex)
return string.substringWithRange(substringRange)
}
With ordinary strings or strings containing unicode characters everything works fine. But one string contains the newline characters "\r\n" and suddenly
let startIndex = string.startIndex.advancedBy(range.location)
is always 1 greater than it should be.
let string = "<html>\r\n var info={};</html>"
let range = NSMakeRange(9, 12)
let substring = substringWithRange(string, range: range)
//Expected: var info={};
//Actual: ar info={};<
//string.startIndex = 0
//range.location = 9
//startIndex after advancedBy = 10
Does anyone know why advancedBy is acting that way and how I can solve this problem?
The reason is that Swift treats \r\n as one character
let cr = "\r"
cr.characters.count // 1
let lf = "\n"
lf.characters.count // 1
let crlf = "\r\n"
crlf.characters.count // 1

Resources