I am trying to use Regex in Swift to replace an HTML string by a string. Basically anytime there is a set of numbers such as '1, 2 and 3 " preceded by the word 'Appendices' or a single number such 1 preceeded by the world 'Appendix' , I would like to create hyperlink tags for it.
For example I have a string:
See Appendices 1 , 9 and 27. You should also see the Appendices 28, 45 and 37. Also see Appendix 19. See also chapter 19 and Verses 38 and 45
And I would like to replace it with:
See Appendices <a href="Appendix://1"/>1</a> , <a href="Appendix://9"/>9</a> and <a href="Appendix://27"/>27</a> . You should also see the Appendices <a href="Appendix://28"/>28</a> , <a href="Appendix://45"/>45</a> and <a href="Appendix://37"/>37</a> . Also see <a href="Appendix://19"/>Appendix 19</a> . See also chapter 19 and Verses 38 and 45
I ended up writing a method that does this:
func findAndReplaceAppendixDeeplinks(theText:String)->String{
var text = theText
var innerRangeIncrement:Int = 0
do {
let regex = try? NSRegularExpression(pattern: "(Appendix|Appendices|App.) (\\d+)((, |and|&)?( )?(\\d+)?)+", options: NSRegularExpressionOptions.CaseInsensitive)
let range = NSMakeRange(0, text.characters.count)
let matches = regex!.matchesInString(text, options: NSMatchingOptions.WithoutAnchoringBounds, range: range)
innerRangeIncrement = 0
for match in matches {
let theMatch:String = (text as NSString).substringWithRange(match.range)
print("the new match is \(theMatch)")
do {
let regex1 = try? NSRegularExpression(pattern: "(\\d+)", options: NSRegularExpressionOptions.CaseInsensitive)
let innerMatches = regex1!.matchesInString(theText, options: NSMatchingOptions.WithoutAnchoringBounds, range: match.range)
for innerMatch in innerMatches{
let innerString:String = (theText as NSString).substringWithRange(innerMatch.range)
print("innerString is \(innerString)")
let replacementString = "\(innerString)"
printIfDebug("replacementString is \(replacementString)")
let innerRange = NSRange(location: innerMatch.range.location + innerRangeIncrement , length: innerMatch.range.length)
print("now looking for character position \(innerMatch.range.location + innerRangeIncrement)")
text = regex1!.stringByReplacingMatchesInString(text, options: NSMatchingOptions.WithoutAnchoringBounds, range: innerRange, withTemplate: replacementString)
innerRangeIncrement = innerRangeIncrement + replacementString.length - innerString.length
printIfDebug("inner increment value is \(innerRangeIncrement)")
printIfDebug(text)
}
printIfDebug("outer increment value is \(innerRangeIncrement)")
}
}
}
return text
}
Related
This question already has an answer here:
Swift Regex doesn't work
(1 answer)
Closed 5 years ago.
I'm trying to parse out "#mentions" from a user provided string. The regular expression itself seems to find them, but the range it provides is incorrect when emoji are present.
let text = "đđđ #joe "
let tagExpr = try? NSRegularExpression(pattern: "#\\S+")
tagExpr?.enumerateMatches(in: text, range: NSRange(location: 0, length: text.characters.count)) { tag, flags, pointer in
guard let tag = tag?.range else { return }
if let newRange = Range(tag, in: text) {
let replaced = text.replacingCharacters(in: newRange, with: "[email]")
print(replaced)
}
}
When running this
tag = (location: 7, length: 2)
And prints out
đđđ [email]oe
The expected result is
đđđ [email]
NSRegularExpression (and anything involving NSRange) operates on UTF16 counts / indexes. For that matter, NSString.count is the UTF16 count as well.
But in your code, you're telling NSRegularExpression to use a length of text.characters.count. This is the number of composed characters, not the UTF16 count. Your string "đđđ #joe " has 9 composed characters, but 12 UTF16 code units. So you're actually telling NSRegularExpression to only look at the first 9 UTF16 code units, which means it's ignoring the trailing "oe ".
The fix is to pass length: text.utf16.count.
let text = "đđđ #joe "
let tagExpr = try? NSRegularExpression(pattern: "#\\S+")
tagExpr?.enumerateMatches(in: text, range: NSRange(location: 0, length: text.utf16.count)) { tag, flags, pointer in
guard let tag = tag?.range else { return }
if let newRange = Range(tag, in: text) {
let replaced = text.replacingCharacters(in: newRange, with: "[email]")
print(replaced)
}
}
let's say I have a string
var a = "#bb #cccc #ddddd\u{ef}"
and i am setting it to textview like this
let text = a.trimmingCharacters(in: .whitespacesAndNewlines)
let textRemoved = text?.replacingOccurrences(of: "\u{ef}", with: "", options: NSString.CompareOptions.literal, range:nil)
textView.text = textRemove
I am trying to remove the \u{ef} character here. But in textRemoved it is not happening. Please help me how to do it.
I am using Xcode 10. Looks like below Xcode version than 10 is working
fine. is it a bug of Xcode 10?
This is a late answer but I struggled to replace "\u{ef}" in string as well. During debugging when hovered over string it showed presence of \u{ef} but when print in description it only showed space.
let str = "\u{ef} Some Title"
print(str) //" Some Title"
I tried replacingOccurrences(of: "\u{ef}", with: "", options: NSString.CompareOptions.literal, range: nil).trimmingCharacters(in: .whitespaces) but it failed as well.
So I used below snippet and it worked like wonder.
let modifiedStr = str.replacingOccurrences(of: "\u{fffc}", with: "", options: NSString.CompareOptions.literal, range: nil).trimmingCharacters(in: .whitespaces)
print(modifiedStr) //"Some Title"
Hope this helps someone!!
i also faced same issue for "\u{e2}". i have searched a lot but unable to find any answer. then i have tried below code , which works for me.
var newString = ""
for char in strMainString.unicodeScalars{
if char.isASCII{
newString += String(char)
}
}
Hope that will also work for you too.
In Xcode 10 Playground, string replaces for \u{00EF} is working.
var a = "#bb #cccc #ddddd\u{ef}"
a = a.replacingOccurrences(of: "\u{00EF}", with: "")
I hope that will work for you.
I tried the following and it worked like a charm:
replacingOccurrences(of: "ïżœ", with: " ", options: NSString.CompareOptions.literal, range: nil)
e.g. 1
let text = "\u{ef}\u{ef}\u{ef}\u{ef}đćŠćŠćŠ"
let text1 = text.replacingOccurrences(of: "\u{fffc}", with: "", options: String.CompareOptions.literal, range: nil)
let text2 = text.replacingOccurrences(of: "\u{ef}", with: "", options: String.CompareOptions.literal, range: nil).trimmingCharacters(in: .whitespaces)
runnable
<img src="https://i.stack.imgur.com/styVo.png"/>
e.g. 2
let strBefore = textDocumentProxy.documentContextBeforeInput
let strAfter = textDocumentProxy.documentContextAfterInput
var textInput = strBefore + strAfter
let textInput2 = textInput.replacingOccurrences(of: "\u{ef}", with: "", options: String.CompareOptions.literal, range: nil)
let textInput1 = textInput.replacingOccurrences(of: "\u{fffc}", with: "", options: String.CompareOptions.literal, range: nil).trimmingCharacters(in: .whitespaces)
runnable
<img src="https://i.stack.imgur.com/xGHtW.png"/>
Similar to question but with \u{e2} symbol (fix is the same):
\u{e2} is not a character rather subset of UTF8 plane which starts with 0xE2 byte.
So look here, E2 are general punctuation symbols.
There many symbols actually which started with \u{e2} but not limited to it and full char can be represented f.e. with e2 80 a8 bytes (line separator).
That explains why shown in Xcode \u{e2} can't be replaced with replacingOccurrences... function. In order to filter out correct symbol you have to know what exact symbol it is, f.e. by using the snippet below:
"\u{2028}&đČ".forEach { (char) in
print(Data(char.utf8).map { String(format: "%02x", $0) }.joined(separator: " "))
}
it prints to console:
e2 80 a8
26
f0 9f 98 b2
which are byte representation for each symbol.
Next step is to filter your string, go here and search in 3d column your bytes and unicode code point value is what you need (first column) and write it in swift code like "\u{2028}\u{206A}..." (depending on your sorting).
The final function may look like:
func removingE2Symbols() -> String {
let specialChars = "\u{202A}\u{202C}"
return filter { !specialChars.contains($0) }
}
Try this
extension String {
var asciiString: String {
return String(self.unicodeScalars.filter{ $0.isASCII })
}
}
It,s working Please check again:
let a = "#bb #cccc #ddddd\u{ef}"
let text = a.trimmingCharacters(in: .whitespacesAndNewlines)
let textRemoved = text.replacingOccurrences(of: "\u{ef}", with: "", options: NSString.CompareOptions.literal, range:nil)
print(textRemoved)
i am using the TesseractOCR to read a receipt and i have managed to extract the text from the receipt line by line e.g
2 melon ÂŁ3.00
1 lime ÂŁ1.50
5 chicken wings ÂŁ10.00
But now, for each line, i would like to extract the item name(melons, lime, chicken wings), then the integer and then the float all sepearately line by line. I have googled a lot and have written this in ruby using regex but cant figure out how to do it in swift. I have figured out the float and integer part just not the words only part.
a link to an answer already would be great or an answer. thanks for any help in advance.
If you have solved this using regex in Ruby, the solution in Swift is similar. First let's define some helper functions since NSRegularExpression still deals in NSRange units:
extension String {
var fullRange: NSRange {
return NSMakeRange(0, self.characters.count)
}
subscript(range: NSRange) -> String {
let startIndex = self.index(self.startIndex, offsetBy: range.location)
let endIndex = self.index(startIndex, offsetBy: range.length)
return self[startIndex..<endIndex]
}
}
And the code:
let text =
"2 melon ÂŁ3.00\n" +
"1 lime ÂŁ1.50\n" +
"5 chicken wings ÂŁ10.00"
let regex = try! NSRegularExpression(pattern: "(\\d+)\\s+(.+?)\\s+ÂŁ([\\d\\.]+)$", options: [.anchorsMatchLines])
regex.enumerateMatches(in: text, options: [], range: text.fullRange) { result, flag, stop in
if let result = result {
let r1 = result.rangeAt(1)
let r2 = result.rangeAt(2)
let r3 = result.rangeAt(3)
print("quantity = \(text[r1]), item = \(text[r2]), price = \(text[r3])")
}
}
use componentSeparatedByString
let a = "5 Chicken Wing"
let b = a.componentSeparatedByString(" ") //meaning space
let b0 = b[0] //5
let b1 = b[1] //Chicken
let b2 = b[2] //Wing
I'm trying to format a phone number to the format (###) ###-####.
I'm using the following regex with replacement template
let regex = try NSRegularExpression(pattern: "([0-9]{3})([0-9]{1,3})([0-9]{0,4})")
regex.stringByReplacingMatches(
in: rawNumber,
options: .reportCompletion,
range: NSRange(location: 0, length: rawNumber.characters.count),
withTemplate: "($1) $2-$3"
)
The problem is that my template string includes the hardcoded - which should not appear if the third capture group $3 isn't found.
For example:
rawNumber = "5125"
would be replaced as (512) 5- when I actually want it in the format (512) 5, because I don't want the - to be shown unless the third capture group was found.
For example I was hoping there might be a way to make a template as something like:
"($1) $2if$3{-}$3"
Without getting fancy, I used the approach of alternate templates in a ternary conditional operator:
let rawNumber = "512512345"
let regex = try NSRegularExpression(pattern: "([0-9]{3})([0-9]{1,3})([0-9]{0,4})")
regex.stringByReplacingMatches(
in: rawNumber,
options: .reportCompletion,
range: NSRange(location: 0, length: rawNumber.characters.count),
withTemplate: "\(rawNumber.count > 6 ? "($1) $2-$3" : "($1) $2")"
)
Result for rawNumber = "5125" is "(512) 5" while rawNumber = "512512345" is "(512) 512-345".
EDIT: Swift 4.2 (and placing result into a constant)
let rawNumber = "512512345"
if let formattedString =
try? NSRegularExpression(pattern: "([0-9]{3})([0-9]{1,3})([0-9]{0,4})", options: []).stringByReplacingMatches(in: rawNumber, options: .reportCompletion, range: NSRange(location: 0, length: rawNumber.count), withTemplate: "\(rawNumber.count > 6 ? "($1) $2-$3" : "($1) $2")") {
print(formattedString)
}
You can write your own subclass of NSRegularExpression for conditional replacement:
class PhoneNumberConverter: NSRegularExpression {
override func replacementString(for result: NSTextCheckingResult, in string: String, offset: Int, template templ: String) -> String {
//Assuming `pattern` has always 3 captures
if result.rangeAt(3).length == 0 {
//$3 isn't found
return super.replacementString(for: result, in: string, offset: offset, template: "($1) $2")
} else {
return super.replacementString(for: result, in: string, offset: offset, template: "($1) $2-$3")
}
}
}
func convertPhoneNumber(rawNumber: String) -> String {
let regex = try! PhoneNumberConverter(pattern: "([0-9]{3})([0-9]{1,3})([0-9]{0,4})")
return regex.stringByReplacingMatches(
in: rawNumber,
options: .reportCompletion,
range: NSRange(location: 0, length: rawNumber.characters.count),
withTemplate: "($1) $2-$3"
)
}
print(convertPhoneNumber(rawNumber: "5125")) //->(512) 5
print(convertPhoneNumber(rawNumber: "512512345")) //->(512) 512-345
Instead of stringByReplacingMatches, use matchesInString. This will give you the list of matches (there should be only one), which itself contains the list of the ranges for each capturing group.
You can then check which capturing group did actually match, and from there, use one template or the other.
It looks like there is no way to do this as all of the template string is interpreted as literal characters except for the capture groups.
https://developer.apple.com/reference/foundation/nsregularexpression:
i'm trying to remove white spaces and some characters from a string, please check my code below
// giving phoneString = +39 333 3333333
var phoneString = ABMultiValueCopyValueAtIndex(phone, indexPhone).takeRetainedValue() as! String
// Remove spaces from string
phoneString = phoneString.stringByTrimmingCharactersInSet(NSCharacterSet.whitespaceCharacterSet())
// Remove +39 if exist
if phoneString.rangeOfString("+39") != nil{
phoneString = phoneString.stringByReplacingOccurrencesOfString("\0", withString: "+39", options: NSStringCompareOptions.LiteralSearch, range: nil)
}
print(phoneString) // output +39 333 3333333
it seems like all the changes has no effect over my string, why this happen?
EDIT #V S
EDIT 2:
I tried to convert my string in utf 8, check the result:
43 51 57 194 160 51 51 51 194 160 51 51 51 51 51 51 51
where:
43 = +
51 = 3
57 = 9
160 = space
194 = wtf?!? is this?
what do you try to do is
// your input string
let str = "+39 333 3333333"
let arr = str.characters.split(" ").map(String.init) // ["+39", "333", "3333333"]
// remove country code and reconstruct the rest as one string without whitespaces
let str2 = arr.dropFirst().joinWithSeparator("") // "3333333333"
to filter out country code, only if exists (as Eendje asks)
let str = "+39 123 456789"
let arr = str.characters.split(" ").map(String.init)
let str3 = arr.filter { !$0.hasPrefix("+") }.joinWithSeparator("") // "123456789"
UPDATE, based on your update.
160 represents no-breakable space. just modify next line in my code
let arr = str.characters.split{" \u{00A0}".characters.contains($0)}.map(String.init)
there is " \u{00A0}".characters.contains($0) expression where you can extend the string to as much whitespace characters, as you need. 160 is \u{00A0} see details here.
Update for Swift 4
String.characters is deprecated. So the correct answer would now be
// your input string
let str = "+39 333 3333333"
let arr = str.components(separatedBy: .whitespaces) // ["+39", "333", "3333333"]
// remove country code and reconstruct the rest as one string without whitespaces
let str2 = arr.dropFirst().joined() // "3333333333"
Firstly, stringByTrimmingCharactersInSet only trims the string - i.e. removes leading & trailing spaces - you need to use stringByReplacingOccurrencesOfString replacing " " with "".
Secondly, your parameters on stringByReplacingOccurrencesOfString for the country code are the wrong way round.
Thirdly, "\0" is not what you want- that's ASCII null, not zero.
Swift 3 / Swift 4
let withoutSpaces = phoneNumber.replacingOccurrences(of: "\\s", with: "", options: .regularExpression)
Swift 5
//MARK:- 3 ways to resolve it
var tempphone = "0345 55500 93"
//MARK:- No 1
tempphone = tempphone.replacingOccurrences(of: " ", with: "")
//MARK:- No 2
tempphone = tempphone.replacingOccurrences(of: "\\s", with: "", options: .regularExpression)
//MARK:- No 3
tempphone = tempphone.trimmingCharacters(in: .whitespaces)
phoneString = phoneString.stringByReplacingOccurrencesOfString("+39", withString: "", options: NSStringCompareOptions.LiteralSearch, range: nil)
phoneString = phoneString.stringByTrimmingCharactersInSet(NSCharacterSet.whitespaceCharacterSet())
Try this. This has worked for me:
if phoneString.rangeOfString("+39") != nil{
freshString = phoneString.stringByReplacingOccurrencesOfString("\0", withString: "+39", options: NSStringCompareOptions.LiteralSearch, range: nil)
}
var strings = freshString.componentsSeparatedByString(" ") as NSArray
var finalString = strings.componentsJoinedByString("")
//outputs +393333333333
You can use this replace the whitespace
phoneNumber.replacingOccurrences(of: "\u{00A0}", with: "")
let trimmedPhoneString = String(phoneString).stringByTrimmingCharactersInSet(NSCharacterSet.whitespaceCharacterSet())
To Remove +39 if exist, you can use stringByReplacingOccurrencesOfString instead
var phoneString = "+39 333 3333333"
phoneString = phoneString.stringByReplacingOccurrencesOfString(" ", withString:"")
if phoneString.rangeOfString("+39") != nil
{
phoneString = phoneString.stringByReplacingOccurrencesOfString("+39", withString: "", options: NSStringCompareOptions.LiteralSearch, range: nil)
}
print(phoneString) // output 3333333333