Swift regex to match unicodes - ios

I am new to swift and want to match all the unicode strings using regex
For example:
var s="😀 emoji 😀"
When I decoded the above string the result is:
"\ud83d\ude00 emoji \ud83d\ude00"
I want to replace each emoji with say *
In java I used the regex as:
"[\uD800-\uDBFF\uDC00-\uDFFF]" and was working
In swift I am using the same regex but its replacing every character with *
I want the result as * emoji *
Help is highly appreciated

The Unicode code point of the emoji you have shown is U+1F600.
(Unicode 9.0 Character Code Charts - Emoticons)
And your regex pattern (which may work for UTF-16 representation) [\uD800-\uDBFF\uDC00-\uDFFF] matches all non-BMP characters -- U+10000...U+10FFFF, which contains most of all emojis but also contains huge non-emoji characters.
So, as you say "[\uD800-\uDBFF\uDC00-\uDFFF]" was working, the equivalent pattern in NSRegularExpression is "[\\U00010000-\\U0010FFFF]".
var s="😀 emoji 😀"
let regex = try! NSRegularExpression(pattern: "[\\U00010000-\\U0010FFFF]", options: [])
let replaced = regex.stringByReplacingMatchesInString(s, options: [], range: NSRange(0..<s.utf16.count), withTemplate: "*") //->"* emoji *"
(Addition)
To see Unicode code points in your string literal:
s.unicodeScalars.forEach {
print(String(format: "U+%04X ", Int($0.value)))
}
For your example string, I get:
U+1F600
U+0020
U+0065
U+006D
U+006F
U+006A
U+0069
U+0020
U+1F600

Related

Combine multiple replacingOccurrences() with Swift

I have a String, I would like to add backslash to specific characters, because I use markdown and I don't wand to add style it's not wanted.
I tried to make a function, and it's working, but it's not efficient I guess:
func escapeMarkdownCharacters(){
let myString = "This is #an exemple #of _my_ * function"
var modString = myString.replacingOccurrences(of: "#", with: "\\#")
modString = modString.replacingOccurrences(of: "*", with: "\\*")
modString = modString.replacingOccurrences(of: "_", with: "\\_")
print(modString) // Displayed: This is \#an exemple \#of \_my\_ \* function
}
I would like to only have one "replacingOccurences" that work for multiple characters. I think I could do that with regex but I didn't figure out how. If you have an idea, please share it with me.
You may use
var modString = myString.replacingOccurrences(of: "[#*_]", with: "\\\\$0", options: [.regularExpression])
With a raw string literal:
var modString = myString.replacingOccurrences(of: "[#*_]", with: #"\\$0"#, options: [.regularExpression])
Result: This is \#an exemple \#of \_my\_ \* function
The options: [.regularExpression] argument enables the regex search mode.
The [#*_] pattern matches #, * or _ and then each match is replaced with a backslash (\\\\) and the match value ($0). Note that the backslash must be doubled in the replacement string because a backslash has a special meaning inside a replacement pattern (it may be used to make $0 a literal string when $ is preceded with a backslash).

How to add Unicode escape sequence in Localizable.strings?

How can a Unicode escape sequence be added to a string in Localizeable.strings file if the string is casted to NSString?
Here is one (ugly) example:
// Localized string: "\u{200F}Number %#" = "\u{200E}Number %#";
let string = NSMutableAttributedString(string: NSString(format: NSLocalizedString("Number %#", comment: "") as NSString, aNumber as NSNumber)) as String
From this question I understand that the problem is the incompatible escape sequences of Localizeable.strings and NSString.
Adding the unicode characters directly is Localizeable.strings file is not an option because I need to insert bidirectional semantics markers that are not printable characters. They would also be lost in most translation programs.
How can I work around that?

I receive an improperly formatted unicode in a String

I am working with a web API that gives me strings like the following:
"Eat pok\u00e9."
Xcode complains that
Expected Hexadecimal code in braces after unicode escape
My understanding is that it should be converted to pok\u{00e9}, but I do not know how to achieve this.
Can anybody point me in the right direction for me develop a way of converting these as there are many in this API?
Bonus:
I also need to remove \n from the strings.
You may want to give us more context regarding what the raw server payload looked like, and show us how you're displaying the string. Some ways of examining strings in the debugger (or if you're looking at raw JSON) will show you escape strings, but if you use the string in the app, you'll see the actual Unicode character.
I wonder if you're just looking at raw JSON.
For example, I passed the JSON, {"foo": "Eat pok\u00e9."} to the following code:
let jsonString = String(data: data, encoding: NSUTF8StringEncoding)!
print(jsonString)
let dictionary = try! NSJSONSerialization.JSONObjectWithData(data, options: []) as! [String: String]
print(dictionary["foo"]!)
And it output:
{"foo": "Eat pok\u00e9."}
Eat poké.
By the way, this standard JSON escape syntax should not be confused with Swift's string literal escape syntax, in which the hex sequence must be wrapped in braces:
print("Eat pok\u{00e9}.")
Swift uses a different escape syntax in their string literals, and it should not be confused with that employed by formats like JSON.
#Rob has an excellent solution for the server passing invalid Swift String literals.
If you need to convert "Eat pok\u00e9.\n" to Eat poké it can be done as follows with Swift 3 regex.
var input = "Eat pok\\u00e9.\n"
// removes newline
input = String(input.characters.map {
$0 == "\n" ? " " : $0
})
// regex helper function for sanity's sake
func regexGroup(for regex: String!, in text: String!) -> String {
do {
let regex = try RegularExpression(pattern: regex, options: [])
let nsString = NSString(string: text)
let results = regex.matches(in: text, options: [], range: NSMakeRange(0, nsString.length))
let group = nsString.substring(with: results[0].range)
return group
} catch let error as NSError {
print("invalid regex: \(error.localizedDescription)")
return ""
}
}
let unicodeHexStr = regexGroup(for:"0\\w*", in: input)
let unicodeHex = Int(unicodeHexStr, radix: 16)!
let char = Character(UnicodeScalar(unicodeHex)!)
let replaced = input.stringByReplacingOccurrencesOfString("\\u"+unicodeHexStr, withString: String(char))
// prints "Eat poké"
print(replaced)
\u{00e9} is a formatting that's specific to Swift String literals. When the code is compiled, this notation is parsed and converted into the actual Unicode Scalar it represents.
What you've received is a String that escapes Unicode scalars in a particlar way. Transform those escaped Unicode Scalars into the Unicode Scalars they represent, see this answer.

swift ios alpha numeric regex that allows underscores and dashes

I am using this lib for validation and are trying to add my own regex.
What I want to do is to make a regex that allows alphanumeric A-Z 0-9 together with dashes and unserscores -_
I have tryed let regex = "[a-zA-Z0-9_-]" but I cant get it to work.
I also want the regex to not only allow english letters, but all languishes.
The lib works cause I have made another regex that only allows ints 0-9 which works
let intRegex = "^[0-9]*$"
Your regex look good but it will only match a single character. Do this "^[a-zA-Z0-9_-]*$" instead to match more than one character.
breakup --
^ -- start of string
[\pL0-9_-] -- characters you want to allow
* -- any number of characters (the crucial bit you were missing)
$ -- end of string
Building up on #charsi's answer
extension String {
var isAlphanumericDashUnderscore: Bool {
get {
let regex = try! NSRegularExpression(pattern: "^[a-zA-Z0-9_-]*$", options: .caseInsensitive)
return regex.firstMatch(in: self, options: [], range: NSRange(location: 0, length: count)) != nil
}
}
}

Invalid escape sequence in literal: "\b"

I need to be able to create a String that is "\b". But when I try to, Xcode throws a compile-time error: Invalid escape sequence in literal. I don't understand why though, "\r" works just fine. If I put "\\b" then that's what is actually stored in the String, which is not what I need - I only need one backslash. To me, this seems to be a Swift oddity because it works just fine in Objective-C.
let str = "\b" //Invalid escape sequence in literal
NSString *str = #"\b"; //works great
I need to generate this string because "\b" is the only way to detect when the user pressed 'delete' when using UIKeyCommand:
let command = UIKeyCommand(input: "\b", modifierFlags: nil, action: "didHitDelete:")
How can I work around this issue?
EDIT: It really doesn't want to generate a String that is only "\b", this does not work - it stays the original value:
var delKey = "\rb"
delKey = delKey.stringByReplacingOccurrencesOfString("r", withString: "", options: .LiteralSearch, range: nil)
The Swift equivalent of \b is \u{8}. It maps to ASCII control code 8, just like \b in Objective C. I've tested this and found it to work fine with UIKeyCommand, in this earlier answer of mine.
Example snippet:
func keyCommands() -> NSArray {
return [
UIKeyCommand(input: "\u{8}", modifierFlags: .allZeros, action: "backspacePressed")
]
}
I don't believe it is supported.
Based on the Swift documentation https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Swift_Programming_Language/StringsAndCharacters.html:
String literals can include the following special Unicode characters:
The escaped special characters \0 (null character), \ (backslash), \t
(horizontal tab), \n (line feed), \r (carriage return), \" (double
quote) and \' (single quote)
An arbitrary Unicode scalar, written as
\u{n}, where n is between one and eight hexadecimal digits
The ASCII for the \b is 8. If you do the following, you'll see these results
let bs = "\u{8}"
var str = "Simple\u{8}string"
println(bs) // Prints ""
println("bs length is \(bs.lengthOfBytesUsingEncoding(NSUTF8StringEncoding))") // Prints 1
println(str) // Prints Simplestring
let space = "\u{20}"
println(space) // Prints " "
println("space length is \(space.lengthOfBytesUsingEncoding(NSUTF8StringEncoding))") // Prints 1
str = "Simple\u{20}string"
println(str) // Prints Simple string
It looks like while ASCII 8 "exists", it is "ignored".

Resources