I receive an improperly formatted unicode in a String - ios

I am working with a web API that gives me strings like the following:
"Eat pok\u00e9."
Xcode complains that
Expected Hexadecimal code in braces after unicode escape
My understanding is that it should be converted to pok\u{00e9}, but I do not know how to achieve this.
Can anybody point me in the right direction for me develop a way of converting these as there are many in this API?
Bonus:
I also need to remove \n from the strings.

You may want to give us more context regarding what the raw server payload looked like, and show us how you're displaying the string. Some ways of examining strings in the debugger (or if you're looking at raw JSON) will show you escape strings, but if you use the string in the app, you'll see the actual Unicode character.
I wonder if you're just looking at raw JSON.
For example, I passed the JSON, {"foo": "Eat pok\u00e9."} to the following code:
let jsonString = String(data: data, encoding: NSUTF8StringEncoding)!
print(jsonString)
let dictionary = try! NSJSONSerialization.JSONObjectWithData(data, options: []) as! [String: String]
print(dictionary["foo"]!)
And it output:
{"foo": "Eat pok\u00e9."}
Eat poké.
By the way, this standard JSON escape syntax should not be confused with Swift's string literal escape syntax, in which the hex sequence must be wrapped in braces:
print("Eat pok\u{00e9}.")
Swift uses a different escape syntax in their string literals, and it should not be confused with that employed by formats like JSON.

#Rob has an excellent solution for the server passing invalid Swift String literals.
If you need to convert "Eat pok\u00e9.\n" to Eat poké it can be done as follows with Swift 3 regex.
var input = "Eat pok\\u00e9.\n"
// removes newline
input = String(input.characters.map {
$0 == "\n" ? " " : $0
})
// regex helper function for sanity's sake
func regexGroup(for regex: String!, in text: String!) -> String {
do {
let regex = try RegularExpression(pattern: regex, options: [])
let nsString = NSString(string: text)
let results = regex.matches(in: text, options: [], range: NSMakeRange(0, nsString.length))
let group = nsString.substring(with: results[0].range)
return group
} catch let error as NSError {
print("invalid regex: \(error.localizedDescription)")
return ""
}
}
let unicodeHexStr = regexGroup(for:"0\\w*", in: input)
let unicodeHex = Int(unicodeHexStr, radix: 16)!
let char = Character(UnicodeScalar(unicodeHex)!)
let replaced = input.stringByReplacingOccurrencesOfString("\\u"+unicodeHexStr, withString: String(char))
// prints "Eat poké"
print(replaced)

\u{00e9} is a formatting that's specific to Swift String literals. When the code is compiled, this notation is parsed and converted into the actual Unicode Scalar it represents.
What you've received is a String that escapes Unicode scalars in a particlar way. Transform those escaped Unicode Scalars into the Unicode Scalars they represent, see this answer.

Related

Find a word after and before a string

I have a string like so...
ab-0-myCoolApp.theAppAB.in
How can I get the word myCoolApp from this string...? Also there are many strings in the same format i.e myCoolApp can be myCoolAppABX or myCoolAppABCD etc.
that could be a really brief solution (=one of the many ones) to your problem, but the core concept would be something like that in every case.
the input has some random values:
let inputs = ["ab-0-myCoolApp.theAppAB.in", "ab-0-myCoolAppABX.theAppAB.in", "ab-0-myCoolAppABXC.theAppAB.in"]
and having a regular expression to find matches:
let regExp = try? NSRegularExpression(pattern: "-([^-]*?)\\.", options: NSRegularExpression.Options.caseInsensitive)
then Release the Kraken:
inputs.forEach { string in
regExp?.matches(in: string, options: NSRegularExpression.MatchingOptions.reportProgress, range: NSMakeRange(0, string.lengthOfBytes(using: .utf8))).forEach({
let match = (string as NSString).substring(with: $0.range(at: 1))
debugPrint(match)
})
}
finally it prints out the following list:
"myCoolApp"
"myCoolAppABX"
"myCoolAppABXC"
NOTE: you may need to implement further failsafes during getting the matches or you can refactor the entire idea at your convenience.

Parse doubles such as 1.0 from JSON in Swift 4 without loosing the decimal?

or can I check if a number was decoded as a decimal number and not and integer later?
if let int = any as? Int {
print("Object in an integer")
} else if let num = any as? Double {
print("Object in a double")
}
, where "any" is an Any value and = 1.0 (not a string) in the JSON file. "any" can be cast to both integer and double (so the order of which I check determines the outcome), but I would like to keep the original format from the JSON file.
Decoding is done using the following line:
let json = try JSONSerialization.jsonObject(with: data, options: [])
Edit: I've tried checking CFType, but get the same for both 1 and 1.0 (inspired by http://stackoverflow.com/a/30223989/1694526)
Any ideas?
As already mentioned by #Sulthan this is not possible on the level you are working as JSONSerialization will and should use a single class to represent a value and may not determine its type.
You could try finding some other tool to check for values but does it really make sense?
You are trying to look for differences between Int and Double but what about 64 or 32 bit? Or signed and unsigned? We usually don't write those into strings so there really is no way to distinguish between them. So there is really no general logic in doing so.
Are you sure the returned JSON will always have ".0" appended for these values? This really depends on the system and a smallest optimization would trim that because JSON standard does not include precisions on numbers. For instance if I use JSONSerialization and print out String(data: (try! JSONSerialization.data(withJSONObject: [ "value": 1.0 ], options: .prettyPrinted)), encoding: .utf8) I receive: {\n \"value\" : 1\n} which means it trimmed ".0" anyway.
I find it hard to understand how this would be good structurally. If you need to save these data for instance into your database you will need to define the size and type of the primitive to hold your data. If you need to use some arithmetics you again need to specify the type...
The only way would be to use it as a display string. But in that case your value should be returned as a string and not as a number.
The solution is to parse to an NSNumber and then to a Decimal (or NSDecimalNumber). DO NOT parse via a Double.
let jsonString = "[ 4.01 ]"
let jsonData = jsonString.data(using: .utf8)!
let jsonArray = try! JSONSerialization.jsonObject(with: jsonData, options: []) as! [Any]
// This is the WRONG way to parse decimals (via a Double)
// parseAttemptA = 4.009999999999998976
let parseAttemptA: Decimal = Decimal(jsonArray[0] as! Double)
// This is the CORRECT way to parse decimals (via an NSNumber)
// parseAttemptB = 4.01
let parseAttemptB: Decimal = (jsonArray[0] as! NSNumber).decimalValue
Here's a screenshot of a playground...

Swift: how to suppress interpretation of special characters and provide string literal

The goal is to serialize a Swift object by converting it to a JSON object then converting the JSON object into a JSON string that can be passed over the wire and decoded on the other side.
The problem is producing a valid JSON string.
Newlines must be escaped in a JSON string, but Swift interprets special characters in the escaped string instead of treating the string as a literal.
For example:
let a = "foobar\nhello\nworld"
let escapedString = a.replacingOccurrences(of: "\n", with: "\\n")
print(escapedString)
What gets printed is foobar\nhello\nworld instead of the desired foobar\\nhello\\nworld.
How do you tell Swift to treat a string as a literal and not to interpret special characters within?
UPDATE
As OOPer points out, using debugPrint shows the \\n characters remaining intact.
However, when paired with evaluateJavaScript in WKWebView, the \\n characters are turned into \n, which is the root issue. For example:
let script = "\(callback)(\'\(escapedString)\')"
webView!.evaluateJavaScript(script) { (object: Any?, error: Error?) -> Void in
print("Done invoking \(callback)")
}
There is no unescaped string syntax like in javascript template literals which is probably what you are looking for; maybe they will add it in the future. Unfortunately you therefore have to escape each back slash which sometimes looks very scray, as in your example.
//This is the same as `foobar\nhello\nworld` where each char is a literal
let a = "foobar\\nhello\\nworld"
let escapedString = a.replacingOccurrences(of: "\\n", with: "\\\\n")
//This outputs `foobar\\nhello\\nworld`
print(escapedString)
Maybe you are just mistaking to interpret the output from print.
When you get foobar\nhello\nworld from print(escapedString), escapedString contains 20 characters -- f o o b a r \ n h e l l o \ n w o r l d.
This is a valid JSON string when enclosed between "s.
If you want to check the escaped result in String-literal-like notation, you can use debugPrint:
let a = "foobar\nhello\nworld"
let escapedString = a.replacingOccurrences(of: "\n", with: "\\n")
print(escapedString) //->foobar\nhello\nworld
debugPrint(escapedString) //->"foobar\\nhello\\nworld"
For UPDATE
When using with evaluateJavaScript, you'd better think what is the right code as JavaScript, if you want to represent a JSON escaped string in JavaScript, you would write in .js file (or in <script>...</script>):
someFunc('foobar\\nhello\\nworld');
So, you may need to write something like this:
let a = "foobar\nhello\nworld"
let escapedForJSON = a.replacingOccurrences(of: "\n", with: "\\n")
//In actual code, you may need a little more...
let escapedForJavaScriptString = escapedForJSON.replacingOccurrences(of: "\\", with: "\\\\")
let script = "\(callback)(\'\(escapedForJavaScriptString)\')"
webView!.evaluateJavaScript(script) { (object: Any?, error: Error?) -> Void in
print("Done invoking \(callback)")
}

Swift regex to match unicodes

I am new to swift and want to match all the unicode strings using regex
For example:
var s="😀 emoji 😀"
When I decoded the above string the result is:
"\ud83d\ude00 emoji \ud83d\ude00"
I want to replace each emoji with say *
In java I used the regex as:
"[\uD800-\uDBFF\uDC00-\uDFFF]" and was working
In swift I am using the same regex but its replacing every character with *
I want the result as * emoji *
Help is highly appreciated
The Unicode code point of the emoji you have shown is U+1F600.
(Unicode 9.0 Character Code Charts - Emoticons)
And your regex pattern (which may work for UTF-16 representation) [\uD800-\uDBFF\uDC00-\uDFFF] matches all non-BMP characters -- U+10000...U+10FFFF, which contains most of all emojis but also contains huge non-emoji characters.
So, as you say "[\uD800-\uDBFF\uDC00-\uDFFF]" was working, the equivalent pattern in NSRegularExpression is "[\\U00010000-\\U0010FFFF]".
var s="😀 emoji 😀"
let regex = try! NSRegularExpression(pattern: "[\\U00010000-\\U0010FFFF]", options: [])
let replaced = regex.stringByReplacingMatchesInString(s, options: [], range: NSRange(0..<s.utf16.count), withTemplate: "*") //->"* emoji *"
(Addition)
To see Unicode code points in your string literal:
s.unicodeScalars.forEach {
print(String(format: "U+%04X ", Int($0.value)))
}
For your example string, I get:
U+1F600
U+0020
U+0065
U+006D
U+006F
U+006A
U+0069
U+0020
U+1F600

Unicode to UTF8 in Swift

I am using the Maps API and when searching for some addresses in foreign countries, the address comes back with Unicode characters embedded like this:
"Place du Panth\U00e9on",
"75005 Paris"
The unicode character in this instance is \U00e9 which is é
The trouble I have having is that SwiftyJSON pukes if I have saved this data in a JSON file and try to read it back. SwiftyJSON does not like the back slash character '\' The JSON is valid and even if I could read it, it is still not good as I would rather have é displayed properly as well as all other Unicode characters.
Does anyone have any ideas on how to convert all unicode characters to UTF8 encoding of that character in Swift?
Should I just write a function that searches for all of the Unicode characters and then convert them?
Unless someone has a better idea, I just wrote this function that is doing the trick for me now.
func convertFromUnicode(var myString:String) -> String {
let convertDict:[String:String] = ["\\U00c0":"À", "\\U00c1" :"Á","\\U00c2":"Â","\\U00c3":"Ã","\\U00c4":"Ä","\\U00c5":"Å","\\U00c6":"Æ","\\U00c7":"Ç","\\U00c8":"È","\\U00c9":"É","\\U00ca":"Ê","\\U00cb":"Ë","\\U00cc":"Ì","\\U00cd":"Í","\\U00ce":"Î","\\U00cf":"Ï","\\U00d1":"Ñ","\\U00d2":"Ò","\\U00d3":"Ó","\\U00d4":"Ô","\\U00d5":"Õ","\\U00d6":"Ö","\\U00d8":"Ø","\\U00d9":"Ù","\\U00da":"Ú","\\U00db":"Û","\\U00dc":"Ü","\\U00dd":"Ý","\\U00df":"ß","\\U00e0":"à","\\U00e1":"á","\\U00e2":"â","\\U00e3":"ã","\\U00e4":"ä","\\U00e5":"å","\\U00e6":"æ","\\U00e7":"ç","\\U00e8":"è","\\U00e9":"é","\\U00ea":"ê","\\U00eb":"ë","\\U00ec":"ì","\\U00ed":"í","\\U00ee":"î","\\U00ef":"ï","\\U00f0":"ð","\\U00f1":"ñ","\\U00f2":"ò","\\U00f3":"ó","\\U00f4":"ô","\\U00f5":"õ","\\U00f6":"ö","\\U00f8":"ø","\\U00f9":"ù","\\U00fa":"ú","\\U00fb":"û","\\U00fc":"ü","\\U00fd":"ý","\\U00ff":"ÿ"]
for (key,value) in convertDict {
myString = myString.stringByReplacingOccurrencesOfString(key, withString: value)
}
return myString
}
Instead of hardcoding all the characters I would decode it with an extension like:
extension String {
var decoded : String {
let data = self.data(using: .utf8)
let message = String(data: data!, encoding: .nonLossyASCII) ?? ""
return message
}
}
and then you could use it like this:
let myString = "Place du Panth\\U00e9on"
print(myString.decoded)
Which would print Place du Panthéon

Resources