Swift URL.path changes encoding of utf-8 characters - ios
Why does converting a String to an URL in Swift 4.2 and then converting the URL back to a String using url.path change the encoding of special characters like german umlauts (ä, ö, ü), even if I use a utf-8 encoding?
I wrote some sample code to show my problem. I encoded the strings to base64 in order to show that there is a difference.
I also have a similar unsolved problem with special characters and swift here.
Sample Code
let string = "/path/to/file"
let stringUmlauts = "/path/to/file/with/umlauts/testäöü"
let base64 = Data(string.utf8).base64EncodedString()
let base64Umlauts = Data(stringUmlauts.utf8).base64EncodedString()
print(base64, base64Umlauts)
let url = URL(fileURLWithPath: string)
let urlUmlauts = URL(fileURLWithPath: stringUmlauts)
let base64Url = Data(url.path.utf8).base64EncodedString()
let base64UrlUmlauts = Data(urlUmlauts.path.utf8).base64EncodedString()
print(base64Url, base64UrlUmlauts)
Output
The base64 and base64Url string stay the same but the base64Umlauts and the base64UrlUmlauts are different.
"L3BhdGgvdG8vZmlsZQ==" for base64
"L3BhdGgvdG8vZmlsZQ==" for base64Url
"L3BhdGgvdG8vZmlsZS93aXRoL3VtbGF1dHMvdGVzdMOkw7bDvA==" for base64Umlauts
"L3BhdGgvdG8vZmlsZS93aXRoL3VtbGF1dHMvdGVzdGHMiG/MiHXMiA==" for base64UrlUmlauts
When I put the base64Umlauts and base64UrlUmlauts strings into an online Base64 decoder, they both show /path/to/file/with/umlauts/testäöü, but the ä, ö, ü are different (not visually).
stringUmlauts.utf8 uses the Unicode characters äöü.
But urlUmlauts.path.utf8 uses the Unicode characters aou each followed by the combining ¨.
This is why you get different base64 encoding - the characters look the same but are actually encoded differently.
What's really interesting is that Array(stringUmlauts) and Array(urlUmlauts.path) are the same. The difference doesn't appear until you perform the UTF-8 encoding of the otherwise exact same String values.
Since the base64 encoding is irrelevant, here's a more concise test:
let stringUmlauts = "/path/to/file/with/umlauts/testäöü"
let urlUmlauts = URL(fileURLWithPath: stringUmlauts)
print(stringUmlauts, urlUmlauts.path) // Show the same
let rawStr = stringUmlauts
let urlStr = urlUmlauts.path
print(rawStr == urlStr) // true
print(Array(rawStr) == Array(urlStr)) // true
print(Array(rawStr.utf8) == Array(urlStr.utf8)) // false!!!
So how is the UTF-8 encoding of two equal strings different?
One solution to this is to use precomposedStringWithCanonicalMapping on the result of path.
let urlStr = urlUmlauts.path.precomposedStringWithCanonicalMapping
Now you get true from:
print(Array(rawStr.utf8) == Array(urlStr.utf8)) // now true
Related
encode url swift withAllowedCharacters not working encoding %20 as %2520 (means encoding % as %25)
Following is my code for URL encoding extension String { var encoded: String { return self.addingPercentEncoding(withAllowedCharacters: .urlQueryAllowed) ?? "" } } But I am facing issue if url contains %20. it is encoding it as %2520 although I have added urlQueryAllowed Original url: https://mydomain.in/retailers_data_v2/retailer/320/17372-Tea%20Coffee%20Vending%20Machine.JPG Encoded url: https://mydomain.in/retailers_data_v2/retailer/320/17372-Tea%2520Coffee%2520Vending%2520Machine.JPG
If you have an already encoded URL String, you first need to remove percent encoding before applying it again. If you aren't sure whether the URL you have is already encoded or not, you can simply use an if let on removingPercentEncoding and depending on its result, either call addingPercentEncoding on the original URL or on the one that you removed the encoding from. let alreadyEncodedURLString = "https://mydomain.in/retailers_data_v2/retailer/320/17372-Tea%20Coffee%20Vending%20Machine.JPG" if let unencodedURLString = alreadyEncodedURLString.removingPercentEncoding { unencodedURLString.addingPercentEncoding(withAllowedCharacters: .urlQueryAllowed) } else { alreadyEncodedURLString.addingPercentEncoding(withAllowedCharacters: .urlQueryAllowed) }
How to identify UTF-8 encoded text from a string and convert it to smiley\emoticon in Swift
I am doing App which support Smiley/emoticons feature. From the backend I am getting response like this str = "Hferuhggeðððððfjjnjrnjgnejfnsgjen". This string response has a UTF-8 encoded text in it, for the above str UTF-8 encode text is "ððððð". Now I need to identify the location of the utf-8 encoded text from the response obtained, and convert that encoded text to an emoticon/smiley.
Finally I found solution if you decode string you will get smiley ,please find the code let che = descriptionText.cString(using: .isoLatin1) let decode_string = String(cString: che!, encoding: .utf8) This worked for me.
RFC2045-MIME variant of Base 64 in swift
I need to convert a encode a string with the RFC2045-MIME variant of base64. However I cant find any options to do this in swift. At the moment i use this method: var str = "\(test1):\(test2)" str = str.data(using: .utf8)!.base64EncodedString(options: Data.Base64EncodingOptions(rawValue: UInt(0))) but this is just the standard base64 encoding not the RFC2045-MIME variant. How can i use the RFC2045-MIME variant?
The only differences between the base64 encodings specified in RFC 2045 and RFC 4686 are that RFC 2045 specifies a maximum line length of 76 characters, with lines separated by CRNL. The documentation of base64EncodedString(options:) says the default line ending is CRNL, so: let data = str.data(using: .utf8)! let b64 = data.base64EncodedString(options: .lineLength76Characters)
Can iOS URL support unicode characters within top level domain?
I'm building an iOS app that takes urls as input. Unicode characters are valid for a tld but when I instantiate a valid URL that contains unicode characters NSURL returns nil. Is this even possible? swift eg. URL(string: "http://➡.ws/䨹")
How to use special characters in URL (Swift 3) : let myUrl = "http://➡.ws/䨹" as String let url = URL(string: myUrl) // nil here .. problem ! if let encoded = myUrl.addingPercentEncoding(withAllowedCharacters: .urlFragmentAllowed){ let urlencoded = URL(string: encoded) // "http://%E2%9E%A1.ws/%E4%A8%B9" here :) no problem ^^ }
Unicode to UTF8 in Swift
I am using the Maps API and when searching for some addresses in foreign countries, the address comes back with Unicode characters embedded like this: "Place du Panth\U00e9on", "75005 Paris" The unicode character in this instance is \U00e9 which is é The trouble I have having is that SwiftyJSON pukes if I have saved this data in a JSON file and try to read it back. SwiftyJSON does not like the back slash character '\' The JSON is valid and even if I could read it, it is still not good as I would rather have é displayed properly as well as all other Unicode characters. Does anyone have any ideas on how to convert all unicode characters to UTF8 encoding of that character in Swift? Should I just write a function that searches for all of the Unicode characters and then convert them?
Unless someone has a better idea, I just wrote this function that is doing the trick for me now. func convertFromUnicode(var myString:String) -> String { let convertDict:[String:String] = ["\\U00c0":"À", "\\U00c1" :"Á","\\U00c2":"Â","\\U00c3":"Ã","\\U00c4":"Ä","\\U00c5":"Å","\\U00c6":"Æ","\\U00c7":"Ç","\\U00c8":"È","\\U00c9":"É","\\U00ca":"Ê","\\U00cb":"Ë","\\U00cc":"Ì","\\U00cd":"Í","\\U00ce":"Î","\\U00cf":"Ï","\\U00d1":"Ñ","\\U00d2":"Ò","\\U00d3":"Ó","\\U00d4":"Ô","\\U00d5":"Õ","\\U00d6":"Ö","\\U00d8":"Ø","\\U00d9":"Ù","\\U00da":"Ú","\\U00db":"Û","\\U00dc":"Ü","\\U00dd":"Ý","\\U00df":"ß","\\U00e0":"à","\\U00e1":"á","\\U00e2":"â","\\U00e3":"ã","\\U00e4":"ä","\\U00e5":"å","\\U00e6":"æ","\\U00e7":"ç","\\U00e8":"è","\\U00e9":"é","\\U00ea":"ê","\\U00eb":"ë","\\U00ec":"ì","\\U00ed":"í","\\U00ee":"î","\\U00ef":"ï","\\U00f0":"ð","\\U00f1":"ñ","\\U00f2":"ò","\\U00f3":"ó","\\U00f4":"ô","\\U00f5":"õ","\\U00f6":"ö","\\U00f8":"ø","\\U00f9":"ù","\\U00fa":"ú","\\U00fb":"û","\\U00fc":"ü","\\U00fd":"ý","\\U00ff":"ÿ"] for (key,value) in convertDict { myString = myString.stringByReplacingOccurrencesOfString(key, withString: value) } return myString }
Instead of hardcoding all the characters I would decode it with an extension like: extension String { var decoded : String { let data = self.data(using: .utf8) let message = String(data: data!, encoding: .nonLossyASCII) ?? "" return message } } and then you could use it like this: let myString = "Place du Panth\\U00e9on" print(myString.decoded) Which would print Place du Panthéon