Cleaning malformed UTF-8 data - ios

Background
With Swift, I'm trying to fetch HTML via URLSession rather than by loading it into a WKWebView first as I only need the HTML and none of the subresources. I'm running into a problem with certain pages that work when loaded into WKWebView but when loaded via URLSession (or even a simple NSString(contentsOf: url, encoding String.Encoding.utf8.rawValue)) the UTF-8 conversion fails.
How to reproduce
This fails (prints "nil"):
print(try? NSString(contentsOf: URL(string: "http://www.huffingtonpost.jp/techcrunch-japan/amazon-is-gobbling-whole-foods-for-a-reported-13-7-billion_b_17171132.html?utm_hp_ref=japan&ir=Japan")!, encoding: String.Encoding.utf8.rawValue))
But changing the URL to the site's homepage, it succeeds:
print(try? NSString(contentsOf: URL(string: "http://www.huffingtonpost.jp")!, encoding: String.Encoding.utf8.rawValue))
Question
How can I "clean" the data returned by a URL that contains malformed UTF-8? I'd like to either remove or replace any invalid sequences in the malformed UTF-8 so that the rest of it can be viewed. WKWebView is able to render the page just fine (and claims it's UTF-8 content as well), as you can see by visiting the URL: http://www.huffingtonpost.jp/techcrunch-japan/amazon-is-gobbling-whole-foods-for-a-reported-13-7-billion_b_17171132.html?utm_hp_ref=japan&ir=Japan

Here is an approach to create a String from (possibly) malformed
UTF-8 data:
Read the website contents into a Data object.
Append a 0 byte to make it a "C string"
Use String(cString:) for the conversion. This initializer replaces ill-formed UTF-8 code unit sequences with the Unicode replacement character ("\u{FFFD}").
Optionally: Remove all occurrences of the replacement character.
Example for the "cleaning" process:
var data = Data(bytes: [65, 66, 200, 67]) // malformed UTF-8
data.append(0)
let s = data.withUnsafeBytes { (p: UnsafePointer<CChar>) in String(cString: p) }
let clean = s.replacingOccurrences(of: "\u{FFFD}", with: "")
print(clean) // ABC
Swift 5:
var data = Data([65, 66, 200, 67]) // malformed UTF-8
data.append(0)
let s = data.withUnsafeBytes { p in
String(cString: p.bindMemory(to: CChar.self).baseAddress!)
}
let clean = s.replacingOccurrences(of: "\u{FFFD}", with: "")
print(clean) // ABC
Of course this can be defined as a custom init method:
extension String {
init(malformedUTF8 data: Data) {
var data = data
data.append(0)
self = data.withUnsafeBytes { (p: UnsafePointer<CChar>) in
String(cString: p).replacingOccurrences(of: "\u{FFFD}", with: "")
}
}
}
Swift 5:
extension String {
init(malformedUTF8 data: Data) {
var data = data
data.append(0)
self = data.withUnsafeBytes{ p in
String(cString: p.bindMemory(to: CChar.self).baseAddress!)
}.replacingOccurrences(of: "\u{FFFD}", with: "")
}
}
Usage:
let data = Data(bytes: [65, 66, 200, 67])
let s = String(malformedUTF8: data)
print(s) // ABC
The cleaning can be done more "directly" using transcode with
extension String {
init(malformedUTF8 data: Data) {
var utf16units = [UInt16]()
utf16units.reserveCapacity(data.count) // A rough estimate
_ = transcode(data.makeIterator(), from: UTF8.self, to: UTF16.self,
stoppingOnError: false) { code in
if code != 0xFFFD {
utf16units.append(code)
}
}
self = String(utf16CodeUnits: utf16units, count: utf16units.count)
}
}
This is essentially what String(cString:)
does, compare
CString.swift and
StringCreate.swift.
Yet another option is to use the UTF8 codecs decode() method
and ignore errors:
extension String {
init(malformedUTF8 data: Data) {
var str = ""
var iterator = data.makeIterator()
var utf8codec = UTF8()
var done = false
while !done {
switch utf8codec.decode(&iterator) {
case .emptyInput:
done = true
case let .scalarValue(val):
str.unicodeScalars.append(val)
case .error:
break // ignore errors
}
}
self = str
}
}

Related

Swift Unzipping zip file and find xml file from base64 data from Gmail API

This question is regarding a DMARC report viewer app in iOS 13 using SwiftUI and Gmail API. The reports are mailed to our admin email id by google in xml format which will be zipped. So basically it is a zip attachment. So here, GMail API is used to access those specific mail using filter and got all the base64 encoded data from API. Also decoded it to Data type data. That far is OK. Next part is where data of zip file in byte format is decompressed and extract xml file inside in String type. Then I need to parse XML. I think I can figure out parsing with XMLParser.
Question: how to decompress zip file in Data type and get xml file from it as String type?
INPUT: String in Base64 format from GMail API fetch (A zip file attachment with only 1 xml file inside)
OUTPUT: String in XML format
PLATFORM: iOS 13/Swift 5.2/SwiftUI/Xcode 11.4
ACTION:
(INPUT)
base64: String | Decode -> Data
attachment.zip: Data | Decompress -> [Data]
ListOfFiles: [Data] | FirstIndex -> Data
dmarc.xml: Data | ContentOfXML -> String
(OUTPUT)
Update: I have tried an external package called Zip and it also failed.
let path = try! FileManager.default.url(for: .documentDirectory, in: .userDomainMask, appropriateFor: nil, create: true)
let url = path.appendingPathComponent(messageId+".zip")
do {
try data.write(to: url)
} catch {
print("Error while writing: "+error.localizedDescription)
}
do {
let unzipDirectory = try Zip.quickUnzipFile(url)
print(unzipDirectory)
} catch let error as NSError {
print("Error while unzipping: "+error.localizedDescription)
}
This code resulted in following error
Error while unzipping: The operation couldn’t be completed. (Zip.ZipError error 1.)
Finally I found it. As it is mentioned in Ref 1,The email bodies are encoded in 7-bit US-ASCII data. So this is why the base64 decoding did not work.
As defined in the rfc1341:
An encoding type of 7BIT requires that the body is already in a
seven-bit mail- ready representation. This is the default value --
that is, "Content-Transfer-Encoding: 7BIT" is assumed if the
Content-Transfer-Encoding header field is not present.
The whole code worked after adding the following.
let edata: String = result.data.replacingOccurrences(of: "-", with: "+").replacingOccurrences(of: "_", with: "/")
As it is mentioned in Ref 2, it just need character replacement on '-' with '+' and '_' with '/' inside base64 data received from gmail api.
func getAttachedData(messageId: String, attachmentId: String) {
decode(self.urlBase+messageId+"/attachments/"+attachmentId+"?"+self.urlKey) { (result: Attachment) in
let edata: String = result.data.replacingOccurrences(of: "-", with: "+").replacingOccurrences(of: "_", with: "/")
if let data = Data(base64Encoded: edata, options: .ignoreUnknownCharacters) {
let filemanager = FileManager.default
let path = try! filemanager.url(for: .documentDirectory, in: .userDomainMask, appropriateFor: nil, create: true)
let url = path.appendingPathComponent(messageId+".zip")
do {
try data.write(to: url)
} catch {
print("Error while writing: "+error.localizedDescription)
}
do {
let unzipDirectory = try Zip.quickUnzipFile(url)
print("Unzipped")
do {
let filelist = try filemanager.contentsOfDirectory(at: unzipDirectory, includingPropertiesForKeys: [], options: [])
for filename in filelist {
print(filename.lastPathComponent)
print(filename.relativeString)
do {
let text = try String(contentsOf: filename, encoding: .utf8)
print(text)
DispatchQueue.main.async {
self.attachments.append(text)
}
} catch let error as NSError {
print("Error: \(error.localizedDescription)")
}
}
} catch let error {
print("Error: \(error.localizedDescription)")
}
} catch let error as NSError {
print("Error while unzipping: "+error.localizedDescription)
}
}
}
}
Ref 1: https://stackoverflow.com/a/58590759/2382813
Ref 2: https://stackoverflow.com/a/24986452/2382813

JOSESwift jwe encryption failed to decode in the nimbus server

Had anybody used JOSESwift successfully? In my case, decryption in the server failing, probably cannot find the matching private key or wrong with encryption. Getting error 500.
My code is, getting the public keys from a server.
keys?.keys?.forEach({ (key) in
BPLogger.debug("\(key)")
do {
let jwkData = key.toJSONString()?.data(using: .utf8)
let rsaKey = try RSAPublicKey(data: jwkData!)
BPLogger.log("key components: \(rsaKey.parameters)")
BpidCache.shared.joseRsaKey = rsaKey
self?.generateParametersJose()
completion()
return
} catch {
BPLogger.debug("Error: \(error)")
}
})
The server expected a 'kid' field in the jose header, which was missing in the framework. So I have added it... The backend Java server uses nimbus library.
func generateParametersJose() {
let rsa = BpidCache.shared.joseRsaKey
var publicKey: SecKey? = nil
do {
publicKey = try rsa?.converted(to: SecKey.self)
} catch {
BPLogger.log("\(error)")
}
var header = JWEHeader(algorithm: .RSA1_5, encryptionAlgorithm: .A256CBCHS512)
// header.parameters["kid"] = "1"
let jwk = MidApi.Model.JWTKey(key: cek);
let jwkData = try! JSONEncoder().encode(jwk)
BPLogger.debug("jwkData = \(String(data: jwkData, encoding: .utf8)!)")
let payload = Payload(jwkData)
// Encrypter algorithms must match header algorithms.
guard let encrypter = Encrypter<SecKey>(keyEncryptionAlgorithm: .RSA1_5, encryptionKey: publicKey!, contentEncyptionAlgorithm: .A256CBCHS512) else {
return
}
guard let jwe = try? JWE(header: header, payload: payload, encrypter: encrypter) else {
BPLogger.error("Falied jwe creation.")
return
}
var comps = jwe.compactSerializedString.components(separatedBy: ".")
var jweHeader = comps.first
let data = jweHeader?.base64URLDecode()
var orgH = try! JSONDecoder().decode(BPJweHeader.self, from: data!)
orgH.kid = "1"
let newJson = try! JSONEncoder().encode(orgH).base64URLEncodedString()
comps[0] = newJson
let newHeader = comps.joined(separator: ".")
BPLogger.log("jwe.compactSerializedString = \(newHeader)")
headers = ["X-Encrypted-Key": newHeader]
// headers = ["X-Encrypted-Key": jwe.compactSerializedString] // this also fails
}
What am I doing wrong?
The latest version of JOSESwift (1.3.0) contains a fix for the problem that prevented setting additional header parameters.
You can now set the additional header parameters listed in RFC-7516. Setting the "kid" parameter like you tried to do in your question works like this:
var header = JWEHeader(algorithm: .RSA1_5, encryptionAlgorithm: .A256CBCHS512)
header.kid = "1"
If you use the framework via CocoaPods, make sure to run pod repo update to make sure you install the latest version which contains the fix.

Import Swift class created dynamically

I am experimenting with a piece of code and I need some assistance. I am creating a file that is a swift file that contains a class and a variable. I am able to successfully create and read the file. Now, is it possible for my to use this swift file and access its variable (v, in this case)?
func writeF() {
let file = "Sample.swift"
let text = "import Foundation \n" +
"public class Sample { \n" +
" let v: Int = 0 \n" +
"}"
if let dir = NSSearchPathForDirectoriesInDomains(NSSearchPathDirectory.DocumentDirectory, NSSearchPathDomainMask.AllDomainsMask, true).first {
let path = NSURL(fileURLWithPath: dir).URLByAppendingPathComponent(file)
//writing
do {
try text.writeToURL(path, atomically: false, encoding: NSUTF8StringEncoding)
}
catch {print("error writing file")}
//reading
do {
let text2 = try NSString(contentsOfURL: path, encoding: NSUTF8StringEncoding)
print(text2)
}
catch {
print("error reading file")
}
}
}
You can't add code on runtime. When your code is compiled, there are no *.swift files left, that can be read by humans. After compiling, your code is basically 0 and 1 only.
As FelixSFD said in their answer, you cannot dynamically build a Swift file and compile it at runtime on the device, at least not in the normal Sandboxed environment. If you have a Jailbroken device, you can build and install the Swift open source runtime and compile programs on-the-fly that way.
As an alternative, you could look into the JavaScriptCore framework to build and run dynamic JavaScript code, and bridge it into your app.
Here's a quick example of passing an object to JavaScript and returning the same object with a class mapping:
import JavaScriptCore
let js = "function test(input) { return input }"
class TestClass: NSObject {
var name: String
init(name: String) {
self.name = name
}
}
let context = JSContext()
context.evaluateScript(js)
let testFunc = context.objectForKeyedSubscript("test")
let result = testFunc.callWithArguments([TestClass(name: "test")])
result.toDictionary()
let testObj = result.toObjectOfClass(TestClass.self) as? TestClass
testObj?.name // "test"

Issue with converting data string to UTF8

I am getting some datas with JSON but when I get data the names tring is something like this :
{
code = 200;
found = 1;
name = "\\u0635\\u0641\\u062d\\u0647 \\u0631\\u0633\\u0645\\u06cc \\u0633\\u0627\\u06cc\\u062a \\u0648\\u0631\\u0632\\u0634 \\u06f3";
}
How can fix it ? I tried to convert it to UTF8 encoding and still nothing ! I have to say the URL works fine in Safari with UTF8 encoding
Here is my code :
let userURL: String = "https://myurl.com/xxx"
let ut8 = userURL.stringByAddingPercentEncodingWithAllowedCharacters(NSCharacterSet.URLQueryAllowedCharacterSet())
NSURLSession.sharedSession().dataTaskWithURL(NSURL(string:ut8!)!, completionHandler: { (data, response, error) -> Void in
// Check if data was received successfully
if error == nil && data != nil {
do {
if let jsonResult = try NSJSONSerialization.JSONObjectWithData(data!, options: []) as? NSDictionary {
print(jsonResult)
}
.....
}
Problem Solved: There was a problem in server , now fixed
It's just what the print method outputs and it sometimes has problem with encoding. You should be fine when render those text in your UI.
To convert the data string into UTF8 string in Swift
yourNSStringObject.utf8
To convert the data string into UTF8 string in Objective C use
[yourNSStringObject UTF8String];

How can I get the Swift/Xcode console to show Chinese characters instead of unicode?

I am using this code:
import SwiftHTTP
var request = HTTPTask()
var params = ["text": "这是中文测试"]
request.requestSerializer.headers["X-Mashape-Key"] = "jhzbBPIPLImsh26lfMU4Inpx7kUPp1lzNbijsncZYowlZdAfAD"
request.responseSerializer = JSONResponseSerializer()
request.POST("https://textanalysis.p.mashape.com/segmenter", parameters: params, success: {(response: HTTPResponse) in if let json: AnyObject = response.responseObject { println("\(json)") } },failure: {(error: NSError, response: HTTPResponse?) in println("\(error)") })
The Xcode console shows this as a response:
{
result = "\U8fd9 \U662f \U4e2d\U6587 \U6d4b\U8bd5";
}
Is it possible to get the console to show the following?:
{
result = "这 是 中文 分词 测试"
}
If so, what do I need to do to make it happen?
Thanks.
Instead of
println(json)
use
println((json as NSDictionary)["result"]!)
This will print the correct Chinese result.
Reason: the first print will call the debug description for NSDictionary which escapes not only Chinese chars.
Your function actually prints the response object, or more accurately a description thereof. The response object is a dictionary and the unicode characters are encoded with \Uxxxx in their descriptions.
See question: NSJSONSerialization and Unicode, won't play nicely together
To access the result string, you could do the following:
if let json: AnyObject = response.responseObject {
println("\(json)") // your initial println statement
if let jsond = json as? Dictionary<String,AnyObject> {
if let result = jsond["result"] as? String {
println("result = \(result)")
}
}
}
The second println statement will print the actual string and this code actually yields:
{
result = "\U8fd9 \U662f \U4e2d\U6587 \U6d4b\U8bd5";
}
result = 这 是 中文 测试

Resources