I want to convert following base64-encoded String in Swift 3:
dfYcSGpvBqyzvkAXkdbHDA==
to its equivalant String:
uöHjo¬³¾#‘ÖÇ
Following websites do the job very fine:
http://www.motobit.com/util/base64-decoder-encoder.asp
http://www.utilities-online.info/base64/#.WG-FwrFh2Rs
So does the PHP's function base64_decode. The documentation of this function says:
Returns FALSE if input contains character from outside the base64
alphabet.
But I am unable to do the same in Swift 3. Following code doesn't do the job too:
func convertBase64ToNormalString(base64String:String)->String!{
let decodedData = Data(base64Encoded: base64String, options: Data.Base64DecodingOptions())
let bytes = decodedData?.bytes
return String(bytes: bytes, encoding: NSUTF8StringEncoding)
}
Here is the contextual information about why I need to convert the base64 string into an string:
My Php developer wants me to send all APIs params values encrypted with AES algorithm. For that, I am using this lib.
He has given me AES key in Hex format (I mentioned in my last question) and iv in base64 (given above) and he instructed me decode this base64 key before using because he was also doing the same in his PHP code. Here is his PHP code of encryption and decryption:
function encryptParamAES($plaintext, $encryptionEnabled = true) {
if (! $encryptionEnabled) {
return $plaintext;
}
// --- ENCRYPTION ---
// Constants =========================================================
// the key should be random binary, use scrypt, bcrypt or PBKDF2 to
// convert a string into a key
// key is specified using hexadecimal
$key = pack ( 'H*', "dcb04a9e103a5cd8b53763051cef09bc66abe029fdebae5e1d417e2ffc2a07a4" );
// create a random IV to use with CBC encoding
$iv_size = 16;
$iv = base64_decode ( "dfYcSGpvBqyzvkAXkdbHDA==" );
// End Of constants ===================================================
// echo "IV: " . base64_encode ( $iv ) . "\n<br>";
// echo "IV Size: " . $iv_size . "\n<br>";
// show key size use either 16, 24 or 32 byte keys for AES-128, 192
// and 256 respectively
// $key_size = strlen ( $key );
// echo "Key size: " . $key_size . "\n<br><br>";
// creates a cipher text compatible with AES (Rijndael block size = 128)
// to keep the text confidential
// only suitable for encoded input that never ends with value 00h
// (because of default zero padding)
$ciphertext = mcrypt_encrypt ( MCRYPT_RIJNDAEL_128, $key, $plaintext, MCRYPT_MODE_CBC, $iv );
// prepend the IV for it to be available for decryption
$ciphertext = $iv . $ciphertext;
// encode the resulting cipher text so it can be represented by a string
$ciphertext_base64 = base64_encode ( $ciphertext );
return $ciphertext_base64;
}
function decryptParamAES($ciphertext_base64, $encryptionEnabled = true) {
if (! $encryptionEnabled) {
return $ciphertext_base64;
}
// --- DECRYPTION ---
// Constants =========================================================
// the key should be random binary, use scrypt, bcrypt or PBKDF2 to
// convert a string into a key
// key is specified using hexadecimal
$key = pack ( 'H*', "dcb04a9e103a5cd8b53763051cef09bc66abe029fdebae5e1d417e2ffc2a07a4" );
// create a random IV to use with CBC encoding
$iv_size = 16;
$iv = base64_decode ( "dfYcSGpvBqyzvkAXkdbHDA==" );
// End Of constants ===================================================
$ciphertext_dec = base64_decode ( $ciphertext_base64 );
// retrieves the IV, iv_size should be created using mcrypt_get_iv_size()
$iv_dec = substr ( $ciphertext_dec, 0, $iv_size );
// retrieves the cipher text (everything except the $iv_size in the front)
$ciphertext_dec = substr ( $ciphertext_dec, $iv_size );
// may remove 00h valued characters from end of plain text
$plaintext_dec = mcrypt_decrypt ( MCRYPT_RIJNDAEL_128, $key, $ciphertext_dec, MCRYPT_MODE_CBC, $iv_dec );
return rtrim ( $plaintext_dec );
}
I just saw this PHP code and wondered, why he is not using $iv as mcrypt_decrypt function's last param!! Will update you on it. But still question remains the same, PHP function base64_decode doesn't return FALSE for the above base64 string!
I tested this function myself by terminal command: php test.php. Here test.php contains following code:
<?php
$iv = base64_decode ( "dfYcSGpvBqyzvkAXkdbHDA==" );
echo $iv;
?>
And the output was: u?Hjo???#???
Looking at your revised question, you're trying to take this base-64 string, and using it as the iv in your AES algorithm. I can understand why you are wondering how to convert that resulting Data into a String, but you should not do that. Yes, there's a rendition of AES that expects the iv as a string. But there's another rendition that expects an Array<UInt8>. So, just like MartinR said in his answer to your other question, build an array of UInt8 instead, like so:
let iv = Array(Data(base64Encoded: "dfYcSGpvBqyzvkAXkdbHDA==")!)
That resulting iv is an Array<UInt8> (also known as [UInt8]). You can use that with your AES function.
My original discussion about converting Data objects to UTF8 strings is below. But the key message is that you shouldn't try to do so. Just build your array of UInt8 and use that with your library's AES function.
Looking at your other question (Convert hex-encoded String to String in Swift 3), you revealed in comments that you were dealing with an AES key. I'm suspicious that we're dealing with a similar issue here (though that was 32 bytes of data, and here we have 16 bytes).
Bottom line, I'd suggest you completely drop this "how to I get a string representation of the data captured in this base-64 string" line of inquiry. If it's an encryption key (or some token or whatever), don't bother trying to represent it as a string. (This is the raison d'être of base-64, to create transmittable string representations of data that isn't a string.)
I'd suggest you step back and describe the broader problem that you are trying to solve. Stop trying to create strings out of these binary payloads. In your code snippet, you successfully create a Data from the base-64 string. The real question, I think, is not "how do I now get a string from that?", but rather "what do I do this Data?"
We can't answer that question without more context about where you got this data and what it is for.
By the way, my original answer to your question is below.
The problem is that base-64 string translates to 16 bytes of data whose hexadecimal representation is
75f61c48 6a6f06ac b3be4017 91d6c70c
But that payload is not valid UTF8 string. The third byte, 1c is not consistent with UTF8 string. If you look at the definition of UTF8, it says that if a byte is in the range of f6–fb, which the second byte is, that the character consists of that and the following two bytes, both of which should be in the range of 21–7e or a0–ff, which 1c is not.
So, this simply is not a valid UTF8 string. Clearly those sites you're using are not gracefully detecting/handling invalid UTF8 strings.
Perhaps the data in this base-64 string was converted from a string using a different encoding. Or, perhaps it was not originally a string at all. Not all binary payloads have clean string representations. Frankly, this is why we use base-64 representations in the first place, to come up with a text representation of a blob of data that is not a string.
If you provide more information about the source of the data contained in this base-64 string, we might be able to advise you further.
Related
Incase of android everything is working perfectly. I want to implement same feature in iOS too but getting different values. Please check the description with images below.
In Java/Android Case:
I tried to convert the string to base64 byte array in java like
byte[] data1 = Base64.decode(balance, Base64.DEFAULT);
Output:
In Swift3/iOS Case:
I tried to convert the string to base64 byte array in swift like
let data:Data = Data(base64Encoded: balance, options: NSData.Base64DecodingOptions(rawValue: 0))!
let data1:Array = (data.bytes)
Output:
Finally solved:
This is due to signed and unsigned integer, meaning unsigned vs signed (so 0 to 255 and -127 to 128). Here, we need to convert the UInt8 array to Int8 array and therefore the problem will be solved.
let intArray = data1.map { Int8(bitPattern: $0) }
In no case should you try to compare data on 2 systems the way you just did. That goes for all types but specially for raw data.
Raw data are NOT presentable without additional context which means any system that does present them may choose how to present them (raw data may represent some text in UTF8 or some ASCII, maybe jpeg image or png or raw RGB pixel data, it might be an audio sample or whatever). In your case one system is showing them as a list of signed 8bit integers while the other uses 8bit unsigned integers for the same thing. Another system might for instance show you a hex string which would look completely different.
As #Larme already mentioned these look the same as it is safe to assume that one system uses signed and the other unsigned values. So to convert from signed (Android) to unsigned (iOS) you need to convert negative values as unsigned = 256+signet so for instance -55 => 256 + (-55) = 201.
If you really need to compare data in your case it is the best to save them into some file as raw data. Then transfer that file to another system and compare native raw data to those in file to check there is really a difference.
EDIT (from comment):
Printing raw data as a string is a problem but there are a few ways. The thing is that many bytes are not printable as strings, may be whitespaces or some reserved codes but mostly the problem is that value of 0 means the end of string in most cases which may exist in the middle of your byte sequence.
So you already have 2 ways of printing byte by byte which is showing Int8 or Uint8 corresponding values. As described in comment converting directly to string may not work as easy as
let string = String(data: data, encoding: .utf8) // Will return nil for strange strings
One way of converting data to string may be to convert each byte into a corresponding character. Check this code:
let characterSequence = data.map { UnicodeScalar($0) } // Create an array of characters from bytes
let stringArray = characterSequence.map { String($0) } // Create an array of strings from array of characters
let myString = stringArray.reduce("", { $0 + $1 }) // Convert an array of strings to a single string
let myString2 = data.reduce("", { $0 + String(UnicodeScalar($1)) }) // Same thing in a single line
Then to test it I used:
let data = Data(bytes: Array(0...255)) // Generates with byte values of 0, 1, 2... up to 255
let myString2 = data.reduce("", { $0 + String(UnicodeScalar($1)) })
print(myString2)
The printing result is:
!"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþ
Then another popular way is using a hex string. It can be displayed as:
let hexString = data.reduce("", { $0 + String(format: "%02hhx",$1) })
print(hexString)
And with the same data as before the result is:
000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9fa0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebfc0c1c2c3c4c5c6c7c8c9cacbcccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedfe0e1e2e3e4e5e6e7e8e9eaebecedeeeff0f1f2f3f4f5f6f7f8f9fafbfcfdfeff
I hope this is enough but in general you could do pretty much anything with array of bytes and show them. For instance you could create an image treating bytes as RGB 8-bit per component if it would make sense. It might sound silly but if you are looking for some patterns it might be quite a witty solution.
How can I get the length (not number of bytes) of a string in its UTF-8 encoded form (PHP's mb_strlen(.., 'UTF-8') equivalent)?
I tried string.characters.count but it does not return the correct length for certain characters like an emoji.
Example:
let s = "✌🏿️"
print(s.characters.count) // prints 2, but should print 3.
You can access the UTF-8 encoding of a string with the .utf8 property. Use count on that to get the number of UTF-8 code units in the string:
let string = "\u{1f603}" // One of the smiley face emojis...
print(string.utf8.count) // prints "4"
Based on your edited question, what you are probably looking for is the number of UnicodeScalars used to encode the string. You access that with the unicodeScalars property:
let s = "✌🏿️"
print(s.unicodeScalars.count) // prints 3
The reason everyone is confused is because your original question asks for the length of the string in its UTF-8 encoded form. The answer that you actually wanted had nothing to do with the length of the string in its UTF-8 encoded form.
I think you are confused about the difference between Unicode "extended grapheme clusters", Unicode code points, and the various encodings (like UTF-8) that can be used to encode a Unicode code point.
A Character in Swift represents what Unicode calls an "extended grapheme cluster". That is to say, it is a single visual character, even if it is made up of multiple Unicode code points.
A Unicode code point is a single linguistic symbol that is given a 32-bit value. Two or more Unicode code points can combine to create a single Character. In Swift, the Unicode code point is represented by the UnicodeScalar type.
When it comes time to store a string, or send it over the internet, or otherwise turn it into data that is represented by bytes, you have to decide how to encode it. There are all kinds of encodings, the most common is probably UTF-8, which encodes the string as a series of UInt8 values.
That's just a brief snippet of the difference between the three concepts. It is actually a really interesting subject and if you Google some of those terms, you will find a lot more good information.
let str = "ačŘ"
print("str has \(str.characters.count) characters") // 3
print("and \(str.utf8.count) bytes as encoded in UTF-8") // 5
update (based on your notes)
let s = "✌🏿️"
let arr:[UInt8] = [226, 156, 140, 240, 159, 143, 191, 239, 184, 143]
var arrCchar = arr.map { (uint8) -> Int8 in
Int8(bitPattern: uint8)
}
arrCchar += [0] // to be null terminated
let str = String.fromCString(&arrCchar)
print(str) // Optional("✌🏿️")
s == str // TRUE !!!!
by characters
s.characters.forEach { (c) -> () in
let str = String(c)
print(str.utf8.map{$0}, "which represents character: ", c)
str.unicodeScalars.forEach({ (u) -> () in
print("composed from unicode scalar(s): ", u.debugDescription)
})
}
/*
[226, 156, 140] which represents character: ✌
composed from unicode scalar(s): "\u{270C}"
[240, 159, 143, 191, 239, 184, 143] which represents character: 🏿️
composed from unicode scalar(s): "\u{0001F3FF}"
composed from unicode scalar(s): "\u{FE0F}"
*/
Every character in Unicode can be represented by one or more unicode scalars. A unicode scalar is a unique 21-bit number (and name) for a character or modifier, such as U+0061 for LOWERCASE LATIN LETTER A("a"), or U+1F425 for FRONT-FACING BABY CHICK ("\U0001f425").
When a Unicode string is written to a text file or some other storage, these unicode scalars are encoded in one of several Unicode-defined formats. Each format encodes the string in small chunks known as code units. These include the UTF-8 format (which encodes a string as 8-bit code units) and the UTF-16 format (which encodes a string as 16-bit code units).
//copy from Apple Developer swift programming guide
I am trying to convert an ISO 8859-1 encoded string to UTF-8.
The following function works with my testdata which contains german umlauts, but I'm not quite sure what source encoding the rune(b) cast assumes. Is it assuming some kind of default encoding, e.g. ISO8859-1 or is there any way to tell it what encoding to use?
func toUtf8(iso8859_1_buf []byte) string {
var buf = bytes.NewBuffer(make([]byte, len(iso8859_1_buf)*4))
for _, b := range(iso8859_1_buf) {
r := rune(b)
buf.WriteRune(r)
}
return string(buf.Bytes())
}
rune is an alias for int32, and when it comes to encoding, a rune is assumed to have a Unicode character value (code point). So the value b in rune(b) should be a unicode value. For 0x00 - 0xFF this value is identical to Latin-1, so you don't have to worry about it.
Then you need to encode the runes into UTF8. But this encoding is simply done by converting a []rune to string.
This is an example of your function without using the bytes package:
func toUtf8(iso8859_1_buf []byte) string {
buf := make([]rune, len(iso8859_1_buf))
for i, b := range iso8859_1_buf {
buf[i] = rune(b)
}
return string(buf)
}
The effect of
r := rune(expression)
is:
Declare variable r with type rune (alias for int32).
Initialize variable r with the value of expresion.
No (re)encoding is involved and saying which one should be optionally used is possible only by explicitly writing/handling some re-encoding in code. Luckily, in this case no (re)encoding is necessary, Unicode incorporated those codes of ISO 8859-1 in a comparable way as ASCII. (If I checked correctly here)
I wanted to know how does writeInt treat a 32 bit unsigned or a signed integer passed to it?
It is easy to understand that how it works with a hexadecimal number. Util.Print will print the corresponding ASCII Characters.
0x41424344 will be broken down into 4 1 byte characters, A, B, C and D.
It seems like its different when an integer is passed to writeInt.
for instance,
var test: ByteArray = new ByteArray();
test.writeInt(0x41424344); // prints ABCD
test.writeInt(2590463591); // prints gVg
test.writeInt(1119885898); // prints BÀJ
I am unclear how the Util.Print function treats the integers written into the ByteArray by writeInt.
The characters, gVg do not correspond to the integer number, 2590463591
According to the definition of writeInt here:
http://livedocs.adobe.com/livecycle/es/sdkHelp/common/langref/flash/utils/ByteArray.html#writeInt%28%29
It states that it works with a 32 Bit Signed Integer.
If someone can elaborate over how it translates the integers to characters, it would be helpful.
EDIT: And how does it handle negative integers?
For instance,
test.writeInt(-11338743); // prints ÿRü
So,
-11338743 = 0xFF52FC09
is that correct?
Thanks.
If you interpret encoded bytes as ASCII
dec hex ascii
1094861636 = 0x41424344 = ABCD
2590463591 = 0x9A675667 = gVg
1119885898 = 0x42C01A4A = BÀJ
Also, note that int vs unsigned int would implement different functions:
var test:ByteArray = new ByteArray();
test.writeInt(0x41424344);
test.writeUnsignedInt(0x41424344);
I am extracting metadata of a song using following code ,And how I can convert the byte array (buf) to string? Please help me,Thanks in advance.
String mint = httpConnection.getHeaderField("icy-metaint");
int b = 0;
int count =0;
while(count++ < length){
b = inputStream.read();
}
int metalength = ((int)b)*16;
if(metalength <= 0)
return;
byte buf[] = new byte[metalength];
inputStream.read(buf,0,buf.length);
1). Read bytes from the stream:
// use net.rim.device.api.io.IOUtilities
byte[] data = IOUtilities.streamToBytes(inputStream);
2). Create a String from the bytes:
String s = new String(data, "UTF-8");
This implies you know the encoding the text data was encoded with before sending from the server. In the example right above the encoding is UTF-8. BlackBerry supports the following character encodings:
* "ISO-8859-1"
* "UTF-8"
* "UTF-16BE"
* "US-ASCII"
The default encoding is "ISO-8859-1". So when you use String(byte[] data) constructor it is the same as String(byte[] data, "ISO-8859-1").
If you don't know what encoding the server uses then I'd recommend to try UTF-8 first, because by now it has almost become a default one for servers. Also note the server may send the encoding via an http header, so you can extract it from the response. However I saw a lot of servers which put "UTF-8" into the header while actually use ISO-8859-1 or even ASCII for the data encoding.
String has a constructor that accepts a byte array that you can use for this.
See e.g. http://java.sun.com/javame/reference/apis/jsr139/java/lang/String.html
As #Heiko mentioned you can create string directly using the constructor. This applies to blackberry java too:
byte[] array = {1,2,3,4,5};
String str = new String(array);