Some utf-8 strings base64 encoded by php can not be decoded using iOS base64 library? - ios

Here is one piece of Chinese utf-8 text which is encoded by PHP on the server-side, but when I decode it with iOS, it returns null.
I also tried this online tool where text can be decoded well.
NSData *decodedData = [[NSData alloc] initWithBase64EncodedString:content options:0];
content = [[NSString alloc] initWithData:decodedData encoding:NSUTF8StringEncoding];
5aW96ZuF77yM5ZKx5p2l5LiA5L+X55qE77yM5pS56Ieq5Lic5Y2X6KW/5YyX6aOO44CCCuS4juS9oOebuOmAou+8jOWFqOaYr+acuue8mOW3p+WQiOOAguWPr+Wtpui1t+adpeWNtOW/g+aGlOaCtOOAggrmgLvmmK/ovpPkuoborqnlho3ljrvlrabvvIzlrabkuobo
Here is the test code for debug this issue with xcode:
NSString * = #"5aW96ZuF77yM5ZKx5p2l5LiA5L+X55qE77yM5pS56Ieq5Lic5Y2X6KW/5YyX6aOO44CCCuS4juS9oOebuOmAou+8jOWFqOaYr+acuue8mOW3p+WQiOOAguWPr+Wtpui1t+adpeWNtOW/g+aGlOaCtOOAggrmgLvmmK/ovpPkuoborqnlho3ljrvlrabvvIzlrabkuobo";
//
NSData *decodedData = [[NSData alloc] initWithBase64EncodedString: options:0];
NSString *content = [[NSString alloc] initWithData:decodedData encoding:NSUTF8StringEncoding] ;
NSLog(content);

Your revised question features a base64 string of:
5aW96ZuF77yM5ZKx5p2l5LiA5L+X55qE77yM5pS56Ieq5Lic5Y2X6KW/5YyX6aOO44CCCuS4juS9oOebuOmAou+8jOWFqOaYr+acuue8mOW3p+WQiOOAguWPr+Wtpui1t+adpeWNtOW/g+aGlOaCtOOAggrmgLvmmK/ovpPkuoborqnlho3ljrvlrabvvIzlrabkuobo
This string has a length that is a multiple of four bytes, so the lack of the =/== terminator at the end is not the problem. And, in fact, initWithBase64EncodedString decodes it successfully:
e5a5bde9 9b85efbc 8ce592b1 e69da5e4 b880e4bf 97e79a84 efbc8ce6 94b9e887
aae4b89c e58d97e8 a5bfe58c 97e9a38e e380820a e4b88ee4 bda0e79b b8e980a2
efbc8ce5 85a8e698 afe69cba e7bc98e5 b7a7e590 88e38082 e58fafe5 ada6e8b5
b7e69da5 e58db4e5 bf83e686 94e682b4 e380820a e680bbe6 98afe8be 93e4ba86
e8aea9e5 868de58e bbe5ada6 efbc8ce5 ada6e4ba 86e8
The issue here is that this appears to not be a valid UTF8 string. In fact, when I run it through the http://base64decode.net site you referenced in your original question, it is also unable to convert it to a UTF8 string (I notice that your screen snapshots are using a different converter web site). When I ran it through another converter, it converted what it could, but then complained about the character following 学了 (which is, coincidentally, the character at which your base64 converter web site stopped, too).
By the way, the UTF8 representation of 了 is e4 ba 86. And you'll see that near the end of the hex representation of your base 64 string, followed by one more byte, e8. The thing is, e8, by itself, is not a valid UTF8 character. It almost looks looks like you took a base64 encoded string and just grabbed the first 200 bytes, disregarding whether that resulted in cutting the UTF8 character off in the middle or not.
The original question featured a base64 string of:
5aW96ZuF77yM5ZKx5p2l5LiA5L+X55qE77yM5pS56Ieq5Lic5Y2X6KW/5YyX6aOO44CCCuS4juS9oOebuOmAou+8jOWFqOaYr+acuue8mOW3p+WQiOOAguWPr+Wtpui1t+adpeWNtOW/g+aGlOaCtOOAggrmgLvmmK/
That is not valid base64. It should be a multiple of four bytes in length, but that is only 163 characters, which is missing a character. Either your server isn't properly terminating the base64 string, or it got cut off for some reason.
For example, if I add a = to get it up to 164 characters, I get a valid base64 string:
5aW96ZuF77yM5ZKx5p2l5LiA5L+X55qE77yM5pS56Ieq5Lic5Y2X6KW/5YyX6aOO44CCCuS4juS9oOebuOmAou+8jOWFqOaYr+acuue8mOW3p+WQiOOAguWPr+Wtpui1t+adpeWNtOW/g+aGlOaCtOOAggrmgLvmmK/=
Adding the = would be the right solution if the server simply neglected to terminate the base64 string properly. Anyway, that can be base64-decoded to:
好雅,咱来一俗的,改自东南西北风。
与你相逢,全是机缘巧合。可学起来却心憔悴。
总是
Is that what you were expecting?
Perhaps you should take a look at your base64 routine on your server? Or if it's getting truncated, look at how you are receiving it and compare the server's original base64 string length to what you have here.
For information about adding = or == to the end of a base 64 encoded string, see the base64 wikipedia page.

#Rob is right.
Check this Check NSData won't accept valid base64 encoded string
But in case if your server is not returning valid JSON with "=" or "==" then you need to use external methods to perform base64decode. Those methods can decode even if base64string does not have "=" symbol at the end.

Related

How to encode a STRING variable into a given code page

I've got a string variable containing a text that I need to encode and write to a file, in UTF-16LE code page.
Currently the following code generates a UTF-8 file and I don't see any option in the statement OPEN DATASET to generate the file in UTF-16LE.
REPORT zmyprogram.
DATA(filename) = `/tmp/myfile`.
OPEN DATASET filename IN TEXT MODE ENCODING DEFAULT FOR OUTPUT.
TRANSFER 'HELLO WORLD' TO filename.
CLOSE DATASET filename.
I guess one solution is to first encode the string in memory, then write the encoded bytes to the file.
Generally speaking, how to encode a string of characters into a given code page, in memory?
In the first part, I explain how to encode a string of characters into a given code page (all is done in memory), and in the second part, I explain specifically how to write files to the application server in a given code page.
General way (all in memory)
If a string of characters (type STRING) has to be encoded, the result has to be stored in a string of bytes, which corresponds to the built-in data type XSTRING.
There are several possibilities which depend on the ABAP version:
Since 7.53, use the class CL_ABAP_CONV_CODEPAGE:
DATA(xstring) = cl_abap_conv_codepage=>create_out( codepage = `UTF-16LE` )->convert( source = `ABCDE` ).
Since 7.02, use the class CL_ABAP_CODEPAGE:
DATA xstring TYPE xstring.
xstring = cl_abap_codepage=>convert_to( source = `ABCDE` codepage = `UTF-16LE` ).
Before 7.02, use the class CL_ABAP_CONV_OUT_CE (documentation provided with the class):
First, instantiate the conversion object, use a SAP code page number instead of the ISO name (list of values shown hereafter):
DATA: conv TYPE REF TO CL_ABAP_CONV_OUT_CE, xstring TYPE xstring.
conv = CL_ABAP_CONV_OUT_CE=>CREATE( encoding = '4103' ). "4103 = utf-16le
Then encode the string and retrieve the bytes encoded:
conv->RESET( ).
conv->WRITE( data = `ABCDE` ).
xstring = conv->GET_BUFFER( ).
Eventually, instead of using RESET, WRITE and GET_BUFFER, the method CONVERT was added in 6.40 and retroported :
conv->CONVERT( EXPORTING data = `ABCDE` IMPORTING buffer = xstring ).
With the class CL_ABAP_CONV_OUT_CE, you need to use the number of the SAP Code Page, not the ISO name. Here are the most common SAP code pages and their equivalent ISO names:
1100: ISO-8859-1
1101: US-ASCII
1160: Windows-1252 ("ANSI")
1401: ISO-8859-2
4102: UTF-16BE
4103: UTF-16LE
4104: UTF-32BE
4105: UTF-32LE
4110: UTF-8
Etc. (the possible values are defined in the table TCP00A, in lines with column CPATTRKIND = 'H').
 
Writing a file on the application server in a given code page
In ABAP, OPEN DATASET can directly specify the target code page, most code pages are supported including UTF-8, but not other UTF (code pages 41xx) which can be done only by the solution explained in 2.3 below (by first encoding in memory).
2.1) IN TEXT MODE ENCODING ...
Possible ENCODING values:
UTF-8: in this mode, it's possible to add the Byte Order Mark if needed, via the option WITH BYTE-ORDER MARK.
DEFAULT: will be UTF-8 in a SAP "Unicode" system (that you can check via the menu System > Status > Unicode System Yes/No), NON-UNICODE otherwise.
NON-UNICODE: will depend on the current ABAP linguistic environment; for language English, it's the character encoding iso-8859-1, for language Polish, it's the character encoding iso-8859-2, etc. (the equivalences are shown in table TCP0C.)
Example in ABAP version 7.52 to write to UTF-8 with the byte order mark:
REPORT zmyprogram.
DATA(filename) = `/tmp/dataset_utf_8`.
OPEN DATASET filename IN TEXT MODE ENCODING UTF-8 WITH BYTE-ORDER MARK FOR OUTPUT.
TRY.
TRANSFER `Witaj świecie` TO filename.
CATCH cx_sy_conversion_codepage INTO DATA(lx).
" Character not supported in language code page
ENDTRY.
CLOSE DATASET filename.
Example in ABAP version 7.52 to write to iso-8859-2 (Polish language here):
REPORT zmyprogram.
SET LOCALE LANGUAGE 'L'. " Polish
DATA(filename) = `/tmp/dataset_nonunicode_pl`.
OPEN DATASET filename IN TEXT MODE ENCODING NON-UNICODE FOR OUTPUT.
TRY.
TRANSFER `Witaj świecie` TO filename.
CATCH cx_sy_conversion_codepage INTO DATA(lx).
" Character not supported in language code page
ENDTRY.
CLOSE DATASET filename.
2.2) IN LEGACY TEXT MODE CODE PAGE ...
Use any code page number except code pages 41xx (i.e. UTF-8 and other UTF; see workaround in 2.3 below).
Example in ABAP version 7.52 to write to iso-8859-2 (code page 1401) :
REPORT zmyprogram.
DATA(filename) = `/tmp/dataset_iso_8859_2`.
OPEN DATASET filename IN LEGACY TEXT MODE CODE PAGE '1401' FOR OUTPUT. " iso-8859-2
TRY.
TRANSFER `Witaj świecie` TO filename.
CATCH cx_sy_conversion_codepage INTO DATA(lx).
" Character not supported in language code page
ENDTRY.
CLOSE DATASET filename.
2.3) UTF = general way + IN BINARY MODE
Example in ABAP version 7.52:
REPORT zmyprogram.
TRY.
DATA(xstring) = cl_abap_codepage=>convert_to( source = `Witaj świecie` codepage = `UTF-16LE` ).
CATCH cx_sy_conversion_codepage INTO DATA(lx).
" Character not supported in language code page
BREAK-POINT.
ENDTRY.
DATA(filename) = `/tmp/dataset_utf_16le`.
OPEN DATASET filename IN BINARY MODE FOR OUTPUT.
TRANSFER xstring TO filename.
CLOSE DATASET filename.

Ruby How to convert back binary string from smsc

my app work with SMSC, and i need to get involve in sms before it send,
i try to send from the mobile that string
"hello this is test"
And when I check the smsc I got this as binary string of my text:
userData = "c8329bfd06d1d1e939283d07d1cb733a"
the encoding of this string is:
<Encoding:ASCII-8BIT>
I know that probably this userData is in GSM encoding in binary-string
so how can i get from userData back the clear text string ?
this question is for english lang, because in Hebrew I can get back the
string with this code:
[userData].pack('H*').force_encoding('utf-16be').encode('utf-8')
but in english i got error:
Encoding::InvalidByteSequenceError: "\xDA\xF3" followed by "u" on UTF-16BE
What I was try is to detect the binary string with ICU, and I got:
"ISO-8859-1" and the language that detected is: 'PT', that very strange cause my languages is English or Hebrew.
anyway i got lost with encoding stuff, so i try to encode to each name of list from Encoding.list
but without luck until now
thanks in advance
Shmulik
OK,
For who that also have this issue, i got the solution, thanks to someone from #ruby irc community (i missed his nickname)
The solution is:
for ascii chars that interpolate to binary:
You need that:
"c8329bfd06d1d1e939283d07d1cb733a".scan(/../).reverse_each.map { |h| h.to_i(16) }.pack('C*').unpack('B*')[0][2..-1].scan(/.{7}/).map.with_object("") { |x, s| s << x.to_i(2) }.reverse
Remember I sent this words in sms:
"hello this is test"
And that it has become in binary to:
"c8329bfd06d1d1e939283d07d1cb733a"
The reason that i got garbage in any encoding is, because the ascii chars is 7bits GSM, so only first 7bits represents the data but each another encoding uses at least 8bits, so that what the code actually do.
But this is just for ascii char set.
In another language like I use Hebrew, the SMS send as ucs2
So this code work for me:
[your_binary_string].pack('H*').force_encoding('utf-16be').encode('utf-8')
Very important to put the binary string in array
So that all for now.
If anybody want to translate and explain what exactly happen in the code for ascii char set, be my guest and welcome.
Shmulik

How do I encode a NSString to send it to a PHP iOS

I need to send a NSString to a PHP file that goes into a MySQL database.
The problem that I have is with especial characters like "é". When I get the string (from the Facebook SDK for iOS) it comes like this: "Thenáme Thesurnamé", for example.
I send it to a PHP on a server using unicode as charset (I also tried with utf8), but in the database it appears with "é" instead of "é".
The encoding of the database is utf8_unicode_ci
You can always user the ampersand in html to encode such chars.
Try this code:
NSString *resStr = [[NSString alloc] initWithCString:[srcStr cStringUsingEncoding:[NSString defaultCStringEncoding]] encoding:NSMacOSRomanStringEncoding];

Read ZIP with umlaut in filename

A proper ZIP is encoded with code page 437. However this code page is not supported by iOS. Thus I can't extract ZIP files that contain files and folders with special characters like ä, ö or ü.
Objective-Zip and zipzap convert the filename to nil, which makes the file unreadable. ZipKit at least converts the umlauts to a question mark. The file can be accessed, but it still looks weird. Is there a way to access the original, CP473 encoded filenames in iOS?
With zipzap you can specify a non-UTF8 encoding for filename interpretation.
In the 8.0 API:
ZZArchive* archive = [[ZZArchive alloc]
initWithURL:URL
options:#{ ZZOpenOptionsEncodingKey:
CFStringConvertEncodingToNSStringEncoding(
kCFStringEncodingDOSLatinUS) }
error:nil];
In the older API:
ZZArchive* archive = [[ZZArchive alloc]
initWithContentsOfURL:URL
encoding:CFStringConvertEncodingToNSStringEncoding(
kCFStringEncodingDOSLatinUS)];
Well, it's just a code page. Each byte is strictly defined as 1 character, so it shouldn't be hard to write up a simple function to convert it to Unicode byte by byte. All of the code points are listed on the wikipedia page you linked (e.g. 0x81 == \u00FC).

Strange character before pound symbol in Titanium Studio

In Titanium Studio, I am storing a one character value in an SQLite database (which uses UTF-8 encoding). When I store a pound symbol (£), it stores fine, but when I read it back, I get ¬£ instead. Strangely enough, the string length still reports to be 1, in spite of two characters being visible. The main problem is that this character forms part of a filename that gets sent to a Windows Server. So, while in Titanium, despite the extra character, everything works, when the filename gets sent to Windows, we get another strange character. I tried converting the character using Ti.Buffer, but when I decode, I still get the same characters back.
var tipo_v='';
var buf = Ti.createBuffer({length:1024});
var l = Ti.Codec.encodeString({
source: Vtipo_visita,
dest: buf,
});
buf.length= l;
tipo_v = Ti.Codec.decodeString({
source: buf,
charset: Ti.Codec.CHARSET_ASCII
});
The variable Vtipo_visita has the ¬£ value. After the call to decodeString(), tipo_v has the value √Ǭ.
I also tried using CHATSET_ISO_LATIN_1, but it didn't make any difference. How can I get this character to display correctly without the extra character in front.
As a final note, I found that simply doing
String.fromCharCode(163)
outputs the two characters in the Debugger, instead of just one. Thanks for any suggestions.

Resources