Xcode - UTF-8 String Encoding - ios

I have a strange problem encoding my String
For example:
NSString *str = #"\u0e09\u0e31\u0e19\u0e23\u0e31\u0e01\u0e04\u0e38\u0e13";
NSString *utf = [str stringByReplacingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
NSLog("utf: %#", utf);
This worked perfectly in log
utf: ฉันรักคุณ
But, when I try using my string that I parsed from JSON with the same string:
//str is string parse from JSON
NSString *str = [spaces stringByReplacingOccurrencesOfString:#"U" withString:#"u"];
NSLog("str: %#, str);
NSString *utf = [str stringByReplacingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
NSLog("utf: %#", utf);
This didn't work in log
str: \u0e09\u0e31\u0e19\u0e23\u0e31\u0e01\u0e04\u0e38\u0e13
utf: \u0e09\u0e31\u0e19\u0e23\u0e31\u0e01\u0e04\u0e38\u0e13
I have been finding the answer for hours but still have no clue
Any would be very much appreciated! Thanks!

The string returned by JSON is actually different - it contains escaped backslashes (for each "\" you see when printing out the JSON string, what it actually contains is #"\").
In contrast, your manually created string already consists of "ฉันรักคุณ" from the beginning. You do not insert backslash characters - instead, #"\u0e09" (et. al.) is a single code point.
You could replace this line
NSString *utf = [str stringByReplacingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
with this line
NSString *utf = str;
and your example output would not change. The stringByReplacingPercentEscapesUsingEncoding: refers to a different kind of escaping. See here about percent encoding.
What you need to actually do, is parse the string for string representations of unicode code points. Here is a link to one potential solution: Using Objective C/Cocoa to unescape unicode characters. However, I would advise you to check out the JSON library you are using (if you are using one) - it's likely that they provide some way to handle this for you transparently. E.g. JSONkit does.

Related

Copyright/Registered symbol encoding not working

I’ve developed an iOS app in which we can send emojis from iOS to web portal and vice versa. All emojis sent from iOS to web portal are displaying perfect except “© and ®”.
Here is the emoji encoding piece of code.
NSData *data = [messageBody dataUsingEncoding:NSNonLossyASCIIStringEncoding];
NSString *encodedString = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding];
// This piece of code returns \251\256 as Unicodes of copyright and registered emojis, as these two Unicodes are not according to standard code so it doesn't display on web portal.
So what should I do to convert them standard Unicodes?
Test Run :
messageBody = #"Copy right symbol : © AND Registered Mark symbol : ®";
// Encoded string i get from the above encoding is
Copy right symbol : \\251 AND Registered Mark symbol : \\256
Where as it should like this (On standard unicodes )
Copy right symbol : \\u00A9 AND Registered Mark symbol : \\u00AE
First, I will try to provide the solution. Then I will try to explain why.
Escaping non-ASCII chars
To escape unicode chars in a string, you shouldn't rely on NSNonLossyASCIIStringEncoding. Below is the code that I use to escape unicode&non-ASCII chars in a string:
// NSMutableString category
- (void)appendChar:(unichar)charToAppend {
[self appendFormat:#"%C", charToAppend];
}
// NSString category
- (NSString *)UEscapedString {
char const hexChar[] = "0123456789ABCDEF";
NSMutableString *outputString = [NSMutableString string];
for (NSInteger i = 0; i < self.length; i++) {
unichar character = [self characterAtIndex:i];
if ((character >> 7) > 0) {
[outputString appendString:#"\\u"];
[outputString appendChar:(hexChar[(character >> 12) & 0xF])]; // append the hex character for the left-most 4-bits
[outputString appendChar:(hexChar[(character >> 8) & 0xF])]; // hex for the second group of 4-bits from the left
[outputString appendChar:(hexChar[(character >> 4) & 0xF])]; // hex for the third group
[outputString appendChar:(hexChar[character & 0xF])]; // hex for the last group, e.g., the right most 4-bits
} else {
[outputString appendChar:character];
}
}
return [outputString copy];
}
(NOTE: I guess Jon Rose's method does the same but I didn't wanna share a method that I didn't test)
Now you have the following string: Copy right symbol : \u00A9 AND Registered Mark symbol : \u00AE
Escaping unicode is done. Now let's convert it back to display the emojis.
Converting back
This is gonna be confusing at first but this is what it is:
NSData *data = [escapedString dataUsingEncoding:NSUTF8StringEncoding];
NSString *converted = [[NSString alloc] data encoding:NSNonLossyASCIIStringEncoding];
Now you have your emojis (and other non-ASCIIs) back.
What is happening?
The problem
In your case, you are trying to create a common language between your server side and your app. However, NSNonLossyASCIIStringEncoding is pretty bad choice for the purpose. Because this is a black-box that is created by Apple and we don't really know what it is exactly doing inside. As we can see, it converts unicode into \uXXXX while converting non-ASCII chars into \XXX. That is why you shouldn't rely on it to build a multi-platform system. There is no equivalent of it in backend platforms and Android.
Yet it is pretty mysterious, NSNonLossyASCIIStringEncoding can still convert back ® from \u00AE while it is converting it into \256 in the first place. I'm sure there are tools on other platforms to convert \uXXXX into unicode chars, that shouldn't be a problem for you.
messageBody is a string there is no reason to convert it to data only to convert it back to a string. Replace your code with
NSString *encodedString = messageBody;
If the messageBody object is incorrect then the way to fix it is to change the way it was created. The server sends data, not strings. The data that the server sends is encoding in some agreed upon way. Generally this encoding is UTF-8. If you know the encoding you can convert the data to a string; if you don't, then the data is gibberish that cannot be read. If the messageBody is incorrect, the problem occurred when it was converted from the data that the server sent. It seems likely that you are parsing it with the incorrect encoding.
The code you posted is just plain wrong. It converts a string to data using one encoding (ASCII) and the reads that data with a different encoding (UTF8). That is like translating a book to Spanish and then having a Portuguese speaker translate it back - it might work for some words, but it is still wrong.
If you are still having trouble then you should share the code of where messageBody is created.
If you server expects a ASCII string with all unicode characters changed to \u00xx then you should first yell at your server guy because he is an idiot. But if that doesn't work you can do the following code
NSString* messageBody = #"Copy right symbol : © AND Registered Mark symbol : ®";
NSData* utf32Data = [messageBody dataUsingEncoding:NSUTF32StringEncoding];
uint32_t *bytes = (uint32_t *) [utf32Data bytes];
NSMutableString* escapedString = [[NSMutableString alloc] init];
//Start a 1 because first bytes are for endianness
for(NSUInteger index = 1; index < escapedString.length / 4 ;index++ ){
uint32_t charValue = bytes[index];
if (charValue <= 127) {
[escapedString appendFormat:#"%C", (unichar)charValue];
}else{
[escapedString appendFormat:#"\\\\u%04X", charValue];
}
}
I'm really do not understand your problem.
You can simply convert ANY character into nsdata and return it into string.
You can simply pass UTF-8 string including both emoji and other symbols using POST request.
NSString* newStr = [[NSString alloc] initWithData:theData encoding:NSUTF8StringEncoding];
NSData* data = [newStr dataUsingEncoding:NSUTF8StringEncoding];
It have to work for both server and client side.
But, of course, you have got the other problem that some fonts do not support allutf-8 chars. That's why, e.g., in terminal you might not see some of them. But this is beyong the scope of this question.
NSNonLossyASCIIStringEncoding is used only then you really wnat to convert symbol into chain of symbols. But it is not needed.

3rd Party Language support (Xcode + iOS) [duplicate]

I've got a problem with the following code:
NSString *strValue=#"你好";
char temp[200];
strcpy(temp, [strValue UTF8String]);
printf("%s", temp);
NSLog(#"%s", temp);
in the first line of the codes, two Chinese characters are double quoted. The problem is printf function can display the Chinese characters properly, but NSLog can't.
Thanks to all. I figured out a solution for this problem. Foundation uses UTF-16 by default, so in order to use NSLog to output the c string in the example, I have to use cStringUsingEncoding to get UTF-16 c string and use %S to replace %s.
NSString *strValue=#"你好";
char temp[200];
strcpy(temp, [strValue UTF8String]);
printf("%s", temp);
strcpy(temp, [strValue cStringUsingEncoding:NSUTF16LittleEndianStringEncoding]);
NSLog(#"%S", temp);
NSLog's %s format specifier is in the system encoding, which seems to always be MacRoman and not unicode, so it can only display characters in MacRoman encoding. Your best option with NSLog is just to use the native object format specifier %# and pass the NSString directly instead of converting it to a C String. If you only have a C string and you want to use NSLog to display a message instead of printf or asl, you will have to do something like Don suggests in order to convert the string to an NSString object first.
So, all of these should display the expected string:
NSString *str = #"你好";
const char *cstr = [str UTF8String];
NSLog(#"%#", str);
printf("%s\n", cstr);
NSLog(#"%#", [NSString stringWithUTF8String:cstr]);
If you do decide to use asl, note that while it accepts strings in UTF8 format and passes the correct encoding to the syslog daemon (so it will show up properly in the console), it encodes the string for visual encoding when displaying to the terminal or logging to a file handle, so non-ASCII values will be displayed as escaped character sequences.
My guess is that NSLog assumes a different encoding for 8-bit C-strings than UTF-8, and it may be one that doesn't support Chinese characters. Awkward as it is, you might try this:
NSLog(#"%#", [NSString stringWithCString: temp encoding: NSUTF8StringEncoding]);
I know you are probably looking for an answer that will help you understand what's going on.
But this is what you could do to solve your problem right now:
NSLog(#"%#", strValue);
# define NSLogUTF8(a,b) NSLog(a,[NSString stringWithCString:[[NSString stringWithFormat:#"%#",b] cStringUsingEncoding:NSUTF8StringEncoding] encoding:NSNonLossyASCIIStringEncoding])
#define NSLogUTF8Ex(a,b) NSLog(a,[MLTool utf8toNString:[NSString stringWithFormat:#"%#",b]])
+(NSString*)utf8toNString:(NSString*)str{
NSString* strT= [str stringByReplacingOccurrencesOfString:#"\\U" withString:#"\\u"];
//NSString *strT = [strTemp mutableCopy];
CFStringRef transform = CFSTR("Any-Hex/Java");
CFStringTransform((__bridge CFMutableStringRef)strT, NULL, transform, YES);
return strT;
}

NSString separation-iOS

I have following strings. But I need to separate them by this "jsonp1343930692" and assign them NSString again. How could I that? I could able to separate them to NSArray but I don't know how to separate to NSString.
jsonp1343930692("snapshot":[{"timestamp":1349143800,"data":[{"label_id":10,"lat":29.7161,"lng":-95.3906,"attr":{"ozone_level":37,"exp":"IN","gridpoint":"29.72:-95.39"}},{"label_id":10,"lat":30.168456,"lng":-95.50448}]}]})
jsonp1343930692("snapshot":[{"timestamp":1349144700,"data":[{"label_id":10,"lat":29.7161,"lng":-95.3906,"attr":{"ozone_level":37,"exp":"IN","gridpoint":"29.72:-95.39"}},{"label_id":10,"lat":30.168456,"lng":-95.50448,"attr":{"ozone_level":57,"exp":"IN","gridpoint":"30.17:-95.5"}},{"label_id":10,"lat":29.036944,"lng":-95.438333}]}]})
The jsonp1343930692 prefix in your string is odd: I don't know where you string come from, but it really seems to be some JSON string with this strange prefix that has no reason to be there. The best shot here is probably to check if it is normal to have this prefix, for example if you get this string from a WebService it is probably the WebService fault to return this odd prefix.
Anyway, if you want to remove the jsonp1343930692 prefix of your string, you have multiple options:
Check that the prefix is existant, and if so, remove the right number of characters from the original string:
NSString* str = ... // your string with the "jsonp1343930692" prefix
static NSString* kStringToRemove = #"jsonp1343930692";
if ([str hasPrefix:kStringToRemove])
{
// rebuilt a string by only using the substring after the prefix
str = [str substringFromIndex:kStringToRemove.length];
}
Split your string in multiple parts, using the jsonp1343930692 string as a separator
NSString* str = ... // your string with the "jsonp1343930692" prefix
static NSString* kStringToRemove = #"jsonp1343930692";
NSArray* parts = [str componentsSeparatedByString:kStringToRemove];
str = [parts componentsJoinedByString:#""];
Replace every occurrences of jsonp1343930692 by the empty string.
NSString* str = ... // your string with the "jsonp1343930692" prefix
static NSString* kStringToRemove = #"jsonp1343930692";
str = [str stringByReplacingOccurrencesOfString:kStringToRemove withString:#""];
So in short you have many possibilities depending on what exactly you want to do :)
Of course, once you have removed your strange jsonp1343930692 prefix, you can deserialize your JSON string to obtain a JSON object (either using some third-party lib like SBJSON or using NSJSONSerializer on iOS5 and later, etc)
Have a look at the NSJSONSerialization class to turn this into a Cocoa collection that you can deal with.

Why is it direct commented Encoded string not converting to Arabic?

NSString * string = #"االْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ";
const char *c = [string cStringUsingEncoding:NSUTF8StringEncoding];
NSString *newString = [[NSString alloc]initWithCString:c encoding:NSISOLatin1StringEncoding];
NSLog(#"%#",newString);
// NSString * staticEncodedString = #"اÙÙØ­ÙÙ Ùد٠ÙÙÙÙÙÙ٠رÙبÙ٠اÙÙعÙاÙÙÙ ÙÙÙÙ";
const char *cvvv = [newString cStringUsingEncoding:NSISOLatin1StringEncoding];
NSString *newStringV = [[NSString alloc]initWithCString:cvvv encoding:NSUTF8StringEncoding];
NSLog(#"%#",newStringV);
Why is it direct commented Encoded string not converting to Arabic?
When i hardcode the Arabic it encodes and then decodes correctly, but why can't static encoded string not readable in arabic?
Thanks for your reply Jake. Yes I loose data while decoding the "staticEncodedString".But All I want is to decode the following string back to Arabic.
NSString * staticEncodedString = #"اÙÙØ­ÙÙ Ùد٠ÙÙÙÙÙÙ٠رÙبÙ٠اÙÙعÙاÙÙÙ ÙÙÙÙ";
The encode is in ANSI i think change it to UTF-8 from any tool.
Use Notepad++ to apply for example and then you can use encode it within sqlite or ios.
Latin1 can not represent the Arabic characters, so you can not encode that string to Latin1. Arabic belongs to the Latin4 character set. The method cStringUsingEncoding will return null if the string cannot losslessly be encoded to the specified encoding.
Why would you want to encode an arabic string to LatinX? UTF-8 will most likely be the best representation since it uses only standard characters and a straightforward approach with no headaches. It may take a bit more bytes than Latin4, but in most cases it will be worth it.
Converting to Latin1 will make you lose your text.

NSDictionary description not returning utf8 characters?

I have an NSDictionary with utf8 strings as objects. Printing the objects prints the special characters as they should.
But utf8 characters do not get correctly printed out when I convert the dictionary to a string with the description method.
NSDictionary *test = [NSDictionary dictionaryWithObject:#"Céline Dion" forKey:#"bla"];
NSLog(#"%#",[test objectForKey:#"bla"]); // prints fine
NSLog(#"%#",test); // does not print fine, é is replaced by \U00e
NSLog(#"%#",[test description]); // also does not print fine
How can I print the NSDictionary while preserving utf8 characters?
I wouldn't worry about what -description does, it's just for debugging.
Technically, you don't have UTF-8 strings. You have strings (which are Unicode). You don't know what NSString uses internally, and you shouldn't care. If you want a UTF-8 string (like when you're passing to a C API), use -UTF8String.
There is a way, but I can't check it at the moment:
NSString *decodedString = [NSString stringWithUTF8String:[[test description] cStringUsingEncoding:[NSString defaultCStringEncoding]]];
NSLog(#"%#",decodedString);

Resources