I have a UTF-8 encoding string that I want to display in a label.
When I set a break-point and examine the variable holding the string, all looks good. However, when I try to output to the log, or to the label, I get latin encoding.
I have tried almost every suggestion on SO and beyond, but I just cannot get the string to display properly.
Here is my code:
NSString *rawString = [NSString stringWithFormat:#"%#",m_value];
const char *utf8String = [rawString UTF8String];
NSLog (#"%#", [NSString stringWithUTF8String:utf8String]);
NSLog (#"%s", utf8String);
NSLog (#"%#", rawString);
self.resultText.text = [NSString stringWithUTF8String:utf8String];
m_value is an NSString, and in the debug window, it also displays the correct encoding.
m_value NSString * 0x006797b0 #"鄧樂愚..."
NSObject NSObject
isa Class 0x3bddd8f4
[0] Class
I am using the iOS 6.1 SDK.
Ok, if m_value is a const char contained UTF-8 string you have to use this method:
- (id)initWithUTF8String:(const char *)bytes
NSString *correctString = [[NSString alloc] initWithUTF8String: m_value];
It's incorrect to pass const char* to # formatter, because # means NSObject, so it will be always incorrect and can lead to app crash
When I want to show khmer on label, I use font 'Hanuman.ttf'. This is code I use:
`UIFont *font = [UIFont fontWithName:#"Hanuman" size:20.0f];
self.nameLabel.text = [NSString stringWithFormat:#"%#",itemName];
self.nameLabel.font = font;`
I don't know this can help you or not , but this is what I did before !
So I finally managed to get to the bottom of this.
The m_value NSString was being set by a third party library to which I had no access to the source. Even though the value of this variable was being decoded correctly in the (I.e. displaying the Chinese characters) in the debug panel, the string was actually encoded with NSMacOSRomanStringEncoding.
I was able to determine this by copying the output into TextWrangler, and flipping encodings until I found the one that translated correctly into UTF-8.
Then to fix in Objective-C, I first translated the NSString to a const char:
const char *macString = [bxr.m_value cStringUsingEncoding:NSMacOSRomanStringEncoding];
Then converted back to an NSString:
NSString *utf8String = [[NSString alloc]initWithCString:macString encoding:NSUTF8StringEncoding];
+1 to #Vitaly_S and #iphonic whose answers eventually led me to this solution. For anyone else that stumbles across this; it seems that as of Xcode 4.6.1, the debug window cannot be trusted to render strings correctly, but you can rely on the NSLog output.
Considering your variable m_value NSData, you can try the following
self.resultText.text = [[NSString alloc] initWithData:m_value encoding:NSISOLatin1StringEncoding];
There are many encoding available you can try them too
NSASCIIStringEncoding /* 0..127 only */
NSNEXTSTEPStringEncoding
NSJapaneseEUCStringEncoding
NSUTF8StringEncoding
NSISOLatin1StringEncoding
NSSymbolStringEncoding
NSNonLossyASCIIStringEncoding
NSShiftJISStringEncoding /* kCFStringEncodingDOSJapanese */
NSISOLatin2StringEncoding
NSUnicodeStringEncoding
NSWindowsCP1251StringEncoding /* Cyrillic; same as AdobeStandardCyrillic */
NSWindowsCP1252StringEncoding /* WinLatin1 */
NSWindowsCP1253StringEncoding /* Greek */
NSWindowsCP1254StringEncoding /* Turkish */
NSWindowsCP1250StringEncoding /* WinLatin2 */
NSISO2022JPStringEncoding /* ISO 2022 Japanese encoding for e-mail */
NSMacOSRomanStringEncoding
NSUTF16StringEncoding /* An alias for NSUnicodeStringEncoding */
NSUTF16BigEndianStringEncoding /* NSUTF16StringEncoding encoding with explicit endianness specified */
NSUTF16LittleEndianStringEncoding /* NSUTF16StringEncoding encoding with explicit endianness specified */
NSUTF32StringEncoding
NSUTF32BigEndianStringEncoding /* NSUTF32StringEncoding encoding with explicit endianness specified */
NSUTF32LittleEndianStringEncoding /* NSUTF32StringEncoding encoding with explicit endianness specified */
Related
For example I could type an emoji character code such as:
NSString* str = #"😊";
NSLog(#"%#", str);
The smile emoji would be seen in the console.
Maybe the code editor and the compiler would trade the literal in UTF-8.
And now I'm working in a full unicode, I mean 32bit per char, environment and I've got the unicode of the emoji, I want to convert the 32bit unicode into a NSString for example:
int charcode = 0x0001F60A;
NSLog(#"%??", charcode);
The question is what should I put at the "??" position and then I could format the charcode into a emoji string?
BTW the charcode was a variable which can not be determine at the compile time.
I don't want to compress the 32bit int into UTF-8 bytes unless that would be the only way.
If 0x0001F60A is a dynamic value determined at runtime then
you can use the NSString method
- (instancetype)initWithBytes:(const void *)bytes length:(NSUInteger)len encoding:(NSStringEncoding)encoding;
to create a string containing a character with the given Unicode value:
int charcode = 0x0001F60A;
uint32_t data = OSSwapHostToLittleInt32(charcode); // Convert to little-endian
NSString *str = [[NSString alloc] initWithBytes:&data length:4 encoding:NSUTF32LittleEndianStringEncoding];
NSLog(#"%#", str); // 😊
Use NSString initialization method
int charcode = 0x0001F60A;
NSLog(#"%#", [[NSString alloc] initWithBytes:&charcode length:4 encoding:NSUTF32LittleEndianStringEncoding]);
I have noticed that if I try to print the byte array containing the representation of a string in UTF-8, using the format specifier "%s", printf() gets it right but NSLog() gets it garbled (i.e., each byte printed as-is, so for example "¥" gets printed as the 2 characters: "¬•").
This is curious, because I always thought that NSLog() is just printf(), plus:
The first parameter (the 'format') is an Objective-C string, not a C
string (hence the "#").
The timestamp and app name prepended.
The newline automatically added at the end.
The ability to print Objective-C objects (using the format "%#").
My code:
NSString* string;
// (...fill string with unicode string...)
const char* stringBytes = [string cStringUsingEncoding:NSUTF8Encoding];
NSUInteger stringByteLength = [string lengthOfBytesUsingEncoding:NSUTF8Encoding];
stringByteLength += 1; // add room for '\0' terminator
char* buffer = calloc(sizeof(char), stringByteLength);
memcpy(buffer, stringBytes, stringByteLength);
NSLog(#"Buffer after copy: %s", buffer);
// (renders ascii, no matter what)
printf("Buffer after copy: %s\n", buffer);
// (renders correctly, e.g. japanese text)
Somehow, it looks as if printf() is "smarter" than NSLog(). Does anyone know the underlying cause, and if this feature is documented anywhere? (Couldn't find)
NSLog() and stringWithFormat: seem to expect the string for %s
in the "system encoding" (for example "Mac Roman" on my computer):
NSString *string = #"¥";
NSStringEncoding enc = CFStringConvertEncodingToNSStringEncoding(CFStringGetSystemEncoding());
const char* stringBytes = [string cStringUsingEncoding:enc];
NSString *log = [NSString stringWithFormat:#"%s", stringBytes];
NSLog(#"%#", log);
// Output: ¥
Of course this will fail if some characters are not representable in the system encoding. I could not find an official documentation for this behavior, but one can see that using %s in stringWithFormat: or NSLog() does not reliably work with arbitrary UTF-8 strings.
If you want to check the contents of a char buffer containing an UTF-8 string, then
this would work with arbitrary characters (using the boxed expression syntax to create an NSString from a UTF-8 string):
NSLog(#"%#", #(utf8Buffer));
I've got a problem with the following code:
NSString *strValue=#"你好";
char temp[200];
strcpy(temp, [strValue UTF8String]);
printf("%s", temp);
NSLog(#"%s", temp);
in the first line of the codes, two Chinese characters are double quoted. The problem is printf function can display the Chinese characters properly, but NSLog can't.
Thanks to all. I figured out a solution for this problem. Foundation uses UTF-16 by default, so in order to use NSLog to output the c string in the example, I have to use cStringUsingEncoding to get UTF-16 c string and use %S to replace %s.
NSString *strValue=#"你好";
char temp[200];
strcpy(temp, [strValue UTF8String]);
printf("%s", temp);
strcpy(temp, [strValue cStringUsingEncoding:NSUTF16LittleEndianStringEncoding]);
NSLog(#"%S", temp);
NSLog's %s format specifier is in the system encoding, which seems to always be MacRoman and not unicode, so it can only display characters in MacRoman encoding. Your best option with NSLog is just to use the native object format specifier %# and pass the NSString directly instead of converting it to a C String. If you only have a C string and you want to use NSLog to display a message instead of printf or asl, you will have to do something like Don suggests in order to convert the string to an NSString object first.
So, all of these should display the expected string:
NSString *str = #"你好";
const char *cstr = [str UTF8String];
NSLog(#"%#", str);
printf("%s\n", cstr);
NSLog(#"%#", [NSString stringWithUTF8String:cstr]);
If you do decide to use asl, note that while it accepts strings in UTF8 format and passes the correct encoding to the syslog daemon (so it will show up properly in the console), it encodes the string for visual encoding when displaying to the terminal or logging to a file handle, so non-ASCII values will be displayed as escaped character sequences.
My guess is that NSLog assumes a different encoding for 8-bit C-strings than UTF-8, and it may be one that doesn't support Chinese characters. Awkward as it is, you might try this:
NSLog(#"%#", [NSString stringWithCString: temp encoding: NSUTF8StringEncoding]);
I know you are probably looking for an answer that will help you understand what's going on.
But this is what you could do to solve your problem right now:
NSLog(#"%#", strValue);
# define NSLogUTF8(a,b) NSLog(a,[NSString stringWithCString:[[NSString stringWithFormat:#"%#",b] cStringUsingEncoding:NSUTF8StringEncoding] encoding:NSNonLossyASCIIStringEncoding])
#define NSLogUTF8Ex(a,b) NSLog(a,[MLTool utf8toNString:[NSString stringWithFormat:#"%#",b]])
+(NSString*)utf8toNString:(NSString*)str{
NSString* strT= [str stringByReplacingOccurrencesOfString:#"\\U" withString:#"\\u"];
//NSString *strT = [strTemp mutableCopy];
CFStringRef transform = CFSTR("Any-Hex/Java");
CFStringTransform((__bridge CFMutableStringRef)strT, NULL, transform, YES);
return strT;
}
I have a C string:
unsigned char* contents = readInFile(path); //sting bytes in some unknown NSStringEncoding encoding.
I want to know the current NSStringEncoding value of contents. I want to do this without using NSString usedEncoding methods.
I need a CFStringRef from contents without introducting a new string encoding.
NSString* contentsString = [NSString stringWithCString:(char *)contents];
introduces default encoding so screws things up. How can a create a CFStringRef dirctly from contents?
Once I have this I can:
CFStringEncoding cfStringEncoding = CFStringGetFastestEncoding((CFStringRef)contentsString);
NSStringEncoding encoding = CFStringConvertEncodingToNSStringEncoding(cfStringEncoding);
NSString * string = #"االْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ";
const char *c = [string cStringUsingEncoding:NSUTF8StringEncoding];
NSString *newString = [[NSString alloc]initWithCString:c encoding:NSISOLatin1StringEncoding];
NSLog(#"%#",newString);
// NSString * staticEncodedString = #"اÙÙØÙÙ Ùد٠ÙÙÙÙÙÙ٠رÙبÙ٠اÙÙعÙاÙÙÙ ÙÙÙÙ";
const char *cvvv = [newString cStringUsingEncoding:NSISOLatin1StringEncoding];
NSString *newStringV = [[NSString alloc]initWithCString:cvvv encoding:NSUTF8StringEncoding];
NSLog(#"%#",newStringV);
Why is it direct commented Encoded string not converting to Arabic?
When i hardcode the Arabic it encodes and then decodes correctly, but why can't static encoded string not readable in arabic?
Thanks for your reply Jake. Yes I loose data while decoding the "staticEncodedString".But All I want is to decode the following string back to Arabic.
NSString * staticEncodedString = #"اÙÙØÙÙ Ùد٠ÙÙÙÙÙÙ٠رÙبÙ٠اÙÙعÙاÙÙÙ ÙÙÙÙ";
The encode is in ANSI i think change it to UTF-8 from any tool.
Use Notepad++ to apply for example and then you can use encode it within sqlite or ios.
Latin1 can not represent the Arabic characters, so you can not encode that string to Latin1. Arabic belongs to the Latin4 character set. The method cStringUsingEncoding will return null if the string cannot losslessly be encoded to the specified encoding.
Why would you want to encode an arabic string to LatinX? UTF-8 will most likely be the best representation since it uses only standard characters and a straightforward approach with no headaches. It may take a bit more bytes than Latin4, but in most cases it will be worth it.
Converting to Latin1 will make you lose your text.