Emoji cStringUsingEncoding with NSASCIIStringEncoding not working - ios

I have an array with data :
MY_ARRAY: (
"email="My_Email_ID"",
"message=\Ud83d\Ude0a",
"key="MY_KEY"
"id="MY_ID""
)
In the message field I have added emoji and it is showing it's hex value.
But when I try converting it to string :
string = [MY_ARRAY componentsJoinedByString: #"&"];
the output in terminal shows:
email="My_Email_ID"&message=😊&key="MY_KEY"&id="MY_ID"
why is it converting back to emoji?
The problem which I am facing is at this line:
const char *charData = [string cStringUsingEncoding:NSASCIIStringEncoding];
as I am getting null here.

Changing below line:
const char *charData = [string cStringUsingEncoding:NSASCIIStringEncoding];
to
const char *charData = [string cStringUsingEncoding:NSUTF8StringEncoding];
solved my issue.
Thanks to this forum.
If at all possible, stay away from anything other than UTF8.
NSASCIIStringEncoding will break (return NULL) whenever there is a non-ASCII character in the string. There is no reason to take the risk. Only use cStringUsingEncoding: if you actually for whatever strange reason need a string in that particular encoding. Since NSLog expects UTF8 strings, any encoding that isn't a subset of UTF8 will produce strange results.

Related

How do I convert NSString to an encoding other than UTF-8?

I'm working with c in iOS Project I'm trying to convert my string to respected type in c , below code is supposed to send to core Library
typedef uint16_t UniCharT;
static const UniCharT s_learnWord[] = {'H', 'e','l','\0'};
what i have done till now is string is the one what I'm passing
NSString * string = #"Hel";
static const UniCharT *a = (UniCharT *)[string UTF8String];
But it is failing to convert when more than one character , If i pass one character then working fine please let me where i miss, How can i pass like s_learnWord ?
and i tried in google and StackOverFLow none of the duplicates or answers didn't worked for me like this
Convert NSString into char array I'm already doing same way only.
Your question is a little ambiguous as the title says "c type char[]" but your code uses typedef uint16_t UniCharT; which is contradictory.
For any string conversions other than UTF-8, you normally want to use the method getCString:maxLength:encoding:.
As you are using uint16_t, you probably are trying to use UTF-16? You'll want to pass NSUTF16StringEncoding as the encoding constant in that case. (Or possibly NSUTF16BigEndianStringEncoding/NSUTF16LittleEndianStringEncoding)
Something like this should work:
include <stdlib.h>
// ...
NSString * string = #"part";
NSUInteger stringBytes = [string maximumLengthOfBytesUsingEncoding];
stringBytes += sizeof(UniCharT); // make space for \0 termination
UniCharT* convertedString = calloc(1, stringBytes);
[string getCString:(char*)convertedString
maxLength:stringBytes
encoding:NSUTF16StringEncoding];
// now use convertedString, pass it to library etc.
free(convertedString);

NSLog() vs printf() when printing C string (UTF-8)

I have noticed that if I try to print the byte array containing the representation of a string in UTF-8, using the format specifier "%s", printf() gets it right but NSLog() gets it garbled (i.e., each byte printed as-is, so for example "¥" gets printed as the 2 characters: "¬•").
This is curious, because I always thought that NSLog() is just printf(), plus:
The first parameter (the 'format') is an Objective-C string, not a C
string (hence the "#").
The timestamp and app name prepended.
The newline automatically added at the end.
The ability to print Objective-C objects (using the format "%#").
My code:
NSString* string;
// (...fill string with unicode string...)
const char* stringBytes = [string cStringUsingEncoding:NSUTF8Encoding];
NSUInteger stringByteLength = [string lengthOfBytesUsingEncoding:NSUTF8Encoding];
stringByteLength += 1; // add room for '\0' terminator
char* buffer = calloc(sizeof(char), stringByteLength);
memcpy(buffer, stringBytes, stringByteLength);
NSLog(#"Buffer after copy: %s", buffer);
// (renders ascii, no matter what)
printf("Buffer after copy: %s\n", buffer);
// (renders correctly, e.g. japanese text)
Somehow, it looks as if printf() is "smarter" than NSLog(). Does anyone know the underlying cause, and if this feature is documented anywhere? (Couldn't find)
NSLog() and stringWithFormat: seem to expect the string for %s
in the "system encoding" (for example "Mac Roman" on my computer):
NSString *string = #"¥";
NSStringEncoding enc = CFStringConvertEncodingToNSStringEncoding(CFStringGetSystemEncoding());
const char* stringBytes = [string cStringUsingEncoding:enc];
NSString *log = [NSString stringWithFormat:#"%s", stringBytes];
NSLog(#"%#", log);
// Output: ¥
Of course this will fail if some characters are not representable in the system encoding. I could not find an official documentation for this behavior, but one can see that using %s in stringWithFormat: or NSLog() does not reliably work with arbitrary UTF-8 strings.
If you want to check the contents of a char buffer containing an UTF-8 string, then
this would work with arbitrary characters (using the boxed expression syntax to create an NSString from a UTF-8 string):
NSLog(#"%#", #(utf8Buffer));

Trouble with string encoding and emoji

I've got some trouble to retrieve some text message from my server, especially with the encoding. Messages can be from many languages (so they can have accents, be in japanese,... ) and can include emoji.
I'm retrieving my message with a JSON with some info. Here is some logs example :
(lldb) po dataMessages
<__NSCFArray 0x14ecc7f0>(
{
author = "User 1";
text = "Hier, c'\U00c3\U00a9tait incroyable";
},
{
...
}
)
(lldb) po [[dataMessages objectAtIndex:0] objectForKey:#"text"]
Hier, c'était incroyable
I'm able to get the correct text with :
const char *c = [[[dataMessages objectAtIndex:indexPath.row] objectForKey:#"text"] cStringUsingEncoding:NSWindowsCP1252StringEncoding];
NSString *myMessage = [NSString stringWithCString:c encoding:NSUTF8StringEncoding];
However, if the message contains emoji, cStringUsingEncoding: return a NULL value.
I don't have control on my server, so I can't change their encoding before messages are sent to me.
The problem is determining the encoding correctly. Emoji are not part of NSWindowsCP1252StringEncoding so the conversion just fails.
Moreover, you are passing through an unnecessary stage. Do not make an intermediate C string! Just call NSString's initWithData:encoding:.
In your case, calling NSWindowsCP1252StringEncoding was always a mistake; I'm surprised that this worked for any string. C3A9 is Unicode (UTF8). So just call initWithData:encoding: with the UTF8 encoding (NSUTF8StringEncoding) from the get-go and all will be well.

3rd Party Language support (Xcode + iOS) [duplicate]

I've got a problem with the following code:
NSString *strValue=#"你好";
char temp[200];
strcpy(temp, [strValue UTF8String]);
printf("%s", temp);
NSLog(#"%s", temp);
in the first line of the codes, two Chinese characters are double quoted. The problem is printf function can display the Chinese characters properly, but NSLog can't.
Thanks to all. I figured out a solution for this problem. Foundation uses UTF-16 by default, so in order to use NSLog to output the c string in the example, I have to use cStringUsingEncoding to get UTF-16 c string and use %S to replace %s.
NSString *strValue=#"你好";
char temp[200];
strcpy(temp, [strValue UTF8String]);
printf("%s", temp);
strcpy(temp, [strValue cStringUsingEncoding:NSUTF16LittleEndianStringEncoding]);
NSLog(#"%S", temp);
NSLog's %s format specifier is in the system encoding, which seems to always be MacRoman and not unicode, so it can only display characters in MacRoman encoding. Your best option with NSLog is just to use the native object format specifier %# and pass the NSString directly instead of converting it to a C String. If you only have a C string and you want to use NSLog to display a message instead of printf or asl, you will have to do something like Don suggests in order to convert the string to an NSString object first.
So, all of these should display the expected string:
NSString *str = #"你好";
const char *cstr = [str UTF8String];
NSLog(#"%#", str);
printf("%s\n", cstr);
NSLog(#"%#", [NSString stringWithUTF8String:cstr]);
If you do decide to use asl, note that while it accepts strings in UTF8 format and passes the correct encoding to the syslog daemon (so it will show up properly in the console), it encodes the string for visual encoding when displaying to the terminal or logging to a file handle, so non-ASCII values will be displayed as escaped character sequences.
My guess is that NSLog assumes a different encoding for 8-bit C-strings than UTF-8, and it may be one that doesn't support Chinese characters. Awkward as it is, you might try this:
NSLog(#"%#", [NSString stringWithCString: temp encoding: NSUTF8StringEncoding]);
I know you are probably looking for an answer that will help you understand what's going on.
But this is what you could do to solve your problem right now:
NSLog(#"%#", strValue);
# define NSLogUTF8(a,b) NSLog(a,[NSString stringWithCString:[[NSString stringWithFormat:#"%#",b] cStringUsingEncoding:NSUTF8StringEncoding] encoding:NSNonLossyASCIIStringEncoding])
#define NSLogUTF8Ex(a,b) NSLog(a,[MLTool utf8toNString:[NSString stringWithFormat:#"%#",b]])
+(NSString*)utf8toNString:(NSString*)str{
NSString* strT= [str stringByReplacingOccurrencesOfString:#"\\U" withString:#"\\u"];
//NSString *strT = [strTemp mutableCopy];
CFStringRef transform = CFSTR("Any-Hex/Java");
CFStringTransform((__bridge CFMutableStringRef)strT, NULL, transform, YES);
return strT;
}

Why is it direct commented Encoded string not converting to Arabic?

NSString * string = #"االْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ";
const char *c = [string cStringUsingEncoding:NSUTF8StringEncoding];
NSString *newString = [[NSString alloc]initWithCString:c encoding:NSISOLatin1StringEncoding];
NSLog(#"%#",newString);
// NSString * staticEncodedString = #"اÙÙØ­ÙÙ Ùد٠ÙÙÙÙÙÙ٠رÙبÙ٠اÙÙعÙاÙÙÙ ÙÙÙÙ";
const char *cvvv = [newString cStringUsingEncoding:NSISOLatin1StringEncoding];
NSString *newStringV = [[NSString alloc]initWithCString:cvvv encoding:NSUTF8StringEncoding];
NSLog(#"%#",newStringV);
Why is it direct commented Encoded string not converting to Arabic?
When i hardcode the Arabic it encodes and then decodes correctly, but why can't static encoded string not readable in arabic?
Thanks for your reply Jake. Yes I loose data while decoding the "staticEncodedString".But All I want is to decode the following string back to Arabic.
NSString * staticEncodedString = #"اÙÙØ­ÙÙ Ùد٠ÙÙÙÙÙÙ٠رÙبÙ٠اÙÙعÙاÙÙÙ ÙÙÙÙ";
The encode is in ANSI i think change it to UTF-8 from any tool.
Use Notepad++ to apply for example and then you can use encode it within sqlite or ios.
Latin1 can not represent the Arabic characters, so you can not encode that string to Latin1. Arabic belongs to the Latin4 character set. The method cStringUsingEncoding will return null if the string cannot losslessly be encoded to the specified encoding.
Why would you want to encode an arabic string to LatinX? UTF-8 will most likely be the best representation since it uses only standard characters and a straightforward approach with no headaches. It may take a bit more bytes than Latin4, but in most cases it will be worth it.
Converting to Latin1 will make you lose your text.

Resources