how to save bytes to an NSString with UTF8 encoding - ios

I have some NSData that I am passing in as bytes
const void *bytes = [responseData bytes];
Those bytes were originally UTF8 formatted, I am now trying to get them into a UTF8 NSString without messing with the encoding at all.
I have previously written this if that copies the bytes into a cstring which normally would be fine unless I have any non english characters in the bytes which take two byte instead of one. This means any international characters in my string get messed up when I copy them into a cstring.
Hence the reason for needing to copying the bytes directly into a UTF8 formatted object.. preferably a NSString.. if possible.
This is how I was handling the conversion which I later found out is wrong but will hopefully give you a good idea of what I am trying to achieve.
else if (typeWithLocalOrdering == METHOD_RESPONSE)
{
cstring = (char *) malloc(sizeWithLocalOrdering + 1);
strncpy(cstring, bytes, sizeWithLocalOrdering);
cstring[sizeWithLocalOrdering] = '\0';
NSString *resultString = [NSString stringWithCString:cstring encoding:NSUTF16StringEncoding];
methodResponseData =[resultString dataUsingEncoding:NSUTF16StringEncoding]; // methodResponseData is used later on in my parsing method
// Take care of the memory allocatoin, so that you can find the endoffile notification
free(cstring);
bytes += sizeWithLocalOrdering;
length -= sizeWithLocalOrdering;
}
Any help would be greatly appreciated.

I don't understand this: "This means any international characters in my string get messed up when I copy them into a cstring." If "sizeWithLocalOrdering" is correct for the actual length of the byte string, it seems like your original code should work (though I would have used memcpy rather than strncpy). If not, nothing's going to work.
Update: OK, I see it. Your original code was wrong here:
[NSString stringWithCString:cstring encoding:NSUTF16StringEncoding];
That should have been NSUTF8StringEncoding.

So it turns out I had a few interesting things happening that I was not expecting..
This is the code I used to get around working with the cstring and just take the bytes straight to a NSString as its original encoding then
NSString *tempstring = [[NSString alloc] initWithBytes:bytes length:sizeWithLocalOrdering encoding:NSUTF8StringEncoding];
methodResponseData =[tempstring dataUsingEncoding:NSUTF16StringEncoding]; // methodResponseData is used later on in my parsing method

Related

Copyright/Registered symbol encoding not working

I’ve developed an iOS app in which we can send emojis from iOS to web portal and vice versa. All emojis sent from iOS to web portal are displaying perfect except “© and ®”.
Here is the emoji encoding piece of code.
NSData *data = [messageBody dataUsingEncoding:NSNonLossyASCIIStringEncoding];
NSString *encodedString = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding];
// This piece of code returns \251\256 as Unicodes of copyright and registered emojis, as these two Unicodes are not according to standard code so it doesn't display on web portal.
So what should I do to convert them standard Unicodes?
Test Run :
messageBody = #"Copy right symbol : © AND Registered Mark symbol : ®";
// Encoded string i get from the above encoding is
Copy right symbol : \\251 AND Registered Mark symbol : \\256
Where as it should like this (On standard unicodes )
Copy right symbol : \\u00A9 AND Registered Mark symbol : \\u00AE
First, I will try to provide the solution. Then I will try to explain why.
Escaping non-ASCII chars
To escape unicode chars in a string, you shouldn't rely on NSNonLossyASCIIStringEncoding. Below is the code that I use to escape unicode&non-ASCII chars in a string:
// NSMutableString category
- (void)appendChar:(unichar)charToAppend {
[self appendFormat:#"%C", charToAppend];
}
// NSString category
- (NSString *)UEscapedString {
char const hexChar[] = "0123456789ABCDEF";
NSMutableString *outputString = [NSMutableString string];
for (NSInteger i = 0; i < self.length; i++) {
unichar character = [self characterAtIndex:i];
if ((character >> 7) > 0) {
[outputString appendString:#"\\u"];
[outputString appendChar:(hexChar[(character >> 12) & 0xF])]; // append the hex character for the left-most 4-bits
[outputString appendChar:(hexChar[(character >> 8) & 0xF])]; // hex for the second group of 4-bits from the left
[outputString appendChar:(hexChar[(character >> 4) & 0xF])]; // hex for the third group
[outputString appendChar:(hexChar[character & 0xF])]; // hex for the last group, e.g., the right most 4-bits
} else {
[outputString appendChar:character];
}
}
return [outputString copy];
}
(NOTE: I guess Jon Rose's method does the same but I didn't wanna share a method that I didn't test)
Now you have the following string: Copy right symbol : \u00A9 AND Registered Mark symbol : \u00AE
Escaping unicode is done. Now let's convert it back to display the emojis.
Converting back
This is gonna be confusing at first but this is what it is:
NSData *data = [escapedString dataUsingEncoding:NSUTF8StringEncoding];
NSString *converted = [[NSString alloc] data encoding:NSNonLossyASCIIStringEncoding];
Now you have your emojis (and other non-ASCIIs) back.
What is happening?
The problem
In your case, you are trying to create a common language between your server side and your app. However, NSNonLossyASCIIStringEncoding is pretty bad choice for the purpose. Because this is a black-box that is created by Apple and we don't really know what it is exactly doing inside. As we can see, it converts unicode into \uXXXX while converting non-ASCII chars into \XXX. That is why you shouldn't rely on it to build a multi-platform system. There is no equivalent of it in backend platforms and Android.
Yet it is pretty mysterious, NSNonLossyASCIIStringEncoding can still convert back ® from \u00AE while it is converting it into \256 in the first place. I'm sure there are tools on other platforms to convert \uXXXX into unicode chars, that shouldn't be a problem for you.
messageBody is a string there is no reason to convert it to data only to convert it back to a string. Replace your code with
NSString *encodedString = messageBody;
If the messageBody object is incorrect then the way to fix it is to change the way it was created. The server sends data, not strings. The data that the server sends is encoding in some agreed upon way. Generally this encoding is UTF-8. If you know the encoding you can convert the data to a string; if you don't, then the data is gibberish that cannot be read. If the messageBody is incorrect, the problem occurred when it was converted from the data that the server sent. It seems likely that you are parsing it with the incorrect encoding.
The code you posted is just plain wrong. It converts a string to data using one encoding (ASCII) and the reads that data with a different encoding (UTF8). That is like translating a book to Spanish and then having a Portuguese speaker translate it back - it might work for some words, but it is still wrong.
If you are still having trouble then you should share the code of where messageBody is created.
If you server expects a ASCII string with all unicode characters changed to \u00xx then you should first yell at your server guy because he is an idiot. But if that doesn't work you can do the following code
NSString* messageBody = #"Copy right symbol : © AND Registered Mark symbol : ®";
NSData* utf32Data = [messageBody dataUsingEncoding:NSUTF32StringEncoding];
uint32_t *bytes = (uint32_t *) [utf32Data bytes];
NSMutableString* escapedString = [[NSMutableString alloc] init];
//Start a 1 because first bytes are for endianness
for(NSUInteger index = 1; index < escapedString.length / 4 ;index++ ){
uint32_t charValue = bytes[index];
if (charValue <= 127) {
[escapedString appendFormat:#"%C", (unichar)charValue];
}else{
[escapedString appendFormat:#"\\\\u%04X", charValue];
}
}
I'm really do not understand your problem.
You can simply convert ANY character into nsdata and return it into string.
You can simply pass UTF-8 string including both emoji and other symbols using POST request.
NSString* newStr = [[NSString alloc] initWithData:theData encoding:NSUTF8StringEncoding];
NSData* data = [newStr dataUsingEncoding:NSUTF8StringEncoding];
It have to work for both server and client side.
But, of course, you have got the other problem that some fonts do not support allutf-8 chars. That's why, e.g., in terminal you might not see some of them. But this is beyong the scope of this question.
NSNonLossyASCIIStringEncoding is used only then you really wnat to convert symbol into chain of symbols. But it is not needed.

Bytes are changed after encoding NSString into NSInputStream via NSData

I run into the following problem when trying encoding an NSString as NSString -> NSData -> NSInputStream and then decode from NSInputStream with read method:
NSString *inputString = [NSString stringWithFormat:#"%c", 255];
NSData *data = [inputString dataUsingEncoding:NSUTF8StringEncoding];
NSInputStream *stream = [NSInputStream inputStreamWithData:data];
[stream open];
uint8_t bytes;
[stream read:&bytes maxLength:1];
NSLog(#"%i", bytes);
The output is 195 instead of 255. Why?
Because of the kind of encoding you used for the string. UTF-8 is a form of string encoding which will end up converting characters with values above 127 into multi-byte sequences. So although inputString contained a single character, your data object didn't actually contain a single byte as you may have assumed, but multiple (two, in this case) bytes. And when you read from the stream, you only read the first byte of the encoded data, but there was more there.
You didn't need to run the data through the input stream to see this result. Accessing the first byte of the NSData instance would have shown the same thing.
You say that this is a "problem" but you don't suggest what you're trying to actually accomplish. 255 isn't a printable/meaningful text character. If you want to transmit raw data bytes, you can do that directly, rather than using an NSString and string encodings. If you are transmitting strings, then it's already doing the right thing. You just need to be prepared that your data size can exceed your string "length".

convert unicode string to nsstring

I have a unicode string as
{\rtf1\ansi\ansicpg1252\cocoartf1265
{\fonttbl\f0\fswiss\fcharset0 Helvetica;\f1\fnil\fcharset0 LucidaGrande;}
{\colortbl;\red255\green255\blue255;}
{\*\listtable{\list\listtemplateid1\listhybrid{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace360\levelindent0{\*\levelmarker \{check\}}{\leveltext\leveltemplateid1\'01\uc0\u10003 ;}{\levelnumbers;}\fi-360\li720\lin720 }{\listname ;}\listid1}}
{\*\listoverridetable{\listoverride\listid1\listoverridecount0\ls1}}
\paperw11900\paperh16840\margl1440\margr1440\vieww22880\viewh16200\viewkind0
\pard\li720\fi-720\pardirnatural
\ls1\ilvl0
\f0\fs24 \cf0 {\listtext
\f1 \uc0\u10003
\f0 }One\
{\listtext
\f1 \uc0\u10003
\f0 }Two\
}
Here i have unicode data \u10003 which is equivalent to "✓" characters. I have used
[NSString stringWithCharacters:"\u10003" length:NSUTF16StringEncoding] which is throwing compilation error. Please let me know how to convert these unicode characters to "✓".
Regards,
Boom
I have same for problem and the following code solve my issue
For Encode
NSData *dataenc = [yourtext dataUsingEncoding:NSNonLossyASCIIStringEncoding];
NSString *encodevalue = [[NSString alloc]initWithData:dataenc encoding:NSUTF8StringEncoding];
For decode
NSData *data = [yourtext dataUsingEncoding:NSUTF8StringEncoding];
NSString *decodevalue = [[NSString alloc] initWithData:data encoding:NSNonLossyASCIIStringEncoding];
Thanks
I have used below code to convert a Uniode string to NSString. This should work fine.
NSData *unicodedStringData =
[unicodedString dataUsingEncoding:NSUTF8StringEncoding];
NSString *emojiStringValue =
[[NSString alloc] initWithData:unicodedStringData encoding:NSNonLossyASCIIStringEncoding];
In Swift 4
let emoji = "😃"
let unicodedData = emoji.data(using: String.Encoding.utf8, allowLossyConversion: true)
let emojiString = String(data: unicodedData!, encoding: String.Encoding.utf8)
I assume that:
You are reading this RTF data from a file or other external source.
You are parsing it yourself (not using, say, AppKit's built-in RTF parser).
You have a reason why you're parsing it yourself, and that reason isn't “wait, AppKit has this built in?”.
You have come upon \u… in the input you're parsing and need to convert that to a character for further handling and/or inclusion in the output text.
You have ruled out \uc, which is a different thing (it specifies the number of non-Unicode bytes that follow the \u… sequence, if I understood the RTF spec correctly).
\u is followed by hexadecimal digits. You need to parse those to a number; that number is the Unicode code point number for the character the sequence represents. You then need to create an NSString containing that character.
If you're using NSScanner to parse the input, then (assuming you have already scanned past the \u itself) you can simply ask the scanner to scanHexInt:. Pass a pointer to an unsigned int variable.
If you're not using NSScanner, do whatever makes sense for however you're parsing it. For example, if you've converted the RTF data to a C string and are reading through it yourself, you'll want to use strtoul to parse the hex number. It'll interpret the number in whatever base you specify (in this case, 16) and then put the pointer to the next character wherever you want it.
Your unsigned int or unsigned long variable will then contain the Unicode code point value for the specified character. In the example from your question, that will be 0x10003, or U+10003.
Now, for most characters, you could simply assign that over to a unichar variable and create an NSString from that. That won't work here: unichars only go up to 0xFFFF, and this code point is higher than that (in technical terms, it's outside the Basic Multilingual Plane).
Fortunately, *CF*String has a function to help you:
unsigned int codePoint = /*…*/;
unichar characters[2];
NSUInteger numCharacters = 0;
if (CFStringGetSurrogatePairForLongCharacter(codePoint, characters)) {
numCharacters = 2;
} else {
characters[0] = codePoint;
numCharacters = 1;
}
You can then use stringWithCharacters:length: to create an NSString from this array of 16-bit characters.
Use this:
NSString *myUnicodeString = #"\u10003";
Thanks to modern Objective C.
Let me know if its not what you want.
NSString *strUnicodeString = "\u2714";
NSData *unicodedStringData = [strUnicodeString dataUsingEncoding:NSUTF8StringEncoding];
NSString *emojiStringValue = [[NSString alloc] initWithData:unicodedStringData encoding:NSUTF8StringEncoding];

3rd Party Language support (Xcode + iOS) [duplicate]

I've got a problem with the following code:
NSString *strValue=#"你好";
char temp[200];
strcpy(temp, [strValue UTF8String]);
printf("%s", temp);
NSLog(#"%s", temp);
in the first line of the codes, two Chinese characters are double quoted. The problem is printf function can display the Chinese characters properly, but NSLog can't.
Thanks to all. I figured out a solution for this problem. Foundation uses UTF-16 by default, so in order to use NSLog to output the c string in the example, I have to use cStringUsingEncoding to get UTF-16 c string and use %S to replace %s.
NSString *strValue=#"你好";
char temp[200];
strcpy(temp, [strValue UTF8String]);
printf("%s", temp);
strcpy(temp, [strValue cStringUsingEncoding:NSUTF16LittleEndianStringEncoding]);
NSLog(#"%S", temp);
NSLog's %s format specifier is in the system encoding, which seems to always be MacRoman and not unicode, so it can only display characters in MacRoman encoding. Your best option with NSLog is just to use the native object format specifier %# and pass the NSString directly instead of converting it to a C String. If you only have a C string and you want to use NSLog to display a message instead of printf or asl, you will have to do something like Don suggests in order to convert the string to an NSString object first.
So, all of these should display the expected string:
NSString *str = #"你好";
const char *cstr = [str UTF8String];
NSLog(#"%#", str);
printf("%s\n", cstr);
NSLog(#"%#", [NSString stringWithUTF8String:cstr]);
If you do decide to use asl, note that while it accepts strings in UTF8 format and passes the correct encoding to the syslog daemon (so it will show up properly in the console), it encodes the string for visual encoding when displaying to the terminal or logging to a file handle, so non-ASCII values will be displayed as escaped character sequences.
My guess is that NSLog assumes a different encoding for 8-bit C-strings than UTF-8, and it may be one that doesn't support Chinese characters. Awkward as it is, you might try this:
NSLog(#"%#", [NSString stringWithCString: temp encoding: NSUTF8StringEncoding]);
I know you are probably looking for an answer that will help you understand what's going on.
But this is what you could do to solve your problem right now:
NSLog(#"%#", strValue);
# define NSLogUTF8(a,b) NSLog(a,[NSString stringWithCString:[[NSString stringWithFormat:#"%#",b] cStringUsingEncoding:NSUTF8StringEncoding] encoding:NSNonLossyASCIIStringEncoding])
#define NSLogUTF8Ex(a,b) NSLog(a,[MLTool utf8toNString:[NSString stringWithFormat:#"%#",b]])
+(NSString*)utf8toNString:(NSString*)str{
NSString* strT= [str stringByReplacingOccurrencesOfString:#"\\U" withString:#"\\u"];
//NSString *strT = [strTemp mutableCopy];
CFStringRef transform = CFSTR("Any-Hex/Java");
CFStringTransform((__bridge CFMutableStringRef)strT, NULL, transform, YES);
return strT;
}

iOS NSString in UTF16

I have a string that I fetched from an Apache server over HTTP:
- (void)connection:(NSURLConnection *)connection didReceiveData:(NSData *)data {
responseString = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding];
...
I need to make that string a UTF16 string. I don't want to turn it into NSData. I need to keep it NSString and I need it to be in UTF16.
I would be happy to put it in an NSData object even, if I could do it as UTF16. I'm doing something similar now:
[self.returnedData appendData:data];
But that still transfers it as UTF8.
It's probably simple and I'm missing it. But I don't find it in the Apple docs or this site, and my Google-Fu has failed me.
What am I missing? How do I do that?
Thanks for your time and help.
EDIT:
Ok. All of what you and Justin have said makes sense and makes things make more sense.
So this is what I am doing. It seems to be correct from this line but I wanted to make sure I am understanding you correctly.
NSData *resultData = [self. result dataUsingEncoding:NSUTF16LittleEndianStringEncoding];
NSString *resultStr = [[NSString alloc] initWithData:resultData encoding:NSUTF16LittleEndianStringEncoding];
NSString *md5Result = [[NSString stringWithFormat:#"%#",[resultStr MD5]] uppercaseString];
NSLog(#"md5Result = %#",md5Result);
That last part is what I am doing with the string after it's UTF-16. I have a category that makes it an MD5 hex string similar to http://blog.blackwhale.at/?tag=hmac
Thanks again. I'll bump you guys both and say this is the right answer.
A string is a string is a string. The encoding refers to how its encoded and decoded to and from NSData. #"blah" is the same as #"blah". There is no UTF8 or UTF 16 for either of those.
Added
So you can do [#"myString" dataUsingEncoding:NSUTF16StringEncoding];
If you convert that back to a string, you'll still have #"myString"
Answer last question in comment below.
So when you POST to a server the server body is encoded data. So what you wanted to do is do what ever you want to the string. THEN convert the string to data using a particular encoding, in your case, NSUTF16StringEncoding or NSUTF16LittleEndianStringEncoding. You are NOT creating UTF-16 string. You are converting a unicode string to UTF-16 encoded data. This is what you need to do then.
NSData *postBody = [[[self.result MD5] uppercaseString] dataUsingEncoding:NSUTF16LittleEndianStringEncoding];
If you need to add more data to the postBody create NSMutableData instead and append the new data as needed.
NSString holds a buffer of whatever encoding it chooses - that may be UTF-8, UTF-16, or something else.
If you just want to create an NSString from a UTF-16 sequence, try NSUTF16BigEndianStringEncoding or one of its relatives.

Resources