How do I convert NSString to an encoding other than UTF-8? - ios

I'm working with c in iOS Project I'm trying to convert my string to respected type in c , below code is supposed to send to core Library
typedef uint16_t UniCharT;
static const UniCharT s_learnWord[] = {'H', 'e','l','\0'};
what i have done till now is string is the one what I'm passing
NSString * string = #"Hel";
static const UniCharT *a = (UniCharT *)[string UTF8String];
But it is failing to convert when more than one character , If i pass one character then working fine please let me where i miss, How can i pass like s_learnWord ?
and i tried in google and StackOverFLow none of the duplicates or answers didn't worked for me like this
Convert NSString into char array I'm already doing same way only.

Your question is a little ambiguous as the title says "c type char[]" but your code uses typedef uint16_t UniCharT; which is contradictory.
For any string conversions other than UTF-8, you normally want to use the method getCString:maxLength:encoding:.
As you are using uint16_t, you probably are trying to use UTF-16? You'll want to pass NSUTF16StringEncoding as the encoding constant in that case. (Or possibly NSUTF16BigEndianStringEncoding/NSUTF16LittleEndianStringEncoding)
Something like this should work:
include <stdlib.h>
// ...
NSString * string = #"part";
NSUInteger stringBytes = [string maximumLengthOfBytesUsingEncoding];
stringBytes += sizeof(UniCharT); // make space for \0 termination
UniCharT* convertedString = calloc(1, stringBytes);
[string getCString:(char*)convertedString
maxLength:stringBytes
encoding:NSUTF16StringEncoding];
// now use convertedString, pass it to library etc.
free(convertedString);

Related

Copyright/Registered symbol encoding not working

I’ve developed an iOS app in which we can send emojis from iOS to web portal and vice versa. All emojis sent from iOS to web portal are displaying perfect except “© and ®”.
Here is the emoji encoding piece of code.
NSData *data = [messageBody dataUsingEncoding:NSNonLossyASCIIStringEncoding];
NSString *encodedString = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding];
// This piece of code returns \251\256 as Unicodes of copyright and registered emojis, as these two Unicodes are not according to standard code so it doesn't display on web portal.
So what should I do to convert them standard Unicodes?
Test Run :
messageBody = #"Copy right symbol : © AND Registered Mark symbol : ®";
// Encoded string i get from the above encoding is
Copy right symbol : \\251 AND Registered Mark symbol : \\256
Where as it should like this (On standard unicodes )
Copy right symbol : \\u00A9 AND Registered Mark symbol : \\u00AE
First, I will try to provide the solution. Then I will try to explain why.
Escaping non-ASCII chars
To escape unicode chars in a string, you shouldn't rely on NSNonLossyASCIIStringEncoding. Below is the code that I use to escape unicode&non-ASCII chars in a string:
// NSMutableString category
- (void)appendChar:(unichar)charToAppend {
[self appendFormat:#"%C", charToAppend];
}
// NSString category
- (NSString *)UEscapedString {
char const hexChar[] = "0123456789ABCDEF";
NSMutableString *outputString = [NSMutableString string];
for (NSInteger i = 0; i < self.length; i++) {
unichar character = [self characterAtIndex:i];
if ((character >> 7) > 0) {
[outputString appendString:#"\\u"];
[outputString appendChar:(hexChar[(character >> 12) & 0xF])]; // append the hex character for the left-most 4-bits
[outputString appendChar:(hexChar[(character >> 8) & 0xF])]; // hex for the second group of 4-bits from the left
[outputString appendChar:(hexChar[(character >> 4) & 0xF])]; // hex for the third group
[outputString appendChar:(hexChar[character & 0xF])]; // hex for the last group, e.g., the right most 4-bits
} else {
[outputString appendChar:character];
}
}
return [outputString copy];
}
(NOTE: I guess Jon Rose's method does the same but I didn't wanna share a method that I didn't test)
Now you have the following string: Copy right symbol : \u00A9 AND Registered Mark symbol : \u00AE
Escaping unicode is done. Now let's convert it back to display the emojis.
Converting back
This is gonna be confusing at first but this is what it is:
NSData *data = [escapedString dataUsingEncoding:NSUTF8StringEncoding];
NSString *converted = [[NSString alloc] data encoding:NSNonLossyASCIIStringEncoding];
Now you have your emojis (and other non-ASCIIs) back.
What is happening?
The problem
In your case, you are trying to create a common language between your server side and your app. However, NSNonLossyASCIIStringEncoding is pretty bad choice for the purpose. Because this is a black-box that is created by Apple and we don't really know what it is exactly doing inside. As we can see, it converts unicode into \uXXXX while converting non-ASCII chars into \XXX. That is why you shouldn't rely on it to build a multi-platform system. There is no equivalent of it in backend platforms and Android.
Yet it is pretty mysterious, NSNonLossyASCIIStringEncoding can still convert back ® from \u00AE while it is converting it into \256 in the first place. I'm sure there are tools on other platforms to convert \uXXXX into unicode chars, that shouldn't be a problem for you.
messageBody is a string there is no reason to convert it to data only to convert it back to a string. Replace your code with
NSString *encodedString = messageBody;
If the messageBody object is incorrect then the way to fix it is to change the way it was created. The server sends data, not strings. The data that the server sends is encoding in some agreed upon way. Generally this encoding is UTF-8. If you know the encoding you can convert the data to a string; if you don't, then the data is gibberish that cannot be read. If the messageBody is incorrect, the problem occurred when it was converted from the data that the server sent. It seems likely that you are parsing it with the incorrect encoding.
The code you posted is just plain wrong. It converts a string to data using one encoding (ASCII) and the reads that data with a different encoding (UTF8). That is like translating a book to Spanish and then having a Portuguese speaker translate it back - it might work for some words, but it is still wrong.
If you are still having trouble then you should share the code of where messageBody is created.
If you server expects a ASCII string with all unicode characters changed to \u00xx then you should first yell at your server guy because he is an idiot. But if that doesn't work you can do the following code
NSString* messageBody = #"Copy right symbol : © AND Registered Mark symbol : ®";
NSData* utf32Data = [messageBody dataUsingEncoding:NSUTF32StringEncoding];
uint32_t *bytes = (uint32_t *) [utf32Data bytes];
NSMutableString* escapedString = [[NSMutableString alloc] init];
//Start a 1 because first bytes are for endianness
for(NSUInteger index = 1; index < escapedString.length / 4 ;index++ ){
uint32_t charValue = bytes[index];
if (charValue <= 127) {
[escapedString appendFormat:#"%C", (unichar)charValue];
}else{
[escapedString appendFormat:#"\\\\u%04X", charValue];
}
}
I'm really do not understand your problem.
You can simply convert ANY character into nsdata and return it into string.
You can simply pass UTF-8 string including both emoji and other symbols using POST request.
NSString* newStr = [[NSString alloc] initWithData:theData encoding:NSUTF8StringEncoding];
NSData* data = [newStr dataUsingEncoding:NSUTF8StringEncoding];
It have to work for both server and client side.
But, of course, you have got the other problem that some fonts do not support allutf-8 chars. That's why, e.g., in terminal you might not see some of them. But this is beyong the scope of this question.
NSNonLossyASCIIStringEncoding is used only then you really wnat to convert symbol into chain of symbols. But it is not needed.

IOS what's the best type for handing INDIVIDUAL unicode chars? wchar_t ? UTF32Char?

I have a set of legacy data that that include individual Unicode chars formed based on struct:
struct LocalGrRec{
wchar_t cBegin;
int x2;
wchar_t cEnd;
in x2;
};
and a typical record looks like this, i.e., includes both long and short Unicode characters
{L'a', 0, L'¥', 3}
I can change the struct to make it easier to handle reading these characters into character variables:
wchar_t c = rec.cBegin;
// or
UTF32Char c = rec.cBegin;
Which one (or perhaps another choice that I don't know of) would make it easier to handle it. Please note that I need to process them as individual chars, but eventually I'll need to include them in an NSString.
What solution gives me the maximum flexibility and minimum pain?
And how would I read that character into a NSString?
Thanks
edit:
I need to compose NSString with it, not the other way around.
With unichar, here's the problem:
unichar c = L'•';
NSLog(#"%c", c); // produces: (") wrong character, presumably the first half of '•'
NSLog(#"%C", c); // produces: (\342\200)
I think you are looking for this method:
[NSString stringWithCharacters:(const unichar*) length:(NSUInteger)];
Just pass it an array of unichars and a length, and it will give you a NSString back
unichar list[3] = {'A', 'B', 'C'};
NSString *listString = [NSString stringWithCharacters:list length:3];
NSLog(#"listString: %#", listString);

Display 5-digit base unicode character from the Entypo font

I'm using the Entypo font in my iPhone app but it's working fine only for some characters. I'm not able to display icons using five-digit unicode values.
I found some information on the Web telling this is due to the UTF encoding supported on iOS (and within other languages too) and the 5-digit unicode values should be splitted in two values.
But I'm not found a clear how-to description or a code sample.
My code to display a Entypo symbol is something like this:
myLabel.text = [NSString stringWithUTF8String:"\u25B6"];
myLabel.font = [UIFont fontWithName:#"Entypo" size:200];
If I replace the unicode value by "\u1F342" which is the icon leaf in the Entypo font then a non-valid character is displayed.
If you already have encountered this issue, perhaps you could help me to save time.
Thanks very much
If you check out the unicode page for that character, you'll see that its UTF-8 encoding is 0xF0 0x9F 0x8D 0x82 - that's what you should be using:
myLabel.text = [NSString stringWithUTF8String:"\uf0\u9f\u8d\u82"];
Note: totally untested.
After several searches I finally found a solution being easy to use in the different cases: symbol encoded up to 4 digits and more than 4 digits.
I defined a NSString category as follows:
#import "NSString+Extension.h"
#implementation NSString (Extension)
/**
* Convert a UTF8 symbol to a string which can directly be used as text in a label view for instance, for which the right font has been specified.
*
* The method can be used for both cases
* . the symbol is defined as a const char with a maximum of 4 digits. In this case the first parameter must be 0 and the second is used. Example: NSString *symbolString = [NSString symbolStringfromUnicode:0 orChar:"\uE766"]
* . the symbol is defined as an integer with hexadecimal notation. It can be have either less or more than 4 digits. In this case, only the first parameter is used. Example : NSString *prefixSymbol = [NSString symbolStringfromUnicode:0x1F464 orChar:nil];
*
* #param symbolUnicode symbol to convert defined as int
* #param symbolChar symbol to convert defined as const char *
*
*/
+ (NSString *)symbolStringfromUnicode:(int)symbolUnicode orChar:(const char *)symbolChar
{
NSString *symbolString;
if (symbolUnicode == 0) {
symbolString = [NSString stringWithUTF8String:symbolChar];
}
else {
int unicode = symbolUnicode;
symbolString = [[NSString alloc] initWithBytes:&unicode length:sizeof(unicode) encoding:NSUTF32LittleEndianStringEncoding];
}
return symbolString;
}
#end

Converting int to NSString

I thought I had nailed converting an int to and NSString a while back, but each time I run my code, the program gets to the following lines and crashes. Can anyone see what I'm doing wrong?
NSString *rssiString = (int)self.selectedBeacon.rssi;
UnitySendMessage("Foo", "RSSIValue", [rssiString UTF8String] );
These lines should take the rssi value (Which is an NSInt) convert it to a string, then pass it to my unity object in a format it can read.
What am I doing wrong?
NSString *rssiString = [NSString stringWithFormat:#"%d", self.selectedBeacon.rssi];
UPDATE: it is important to remember there is no such thing as NSInt. In my snippet I assumed that you meant NSInteger.
If you use 32-bit environment, use this
NSString *rssiString = [NSString stringWithFormat:#"%d", self.selectedBeacon.rssi];
But you cann't use this in 64-bit environment, Because it will give below warning.
Values of type 'NSInteger' should not be used as format arguments; add
an explicit cast to 'long'
So use below code, But below will give warning in 32-bit environment.
NSString *rssiString = [NSString stringWithFormat:#"%ld", self.selectedBeacon.rssi];
If you want to code for both(32-bit & 64-bit) in one line, use below code. Just casting.
NSString *rssiString = [NSString stringWithFormat:#"%ld", (long)self.selectedBeacon.rssi];
I'd like to provide a sweet way to do this job:
//For any numbers.
int iValue;
NSString *sValue = [#(iValue) stringValue];
//Even more concise!
NSString *sValue = #(iValue).stringValue;
NSString *rssiString = [self.selectedBeacon.rssi stringValue];
For simple conversions of basic number values, you can use a technique called casting. A cast forces a value to perform a conversion based on strict rules established for the C language. Most of the rules dictate how conversions between numeric types (e.g., long and short versions of int and float types) are to behave during such conversions.
Specify a cast by placing the desired output data type in parentheses before the original value. For example, the following changes an int to a float:
float myValueAsFloat = (float)myValueAsInt;
One of the rules that could impact you is that when a float or double is cast to an int, the numbers to the right of the decimal (and the decimal) are stripped off. No rounding occurs. You can see how casting works for yourself in Workbench by modifying the runMyCode: method as follows:
- (IBAction)runMyCode:(id)sender {
double a = 12345.6789;
int b = (int)a;
float c = (float)b;
NSLog(#"\ndouble = %f\nint of double = %d\nfloat of int = %f", a, b, c);
}
the console reveals the following log result:
double = 12345.678900
int of double = 12345
float of int = 12345.000000
original link is http://answers.oreilly.com/topic/2508-how-to-convert-objective-c-data-types-within-ios-4-sdk/
If self.selectedBeacon.rssi is an int, and it appears you're interested in providing a char * string to the UnitySendMessage API, you could skip the trip through NSString:
char rssiString[19];
sprintf(rssiString, "%d", self.selectedBeacon.rssi);
UnitySendMessage("Foo", "RSSIValue", rssiString );

convert unicode string to nsstring

I have a unicode string as
{\rtf1\ansi\ansicpg1252\cocoartf1265
{\fonttbl\f0\fswiss\fcharset0 Helvetica;\f1\fnil\fcharset0 LucidaGrande;}
{\colortbl;\red255\green255\blue255;}
{\*\listtable{\list\listtemplateid1\listhybrid{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace360\levelindent0{\*\levelmarker \{check\}}{\leveltext\leveltemplateid1\'01\uc0\u10003 ;}{\levelnumbers;}\fi-360\li720\lin720 }{\listname ;}\listid1}}
{\*\listoverridetable{\listoverride\listid1\listoverridecount0\ls1}}
\paperw11900\paperh16840\margl1440\margr1440\vieww22880\viewh16200\viewkind0
\pard\li720\fi-720\pardirnatural
\ls1\ilvl0
\f0\fs24 \cf0 {\listtext
\f1 \uc0\u10003
\f0 }One\
{\listtext
\f1 \uc0\u10003
\f0 }Two\
}
Here i have unicode data \u10003 which is equivalent to "✓" characters. I have used
[NSString stringWithCharacters:"\u10003" length:NSUTF16StringEncoding] which is throwing compilation error. Please let me know how to convert these unicode characters to "✓".
Regards,
Boom
I have same for problem and the following code solve my issue
For Encode
NSData *dataenc = [yourtext dataUsingEncoding:NSNonLossyASCIIStringEncoding];
NSString *encodevalue = [[NSString alloc]initWithData:dataenc encoding:NSUTF8StringEncoding];
For decode
NSData *data = [yourtext dataUsingEncoding:NSUTF8StringEncoding];
NSString *decodevalue = [[NSString alloc] initWithData:data encoding:NSNonLossyASCIIStringEncoding];
Thanks
I have used below code to convert a Uniode string to NSString. This should work fine.
NSData *unicodedStringData =
[unicodedString dataUsingEncoding:NSUTF8StringEncoding];
NSString *emojiStringValue =
[[NSString alloc] initWithData:unicodedStringData encoding:NSNonLossyASCIIStringEncoding];
In Swift 4
let emoji = "😃"
let unicodedData = emoji.data(using: String.Encoding.utf8, allowLossyConversion: true)
let emojiString = String(data: unicodedData!, encoding: String.Encoding.utf8)
I assume that:
You are reading this RTF data from a file or other external source.
You are parsing it yourself (not using, say, AppKit's built-in RTF parser).
You have a reason why you're parsing it yourself, and that reason isn't “wait, AppKit has this built in?”.
You have come upon \u… in the input you're parsing and need to convert that to a character for further handling and/or inclusion in the output text.
You have ruled out \uc, which is a different thing (it specifies the number of non-Unicode bytes that follow the \u… sequence, if I understood the RTF spec correctly).
\u is followed by hexadecimal digits. You need to parse those to a number; that number is the Unicode code point number for the character the sequence represents. You then need to create an NSString containing that character.
If you're using NSScanner to parse the input, then (assuming you have already scanned past the \u itself) you can simply ask the scanner to scanHexInt:. Pass a pointer to an unsigned int variable.
If you're not using NSScanner, do whatever makes sense for however you're parsing it. For example, if you've converted the RTF data to a C string and are reading through it yourself, you'll want to use strtoul to parse the hex number. It'll interpret the number in whatever base you specify (in this case, 16) and then put the pointer to the next character wherever you want it.
Your unsigned int or unsigned long variable will then contain the Unicode code point value for the specified character. In the example from your question, that will be 0x10003, or U+10003.
Now, for most characters, you could simply assign that over to a unichar variable and create an NSString from that. That won't work here: unichars only go up to 0xFFFF, and this code point is higher than that (in technical terms, it's outside the Basic Multilingual Plane).
Fortunately, *CF*String has a function to help you:
unsigned int codePoint = /*…*/;
unichar characters[2];
NSUInteger numCharacters = 0;
if (CFStringGetSurrogatePairForLongCharacter(codePoint, characters)) {
numCharacters = 2;
} else {
characters[0] = codePoint;
numCharacters = 1;
}
You can then use stringWithCharacters:length: to create an NSString from this array of 16-bit characters.
Use this:
NSString *myUnicodeString = #"\u10003";
Thanks to modern Objective C.
Let me know if its not what you want.
NSString *strUnicodeString = "\u2714";
NSData *unicodedStringData = [strUnicodeString dataUsingEncoding:NSUTF8StringEncoding];
NSString *emojiStringValue = [[NSString alloc] initWithData:unicodedStringData encoding:NSUTF8StringEncoding];

Resources