Turning escaped unicode char[] into NSString - ios

I have the following char[] str = "\xe7a";
This is the result of having converted "ça" into unicode escaped with python .encode('unicode-escape')
When it gets to iOS I'm trying to convert it to "ça" again... but I can't find the right method to do it.
How can I convert \x escaped characters into their proper characters using iOS functions?
str = [[NSString alloc] initWithBytes:m.param5 length:STRING_PARAM_LENGTH encoding:NSASCIIStringEncoding] UTF8String];
doesn't work
str = [[NSString alloc] initWithBytes:m.param5 length:STRING_PARAM_LENGTH NSUTF8StringEncoding];
doesn't work
str = [NSString stringWithUTF8String:m.param5];
doesn't work as well
Any ideas?

Assuming \xe7 means the byte 0xe7, the char array is encoded as Windows-1252/ISO-8859-1... so:
NSString *string = [NSString stringWithCString:str encoding:NSISOLatin1StringEncoding];
If the contents are literally a backslash, x, e, and 7, you need to turn that into the real implied byte value
before running the above code

Related

Encoding NSString cut short using NSASCIIStringEncoding

I've got this:
<53657269 616c3a20 39303030 30303138 3b4d6f64 656c3a20 32323031 3b466972 6d776172 653a2030 3431353b 4c696272 6172793a 20535444 30363132 3b566f69 63653a20 4d31303b 546f7765 723a2059 65733b52 65636f72 643a2059 65733b44 69616c3a 204e6f3b 554f7074 733a2031 39383b46 756e6374 696f6e73 3a205245 44411034 424c5545 1011546f 6c6c0118 48796d6e a003466e 63351066 666f6f64 10556261 636f0000 746f6173 10253b4c 6162656c 733a2042 4c554542 4c554523 466e6338 544f4153 54455223 466e6337 4241434f 4e23466e 63354c41 50544f50 53235245 44415445 58415323 466e6336 42524943 4b532374 6f617354 4f415354 45522362 61636f42 41434f4e 23666f6f 64424143 4f4e3b4d 696c5665 723a2035 2e302e36 2e313b4c 6f67696e 3a205965 73>
I'm using this to convert to a NSString:
NSString *info = [[NSString alloc] initWithData:data encoding:NSASCIIStringEncoding];
My output is:
Serial: 90000018;Model: 2201;Firmware: 0415;Library: STD0612;Voice: M10;Tower: Yes;Record: Yes;Dial: No;UOpts: 198;Functions: REDA4BLUETollHymn Fnc5ffoodUbaco
If you convert the entire set you get a lot more than what is showing up in the string. Why is it cutting it short in the encoding?
Quickly converting with readily available encoders online you get the full conversion:
Serial: 90000018;Model: 2201;Firmware: 0415;Library: STD0612;Voice: M10;Tower: Yes;Record: Yes;Dial: No;UOpts: 198;Functions: REDA4BLUETollHymn Fnc5ffoodUbacotoas%;Labels: BLUEBLUE#Fnc8TOASTER#Fnc7BACON#Fnc5LAPTOPS#REDATEXAS#Fnc6BRICKS#toasTOASTER#bacoBACON#foodBACON;MilVer: 5.0.6.1;Login: Yes
Why is NSString *info = [[NSString alloc] initWithData:data encoding:NSASCIIStringEncoding]; only writing about half to the string?
I suspect that it's converting the whole string, but the mechanism you're using to examine it is truncating it. It looks like your ASCII-encoded data has embedded null characters in it. NSString is perfectly capable of holding embedded null characters, but anything which converts to a C-style string will stop processing after it hits one of those.
What do you get if you post-process your string using the following?
unichar nullUnichar = 0;
NSString* nullCharString = [[NSString alloc] initWithCharacters:&nullUnichar length:1];
info = [info stringByReplacingOccurrencesOfString:nullCharString withString:#"\\x00"];

Converting Unicode to nsstring

I am writing a code that will convert the Unicode to tibetan string. Strangely the code give right conversion on some Unicode while give null while converting Unicode to NSString in iOS using Objective-C.
My unicode is
cellName = "\U0f56\U0f7c\U0f51\U0f0d";
cellSubtitle = "\U0f56\U0f7c\U0f51\U0f0b\U0f62\U0f72\U0f42\U0f66\U0f0b\U0f60\U0f51\U0f74\U0f66\U0f0b\U0f66\U0fa1\U0f7c\U0f51\U0f0b\U0f40\U0fb1\U0f72\U0f0b\U0f61\U0f74\U0f63\U0f0b\U0f42\U0fb2\U0f74\U0f0d \U0f66\U0f9f\U0f7c\U0f51\U0f0b\U0f58\U0f44\U0f60\U0f0b\U0f62\U0f72\U0f66\U0f0b\U0f66\U0f90\U0f7c\U0f62\U0f0b\U0f42\U0f66\U0f74\U0f58\U0f0b\U0f51\U0f44\U0f0c\U0f0d \U0f56\U0f62\U0f0b\U0f51\U0f56\U0f74\U0f66\U0f0b\U0f42\U0f59\U0f44\U0f0b\U0f62\U0f74\U0f0b\U0f56\U0f5e\U0f72\U0f0d \U0f66\U0fa8\U0f51\U0f0b\U0f58\U0f51\U0f7c\U0f0b\U0f41\U0f58\U0f66\U0f0b\U0f66\U0f92\U0f44\U0f0b\U0f51\U0fb2\U0f74\U0f42\U0f0b\U0f56\U0f45\U0f66\U0f0b\U0f66\U0f7c\U0f0d \U0f0d";
and the text conversion is
cellName = "བོད།";
cellSubtitle = "བོད་རིགས་འདུས་སྡོད་ཀྱི་ཡུལ་གྲུ། སྟོད་མངའ་རིས་སྐོར་གསུམ་དང༌། བར་དབུས་གཙང་རུ་བཞི། སྨད་མདོ་ཁམས་སྒང་དྲུག་བཅས་སོ། །";
for the unicode
cellName = "\U0f46\U0f74\U0f44\U0f0b\U0f0d";
cellSubtitle = "(\U0f62\U0f92\U0fb1\U0f53\U0f0b\U0f5a\U0f72\U0f42) \U0f21 \U0f46\U0f7a\U0f0b\U0f56\U0f60\U0f72\U0f0b\U0f63\U0fa1\U0f7c\U0f42\U0f0b\U0f5f\U0fb3\U0f0d \U0f56\U0f7c\U0f44\U0f66\U0f0b\U0f5a\U0f7c\U0f51\U0f0b\U0f51\U0f44\U0f0b\U0f62\U0f92\U0fb1\U0f0b\U0f41\U0fb1\U0f7c\U0f53\U0f0b\U0f66\U0f7c\U0f42\U0f66\U0f0b\U0f63\U0f9f\U0f7c\U0f66\U0f0b\U0f66\U0f0b\U0f63\U0f9f\U0f7c\U0f66\U0f0b\U0f60\U0f47\U0f7c\U0f42\U0f0b\U0f42\U0f72\U0f0b\U0f66\U0f92\U0f7c\U0f0b\U0f53\U0f66\U0f0b\U0f49\U0f74\U0f44\U0f0b\U0f44\U0f74\U0f0b\U0f59\U0f58\U0f0b\U0f63\U0f66\U0f0b\U0f5f\U0f72\U0f53\U0f0b\U0f58\U0f7a\U0f51\U0f0b\U0f54\U0f60\U0f72\U0f0b\U0f51\U0f7c\U0f53\U0f0b\U0f4f\U0f7a\U0f0d \n \U0f51\U0f54\U0f7a\U0f62\U0f0b\U0f53\U0f0d \U0f41\U0f44\U0f0b\U0f46\U0f74\U0f44\U0f0b\U0f0d \U0f5e\U0f72\U0f44\U0f0b\U0f46\U0f74\U0f44\U0f0b\U0f0d \U0f46\U0f74\U0f0b\U0f56\U0f7c\U0f0b\U0f46\U0f74\U0f44\U0f0b\U0f46\U0f74\U0f44\U0f0b\U0f0d \U0f51\U0f40\U0f62\U0f0b\U0f61\U0f7c\U0f63\U0f0b\U0f46\U0f74\U0f44\U0f0b\U0f56\U0f0b\U0f5e\U0f7a\U0f66\U0f0b\U0f54\U0f0b\U0f63\U0f9f\U0f0b\U0f56\U0f74\U0f0d \n2. \U0f53\U0f0b\U0f5a\U0f7c\U0f51\U0f0b\U0f42\U0f5e\U0f7c\U0f53\U0f0b\U0f54\U0f60\U0f72\U0f0b\U0f51\U0f7c\U0f53\U0f0b\U0f4f\U0f7a\U0f0d \n \U0f51\U0f54\U0f7a\U0f62\U0f0b\U0f53\U0f0d \U0f53\U0f74\U0f0b\U0f56\U0f7c\U0f0b\U0f53\U0f72\U0f0b\U0f55\U0f74\U0f0b\U0f56\U0f7c\U0f0b\U0f63\U0f66\U0f0b\U0f46\U0f74\U0f44\U0f0b\U0f0d \U0f46\U0f74\U0f44\U0f0b\U0f44\U0f74\U0f60\U0f72\U0f0b\U0f51\U0f74\U0f66\U0f0d \U0f55\U0f74\U0f0b\U0f56\U0f7c\U0f0b\U0f63\U0f66\U0f0b\U0f53\U0f74\U0f0b\U0f56\U0f7c\U0f0b\U0f46\U0f74\U0f44\U0f0b\U0f0d \U0f56\U0fb1\U0f72\U0f66\U0f0b\U0f54\U0f0b\U0f46\U0f74\U0f44\U0f0b\U0f56\U0f0d \U0f56\U0f74\U0f0b\U0f46\U0f74\U0f44\U0f0b\U0f56\U0f0b\U0f5e\U0f7a\U0f66\U0f0b\U0f54\U0f0b\U0f63\U0f9f\U0f0b\U0f56\U0f74\U0f0d3. \U0f51\U0f58\U0f53\U0f0b\U0f54\U0f60\U0f58\U0f0b\U0f5e\U0f53\U0f0b\U0f54\U0f60\U0f72\U0f0b\U0f51\U0f7c\U0f53\U0f0b\U0f4f\U0f7a\U0f0d \U0f66\U0f7a\U0f58\U0f66\U0f0b\U0f46\U0f74\U0f44\U0f0b\U0f0d \U0f49\U0f58\U0f0b\U0f46\U0f74\U0f44\U0f0b\U0f0d \U0f56\U0fb3\U0f7c\U0f0b\U0f46\U0f74\U0f44\U0f0b\U0f0d \U0f40\U0fb3\U0f51\U0f0b\U0f46\U0f74\U0f44\U0f0b\U0f5e\U0f7a\U0f66\U0f0b\U0f54\U0f0b\U0f63\U0f9f\U0f0b\U0f56\U0f74\U0f0d";
it gives me null value
And I am using this code for the conversion
NSData *data2 = [stringJoinedByNewLines2 dataUsingEncoding:NSUTF8StringEncoding];
NSString *decodevalue2 = [[NSString alloc] initWithData:data2 encoding:NSNonLossyASCIIStringEncoding];

How to put unicode char into NSString

For example I could type an emoji character code such as:
NSString* str = #"😊";
NSLog(#"%#", str);
The smile emoji would be seen in the console.
Maybe the code editor and the compiler would trade the literal in UTF-8.
And now I'm working in a full unicode, I mean 32bit per char, environment and I've got the unicode of the emoji, I want to convert the 32bit unicode into a NSString for example:
int charcode = 0x0001F60A;
NSLog(#"%??", charcode);
The question is what should I put at the "??" position and then I could format the charcode into a emoji string?
BTW the charcode was a variable which can not be determine at the compile time.
I don't want to compress the 32bit int into UTF-8 bytes unless that would be the only way.
If 0x0001F60A is a dynamic value determined at runtime then
you can use the NSString method
- (instancetype)initWithBytes:(const void *)bytes length:(NSUInteger)len encoding:(NSStringEncoding)encoding;
to create a string containing a character with the given Unicode value:
int charcode = 0x0001F60A;
uint32_t data = OSSwapHostToLittleInt32(charcode); // Convert to little-endian
NSString *str = [[NSString alloc] initWithBytes:&data length:4 encoding:NSUTF32LittleEndianStringEncoding];
NSLog(#"%#", str); // 😊
Use NSString initialization method
int charcode = 0x0001F60A;
NSLog(#"%#", [[NSString alloc] initWithBytes:&charcode length:4 encoding:NSUTF32LittleEndianStringEncoding]);

Objective c unicode char* to NSString

I'm getting char* with latin letter
print example : M\xe4da Primavesi
i'm trying to convert it to NSString , the final result should be Mäda Primavesi.
Anyone know how the conversation be done ?
Thanks
The encoding you want is NSISOLatin1StringEncoding:
NSString *latin = [NSString stringWithCString:"M\xe4da Primavesi" encoding:NSISOLatin1StringEncoding];
BUT you will notice that this prints MÚ Primavesi. That is because \x is greedy, and interprets the "da" as part of the hex \xe4da. You have to find a way to separate the "\xe4" part with the "da" part.
This works:
NSString *latin = [NSString stringWithCString:"M\xe4""da Primavesi" encoding:NSISOLatin1StringEncoding]; // prints Mäda Primavesi
I suggest you encode your latin C-String using utf-8 string "M\u00e4da Primavesi" instead, and decode it with NSUTF8StringEncoding.
Look like latin1.
[NSString stringWithCString:cString encoding: NSISOLatin1StringEncoding]
Try the NSString API stringWithCString:encoding: as below,
`[NSString stringWithCString:cString encoding:NSUTF8StringEncoding];`
char *latinChars = "M\xe4da Primavesi";
NSString *chatStr = [NSString stringWithCString:latinChars encoding:NSASCIIStringEncoding];
NSLog(#"chatStr:%#", chatStr);
The result is:MÚ Primavesi
And I have a try:
char *latinChars = "M\xe4 da Primavesi"; //add an blank for 'da'
NSString *chatStr = [NSString stringWithCString:latinChars encoding:NSASCIIStringEncoding];
NSLog(#"chatStr:%#", chatStr);
The result is:Mä da Primavesi

convert unicode string to nsstring

I have a unicode string as
{\rtf1\ansi\ansicpg1252\cocoartf1265
{\fonttbl\f0\fswiss\fcharset0 Helvetica;\f1\fnil\fcharset0 LucidaGrande;}
{\colortbl;\red255\green255\blue255;}
{\*\listtable{\list\listtemplateid1\listhybrid{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace360\levelindent0{\*\levelmarker \{check\}}{\leveltext\leveltemplateid1\'01\uc0\u10003 ;}{\levelnumbers;}\fi-360\li720\lin720 }{\listname ;}\listid1}}
{\*\listoverridetable{\listoverride\listid1\listoverridecount0\ls1}}
\paperw11900\paperh16840\margl1440\margr1440\vieww22880\viewh16200\viewkind0
\pard\li720\fi-720\pardirnatural
\ls1\ilvl0
\f0\fs24 \cf0 {\listtext
\f1 \uc0\u10003
\f0 }One\
{\listtext
\f1 \uc0\u10003
\f0 }Two\
}
Here i have unicode data \u10003 which is equivalent to "✓" characters. I have used
[NSString stringWithCharacters:"\u10003" length:NSUTF16StringEncoding] which is throwing compilation error. Please let me know how to convert these unicode characters to "✓".
Regards,
Boom
I have same for problem and the following code solve my issue
For Encode
NSData *dataenc = [yourtext dataUsingEncoding:NSNonLossyASCIIStringEncoding];
NSString *encodevalue = [[NSString alloc]initWithData:dataenc encoding:NSUTF8StringEncoding];
For decode
NSData *data = [yourtext dataUsingEncoding:NSUTF8StringEncoding];
NSString *decodevalue = [[NSString alloc] initWithData:data encoding:NSNonLossyASCIIStringEncoding];
Thanks
I have used below code to convert a Uniode string to NSString. This should work fine.
NSData *unicodedStringData =
[unicodedString dataUsingEncoding:NSUTF8StringEncoding];
NSString *emojiStringValue =
[[NSString alloc] initWithData:unicodedStringData encoding:NSNonLossyASCIIStringEncoding];
In Swift 4
let emoji = "😃"
let unicodedData = emoji.data(using: String.Encoding.utf8, allowLossyConversion: true)
let emojiString = String(data: unicodedData!, encoding: String.Encoding.utf8)
I assume that:
You are reading this RTF data from a file or other external source.
You are parsing it yourself (not using, say, AppKit's built-in RTF parser).
You have a reason why you're parsing it yourself, and that reason isn't “wait, AppKit has this built in?”.
You have come upon \u… in the input you're parsing and need to convert that to a character for further handling and/or inclusion in the output text.
You have ruled out \uc, which is a different thing (it specifies the number of non-Unicode bytes that follow the \u… sequence, if I understood the RTF spec correctly).
\u is followed by hexadecimal digits. You need to parse those to a number; that number is the Unicode code point number for the character the sequence represents. You then need to create an NSString containing that character.
If you're using NSScanner to parse the input, then (assuming you have already scanned past the \u itself) you can simply ask the scanner to scanHexInt:. Pass a pointer to an unsigned int variable.
If you're not using NSScanner, do whatever makes sense for however you're parsing it. For example, if you've converted the RTF data to a C string and are reading through it yourself, you'll want to use strtoul to parse the hex number. It'll interpret the number in whatever base you specify (in this case, 16) and then put the pointer to the next character wherever you want it.
Your unsigned int or unsigned long variable will then contain the Unicode code point value for the specified character. In the example from your question, that will be 0x10003, or U+10003.
Now, for most characters, you could simply assign that over to a unichar variable and create an NSString from that. That won't work here: unichars only go up to 0xFFFF, and this code point is higher than that (in technical terms, it's outside the Basic Multilingual Plane).
Fortunately, *CF*String has a function to help you:
unsigned int codePoint = /*…*/;
unichar characters[2];
NSUInteger numCharacters = 0;
if (CFStringGetSurrogatePairForLongCharacter(codePoint, characters)) {
numCharacters = 2;
} else {
characters[0] = codePoint;
numCharacters = 1;
}
You can then use stringWithCharacters:length: to create an NSString from this array of 16-bit characters.
Use this:
NSString *myUnicodeString = #"\u10003";
Thanks to modern Objective C.
Let me know if its not what you want.
NSString *strUnicodeString = "\u2714";
NSData *unicodedStringData = [strUnicodeString dataUsingEncoding:NSUTF8StringEncoding];
NSString *emojiStringValue = [[NSString alloc] initWithData:unicodedStringData encoding:NSUTF8StringEncoding];

Resources