How to convert unicode hex number variable to character in NSString? - ios

Now I have a range of unicode numbers, I want to show them in UILabel, I can show them if i hardcode them, but that's too slow, so I want to substitute them with a variable, and then change the variable and get the relevant character.
For example, now I know the unicode is U+095F, I want to show the range of U+095F to U+096f in UILabel, I can do that with hardcode like
NSString *str = [NSString stringWithFormat:#"\u095f"];
but I want to do that like
NSInteger hex = 0x095f;
[NSString stringWithFormat:#"\u%ld", (long)hex];
I can change the hex automatically,just like using #"%ld", (long)hex, so anybody know how to implement that?

You can initialize the string with the a buffer of bytes of the hex (you simply provide its pointer). The point is, and the important thing to notice is that you provide the character encoding to be applied. Specifically you should notice the byte order.
Here's an example:
UInt32 hex = 0x095f;
NSString *unicodeString = [[NSString alloc] initWithBytes:&hex length:sizeof(hex) encoding:NSUTF32LittleEndianStringEncoding];
Note that solutions like using the %C format are fine as long as you use them for 16-bit unicode characters; 32-bit unicode characters like emojis (for example: 0x1f601, 0x1f41a) will not work using simple formatting.

You would have to use
[NSString stringWithFormat:#"%C", (unichar)hex];
or directly declare the unichar (unsigned short) as
unichar uni = 0x095f;
[NSString stringWithFormat:#"%C", uni];
A useful resource might be the String Format Specifiers, which lists %C as
16-bit Unicode character (unichar), printed by NSLog() as an ASCII character, or, if not an ASCII character, in the octal format \ddd or the Unicode hexadecimal format \udddd, where d is a digit.

Like this:
unichar charCode = 0x095f;
NSString *s = [NSString stringWithFormat:#"%C",charCode];
NSLog(#"String = %#",s); //Output:String = य़

Related

Substring char * in Objective C

I need to substring char* to some length and need to convert to NSString.
char *val substring Length
I tried
NSString *tempString = [NSString stringWithCString:val encoding:NSAsciiStringEncoding];
NSRange range = NSMakeRange (0, length);
NSString *finalValue = [tempString substringWithRange: range];
This works but not for other special character languages like chinese.
If i convert To UTF8Encoding then substring length will mismatch.
Is there any other way to substring the char* then convert to UTF8 encoding?
You have to use the encoding, the string is encoded in.
In your case, you say to interpret the string as ASCII string. ASCII does not have chinese characters. Therefore this cannot work with chinese characters: They are not there.
Likely you have an UTF8 encoded string. But simply switching to UTF8 does not help. Since NSString and OS X/iOS at all encodes 16-Bit Unicode, but extended Unicode has 20 bits, chinese characters needs multiple codes. This has some effects, for example -length returns the number of codes, not the number of chinese characters. However, with -rangeOfComposedCharacterSequencesForRange: you can adjust the range.
For example 𠀖 (CJK unified ideograph-0x20016):
NSString *str = #"𠀖"; // One chinese whatever
NSLog(#"%ld", [str length]); // This are "2" characters
NSRange range = {0, 1}; // Range for the "first" character
NSLog(#"%ld %ld", range.location, range.length); // 0 1
range = [str rangeOfComposedCharacterSequencesForRange:range];
NSLog(#"%ld %ld", range.location, range.length); // 0 2
You can get a better answer, if you add information about the encoding of the string coming in and the required encoding for putting out.
Strings are not UTF8 or whatever strings. Strings are strings. Their storage, their representation in computer memory has an encoding, but they don't have an encoding themselves.
I found the solution for my question
char subString[length+1];
strncpy(subString, val, length);
subString[length] = '\0'; // place the null terminator
NSString *finalString = [NSString stringWithCString: subString encoding:NSUTF8StringEncoding];
I did the char* sub string and UTF8 encoding both.

Way to detect character that takes up more than one index spot in an NSString?

I'm wondering, is there a way to detect a character that takes up more than 1 index spot in an NSString? (like an emoji). I'm trying to implement a custom text view and when the user pushes delete, I need to know if I should delete only the previous one index spot or more.
Actually NSString use UTF-16.So it is quite difficult to work with characters which takes two UTF-16 charater(unichar) or more.But you can do with rangeOfComposedCharacterSequenceAtIndexto get range and than delete.
First find the last character index from string
NSUInteger lastCharIndex = [str length] - 1;
Than get the range of last character
NSRange lastCharRange = [str rangeOfComposedCharacterSequenceAtIndex: lastCharIndex];
Than delete with range from character (If it is of two UTF-16 than it deletes UTF-16)
deletedLastCharString = [str substringToIndex: lastCharRange.location];
You can use this method with any type of characters which takes any number of unichar
For one you could transform the string to a sequence of characters using [myString UTF8String] and you can then check if the character has its first bit set to one or zero. If its one then this is a UTF8 character and you can then check how many bytes are there to this character. Details about UTF8 can be found on Wikipedia - UTF8. Here is a simple example:
NSString *string = #"ČTest";
const char *str = [string UTF8String];
NSMutableString *ASCIIStr = [NSMutableString string];
for (int i = 0; i < strlen(str); ++i)
if (!(str[i] & 128))
[ASCIIStr appendFormat:#"%c", str[i]];
NSLog(#"%#", ASCIIStr); //Should contain only ASCII characters

convert unicode string to nsstring

I have a unicode string as
{\rtf1\ansi\ansicpg1252\cocoartf1265
{\fonttbl\f0\fswiss\fcharset0 Helvetica;\f1\fnil\fcharset0 LucidaGrande;}
{\colortbl;\red255\green255\blue255;}
{\*\listtable{\list\listtemplateid1\listhybrid{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace360\levelindent0{\*\levelmarker \{check\}}{\leveltext\leveltemplateid1\'01\uc0\u10003 ;}{\levelnumbers;}\fi-360\li720\lin720 }{\listname ;}\listid1}}
{\*\listoverridetable{\listoverride\listid1\listoverridecount0\ls1}}
\paperw11900\paperh16840\margl1440\margr1440\vieww22880\viewh16200\viewkind0
\pard\li720\fi-720\pardirnatural
\ls1\ilvl0
\f0\fs24 \cf0 {\listtext
\f1 \uc0\u10003
\f0 }One\
{\listtext
\f1 \uc0\u10003
\f0 }Two\
}
Here i have unicode data \u10003 which is equivalent to "✓" characters. I have used
[NSString stringWithCharacters:"\u10003" length:NSUTF16StringEncoding] which is throwing compilation error. Please let me know how to convert these unicode characters to "✓".
Regards,
Boom
I have same for problem and the following code solve my issue
For Encode
NSData *dataenc = [yourtext dataUsingEncoding:NSNonLossyASCIIStringEncoding];
NSString *encodevalue = [[NSString alloc]initWithData:dataenc encoding:NSUTF8StringEncoding];
For decode
NSData *data = [yourtext dataUsingEncoding:NSUTF8StringEncoding];
NSString *decodevalue = [[NSString alloc] initWithData:data encoding:NSNonLossyASCIIStringEncoding];
Thanks
I have used below code to convert a Uniode string to NSString. This should work fine.
NSData *unicodedStringData =
[unicodedString dataUsingEncoding:NSUTF8StringEncoding];
NSString *emojiStringValue =
[[NSString alloc] initWithData:unicodedStringData encoding:NSNonLossyASCIIStringEncoding];
In Swift 4
let emoji = "😃"
let unicodedData = emoji.data(using: String.Encoding.utf8, allowLossyConversion: true)
let emojiString = String(data: unicodedData!, encoding: String.Encoding.utf8)
I assume that:
You are reading this RTF data from a file or other external source.
You are parsing it yourself (not using, say, AppKit's built-in RTF parser).
You have a reason why you're parsing it yourself, and that reason isn't “wait, AppKit has this built in?”.
You have come upon \u… in the input you're parsing and need to convert that to a character for further handling and/or inclusion in the output text.
You have ruled out \uc, which is a different thing (it specifies the number of non-Unicode bytes that follow the \u… sequence, if I understood the RTF spec correctly).
\u is followed by hexadecimal digits. You need to parse those to a number; that number is the Unicode code point number for the character the sequence represents. You then need to create an NSString containing that character.
If you're using NSScanner to parse the input, then (assuming you have already scanned past the \u itself) you can simply ask the scanner to scanHexInt:. Pass a pointer to an unsigned int variable.
If you're not using NSScanner, do whatever makes sense for however you're parsing it. For example, if you've converted the RTF data to a C string and are reading through it yourself, you'll want to use strtoul to parse the hex number. It'll interpret the number in whatever base you specify (in this case, 16) and then put the pointer to the next character wherever you want it.
Your unsigned int or unsigned long variable will then contain the Unicode code point value for the specified character. In the example from your question, that will be 0x10003, or U+10003.
Now, for most characters, you could simply assign that over to a unichar variable and create an NSString from that. That won't work here: unichars only go up to 0xFFFF, and this code point is higher than that (in technical terms, it's outside the Basic Multilingual Plane).
Fortunately, *CF*String has a function to help you:
unsigned int codePoint = /*…*/;
unichar characters[2];
NSUInteger numCharacters = 0;
if (CFStringGetSurrogatePairForLongCharacter(codePoint, characters)) {
numCharacters = 2;
} else {
characters[0] = codePoint;
numCharacters = 1;
}
You can then use stringWithCharacters:length: to create an NSString from this array of 16-bit characters.
Use this:
NSString *myUnicodeString = #"\u10003";
Thanks to modern Objective C.
Let me know if its not what you want.
NSString *strUnicodeString = "\u2714";
NSData *unicodedStringData = [strUnicodeString dataUsingEncoding:NSUTF8StringEncoding];
NSString *emojiStringValue = [[NSString alloc] initWithData:unicodedStringData encoding:NSUTF8StringEncoding];

3rd Party Language support (Xcode + iOS) [duplicate]

I've got a problem with the following code:
NSString *strValue=#"你好";
char temp[200];
strcpy(temp, [strValue UTF8String]);
printf("%s", temp);
NSLog(#"%s", temp);
in the first line of the codes, two Chinese characters are double quoted. The problem is printf function can display the Chinese characters properly, but NSLog can't.
Thanks to all. I figured out a solution for this problem. Foundation uses UTF-16 by default, so in order to use NSLog to output the c string in the example, I have to use cStringUsingEncoding to get UTF-16 c string and use %S to replace %s.
NSString *strValue=#"你好";
char temp[200];
strcpy(temp, [strValue UTF8String]);
printf("%s", temp);
strcpy(temp, [strValue cStringUsingEncoding:NSUTF16LittleEndianStringEncoding]);
NSLog(#"%S", temp);
NSLog's %s format specifier is in the system encoding, which seems to always be MacRoman and not unicode, so it can only display characters in MacRoman encoding. Your best option with NSLog is just to use the native object format specifier %# and pass the NSString directly instead of converting it to a C String. If you only have a C string and you want to use NSLog to display a message instead of printf or asl, you will have to do something like Don suggests in order to convert the string to an NSString object first.
So, all of these should display the expected string:
NSString *str = #"你好";
const char *cstr = [str UTF8String];
NSLog(#"%#", str);
printf("%s\n", cstr);
NSLog(#"%#", [NSString stringWithUTF8String:cstr]);
If you do decide to use asl, note that while it accepts strings in UTF8 format and passes the correct encoding to the syslog daemon (so it will show up properly in the console), it encodes the string for visual encoding when displaying to the terminal or logging to a file handle, so non-ASCII values will be displayed as escaped character sequences.
My guess is that NSLog assumes a different encoding for 8-bit C-strings than UTF-8, and it may be one that doesn't support Chinese characters. Awkward as it is, you might try this:
NSLog(#"%#", [NSString stringWithCString: temp encoding: NSUTF8StringEncoding]);
I know you are probably looking for an answer that will help you understand what's going on.
But this is what you could do to solve your problem right now:
NSLog(#"%#", strValue);
# define NSLogUTF8(a,b) NSLog(a,[NSString stringWithCString:[[NSString stringWithFormat:#"%#",b] cStringUsingEncoding:NSUTF8StringEncoding] encoding:NSNonLossyASCIIStringEncoding])
#define NSLogUTF8Ex(a,b) NSLog(a,[MLTool utf8toNString:[NSString stringWithFormat:#"%#",b]])
+(NSString*)utf8toNString:(NSString*)str{
NSString* strT= [str stringByReplacingOccurrencesOfString:#"\\U" withString:#"\\u"];
//NSString *strT = [strTemp mutableCopy];
CFStringRef transform = CFSTR("Any-Hex/Java");
CFStringTransform((__bridge CFMutableStringRef)strT, NULL, transform, YES);
return strT;
}

Why is it direct commented Encoded string not converting to Arabic?

NSString * string = #"االْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ";
const char *c = [string cStringUsingEncoding:NSUTF8StringEncoding];
NSString *newString = [[NSString alloc]initWithCString:c encoding:NSISOLatin1StringEncoding];
NSLog(#"%#",newString);
// NSString * staticEncodedString = #"اÙÙØ­ÙÙ Ùد٠ÙÙÙÙÙÙ٠رÙبÙ٠اÙÙعÙاÙÙÙ ÙÙÙÙ";
const char *cvvv = [newString cStringUsingEncoding:NSISOLatin1StringEncoding];
NSString *newStringV = [[NSString alloc]initWithCString:cvvv encoding:NSUTF8StringEncoding];
NSLog(#"%#",newStringV);
Why is it direct commented Encoded string not converting to Arabic?
When i hardcode the Arabic it encodes and then decodes correctly, but why can't static encoded string not readable in arabic?
Thanks for your reply Jake. Yes I loose data while decoding the "staticEncodedString".But All I want is to decode the following string back to Arabic.
NSString * staticEncodedString = #"اÙÙØ­ÙÙ Ùد٠ÙÙÙÙÙÙ٠رÙبÙ٠اÙÙعÙاÙÙÙ ÙÙÙÙ";
The encode is in ANSI i think change it to UTF-8 from any tool.
Use Notepad++ to apply for example and then you can use encode it within sqlite or ios.
Latin1 can not represent the Arabic characters, so you can not encode that string to Latin1. Arabic belongs to the Latin4 character set. The method cStringUsingEncoding will return null if the string cannot losslessly be encoded to the specified encoding.
Why would you want to encode an arabic string to LatinX? UTF-8 will most likely be the best representation since it uses only standard characters and a straightforward approach with no headaches. It may take a bit more bytes than Latin4, but in most cases it will be worth it.
Converting to Latin1 will make you lose your text.

Resources