Problem parsing Strings with Russian chars - ios

I'm using an old objectiveC routine (let's call it oldObjectiveCFunction), which parses a String analyzing each char. After analyzing chars, it divides that String into Strings, and returns them into an array called *functions. This is a super reduced sample of how is that old function doing the String parse:
NSMutableArray *functions = [NSMutableArray new];
NSMutableArray *components = [NSMutableArray new];
NSMutableString *sb = [NSMutableString new];
char c;
int sourceLen = source.length;
int index = 0;
while (index < sourceLen) {
c = [source characterAtIndex:index];
//here do some random work analyzing the char
[sb appendString:[NSString stringWithFormat:#"%c",c]];
if (some condition){
[components addObject:(NSString *)sb];
sb = [NSMutableString new];
[functions addObject:[components copy]];
}
}
later, I'm getting each String of *functions doing this with Swift code:
let functions = oldObjectiveCFunction(string) as? [[String]]
functions?.forEach({ (function) in
var functionCopy = function.map { $0 }
for index in 0..<functionCopy.count {
let string = functionCopy[index]
}
}
the problem is that, it works perfectly with normal strings, but if the String contains russian names, like this:
РАЦИОН
the output, the content of my let string variable, is this:
\u{10}&\u{18}\u{1e}\u{1d}
How can I get the same Russian string instead of that?
I tried doing this:
let string2 = String(describing: string?.cString(using: String.Encoding.utf8))
but it returns even more strange result:
"Optional([32, 16, 38, 24, 30, 29, 0])"

Analysis. Sorry, I don't speak swift or Objective-C so the following example is given in Python; however, the 4th and 5th column (unicode reduced to 8-bit) recalls weird numbers in your question.
for ch in 'РАЦИОН':
print(ch, # character itself
ord(ch), # character unicode in decimal
'{:04x}'.format(ord(ch)), # character unicode in hexadecimal
(ord(ch)&0xFF), # unicode reduced to 8-bit decimal
'{:02x}'.format(ord(ch)&0xFF)) # unicode reduced to 8-bit hexadecimal
Р 1056 0420 32 20
А 1040 0410 16 10
Ц 1062 0426 38 26
И 1048 0418 24 18
О 1054 041e 30 1e
Н 1053 041d 29 1d
Solution. Hence, you need to fix all in your code reducing 16-bit to to 8-bit:
first, declare unichar c; instead of char c; at the 4th line, and use [sb appendString:[NSString stringWithFormat:#"%C",c]]; at the 11th line; note
Latin Capital Letter C in %C specifier 16-bit UTF-16 code unit (unichar) instead of
Latin Small Letter C in %c specifier 8-bit unsigned character (unsigned char);
Resources. My answer is based on answers to the following questions at SO:
What are the supported Swift String format specifiers?
objective-c - difference between char and unichar?

Your last result is not strange. The optional comes from the string?, and the cString() function returns an array of CChar ( Int8 ).
I think the problem comes from here - but I'm not sure because the whole thing looks confusing:
[sb appendString:[NSString stringWithFormat:#"%c",c]];
have you tried :
[sb appendString: [NSString stringWithCString:c encoding:NSUTF8StringEncoding]];
Instead of stringWithFormat?
( The solution of the %C instead of %c proposed by your commenters looks a good idea too. ) - oops - just saw you have tried without success.

Related

Convert Hex String to ASCII Format [duplicate]

This question already has an answer here:
NSString containing hex convert to ascii equivalent
(1 answer)
Closed 6 years ago.
I have a Hex string like "000000000100" and I am using the following logic to do ASCII conversion, the output I am receiving is only 1 byte (\x01) But I want the output in the 6 byte format as \x00\x00\x00\x00\x01\x00
-(NSString*) decode
{
string=#"000000000100";
NSMutableString * newString = [[NSMutableString alloc]init];
int i = 0;
while (i < [string length])
{
NSString * hexChar = [string substringWithRange: NSMakeRange(i, 2)];
int value = 0;
sscanf([hexChar cStringUsingEncoding:NSASCIIStringEncoding], "%x", &value);
[newString appendFormat:#"%c", (char)value];
i+=2;
}
return newString;
}
How to do that ?
Let's first directly address your bug: In your code you attempt to add the next byte to your string with:
[newString appendFormat:#"%c", (char)value];
Your problem is that %c produces nothing if the character is a null, so you are appending an empty string and as you found end up with a string with a single byte in it.
You can fix your code by testing for the null and appending a string containing a single null:
if (value == 0)
[newString appendString:#"\0"]; // append a single null
else
[newString appendFormat:#"%c", (char)value];
Second, is this the way to do this?
Other answers have shown you other algorithms, they might be more efficient than yours as they only convert to a C-String once rather than repeatedly extract substrings and convert each one individually.
If and only if performance is a real issue for you you might wish to consider such C-based solutions. You clearly know how to use scanf, but in such a simple case as this you might want to look at digittoint and do the conversion of two hex digits to an integer yourself (value of first * 16 + value of second).
Conversely if you'd like to avoid C and scanf look at NSScanner and scanHexInt/scanHexLongLong - if your strings are never longer than 16 hex digits you can convert the whole string in one go and then produce an NSString from the bytes of the resultant unsigned 64-bit integer.
HTH

How to convert unicode hex number variable to character in NSString?

Now I have a range of unicode numbers, I want to show them in UILabel, I can show them if i hardcode them, but that's too slow, so I want to substitute them with a variable, and then change the variable and get the relevant character.
For example, now I know the unicode is U+095F, I want to show the range of U+095F to U+096f in UILabel, I can do that with hardcode like
NSString *str = [NSString stringWithFormat:#"\u095f"];
but I want to do that like
NSInteger hex = 0x095f;
[NSString stringWithFormat:#"\u%ld", (long)hex];
I can change the hex automatically,just like using #"%ld", (long)hex, so anybody know how to implement that?
You can initialize the string with the a buffer of bytes of the hex (you simply provide its pointer). The point is, and the important thing to notice is that you provide the character encoding to be applied. Specifically you should notice the byte order.
Here's an example:
UInt32 hex = 0x095f;
NSString *unicodeString = [[NSString alloc] initWithBytes:&hex length:sizeof(hex) encoding:NSUTF32LittleEndianStringEncoding];
Note that solutions like using the %C format are fine as long as you use them for 16-bit unicode characters; 32-bit unicode characters like emojis (for example: 0x1f601, 0x1f41a) will not work using simple formatting.
You would have to use
[NSString stringWithFormat:#"%C", (unichar)hex];
or directly declare the unichar (unsigned short) as
unichar uni = 0x095f;
[NSString stringWithFormat:#"%C", uni];
A useful resource might be the String Format Specifiers, which lists %C as
16-bit Unicode character (unichar), printed by NSLog() as an ASCII character, or, if not an ASCII character, in the octal format \ddd or the Unicode hexadecimal format \udddd, where d is a digit.
Like this:
unichar charCode = 0x095f;
NSString *s = [NSString stringWithFormat:#"%C",charCode];
NSLog(#"String = %#",s); //Output:String = य़

Very big ID in JSON, how to obtain it without losing precision

I have IDs in JSON file and some of them are really big but they fit inside bounds of unsigned long long int.
"id":9223372036854775807,
How to get this large number from JSON using objectForKey:idKey of NSDictionary?
Can I use NSDecimalNumber? Some of this IDs fit into regular integer.
Tricky. Apple's JSON code converts integers above 10^18 to NSDecimalNumber, and smaller integers to plain NSNumber containing a 64 bit integer value. Now you might have hoped that unsignedLongLongValue would give you a 64 bit value, but it doesn't for NSDecimalNumber: The NSDecimalNumber first gets converted to double, and the result to unsigned long long, so you lose precision.
Here's something that you can add as an extension to NSNumber. It's a bit tricky, because if you get a value very close to 2^64, converting it to double might get rounded to 2^64, which cannot be converted to 64 bit. So we need to divide by 10 first to make sure the result isn't too big.
- (uint64_t)unsigned64bitValue
{
if ([self isKindOfClass:[NSDecimalNumber class]])
{
NSDecimalNumber* asDecimal = (NSDecimalNumber *) self;
uint64_t tmp = (uint64_t) (asDecimal.doubleValue / 10.0);
NSDecimalNumber* tmp1 = [[NSDecimalNumber alloc] initWithUnsignedLongLong:tmp];
NSDecimalNumber* tmp2 = [tmp1 decimalNumberByMultiplyingByPowerOf10: 1];
NSDecimalNumber* remainder = [asDecimal decimalNumberBySubtracting:tmp2];
return (tmp * 10) + remainder.unsignedLongLongValue;
}
else
{
return self.unsignedLongLongValue;
}
}
Or process the raw JSON string, look for '"id" = number; '. With often included white space, you can find the number, then over write it with the number quoted. You can put the data into a mutable data object and get a char pointer to it, to overwrite.
[entered using iPhone so a bit terse]

ASCII character to decimal code value

I'm trying to grab a character from a UITextField and find the ascii decimal code value for it. I can save the field value into a char variable, but I'm having trouble obtaining the decimal code value. See below code snippet of the problem.
// grabs letter from text field
char dechar = [self.decInput.text characterAtIndex:0]; //trying to input a capital A (for example)
NSLog(#"dechar: %c",dechar); // verifies that dechar holds my intended letter
// below line is where i need help
int dec = sizeof(dechar);
NSLog(#"dec: %d",dec); //this returns a value of 1
// want to pass my char dechar into below 'A' since this returns the proper ASCII decimal code of 65
int decimalCode = 'A';
NSLog(#"value: %d",decimalCode); // this shows 65 as it should
I know going the other way I can just use...
int dec = [self.decInput.text floatValue];
char letter = dec;
NSLog(#"ch = %c",letter); //this returns the correct ASCII letter
any ideas?
Why are you using the sizeof operator?
Simply do:
int dec = dechar;
This will give you 65 for dec assuming that dechar is A.
BTW - you really should change dechar to unichar, not char.
iOS uses unicode, not ASCII. Unicode characters are usually 16 bits, not 8 bits.
Look at using the NSString method characterAtIndex, which returns a unichar. A unichar is a 16 bit integer rather than an 8 bit value, so it can represent a lot more characters.
If you want to get ASCII values from an NSString, you should first convert it to ASCII using the NSString method dataUsingEncoding: NSASCIIStringEncoding, then iterate through the bytes in the date you get back.
Note that ASCII can only represent a tiny fraction of unicode characters though.

NSString's method 'getCString' return false for Arabic text

I am generating QR code and everything is working fine if text is only in English. When i want to generate QR code with some Arabic text then it fails at NSString's method "getCString:maxLength:encoding:".
Suppose, I have two strings:
NSString *englishText = #"Some text English";
NSString *englishArabicMixText = #"Some text بالعربي";
char strEng [[englishText length] + 1];
char strArb [[englishArabicMixText length] + 1];
1- [englishText getCString:strEng maxLength:[englishText length] + 1 encoding:NSUTF8StringEncoding];
2- [englishArabicMixText getCString:strArb maxLength:[englishArabicMixText length] + 1 encoding:NSUTF8StringEncoding];
At Case#1 'getCString' return true and QR code is generated and at Case#2 it return false and failed to generate code.
What should I do, so that in case#2 it should also return true ? Thank you
length returns the number of Unicode characters. You have to use lengthOfBytesUsingEncoding:, which returns the number of bytes required to store the receiver in a given encoding.
NSUInteger arbLength = [englishArabicMixText lengthOfBytesUsingEncoding:NSUTF8StringEncoding] + 1;
char strArb [arbLength];
[englishArabicMixText getCString:strArb maxLength:arbLength encoding:NSUTF8StringEncoding];
2 is returning false for either 2 possible reasons:
1) the string cannot be converted with the specified encoding.
2) the buffer to hold the encoded string is too small.
I'd guess (or at least I suggest you to start investigating) problem is nr. 2.
Because as you're converting to UTF8 a single un-encoded character may result in more than one encoded character. An 'A' is a single byte with value 65 but an arabic character or some kind of symbol may require more bytes.
You are assuming your destination buffer requires the same number of bytes as the same number of characters of your NSString
So you should do something like that:
NSUInteger size = [englishArabicMixText lengthOfBytesUsingEncoding : NSUTF8StringEncoding];
if(size>0)
{
size++;
strArb = malloc(size); // NOTE: you should allocate space for your string at runtime!!
[englishArabicMixText getCString:strArb maxLength:size encoding:NSUTF8StringEncoding];
}
You should do the same for the plain english string too.
And I'd reccomend to allocate dinamically at runtime the space for the C string with malloc and then free it when you don't need it anymore.

Resources