NSAttributedString & decomposedStringWithCanonicalMapping ranges - ios

I'm running into problems with international (in this case Korean) NSString values.
The same input string is used in two different parts of the program. The first part finds a substring that needs highlighting, stores the NSString and the range for the highlighting into a database.
The second part of the program retrieves the string and displays the highlighting.
The marking part is done using an NSString that has been normalized in Unicode Normalization Form C using the precomposedStringWithCanonicalMapping method on NSString. An NSRange and an NSString are then stored into the Core Data database.
The graphical highlighting is performed by retrieving the NSRange and NSString from the database, putting the NSString into the same Form C using the same method, using this to initialize an NSMutableAttributedString and using the NSRange to set its text attributes.
At this stage, the program crashes because the NSMutableAttributedString is 80 characters long, whereas the NSString was 81 characters long..
NSAttributedString does not have a precomposedStringWithCanonicalMapping method and I assume it changes the representation internally resulting in a different encoding and thus length.
What can I do?
is the a way of forcing NSAttributedString to keep an underlying encoding?
is there a way of converting an NSRange from one encoding to another?
or is there anything else I can do?

Ok,
I did eventually find out what had happened. In one particular place in the program I mistakenly used decomposed​String​With​Canonical​Mapping rather than precomposed​String​With​Canonical​Mapping and that's where the "wrong" mapping came from.

Related

How does UITextField handle french characters?

I have a UITextField that takes input for both english and french.
After I type french accents chars for example àÂèÎôÛç, and log the string, it looks like this:
\M-C\240\M-C\M^B\M-C\M-(\M-C\M^N\M-C\M-4\M-C\M^[\M-C\M-'
I've done a little research, but couldn't find what kind of encoding this is.
What kind of encoding is this?
How people normally handle encoding of special character?
The return type of the text method of UITextField is NSString which according to the documentation is:
An NSString object encodes a Unicode-compliant text string, represented as a sequence of UTF–16 code units. All lengths, character indexes, and ranges are expressed in terms of 16-bit platform-endian values, with index values starting at 0.
So it is just Unicode and the Xcode console just isn't printing it correctly.

How to decode \U201a\U00c4\U00f2\U201a\U00c4\U00f4 this? [duplicate]

This question already has answers here:
UTF8 character decoding in Objective C
(4 answers)
Closed 6 years ago.
Aam getting long text from server and that text contains character \U201a\U00c4\U00f2He-Must-Not-Be-Named\U201a\U00c4\U00f4.
When I display text in textView am getting some different characters...
How do I get normal Text in objective c???
Please help me out with this
When I received data from server I use
infoDictionary = [NSJSONSerialization JSONObjectWithData:data options:0 error:nil];
and from that infoDictionary I get text like
locks his cousin Dudley in the snake\U201a\U00c4\U00f4s captivity just in the blink of an eye. Each wand has a magical core such as phoenix\U201a\U00c4\U00f4s hair or dragon heartstring, that performs all the magic.
\n
And I assign this value to textView like
detailsTextView.text = [infoDictionary objectForKey:#"DESCRIPTION"];
But in textView am getting some different characters..
There are two possibilities, one more likely, one less likely.
The less likely one is that your server sends rubbish when it tries to translate its data into JSON.
The more likely one is that you are just frightening yourself, and there is nothing wrong. Something like \U201a\U00c4\U00f2He-Must-Not-Be-Named\U201a\U00c4\U00f4 is exactly how non-ASCII characters are encoded in UTF-8. For example, U201A is the Unicode character "Single Low-9 Quotation Mark". Use the character viewer in MacOS X to find out what the characters are if you are curious. If you use NSLog, you will also get the same strange characters. They should be displayed in your text view perfectly fine.
However, in your case, the sequence \U00c4\U00f2 or \U00c4\U00f4 seems to be highly unusual. This would seem to be a problem with the server code, or with the actual data that is stored. If you are given rubbish data, there's nothing you can do about it. It's also not created by one of the typical stupid mistakes on the server (storing MacRoman characters, or taking UTF-8 and assume the bytes are code points). The only thing you can do is to contact whoever is supplying this data.
Now there is something you can do. You can use the method stringByReplacingOccurencesOfString: to replace nonsense data with something sensible. I wouldn't expect the sequence \U201a\U00c4\U00f4s = ’ to ever appear in a string that I display. So figure out what string belongs there (say a quotation mark) and replace it. So get the description into an NSString, use stringByReplacingOccurencesOfString: and use the result. There may be more strange combinations than just this one.
stringWithUTF8String: takes const char* as an argument, so no "#"
symbol in the front.
NSString *description = [infoDictionary objectForKey:#"DESCRIPTION"];
NSString *str = [NSString stringWithUTF8String:description.UTF8String];
detailsTextView.text = str;
Show this str in your textview.

Why use NSRange on strings when there appears to be a perfectly good substring method?

I'm learning native iOS development for the first time, and I came across the struct NSRange. I come from a Java background so I don't really see the reasoning for using a range struct when you can just use substring methods that are part of the NSString class. What is the advantage of using range structs over using the non-range NSString substring methods.
Thanks!
edit:
Looks like I was considering the substring methods: substringFromIndex: and substringToIndex:.
Considering the inflexibility of these methods (ie. not being able to choose both a start AND end point) makes the range struct instances more necessary. Though I guess you could also nest those two methods to achieve the same result.
edit 2: Examples.
Non-range substring method examples:
NSString *str = #"This is a string.";
NSString *substr = [str substringToIndex:7];
NSString *substr2 = [str substringFromIndex:7];
Ranges substring method example:
NSString *substr3 = [str substringWithRange:NSMakeRange(5, 5)];
Because the range based methods offer a lot more flexibility, and they are also easily usable with all of the NSString search methods (which use ranges heavily). In general, if you're going to create a substring you need to know where to start or end and that information is likely to have come from a search, thus you have a range.

How to Validate a Character Compatible with MacRoman

Is there a simple way (a function, a method...) of validating a character that a user types to see if it's compatible with Mac OS Roman? I've read a few dozen topics to find out why an iOS application crashes in reference to CGContextShowTextAtPoint. I guess an application can crash if it tries to draw on an image a string (i.e. ©) containing a character that is not included in the Mac OS Roman set. There are 256 characters in this set. I wonder if there's a better way other than matching the selected character one by one with those 256 characters?
Thank you
You might give https://developer.apple.com/library/mac/#documentation/graphicsimaging/conceptual/drawingwithquartz2d/dq_text/dq_text.html a closer read.
You can draw any encoding using CGContextShowGlyphsAtPoint instead of CContextShowTextAtPoint so you can tell it what the encoding is. If the user types it then you'll be getting the string as an NSString which is a Unicode string underneath. Probably the easiest is going to be to get the utf8 encoding of that user entered string via NSString's UTF8String method.
If you really want to stick with the very limited MacRoman for some reason, then use NSString's cStringUsingEncoding: passing in NSMacOSRomanStringEncoding to get a MacRoman string. Read the documentation on this in NSString though. Will return null if the user string can't be encoded in MacRoman losslessly. As it discusses you can use dataUsingEncoding:allowLossyConversion: and canBeConvertedToEncoding: to check. Read the cautions in the Discussion for cStringUsingEncoding: about about lifecycle of the returned strings though. getCString:maxLength:encoding: might end up being a better choice for you. All discussed in the class documentation for NSString.
This doesn't directly answer the question but this answer may be a solution to your problem.
If you have an NSString, instead of using CGContextShowTextAtPoint, you can do:
[someStr drawAtPoint:somePoint withFont:someFont];
where someStr is an NSString containing any Unicode characters a user can type, somePoint is a CGPoint, and someFont is the UIFont to use to render the text.

using nsattributedstring and nslocalizedstring

My old code uses NSLocalizedString to display the following where outputText was an NSMutableString that contained many such lines in a single output session:
[outputText appendFormat: NSLocalizedString(#"\n\n%# and %# are identical. No comparison required.", #"\n\n%# and %# are identical. No comparison required."), self.ipAddress, secAddress.ipAddress];
I'm trying to change the color of the various ipAddress strings, but can't find a similar method when using NSMutableAttributedString.
The biggest problem I'm facing is that since this entire string will be localized, I can't reliably set the NSRange without breaking up each part of the formatted output.
Do I need to dissect each part of this string, convert it to NSAttributedString and append each piece to the outputText??
The answer is: yes.
Yes, you need to localize sections with different attributes separately.

Resources