This question already has answers here:
UTF8 character decoding in Objective C
(4 answers)
Closed 6 years ago.
Aam getting long text from server and that text contains character \U201a\U00c4\U00f2He-Must-Not-Be-Named\U201a\U00c4\U00f4.
When I display text in textView am getting some different characters...
How do I get normal Text in objective c???
Please help me out with this
When I received data from server I use
infoDictionary = [NSJSONSerialization JSONObjectWithData:data options:0 error:nil];
and from that infoDictionary I get text like
locks his cousin Dudley in the snake\U201a\U00c4\U00f4s captivity just in the blink of an eye. Each wand has a magical core such as phoenix\U201a\U00c4\U00f4s hair or dragon heartstring, that performs all the magic.
\n
And I assign this value to textView like
detailsTextView.text = [infoDictionary objectForKey:#"DESCRIPTION"];
But in textView am getting some different characters..
There are two possibilities, one more likely, one less likely.
The less likely one is that your server sends rubbish when it tries to translate its data into JSON.
The more likely one is that you are just frightening yourself, and there is nothing wrong. Something like \U201a\U00c4\U00f2He-Must-Not-Be-Named\U201a\U00c4\U00f4 is exactly how non-ASCII characters are encoded in UTF-8. For example, U201A is the Unicode character "Single Low-9 Quotation Mark". Use the character viewer in MacOS X to find out what the characters are if you are curious. If you use NSLog, you will also get the same strange characters. They should be displayed in your text view perfectly fine.
However, in your case, the sequence \U00c4\U00f2 or \U00c4\U00f4 seems to be highly unusual. This would seem to be a problem with the server code, or with the actual data that is stored. If you are given rubbish data, there's nothing you can do about it. It's also not created by one of the typical stupid mistakes on the server (storing MacRoman characters, or taking UTF-8 and assume the bytes are code points). The only thing you can do is to contact whoever is supplying this data.
Now there is something you can do. You can use the method stringByReplacingOccurencesOfString: to replace nonsense data with something sensible. I wouldn't expect the sequence \U201a\U00c4\U00f4s = ’ to ever appear in a string that I display. So figure out what string belongs there (say a quotation mark) and replace it. So get the description into an NSString, use stringByReplacingOccurencesOfString: and use the result. There may be more strange combinations than just this one.
stringWithUTF8String: takes const char* as an argument, so no "#"
symbol in the front.
NSString *description = [infoDictionary objectForKey:#"DESCRIPTION"];
NSString *str = [NSString stringWithUTF8String:description.UTF8String];
detailsTextView.text = str;
Show this str in your textview.
Related
I'm running into problems with international (in this case Korean) NSString values.
The same input string is used in two different parts of the program. The first part finds a substring that needs highlighting, stores the NSString and the range for the highlighting into a database.
The second part of the program retrieves the string and displays the highlighting.
The marking part is done using an NSString that has been normalized in Unicode Normalization Form C using the precomposedStringWithCanonicalMapping method on NSString. An NSRange and an NSString are then stored into the Core Data database.
The graphical highlighting is performed by retrieving the NSRange and NSString from the database, putting the NSString into the same Form C using the same method, using this to initialize an NSMutableAttributedString and using the NSRange to set its text attributes.
At this stage, the program crashes because the NSMutableAttributedString is 80 characters long, whereas the NSString was 81 characters long..
NSAttributedString does not have a precomposedStringWithCanonicalMapping method and I assume it changes the representation internally resulting in a different encoding and thus length.
What can I do?
is the a way of forcing NSAttributedString to keep an underlying encoding?
is there a way of converting an NSRange from one encoding to another?
or is there anything else I can do?
Ok,
I did eventually find out what had happened. In one particular place in the program I mistakenly used decomposedStringWithCanonicalMapping rather than precomposedStringWithCanonicalMapping and that's where the "wrong" mapping came from.
I have a two strings.
Once is a response from a TCP server using NSStream events, using:
NSString *output = [[NSString alloc] initWithBytes:buffer length:len encoding:NSASCIIStringEncoding];
And one is a string produced from on the fly, that should match the returned string from the NSStream.
I have NSLog both of these out, and they are identical.
I have tried to NSLog the Lengths of the strings, and one is two characters longer - Even though they are both identical in 'text' form.
Any suggestions to point me in the right direction?
I need to know if they match, as if they do, another event will be triggered to enhance and add additional functionality to my app.
Never use == to compare strings. If their contents are character-by-character identical, you can use isEqualToString to compare 2 strings. If your strings have different lengths, though, then they are not character-by-character identical.
Write a for loop that uses the method characterAtIndex to log the characters from each string 1 at a time and compare them. You might need to log the characters' integer values so you can see info about the non-printable ones.
Thanks Guys.
Fixed with #rdelmar suggestions - I didn't know this was possible in Objective-C:
NSString *trimOutput = [output stringByTrimmingCharactersInSet:[NSCharacterSet newlineCharacterSet]];
I'm retrieving a string from an NSData object dataOut (coming from a CBCharacteristic), and defining a testString as well which is defined as the same value as shown below:
The problem is when I try to compare the two, I get that the two strings are not equal, even though the debugger shows otherwise:
Here is the comparison:
The log keeps logging "Strings are not equal!"
What am I doing wrong? Is the encoding incorrect, even though the strings are the same?
You will see them are different when you convert them to NSData and print out the data.
What you see is not always what you get especially with Unicode characters. There maybe invisible characters or some characters that looks similar.
Is there a simple way (a function, a method...) of validating a character that a user types to see if it's compatible with Mac OS Roman? I've read a few dozen topics to find out why an iOS application crashes in reference to CGContextShowTextAtPoint. I guess an application can crash if it tries to draw on an image a string (i.e. ©) containing a character that is not included in the Mac OS Roman set. There are 256 characters in this set. I wonder if there's a better way other than matching the selected character one by one with those 256 characters?
Thank you
You might give https://developer.apple.com/library/mac/#documentation/graphicsimaging/conceptual/drawingwithquartz2d/dq_text/dq_text.html a closer read.
You can draw any encoding using CGContextShowGlyphsAtPoint instead of CContextShowTextAtPoint so you can tell it what the encoding is. If the user types it then you'll be getting the string as an NSString which is a Unicode string underneath. Probably the easiest is going to be to get the utf8 encoding of that user entered string via NSString's UTF8String method.
If you really want to stick with the very limited MacRoman for some reason, then use NSString's cStringUsingEncoding: passing in NSMacOSRomanStringEncoding to get a MacRoman string. Read the documentation on this in NSString though. Will return null if the user string can't be encoded in MacRoman losslessly. As it discusses you can use dataUsingEncoding:allowLossyConversion: and canBeConvertedToEncoding: to check. Read the cautions in the Discussion for cStringUsingEncoding: about about lifecycle of the returned strings though. getCString:maxLength:encoding: might end up being a better choice for you. All discussed in the class documentation for NSString.
This doesn't directly answer the question but this answer may be a solution to your problem.
If you have an NSString, instead of using CGContextShowTextAtPoint, you can do:
[someStr drawAtPoint:somePoint withFont:someFont];
where someStr is an NSString containing any Unicode characters a user can type, somePoint is a CGPoint, and someFont is the UIFont to use to render the text.
I am using PDFKitten for searching strings within PDF documents with highlighting of the results. FastPDFKit or any other commercial library is no option so i sticked to the most close one for my requirements.
As you can see in the screenshot i searched for the string "in" which is always correctly highlighted except the last one. I got a more complex PDF document where the highlighted box for "in" is nearly 40% wrong.
I read the whole syntax and checked the issues tracker but except line height problems i found nothing regarding the width calculation. For the moment i dont see any pattern where the calculation goes or could be wrong and i hope that maybe someone else had a close problem to mine.
My current expectation is that the coordinates and character width is wrong calculated somewhere in the font classes or RenderingState.m. The project is very complex and maybe someone of you had a similar problem with PDFKitten in the past.
I have used the original sample PDF document from PDFKitten for my screenshot.
This might be a bug in PDFKitten when calculating the width of characters whose character identifier does not coincide with its unicode character code.
appendPDFString in StringDetector works with two strings when processing some string data:
// Use CID string for font-related computations.
NSString *cidString = [font stringWithPDFString:string];
// Use Unicode string to compare with user input.
NSString *unicodeString = [[font stringWithPDFString:string] lowercaseString];
stringWithPDFString in Font transforms the sequence of character identifiers of its argument into a unicode string.
Thus, in spite of the name of the variable, cidString is not a sequence of character identifiers but instead of unicode chars. Nonetheless its entries are used as argument of didScanCharacter which in Scanner is implemented to forward the position by the character width: It is using the value as parameter of widthOfCharacter in Font to determine the character width, and that method (according to the comment "Width of the given character (CID) scaled to fontsize") expects its argument to be a character identifier.
So, if CID and unicode character code don't coincide, the wrong character widths is determined and the position of any following character cannot be trusted. In the case at hand, the /fi ligature has a CID of 12 which is way different from its Unicode code 0xfb01.
I would propose PDFKitten to be enhanced to also define a didScanCID method in StringDetector which in appendPDFString should be called next to didScanCharacter for each processed character forwarding its CID. Scanner then should make use of this new method instead to calculate the width to forward its cursor.
This should be triple-checked first, though. Maybe some widthOfCharacter implementations (there are different ones for different font types) in spite of the comment expect the argument to be a unicode code after all...
(Sorry if I used the wrong vocabulary here or there, I'm a 'Java guy... :))