Encoding problems while extracting NSDictionary - ios

I have a JSON file with texts and titles. Titles are in English language, but texts sometimes include cyrillic symbols. When I extract JSON to NSDictionary everything seems to be good, because log shows me english and UTF symbols:
\U00c8\U00f1\U00f2\U00ee\U00f0\U00e8\U00df about...
But when I try to get a string from NSDictionary with this value it gives me strange symbols:
Èñòîðèß about...
It seems like there is everything allright before I try to extract any specific value from the dictionary. So I need help to understand how to get the same exact (English and UTF symbols) value which shows me when I NSLog the whole dictionary.
This is how I get the value now: NSLog(#"text: %#", dict[#"results"][0][#"text"]);

It seems, you haven't encoded your data properly. See your json file in UTF-8:
That's why you can not decode it into normal NSString

Related

\u0092 is not printed in UILabel

I have a local json file with some descriptions of an app and I have found a weird behaviour when parsing \u0092 and \u0091 characters.
When json file contains these characters, the corresponding parsed NSString is printed like "?" and in UIlabel it dissapears completely.
Example "L\u2019H\u00e9r." is showed as "LHér." instead of "L'Hér."
If I replace this characters with \u2019, then I can see the caracter ' in UILabel
Does anybody any clue about this?
EDIT: For the moment I will substitute both of them with character \u2019, it is also a ' and there is no problem confusing it with a control character. Thank you all!
This answer is a little speculative, but I hope it gets you on the right tracks.
Your best bet may be to give up and substitute \u0091 and \u0092 for something else as a preprocessing step before string display. These are control characters and are unprintable in most encodings. But:
If rest of the file is proper UTF, your json file probably has problems: encoding is wrong (CP-1250?) while you read the file as UTF, some error has been made when converting the file, or a similar issue. So another solution is of course fixing your file.
If you're not sure about how your file is encoded, it may simply be encoded in CP-1250 - so reading the file using NSWindowsCP1250StringEncoding might fix your problem.
BTW, if you hardcode a string #"\u0091", you'll get a compilation time error Universal character name refers to a control character. Yes, not even a warning, it's that much unprintable in Unicode ;)

lua reading chinese character

I have the following xml that I would like to read:
chinese xml - https://news.google.com/news/popular?ned=cn&topic=po&output=rss
korean xml - http://www.voanews.com/templates/Articles.rss?sectionPath=/korean/news
Currently, I try to use a luaxml to parse in the xml which contain the chinese character. However, when I print out using the console, the result is that the chinese character cannot be printed correctly and show as a garbage character.
I would like to ask if there is anyway to parse a chinese or korean character into lua table?
I don't think Lua is the issue here. The raw data the remote site sends is encoded using UTF-8, and Lua does no special interpretation of that—which means it should be preserved perfectly if you just (1) read from the remote site, and (2) save the read data to a file. The data in the file will contain CJK characters encoded in UTF-8, just like the remote site sent back.
If you're getting funny results like you mention, the fault probably lies either with the library you're using to read from the remote site, or perhaps simply with the way your console displays the results when you output to it.
I managed to convert the "中美" into chinese character.
I would need to do one additional step which has to convert all the the series of string by using this method from this link, http://forum.luahub.com/index.php?topic=3617.msg8595#msg8595 before saving into xml format.
string.gsub(l,"&#([0-9]+);", function(c) return string.char(tonumber(c)) end)
I would like to ask for LuaXML, I have come across this method xml.registerCode(decoded,encoded)
Under that method, it says that
registers a custom code for the conversion between non-standard characters and XML character entities
What do they mean by non-standard characters and how do I use it?

NSDictionary writeToFile encoding only in UTF-8?

By putting a NSDictionary to a file I get an UTF-8 encoded XML file. I need to write data to a file in NSISOLatin1StringEncoding. Is NSDictionary UTF-8 only? How to achieve my goal?
Are you sure you need a file encoded as ISO Latin-1? The problem with all encodings other than some form of Unicode is that they can't represent all possible characters.
The encoding is surely the least of your problems. A dictionary's file representation is a property list file. It's unlikely that any code which requires Latin-1 encoding would understand that format. Indeed, the format is not guaranteed. It's not even guaranteed to be XML or textual. Property lists may be binary.
If you want to exchange data with a program that's going to use anything other than Cocoa's property list implementation, you should manually write the contents of the dictionary out in a format that's defined independently of Apple's property list format.
And, yes, if Cocoa does write the property list as XML, it's going to be UTF-8-encoded.

ios: reading Chinese characters of a html source code

I'm trying to read and save Chinese characters written in websites !
For example:
html source code has this line:
title="网络歌手"
when I read this as NSString, the value returned is in the format like:
\UT0212\UT0999
something like that.
I have tried converting using gb2312 and utf-8, etc. encoders, but I don't quite get the exact Chinese. Sometimes I get close to Chinese, but not the exact words.
Any help is appreciated !
Regards,
Suraj
http://www.pinyin.info/tools/converter/chars2uninumbers.html
I believe you would have to convert the characters to unicode...similar to what they did in the above article

Decode unicode string in iOS

I got an app, where i download data through JSON. But when i am trying to show NSStrings i see something like this:
\u041e\u041d\u0410!!! etc.
How can I decode it into normal symbols?
in our team for this problem we create our own decoder
You'll see \u041e\u041d\u0410!!! only in NSLog's console view.. just assign NSSTring's value to some UILabel or UITextField it will show properly
If you log just for example nsdictioanry you will see unicode \u041e\u041d\u0410, but if you will log NSString (objectForKey:#"key") it will be ok.

Resources