Special characters conversion from service in iOS - ios

I am getting some data from response which includes some special characters. The text is in Chinese format. Here I have shown the special characters.
Please have a look and advise me how to convert this to Chinese format. It is possible in android using this method
String s = "hhh";
s.getBytes("Windows-1252");
Special Characters string is
"P.R. China 100020 中国北京市æœé˜³åŒºæœå¤–大街22å·æ³›åˆ©å
¤§åŽ¦408室"

Related

Why can't Dart decode base64 from Crystal?

Why does Dart produce the error "invalid character at position 61" with base64 from Crystal Lang?
The default Crystal lang Base64 encoding won’t work in Dart or Flutter.
This is because it doesn’t use strict encoding by default, inserting newlines every 60 characters.
To Dart, these newlines are unknown characters.
So, in short, you have to use Crystal’s Base64.strict_encode method. This will encode without special characters.
Dart has no method to ignore the special characters so this is 100% necessary to make it work.
https://crystal-lang.org/api/0.35.1/Base64.html#strict_encode(data,io:IO)-instance-method

If it is valid that Wikipedia uses Chinese characters (and other unicode characters) in URL

On Wikipedia you see URLs like these:
https://zh.wiktionary.org/wiki/附录:字母索引 (but copy-pasting the URL results in the equivalent https://zh.wiktionary.org/wiki/%E9%99%84%E5%BD%95:%E5%AD%97%E6%AF%8D%E7%B4%A2%E5%BC%95).
https://th.wiktionary.org/wiki/หน้าหลัก (which when copy-pasted becomes
https://th.wiktionary.org/wiki/%E0%B8%AB%E0%B8%99%E0%B9%89%E0%B8%B2%E0%B8%AB%E0%B8%A5%E0%B8%B1%E0%B8%81)
First, I'm wondering what is happening here, what the encoding transformation is called and what it's doing and why it's doing that. I don't see why you can't just have the original native characters in the URL.
Second, I'm wondering if what Wikipedia is doing is considered valid. If it is okay to include these non-ASCII glyphs in the URL, and if not, why not (other than perhaps because the standard says so). Also would be interested to know how many browsers support showing the link in the URL bar using the native glyphs vs. this encoded thing, and even would be interesting to know how native Chinese/Thai/etc. people enter in the URL in their language, if they use the encoding or what (but that probably makes this question too complicated; still would be an interesting bonus).
The reason I ask is because I would like to put let's say words/definitions of a few different languages onto a webpage, and I would like to make the url show the actual word used in the language. So in english it might be /hello, but the equivalent word/definition in Thai would be /สวัสดี. That makes way more sense to me than having to make it into the encoding thing.
From https://en.wikipedia.org/wiki/Uniform_Resource_Identifier
Strings of data octets within a URI are represented as characters. *Permitted characters within a URI are the ASCII characters for the lowercase and uppercase letters of the modern English alphabet, the Arabic numerals, hyphen, period, underscore, and tilde.[14] Octets represented by any other character must be percent-encoded.
Not all Unicode characters can be used in URIs. Characters that aren't supported can still be encoded using Percent Encoding. You can see the non-ascii characters in the URL field because your browser chooses to display them that way, the actual HTTP requests are done using the encoded strings.

How can I convert Chinese characters into another set?

I'm trying to reproduce a character conversion...
Essentially out of the Chinese word for Login. In this example the Chinese word for Login, "登录"should be converted into this text instead "µÇ¼".
It would be nice if there was a piece of software that did this for me already...

Google's URL encoding?

I have noticed that Google does not encode all special characters in the query part of the URL . For example:
Placing this string in Google's search: !##$%^&*()
Yields this URL: https://www.google.com/#q=!%40%23%24%25^%26*()
Notice that the !, ^, *, ( , and ) are not encoded.
Some of the characters such as : or < are considered unsafe or reserved, yet Google doesn't encode them.
Can someone explain why Google does this, and if they have a reference document as to exactly what characters get encoded and which don't?
Thanks for any help!
As documented here:
Some characters are not safe to use in a URL without first being
encoded. Because a Google search request is made by using an HTTP URL,
the search request must follow URL conventions, including character
encoding, where necessary.
The HTTP URL syntax defines that only alphanumeric characters, the
special characters $-_.+!*'(), and the reserved characters ;/?:#=& can
be used as values within an HTTP URL request. Since reserved
characters are used by the search engine to decode the URL, and some
special characters are used to request search features, then all
non-alphanumeric characters used as a value to an input parameter must
be URL-encoded.
To URL-encode a string:
Replace space characters with a "+" character Replace each
non-alphanumeric character by its hexadecimal ASCII value, in the
format of a "%" character followed by two hexadecimal digits. (Such an
ASCII value may be referred to as an escape code.)
Some input parameters require that the values passed to Google search are double-URL-encoded. This requirement means that you must apply the URL encoding to the string twice in succession to generate the final value.

KRL RSS parser: Handle encoding issues?

I'm importing an RSS feed from Tumblr into a Kynetx app. It appears that the RSS feed has some encoding issues, as apostrophes appear like this:
The feed (which you can find here) claims to be encoded in UTF-8.
Is there a way to specify the encoding or else replace those characters with regular apostrophes?
While not optimal, you could try to catch these encodings and replace them with the UTF-8 standard:
newstring = oldstring.replace(re/’/\'/);
This appears to be a case of a service that specifies UTF-8, but does't explicitly enforce it. I uploaded an image of the RSS feed that you provided. For comparison, I cut and pasted the text into a notepad document and then typed in the same text from my keyboard.
I don't know if you can tell from the image, but the apostrophe that is mangled is different from the apostrophe that is generated by my UTF-8 browser.
I suspect that this post was submitted via a Windows client. If you look at your encoding options, you will see an option for Western (Windows-1252).
Windows-1252 is a legacy encoding from windows that resembles ISO 8859-1, but substitutes some of their own characters for control characters in the ANSI standard and changes the location in the codepage of others.
A couple of quotes from the wikipedia page that I cite above:
It is very common to mislabel Windows-1252 text data with the charset label ISO-8859-1. Many web browsers and e-mail clients treat the MIME charset ISO-8859-1 as Windows-1252 characters in order to accommodate such mislabeling
Many Microsoft programs, such as Word will automatically substitute Windows-1252 characters when standard ASCII characters are entered, such as for "smart quotes" (e.g. substituting ’ for the apostrophe in a contraction) or substituting © for the three characters '(c)'.
KRL supports all of the language charsets supported by UTF-8, so it supports multi-byte international characters natively; however, that comes at the expense of being able to fudge encodings that is possible when you only have ISO-8859-1 or Windows-1252 to choose from.

Resources