How can I "puny" encode emoji? - ios

Emojis have percent encoded representations, but they also have "puny" representations. How does this work? Is there an easy way to convert emojis to punty codes?
For example, the key emoji (🔑) can be represented by the percent encoding %f0%9f%94%91 and the puny code xn--kv8h.

There are actually many built solutions for iOS including this pod: Puntycode-Cocoa.
The general method for encoding Unicode as ASCII using punycode is explained in detail on Wikipedia.

Related

Are code pages and code charts the same thing?

Based on what I have gathered so far from reading information available online:
character set is a bunch of characters that we want to use (like an interface)
character encoding is a method of encoding some character set (like an implementation)
What is the relationship between code charts and code pages and how do they fit into the overall context? I am not sure if these two terms are synonyms or if they are referring to distinct concepts.
Do code charts/code pages define character sets through large tables and also provide a method of encoding, making them a part of character encoding? Or, do they only define character sets and leave encoding implementation to another aspect? Additionally, is a locale simply a type of code chart/code page or is it a separate concept altogether?
In the majority of cases, character sets and character encodings are one and the same. For example, ISO-8859-1 defines the character set for Western Europe AND the encoding using an 8bit scheme.
See the specification for ISO-8859-1: ftp://std.dkuug.dk/JTC1/sc2/wg3/docs/n411.pdf, which includes the encoding implementation.
Unicode on the other hand separates encoding from the character definition, albeit within a bunch of related documents. In Unicode, just about all current and a good deal of historic characters, symbols and modifiers are mapped to a 32 bit "code point". Encodings of UTF-32, UTF-16 and UTF-8 are then documented separately, to define how the Unicode Code Point is encoded.

iOS write to CSV file: which encoding to use

In my iOS app, I have a feature that writes data to a CSV file. This works fine in most cases with the following:
[csvString writeToFile: filePath atomically:YES encoding: NSUTF8StringEncoding error:&error];
I recently got an email from a Japanese user that the CSV file exported has weird symbols instead of Japanese characters. So I switched to using NSUTF16StringEncoding and it seems to work fine for Japanese characters as well.
So the question is: is it better to use NSUTF16StringEncoding, or are there any drawbacks to doing this? It seems that other examples I've seen for writing to CSV files (including CHCSVParser) use
NSUTF8StringEncoding, so I'm not sure which one to prefer.
Thanks.
There's no a "better" encoding.
UTF-8 uses a variable number of bytes per each character, from 1 to 4. UTF-16 uses always 2 bytes for every character. What is best, is really up to you and your business. In theory, if your users are mostly based in Asia and use primarily non-ASCII character, files encoded in UTF-16 are smaller. If your users are primarily living in the Western world and use Latin-based alphabets, using UTF-8 makes every file 50% smaller.
I believe your problem is not with the choice of the encoding, but rather with the presentation. Text editors cannot guess the encoding of a file, so it's possible that your Japanese user was using a text editor that defaults to UTF-16, and thus was unable to represent UTF-8 character sequences correctly.
The solution to this problem is to using the BOM sequence, as per this SO answer: https://stackoverflow.com/a/2585194/192024 (in short: just add those 3 bytes at the beginning of the file to tell editors what encoding to use)

How to create UTF-16 animation in Twitter?

I use a UTF-16 character picker to create ASCII art in Texbox in HTML, and UTF-16 characters are supported and visible "as is". Now I need to process such ASCII art and save into an Array as UTF-16 characters, process with Javascript as Strings to build ASCII art animations for Twitter like this:
You don't have to be sorry.
Twitter accepts UTF-16 as ASCIIart
For UTF-16 definition go to Wikipedia
http://en.wikipedia.org/wiki/UTF-16
UTF-16 (16-bit Unicode Transformation Format) is a character encoding for Unicode capable of encoding 1,112,064[1] numbers (called code points) in the Unicode code space from 0 to 0x10FFFF. It produces a variable-length result of either one or two 16-bit code units per code point.
I already did 2-bytes Unicode picker (UTF-16) and can generate UTF-16 input into Twitter.
==
re:
Removed the link as it's pointing to a Twitter account which doesn't show the mentioned content anymore (w/o scrolling). May appear like spam then. – david Nov 20 at 4:09
That way it may take much longet time to get right answer.
UTF-16 is a character encoding. Twitter only accepts UTF-8 as input. You can convert UTF-16 to UTF-8 without any data loss, so just do that and then send it to Twitter.

Can anyone tell me how to convert UTF-8 value to UCS-2 value in Objective-c?

I am trying to convert UTF-8 string into UCS-2 string.
I need to get string like "\uFF0D\uFF0D\u6211\u7684\u4E0A\u7F51\u4E3B\u9875".
I have googled for about a month by now, but still there is no reference about converting UTF-8 to UCS-2.
Please someone help me.
Thx in advance.
EDIT: okay, maybe my explanation was not good enough. Here is what I am trying to do.
I live in Korea, and I am trying to send a sms message using CTMessageCenter. I tried to send chinese simplified character through my app. And I get ???? Instead of proper characters. So I tried UTF-8, UTF-16, BE and LE as well. But they all return ??. Finally I found out that SMS uses UCS-2 and EUC-KR encoding in Korea. Weird, isn't it?
Anyway I tried to send string like \u4E3B\u9875 and it worked.
So I need to convert string into UCS-2 encoding first and get the string literal from those strings.
Wikipedia:
The older UCS-2 (2-byte Universal Character Set) is a similar
character encoding that was superseded by UTF-16 in version 2.0 of the
Unicode standard in July 1996.2 It produces a fixed-length format
by simply using the code point as the 16-bit code unit and produces
exactly the same result as UTF-16 for 96.9% of all the code points in
the range 0-0xFFFF, including all characters that had been assigned a
value at that time.
IBM:
Since the UCS-2 standard is limited to 65,535 characters, and the data
processing industry needs over 94,000 characters, the UCS-2 standard
is in the process of being superseded by the Unicode UTF-16 standard.
However, because UTF-16 is a superset of the existing UCS-2 standard,
you can develop your applications using the systems existing UCS-2
support as long as your applications treat the UCS-2 as if it were
UTF-16.
uincode.org:
UCS-2 is obsolete terminology which refers to a Unicode
implementation up to Unicode 1.1, before surrogate code points and
UTF-16 were added to Version 2.0 of the standard. This term should now
be avoided.
UCS-2 does not define a distinct data format, because UTF-16 and UCS-2
are identical for purposes of data exchange. Both are 16-bit, and have
exactly the same code unit representation.
So, using the "UTF8toUnicode" transformation in most language libraries will produce UTF-16, which is essentially UCS-2. And simply extracting the 16-bit characters from an Objective-C string will accomplish the same thing.
In other words, the solution has been staring you in the face all along.
UCS-2 is not a valid Unicode encoding. UTF-8 is.
It is therefore impossible to convert UTF-8 into UCS-2 — and indeed, also the reverse.
UCS-2 is dead, ancient history. Let it rot in peace.

How to convert ASCII or Unicode to cyrillic?

I am trying to convert ascii code or unicode of the key press to cyrillic. I have used keyboard layout manager for this but it is not converting all the characters to cyrillic.
Is there any other way to convert the character into cyrillic? Or do we need to do anything else in keyboard layout manager?
vishal N
It depends which cyrillic codepage do you use. cp-1251, cp-866 or koi8-r.
https://en.wikipedia.org/wiki/Windows-1251
https://en.wikipedia.org/wiki/Code_page_866
https://en.wikipedia.org/wiki/KOI_character_encodings
All three codepages have one byte per symbol and it is very simple to convert.
It is simple table translation.

Resources