There is a category of Unicode characters that I want to use on iOS (must be unicode). Unfortunately none of the built-in iOS fonts support these characters (tested and proven. Other unicode characters work). How can I get this grouping of Unicode characters to work on iOS? These characters must be supported in the native text messaging app (iOS 8)
I've posted an image of the category as shown in the Character View on MAC OSX
Here is an example of one such character:
๐
MATHEMATICAL BOLD SCRIPT CAPITAL D
Unicode: U+1D4D3 (U+D835 U+DCD3), UTF-8: F0 9D 93 93, GB: 94339F33
Related
I have a thermal printer "MPT-II" from an unknown Chinese brand, that has both USB and Bluetooth. I can successfully print text using:
Loyverse app on Android
JavaScript
Raw HEX or decimal
However, only using the Loyverse app am I able to input special characters, and by special characters I mean the Danish characters รฆรธรฅ/รรร
.
If I open up any BLE tool on Windows (Bluetooth LE Lab for example), I can select the correct characteristic and send something like 104 101 108 108 111 13 10 which would print "hello" on the printer. I've read a bit about the ESC R and ESC t commands, but how exactly do I set those modes? I've tried prepending it to each command, such as 27 82 1 104 101 108 108 111 13 10 where the 27 82 1 corresponds to ESC R 4 and the 4 corresponds to Denmark I.
According to the printer's manual, it states the following:
GB18030 character set, ASCII characters, user defined characters, bar codes CODE39, EAN13, EAN8, CODABAR, CODE93, ITF, bitmaps.
According to that list, the Danish character set is not supported. I'm not sure how the Loyverse app is doing it correctly, but the text is the same using raw commands and Loyverse, so I don't think Loyverse is converting to a bitmap and sending that data.
So my real question is: How do I send the correct character set for my printer? Maybe the character set is already correct, but the ASCII character for รฆรธรฅ/รรร
are wrong?
EDIT: I have confirmed that something works with the ESC XXXX commands. If I do 27 97 2 followed by my "hello" sequence, the text is printed to the right (right aligned). So that definitely works.. I have tried probably all character sets thus far using ESC R and ESC t but none of them work :(
EDIT 2: I have now tested every single combination of ESC R and ESC t. I went through the entire list printing some Chinese characters, and every single line of 150+ I tried all returned the same Chinese character. So ESC R or ESC t is definitely not the command I should be using to change the charset.
Our application automatically modifies the layout of Arabic text when it is followed by a bracket and I was wondering whether this was the correct behaviour or not?
The application shows items in the following format:
[ID of structure](version)
So version 1.5 of the English structure "stackoverflow" would be displayed as:
stackoverflow(1.5)
Note: the brackets need to be displayed. There is no space between the ID and the first bracket. The brackets simply encompass the version. The brackets could have been any character but it's far too late to switch to a different character now!
This works fine for left to right languages, but for Arabic languages the structures appear in the form:
ุณุชุงููููุฑูููู(1.0)
I am not an Arabic speaker and I need to know if this is actually correct. Is the Arabic format the equivalent of the English format or has something gone horribly wrong?
The text in Arabic should be shown like:
ุณุชุงููููุฑูููู(1.0) โ
I added the html entity of RLM / Right-to-left Mark โ in order to fix the text. You should do so if your application doesn't support Bidi native-ly. You can add the RLM by these ways:
HTML Entity (decimal) โ
HTML Entity (hex) โ
HTML Entity (named) โ
How to type in Microsoft Windows Alt +200F
UTF-8 (hex) 0xE2 0x80 0x8F (e2808f)
UTF-8 (binary) 11100010:10000000:10001111
UTF-16 (hex) 0x200F (200f)
UTF-16 (decimal) 8,207
UTF-32 (hex) 0x0000200F (200f)
UTF-32 (decimal) 8,207
C/C++/Java source code "\u200F"
Python source code u"\u200F"
(note: StackOverflow right transliteration is ุณุชุงู-ุฃููุฑููู)
I am facing an issue when displaying the C cedilla character (U+00E7 รง) used in French language, on a handset.
When it is sent via USSGW/SS7 as small c cedilla , it is displayed on handset as capital c cedilla (U+00C7 ร).
For info, the character is encoded with gsm7bit.
Do you have any solution or idea for this situation?
The original ETSI TS 100 900 V7.2.0 (1999-07) Digital cellular telecommunications system (Phase 2+);
Alphabets and language-specific information
(GSM 03.38 version 7.2.0 Release 1998) defined byte 0x09 as ร (capital C with cedilla).
Subsequently in GSM 03.38 to Unicode mappings, a clarification was made:
General notes:
This table contains the data the Unicode Consortium has on how ETSI GSM 03.38 7-bit default alphabet characters map into Unicode. This mapping is based on ETSI TS 100 900 V7.2.0 (1999-07), with a correction of 0x09 to small c-cedilla, instead of capital C-cedilla.
and in the table:
0x08 0x00F2 # LATIN SMALL LETTER O WITH GRAVE
0x09 0x00E7 # LATIN SMALL LETTER C WITH CEDILLA
#0x09 0x00C7 # LATIN CAPITAL LETTER C WITH CEDILLA (see note above)
0x0A 0x000A # LINE FEED
So there you have it, this character was remapped at some point. It is likely that you are correctly-encoding the character, but an older device or something using a library with the old standard is interpreting the character according to the original mapping, resulting in the capital letter.
I'm not seeing a mapping for ร so it shouldn't appear any more.
There are some Unicode arrangements that I want to use in my app. I am having trouble properly escaping them for use.
For instance this Unicode sequence: ๐
ฐ
If I escape it using an online tool i get: \ud83c\udd70
But of course this is an invalid sequence per the compiler:
var str = NSString.stringWithUTF8String("\ud83c\udd70")
Also if I do this:
var str = NSString.stringWithUTF8String("\ud83c")
I get an error "Invalid Unicode Scalar"
I'm trying to use these Unicode "fonts":
http://www.panix.com/~eli/unicode/convert.cgi?text=abcdefghijklmnopqrstuvwxyz
If I view the source of this website I see sequences like this:
𝕒
Struggling to wrap my head around what is the "proper" way to work with/escape unicode.
And simply need a to figure out a way to get them working on iOS.
Any thoughts?
\ud83c\udd70 is a UTF-16 surrogate pair which encodes the unicode character ๐
ฐ (U+1F170). Swift string literals do not use UTF-16, so that escape sequence doesn't make sense. However, since 1F170 has five digits you can't use a \uXXXX escape sequence (which only accepts four hexadecimal digits). Instead, use a \UXXXXXXXX sequence (note the capital U), which accepts eight:
var str = "\U0001F170" // returns "๐
ฐ"
You can also just paste the character itself into your string:
var str = "๐
ฐ" // returns "๐
ฐ"
Swift is an early Beta, is is broken in many ways. This issue is a Swift bug.
let ringAboveA: String = "\u0041\u030A" is ร
and is accepted
let negativeSquaredA: String = "\uD83D\uDD70" is ๐
ฐ and produces an error
Both are decomposed UTF16 characters that are accepted by Objective-C. The difference is that the composed character ๐
ฐ is in plane 1.
Note: to get the UTF32 code point either use the OSX Character Viewer or a code snippet:
NSLog(#"utf32: %#", [#"๐
ฐ" dataUsingEncoding:NSUTF32BigEndianStringEncoding]);
utf32: <0001f170>
To get the Character Viewer in the Apple Menu go to the "System Preferences", "Keyboard", "Keyboard" tab and select the checkbox: "Show Keyboard & Character Viewers in menu bar". The "Character View" item will be in the menu bar just to the left of the Date.
After entering the character right (control) click on the character in favorites to copy the search results.
Copied information:
๐
ฐ
NEGATIVE SQUARED LATIN CAPITAL LETTER A
Unicode: U+1F170 (U+D83C U+DD70), UTF-8: F0 9F 85 B0
Better yet: Add unicode in the list on the left and select it.
I've parsed an HTML page with mochiweb_html and want to parse the following text fragment
0 โ 1
Basically I want to split the string on the spaces and dash character and extract the numbers in the first characters.
Now the string above is represented as the following Erlang list
[48,32,226,128,147,32,49]
I'm trying to split it using the following regex:
{ok, P}=re:compile("\\xD2\\x80\\x93"), %% characters 226, 128, 147
re:split([48,32,226,128,147,32,49], P, [{return, list}])
But this doesn't work; it seems the \xD2 character is the problem [if I remove it from the regex, the split occurs]
Could someone possibly explain
what I'm doing wrong here ?
why the 'โ' character seemingly requires three integers for representation [226, 128, 147]
Thanks.
226,128,147 is E2,80,93 in hex.
> {ok, P} = re:compile("\xE2\x80\x93").
...
> re:split([48,32,226,128,147,32,49], P, [{return, list}]).
["0 "," 1"]
As to your second question, about why a dash takes 3 bytes to encode, it's because the dash in your input isn't an ASCII hyphen (hex 2D), but is a Unicode en-dash (hex 2013). Your code is recieving this in UTF-8 encoding, rather than the more obvious UCS-2 encoding. Hex 2013 comes out to hex E28093 in UTF-8 encoding.
If your next question is "why UTF-8", it's because it's far easier to retrofit an old system using 8-bit characters and null-terminated C style strings to use Unicode via UTF-8 than to widen everything to UCS-2 or UCS-4. UTF-8 remains compatible with ASCII and C strings, so the conversion can be done piecemeal over the course of years, or decades if need be. Wide characters require a "Big Bang" one-time conversion effort, where everything has to move to the new system at once. UTF-8 is therefore far more popular on systems with legacies dating back to before the early 90s, when Unicode was created.