iOS Font Not Properly Displaying All Unicode Characters - ios

Example of two characters: U+22FF, U+23BA... and many others.
Is this an encoding layer that I'm misunderstanding for iOS? Like, at some point it no longer can properly display codes beyond 22B...?
I'm capturing this in an NSString, and trying to simply update a text field. Something like
NSString *test = #"\u23ba";
[displayText setText:test];
This will display a standard type error like a box with a question mark in it, or just a box (depending on the font).
Is there a way to expand the unicode options for iOS? Because these can be displayed on my Mac. Or, is my only option some variant of the NSAttributableString route?

U+22FF and U+23BA are valid codepoints (assigned to characters). But they are supported by a few fonts only. So you should first check which font(s) are being used, or available.
For example, U+22FF is included in Asana-Math, Cambria, Cambria Math, Code2000, DejaVu Sans (oddly, only Bold Oblique typefaces), FreeSerif, GNU Unifont, Quivira, Segoe UI Symbol, STIXMath, STX, Sun-ExtA, Symbola, XITS, XITSMath. U+23BA is included in Cambria, Cambria Math, Code2000, FreeMono, FreeSerif, GNU Unifont, Quivira, Segoe UI Symbol, Sun-ExtA, Symbola. Many of these are free fonts. Typographic quality varies a lot. Cambria fonts and Segoe UI Symbol are commercial, shipped with some Microsoft products. There are probably some other fonts that cover those characters, but not many (Everson Mono, I suppose, I don’t currently have it).

Related

Why harfbuzz shape 2 single char into one glyph?

i'm new to both skia and harfbuzz, my project rely on skia to render text(Skia rely on harfbuzz to shape text.).
So, if i try to render text "ff" or "fl" or "fi"(or maybe some other combinations idk.), instead of render 2 "f", skia will render one glyph which composed of 2 chars("ff" or "fl" or "fi"), it will become much more obvious if i set text letter space property.
By following breakpoints, i tracked and found this all result from shaping result of harfbuzz. Harfbuzz will give out 1 glyph if the text is "ff" or "fl" or "fi".
It seems by making some configs of harfbuzz, i can avoid this, but idk how, please give me some hints.
PS:Shape result will be different if i use different font file, so this is also related to font file i used to shape.
What you are observing is the result of ligature glyph substitutions that occur during text layout.
Harfbuzz is performing advanced text layout using OpenType Layout features in a font. OpenType features are a mechanism for enabling different typographic capabilities that are supported in the font.
Some features are required for correct display of a script. For example, certain features are used to trigger word-contextual forms in Arabic script, or to trigger positioning of vowel marks in Bangla script or diacritic marks in Latin script. These are mandatory for correct display of these scripts.
Other features trigger optional typographic capabilities of a font—they're not essential for correct display of the script, but may be desired for high quality typography. Small caps or superscript forms are two examples of optional features. Many optional features are should not be used in applications by default. For instance, small caps should only be used when the content author explicitly wants them.
But in OpenType some optional features are recommended for use by default since they are intended to provide good quality typography for any body text. One such feature is "Standard Ligatures".
Your specific cases, "ff", "fi", etc., are considered standard ligatures. In any publication that has high quality typography, these will always be used in body text. Because the OpenType spec recommends that Standard Ligatures be applied by default, that's exactly what Harfbuzz is doing.
You can read the Harfbuzz documentation to find out more able how to enable or disable OpenType features. And you can find descriptions of all OpenType features in the OpenType Layout Tag Registry (part of the OpenType spec).
OpenType features use data contained directly in the fonts. Harfbuzz will enable the Standard Ligatures feature by default, but not all fonts necessarily have data that pertains to that feature. That's why you see the behaviour with some fonts but not others.
When a font does support features, the font data describe glyph-level actions to be performed. Harfbuzz (or any OpenType layout engine) will read the data and they perform the described actions. There are several types of actions that can be performed. One is a ligature substution—that is, substitute a sequence of glyphs with a single glyph, the ligature glyph. Ligature substitution actions could be used in fonts for a variety of purposes. Forming a "ff" ligature is one example. But a font might also substitute the default glyphs for a base letter and following combining mark with a single glyph that incorporates the base letter and the mark with the precise positioning of the mark for that combination. And that's something that would be essential for correct display of the script, but something that should be optional.
Thus, it would be a bad idea to disable all ligature substitutions. That's why OpenType has features as a trigger/control mechanism: features are organized around distinct typographic results, not the specific glyph-level actions used to achieve those results. So, you could disable a feature like Standard Ligatures without blocking ligature substitution actions that get used by the font for other typographic purposes.

Seemingly unsupported unicode character in UILabel

I have a UILabel that displays a string coming in from a web service. It seems to be properly displaying some unicode characters, but not all. The string comes from the web service in a JSON object as follows:
"\u2b51 \u2605 Special Chars"
This is displayed in the UILabel like so:
Clearly, it's displaying the \u2605 character just fine but not the \u2b51 character. The font is Helvetica Neue--the system font.
Am I doing something wrong or is this a bug in iOS and/or the font?
This seems to be purely a font issue. The character U+2605 BLACK STAR “★” is relatively common in fonts, so it is probably taken from a system font or a fallback font. The character U+2B51 BLACK SMALL STAR “⭑” is relatively rare; it was added in Unicode 5.1, i.e. rather recently (in the character code world, that is). According to Fileformat.info data, it appears in Code2000, FreeSerif, GNU Unifont, Quivira, STIX, STIXMath, and Symbola. Not much; most computers have none of them (though many Linux systems probably have FreeSerif). Well, it seems that you can add Asana Math and Universalia to the list; still rather limited.

List of iOS fonts grouped by supported script (ISO 15924)

So, obviously there's iosfonts.com which has been incredibly helpful, but how can I determine that, for example, HiraKakuProN-W3 contains the code points for Japanese, (Jpan, 413 in ISO 15924)
Furthermore, I'd like to know more specific information. I imagine that, continuing the example, HiraKakuProN contains the characters for Hiragana and Katakana, but does it also contain all the CJK unified ideographs, just the ones needed for Japanese, or none of them?
Where can I find exhaustive tables of unicode characters per language (IETF language tag)? It's easy to find a listing of all Hani characters, but Unicode (and the Hani code point table) doesn't make a distinction between Hans, Hant, Jpan, etc. I ask this because, if there is no readily available info on which iOS font is for which language, I will programmatically determine this myself, but will need to know what characters to look for.
Thanks for any leads.
The list of supported ScriptCodes for Arial Unicode ( The most polyvalent font as far as I know) is there :
http://en.wikipedia.org/wiki/Arial_Unicode_MS
From this site, you can find a link to fonts supporting a given ScriptCode.
But it may need some font installations.
I hope it helps… This is a complex domain ;)
http://scriptsource.org/cms/scripts/page.php

Delphi routine to display arbitrary bytes in arbitrary encoding in arbitrary language

I have some byte streams that may or may not be encoded as 1) extended ASCII, 2) UTF-8, or 3) UTF-16. And they may be in English, French, or Chinese. I would like to write a simple program that allows the user to enter a byte stream and then pick one of the encodings and one of the languages and see what the string would look like when interpreted in that manner. Or simply interpret each string in each of the 9 possible ways and display them all. I would like to avoid having to switch regionalizations repeatedly. I'm using Delphi 2007. Is what I am trying to do even possible?
In Delphi 2009 or later, this would be easier, since it supports Unicode and can do most of this transparently. For older versions, you have to do a bit more manual work.
The first thing you want to do is convert the text to a common codepage; preferably UTF-16, since that's the native codepage on Windows. For that, you use the MultiByteToWideChar function. For UTF-8 to UTF-16, the language doesn't matter; for "extended ASCII", you will need to choose an appropriate source code page (e.g. Windows-1252 for English and French, and GB2312 or Big5 or some other Chinese code page - that depends on what you expect to receive). To store these, you can use a WideString, which stores UTF-16 directly.
Once you have that, you have to draw the text somehow - and that requires you to either get a Unicode-capable control (a label is likely sufficient), or write one, or just call the appropriate Windows API function directly to draw - and that's where it can get a bit messy, because there are several functions for doing that. TextOutW is probably the simplest choice here, but another option would be DrawText. Make sure you explicitly call the W version of these function in order to work with Unicode. (See also the related question How do I draw Unicode text?).
Take note: Due to CJK unification - the encoding of equivalent Chinese Hanzi, Japanese Kanji, and Korean Hanja characters at the same code points in Unicode - you need to pick a font that matches the expected kind of Chinese, traditional or simplified, in order to get expected rendering. To quote a somewhat related post by Michael Kaplan:
What it comes down to is that there are many characters which can have
four different possible looks:
Japanese will default to using MS UI Gothic (fallback to PMingLIU, then SimSun, then Gulim)
Korean will default to using Gulim (fallback to PMingLiu, then MS UI Gothic, then SimSun)
Simplified Chinese will default to using SimSun (fallback to PMingLiu, then MS UI Gothic, then Batang)
Traditional Chinese will default to using PMingLiu (fallback to SimSun, then MS Mincho, then Batang)
Unless you have a specific font you want/need to use, pick the first font in the list for the language variant you want to use, since these are standard fonts (on XP, you will need to enable East Asian Language support before they are available, on Vista and above, they are always included). If you do not do this, then Windows may either not render the characters at all (showing the missing character glyph instead), or it may use an inappropriate fallback (e.g. PMingLiu for Simplified Chinese) - the exact behavior depends on the API function you use to render the text.

Finding System Fonts with Delphi

What is the best way to find all the system fonts a user has available so they can be displayed in a dropdown selection box?
I would also like to distinguish between Unicode and non-Unicode fonts.
I am using Delphi 2009 which is fully Unicode enabled, and would like a Delphi solution.
The Screen.Fonts property is populated via the EnumFontFamiliesEx API function. Look in Forms.pas for an example of calling that function.
The callback function that it calls will receive a TNewTextMetricEx record, and one of the members of that record is a TFontSignature. The fsUsb field indicates which Unicode subranges the font claims to support.
The system doesn't actually have "Unicode fonts." Even the fonts that have the word Unicode in their names don't have glyphs for all Unicode characters. You can distinguish between bitmap, printer, and TrueType fonts, but beyond that, the best you can do is to figure out whether the font you're considering supports the characters you want. And if the font isn't what you'd consider a "Unicode font," but it supports all the characters you need, then what difference does it make? To get this information, you may be interested in GetFontUnicodeRanges.
The Microsoft technology for displaying text with different fonts based on which fonts contain which characters is Uniscribe, particularly font fallback. I'm not aware of any Delphi support for Uniscribe; I started writing a set of import units for it once, but my interests are fickle, and I moved on to something else before I completed it. Michael Kaplan's blog talks about Uniscribe sometimes, so that's another place to look.
I can answer half your question, you can get a list of the Fonts that your current environment has access to as a string list from the global Screen object
i.e.
Listbox1.Items.AddStrings(Screen.Fonts);
You can look in the forms.pas source to see how Codegear fill Screen.Fonts by enumerating the Windows fonts. The returned LOGFONT structure has a charset member, but this does not provide a simple 'Unicode' determination.
As far as I know Windows cannot tell you explicitly if a font is 'Unicode'. Moreover if you try to display Unicode text in a 'non-Unicode' font Windows may substitute a different font, so it is difficult to say whether a font will or will not display Unicode. For example I have an ancient Arial Black font file which contains no Unicode glyphs, but if I use this to display Japanese text in a D2009 memo, the Japanese shows up correctly in Arial and the rest in Arial Black. In other examples, the usual empty squares may show up.

Resources