Seemingly unsupported unicode character in UILabel - ios

I have a UILabel that displays a string coming in from a web service. It seems to be properly displaying some unicode characters, but not all. The string comes from the web service in a JSON object as follows:
"\u2b51 \u2605 Special Chars"
This is displayed in the UILabel like so:
Clearly, it's displaying the \u2605 character just fine but not the \u2b51 character. The font is Helvetica Neue--the system font.
Am I doing something wrong or is this a bug in iOS and/or the font?

This seems to be purely a font issue. The character U+2605 BLACK STAR “★” is relatively common in fonts, so it is probably taken from a system font or a fallback font. The character U+2B51 BLACK SMALL STAR “⭑” is relatively rare; it was added in Unicode 5.1, i.e. rather recently (in the character code world, that is). According to Fileformat.info data, it appears in Code2000, FreeSerif, GNU Unifont, Quivira, STIX, STIXMath, and Symbola. Not much; most computers have none of them (though many Linux systems probably have FreeSerif). Well, it seems that you can add Asana Math and Universalia to the list; still rather limited.

Related

TCPDF wrong Arabic Character Display with Sakkal Majalla Font

I am using TCPDF for many years. Recently I had to work on Arabic language display. The client wanted SakkalMajalla font (available in Windows/font) and I converted this using TCPDF tool. The conversion process was successful without error.
Now, I am facing a little issue and could not solve it since last 2 months. One of the special characters (called tanween) is placed at the bottom of the preceding character whereas it should be on top.Everything else is working fine but little thing (ٍ
) displayed at wrong place changes the meaning of the word.
يمنع استخدام الهاتف الجوال داخل صالة الاختبار
منعاً باتاً
(I can not upload image as I need 10 reputation points for that, but please notice the little thing on top of this letter تاً. Here, it is displaying properly, but in the pdf it displays at the bottom of the letter.
Is there anyway to edit manually the positioning of this character?
I am searching for the solution for the last 2 months. I event wrote 2 emails to the author of TCPDF Nicolas, but he did not give any response.
Please help.
Even though the font conversion process appeared to work successfully, you should double-check with a font editor (like FontForge) to check that the character is actually encoded correctly in the converted font file.
I have found, after many years of trying to convert all sorts of non-Latin fonts from one format to another, that the most reliable solution for font conversion is this site:
http://www.xml-convert.com/en/convert-tff-font-to-afm-pfa-fpdf-tcpdf

iOS Font Not Properly Displaying All Unicode Characters

Example of two characters: U+22FF, U+23BA... and many others.
Is this an encoding layer that I'm misunderstanding for iOS? Like, at some point it no longer can properly display codes beyond 22B...?
I'm capturing this in an NSString, and trying to simply update a text field. Something like
NSString *test = #"\u23ba";
[displayText setText:test];
This will display a standard type error like a box with a question mark in it, or just a box (depending on the font).
Is there a way to expand the unicode options for iOS? Because these can be displayed on my Mac. Or, is my only option some variant of the NSAttributableString route?
U+22FF and U+23BA are valid codepoints (assigned to characters). But they are supported by a few fonts only. So you should first check which font(s) are being used, or available.
For example, U+22FF is included in Asana-Math, Cambria, Cambria Math, Code2000, DejaVu Sans (oddly, only Bold Oblique typefaces), FreeSerif, GNU Unifont, Quivira, Segoe UI Symbol, STIXMath, STX, Sun-ExtA, Symbola, XITS, XITSMath. U+23BA is included in Cambria, Cambria Math, Code2000, FreeMono, FreeSerif, GNU Unifont, Quivira, Segoe UI Symbol, Sun-ExtA, Symbola. Many of these are free fonts. Typographic quality varies a lot. Cambria fonts and Segoe UI Symbol are commercial, shipped with some Microsoft products. There are probably some other fonts that cover those characters, but not many (Everson Mono, I suppose, I don’t currently have it).

Win 7 and Win 2003 Server displays ascii value differntly

In our Visual Basic 6.0 program we used function chr(11) appended with some string and is displayed in text box.
In Windows 2003 Server, the value in text box is displayed as "a box(for chr(11)) followed by string"
In windows 7, the value in text box is displayed as "♂(for chr(11) followed by string"
Can anyone advice why it is behaving like this?
Thanks in advance.
It is probably a difference in fonts.
Even when the same "face name" is used the actual installed font can differ in terms of things like which glyphs are suported.
Note that your program isn't using ASCII in any sense of the word, but ANSI. The mapping from Unicode in your program to ANSI for display varies with Locale and Charset settings as well. Charset might also be a factor here.
Chr(11) says "take 11 and treat it as an ANSI character in the current codepage, convert that to Unicode, then return it as a Variant String."
Chr$(11) removes some of that overhead by returning a String, and ChrW$(11) is even cleaner, skipping the laundering through ANSI-to-Unicode conversion as well.
Faster yet is to just used the named constant for this character vbVerticalTab instead.
But none of that impacts display. It's more a question of avoiding unnecessary overhead.
You're relying on something that isn't reliable, i.e. that non-printable characters will always have a glyph. That "box" symbol you see means there is no glyph available for the character.
Even the Character Map applet doesn't display the glyph mapping for values below 33 (&H21).

How to correctly display Japanese RTF Fonts

I am working on an application in Delphi 2009 which makes heavy use of RTF, edited using TRichEdit and TLMDRichEdit. Users who entered Japanese text in these RTF controls have been submitting intermittent reports about the Japanese text being displayed as gibberish when reloading the content, both on Win XP and Vista, with Eastern Language Support installed.
Typically, English and Japanese is mixed and is mostly displayed without a problem, for example:
Inventory turns partnerships. 在庫回転率の
(my apologies if the Japanese text is broken incorrectly - I do not speak or read the language).
Quite frequently however, only the Japanese portion of the text will be gibberish, for example:
ŒÉñ?“]-¦Œüã‚Ì·•Ê‰?-vˆö‚ðŽû‰v‚ÉŒø‰?“I‚ÉŒ‹‚т‚¯‚é’mŽ¯‚ª‘÷Ý‚·‚é?(マーケットセクター、
見込み客の優 先順位と彼らに販売する知識)
From extensive online searching, it appears that the problem is as a result of the fonts saved as part of the RTF. Fonts present on Japanese language version of Windows is not necessarily the same as a US English version. It is possible to programmatically replace the fonts in the RTF file which yields an almost acceptable result, i.e.
-D‚‚スƒIƒyƒŒ[ƒVƒ・“‚ニƒƒWƒXƒeƒBƒbƒN‚フƒpƒtƒH[ƒ}ƒ“ƒX‚-˜‰v‚ノŒ‹‚ム‚ツ‚ッ‚ネ‚「‚±ニ‚ヘ?A‘‚「‚ノ-ウ‘ハ‚ナ‚ ‚驕B‚サ‚‚ヘAl“セ‚オ‚ス・‘P‚フˆロ‚ƒƒXƒN‚ノ‚ウ‚‚キB
However, there are still quite a few "junk" characters in there which are not correctly recognized as Japanese characters. Looking at the raw RTF you'll see the following:
-D\'82\'82\u65405?\'83I\'83y\'83\'8c[\'83V\'83\u12539?\ldblquote\'82\u65414?
Clearly, the Unicode characters are rendered correctly, but for example the \'82\'82 pair of characters should be something else? My guess is that it actually represents a double byte character of some sort, which was for some mysterious reason encoded as two separate characters rather than a single Unicode character.
Is there a generic, (relatively) foolproof way to take RTF containing Eastern Languages and reliably displaying it again?
For completeness sake, I updated the RTF font table in the following way:
Replaced the font name "?l?r ?o?S?V?b?N;" with "\'82\'6c\'82\'72 \'82\'6f\'83\'53\'83\'56\'83\'62\'83\'4e;"
Updated font names by replacing "\froman\fprq1\fcharset0 " with "\fnil\fprq1\fcharset128 "
Updated font names by replacing "\froman\fprq1\fcharset238 " with "\fnil\fprq1\fcharset128 "
Updated font names by replacing "\froman\fprq1 " with "\fnil\fprq1\fcharset128 "
Replacing font name "?? ?????;" with "\'82\'6c\'82\'72 \'82\'6f\'83\'53\'83\'56\'83\'62\'83\'4e;"
Update: Updating font names alone wont make a difference. The locale seems to be the big problem. I have seen a few site discussing ways around converting the display of Japanese RTF to something most reader would handle, but I haven't found a solution yet, see for example:
here and here.
My guess is that changing font names in the RTF has probably made things worse. If a font specified in the RTF is not a Unicode font, then surely the characters due to be rendered in that font will be encoded as Shift-JIS, not as Unicode. And then so will the other characters in the text. So treating the whole thing as Unicode, or appending Unicode text, will cause the corruption you see. You need to establish whether RTF you import is encoded Shift-JIS or Unicode, and also whether the machine you are running on (and therefore D2009 default input format) is Japanese or not. In Japan, if a text file has no Unicode BOM it would usually be Shift-JIS (but not always).
I was seeing something similar, but not with Japanese fonts. Just special characters like micro (as in microliters) and superscripts. The problem was that even though the RTF string I was sending to the user from an ASP.NET webpage was correct (I could see the encoded RTF stream using Fiddler2), when MS Word actually opened the RTF, it added a bunch of garbage escape codes like what I see in your sample.
What I did was to run the entire RTF text through a conversion routine that swapped all characters over ascii 127 to their special unicode point equivalent. So I would get something like \uc1\u181? (micro) for the special characters. When I did that, Word was able to open the file no problem. Ironically, it re-encoded the \uc1\uxxx? back to their RTF escaped equivalents.
Private Function ConvertRtfToUnicode(ByVal value As String) As String
Dim ch As Char() = value.ToCharArray()
Dim c As Char
Dim sb As New System.Text.StringBuilder()
Dim code As Integer
For i As Integer = 0 To ch.Length - 1
c = ch(i)
code = Microsoft.VisualBasic.AscW(c)
If code <= 127 Then
'Don't need to replace if one of your typical ASCII codes
sb.Append(c)
Else
'MR: Basic idea came from here http://www.eggheadcafe.com/conversation.aspx?messageid=33935981&threadid=33935972
' swaps the character for it's Unicode decimal code point equivalent
sb.Append(String.Format("\uc1\u{0:d}?", code))
End If
Next
Return sb.ToString()
End Function
Not sure if that will help your problem, but it's working for me.

Finding System Fonts with Delphi

What is the best way to find all the system fonts a user has available so they can be displayed in a dropdown selection box?
I would also like to distinguish between Unicode and non-Unicode fonts.
I am using Delphi 2009 which is fully Unicode enabled, and would like a Delphi solution.
The Screen.Fonts property is populated via the EnumFontFamiliesEx API function. Look in Forms.pas for an example of calling that function.
The callback function that it calls will receive a TNewTextMetricEx record, and one of the members of that record is a TFontSignature. The fsUsb field indicates which Unicode subranges the font claims to support.
The system doesn't actually have "Unicode fonts." Even the fonts that have the word Unicode in their names don't have glyphs for all Unicode characters. You can distinguish between bitmap, printer, and TrueType fonts, but beyond that, the best you can do is to figure out whether the font you're considering supports the characters you want. And if the font isn't what you'd consider a "Unicode font," but it supports all the characters you need, then what difference does it make? To get this information, you may be interested in GetFontUnicodeRanges.
The Microsoft technology for displaying text with different fonts based on which fonts contain which characters is Uniscribe, particularly font fallback. I'm not aware of any Delphi support for Uniscribe; I started writing a set of import units for it once, but my interests are fickle, and I moved on to something else before I completed it. Michael Kaplan's blog talks about Uniscribe sometimes, so that's another place to look.
I can answer half your question, you can get a list of the Fonts that your current environment has access to as a string list from the global Screen object
i.e.
Listbox1.Items.AddStrings(Screen.Fonts);
You can look in the forms.pas source to see how Codegear fill Screen.Fonts by enumerating the Windows fonts. The returned LOGFONT structure has a charset member, but this does not provide a simple 'Unicode' determination.
As far as I know Windows cannot tell you explicitly if a font is 'Unicode'. Moreover if you try to display Unicode text in a 'non-Unicode' font Windows may substitute a different font, so it is difficult to say whether a font will or will not display Unicode. For example I have an ancient Arial Black font file which contains no Unicode glyphs, but if I use this to display Japanese text in a D2009 memo, the Japanese shows up correctly in Arial and the rest in Arial Black. In other examples, the usual empty squares may show up.

Resources