I have a unicode string that could contain characters from a right to left language such as Arabic or Hebrew, but could also contain text from left to right languages. I need to be able to know at which end is the start and in which direction to step when stepping through the string from beginning to end depending on which language is in the string. Is there a standard way of dealing with this?
TMemo appears to handle this in the way I want. I paste some hebrew text into a TMemo and the direction that the caret moves is the reverse of the arrow keys I use. I can even have a mixture of english and hebrew text in the same memo and the direction the caret moves will depend on whether it's within an english or hebrew section of text. I'd like to replicate this behaviour. I attempted to look into the Delphi code including FMX.Memo and FMX.Text, but couldn't find the code responsible. I have a feeling that the code for handling this may be hidden in a dll. I could write code myself that contains a list of all possible right to left unicode characters to test if a character is RTL or LTR, but I'd like to make use of code that already exists if possible. Can anyone point me in the right direction?
I do know about the unicode RLM mark, which is an invisible character used to mark a section of RTL text, but I don't think this is being used by TMemo. The hebrew text I'm pasting into the TMemo doesn't contain this or any other invisible character.
Related
Is it possible to enter from keyboard special Unicode characters, link the ones below?
U+2603 ☃ SNOWMAN
U+2604 ☄ COMET
U+2605 ★ BLACK STAR
U+2606 ☆ WHITE STAR
U+2607 ☇ LIGHTNING
U+2608 ☈ THUNDERSTORM
U+2609 ☉ SUN
U+260A ☊ ASCENDING NODE
U+260B ☋ DESCENDING NODE
I would like for example to have buttons with up/down arrows in them, without loading images.
I have tried entering Alt+08593 on keyboard but other character (than the expected arrow) will be inserted.
Update:
The reason for this is LAZINESS. I am too lazy to search for icons or create my own icons. For example you can simply replace the notorious 'save' floppy disk icon. Just take a look at: 💾. BAM! Nice. Right?
Update:
It seems some characters such as 📗 (green book = 128215) are not accepted by Delphi, with copy/paste.
Update:
There is nice component that allows you to put unicode chanracters in a image list:
https://github.com/EtheaDev/IconFontsImageList
The Delphi IDE won't accept ALT key codes that high. A couple of alternatives:
Paste the text from somewhere else.
Enter the numeric code directly in the .dfm file.
As an example of the second approach, try this in your .dfm file for the button caption property:
Caption = #8592#8593#8594#8595
You also mention Green Book U+1F4D7. That is from outside the BMP, and hence encoded with a surrogate pair:
Caption = #55357#56535
My guess is that as soon as you want your glyphs to be shown in colour, or at a different size, you will find that using text makes this impossible. You are also likely to encounter fonts that don't contain glyphs for the characters you select. So you will find that using images is the most robust approach.
Or, alternatively, if you had a table of the decimal values:
9731 ☃ SNOWMAN
9732 ☄ COMET
9733 ★ BLACK STAR
9734 ☆ WHITE STAR
9735 ☇ LIGHTNING
9736 ☈ THUNDERSTORM
9737 ☉ SUN
9738 ☊ ASCENDING NODE
9739 ☋ DESCENDING NODE
then you can use the keyboard as follows in Delphi.
To change the caption of Button1 to be the snowman:
Press Alt+F12 to edit the form as text
Press Ctrl+E to enter incremental search mode
Type Button1, or as much of it as is required to locate the definition of Button1
To the right of the Caption = property definition (I'm assuming VCL here) enter # followed by the relevant Unicode value, e.g. #9731
Caption = #9731
If you want text as well as the snowman, the character code goes outside quotes, so e.g.
Caption = 'Snowman = '#9731
More info on the # syntax (which is more commonly entered in Delphi source, rather than in the text view of form files) can be found by reading about control strings, as they are actually called, in the online documentation.
I have a UILabel that is supposed to be two lines long. The text is localized into French and English.
I'm setting the attributedText property of the label. The text is constructed from three appended strings, say textFragment1, textFragment2 and textFragment3. The middle string is actually an image created with an NSTextAttachment.
I want to make sure that textFragment2 and textFragment3 are on the same line. The first string may or may not wrap depending on how long it is. The problem is that my French text is currently fairly short, so the line is wrapping after textFragment2.
I have fixed this temporarily by adding a line break symbol in the localized French text for textFragment1. I don't really love this solution though. I was wondering if there is a way to treat textFragment2 and textFragment3 so that they will always be together on the same line.
You could use a non-breaking space (\u00a0) to join textFragment2 and textFragment3. This character looks just like a normal space—i.e. it results in the same amount of whitespace—but line breaking will not take place on either side of it.
You could also use a zero-width space (\u2060). Using this character will not result in any whitespace, but it will still prevent line breaking on either side. This is what you want if you don’t want any space between textFragment2 and textFragment3 but you still want to prevent line breaking there. (It’s also useful if you have a word with a hyphen in the middle of it but you want to prevent the line from being broken after the hyphen.)
You can read more about these kinds of characters on Wikipedia.
Question
Is there a definitive way to detect if the text direction in NSString or NSAttributedString is right-to-left? Text is received from external resource, not entered in the app (can't use UITextInput protocol methods).
Bad solution
I have been detecting it using CFStringTokenizerCopyBestStringLanguage() and then +[NSLocale characterDirectionForLanguage:], but it is pretty unreliable for strings, where there are only a few arabic characters and many latin. They are processes as a RTL when displayed in the label, but incorrect direction detection makes inconsistent behaviour.
Investigation
During development, I have found out that when creating NSAttributedString with no attributes except for the font and displaying it using TTTAttributedLabel, RTL text aligns to the right edge. When using simple NSString, or when using standard UILabel with attributed string, alignment stays left.
So, there has to be something in the text that says "it is RTL" and there is some method to detect it.
I analysed what I am receiving from server, and there were no standard Unicode bidi chars. This is how it looks. JSON is sent in ascii with all unicode characters escaped:
"\u0643\u062a\u0628\u062a \u0648\u0648\u062c\u0650 .."
When unescaping, it looks like this, with \u0643 is on the far right before the dots:
"كتبت ووجِ .."
As far as I can tell, every character represents a single character in arabic language, and there are no U+202B or similar control characters which could be used to easily detect direction.
Yet, when rendering this text through TTTAttributedLabel, it aligns right. I started looking into the source code, and it doesn't do anything to detect the direction or set the alignment. It only created the framesetter using CTFramesetterCreateWithAttributedString() and then, when it gets lines from it using CTFrameGetLines() and line positions with CTFrameGetLineOrigins(), they are already aligned to the right.
So, does anyone know if direction can be detected solely from the text with publicly available (and preferably fast) API methods?
Check This link that helps to you Automatically align text in UILabel according to text language
Change this for right and left alignment
if (isCodePointStrongRTL(utf32chars[i]))
return 2;
if (isCodePointStrongLTR(utf32chars[i]))
return 0;
saw this comment on other question, and thought it's important for others to check this out. this was exactly what I was looking for RTL-LTR :
"
but I've been using a solution to explicitly set the direction based on known RTL languages, which used this as a starting point:
https://stackoverflow.com/a/16309559
"
I have an application that reads some strings and displays them in some UILabels, but these strings are not from one language so I need to be able to set the alignment of the label to right if the string is from a RTL language, and to left otherwise.
So is there a direct way to accomplish that, and if not what's the best way to get the language's direction from a string.
I'm thinking of checking the first X characters of the string and see what is the larger percentage of the LTR and RTL characters to set the alignment of the label according to it. Is this a good solution and if so where could I find a table that shows what are the unicodes that belongs to LTR characters and what those for RTL.
In programs with appropriate RTL support, you do not need to set the direction for any simple text like just words and spaces. Programs are expected to apply the properties of the characters themselves, as defined in character code standards.
If a piece of text to be rendered as a unit contains punctuation characters and/or a mix of LTR and RTL letters, the direction should be set either with control characters or at a higher protocol level. I don’t think you should do heuristics on this; rather, the overall direction should be a property of the text item, decided by the information provider (e.g., author, translator) and stored along with the text. In software localization, a set of localized values for texts normally has one language and the overall direction should be stored as part of the data set or inferred from the language.
Note that writing direction should also affect the overall layout direction (e.g., table columns should run from right to left for RTL languages) and the text alignment.
What was the original historical use of the vertical tab character (\v in the C language, ASCII 11)?
Did it ever have a key on a keyboard? How did someone generate it?
Is there any language or system still in use today where the vertical tab character does something interesting and useful?
Vertical tab was used to speed up printer vertical movement. Some printers used special tab belts with various tab spots. This helped align content on forms. VT to header space, fill in header, VT to body area, fill in lines, VT to form footer. Generally it was coded in the program as a character constant. From the keyboard, it would be CTRL-K.
I don't believe anyone would have a reason to use it any more. Most forms are generated in a printer control language like postscript.
#Talvi Wilson noted it used in python '\v'.
print("hello\vworld")
Output:
hello
world
The above output appears to result in the default vertical size being one line. I have tested with perl "\013" and the same output occurs. This could be used to do line feed without a carriage return on devices with convert linefeed to carriage-return + linefeed.
Microsoft Word uses VT as a line separator in order to distinguish it from the normal new line function, which is used as a paragraph separator.
In the medical industry, VT is used as the start of frame character in the MLLP/LLP/HLLP protocols that are used to frame HL-7 data, which has been a standard for medical exchange since the late 80s and is still in wide use.
It was used during the typewriter era to move down a page to the next vertical stop, typically spaced 6 lines apart (much the same way horizontal tabs move along a line by 8 characters).
In modern day settings, the vt is of very little, if any, significance.
The ASCII vertical tab (\x0B)is still used in some databases and file formats as a new line WITHIN a field. For example:
In the .mer file format to allow new lines within a data field,
FileMaker databases can use vertical tabs as a linefeed (see https://support.microsoft.com/en-gb/kb/59096).
I have found that the VT char is used in pptx text boxes at the end of each line shown in the box in oder to adjust the text to the size of the box.
It seems to be automatically generated by powerpoint (not introduced by the user) in order to move the text to the next line and fix the complete text block to the text box. In the example below, in the position of §:
"This is a text §
inside a text box"
A vertical tab was the opposite of a line feed i.e. it went upwards by one line. It had nothing to do with tab positions. If you want to prove this, try it on an RS232 terminal.
similar to R0byn's experience, i was experimenting with a Powerpoint slide presentation and dumped out the main body of text on the slide, finding that all the places where one would typically find carriage return (ASCII 13/0x0d/^M) or line feed/new line (ASCII 10/0x0a/^J) characters, it uses vertical tab (ASCII 11/0x0b/^K) instead, presumably for the exact reason that dan04 described above for Word: to serve as a "newline" while staying within the same paragraph. good question though as i totally thought this character would be as useless as a teletype terminal today.
I believe it's still being used, not sure exactly. There might be even a key combination of it.
As English is written Left to Right, Arabic Right to Left, there are languages in world that are also written top to bottom. In that case a vertical tab might be useful same as the horizontal tab is used for English text.
I tried searching, but couldn't find anything useful yet.