How to fully justify texts programmatically (Delphi)? - delphi

How can I fully justify a block of text (like MS Word does, not only on the right and not only on the left but on both sides)?
I want to justify some texts (mainly arabic text) adjusted to certain screen size (some handheld device screen actually, and its text viewer doesn't have this function) and save this text as justified. So I can reload and reuse it again elsewhere.
(The problem with MS word is, that if you copy the justified text from MS Word and paste it to another editor it'll copy it un-justified).
Update : for now I'm thinking of doing it like this:
get-a-word
get-word-width
add-word-to-total-Word and add-Word-width-to-total-word-width
check if total-Word-width = myscreen-width then continue
else if total-Word-width is between myscree-wdith and (myscreen-width -3) then
add-spaces-To-total-word until it = myscreen-width
This is what I'm thinking now, but I put this question up and hope to see if there is a better solution, or somebody else already implemented it.
PS: I hope I have made my question clear and I'm sorry for bad expression if there is.
edit1 : changed the title to make it more clear.

If you want to justify plain text, you can only add extra spaces to the lines to get them align on the left and right. Unfortunately the character widths differ in fonts; so doing it this way will only work for a certain font, unless you limit yourself to monospaced fonts where all characters have the same size.
If you want a result like in Word, adding spaces won't cut it. Word will not add spaces, but stretch and shrink the existing spaces. This information is lost when you copy and paste it into another app.
Either way, justifying is an optimization problem. If you are interested in a good solution and its implementation: have a look a TeX. For an implementation that works on plain text with monospaced fonts have a look at par

There are some API calls that may help:
ExtTextOut and GetCharacterPlacement
Look at the GCP_JUSTIFY flag for GetCharacterPlacement
ExtTextOut is used by Canvas.TextRect

The problem you are going to face is always going to be differences in the rendering of the font. Word handles full justification by adjusting kerning as well as adjusting the number of pixels between words by a few (either way). The end result is lined up both margins. This pixel adjustment is done BOTH ways, and as evenly as possible.
To properly handle this in your portable device you will have to also perform the same algorithm for the display of the text there.
If this is not possible, then the ONLY way you can even get somewhat close would be to add whitespace between words.

As has been pointed out in other answers Word does full justification by stretching the existing spaces often by very small amounts. This is only possible if you have full control over how your text is drawn on the screen (which word - or any other windows program has).
You only real option in this regard would be to implement your own text viewer on the platform you are targeting. Eg you would need to draw the text on the screen yourself (any platform that allows games should allow you to draw on the screen). However this seems like an awful lot of trouble to get justified text.
Sorry couldn't be of more help.

Related

How can I standardize the the varying truncating dot characters of UILabel?

I have a plist file which I decode to load data onto my application.
This plist file contains String type values that gets mapped to UILabel's text property.
I noticed that the truncating behavior of the text in the label is not always the same.
To be more specific, the three dots that are added when the text is truncated are, as opposed to my expectation, two kinds: one being ... and the other being ⋯ which appears to be this unicode character in this link.
I checked UILabel's attribute settings but I was unable to find any settings related to this behavior.
Has anyone else experienced this problem and standardized the truncating character to be ...?
Here is the image describing the problem mentioned above. Both labels have 2 lines and have new line escape character inserted between the first line and the second line of text. I am posting a link to this image because apparently I don't have enough reputation to post an image.
varying truncating characters of UILabel
IMO this is a bug in UILabel, and it may be worth opening a Feedback about it.
TL;DR: I recommend using TTTAttributedLabel.
Long-winded answer, because this was such an interesting question:
UILabel uses a different ellipsis based on the language script being truncated. As you've noticed, for most scripts, they use HORIZONTAL ELLIPSIS (…), or something very similar. But for Chinese, Japanese, and Korean (CJK), they use MIDLINE HORIZONTAL ELLIPSIS (⋯), or again, something very similar. The only other exception I've found is Burmese, which uses three circles that I don't recognize.
In my tests, all the following used …: Latin, Cyrillic, Bengali, Arabic, Hebrew, Hindi, Thai, Kannada, Nepali, and Mongolian (I kid. iOS can't layout Mongolian. Nobody can layout Mongolian, but it still uses …). UILabel even uses … for Lao, even though I thought ຯ was specifically for that, but I guess eventually everything becomes Latin.
The problem with UILabel being so clever for CJK and Burmese is that it decides what character to use exclusively by looking at the first character being removed. And it thinks SPACE is Latin (or at least not "special").
So what to do? My recommendation is probably to use TTTAttributedLabel, since it lets you configure the truncation character, and more importantly, is open source so you can fix it if it's not working the way you want.
The second option would be to truncate the text by hand using techniques like the one described in How to change truncate characters in UILabel?. There are probably better ways to do it using CTFrameGetVisibleStringRange instead of constantly shrinking the string until it fits, but I don't know if it's worth the effort. (If that path sounds useful, I could probably write up something that does it. It's just probably not worth the trouble.)
And the final option I know is to replace the SPACE character with an "equivalent" CJK character. The closest I've found that works is HANGUL FILLER (U+3164), but I don't like it. It's too wide, and I expect that it will make Korean uncomfortable to read (but I rarely try to read Korean, so I may be wrong here):
With SPACE: 안녕 하세요
With FILLER: 안녕ㅤ하세요
There's also HALFWIDTH HANGUL FILLER (U+FFA0), which is better, but UILabel seems to make it zero width (this may be a font issue, so maybe worth trying):
With SPACE: 안녕 하세요
With HALF: 안녕ᅠ하세요
let string = "안녕 하세요"
let filler = "\u{3164}"
label.text = string.replacingOccurrences(of: " ", with: filler)
OTOH, you may run into the same problem if you use any other non-CJK characters, like Latin punctuation or Arabic numerals. So this solution may not scale. And you should make sure that Voice Over properly ignores it.

Checking location of word range relative to the page

I am writing a vba macro that checks that word documents are formatted correctly to meet certain specifications. One of the things I have to check for are the left margins of each line - different paragraphs are supposed to have different first indents and hanging indents depending on the context. This should be as simple as checking the style, but unfortunately it is not - some of the documents use styles to change the indents, but others use manual spaces and tabs to position the text correctly. So I need some way to check the actual physical position of the first physical character in each Document.Paragraphs. I have no problem getting a range with the first visible character in the paragraph, but I'm not sure about getting the distance from the margin (or from the left side of the page - doesn't make a difference because the margins are consistent).
I found the Window.GetPoint method, but I'm nervous to use it, because that is based on the actual physical location on the screen. This macro is going to be used on different computers, with different versions of word, and I'm not sure about how it is affected by other view settings (like print layout, zoom, etc.) Is there a consistent way to use this method to determine the distance from the margin?
The other method would be (because all of the documents are in Courier New 12) to look at the firstindent property of the style, and the count manually all of the spaces and tabs (but that would need to take into account tabstops). This I'm also not sure how to do.
I would think that there should be a much simpler way of doing this, but I can't find it, so if anyone has any suggestions I would really appreciate any help.
It was there after all! Range.Information(wdHorizontalPositionRelativeToPage)

Character spacing in LaTeX with lstlisting package

I'm trying to get my code snippets to look as good as possible and so far I'm having troubles with the character spacing. Here is an example of the output:
alt text http://grab.by/grabs/2bb230de7c088d007733f52b95a40363.png
While the text in small is perfect, all the keywords that are in capital letters look terrible. Here are the settings I use
\lstset{basicstyle=\footnotesize, basewidth=0.5em}
If I increase the basewidth, the capital letters look good, but I can't get any decent sized line of code in one line. The following example does not fit in a page and I already put two line breaks in:
alt text http://grab.by/grabs/97ec29aa5a6811ce28bcd30bd389b52f.png
Does anyone have a clue how I can get this to work? Using \ttfamily does the trick, however, I'd prefer keeping the font.
Thanks.
If you prioritize looking nice, then using flexible colums is preferable:
\lstset{basicstyle=\footnotesize, columns=fullflexible}
You "obviously" need to scale the capital letters down horizontally. I do not know of a way to do this without actually editing the font itself.
However, you could put the entire listing into a \scalebox resp. \resizebox (from the graphicx package).
On a side note, the font you are using seems a bit strange, though, since the distance between small letters is significantly bigger than that between capital letters.

What is a vertical tab?

What was the original historical use of the vertical tab character (\v in the C language, ASCII 11)?
Did it ever have a key on a keyboard? How did someone generate it?
Is there any language or system still in use today where the vertical tab character does something interesting and useful?
Vertical tab was used to speed up printer vertical movement. Some printers used special tab belts with various tab spots. This helped align content on forms. VT to header space, fill in header, VT to body area, fill in lines, VT to form footer. Generally it was coded in the program as a character constant. From the keyboard, it would be CTRL-K.
I don't believe anyone would have a reason to use it any more. Most forms are generated in a printer control language like postscript.
#Talvi Wilson noted it used in python '\v'.
print("hello\vworld")
Output:
hello
world
The above output appears to result in the default vertical size being one line. I have tested with perl "\013" and the same output occurs. This could be used to do line feed without a carriage return on devices with convert linefeed to carriage-return + linefeed.
Microsoft Word uses VT as a line separator in order to distinguish it from the normal new line function, which is used as a paragraph separator.
In the medical industry, VT is used as the start of frame character in the MLLP/LLP/HLLP protocols that are used to frame HL-7 data, which has been a standard for medical exchange since the late 80s and is still in wide use.
It was used during the typewriter era to move down a page to the next vertical stop, typically spaced 6 lines apart (much the same way horizontal tabs move along a line by 8 characters).
In modern day settings, the vt is of very little, if any, significance.
The ASCII vertical tab (\x0B)is still used in some databases and file formats as a new line WITHIN a field. For example:
In the .mer file format to allow new lines within a data field,
FileMaker databases can use vertical tabs as a linefeed (see https://support.microsoft.com/en-gb/kb/59096).
I have found that the VT char is used in pptx text boxes at the end of each line shown in the box in oder to adjust the text to the size of the box.
It seems to be automatically generated by powerpoint (not introduced by the user) in order to move the text to the next line and fix the complete text block to the text box. In the example below, in the position of §:
"This is a text §
inside a text box"
A vertical tab was the opposite of a line feed i.e. it went upwards by one line. It had nothing to do with tab positions. If you want to prove this, try it on an RS232 terminal.
similar to R0byn's experience, i was experimenting with a Powerpoint slide presentation and dumped out the main body of text on the slide, finding that all the places where one would typically find carriage return (ASCII 13/0x0d/^M) or line feed/new line (ASCII 10/0x0a/^J) characters, it uses vertical tab (ASCII 11/0x0b/^K) instead, presumably for the exact reason that dan04 described above for Word: to serve as a "newline" while staying within the same paragraph. good question though as i totally thought this character would be as useless as a teletype terminal today.
I believe it's still being used, not sure exactly. There might be even a key combination of it.
As English is written Left to Right, Arabic Right to Left, there are languages in world that are also written top to bottom. In that case a vertical tab might be useful same as the horizontal tab is used for English text.
I tried searching, but couldn't find anything useful yet.

vertical edge setting

What column setting do you use in the IDE for the vertical edge. I use 80 columns in line mode, but I wanted to know if this is common or is there a more common standard? I have seen other options like background mode, but found it too distracting.
Vertical Edge, for those who are unfamiliar, is a line or an area which marks off the section where the code can be written. Anything beyond may not format the best way in other code readers or makes code readability tougher. Please correct if my understanding is inaccurate.
Widescreen monitors and a preference for a smaller font so I get more vertical lines makes 80 a little lacking on the wide side.
I don't have a vertical column setting. Any coding lines (usually ifs) that may be too wide, I split at logical operators. For text lines, its a bit more nebulous where I actually split them but split them conservative.
Note: Your question appears to be the same as: https://stackoverflow.com/questions/903754/do-you-still-limit-line-length-in-code and question 746853 (which I can't hyperlink to as I am a "new user")

Resources