I have a richedit containing lines using different fonts, styles, languages etc.
I am drawing in a gutter. I would like to start my drawing at the same y pixel position as the corresponding line.
Send the control an em_PosFromChar message. It returns the client coordinates of the character at the given index, although the documentation doesn't say what the coordinates represent (upper left corner, baseline center, or what). You're looking for the character's baseline.
Use em_LineIndex to get a character index for a given line number, if you don't already know the index of a character you're interested in.
Related
Context:
I'm working on a device which inserted between an electronic typewriter's controller and its keyboard turns it into a serial printer/terminal.
I want it to support some of the control sequences from ECMA-48 / ISO-6429 / ANSI X3.64. (also known as ANSI escape code)
I'm having some uncertainty if I'm understanding correctly the standard, so I would like to ask to know how it should be.
It's related to the commands SLH - SET LINE HOME and SLL - SET LINE LIMIT.
For example I could have the situation that I have 1/12 inch wide characters, I want a left margin of 1 inch, and 80 columns of text.
Then I would set page home to 13 and page limit to 92.
(since character positions are counted from 1, home is the first position, limit is the last)
So far ok.
But when I already have set the home, than how should the functions:
CHA - CURSOR CHARACTER ABSOLUTE
CUP - CURSOR POSITION
HPA - CHARACTER POSITION ABSOLUTE
CPR - ACTIVE POSITION REPORT
and others related to the cursor position work?
Should they use coordinates relative to to the actual edge, or to the home position.
So in my above example if I wanted to move to column 2 of the text print area (home being 13), I should use coordinate 2 or 14?
(similarly for vertical position and page home & limit)
My understanding is that these control sequences still use the absolute coordinates.
so in my example I would have to use coordinate 14.
Is this correct?
And if it is correct, this raises some additional problems:
I would have to know where the margins are to know which horizontal and vertical offset to use when moving the cursor to absolute positions.
If a program sets the margins first, then no problem, but I a program connects to the device and does not change the margins then it does not know the offset.
(There is a way, it could send a carriage return to move to the home position and then request the ACTIVE POSITION REPORT to discover the left margin position, but it does not look like a nice solution)
What should my device do if it is requested to move to a position outside the defined home and limit?
The standard says that beyond these limits no implicit movement should happen, but this is explicit movement.
If it receives a command to move to position 1 when the home is 13 what should it do? Move to 1? Move to 13? ignore?
When it is at position 15, home is at 13 and receives command to move cursor left by 4 positions should it move by 4 to 11? move by 2 to 13? ignore?
Another problem I see is that there is a command to set page home, and page limit, but not total page height.
It is only possible to select predefined formats by PFS - PAGE FORMAT SELECTION.
But I don't see a way to select any other height.
If I want to use continuous paper with 12 inch long pages (72 lines of text at 1/6 inch line height) connected together into a long tape then I see no way to define that height so that my device can correctly keep track of its positions on the following pages. Is there a way to do it?
Looks like I had to find the answers by myself.
question 0:
Yes, it appears, that the coordinates should still be absolute.
he standard says about character positions in a line and line positions in a page and these are specified in the beginning of the document and nowhere at all is said about it being relative. Looks like the only role of line home and limit is the place where CR (and some others) returns to, and limit of where implicit movement (like advancing forward after printing a character) can go, similar for page home and page limit.
question 1:
There is no easy way for a program to recognise where the home and limit positions are. As I mentioned, requesting ACTIVE POSITION REPORT can help if this is implemented. (my devece does not support it yet).
Anyway, many programs don't recognise the concept of line home, and assume that normal character positions start from 1.
My solution to this is that after power on, the line home IS exactly at position 1, and if you want something else, you have to specify it.
This way a program can safely make this assumption.
(However after the PFS - PAGE FORMAT SELECTION command I do set the line home to 1 inch as this is what the standard proposes)
question 2:
As above, the home and limit are only a margin for implicit movements. So the cursor movement commands will move outside these limits with no problem. Only the actual page size will limit them.
question 3:
(but I didn't give it a number when asked)
DTA - DIMENSION TEXT AREA is the command for this purpose. It specifies the size of the text area limited by the actual page size, not by the home and limit positions.
I'm using the iOS Vision framework to perform OCR via a VNRecognizeTextRequest call, and I'm trying to locate each individual character in the resulting VNRecognizedText observations. However, when I call the boundingBox(for range: Range<String.Index>) method on any VNRecognizedText object and for any valid range within the recognized text, I get the same bounding box back. This bounding box corresponds to the bounding box of the entire string.
Am I misunderstanding the boundingBox(for:) method, or is there some other way to get discrete location info for single characters within a recognized text observation?
Thanks in advance!
Edit:
After looking into this more, I've realized that there's some sort of link with word groups and whitespace.
Consider a recognized text observation with a string value of "Foo bar". Calling boundingBox(for:) for each character in "Foo" returns the exact same bounding box which, based on the dimensions, seems to correspond to the entire substring "Foo" instead of the single character whose range we pass into the boundingBox method. Then, in another bit of strange behavior, the boundingBox for the whitespace character is simply an empty region at the origin whose edges don't correspond with the substrings on either side of it. Finally, the behavior for the second substring is the same as the first: each character in "bar" has the same bounding box.
After hours of further investigation, I decided to get in touch with Apple Developer Tech Support. Sure enough, this is a bug! When VNRecognizeTextRequest.recognitionLevel is set to .accurate, as I had, the bug manifests. When recognitionLevel is set to .fast, the results behave as expected, with discrete bounding boxes per character.
I am using a parser to retrieve the AST of some code typed in an editor. What I would like to achieve now is this: Given the cursor position in the editor (row, column), find the currently selected AST element. However, I have no idea how this can be done, are there any standard ways to solve this?
First you stamp each AST node with source file position (line and column number).
Second, you build a map in the editor: abstractly, for each pixel, the line and column number of the source file being displayed. (In practice, if your displayed lines are fixed height, and your displayed characters are fixed with, you can get by with a map from the displayed-line-number to the source line number).
Now mapping back and forth from screen position (e.g., cursor location) to AST node is easy, even if you edit the tree and/or change what part is displayed.
Some complications occur when you insert new tree nodes in the tree because they don't have a "file position". That's OK, you can assign them arbitrary line/column numbers that don't overlap with any existing line numbers. When you write out the modified tree to file, you don't really need the line/column numbers anymore, so you can ignore them all.
I've ran in to an issue concerning generating floating point coordinates from an image.
The original problem is as follows:
the input image is handwritten text. From this I want to generate a set of points (just x,y coordinates) that make up the individual characters.
At first I used findContours in order to generate the points. Since this finds the edges of the characters it first needs to be ran through a thinning algorithm, since I'm not interested in the shape of the characters, only the lines or as in this case, points.
Input:
thinning:
So, I run my input through the thinning algorithm and all is fine, output looks good. Running findContours on this however does not work out so good, it skips a lot of stuff and I end up with something unusable.
The second idea was to generate bounding boxes (with findContours), use these bounding boxes to grab the characters from the thinning process and grab all none-white pixel indices as "points" and offset them by the bounding box position. This generates even worse output, and seems like a bad method.
Horrible code for this:
Mat temp = new Mat(edges, bb);
byte roi_buff[] = new byte[(int) (temp.total() * temp.channels())];
temp.get(0, 0, roi_buff);
int COLS = temp.cols();
List<Point> preArrayList = new ArrayList<Point>();
for(int i = 0; i < roi_buff.length; i++)
{
if(roi_buff[i] != 0)
{
Point tempP = bb.tl();
tempP.x += i%COLS;
tempP.y += i/COLS;
preArrayList.add(tempP);
}
}
Is there any alternatives or am I overlooking something?
UPDATE:
I overlooked the fact that I need the points (pixels) to be ordered. In the method above I simply do scanline approach to grabbing all the pixels. If you look at the 'o' for example, it would grab first the point on the left hand side, then the one on the right hand side. I would need them to be ordered by their neighbouring pixels since I want to draw paths with the points later on (outside of opencv).
Is this possible?
You should look into implementing your own connected components labelling. The concept is very simple: you scan the first line and assign unique labels to each horizontally connected strip of pixels. You basically check for every pixel if it is connected to its left neighbour and assign it either that neighbour's label or a new label. In the second row you do the same, but you also check against the pixels above it. Sometimes you need a label merge: two strips that were not connected in the previous row are joined in the current row. The way to deal with this is either to keep a list of label equivalences or use pointers to labels (so you can easily do a complete label change for an object).
This is basically what findContours does, but if you implement it yourself you have the freedom to go for 8-connectedness and even bridge a single-pixel or two-pixel gap. That way you get "almost-connected components labelling". It looks like you need this for the "w" in your example picture.
Once you have the image labelled this way, you can push all the pixels of a single label to a vector, and order them something like this. Find the top left pixel, push it to a new vector and erase it from the original vector. Now find the pixel in the original vector closest to it, push it to the new vector and erase from the original. Continue until all pixels have been transferred.
It will not be very fast this way, but it should be a start.
What was the original historical use of the vertical tab character (\v in the C language, ASCII 11)?
Did it ever have a key on a keyboard? How did someone generate it?
Is there any language or system still in use today where the vertical tab character does something interesting and useful?
Vertical tab was used to speed up printer vertical movement. Some printers used special tab belts with various tab spots. This helped align content on forms. VT to header space, fill in header, VT to body area, fill in lines, VT to form footer. Generally it was coded in the program as a character constant. From the keyboard, it would be CTRL-K.
I don't believe anyone would have a reason to use it any more. Most forms are generated in a printer control language like postscript.
#Talvi Wilson noted it used in python '\v'.
print("hello\vworld")
Output:
hello
world
The above output appears to result in the default vertical size being one line. I have tested with perl "\013" and the same output occurs. This could be used to do line feed without a carriage return on devices with convert linefeed to carriage-return + linefeed.
Microsoft Word uses VT as a line separator in order to distinguish it from the normal new line function, which is used as a paragraph separator.
In the medical industry, VT is used as the start of frame character in the MLLP/LLP/HLLP protocols that are used to frame HL-7 data, which has been a standard for medical exchange since the late 80s and is still in wide use.
It was used during the typewriter era to move down a page to the next vertical stop, typically spaced 6 lines apart (much the same way horizontal tabs move along a line by 8 characters).
In modern day settings, the vt is of very little, if any, significance.
The ASCII vertical tab (\x0B)is still used in some databases and file formats as a new line WITHIN a field. For example:
In the .mer file format to allow new lines within a data field,
FileMaker databases can use vertical tabs as a linefeed (see https://support.microsoft.com/en-gb/kb/59096).
I have found that the VT char is used in pptx text boxes at the end of each line shown in the box in oder to adjust the text to the size of the box.
It seems to be automatically generated by powerpoint (not introduced by the user) in order to move the text to the next line and fix the complete text block to the text box. In the example below, in the position of §:
"This is a text §
inside a text box"
A vertical tab was the opposite of a line feed i.e. it went upwards by one line. It had nothing to do with tab positions. If you want to prove this, try it on an RS232 terminal.
similar to R0byn's experience, i was experimenting with a Powerpoint slide presentation and dumped out the main body of text on the slide, finding that all the places where one would typically find carriage return (ASCII 13/0x0d/^M) or line feed/new line (ASCII 10/0x0a/^J) characters, it uses vertical tab (ASCII 11/0x0b/^K) instead, presumably for the exact reason that dan04 described above for Word: to serve as a "newline" while staying within the same paragraph. good question though as i totally thought this character would be as useless as a teletype terminal today.
I believe it's still being used, not sure exactly. There might be even a key combination of it.
As English is written Left to Right, Arabic Right to Left, there are languages in world that are also written top to bottom. In that case a vertical tab might be useful same as the horizontal tab is used for English text.
I tried searching, but couldn't find anything useful yet.