What are the ways to create draw data structures for latex? - latex

I tried tikz/pgf a bit but have not had much luck creating a nice diagram to visualize bitfields or byte fields of packed data structures (i.e. in memory). Essentially I want a set of rectangles representing ranges of bits with labels inside, and offsets along the top. There should be multiple rows for each word of the data structure. This is similar to most of the diagrams in most processor manuals labeling opcode encoding etc.
Has anyone else tried to do this using latex or is there a package for this?

I have successfully used the bytefield package for something like this. If it doesn't do exactly what you want, please extend your question with an example...

You will find several examples with both tikz code source and a visual rendering of this code at http://www.texample.net/tikz/examples/

Related

How to properly do custom markdown markup

I currently work on a personal writing project which has ended up with me maintaining a few different versions due to the differences of the relevant platforms and output formats I want to support that are not trivially solved. After several instances of me glancing at pandoc and the sheer forest that it represents, I have concluded mere templates don't do what I need, and worse, that I seem to need a combination of a custom filter and writer... suffice to say: messing with the AST is where I feel way out of my depth. Enough so that, rather than asking specific questions of 'how do I do X' here, this is a question of 'is X the right way to go about it, or what is the proper way to do it, and can you give an example of how it ties together?'... so if this question is rather lengthy: my apologies.
My current goal is to have custom markup like the following which is supposed to 'track' which character says something:
<paul|"Hi there">
If I convert to HTML, I'd want something similar to:
<span class="speech paul">"Hi there"</span>
to pop out (and perhaps the <p> tags), whereas if it is just pure markdown / plain text, I'd want it to silently disappear:
"Hi there"
Looking at the JSON AST structures I've studied, it would make sense that I'd want a new structure type similar to the 'Emph' tag called 'Speech' which allows whole blobs of text to be put inside of it with a bit of extra information attached (the person speaking). So something like this:
{"t":"Speech","speaker":"paul","c":[ ... ] }
Problem #1: At the point a lua-filter sees the document, it is obviously already distilled to an AST. This means replacing the items in a manner similar to what most macro expander samples do cannot really work since it would require reading forward. With this method, I just replace bits and pieces in place (<NAME| becomes a StartSpeech and the first solitary > that follows becomes an EndSpeech, but that would make malformed input a bigger potential problem because of silent-ish failures. Additionally, these tags would be completely out of sorts with how an AST is supposed to look.
To complicate matters even further, some of my characters end up learning a secondary language throughout the story, for which I apply a different format that contains a simplified understanding of the spoken text with perspective-characters understanding of what was said. Example:
<paul|"Heb je goed geslapen?"|"Did you ?????">
I could probably add a third 'UnderstoodSpeech' group to my filter, but (problem #2) at this point, the relationship between the speaker, the original speech, and the understood translation is completely gone. As long as the final documents need these values in these respective orders and only in these orders, it is fine... but what if I want my HTML version to look like
"Did you?????"
with a tool-tip / hover-over effect containing the original speech? That would be near impossible to achieve because the AST does not contain that kind of relational detail.
Whatever kind of AST I create in the filter is what I need to understand in my custom writer. Ideally, I want to re-use as much stock functionality of pandoc as possible for the writer, but I don't even know if that is feasible at this point.
So now my question: could someone with great pandoc understanding please give me an example on how to keep relevant data-bits together and apply them in the correct manner? By this I mean show a basic example of what needs to be put in the lua-filter and lua-writer scripts in the following toolchain
[CUSTOMIZED MARKDOWN INPUT] -> lua-filter -> lua-writer -> [CUSTOMIZED HTML5 OUTPUT]

Detect table with OpenCV

I often work with scanned papers. The papers contain tables (similar to Excel tables) which I need to type into the computer manually. To make the task worse the tables can be of different number of columns. Manually entering them into Excel is mundane to say the least.
I thought I can save myself a week of work if I can put a program to OCR it. Would it be possible to detect headers text areas with the OpenCV and OCR the text behind the detected image coordinates.
Can I achieve this with the help of OpenCV or do I need entirely different approach?
Edit: Example table is really just a standard table similar to what you can see in Excel and other spread-sheet applications, see below.
This question seems a little old but i was also working on a similar problem and got my own solution which i am explaining here.
For reading text using any OCR engine there are many challanges in getting good accuracy which includes following main cases:
Presence of noise due to poor image quality / unwanted elements/blobs in the background region. This will require some pre-processing like noise removal which can be easily done using gaussian filter or normal median filter methods. These are also available in opencv.
Wrong orientation of image: Because of wrong orientation OCR engine fails to segment the lines and words in image correctly which gives the worst accuracy.
Presence of lines: While doing word or line segmentation OCR engine sometimes also tries to merge the words and lines together and thus processing wrong content and hence giving wrong results.
There are other issues also but these are the basic ones.
In this case i think the scan image quality is quite good and simple and following steps can be used solve the problem.
Simple image binarization will remove the background content leaving only necessary content as shown here.
Now we have to remove lines which in this case is tabular grid. This can also be identified using connected components and removing the large connected components. So our final image that is needed to be fed to OCR engine will look like this.
For OCR we can use Tesseract Open Source OCR Engine. I got following results from OCR:
Caption title
header! header2 header3
row1cell1 row1cell2 row1cell3
row2cell1 row2cell2 row2cell3
As we can see here that result is quite accurate but there are some issues like
header! which should be header1, this is because OCR engine misunderstood ! with 1. This problem can be solved by further processing the result using Regex based operations.
After post processing the OCR result it can be parsed to read the row and column values.
Also here in this case to classify the sheet title, heading and normal cell values their font information can be used.

Delphi component or library to display mathematical expressions

I'm looking for a simple component that displays mathematical expressions in Delphi. When I started out I thought it would be easy to find something on the net, but it turns out it was harder than anticipated. There are lots and lots of components that will parse mathematical expressions, but few (none?) that will display them.
Ideally I would like a component as simple as a TLabel, where I could set the caption to some expression and it would be displayed correctly, but some sort of library that let's me draw expressions to a canvas would also be sufficient for my needs.
Update:
I'm not talking about plotting graphs of functions or something like that. I want to display (for instance)
(X^2+3)/X
like this:
Solution:
MBo's answer was just what I was looking for. Some people may be put off by the fact that all comments and documentation are in Russian, but don't let that scare you. It was really easy to use.
Installation: Unzip the files (at least "ExprMake.pas" and "ExprDraw.pas") to a directory in your library path. That's it.
Use: I haven't experimented extensively with it, but these few lines demonstrates how easy it is.
procedure TForm1.Button1Click(Sender: TObject);
var
vExprC : TExprClass;
vExprB : TExprBuilder;
begin
vExprB := TExprBuilder.Create;
try
vExprC := vExprB.BuildExpr('(X^2+3)/X');
vExprC.Canvas := Canvas;
vExprC.Font.Size := 50;
vExprC.Draw(10,10,ehLeft,evTop);
finally
vExprC.Free;
vExprB.Free;
end;
end;
Native Delphi module by Anton Grigoriev to draw mathematical expressions. Assistant program - in Russian. This is how it looks.
Addition about credits:
Modules are free. The author asks only to mention (AboutBox etc) that mathematical expressions have been drawn by means of ExprDraw and ExprMake modules, written by Anton Grigoriev
(raw translation from readme.txt)
I don't know of a native Delphi implementation, but maybe this question is helpful to you: How to render a formula in WPF or WinForms. It mentions some C/C# solutions which could possibly translated or used as DLL (see the OP's solution).
Another alternative could be this Formulator ActiveX Control.
Furthermore it may broaden your search results if you use some other search criteria, especially without the "Delphi" keyword. ;-)
renderer, formula, math, MathML, expression, engine, tex, ...
And as we can learn from MBo's answer, it could also be a good idea to search in other languages :-)
delphi математических формул рисования
I'm sure you searched for something like that, but possibly there is one keyword that you have forgotten.
I was looking for a similar component for some time and MBo's solution would be acceptable.
I was convinced that it could be done also in another way: embedding a TWebBrowser and using an exixting javascript renderer for LaTex and MathML formulas, but...
I just tried QDSEquations and I think it's even a better solution!
⟪
Delphi component equation editor that allow you to enter and display math formulas of any complexity, from simple Greek symbols to
matrixes and complex integral expressions. You can use the equation
editor in your projects written in the Delphi environment, for
example, in programs testing knowledge of different mathematics fields
(mathematical analysis, discrete mathematics, probability theory and
so on), physics and other.
It’s quite easy to enter formulas in it:
simple symbols are entered similarly to entering data in a text field
special symbols and formula elements are entered with the help of an additional menu
⟫
It's better because you can edit formula directly in a "textfield" component with the help of an additional button-menu component and/or using a math expression string and/or using predefined methods.
Hope it helped!
I had the same problem several months ago, I solved it by getting a LaTeX renderer DLL which could be called from Delphi. Then you just called it, giving it the expression as a string, and it returned you a bitmap with the rendered expression in it.
I forgot the name unfortunately :( but you should be able to find it again by looking for "latex dll delphi"?

Searching Math Symbols as Text: Mission Impossible?

How could the math types being represented in a format that are searchable like text?
I mean that there is a toolbar that you can have input math symbols and search for them as text, so the format can represent math symbols as text.
Is it such a task impossible to implement because math types can be represented only as icons?
What do you think is the proper implementation of a new format that loads symbols in memory like text-format?
Are there any existing solutions of searchable Math symbols in pdf or in any other format?
(I do not take under consideration Latex since you should use words for searching but not using math symbols directly and using words for searching a long Math formula could be very complex for writing down and the user could prefer to scroll the document than writing the whole latex-format of the math type)
Designing new fonts that represent Math symbols can help of solving the problem or not at all?
Thanks in advance!
We had the same problem for musical notation. It was almost impossible to search for more obscure markings found in baroque music.
Our solution was to create a mapping table using SQL (SQL Server 2012) and then create xref tables as needed for the implementing products. This became necessary for some of the tablets that the music schools (mainly in the Northwest oddly) that had significantly different requirements.
Good luck

How to detect tabular data from a variety of sources

In an experimental project I am playing with I want to be able to look at textual data and detect whether it contains data in a tabular format. Of course there are a lot of cases that could look like tabular data, so I was wondering what sort of algorithm I'd need to research to look for common features.
My first thought was to write a long switch/case statement that checked for data seperated by tabs, and then another case for data separated by pipe symbols and then yet another case for data separated in another way etc etc. Now of course I realize that I would have to come up with a list of different things to detect - but I wondered if there was a more intelligent way of detecting these features than doing a relatively slow search for each type.
I realize this question isn't especially eloquently put so I hope it makes some sense!
Any ideas?
(no idea how to tag this either - so help there is welcomed!)
The only reliable scheme would be to use machine-learning. You could, for example, train a perceptron classifier on a stack of examples of tabular and non-tabular materials.
A mixed solution might be appropriate, i.e. one whereby you handled the most common/obvious cases with simple heuristics (handled in "switch-like" manner) as you suggested, and to leave the harder cases, for automated-learning and other types of classifier-logic.
This assumes that you do not already have a defined types stored in the TSV.
A TSV file is typically
[Value1]\t[Value..N]\n
My suggestion would be to:
Count up all the tabs
Count up all of new lines
Count the total tabs in the first row
Divide the total number of tabs by the tabs in the first row
With the result of 4, if you get a remainder of 0 then you have a candidate of TSV files. From there you may either want to do the following things:
You can continue reading the data and ignoring the error of lines with less or more than the predicted tabs per line
You can scan each line before reading to make sure all are consistent
You can read up to the line that does not fit the format and then throw an error
Once you have a good prediction of the amount of tab separated values you can use a regular expression to parse out the values [as a group].

Resources