Converting mathtype equations from word docx to word equations using ruby - xml-parsing

I am trying to convert Docx to Html,
but the Docx might contain images, MathType equations in WMF format and word equations in tex format with $ delimiters.
I have tried converting Docx to Html using pandoc and LibreOffice:
Problems using pandoc:
Pandoc skip MathType equations so I had to read document.xml and convert WMF to png using gimp's convert command line tool.
This puts some of the equations in very unreadable format.
Problems using libreoffice:
It actually converts whole documents to HTML very nicely, but word equations get cut from sides during conversion.
What I want is some tool that can convert Docx to Html, and it converts MathType equations as well as word equations both to tex format.
I want to do this preferably in ruby, but any workaround or ideas are most welcome as well.
Thanks.

The pandoc docx reader supports only OMML math, not the old MathType.
You could write a pandoc filter that finds the math in the pandoc AST (it will be there as plain text) and convert them to pandoc math elements which the pandoc LaTeX writer will automatically convert to TeX math.

Related

How do I convert from postscript/gnuplot to pdf maintaining accented letters for latex document?

So I have to write my lab report in Italian for my lab class. In class they taught us how to use gnuplot to create graphs, so I'm using it to produce our graphs, which then I need to put in my latex document. The problem is that I have to set the label on the y axes as "velocità", and when I then save the file in ps and convert in pdf the 'à' disappears or is substituted by something else. What I've tried doing is using variations of the commands
set encoding iso_8859_1
set ylabel "velocit\340"
then I saved the plot using set term postscript color, set output "graf.ps", replot, and from the wsl terminal, using ps2pdf, I converted it into a pdf, but when I open the pdf, the letter 'à' doesn't appear anymore, even though it did show in graph previously generated by gnuplot. What should I do? In case, is there another way I can attach the original graph in my latex document?
Gnuplot provides several LaTeX-friendly terminal types. Postscript is not one of them. Postscript's character encodings are idiosyncratic at best. If your goal is to include gnuplot output in latex, then choose a terminal type that is designed for it. Some terminal types (e.g. cairolatex) work only with latex because they depend on latex to do all the text processing. Others (e.g. pdf, png, tikz) produce output that is fully compatible with latex but already has the text embedded in it. It is best to use UTF-8 encoding for everything, including your accented characters. For example:
set term pdf size 7cm,5cm
set output 'myfigure.pdf'
set encoding utf8
set ylabel "velocità"
set xlabel "tempo"
plot [0:10] x**2 title "velocità"
Then in your latex document, something like:
\usepackage[utf8]{inputenc}
\usepackage{graphicx}
...
My TeX document.
\begin{figure}[h]
\includegraphics{myfigure}
\end{figure}
...

pdflatex and pstricks

I normally use pdflatex to convert papers written as tex files to a pdf for publishing. With pdflatex any figures in the tex file can be PNG, PDF etc but not PostScript. Now I am writing a paper where I have some figures created with pstricks and that means I have to use latex to generate the a dvi file and then dvips and ps2pdf to obtain the final pdf file.
I cannot generate the pstricks figures separately because they contain citations and references to sections in the paper and if I use the sequence latex->dvips->ps2pdf for the whole tex file I cannot have other figures as PNG or PDF, they have to be converted to PostScript.
Is there an elegant solution to this?

Pandoc not generating new lines in markdown with latex

I am working on a .md file which includes latex. The file looks like this:
$$
1+1 = 2
\\
2+2 = 4
$$
The File:
When viewing it as markdown the file looks perfectly fine with the new line properly added.
Although when I use pandoc to write the file to a pdf the following happens:
PDF File (from pandoc)
As you can see the new line has been completely removed and makes the latex hard to read.
I am using the following pandoc command:
pandoc --wrap=preserve in.md -o out.pdf
The --wrap=preserve does not seem to be working as it ignores a new line. I have also tried to use \newline \linebreak instead of \\ and neither seem to be working.
How can I specify a line break so pandoc will make sure to keep the breaks rather than keeping everything inline?
Whatever file preview tool you are using: it is lying to you. Double-backslash is not the correct way to insert newlines into math.
Pandoc either parses the math and converts it into the target format, or just passes the code through, depending on the output format. For PDF output via LaTeX, the equation is just passed through. (You can check by running pandoc with --verbose, which, among other things, will print the generated raw LaTeX code.) So it is clear that the problem lies with the input.
There are multiple ways to add linebreaks into math in LaTeX. One of them is the align* environment:
\begin{align*}
1+1 = 2
\\
2+2 = 4
\end{align*}
This will give you the expected PDF output, but has the downside of not showing up in other output formats like HTML. I'm not aware of any method which would produce linebreaks in math equations across all possible pandoc output formats. You'll have to use multiple single-line equations if you need that.

Algorithm for conversion of MathML to AsciiMath and Latex to MathML

I need to convert LaTeX code to Mathml and Mathml to asciimath. Is there any algorithm or javascript code present to convert that. I don't want to use node packages.

Converting Asciidoc to LaTeX

I want to convert Asciidoc to LaTeX, then use an existing toolchain that includes LaTeX modules to convert the resulting document further to the final format. Asciidoc's native LaTeX conversion is "experimental" according to their documentation, and it also doesn't work for me. There is another toolchain supported by Asciidoc, which is converting to Docbook first, then use dblatex to convert it further. However, it includes a lot of formatting in its LaTeX output, which clashes with the formatting of my toolchain.
Is there any way to convert Asciidoc to LaTex in a way that the content is included in the resulting document, but without any exact formatting rules (expect those explicitly specified in the document). I don't want the LaTeX result to contain any information about fonts, page layout and so on, because for those I already have a toolchain.
I get acceptable, almost good results with this toolchain using pandoc convertor:
edit your document in asciidoc or asciidoctor
convert your document to docbook: asciidoctor -b docbook5 your asciidoc document.
convert your docbook document to (xe)latex using pandoc: pandoc -f docbook your docbook document --pdf-engine=xelatex
You can customize your latex layout and modules in a pandoc configuration file or convert your docbook file into a latex file with pandoc. The converted latex file is quite clean (because its source is docbook).

Resources