Pandoc, markdown, powerpoint: support for equations? - latex

Pandoc can now generate powerpoint presentations from markdown. This seems to work quite well.
However, when I try to include an equation – even something as simple as $a=2$ – the whole contents of the slide disappears. Is this a bug or a feature? Can you include equations in PowerPoint presentations? I was hoping that I would be finally able to write my lectures (which need to be in powerpoint because of reasons) using TeX math syntax in a plain text file.
EDIT:
The command to convert the markdown document saved in the file test.rmd was
render("test.rmd")
Minimal example #1:
---
output: powerpoint_presentation
---
## Math test
This is a test (no maths)
Output:
Test example #2:
---
output: powerpoint_presentation
---
## Math test
This is a test $a=2$
Output:
Versions:
> rmarkdown::pandoc_version()
[1] ‘2.7.1’
> packageVersion("rmarkdown")
[1] ‘1.10’
MS PowerPoint 2007. Note that if Word output is chosen, the formula appears as expected.

The problem seems to be with PowerPoint. From what you found, and from what I can tell from searching the web, is seems safe to say that PowerPoint 2007 does not have full support for Microsoft's OMML math format (although Word 2007 does have support).
Unfortunately, inserting inline PNGs through pandoc is not possible with PowerPoint, so inserting formulas rendered as PNG won't work either. The only option would be to insert equation images as figures, but that would limit you to one equation per slide (or two when used with columns).

Related

Gibberish table output in tabula-java for Japanese PDF but works in standalone Tabula

I am trying to extract data from this Japanese PDF using tabula-py (and tabula-java), but the output is gibberish. In both tabula-py and tabula-java, the output isn't human readable (definitely not Japanese characters), and there are no no error/warning messages. It does seem that the content of the PDF is processed though.
When using the standalone Tabula tool, the characters are encoded properly:
Searching online in the tabula-py and tabula-java documentation, and below are suggestions I could find, but these don't change the output.
Setting the -Dfile.encoding=utf8 (in java call to tabula-py or tabula-java)
Setting chcp 65001 (in Windows command prompt)
I understand Tabula and tabula-java (and tabula-py) use the same library, but is there something different between the two that would explain the difference in encoding output?
Background info
There is nothing unusual in this PDF compared to any other.
The text like any PDF is written in authors random order so for example the 1st PDF body Line (港区内認可保育園等一覧) is the 1262nd block of text added long after the table was started. To hear written order we can use Read Aloud, to verify character and language recognition but unless the PDF was correctly tagged it will also jump from text block to block
So internally the text is rarely tabular the first 8 lines are
1 認可保育園
0歳 1歳 2歳3歳4歳5歳 計
短時間 標準時間
001010 区立
3か月
3455-
4669
芝5-18-1-101
Thus you need text extractors that work in a grid like manner or convert the text layout into a row by row output.
This is where all extractors will be confounded as to how to output such a jumbled dense layout and generally ALL will struggle with this page.
Hence its best to use a good generic solution. It will still need data cleaning but at least you will have some thing to work on.
If you only need a zone from the page it is best to set the boundary of interest to avoid extraneous parsing.
Your "standalone Tabula tool" output is very good but could possibly be better by use pdftotext -layout and adjust some options to produce amore regular order.
Your Question
the difference in encoding output?
The Answer
The output from pdf is not the internal coding, so the desired text output is UTF-8, but PDF does not store the text as UTF-8 or unicode it simply uses numbers from a font character map. IF the map is poor everything would be gibberish, however in this case the map is good, so where does the gibberish arise? It is because that out part is not using UTF-8 and console output is rarely unicode.
You correctly show that console needs to be set to Unicode mode then the output should match (except for the density problem)
The density issue would be easier to handle if preprocessed in a flowing format such as HTML
or using a different language

Let knitr/kable display latex code for further editing

I have to put all my code from a Rmarkdown document into pure LaTeX format in Overleaf. For this reason, I can't use my knitr code anymore to produce latex tables right away, but I need to write the LaTeX code by myself.
I through, although kable/kableExtra do now show LaTeX code in RStudio, the code must be produced in the background to make LaTeX able to read it.
Now my question: Is there any option to let me see the full LaTeX code produced by kable/kableExtra in order to take it and copy it into my pure LaTeX document?
I would be happy to hear your suggestions.
Best,
Moritz
There are different ways to get to the .tex file from an .Rmd file.
Convert only to LaTeX by using in your YAML header
---
[...]
output: rmarkdown::latex_document
---
Convert to PDF but keep the LaTeX file by using in your YAML header
---
[...]
output:
rmarkdown::pdf_document
keep_tex: yes
---
I prefer the second approach since it allows for previewing the document easily while editing the .Rmd file. Note that there are other output functions besides rmarkdown::pdf_function that support the keep_tex argument.

Generating LaTeX Server Side

I'm trying to build a service that accepts some string with LaTeX formatting and then returns a string with the LaTeX bits as pngs, or whatever else.
So, the idea is:
client sends a request containing: the point is that $sum_{n=1}^5 f(x)$ is a good estimate
server sends back the string: the point is that FORMULAS_HERE is a good estimate
I really have no idea where to begin getting the LaTeX converted. Naively, I assume I would just parse out the LaTeX bits and then do something to get a png/jpeg/etc... and then insert that into the response.
Googling around really reveals minimal information.
Currently, my simple server is built on node, but that's not really important. I can change languages if there's some magic solution out there. I honestly wish I could magically transform LaTeX into unicode and have it be perfectly seamless.
Question: How do I handle LaTeX on the server side?
- The goal is to then spit it back to the client so the text can be inlined relatively naturally (i.e. I could text my buddy Hey, what if $\chi(n)$ was considered independently? and it would be received formatted on the other end without begin a weird big picture blob).
Any advice on just a direction or set of packages/technologies/etc.. would be useful here.
Prepare your latex document with math and convert it using the excellent open-source ImageMagick
pdflatex formula.tex
convert -density 300 formula.pdf -quality 90 formula.png
The convert command used above is one of the ImageMagick tools. See documentation and numerous online resources for many options. The software has versions for all major platforms.
The input latex file should be prepared so that there is no background, margins, etc. For discussion of how to do that, see this post, and the source for it. The example above ultimately comes from there.
This is one way to write the formula.tex file used above, from the linked source.
\ifdefined\formula
\else
\def\formula{E = m c^2}
\fi
\documentclass[border=2pt]{standalone}
\usepackage{amsmath}
\usepackage{varwidth}
\begin{document}
\begin{varwidth}{\linewidth}
\[ \formula \]
\end{varwidth}
\end{document}
There are other converters out there but you need not bother if you can use this.
I have to mention MathJax. It runs in a browser, via one-line JavaScript snippet. Should you ever migrate to a browser/mobile service this would be a perfect solution. Here is their one page tutorial.

Octave: saving figure with greek letters and subscripts

I'm currently trying to save a stress vs. strain curve using Octave. On this plot, I want to include text showing the equation for calculating engineering stress and engineering strain. Both of these require greek letters (\sigma and \epsilon respectively) as well as subscripts for the formulae.
Currently, using print with -deps, -dpng, or any other device, it creates a file, however the greek letters appear as the words "sigma" and "epsilon", and wherever I have a subscript, such as 0, it just appears as "_0". This looks very unprofessional.
Since I'm generating some 25 graphs, I don't want to have to go through and do a screenshot for each one. Does octave support saving the generated figure as displayed? I intend to use the generated files in a LaTeX document later (preferably as png so I can email them separately too).
I've also tried changing the "graphics_toolkit" option between fltk and gnuplot however it doesn't seem to help.
Attached to this post is a screenshot of the desired results and the actual results.
I am currently "not allowed" to post images, so I'll link them:
http://i.imgur.com/Tjt5Ecn.png (screenshot, desired result) and http://i.imgur.com/SP3hekd.png (directly saved, actual result)
Does anyone know a good way to print a figure from Octave which includes greek characters and subscripts in the titles?
Since you plan to use your graph in a Latex document, generating the graphs with -depslatex and converting them to pdf is a good idea . (Results look slightly better than direct -dpdflatex).
With -depslatex, you can include Latex code in your figures that will be written to a separate tex file.
Note that you need to use double backslashes \\ to export a single backslash.
graphics_toolkit("gnuplot");
...
legend("$\\varepsilon$");
print(sprintf("graph%s_%d.eps", name, type), '-depslatex', '-S200,270', '-F:9');
system(sprintf("epstopdf graph%s_%d.eps", name, type));
On the Latex side, you then \input the tex file generated by Octave. On the plus side, since you need 25 graphs, you can automatize this process on both sides Octave and Latex.
\newcommand{\mygraph}[1]{%
\graphicspath{{./figures/}}
\resizebox{0.495\linewidth}{!}{\relscale{1.0}\small%
\input{./figures/#1.tex}
}%
}
\mygraph{graph1_1}
Here, a Latex command \mygraph is defined to scale and include a figure located in a subfolder.
(I am using Octave 4.0.0 with gnuplot 4.4 on Ubuntu 12)

Including full LaTeX documents within others

I'm currently finishing off my dissertation, and would like to be able to include some documents within my LaTeX document.
The files I'd like to include are weekly reports done in LaTeX to my supervisor. Obviously all documents are page numbered seperately.
I would like them to be included in the final document.
I could concatenate all the final PDFs using GhostScript or some other tool, but I would like to have consistent numbering throughout the document.
I have tried including the LaTeX from each document in the main document, but the preamble etc causes problems and the small title I have in each report takes a whole page...
In summary, I'm looking for a way of including a number of 1 or 2 page self-complete LaTeX files in a large report, keeping their original layouts, but changing the page numbering.
For a possible solution of \input-ing the original LaTeX files while skipping their preamble, the newclude package might help.
Otherwise, you can use pdfpages for inserting pre-existing PDFs into your dissertation. I seem to recall that it has a feature of "suppressing" the original page numbers by covering them up with white boxes.
The suggestion from #Will Robertson works great. I'd just like to add an example for all lazy people:
\usepackage{pdfpages}
...
% Insert _all_ pages from some_pdf.pdf:
\includepdf[pages=-]{some_pdf} % the .pdf extension may be omitted
From the documentation of the package:
To include a specific range of pages, you could do pages={4-9}. If start is omitted, it defaults to the first page, if end is omitted, it defaults to the last page.
To include it in landscape mode, do landscape=true
Maintaining the original formatting per document will be difficult if they're using different formats. For example, concatenating different document classes will be near impossible.
I would suggest you go with the GhostScript solution with a slight twist. Latex allows you to set the starting page number using \setcounter{page}{13} for example. If you can find an application that can count the pages of a PDF document (pdfinfo in the pdfjam Ubuntu package is one example), then you can do the following:
Compile the next document to PDF
Concatenate the latest PDF with the current full PDF
Find the page count of the full PDF
Use sed to pluck in a \setcounter{page}{N} command into the next latex file
Go back to the beginning
If you need to do any other processing, again use sed. You should (assuming you fix the infinite loop in the above algorithm ;-) ) end up with a final PDF document with all original PDFs concatenated and continuous line numbers.
Have a look a the combine package, which seems to be exactly what you're searching for.
Since it merges documents at the source level, I guess the page numbers will be correct.

Resources