Tex to Word Pipeline With Reference File - latex

I would like to convert my overleaf template to a word document for my collaborators to edit directly outside of Overleaf. I am aware of Pandoc to convert the text file to word
pandoc -o Test.docx Test.tex
However, my tex document uses references in .bib format (and a separate file) and those are lost in the conversion.
My File Structure:
- Project
|
|-- Test.tex
|-- References.bib
|-- Test.pdf
Test.pdf does include the references, however is there an option to include references in pandoc or does anyone have a pdf to word converted they recommend that is open source?

I did some research concerning this topic (LaTeX to Word conversion) recently. It seems you can generate the pdf file and then directly open it in Word. The only problem is that hyperlinks and cross-references will be lost. See here.
Although you asked for an open source option, I suppose you (or your collaborators) have access to Word, since you want to convert to it.
It is also not clear if you need the hyperlinks to be preserved.

Related

Rmarkdown with pandoc templates, apply lua filter on intermediate .tex

I'm trying to use lua filters to capture images in my manuscript and list their caption in a special \section at the end of it.
I am working on a rmarkdown document that itself uses a .tex template.
I wasn't able to get anywhere, so I run a very simple filter:
function Header (head) print(pandoc.utils.stringify(head)) end
and noticed that just the headers in the markdown were recognized, not the ones in the ones in the template.
The only way I found to have lua filters recognize the elements in the template was to rerun the produced .tex file with pandoc:
pandoc -f latex -t latex -o test2.tex --lua-filter=my_filters.lua test.tex
but that removed all latex formatting and structure content outside the body, e.g., \documentclass, \usepackage and other custom commands. So it's a no go.
So the question is, is there a way to force lua filter to be applied after the integration of a latex template when knitting a rmarkdown document?
There might be a way, but it most likely won't do what you need.
When pandoc reads a document, it parses it and converts it into it's internal data structure. That internal structure can then be modified with a filter. LaTeX is a very expressive and complex document format, and any conversion from LaTeX into pandoc's internal format will result in a loss of (layout) information. That's good enough in most cases, but would be a problem in your case.
There are two possible ways to do this: one is to post-process the output, which is probably tedious and error-prone. The other is to find a way to generate the desired output, e.g. via a pandoc filter, without adding it to the template first.
I believe your other question is the right way to go.

Converting Asciidoc to LaTeX

I want to convert Asciidoc to LaTeX, then use an existing toolchain that includes LaTeX modules to convert the resulting document further to the final format. Asciidoc's native LaTeX conversion is "experimental" according to their documentation, and it also doesn't work for me. There is another toolchain supported by Asciidoc, which is converting to Docbook first, then use dblatex to convert it further. However, it includes a lot of formatting in its LaTeX output, which clashes with the formatting of my toolchain.
Is there any way to convert Asciidoc to LaTex in a way that the content is included in the resulting document, but without any exact formatting rules (expect those explicitly specified in the document). I don't want the LaTeX result to contain any information about fonts, page layout and so on, because for those I already have a toolchain.
I get acceptable, almost good results with this toolchain using pandoc convertor:
edit your document in asciidoc or asciidoctor
convert your document to docbook: asciidoctor -b docbook5 your asciidoc document.
convert your docbook document to (xe)latex using pandoc: pandoc -f docbook your docbook document --pdf-engine=xelatex
You can customize your latex layout and modules in a pandoc configuration file or convert your docbook file into a latex file with pandoc. The converted latex file is quite clean (because its source is docbook).

Combine rmarkdown generated latex documents in Rstudio without having to manually delete preambles

The problem:
I have several Rmarkdown documents representing different sections of my thesis (e.g. chapter 1 results, chapter 2 results), and I would like to combine the knitted latex versions of these documents with other TeX files into a master TeX 'thesis' document. The problem is, that when knitting to PDF and keeping the TeX file to include in the master TeX document, Rstudio auto-generates a bunch of preamble that ends up clashing with my preamble in the master.TeX document.
A less than ideal, workaround:
I have been working around this issue by deleting this preamble by hand before including the knitted output into the master.tex file.
So far my current workflow is as follows:
set the output to pdf_document in the YAML header, with the keep_tex option set to true.
knitPDF using the Rstudio button
delete the preamble at the beginning of the knitted TeX files by hand.
include the files into my master.tex file using \include{}
The question:
I would like to convert Rmd to LaTeX, without having to delete the preamble manually before including the knitted TeX files into a master TeX file. How do I do this?
Sweave is not an option:
I'm sure I could achieve this by working with Sweave, however I like the flexibility of output formats that Rmarkdown offers: I can knit to html and view in my browser as I'm writing, to quickly view the progress of my work and ensure that there are no bugs, and I can also choose to knit to word to share with my supervisor who works only in word. Then, when I'm ready to compile everything into my master LaTeX document for printing, I can select knit to PDF in Rstudio to obtain the latex output.
I have since come up with a workaround:
create_latex <- function(f){
knitr::knit(f, 'tmp-outputfile.md');
newname <- paste0(tools::file_path_sans_ext(f), ".tex")
mess <- paste('pandoc -f markdown -t latex -p -o', shQuote(newname),"tmp-outputfile.md")
system(mess)}
The function above takes an Rmd file as its input, knits the file to md, and then converts it to TeX by calling pandoc from the command-line.
The secret ingredient lies in the pandoc call... When knitting using Rstudio, Rstudio must be calling the pandoc standalone -s flag when it compiles the pdf. This generates a 'standalone' document, i.e. one that contains latex preamble. This is obviously necessary when you want to view the PDF, but conflicts with my needs.
I am instead seeking to generate a latex 'fragment' from Rmd with knitr, that can be later incorporated into my master Latex file. So the solution was simply to create a pandoc command line call that omits the -s standalone flag. I achieved this by calling pandoc from R with system() inside the above function.
Hope this helps anyone out there having this problem, but would be great if I was able to change Rstudio's settings to avoid bothering with this hack. Suggestions and feedback welcome.
Here is a simpler solution using the LaTeX package docmute.
Your main Rmarkdown document, (e.g., main.Rmd) should load the docmute package and include your other files (e.g., part1.Rmd and part.Rmd), once knitted and located in the same directory, with \input or \includelike this:
---
title: "Main"
output: pdf_document
header-includes:
- \usepackage{docmute}
---
# Part 1
\input{part1.tex}
# Part 2
\input{part1.tex}
Then your other files just need to have a keep_tex argument in the YAML front matter. Like this:
---
title: "Part 1"
output:
pdf_document:
keep_tex: true
---
Something
and this:
---
title: "Part 2"
output:
pdf_document:
keep_tex: true
---
Something else
Then all you need to do is knit part*.Rmd's before main.Rmd and all the content will appear in main.tex and main.pdf. The docmute package strips the preamble from the input .tex files so you will need to make sure anything you need before \begin{docunent} in the input files is also in main.Rmd.

How to handle citations in Ipython Notebook?

What is the best way to take care of citations in Ipython Notebook? Ideally, I would like to have a bibtex file, and then, as in latex, have a list of shorthands in Ipython markdown cells, with the full references at the end of the notebook.
The relevant material I found is this: http://nbviewer.ipython.org/github/ipython/nbconvert-examples/blob/master/citations/Tutorial.ipynb
But I couldn't follow the documentation very well. Can anyone explain it? Thanks so much!!
Summary
This solution is largely based on Sylvain Deville's excellent blog post. It allows you to simply write [#citation_key] in markdown cells. The references will be formatted after document conversion. The only requirements are LaTeX and pandoc, which are both widely supported. While there is never a guarantee, this approach should therefore still work in many years time.
Step-by-Step Guide
In addition to a working installation of jupyter you need:
LaTeX (installation guide).
Pandoc (installation guide).
A citation style language. Download a citation style, e.g., APA. Save the .csl file (e.g., apa.csl) into the same folder as your jupyter notebook (or specify the path to the .csl file later).
A .bib file with your references. I am using a sample bib file list.bib. Save to the same folder as your jupyter notebook (or specify the path to the .bib file later).
Once you completed these steps, the rest is easy:
Use markdown syntax for references in markdown cells in your jupyter notebook. E.g., [#Sh:1] where the syntax works like this: ([#citationkey_in_bib_file]). I much prefer this syntax over other solutions because it is so fast to type [#something].
At the end of your ipython notebook, create a code cell with the following syntax to automatically convert your document (note that this is R code, use an equivalent command to system() for python):
#automatic document conversion to markdown and then to word
#first convert the ipython notebook paper.ipynb to markdown
system("jupyter nbconvert --to markdown paper.ipynb")
#next convert markdown to ms word
conversion <- paste0("pandoc -s paper.md -t docx -o paper.docx",
" --filter pandoc-citeproc",
" --bibliography="listb.bib",
" --csl="apa.csl")
system(conversion)
Run this cell (or simply run all cells). Note that the 2nd system call is simply pandoc -s paper.md -t docx -o paper.docx --filter pandoc-citeproc --bibliography=listb.bib --csl=apa.csl. I merely used paste0() to be able to spread this over multiple lines and make it nicer to read.
The output is a word document. If you prefer another document, check out this guide for alternative syntax.
#Extras
If you do not like that your converted document includes the syntax for the document conversion, insert a markdown cell above and below the code cell with the syntax for the conversion. In the cell above, enter <!-- and in the cell below enter -->. This is a regular HTML command for a comment, so the syntax will in between these two cells will be evaluated but not printed.
You can also include a yaml header in your first cell. E.g.,
---
title: This is a great title.
author: Author Name
abstract: This is a great abstract
---
You can use the Document Tools of the Calico suite, which can be installed separately with:
sudo ipython install-nbextension https://bitbucket.org/ipre/calico/downloads/calico-document-tools-1.0.zip
Read the tutorial and watch the YouTube video for more details.
Warning: only the cited references are processed. Therefore, if you fail to cite an article, it won't appear in the References section. As a little working example, copy the following in a Markdown cell and press the "book" icon.
<!--bibtex
#Article{PER-GRA:2007,
Author = {P\'erez, Fernando and Granger, Brian E.},
Title = {{IP}ython: a System for Interactive Scientific Computing},
Journal = {Computing in Science and Engineering},
Volume = {9},
Number = {3},
Pages = {21--29},
month = may,
year = 2007,
url = "http://ipython.org",
ISSN = "1521-9615",
doi = {10.1109/MCSE.2007.53},
publisher = {IEEE Computer Society},
}
#article{Papa2007,
author = {Papa, David A. and Markov, Igor L.},
journal = {Approximation algorithms and metaheuristics},
pages = {1--38},
title = {{Hypergraph partitioning and clustering}},
url = {http://www.podload.org/pubs/book/part\_survey.pdf},
year = {2007}
}
-->
Examples of citations: [CITE](#cite-PER-GRA:2007) or [CITE](#cite-Papa2007).
This should result in the following added Markdown cell:
References
^ PĂ©rez, Fernando and Granger, Brian E.. 2007. IPython: a System for Interactive Scientific Computing. URL
^ Papa, David A. and Markov, Igor L.. 2007. Hypergraph partitioning and clustering. URL
I was able to run it with the following approach:
Insert the html citation as in the tutorial you mentioned.
Create ipython.bib in the "standard" bibtex format. It goes into the same file as your *.ipynb notebook file.
Create the template file as in the tutorial, also in the same directory or else in the (distribution dependent) directory with the other templates. On my system, that's /usr/local/lib/python2.7/dist-packages/IPython/nbconvert/templates/latex.
The tutorial has the template extend latex_article.tplx. On my distribution, it's article.tplx (without latex_).
Run nbconvert with --to latex; that generates an .aux file among other things. Latex will complain about missing references.
Run bibtex yournotebook.aux; this generates yournotebook.bbl. You only need to re-run this if you change references.
Re-run nbconvert either with --to latex or with --to pdf. This generates a .tex file, or else runs all the way to a .pdf.
If you want html output, you can use pandoc to assemble the references into a tidy citation page. This may require some hand-editing to make an html page you can reference from your main document.
If you know that you will be converting your notebook to latex anyway, consider simply adding a "Raw" cell (Ctrl+M R) to the end of the document, containing the bibliography just as you would put it in pure LaTeX.
For example, when I need to reference a couple of external links, I would not even care to do a proper BibTeX thing and simply have a "Raw" cell at the end of the notebook like that:
\begin{thebibliography}{1}
\bibitem{post1}
Holography in Simple Terms. K.Tretyakov (blog post), 2015.\\
\url{http://fouryears.eu/2015/07/24/holography-in-simple-terms/}
\bibtem{book1}
The Importance of Citations. J. Smith. 2010.
\end{thebibliography}
The items can be cited in other Markdown cells using the usual <cite data-cite="post1">(KT, 2015)</cite>
Of course, you can also use proper BibTeX as well. Just add the corresponding Raw cell, e.g:
\bibliographystyle{unsrt}
\bibliography{papers}
This way you do not have to bother editing a separate template file (at the price of cluttering the notebook's HTML export with raw Latex, though).
You should have a look at the latex_envs extension in https://github.com/ipython-contrib/IPython-notebook-extensions (install from this repo, it is the most recent version). This extension contains a way to integrate bibliography using bibtex files and standard latex notation, and generates a bibliography section at the end of the notebook. Style of citations can be (to some extent) customized. Some documentation here https://rawgit.com/jfbercher/latex_envs/master/doc/latex_env_doc.html

Convert EPS to PDF on the fly with pdflatex on the fly

I'm trying to include an EPS figure in a document that will be compiled using pdflatex. Of course, the picture can be converted to pdf using epstopdf (which comes with the MikTeX distribution). Is there any way to do this on the fly, that is, make pdflatex do the conversion?
I'm looking for such a solution because I want to set up an easy-to-learn environment for students. Ideally, the converted picture is placed in the directory that also contains the original .eps, and the .pdf is used if available.
The relevant answer in the TeX FAQ points to epstopdf.sty, included with Heiko Oberdiek's packages.
I would recommend using latex-mk which is a nice way to have a very simple Makefile for latex construction. Of course you can have eps file converted to pdf, or fig to eps, etc, during the build process.
Currently my Makefile look like that :
NAME=report
TEXSRCS=report.tex
BIBTEXSRCS=biblio.bib
USE_PDFLATEX=true
VIEWPDF=open # cause i'm on osx, gv for most unix
XFIGDIRS=img
## For osx users :
include /opt/local/share/latex-mk/latex.gmk
## For unix users :
#include /usr/share/latex-mk/latex.gmk
When I invoke make, the first thing it does is converting some .fig into .pdf files. I'm pretty sure it would do the same with eps files.
If you want to include one EPS figure in latex then you need to at first make the figure in EPS format if it is not in EPS format.Like if your figure is in .jpeg extension, then you need to make it .eps
Then you need to include it in the LaTex with using some code which is common in LaTex and then to make it in pdf format you need to use one small instruction that is \usepackage{epstopdf}
I was also facing this problem and found this post very helpful "How to Convert .eps to PDF in Latex ?"
Now i am able to include EPS figure in LaTex and also can convert it in PDF. I think you will also get help and all the details from the above link.Let me know if you face any further problem.

Resources