Get pandoc to skip certain sections of document - latex

Ok, so I'm converting Word documents to Latex using Pandoc.
Embedded in my word documents are some latex commands that I've constructed using my reference manager to automatically generate Bibtex citations (i.e. \citep{Author1_Author2_Year}.
This would work perfectly, but I need to get pandoc to skip over these sections of the document as currently it is escaping the characters for latex (i.e. it is converted to \citep{Author1_Author2_Year}.
So, what I want to do is tell pandoc to omit conversion each time it reaches a \citep{*} command. I've read the manual, but there doesn't seem to be a straightforward way of doing this.
Am I missing something obvious here? I'm trying to avoid writing another script to go over the tex document and amend the citations.

If you use the pandoc syntax for citations and run pandoc as follows, it will work:
In your one.docx you write:
I want to cite #knuth1984texbook.
With
pandoc -o two.md one.docx
you prepare your markdown-document. Then you create your latex-document:
pandoc --biblatex -o three.tex two.md
and you get three.tex:
I want to cite \textcite{knuth1984texbook}.
Preamble and Postamble are missing. So with
pandoc --standalone --biblatex -o three.tex two.md
you get a complete latex-file, that you can compile with latex.

Related

Rmarkdown with pandoc templates, apply lua filter on intermediate .tex

I'm trying to use lua filters to capture images in my manuscript and list their caption in a special \section at the end of it.
I am working on a rmarkdown document that itself uses a .tex template.
I wasn't able to get anywhere, so I run a very simple filter:
function Header (head) print(pandoc.utils.stringify(head)) end
and noticed that just the headers in the markdown were recognized, not the ones in the ones in the template.
The only way I found to have lua filters recognize the elements in the template was to rerun the produced .tex file with pandoc:
pandoc -f latex -t latex -o test2.tex --lua-filter=my_filters.lua test.tex
but that removed all latex formatting and structure content outside the body, e.g., \documentclass, \usepackage and other custom commands. So it's a no go.
So the question is, is there a way to force lua filter to be applied after the integration of a latex template when knitting a rmarkdown document?
There might be a way, but it most likely won't do what you need.
When pandoc reads a document, it parses it and converts it into it's internal data structure. That internal structure can then be modified with a filter. LaTeX is a very expressive and complex document format, and any conversion from LaTeX into pandoc's internal format will result in a loss of (layout) information. That's good enough in most cases, but would be a problem in your case.
There are two possible ways to do this: one is to post-process the output, which is probably tedious and error-prone. The other is to find a way to generate the desired output, e.g. via a pandoc filter, without adding it to the template first.
I believe your other question is the right way to go.

Pandoc - Support for a custom input latex environment

I am using a single unified LaTeX doc to create problem sets and solutions:
\item What is one plus one?
\begin{soln}
The answer is "two".
\end{soln}
In LaTeX, I define this environment with (simplified):
\NewEnviron{soln}
{
\ifsolutions\expandafter
\BODY
\fi
}
That is, if \solutionsfalse has been defined in LaTeX, it prints:
1. What is one plus one?
and if \solutionstrue has been defined, it prints:
1. What is one plus one?
** The answer is two **
I'm trying to replicate this in pandoc to generate HTML or MD files from the latex input, but I've run against the wall. Pandoc doesn't honor any kind of /if /else /fi statement in LaTeX, I think. Pandoc doesn't honor the comment environment, which would also work with \excludecomment{soln}. So, I can't come up with a shim.tex file that would replicate the 'ignore stuff in the soln environment'.
The next way to go would, I guess, be to do something in luatex that pandoc can talk to, or to define the custom environment to pandoc with a filter? But the documentation for those systems is extremely heavyweight - there's no easy way in.
Can anyone suggest a solution to this?
Ideally, I want to run two different shell commands. Command A should omit all content in the soln environment. Command B, ideally, should turn all regular text blue, and show all content in the soln environment in black color.
(P.S. The xcolor package also seems unsupported in native pandoc, although there is a filter that doesn't work for me.)
Edit
Following comments by #tarleb and #mb21, I guess I have to try to work out how filters work. Again, the documentation here is terrible - it wants you to know everything before you can do anything.
I tried this:
return {
{
RawBlock = function(elem)
print(elem.text)
if starts_with('\\begin{soln}', elem.text) then
return pandoc.RawBlock(elem.format,"SOLN")
else
return elem
end
end,
}
}
and ran it with
pandoc --lua-filter ifdef.lua --mathjax -s hw01.tex -s -o hw01.html
But there is nothing on stdout from the print statement, and my document is unchanged, so the RawBlocks are apparently not processed by the lua filter unless the -f latex+raw_tex flag is passed. But passing those flags means that pandoc doesn't actually process the \include commands in the latex, so my filter wont' see the subdocuments.
Apparently, the answer is "No, pandoc cannot support new latex environment", because it would require modifying the parser. Although the -f latex+raw_tex can disable big parts of the parser, that just means the document is largely unparsed, which isn't what I want.
Please tell me if I'm wrong.

HTML homework created with pandoc from LaTeX includes the answers. Oops

I have a fairly large number (dozens) of homeworks written in LaTeX that I compile to PDF. We have recently adopted a LMS (Canvas) that needs HTML files. Conversion to HTML is easy peasy with pandoc, using the following command.
pandoc -s core3-hw08.tex --mathjax -c test.css -o core3hw08.html
Unfortunately, I included the homework answers following \end{document} in the tex files. This works great with PDF because my command (pdflatex) ignores everything after the end of the document. Pandoc doesn't, and the result is that the homework answers are converted to HTML along with the homework questions. Oops.
I could create an entire additional set of homework tex files without the answers, but that seems a bad solution --- having two identical sets of files with the only difference that one set as the answers and the other doesn't. I also want to keep the answers and questions in the same source document.
Is there any way to tell pandoc to ignore everything after \end{document}? I don't mind revising the source documents, which can be done with a script.

Remove citation from Latex

I am trying to edit a a paper in Latex. But it make some problems in Reference section. I deleted three citation from reference file and remove citation name Like, \cite{X...} from the paper content. But still the citation are showed in the original pdf file. I need a suggestion, what would be solution or better way to do this.
Check the *.log files: you should understand better what's going on. However, citations handling has several steps (simplified version):
latex file.tex outputs a file.aux with the citations
bibtex file.tex outputs a file.bbl with the references
one more (or even more) latex file.tex is needed to get the article with both references and citations
The complete flow (taken from Goosens, Mittelbach, and Samarin (1994) The LaTeX Companion, Figure 12.1, p. 375) is: []
I encountered the same issue and I resolved it by deleting all the build generated files, i.e. *.aux, *.log etc, and rebuilding the .tex file.

How to handle citations in Ipython Notebook?

What is the best way to take care of citations in Ipython Notebook? Ideally, I would like to have a bibtex file, and then, as in latex, have a list of shorthands in Ipython markdown cells, with the full references at the end of the notebook.
The relevant material I found is this: http://nbviewer.ipython.org/github/ipython/nbconvert-examples/blob/master/citations/Tutorial.ipynb
But I couldn't follow the documentation very well. Can anyone explain it? Thanks so much!!
Summary
This solution is largely based on Sylvain Deville's excellent blog post. It allows you to simply write [#citation_key] in markdown cells. The references will be formatted after document conversion. The only requirements are LaTeX and pandoc, which are both widely supported. While there is never a guarantee, this approach should therefore still work in many years time.
Step-by-Step Guide
In addition to a working installation of jupyter you need:
LaTeX (installation guide).
Pandoc (installation guide).
A citation style language. Download a citation style, e.g., APA. Save the .csl file (e.g., apa.csl) into the same folder as your jupyter notebook (or specify the path to the .csl file later).
A .bib file with your references. I am using a sample bib file list.bib. Save to the same folder as your jupyter notebook (or specify the path to the .bib file later).
Once you completed these steps, the rest is easy:
Use markdown syntax for references in markdown cells in your jupyter notebook. E.g., [#Sh:1] where the syntax works like this: ([#citationkey_in_bib_file]). I much prefer this syntax over other solutions because it is so fast to type [#something].
At the end of your ipython notebook, create a code cell with the following syntax to automatically convert your document (note that this is R code, use an equivalent command to system() for python):
#automatic document conversion to markdown and then to word
#first convert the ipython notebook paper.ipynb to markdown
system("jupyter nbconvert --to markdown paper.ipynb")
#next convert markdown to ms word
conversion <- paste0("pandoc -s paper.md -t docx -o paper.docx",
" --filter pandoc-citeproc",
" --bibliography="listb.bib",
" --csl="apa.csl")
system(conversion)
Run this cell (or simply run all cells). Note that the 2nd system call is simply pandoc -s paper.md -t docx -o paper.docx --filter pandoc-citeproc --bibliography=listb.bib --csl=apa.csl. I merely used paste0() to be able to spread this over multiple lines and make it nicer to read.
The output is a word document. If you prefer another document, check out this guide for alternative syntax.
#Extras
If you do not like that your converted document includes the syntax for the document conversion, insert a markdown cell above and below the code cell with the syntax for the conversion. In the cell above, enter <!-- and in the cell below enter -->. This is a regular HTML command for a comment, so the syntax will in between these two cells will be evaluated but not printed.
You can also include a yaml header in your first cell. E.g.,
---
title: This is a great title.
author: Author Name
abstract: This is a great abstract
---
You can use the Document Tools of the Calico suite, which can be installed separately with:
sudo ipython install-nbextension https://bitbucket.org/ipre/calico/downloads/calico-document-tools-1.0.zip
Read the tutorial and watch the YouTube video for more details.
Warning: only the cited references are processed. Therefore, if you fail to cite an article, it won't appear in the References section. As a little working example, copy the following in a Markdown cell and press the "book" icon.
<!--bibtex
#Article{PER-GRA:2007,
Author = {P\'erez, Fernando and Granger, Brian E.},
Title = {{IP}ython: a System for Interactive Scientific Computing},
Journal = {Computing in Science and Engineering},
Volume = {9},
Number = {3},
Pages = {21--29},
month = may,
year = 2007,
url = "http://ipython.org",
ISSN = "1521-9615",
doi = {10.1109/MCSE.2007.53},
publisher = {IEEE Computer Society},
}
#article{Papa2007,
author = {Papa, David A. and Markov, Igor L.},
journal = {Approximation algorithms and metaheuristics},
pages = {1--38},
title = {{Hypergraph partitioning and clustering}},
url = {http://www.podload.org/pubs/book/part\_survey.pdf},
year = {2007}
}
-->
Examples of citations: [CITE](#cite-PER-GRA:2007) or [CITE](#cite-Papa2007).
This should result in the following added Markdown cell:
References
^ PĂ©rez, Fernando and Granger, Brian E.. 2007. IPython: a System for Interactive Scientific Computing. URL
^ Papa, David A. and Markov, Igor L.. 2007. Hypergraph partitioning and clustering. URL
I was able to run it with the following approach:
Insert the html citation as in the tutorial you mentioned.
Create ipython.bib in the "standard" bibtex format. It goes into the same file as your *.ipynb notebook file.
Create the template file as in the tutorial, also in the same directory or else in the (distribution dependent) directory with the other templates. On my system, that's /usr/local/lib/python2.7/dist-packages/IPython/nbconvert/templates/latex.
The tutorial has the template extend latex_article.tplx. On my distribution, it's article.tplx (without latex_).
Run nbconvert with --to latex; that generates an .aux file among other things. Latex will complain about missing references.
Run bibtex yournotebook.aux; this generates yournotebook.bbl. You only need to re-run this if you change references.
Re-run nbconvert either with --to latex or with --to pdf. This generates a .tex file, or else runs all the way to a .pdf.
If you want html output, you can use pandoc to assemble the references into a tidy citation page. This may require some hand-editing to make an html page you can reference from your main document.
If you know that you will be converting your notebook to latex anyway, consider simply adding a "Raw" cell (Ctrl+M R) to the end of the document, containing the bibliography just as you would put it in pure LaTeX.
For example, when I need to reference a couple of external links, I would not even care to do a proper BibTeX thing and simply have a "Raw" cell at the end of the notebook like that:
\begin{thebibliography}{1}
\bibitem{post1}
Holography in Simple Terms. K.Tretyakov (blog post), 2015.\\
\url{http://fouryears.eu/2015/07/24/holography-in-simple-terms/}
\bibtem{book1}
The Importance of Citations. J. Smith. 2010.
\end{thebibliography}
The items can be cited in other Markdown cells using the usual <cite data-cite="post1">(KT, 2015)</cite>
Of course, you can also use proper BibTeX as well. Just add the corresponding Raw cell, e.g:
\bibliographystyle{unsrt}
\bibliography{papers}
This way you do not have to bother editing a separate template file (at the price of cluttering the notebook's HTML export with raw Latex, though).
You should have a look at the latex_envs extension in https://github.com/ipython-contrib/IPython-notebook-extensions (install from this repo, it is the most recent version). This extension contains a way to integrate bibliography using bibtex files and standard latex notation, and generates a bibliography section at the end of the notebook. Style of citations can be (to some extent) customized. Some documentation here https://rawgit.com/jfbercher/latex_envs/master/doc/latex_env_doc.html

Resources