Pandoc generation of pdf from markdown 4th header is rendered differently - latex

I am using pandoc to generate a pdf from some markdown. I am using h1 through h4 via hash symbols. Example h1=#, h4=####. When I generate my document like this:
pandoc input.md -o output.pdf
I get a document where h1, h2 and h3 have a newline after them but h4 does not have a new line. The text starts on the same line as the header (it isn't formatted the same way, but there isn't a newline character between).
I've tried adding spaces after the #### and adding manual line returns using my editor but nothing seems to work.
Any ideas?

While the mentioned solutions work fine, pandoc offers a build-in variable to enable block-headings for \paragraph.
pandoc -s -o out.pdf some.md -V block-headings
Pandoc Manual

pandoc generates PDFs via LaTeX. In LaTeX, "headers" are generated using the following commands:
\section
\subsection
\subsubsection
\paragraph
\subparagraph
As you can see, a "level four heading" corresponds to the \paragraph command, which is rendered as you describe. There simply isn't a \subsubsubsection command to use.
The only way to get what you want is to redefine the \paragraph command, which is quite tricky. I haven't been able to make it work with Pandoc.

While #tarleb’s answer is surely the best (except that it specifies a
wrong amount of vertical space), here is a “simpler” (by some measure)
but more hacky (in LaTeX terms at least) solution which optionally uses a Pandoc Lua filter or a LaTeX hack but avoids loading another LaTeX package.
We want the LaTeX source to look something like this:
\hypertarget{level-4-heading}{%
\paragraph{Level 4 heading}\label{level-4-heading}}
\hfill
Lorem ipsum dolor sit amet.
This LaTeX looks awful, but if you don’t need to keep or share the LaTeX
source it does what you probably want: a space between the level 4
heading and the paragraph after it equal to the space between a level 3
heading and the paragraph after it.
Here is how it works: since a \hfill on a line on its own is about as
close as you can get to an empty paragraph in LaTeX you get a first
paragraph — the one running in with the heading — containing only
horizontal white space until the end of the line, and then immediately
after a new paragraph — the actual first paragraph after the heading —
with just a normal paragraph space between it and the heading. This
probably also upsets LaTeX’s idea about what a \paragraph should be
like as little as possible.
The “manual” way to do this is as follows:
#### Level 4 heading
````{=latex}
\hfill
````
Lorem ipsum dolor sit amet.
This uses Pandoc’s relatively new raw markup syntax — the “code block”
is actually a raw LaTeX block — but it looks even more awful than the
resulting LaTeX source! It is also a tedious chore to have to insert
this after every level 4 heading. In other words you want to insert that
raw LaTeX automatically, and that can be done with a Lua filter:
--[======================================================================[
latex-h4-break.lua - Pandoc filter to get break after a level 4 heading.
Usage:
$ pandoc --lua-filter latex-h4-break.lua input.md -o output.pdf
--]======================================================================]
-- create it once, use it many times!
local hfill_block = pandoc.RawBlock('latex', '\\hfill')
function Header (elem)
if 4 == elem.level then
return { elem, hfill_block }
else -- ignore headings at other levels!
return nil
end
end
However you can also do a simple LaTeX hack in a header-includes
metadata block to get the same effect:
---
header-includes:
- |
``` {=latex}
\let\originAlParaGraph\paragraph
\renewcommand{\paragraph}[1]{\originAlParaGraph{#1} \hfill}
```
---
#### Level 4 heading
Lorem ipsum dolor sit amet.
This works by first creating an “alias” of the \paragraph command and
then redefining the \paragraph command itself, using the alias in the
new definition so that now wherever the LaTeX source created by Pandoc
contains \paragraph{Foo} it is if it instead had contained
\paragraph{Foo} \hfill which does what we want with zero extra
dependencies! (In case you wonder the wacky spelling of the “aliased”
command is to minimize the risk that it collides with anything which
already exists, since the TeX \let command doesn’t check for that. We
certainly don’t want to overwrite any existing command!)
NOTE: If you really should want more or less space than a normal
paragraph break after the heading just add an appropriate \vspace
command after the \hfill: \hfill \vspace{-0.5\parskip}.

Shifting Headers
Arguably the best way to solve this would be to avoid the problem altogether by shifting what a level 4 header corresponds to. The default for pandoc is to use \section commands for 1st level and \paragraph for 4th level headers. This can be altered via the --top-level-division parameter:
--top-level-division=[default|section|chapter|part]
Treat top-level headers as the given division type in LaTeX [...] output. The hierarchy order is part, chapter, then section; all headers are shifted such that the top-level header becomes the specified type. The default behavior is to determine the best division type via heuristics [...]
So with --top-level-division=chapter, a 4th-level header would be generated via the \subsubsection command.
Styling via LaTeX
If this is not an option, the next best way is to configure the layout of the corresponding LaTeX command: for level-four headers, this is \paragraph by default. The following methods are taken from TeX StackExchange answers.
Please also check the answer by bpj, which is much simpler than what's proposed below.
Default document-classes
The default way would be to configure \paragraph via the titlesec package. We can use the header-includes metadata field for this, which pandoc will include in the intermediate LaTeX document.
---
header-includes: |
``` {=latex}
\usepackage{titlesec}
\titlespacing*{\paragraph}{0pt}{1ex}{-\parskip}
\titleformat{\paragraph}[hang]
{\normalfont\bfseries}
{}
{0pt}
{}
```
---
KOMA document-classes
Using titlesec won't work properly for documents using KOMA classes (like scrartcl), as KOMA has it's own ways of doing things. For these, use this alternative snippet:
---
documentclass: scrartcl
header-includes: |
``` {=latex}
\makeatletter
\renewcommand\paragraph{\#startsection{paragraph}{4}{\z#}%
{-3.25ex \#plus -1ex \#minus -0.2ex}%
{0.01pt}%
{\raggedsection\normalfont\sectfont\nobreak\size#paragraph}%
}
\makeatother
```
---

I am not sure, why, but this works for me:
Put $\ \\ $ in the first line after your #### headline

Related

Latex: \newcommand interpret # symbol problem

I defined a new command
\newcommand{\test}[1]{\href{https://github.com/microsoft/vscode/blob/main/package.json#1}{#1}}
When I use it as follows:
\test{#L4}
the url will be interpreted as:
https://github.com/microsoft/vscode/blob/main/package.json##L4
There is an extra # in the url, which is unexpected. What I really want is:
https://github.com/microsoft/vscode/blob/main/package.json#L4
which means line 4 of the package.json code
The easiest but no so elegant way to solve the problem is as follows.
\test{\#L4}
but what if other special characters like _ appear in the part of url I copy? It's boring to correct these meaningless grammar mistakes.
Is there any more elegant way to solve the problem? What I want is to copy plain text, which is part of url to the latex code and no extra efforts like adding \ escape character before # and other special characters are needed.
What I want is to copy plain text, which is part of url to the latex code and no extra efforts like adding \ escape character before # and other special characters are needed.
I am afraid it will require extra effort because # is used as a parameter for macros in LaTeX. You can play with redefining categories and when # is consumed, restore its original meaning.
I have another solution based on expl3 where I save the main address and create links with additional parts. See a small demo below
\documentclass{article}
\usepackage[colorlinks]{hyperref}
\usepackage[margin=1in]{geometry} % To fit long links without breaks
\ExplSyntaxOn
\str_new:N \l__xaddr_main_str
\str_new:N \l__xaddr_show_str
\cs_new:Npn \combine_addr:n #1 {
\str_set:Nn \l_tmpa_str {#1}
\str_concat:NNN \l__xaddr_show_str \l__xaddr_main_str \l_tmpa_str
}
\NewDocumentCommand\xsetmainaddr{v}{
\str_set:Nn \l__xaddr_main_str {#1}}
\NewDocumentCommand\xdisplay{v}{%
\str_if_empty:NF \l__xaddr_main_str {
\combine_addr:n {#1}
\str_use:N \l__xaddr_show_str
}}
\NewDocumentCommand\xhref{vv}{%
\str_if_empty:NF \l__xaddr_main_str {
\combine_addr:n {#1}
\href{\l__xaddr_show_str}{#2}
}}
\ExplSyntaxOff
\setlength\parindent{0pt} % To fit long links without breaks
\begin{document}
\xhref{#L4}{GitHub} % Nothing to display. The main address is not defined.
\xsetmainaddr{https://github.com/microsoft/vscode/blob/main/package.json}
\par\xdisplay{#L4} % Display the full address including #L4
\par\xdisplay{#something_else%20here} % all characters accepted
\par\xhref{#L4}{GitHub} % Generates link: <address>#L4
\bigskip
\xsetmainaddr{http://www.google.com} % New address
\par\xhref{}{Google} % Generates link
\end{document}

How to remove parentheses around pandoc citations?

Here's how I do my citations in latex
\documentclass[11pt]{article}
\usepackage[
backend=biber, style=apa,
giveninits=true, uniquename=init,
citestyle=authoryear]{biblatex}
\bibliography{references.bib}
\begin{document}
... catastrophic population declines (\cite{McIntyre__2015}).
\end{document}
I'm using pandoc to convert this to docx or odt so I can get track changes from colleagues.
pandoc ./main.tex -f latex -t odt --bibliography=./references.bib --csl ../apa.csl -o output.odt
However... in the resulting document, pandoc automatically surrounds every \cite call with an extra set of parenthesis.
...catastrophic population declines ((McIntyre et al. 2015)).
I really like doing parentheses manually... is there a way for me to get pandoc to stop adding these extra citation parentheses?
I have the impression that this can be done with lua filters in pandoc... I was hoping someone could give me some pointers in the right direction on how to address this.
A Lua filter could be used to change the citation mode such that the parens are omitted:
function Cite(cite)
for _, c in ipairs(cite.citations) do
c.mode = pandoc.AuthorInText
end
return cite
end
Make sure that the filter runs before citeproc, i.e., it must occur first in the call to pandoc:
pandoc --lua-filter=modify-citations.lua --citeproc ...
The alternative would be to change \cite into \citet.
The answer from tarleb didn't solve the question, but it did lead me to the right documentation.
I now understand that pandoc relies on CSL for the actual formatting of citations, while lua filters can modify what kind of citation is used (author in text, vs author in parenthesis).
<citation et-al-min="3" et-al-use-first="1" disambiguate-add-year-suffix="true" disambiguate-add-names="true" disambiguate-add-givenname="true" collapse="year" givenname-disambiguation-rule="primary-name-with-initials">
<layout prefix="" suffix="" delimiter="; ">
</layout>
</citation>
In my CSL doc, I just removed the parentheses from the prefix and suffix attributes of the <citation> <layout> node.
Now only my manually placed parentheses appear in the compiled doc.
...catastrophic population declines (McIntyre et al., 2015).

Latex ClassicThesis - The numbering of the paragraphs does not appear

I need one more numbered level for my report. I used \paragraph{title} but it only appears without numbering.
In the config file, the paragraph is described as:
\titleformat{\paragraph}[runin]
{\normalfont\normalsize}{\theparagraph}{0pt}{\spacedlowsmallcaps}
I believe that the command \theparagraph is responsible for the numbering. Why doesn't it appear?
I searched the net for answers and tried the following commands (at once) before the beginning of the document:
\setcounter{secnumdepth}{\paragraphnumdepth}
Replace the previous command to get rid of run-in with:
\titleformat{\paragraph}
{\relax}{\textsc{\MakeTextLowercase{\theparagraph}}}{1em}{\normalsize\itshape}
\renewcommand{\theparagraph}{\thesubsection.\arabic{paragraph}}
I put the last two commands in \makeatletter and \makeatother.
The paragraph name appears like the section names now, but still no numbers. Any ideas? Here is a small example that works because I didn't include the classicthesis config files.
\documentclass{scrreprt}
\usepackage[utf8]{inputenc}
\setcounter{secnumdepth}{\paragraphnumdepth}
\begin{document}
\chapter{Chapitre}
\section{Introduction}
\subsection{Première sous-partie}
\subsubsection{Un cran en dessous}
\paragraph{Paragraphe with number: what I would like in my report}
Functional here because the problem clearly comes from the two classicthesis config files...
\end{document}
Thank you

Any way to keep R Markdown from putting a LaTeX environment in a new paragraph?

Environments like align and gather are pretty clearly designed for use within a paragraph of LaTeX text, as two line breaks between the document text and the start of the math environment inserts an egregious two paragraph's worth of vertical white space. Markdown, though, always starts any LaTeX environment two lines below the text that's above it, even if you begin the environment on the very same line of the markdown code/text, and even if you put 2 spaces before it in order to add a single line break. Since there's no multiline math dislay native to markdown, this poses a dilemma.
Running \vspace{-\baselineskip} before the environment compensates well enough, but of course it would be better to just tell markdown not to insert the line breaks in the first place. Is that possible? And if not, then what would be the easiest way to automatically run \vspace{-\baselineskip} before the beginning of each align (and/or align*, gather, gather*, etc.) environment?
MWE:
---
output:
pdf_document:
keep_tex: 1
---
The following environment will get marked up with an extra two lines between it and
this text, putting it on a new paragraph and creating a lot of whitespace above it,
whether or not there's any line breaks in the markdown code:
\begin{gather*}
A \\ B
\end{gather*}
This can of course be hackily corrected by subtracting vertical space:
\vspace{-\baselineskip} \begin{gather*}
A \\ B
\end{gather*}
The best you can do in this situation is to automatically insert \vspace{-\baselineskip} at the start of every specific environment using the etoolbox package:
---
output:
pdf_document:
keep_tex: 1
header-includes:
- \usepackage{etoolbox}
- \AtBeginEnvironment{gather}{\vspace{-\baselineskip}}
- \AtBeginEnvironment{gather*}{\vspace{-\baselineskip}}
---
The following environment will get marked up with an extra two lines between it and
this text, putting it on a new paragraph and creating a lot of whitespace above it,
whether or not there's any line breaks in the markdown code:
\begin{gather*}
A \\ B
\end{gather*}
This can of course be hackily corrected by subtracting vertical space:
\begin{gather*}
A \\ B
\end{gather*}
This, however, is not optimal, as the gap inserted by the environment depends on the amount of text ending the preceding paragraph. As a result of Pandoc's processing, the amount is always the same (\abovedisplayskip), so it may be "better" to use
header-includes:
- \usepackage{etoolbox}
- \AtBeginEnvironment{gather}{\vspace{\dimexpr-\baselineskip-\abovedisplayskip}}
- \AtBeginEnvironment{gather*}{\vspace{\dimexpr-\baselineskip-\abovedisplayskip}}
You'll have to do this for all amsmath-related display alignments.

Slides with Columns in Pandoc

I would like to have code and an image side-by-side in a Beamer slide.
In LaTeX I would do this with columns. I would like to use markdown within the column structure.
\begin{columns}
\column{.5\textwidth}
~~~~~~~~Python
>>> some python code
~~~~~~~
\column{.5\textwidth}
![](A_generated_image.pdf)
\end{columns}
Unfortunately Pandoc doesn't process the markdown within the \begin{columns} and \end{columns} statements. Is there a way around this?
Is there a way to use markdown within inlined LaTeX?
Is there a pure markdown solution?
Current versions of pandoc (i.e., pandoc 2.0 and later) supports fenced divs. Specially named divs are transformed into columns when targeting a slides format:
# This slide has columns
::: columns
:::: column
left
::::
:::: column
right
::::
:::
Pandoc translates this into the following LaTeX beamer code:
\begin{frame}{This slide has columns}
\protect\hypertarget{this-slide-has-columns}{}
\begin{columns}[T]
\begin{column}{0.48\textwidth}
left
\end{column}
\begin{column}{0.48\textwidth}
right
\end{column}
\end{columns}
\end{frame}
This is simple and has the additional advantage of giving similar results when targeting other presentational formats like reveal.js.
More than two columns work out of the box for Beamer output. Powerpoint, however, only supports two columns. For reveal.js, the widths of three or more columns must be given explicitly:
::: columns
:::: {.column width=30%}
left
::::
:::: {.column width=30%}
middle
::::
:::: {.column width=30%}
right
::::
:::
The problem is that pandoc ignores markdown if it finds a \begin{}. An alternative is to edit the beamer template and add the following:
\newcommand{\columnsbegin}{\begin{columns}}
\newcommand{\columnsend}{\end{columns}}
And write it like this:
\columnsbegin
\column{.5\textwidth}
~~~~~~~~Python
>>> some python code
~~~~~~~
\column{.5\textwidth}
![](A_generated_image.pdf)
\columnsend
I hope still valuable. I made a Pandoc filter in Python to put columns easily, so you can write your presentations in this way:
# Hello World
[columns]
[column=0.5]
~~~python
if __name__ == "__main__":
print "Hello World"
~~~
[column=0.5]
This is how a "Hello World" looks like in Python
[/columns]
that the filter will convert each markup to \begin{columns} and \column{.5\textwidth}, so, the document above will turn in
\begin{frame}[fragile]{Hello}
\begin{columns}
\column{0.5\textwidth}
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{some python code}
\end{Highlighting}
\end{Shaded}
\column{0.5\textwidth}
Hello World
\end{columns}
\end{frame}
The code filter is here
import pandocfilters as pf
def latex(s):
return pf.RawBlock('latex', s)
def mk_columns(k, v, f, m):
if k == "Para":
value = pf.stringify(v)
if value.startswith('[') and value.endswith(']'):
content = value[1:-1]
if content == "columns":
return latex(r'\begin{columns}')
elif content == "/columns":
return latex(r'\end{columns}')
elif content.startswith("column="):
return latex(r'\column{%s\textwidth}' % content[7:])
if __name__ == "__main__":
pf.toJSONFilter(mk_columns)
If you never use a pandoc filter, just save the filter to the same document location as columnfilter.py (or other name you want) and run
pandoc -t beamer --filter columnfilter.py yourDocument.mkd
And enjoy!
Beamer Specific Answer. I ran across a solution when attempting to add multiple columns for Pandoc in a regular document. It works here as well, though it does constrain you to Beamer; though that is your use case.
In the slide deck, insert once:
---
header-includes:
- \newcommand{\hideFromPandoc}[1]{#1}
- \hideFromPandoc{
\let\Begin\begin
\let\End\end
}
---
Then add content thus:
\Begin{columns}
\Begin{column}{0.3\textwidth}
Res ipsum loquiter, sed in inferno decit?
\End{column}
\Begin{column}{0.3\textwidth}
Res ipsum loquiter, sed in inferno decit?
\End{column}
\Begin{column}{0.3\textwidth}
Res ipsum loquiter, sed in inferno decit?
\End{column}
\End{columns}
Creating the "hideFromPandoc" command lets you insert begin/end statements throughout without depriving you of markdown in the block.
Fenced Div Answer. There's an answer above that refers to fenced divs. I commented that the answer only works with two columns. It breaks down with more. Here is how that answer works with multiple divs:
::: {.columns}
:::: {.column width=0.3}
Test
::::
:::: {.column width=0.3}
Test
::::
:::: {.column width=0.3}
Test
::::
:::
To get this answer, I had to look at the commit that added the column feature specifically.
You could use FletcherPenney MultiMarkdown which can process markdown to LaTeX/Beamer. Compared to Pandoc, MultiMarkdown has not so many features. However, especially when working with LaTeX it has the advantage that you can embed LaTeX code directly into the Markdown in HTML comments.
Your code would look like this:
<!-- \begin{columns} -->
<!-- \column{.5\textwidth} -->
>>> some python code
<!-- \column{.5\textwidth} -->
![](A_generated_image.pdf)
<!-- \end{columns} -->
For me this solution works fine. With a good editor (e.g. Scrivener, Sublime Text) you can write the latex code without all the comments and find/replace them after editing. In addition, the Metadata support in Multimarkdown is much more flexible, so that it is easier to customize presentations.
In the meantime, I hope that the Pandoc team provides a solution to this problem. I think there are some users who would like to embed small LaTex code particles throughout their markdown documents without having them converted/escaped.
You can use MultiMarkDown comments ( "<!-- Your LaTeX Code inside -->" ) with Pandoc when you enclose the Pandoc command in which you transform your markdown to LaTeX with two sed commands.
In the first sed run, you change the MultiMarkDown comments to "\verb+AAAAAAALaTeX-StuffZZZZZZ+". Then you transform to LaTeX with Pandoc as usual, everything inside "\verb+AAAAAAALaTeX-StuffZZZZZZZ+" is left alone. Then you run sed on the TeX-File and delete the "\verb+AAAAAAA" and "ZZZZZZ+" unfolding your LaTeX code.
The first sed command line before the Pandoc transformation could look like this:
sed -E -e "s/<\\!--(.+)--\\>/\\\\verb\+AAAAAAA\1ZZZZZZZ\+/g " \
source.md > source.i.md
Then use Pandoc on source.i.md as usual to create source.tex. Second sed run like this:
sed -E -e "s/\\\\verb\+AAAAAAA(.+)ZZZZZZZ\+/\1/g" -i "" source.tex
I automated everything in a Makefile so that I can make more changes e.g. to table definitions in one step. On first glance this approach works fine (tested it on column definitions with the beamer class).
With this little sed scripts, you can use all the nice things from Pandoc. You need only to mmd-comment those TeX and LaTeX commands which become either escaped or enclose larger parts of your Markdown.

Resources