I'm trying to develop a pandoc (v2.18) lua custom writer for kramdown. Kramdown uses $$ as delimiter for display and inline math and so my writer looks like:
function Writer (doc, opts)
local filter = {
Math = function(elem)
local math = elem
if elem.mathtype == 'DisplayMath' then
local delimited = '\n$$' .. elem.text ..'$$\n'
math = pandoc.RawBlock('markdown', delimited)
end
if elem.mathtype == 'InlineMath' then
local delimited = '$$' .. elem.text ..'$$'
math = pandoc.RawInline('markdown', delimited)
end
return math
end
}
return pandoc.write(doc:walk(filter), 'markdown', opts)
end
Now when trying to convert a latex test file called vector.tex this fails with the error message
$ pandoc -t kramdown.lua vector.tex -o vector.md --wrap=preserve
Error running Lua:
PandocLuaError "all choices failed"
stack traceback:
kramdown.lua:21: in function 'Writer'
I realized that it works and I get the output I want by replacing RawBlock with RawInline like
math = pandoc.RawInline('markdown', delimited .. '\n')
So there seems to be a problem with my usage of RawBlock. I am new to pandoc and lua so maybe I'm missing something basic here. Can someone give me a hint what might be the issue here?
Using RawInline works, as Math elements are inline elements. Display math may look like a block, but internally it's still an inline. Filters must replace inline elements with other inlines, and blocks with blocks.
A "Block" is something like a paragraph, list, or block quote, while an "Inline" is text, emphasis, an image, or a link.
Sorry for the abysmal error message, I'll try to improve that.
Related
I'm trying to customize the default HTML output of footnotes from an .odt file.
For example a file with a footnote like this:
Some text with a footnote1
Will render the HTML output below:
<ol class="footnotes">
<li id="fn1" role="doc-endnote">
<p>Content of footnote number 1. ↩︎</p>
</li>
</ol>
I want instead to have a flat paragraph to be output, with hardcoded a number like following:
<p>1. Content of footnote number 1. ↩︎</p>
I've used parts of sample.lua from the Pandoc repo but is not working, the process is blocked by this error:
$ pandoc --lua-filter=my-filter.lua file.odt -o file.html
Error running filter my-filter.lua:
my-filter.lua:7: bad argument #1 to 'gsub' (string expected, got table)
stack traceback:
[C]: in function 'string.gsub'
my-filter.lua:7: in function 'Note'
Below is my attempted script, I guess I'm naively overlooking something obvious or I've badly understood how filters work.
-- Table to store footnotes, so they can be included at the end.
local notes = {}
function Note(s)
local num = #notes + 1
-- insert the back reference right before the final closing tag.
s = string.gsub(s,
'(.*)</', '%1 ↩</')
-- add a list item with the note to the note table.
table.insert(notes, '<p id="fn' .. num .. '">' .. num .. '. ' .. s .. '</p>')
-- return the footnote reference, linked to the note.
return '<a id="fnref' .. num .. '" href="#fn' .. num ..
'"><sup>' .. num .. '</sup></a>'
end
function Pandoc (doc)
local buffer = {}
local function add(s)
table.insert(buffer, s)
end
add(doc)
if #notes > 0 then
for _,note in pairs(notes) do
add(note)
end
end
return table.concat(buffer,'\n') .. '\n'
end
Update
Tweaking part of what #tarleb answered I've managed now to modify the inline note reference link, but apparently the second function is not rendering the list of footnotes at the end of the document. What's missing?
local notes = pandoc.List{}
function Note(note)
local num = #notes + 1
-- add a list item with the note to the note table.
notes:insert(pandoc.utils.blocks_to_inlines(note.content))
-- return the footnote reference, linked to the note.
return pandoc.RawInline('html', '<a id="fnref' .. num .. '" href="#fn' .. num ..
'"><sup>' .. num .. '</sup></a>')
end
function Pandoc (doc)
doc.meta['include-after'] = notes:map(
function (content, i)
-- return a paragraph for each note.
return pandoc.Para({tostring(i) .. '. '} .. content)
end
)
return doc
end
The sample.lua is an example of a custom Lua writer, not a Lua filter. They can look similar, but are quite different. E.g., filter functions modify abstract document elements, while functions in custom writers generally expect strings, at least in the first argument.
A good way to go about this in a filter could be to place the custom rendering in the include-after metadata:
local notes = pandoc.List{}
function Pandoc (doc)
doc.blocks:walk {
Note = function (note)
notes:insert(pandoc.utils.blocks_to_inlines(note.content))
-- Raw HTML goes into an RawInline element
return pandoc.RawInline('html', 'footnote link HTML goes here')
end
}
doc.meta['include-after'] = notes:map(
function (content, i)
-- return a paragraph for each note.
return pandoc.Para({tostring(i) .. ' '} .. content)
end
)
return doc
end
I've managed after some trial and error to get a result that is working as intended, but "stylistically" not absolutely perfect.
Please read my commentary below mostly as an excercise, I'm trying to understand better how to use this great tool the way I wanted, not the way any reasonable person should in a productive way (or any way at all). ;)
What I'd like to improve:
I have to wrap the p elements in a div because as of Pandoc 2.18 is not possible to provide direct attributes to a Paragraph. This is a minor code bloat but acceptable.
I'd like to use a section element instead of a div to put all the notes at end of document (used in the Pandoc function), but I haven't found a way to create a RawBlock element and then add the note blocks to it.
I'm tottaly not proficient in Lua and barely grasped a few concept of how Pandoc works, so I'm pretty confident that what I've done below is non optimal. Suggestions are welcome!
-- working as of Pandoc 2.18
local notes = pandoc.List{}
function Note(note)
local num = #notes + 1
-- create a paragraph for the note content
local footNote = pandoc.Para(
-- Prefix content with number, ex. '1. '
{tostring(num) .. '. '} ..
-- paragraph accept Inline objects as content, Note content are Block objects
-- and must be converted to inlines
pandoc.utils.blocks_to_inlines(note.content) ..
-- append backlink
{ pandoc.RawInline('html', '<a class="footnote-back" href="#fnref' .. num .. '" role="doc-backlink"> ↩︎</a>')}
)
-- it's not possible to render paragraphs with attribute elements as of Pandoc 2.18
-- so wrap the footnote in a <div> with attributes and append the element to the list
notes:insert(pandoc.Div(footNote, {id = 'fn' .. num, role = 'doc-endnote'}))
-- return the inline body footnote reference, linked to the note.
return pandoc.RawInline('html', '<a id="fnref' .. num .. '" href="#fn' .. num ..
'"><sup>' .. num .. '</sup></a>')
end
function Pandoc (doc)
if #notes > 0 then
-- append collected notes to block list, the end of the document
doc.blocks:insert(
pandoc.Div(
notes:map(
function (note)
return note
end
),
-- attributes
{class = 'footnotes', role = 'doc-endnotes'}
)
)
end
return doc
end
I am beginner in Pandoc and Lua who is experimenting with converting Word documents to Markdown. I want to convert chapter headings in Word to paragraph of text in Markdown. Furthermore, I want to insert some text before and after the chapter headings.
To achieve that, I used the following lua filter (sample.lua)
function Header(el)
if el.level == 1 then
return {"something before (",el.content,") something after"}
end
end
after which I performed the conversion using
pandoc --lua-filter=sample.lua -s file.docx -t markdown -o file.txt
where file.docx is just minimal example document containing one chapter heading. However, using this filter, I obtained
something before (
Chapter title
) something after
What I want to get, however, is
something before (Chapter title) something after
but since (if I am not mistaken) el.content is inline element, there are linebreaks around it. I tried to solve this problem by using pandoc documentation and various lua functions, but to no avail, which is why I would kindly ask for help.
Try this instead:
function Header(el)
if el.level == 1 then
return {{"something before ("} .. el.content .. {") something after"}}
end
end
The reason is that el.content is a list of inline elements, and it can be be extended by concatenating lists with additional content. The operator .. is the concatenation operator; it works with strings and pandoc lists.
I'd like to convert a CodeBlock into a LineBlock in a Pandoc Lua filter. The problem with that conversion is that the CodeBlock element has a text property (a string), but the LineBlock expects inline content elements (each word, space, newline etc. its own element). How can I convert the text property into content suitable for LineBlock?
This is how my code looks ATM:
function CodeBlock(el)
-- test for manually generating content
-- return pandoc.LineBlock {{pandoc.Str("Some")}, {pandoc.Space()}, {pandoc.Str("content")}}
-- using read does not work, how can I convert the string el.text?
local contentElements = pandoc.read(el.text)
return pandoc.LineBlock(contentElements)
end
I'm assuming the text in the code block is formatted in Markdown, as that's the most frequently used input format for pandoc.
Your approach is good, there just seems to be some lack of clarity about the different types: pandoc.read takes a string, as in el.text, and returns a Pandoc object, which has a list of Block values in its blocks field.
This list of blocks is an acceptable return value of the CodeBlock function.
To convert the text into a LineBlock, we could modify it such that it becomes a line block in Markdown syntax. Then we can read the resulting text as Markdown using pandoc.read.
Line blocks in pandoc Markdown (and reStructuredText) have a pipe character at the start of each line. So we must add | after each newline character and also prepend it that to the first line.
We can pass the result into pandoc.read, then return the resulting blocks, which should really be just a single LineBlock in our case.
This is the full filter:
function CodeBlock (el)
return pandoc.read('| ' .. el.text:gsub('\n', '\n| '), 'markdown').blocks
end
I have been trying to convert a string into a table for example:
local stringtable = "{{"user123","Banned for cheating"},{"user124","Banned for making alt accounts"}}"
Code:
local table = "{{"user123","Banned for cheating"},{"user124","Banned for making alt accounts"}}"
print(table[1])
Output result:
Line 3: nil
Is there any sort of method of converting a string into a table? If so, let me know.
First of all, your Lua code will not work. You cannot have unescaped double quotes in a string delimited by double quotes. Use single quotes(') within a "-string, " within '...' or use heredoc syntax to be able to use both types of quotes, as shall I in the example below.
Secondly, your task cannot be solved with a regular expression, unless your table structure is very rigid; and even then Lua patterns will not be enough: you will need to use Perl-compatible regular expressions from Lua lrexlib library.
Thirdly, fortunately, Lua has a Lua interpreter available at runtime: the function loadstring. It returns a function that executes Lua code in its argument string. You just need to prepend return to your table code and call the returned function.
The code:
local stringtable = [===[
{{"user123","Banned for cheating"},{"user124","Banned for making alt accounts"}}
]===]
local tbl_func = loadstring ('return ' .. stringtable)
-- If stringtable is not valid Lua code, tbl_func will be nil:
local tbl = tbl_func and tbl_func() or nil
-- Test:
if tbl then
for _, user in ipairs (tbl) do
print (user[1] .. ': ' .. user[2])
end
else
print 'Could not compile stringtable'
end
I'm trying to create a pandoc filter that will help me summarize data. I've seen some filters that create table of contents, but I'd like to organize the index based on content found within headers.
For instance, below I'd like to provide a summary of content based on tagged dates in headers (some headers will not contain dates...)
[nwatkins#sapporo foo]$ cat test.md
# 1 May 2018
some info
# not a date
some data
# 2 May 2018
some more info
I started off by trying to look at the content of the headers. The intention was to just apply a simple regex for different date/time patterns.
[nwatkins#sapporo foo]$ cat test.lua
function Header(el)
return pandoc.walk_block(el, {
Str = function(el)
print(el.text)
end })
end
Unfortunately, this seems to apply the print state for each space-separated string, rather than a concatenation allowing me to analyze an entire header content:
[nwatkins#sapporo foo]$ pandoc --lua-filter test.lua test.md
1
May
2018
not
...
Is there a canonical way to do this in filters? I have yet to see any helper function in the Lua filters documentation.
Update: the dev version now provides the new functions pandoc.utils.stringify and pandoc.utils.normalize_date. They will become part of the next pandoc release (probably 2.0.6). With these, you can test whether a header contains a date with the following code:
function Header (el)
content_str = pandoc.utils.stringify(el.content)
if pandoc.utils.normalize_date(content_str) ~= nil then
print 'header contains a date'
else
print 'not a date'
end
end
There is no helper function yet, but we have plans to provide a pandoc.utils.tostring function in the very near future.
In the meantime, the following snippet (taken from this discussion) should help you to get what you need:
--- convert a list of Inline elements to a string.
function inlines_tostring (inlines)
local strs = {}
for i = 1, #inlines do
strs[i] = tostring(inlines[i])
end
return table.concat(strs)
end
-- Add a `__tostring` method to all Inline elements. Linebreaks
-- are converted to spaces.
for k, v in pairs(pandoc.Inline.constructor) do
v.__tostring = function (inln)
return ((inln.content and inlines_tostring(inln.content))
or (inln.caption and inlines_tostring(inln.caption))
or (inln.text and inln.text)
or " ")
end
end
function Header (el)
header_text = inlines_tostring(el.content)
end