How to change the HTML rendering of a Pandoc element? - lua

I'm trying to customize the default HTML output of footnotes from an .odt file.
For example a file with a footnote like this:
Some text with a footnote1
Will render the HTML output below:
<ol class="footnotes">
<li id="fn1" role="doc-endnote">
<p>Content of footnote number 1. ↩︎</p>
</li>
</ol>
I want instead to have a flat paragraph to be output, with hardcoded a number like following:
<p>1. Content of footnote number 1. ↩︎</p>
I've used parts of sample.lua from the Pandoc repo but is not working, the process is blocked by this error:
$ pandoc --lua-filter=my-filter.lua file.odt -o file.html
Error running filter my-filter.lua:
my-filter.lua:7: bad argument #1 to 'gsub' (string expected, got table)
stack traceback:
[C]: in function 'string.gsub'
my-filter.lua:7: in function 'Note'
Below is my attempted script, I guess I'm naively overlooking something obvious or I've badly understood how filters work.
-- Table to store footnotes, so they can be included at the end.
local notes = {}
function Note(s)
local num = #notes + 1
-- insert the back reference right before the final closing tag.
s = string.gsub(s,
'(.*)</', '%1 ↩</')
-- add a list item with the note to the note table.
table.insert(notes, '<p id="fn' .. num .. '">' .. num .. '. ' .. s .. '</p>')
-- return the footnote reference, linked to the note.
return '<a id="fnref' .. num .. '" href="#fn' .. num ..
'"><sup>' .. num .. '</sup></a>'
end
function Pandoc (doc)
local buffer = {}
local function add(s)
table.insert(buffer, s)
end
add(doc)
if #notes > 0 then
for _,note in pairs(notes) do
add(note)
end
end
return table.concat(buffer,'\n') .. '\n'
end
Update
Tweaking part of what #tarleb answered I've managed now to modify the inline note reference link, but apparently the second function is not rendering the list of footnotes at the end of the document. What's missing?
local notes = pandoc.List{}
function Note(note)
local num = #notes + 1
-- add a list item with the note to the note table.
notes:insert(pandoc.utils.blocks_to_inlines(note.content))
-- return the footnote reference, linked to the note.
return pandoc.RawInline('html', '<a id="fnref' .. num .. '" href="#fn' .. num ..
'"><sup>' .. num .. '</sup></a>')
end
function Pandoc (doc)
doc.meta['include-after'] = notes:map(
function (content, i)
-- return a paragraph for each note.
return pandoc.Para({tostring(i) .. '. '} .. content)
end
)
return doc
end

The sample.lua is an example of a custom Lua writer, not a Lua filter. They can look similar, but are quite different. E.g., filter functions modify abstract document elements, while functions in custom writers generally expect strings, at least in the first argument.
A good way to go about this in a filter could be to place the custom rendering in the include-after metadata:
local notes = pandoc.List{}
function Pandoc (doc)
doc.blocks:walk {
Note = function (note)
notes:insert(pandoc.utils.blocks_to_inlines(note.content))
-- Raw HTML goes into an RawInline element
return pandoc.RawInline('html', 'footnote link HTML goes here')
end
}
doc.meta['include-after'] = notes:map(
function (content, i)
-- return a paragraph for each note.
return pandoc.Para({tostring(i) .. ' '} .. content)
end
)
return doc
end

I've managed after some trial and error to get a result that is working as intended, but "stylistically" not absolutely perfect.
Please read my commentary below mostly as an excercise, I'm trying to understand better how to use this great tool the way I wanted, not the way any reasonable person should in a productive way (or any way at all). ;)
What I'd like to improve:
I have to wrap the p elements in a div because as of Pandoc 2.18 is not possible to provide direct attributes to a Paragraph. This is a minor code bloat but acceptable.
I'd like to use a section element instead of a div to put all the notes at end of document (used in the Pandoc function), but I haven't found a way to create a RawBlock element and then add the note blocks to it.
I'm tottaly not proficient in Lua and barely grasped a few concept of how Pandoc works, so I'm pretty confident that what I've done below is non optimal. Suggestions are welcome!
-- working as of Pandoc 2.18
local notes = pandoc.List{}
function Note(note)
local num = #notes + 1
-- create a paragraph for the note content
local footNote = pandoc.Para(
-- Prefix content with number, ex. '1. '
{tostring(num) .. '. '} ..
-- paragraph accept Inline objects as content, Note content are Block objects
-- and must be converted to inlines
pandoc.utils.blocks_to_inlines(note.content) ..
-- append backlink
{ pandoc.RawInline('html', '<a class="footnote-back" href="#fnref' .. num .. '" role="doc-backlink"> ↩︎</a>')}
)
-- it's not possible to render paragraphs with attribute elements as of Pandoc 2.18
-- so wrap the footnote in a <div> with attributes and append the element to the list
notes:insert(pandoc.Div(footNote, {id = 'fn' .. num, role = 'doc-endnote'}))
-- return the inline body footnote reference, linked to the note.
return pandoc.RawInline('html', '<a id="fnref' .. num .. '" href="#fn' .. num ..
'"><sup>' .. num .. '</sup></a>')
end
function Pandoc (doc)
if #notes > 0 then
-- append collected notes to block list, the end of the document
doc.blocks:insert(
pandoc.Div(
notes:map(
function (note)
return note
end
),
-- attributes
{class = 'footnotes', role = 'doc-endnotes'}
)
)
end
return doc
end

Related

PandocLuaError "all choices failed" in custom pandoc writer

I'm trying to develop a pandoc (v2.18) lua custom writer for kramdown. Kramdown uses $$ as delimiter for display and inline math and so my writer looks like:
function Writer (doc, opts)
local filter = {
Math = function(elem)
local math = elem
if elem.mathtype == 'DisplayMath' then
local delimited = '\n$$' .. elem.text ..'$$\n'
math = pandoc.RawBlock('markdown', delimited)
end
if elem.mathtype == 'InlineMath' then
local delimited = '$$' .. elem.text ..'$$'
math = pandoc.RawInline('markdown', delimited)
end
return math
end
}
return pandoc.write(doc:walk(filter), 'markdown', opts)
end
Now when trying to convert a latex test file called vector.tex this fails with the error message
$ pandoc -t kramdown.lua vector.tex -o vector.md --wrap=preserve
Error running Lua:
PandocLuaError "all choices failed"
stack traceback:
kramdown.lua:21: in function 'Writer'
I realized that it works and I get the output I want by replacing RawBlock with RawInline like
math = pandoc.RawInline('markdown', delimited .. '\n')
So there seems to be a problem with my usage of RawBlock. I am new to pandoc and lua so maybe I'm missing something basic here. Can someone give me a hint what might be the issue here?
Using RawInline works, as Math elements are inline elements. Display math may look like a block, but internally it's still an inline. Filters must replace inline elements with other inlines, and blocks with blocks.
A "Block" is something like a paragraph, list, or block quote, while an "Inline" is text, emphasis, an image, or a link.
Sorry for the abysmal error message, I'll try to improve that.

Splitting strings in Lua

I'm very new to Lua, So sorry if I sound really stupid.
I'm trying to make a program that does something a bit like this:
User input: "Hello world"
Var1: Hello
Var2: world
Because I have no idea what I'm doing, All I have is test = io.read(), And I have no idea what to do next.
I appreciate any help!
Thanks, Morgan.
If you want split words, you can do so:
input = "Hello world"
-- declare a table to store the results
-- use tables instead of single variables, if you don't know how many results you'll have
t_result = {}
-- scan the input
for k in input:gmatch('(%w+)') do table.insert(t_result, k) end
-- input:gmatch('(%w+)')
-- with generic match function will the input scanned for matches by the given pattern
-- it's the same like: string.gmatch(input, '(%w+)')
-- meaning of the search pattern:
---- "%w" = word character
---- "+" = one or more times
---- "()" = capture the match and return it to the searching variable "k"
-- table.insert(t_result, k)
-- each captured occurence of search result will stored in the result table
-- output
for i=1, #t_result do print(t_result[i]) end
-- #t_result: with "#" you get the length of the table (it's not usable for each kind of tables)
-- other way:
-- for k in pairs(t_result) do print(t_result[k]) end
Output:
Hello
world

Concatenating string fragments in Pandoc lua filters

I'm trying to create a pandoc filter that will help me summarize data. I've seen some filters that create table of contents, but I'd like to organize the index based on content found within headers.
For instance, below I'd like to provide a summary of content based on tagged dates in headers (some headers will not contain dates...)
[nwatkins#sapporo foo]$ cat test.md
# 1 May 2018
some info
# not a date
some data
# 2 May 2018
some more info
I started off by trying to look at the content of the headers. The intention was to just apply a simple regex for different date/time patterns.
[nwatkins#sapporo foo]$ cat test.lua
function Header(el)
return pandoc.walk_block(el, {
Str = function(el)
print(el.text)
end })
end
Unfortunately, this seems to apply the print state for each space-separated string, rather than a concatenation allowing me to analyze an entire header content:
[nwatkins#sapporo foo]$ pandoc --lua-filter test.lua test.md
1
May
2018
not
...
Is there a canonical way to do this in filters? I have yet to see any helper function in the Lua filters documentation.
Update: the dev version now provides the new functions pandoc.utils.stringify and pandoc.utils.normalize_date. They will become part of the next pandoc release (probably 2.0.6). With these, you can test whether a header contains a date with the following code:
function Header (el)
content_str = pandoc.utils.stringify(el.content)
if pandoc.utils.normalize_date(content_str) ~= nil then
print 'header contains a date'
else
print 'not a date'
end
end
There is no helper function yet, but we have plans to provide a pandoc.utils.tostring function in the very near future.
In the meantime, the following snippet (taken from this discussion) should help you to get what you need:
--- convert a list of Inline elements to a string.
function inlines_tostring (inlines)
local strs = {}
for i = 1, #inlines do
strs[i] = tostring(inlines[i])
end
return table.concat(strs)
end
-- Add a `__tostring` method to all Inline elements. Linebreaks
-- are converted to spaces.
for k, v in pairs(pandoc.Inline.constructor) do
v.__tostring = function (inln)
return ((inln.content and inlines_tostring(inln.content))
or (inln.caption and inlines_tostring(inln.caption))
or (inln.text and inln.text)
or " ")
end
end
function Header (el)
header_text = inlines_tostring(el.content)
end

Pattern not matching *(%(*.%))

I'm trying to learn how patterns (implemented in string.gmatch, etc.) do work in Lua 5.3, from the reference manual.
(Thanks #greatwolf for correcting my interpretation about the pattern item using *.)
What I'm trying to do is to match '(%(.*%))*' (substrings enclosed by ( and ); for example, '(grouped (etc))'), so that it logs
(grouped (etc))
(etc)
or
grouped (etc)
etc
But it does nothing 😐 (online compiler).
local test = '(grouped (etc))'
for sub in test:gmatch '(%(.*%))*' do
print(sub)
end
Another possibility -- using recursion:
function show(s)
for s in s:gmatch '%b()' do
print(s)
show(s:sub(2,-2))
end
end
show '(grouped (etc))'
I don't think you can do this with gmatch but using %b() along with the while loop may work:
local pos, _, sub = 0
while true do
pos, _, sub = ('(grouped (etc))'):find('(%b())', pos+1)
if not sub then break end
print(sub)
end
This prints your expected results for me.
local test = '(grouped (etc))'
print( test:match '.+%((.-)%)' )
Here:
. +%( catch the maximum number of characters until it %( ie until the last bracket including it, where %( just escapes the bracket.
(.-)%) will return your substring to the first escaped bracket %)

Lua: Quoted arguments passed as one in function

I'm attempting to simplify a script, and my attempts are failing. I'm making a function that will pass the given arguments and turn them into an indexed table, but I want to be able to pass quoted and non-quoted alike and have the function recognize that quoted arguments are considered one value while also respecting non-quoted arguments.
For example:
makelist dog "brown mouse" cat tiger "colorful parrot"
should return an indexed table like the following:
list_table = {"dog", "brown mouse", "cat", "tiger", "colorful parrot"}
The code I have works for quoted, but it's messing up on the non-quoted, and on top of that, adds the quoted arguments a second time. Here's what I have:
function makelist(str)
require 'tprint'
local list_table = {}
for word in string.gmatch(str, '%b""') do
table.insert(list_table, word)
end
for word in string.gmatch(str, '[^%p](%a+)[^%p]') do
table.insert(list_table, word)
end
tprint(list_table)
end
I'm not understanding why the omission of quotes is being ignored, and also is chopping off the first letter. That is, this is the output I receive from tprint (a function that prints a table out, not relevant to the code):
makelist('dog "brown mouse" cat tiger "colorful parrot"')
1=""brown mouse""
2=""colorful parrot""
3="og"
4="rown"
5="mouse"
6="cat"
7="tiger"
8="olorful"
9="parrot"
As you can see, 'd', 'b', and 'c' are missing. What fixes do I need to make so that I can get the following output instead?
1="brown mouse"
2="colorful parrot"
3="dog"
4="cat"
5="tiger"
Or better yet, have them retain the same order they were dictated as arguments, if that's possible at all.
local function makelist(str)
local t = {}
for quoted, non_quoted in ('""'..str):gmatch'(%b"")([^"]*)' do
table.insert(t, quoted ~= '""' and quoted:sub(2,-2) or nil)
for word in non_quoted:gmatch'%S+' do
table.insert(t, word)
end
end
return t
end
It may be easier to simply split on whitespaces and concatenate those elements that are inside quotes. Something like this may work (I added few more test cases):
function makelist(str)
local params, quoted = {}, false
for sep, word in str:gmatch("(%s*)(%S+)") do
local word, oquote = word:gsub('^"', "") -- check opening quote
local word, cquote = word:gsub('"$', "") -- check closing quote
-- flip open/close quotes when inside quoted string
if quoted then -- if already quoted, then concatenate
params[#params] = params[#params]..sep..word
else -- otherwise, add a new element to the list
params[#params+1] = word
end
if quoted and word == "" then oquote, cquote = 0, oquote end
quoted = (quoted or (oquote > 0)) and not (cquote > 0)
end
return params
end
local list = makelist([[
dog "brown mouse" cat tiger " colorful parrot " "quoted"
in"quoted "terminated by space " " space started" next "unbalanced
]])
for k, v in ipairs(list) do print(k, v) end
This prints the following list for me:
1 dog
2 brown mouse
3 cat
4 tiger
5 colorful parrot
6 quoted
7 in"quoted
8 terminated by space
9 space started
10 next
11 unbalanced
First thanks for your question, got me to learn the basics of Lua!
Second, so I think you went with your solution in a bit of misdirection. Looking at the question I just said why don't you split once by the quotes (") and than choose where you want to split by space.
This is what I came up with:
function makelist(str)
local list_table = {}
i=0
in_quotes = 1
if str:sub(0,1) == '"' then
in_quotes = 0
end
for section in string.gmatch(str, '[^"]+') do
i = i + 1
if (i % 2) == in_quotes then
for word in string.gmatch(section, '[^ ]+') do
table.insert(list_table, word)
end
else
table.insert(list_table, section)
end
end
for key,value in pairs(list_table) do print(key,value) end
end
The result:
1 dog
2 brown mouse
3 cat
4 tiger
5 colorful parrot

Resources