I am using Pandoc to generate a list of publications for my website. I'm using it solely to generate the html with the publications so that I can then paste the raw html in jekyll. This part works fine.
The complications arise when I tty to generate the html so that my name appears boldfaced in all entries. I'm trying to use this solution for that, which works when I apply it to a pure Latex document I am generating. However when I try to apply the same Pandoc, the html is generated without any boldface.
Here's my Pandoc file:
---
bibliography: /home/tomas/Dropbox/cv/makerefen4/selectedpubs.bib
nocite: '#'
linestretch: 1.5
fontsize: 12pt
header-includes: |
\usepackage[
backend=biber,
dashed=false,
style=authoryear-icomp,
natbib=true,
url=false,
doi=true,
eprint=false,
sorting=ydnt, %Year (Descending) Name Title
maxbibnames=99
]{biblatex}
\renewcommand{\mkbibnamegiven}[1]{%
\ifitemannotation{highlight}
{\textbf{#1}}
{#1}}
\renewcommand*{\mkbibnamefamily}[1]{%
\ifitemannotation{highlight}
{\textbf{#1}}
{#1}}
...
And here's the relevant part of my Makefile:
PANDOC_OPTIONS=--columns=80
PANDOC_HTML_OPTIONS=--filter=pandoc-citeproc --csl=els-modified.csl --biblatex
Again: this code generates the references fine. It just doesn't boldface anything as it is supposed to.
Any ideas?
EDIT
Bib entries look like this
#MISC{test,
AUTHOR = {Last1, First1 and Last2, First2 and Last3, First3},
AUTHOR+an = {2=highlight},
}
And versions are
- Biblatex 3.12
- Biber 2.12
You can use a lua filter to modify the AST. The following works for me to get the surname and initials (Smith, J.) highlighted in the references (see here). You can replace pandoc.Strong with pandoc.Underline or pandoc.Emph. Replace Smith and J. with your name/initials, save it as myname.lua and use something like:
pandoc --citeproc --bibliography=mybib.bib --csl.mycsl.csl --lua-filter=myname.lua -o refs.html refs.md
i.e. put the filter last.
local highlight_author_filter = {
Para = function(el)
if el.t == "Para" then
for k,_ in ipairs(el.content) do
if el.content[k].t == "Str" and el.content[k].text == "Smith,"
and el.content[k+1].t == "Space"
and el.content[k+2].t == "Str" and el.content[k+2].text:find("^J.") then
local _,e = el.content[k+2].text:find("^J.")
local rest = el.content[k+2].text:sub(e+1)
el.content[k] = pandoc.Strong { pandoc.Str("Smith, J.") }
el.content[k+1] = pandoc.Str(rest)
table.remove(el.content, k+2)
end
end
end
return el
end
}
function Div (div)
if 'refs' == div.identifier then
return pandoc.walk_block(div, highlight_author_filter)
end
return nil
end
Notes:
The above works if you use an author-date format csl. If you want to use a numeric format csl (e.g. ieee.csl or nature.csl) you will need to substitute Span for Para in the filter, i.e.:
Span = function(el)
if el.t == "Span" then
If you also want to use the multiple-bibliographies lua filter, it should go before the author highlight filter. And 'refs' should be 'refs_biblio1' or 'refs_biblio2' etc, depending on how you have defined them:
function Div (div)
if 'refs' or 'refs_biblio1’ or 'refs_biblio2’ == div.identifier then
For pdf output, you will also need to add -V csl-refs in the pandoc command if you use a numeric format csl.
The filter highlights Smith, J. if formated in this order by the csl. Some csl will use this format for the first author and then switch to J. Smith for the rest, so you will have to adjust the filter accordingly adding an extra if el.content[k].t == "Str”…etc. Converting to .json first will help to check the correct formatting in the AST.
Related
I am using quarto to create a lesson with lots of mixed in tex. I want two version of this lesson -- one with answers and one without the answers -- in the same document to avoid trying to keep the two files consistent. What is the easiest way to do this?
I tried
::: {.content-visible XXX}
Will only appear in HTML.
:::
but that only seems to be working if I want to change the document output format. I want it to toggle with just a TRUE or FALSE value. I also tried
```{r, eval = showText, echo = TRUE, output = "asis"}
The probability that $n^2$ items happy is `dpois(n, 1)`.
```
However, this gives me the error
Error: unexpected symbol in "The probability"
Furthermore, it doesn't render the latex $n^2$
UPDATE: I tried the Lua filter approach without success.
---
title: "Conditional Content"
format: pdf
editor: visual
hide-answer: false
filters:
- hide-answer.lua
---
## Answer Test
- This is a list
- This $n^2$ works
- This is another element
- **Question:** What is my name?
::: answer
- Why is this $n^2$ failing?
:::
- Continuation
You can use Lua filter to create an option which if true, will remove all the contents within answer divs.
---
title: Conditional content
format: pdf
hide-answer: true
filters:
- hide-answer.lua
---
## Part 01
**Question 01: What is the probability that .... ?**
::: answer
The probability that $n^2$ items happy is `dpois(n, 1)`.
:::
hide-answer.lua
function answer()
return {
Div = function(el)
if el.classes:includes('answer') then
return pandoc.Null()
else
return el
end
end
}
end
function Pandoc(doc)
local meta = doc.meta
local hide = meta['hide-answer']
if hide then
return doc:walk(answer())
end
end
output when hide-answer: true,
output when hide-answer: false,
I’m trying to do the same at the moment and am exploring the use of profiles. https://quarto.org/docs/projects/profiles.html
I’d love to give a more complete answer, but just wanted to share that. I’m creating a profile for student (without answers) and TA (with answers).
If any experts have further advice, please feel free!
Best wishes!
I am trying to make a WYSIWYG internal tool. And we decided to implement this feature with contentEditable. However, we save data to our databases in markdown. So I have to be able to parse from html to md and back. For html to md I use package html2md and for the other way around I use Markdown package.
The issue i've been having is that when you write to my editor text like
HEY
After many lines some text
It produces this in md
HEY
After many lines some text
Notably it uses 2 whitespace and 2 LF characters (or atleast i think so but i might be slightly wrong.) I solved this issue by parsing it like this
markdownToHtml(data.replaceAll('&', '&').replaceAll('<', '<').replaceAll('>', '>'), inlineSyntaxes: [TextSyntax(String.fromCharCodes([32,32,10,10]),sub: "<div><br></div>")],inlineOnly: true );
The inline only parameter was neccesary because without it the text syntax wasnt applied for some reason. However this inline only then bit me in the arse when I tried to implement parsing of unordered lists, which are parsed as blocks. So I need a way to correctly parse these empty lines without using inline only.
class EmptyLineBlockSyntax extends BlockSyntax{
RegExp get pattern => RegExp(r'^(?:[ \t][ \t]+)$');
const EmptyLineBlockSyntax();
Node parse(BlockParser parser) {
parser.encounteredBlankLine = true;
parser.advance();
return Element('p',[Element.empty('br')]);
}
}
return markdownToHtml(data.replaceAll('&', '&').replaceAll('<', '<').replaceAll('>', '>'), blockSyntaxes: [EmptyLineBlockSyntax()]);
The Pandoc documentation says that cross references can be made to section headers in a number of ways. For example, you can create your own ID and reference that ID. For example:
# This is my header {#header}
Will create an ID with value '#header' that can be refenced in the text, as such:
[Link to header](#header)
Which will display the text 'Link to header' with a link to the header.
I couldn't find anywhere how to make the text of the link be the section number when compiled as a LaTeX document.
For example, if my header is compiled to '1.2.3 Section Header', I want my cross-reference to text to display as '1.2.3'.
This can be achieved by defining the ID as done previously. eg:
# This is my header {#header}
Then in the text, the cross reference can be written as:
\ref{header}
When this compiles to LaTeX, the cross-reference text will be the section number of the referenced heading.
You can use the pandoc-secnos filter, which is part of the pandoc-xnos filter suite.
The header
# This is my header {#sec:header}
is referenced using #sec:header. Alternatively, you can reference
# This is my header
using #sec:this-is-my-header.
Markdown documents coded in this way can be processed by adding --filter pandoc-secnos to the pandoc call. The --number-sections option should be used as well. The output uses LaTeX's native commands (i.e., \label and \ref or \cref).
The benefit to this approach is that output in other formats (html, epub, docx, ...) is also possible.
A general solution which works with all supported output formats can be build by leveraging pandoc Lua filters: The function pandoc.utils.hierarchicalize can be used to get the document hierarchy. We can use this to associate section IDs with section numbers, which can later be used to add these numbers to links with no link description (e.g., [](#myheader)).
local hierarchicalize = (require 'pandoc.utils').hierarchicalize
local section_numbers = {}
function populate_section_numbers (doc)
function populate (elements)
for _, el in pairs(elements) do
if el.t == 'Sec' then
section_numbers['#' .. el.attr.identifier] = table.concat(el.numbering, '.')
populate(el.contents)
end
end
end
populate(hierarchicalize(doc.blocks))
end
function resolve_section_ref (link)
if #link.content > 0 or link.target:sub(1, 1) ~= '#' then
return nil
end
local section_number = pandoc.Str(section_numbers[link.target])
return pandoc.Link({section_number}, link.target, link.title, link.attr)
end
return {
{Pandoc = populate_section_numbers},
{Link = resolve_section_ref}
}
The above should be saved to a file and then passed to pandoc via the --lua-filter option.
Example
Using the example from the question
# This is my header {#header}
## Some subsection
See section [](#header), especially [](#some-subsection)
Using the above filter, the last line will render as "See section 1, especially 1.1".
Don't forget to call pandoc with option --number-sections, or headers will not be numbered.
Since pandoc version 2.8 the function pandoc.utils.hierarchicalize has been replaced with make_sections. Here is an updated version of the #tarleb's answer which works with newer ´pandoc´ versions.
local make_sections = (require 'pandoc.utils').make_sections
local section_numbers = {}
function populate_section_numbers (doc)
function populate (elements)
for _, el in pairs(elements) do
if el.t == 'Div' and el.attributes.number then
section_numbers['#' .. el.attr.identifier] = el.attributes.number
populate(el.content)
end
end
end
populate(make_sections(true, nil, doc.blocks))
end
function resolve_section_ref (link)
if #link.content > 0 or link.target:sub(1, 1) ~= '#' then
return nil
end
local section_number = pandoc.Str(section_numbers[link.target])
return pandoc.Link({section_number}, link.target, link.title, link.attr)
end
return {
{Pandoc = populate_section_numbers},
{Link = resolve_section_ref}
}
I am using Sphinx to write a document with lots of references:
.. _human-factor:
The Human Factor
================
...
(see :ref:`human-factor` for details)
The compiled document contains something like this:
(see The Human Factor for details)
Instead I would like to have it formatted like this:
(see 5.1 The Human Factor for details)
I tried to google the solution and I found out that the latex hyperref package can do this but I have no idea how to add this to the Sphinx build.
I resolved it by basically using numsec.py from here: https://github.com/jterrace/sphinxtr
I had to replace the doctree_resolved function with this one to get section number + title (e.g. "5.1 The Human Factor").
def doctree_resolved(app, doctree, docname):
secnums = app.builder.env.toc_secnumbers
for node in doctree.traverse(nodes.reference):
if 'refdocname' in node:
refdocname = node['refdocname']
if refdocname in secnums:
secnum = secnums[refdocname]
emphnode = node.children[0]
textnode = emphnode.children[0]
toclist = app.builder.env.tocs[refdocname]
anchorname = None
for refnode in toclist.traverse(nodes.reference):
if refnode.astext() == textnode.astext():
anchorname = refnode['anchorname']
if anchorname is None:
continue
linktext = '.'.join(map(str, secnum[anchorname]))
node.replace(emphnode, nodes.Text(linktext
+ ' ' + textnode))
To make it work one needs to include the numsec extension in conf.py and also to add :numbered: in the toctree like so:
.. toctree::
:maxdepth: 1
:numbered:
I have a Rails site, where the content is written in markdown. I wish to display a snippet of each, with a "Read more.." link.
How do I go about this? Simple truncating the raw text will not work, for example..
>> "This is an [example](http://example.com)"[0..25]
=> "This is an [example](http:"
Ideally I want to allow the author to (optionally) insert a marker to specify what to use as the "snippet", if not it would take 250 words, and append "..." - for example..
This article is an example of something or other.
This segment will be used as the snippet on the index page.
^^^^^^^^^^^^^^^
This text will be visible once clicking the "Read more.." link
The marker could be thought of like an EOF marker (which can be ignored when displaying the full document)
I am using maruku for the Markdown processing (RedCloth is very biased towards Textile, BlueCloth is extremely buggy, and I wanted a native-Ruby parser which ruled out peg-markdown and RDiscount)
Alternatively (since the Markdown is translated to HTML anyway) truncating the HTML correctly would be an option - although it would be preferable to not markdown() the entire document, just to get the first few lines.
So, the options I can think of are (in order of preference)..
Add a "truncate" option to the maruku parser, which will only parse the first x words, or till the "excerpt" marker.
Write/find a parser-agnostic Markdown truncate'r
Write/find an intelligent HTML truncating function
Write/find an intelligent HTML truncating function
The following from http://mikeburnscoder.wordpress.com/2006/11/11/truncating-html-in-ruby/, with some modifications will correctly truncate HTML, and easily allow appending a string before the closing tags.
>> puts "<p><b>Something</p>".truncate_html(5, at_end = "...")
=> <p><b>Someth...</b></p>
The modified code:
require 'rexml/parsers/pullparser'
class String
def truncate_html(len = 30, at_end = nil)
p = REXML::Parsers::PullParser.new(self)
tags = []
new_len = len
results = ''
while p.has_next? && new_len > 0
p_e = p.pull
case p_e.event_type
when :start_element
tags.push p_e[0]
results << "<#{tags.last}#{attrs_to_s(p_e[1])}>"
when :end_element
results << "</#{tags.pop}>"
when :text
results << p_e[0][0..new_len]
new_len -= p_e[0].length
else
results << "<!-- #{p_e.inspect} -->"
end
end
if at_end
results << "..."
end
tags.reverse.each do |tag|
results << "</#{tag}>"
end
results
end
private
def attrs_to_s(attrs)
if attrs.empty?
''
else
' ' + attrs.to_a.map { |attr| %{#{attr[0]}="#{attr[1]}"} }.join(' ')
end
end
end
Here's a solution that works for me with Textile.
Convert it to HTML
Truncate it.
Remove any HTML tags that got cut in half with
html_string.gsub(/<[^>]*$/, "")
Then, uses Hpricot to clean it up and close unclosed tags
html_string = Hpricot( html_string ).to_s
I do this in a helper, and with caching there's no performance issue.
You could use a regular expression to find a line consisting of nothing but "^" characters:
markdown_string = <<-eos
This article is an example of something or other.
This segment will be used as the snippet on the index page.
^^^^^^^^^^^^^^^
This text will be visible once clicking the "Read more.." link
eos
preview = markdown_string[0...(markdown_string =~ /^\^+$/)]
puts preview
Rather than trying to truncate the text, why not have 2 input boxes, one for the "opening blurb" and one for the main "guts". That way your authors will know exactly what is being show when without having to rely on some sort of funkly EOF marker.
I will have to agree with the "two inputs" approach, and the content writer would need not to worry, since you can modify the background logic to mix the two inputs in one when showing the full content.
full_content = input1 + input2 // perhaps with some complementary html, for a better formatting
Not sure if it applies to this case, but adding the solution below for the sake of completeness. You can use strip_tags method if you are truncating Markdown-rendered contents:
truncate(strip_tags(markdown(article.contents)), length: 50)
Sourced from:
http://devblog.boonecommunitynetwork.com/rails-and-markdown/
A simpler option that just works:
truncate(markdown(item.description), length: 100, escape: false)