how can I parse json-ld to markdown - parsing

Is there an existing parser to parse json-ld to markdown? I want to generate documentation from my jsonld file. If such a thing doesn't exist, how should I go ahead writing one? or perhaps I could use a json to markdown converter? Any suggestions on how could do this?

I was just googling for such a program, and found your question.
The closest things I could find are: ocxmd, which is an extension to Markdown; and md-ld, which does not even use proper Markdown - instead, it apparently creates an incompatible version of the format which can be parsed to JSON-LD.
If I were writing such a converter in Python, I would use:
pyld to parse JSON-LD files and expand them using the #context;
And a template engine, likely Jinja2, to generate Markdown representation of every node of the JSON-LD document.
The program would be based on recursion. You might have separate functions to display:
URIs,
Numbers,
Images,
...
The program will recurse over the JSON-LD document and convert each of its sections into Markdown format.

Related

Rmarkdown with pandoc templates, apply lua filter on intermediate .tex

I'm trying to use lua filters to capture images in my manuscript and list their caption in a special \section at the end of it.
I am working on a rmarkdown document that itself uses a .tex template.
I wasn't able to get anywhere, so I run a very simple filter:
function Header (head) print(pandoc.utils.stringify(head)) end
and noticed that just the headers in the markdown were recognized, not the ones in the ones in the template.
The only way I found to have lua filters recognize the elements in the template was to rerun the produced .tex file with pandoc:
pandoc -f latex -t latex -o test2.tex --lua-filter=my_filters.lua test.tex
but that removed all latex formatting and structure content outside the body, e.g., \documentclass, \usepackage and other custom commands. So it's a no go.
So the question is, is there a way to force lua filter to be applied after the integration of a latex template when knitting a rmarkdown document?
There might be a way, but it most likely won't do what you need.
When pandoc reads a document, it parses it and converts it into it's internal data structure. That internal structure can then be modified with a filter. LaTeX is a very expressive and complex document format, and any conversion from LaTeX into pandoc's internal format will result in a loss of (layout) information. That's good enough in most cases, but would be a problem in your case.
There are two possible ways to do this: one is to post-process the output, which is probably tedious and error-prone. The other is to find a way to generate the desired output, e.g. via a pandoc filter, without adding it to the template first.
I believe your other question is the right way to go.

Parsing and pretty printing the same file format in Haskell

I was wondering, if there is a standard, canonical way in Haskell to write not only a parser for a specific file format, but also a writer.
In my case, I need to parse a data file for analysis. However, I also simulate data to be analyzed and save it in the same file format. I could now write a parser using Parsec or something equivalent and also write functions that perform the text output in the way that it is needed, but whenever I change my file format, I would have to change two functions in my code. Is there a better way to achieve this goal?
Thank you,
Dominik
The BNFC-meta package https://hackage.haskell.org/package/BNFC-meta-0.4.0.3
might be what you looking for
"Specifically, given a quasi-quoted LBNF grammar (as used by the BNF Converter) it generates (using Template Haskell) a LALR parser and pretty pretty printer for the language."
update: found this package that also seems to fulfill the objective (not tested yet) http://hackage.haskell.org/package/syntax

Convert Mathjax ouptut in a readable format for jqmath

On a project, I use ck editor with mathjax plugin in order to insert some formulas.
In a another part of this project, I would like to use jqmath. Cause it's faster and more integrated in wkhtmltopdf (I use those formulas in some docs produced by wkhtmltopdf, and some issues exist with mathjax, especialy over bar).
My problem: syntax is different between mathjax and jqmath. Of course, jqmath doesn't care about my formulas syntaxed under mathjax...
So my question is: does it exist a way to convert maths strings from mathjax to jqmath syntax?
Cheers
Both MathJax and jqmath use MathML internally and both understand it as an input format (jqmath added MathML input support a while back, see the copy-me.html in the distribution). So you can generate MathML from MathJax and feed that into jqmath.

Stanford NLP - Using Parsed or Tagged text to generate Full XML

I'm trying to extract data from the PennTreeBank, Wall Street Journal corpus. Most of it already has the parse trees, but some of the data is only tagged.
i.e. wsj_DDXX.mrg and wsj_DDXX.pos files.
I would like to use the already parsed trees and tagged data in these files so as not to use the parser and taggers within CoreNLP, but I still want the output file format that CoreNLP gives; namely, the XML file that contains the dependencies, entity coreference, and the parse tree and tagged data.
I've read many of the java docs but I cannot figure out how to get it the way I described.
For POS, I tried using the LexicalizedParser and it allows me to use the tags, but I can only generate an XML file with the some of the information I want; there is no option for coreference or generating the parse trees. To get it to correctly generate the sub-optimal XML files here, I had to write a script to get rid of all of the brackets within the files. This is the command I use:
java -cp "*" edu.stanford.nlp.parser.lexparser.LexicalizedParser -outputFormat typedDependenciesCollapsed,wordsAndTags -outputFilesExtension xml -outputFormatOptions xml -writeOutputFiles -outputFilesDirectory my\dir -tokenized -tagSeparator / -tokenizerFactory edu.stanford.nlp.process.WhitespaceTokenizer -tokenizerMethod newCoreLabelTokenizerFactory edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz my\wsj\files\dir
I also can't generate the data I would like to have for the WSJ data that already has the trees. I tried using what is said here and I looked at the corresponding Javadocs. I used the command similar to what is described. But I had to write a python program to retrieve the stdout data resulting from analyzing each file and wrote it into a new file. This resulting data is only a text file with the dependencies and is not in the desired XML notation.
To summarize, I would like to use the POS and tree data from these PTB files in order to generate a CoreNLP parse corresponding to what would occur if I used CoreNLP on a regular text file. The pseudo command would be like this:
java -cp "*" edu.stanford.nlp.pipeline.CoreNLP -useTreeFile wsj_DDXX.mrg
and
java -cp "*" edu.stanford.nlp.pipeline.CoreNLP -usePOSFile wsj_DDXX.pos
Edit: fixed a link.
Yes, this is possible, but a bit tricky and there is no out of the box feature that can do this, so you will have to write some code. The basic idea is to replace the tokenize, ssplit and pos annotators (and in case you also have trees the parse annotator) with your code that loads these annotations from your annotated files.
On a very high level you have to do the following:
Load your trees with MemoryTreebank
Loop through all the trees and for each tree create a sentence CoreMap to which you add
a TokensAnnotation
a TreeAnnotation and the SemanticGraphCoreAnnotations
Create an Annotation object with a list containing the CoreMap objects for all sentences
Run the StanfordCoreNLP pipeline with the annotators option set to lemma,ner,dcoref and the option enforceRequirements set to false.
Take a look at the individual annotators to see how to add the required annotations. E.g. there is a method in ParserAnnotatorUtils that adds the SemanticGraphCoreAnnotations.

Lighweight markup (wiki) language for documenting

When I write papers or documentation it makes think using LaTeX or OpenOffice is overkill as I usually only need some markup elements (bold, headlines, lists, ...) . I'd like to write my documents using a wiki style markup as this is very efficient.
For example:
= Introduction =
'''HTML''' is a markup language...
In the end I'd like to simply convert it to PDF. (Cross-platform was nice too.)
compiler.exe -pdf input.wiki output.pdf
Is there a tool (or simple tool chain) to do this job?
I'd personally like to not make use of LaTeX as a transformation step. There are tools doing this job transforming lightweight syntax to TeX and then to PDF/PS.
You might find that MarkDown gets pretty close to what you want.
MarkDown is a simple technique for marking up text files so that they can be post-processed into other forms. One of the nice things about MarkDown is their goal that a marked-up document should be simply readable as a straight text file:
The overriding design goal for
Markdown’s formatting syntax is to
make it as readable as possible. The
idea is that a Markdown-formatted
document should be publishable as-is,
as plain text, without looking like
it’s been marked up with tags or
formatting instructions.
PanDoc looks like it might be good companian tool to convert the MarkDown straight into PDF files. There may well be other choices - PanDoc is just the best tool I found with a quick Google search.
reStructuredText.
You can use Sphinx to generate HTML and LaTeX (and later PDF with pdflatex).
There is also rst2pdf, don't know if it's mature.
You could use Markdown (example) and then use Pandoc (which also works with reStructuredText and several other wiki-like syntaxes) to convert to PDF.

Resources