I am trying to make a WYSIWYG internal tool. And we decided to implement this feature with contentEditable. However, we save data to our databases in markdown. So I have to be able to parse from html to md and back. For html to md I use package html2md and for the other way around I use Markdown package.
The issue i've been having is that when you write to my editor text like
HEY
After many lines some text
It produces this in md
HEY
After many lines some text
Notably it uses 2 whitespace and 2 LF characters (or atleast i think so but i might be slightly wrong.) I solved this issue by parsing it like this
markdownToHtml(data.replaceAll('&', '&').replaceAll('<', '<').replaceAll('>', '>'), inlineSyntaxes: [TextSyntax(String.fromCharCodes([32,32,10,10]),sub: "<div><br></div>")],inlineOnly: true );
The inline only parameter was neccesary because without it the text syntax wasnt applied for some reason. However this inline only then bit me in the arse when I tried to implement parsing of unordered lists, which are parsed as blocks. So I need a way to correctly parse these empty lines without using inline only.
class EmptyLineBlockSyntax extends BlockSyntax{
RegExp get pattern => RegExp(r'^(?:[ \t][ \t]+)$');
const EmptyLineBlockSyntax();
Node parse(BlockParser parser) {
parser.encounteredBlankLine = true;
parser.advance();
return Element('p',[Element.empty('br')]);
}
}
return markdownToHtml(data.replaceAll('&', '&').replaceAll('<', '<').replaceAll('>', '>'), blockSyntaxes: [EmptyLineBlockSyntax()]);
Related
When using MathQuill one needs to type \nsub to get the ⊈ symbol. The latex you get back is \not\subset, which is fine.
But when you try to set back the same \not\subset expression to the MathQuill field, you get a different result : \neg\subset, which translates to a different rendering.
The problem can be reproduced directly on the MathQuill page (http://mathquill.com/) using the browser console :
Any ideas on how to handle or work around this ?
MathQuill will accept \nsub while typing, but also \notsubset, which got me to a simple find-and-replace solution, which does not require any other code change.
Not the best, but it works; the only downside is that the list of text replacements is fixed; the set operators are the one I found; could be more.
// special math pre-processing
var mathFix = {
"\\not\\subset" : "\\notsubset",
"\\not\\supset" : "\\notsupset"
}
var mathTextFixed = '<your latex here>';
Object.keys(mathFix).forEach(function(t1) {
mathTextFixed = mathTextFixed.replace(t1, mathFix[t1]);
});
mathField.latex(mathTextFixed);
I'm trying to convert GitHub-flavored markdown (GFM) text to a Markdown AST Node. Because GFM allows for HTML tags like <img>, I also want those parsed as Markdown elements as well.
Right now, I'm using a combination of Unified + Turndown to do so. Essentially, I am parsing the Markdown, converting all of it to raw HTML using Unified and several plugins, then converting it to raw markdown using Turndown, then using Unified again to parse the new raw markdown into an AST node, and then finally parsing that node as I want.
This is what I have:
import unified from 'unified';
import markdown from 'remark-parse';
import gfm from 'remark-gfm';
import TurndownService from 'turndown';
import remark2rehype from 'remark-rehype';
import html from 'rehype-stringify';
import raw from 'rehype-raw';
import {parseBlocks} from './parser/internal';
import type {Root} from './markdown';
import type {ParsingOptions} from './types';
const {gfm: turndownGfm} = require('turndown-plugin-gfm');
export async function markdownToBlocks(body: string) {
const turndownService = new TurndownService().use(turndownGfm);
const rawHtml = await unified()
.use(markdown)
.use(gfm)
.use(remark2rehype, {allowDangerousHtml: true})
.use(raw)
.use(html)
.process(body);
const rawMarkdown = turndownService.turndown(String(rawHtml));
const root = unified().use(markdown).use(gfm).parse(rawMarkdown);
return parseBlocks(root as unknown as Root);
}
It does exactly what I want to and works perfectly, however I can't help but think there's a better solution that's more efficient than how I do it right now, since I'm parsing it several times over.
Any advice for how I can improve this with fewer parsing steps would be appreciated, thanks!
I also want those parsed as Markdown elements as well.
HTML inside Markdown is allowed, there is no need to parse it as Markdown, just pass it along as-is.
Otherwise don't re-invent the wheel: Turn the Markdown into HTML and load it as DOMDocument if you need an object model you can traverse.
The wiktionary entry for faint lies at https://en.wiktionary.org/wiki/faint
The wikitext for the etymology section is:
From {{inh|en|enm|faynt}}, {{m|enm|feynt||weak; feeble}}, from
{{etyl|fro|en}} {{m|fro|faint}}, {{m|fro|feint||feigned; negligent;
sluggish}}, past participle of {{m|fro|feindre}}, {{m|fro|faindre||to
feign; sham; work negligently}}, from {{etyl|la|en}}
{{m|la|fingere||to touch, handle, usually form, shape, frame, form in
thought, imagine, conceive, contrive, devise, feign}}.
It contains various templates of the form {{xyz|...}}
I would like to parse them and get the text output as it shows on the page:
From Middle English faynt, feynt (“weak; feeble”), from Old French
faint, feint (“feigned; negligent; sluggish”), past participle of
feindre, faindre (“to feign; sham; work negligently”), from Latin
fingere (“to touch, handle, usually form, shape, frame, form in
thought, imagine, conceive, contrive, devise, feign”).
I have about 10000 entries extracted from the freely available dumps of wiktionary here.
To do this, my thinking is to extract templates and their expansions (in some form). To explore the possibilites I've been fiddling with the lua scripting facility on mediawiki. By trying various queries inside the debug console on edit pages of modules, like here:
https://en.wiktionary.org/w/index.php?title=Module:languages/print&action=edit
mw.log(p)
>> table
mw.logObject(p)
>> table#1 {
["code_to_name"] = function#1,
["name_to_code"] = function#2,
}
p.code_to_name("aaa")
>>
p.code_to_name("ab")
>>
But, I can't even get the function calls right. p.code_to_name("aaa") doesn't return anything.
The code that presumably expands the templates for the etymology section is here:
https://en.wiktionary.org/w/index.php?title=Module:etymology/templates
How do I call this code correctly?
Is there a simpler way to achieve my goal of parsing wikitext templates?
Is there some function available in mediawiki that I can call like "parse-wikitext("text"). If so, how do I invoke it?
To expand templates (and other stuff) in wikitext, use frame.preprocess, which is called as a method on a frame object. To get a frame object, use mw.getCurrentFrame. For instance, type = mw.getCurrentFrame():preprocess('{{l|en|word}}') in the console to get the wikitext resulting from {{l|en|word}}. That currently gives <span class="Latn" lang="en">[[word#English|word]]</span>.
You can also use the Expandtemplates action in the MediaWiki API ( https://en.wiktionary.org/w/api.php?action=expandtemplates&text={{l|en|word}}), or the Special:ExpandTemplates page, or JavaScript (if you open the browser console while browsing a Wiktionary page):
new mw.Api().get({
action: 'parse',
text: '{{l|en|word}}',
title: mw.config.values.wgPageName,
}).done(function (data) {
const wikitext = data.parse.text['*'];
if (wikitext)
console.log(wikitext);
});
If the mw.api library hasn't already been loaded and you get a TypeError ("mw.Api is not a constructor"):
mw.loader.using("mediawiki.api", function() {
// Use mw.Api here.
});
So these are some of the ways to expand templates.
Bonjour, Hello,
I'm making this delphi application that I use to create word documents. I'm done with basic word operations (create/save, text, tables..etc).
What I need to do is to insert Page Numbers of headings as Cross Reference in the text. Something like :
"... and the process works as explained on page 23..."
where page number is a hyperlink to a heading. When I recorded a Macros in word it looks like:
Selection.InsertCrossReference ReferenceType:="Heading", _
ReferenceKind:=wdPageNumber, ReferenceItem:="49", InsertAsHyperlink:=True _
, IncludePosition:=False, SeparateNumbers:=False, SeparatorString:=" "
What would be the equivalent in Delphi please?
Thanks in advance!
Arjun.
Even when you're using Late Binding you still need to provide all of the parameters in the same order as the original declaration.
expression.InsertCrossReference(ReferenceType, ReferenceKind, ReferenceItem,
InsertAsHyperlink, IncludePosition, SeparateNumbers, SeparatorString)
If you aren't using a parameter then you can replace it with EmptyParam.
So I think your code will be:
Selection.InsertCrossReference('Heading', 7, '49', True, False, False, ' ');
(I think that wdPageNumber's value is 7).
I have a file containing a text representation of an object. I have written a combinator parser grammar that parses the text and returns the object. In the text, "#" is a comment delimiter: everything from that character to the end of the line is ignored. Blank lines are also ignored. I want to process text one line at a time, so that I can handle very large files.
I don't want to clutter up my parser grammar with generic comment and blank line logic. I'd like to remove these as a preprocessing step. Converting the file to an iterator over line I can do something like this:
Source.fromFile("file.txt").getLines.map(_.replaceAll("#.*", "").trim).filter(!_.isEmpty)
How can I pass the output of an expression like that into a combinator parser? I can't figure out how to create a Reader object out of a filtered expression like this. The Java FileReader interface doesn't work that way.
Is there a way to do this, or should I put my comment and blank line logic in the parser grammar? If the latter, is there some util.parsing package that already does this for me?
The simplest way to do this is to use the fromLines method on PagedSeq:
import scala.collection.immutable.PagedSeq
import scala.io.Source
import scala.util.parsing.input.PagedSeqReader
val lines = Source.fromFile("file.txt").getLines.map(
_.replaceAll("#.*", "").trim
).filterNot(_.isEmpty)
val reader = new PagedSeqReader(PagedSeq.fromLines(lines))
And now you've got a scala.util.parsing.input.Reader that you can plug into your parser. This is essentially what happens when you parse a java.io.Reader, anyway—it immediately gets wrapped in a PagedSeqReader.
Not the prettiest code you'll ever write, but you could go through a new Source as follows:
val SEP = System.getProperty("line.separator")
def lineMap(fileName : String, trans : String=>String) : Source = {
Source.fromIterable(
Source.fromFile(fileName).getLines.flatMap(
line => trans(line) + SEP
).toIterable
)
}
Explanation: flatMap will produce an iterator on characters, which you can turn into an Iterable, which you can use to build a new Source. You need the extra SEP because getLines removes it by default (using \n may not work as Source will not properly separate the lines).
If you want to apply filtering too, i.e. remove some of the lines, you could for instance try:
// whenever `trans` returns `None`, the line is dropped.
def lineMapFilter(fileName : String, trans : String=>Option[String]) : Source = {
Source.fromIterable(
Source.fromFile(fileName).getLines.flatMap(
line => trans(line).map(_ + SEP).getOrElse("")
).toIterable
)
}
As an example:
lineMapFilter("in.txt", line => if(line.isEmpty) None else Some(line.reverse))
...will remove empty lines and reverse non-empty ones.