(Nearly.js) How can I import files which content I want to be the input of my grammars as in LaTex? - parsing

What I want:
First, assume I am able to successfully parse the following with my grammar.ne:
\begin{chapter}
...content
\end{chapter}
The desired behavior is to be able to make my grammar.ne file be able to parse a source file containing the above text. For example, in LaTex one can write something like this:
\chapter{filename}
The problem is simply that I don't really know how to do it. But I'll explain what I am trying.
What I am trying
% chapter grammar works well
chapter -> "\\begin{junior}" _ chapterContent _ "\\end{junior}" {% (data) => data[2] %}
% chapterTag below is able to read file and return a string.
% What I don't know how to do is making the grammar parse such a string as chapter
chapterTag -> "\\chapter{" anyCharacters "}"
{% (data) => {
// Assume function readfile exists.
const juniorText = readfile(data[1]);
// How do I tell Nearley to parse this string as a chapter?
return juniorText;
}
%}
If you've used LaTex before, you know what the content of the filename with the chapter would be (\begin{chapter} ... \end{chapter}).

Related

How expect toMatch works?

I'm doing a POC with playwright with {expect} from '#playwright/test'.
I'm a bit confused how to create the regexp, I expect to validate this string which is correct as per the regex validator.
expect('abc123').toMatch('abc(\d+)')
first the '\' is marked as Unnecessary escape character
tried '\\' marks error
removed '\' marks error
How about toMatchText:
import { expect } from '#playwright/test';
await expect('abc123').toMatchText(/abc\d+/);
I don't see toMatch in the docs, so I used toMatchText(). I also think that capturing group in the regex () is not necessary in this example, so \d+ should be enough.
import { expect } from '#playwright/test';
let pattern = new RegExp('abc(\d+)');
const text = await page.locator('.label').textContent();
expect(text).toEqual(expect.stringMatching(pattern));
On this example you can receive a Regular Expression parameter as string and converted to a RegExp object, hence backlashes are not explicitly necessary.
Then compare with a text with expect.stringMatching as shown in the last row

Dart Markdown package, how to handle new lines

I am trying to make a WYSIWYG internal tool. And we decided to implement this feature with contentEditable. However, we save data to our databases in markdown. So I have to be able to parse from html to md and back. For html to md I use package html2md and for the other way around I use Markdown package.
The issue i've been having is that when you write to my editor text like
HEY
After many lines some text
It produces this in md
HEY
After many lines some text
Notably it uses 2 whitespace and 2 LF characters (or atleast i think so but i might be slightly wrong.) I solved this issue by parsing it like this
markdownToHtml(data.replaceAll('&', '&').replaceAll('<', '<').replaceAll('>', '>'), inlineSyntaxes: [TextSyntax(String.fromCharCodes([32,32,10,10]),sub: "<div><br></div>")],inlineOnly: true );
The inline only parameter was neccesary because without it the text syntax wasnt applied for some reason. However this inline only then bit me in the arse when I tried to implement parsing of unordered lists, which are parsed as blocks. So I need a way to correctly parse these empty lines without using inline only.
class EmptyLineBlockSyntax extends BlockSyntax{
RegExp get pattern => RegExp(r'^(?:[ \t][ \t]+)$');
const EmptyLineBlockSyntax();
Node parse(BlockParser parser) {
parser.encounteredBlankLine = true;
parser.advance();
return Element('p',[Element.empty('br')]);
}
}
return markdownToHtml(data.replaceAll('&', '&').replaceAll('<', '<').replaceAll('>', '>'), blockSyntaxes: [EmptyLineBlockSyntax()]);

Groovy- searching and excretion xml code from log file

I have so many texts in log file but sometimes i got responses as a xml code and I have to cut this xml code and move to other files.
For example:
sThread1....dsadasdsadsadasdasdasdas.......dasdasdasdadasdasdasdadadsada
important xml code to cut and move to other file: <response><important> 1 </import...></response>
important xml code to other file: <response><important> 2 </important...></response>
sThread2....dsadasdsadsadasdasdasdas.......dasdasdasdadasdasdasdadadsada
Hindrance: xml code starting from difference numbers of sign (not always start in the same number of sign)
Please help me with finding method how to find xml code in text
Right now i tested substring() method but xml code not always start from this same sign :(
EDIT:
I found what I wanted, function which I searched was indexOf().
I needed a number of letter where String "Response is : " ending: so I used:
int positionOfXmlInLine = lineTxt.indexOf("<response")
And after this I can cut string to the end of the line :
def cuttedText = lineTxt.substring(positionOfXmlInLine);
So I have right now only a XML text/code from log file.
Next is a parsing XML value like BDKosher wrote under it.
Hoply that will help someone You guys
You might be able to leverage XmlSlurper for this, assuming your XML is valid enough. The code below will take each line of the log, wrap it in a root element, and parse it. Once parsed, it extracts and prints out the value of the <important> element's value attribute, but instead you could do whatever you need to do with the data:
def input = '''
sThread1..sdadassda..sdadasdsada....sdadasdas...
important code to cut and move to other file: **<response><important value="1"></important></response>**
important code to other file: ****<response><important value="3"></important></response>****
sThread2..dsadasd.s.da.das.d.as.das.d.as.da.sd.a.
'''
def parser = new XmlSlurper()
input.eachLine { line, lineNo ->
def output = parser.parseText("<wrapper>$line</wrapper>")
if (!output.response.isEmpty()) {
println "Line $lineNo is of importance ${output.response.important.#value.text()}"
}
}
This prints out:
Line 2 is of importance 1
Line 3 is of importance 3

Preprocessing Scala parser Reader input

I have a file containing a text representation of an object. I have written a combinator parser grammar that parses the text and returns the object. In the text, "#" is a comment delimiter: everything from that character to the end of the line is ignored. Blank lines are also ignored. I want to process text one line at a time, so that I can handle very large files.
I don't want to clutter up my parser grammar with generic comment and blank line logic. I'd like to remove these as a preprocessing step. Converting the file to an iterator over line I can do something like this:
Source.fromFile("file.txt").getLines.map(_.replaceAll("#.*", "").trim).filter(!_.isEmpty)
How can I pass the output of an expression like that into a combinator parser? I can't figure out how to create a Reader object out of a filtered expression like this. The Java FileReader interface doesn't work that way.
Is there a way to do this, or should I put my comment and blank line logic in the parser grammar? If the latter, is there some util.parsing package that already does this for me?
The simplest way to do this is to use the fromLines method on PagedSeq:
import scala.collection.immutable.PagedSeq
import scala.io.Source
import scala.util.parsing.input.PagedSeqReader
val lines = Source.fromFile("file.txt").getLines.map(
_.replaceAll("#.*", "").trim
).filterNot(_.isEmpty)
val reader = new PagedSeqReader(PagedSeq.fromLines(lines))
And now you've got a scala.util.parsing.input.Reader that you can plug into your parser. This is essentially what happens when you parse a java.io.Reader, anyway—it immediately gets wrapped in a PagedSeqReader.
Not the prettiest code you'll ever write, but you could go through a new Source as follows:
val SEP = System.getProperty("line.separator")
def lineMap(fileName : String, trans : String=>String) : Source = {
Source.fromIterable(
Source.fromFile(fileName).getLines.flatMap(
line => trans(line) + SEP
).toIterable
)
}
Explanation: flatMap will produce an iterator on characters, which you can turn into an Iterable, which you can use to build a new Source. You need the extra SEP because getLines removes it by default (using \n may not work as Source will not properly separate the lines).
If you want to apply filtering too, i.e. remove some of the lines, you could for instance try:
// whenever `trans` returns `None`, the line is dropped.
def lineMapFilter(fileName : String, trans : String=>Option[String]) : Source = {
Source.fromIterable(
Source.fromFile(fileName).getLines.flatMap(
line => trans(line).map(_ + SEP).getOrElse("")
).toIterable
)
}
As an example:
lineMapFilter("in.txt", line => if(line.isEmpty) None else Some(line.reverse))
...will remove empty lines and reverse non-empty ones.

How to create a parser which tokenizes a list of words taken from a file?

I am trying to do a syntax text corrector for my compilers' class. The idea is: I have some rules, which are inherent to the language (in my case, Portuguese), like "A valid phrase is SUBJECT VERB ADJECTIVE", as in "Ruby is great".
Ok, so first I have to tokenize the input "Ruby is great". So I have a text file "verbs", with a lot of verbs, one by line. Then I have one text "adjectives", one "pronouns", etc.
I am trying to use Ragel to create a parser, but I don't know how I could do something like:
%%{
machine test;
subject = <open-the-subjects-file-and-accept-each-one-of-them>;
verb = <open-the-verbs-file-and-accept-each-one-of-them>;
adjective = <open-the-adjective-file-and-accept-each-one-of-them>;
main = subject verb adjective # { print "Valid phrase!" } ;
}%%
I looked at ANTLR, Lex/Yacc, Ragel, etc. But couldn't find one that seemed to solve this problem. The only way to do this that I could think of was to preprocess Ragel's input file, so that my program reads the file and writes its contents at the right place. But I don't like this solution either.
Does anyone knows how I could do this? There's no problem if it isn't with Ragel, I just want to solve this problem. I would like to use Ruby or Python, but that's not really necessary either.
Thanks.
If you want to read the files at compile time .. make them be of the format:
subject = \
ruby|\
python|\
c++
then use ragel's 'include' or 'import' statement (I forget which .. must check the manual) to import it.
If you want to check the list of subjects at run time, maybe just make ragel read 3 words, then have an action associated with each word. The action can read the file and lookup if the word is good or not at runtime.
The action reads the text file and compares the word's contents.
%%{
machine test
action startWord {
lastWordStart = p;
}
action checkSubject {
word = input[lastWordStart:p+1]
for possible in open('subjects.txt'):
if possible == word:
fgoto verb
# If we get here do whatever ragel does to go to an error or just raise a python exception
raise Exception("Invalid subject '%s'" % word)
}
action checkVerb { .. exercise for reader .. ;) }
action checkAdjective { .. put adjective checking code here .. }
subject = ws*.(alnum*)>startWord%checkSubject
verb := : ws*.(alnum*)>startWord%checkVerb
adjective := ws*.)alnum*)>startWord%checkAdjective
main := subject;
}%%
With bison I would write the lexer by hand, which lookup the words in the predefined dictionary.

Resources