strange errors when using parsertree - rascal

I am playing around with the parsing of code using the java15 syntax.
I noticed that when parsing an entire class it gives me an error if the class file ends with an empty line. I wrote some code to remove these empty lines before parsing but is there a more structural solution? Or am I missing something?
Related: when I am trying to parse a single method: as soon as I change something to the location of the accolades { } ( on a separate line or not for example) I receive an error.
|std:///ParseTree.rsc|(14967,5,<455,80>,<455,85>): ParseError(|java+class:///smallsql/database/language/Language_it|(10537,0,<152,0>,<152,0>))
at parse(|std:///ParseTree.rsc|(14967,5,<455,80>,<455,85>))
at $root$(|prompt:///|(0,7,<1,0>,<1,7>))
at *** somewhere ***(|std:///ParseTree.rsc|(14967,5,<455,80>,<455,85>))
at parse(|std:///ParseTree.rsc|(14967,5,<455,80>,<455,85>))
at $root$(|prompt:///|(0,7,<1,0>,<1,7>))

I'm guessing you are parsing using code like so:
parse(#CompilationUnit, file)
[CompilationUnit] file
But: a CompilationUnit does not start or end with layout notation.
To solve this, you should declare the start non-terminal of a grammar like so:
start syntax CompilationUnit = ... ;
layout L = [\t\n\r\ ]*; // it's necessary to have L be nullable
and this will (internally and hidden) generate for you a new production, like so:
syntax start[CompilationUnit] = L CompilationUnit top L;
With this, you can then parse a whole file which might end with layout:
parse(#start[CompilationUnit], file)
[start[CompilationUnit]] file
To extract the CompilationUnit the top field comes in handy:
CompilationUnit u = parse(#CompilationUnit, file).top

Related

Dart Markdown package, how to handle new lines

I am trying to make a WYSIWYG internal tool. And we decided to implement this feature with contentEditable. However, we save data to our databases in markdown. So I have to be able to parse from html to md and back. For html to md I use package html2md and for the other way around I use Markdown package.
The issue i've been having is that when you write to my editor text like
HEY
After many lines some text
It produces this in md
HEY
After many lines some text
Notably it uses 2 whitespace and 2 LF characters (or atleast i think so but i might be slightly wrong.) I solved this issue by parsing it like this
markdownToHtml(data.replaceAll('&', '&').replaceAll('<', '<').replaceAll('>', '>'), inlineSyntaxes: [TextSyntax(String.fromCharCodes([32,32,10,10]),sub: "<div><br></div>")],inlineOnly: true );
The inline only parameter was neccesary because without it the text syntax wasnt applied for some reason. However this inline only then bit me in the arse when I tried to implement parsing of unordered lists, which are parsed as blocks. So I need a way to correctly parse these empty lines without using inline only.
class EmptyLineBlockSyntax extends BlockSyntax{
RegExp get pattern => RegExp(r'^(?:[ \t][ \t]+)$');
const EmptyLineBlockSyntax();
Node parse(BlockParser parser) {
parser.encounteredBlankLine = true;
parser.advance();
return Element('p',[Element.empty('br')]);
}
}
return markdownToHtml(data.replaceAll('&', '&').replaceAll('<', '<').replaceAll('>', '>'), blockSyntaxes: [EmptyLineBlockSyntax()]);

Embedding mRuby: retrieving mrb_parser_message after parse error

I'm trying to embed mRuby in a Max MSP object. One of the first things I want to setup is error logging in the Max IDE console window. To that effect, after I parse the code ( stored in a C string ) with mrb_parse_string, I expect errors to be available in the parser's error_buffer array, but the structures in this array are always empty ( lineno and column set to 0 and message set to NULL ) even when there is an error.
Is there a special way to set up the parser before parsing the code so it fills its error_buffer array properly in case an error occurs ? I've looked into the mirb source, but it doesn't look like it. I'm lost. Here is the code I'm using, taken from a small C program I'm using as test:
mrb_state *mrb;
char *code;
struct mrb_parser_state *parser;
parser = mrb_parse_string(mrb, code, mrbc_context_new(mrb));
if (parser->nerr > 0) {
for(i = 0; i < parser->nerr; i++) {
printf("line %d:%d: %s\n", parser->error_buffer[i].lineno,
parser->error_buffer[i].column,
parser->error_buffer[i].message);
}
return -1;
}
When passed the following faulty ruby code:
[1,1,1]]
the previous code outputs :
line 1:8: syntax error, unexpected ']', expecting $end
line 0:0: (null)
I don't know where the first line comes from, since I compiled mRuby with MRB_DISABLE_STDIO defined and as line 14 and following in mrbconf.md suggests, but it is accurate.
The second line is the actual output from my code and shows that the returned mrb_parser_state structure's error_buffer is empty, which is surprising since the parser did see an error.
Sorry totally misunderstood your question.
So you want to:
capture script's syntax errors instead of printing.
make MRB_DISABLE_STDIO work.
For 1st issue
struct mrb_parser_state *parser;
parser = mrb_parse_string(mrb, code, mrbc_context_new(mrb));
should be replaced with:
struct mrbc_context *cxt;
struct mrb_parser_state *parser;
cxt = mrbc_context_new(mrb);
cxt->capture_errors = TRUE;
parser = mrb_parse_string(mrb, code, cxt);
like what mirb does.
For 2nd issue I don't know your build_config.rb so I can't say much about it.
Some notes to make things accurate:
MRB_DISABLE_STDIO is a compile flag for building mruby so you need to pass it in build_config.rb like:
cc.defines << %w(MRB_DISABLE_STDIO)
(see build_config_ArduinoDue.rb)
line 1:8: syntax error, unexpected ']', expecting $end
is the parsing error of mruby parser([1,1,1]] must be [1,1,1]).
And 1:8 means 8th column of 1st line (which points to unnecessary ]) so it seems like your C code is working correctly to me.
(For a reference your code's compilation error in CRuby:
https://wandbox.org/permlink/KRIlW2956TnS6puD )
prog.rb:1: syntax error, unexpected ']', expecting end-of-input
[1,1,1]]
^

Groovy- searching and excretion xml code from log file

I have so many texts in log file but sometimes i got responses as a xml code and I have to cut this xml code and move to other files.
For example:
sThread1....dsadasdsadsadasdasdasdas.......dasdasdasdadasdasdasdadadsada
important xml code to cut and move to other file: <response><important> 1 </import...></response>
important xml code to other file: <response><important> 2 </important...></response>
sThread2....dsadasdsadsadasdasdasdas.......dasdasdasdadasdasdasdadadsada
Hindrance: xml code starting from difference numbers of sign (not always start in the same number of sign)
Please help me with finding method how to find xml code in text
Right now i tested substring() method but xml code not always start from this same sign :(
EDIT:
I found what I wanted, function which I searched was indexOf().
I needed a number of letter where String "Response is : " ending: so I used:
int positionOfXmlInLine = lineTxt.indexOf("<response")
And after this I can cut string to the end of the line :
def cuttedText = lineTxt.substring(positionOfXmlInLine);
So I have right now only a XML text/code from log file.
Next is a parsing XML value like BDKosher wrote under it.
Hoply that will help someone You guys
You might be able to leverage XmlSlurper for this, assuming your XML is valid enough. The code below will take each line of the log, wrap it in a root element, and parse it. Once parsed, it extracts and prints out the value of the <important> element's value attribute, but instead you could do whatever you need to do with the data:
def input = '''
sThread1..sdadassda..sdadasdsada....sdadasdas...
important code to cut and move to other file: **<response><important value="1"></important></response>**
important code to other file: ****<response><important value="3"></important></response>****
sThread2..dsadasd.s.da.das.d.as.das.d.as.da.sd.a.
'''
def parser = new XmlSlurper()
input.eachLine { line, lineNo ->
def output = parser.parseText("<wrapper>$line</wrapper>")
if (!output.response.isEmpty()) {
println "Line $lineNo is of importance ${output.response.important.#value.text()}"
}
}
This prints out:
Line 2 is of importance 1
Line 3 is of importance 3

Preprocessing Scala parser Reader input

I have a file containing a text representation of an object. I have written a combinator parser grammar that parses the text and returns the object. In the text, "#" is a comment delimiter: everything from that character to the end of the line is ignored. Blank lines are also ignored. I want to process text one line at a time, so that I can handle very large files.
I don't want to clutter up my parser grammar with generic comment and blank line logic. I'd like to remove these as a preprocessing step. Converting the file to an iterator over line I can do something like this:
Source.fromFile("file.txt").getLines.map(_.replaceAll("#.*", "").trim).filter(!_.isEmpty)
How can I pass the output of an expression like that into a combinator parser? I can't figure out how to create a Reader object out of a filtered expression like this. The Java FileReader interface doesn't work that way.
Is there a way to do this, or should I put my comment and blank line logic in the parser grammar? If the latter, is there some util.parsing package that already does this for me?
The simplest way to do this is to use the fromLines method on PagedSeq:
import scala.collection.immutable.PagedSeq
import scala.io.Source
import scala.util.parsing.input.PagedSeqReader
val lines = Source.fromFile("file.txt").getLines.map(
_.replaceAll("#.*", "").trim
).filterNot(_.isEmpty)
val reader = new PagedSeqReader(PagedSeq.fromLines(lines))
And now you've got a scala.util.parsing.input.Reader that you can plug into your parser. This is essentially what happens when you parse a java.io.Reader, anyway—it immediately gets wrapped in a PagedSeqReader.
Not the prettiest code you'll ever write, but you could go through a new Source as follows:
val SEP = System.getProperty("line.separator")
def lineMap(fileName : String, trans : String=>String) : Source = {
Source.fromIterable(
Source.fromFile(fileName).getLines.flatMap(
line => trans(line) + SEP
).toIterable
)
}
Explanation: flatMap will produce an iterator on characters, which you can turn into an Iterable, which you can use to build a new Source. You need the extra SEP because getLines removes it by default (using \n may not work as Source will not properly separate the lines).
If you want to apply filtering too, i.e. remove some of the lines, you could for instance try:
// whenever `trans` returns `None`, the line is dropped.
def lineMapFilter(fileName : String, trans : String=>Option[String]) : Source = {
Source.fromIterable(
Source.fromFile(fileName).getLines.flatMap(
line => trans(line).map(_ + SEP).getOrElse("")
).toIterable
)
}
As an example:
lineMapFilter("in.txt", line => if(line.isEmpty) None else Some(line.reverse))
...will remove empty lines and reverse non-empty ones.

How to create a parser which tokenizes a list of words taken from a file?

I am trying to do a syntax text corrector for my compilers' class. The idea is: I have some rules, which are inherent to the language (in my case, Portuguese), like "A valid phrase is SUBJECT VERB ADJECTIVE", as in "Ruby is great".
Ok, so first I have to tokenize the input "Ruby is great". So I have a text file "verbs", with a lot of verbs, one by line. Then I have one text "adjectives", one "pronouns", etc.
I am trying to use Ragel to create a parser, but I don't know how I could do something like:
%%{
machine test;
subject = <open-the-subjects-file-and-accept-each-one-of-them>;
verb = <open-the-verbs-file-and-accept-each-one-of-them>;
adjective = <open-the-adjective-file-and-accept-each-one-of-them>;
main = subject verb adjective # { print "Valid phrase!" } ;
}%%
I looked at ANTLR, Lex/Yacc, Ragel, etc. But couldn't find one that seemed to solve this problem. The only way to do this that I could think of was to preprocess Ragel's input file, so that my program reads the file and writes its contents at the right place. But I don't like this solution either.
Does anyone knows how I could do this? There's no problem if it isn't with Ragel, I just want to solve this problem. I would like to use Ruby or Python, but that's not really necessary either.
Thanks.
If you want to read the files at compile time .. make them be of the format:
subject = \
ruby|\
python|\
c++
then use ragel's 'include' or 'import' statement (I forget which .. must check the manual) to import it.
If you want to check the list of subjects at run time, maybe just make ragel read 3 words, then have an action associated with each word. The action can read the file and lookup if the word is good or not at runtime.
The action reads the text file and compares the word's contents.
%%{
machine test
action startWord {
lastWordStart = p;
}
action checkSubject {
word = input[lastWordStart:p+1]
for possible in open('subjects.txt'):
if possible == word:
fgoto verb
# If we get here do whatever ragel does to go to an error or just raise a python exception
raise Exception("Invalid subject '%s'" % word)
}
action checkVerb { .. exercise for reader .. ;) }
action checkAdjective { .. put adjective checking code here .. }
subject = ws*.(alnum*)>startWord%checkSubject
verb := : ws*.(alnum*)>startWord%checkVerb
adjective := ws*.)alnum*)>startWord%checkAdjective
main := subject;
}%%
With bison I would write the lexer by hand, which lookup the words in the predefined dictionary.

Resources