Accessing parsed data from Parser in ANTLR4

Accessing parsed data from Parser in ANTLR4 - antlrworks

I am processing my input file through my custom grammar and extracting tokens and returning them in a HashMap. In ANTLR 3 I was able to parse the file by invoking rule() method on parser and getting the HashMap something like parser.record.
This doesn't seems to work in ANTLR 4. I referred the book and it appears that I have to call parser.init() to start parsing but I don't see any method in my parser.
I used ANTLRWorks 2 to generate my lexer and parser files. I didn't generate the listener classes.

In ANTLR 4, you start parsing by calling the method matching the name of the entry rule. If the rule in your grammar is called rule, you would start parsing by calling rule(). If the rule in your grammar is called init, then you would start parsing by calling init().
Note that the Java target in ANTLR 4 does not allow you to have a rule called rule, so if you have such a rule in your grammar you'll need to rename it before the grammar will compile.

Related

Implementing "includes" when parsing in Attoparsec

I am writing a DSL for fun. I decided to use attoparsec because I was familiar with it.
I want to implement parsing of includes with relative filenames like this:
include /some/dir/file.ext
or URLs:
include http://blah.com/my/file.ext
So when I'm parsing I expect to read the referenced resource and parse the entire thing, appending its contents to the "outer" parsing state.
The problem is that although the parsing of these statements is easy, I can't run IO (as I understand it) within my Attoparsec parsers.
How do I use Attoparsec to achieve this? Do I chop the initial input up using some string filtering and then parse each "block" into parse and feed accordingly? Essentially a two-pass parse approach?

Attoparsec is pure (Data.Attoparsec.Internal.Types.Parser is not a transformer and doesn’t include IO) so you’re right that you can’t expand includes from within a parser directly.
Splitting the parser into two passes seems like the right approach: one pass acts like the C preprocessor, accepting a file with include statements interleaved with other stuff. The “other stuff” only needs to be basically lexically valid, not your full parser—just like the C preprocessor only cares about tokens and matching parentheses, not matching other brackets or anything semantic. You then replace the includes, producing a fully expanded file that you can give to your existing parser.
If an included file must be syntactically “standalone” in some sense†, then you can parse a whole file first, interleaved with includes, then replace them. For instance:
-- Whatever items you’re parsing.
data Item
-- A reference to an included path.
data Include = Include FilePath
parse :: Parser [Either Include Item]
-- Substitute includes; also calls ‘parse’
-- recursively until no includes remain.
substituteIncludes :: [Either Include Item] -> IO [Item]
† Say, if you’re just using attoparsec for lexing tokens that can’t cross file boundaries anyway, or you’re doing full parsing but want to disallow an include file that contains e.g. unmatched brackets.
The other option is to embed IO in your parser directly by using a different parsing library such as megaparsec, which provides a ParsecT transformer that you can wrap around IO to do IO directly in your parser. I would probably do this for a prototype, but it seems tidier to separate the concerns of parsing and expansion as much as possible.

How to generate a parser generator using Xtext?

I am planning to implement a meta language on top of Xtext. In other words, I am using the Xtext grammar to define my own meta language. This meta language can then be used to define a language (using the syntax that I defined). Using the defined language, a model can be created by the user.
Hence, I would like to use Xtext/Xtend as a generator for parser generators. This would enable me to add as many meta levels as I like. My understanding is, that Xtext itself is defined using Xtext, so this should be possible?
The problem is that I don't know how to approach this, as I am not an expert in Xtext or parser generator frameworks in general. Any solutions/approaches/hints are welcomed.
Update (more details and motivation)
Xtext can be used to generate anything, so I could write a generator based on Xtext that generates a parser. This could be done by specifying my meta language's grammar, using Xtext to generate a parser for that grammar, so I would have access to an AST that represents a model written in my meta language. However, from here on, I would be left alone to do whatever I want with the AST, e.g. generate a parser (because the AST represents the grammar of a user-defined language). But as Xtext has the specific ability to generate parsers, I was thinking of reusing this feature instead of implementing my own parser generator based on the AST of a grammar.
My motivation is the wish to define my own DSL grammar language (as a replacement for Xtext), while still being able to use the infrastructure provided by the Xtext project.

I came to the following solution:
A grammar that was written using my grammar language will be parsed by Xtext. Next, the resulting AST is transformed to the Xtext grammar language AST, which can be used as input for the existing parser generator.
In general, given some grammar language l1, a model written in this language will be parsed and the resulting AST will be transformed to the AST of the grammar language l2 that was used to specify l1. This step is repeated until we have an AST representing a model of the Xtext grammar language, which will be used to generate the new parser.
Naturally, any information added with the definition of a new grammar language will be lost in each transformation step. Therefore, the infrastructure that is developed around a grammar language has the responsibility to create some kind of functionality that makes this information available to a higher language developed using the grammar language.

For a different approach, see:
WWW.XTRAN-LLC.com/xtran.html#parse-gen
In a nutshell, I got tired of creating parsers for XTRAN, our Expert System whose rules language manipulates computer languages, data, and text, so I created a parsing engine that directly executes EBNF at parse time (as opposed to creating parsing code, e.g. Lexx/YACC and ANTLR). Since XTRAN must also render code content represented in its Internal Representation / AST (after it's manipulated) as source code text, I created a corresponding rendering engine that executes (a much simpler form of) EBNF at render time.

reference to where the parser is at when calling a rule listener in ANTLR4

I'm generating listeners in Python, but any language is ok for answers or comments.
I need to know if there's some reference to where in the parsing tree, or even better, in the token stream or in the source file the parser is at when calling a specific listener method.
I get a context object, which has a reference to the parser itself, I looked for it but don't seem to find any.
This is for debugging only.
def enterData_stmt(self, ctx:fassParser.Data_stmtContext):
I know the parser doesn't traverse the source file but rather the abstract syntax tree, and I could look at it and get where the parser is at, but I'm wondering if I can get a little context for quick debugging without having to do a tree traversal

Every ParseRuleContext object has the fields start and stop, which contain the first and last token matched by the rule respectively. The token objects have the methods getLine and getCharPositionInLine to find out the line number and column number where each token starts respectively (there are no methods telling you where a token ends (except as an absolute index - not a line and column number), so if you need that, you'll need to calculate it yourself using the start position and the length).
I know the pareser doesn't traverse the source file but rather the abstract syntax tree
Of course the parser traverses the source file - how else could it parse it? The parser goes through the source file to generate the (not very abstract) parse tree. If you're using a visitor or ParseTreeWalker with a listener, the visitor/listener will then walk the generated parse tree. If you're using addParseListener, the listener will be invoked with the partially-constructed tree while the parser is still parsing the file.

I have written a grammar in Antlr and I want to check if some expressions do or do not parse according to the Grammar

I have written a BNF Grammar in Antlr4. Using Antlr commands I managed to run it and compile it. The outputs are all the necessary files that Antlr generates (Lexers, Parsers, Listeners). I am not sure if the BNF grammar I created is semantically correct, but at least it is syntactically correct, since no errors appear.
At this point, I have to check if some existing expressions parse according to that grammar, but I have no idea how to do that.

I'm making the following assumptions:
antlr-4.1-complete.jar is in your CLASSPATH
Your grammar is called 'Test'
Your starting rule is called 'parse'
Then do the following:
$ java org.antlr.v4.runtime.misc.TestRig Test parse -tree
Type expressions here
CTL^D
If you have an example expression in a file, you can pipe the contents through the parser:
$ cat fileName | java org.antlr.v4.runtime.misc.TestRig Test parse -tree

Which parser generator would be useful for manipulating the productions themselves?

Similar to Generating n statements from context-free grammars, I want to randomly generate sentences from a grammar.
What is a good parser generator for manipulating the actual grammar productions themselves? I want the parser generator to actually give me access to the productions (production objects?).
If I had a grammar akin to:
start_symbol ::= foo
foo ::= bar | baz
What is a good parser generator for:
giving me the starting production symbol
allow me to choose one production from RHS of the start symbol ( foo in this case)
give me the production options for foo
Clearly every parser has internal representations for productions and methods of associating the production with its RHS, but which parser would be easy to manipulate these internals?
Note: the blog entry linked to from the other SO question I mentioned has some sort of custom CFG parser. I want to use an actual grammar for a real parser, not generate my own grammar parser.

It should be pretty easy to write a grammar, that matches the grammar that a parser generator accepts. (With an open source parser genrator, you ought to be able to fetch such a grammar from the parser generator source code; they all then to have self-grammars). With that, you can then parse any grammar the parser generator accepts.
If you want to manipulate the parsed grammar, you'll need an abstract syntax tree of same. You can make most parser generators build a tree, either by using built-in mechanisms or ad hoc code you add.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart