I receive a task to parse a text which conforms to EBNF syntax. Is there any tool/library I can use?
ANTLR is the standard tool for parsing EBNF.
See Good parser generator (think lex/yacc or antlr) for .NET? Build time only? here on SO.
Related
I am trying to write my own programming language, and I use Lua source as a reference.
I have several questions about it:
What kind of parser Lua use? Is it a Pratt parser?
... and why it doesn't produce an AST? That's a job that parser does, isn't it?
I wish to use FParsec for a python-like language, indentation-based.
I understand that this must be done in the lexing phase, but FParsec don't have a lexing phase. Is possible to use FParsec, or, how can feed it after lexing?
P.D: I'm new at F#, but experienced in other languages
Yes, it's possible.
Here is a relevant article by FParsec author. If you want to go deeper on the subject, this paper might worth a read. The paper points out that there are multiple packages for indentation-aware parsing that based on Parsec, the parser combinator that inspires FParsec.
FParsec doesn't have a separate lexing phase but instead it fuses lexing and parsing to a single phase. IMO indentation-aware parsing is better to be done with parser combinators (FParsec) than parser generators (fslex/fsyacc). The reason is that you need to manually track current indentation and report good error messages based on contexts.
I know that it's possible to use, for example, bison-generated Java files in scala project, but is there any native "grammar to scala" LALR(1) generators?
Another plug here: ScalaBison is close to LALR(1) and lets you use Scala in the actions.
I'm not really answering the original question, and please excuse the plug, but you may be interested in our sbt-rats plugin for the sbt tool. It uses the Rats! parser generator for Java, but makes it easier to use from Scala.
Rats! uses parsing expression grammars as its syntax description formalism, not context-free grammars and definitely not LALR(1) grammars. sbt-rats also has a high-level syntax definition language that in most cases means you do not need to write semantic actions to get a syntax tree that represents your input. The plugin will optionally generate case classes for the tree representation and a pretty-printer for the tree structure.
I have a requirement but I don't know much about implementation detail.
I have a query string like -->
(title:java or author:john) and date:[20110303 TO 20110308]
basically the query string is composed with lucene syntax.
What I really need to do is parse query string into AST and convert AST to lucene query.
I'm not familiar with compiler or parser technology and I ran into Irony project.
Can someone point me how to and where to start? Using Irony or hand-made will be okay.
Thanks a lot.
Sorry for the late response:
Generally speaking, to create a parser, it's best to describe the grammar in the abstract, then generate the parser using a parser generator.
I created the lucene-query-parser.js library using a PEG grammar, which is in the Github repo here. That grammar is specific to PEG.js and uses JavaScript to implement an AST style result for the parsed query.
It's not necessary to return an AST style structure, but I found that to be most useful for the project that I wrote the syntax for. You could re-implement the grammar to return any sort of parser result that you wanted to.
If your query String is in Lucene syntax, then simply pass it to the parse(String) method of Lucene's QueryParser.
That will return a Query object representing the query String.
If you need to extend or modify the standard lucene syntax, then you could start by looking at the JavaCC Grammar for QueryParser.
Others have modified it in the past to add support for RegExps
You could also look at the Myna parser which is a JavaScript parsing library that has a sample Lucene grammar. The Myna parser automatically generates an AST that you can easily transform into whatever form you want.
I'm making an application that will parse commands in Scala. An example of a command would be:
todo get milk for friday
So the plan is to have a pretty smart parser break the line apart and recognize the command part and the fact that there is a reference to time in the string.
In general I need to make a tokenizer in Scala. So I'm wondering what my options are for this. I'm familiar with regular expressions but I plan on making an SQL like search feature also:
search todo for today with tags shopping
And I feel that regular expressions will be inflexible implementing commands with a lot of variation. This leads me to think of implementing some sort of grammar.
What are my options in this regard in Scala?
You want to search for "parser combinators". I have a blog post using this approach (http://cleverlytitled.blogspot.com/2009/04/shunting-yard-algorithm.html), but I think the best reference is this series of posts by Stefan Zieger (http://szeiger.de/blog/2008/07/27/formal-language-processing-in-scala-part-1/)
Here are slides from a presentation I did in Sept. 2009 on Scala parser combinators. (http://sites.google.com/site/compulsiontocode/files/lambdalounge/ImplementingExternalDSLsUsingScalaParserCombinators.ppt) An implementation of a simple Logo-like language is demonstrated. It might provide some insights.
Scala has a parser library (scala.util.parsing.combinator) which enables one to write a parser directly from its EBNF specification. If you have an EBNF for your language, it should be easy to write the Scala parser. If not, you'd better first try to define your language formally.