LL(1) parser generator in OCaml - parsing

I'm looking for a LL(1) parser generator in OCaml... Can anybody help me with this?

Well, LALR parsers can parse a strict superset of the languages which can be parsed by LL parsers. So I would advise simply using ocamlyacc which ships with Ocaml and is an LALR(1) parser generator. This may require some minor rewriting of the grammar, but it shouldn't be too hard.

Planck LL(n) parser combinator library: https://bitbucket.org/camlspotter/planck/overview
It has started as my toy project, and there is no actual users, but I could implement OCaml syntax lexer/parser with Planck which are 100% compatible with the originals.
I do not recommend to use it but if you are interested... try it.

Stream parser as included in camlp4 are (at best of my knowledge) LL(1) parser. see
http://caml.inria.fr/pub/docs/manual-camlp4/manual003.html

I have heard good things about Menhir
The home page says at the top:
Menhir is a LR(1) parser generator for the OCaml programming language. That is, Menhir compiles LR(1) grammar specifications down to OCaml code. Menhir was designed and implemented by François Pottier and Yann Régis-Gianas.
Menhir is 90% compatible with ocamlyacc. Legacy ocamlyacc grammar specifications are accepted and compiled by Menhir. The resulting parsers run and produce correct parse trees.

Related

What happens if you directly use LL grammar for an LR parser, after making basic syntactical changes?

sorry for the amateurish question. I have a grammar that's LL and I want to write an LR grammar. Can I use the LL grammar, make minimal syntactical changes for it to fit with an LR parser and use it? Is that a bad idea? Are there structural differences between them that don't translate?
All LL(1) grammars are LR(1), so if you had an LR(1) parser generator, you could definitely use your LL(1) grammar, assuming the BNF syntax is that used by the parser generator.
But you probably don't have an LR(1) parser generator, but rather a parser generator which can only handle the LALR(1) subset of LR(1) grammars. All the same, you're probably fine. "Most" LL(1) grammars are in LALR(1), and it's pretty rare to find a useful LL(1) which is not. (This pattern is unlikely to arise in a practical grammar, for example.)
So it's probably possible. But it might not be a good idea.
Top-down parsers can't handle left-recursion, and without left-recursion you can't write a grammar which represents left-associative operators, which is to say most arithmetic operators. This problem is usually solved in practice by using a right-associative grammar along with an idiosyncratic evaluation function which corrects the associativity. That's less than ideal. Also, LL grammars created by mechanically removing left-recursion tend to be very hard to read.
So you are probably best off using a grammar designed for LR parsing. But you probably don't have to.

How to make LALR(1) parser directly?

I studied LR(1) parsers and then LALR(1) and noticed that if we want to construct LALR(1) parsers, we should FIRST construct the LR(1) parser and then, by combining some states with the same core, we can go ahead for LALR(1) parser. (For complex grammars, it's not easy to construct LR parsers)
Now a question comes to mind: can we make LALR(1) parser DIRECTLY? Without using (Or maybe constructing) LR(1) parser? If Yes, How?
Thanks in advance!
PARSING TECHNIQUES A Practical Guide by Dick Grune and Ceriel J.H. Jacobs is worth getting. The Lemon Parser generator (http://www.hwaci.com/sw/lemon/) has readable code too.

LR(k) or LALR(k) parser generator with features similar to ANTLR

I'm currently in the process of writing a parser for some language. I've been given a grammar for this language, but this grammar has some left recursions and non-LL(*) constructs, so ANTLR doesn't do very well, even with backtracking.
Because removing these left recursions and non-LL(*) constructs is harder than it looked at first glance, I now want to try a LR(k) or LALR(k) parser generator. The higher k the better.
Can anyone recommend me a parser generator fulfilling these requirements?
The generated parser is preferably a LR(k) parser with some high (or even arbitrary) k, or at least a LALR(k) parser with some high k.
The generated parser is written in C or C++, and if it is written in C, it is linkable to C++-Code.
A feature set similar to ANTLR (especially the AST rewriting) would be nice.
Performance is not the most pressing issue, the generated parser is intended to
be used on desktop machines with much memory and cpu power.
Thanks and greetings,
Jost
PS: I'm not asking because I can't google myself, but because there is no time left to test some generators myself. So please only answer if you have experience with the recommended parser generators.
You might consider LRSTAR.
I have no experience with the tool itself, but I've met the author and he seems like a pretty competent guy. (I do build parsing engines and related technology for a living).
LRSTAR 10.0 is available now. On the comparison page, there is a comparison of LRSTAR, ANTLR and Bison. LRSTAR now reads ANTLR's style notation using the same EBNF operators (:, |, *, +, ?). It's a C++ based system generating LR(k) parsers in C++. The parsers do automatic AST construction and traversal. The new version 10.0 reads Yacc/Bison grammars if there is no action code in the grammar.
I have now decided to use DParser, which is a GLR-Parser generator capable of recognizing any context free language. It seems to be well programmed (look at the tests in the source distribution), but lacks a lot of the features ANTLR provides, most notably the AST-Construction tools.
As a plus, it mostly reuses ANTLRs grammar file format, which was the format my grammar is in.

Translate Haskell Parsec grammar to Scala?

I'm trying to translate a grammar written in Haskell using Parsec into Scala's parser combinators.
The translation of the actual matching expressions is pretty straightforward and, at least in my opinion, even a little easier in Scala. What's not at all clear to me is how to handle the statefulness that Parsec passes around using monads.
A Scala parser reads in Input and produces a ParseResult[T].
In contrast, a GenParser in Haskell reads in input and a state and produces another parser. Passing that state around in Scala has me confused.
Does anyone have an example of stateful parsing in Scala that they'd be willing to share?
The only way I know of to handle state-fullness in Scala Parsers Combinators is through the into method, also known as >> and flatMap (and, yes, you can use it in for-comprehensions). However, it passes state (or, more precisely, parse result) into a parser, and not along the next parsers, which seems to be what you are describing.
Not knowing Haskell's Parsec, it is difficult for me to guess at how that can be used to translate your grammar.
See also this question. There was a very interesting paper about Scala parser combinators, but I was not able to find it. Some spelunking on Scala Lang might turn it up.

Scala Parsers: Availability, Differences and Combining?

My question is about the Scala Parsers:
Which ones are available (in the Standard library and outside),
what's the difference between them,
do they share a common API and
can different Parsers be combined to parse one input string?
I found at least these:
Scala's "standard" parser (seems to be an LL parser)
Scala's Packrat parser (since 2.8, is a LALR parser)
The Parboiled parser (PEG parser?)
Spiewak's GLL parser combinator
There's also Dan Spiewak's implementation of GLL parser combinators.
It's worth noting that Scala's standard parser combinators are not LL, nor are Packrat combinators LALR. Parser combinators are a form of recursive descent with infinite backtracking. You can think of them a bit like "LL(*)". The class of languages supported by this technique is precisely the class of unambiguous context-free languages, or the same class as LALR(1) and Packrat. However, the class of grammar is quite a bit different, with the most notable weakness being non-support for left-recursion.
Packrat combinators do support left-recursion, but they still fail to support many other, more subtle features of LALR. This weakness generally stems from the ordered choice operator, which can lead to some devilishly tricky grammar bugs, as well as prevents certain nice grammatical formulations. The most often-seen example of these bugs happens when you accidentally order ambiguous choices as shortest first, resulting in a greedy match that prevents the correct branch from ever being tried. LALR doesn't have this problem, since it simply tries all possible branches at once, deferring the decision point until the end of the production.
There is also a new approach known as "parsing with derivatives". The approach is described here. There is an implementation in Scala by Daniel Spiewak.
Just wanted to update this answer with a pointer to the latest iteration of the parboiled project, called parboiled2:
https://github.com/sirthias/parboiled2
parboiled2 targets only Scala (as opposed to Scala + Java), makes use of Scala macros, and is very actively maintained.

Resources