I studied LR(1) parsers and then LALR(1) and noticed that if we want to construct LALR(1) parsers, we should FIRST construct the LR(1) parser and then, by combining some states with the same core, we can go ahead for LALR(1) parser. (For complex grammars, it's not easy to construct LR parsers)
Now a question comes to mind: can we make LALR(1) parser DIRECTLY? Without using (Or maybe constructing) LR(1) parser? If Yes, How?
Thanks in advance!
PARSING TECHNIQUES A Practical Guide by Dick Grune and Ceriel J.H. Jacobs is worth getting. The Lemon Parser generator (http://www.hwaci.com/sw/lemon/) has readable code too.
Related
sorry for the amateurish question. I have a grammar that's LL and I want to write an LR grammar. Can I use the LL grammar, make minimal syntactical changes for it to fit with an LR parser and use it? Is that a bad idea? Are there structural differences between them that don't translate?
All LL(1) grammars are LR(1), so if you had an LR(1) parser generator, you could definitely use your LL(1) grammar, assuming the BNF syntax is that used by the parser generator.
But you probably don't have an LR(1) parser generator, but rather a parser generator which can only handle the LALR(1) subset of LR(1) grammars. All the same, you're probably fine. "Most" LL(1) grammars are in LALR(1), and it's pretty rare to find a useful LL(1) which is not. (This pattern is unlikely to arise in a practical grammar, for example.)
So it's probably possible. But it might not be a good idea.
Top-down parsers can't handle left-recursion, and without left-recursion you can't write a grammar which represents left-associative operators, which is to say most arithmetic operators. This problem is usually solved in practice by using a right-associative grammar along with an idiosyncratic evaluation function which corrects the associativity. That's less than ideal. Also, LL grammars created by mechanically removing left-recursion tend to be very hard to read.
So you are probably best off using a grammar designed for LR parsing. But you probably don't have to.
I need a way of generating parsers for all deterministic context-free grammars.
I know that every deterministic context-free grammar can be parsed by some LR(k) parser. The problem is that I need to generate parsers for grammars of unknown k. So, to handle every deterministic context-free grammar, k would need to be infinite.
I also know that GLR parsers can parse all context-free grammars, deterministic or not. But I need to reject non-deterministic grammars. I'm not sure if GLR can detect that property from an input grammar.
Is there a type of parser generator that can handle all deterministic context-free grammars, while rejecting non-deterministic grammars, without needing a k input? (The only input is the grammar itself)
The problem of “given a CFG, decide whether it’s LR(k) for any k” is, surprisingly, undecidable! This means that it’s not possible for any parser generator to always be able to take an arbitrary grammar and determine which choice of k to use, or even if such a choice of k exists.
In practice, most grammars that we care about are fairly close to LR(1), for some definition of “fairly close,” which is why most parser generators focus on that simpler case.
I've been using lex/yacc and now I'm trying to switch to ANTLR. The major concern is that ANTLR is an LL(*) parser unlike yacc which is LALR. I'm used to thinking bottom-up and I don't exactly know what the advantage of LL grammars is. People say that LL grammars are easier to understand and more popular these days. But it seems that LR parsers are more powerful e.g. LL parsers are incapable of dealing with left-recursions, although there seems to be some workarounds.
So the question is what is the advantage of LL grammars over LALR? I'd appreciate it if somebody could give me some examples. Links to useful articles would be great, too.
Thanks for your help in advance!
(I see this is a great resource: What advantages do LL parsers have over LR parsers?, but it would've been better with some examples.)
LR parsers are strictly more powerful than LL parsers, and in addition, LALR parsers can run in O(n) like LL parsers. So you won't find any functional advantages of LL over LR.
Thus, the only advantage of LL is that LR state machines are quite a bit more complex and difficult to understand, and LR parsers themselves are not especially intuitive. On the other hand, LL parser code which is automatically generated can be very easy to understand and debug.
The greatest advantage I see to LL parsers is that they are so easy to understand and implement! You can hand write recursive descent parsers with code that closely matches the grammar.
LR are generally considered more powerful and also much faster BUT there are a few trade offs that I know:
LR parsers can only use synthesized attributes; they can not pass inherited attributes.
Actions in an LR grammar can cause grammar nondeterminism but not in LL.
However, you will find that LL(*) are also very powerful.
I'm looking for a LL(1) parser generator in OCaml... Can anybody help me with this?
Well, LALR parsers can parse a strict superset of the languages which can be parsed by LL parsers. So I would advise simply using ocamlyacc which ships with Ocaml and is an LALR(1) parser generator. This may require some minor rewriting of the grammar, but it shouldn't be too hard.
Planck LL(n) parser combinator library: https://bitbucket.org/camlspotter/planck/overview
It has started as my toy project, and there is no actual users, but I could implement OCaml syntax lexer/parser with Planck which are 100% compatible with the originals.
I do not recommend to use it but if you are interested... try it.
Stream parser as included in camlp4 are (at best of my knowledge) LL(1) parser. see
http://caml.inria.fr/pub/docs/manual-camlp4/manual003.html
I have heard good things about Menhir
The home page says at the top:
Menhir is a LR(1) parser generator for the OCaml programming language. That is, Menhir compiles LR(1) grammar specifications down to OCaml code. Menhir was designed and implemented by François Pottier and Yann Régis-Gianas.
Menhir is 90% compatible with ocamlyacc. Legacy ocamlyacc grammar specifications are accepted and compiled by Menhir. The resulting parsers run and produce correct parse trees.
I've recently being trying to teach myself how parsers (for languages/context-free grammars) work, and most of it seems to be making sense, except for one thing. I'm focusing my attention in particular on LL(k) grammars, for which the two main algorithms seem to be the LL parser (using stack/parse table) and the Recursive Descent parser (simply using recursion).
As far as I can see, the recursive descent algorithm works on all LL(k) grammars and possibly more, whereas an LL parser works on all LL(k) grammars. A recursive descent parser is clearly much simpler than an LL parser to implement, however (just as an LL one is simpler than an LR one).
So my question is, what are the advantages/problems one might encounter when using either of the algorithms? Why might one ever pick LL over recursive descent, given that it works on the same set of grammars and is trickier to implement?
LL is usually a more efficient parsing technique than recursive-descent. In fact, a naive recursive-descent parser will actually be O(k^n) (where n is the input size) in the worst case. Some techniques such as memoization (which yields a Packrat parser) can improve this as well as extend the class of grammars accepted by the parser, but there is always a space tradeoff. LL parsers are (to my knowledge) always linear time.
On the flip side, you are correct in your intuition that recursive-descent parsers can handle a greater class of grammars than LL. Recursive-descent can handle any grammar which is LL(*) (that is, unlimited lookahead) as well as a small set of ambiguous grammars. This is because recursive-descent is actually a directly-encoded implementation of PEGs, or Parser Expression Grammar(s). Specifically, the disjunctive operator (a | b) is not commutative, meaning that a | b does not equal b | a. A recursive-descent parser will try each alternative in order. So if a matches the input, it will succeed even if b would have matched the input. This allows classic "longest match" ambiguities like the dangling else problem to be handled simply by ordering disjunctions correctly.
With all of that said, it is possible to implement an LL(k) parser using recursive-descent so that it runs in linear time. This is done by essentially inlining the predict sets so that each parse routine determines the appropriate production for a given input in constant time. Unfortunately, such a technique eliminates an entire class of grammars from being handled. Once we get into predictive parsing, problems like dangling else are no longer solvable with such ease.
As for why LL would be chosen over recursive-descent, it's mainly a question of efficiency and maintainability. Recursive-descent parsers are markedly easier to implement, but they're usually harder to maintain since the grammar they represent does not exist in any declarative form. Most non-trivial parser use-cases employ a parser generator such as ANTLR or Bison. With such tools, it really doesn't matter if the algorithm is directly-encoded recursive-descent or table-driven LL(k).
As a matter of interest, it is also worth looking into recursive-ascent, which is a parsing algorithm directly encoded after the fashion of recursive-descent, but capable of handling any LALR grammar. I would also dig into parser combinators, which are a functional way of composing recursive-descent parsers together.