What is the relation between parser combinators and recursive descent parsers?
The Wikipedia link on parser combinators is actually pretty reasonable. From it one of the first things we learn is that "Parser combinators use a top-down parsing strategy", i.e. recursive descent.
Combinators themselves are building blocks for parsers, but they lean toward recursive descent.
Related
Recently I wrote a (highly optimized) LALR(1) parser (that could handle ambiguous grammars) and supplied it a very ambiguous grammar. After that, I wrote a recursive descent parser for the same grammar, but with all the ambiguity taken out. I've read many times that LALR(1) parsers are very efficient, and recursive descent parsers are very inefficient, so I naturally expected the LALR parser to run much faster, even though it had an ambiguous grammar. When I compared the results of the two runs, I was shocked-- the recursive descent parser much faster than the LALR parser! Why was the LALR parser slower than the recursive descent parser? Was it because the LALR parser had an ambiguous grammar?
I've watched this course by Alex Aiken and read through many other resources. But I'm struggling to find clear classification of top-down parsers.
This document doesn't provide a clear classification either but at least gives some definitions I'll use in the post. So here is the classification I've come up so far:
Backtracking VS Predictive
Backtracking
One solution to parsing would be to implement backtracking. Based on the information
the parser currently has about the input, a decision is made to go with one particular
production. If this choice leads to a dead end, the parser would have to backtrack to that
decision point, moving backwards through the input, and start again making a different
choice and so on until it either found the production that was the appropriate one or ran
out of choices.
Predictive
A predictive
parser is characterized by its ability to choose the production to apply solely on the basis
of the next input symbol and the current nonterminal being processed.
Recursive descent VS table-driven
Recursive descent
A
recursive-descent parser consists of several small functions, one for each nonterminal in
the grammar. As we parse a sentence, we call the functions that correspond to the left
side nonterminal of the productions we are applying. If these productions are recursive,
we end up calling the functions recursively.
Table driven
There is another method for implementing a predictive parser that uses a table to store that production along with an explicit stack to keep track of where we are in the parse
As I understand now I have four different types of parsers:
Recursive descent + backtracking
Recursive descent + prediction
Table-driven + backtracking
Table-driven + prediction
If I'm correct, can some also tell me where in the following 4 types of parsers the LL(k) parser falls?
No. You have:
backtracking vs predictive
recursive descent vs table-driven
So you can have:
recursive descent backtracking
recursive descent predictive
table-driven with backtracking
table-driven predictive.
To be specific, 'Recursive descent with table/stack implementation' is a contradiction in terms.
All table-driven parser implementations need a stack. This is not a dichotomy.
where in the following 4 types of parsers the LL(k) parser falls?
Anywhere.
I studied LR(1) parsers and then LALR(1) and noticed that if we want to construct LALR(1) parsers, we should FIRST construct the LR(1) parser and then, by combining some states with the same core, we can go ahead for LALR(1) parser. (For complex grammars, it's not easy to construct LR parsers)
Now a question comes to mind: can we make LALR(1) parser DIRECTLY? Without using (Or maybe constructing) LR(1) parser? If Yes, How?
Thanks in advance!
PARSING TECHNIQUES A Practical Guide by Dick Grune and Ceriel J.H. Jacobs is worth getting. The Lemon Parser generator (http://www.hwaci.com/sw/lemon/) has readable code too.
to implement a recursive descent parser is the first and follow sets required? and if so can you still build the recursive descent given non uniqueness in the first and follows?
I'm having a hard time distinguishing between recursive descent and ll(1) parsing.
Thanks.
Recursive descent parsers do not have to be deterministic, i.e. one can construct recursive descent parsers that cannot decide which derivation to choose after a finite constant lookahead.
LL(k) parsers construct a parse tree incrementally, each new character will extend the parse tree.
Nondetermistic recursive descent parsers can build a parse tree, which is discarded completely on the occurrence of a certain character.
Examples for recursive descent which is not necessarily LL(k):
Parsing in PROLOG (backtracking)
Packrat Parsing (backtracking with memoization)
I've recently being trying to teach myself how parsers (for languages/context-free grammars) work, and most of it seems to be making sense, except for one thing. I'm focusing my attention in particular on LL(k) grammars, for which the two main algorithms seem to be the LL parser (using stack/parse table) and the Recursive Descent parser (simply using recursion).
As far as I can see, the recursive descent algorithm works on all LL(k) grammars and possibly more, whereas an LL parser works on all LL(k) grammars. A recursive descent parser is clearly much simpler than an LL parser to implement, however (just as an LL one is simpler than an LR one).
So my question is, what are the advantages/problems one might encounter when using either of the algorithms? Why might one ever pick LL over recursive descent, given that it works on the same set of grammars and is trickier to implement?
LL is usually a more efficient parsing technique than recursive-descent. In fact, a naive recursive-descent parser will actually be O(k^n) (where n is the input size) in the worst case. Some techniques such as memoization (which yields a Packrat parser) can improve this as well as extend the class of grammars accepted by the parser, but there is always a space tradeoff. LL parsers are (to my knowledge) always linear time.
On the flip side, you are correct in your intuition that recursive-descent parsers can handle a greater class of grammars than LL. Recursive-descent can handle any grammar which is LL(*) (that is, unlimited lookahead) as well as a small set of ambiguous grammars. This is because recursive-descent is actually a directly-encoded implementation of PEGs, or Parser Expression Grammar(s). Specifically, the disjunctive operator (a | b) is not commutative, meaning that a | b does not equal b | a. A recursive-descent parser will try each alternative in order. So if a matches the input, it will succeed even if b would have matched the input. This allows classic "longest match" ambiguities like the dangling else problem to be handled simply by ordering disjunctions correctly.
With all of that said, it is possible to implement an LL(k) parser using recursive-descent so that it runs in linear time. This is done by essentially inlining the predict sets so that each parse routine determines the appropriate production for a given input in constant time. Unfortunately, such a technique eliminates an entire class of grammars from being handled. Once we get into predictive parsing, problems like dangling else are no longer solvable with such ease.
As for why LL would be chosen over recursive-descent, it's mainly a question of efficiency and maintainability. Recursive-descent parsers are markedly easier to implement, but they're usually harder to maintain since the grammar they represent does not exist in any declarative form. Most non-trivial parser use-cases employ a parser generator such as ANTLR or Bison. With such tools, it really doesn't matter if the algorithm is directly-encoded recursive-descent or table-driven LL(k).
As a matter of interest, it is also worth looking into recursive-ascent, which is a parsing algorithm directly encoded after the fashion of recursive-descent, but capable of handling any LALR grammar. I would also dig into parser combinators, which are a functional way of composing recursive-descent parsers together.