Recently I wrote a (highly optimized) LALR(1) parser (that could handle ambiguous grammars) and supplied it a very ambiguous grammar. After that, I wrote a recursive descent parser for the same grammar, but with all the ambiguity taken out. I've read many times that LALR(1) parsers are very efficient, and recursive descent parsers are very inefficient, so I naturally expected the LALR parser to run much faster, even though it had an ambiguous grammar. When I compared the results of the two runs, I was shocked-- the recursive descent parser much faster than the LALR parser! Why was the LALR parser slower than the recursive descent parser? Was it because the LALR parser had an ambiguous grammar?
Related
I've read that Earley is easier to use and that it can handle more cases than LL(k) (see https://www.wikiwand.com/en/Earley_parser):
Earley parsers are appealing because they can parse all context-free languages, unlike LR parsers and LL parsers, which are more typically used in compilers but which can only handle restricted classes of languages.
But I cannot find a simple example that shows Earley has an advantage over LL(k).
That quote (and the Wikipedia entry it comes from) do not claim that the Earley parsing algorithm is faster than LL(k), nor that it allows for shorter grammars. What it claims is that the Earley parser can parse grammars which cannot be parsed by LL(k) or LR(k) grammars. That is true; the Earley parser can parse any context-free-grammar.
A simple example of a grammar which cannot be parsed by an LR(k) parser is the language of palindromes (sentences which read the same left-to-right as right-left). Here's a grammar for even-length palindromes over the alphabet {a, b}:
S → ε
S → a S a
S → b S b
You can add odd-length palindromes by adding two more productions (S → a and S → b), and it should be easy to see how to extend it to a larger alphabet.
Note that the grammar is unambiguous; there is only one parse tree for every valid input. That's not an issue for Earley parsing -- the parser can produce a representation of all possible parses from an ambiguous grammar, although it might take longer than parsing an unambiguous grammar. However, LR(k) parsers only exist for unambiguous grammars (and, as shown by the above example, not for all unambiguous grammars).
In the above, I only mention LR(k) parsing, because LR(k) parsing is strictly more powerful than LL(k) parsing. Any grammar with an LL(k) parser can be parsed with an LR(k) parser, so if there is no LR(k) parser, there is also no LL(k) parser. However, the converse is not true: there are grammars which can be parsed by an LR(k) parser for which no LL(k') grammar exists, for any value of k'. Moreover, there are languages which have an LR(1) grammar and which have no LL(k) grammar, for any value of k. The proofs of these assertions can be found in any good textbook on automaton theory.
Any LL(k) language can be parsed in time linear to the length of the input using LL(k), LR(k) or Earley parsers. That is, the three algorithms have asymptotically equal computational complexity for grammars which qualify for all three algorithms. But asymptotic complexity isn't the full story. If you have an LR(1) grammar, it is probably faster (by a constant factor) to use an LR(1) parser, because the individual steps are just lookups in precomputed tables.
If the grammar is also LL(1), then a well-written recursive descent or table-driven LL(1) is also probably faster. (Comparisons between LR(1) and LL(1) parsers are not as clear-cut; a lot will depend on the quality of the parser code.)
But for values of k larger than 1, the Earley algorithm could well be the best choice, because of the size of LR(k) decision tables for larger values of k.
Since LR (1) grammars are broader than LALR or SLR grammars, I have doubts regarding the proof of LR grammars. If a grammar is SLR or LALR, then does that mean that the grammar is also LR (1)?
In fact, since the method that uses lookaheads is more complex to carry out, I can therefore use the SLR method (without lookahead) or the LALR method to prove that a grammar is LR (1).
If a grammar is SLR(k) or LALR(k) then it also LR(k). If a conflict is found while constructing the LR parser, that same conflict will be present in an LALR or SLR parser for the same grammar.
sorry for the amateurish question. I have a grammar that's LL and I want to write an LR grammar. Can I use the LL grammar, make minimal syntactical changes for it to fit with an LR parser and use it? Is that a bad idea? Are there structural differences between them that don't translate?
All LL(1) grammars are LR(1), so if you had an LR(1) parser generator, you could definitely use your LL(1) grammar, assuming the BNF syntax is that used by the parser generator.
But you probably don't have an LR(1) parser generator, but rather a parser generator which can only handle the LALR(1) subset of LR(1) grammars. All the same, you're probably fine. "Most" LL(1) grammars are in LALR(1), and it's pretty rare to find a useful LL(1) which is not. (This pattern is unlikely to arise in a practical grammar, for example.)
So it's probably possible. But it might not be a good idea.
Top-down parsers can't handle left-recursion, and without left-recursion you can't write a grammar which represents left-associative operators, which is to say most arithmetic operators. This problem is usually solved in practice by using a right-associative grammar along with an idiosyncratic evaluation function which corrects the associativity. That's less than ideal. Also, LL grammars created by mechanically removing left-recursion tend to be very hard to read.
So you are probably best off using a grammar designed for LR parsing. But you probably don't have to.
to implement a recursive descent parser is the first and follow sets required? and if so can you still build the recursive descent given non uniqueness in the first and follows?
I'm having a hard time distinguishing between recursive descent and ll(1) parsing.
Thanks.
Recursive descent parsers do not have to be deterministic, i.e. one can construct recursive descent parsers that cannot decide which derivation to choose after a finite constant lookahead.
LL(k) parsers construct a parse tree incrementally, each new character will extend the parse tree.
Nondetermistic recursive descent parsers can build a parse tree, which is discarded completely on the occurrence of a certain character.
Examples for recursive descent which is not necessarily LL(k):
Parsing in PROLOG (backtracking)
Packrat Parsing (backtracking with memoization)
What is the relation between parser combinators and recursive descent parsers?
The Wikipedia link on parser combinators is actually pretty reasonable. From it one of the first things we learn is that "Parser combinators use a top-down parsing strategy", i.e. recursive descent.
Combinators themselves are building blocks for parsers, but they lean toward recursive descent.