What happens if you directly use LL grammar for an LR parser, after making basic syntactical changes? - parsing

sorry for the amateurish question. I have a grammar that's LL and I want to write an LR grammar. Can I use the LL grammar, make minimal syntactical changes for it to fit with an LR parser and use it? Is that a bad idea? Are there structural differences between them that don't translate?

All LL(1) grammars are LR(1), so if you had an LR(1) parser generator, you could definitely use your LL(1) grammar, assuming the BNF syntax is that used by the parser generator.
But you probably don't have an LR(1) parser generator, but rather a parser generator which can only handle the LALR(1) subset of LR(1) grammars. All the same, you're probably fine. "Most" LL(1) grammars are in LALR(1), and it's pretty rare to find a useful LL(1) which is not. (This pattern is unlikely to arise in a practical grammar, for example.)
So it's probably possible. But it might not be a good idea.
Top-down parsers can't handle left-recursion, and without left-recursion you can't write a grammar which represents left-associative operators, which is to say most arithmetic operators. This problem is usually solved in practice by using a right-associative grammar along with an idiosyncratic evaluation function which corrects the associativity. That's less than ideal. Also, LL grammars created by mechanically removing left-recursion tend to be very hard to read.
So you are probably best off using a grammar designed for LR parsing. But you probably don't have to.

Related

What type of grammar is used to parse Lua?

I recently met the concept of LR, LL etc. Which category is Lua? Are there, or, can there be implementations that differ from the official code in this aspect?
LR, LL and so on are algorithms which attempt to find a parser for a given grammar. This is not possible for every grammar, and you can categorize on the basis of that possibility. But you have to be aware of the difference between a language and a grammar.
It might be possible to create an LR(k) parser for a given grammar, for some specific value of k. If so, the grammar is LR(k). Note that an LR(k) grammar is also an LR(k+1) grammar, and that an LL(k) grammar is also LR(k). So these are not categories in the sense that every grammar is in exactly one category.
Any language can be recognised by many different grammars. (In fact, an unlimited number). These grammars can be arbitrarily complex. You can always write a grammar for a given language which is not even context-free. We say that a language is <X> if there exists a grammar for that language which is <X>. But the fact that a specific grammar for that language is not <X> says nothing.
One interesting theorem demonstrates that if there is an LR(k) grammar for any language, then it is possible to derive an LR(1) grammar for that language. So while the k parameter is useful for describing grammars, languages can only be LR(0) or LR(1). This is not true of LL(k) languages, though.
Lua as a language is basically LR(1) and LL(2). The grammar is part of the reference manual, except that the published grammar doesn't specify operator precedences or a few rules having to do with newlines. The actual parser is a hand-written recursive-descent parser (at least, the last time I looked) with a couple of small deviations in order to handle operator precedence and the minor deviations from LL(1). However, there exist LALR(1) parsers for Lua as well.

Is there a type of parser generator that handles all deterministic context-free grammars?

I need a way of generating parsers for all deterministic context-free grammars.
I know that every deterministic context-free grammar can be parsed by some LR(k) parser. The problem is that I need to generate parsers for grammars of unknown k. So, to handle every deterministic context-free grammar, k would need to be infinite.
I also know that GLR parsers can parse all context-free grammars, deterministic or not. But I need to reject non-deterministic grammars. I'm not sure if GLR can detect that property from an input grammar.
Is there a type of parser generator that can handle all deterministic context-free grammars, while rejecting non-deterministic grammars, without needing a k input? (The only input is the grammar itself)
The problem of “given a CFG, decide whether it’s LR(k) for any k” is, surprisingly, undecidable! This means that it’s not possible for any parser generator to always be able to take an arbitrary grammar and determine which choice of k to use, or even if such a choice of k exists.
In practice, most grammars that we care about are fairly close to LR(1), for some definition of “fairly close,” which is why most parser generators focus on that simpler case.

How to make LALR(1) parser directly?

I studied LR(1) parsers and then LALR(1) and noticed that if we want to construct LALR(1) parsers, we should FIRST construct the LR(1) parser and then, by combining some states with the same core, we can go ahead for LALR(1) parser. (For complex grammars, it's not easy to construct LR parsers)
Now a question comes to mind: can we make LALR(1) parser DIRECTLY? Without using (Or maybe constructing) LR(1) parser? If Yes, How?
Thanks in advance!
PARSING TECHNIQUES A Practical Guide by Dick Grune and Ceriel J.H. Jacobs is worth getting. The Lemon Parser generator (http://www.hwaci.com/sw/lemon/) has readable code too.

What kind of parser is a pratt parser?

I'm implementing pratt's top down operator precedence parser and I'd like to know in which formal category it falls into - is it LR(1)?
Pratt parser are not LR parsers. And they're not exactly LL parsers either. In fact, Pratt parsers are generally hand-coded in some general purpose programming language; the technique is not based on an abstraction like push-down finite state automata. This makes it somewhat more difficult to prove assertions about a given Pratt parser, such as that it recognizes a particular formal language.
In general, Pratt parsers can easily be designed to recognize a language if the grammar is an operator precedence grammar, so they can be considered to be a dual of operator precedence parsing, even though operator precedence parsing is bottom-up and Pratt parsers are nominally top-down. Tracing a Pratt parser and the transitions of an operator precedence parser for the same language will show the similarity.
So I suppose that it might be possible to come up with a formalism for Pratt parsers, but as far as I know, none exists.

Difference between an LL and Recursive Descent parser?

I've recently being trying to teach myself how parsers (for languages/context-free grammars) work, and most of it seems to be making sense, except for one thing. I'm focusing my attention in particular on LL(k) grammars, for which the two main algorithms seem to be the LL parser (using stack/parse table) and the Recursive Descent parser (simply using recursion).
As far as I can see, the recursive descent algorithm works on all LL(k) grammars and possibly more, whereas an LL parser works on all LL(k) grammars. A recursive descent parser is clearly much simpler than an LL parser to implement, however (just as an LL one is simpler than an LR one).
So my question is, what are the advantages/problems one might encounter when using either of the algorithms? Why might one ever pick LL over recursive descent, given that it works on the same set of grammars and is trickier to implement?
LL is usually a more efficient parsing technique than recursive-descent. In fact, a naive recursive-descent parser will actually be O(k^n) (where n is the input size) in the worst case. Some techniques such as memoization (which yields a Packrat parser) can improve this as well as extend the class of grammars accepted by the parser, but there is always a space tradeoff. LL parsers are (to my knowledge) always linear time.
On the flip side, you are correct in your intuition that recursive-descent parsers can handle a greater class of grammars than LL. Recursive-descent can handle any grammar which is LL(*) (that is, unlimited lookahead) as well as a small set of ambiguous grammars. This is because recursive-descent is actually a directly-encoded implementation of PEGs, or Parser Expression Grammar(s). Specifically, the disjunctive operator (a | b) is not commutative, meaning that a | b does not equal b | a. A recursive-descent parser will try each alternative in order. So if a matches the input, it will succeed even if b would have matched the input. This allows classic "longest match" ambiguities like the dangling else problem to be handled simply by ordering disjunctions correctly.
With all of that said, it is possible to implement an LL(k) parser using recursive-descent so that it runs in linear time. This is done by essentially inlining the predict sets so that each parse routine determines the appropriate production for a given input in constant time. Unfortunately, such a technique eliminates an entire class of grammars from being handled. Once we get into predictive parsing, problems like dangling else are no longer solvable with such ease.
As for why LL would be chosen over recursive-descent, it's mainly a question of efficiency and maintainability. Recursive-descent parsers are markedly easier to implement, but they're usually harder to maintain since the grammar they represent does not exist in any declarative form. Most non-trivial parser use-cases employ a parser generator such as ANTLR or Bison. With such tools, it really doesn't matter if the algorithm is directly-encoded recursive-descent or table-driven LL(k).
As a matter of interest, it is also worth looking into recursive-ascent, which is a parsing algorithm directly encoded after the fashion of recursive-descent, but capable of handling any LALR grammar. I would also dig into parser combinators, which are a functional way of composing recursive-descent parsers together.

Resources