What type of grammar is used to parse Lua? - parsing

I recently met the concept of LR, LL etc. Which category is Lua? Are there, or, can there be implementations that differ from the official code in this aspect?

LR, LL and so on are algorithms which attempt to find a parser for a given grammar. This is not possible for every grammar, and you can categorize on the basis of that possibility. But you have to be aware of the difference between a language and a grammar.
It might be possible to create an LR(k) parser for a given grammar, for some specific value of k. If so, the grammar is LR(k). Note that an LR(k) grammar is also an LR(k+1) grammar, and that an LL(k) grammar is also LR(k). So these are not categories in the sense that every grammar is in exactly one category.
Any language can be recognised by many different grammars. (In fact, an unlimited number). These grammars can be arbitrarily complex. You can always write a grammar for a given language which is not even context-free. We say that a language is <X> if there exists a grammar for that language which is <X>. But the fact that a specific grammar for that language is not <X> says nothing.
One interesting theorem demonstrates that if there is an LR(k) grammar for any language, then it is possible to derive an LR(1) grammar for that language. So while the k parameter is useful for describing grammars, languages can only be LR(0) or LR(1). This is not true of LL(k) languages, though.
Lua as a language is basically LR(1) and LL(2). The grammar is part of the reference manual, except that the published grammar doesn't specify operator precedences or a few rules having to do with newlines. The actual parser is a hand-written recursive-descent parser (at least, the last time I looked) with a couple of small deviations in order to handle operator precedence and the minor deviations from LL(1). However, there exist LALR(1) parsers for Lua as well.

Related

What happens if you directly use LL grammar for an LR parser, after making basic syntactical changes?

sorry for the amateurish question. I have a grammar that's LL and I want to write an LR grammar. Can I use the LL grammar, make minimal syntactical changes for it to fit with an LR parser and use it? Is that a bad idea? Are there structural differences between them that don't translate?
All LL(1) grammars are LR(1), so if you had an LR(1) parser generator, you could definitely use your LL(1) grammar, assuming the BNF syntax is that used by the parser generator.
But you probably don't have an LR(1) parser generator, but rather a parser generator which can only handle the LALR(1) subset of LR(1) grammars. All the same, you're probably fine. "Most" LL(1) grammars are in LALR(1), and it's pretty rare to find a useful LL(1) which is not. (This pattern is unlikely to arise in a practical grammar, for example.)
So it's probably possible. But it might not be a good idea.
Top-down parsers can't handle left-recursion, and without left-recursion you can't write a grammar which represents left-associative operators, which is to say most arithmetic operators. This problem is usually solved in practice by using a right-associative grammar along with an idiosyncratic evaluation function which corrects the associativity. That's less than ideal. Also, LL grammars created by mechanically removing left-recursion tend to be very hard to read.
So you are probably best off using a grammar designed for LR parsing. But you probably don't have to.

Is there a type of parser generator that handles all deterministic context-free grammars?

I need a way of generating parsers for all deterministic context-free grammars.
I know that every deterministic context-free grammar can be parsed by some LR(k) parser. The problem is that I need to generate parsers for grammars of unknown k. So, to handle every deterministic context-free grammar, k would need to be infinite.
I also know that GLR parsers can parse all context-free grammars, deterministic or not. But I need to reject non-deterministic grammars. I'm not sure if GLR can detect that property from an input grammar.
Is there a type of parser generator that can handle all deterministic context-free grammars, while rejecting non-deterministic grammars, without needing a k input? (The only input is the grammar itself)
The problem of “given a CFG, decide whether it’s LR(k) for any k” is, surprisingly, undecidable! This means that it’s not possible for any parser generator to always be able to take an arbitrary grammar and determine which choice of k to use, or even if such a choice of k exists.
In practice, most grammars that we care about are fairly close to LR(1), for some definition of “fairly close,” which is why most parser generators focus on that simpler case.

What kind of parser is a pratt parser?

I'm implementing pratt's top down operator precedence parser and I'd like to know in which formal category it falls into - is it LR(1)?
Pratt parser are not LR parsers. And they're not exactly LL parsers either. In fact, Pratt parsers are generally hand-coded in some general purpose programming language; the technique is not based on an abstraction like push-down finite state automata. This makes it somewhat more difficult to prove assertions about a given Pratt parser, such as that it recognizes a particular formal language.
In general, Pratt parsers can easily be designed to recognize a language if the grammar is an operator precedence grammar, so they can be considered to be a dual of operator precedence parsing, even though operator precedence parsing is bottom-up and Pratt parsers are nominally top-down. Tracing a Pratt parser and the transitions of an operator precedence parser for the same language will show the similarity.
So I suppose that it might be possible to come up with a formalism for Pratt parsers, but as far as I know, none exists.

Converting a context-free grammar into a LL(1) grammar

First off, I have checked similar questions and none has quite the information I seek.
I want to know if, given a context-free grammar, it is possible to:
Know if there exists or not an equivalent LL(1) grammar. Equivalent in the sense that it should generate the same language.
Convert the context-free grammar to the equivalent LL(1) grammar, given it exists. The conversion should succeed if an equivalent LL(1) grammar exists. It is OK if it does not terminate if an equivalent does not exists.
If the answer to those questions are yes, are such algorithms or tools available somewhere ? My own researches have been fruitless.
Another answer mentions that the Dragon Book has an algorithms to eliminate left recursion and to left factor a context-free grammar. I have access to the book and checked it but it is unclear to me if the grammar is guaranteed to be LL(1). The restriction imposed on the context-free grammar (no null production and no cycle) are agreeable to me.
From university compiler courses I took I remember that LL(1) grammar is context free grammar, but context free grammar is much bigger than LL(1). There is no general algorithm (meaning not NP hard) to check and convert from context-free (that can be transformed to LL(1)) to LL(1).
Applying the bag of tricks like removing left recursion, removing first-follow conflict, left-factoring, etc. are similar to mathematical transformation when you want to integrate a function... You need experience that is sometimes very close to an art. The transformations are often inverse of each other.
One reason why LR type grammars are being used now a lot for generated parsers is that they cover much wider spectrum of context free grammars than LL(1).
Btw. e.g. C grammar can be expressed as LL(1), but C# cannot (e.g. lambda function x -> x + 1 comes to mind, where you cannot decide if you are seeing a parameter of lambda or a known variable).

How to determine if a language is LL(1)?

I have a grammar and I can check whether or not is is LL(1). However, is there any way to check if the language generated by the grammar is LL(1)? And what exactly is the difference between LL(1) grammars and LL(1) languages?
Any grammar that is LL(1) defines an LL(1) language. By definition, a language is LL(1) if there is some grammar that generates it that is LL(1), so the fact that you have an LL(1) grammar for the language automatically means that the language is LL(1).
To elaborate, a language is a set of strings and a grammar for that language is a means of describing that language. Some languages have LL(1) grammars while others do not. However, the fact that a grammar is not LL(1) does not mean that the language it describes is not. For example, consider this grammar:
A -> ab | ac
This grammar is not LL(1) because it contains a FIRST/FIRST conflict when trying to predict the production for A when seeing terminal a. However, it describes an LL(1) language, since the language is also described by the grammar
A -> aX
X -> b | c
So the language generated by these grammars (which just contains ab and ac) is indeed LL(1).
Determining whether the language described by an arbitrary grammar is LL(1) is much harder and to the best of my knowledge the only way to do it would be to either explicitly exhibit an LL(1) grammar for the language generated by the initial grammar (which is tricky) or to mathematically prove that no such grammar exists.
Hope this helps!

Resources