I recently met the concept of LR, LL etc. Which category is Lua? Are there, or, can there be implementations that differ from the official code in this aspect?
LR, LL and so on are algorithms which attempt to find a parser for a given grammar. This is not possible for every grammar, and you can categorize on the basis of that possibility. But you have to be aware of the difference between a language and a grammar.
It might be possible to create an LR(k) parser for a given grammar, for some specific value of k. If so, the grammar is LR(k). Note that an LR(k) grammar is also an LR(k+1) grammar, and that an LL(k) grammar is also LR(k). So these are not categories in the sense that every grammar is in exactly one category.
Any language can be recognised by many different grammars. (In fact, an unlimited number). These grammars can be arbitrarily complex. You can always write a grammar for a given language which is not even context-free. We say that a language is <X> if there exists a grammar for that language which is <X>. But the fact that a specific grammar for that language is not <X> says nothing.
One interesting theorem demonstrates that if there is an LR(k) grammar for any language, then it is possible to derive an LR(1) grammar for that language. So while the k parameter is useful for describing grammars, languages can only be LR(0) or LR(1). This is not true of LL(k) languages, though.
Lua as a language is basically LR(1) and LL(2). The grammar is part of the reference manual, except that the published grammar doesn't specify operator precedences or a few rules having to do with newlines. The actual parser is a hand-written recursive-descent parser (at least, the last time I looked) with a couple of small deviations in order to handle operator precedence and the minor deviations from LL(1). However, there exist LALR(1) parsers for Lua as well.
I know that every LL(1) is also an LR(1). But what about the relationship between LL(1) and LR(0), can a LL(1) be a LR(0) as well?
You ask two questions, one in the title and the other in the body of the post. Neither specify whether you are asking about languages or grammars, but the basic answers are the same:
Are all LL(1) languages LR(0)?
No. A language which contains both a string and a proper prefix of that string cannot be LR(0). But many LL(1) languages have that form.
Are some LL(1) languages LR(0)?
Sure.
(The unasked question) Are any LR(0) languages not LL(1).
Yes. For example, the language {ambnc | m≥n≥0} is LR(0), but it has no LL(1) grammar.
I looked on google for know if it was possible to create an LL parser using a BNF grammar but I saw on wikipedia that they use something like
S → F
S → ( S + F )
F → a
which is not a BNF grammar. Is it possible to use BNF grammar for create an LL parser or you have to use only LL grammar ?
Thank's
Yes you can. BNF is just a notation to represent context free grammars. Roughly ::= is the arrow, a <symbol> that appears to the left is a non-terminal and "literals" are terminal symbols (see here)
Take into account that not all context-free grammars are LL(1), there are grammars that have no LL parser.
If you use a parser generator you'll find out fast wether your grammar is LL or not. But if it isn't it may be difficult to correct it (especially if you have no background in language theory).
I've been playing around with a lot of grammars that are not LL(1) recently, and many of them can be transformed into grammars that are LL(1).
However, I have never seen an example of an unambiguous language that is not LL(1). In other words, a language for which any unambiguous grammar for the language is not LL(1)), nor do I have any idea how I would go about proving that I had found one if I accidentally stumbled across one.
Does anyone know how to prove that a particular unambiguous language is not LL(1)?
I was thinking about the question a while and then found this language at Wikipedia:
S -> A | B
A -> 'a' A 'b' | ε
B -> 'a' B 'b' 'b' | ε
They claim the language described by the grammar above cannot be described by LL(k) grammar. You asked about LL(1) only and this is pretty straightforward. Having first symbol only, you don't know if the sequence is 'ab' or 'aab' (or any more recursive one) and therefore you cannot choose the right rule. So the language is definitely not LL(1).
Also for every sequence generated by this grammar there is only one derivation tree. So the language is unambiguous.
The second part of your question is a little harder. It is much easier to prove the language is LL(1), than the opposite (there is no LL(1) grammar describing the language). I think you just create a grammar describing the language, then you try to make it LL(1). After discovering a conflict which cannot be resolved you somehow have to take advantage of it and create a proof.
Is there a simple way to determine whether a grammar is LL(1), LR(0), SLR(1)... just from looking on the grammar without doing any complex analysis?
For instance: To decide whether a BNF Grammar is LL(1) you have to calculate First and Follow sets - which can be time consuming in some cases.
Has anybody got an idea how to do this faster?
Any help would really be appreciated!
First off, a bit of pedantry. You cannot determine whether a language is LL(1) from inspecting a grammar for it, you can only make statements about the grammar itself. It is perfectly possible to write non-LL(1) grammars for languages for which an LL(1) grammar exists.
With that out of the way:
You could write a parser for the grammar and have a program calculate first and follow sets and other properties for you. After all, that's the big advantage of BNF grammars, they are machine comprehensible.
Inspect the grammar and look for violations of the constraints of various grammar types. For instance: LL(1) allows for right but not left recursion, thus, a grammar that contains left recursion is not LL(1). (For other grammar properties you're going to have to spend some quality time with the definitions, because I can't remember anything else off the top of my head right now :).
In answer to your main question: For a very simple grammar, it may be possible to determine whether it is LL(1) without constructing FIRST and FOLLOW sets, e.g.
A → A + A | a
is not LL(1), while
A → a | b
is.
But when you get more complex than that, you'll need to do some analysis.
A → B | a
B → A + A
This is not LL(1), but it may not be immediately obvious
The grammar rules for arithmetic quickly get very complex:
expr → term { '+' term }
term → factor { '*' factor }
factor → number | '(' expr ')'
This grammar handles only multiplication and addition, and already it's not immediately clear whether the grammar is LL(1). It's still possible to evaluate it by looking through the grammar, but as the grammar grows it becomes less feasable. If we're defining a grammar for an entire programming language, it's almost certainly going to take some complex analysis.
That said, there are a few obvious telltale signs that the grammar is not LL(1) — like the A → A + A above — and if you can find any of these in your grammar, you'll know it needs to be rewritten if you're writing a recursive descent parser. But there's no shortcut to verify that the grammar is LL(1).
One aspect, "is the language/grammar ambiguous", is a known undecidable question like the Post correspondence and halting problems.
Straight from the book "Compilers: Principles, Techniques, & Tools" by Aho, et. al.
Page 223:
A grammar G is LL(1) if and only if whenever A -> alpha | beta are two distinct productions of G, the following conditions hold:
For no terminal "a" do both alpha and beta derive strings beginning with "a"
At most one of alpha and beta can derive the empty string
If beta can reach the empty transition via zero or more transitions, then alpha does not derive any string beginning with a terminal in FOLLOW(A). Likewise, if alpha can reach the empty transition via zero or more transitions, then beta does not derive any string beginning with a terminal in FOLLOW(A)
Essentially this is a matter of verifying the grammar passes the Pairwise Disjointness Test and also does not involve Left Recursion. Or more succinctly a grammar G that is left-recursive or ambiguous cannot be LL(1).
Check whether the grammar is ambiguous or not. If it is, then the grammar is not LL(1) because no ambiguous grammar is LL(1).
ya there are shortcuts for ll(1) grammar
1) if A->B1|B2|.......|Bn
then first(B1)intersection first(B2)intersection .first(Bn)=empty set then it is ll(1) grammar
2) if A->B1|epsilon
then B1 intersection follow(A)is empty set
3) if G is any grammar such that every non terminal derives only one production then the grammar is LL(1)
p0 S' → E
p1 E → id
p2 E → id ( E )
p3 E → E + id
Construct the LR(0) DFA, the FOLLOW set for E and the SLR action/goto tables.
Is this an LR(0) grammar? Prove your answer.
Using the SLR tables, show the steps (shifts, reductions, accept) of an LR parser parsing: id ( id + id )