Finding a language that is not LL(1)?

Finding a language that is not LL(1)? - parsing

I've been playing around with a lot of grammars that are not LL(1) recently, and many of them can be transformed into grammars that are LL(1).
However, I have never seen an example of an unambiguous language that is not LL(1). In other words, a language for which any unambiguous grammar for the language is not LL(1)), nor do I have any idea how I would go about proving that I had found one if I accidentally stumbled across one.
Does anyone know how to prove that a particular unambiguous language is not LL(1)?

I was thinking about the question a while and then found this language at Wikipedia:
S -> A | B
A -> 'a' A 'b' | ε
B -> 'a' B 'b' 'b' | ε
They claim the language described by the grammar above cannot be described by LL(k) grammar. You asked about LL(1) only and this is pretty straightforward. Having first symbol only, you don't know if the sequence is 'ab' or 'aab' (or any more recursive one) and therefore you cannot choose the right rule. So the language is definitely not LL(1).
Also for every sequence generated by this grammar there is only one derivation tree. So the language is unambiguous.
The second part of your question is a little harder. It is much easier to prove the language is LL(1), than the opposite (there is no LL(1) grammar describing the language). I think you just create a grammar describing the language, then you try to make it LL(1). After discovering a conflict which cannot be resolved you somehow have to take advantage of it and create a proof.

Related

What type of grammar is used to parse Lua?

I recently met the concept of LR, LL etc. Which category is Lua? Are there, or, can there be implementations that differ from the official code in this aspect?

LR, LL and so on are algorithms which attempt to find a parser for a given grammar. This is not possible for every grammar, and you can categorize on the basis of that possibility. But you have to be aware of the difference between a language and a grammar.
It might be possible to create an LR(k) parser for a given grammar, for some specific value of k. If so, the grammar is LR(k). Note that an LR(k) grammar is also an LR(k+1) grammar, and that an LL(k) grammar is also LR(k). So these are not categories in the sense that every grammar is in exactly one category.
Any language can be recognised by many different grammars. (In fact, an unlimited number). These grammars can be arbitrarily complex. You can always write a grammar for a given language which is not even context-free. We say that a language is <X> if there exists a grammar for that language which is <X>. But the fact that a specific grammar for that language is not <X> says nothing.
One interesting theorem demonstrates that if there is an LR(k) grammar for any language, then it is possible to derive an LR(1) grammar for that language. So while the k parameter is useful for describing grammars, languages can only be LR(0) or LR(1). This is not true of LL(k) languages, though.
Lua as a language is basically LR(1) and LL(2). The grammar is part of the reference manual, except that the published grammar doesn't specify operator precedences or a few rules having to do with newlines. The actual parser is a hand-written recursive-descent parser (at least, the last time I looked) with a couple of small deviations in order to handle operator precedence and the minor deviations from LL(1). However, there exist LALR(1) parsers for Lua as well.

Example of grammar that works in LR(1) but not LALR(1)?

I can't seem to understand the difference between LALR(1) and LR(1) except that LALR(1) seems to have fewer states than LR(1) does.
I wonder if anyone has the example to show the difference and some explanation.
Thank you

There's an example in the Dragon book (Example 4.44; 4.58 if you have the second edition):
S' → S
S → aAd | bBd | aBe | bAe
A → c
B → c
Since the grammar only generates four strings, it's easy enough to create the LR item sets. When you do that, you'll see that there are two sets with the same items but different lookaheads, corresponding to the prefixes ac and bc. There are no conflicts, so the grammar is LR(1).
The LALR algorithm combines states whose items sets are the same, effectively merging their lookaheads. This creates a reduce/reduce conflict, so the grammar is not LALR(1).

Converting a context-free grammar into a LL(1) grammar

First off, I have checked similar questions and none has quite the information I seek.
I want to know if, given a context-free grammar, it is possible to:
Know if there exists or not an equivalent LL(1) grammar. Equivalent in the sense that it should generate the same language.
Convert the context-free grammar to the equivalent LL(1) grammar, given it exists. The conversion should succeed if an equivalent LL(1) grammar exists. It is OK if it does not terminate if an equivalent does not exists.
If the answer to those questions are yes, are such algorithms or tools available somewhere ? My own researches have been fruitless.
Another answer mentions that the Dragon Book has an algorithms to eliminate left recursion and to left factor a context-free grammar. I have access to the book and checked it but it is unclear to me if the grammar is guaranteed to be LL(1). The restriction imposed on the context-free grammar (no null production and no cycle) are agreeable to me.

From university compiler courses I took I remember that LL(1) grammar is context free grammar, but context free grammar is much bigger than LL(1). There is no general algorithm (meaning not NP hard) to check and convert from context-free (that can be transformed to LL(1)) to LL(1).
Applying the bag of tricks like removing left recursion, removing first-follow conflict, left-factoring, etc. are similar to mathematical transformation when you want to integrate a function... You need experience that is sometimes very close to an art. The transformations are often inverse of each other.
One reason why LR type grammars are being used now a lot for generated parsers is that they cover much wider spectrum of context free grammars than LL(1).
Btw. e.g. C grammar can be expressed as LL(1), but C# cannot (e.g. lambda function x -> x + 1 comes to mind, where you cannot decide if you are seeing a parameter of lambda or a known variable).

How to determine if a language is LL(1)?

I have a grammar and I can check whether or not is is LL(1). However, is there any way to check if the language generated by the grammar is LL(1)? And what exactly is the difference between LL(1) grammars and LL(1) languages?

Any grammar that is LL(1) defines an LL(1) language. By definition, a language is LL(1) if there is some grammar that generates it that is LL(1), so the fact that you have an LL(1) grammar for the language automatically means that the language is LL(1).
To elaborate, a language is a set of strings and a grammar for that language is a means of describing that language. Some languages have LL(1) grammars while others do not. However, the fact that a grammar is not LL(1) does not mean that the language it describes is not. For example, consider this grammar:
A -> ab | ac
This grammar is not LL(1) because it contains a FIRST/FIRST conflict when trying to predict the production for A when seeing terminal a. However, it describes an LL(1) language, since the language is also described by the grammar
A -> aX
X -> b | c
So the language generated by these grammars (which just contains ab and ac) is indeed LL(1).
Determining whether the language described by an arbitrary grammar is LL(1) is much harder and to the best of my knowledge the only way to do it would be to either explicitly exhibit an LL(1) grammar for the language generated by the initial grammar (which is tricky) or to mathematically prove that no such grammar exists.
Hope this helps!

How to determine whether a language is LL(1) LR(0) SLR(1)

Is there a simple way to determine whether a grammar is LL(1), LR(0), SLR(1)... just from looking on the grammar without doing any complex analysis?
For instance: To decide whether a BNF Grammar is LL(1) you have to calculate First and Follow sets - which can be time consuming in some cases.
Has anybody got an idea how to do this faster?
Any help would really be appreciated!

First off, a bit of pedantry. You cannot determine whether a language is LL(1) from inspecting a grammar for it, you can only make statements about the grammar itself. It is perfectly possible to write non-LL(1) grammars for languages for which an LL(1) grammar exists.
With that out of the way:
You could write a parser for the grammar and have a program calculate first and follow sets and other properties for you. After all, that's the big advantage of BNF grammars, they are machine comprehensible.
Inspect the grammar and look for violations of the constraints of various grammar types. For instance: LL(1) allows for right but not left recursion, thus, a grammar that contains left recursion is not LL(1). (For other grammar properties you're going to have to spend some quality time with the definitions, because I can't remember anything else off the top of my head right now :).

In answer to your main question: For a very simple grammar, it may be possible to determine whether it is LL(1) without constructing FIRST and FOLLOW sets, e.g.
A → A + A | a
is not LL(1), while
A → a | b
is.
But when you get more complex than that, you'll need to do some analysis.
A → B | a
B → A + A
This is not LL(1), but it may not be immediately obvious
The grammar rules for arithmetic quickly get very complex:
expr → term { '+' term }
term → factor { '*' factor }
factor → number | '(' expr ')'
This grammar handles only multiplication and addition, and already it's not immediately clear whether the grammar is LL(1). It's still possible to evaluate it by looking through the grammar, but as the grammar grows it becomes less feasable. If we're defining a grammar for an entire programming language, it's almost certainly going to take some complex analysis.
That said, there are a few obvious telltale signs that the grammar is not LL(1) — like the A → A + A above — and if you can find any of these in your grammar, you'll know it needs to be rewritten if you're writing a recursive descent parser. But there's no shortcut to verify that the grammar is LL(1).

One aspect, "is the language/grammar ambiguous", is a known undecidable question like the Post correspondence and halting problems.

Straight from the book "Compilers: Principles, Techniques, & Tools" by Aho, et. al.
Page 223:
A grammar G is LL(1) if and only if whenever A -> alpha | beta are two distinct productions of G, the following conditions hold:
For no terminal "a" do both alpha and beta derive strings beginning with "a"
At most one of alpha and beta can derive the empty string
If beta can reach the empty transition via zero or more transitions, then alpha does not derive any string beginning with a terminal in FOLLOW(A). Likewise, if alpha can reach the empty transition via zero or more transitions, then beta does not derive any string beginning with a terminal in FOLLOW(A)
Essentially this is a matter of verifying the grammar passes the Pairwise Disjointness Test and also does not involve Left Recursion. Or more succinctly a grammar G that is left-recursive or ambiguous cannot be LL(1).

Check whether the grammar is ambiguous or not. If it is, then the grammar is not LL(1) because no ambiguous grammar is LL(1).

ya there are shortcuts for ll(1) grammar
1) if A->B1|B2|.......|Bn
then first(B1)intersection first(B2)intersection .first(Bn)=empty set then it is ll(1) grammar
2) if A->B1|epsilon
then B1 intersection follow(A)is empty set
3) if G is any grammar such that every non terminal derives only one production then the grammar is LL(1)

p0 S' → E
p1 E → id
p2 E → id ( E )
p3 E → E + id
Construct the LR(0) DFA, the FOLLOW set for E and the SLR action/goto tables.
Is this an LR(0) grammar? Prove your answer.
Using the SLR tables, show the steps (shifts, reductions, accept) of an LR parser parsing: id ( id + id )

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Finding a language that is not LL(1)? - parsing

Related

What type of grammar is used to parse Lua?

Example of grammar that works in LR(1) but not LALR(1)?

Converting a context-free grammar into a LL(1) grammar

How to determine if a language is LL(1)?

How to determine whether a language is LL(1) LR(0) SLR(1)

Categories

Resources