Is it possible to create a LL parser using BNF grammar? - parsing

I looked on google for know if it was possible to create an LL parser using a BNF grammar but I saw on wikipedia that they use something like
S → F
S → ( S + F )
F → a
which is not a BNF grammar. Is it possible to use BNF grammar for create an LL parser or you have to use only LL grammar ?
Thank's

Yes you can. BNF is just a notation to represent context free grammars. Roughly ::= is the arrow, a <symbol> that appears to the left is a non-terminal and "literals" are terminal symbols (see here)
Take into account that not all context-free grammars are LL(1), there are grammars that have no LL parser.
If you use a parser generator you'll find out fast wether your grammar is LL or not. But if it isn't it may be difficult to correct it (especially if you have no background in language theory).

Related

What type of grammar is used to parse Lua?

I recently met the concept of LR, LL etc. Which category is Lua? Are there, or, can there be implementations that differ from the official code in this aspect?
LR, LL and so on are algorithms which attempt to find a parser for a given grammar. This is not possible for every grammar, and you can categorize on the basis of that possibility. But you have to be aware of the difference between a language and a grammar.
It might be possible to create an LR(k) parser for a given grammar, for some specific value of k. If so, the grammar is LR(k). Note that an LR(k) grammar is also an LR(k+1) grammar, and that an LL(k) grammar is also LR(k). So these are not categories in the sense that every grammar is in exactly one category.
Any language can be recognised by many different grammars. (In fact, an unlimited number). These grammars can be arbitrarily complex. You can always write a grammar for a given language which is not even context-free. We say that a language is <X> if there exists a grammar for that language which is <X>. But the fact that a specific grammar for that language is not <X> says nothing.
One interesting theorem demonstrates that if there is an LR(k) grammar for any language, then it is possible to derive an LR(1) grammar for that language. So while the k parameter is useful for describing grammars, languages can only be LR(0) or LR(1). This is not true of LL(k) languages, though.
Lua as a language is basically LR(1) and LL(2). The grammar is part of the reference manual, except that the published grammar doesn't specify operator precedences or a few rules having to do with newlines. The actual parser is a hand-written recursive-descent parser (at least, the last time I looked) with a couple of small deviations in order to handle operator precedence and the minor deviations from LL(1). However, there exist LALR(1) parsers for Lua as well.

Is there a well-defined and well-reasoned transformation from LL(*) to PEG?

I am in the process of investigating PEG (Parsing Expression Grammar) parsers, and one of the topics I'm looking into is equivalence with other parsing techniques.
I found a good paper about transforming regexes into equivalent PEGs at From Regular Expressions to Parsing Expression Grammars.
I am hoping to find a similar treatment for LL(*) parsers but have as-yet come up empty-handed. It seems to me that a lot of the techniques described in 1 are also going to be applicable to the problem of LL(*) transformation, however I'm not sufficiently steeped in the formalisms to be confident of my own analysis.
Your collective help would be much appreciated!
The Wikipedia article about PEG says it all, I think. PEG does recursive descent by using clause ordering for disambiguation. In theory, the family of languages that can be parsed with recursive descent is the LL family, but, because PEG has unlimited lookahead and no ambiguity, the family should be a larger one, probably full CFG.
Every LL(k) grammar can be implemented by a recursive-descent parser with k lookahead, therefore every LL(k) grammar can be transformed to a PEG grammar by ordering the rules so those that require the longest lookahed are listed first.
This is an LL(k) grammar:
params = expr
params = expr ',' params
To make it a PEG grammar for the same language, the rules must be reordered:
params = expr ',' params
params = expr

How to determine if a language is LL(1)?

I have a grammar and I can check whether or not is is LL(1). However, is there any way to check if the language generated by the grammar is LL(1)? And what exactly is the difference between LL(1) grammars and LL(1) languages?
Any grammar that is LL(1) defines an LL(1) language. By definition, a language is LL(1) if there is some grammar that generates it that is LL(1), so the fact that you have an LL(1) grammar for the language automatically means that the language is LL(1).
To elaborate, a language is a set of strings and a grammar for that language is a means of describing that language. Some languages have LL(1) grammars while others do not. However, the fact that a grammar is not LL(1) does not mean that the language it describes is not. For example, consider this grammar:
A -> ab | ac
This grammar is not LL(1) because it contains a FIRST/FIRST conflict when trying to predict the production for A when seeing terminal a. However, it describes an LL(1) language, since the language is also described by the grammar
A -> aX
X -> b | c
So the language generated by these grammars (which just contains ab and ac) is indeed LL(1).
Determining whether the language described by an arbitrary grammar is LL(1) is much harder and to the best of my knowledge the only way to do it would be to either explicitly exhibit an LL(1) grammar for the language generated by the initial grammar (which is tricky) or to mathematically prove that no such grammar exists.
Hope this helps!

Finding a language that is not LL(1)?

I've been playing around with a lot of grammars that are not LL(1) recently, and many of them can be transformed into grammars that are LL(1).
However, I have never seen an example of an unambiguous language that is not LL(1). In other words, a language for which any unambiguous grammar for the language is not LL(1)), nor do I have any idea how I would go about proving that I had found one if I accidentally stumbled across one.
Does anyone know how to prove that a particular unambiguous language is not LL(1)?
I was thinking about the question a while and then found this language at Wikipedia:
S -> A | B
A -> 'a' A 'b' | ε
B -> 'a' B 'b' 'b' | ε
They claim the language described by the grammar above cannot be described by LL(k) grammar. You asked about LL(1) only and this is pretty straightforward. Having first symbol only, you don't know if the sequence is 'ab' or 'aab' (or any more recursive one) and therefore you cannot choose the right rule. So the language is definitely not LL(1).
Also for every sequence generated by this grammar there is only one derivation tree. So the language is unambiguous.
The second part of your question is a little harder. It is much easier to prove the language is LL(1), than the opposite (there is no LL(1) grammar describing the language). I think you just create a grammar describing the language, then you try to make it LL(1). After discovering a conflict which cannot be resolved you somehow have to take advantage of it and create a proof.

How to determine whether a language is LL(1) LR(0) SLR(1)

Is there a simple way to determine whether a grammar is LL(1), LR(0), SLR(1)... just from looking on the grammar without doing any complex analysis?
For instance: To decide whether a BNF Grammar is LL(1) you have to calculate First and Follow sets - which can be time consuming in some cases.
Has anybody got an idea how to do this faster?
Any help would really be appreciated!
First off, a bit of pedantry. You cannot determine whether a language is LL(1) from inspecting a grammar for it, you can only make statements about the grammar itself. It is perfectly possible to write non-LL(1) grammars for languages for which an LL(1) grammar exists.
With that out of the way:
You could write a parser for the grammar and have a program calculate first and follow sets and other properties for you. After all, that's the big advantage of BNF grammars, they are machine comprehensible.
Inspect the grammar and look for violations of the constraints of various grammar types. For instance: LL(1) allows for right but not left recursion, thus, a grammar that contains left recursion is not LL(1). (For other grammar properties you're going to have to spend some quality time with the definitions, because I can't remember anything else off the top of my head right now :).
In answer to your main question: For a very simple grammar, it may be possible to determine whether it is LL(1) without constructing FIRST and FOLLOW sets, e.g.
A → A + A | a
is not LL(1), while
A → a | b
is.
But when you get more complex than that, you'll need to do some analysis.
A → B | a
B → A + A
This is not LL(1), but it may not be immediately obvious
The grammar rules for arithmetic quickly get very complex:
expr → term { '+' term }
term → factor { '*' factor }
factor → number | '(' expr ')'
This grammar handles only multiplication and addition, and already it's not immediately clear whether the grammar is LL(1). It's still possible to evaluate it by looking through the grammar, but as the grammar grows it becomes less feasable. If we're defining a grammar for an entire programming language, it's almost certainly going to take some complex analysis.
That said, there are a few obvious telltale signs that the grammar is not LL(1) — like the A → A + A above — and if you can find any of these in your grammar, you'll know it needs to be rewritten if you're writing a recursive descent parser. But there's no shortcut to verify that the grammar is LL(1).
One aspect, "is the language/grammar ambiguous", is a known undecidable question like the Post correspondence and halting problems.
Straight from the book "Compilers: Principles, Techniques, & Tools" by Aho, et. al.
Page 223:
A grammar G is LL(1) if and only if whenever A -> alpha | beta are two distinct productions of G, the following conditions hold:
For no terminal "a" do both alpha and beta derive strings beginning with "a"
At most one of alpha and beta can derive the empty string
If beta can reach the empty transition via zero or more transitions, then alpha does not derive any string beginning with a terminal in FOLLOW(A). Likewise, if alpha can reach the empty transition via zero or more transitions, then beta does not derive any string beginning with a terminal in FOLLOW(A)
Essentially this is a matter of verifying the grammar passes the Pairwise Disjointness Test and also does not involve Left Recursion. Or more succinctly a grammar G that is left-recursive or ambiguous cannot be LL(1).
Check whether the grammar is ambiguous or not. If it is, then the grammar is not LL(1) because no ambiguous grammar is LL(1).
ya there are shortcuts for ll(1) grammar
1) if A->B1|B2|.......|Bn
then first(B1)intersection first(B2)intersection .first(Bn)=empty set then it is ll(1) grammar
2) if A->B1|epsilon
then B1 intersection follow(A)is empty set
3) if G is any grammar such that every non terminal derives only one production then the grammar is LL(1)
p0 S' → E
p1 E → id
p2 E → id ( E )
p3 E → E + id
Construct the LR(0) DFA, the FOLLOW set for E and the SLR action/goto tables.
Is this an LR(0) grammar? Prove your answer.
Using the SLR tables, show the steps (shifts, reductions, accept) of an LR parser parsing: id ( id + id )

Resources