SLR(1) and LALR(1) and Reduce - parsing

I confused Exactly !!!!!!
I read following example in one of my professor note.
1) we have a SLR(1) Grammar G as following. we use SLR(1) parser generator and generate a parse table S for G. we use LALR(1) parser generator and generate a parse table L for G.
S->AB
A->dAa
A-> lambda (lambda is a string with length=0)
B->aAb
Solution: the number of elements with R (reduce) in S is more than L.
but in one site i read:
2) Suppose T1, T2 is created with SLR(1) and LALR(1) for Grammar G. if G be a SLR(1) Grammar which of the following is TRUE?
a) T1 and T2 has not any difference.
b) total Number of non-error entries in T1 is lower than T2
c) total Number of error entries in T1 is lower than T2
Solution:
The LALR(1) algorithm generates exactly the same states as the SLR(1) algorithm, but it can generate different actions; it is capable of resolving more conflicts than the SLR(1) algorithm. However, if the grammar is SLR(1), both algorithms will produce exactly the same machine (a is right).
any one could describe for me which of them is true?
EDIT: infact my question is why for a given SLR(1) Grammar, the parse table of LALAR(1) and SLR(1) is exactly the same, (error and non-error entries are equal and number of reduce is equal) but for the above grammar, the number of Reduced in S is more than L.
I see in another book that in general we have:
Summary:
1) for the above grammar that i wrote in question 1, why number of reduced is different?
2) if we have a SLR(1) Grammar, why the table is exactly the same? (number of reduced and error entries become the same)

Both of these statements are true!
One of your questions was why SLR(1) and LALR(1) parsers have the same states as one another. SLR(1) parsers are formed by starting with an LR(0) automaton, then augmenting each production with lookahead information from FOLLOW sets. In an LALR(1) parser, we begin with an LR(1) parser (where each production has very precise lookahead information), then combine any two states that have the same underlying LR(0) state. This results in an LR(0) automaton with additional information because each LR(0) state corresponds to at least one LR(1) state and each LR(1) state corresponds to some underlying LR(0) state.
SLR(1) and LALR(1) parsers both have the same set of states, which are the same states as in an LR(0) parser. The parsers differ only in what actions they perform in each state.
In both SLR(1) and LALR(1) parsers, each item has an associated set of lookahead tokens. Whenever the parser enters a state with a reduce item in it, the parser will perform that reduction if the next token of input is in the lookahead set. In an SLR(1) parser, the lookahead set is the FOLLOW set for the nonterminal on the left-hand side of the production. In an LALR(1) parser, the lookahead set is, appropriately, called the LA set for the combination of the nonterminal in the production and the automaton state.
You can prove that the LA sets used in an LALR(1) parser are subsets of the FOLLOW sets used in SLR(1) parsers. This means that LALR(1) parsers will never have more reduce actions than SLR(1) parsers, and in some cases the LALR(1) parsers will choose to shift when an SLR(1) parser would have a shift/reduce conflict.
Hope this helps!

Answer to Q1:
First of all you need to create DFA for SLR(1) and LALR(1) parsers. I created DFA for both of them.
Link to images of DFAs SLR(1) and LALR(1) DFAs
For SLR(1) I got 10 states and 10 reduce entries whereas for LALR(1) I created DFA for CLR(1) with 13 states which got minimized to 10 states with 7 reduce entries. Thats answers your first question.
Answer to Q2:
G is SLR(1) grammar, then surely there are no conflicts (or error) S-R or R-R in the SLR(1) table. LALR(1) has more power than SLR(1),therefore there is also no conflict in LALR(1) table for given grammar G. Lets see option by option
(c) : there no error in T1 and T2 (wrong option)
(b) : Non-error entries means shift entries and reduce entries. It should be clearly noted that in bottom up parsers from parser to parser only rules for reduce entries changes while for that of shift entries remain same. For e.g in LR(0) reduce entries are made in each column, for SLR(1) it is done in FOLLOW of left hand side variable, while in CLR(1) and LALR(1) reduce entries are made in lookahead symbols. Thus reduce entries changes from parser to parser but shift entries are same.
We have also already proved in Q1 where reduce entries of SLR(1) parsing table are more than that of LALR(1). Therefore proving (b) option to be incorrect.
(a) T1 and T2 may come out to be same but not always. And other important thing is that multiple choice questions sometimes wants you to choose most appropriate option. Thus for me (a) is the answer

Related

Theory: LL(k) parser vs parser for LL(k) grammars

I'm concerned about the very important difference between the therms: "LL(k) parser" and "parser for LL(k) grammars". When a LL(1) backtracking parser is in question, it IS a parser for LL(k) grammars, because it can parse them, but its NOT LL(k) parser, because it does not use k tokens to look ahead from a single position in the grammar, but its exploring with backtracking the possible cases, regardless that it still uses k tokens to explore.
Am I right?
The question may break down to the way the look-ahead is performed. If the look-ahead is actually still processing the grammar with backtracking, that does not make it LL(k) parser. To be LL(k) parser the parser must not use the grammar with backtracking mechanism, because then it would be "LL(1) parser with backtracking that can parser LL(k) grammars".
Am I right again?
I think the difference is related to the expectation that LL(1) parser is using a constant time per token, and LL(k) parser is using at most k * constant (linear to the look-ahead) time per token, not an exponential time as it would be in the case of a backtracking parser.
Update 1: to simplify - per token, is the parsing LL(k) expected to run exponentially in respect to k or in a linear time in respect to k?
Update 2: I have changed it to LL(k) because the question is irrelevant to the range of which k is (integer or infinity).
An LL(k) parser needs to do the following at each point in the inner loop:
Collect the next k input symbols. Since this is done at each point in the input, this can be done in constant time by keeping the lookahead vector in a circular buffer.
If the top of the prediction stack is a terminal, then it is compared with the next input symbol; either both are discarded or an error is signalled. This is clearly constant time.
If the top of the prediction stack is a non-terminal, the action table is consulted, using the non-terminal, the current state and the current lookahead vector as keys. (Not all LL(k) parsers need to maintain a state; this is the most general formulation. But it doesn't make a difference to complexity.) This lookup can also be done in constant time, again by taking advantage of the incremental nature of the lookahead vector.
The prediction action is normally done by pushing the right-hand side of the selected production onto the stack. A naive implementation would take time proportional to the length of the right-hand side, which is not correlated with either the lookahead k nor the length of the input N, but rather is related to the size of the grammar itself. It's possible to avoid the variability of this work by simply pushing a reference to the right-hand side, which can be used as though it were the list of symbols (since the list can't change during the parse).
However, that's not the full story. Executing a prediction action does not consume an input, and it's possible -- even likely -- that multiple predictions will be made for a single input symbol. Again, the maximum number of predictions is only related to the grammar itself, not to k nor to N.
More specifically, since the same non-terminal cannot be predicted twice in the same place without violating the LL property, the total number of predictions cannot exceed the number of non-terminals in the grammar. Therefore, even if you do push the entire right-hand side onto the stack, the total number of symbols pushed between consecutive shift actions cannot exceed the size of the grammar. (Each right-hand side can be pushed at most once. In fact, only one right-hand side for a given non-terminal can be pushed, but it's possible that almost every non-terminal has only one right-hand side, so that doesn't reduce the asymptote.) If instead only a reference is pushed onto the stack, the number of objects pushed between consecutive shift actions -- that is, the number of predict actions between two consecutive shift actions -- cannot exceed the size of the non-terminal alphabet. (But, again, it's possible that |V| is O(|G|).
The linearity of LL(k) parsing was established, I believe, in Lewis and Stearns (1968), but I don't have that paper at hand right now so I'll refer you to the proof in Sippu & Soisalon-Soininen's Parsing Theory (1988), where it is proved in Chapter 5 for Strong LL(K) (as defined by Rosenkrantz & Stearns 1970), and in Chapter 8 for Canonical LL(K).
In short, the time the LL(k) algorithm spends between shifting two successive input symbols is expected to be O(|G|), which is independent of both k and N (and, of course, constant for a given grammar).
This does not really have any relation to LL(*) parsers, since an LL(*) parser does not just try successive LL(k) parses (which would not be possible, anyway). For the LL(*) algorithm presented by Terence Parr (which is the only reference I know of which defines what LL(*) means), there is no bound to the amount of time which could be taken between successive shift actions. The parser might expand the lookahead to the entire remaining input (which would, therefore, make the time complexity dependent on the total size of the input), or it might fail over to a backtracking algorithm, in which case it is more complicated to define what is meant by "processing an input symbol".
I suggest you to read the chapter 5.1 of Aho Ullman Volume 1.
https://dl.acm.org/doi/book/10.5555/578789
A LL(k) parser is a k-predictive algorithm (k is the lookahead integer >= 1).
A LL(k) parser can parse any LL(k) grammar. (chapter 5.1.2)
for all a, b you have a < b => LL(b) grammar is also a LL(a) grammar. But the reverse is not true.
A LL(k) parser is PREDICTIVE. So there is NO backtracking.
All LL(k) parsers are O(n) n is the length of the parsed sentence.
It is important to understand that a LL(3) parser do not parse faster than a LL(1). But the LL(3) parser can parse MORE grammars than the LL(1). (see the point #2 and #3)

How does LR parsing select a qualifying grammar production (to construct the parse tree from the leaves)?

I am reading a tutorial of the LR parsing. The tutorial uses an example grammar here:
S -> aABe
A -> Abc | b
B -> d
Then, to illustrate how the parsing algorithm works, the tutorial shows the process of parsing the word "abbcde" below.
I understand at each step of the algorithm, a qualifying production (namely a gramma rule, illustrated in column 2 in the table) is searched to match a segment of the string. But how does the LR parsing chooses among a set of qualifying productions (illustrate in column 3 in the table)?
An LR parse of a string traces out a rightmost derivation in reverse. In that sense, the ordering of the reductions applied is what you would get if you derived the string by always expanding out the rightmost nonterminal, then running that process backwards. (Try this out on your example - isn’t that neat?)
The specific mechanism by which LR parsers actually do this involves the use of a parsing automaton that tracks where within the grammar productions the parse happens to be, along with some lookahead information. There are several different flavors of LR parser (LR(0), SLR(1), LALR(1), LR(1), etc.), which differ on how the automaton is structured and how they use lookahead information. You may find it helpful to search for a tutorial on how these automata work, as that’s the heart of how LR parsers actually work.

Example of grammar that works in LR(1) but not LALR(1)?

I can't seem to understand the difference between LALR(1) and LR(1) except that LALR(1) seems to have fewer states than LR(1) does.
I wonder if anyone has the example to show the difference and some explanation.
Thank you
There's an example in the Dragon book (Example 4.44; 4.58 if you have the second edition):
S' → S
S → aAd | bBd | aBe | bAe
A → c
B → c
Since the grammar only generates four strings, it's easy enough to create the LR item sets. When you do that, you'll see that there are two sets with the same items but different lookaheads, corresponding to the prefixes ac and bc. There are no conflicts, so the grammar is LR(1).
The LALR algorithm combines states whose items sets are the same, effectively merging their lookaheads. This creates a reduce/reduce conflict, so the grammar is not LALR(1).

What is the difference between LALR and LR parsing? [duplicate]

This question already has answers here:
What is the difference between LR, SLR, and LALR parsers?
(9 answers)
Closed 3 years ago.
The community reviewed whether to reopen this question 4 months ago and left it closed:
Original close reason(s) were not resolved
I understand both LR and LALR are bottom-up parsing algorithms, but what's the difference between the two?
What's the difference between LR(0), LALR(1), and LR(1) parsing? How can I tell if a grammar is LR(0), LALR(1), or LR(1)?
At a high level, the difference between LR(0), LALR(1), and LR(1) is the following:
An LALR(1) parser is an "upgraded" version of an LR(0) parser that keeps track of more precise information to disambiguate the grammar. An LR(1) parser is a significantly more powerful parser that keeps track of even more precise information than an LALR(1) parser.
LALR(1) parsers are a constant factor larger than LR(0) parsers, and LR(1) parsers are usually exponentially larger than LALR(1) parsers.
Any grammar that can be parsed with an LR(0) parser can be parsed with an LALR(1) parser and any grammar that can be parsed with an LALR(1) parser can be parsed with an LR(1) parser. There are grammars that are LALR(1) but not LR(0) and LR(1) but not LALR(1).
More formally, an LR(k) parser is a bottom-up parser that works by maintaining a stack of terminals and nonterminals. The parser is controlled by a finite automaton that determines, based on the current state of the parser and the next k tokens of input, whether to shift a new token onto the stack or reduce the top symbols of the stack by applying a production in reverse.
In order to keep track of enough information to make a determination about whether to shift or reduce, LR(k) parsers have each state correspond to a "configurating set," a set of productions annotated with the following information:
How much of the production has been seen so far, and
What tokens to expect after the production has been completed (the lookahead)
The first of these pieces of information is used to determine whether the parser may need to do a reduction - if none of the productions in a current state have been completed, there's no reason to do a reduction. The second of these pieces of information is used when doing a reduction to determine whether the reduction should be performed. When deciding whether to reduce, an LR(k) parser looks at the next k tokens of the input stream. If they match the lookahead tokens, the parser will reduce, and otherwise the parser does nothing.
Problems arise in an LR(k) parser when there are conflicts about what the parser should do in a given state. One type of conflict, a shift/reduce conflict, comes up when the parser is in a state where a production has been completed, but the lookahead symbols for that production conflict are also used by another uncompleted production in the state. This means that the parser can't tell whether to perform the reduction or not. A second type of conflict is a reduce/reduce conflict, where the parser knows it has to do a reduction, but two or more reductions are possible and it can't tell which to do.
Intuitively, as k gets larger and larger, the parser has more and more precise information available to it to determine when to shift and when to reduce. If a grammar is not LR(0), for example, the parser might have a state where given no lookahead at all it can't determine whether to shift or to reduce. However, that grammar might still be LR(1) because given an extra token of lookahead, it may be able to recognize that it should definitely shift and not reduce or definitely reduce and not shift.
The problem with LR(k) parsers is that as k gets larger, the number of states can increase exponentially. Lookahead in LR(k) parsers is handled by building more and more states in the parser to correspond to different combinations of productions and lookaheads, so as the number of possible lookaheads increases so does the number of states. Consequently, LR(1) parsers are commonly too large to be practical, and LR(2) or greater is almost unheard of in practice.
LALR(1) was invented as a compromise between the space efficiency of LR(0) parsers and the expressive power of LR(1) parsers. There are several ways to think about what an LALR(1) parser is. Originally, LALR(1) parsers were specified as a transformation that converts LR(1) automata into smaller automata. Although an LR(1) parser may have many more states than an LR(0) automaton, the only difference is that an LR(1) parser may have multiple copies of any particular state in an LR(0) automaton, each annotated with different lookahead information. An LALR(1) parser can be formed by starting with an LR(1) parser, then combining together all states that have the same "core" (the set of productions and their positions), then aggregating all the lookahead information together. This results in a parser that has the same number of states as an LR(0) parser but retains some amount of information about lookaheads to help avoid LR conflicts.
Another view of LALR(1) grammars uses the "LALR-by-SLR" method. LALR(1) parsers can be constructed by starting with an LR(0) parser for a grammar, then creating a new grammar for the language that annotates nonterminals with information about what states in the LR(0) parser they correspond to. The information about the FOLLOW sets of the nonterminals in that grammar can then be used to compute the lookaheads in the LR(0) parser.
The net result is that
LR(0) parsers are small, but not very expressive.
LALR(1) parsers are slightly larger due to the lookahead information, but very expressive.
LR(1) parsers are huge, but extremely expressive.
As for your second question - how do you determine whether a grammar is LR(1) or LALR(1) - the standard approach is to try to build the parsing automata for the LR(1) parser and LALR(1) parser and checking for conflicts. To build the LR(1) parser, you build up the LR(1) configurating sets, then check to see if any of those configurating sets have a shift/reduce conflict or reduce/reduce conflict. To construct an LALR(1) parser, you can either build the LR(1) parser and then condense configurating sets with the same core or can use the LALR-by-SLR method based on the LR(0) parser for the language. More details about how to construct these configurating sets are available in most compilers textbooks. You can also check out the lecture notes from a compilers course I taught in Summer 2012, which cover all of the above parsing methods and a few others.
Hope this helps!
LR(0), SLR(1), LALR(1) parsers all have the same number of states. Minimal LR(1) parsers will have a few more states if the grammar requires it, to avoid reduce-reduce conflicts.
Canonical LR(1) parsers will have many more states, too many for medium or large computer languages.
SLR(1) parser generators build an LR(0) state machine and determine the k=1 lookaheads by examining the grammar (which may report erroneous conflicts).
LALR(1) parser generators build an LR(0) state machine and determine the k=1 lookaheads by examining the LR(0) state machine (which is very complicated).
Canonical LR(1) parser generators build an LR(1) state machine.
Minimal LR(1) parser generators build an LR(1) state machine and merge compatible states during the build process.
The parsing algorithm for a good LALR(1) parser is different in two ways: (1) It should have shift-reduce actions, which reduces the number of states by about 30% and makes the parser faster, and (2) it must do one or more reductions when detecting a syntax error, which makes error recovery more complicated.
The parsing algorithm for a canonical LR(1) parser (1) does not have shift-reduce actions and (2) does not make any reductions when detecting a syntax error, which makes error recovery simpler.
There is another case, called minimal LR(1), which uses the same parsing algorithm and error recovery algorithm as LALR(1). Minimal LR(1) parsers offer the power of LR(1) and their size is almost as small as LALR(1). The LRSTAR Parser Generator creates minimal LR(1) parsers for C++ programmers.

How to determine whether a language is LL(1) LR(0) SLR(1)

Is there a simple way to determine whether a grammar is LL(1), LR(0), SLR(1)... just from looking on the grammar without doing any complex analysis?
For instance: To decide whether a BNF Grammar is LL(1) you have to calculate First and Follow sets - which can be time consuming in some cases.
Has anybody got an idea how to do this faster?
Any help would really be appreciated!
First off, a bit of pedantry. You cannot determine whether a language is LL(1) from inspecting a grammar for it, you can only make statements about the grammar itself. It is perfectly possible to write non-LL(1) grammars for languages for which an LL(1) grammar exists.
With that out of the way:
You could write a parser for the grammar and have a program calculate first and follow sets and other properties for you. After all, that's the big advantage of BNF grammars, they are machine comprehensible.
Inspect the grammar and look for violations of the constraints of various grammar types. For instance: LL(1) allows for right but not left recursion, thus, a grammar that contains left recursion is not LL(1). (For other grammar properties you're going to have to spend some quality time with the definitions, because I can't remember anything else off the top of my head right now :).
In answer to your main question: For a very simple grammar, it may be possible to determine whether it is LL(1) without constructing FIRST and FOLLOW sets, e.g.
A → A + A | a
is not LL(1), while
A → a | b
is.
But when you get more complex than that, you'll need to do some analysis.
A → B | a
B → A + A
This is not LL(1), but it may not be immediately obvious
The grammar rules for arithmetic quickly get very complex:
expr → term { '+' term }
term → factor { '*' factor }
factor → number | '(' expr ')'
This grammar handles only multiplication and addition, and already it's not immediately clear whether the grammar is LL(1). It's still possible to evaluate it by looking through the grammar, but as the grammar grows it becomes less feasable. If we're defining a grammar for an entire programming language, it's almost certainly going to take some complex analysis.
That said, there are a few obvious telltale signs that the grammar is not LL(1) — like the A → A + A above — and if you can find any of these in your grammar, you'll know it needs to be rewritten if you're writing a recursive descent parser. But there's no shortcut to verify that the grammar is LL(1).
One aspect, "is the language/grammar ambiguous", is a known undecidable question like the Post correspondence and halting problems.
Straight from the book "Compilers: Principles, Techniques, & Tools" by Aho, et. al.
Page 223:
A grammar G is LL(1) if and only if whenever A -> alpha | beta are two distinct productions of G, the following conditions hold:
For no terminal "a" do both alpha and beta derive strings beginning with "a"
At most one of alpha and beta can derive the empty string
If beta can reach the empty transition via zero or more transitions, then alpha does not derive any string beginning with a terminal in FOLLOW(A). Likewise, if alpha can reach the empty transition via zero or more transitions, then beta does not derive any string beginning with a terminal in FOLLOW(A)
Essentially this is a matter of verifying the grammar passes the Pairwise Disjointness Test and also does not involve Left Recursion. Or more succinctly a grammar G that is left-recursive or ambiguous cannot be LL(1).
Check whether the grammar is ambiguous or not. If it is, then the grammar is not LL(1) because no ambiguous grammar is LL(1).
ya there are shortcuts for ll(1) grammar
1) if A->B1|B2|.......|Bn
then first(B1)intersection first(B2)intersection .first(Bn)=empty set then it is ll(1) grammar
2) if A->B1|epsilon
then B1 intersection follow(A)is empty set
3) if G is any grammar such that every non terminal derives only one production then the grammar is LL(1)
p0 S' → E
p1 E → id
p2 E → id ( E )
p3 E → E + id
Construct the LR(0) DFA, the FOLLOW set for E and the SLR action/goto tables.
Is this an LR(0) grammar? Prove your answer.
Using the SLR tables, show the steps (shifts, reductions, accept) of an LR parser parsing: id ( id + id )

Resources