I currently am studying about compilers and as I understand in LR(0) there are cases where we have "shift/reduce" or "reduce/reduce" conflicts, but it's impossible to have "shift/shift" conflicts! Why we can't have a "shift/shift" conflict?
Shift/reduce conflicts occur when the parser can't tell whether to shift (push the next input token atop the parsing stack) or reduce (pop a series of terminals and nonterminals from the parsing stack). A reduce/reduce conflict is when the parser knows to reduce, but can't tell which reduction to perform.
If you were to have a shift/shift conflict, the parser would know that it needed to push the next token onto its parsing stack, but wouldn't know how to do it. Since there is only one way to push the token onto the parsing stack, there generally cannot be any conflicts of this form.
That said, it's theoretically possible for a shift/shift conflict to exist if you had a strange setup in which there were two or more transitions leading out of a given parsing state that were labeled with the same terminal symbol. The conflict in that case would be whether to shift and go to one state or shift and go to another. This could happen if you tried to compress an automaton into fewer states and did so incorrectly, or if you were trying to build a nondeterministic parsing automaton. In practice, this would never happen.
Hope this helps!
Related
Let's say I have got a set of LR(1) states and I want to try to convert it to LALR(1).
I did the first step of finding states that have got the same LR(0) core and now I'm doing the merge process. However one of such set of states can't be merged, because it would introduce RR conflict(s). Should I:
Stop the conversion right now and say the grammar, that constructed this state machine, is not a LALR(1) grammar,
or should I continue merge other possible states and stop only if none of such candidates can be merged?
The conflict is not going to go away with more merges, so you could stop immediately and report failure.
Most parser generators would continue to the end, though:
The user might appreciate knowing about all conflicts and not just the first one found, in order to debug their grammar;
Sometimes a simple heuristic to resolve conflicts succeeds in generating a usable parser. (Yacc, for example, first resolves using declared operator precedences, then by preferring shift to reduce, and finally by preferring the reduction which appears earlier in the grammar.)
Why right-recursive grammar is not appropriate for Bottom-Up LR(k) parsing?
I know Bottom-Up parsing starts at leaves and builds up to the root node whereas Top-Down starts at the root and makes it's way down, did a little research regarding these questions and got how they could be modified but couldn't get a direct answer as to why this doesn't work. Any help is appreciated.
[OP changed question title from "why can't ... be used" to "why isn't appropriate ..." That in turn changes my comment into an answer, so I am posting it as one.]
You can use left- or right- recursive rules with any LR(k) parsing algorithm.
Right recursive rules cause a funny property of the parsing process, if you are going to build trees: you have to keep a stack as deep as the right recursion to track the collected nodes.
People will give you source files containing a million items in a list so your stack must be that deep. With right recursive rules this may be deep enough so that you run out of space if you have a fixed sized stack.
Often a parser stack is implemented using the processor's natural push down stack. Our common OSes (Windows, Linux) and their common compilers happen to offer you exactly such fixed size pushdown stacks so in some sense they aggravate this problem.
With a left recursive rule, you reduce after each list item so the stack can be essentially unit depth. That's a lot friendlier: doesn't crash, and uses the cache well.
I have read this to understand more the difference between top down and bottom up parsing, can anyone explain the problems associated with left recursion in a top down parser?
In a top-down parser, the parser begins with the start symbol and tries to guess which productions to apply to reach the input string. To do so, top-down parsers need to use contextual clues from the input string to guide its guesswork.
Most top-down parsers are directional parsers, which scan the input in some direction (typically, left to right) when trying to determine which productions to guess. The LL(k) family of parsers is one example of this - these parsers use information about the next k symbols of input to determine which productions to use.
Typically, the parser uses the next few tokens of input to guess productions by looking at which productions can ultimately lead to strings that start with the upcoming tokens. For example, if you had the production
A → bC
you wouldn't choose to use this production unless the next character to match was b. Otherwise, you'd be guaranteed there was a mismatch. Similarly, if the next input character was b, you might want to choose this production.
So where does left recursion come in? Well, suppose that you have these two productions:
A → Ab | b
This grammar generates all strings of one or more copies of the character b. If you see a b in the input as your next character, which production should you pick? If you choose Ab, then you're assuming there are multiple b's ahead of you even though you're not sure this is the case. If you choose b, you're assuming there's only one b ahead of you, which might be wrong. In other words, if you have to pick one of the two productions, you can't always choose correctly.
The issue with left recursion is that if you have a nonterminal that's left-recursive and find a string that might match it, you can't necessarily know whether to use the recursion to generate a longer string or avoid the recursion and generate a shorter string. Most top-down parsers will either fail to work for this reason (they'll report that there's some uncertainty about how to proceed and refuse to parse), or they'll potentially use extra memory to track each possible branch, running out of space.
In short, top-down parsers usually try to guess what to do from limited information about the string. Because of this, they get confused by left recursion because they can't always accurately predict which productions to use.
Hope this helps!
Reasons
1)The grammar which are left recursive(Direct/Indirect) can't be converted into {Greibach normal form (GNF)}* So the Left recursion can be eliminated to Right Recuraive Format.
2)Left Recursive Grammars are also nit LL(1),So again elimination of left Recursion may result into LL(1) grammer.
GNF
A Grammer of the form A->aV is Greibach Normal Form.
I use Bison output file to analyze the state (machine) transformation of parser, I find when parser deduce a rule, it goes back to a previous state, but sometimes it goes one state back, sometimes it goes two or three states back. Can anyone tell me what is the rule that determine to which state the state machine will go back, after finished a deduction?
Thanks in advance.
When an LR(k) machine performs a reduction, it pops the right-hand side of the production off the parser stack, revealing the state in which parsing of the production started. It then looks up the reduced non-terminal in the GOTO table for that state.
So the number of entries popped off the parser stack will be the number of symbols on the right-hand side of the reduced production. (In theory, an LR parser could optimize by not pushing all symbols onto the stack, which would allow it to pop fewer symbols off the stack. But as far as I know, bison doesn't do this particular optimization, because it would dramatically complicated the interface.)
Is a theoretical LR parser with infinite lookahead capable of parsing (unambiguous) languages which can be desribed by a context-free grammar?
Normally LR(k) parsers are restricted to deterministic context-free languages. I think this means that there always has to be exactly one grammar rule that can be applied currently. Meaning within the current lookahead context not more than one possible way of parsing is allowed to occur. The book "Language Implementation Patterns" states that a "... parser is nondeterministic - it cannot determine which alternative to choose." if the lookahead sets overlap. In contrast a non-deterministic parser just chooses one way if there are multiple alternatives and then goes back to the decision point and chooses the next alternative if it is impossible at a certain point to continue with the decision previously made.
Wherever I read definitions of LR(k) parsers (like on Wikipedia or in the Dragon Book) I always read something like: "k is the number of lookahead tokens" or cases when "k > 1" but never if k can be infinite. Wouldn't an infinite lookahead be the same as trying all alternatives until one succeeds?
Could it be that k is assumed to be finite in order to (implicitly) distinguish LR(k) parsers from non-deterministic parsers?
You are raising several issues here that are difficult to answer in a short form. Nevertheless I will try.
First of all, what is "infinite lookahead"? There is no book that describes such parser. If you have clear idea of what is this, you need to describe it first. Only after that we can discuss this topic. For now parsing theory discusses only LR(k) grammars, where the k is finite.
Normally LR(k) parsers are restricted to deterministic context-free
languages. I think this means that there always has to be exactly one
grammar rule that can be applied currently.
This is wrong. LR(k) grammars may have "grammar conflicts". Dragon book briefly mentions them without going into any details. "Grammars may have conflicts" means that some grammars do not have conflicts, while all other grammars have them. When grammar does not have conflicts, then there is ALWAYS only one rule and the situation is relatively simple.
When grammar has conflicts, this means that in certain cases more than one rule is applicable. Classic parsers cannot help here. What makes matters worse is that some input statement may have a set of correct parsings, not just one. From the grammar theory stand point all these parsings have the same value and importance.
The book "Language Implementation Patterns" states that a "... parser
is nondeterministic - it cannot determine which alternative to
choose."
I have impression that there is no definitive agreement on what "nondeterministic parser" means. I would tend to say that nondeterministic parser just picks up one of the alternatives randomly and goes on.
Practically only 2 strategies of resolving conflicts are used. The first one is conflict resolution in the callback handler. Callback handler is a regular code. Programmer, who writes it, checks whatever he wants in any way he wants. This code only gives back the result - what action to take. For the parser on top this callback handler is a black box. There is no theory here.
Second approach is called "backtracking". The idea behind is very simple. We do not know where to go. Ok, let's try all possible alternatives. In this case all variants are tried. There is nothing non deterministic here. There are several different flavors of backtracking.
If this is not enough I can write a little bit more.
nondeterminism means that in order to produce the correct result(s!), a finite state machine reads a token and then has N>1 next states. You can recognize a nondeterministic FSM if a node has more than one outgoing edge with the same label. Note that not every branch has to be valid, but the FSM can't pick just one. In practice you could fork here, resulting in N state machines or you could try a branch completely and then come back and try the next one until every outgoing statetransfer was tested.