Deterministic finite automaton (DFA) - automaton

Someone can help me to find the DFA for the language L:

Related

Deterministic Pushdown automa vs Non-deterministic pushdown Automata

how can you show a) Deterministic pushdown automata (DPDA) are more powerful than finite automata and less powerful than a non-determinstic pushdown automata?
(1) First, show that any language that can be accepted by a finite automaton can also be accepted by a deterministic pushdown automaton. Recall that any language accepted by a finite automaton is accepted by a deterministic finite automaton, and a deterministic pushdown automaton can simulate a deterministic finite automaton simply by doing nothing interesting with its stack. Next, show there is a non-regular language accepted by a DPDA. 0^n 1^n is a good candidate. Prove this language is non-regular using the pumping lemma or Myhill-Nerode theorem, thenshow the DPDA that pushes on 0s and then switches to popping on 1s works.
(2) First, note that NPDAs can accept any language accepted by DPDAs since all DPDAs are also NPDAs that happen not to make use of nondeterminism. Next, find a language that has an NPDA but no DPDA. Even-length palindromes over the alphabet {0, 1} might work. There is a simple NPDA for this that nondeterministically guesses when the first half of the input has been read and switches from pushing to popping. To show there is no DPDA is more challenging. Perhaps you could argue as follows: suppose there were a DPDA. Then, in any configuration of the DPDA, only one transition would be possible. If string w leads to an accepting state in the DPDA and empties the stack, x00 may lead either to an accepting or non-accepting state (since x00 either may or may not be an even-length palindrome). This is a contradiction, though, so our DPDA does not exist. The same argument fails for NPDAs, by the way, because there can be multiple paths through, so one failed choice means nothing.
NDPA is more powerful than DPDA because we can add more transitions to it. It is possible for every language to add a transition. For some languages, we can construct DPDA there exist an NPDA but there are some languages that are accepted by NPDA but are not by DPDA. This is said to be powerful when it accepts more sets of languages than other automata.
In fact, it is more powerful than DFA(Deterministic finite automata) and NFA(Non-deterministic finite automata) also because, In the case of DFA and NFA, they are equivalent in power. So for every language accepted by DFA there exist an NFA and Vice-Versa. There is not any language for which we construct NFA but not DFA. Hence, we can’t convert NPDA to DPDA always and we can convert NFA to equivalent dfa.

How easy is to find a string that leads to conflict in a SLR(1) parser compared to a LR(1)

It is known that SLR(1) parsers usually have less states than LR(1). But is it easier or harder because of this to find a string that leads to conflict in a SLR(1) parser compared to a LR(1) and why?
Thank you in advance.
Let’s say you have some CFG and you build both an SLR(1) parser and an LR(1) parser for that grammar. If you don’t have any shift/reduce or reduce/reduce conflicts, then you’re done - there aren’t any strings that lead to conflicts. On the other hand, if such a conflict exists, then yes, there is a string that leads to a conflict. You can find such a string by working backwards through the automaton: find a path from the start state to the state with the shift/reduce or reduce/reduce conflict and write out the series of terminals and nonterminals that take you there. If it’s all terminals, great! You’ve got your string. If there are any nonterminals there, since an LR parser traces a rightmost derivation in reverse, pick the rightmost nonterminal in the sequence you found and repeat this process to expand it out further. Eventually you’ll get your string.
In that sense, once you have the automata constructed, the exact same procedure will find a string that can’t be parsed. So the difficulty of finding a bad string basically boils down to building out the SLR(1) versus LR(1) automata, and that’s where the fact that LR(1) automata are a bit bigger than SLR(1) automata come in. It’ll probably take a bit longer to find out that a grammar isn’t LR(1) than to find out that it isn’t SLR(1) simply because it takes more time to build LR(1) parsers.

Removing ambiguity from context free grammars

Given an ambiguous grammar, to remove operator precedence problems we would convert the grammar to follow the operator precedence rules. To solve the operator associativity problem, we convert the grammar into left recursive or right recursive by considering the operator it is associated with.
Now when the computer has to do the parsing, suppose if it uses the recursive descent algorithm, should the grammar be unambiguous in the first place? Or the grammar should have different requirements according to the algorithm?
If the grammar is left recursive, the recursive descent algorithm doesn't terminate. Now how do I give an unambiguous grammar(with associativity problems solved) to the algorithm as the input?
The grammar must be LL(k) to use the standard efficient recursive descent algorithm with no backtracking. There are standard transformations useful for taking a general LR grammar (basically any grammar parsable by a deterministic stack-based algorithm) to LL(k) form. They include left recursion elimination and left factoring. These are extensive topics I won't attempt to cover here. But they are covered well in most any good compiler text and reasonably well in online notes available thru search. Aho Sethi and Ullman Compiler Design is a great reference for this and most other compiler basics.

What is the difference between LALR and LR parsing? [duplicate]

This question already has answers here:
What is the difference between LR, SLR, and LALR parsers?
(9 answers)
Closed 3 years ago.
The community reviewed whether to reopen this question 4 months ago and left it closed:
Original close reason(s) were not resolved
I understand both LR and LALR are bottom-up parsing algorithms, but what's the difference between the two?
What's the difference between LR(0), LALR(1), and LR(1) parsing? How can I tell if a grammar is LR(0), LALR(1), or LR(1)?
At a high level, the difference between LR(0), LALR(1), and LR(1) is the following:
An LALR(1) parser is an "upgraded" version of an LR(0) parser that keeps track of more precise information to disambiguate the grammar. An LR(1) parser is a significantly more powerful parser that keeps track of even more precise information than an LALR(1) parser.
LALR(1) parsers are a constant factor larger than LR(0) parsers, and LR(1) parsers are usually exponentially larger than LALR(1) parsers.
Any grammar that can be parsed with an LR(0) parser can be parsed with an LALR(1) parser and any grammar that can be parsed with an LALR(1) parser can be parsed with an LR(1) parser. There are grammars that are LALR(1) but not LR(0) and LR(1) but not LALR(1).
More formally, an LR(k) parser is a bottom-up parser that works by maintaining a stack of terminals and nonterminals. The parser is controlled by a finite automaton that determines, based on the current state of the parser and the next k tokens of input, whether to shift a new token onto the stack or reduce the top symbols of the stack by applying a production in reverse.
In order to keep track of enough information to make a determination about whether to shift or reduce, LR(k) parsers have each state correspond to a "configurating set," a set of productions annotated with the following information:
How much of the production has been seen so far, and
What tokens to expect after the production has been completed (the lookahead)
The first of these pieces of information is used to determine whether the parser may need to do a reduction - if none of the productions in a current state have been completed, there's no reason to do a reduction. The second of these pieces of information is used when doing a reduction to determine whether the reduction should be performed. When deciding whether to reduce, an LR(k) parser looks at the next k tokens of the input stream. If they match the lookahead tokens, the parser will reduce, and otherwise the parser does nothing.
Problems arise in an LR(k) parser when there are conflicts about what the parser should do in a given state. One type of conflict, a shift/reduce conflict, comes up when the parser is in a state where a production has been completed, but the lookahead symbols for that production conflict are also used by another uncompleted production in the state. This means that the parser can't tell whether to perform the reduction or not. A second type of conflict is a reduce/reduce conflict, where the parser knows it has to do a reduction, but two or more reductions are possible and it can't tell which to do.
Intuitively, as k gets larger and larger, the parser has more and more precise information available to it to determine when to shift and when to reduce. If a grammar is not LR(0), for example, the parser might have a state where given no lookahead at all it can't determine whether to shift or to reduce. However, that grammar might still be LR(1) because given an extra token of lookahead, it may be able to recognize that it should definitely shift and not reduce or definitely reduce and not shift.
The problem with LR(k) parsers is that as k gets larger, the number of states can increase exponentially. Lookahead in LR(k) parsers is handled by building more and more states in the parser to correspond to different combinations of productions and lookaheads, so as the number of possible lookaheads increases so does the number of states. Consequently, LR(1) parsers are commonly too large to be practical, and LR(2) or greater is almost unheard of in practice.
LALR(1) was invented as a compromise between the space efficiency of LR(0) parsers and the expressive power of LR(1) parsers. There are several ways to think about what an LALR(1) parser is. Originally, LALR(1) parsers were specified as a transformation that converts LR(1) automata into smaller automata. Although an LR(1) parser may have many more states than an LR(0) automaton, the only difference is that an LR(1) parser may have multiple copies of any particular state in an LR(0) automaton, each annotated with different lookahead information. An LALR(1) parser can be formed by starting with an LR(1) parser, then combining together all states that have the same "core" (the set of productions and their positions), then aggregating all the lookahead information together. This results in a parser that has the same number of states as an LR(0) parser but retains some amount of information about lookaheads to help avoid LR conflicts.
Another view of LALR(1) grammars uses the "LALR-by-SLR" method. LALR(1) parsers can be constructed by starting with an LR(0) parser for a grammar, then creating a new grammar for the language that annotates nonterminals with information about what states in the LR(0) parser they correspond to. The information about the FOLLOW sets of the nonterminals in that grammar can then be used to compute the lookaheads in the LR(0) parser.
The net result is that
LR(0) parsers are small, but not very expressive.
LALR(1) parsers are slightly larger due to the lookahead information, but very expressive.
LR(1) parsers are huge, but extremely expressive.
As for your second question - how do you determine whether a grammar is LR(1) or LALR(1) - the standard approach is to try to build the parsing automata for the LR(1) parser and LALR(1) parser and checking for conflicts. To build the LR(1) parser, you build up the LR(1) configurating sets, then check to see if any of those configurating sets have a shift/reduce conflict or reduce/reduce conflict. To construct an LALR(1) parser, you can either build the LR(1) parser and then condense configurating sets with the same core or can use the LALR-by-SLR method based on the LR(0) parser for the language. More details about how to construct these configurating sets are available in most compilers textbooks. You can also check out the lecture notes from a compilers course I taught in Summer 2012, which cover all of the above parsing methods and a few others.
Hope this helps!
LR(0), SLR(1), LALR(1) parsers all have the same number of states. Minimal LR(1) parsers will have a few more states if the grammar requires it, to avoid reduce-reduce conflicts.
Canonical LR(1) parsers will have many more states, too many for medium or large computer languages.
SLR(1) parser generators build an LR(0) state machine and determine the k=1 lookaheads by examining the grammar (which may report erroneous conflicts).
LALR(1) parser generators build an LR(0) state machine and determine the k=1 lookaheads by examining the LR(0) state machine (which is very complicated).
Canonical LR(1) parser generators build an LR(1) state machine.
Minimal LR(1) parser generators build an LR(1) state machine and merge compatible states during the build process.
The parsing algorithm for a good LALR(1) parser is different in two ways: (1) It should have shift-reduce actions, which reduces the number of states by about 30% and makes the parser faster, and (2) it must do one or more reductions when detecting a syntax error, which makes error recovery more complicated.
The parsing algorithm for a canonical LR(1) parser (1) does not have shift-reduce actions and (2) does not make any reductions when detecting a syntax error, which makes error recovery simpler.
There is another case, called minimal LR(1), which uses the same parsing algorithm and error recovery algorithm as LALR(1). Minimal LR(1) parsers offer the power of LR(1) and their size is almost as small as LALR(1). The LRSTAR Parser Generator creates minimal LR(1) parsers for C++ programmers.

Converting a context-free grammar into a LL(1) grammar

First off, I have checked similar questions and none has quite the information I seek.
I want to know if, given a context-free grammar, it is possible to:
Know if there exists or not an equivalent LL(1) grammar. Equivalent in the sense that it should generate the same language.
Convert the context-free grammar to the equivalent LL(1) grammar, given it exists. The conversion should succeed if an equivalent LL(1) grammar exists. It is OK if it does not terminate if an equivalent does not exists.
If the answer to those questions are yes, are such algorithms or tools available somewhere ? My own researches have been fruitless.
Another answer mentions that the Dragon Book has an algorithms to eliminate left recursion and to left factor a context-free grammar. I have access to the book and checked it but it is unclear to me if the grammar is guaranteed to be LL(1). The restriction imposed on the context-free grammar (no null production and no cycle) are agreeable to me.
From university compiler courses I took I remember that LL(1) grammar is context free grammar, but context free grammar is much bigger than LL(1). There is no general algorithm (meaning not NP hard) to check and convert from context-free (that can be transformed to LL(1)) to LL(1).
Applying the bag of tricks like removing left recursion, removing first-follow conflict, left-factoring, etc. are similar to mathematical transformation when you want to integrate a function... You need experience that is sometimes very close to an art. The transformations are often inverse of each other.
One reason why LR type grammars are being used now a lot for generated parsers is that they cover much wider spectrum of context free grammars than LL(1).
Btw. e.g. C grammar can be expressed as LL(1), but C# cannot (e.g. lambda function x -> x + 1 comes to mind, where you cannot decide if you are seeing a parameter of lambda or a known variable).

Resources