I have a NFA like this:
and the question is:
Are epsilon and empty set languages of this NFA?
Your automata needs a minimum of {b,a} in order to reach its end state. Therefore, since it's impossible to reach the end with no transition, empty set isn't in its language. Also since there isnt a path from start to finish that is composed entirely of ε-transitions, there's no way to reach the end state with only ε.
So no, Empty set and ε are Not part of that NFA's languages.
Related
enter image description here
I tried this but this is DFA how would NFA be?
By most definitions, every DFA is also an NFA. So your automaton is an NFA.
If you really want some nondeterminism, here is another NFA:
0,1
-> ((even)) <---> (odd)
^
| 0,1
v
(also odd)
As you can see, forcing an automaton to use nondeterminism is not very enlightening.
In most cases, using nondeterminism makes the automaton more compact and simple to write. But in this case the automaton already only used two states. It can hardly get smaller.
I am confused about the implementation of a language by an automaton. Does the automaton go directly to the next state if there is a ɛ-transition? Suppose I have an automaton consisting of three states a, b, and c (where a is initial state and c the accepting state) with alphabet {0,1}. How does the following work?
a----ɛ--->(b----0---->a)
(b----1---->c)
Is the string "1" accepted? What if we had
a---1--->b----ɛ--->c
? Would the string "1" be accepted?
Does the automaton go directly to the next state if there is an ɛ-transition?
Roughly speaking, yes. An ɛ-transition (in a non-deterministic finite automaton, or NFA, for short) is a transition that is not associated with the consumption of any symbol (0 or 1, in this case). Once you understand that, it's easy (in this case) to derive deterministic finite automata (or DFA, for short) that are equivalent to your NFAs and identify the languages that the latter describe.
Suppose I have an automaton [...] Is the string "1" accepted?
Yes. Here is a nicer diagram (curtesy of LaTeX and tikz) of your first NFA:
An equivalent DFA would be:
Once you have that, it's easy to see that the language accepted by that NFA is the set of strings that
start with zero or more 0's,
end with exactly one 1.
The string "1", because it starts with zero 0 and ends with one 1, is indeed accepted.
What if we had [...]? Would the string "1" be accepted?
Yes. Here is a nicer diagram of your second NFA:
An equivalent DFA would be:
In fact, it's easy to see that "1" is the only accepted string, here.
I'm implementing the automatic construction of an LALR parse table for no reason at all. There are two flavors of this parser, LALR(0) and LALR(1), where the number signifies the amount of look-ahead.
I have gotten myself confused on what look-ahead means.
If my input stream is 'abc' and I have the following production, would I need 0 look-ahead, or 1?
P :== a E
Same question, but I can't choose the correct P production in advance by only looking at the 'a' in the input.
P :== a b E
| a b F
I have additional confusion in that I don't think the latter P-productions really happen in when building a LALR parser generator. The reason is that the grammar is effectively left-factored automatically as we compute the closures.
I was working through this page and was ok until I got to the first/follow section. My issue here is that I don't know why we are calculating these things, so I am having trouble abstracting this in my head.
I almost get the idea that the look-ahead is not related to shifting input, but instead in deciding when to reduce.
I've been reading the Dragon book, but it is about as linear as a Tarantino script. It seems like a great reference for people who already know how to do this.
The first thing you need to do when learning about bottom-up parsing (such as LALR) is to remember that it is completely different from top-down parsing. Top-down parsing starts with a nonterminal, the left-hand-side (LHS) of a production, and guesses which right-hand-side (RHS) to use. Bottom-up parsing, on the other hand, starts by identifying the RHS and then figures out which LHS to select.
To be more specific, a bottom-up parser accumulates incoming tokens into a queue until a right-hand side is at the right-hand end of the queue. Then it reduces that RHS by replacing it with the corresponding LHS, and checks to see whether an appropriate RHS is at the right-hand edge of the modified accumulated input. It keeps on doing that until it decides that no more reductions will take place at that point in the input, and then reads a new token (or, in other words, takes the next input token and shifts it onto the end of the queue.)
This continues until the last token is read and all possible reductions are performed, at which point if what remains is the single non-terminal which is the "start symbol", it accepts the parse.
It is not obligatory for the parser to reduce a RHS just because it appears at the end of the current queue, but it cannot reduce a RHS which is not at the end of the queue. That means that it has to decide whether to reduce or not before it shifts any other token. Since the decision is not always obvious, it may examine one or more tokens which it has not yet read ("lookahead tokens", because it is looking ahead into the input) in order to decide. But it can only look at the next k tokens for some value of k, typically 1.
Here's a very simple example; a comma separated list:
1. Start -> List
2. List -> ELEMENT
3. List -> List ',' ELEMENT
Let's suppose the input is:
ELEMENT , ELEMENT , ELEMENT
At the beginning, the input queue is empty, and since no RHS is empty the only alternative is to shift:
queue remaining input action
---------------------- --------------------------- -----
ELEMENT , ELEMENT , ELEMENT SHIFT
At the next step, the parser decides to reduce using production 2:
ELEMENT , ELEMENT , ELEMENT REDUCE 2
Now there is a List at the end of the queue, so the parser could reduce using production 1, but it decides not to based on the fact that it sees a , in the incoming input. This goes on for a while:
List , ELEMENT , ELEMENT SHIFT
List , ELEMENT , ELEMENT SHIFT
List , ELEMENT , ELEMENT REDUCE 3
List , ELEMENT SHIFT
List , ELEMENT SHIFT
List , ELEMENT -- REDUCE 3
Now the lookahead token is the "end of input" pseudo-token. This time, it does decide to reduce:
List -- REDUCE 1
Start -- ACCEPT
and the parse is successful.
That still leaves a few questions. To start with, how do we use the FIRST and FOLLOW sets?
As a simple answer, the FOLLOW set of a non-terminal cannot be computed without knowing the FIRST sets for the non-terminals which might follow that non-terminal. And one way we can decide whether or not a reduction should be performed is to see whether the lookahead is in the FOLLOW set for the target non-terminal of the reduction; if not, the reduction certainly should not be performed. That algorithm is sufficient for the simple grammar above, for example: the reduction of Start -> List is not possible with a lookahead of ,, because , is not in FOLLOW(Start). Grammars whose only conflicts can be resolved in this way are SLR grammars (where S stands for "Simple", which it certainly is).
For most grammars, that is not sufficient, and more analysis has to be performed. It is possible that a symbol might be in the FOLLOW set of a non-terminal, but not in the context which lead to the current stack configuration. In order to determine that, we need to know more about how we got to the current configuration; the various possible analyses lead to LALR, IELR and canonical LR parsing, amongst other possibilities.
I have read this to understand more the difference between top down and bottom up parsing, can anyone explain the problems associated with left recursion in a top down parser?
In a top-down parser, the parser begins with the start symbol and tries to guess which productions to apply to reach the input string. To do so, top-down parsers need to use contextual clues from the input string to guide its guesswork.
Most top-down parsers are directional parsers, which scan the input in some direction (typically, left to right) when trying to determine which productions to guess. The LL(k) family of parsers is one example of this - these parsers use information about the next k symbols of input to determine which productions to use.
Typically, the parser uses the next few tokens of input to guess productions by looking at which productions can ultimately lead to strings that start with the upcoming tokens. For example, if you had the production
A → bC
you wouldn't choose to use this production unless the next character to match was b. Otherwise, you'd be guaranteed there was a mismatch. Similarly, if the next input character was b, you might want to choose this production.
So where does left recursion come in? Well, suppose that you have these two productions:
A → Ab | b
This grammar generates all strings of one or more copies of the character b. If you see a b in the input as your next character, which production should you pick? If you choose Ab, then you're assuming there are multiple b's ahead of you even though you're not sure this is the case. If you choose b, you're assuming there's only one b ahead of you, which might be wrong. In other words, if you have to pick one of the two productions, you can't always choose correctly.
The issue with left recursion is that if you have a nonterminal that's left-recursive and find a string that might match it, you can't necessarily know whether to use the recursion to generate a longer string or avoid the recursion and generate a shorter string. Most top-down parsers will either fail to work for this reason (they'll report that there's some uncertainty about how to proceed and refuse to parse), or they'll potentially use extra memory to track each possible branch, running out of space.
In short, top-down parsers usually try to guess what to do from limited information about the string. Because of this, they get confused by left recursion because they can't always accurately predict which productions to use.
Hope this helps!
Reasons
1)The grammar which are left recursive(Direct/Indirect) can't be converted into {Greibach normal form (GNF)}* So the Left recursion can be eliminated to Right Recuraive Format.
2)Left Recursive Grammars are also nit LL(1),So again elimination of left Recursion may result into LL(1) grammer.
GNF
A Grammer of the form A->aV is Greibach Normal Form.
Can you check on this: https://dl.dropbox.com/u/25439537/finite%20automata.png
This is a checked homework, so don't worry. I just want to clarify whether my answer is correct or not, because it is marked by my teacher as incorrect.
My answer is ((a+b)(a+b))*a
The first (a+b) signifies the upper arrows. The second (a+b) signifies the lower arrows. The last 'a' tells us that it should always end in 'a'.
I just want to record evidences from a lot of experts so that I can give it to my teacher.
I believe your answer is correct.
Let's consider the whole process as two parts: (1) start with start, and go back to start; and (2) go from start to end and accept. Obviously, the (1) part is a loop.
For (1), starting with start, either accept b or a. For b, it's b(a+b) to go back. For a, it's a(a+b) to go back. So (1) is b(a+b) + a(a+b) which is (a+b)(a+b).
For (2), it's a'.
So, the final result is (loop in (1))* (2) i.e. ( (a+b)(a+b) )* a.
Follow the description above, you can also come up with a proof of the equivalence between the two. Proof part (a) every sequence accepted by the automata is in the set ((a+b)(a+b))*a; part (b) every sequence in the set ((a+b)(a+b))*a is accepted by the automata.
Your answer is wrong, because it doesn't provide for strings starting with b.
The path (start) -> b -> a+b -> a -> (end) is accepted by your finite automaton, but not by your regex. The simplest counterexample to your answer being correct is the regex's rejection of the string "baba".
By the way, if the teacher gave you that regex without the "end" state having two concentric circles (to indicate being an accept state) it was probably a trick question. Having no accept state means your automaton rejects everything. The best way to describe that would be to just write down {} (the empty set).