NPDA for this language with n and n-1 - automata

To draw the transition graph of a NPDA that accepts L, I think one could start this problem by reading an a, writing an a, then moving right, then doing the same for b so that there is some state q1 that is getting those "ab" movements. But then how is it possible to get bb to have n-1?
I have something that I think works but I'm teaching this to myself so maybe someone can show me where the n and n-1 could be done correctly.
EDIT:
But now this should be it --

It's quite simple actually. Each time you read "ab", you store a token in the stack. Then after reading "c", you take one from that token. Then for each "bb", you take one token from the stack. When the stack is empty you accept.
For the NPDA itself, it depends how do you define the acceptance. From your question, I don't get what do you mean by "writing an a, then moving right". Are you confusing NPDA with a Turing Machine? An NPDA is very similar to an NFA (Non-deterministic Finite Automata) except that it's equipped with a stack, in which you can put stack symbols. Read up more here: http://www.cs.duke.edu/~rodger/courses/cps140/lects/sectnpdaH.pdf
Below is the transition table, assuming stack symbols {A,Z} where Z is the initial stack symbol. Accept when we are in the only accepting state qa and we have finished reading the input tape. Assume that each transition always consume one stack symbol, and if there is no more symbol to be consumed while the NPDA is not in acceptance state, the NPDA does not accept (or reject) the string.
In the table below, the first column represents the state we are currently in and the stack symbol we are reading. Subsequent columns represent the new state and stack symbol pushed to the stack upon reading a particular character in input string (or not reading any character at all with ε)
+------++---------+-------+--------+-------+
| || a | b | c | ε |
+------++---------+-------+--------+-------+
|(q1,Z)|| (q2,Z) | - | - | - |
+------++---------+-------+--------+-------+
|(q1,A)|| (q2,A) | - | (q3,ε) | - |
+------++---------+-------+--------+-------+
|(q2,Z)|| - |(q1,ZA) | - | - |
+------++---------+-------+--------+-------+
|(q2,A)|| - |(q1,AA) | - | - |
+------++---------+-------+--------+-------+
|(q3,Z)|| - | - | - |(qa,ε) |
+------++---------+-------+--------+-------+
|(q3,A)|| - | (q4,A) | - | - |
+------++---------+-------+--------+-------+
|(q4,A)|| - | (q3,ε) | - | - |
+------++---------+-------+--------+-------+
The key idea here is that the number of A in the stack represents how many times the phrase "ab" appears in the input string.
You can see that in reading "a" in state q1 with stack symbol Z, we push back Z to the stack. Meaning there is still no "ab" detected yet. It will then go to q2.
In q2, we only accept "b", any other input will cause it to hang (and hence reject). It pushes back the state it read into the stack, plus an additional A, effectively increasing the number of A in the stack by 1, since this transition represents an additional phrase "ab" in the input string.
The state q3 represents the part after successfully reading "c". Note that transition from q1 to q3 must consume A, not Z, since we have the constraint n >= 1. Then, upon reading "b", it will go to state q4 to wait for another "b". Later q4, upon reading "b", will consume the stack symbol A, matching
At state q3 we also want a transition to acceptance state qa once we reach stack symbol Z, referring to the state where there are n copies of "ab" and n-1 copies of "bb".
State q4 only accepts the input string "b" with stack symbol A, representing finding a new phrase "bb" should match one phrase "ab" previously found.
Hope this helps!

Related

LR(1) - How do I know how many items to pop of the node stack when there are epsilon productions?

Suppose I have this simple grammar (labeled):
1 || S' -> A;
2 || A -> a B C D z;
3 || B -> b E;
4 || E -> e | ;
5 || C -> c | ;
6 || D -> d | ;
I can construct the LR(1) parsing table, but I don't fully understand how to parse it. You can take a look at the table here. Suppose the input is:
a b z
If you look at production #2, you'll match a, then in production #3, b, but then, at the end of production #3, there is a nullable production. The same applies to the end of production #2 (except for the z). When building the node stack, I have to pop off the number of symbols in the rhs of the production I am reducing, but how will I know how much when some of the symbols are nullable?
A symbol is a symbol and an empty right-hand side is empty. It's as simple as that.
If you are reducing an empty right-hand side, there are zero symbols and so you pop zero things off the stack.
If you are reducing a right-hand side with three symbols, there are three symbols and you need to pop three symbols off the stack (and associated states). It's irrelevant how many tokens each symbol on the right-hand side was associated with, because it has been reduced to one symbol, and that symbol was pushed onto the stack by the GOTO action.

Rules to convert a CFG to a NDPA?

I have to define an FA by using this grammar:
S -> aSb
S -> c
S -> dA
A -> Sd
How do I manage the first rule and the last one?
For the second one I think I have to create another state (the final one) and link S and this new state. For the third one instead, I think I have to create the state "A" and link it to S by passing "d".
There are algorithms you can use to get a PDA from a CFG: look into top-down and bottom-up parsers, for instance. What I think of as the usual proof that PDAs accept languages generated by CFGs, and vice versa, uses such a construction.
An alternative is to understand the language generated by the grammar, and to design a PDA for it directly. This is less mechanical but has the potential to yield a more concise PDA. If you want to go this route, we can first simplify the grammar by recognizing the nonterminal A can safely be replaced by the RHS of the only production for it:
S -> aSb
S -> c
S -> dSd // removed A -> Sd and replaced here
How does this grammar work?
You have c in the middle by the 2nd production;
You have matching ds on the left and right of the c;
You have as on the left matching bs on the right of c.
A PDA should work as follows:
Read as and ds until you see a c. Push everything on the stack as you go. When you see a c, go to the next state, but don't push the c.
Read bs and ds, popping as and ds from the stack, until:
The topmost stack symbol doesn't match input; crash.
You run out of input with symbols still on the stack; crash.
You run out of stack symbols with input remaining; crash.
You run out of stack and input simultaneously; accept.
Here's a transition table:
q s x q' s'
------------------------------
q0 a,d,Z a q0 aa,ad,aZ
q0 a,d,Z d q0 da,dd,dZ
q0 a,d,Z c q1 a,d,Z
q1 a b q1 -
q1 d d q1 -
If we accept in q1 by empty stack, these transitions are enough. If we want to accept by empty stack or accepting state, we could add a transition like f(q1, Z, -) = (q2, Z) and make q2 accepting; the PDA would transition there nondeterministically and would crash unless the input were also exhausted.

Can a table-based LL parser handle repetition without right-recursion?

I understand how an LL recursive descent parser can handle rules of this form:
A = B*;
with a simple loop that checks whether to continue looping or not based on whether the lookahead token matches a terminal in the FIRST set of B. However, I'm curious about table based LL parsers: how can rules of this form work there? As far as I know, the only way to handle repetition like this in one is through right-recursion, but that messes up associativity in cases where a right-associative parse tree is not desired.
I'd like to know because I'm currently attempting to write an LL(1) table-based parser generator and I'm not sure how to handle a case like this without changing the intended parse tree shape.
The Grammar
Let's expand your EBNF grammar to simple BNF and assume, that b is a terminal and <e> is an empty string:
A -> X
X -> BX
X -> <e>
B -> b
This grammar produces strings of terminal b's of any length.
The LL(1) Table
To construct the table, we will need to generate the first and follow sets (constructing an LL(1) parsing table).
First sets
First(α) is the set of terminals that begin strings derived from any string of grammar symbols α.
First(A) : b, <e>
First(X) : b, <e>
First(B) : b
Follow sets
Follow(A) is the set of terminals a that can
appear immediately to the right of a nonterminal A.
Follow(A) : $
Follow(X) : $
Follow(B) : b$
Table
We can now construct the table based on the sets, $ is the end of input marker.
+---+---------+----------+
| | b | $ |
+---+---------+----------+
| A | A -> X | A -> X |
| X | X -> BX | X -> <e> |
| B | B -> b | |
+---+---------+----------+
The parser action always depends on the top of the parse stack and the next input symbol.
Terminal on top of the parse stack:
Matches the input symbol: pop stack, advance to the next input symbol
No match: parse error
Nonterminal on top of the parse stack:
Parse table contains production: apply production to stack
Cell is empty: parse error
$ on top of the parse stack:
$ is the input symbol: accept input
$ is not the input symbol: parse error
Sample Parse
Let us analyze the input bb. The initial parse stack contains the start symbol and the end marker A $.
+-------+-------+-----------+
| Stack | Input | Action |
+-------+-------+-----------+
| A $ | bb$ | A -> X |
| X $ | bb$ | X -> BX |
| B X $ | bb$ | B -> b |
| b X $ | bb$ | consume b |
| X $ | b$ | X -> BX |
| B X $ | b$ | B -> b |
| b X $ | b$ | consume b |
| X $ | $ | X -> <e> |
| $ | $ | accept |
+-------+-------+-----------+
Conclusion
As you can see, rules of the form A = B* can be parsed without problems. The resulting concrete parse tree for input bb would be:
Yes, this is definitely possible. The standard method of rewriting to BNF and constructing a parse table is useful for figuring out how the parser should work – but as far as I can tell, what you're asking is how you can avoid the recursive part, which would mean that you'd get the slanted binary tree/linked list form of AST.
If you're hand-coding the parser, you can simply use a loop, using the lookaheads from the parse table that indicate a recursive call to decide to go around the loop once more. (I.e., you could just use while with those lookaheads as the condition.) Then for each iteration, you simply append the constructed subtree as a child of the current parent. In your case, then, A would get several direct B-children.
Now, as I understand it, you're building a parser generator, and it might be easiest to follow the standard procedure, going via plan BNF. However, that's not really an issue; there is no substantive difference between iteration and recursion, after all. You simply have to have a class of “helper rules” that don't introduce new AST nodes, but that rather append their result to the node of the nonterminal that triggered them. So when turning the repetition into X -> BX, rather than constructing X nodes, you have your X rule extend the child-list of the A or X (whichever triggered it) by its own children. You'll still end up with A having several B children, and no X nodes in sight.

Eliminating Left Recursion

So I have some grammar that doesn't work for a top-down parser due to it having left recursion:
L::= A a | B b
A::= L b | a a
B::= b B b | b a
So in order to fix this, I have to remove the left recursion. To do this, I do a little substitute-like-thing:
L::= A a | B b
A::= A a b | B b b | a a (I plugged in the possible values of "L")
A then turns to (I believe):
A::= a a A' | B b b
A'::= a b A' | ε
I'm fairly certain that I'm correct up to there (wouldn't be surprised if I'm not, though). Where I'm struggling is now removing the left recursion out of "B b b". I've tried going about this so many ways, and I don't think any of them work. Here's one that seems most logical, but ugly as well (thus saying it's probably wrong). Starting by manipulating B::= b B b | b a
B::= b B b | b a
B::= b B' (they both start with b, so maybe i can pull it out?)
B'::= B b | a
B'::= b B' b | a (substituted B's value in)
B'::= b B" | a
B"::= b B" |a B" | ε
So I guess to show what the finalized B's would be:
B::= b B'
B'::= b B" | a
B"::= b B" | a B" | ε
This seems way too ugly to be correct. Especially since I'd have to plug that into the new "A" terminals that I created.
Can someone help me out? No idea if I'm going about this the right way. I'm supposed to be able to create an LL(1) parse table afterward (should be able to do that part on my own).
Thanks.
In a parser that tries to expand nonterminals from the left, if some nonterminal can expand to a string with itself on the left, the parser may expand that nonterminal forever without actually parsing anything. This is left recursion. On the other hand, it is perfectly fine if a nonterminal expands to a string with some different nonterminal on the left, as long as no chain of expansions produces a string with the original nonterminal on the left. Similarly, it is fine if nonterminals aren't all the way to the right in the expansion, as long as they're not all the way to the left.
tl;dr There's nothing wrong with B b b or b B b. You've removed all the left recursion. You don't need to keep going.

Shift-reduce: when to stop reducing?

I'm trying to learn about shift-reduce parsing. Suppose we have the following grammar, using recursive rules that enforce order of operations, inspired by the ANSI C Yacc grammar:
S: A;
P
: NUMBER
| '(' S ')'
;
M
: P
| M '*' P
| M '/' P
;
A
: M
| A '+' M
| A '-' M
;
And we want to parse 1+2 using shift-reduce parsing. First, the 1 is shifted as a NUMBER. My question is, is it then reduced to P, then M, then A, then finally S? How does it know where to stop?
Suppose it does reduce all the way to S, then shifts '+'. We'd now have a stack containing:
S '+'
If we shift '2', the reductions might be:
S '+' NUMBER
S '+' P
S '+' M
S '+' A
S '+' S
Now, on either side of the last line, S could be P, M, A, or NUMBER, and it would still be valid in the sense that any combination would be a correct representation of the text. How does the parser "know" to make it
A '+' M
So that it can reduce the whole expression to A, then S? In other words, how does it know to stop reducing before shifting the next token? Is this a key difficulty in LR parser generation?
Edit: An addition to the question follows.
Now suppose we parse 1+2*3. Some shift/reduce operations are as follows:
Stack | Input | Operation
---------+-------+----------------------------------------------
| 1+2*3 |
NUMBER | +2*3 | Shift
A | +2*3 | Reduce (looking ahead, we know to stop at A)
A+ | 2*3 | Shift
A+NUMBER | *3 | Shift (looking ahead, we know to stop at M)
A+M | *3 | Reduce (looking ahead, we know to stop at M)
Is this correct (granted, it's not fully parsed yet)? Moreover, does lookahead by 1 symbol also tell us not to reduce A+M to A, as doing so would result in an inevitable syntax error after reading *3 ?
The problem you're describing is an issue with creating LR(0) parsers - that is, bottom-up parsers that don't do any lookahead to symbols beyond the current one they are parsing. The grammar you've described doesn't appear to be an LR(0) grammar, which is why you run into trouble when trying to parse it w/o lookahead. It does appear to be LR(1), however, so by looking 1 symbol ahead in the input you could easily determine whether to shift or reduce. In this case, an LR(1) parser would look ahead when it had the 1 on the stack, see that the next symbol is a +, and realize that it shouldn't reduce past A (since that is the only thing it could reduce to that would still match a rule with + in the second position).
An interesting property of LR grammars is that for any grammar which is LR(k) for k>1, it is possible to construct an LR(1) grammar which is equivalent. However, the same does not extend all the way down to LR(0) - there are many grammars which cannot be converted to LR(0).
See here for more details on LR(k)-ness:
http://en.wikipedia.org/wiki/LR_parser
I'm not exactly sure of the Yacc / Bison parsing algorithm and when it prefers shifting over reducing, however I know that Bison supports LR(1) parsing which means it has a lookahead token. This means that tokens aren't passed to the stack immediately. Rather they wait until no more reductions can happen. Then, if shifting the next token makes sense it applies that operation.
First of all, in your case, if you're evaluating 1 + 2, it will shift 1. It will reduce that token to an A because the '+' lookahead token indicates that its the only valid course. Since there are no more reductions, it will shift the '+' token onto the stack and hold 2 as the lookahead. It will shift the 2 and reduce to an M since A + M produces an A and the expression is complete.

Resources