Normalizing structural differences in grammars - parsing

Consider the following grammar:
S → A | B
A → xy
B → xyz
This is what I think an LR(0) parser would do given the input xyz:
| xyz → shift
x | yz → shift
xy | z → reduce
A | z → shift
Az | → fail
If my assumption is correct and we changed rule B to read:
B → Az
now the grammar suddenly becomes acceptable by an LR(0) parser. I presume this new grammar describes the exact same set of strings than the first grammar in this question.
What are the differences between the first and second grammars?
How do we decouple structural differences in our grammars from the language they describe?
Through normalization?
What kind of normalization?
To further clarify:
I want to describe a language to a parser, without the structure of the grammar playing a role. I'd like to obtain the most minimal/fundamental description of a set of strings. For LR(k) grammars, I'd like to minimize the k.

I think your LR(0) parser is not a standard parser:
Source
An LR(0) parser is a shift/reduce parser that uses zero tokens of lookahead to determine what action to take (hence the 0). This means that in any configuration of the parser, the parser must have an unambiguous action to choose - either it shifts a specific symbol or applies a specific reduction. If there are ever two or more choices to make, the parser fails and we say that the grammar is not LR(0).
So, When you have:
S->A|B
A->xy
B->Az
or
S->A|B
A->xy
B->xyz
LR(0) will never check B rule, And for both of them it will fail.
State0 - Clousure(S->°A):
S->°A
A->°xy
Arcs:
0 --> x --> 2
0 --> A --> 1
-------------------------
State1 - Goto(State0,A):
S->A°
Arcs:
1 --> $ --> Accept
-------------------------
State2 - Goto(State0,x):
A->x°y
Arcs:
2 --> y --> 3
-------------------------
State3 - Goto(State2,y):
A->xy°
Arcs:
-------------------------
But if you have
I->S
S->A|B
A->xy
B->xyz or B->Az
Both of them will accept the xyz, but in difference states:
State0 - Clousure(I->°S):
I->°S
S->°A
S->°B
A->°xy A->°xy, $z
B->°xyz B->°Az, $
Arcs:
0 --> x --> 4
0 --> S --> 1
0 --> A --> 2
0 --> B --> 3
-------------------------
State1 - Goto(State0,S):
I->S°
Arcs:
1 --> $ --> Accept
-------------------------
State2 - Goto(State0,A):
S->A° S->A°, $
B->A°z, $
Arcs: 2 --> z --> 5
-------------------------
State3 - Goto(State0,B):
S->B°
Arcs:
-------------------------
State4 - Goto(State0,x):
A->x°y A->x°y, $z
B->x°yz
Arcs:
4 --> y --> 5 4 --> y --> 6
-------------------------
State5 - Goto(State4,y): - Goto(State2,z):
A->xy° B->Az°, $
Arcs:
5 --> z --> 6 -<None>-
-------------------------
State6 - Goto(State5,z): - Goto(State4,y)
B->xyz° A->xy°, $z
Arcs:
-------------------------
You can see the Goto Table and Action Table is different.
[B->xyz] [B->Az]
| Stack | Input | Action | Stack | Input | Action
--+---------+--------+---------- --+---------+--------+----------
1 | 0 | xyz$ | Shift 1 | 0 | xyz$ | Shift
2 | 0 4 | yz$ | Shift 2 | 0 4 | xy$ | Shift
3 | 0 4 5 | z$ | Shift 3 | 0 4 6 | z$ | Reduce A->xy
4 | 0 4 5 6 | $ | Reduce B->xyz 4 | 0 2 | z$ | Shift
5 | 0 3 | $ | Reduce S->B 5 | 0 2 5 | $ | Reduce B->Az
6 | 0 1 | $ | Accept 6 | 0 3 | $ | Reduce S->B
7 | 0 1 | $ | Accept
Simply when you change B->xyz to B->Az you add an Action to your LR Table to find the differences you can check Action Table and Goto Table (Constructing LR(0) parsing tables)
When you have A->xy and B->xyz then you have two bottom handles [xy or xyz] but when you have B->Az you have only one bottom handle [xy] that can accept an additional z.
I think related to local optimization -c=a+b; d=a+b -> c=a+b; d=c- when you use B->Az you make B->xyz optimized.

Related

How do classify class when there's a tie using a K value in KNN

so I had this question I was debating on with a friend.
The questions goes like what should be the minimum value of K, so that "Naeem" can be classified as:
F
B
Here are the values the distances I calculated given the matrix:
Name | A | B | C | Class| Distance from Naeem
--------|-------|-------|---|------|--------------------
'Kamran'| 35 | 35 | 3 | 'A' | 15.17
'Zahid' | 22 | 50 | 2 | 'B' | 15.0
'Imran' | 63 | 200 | 1 | 'C' | 152.24
'Azfer' | 59 | 170 | 1 | 'D' | 122.0
'Raza' | 25 | 40 | 4 | 'E' | 15.75
'Aamir' | 35 | 150 | 1 | 'A' | 100.02
'Zia' | 25 | 120 | 3 | 'B' | 71.03
'Ishrat'| 26 | 90 | 4 | 'C' | 41.53
'Khalid'| 40 | 60 | 2 | 'F' | 10.44
'Naeem' | 37 | 50 | 2 | ? |
Now we agree that for Naeem to be of class F, K will be 1.
However when it comes for Naeem to be of class B, he says that it'll be K=3 because that's the first time that B class is considered as nearest neighbour, but I say that for classification we need not to have ties of classes which K=3 will bring (F,A,B) and rather we need to use K=4 so that we have two neighbours with class B and as majority wins, Naeem will be classified as B only when K=4.
Any insights on who's correct or we are both understanding something wrong?
According to me, for 'Naeem' to be classified as 'F' value of K must be equal to one.
When it comes for "Naeem" to be of class B, value of K must be number that has a majority of B. We achieve majority of B when value of K is set to 6.
K=1 gives {F}
K=2 gives {F,B}
K=3 gives {F,B,A}
K=4 gives {F,B,A,E}
K=5 gives {F,B,A,E,C}
K=6 gives {F,B,A,E,C,B}
for k=6, all other variable have 1 repetitions and B has 2 , so then 'Naeem' will be classified as B

Designing a DFA

I want to design a DFA for the following language after fixing ambiguity.
I thought and tried a lot but couldn't get a proper answer.
S->aA|aB|lambda
A->aA|aS
B->bB|aB|b
I recommend first getting an NFA by considering this to be a regular grammar; then, determinize the NFA, and then we can write down a new grammar that's equivalent to this one but unambiguous (for the same reason the determinized automaton is deterministic). Writing down the NFA for this grammar is easy: productions of the form X -> sY translate into transitions from state X to state Y on input s. Similarly, transitions of the form X -> lambda mean X is an accepting state, and transitions of the form X -> b imply a new accepting state that transitions to a dead state.
We need states for each nonterminal symbol S, A and B; and we will have transitions for every production. Our NFA looks like this:
/---a----\
| |
V |
----->(S)--a-->(A)<--\
| | |
a \--a-/ /--a,b--\
| | |
V V |
/--->(B)--b-->(X)-a,b->(Y)<-----/
| |
\-a,b-/
Here, states (S) and (X) are accepting, state (Y) is a dead state (we didn't really need to depict this explicitly, but bear with me) and this automaton is totally equivalent to the grammar. Now, we need to determinize this. States of the determinized automaton will correspond to subsets of states from the nondeterministic version. Our first deterministic state will correspond to the set containing just (S), and we will figure out the other required subsets (of which we can have at most 32, since we have 5 states and 2 to the power of 5 is 32) using the transitions:
Q s Q'
{(S)} a {(A),(B)}
{(S)} b empty
{(A),(B)} a {(A),(B),(S)}
{(A),(B)} b {(B),(X)}
{(A),(B),(S)} a {(A),(B),(S)}
{(A),(B),(S)} b {(B),(X)}
{(B),(X)} a {(B),(Y)}
{(B),(X)} b {(B),(X),(Y)}
{(B),(Y)} a {(B),(Y)}
{(B),(Y)} b {(B),(X),(Y)}
{(B),(X),(Y)} a {(B),(Y)}
{(B),(X),(Y)} b {(B),(X),(Y)}
We encountered six states, plus a dead state (empty) which we can name q1 through q6, plus qD. All of the states corresponding to subsets with either (S) or (X) in them are accepting, and (S) is the initial state. Our DFA looks like this:
/-a,b-\
| |
V |
----->(q1)--b-->(qD)----/
|
a /--a--\
| | |
V V |
(q2)--a-->(q3)----/
| |
b |
| b
V |
/--(q4)<------/ /--b--\
| | | |
| \------b------(q6)<---+
a /--a----\ | |
| | | | |
\-->(q5)<-----+--a-/ |
| |
\---------b---------/
Finally, we can read off the unambiguous regular grammar from our DFA:
(q1) -> a(q2) | b(qD) | lambda
(qD) -> a(qD) | b(qD)
(q2) -> a(q3) | b(q4)
(q3) -> a(q3) | b(q4) | lambda
(q4) -> a(q5) | b(q6) | lambda
(q5) -> a(q5) | b(q6)
(q6) -> a(q5) | b(q6) | lambda

How do I count merged cells separately in Google Spreadsheets?

I'm using countif(a1:a10, "*") to sum the number of names in a guest list. However, some cells have been merged, e.g. for married couples or families where only one name is supplied.
See below for a concrete example, where (+ x) are merged with the cell above:
| A | B |
1 | Cheah | Teo |
2 | Hadi's Family | Robinson |
3 | (+ wife) | Müller |
4 | (+ son) | Chan |
5 | Ganesan | Yeong |
6 | Chng | (+ wife) |
7 | Tan | Ng |
8 | Williams | (+ husband) |
9 | Brecht | (+ daughter) |
10 | Ahmad | |
In the example above, I would like to obtain 10 in column A and 9 in column B.
I've seen #MaxMakhrov's suggestion of last non-empty row, but haven't been able to get a working solution out of it yet.
Any ideas?
The formula of finding last non-empty row may help:
=MAX(FILTER(ROW(A:A),A:A<>""))
But it will give the wrong result if last cells are merged:
Example:
1 a
2
3 b > merged
4
Last non-emty row is 2 in example above because cells 2,3,4 are merged.

Left recursion elimination

I have this grammar
S->S+S|SS|(S)|S*|a
I want to know how to eliminate the left recursion from this grammar because the S+S is really confusing...
Let's see if we can simplify the given grammar.
S -> S*|S+S|SS|(S)|a
We can write it as;
S -> S*|SQ|SS|B|a
Q -> +S
B -> (S)
Now, you can eliminate left recursion in familiar territory.
S -> BS'|aS'
S' -> *S'|QS'|SS'|e
Q -> +S
B -> (S)
Note that e is epsilon/lambda.
We have removed the left recursion, so we no longer have need of Q and B.
S -> (S)S'|aS'
S' -> *S'|+SS'|SS'|e
You'll find this useful when dealing with left recursion elimination.
My answer using theory from this reference
How to Eliminate Left recursion in Context-Free-Grammar.
S --> S+S | SS | S* | a | (S)
-------------- -------
Sα form β form
Left-Recursive-Rules Non-Left-Recursive-Rules
We can write like
S ---> Sα1 | Sα2 | Sα3 | β1 | β2
Rules to convert in equivalent Non-recursive grammar:
S ---> β1 | β2
Z ---> α1 |
α2 | α3
Z ---> α1Z |
α2Z | α3Z
S ---> β1Z | β2Z
Where
α1 = +S
α2 = S
α3 = *
And β-productions not start starts with S:
β1 = a
β2 = (S)
Grammar without left-recursion:
Non- left recursive Productions S --> βn
S --> a | (S)
Introduce new variable Z with following productions: Z ---> αn and Z --> αnZ
Z --> +S | S | *
and
Z --> +SZ | SZ | *Z
And new S productions: S --> βnZ
S --> aZ | (S)Z
Second form (answer)
Productions Z --> +S | S | * and Z --> +SZ | SZ | *Z can be combine as Z --> +SZ | SZ | *Z| ^ where ^ is null-symbol.
Z --> ^ use to remove Z from production rules.
So second answer:
S --> aZ | (S)Z and Z --> +SZ | SZ | *Z| ^

difference between top down and bottom up parsing techniques?

I guess the same logic is applied in both of them, i.e replacing the matched strings with the corresponding non-terminal elements as provided in the production rules.
Why do they categorize LL as top down and LR as bottom-up?
Bottom up parsing:
Bottom-up parsing (also known as
shift-reduce parsing) is a strategy
for analyzing unknown data
relationships that attempts to
identify the most fundamental units
first, and then to infer higher-order
structures from them. It attempts to
build trees upward toward the start
symbol.
Top-down parsing:
Top-down parsing is a strategy of
analyzing unknown data relationships
by hypothesizing general parse tree
structures and then considering
whether the known fundamental
structures are compatible with the
hypothesis.
Top down parsing
involves to generating the string from first non-terminal.
Example: recursive descent parsing,non-recursive descent parsing, LL parsing, etc.
The grammars with left recursive and left factoring do not work.
Might occur backtracking.
Use of left most derivation
Things Of Interest Blog
The difference between top-down parsing and bottom-up parsing
Given a formal grammar and a string produced by that grammar, parsing is figuring out the production process for that string.
In the case of the context-free grammars, the production process takes the form of a parse tree. Before we begin, we always know two things about the parse tree: the root node, which is the initial symbol from which the string was originally derived, and the leaf nodes, which are all the characters of the string in order. What we don't know is the layout of nodes and branches between them.
For example, if the string is acddf, we know this much already:
S
/|\
???
| | | | |
a c d d f
Example grammar for use in this article
S → xyz | aBC
B → c | cd
C → eg | df
Bottom-up parsing
This approach is not unlike solving a jigsaw puzzle. We start at the bottom of the parse tree with individual characters. We then use the rules to connect the characters together into larger tokens as we go. At the end of the string, everything should have been combined into a single big S, and S should be the only thing we have left. If not, it's necessary to backtrack and try combining tokens in different ways.
With bottom-up parsing, we typically maintain a stack, which is the list of characters and tokens we've seen so far. At each step, we shift a new character onto the stack, and then reduce as far as possible by combining characters into larger tokens.
Example
String is acddf.
Steps
ε can't be reduced
a can't be reduced
ac can be reduced, as follows:
reduce ac to aB
aB can't be reduced
aBd can't be reduced
aBdd can't be reduced
aBddf can be reduced, as follows:
reduce aBddf to aBdC
aBdC can't be reduced
End of string. Stack is aBdC, not S. Failure! Must backtrack.
aBddf can't be reduced
ac can't be reduced
acd can be reduced, as follows:
reduce acd to aB
aB can't be reduced
aBd can't be reduced
aBdf can be reduced, as follows:
reduce aBdf to aBC
aBC can be reduced, as follows:
reduce aBC to S
End of string. Stack is S. Success!
Parse trees
|
a
| |
a c
B
| |
a c
B
| | |
a c d
B
| | | |
a c d d
B
| | | | |
a c d d f
B C
| | | |\
a c d d f
| |
a c
| | |
a c d
B
| /|
a c d
B
| /| |
a c d d
B
| /| | |
a c d d f
B C
| /| |\
a c d d f
S
/|\
/ | |
/ B C
| /| |\
a c d d f
Example 2
If all combinations fail, then the string cannot be parsed.
String is acdg.
Steps
ε can't be reduced
a can't be reduced
ac can be reduced, as follows:
reduce ac to aB
aB can't be reduced
aBd can't be reduced
aBdg can't be reduced
End of string. Stack is aBdg, not S. Failure! Must backtrack.
ac can't be reduced
acd can be reduced, as follows:
reduce acd to aB
aB can't be reduced
aBg can't be reduced
End of string. stack is aBg, not S. Failure! Must backtrack.
acd can't be reduced
acdg can't be reduced
End of string. Stack is is acdg, not S. No backtracking is possible. Failure!
Parse trees
|
a
| |
a c
B
| |
a c
B
| | |
a c d
B
| | | |
a c d g
| |
a c
| | |
a c d
B
| /|
a c d
B
| /| |
a c d g
| | |
a c d
| | | |
a c d g
Top-down parsing
For this approach we assume that the string matches S and look at the internal logical implications of this assumption. For example, the fact that the string matches S logically implies that either (1) the string matches xyz or (2) the string matches aBC. If we know that (1) is not true, then (2) must be true. But (2) has its own further logical implications. These must be examined as far as necessary to prove the base assertion.
Example
String is acddf.
Steps
Assertion 1: acddf matches S
Assertion 2: acddf matches xyz:
Assertion is false. Try another.
Assertion 2: acddf matches aBC i.e. cddf matches BC:
Assertion 3: cddf matches cC i.e. ddf matches C:
Assertion 4: ddf matches eg:
False.
Assertion 4: ddf matches df:
False.
Assertion 3 is false. Try another.
Assertion 3: cddf matches cdC i.e. df matches C:
Assertion 4: df matches eg:
False.
Assertion 4: df matches df:
Assertion 4 is true.
Assertion 3 is true.
Assertion 2 is true.
Assertion 1 is true. Success!
Parse trees
S
|
S
/|\
a B C
| |
S
/|\
a B C
| |
c
S
/|\
a B C
/| |
c d
S
/|\
a B C
/| |\
c d d f
Example 2
If, after following every logical lead, we can't prove the basic hypothesis ("The string matches S") then the string cannot be parsed.
String is acdg.
Steps
Assertion 1: acdg matches S:
Assertion 2: acdg matches xyz:
False.
Assertion 2: acdg matches aBC i.e. cdg matches BC:
Assertion 3: cdg matches cC i.e. dg matches C:
Assertion 4: dg matches eg:
False.
Assertion 4: dg matches df:
False.
False.
Assertion 3: cdg matches cdC i.e. g matches C:
Assertion 4: g matches eg:
False.
Assertion 4: g matches df:
False.
False.
False.
Assertion 1 is false. Failure!
Parse trees
S
|
S
/|\
a B C
| |
S
/|\
a B C
| |
c
S
/|\
a B C
/| |
c d
Why left-recursion is a problem for top-down parsers
If our rules were left-recursive, for example something like this:
S → Sb
Then notice how our algorithm behaves:
Steps
Assertion 1: acddf matches S:
Assertion 2: acddf matches Sb:
Assertion 3: acddf matches Sbb:
Assertion 4: acddf matches Sbbb:
...and so on forever
Parse trees
S
|
S
|\
S b
|
S
|\
S b
|\
S b
|
S
|\
S b
|\
S b
|\
S b
|
...

Resources