I come with a question while doing one of the exercises for my parsing course in the university. The question specifically goes to constructing a parsing table using canonical LR items.
The given grammar production rules go as follow:
S -> NP VP
NP -> NP PP
NP -> a n
VP -> v
VP -> VP NP
VP -> VP PP
PP -> p NP
Now the problem is, when I try to construct the table by figuring out the states, I end up with this problem:
at I_0
S'-> .S
S->.NP VP
NP->.NP PP/.a n
In this above case, I have to perform go-to action on both S->.NP VP and NP->.NP PP. This would indicate that when I create the parsing table, I would have two states in which NP could bring the parsing to. Am I missing something here?
Please do note that while I can see this question not being a problem with a lookahead, this question was specifically for LR(0) exercise.
Related
In Python, I have an input of list like following-
[('S', ['NP', 'VP']),
('A', ['V', 'NP']),
('VP', ['V', 'NP']),
('NP', ['DET', 'NP']),
('N', "'mouse'"),
('NP', "'mouse'"),
('DET', "'the'"),
('V', "'saw'"),
('N', "'Ron'"),
('NP', "'Ron'")]
This is the result of the following CYK algorithm-
S -> NP VP
VP -> A NP | V NP
NP -> N N | DET NP | 'chocolate' | 'cat' | 'John' | 'Ron' | 'mouse'
DET -> 'the'
N -> 'chocolate' | 'cat' | 'John' | 'Ron' | 'mouse'
V -> 'saw' | 'bought' | 'ate'
A -> V NP
The string that I want to match with is "Ron saw the mouse"
I want to relate output like this-
(S (NP Ron) (VP (V saw) (NP (DET the) (NP mouse))))
I am not sure how the algorithm should be constructed especially with an ambiguous algorithm which may contain multiple outputs.
How should I construct code? Any suggestion what should be a better approach with/without recursion?
UPDATE---
I managed to get a single exact parse tree after adding extra parents and child nodes position values with the input list. But my problem doesn't solve with the ambiguous sentence.
I am just trying to understand a natural language interface for relational databases proposed by Fei Li (2014) (available as github project). Specifically I don't understand the grammar they define for a ParseTree of a natural language query to a database. This question is somewhat like this one but with a more complex grammar.
Background: A natural language sentence can be parsed as ParseTree (common library for this is the Stanford Parser) which describes the grammatical relations of words in a sentence.
The grammar for a valid ParseTree is:
Q -> (SClause)(ComplexCondition)*
SClause -> SELECT + GNP
ComplexCondition -> ON + (leftSubtree*rightSubtree)
leftSubtree -> GNP
rightSubtree -> GNP | VN | MIN | MAX
GNP -> (FN + GNP) | NP
NP -> NN + (NN)*(condition)*
condition -> VN | (ON + VN)
where
Q represents an entire query tree
+ a parent-child relationship
* a sibling relationship
SN is a SELECT node
ON is an OPERATOR node (e.g. =, <=)
FN is a FUNCTION node (e.g. AVG)
NN is a NAME node (e.g. a column in a db table)
VN is a VALUE node (i.e. a value in a column in a db table)
ComplexCondition must have one ON with a leftSubtree and rightSubtree
NP is one NN whose children are multiple NNs and conditions.
My questions:
why is condition defined as VN or (ON+VN)? This would mean that something like the digit 5 by itself can be a condition. Would make more sense of only the latter, i.e. (ON+VN) is a condition (e.g. >5)
How can a rightSubtree just be a function (e.g. MIN). Btw I am understanding the pipe | as logical or.
I understand that GNP is recursively defined but at a terminal node the GNP node must be just a NP node right? But a NP is defined as something that has children... HOW?!?!?!
The authors of the github project quoted above state: "Take a Value Node (VN) for example, according to the grammar, it is invalid if and only if it has children. I don't know how to infer this from the grammar
Thanks for the help
Question:
Given the following grammar, fix it to an LR(O) grammar:
S -> S' $
S'-> aS'b | T
T -> cT | c
Thoughts
I've been trying this for quite sometime, using automatic tools for checking my fixed grammars, with no success. Our professor likes asking this kind of questions on test without giving us a methodology for approaching this (except for repeated trying). Is there any method that can be applied to answer these kind of questions? Can anyone show this method can be applied on this example?
I don't know of an automatic procedure, but the basic idea is to defer decisions. That is, if at a particular state in the parse, both shift and reduce actions are possible, find a way to defer the reduction.
In the LR(0) parser, you can make a decision based on the token you just shifted, but not on the token you (might be) about to shift. So you need to move decisions to the end of productions, in a manner of speaking.
For example, your language consists of all sentences { ancmbn$ | n ≥ 0, m > 0}. If we restrict that to n > 0, then an LR(0) grammar can be constructed by deferring the reduction decision to the point following a b:
S -> S' $.
S' -> U | a S' b.
U -> a c T.
T -> b | c T.
That grammar is LR(0). In the original grammar, at the itemset including T -> c . and T -> c . T, both shift and reduce are possible: shift c and reduce before b. By moving the b into the production for T, we defer the decision until after the shift: after shifting b, a reduction is required; after c, the reduction is impossible.
But that forces every sentence to have at least one b. It omits sentences for which n = 0 (that is, the regular language c*$). That subset has an LR(0) grammar:
S -> S' $.
S' -> c | S' c.
We can construct the union of these two languages in a straight-forward manner, renaming one of the S's:
S -> S1' $ | S2' $.
S1' -> U | a S1' b.
U -> a c T.
T -> b | c T.
S2' -> c | S2' c.
This grammar is LR(0), but the form in which the end-of-input sentinel $ has been included seems to be cheating. At least, it violates the rule for augmented grammars, because an augmented grammar's base rule is always S -> S' $ where S' and $ are symbols not used in the original grammar.
It might seem that we could avoid that technicality by right-factoring:
S -> S' $
S' -> S1' | S2'
Unfortunately, while that grammar is still deterministic, and does recognise exactly the original language, it is not LR(0).
(Many thanks to #templatetypedef for checking the original answer, and identifying a flaw, and also to #Dennis, who observed that c* was omitted.)
Is it possible to use one of the parsing libraries (e.g. Parsec) for parsing something different than a String? And how would I do this?
For the sake of simplicity, let's assume the input is a list of ints [Int]. The task could be
drop leading zeros
parse the rest into the pattern (S+L+)*, where S is a number less than 10, and L is a number larger or equal to ten.
return a list of tuples (Int,Int), where fst is the product of the S and snd is the product of the L integers
It would be great if someone could show how to write such a parser (or something similar).
Yes, as user5402 points out, Parsec can parse any instance of Stream, including arbitrary lists. As there are no predefined token parsers (as there are for text) you have to roll your own, (myToken below) using e.g. tokenPrim
The only thing I find a bit awkward is the handling of "source positions". SourcePos is an abstract type (rather than a type class) and forces me to use its "filename/line/column" format, which feels a bit unnatural here.
Anyway, here is the code (without the skipping of leading zeroes, for brevity)
import Text.Parsec
myToken :: (Show a) => (a -> Bool) -> Parsec [a] () a
myToken test = tokenPrim show incPos $ justIf test where
incPos pos _ _ = incSourceColumn pos 1
justIf test x = if (test x) then Just x else Nothing
small = myToken (< 10)
large = myToken (>= 10)
smallLargePattern = do
smallints <- many1 small
largeints <- many1 large
let prod = foldl1 (*)
return (prod smallints, prod largeints)
myIntListParser :: Parsec [Int] () [(Int,Int)]
myIntListParser = many smallLargePattern
testMe :: [Int] -> [(Int, Int)]
testMe xs = case parse myIntListParser "your list" xs of
Left err -> error $ show err
Right result -> result
Trying it all out:
*Main> testMe [1,2,55,33,3,5,99]
[(2,1815),(15,99)]
*Main> testMe [1,2,55,33,3,5,99,1]
*** Exception: "your list" (line 1, column 9):
unexpected end of input
Note the awkward line/column format in the error message
Of course one could write a function sanitiseSourcePos :: SourcePos -> MyListPosition
There is very likely a way to get Parsec to use [a] as the stream type, but the idea behind parser combinators is actually very simple, and it's not very difficult to roll your own library.
A very accessible resource I would recommend is Monadic Parsing in Haskell by Graham Hutton and Erik Meijer.
Indeed, right now Erik Meijer is teaching an intro Haskell/functional programming course on edx.org (link) and Lecture 7 is all about functional parsers. As he states in the intro to the lecture:
"... No one can follow the path towards mastering functional programming without writing their own parser combinator library. We start by explaining what parsers are and how they can naturally be viewed as side-effecting functions. Next we define a number of basic parsers and higher-order functions for combining parsers. ..."
I am given the following sentence:
The bird tried to escape from the strong cage.
And the following grammar rules:
s->np, vp
np->det, n
np->det, adjp
adjp->adj, n
pp->p, np
comp->p, vp
vp->v, pp
vp->v, comp
I tried left most derivation to derive the tree and also from just doing it through bottom up analysis. Here is a simple chart I tried:
The question I have is whether it is possible to have two S which will lead up to the route of a single S
More concretely is this acceptable:
s
/ \
s s
/ \ / \
NP VP VP NP
According to your grammar, a prepositional phrase (pp) consists of a preposition (p) followed by a noun phrase (np). But your parse tree shows pps consisting only of a preposition ("to" and "from"). If you do the bottom-up parse with this in mind, you should arrive at the correct answer.
To answer your direct question, your grammar does not allow s to consist of two ss; only of a noun phrase (np) followed by a verb phrase (vp).