Why this is not grammar LL(1) - parsing

I have been given a task to transform this grammar to LL(1)
E → E+E | E-E | E*E | E/E | E^E | -E | (E)| id | num
So for first step I eliminated ambiguity and got this:
E → E+T | E-T | T
T → T*P | T/P | P
P → P ^ F | F
F → id | num | -E | (E)
And after that I have eliminated left-recursion and got this:
E → TE'
E' → +TE' | -TE' | ɛ
T → PT'
T' → *PT' | /PT' | ɛ
P → FP'
P' → ^FP' | ɛ
F → id | num | -E | (E)
When I put this into JFLAP and click 'Build LL(1) parse table' I get warning that grammar above is not LL(1).
Why this grammar is not LL(1), and what do I need to change to be LL(1).

Your grammar is still ambiguous, so it can't be LL(1).
This production F → -E makes it possible to mix an expression with lower precedence operators in a level (unary operator) where they shouldn't appear.
Note that id + - id + id has two derivation trees.
You shouldn't use E there, but a symbol that represents an atomic value. You could replace
F → id | num | -E | (E)
with
F → -A | A
A → id | num | (E)
Or F → -F | A if you want to allow multiple unary minuses.

Related

<ParseException> Unexpected characters ($F[0] = (shift #F) .".$F[0]"; $, = ":"; print #F;'']

This is the yaml file:
tasks:
test: {include: [bash_exec], args:['-c', 'state --m=4 in=in4.db | cppextract -f , -P NEW_MODEL /stdin Id Date {a,b,b2}{c,d}L {d1,d2,d3,d4}{x,}y | perl -lane '$F[0] = (shift #F) .".$F[0]"; $, = ":"; print #F;' | state2 --id=Id.Date wq.db -'], answer: '{{out}}/utestt.csv', n: 5, cols: [f,k]}
When parsed, it yields the following error:
Unexpected characters ($F[0] = (shift #F) .".$F[0]"; $, = ":"; print #F;'']
This command
state --m=4 in=in4.db | cppextract -f , -P NEW_MODEL /stdin Id Date {a,b,b2}{c,d}L {d1,d2,d3,d4}{x,}y | perl -lane '$F[0] = (shift #F) .".$F[0]"; $, = ":"; print #F;'
provides right output on linux command line but throws yaml parser exception when running through yaml.
First, let's untangle the YAML file in a more readable format:
tasks:
test: {
include: [bash_exec],
args:['-c', 'state --m=4 in=in4.db | cppextract -f , -P NEW_MODEL /stdin Id Date {a,b,b2}{c,d}L {d1,d2,d3,d4}{x,}y | perl -lane '$F[0] = (shift #F) .".$F[0]"; $, = ":"; print #F;' | state2 --id=Id.Date wq.db -'],
answer: '{{out}}/utestt.csv', n: 5, cols: [f,k]
}
The first problem is args:[; YAML requires you to separate a mapping value from the key (unless the key is a quoted scalar). Let's do that:
tasks:
test: {
include: [bash_exec],
args: [
'-c',
'state --m=4 in=in4.db | cppextract -f , -P NEW_MODEL /stdin Id Date {a,b,b2}{c,d}L {d1,d2,d3,d4}{x,}y | perl -lane '
$F[0] = (shift #F) .".$F[0]"; $, = ":"; print #F;' | state2 --id=Id.Date wq.db -'
],
answer: '{{out}}/utestt.csv', n: 5, cols: [f,k]
}
This makes it obvious what happens: You end the single-quoted scalar started with 'state right before the $ symbol. As we are in a YAML flow sequence (started by [), the parser expects a comma or the end of the sequence after that value. However, it finds a $ which is what it complains about.
Now obviously, you don't want to stop the scalar before the $; the ' is supposed to be part of the content. There are multiple ways to achieve this, but the most readable way is probably to define the value as a block scalar:
tasks:
test:
include: [bash_exec]
args:
- '-c'
- >-
state --m=4 in=in4.db | cppextract -f ,
-P NEW_MODEL /stdin Id Date {a,b,b2}{c,d}L {d1,d2,d3,d4}{x,}y |
perl -lane '$F[0] = (shift #F) .".$F[0]"; $, = ":"; print #F;' |
state2 --id=Id.Date wq.db -
answer:
- '{{out}}/utestt.csv',
- n: 5
- cols: [f, k]
>- starts a flow scalar, which can span multiple lines, and the linebreaks will be folded into a space character. Note that I removed the surrounding flow mapping ({…}) and replaced it with a block mapping to be able to use a block scalar in it.
I also changed answer to be a sequence which it is not currently, but it looks like it should be (it is also erroneous in the YAML you show).

How to get a parse NLP Tree object from bracketed parse string with nltk or spacy?

I have a sentence "You could say that they regularly catch a shower , which adds to their exhilaration and joie de vivre." and I can't achieve to get the NLP parse tree like the following example:
(ROOT (S (NP (PRP You)) (VP (MD could) (VP (VB say) (SBAR (IN that) (S (NP (PRP they)) (ADVP (RB regularly)) (VP (VB catch) (NP (NP (DT a) (NN shower)) (, ,) (SBAR (WHNP (WDT which)) (S (VP (VBZ adds) (PP (TO to) (NP (NP (PRP$ their) (NN exhilaration)) (CC and) (NP (FW joie) (FW de) (FW vivre))))))))))))) (. .)))
I want to replicate the solution to this question https://stackoverflow.com/a/39320379 but I have a string sentence instead of the NLP tree.
BTW, I am using python 3
Use the Tree.fromstring() method:
>>> from nltk import Tree
>>> parse = Tree.fromstring('(ROOT (S (NP (PRP You)) (VP (MD could) (VP (VB say) (SBAR (IN that) (S (NP (PRP they)) (ADVP (RB regularly)) (VP (VB catch) (NP (NP (DT a) (NN shower)) (, ,) (SBAR (WHNP (WDT which)) (S (VP (VBZ adds) (PP (TO to) (NP (NP (PRP$ their) (NN exhilaration)) (CC and) (NP (FW joie) (FW de) (FW vivre))))))))))))) (. .)))')
>>> parse
Tree('ROOT', [Tree('S', [Tree('NP', [Tree('PRP', ['You'])]), Tree('VP', [Tree('MD', ['could']), Tree('VP', [Tree('VB', ['say']), Tree('SBAR', [Tree('IN', ['that']), Tree('S', [Tree('NP', [Tree('PRP', ['they'])]), Tree('ADVP', [Tree('RB', ['regularly'])]), Tree('VP', [Tree('VB', ['catch']), Tree('NP', [Tree('NP', [Tree('DT', ['a']), Tree('NN', ['shower'])]), Tree(',', [',']), Tree('SBAR', [Tree('WHNP', [Tree('WDT', ['which'])]), Tree('S', [Tree('VP', [Tree('VBZ', ['adds']), Tree('PP', [Tree('TO', ['to']), Tree('NP', [Tree('NP', [Tree('PRP$', ['their']), Tree('NN', ['exhilaration'])]), Tree('CC', ['and']), Tree('NP', [Tree('FW', ['joie']), Tree('FW', ['de']), Tree('FW', ['vivre'])])])])])])])])])])])])]), Tree('.', ['.'])])])
>>> parse.pretty_print()
ROOT
|
S
______________________________________________________|_____________________________________________________________
| VP |
| ____|___ |
| | VP |
| | ___|____ |
| | | SBAR |
| | | ____|_______ |
| | | | S |
| | | | _______|____________ |
| | | | | | VP |
| | | | | | ____|______________ |
| | | | | | | NP |
| | | | | | | __________|__________ |
| | | | | | | | | SBAR |
| | | | | | | | | ____|____ |
| | | | | | | | | | S |
| | | | | | | | | | | |
| | | | | | | | | | VP |
| | | | | | | | | | ____|____ |
| | | | | | | | | | | PP |
| | | | | | | | | | | ____|_____________________ |
| | | | | | | | | | | | NP |
| | | | | | | | | | | | ________________|________ |
NP | | | NP ADVP | NP | WHNP | | NP | NP |
| | | | | | | ___|____ | | | | ____|_______ | ____|____ |
PRP MD VB IN PRP RB VB DT NN , WDT VBZ TO PRP$ NN CC FW FW FW .
| | | | | | | | | | | | | | | | | | | |
You could say that they regularly catch a shower , which adds to their exhilaration and joie de vivre .
I am going to assume there is a good reason as to why you need the dependency parse tree in that format. Spacy does a great job by using a CNN (Convolutional Neural Network) to produce CFGs (Context-Free Grammars), is production ready, and is super-fast. You can do something like the below to see for yourself (and then read the docs in the prior link):
import spacy
nlp = spacy.load('en')
text = 'You could say that they regularly catch a shower , which adds to their exhilaration and joie de vivre.'
for token in nlp(text):
print(token.dep_, end='\t')
print(token.idx, end='\t')
print(token.text, end='\t')
print(token.tag_, end='\t')
print(token.head.text, end='\t')
print(token.head.tag_, end='\t')
print(token.head.idx, end='\t')
print(' '.join([w.text for w in token.subtree]), end='\t')
print(' '.join([w.text for w in token.children]))
Now, you could make an algorithm to navigate this tree, and print accordingly (I couldn't find a quick example, sorry, but you can see the indexes and how to traverse the parse). Another thing you could do is to extract the CFG somehow, and then use NLTK to do the parsing and subsequent displaying in the format you desire. This is from the NLTK playbook (modified to work with Python 3):
import nltk
from nltk import CFG
grammar = CFG.fromstring("""
S -> NP VP
VP -> V NP | V NP PP
V -> "saw" | "ate"
NP -> "John" | "Mary" | "Bob" | Det N | Det N PP
Det -> "a" | "an" | "the" | "my"
N -> "dog" | "cat" | "cookie" | "park"
PP -> P NP
P -> "in" | "on" | "by" | "with"
""")
text = 'Mary saw Bob'
sent = text.split()
rd_parser = nltk.RecursiveDescentParser(grammar)
for p in rd_parser.parse(sent):
print(p)
# (S (NP Mary) (VP (V saw) (NP Bob)))
However, you can see that you need to define the CFG (so if you tried your original text in place of the example's, you saw that it didn't understand the tokens not defined in the CFG).
It seems the easiest way to obtain your desired format is using Stanford's NLP parser. Taken from this SO question (and sorry, I haven't tested it):
parser = StanfordParser(model_path='edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz')
parsed = parser.raw_parse('Jack payed up to 5% more for each unit')
for line in parsed:
print(line, end=' ') # This will print all in one line, as desired
I didn't test this because I don't have the time to install the Stanford Parser, which can be a bit of a cumbersome process (relative to installing Python modules), that is, assuming you are looking for a Python solution.
I hope this helps, and I'm sorry that it's not a direct answer.

'Unknown logical symbol map.Map.const' message in Why3

I'm experimenting with why3 by following their tutorial, but I get the message Unknown logical symbol map.Map.const for multiple provers. Here are the contents of the theory I'm trying to prove:
theory List
type list 'a = Nil | Cons 'a (list 'a)
predicate mem(x: 'a) (l: list 'a) = match l with
| Nil -> false
| Cons y r -> x = y || mem x r
end
goal G1: mem 2 (Cons 1 (Cons 2 (Cons 3 Nil)))
end
Here are the results of a variety of provers:
z3:
▶ why3 prove -P z3 demo_logic.why
File "/usr/local/share/why3/drivers/z3_bare.drv", line 172, characters 36-41:
Unknown logical symbol map.Map.const
cvc4:
▶ why3 prove -P cvc4 demo_logic.why
File "/usr/local/share/why3/drivers/cvc4_bare.drv", line 180, characters 36-41:
Unknown logical symbol map.Map.const
pvs:
▶ why3 prove -P pvs demo_logic.why
File "/usr/local/share/why3/drivers/pvs-common.gen", line 41, characters 18-23:
Unknown logical symbol map.Map.const
This is my why3 version information:
▶ why3 --version
Why3 platform, version -n 0.85+git (build date: Tue Mar 10 08:27:47 EDT 2015)
The timestamps on the .drv files mentioned in the error messages match the timestamp on my why3 executable.
Is there something wrong with my theory or my installation?
Edit to add: In the tutorial itself it says to use why3 demo_logic.why to prove the theory, but when I try that I get this result:
▶ why3 demo_logic.why
'demo_logic.why' is not a Why3 command.
If instead I just do why3 prove demo_logic.why, the result is just (approximately) an echo of the theory:
▶ why3 prove demo_logic.why
theory List
(* use why3.BuiltIn.BuiltIn *)
type list 'a =
| Nil
| Cons 'a (list 'a)
predicate mem (x:'a) (l:list 'a) =
match l with
| Nil -> false
| Cons y r -> x = y || mem x r
end
goal G1 : mem 2 (Cons 1 (Cons 2 (Cons 3 (Nil:list int))))
end
Do you installed a previous version of why3? Problems in the execution of provers are often due to a new why3 using a configuration file of an old why3. And I have seen your particular instance fixed by this:
rm ~/.why3.conf
why3 config --detect

Absolute vs relative vs "slash" URL?

If this full URL:
http://domain.com/dir/file.css
Is an "absolute URL", where the link will work from any website.
And this:
../dir/file.css
Is a "relative URL", where the link will only work from that directory path.
What is the combination of those two called…
/dir/file.css
Where the link will work from any location on that site?
Your first example is a URL. Your second and third examples are not URLs, they're paths. If the path begins with / then it's an absolute path, otherwise it's a relative path.
Web browsers generally understand how to interpret a path in relation to the "current" host and path.
You’re basically talking about a URI scheme. In your example:
/dir/file.css
This is considered the path:
/dir/
And this is the filename:
file.css
So saying “hostname plist path & filename” is a safe bet. Or perhaps /dir/file.css can be considered the root path since the / at the beginning anchors it to the hostname part of the URL.
This diagram from Wikipedia explains it well:
foo://username:password#example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose
\_/ \_______________/ \_________/ \__/ \___/ \_/ \______________________/ \__/
| | | | | | | |
| userinfo hostname port | | query fragment
| \________________________________/\_____________|____|/ \__/ \__/
| | | | | | |
| | | | | | |
scheme authority path | | interpretable as keys
name \_______________________________________________|____|/ \____/ \_____/
| | | | | |
| hierarchical part | | interpretable as values
| | |
| path interpretable as filename |
| ___________|____________ |
/ \ / \ |
urn:example:animal:ferret:nose interpretable as extension
path
_________|________
scheme / \
name userinfo hostname query
_|__ ___|__ ____|____ _____|_____
/ \ / \ / \ / \
mailto:username#example.com?subject=Topic

LALR Parser Generator Implementation Problem

I'm currently trying to implement a LALR parser generator as described in "compilers principles techniques and tools" (also called "dragon book").
A lot already works. The parser generator is currently able to generate the full goto-graph.
Example Grammar:
S' --> S
S --> C C
C --> c C
C --> d
Nonterminals: S', S, C
Terminals: c, d
Start: S'
The goto-graph:
I[0]---------------+ I[1]-------------+
| S' --> . S , $ |--S-->| S' --> S . , $ |
| S --> . C C , $ | +----------------+
| C --> . c C , c |
| C --> . c C , d | I[2]--------------+
| C --> . d , c | | S --> C . C , $ | I[3]--------------+
| C --> . d , d |--C-->| C --> . c C , $ |--C-->| S --> C C . , $ |
+------------------+ | C --> . d , $ | +-----------------+
| | +-----------------+
| | +--c--+ | |
| | | | c |
| | | v v |
| | I[4]--------------+ |
| c | C --> c . C , c | |
| | | C --> c . C , d | |
| | | C --> c . C , $ | d
| | | C --> . c C , c | |
| +---->| C --> . c C , d | |
| | C --> . c C , $ | |
d | C --> . d , c |--+ |
| +-----| C --> . d , d | | |
| | | C --> . d , $ | | |
| | +-----------------+ | |
| C | |
| | I[6]--------------+ | |
| | | C --> c C . , c | d |
| +---->| C --> c C . , d | | |
| | C --> c C . , $ | | |
| +-----------------+ | |
| | |
| I[5]------------+ | |
| | C --> d . , c |<---+ |
+------->| C --> d . , d | |
| C --> d . , $ |<-----+
+---------------+
I have trubbles with implementing the algorithm to generate the actions-table!
My algorithm computes the following output:
state | action
| c | d | $
------------------------
0 | s4 | s5 |
------------------------
1 | | | acc
------------------------
2 | s4 | s5 |
------------------------
3 | | | r?
------------------------
4 | s4 | s5 |
------------------------
5 | r? | r? | r?
------------------------
6 | r? | r? | r?
sx... shift to state x
rx... reduce to state x
The r? means that I don't know how to get the state (the ?) to which the parser should reduce. Does anyone know an algorithm to get ? using the goto-graph above?
If anything is describe no clearly enough, please ask and I will try to explain it better!
Thanks for your help!
A shift entry is attributed by the next state, but a reduce entry indicates a production.
When you shift, you push a state reference onto your stack and proceed to the next state.
When you reduce, this is for a specific production. The production was responsible for shifting n states onto your stack, where n is the number of symbols in that production. E.g. one for S', two for S, and two or one for C (i.e. for the first or second alternative for C).
After n entries are popped off the stack, you return to the state where you started processing that production. For that state and the nonterminal resulting from the production, you lookup the goto table to find the next state, which is then pushed.
So the reduce entries identify a production. In fact it may be sufficient to know the resulting nonterminal, and the number of symbols to pop.
Your table thus should read
state | action | goto
| c | d | $ | C | S
------------------------------------
0 | s4 | s5 | | 2 | 1
------------------------------------
1 | | | acc | |
------------------------------------
2 | s4 | s5 | | 3 |
------------------------------------
3 | | | r0 | |
------------------------------------
4 | s4 | s5 | | | 6
------------------------------------
5 | r3 | r3 | r3 | |
------------------------------------
6 | r2 | r2 | r2 | |
where rx indicates reduce by production x.
You need to pop the stack and and find the next state from there.
The rx means: reduce using the production with the number x!
Then everything gets clear!
Simple pop the body of the production and shift the head back onto the top!

Resources