CYK build parsing tree from table - parsing

How do I build a parsing tree after I obtain the CYK table? I didn't understand what Wikipedia was trying to say.
Example:
A
- C
A - B
B,D C,F E B,D
How do I return a list of rules applied to get from A to BCED (or any other combination of letters from the longest line of the table)?

Related

Creating a PsiElement from an ASTNode with implied nodes and out of order children

Is it OK for a custom language plugin using IntelliJ's plugin SDK to produce a PSIElement tree where
some PSIElements have no associated ASTNode, so where myPsiElement.getNode() == null
some PSIElements have children out of order, e.g. myPsiElement.children()[0].getStartOffsetInParent() > myPsiElement.children()[1].getStartOffsetInParent()
some PSIElements correspond to zero characters in the source: myPsiElement.getTextLength() == 0
Would any of these properties make it harder to take advantage of language plugin SDK features?
For background:
I'm creating a custom language plugin for IntelliJ per "Implementing a Parser and PSI."
The bottom of the diagram from the docs shows the relationship between ASTNodes and PsiElements.
IIUC, first, a lexer segments the text into tokens. Then a parser drops node start and end marks between tokens to specify the parse tree structure. Intellij internals lift that token stream with markers into a (not-very abstract) ASTNode tree. Finally, language-specific plugin code builds a PSI tree from the AST.
It looks like, from that diagram, that every node in the Psi tree is associated with an ASTElement.
This relationship seems bidirectional per PsiElement.getNode(). The diagram doesn't show an arrow between MyPsiFile and the MyElementType.FILE ASTNode but
PsiFile.getNode() suggests there has to be one.
For language-specific reasons, my existing parser produces a tree that don't clearly fit this model.
Nodes can appear out of order.
The expression a + b parses to a node like (call + a b). Note that infix and postfix operators are shifted left to before their first operand.
The parser synthesizes some nodes that correspond to no tokens, and so the relationship between them and an AST node is unclear. For example I need to produce distinct trees for
for (x;;) body
for (;x;) body
for (;;x) body
so the parser inserts symbols like the init= in (for init= x body) which the definition of for can use to decide what to do with x.
There is a concept of a "non-physical" PSIElement:
/**
* Checks if an actual source or class file corresponds to the element. Non-physical elements include,
* for example, PSI elements created for the watch expressions in the debugger.
* Non-physical elements do not generate tree change events.
* Also, {#link PsiDocumentManager#getDocument(PsiFile)} returns null for non-physical elements.
but it's unclear to me if those are associated with ASTNodes or if that's right for nodes that are implied by program text.

Pre Order, In order, and Post Order Tree Traversals

I am confused about in order, pre-order and post-order traversals, specifically
this one, Pre-Order: ABAB, Post Order: BABA, In Order: AABB.
I understand that the root is the first and last element of Pre and Post, but I fail to understand how to finish constructing the Binary Tree.
Your post is vague, and doesn't make much sense, but I'll explain in, pre, post order and constructing a binary tree for you.
One of the reasons your question doesn't make sense is you haven't established an order to the elements you describe in ordering, ABAB BABA and AABB means absolutely nothing with out a tree to properly show where each element goes (and is each element a letter? why do they duplicate)
Another reason why your question doesn't make sense is that you appear to think that pre, pos and in order have something to do with creating a binary tree, they don't.
Pre ordering, In Ordering, and Post Ordering are all types of Depth First Search algorithms for tree traversal. That is to say they are ways of navigating a tree, not creating one. You may use these algorithms to find elements, or to simply print out all the contents of a tree, this is especially useful to a tree who's nodes are only linked via pointers (as apposed to say, an array based binary heap).
Imagine the following binary tree (the same for all examples)
A
B C
D E F G
Pre order traversal is a type of tree traversal algorithm where you always take the left most path first. When you can't go farther, you take the next most left path, and do the same thing recursively on the next node. In the above example tree, pre order traversal would start at the root, (A) go left (A,B) go left again (D) couldn't go left so then would go right (E) and in the end you would end up with the following traversal sequence: A B D E C F G
In order traversal is similar to pre-order traversal, but instead of displaying at each step, in order traversal goes the deepest left it can go, then displays, and if it can't go deep enough any more, it goes back up, displays (hence 'in' order), and tries the same thing to the right again recursively until its done. In the tree example, we'd actually print D first, go back up to B, and print B, then E, then back up to A, and so on, so the final output would be D B E A F G C. Note Wikipedias example may make more sense as it is more complicated.
In post order, we print from bottom up essentially, we find the deepest node in the left subtree, and print the deepest nodes in there recursively until we're done, go to the right subtree and finally print the root eg: D E B F G C A. Again this example makes more sense with wikipedia, since they have a more complicated tree.
If you want to construct a tree, there are many ways to do so but it depends entirely what kind of ordering structure you want. Do you want to have a binary structure or n-ary structure? Do you care about which element is on top, or do you only want the min/max (like a pairing heap or binary heap priority queue)? Do you have a search condition, such that the roots of each part of a tree must be larger/smaller/other condition relative to the children or their parents? (like a binary search tree)
This post is also good explaining the traversals if this isn't sufficient, it also explains why you need different types of ordering in order to construct a tree from a sequence of nodes with proper connections (if your original intent was to copy a binary tree structure)

What is difference between Parse Tree, Annotated Parse Tree and Activation Tree ?(compiler)

I know what is a Parse Tree and what is an Abstract Tree but I after reading some about Annotated Parse Tree(as we draw detailed tree which is same as Parse Tree), I feel that they are same as Parse Tree.
Can anyone please explain differences among these three in detail ?
Thanks.
AN ANNOTATED PARSE TREE is a parse tree showing the values of the attributes at each node. The process of computing the attribute values at the nodes is called annotating or decorating the parse tree.
For example: Refer link below, it is annotated parse tree for 3*5+4n
https://i.stack.imgur.com/WAwdZ.png
A parse tree is a representation of how a source text (of a program) has been decomposed to demonstate it matches a grammar for a language. Interior nodes in the tree are language grammar nonterminals (BNF rule left hand side tokens), while leaves of the tree are grammar terminals (all the other tokens) in the order required by grammar rules.
An annotated parse tree is one in which various facts about the program have been attached to parse tree nodes. For example, one might compute the set of identifiers that each subtree mentions, and attach that set to the subtree. Compilers have to store information they have collected about the program somewhere; this is a convenient place to store information which is derivable form the tree.
An activation tree is conceptual snapshot of the result of a set of procedures calling one another at runtime. Node in such a tree represent procedures which have run; childen represent procedures called by their parent.
So a key difference between (annotated) parse trees and activation trees is what they are used to represent: compile time properties vs. runtime properties.
An annotated parse tree lets you intergrate the entire compilation into the parse tree structure. CM Modula-3 does that if im not mistaken.
To build an APT, simply declare an abstract base class of nodes, subclass each production on it and declare the child nodes as field variables.

Is there a rule of thumb for which symbol to shift first when generating LR(0) parser table?

I'm reading about LR(0) parser from this book: Modern Compiler Implementation in Java. Below is how the parsing table looked like based on the book.
http://postimg.org/image/hyowddu1h/
Start symbol: S --> E$
Productions:
(1) E --> T + E
(2) E --> T
(3) T --> x
I tried making a parsing table based on the productions given but I didn't get the same parsing table as the one in the book. I think I shifted the symbol correctly. It's just that I have different parsing table than the one in the book.
(Note: I started with state 0 instead of state 1 like in the book)
So is the parsing table unique or is there any rule of thumbs on deciding which symbol to shift to the stack first or how to label the parsing state correctly? I always shift the terminal symbols first then nonterminal symbols as shown below:
http://postimg.org/image/76vbu2vu3/
Thanks in advance!
State numbers are not that important. In fact, they're not important at all. What is important is state identity; how you label the state to show that identity is up to you. You could use names, like hurricanes and tornados.
Two machines are the same if you can construct a one-to-one mapping between the two state sets such that all the labeled transition preserve the mapping. I'm pretty sure you can construct that mapping between the state machine in your textbook and the machine you built.

Prolog Parsing Output

I'm doing a piece of university coursework, and I'm stuck with some Prolog.
The coursework is to make a really rudimentary Watson (the machine that answers questions on Jeapoardy).
Anyway, I've managed to make it output the following:
noun_phrase(det(the),np2(adj(traitorous),np2(noun(tostig_godwinson)))),
verb_phrase(verb(was),np(noun(slain)))).
But the coursework specifies that I now need to extract the first and second noun, and the verb, to make a more concise sentence; i.e. [Tostig_godwinson, was, slain].
I much prefer programming in languages like C etc., so I'm a bit stuck. If this were a procedural language, I'd use parsing tools, but Prolog doesn't have any... What do I need to do to extract those parts?
Thank you in advance
In Prolog, the language is the parsing tool. Use the univ (=..) operator to do term inspection:
% find terminal nodes (words) in Tree
terminal(Tree, Type, Item) :-
Tree =.. [Type, Item],
atomic(Item).
terminal(Tree, Type, Item) :-
Tree =.. [_, Sub],
member(Node, Sub),
terminal(Node, Type, Item).
Now get a list of all nouns with findall(N, terminal(Tree, noun, N), Nouns) and get the nth1 element.

Resources