Parsing Tree and Derivation?

Parsing Tree and Derivation? - parsing

I don't understand the relantioship between parsing tree and derivation. The parsing tree is invariant with respect to the derivation, but does this mean that regardless of the derivation (rightmost or leftmost) the parsing tree remains the same? Or according to the method (rightmost or leftmost) does the parsing tree change?
Pls help me
Sorry for my bad english.

The parse tree is a record of the derivation. Each non-leaf node of the tree is the result of a single derivation step.
At the root of the parse tree and at the beginning of the derivation, you find the grammar's start symbol. A derivation step replaces a non-terminal with the right-hand side of some production which has that non-terminal on the left-hand side. In the tree, the node corresponding to the non-terminal is given a sequence of children, each one a symbol in the right-hand side of the production. Terminal symbols become leaf nodes and non-terminals will eventually become the top of a subtree.
If the grammar is unambiguous, there is only one parse tree for each derivable sentence. But that parse tree represents a large number of possible derivations, unless the grammar is linear (that is, the right-hand side of every production contains at most one non-terminal). In a derivation which is being built, you can select any non-terminal for the next derivation step; in the parse tree, you can select any node representing a non-terminal which does not yet have children.
The leftmost and rightmost derivations are just two of these many possibilities. (Again, unless the grammar is linear, in which case the leftmost and rightmost derivations are the same derivation.) But a derivation doesn't have to be leftmost or rightmost. At each step, it can choose any non-terminal, not only the leftmost or rightmost one. In the tree representation, a possible derivation can be generated by any valid topological sort of the nodes.
But that doesn't matter in practical terms. The only useful practical question is whether there is more than one different parse tree for each sentence in the grammar, which is exactly the same as asking whether there is more than one leftmost derivation or more than one rightmost derivation.

Related

Theory: LL(k) parser vs parser for LL(k) grammars

I'm concerned about the very important difference between the therms: "LL(k) parser" and "parser for LL(k) grammars". When a LL(1) backtracking parser is in question, it IS a parser for LL(k) grammars, because it can parse them, but its NOT LL(k) parser, because it does not use k tokens to look ahead from a single position in the grammar, but its exploring with backtracking the possible cases, regardless that it still uses k tokens to explore.
Am I right?
The question may break down to the way the look-ahead is performed. If the look-ahead is actually still processing the grammar with backtracking, that does not make it LL(k) parser. To be LL(k) parser the parser must not use the grammar with backtracking mechanism, because then it would be "LL(1) parser with backtracking that can parser LL(k) grammars".
Am I right again?
I think the difference is related to the expectation that LL(1) parser is using a constant time per token, and LL(k) parser is using at most k * constant (linear to the look-ahead) time per token, not an exponential time as it would be in the case of a backtracking parser.
Update 1: to simplify - per token, is the parsing LL(k) expected to run exponentially in respect to k or in a linear time in respect to k?
Update 2: I have changed it to LL(k) because the question is irrelevant to the range of which k is (integer or infinity).

An LL(k) parser needs to do the following at each point in the inner loop:
Collect the next k input symbols. Since this is done at each point in the input, this can be done in constant time by keeping the lookahead vector in a circular buffer.
If the top of the prediction stack is a terminal, then it is compared with the next input symbol; either both are discarded or an error is signalled. This is clearly constant time.
If the top of the prediction stack is a non-terminal, the action table is consulted, using the non-terminal, the current state and the current lookahead vector as keys. (Not all LL(k) parsers need to maintain a state; this is the most general formulation. But it doesn't make a difference to complexity.) This lookup can also be done in constant time, again by taking advantage of the incremental nature of the lookahead vector.
The prediction action is normally done by pushing the right-hand side of the selected production onto the stack. A naive implementation would take time proportional to the length of the right-hand side, which is not correlated with either the lookahead k nor the length of the input N, but rather is related to the size of the grammar itself. It's possible to avoid the variability of this work by simply pushing a reference to the right-hand side, which can be used as though it were the list of symbols (since the list can't change during the parse).
However, that's not the full story. Executing a prediction action does not consume an input, and it's possible -- even likely -- that multiple predictions will be made for a single input symbol. Again, the maximum number of predictions is only related to the grammar itself, not to k nor to N.
More specifically, since the same non-terminal cannot be predicted twice in the same place without violating the LL property, the total number of predictions cannot exceed the number of non-terminals in the grammar. Therefore, even if you do push the entire right-hand side onto the stack, the total number of symbols pushed between consecutive shift actions cannot exceed the size of the grammar. (Each right-hand side can be pushed at most once. In fact, only one right-hand side for a given non-terminal can be pushed, but it's possible that almost every non-terminal has only one right-hand side, so that doesn't reduce the asymptote.) If instead only a reference is pushed onto the stack, the number of objects pushed between consecutive shift actions -- that is, the number of predict actions between two consecutive shift actions -- cannot exceed the size of the non-terminal alphabet. (But, again, it's possible that |V| is O(|G|).
The linearity of LL(k) parsing was established, I believe, in Lewis and Stearns (1968), but I don't have that paper at hand right now so I'll refer you to the proof in Sippu & Soisalon-Soininen's Parsing Theory (1988), where it is proved in Chapter 5 for Strong LL(K) (as defined by Rosenkrantz & Stearns 1970), and in Chapter 8 for Canonical LL(K).
In short, the time the LL(k) algorithm spends between shifting two successive input symbols is expected to be O(|G|), which is independent of both k and N (and, of course, constant for a given grammar).
This does not really have any relation to LL(*) parsers, since an LL(*) parser does not just try successive LL(k) parses (which would not be possible, anyway). For the LL(*) algorithm presented by Terence Parr (which is the only reference I know of which defines what LL(*) means), there is no bound to the amount of time which could be taken between successive shift actions. The parser might expand the lookahead to the entire remaining input (which would, therefore, make the time complexity dependent on the total size of the input), or it might fail over to a backtracking algorithm, in which case it is more complicated to define what is meant by "processing an input symbol".

I suggest you to read the chapter 5.1 of Aho Ullman Volume 1.
https://dl.acm.org/doi/book/10.5555/578789
A LL(k) parser is a k-predictive algorithm (k is the lookahead integer >= 1).
A LL(k) parser can parse any LL(k) grammar. (chapter 5.1.2)
for all a, b you have a < b => LL(b) grammar is also a LL(a) grammar. But the reverse is not true.
A LL(k) parser is PREDICTIVE. So there is NO backtracking.
All LL(k) parsers are O(n) n is the length of the parsed sentence.
It is important to understand that a LL(3) parser do not parse faster than a LL(1). But the LL(3) parser can parse MORE grammars than the LL(1). (see the point #2 and #3)

How does LR parsing select a qualifying grammar production (to construct the parse tree from the leaves)?

I am reading a tutorial of the LR parsing. The tutorial uses an example grammar here:
S -> aABe
A -> Abc | b
B -> d
Then, to illustrate how the parsing algorithm works, the tutorial shows the process of parsing the word "abbcde" below.
I understand at each step of the algorithm, a qualifying production (namely a gramma rule, illustrated in column 2 in the table) is searched to match a segment of the string. But how does the LR parsing chooses among a set of qualifying productions (illustrate in column 3 in the table)?

An LR parse of a string traces out a rightmost derivation in reverse. In that sense, the ordering of the reductions applied is what you would get if you derived the string by always expanding out the rightmost nonterminal, then running that process backwards. (Try this out on your example - isn’t that neat?)
The specific mechanism by which LR parsers actually do this involves the use of a parsing automaton that tracks where within the grammar productions the parse happens to be, along with some lookahead information. There are several different flavors of LR parser (LR(0), SLR(1), LALR(1), LR(1), etc.), which differ on how the automaton is structured and how they use lookahead information. You may find it helpful to search for a tutorial on how these automata work, as that’s the heart of how LR parsers actually work.

Parse trees in ambiguous and unambiguous grammar

In an unambiguous grammar, do left and right derivation both produce the same parse tree?
Because I have read that grammar having more than one parse tree is said to be ambiguous.

If the grammar is unambiguous, there is only one parse tree. (By definition.) So the leftmost and rightmost derivations generate the same tree.
You can think of a derivation as a tree walk. For a given tree, there are many different possible ways of traversing it. Leftmost and rightmost derivations are pre- and post-order depth-first traverses, respectively.

Difference between left/right recursive, left/right-most derivation, precedence, associativity etc

I am currently learning language processors and a topic that comes up very often is the direction in which elements in a grammar are consumed. Left to right or right to left.
I understand the concept but there seems to be so many ways of writing these rules and I am not sure if they are all the same. What I've seen so far is:
Right/Left recursion,
Right/Left-most derivation,
Right/Left reduction, precedence, associativity etc.
Do these all mean the same thing?

No, they all have different meanings.
Right- and left-recursion refer to recursion within production rules. A production for a non-terminal is recursive if it can derive a sequence containing that non-terminal; it is left-recursive if the non-terminal can appear at the start (left edge) of the derived sequence, and right-recursive if it can appear at the end (right edge). A production can be recursive without being either left- or right-recursive, and it can even be both left- and right-recursive.
For example:
term: term '*' factor { /* left-recursive */ }
assignment: lval '=' assignment { /* right-recursive */ }
The above examples are both direct recursion; the non-terminal directly derives a sequence containing the non-terminal. Recursion can also be indirect; it is still recursion.
All common parsing algorithm process left-to-right, which is the first L in LL and LR. Top-down (LL) parsing finds a leftmost derivation (the second L), while bottom-up (LR) parsing finds a rightmost derivation (the R).
Effectively, both types of parser start with a single non-terminal (the start symbol) and "guess" a derivation based on some non-terminal in the current sequence until the input text is derived. In a leftmost derivation, it is always the leftmost non-terminal which is expanded. In a rightmost derivation, it is always the rightmost non-terminal.
So a top-down parser always guesses which production to use for the first non-terminal, after which it needs to again work on whatever is now the first non-terminal. ("Guess" here is informal. It can look at the input to be matched -- or at least the next k tokens of the input -- in order to determine which production to use.) This is called top-down processing because it builds the parse tree from the top down.
It's easier (at least for me) to visualize the action of a bottom-up parser in reverse; it builds the parse tree bottom up by repeatedly reading just enough of the input to find some production, which will be the last derivation in the derivation chain. So it does produce a rightmost derivation, but it outputs it back-to-front.
In an LR grammar for an operator language (roughly speaking, a grammar for languages which look like arithmetic expressions), left- and right- associativity are modelled using left- and right-recursive grammar rules, respectively. "Associativity" is an informal description of the grammar, as is "precedence".
Precedence is modelled by using a series of grammar rules, each of which refers to the next rule (and which usually end up with a recursive production for handling parentheses -- '(' expr ')' -- which is neither left- nor right-recursive).
There is an older style of bottom-up parsing, called "operator precedence parsing", in which precedence is explicitly part of the language description. One common operator-precedence algorithm is the so-called Shunting Yard algorithm. But if you have an LALR(1) parser generator, like bison, you might as well use that instead, because it is both more general and more precise.

(I am NOT an expert on parser and compiler theory. I happen to be learning something related. And I'd like to share something I have found so far.)
I strongly suggest taking a look at this awesome article.
It explains and illustrats the LL and LR algorithm. You can clearly see why LL is called top-down and LR is called bottom-up.
Some quotation:
The primary difference between how LL and LR parsers operate is that
an LL parser outputs a pre-order traversal of the parse tree and an LR
parser outputs a post-order traversal.
...
We are converging on a very simple model for how LL and LR parsers
operate. Both read a stream of input tokens and output that same token
stream, inserting rules in the appropriate places to achieve a
pre-order (LL) or post-order (LR) traversal of the parse tree.
...
When you see designations like LL(1), LR(0), etc. the number in
parentheses is the number of tokens of lookahead.
And as to the acronyms: (source)
The first L in LR and LL means: that the parser reads input
text in one direction without backing up; that direction is typically
Left to right within each line, and top to bottom across the lines of
the full input file.
The remaining R and L means: right-most and left-most derivation, respectively.
These are 2 different parsing strategies. A parsing strategy determines the next non-terminal to rewrite. (source)
For left-most derivation, it is always the leftmost nonterminal.
For right-most derivation, it is always the rightmost nonterminal.

How can a tree be encoded as input to a neural network?

I have a tree, specifically a parse tree with tags at the nodes and strings/words at the leaves. I want to pass this tree as input into a neural network all the while preserving its structure.
Current approach
Assume we have some dictionary of words w1,w2.....wn
Encode the words that appear in the parse tree as n dimensional binary vectors with a 1 showing up in the ith spot whenever the word in the parse tree is wi
Now how about the tree structure? There are about 2^n possible parent tags for n words that appear at the leaves So we cant set a max length of input words and then just brute force enumerate all trees.
Right now all i can think of is to approximate the tree by choosing the direct parent of a leaf. This can be represented by a binary vector as well with dimension equal to number of different types of tags - on the order of ~ 100 i suppose.
My input is then two dimensional. The first is just the vector representation of a word and the second is the vector representation of its parent tag
Except this will lose a lot of the structure in the sentence. Is there a standard/better way of solving this problem?

You need a recursive neural network. Please see this repository for an example implementation: https://github.com/erickrf/treernn
The principle of a recursive (not recurrent) neural network is shown in this picture.
It learns representation of each leaf, and then goes up through the parents to finally construct the representation of the whole structure.

Encode tree structure: Think of Recurrent Neural Network, which you have one chain which can be construct by for loop. But here you have a tree. So you would need do some kind of loop with branch. Recursive function call might work with some Python overhead.
I suggest you build neural network with 'define by run' framework (like Chainer, PyTorch) to reduce overhead. Because your tree may have to be rebuild different for each data sample, which require to rebuilding computation graph.
Read Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks, with original Torch7 implementation here and PyTorch implementation, you may have some ideal.
For encoding a tag at node, I think an easiest way would be encoding them as you do with word.
For example, a node data is [word vector][tag vector]. If node is leaf, you have word, but may not have tag (you did not say that there is tag at leaf node), so leaf data representation is [word][zero vector] (or [word vector][tag vector]). The case inner node that does not have word=> [zero vector][tag vector]. Then, you have inner node and leaf with same vector dimension of data representation. You may treat them equally (or not :3)

Encode each leaf node using (i) the sequence of nodes that connects it to the root node and (ii) the encoding of the leaf node that comes before it.
For (i), use a recurrent network whose input is tags. Feed this RNN the root tag, the second level tag, ..., and finally the parent tag (or their embeddings). Combine this with the leaf itself (the word or its embedding). Now, you have a feature that describes the leaf and its ancestors.
For (ii), also use a recurrent network! Simply start by computing the feature described above for the left most leaf and feed it to a second RNN. Keep doing this for each leaf moving from left to right. At each step, the second RNN will give you a vector that represents the current leaf with its ancestors, the leaves that come before it and their ancestors.
Optionally, do (ii) bi-directionally and you will get a leaf feature that incorporates the whole tree!

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart