Theory: LL(k) parser vs parser for LL(k) grammars - parsing

I'm concerned about the very important difference between the therms: "LL(k) parser" and "parser for LL(k) grammars". When a LL(1) backtracking parser is in question, it IS a parser for LL(k) grammars, because it can parse them, but its NOT LL(k) parser, because it does not use k tokens to look ahead from a single position in the grammar, but its exploring with backtracking the possible cases, regardless that it still uses k tokens to explore.
Am I right?
The question may break down to the way the look-ahead is performed. If the look-ahead is actually still processing the grammar with backtracking, that does not make it LL(k) parser. To be LL(k) parser the parser must not use the grammar with backtracking mechanism, because then it would be "LL(1) parser with backtracking that can parser LL(k) grammars".
Am I right again?
I think the difference is related to the expectation that LL(1) parser is using a constant time per token, and LL(k) parser is using at most k * constant (linear to the look-ahead) time per token, not an exponential time as it would be in the case of a backtracking parser.
Update 1: to simplify - per token, is the parsing LL(k) expected to run exponentially in respect to k or in a linear time in respect to k?
Update 2: I have changed it to LL(k) because the question is irrelevant to the range of which k is (integer or infinity).

An LL(k) parser needs to do the following at each point in the inner loop:
Collect the next k input symbols. Since this is done at each point in the input, this can be done in constant time by keeping the lookahead vector in a circular buffer.
If the top of the prediction stack is a terminal, then it is compared with the next input symbol; either both are discarded or an error is signalled. This is clearly constant time.
If the top of the prediction stack is a non-terminal, the action table is consulted, using the non-terminal, the current state and the current lookahead vector as keys. (Not all LL(k) parsers need to maintain a state; this is the most general formulation. But it doesn't make a difference to complexity.) This lookup can also be done in constant time, again by taking advantage of the incremental nature of the lookahead vector.
The prediction action is normally done by pushing the right-hand side of the selected production onto the stack. A naive implementation would take time proportional to the length of the right-hand side, which is not correlated with either the lookahead k nor the length of the input N, but rather is related to the size of the grammar itself. It's possible to avoid the variability of this work by simply pushing a reference to the right-hand side, which can be used as though it were the list of symbols (since the list can't change during the parse).
However, that's not the full story. Executing a prediction action does not consume an input, and it's possible -- even likely -- that multiple predictions will be made for a single input symbol. Again, the maximum number of predictions is only related to the grammar itself, not to k nor to N.
More specifically, since the same non-terminal cannot be predicted twice in the same place without violating the LL property, the total number of predictions cannot exceed the number of non-terminals in the grammar. Therefore, even if you do push the entire right-hand side onto the stack, the total number of symbols pushed between consecutive shift actions cannot exceed the size of the grammar. (Each right-hand side can be pushed at most once. In fact, only one right-hand side for a given non-terminal can be pushed, but it's possible that almost every non-terminal has only one right-hand side, so that doesn't reduce the asymptote.) If instead only a reference is pushed onto the stack, the number of objects pushed between consecutive shift actions -- that is, the number of predict actions between two consecutive shift actions -- cannot exceed the size of the non-terminal alphabet. (But, again, it's possible that |V| is O(|G|).
The linearity of LL(k) parsing was established, I believe, in Lewis and Stearns (1968), but I don't have that paper at hand right now so I'll refer you to the proof in Sippu & Soisalon-Soininen's Parsing Theory (1988), where it is proved in Chapter 5 for Strong LL(K) (as defined by Rosenkrantz & Stearns 1970), and in Chapter 8 for Canonical LL(K).
In short, the time the LL(k) algorithm spends between shifting two successive input symbols is expected to be O(|G|), which is independent of both k and N (and, of course, constant for a given grammar).
This does not really have any relation to LL(*) parsers, since an LL(*) parser does not just try successive LL(k) parses (which would not be possible, anyway). For the LL(*) algorithm presented by Terence Parr (which is the only reference I know of which defines what LL(*) means), there is no bound to the amount of time which could be taken between successive shift actions. The parser might expand the lookahead to the entire remaining input (which would, therefore, make the time complexity dependent on the total size of the input), or it might fail over to a backtracking algorithm, in which case it is more complicated to define what is meant by "processing an input symbol".

I suggest you to read the chapter 5.1 of Aho Ullman Volume 1.
https://dl.acm.org/doi/book/10.5555/578789
A LL(k) parser is a k-predictive algorithm (k is the lookahead integer >= 1).
A LL(k) parser can parse any LL(k) grammar. (chapter 5.1.2)
for all a, b you have a < b => LL(b) grammar is also a LL(a) grammar. But the reverse is not true.
A LL(k) parser is PREDICTIVE. So there is NO backtracking.
All LL(k) parsers are O(n) n is the length of the parsed sentence.
It is important to understand that a LL(3) parser do not parse faster than a LL(1). But the LL(3) parser can parse MORE grammars than the LL(1). (see the point #2 and #3)

Related

How does LR parsing select a qualifying grammar production (to construct the parse tree from the leaves)?

I am reading a tutorial of the LR parsing. The tutorial uses an example grammar here:
S -> aABe
A -> Abc | b
B -> d
Then, to illustrate how the parsing algorithm works, the tutorial shows the process of parsing the word "abbcde" below.
I understand at each step of the algorithm, a qualifying production (namely a gramma rule, illustrated in column 2 in the table) is searched to match a segment of the string. But how does the LR parsing chooses among a set of qualifying productions (illustrate in column 3 in the table)?
An LR parse of a string traces out a rightmost derivation in reverse. In that sense, the ordering of the reductions applied is what you would get if you derived the string by always expanding out the rightmost nonterminal, then running that process backwards. (Try this out on your example - isn’t that neat?)
The specific mechanism by which LR parsers actually do this involves the use of a parsing automaton that tracks where within the grammar productions the parse happens to be, along with some lookahead information. There are several different flavors of LR parser (LR(0), SLR(1), LALR(1), LR(1), etc.), which differ on how the automaton is structured and how they use lookahead information. You may find it helpful to search for a tutorial on how these automata work, as that’s the heart of how LR parsers actually work.

Why look ahead at most 1 input token?

I'm currently trying to implement LL parser but I have a question.
Need I to look ahead at most 1 input token for verify if the user's input
is syntactically correct or it's for another reason?
There are many different kinds of parsing algorithms. The one you're describing is called LL(1) and by definition it just uses one token of lookahead. However, there are other parsing algorithms that use more lookahead than this. For example, an LL(2) parser uses two tokens of lookahead, and an LL(*) parser has unbounded lookahead. There are grammars for which one token of lookahead isn't enough (that is, grammars that are LL(2) but not LL(1)). Here's an example:
S → n | n + S
Try working out why one token of lookahead isn't enough, but two tokens suffices.
The reason parsing algorithms try to keep the number of lookahead tokens low is for simplicity and efficiency. As the number of lookahead tokens increases, the size of the parsing table needed to drive the parser increases, as does the complexity of building those tables.

Difference between left/right recursive, left/right-most derivation, precedence, associativity etc

I am currently learning language processors and a topic that comes up very often is the direction in which elements in a grammar are consumed. Left to right or right to left.
I understand the concept but there seems to be so many ways of writing these rules and I am not sure if they are all the same. What I've seen so far is:
Right/Left recursion,
Right/Left-most derivation,
Right/Left reduction, precedence, associativity etc.
Do these all mean the same thing?
No, they all have different meanings.
Right- and left-recursion refer to recursion within production rules. A production for a non-terminal is recursive if it can derive a sequence containing that non-terminal; it is left-recursive if the non-terminal can appear at the start (left edge) of the derived sequence, and right-recursive if it can appear at the end (right edge). A production can be recursive without being either left- or right-recursive, and it can even be both left- and right-recursive.
For example:
term: term '*' factor { /* left-recursive */ }
assignment: lval '=' assignment { /* right-recursive */ }
The above examples are both direct recursion; the non-terminal directly derives a sequence containing the non-terminal. Recursion can also be indirect; it is still recursion.
All common parsing algorithm process left-to-right, which is the first L in LL and LR. Top-down (LL) parsing finds a leftmost derivation (the second L), while bottom-up (LR) parsing finds a rightmost derivation (the R).
Effectively, both types of parser start with a single non-terminal (the start symbol) and "guess" a derivation based on some non-terminal in the current sequence until the input text is derived. In a leftmost derivation, it is always the leftmost non-terminal which is expanded. In a rightmost derivation, it is always the rightmost non-terminal.
So a top-down parser always guesses which production to use for the first non-terminal, after which it needs to again work on whatever is now the first non-terminal. ("Guess" here is informal. It can look at the input to be matched -- or at least the next k tokens of the input -- in order to determine which production to use.) This is called top-down processing because it builds the parse tree from the top down.
It's easier (at least for me) to visualize the action of a bottom-up parser in reverse; it builds the parse tree bottom up by repeatedly reading just enough of the input to find some production, which will be the last derivation in the derivation chain. So it does produce a rightmost derivation, but it outputs it back-to-front.
In an LR grammar for an operator language (roughly speaking, a grammar for languages which look like arithmetic expressions), left- and right- associativity are modelled using left- and right-recursive grammar rules, respectively. "Associativity" is an informal description of the grammar, as is "precedence".
Precedence is modelled by using a series of grammar rules, each of which refers to the next rule (and which usually end up with a recursive production for handling parentheses -- '(' expr ')' -- which is neither left- nor right-recursive).
There is an older style of bottom-up parsing, called "operator precedence parsing", in which precedence is explicitly part of the language description. One common operator-precedence algorithm is the so-called Shunting Yard algorithm. But if you have an LALR(1) parser generator, like bison, you might as well use that instead, because it is both more general and more precise.
(I am NOT an expert on parser and compiler theory. I happen to be learning something related. And I'd like to share something I have found so far.)
I strongly suggest taking a look at this awesome article.
It explains and illustrats the LL and LR algorithm. You can clearly see why LL is called top-down and LR is called bottom-up.
Some quotation:
The primary difference between how LL and LR parsers operate is that
an LL parser outputs a pre-order traversal of the parse tree and an LR
parser outputs a post-order traversal.
...
We are converging on a very simple model for how LL and LR parsers
operate. Both read a stream of input tokens and output that same token
stream, inserting rules in the appropriate places to achieve a
pre-order (LL) or post-order (LR) traversal of the parse tree.
...
When you see designations like LL(1), LR(0), etc. the number in
parentheses is the number of tokens of lookahead.
And as to the acronyms: (source)
The first L in LR and LL means: that the parser reads input
text in one direction without backing up; that direction is typically
Left to right within each line, and top to bottom across the lines of
the full input file.
The remaining R and L means: right-most and left-most derivation, respectively.
These are 2 different parsing strategies. A parsing strategy determines the next non-terminal to rewrite. (source)
For left-most derivation, it is always the leftmost nonterminal.
For right-most derivation, it is always the rightmost nonterminal.

SLR(1) and LALR(1) and Reduce

I confused Exactly !!!!!!
I read following example in one of my professor note.
1) we have a SLR(1) Grammar G as following. we use SLR(1) parser generator and generate a parse table S for G. we use LALR(1) parser generator and generate a parse table L for G.
S->AB
A->dAa
A-> lambda (lambda is a string with length=0)
B->aAb
Solution: the number of elements with R (reduce) in S is more than L.
but in one site i read:
2) Suppose T1, T2 is created with SLR(1) and LALR(1) for Grammar G. if G be a SLR(1) Grammar which of the following is TRUE?
a) T1 and T2 has not any difference.
b) total Number of non-error entries in T1 is lower than T2
c) total Number of error entries in T1 is lower than T2
Solution:
The LALR(1) algorithm generates exactly the same states as the SLR(1) algorithm, but it can generate different actions; it is capable of resolving more conflicts than the SLR(1) algorithm. However, if the grammar is SLR(1), both algorithms will produce exactly the same machine (a is right).
any one could describe for me which of them is true?
EDIT: infact my question is why for a given SLR(1) Grammar, the parse table of LALAR(1) and SLR(1) is exactly the same, (error and non-error entries are equal and number of reduce is equal) but for the above grammar, the number of Reduced in S is more than L.
I see in another book that in general we have:
Summary:
1) for the above grammar that i wrote in question 1, why number of reduced is different?
2) if we have a SLR(1) Grammar, why the table is exactly the same? (number of reduced and error entries become the same)
Both of these statements are true!
One of your questions was why SLR(1) and LALR(1) parsers have the same states as one another. SLR(1) parsers are formed by starting with an LR(0) automaton, then augmenting each production with lookahead information from FOLLOW sets. In an LALR(1) parser, we begin with an LR(1) parser (where each production has very precise lookahead information), then combine any two states that have the same underlying LR(0) state. This results in an LR(0) automaton with additional information because each LR(0) state corresponds to at least one LR(1) state and each LR(1) state corresponds to some underlying LR(0) state.
SLR(1) and LALR(1) parsers both have the same set of states, which are the same states as in an LR(0) parser. The parsers differ only in what actions they perform in each state.
In both SLR(1) and LALR(1) parsers, each item has an associated set of lookahead tokens. Whenever the parser enters a state with a reduce item in it, the parser will perform that reduction if the next token of input is in the lookahead set. In an SLR(1) parser, the lookahead set is the FOLLOW set for the nonterminal on the left-hand side of the production. In an LALR(1) parser, the lookahead set is, appropriately, called the LA set for the combination of the nonterminal in the production and the automaton state.
You can prove that the LA sets used in an LALR(1) parser are subsets of the FOLLOW sets used in SLR(1) parsers. This means that LALR(1) parsers will never have more reduce actions than SLR(1) parsers, and in some cases the LALR(1) parsers will choose to shift when an SLR(1) parser would have a shift/reduce conflict.
Hope this helps!
Answer to Q1:
First of all you need to create DFA for SLR(1) and LALR(1) parsers. I created DFA for both of them.
Link to images of DFAs SLR(1) and LALR(1) DFAs
For SLR(1) I got 10 states and 10 reduce entries whereas for LALR(1) I created DFA for CLR(1) with 13 states which got minimized to 10 states with 7 reduce entries. Thats answers your first question.
Answer to Q2:
G is SLR(1) grammar, then surely there are no conflicts (or error) S-R or R-R in the SLR(1) table. LALR(1) has more power than SLR(1),therefore there is also no conflict in LALR(1) table for given grammar G. Lets see option by option
(c) : there no error in T1 and T2 (wrong option)
(b) : Non-error entries means shift entries and reduce entries. It should be clearly noted that in bottom up parsers from parser to parser only rules for reduce entries changes while for that of shift entries remain same. For e.g in LR(0) reduce entries are made in each column, for SLR(1) it is done in FOLLOW of left hand side variable, while in CLR(1) and LALR(1) reduce entries are made in lookahead symbols. Thus reduce entries changes from parser to parser but shift entries are same.
We have also already proved in Q1 where reduce entries of SLR(1) parsing table are more than that of LALR(1). Therefore proving (b) option to be incorrect.
(a) T1 and T2 may come out to be same but not always. And other important thing is that multiple choice questions sometimes wants you to choose most appropriate option. Thus for me (a) is the answer

Parsing cases that need much lookahead

Most parsing can be done by looking only at the next symbol (character for lexical analysis, token for parsing proper), and most of the remaining cases can be handled by looking at only one symbol after that.
Are there any practical cases - for programming languages or data formats in actual use - that need several or indefinitely many symbols of lookahead (or equivalently backtracking)?
As I recall, Fortran is one language in which you need a big lookahead buffer. Parsing Fortran requires (in theory) unbounded lookahead, although most implementations limit the length of a statement line, which puts a limit on the size of the lookahead buffer.
Also, see the selected answer for Why can't C++ be parsed with a LR(1) parser?. In particular, the quote:
"C++ grammar is ambiguous, context-dependent and potentially requires infinite lookahead to resolve some ambiguities".
Knuth proved that any LR(k) grammar can be mechanically transformed to a LR(1) one. Also, AFAIK, any LR(k) grammar can be parsed in time proportional to the length of the parsed string. As it includes LR(1), I don't see what use there would be for implementing LR(k) parsing with k > 1.
I never studied the LR(k)->LR(1) transformation, so it may be possible that it is not that practical for some cases, though.

Resources