syntax directed definition for S -> '{' L '}' ; L-> L S | null - parsing

**Exercise 5.4.4** Write L-attributed SDD's analogous to that of Example 5.19
for the fo llowing productions, each of which represents a familiar flow-of-control
construct , as in the programming language C. You may need to generate a three­
address statement to jump to a particular label L, in which case you should
generate g;oto L.
c)S -> '{' L '}' ; L-> L S | null
This is the question from Dragon book exercise. I am confuse here whether it is List or other. if this is the list then i have attempted in this way:
List = new()
List.next = S.next
and the list continue with List1,2,....n.
I just want to confirm whether i am going right or wrong?
here is the book link: http://www.slideshare.net/rajarshisbaisthakurforever/dragon-book-for-compiler-2v2. the page number is 337 chapter 5 section 5.5. IMPLEMENTING L-ATTRIB UTED SDD 'S

Related

ANTLR 4 Parser Grammar

How can I improve my parser grammar so that instead of creating an AST that contains couple of decFunc rules for my testing code. It will create only one and sum becomes the second root. I tried to solve this problem using multiple different ways but I always get a left recursive error.
This is my testing code :
f :: [Int] -> [Int] -> [Int]
f x y = zipWith (sum) x y
sum :: [Int] -> [Int]
sum a = foldr(+) a
This is my grammar:
This is the image that has two decFuncin this link
http://postimg.org/image/w5goph9b7/
prog : stat+;
stat : decFunc | impFunc ;
decFunc : ID '::' formalType ( ARROW formalType )* NL impFunc
;
anotherFunc : ID+;
formalType : 'Int' | '[' formalType ']' ;
impFunc : ID+ '=' hr NL
;
hr : 'map' '(' ID* ')' ID*
| 'zipWith' '(' ('*' |'/' |'+' |'-') ')' ID+ | 'zipWith' '(' anotherFunc ')' ID+
| 'foldr' '(' ('*' |'/' |'+' |'-') ')' ID+
| hr op=('*'| '/' | '.&.' | 'xor' ) hr | DIGIT
| 'shiftL' hr hr | 'shiftR' hr hr
| hr op=('+'| '-') hr | DIGIT
| '(' hr ')'
| ID '(' ID* ')'
| ID
;
Your test input contains two instances of content that will match the decFunc rule. The generated parse-tree shows exactly that: two sub-trees, each having a deFunc as the root.
Antlr v4 will not produce a true AST where f and sum are the roots of separate sub-trees.
Is there any thing can I do with the grammar to make both f and sum roots – Jonny Magnam
Not directly in an Antlr v4 grammar. You could:
switch to Antlr v3, or another parser tool, and define the generated AST as you wish.
walk the Antlr v4 parse-tree and create a separate AST of your desired form.
just use the parse-tree directly with the realization that it is informationally equivalent to a classically defined AST and the implementation provides a number practical benefits.
Specifically, the standard academic AST is mutable, meaning that every (or all but the first) visitor is custom, not generated, and that any change in the underlying grammar or an interim structure of the AST will require reconsideration and likely changes to every subsequent visitor and their implemented logic.
The Antlr v4 parse-tree is essentially immutable, allowing decorations to be accumulated against tree nodes without loss of relational integrity. Visitors all use a common base structure, greatly reducing brittleness due to grammar changes and effects of prior executed visitors. As a practical matter, tree-walks are easily constructed, fast, and mutually independent except where expressly desired. They can achieve a greater separation of concerns in design and easier code maintenance in practice.
Choose the right tool for the whole job, in whatever way you define it.

What do you call this and how to read this? (Parsing for Scheme)

At the moment, I am learning how to parse Scheme in Java. Here is the basic list (I do not know what its formal name is) Edit: Grammar!
exp -> ( rest
| #f
| #t
| ' exp
| integer_constant
| string_constant
| identifier
rest -> )
| exp+ [ . exp ] )
My question is: What is that list called, like what is the formal name for it? "Parse list"? Edit: According to a comment, it's called grammar.
And how to read it? My guess is that the expression goes in between the the left and right parenthesis, example: ( exp ).
Additionally I guess any of the objects between the lines exp -> ( rest and rest ->), #f, #t, ' exp, integer_constant, string_constant, identifier go in place of the expression in the previous example example. Like for example: ( #t )
And the last item on the list is | exp+ [ . exp] ), which I suppose is another expression to the right of the first right parenthesis such as for example with the previous example: ((#t) exp)?
Lastly, this bit [ . exp], the bracket just says it is optional?
If I am wrong, please correct me.
This is called a grammar. There are many different syntaxes for writing down grammars, but they are all quite similar to each other.
Here -> can be read as "is", | as "or", + as one or more, and [], as you suspected, as "optionally". The other symbols used here just stand for themselves. So this grammar can be read like this:
An expression is:
an opening parentheses followed by a "rest" (see 2)
OR a hash mark followed by the letter f
OR a hash mark followed by the letter t
OR a single quote followed by an expression
OR an integer constant (like 123)
OR a string constant (like "foo")
OR an identifier (like foo)
A "rest" is:
a closing parentheses
OR one or more expressions, optionally followed by a dot and one other expression, followed by a closing parenthesis
So foo is an expression (because identifiers are expressions), () is an expression (because ) is a "rest" and ( rest is an expression, (foo) is an expression (because foo is an expression, exp ) is a "rest" and ( rest is an expression) and so on.

Is Mediawiki markup context-sensitive?

There seems to be disagreement over whether MediaWiki markup (the markup language used to create and edit Wikipedia articles) is context-free or context-sensitive.
See http://www.mediawiki.org/wiki/User_talk:Kanor#Response_to_article_in_Meatball
I would argue that it is obviously context-sensitive. One example of this would be terminal characters in wikimarkup lists. Lists are formed like:
* One thing
* Another thing
* Yet another thing
The end of a list item is indicated by a carriage return.
However, if the list is nested in, say a table or transclusion, then the end of a list item may either be a carriage return, or a table/transclusion terminal symbol. For example, the following seems to be valid markup:
{{Infobox person
* One thing
* Another thing
* Yet another thing}}
However, a parser would need to keep track of the context, e.g. the fact that it is currently nested within a transclusion, when it encounters the }} symbol, instead of an end-line (carriage-return) character, when determining the end of the last list item.
So... how is this possibly not context-sensitive?
"Context-sensitive" has a precise formal definition, and it does not appear to match your intuition. The grammar
S -> P | E
P -> '(' T '.' ')'
E -> '[' T '!' ']'
T -> <any context-free grammar fragment>
is context free (even regular, if T is regular), despite the fact that what comes after T (dot/exclamation mark) depends on the first character: There are no "context non-terminals" on the left hand side. Even arbitrary nesting isn't a problem:
S -> A | B
A -> '(' S ')'
B -> '[' S ']'
The parser has to remember which unmatched opening braces it has seen so far, but it does not need context in the sense of context-free/-sensitive grammars. These particular grammars aren't even ambiguous (again a formal term, also used in the Wiki user page you link to).
Context free does not mean "parser needs no working memory", or equivalently "parser can be restricted to look at every single token in complete isolation".

Better way to build lists of matches in parser generator

(I use Yecc, an Erlang parser generator similar to Yacc, so the syntax is different from Yacc)
The problem is simple, say we want parse a lispy syntax, i want match on expressions-lists.
An expression list is a list of expressions separated with blank space.
In Erlang, [1,3,4] is a list and ++ concatenates two lists.
We want match this 1 (1+2) 3. expression will match 1, (1+2) and 3. So, there i match on the list followed by one more expression, and if there is no match i end to match on a single expression. This builds a list recursively.
expressionlist -> expressionlist expression : '$1' ++ ['$2'].
expressionlist -> expression : ['$1'].
But i can do this too (invert the order):
expressionlist -> expression expressionlist : ['$1'] ++ '$2'.
expressionlist -> expression : ['$1'].
Both of theese seem to work, i would like to know if there was any difference.
With a separator
I want to match {name = albert , age = 43}. propdef matches name = value. So it is the same problem but with an additional separator ,. Is there any difference there from the first problem ?
proplist -> propdef ',' proplist : ['$1'] ++ '$3'.
proplist -> propdef : ['$1'].
proplist -> '{' proplist '}' : '$2'.
proplist -> '{' '}' : [].
%% Could write this
%% proplist -> proplist ',' propdef : '$1' ++ ['$3'].
Thank you.
Since Yecc is an LALR parser generator, the use of left recursion or right recursion doesn't matter much. In old times, people would prefer left recursion in Yacc/Bison and similar tools because it allows the parser to keep reducing instead of shifting everything onto the stack until the end of the list, but these days, stack space and speed isn't that important, so you can pick whatever suits you best. (Note that in an LL parser, left recursion causes an infinite loop, so for such parsers, right recursion is necessary.)
The more important thing, for your example above, is that '$1' ++ ['$2'] in the left recursive version will cause quadratic time complexity, since the "expressionlist" part is the one that's going to be longer and longer. You should never have the growing component on the left when you use the ++ operator. If you parse a list of thousands of elements, this complexity will hurt you. Using the right recursive version instead, ['$1'] ++ '$2' will give you linear time, even if the parser has to shift the whole list onto the stack before it starts reducing. You could try both versions and parse a really long list to see the difference.
Using a separator as in "propdef ',' proplist" does not change the problem.

Producing Expressions from This Grammar with Recursive Descent

I've got a simple grammar. Actually, the grammar I'm using is more complex, but this is the smallest subset that illustrates my question.
Expr ::= Value Suffix
| "(" Expr ")" Suffix
Suffix ::= "->" Expr
| "<-" Expr
| Expr
| epsilon
Value matches identifiers, strings, numbers, et cetera. The Suffix rule is there to eliminate left-recursion. This matches expressions such as:
a -> b (c -> (d) (e))
That is, a graph where a goes to both b and the result of (c -> (d) (e)), and c goes to d and e. I'm trying to produce an abstract syntax tree for these expressions, but I'm running into difficulty because all of the operators can accept any number of operands on each side. I'd rather keep the logic for producing the AST within the recursive descent parsing methods, since it avoids having to duplicate the logic of extracting an expression. My current strategy is as follows:
If a Value appears, push it to the output.
If a From or To appears:
Output a separator.
Get the next Expr.
Create a Link node.
Pop the first set of operands from output into the Link until a separator appears.
Erase the separator discovered.
Pop the second set of operands into the Link until a separator.
Push the Link to the output.
If I run this through without obeying steps 2.3–2.7, I get a list of values and separators. For the expression quoted above, a -> b (c -> (d) (e)), the output should be:
A sep_1 B sep_2 C sep_3 D E
Applying the To rule would then yield:
A sep_1 B sep_2 (link from C to {D, E})
And subsequently:
(link from A to {B, (link from C to {D, E})})
The important thing to note is that sep_2, crucial to delimit the left-hand operands of the second ->, does not appear, so the parser believes that the expression was actually written:
a -> (b c -> (d) (e))
In order to solve this with my current strategy, I would need a way to produce a separator between adjacent expressions, but only if the current expression is a From or To expression enclosed in parentheses. If that's possible, then I'm just not seeing it and the answer ought to be simple. If there's a better way to go about this, however, then please let me know!
I haven't tried to analyze it in detail, but: "From or To expression enclosed in parentheses" starts to sound a lot like "context dependent", which recursive descent can't handle directly. To avoid context dependence you'll probably need a separate production for a From or To in parentheses vs. a From or To without the parens.
Edit: Though it may be too late to do any good, if my understanding of what you want to match is correct, I think I'd write it more like this:
Graph :=
| List Sep Graph
;
Sep := "->"
| "<-"
;
List :=
| Value List
;
Value := Number
| Identifier
| String
| '(' Graph ')'
;
It's hard to be certain, but I think this should at least be close to matching (only) the inputs you want, and should make it reasonably easy to generate an AST that reflects the input correctly.

Resources