Different methods of implementing a specific parsing rule for a compiler

Different methods of implementing a specific parsing rule for a compiler - parsing

Let's say we have a rule in parsing tokens that specifies:
x -> [y[,y]*]
Where the brackets '[ ]' mean that anything in them is optional in order for the rule to take place and the * means 0 or more.
e.g it could be:
x : (empty)
OR
x : y
OR
x : y,y
as well etc. (the above are examples of input that 'x' rule would be activated, not how the code should be)
I have tried the following that works already
x : y commaY
|
;
commaY : COMMA y commaY
|
;
I would like to know alternative options in the above that would make it work, if there are any, for educational purposes.
Thank you in advance.

EDIT my earlier answer was incorrect (as pointed out in the comments), but I cannot remove an accepted answer, so I decided to edit it.
You will need (at least) 2 rules for x -> [y[,y]*]. Here is another possibility:
x
: list
| /* eps */
;
list
: list ',' y
| y
;

Related

Applying YACC to GCODE (GRBL)

GCode is language used to tell multi-axis (CNC) robots how to move.
It looks like this :
M3 S5000 (Start Spindle Clockwise at 5000 RPM)
G21 (All units in mm)
G00 Z1.000000 (lift Z axis up by 1mm)
G00 X94.720505 Y-14.904622 (Go to this XY coordinate)
G01 Z0.000000 F100.0 (Penetrate at 100mm/m)
G01 X97.298434 Y-14.870127 F400 (cut to here)
G03 X98.003848 Y-14.275867 I-0.028107 J0.749174 (cut an arc)
G00 Z1.000000 (lift Z axis)
etc.
I have layed these commands out in sentences, but each token could be on a separate line.
And in fact there are no rules about numbers being concatenated to their respective code letters. Yet I already have a LEX parser which can get me the tokens as described below.
Note that certain commands (M or G codes) have parameters.
In the case of M3, it can have an S (spindle speed) parameter.
G0 and G1 can have X,Y,Z,F etc.
G3 can have X,Y,Z,I,J,R...
However each G code does not require ALL those parameters, just one, many or all.
One thing to note here is that we are cutting a single path, then lifting the z axis.
That is, we move to a location above the work surface, penetrate, cut a path then lift off.
I would call this a 'block' or a 'path' and it is this that I'm interested in.
I need to be able to parse GCode in any messy format and then create a structure of 'blocks', where a block is any series of 'commands' between an z axis down and up.
I can tokenise this language using LEX (python PLY specifically).
And get :
type M value 3
type S value 5000
type COMMENT value "Start Spindle Clockwise at 5000 RPM"
type G value 31
type COMMENT value "All unites in mm"
type G value 0
type Z value 1.0
etc.
Now using Lexx I need a rule for a thing called a 'command'.
A command is any comment, or :
A 'G' or 'M' code followed by ANY of the appropriate parameter codes (X,Y,Z etc.)
Command ends when another command (comment, G or M) is encountered.
Then I need a thing called a 'block',
where a block is any set of 'commands' that come after a Z down and before a Z up.
There are 100 G codes and 100 M Codes and 25 parameter codes (A-Z minus G and M)
A rule for 'command' might look like :
command : G F H I J K L S T W X Y Z (how to specify ONE OF)
| M S F (How to specify one of)
| COMMENT
And then how would we define block!?
I realise this is a very long post, but if anyone can give me even an idea as to whether YACC can do this? Otherwise I'll just write some code that converts the lex tokens into a tree manually.
Addendum #rici
Thank you for taking the time to understand this question.
By way of feedback:
My task in full is to get YACC to do the heavy lifting of separating chunks of code into blocks based on different use cases.
For example When 'engraving', often a block will represent a letter or some other shape (in the xy plane). So a block will be defined by the movement of the z axis in and out of the xy plane.
I want to be able to post process blocks:
hatch fill a 'block'. which will involve some fairly complicated calculation of path boundaries, tangents to those boundaries, tool diameter etc. This is the most pressing use case and I haven't a good solution to this yet but I know it can be done because it can be done in Inkscape (vector graphics application)
rotate by n degrees. A fairly simply coordinate translation, I have a solution for this already.
iteratively deepen (extrude). Copy blocks and adjust Z depth on each iteration. Simple.
etc.

If you just want to ensure that a G command is followed by something, you can do this:
g_modifier: F | H | I | J | K | L | S | T | W | X | Y | Z
m_modifier: S | F
g_command: G g_modifier | g_command g_modifier
m_command: M m_modifier | m_command m_modifier
command: g_command | m_command | COMMENT
If you want to split those into sequences using the presence of a Z modifier, that can be done. You might want the lexer to be able to produce two different Z token types, based on the sign of the argument, because the parser can only make syntax decision based on tokens, not on semantic values.
Your question provides at least two different definitions of a block, making it a bit difficult to provide a clear answer.
"That is, we move to a location above the work surface, penetrate, cut a path then lift off. I would call this a 'block' or a 'path' and it is this that I'm interested in."
That would be, for example:
G00 X94.7 Y-14.9 (Move)
G01 Z0.0 (Penetrate)
G01 X97.2 Y-14.8 G03 X98.0 Y-14.2 I-0.02 J0.7 (Path)
G00 Z1.0 (Lift)
But later you say, "a block is any set of 'commands' that come after a Z down and before a Z up.
That would be just this part of the previous example:
G01 X97.2 Y-14.8 G03 X98.0 Y-14.2 I-0.02 J0.7 (Path)
Those are both possible, but obviously different. Here are some possible building blocks:
# This list doesn't include Z words
g_modifier: F | H | I | J | K | L | S | T | W | X | Y
g_command_no_z: G g_modifier
| g_command_no_z g_modifier
# This doesn't distinguish between Z up and Z down. If you want that to
# affect syntax, you need two different Z tokens, and then two different
# with_z non-terminals.
g_command_with_z: G Z
| g_command_no_z Z
| g_command_with_z g_modifier
# You might or might not want this.
# It's a non-empty sequence of G or M commands with no Z's.
path: command_no_z
| path command_no_z
command_no_z: COMMENT
| m_command
| g_command_no_z

What is `where .force`?

I've been playing around with the idea of writing programs that run on Streams and properties with them, but I feel that I am stuck even with the simplest of things. When I look at the definition of repeat in Codata/Streams in the standard library, I find a construction that I haven't seen anywhere in Agda: λ where .force →.
Here, an excerpt of a Stream defined with this weird feature:
repeat : ∀ {i} → A → Stream A i
repeat a = a ∷ λ where .force → repeat a
Why does where appear in the middle of the lambda function definition?, and what is the purpose of .force if it is never used?
I might be asking something that is in the documentation, but I can't figure out how to search for it.
Also, is there a place where I can find documentation to use "Codata" and proofs with it? Thanks!

Why does where appear in the middle of the lambda function definition?,
Quoting the docs:
Anonymous pattern matching functions can be defined using one of the
two following syntaxes:
\ { p11 .. p1n -> e1 ; … ; pm1 .. pmn -> em }
\ where p11 .. p1n -> e1 … pm1 .. pmn -> em
So λ where is an anonymous pattern matching function. force is the field of Thunk and .force is a copattern in postfix notation (originally I said nonsense here, but thanks to #Cactus it's now fixed, see his answer).
Also, is there a place where I can find documentation to use "Codata" and proofs with it? Thanks!
Check out these papers
Normalization by Evaluation in the Delay Monad
A Case Study for Coinduction via Copatterns and Sized Types
Equational Reasoning about Formal Languages in Coalgebraic Style
Guarded Recursion in Agda via Sized Types

As one can see in the definition of Thunk, force is the field of the Thunk record type:
record Thunk {ℓ} (F : Size → Set ℓ) (i : Size) : Set ℓ where
coinductive
field force : {j : Size< i} → F j
So in the pattern-matching lambda, .force is not a dot pattern (why would it be? there is nothing prescribing the value of the parameter), but instead is simply syntax for the record field selector force. So the above code is equivalent to making a record with a single field called force with the given value, using copatterns:
repeat a = a :: as
where
force as = repeat a
or, which is actually where the .force syntax comes from, using postfix projection syntax:
repeat a = a :: as
where
as .force = repeat a

Resolving left-recursion in my grammar

My grammar has a case of left-recursion in the sixth production rule.
I resolved this by replacing Rule 6 and 7 like this:
I couldn't find any indirect left recursions in this grammar.
The only thing that bothers me is the final production rule, which has a terminal surrounded by two non-terminals.
My two questions are:
Is my resolved left recursion correct?
Is the final production rule a left recursion? I am not sure how to
treat this special case.

Yes, your resolution is correct. You may want to remove the epsilon rule for ease of use, but the accepted strings are correct.
X -> -
X -> -Z
Z -> +
Z -> +Z
Z -> X + Y
... and Y is of the form 0* 1 (no syntax collisions)
As a check, note that you could now replace this final rule with two new rules, one for each expansion of X:
Z -> - + Y
Z -> -Z + Y
This removes X entirely from the Z rules, and each Z rule would then begin with a terminal.
No, your final production rule is no longer left-recursive. X now must resolve to a string beginning with a non-terminal.
I have to admit, though, I'm curious about what use the language has. :-)

syntax directed definition for S -> '{' L '}' ; L-> L S | null

**Exercise 5.4.4** Write L-attributed SDD's analogous to that of Example 5.19
for the fo llowing productions, each of which represents a familiar flow-of-control
construct , as in the programming language C. You may need to generate a three
address statement to jump to a particular label L, in which case you should
generate g;oto L.
c)S -> '{' L '}' ; L-> L S | null
This is the question from Dragon book exercise. I am confuse here whether it is List or other. if this is the list then i have attempted in this way:
List = new()
List.next = S.next
and the list continue with List1,2,....n.
I just want to confirm whether i am going right or wrong?
here is the book link: http://www.slideshare.net/rajarshisbaisthakurforever/dragon-book-for-compiler-2v2. the page number is 337 chapter 5 section 5.5. IMPLEMENTING L-ATTRIB UTED SDD 'S

bison shift/reduce conflict

in the following simple grammar, on the conflict at state 4,
can 'shift' become the taken action without changing the rules ?
(I thought that by default shift was bison's preferred action)
%token one two three
%%
start : a;
a : X Y Z;
X : one;
Z : two | three;
Y : two | ;
%%

shift is bison's preferred action, and you can see in the state output that it will shift two in state 4. It will still report a shift-reduce conflict, but you can take that as a warning if you like. (See %expect.) You'd probably be better off fixing the grammar:
start : a;
a : X Z | X Y Z;
X : one;
Y : two;
Z : two | three;

Shift is the default, but that results in the generated parser giving an error for the input one two so that is probably not what you want. Instead, follow rici's advice and fix the grammar.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart