How can I set priority for custom operators? (if that is possible)
like * or / has higher priority then + or - I want to add such rule to my operators.
Precedence is decided by the table at the bottom of this page: http://msdn.microsoft.com/en-us/library/dd233228.aspx
In particular the order is :
|,',',||,&,&&,< op, >op, =, |op, &op,&&& , |||, ^^^, ~~~, <<<, >>>,^ op,::,- op, +op,
(binary),* op, /op, %op,** op,prefix operators (+op, -op, %, %%, &, &&, !op, ~op)
From the same page:
F# supports custom operator overloading. This means that you can
define your own operators. In the previous table, op can be any valid
(possibly empty) sequence of operator characters, either built-in or
user-defined. Thus, you can use this table to determine what sequence
of characters to use for a custom operator to achieve the desired
level of precedence. Leading . characters are ignored when the
compiler determines precedence.
Related
Let's say I'm writing a parser that parses the following syntax:
foo.bar().baz = 5;
The grammar rules look something like this:
program: one or more statement
statement: expression followed by ";"
expression: one of:
- identifier (\w+)
- number (\d+)
- func call: expression "(" ")"
- dot operator: expression "." identifier
Two expressions have a problem, the func call and the dot operator. This is because the expressions are recursive and look for another expression at the start, causing a stack overflow. I will focus on the dot operator for this quesition.
We face a similar problem with the plus operator. However, rather than using an expression you would do something like this to solve it (look for a "term" instead):
add operation: term "+" term
term: one of:
- number (\d+)
- "(" expression ")"
The term then includes everything except the add operation itself. To ensure that multiple plus operators can be chained together without using parenthesis, one would rather do:
add operation: term, one or more of ("+" followed by term)
I was thinking a similar solution could for for the dot operator or for function calls.
However, the dot operator works a little differently. We always evaluate from left-to-right and need to allow full expressions so that you can do function calls etc. in-between. With parenthesis, an example might be:
(foo.bar()).baz = 5;
Unfortunately, I do not want to require parenthesis. This would end up being the case if following the method used for the plus operator.
How could I go about implementing this?
Currently my parser never peeks ahead, but even if I do look ahead, it still seems tricky to accomplish.
The easy solution would be to use a bottom-up parser which doesn't drop into a bottomless pit on left recursion, but I suppose you have already rejected that solution.
I don't understand your objection to using a looping construct, though. Postfix modifiers like field lookup and function call are not really different from binary operators like addition (except, of course, for the fact that they will not need to claim an eventual right operand). Plus and minus intermingle freely, which you can parse with a repetition like:
additive: term ( '+' term | '-' term )*
Similarly, postfix modifiers can be easily parsed with something like:
postfixed: atom ( '.' ID | '(' opt-expr-list `)` )*
I'm using a form of extended BNF: parentheses group; | separates alternatives and binds less stringly than concatenation; and * means "zero or more repetitions" of the atom on its left.
Another postfix operator which falls into the same category is array/map subscripting ('[' expr ']'), although you might also have other postfix operators.
Note that like the additive syntax above, selecting the appropriate alternative does not require looking beyond the next token. It's hard to parse without being able to peek one token into the future. Fortunately, that's very little overhead.
One way could be for the dot operator to parse a non-dot expression, that is, a rule that is the same as expression but without the dot operator. This prevents recursion.
Then, when the non-dot expression has been parsed, check if a dot and an identifier follows. If this is not the case, we are done. If this is the case, wrap the current node up in a dot operation node. Then, keep track of the entire string text that has been parsed for this operation so far. Then revert everything back to before the operation was being parsed, and now re-parse a "custom expression", where the first directly-nested expression would really be trying to match the exact string that was parsed before rather than a real expression. Repeat until there are no more dot-identifier pairs (this should happen automatically by the new "custom expression").
This is messy, complicated and possibly slow, and I'm not entirely sure if it'll work but I'll try it out. I'd appreciate alternative solutions.
Trying to parse operators (+, -, =, <<, !=), using states like
%{
%}
OP ["+"|";"|":"|","|"*"|"/"|"="|"("|")"|"{"|"}"|"*"|"#"|"$"|
"<"|">"|"&"|"|"|"!"|]
DOUBOP [":="|".."|"<<"|">>"|"<>"|"<="|">="|"=>"|"**"|"!="|"{:"|"}:"|"\-"]
and later on
{DOUBOP} { printf("%s (operator)\n", yytext); }
{OP} { printf("%s (operator)\n", yytext); }
but Lex is identifying operators like "<<" as "<" and "<". I thought since it was in double quotes this would work, but I see that's not the case.
Is there anyway I can give a regular expression precedence, ie have lex check for a double operator first, and then a single operator?
Thanks in advance.
[...] is a character class, not an eccentric type of parenthesis. If you want to parenthesize a sub-expression in a pattern, use ordinary parentheses. In this case, however, parentheses are not necessary. (Indeed, most of the quotes aren't necessary either, but they don't hurt and some of them would be useful.)
"==" recognises the two character-sequence consisting of two equal signs. "=="|"++" recognizes either two equal signs or two plus signs.
By contrast, ["=="] recognises a single character, which could be either a quote or an equals sign. Since a character class is a set, the fact that each of those appears twice is irrelevant (although I think it would save a lot of grief if flex issued a warning). Similarly, ["=="|"<<"] recognises a single character if it is a quote, an equals sign, a vertical bar or a less than sign.
Flex pattern syntax is documented in the flex manual. It differs in a few ways from regexes in other systems, so it's worth reading the short document. However, character classes are mostly the same in all regex syntaxes in common use, especially the use of square brackets to delimit the set.
An easier way is to put all single characters together, and run the * command on the end up curly braces.
i.e.
OP ["+"|";"|":"|","|"*"|"/"|"="|"("|")"|"{"|"}"|"*"|"#"|"$"|
"<"|">"|"&"|"|"|"!"|]*
There is one thing which I don't understand about reference modification in Cobol.
The example goes like this:
MOVE VARIABLE(VARIABLE2 +4:2) TO VARIABLE3
Now I do not qutie understand what the "+4:2" references to. Does it mean that the first two signs 4 signs after the target are moved? Meaning if for example VARIABLE (the 1st) is filled with "123456789" and VARIABLE2 contains the 2nd and 3rd position within that variable (so"23"), the target is "23 +4" meaning "789". Then the first two positions in the target (indicated by the ":2") are moved to VARIABLE3. So in the end VARIABLE3 would contain "78".
Am I understanding this right or am I making a false assumption about that instruction?
(VARIABLE2 +4:2) is a syntax error, because the starting position must be an arithmetic expression. There must be a space after the + for this reference modification to be valid. And, VARIABLE2 must be numeric and the expression shall evaluate to an integer.
Once corrected, then 4 is added to the content of VARIABLE2. That is the left-most (or starting position) within VARIABLE1 for the move. 2 characters are moved to VARIABLE3. If VARIABLE3 is longer than two characters, the remaining positions are filled with spaces.
From the 2002 COBOL standard:
8.7.1 Arithmetic operators
There are five binary arithmetic operators and two unary arithmetic operators that may be used in arithmetic expressions. They are represented by specific COBOL characters that shall be preceded by a space and followed by a space except that no space is required between a left parenthesis and a unary operator or between a unary operator and a left parenthesis.
Emphasis added.
Being new to Xtext I would like to know how to define the upper and lower boundaries regarding the occurrence of a letter.
I know of the following expressions / operators
exactly one (the default, no operator)
one or none (operator ?)
any (zero or more, operator *)
one or more (operator +)
Given the examples
<IS123A4>
<IS12>
<ISB123455>
how do I describe the grammar for the rule that after "IS" 1-25 alphanumeric letters may appear.
Currently, I have
`terminal ISCONCEPTAME : '<IS' ALPHANUM ALPHANUM? ALPHANUM? ALPHANUM?.....'>';`
`terminal ALPHANUM: ('a'..'z'|'A'..'Z'|'_'|INT);`
However, I am not sure if it is the right way to do it. I was thinking about something like
`terminal ISCONCEPTAME : '<IS' ALPHANUM{1,25} '>';`
Thanks for any input!
I have a set of chars which I define in the TYPE section as:
TAmpls = set of '1'..'9'';
In my function I declare a new variable, in the var section, with type Tampls using:
myAmpls : Tampls;
I then un-assign everything in myAmpls using:
myAMpls := [];
I then find an integer (I'll call it n). If this number is not assigned in my set variable, I want to assign it, for this I have tried using:
if not chr(n) in myAmpls then include(myAmpls,chr(n));
But the compiler throws an error saying:
'Operator not applicable to this operand type'
If I remove the 'not', the code compiles fine, why is this?
I would have thought that whether or not n was already in myAmpls was boolean, so why can't I use 'not'?
Delphi operator precedence is detailed in the documentation. There you will find a table of the operators listing their precedence. I won't reproduce the table here, no least because it's hard to lay out in markdown!
You will also find this text:
An operator with higher precedence is evaluated before an operator with lower precedence, while operators of equal precedence associate to the left.
Your expression is:
not chr(n) in myAmpls
Now, not has higher precedence than in. Which means that not is evaluated first. So the expression is parsed as
(not chr(n)) in myAmpls
And that is a syntax error because not cannot be used with a character operand. You need to apply parens to give the desired meaning to your expression:
not (chr(n) in myAmpls)