I am new in Eralng . get a little query about applying functions
assumming got a funciton defined :
mysum(X) -> fun(Y)-> X + Y end.
then try to calling like this
mysum(32)(332)
getting error
* 1: syntax error before: '('
so I had to
apply(mysum(32),[333])
or
M = mysum(32), M(333)
but I would like to know a little bit more , why it is not supporting , what is the disadvantage
As you expected, mysum return a function. you must enclose the evaluation inside parenthesis to satisfy the erlang parser:
(mysum(32))(332)
this spelling is obviously not ambiguous.
Your expression seems not ambiguous because you know that mysum(32) is a function, but the types are solved at run time in erlang, so the parser has no idea of what is mysum(32), it is expecting some help here to know what it has to do: the parenthesis, the apply or the intermediate variables, but it could be an operator or a separator.
Related
Let's say I'm writing a parser that parses the following syntax:
foo.bar().baz = 5;
The grammar rules look something like this:
program: one or more statement
statement: expression followed by ";"
expression: one of:
- identifier (\w+)
- number (\d+)
- func call: expression "(" ")"
- dot operator: expression "." identifier
Two expressions have a problem, the func call and the dot operator. This is because the expressions are recursive and look for another expression at the start, causing a stack overflow. I will focus on the dot operator for this quesition.
We face a similar problem with the plus operator. However, rather than using an expression you would do something like this to solve it (look for a "term" instead):
add operation: term "+" term
term: one of:
- number (\d+)
- "(" expression ")"
The term then includes everything except the add operation itself. To ensure that multiple plus operators can be chained together without using parenthesis, one would rather do:
add operation: term, one or more of ("+" followed by term)
I was thinking a similar solution could for for the dot operator or for function calls.
However, the dot operator works a little differently. We always evaluate from left-to-right and need to allow full expressions so that you can do function calls etc. in-between. With parenthesis, an example might be:
(foo.bar()).baz = 5;
Unfortunately, I do not want to require parenthesis. This would end up being the case if following the method used for the plus operator.
How could I go about implementing this?
Currently my parser never peeks ahead, but even if I do look ahead, it still seems tricky to accomplish.
The easy solution would be to use a bottom-up parser which doesn't drop into a bottomless pit on left recursion, but I suppose you have already rejected that solution.
I don't understand your objection to using a looping construct, though. Postfix modifiers like field lookup and function call are not really different from binary operators like addition (except, of course, for the fact that they will not need to claim an eventual right operand). Plus and minus intermingle freely, which you can parse with a repetition like:
additive: term ( '+' term | '-' term )*
Similarly, postfix modifiers can be easily parsed with something like:
postfixed: atom ( '.' ID | '(' opt-expr-list `)` )*
I'm using a form of extended BNF: parentheses group; | separates alternatives and binds less stringly than concatenation; and * means "zero or more repetitions" of the atom on its left.
Another postfix operator which falls into the same category is array/map subscripting ('[' expr ']'), although you might also have other postfix operators.
Note that like the additive syntax above, selecting the appropriate alternative does not require looking beyond the next token. It's hard to parse without being able to peek one token into the future. Fortunately, that's very little overhead.
One way could be for the dot operator to parse a non-dot expression, that is, a rule that is the same as expression but without the dot operator. This prevents recursion.
Then, when the non-dot expression has been parsed, check if a dot and an identifier follows. If this is not the case, we are done. If this is the case, wrap the current node up in a dot operation node. Then, keep track of the entire string text that has been parsed for this operation so far. Then revert everything back to before the operation was being parsed, and now re-parse a "custom expression", where the first directly-nested expression would really be trying to match the exact string that was parsed before rather than a real expression. Repeat until there are no more dot-identifier pairs (this should happen automatically by the new "custom expression").
This is messy, complicated and possibly slow, and I'm not entirely sure if it'll work but I'll try it out. I'd appreciate alternative solutions.
This syntax module is syntactically valid:
module mod1
syntax Empty =
;
And so is this one, which should be an equivalent grammar to the previous one:
module mod2
syntax Empty =
( )
;
(The resulting parser accepts only empty strings.)
Which means that you can make grammars such as this one:
module mod3
syntax EmptyOrKitchen =
( ) | "kitchen"
;
But, the following is not allowed (nested parenthesis):
module mod4
syntax Empty =
(( ))
;
I would have guessed that redundant parenthesis are allowed, since they are allowed in things like expressions, e.g. ((2)) + 2.
This problem came up when working with the data types for internal representation of rascal syntax definitions. The following code will create the same module as in the last example, namely mod4 (modulo some whitespace):
import Grammar;
import lang::rascal::format::Grammar;
str sm1 = definition2rascal(\definition("unknown_main",("the-module":\module("unknown",{},{},grammar({sort("Empty")},(sort("Empty"):prod(sort("Empty"),[
alt({seq([])})
],{})))))));
The problematic part of the data is on its own line - alt({seq([])}). If this code is changed to seq([]), then you get the same syntax module as mod2. If you further delete this whole expression, i.e. so that you get this:
str sm3 =
definition2rascal(\definition("unknown_main",("the-module":\module("unknown",{},{},grammar({sort("Empty")},(sort("Empty"):prod(sort("Empty"),[
], {})))))));
Then you get mod1.
So should such redundant parenthesis by printed by the definition2rascal(...) function? And should it matter with regards to making the resulting module valid or not?
Why they are not allowed is basically we wanted to see if we could do without. There is currently no priority relation between the symbol kinds, so in general there is no need to have a bracket syntax (like you do need to + and * in expressions).
Already the brackets have two different semantics, one () being the epsilon symbol and two (Sym1 Sym2 ...) being a nested sequence. This nested sequence is defined (syntactically) to expect at least two symbols. Now we could without ambiguity introduce a third semantics for the brackets with a single symbol or relax the requirement for sequence... But we reckoned it would be confusing that in one case you would get an extra layer in the resulting parse tree (sequence), while in the other case you would not (ignored superfluous bracket).
More detailed wise, the problem of printing seq([]) is not so much a problem of the meta syntax but rather that the backing abstract notation is more relaxed than the concrete notation (i.e. it is a bigger language or an over-approximation). The parser generator will generate a working parser for seq([]). But, there is no Rascal notation for an empty sequence and I guess the pretty printer should throw an exception.
Grammar:
rule: (a b)? a c ;
Input:
a d
Question: Which error message correct at position 2 for given input?
1. expected "b", "c".
2. expected "c".
P.S.
I write parser and I have choice (dilemma) take into account that "b" expected at position or not take.
The #1 error (expected "b", "c") want to say that input "a b" expected but because it optional it may not expected but possible.
I don't know possible is the same as expected or not?
Which error message better and correct #1 or #2?
Thanks for answers.
P.S.
In first case I define marker of testing as limit of position.
if(_inputPos > testing) {
_failure(_inputPos, _code[cp + {{OFFSET_RESULT}}]);
}
Limit moved in optional expressions:
OPTIONAL_EXPRESSION:
testing = _inputPos;
The "b" expression move _inputPos above the testing pos and add failure at _inputPos.
In second case I can define marker of testing as boolean flag.
if(!testing) {
_failure(_inputPos, _code[cp + {{OFFSET_RESULT}}]);
}
The "b" expression in this case not add failure because it tested (inner for optional expression).
What you think what is better and correct?
Testing defined as specific position and if expression above this position (_inputPos > testing) it add failure (even it inside optional expression).
Testing defined as flag and if this flag set that the failures not takes into account. After executing optional expression it restore (not reset!) previous value of testing (true or false).
Also failures not takes into account if rule not fails. They only reported if parsing fails.
P.S.
Changes at 06 Jan 2014
This question raised because it related to two different problems.
First problem:
Parsing expression grammar (PEG) describe only three atomic items of input:
terminal symbol
nonterminal symbol
empty string
This grammar does not provide such operation as lexical preprocessing an thus it does not provide such element as the token.
Second problem:
What is a grammar? Are two grammars can be considred equal if they accept the same input but produce different result?
Assume we have two grammar:
Grammar 1
rule <- type? identifier
Grammar 2
rule <- type identifier / identifier
They both accept the same input but produce (in PEG) different result.
Grammar 1 results:
{type : type, identifier : identifier}
{type : null, identifier : identifier}
Grammar 2 results:
{type : type, identifier : identifier}
{identifier : identifier}
Quetions:
Both grammar equal?
It is painless to do optimization of grammars?
My answer on both questions is negative. No equal, Not painless.
But you may ask. "But why this happens?".
I can answer to you. "Because this is not a problem. This is a feature".
In PEG parser expression ALWAYS consists from these parts.
ORDERED_CHOICE => SEQUENCE => EXPRESSION
And this explanation is the my answer on question "But why this happens?".
Another problem.
PEG parser does not recognize WHITESPACES because it does not have tokens and tokens separators.
Now look at this grammar (in short):
program <- WHITESPACE expr EOF
expr <- ruleX
ruleX <- 'X' WHITESPACE
WHITESPACE < ' '?
EOF <- ! .
All PEG grammar desribed in this manner.
First WHITESPACE at begin and other WHITESPACE (often) at the end of rule.
In this case in PEG optional WHITESPACE must be assumed as expected.
But WHITESPACE not means only space. It may be more complex [\t\n\r] and even comments.
But the main rule of error messages is the following.
If not possible to display all expected elements (or not possible to display at least one from all set of expected elements) in this case is more correct do not display anything.
More precisely required to display "unexpected" error mesage.
How you in PEG will display expected WHITESPACE?
Parser error: expected WHITESPACE
Parser error: expected ' ', '\t', '\n' , 'r'
What about start charcters of comments? They also may be part of WHITESPACE in some grammars.
In this case optional WHITESPACE will be reject all other potential expected elements because not possible correctly to display WHITESPACE in error message because WHITESPACE is too complex to display.
Is this good or bad?
I think this is not bad and required some tricks to hide this nature of PEG parsers.
And in my PEG parser I not assume that the inner expression at first position of optional (optional & zero_or_more) expression must be treated as expected.
But all other inner (except at the first position) must treated as expected.
Example 1:
List<int list; // type? ident
Here "List<int" is a "type". But missing ">" is not at the first position in optional "type?".
This failure take into account and report as "expected '>'"
This is because we not skip "type" but enter into "type" and after really optional "List" we move position from first to next real "expected" (that already outside of testing position) element.
"List" was in "testing" position.
If inner expression (inside optional expression) "fits in the limitation" not continue at next position then it not assumed as the expected input.
From this assumption has been asked main question.
You must just take into account that we are talking about PEG parsers and their error messages.
Here is your grammar:
What is clear here is that after the first a there are two possible inputs: b or c. Your error message should not prioritize one over the other.
The basic idea to produce an error message for an invalid input is to find the most far place you failed (if your grammar where d | (a b)? a c, d wouldn't be part of the error) and determine what are all possible inputs that could make you advance and say "expected '...' but got '...'". There are other approaches to try to recover the parser and force it to continue. If there is only one possible expected token, let's temporarily insert it into the token stream and continue as if it where there since ever. This would lead to better error detection as you can find errors beyond the point where the parser first stopped.
Im trying to model the EBNF expression
("declare" "namespace" ";")* ("declare" "variable" ";")*
I have built up the yacc (Im using MPPG) grammar, which seems to represent this, but it fails to match my test expression.
The test case i'm trying to match is
declare variable;
The Token stream from the lexer is
KW_Declare
KW_Variable
Separator
The grammar parse says there is a "Shift/Reduce conflict, state 6 on KW_Declare". I have attempted to solve this with "%left PrologHeaderList PrologBodyList", but neither solution works.
Program : Prolog;
Prolog : PrologHeaderList PrologBodyList;
PrologHeaderList : /*EMPTY*/
| PrologHeaderList PrologHeader;
PrologHeader : KW_Declare KW_Namespace Separator;
PrologBodyList : /*EMPTY*/
| PrologBodyList PrologBody;
PrologBody : KW_Declare KW_Variable Separator;
KW_Declare KW_Namespace KW_Variable Separator are all tokens with values "declare", "naemsapce", "variable", ";".
It's been a long time since I've used anything yacc-like, but here are a couple of suggestions that may or may not help.
It seems that you need a 2-token lookahead in this situation. The parser gets to the last PrologHeader, and it has to decide whether the next construct is a PrologHeader or a PrologBody, and it can't tell that from the KW_Declare. If there's a directive to increase lookahead in this situation, it will probably solve the problem.
You could also introduce context into your actions: rather than define PrologHeaderList and PrologBodyList, define PrologRuleList and have the actions throw an error if a header appears after a body. Ugly, but sometimes you have to do it: what appears simple in a grammar may not be simple in the generated parser.
A hackish approach might be to combine the tokens: rather than KW_Declare and KW_Variable, have your lexer recognize the space and use KW_Declare_Variable. Since both are keywords, you're not going to run into namespace collision problems.
The grammar at the top is regular so IIRC you can plot it out as a DFA (or a NDA and convert it to a DFA) and then convert the DFA to a grammar. It's bean a while so I'll leave the work as an exercise for the reader.
I'm learning Bison and at this time the only thing that I do was the rpcalc example, but now I want to implement a print function(like printf of C), but I don't know how to do this and I'm planning to have a syntax like this print ("Something here");, but I don't know how to build the print function and I don't know how to create that ; as a end of line. Thanks for your help.
You first need to ask yourself:
What are the [sub-]parts of my 'print ("something");' syntax ?
Once you identify these parts, "simply" describe them in the form of grammar syntax rules, along with applicable production rules. And then let Bison generate the parser for you; that's about it.
To put you on your way:
The semi-column is probably a element you will use to separate statemements (such a one "call" to print from another).
'print' itself is probably a keyword, or preferably a native function name of your language.
The print statement appears to take a literal string as [one of] its arguments. a literal string starts and ends with a double quote (and probably allow for escaped quotes within itself)
etc.
The bolded and italic expressions above are some of the entities (the 'symbols' in parser lingo) you'll likely need to define in the syntax for your language. For that you'll use Bison grammar rules, such as
stmt : print_stmt ';' | input_stmt ';'| some_other_stmt ';' ;
prnt_stmt : print '(' args ')'
{ printf( $3 ); }
;
args : arg ',' args;
...
Since the question asked about the semi-column, maybe some confusion was from the different uses thereof; see for example above how the ';' belong to your language's syntax whereby the ; (no quotes) at the end of each grammar rule are part of Bison's language.
Note: this is of course a simplistic implementation, aimed at showing the essential. Also the Bison syntax may be a tat off (been there / done it, but a long while back ;-) I then "met" ANTLR never to return to Bison, although I do see how its lightweight and fully self contained nature can make it appropriate in some cases)