unregister_name({local,Name}) ->
_ = (catch unregister(Name));
unregister_name({global,Name}) ->
_ = global:unregister_name(Name);
unregister_name({via, Mod, Name}) ->
_ = Mod:unregister_name(Name);
unregister_name(Pid) when is_pid(Pid) ->
Pid.
This is from gen_server.erl. If _ always matches and the match always evaluates to the right hand side expression, what are the _ = expression() lines doing here?
Typically _ = ... matches are used to quiet dialyzer warnings about unmatched function return values when its -Wunmatched_returns option is used. As the documentation explains:
-Wunmatched_returns
Include warnings for function calls which ignore a structured return value or
do not match against one of many possible return value(s).
By explicitly matching the return value against the _ "don't care" variable, you can use this useful dialyzer option without having to see warnings for return values you don't care about.
In Erlang, last expression of function is its return value, so someone might be tempted to check, what global:unregister_name/1 or Mod:unregister_name(Name) return and try to pattern match on that.
The _ = expression() doesn't do anything in particular, but hints, that this return value should be ignored (for example, because they are not documented and might be subject to change). However in the last expression, Pid is returned explicitly. This means, that you can pattern match like this:
case unregister_name(Something) of
Pid when is_pid(Pid) -> foo();
_ -> bar()
end.
To sum up: those lines aren't doing anything there, but when someone else is reading the source code, they show original programmer intent.
Unfortunately, this particular function is not exported and in the original module never used in pattern match, so I don't have an example to back this up :)
And I'll note that I've since come across this:
The Power of Ten – Rules for Developing Safety Critical Code
Gerard J. Holzmann
NASA/JPL Laboratory for Reliable Software Pasadena, CA
91109
[...]
Rule: The return value of non-void functions must be checked by each calling function, and the validity of parameters must be checked
inside each function.
Rationale: This is possibly the most frequently
violated rule, and therefore somewhat more suspect as a general rule.
In its strictest form, this rule means that even the return value of
printf statements and file close statements must be checked. One can
make a case, though, that if the response to an error would rightfully
be no different than the response to success, there is little point in
explicitly checking a return value. This is often the case with calls
to printf and close. In cases like these, it can be acceptable to
explicitly cast the function return value to (void) – thereby
indicating that the programmer explicitly and not accidentally decides
to ignore a return value. In more dubious cases, a comment should be
present to explain why a return value is irrelevant. In most cases,
though, the return value of a function should not be ignored,
especially if error return values must be propagated up the function
call chain. Standard libraries famously violate this rule with
potentially grave consequences. See, for instance, what happens if you
accidentally execute strlen(0), or strcat(s1, s2, -1) with the
standard C string library – it is not pretty. By keeping the general
rule, we make sure that exceptions must be justified, with mechanical
checkers flagging violations. Often, it will be easier to comply with
the rule than to explain why noncompliance might be acceptable.
Related
Here's my grammar to the for statements:
FOR x>0 {
//somthing
}
// or
FOR x = 0; x > 0; x++ {
//somthing
}
it has the same prefix FOR, and I'd want to print the for_begin label after InitExpression,
however the codes right after FOR will become useless because of confliction.
ForStmt
: FOR {
printf("for_begin_%d:\n", n);
} Expression {
printf("ifeq for_exit_%d\n", n);
} ForBlock
| FOR ForClause ForBlock
;
ForClause
: InitExpression ';' {
printf("for_begin_%d:\n", n);
} Expression ';' Expression { printf("ifeq for_exit_%d\n", n); }
;
I had tried to change it to something like:
ForStart
: FOR
| FOR InitExpression
;
or use a flag to mention where to print the for_begin label,
but also fail to resolve the conflict.
How to make it not conflict?
How can the parser know which alternative of the FOR statement it sees?
While it's possible that an InitExpression has identifiable form, such as an assignment statement, which could not be used in a conditional expression. That strikes me as too restrictive for practical purposes -- there are many things you might do to initialise a loop other than a direct assignment -- but leaving that aside, it means that the earliest the InitExpression can be definitively identified is when the assignment operator is seen. If lvalues in your language can only be simple identifiers, that would make it the second lookahead token after the FOR, but in most useful language lvalues can be much more complicated than just simple identifiers, and so it's likely that the InitExpression cannot be definitively identified with finite lookahead.
But it's more likely that the only significant difference between the two forms is that the expression in the first form is followed by a block (which I suppose cannot start with a semicolon) and the first expression in the second form is followed by a semicolon. So the parser knows what it is parsing at the end of the first expression and no earlier.
Normally, that would not cause a problem. Were it not for the MidRule Action which inserts a label, the parser does not have to make a reduction decision until it reaches the end of the first expression, at which point it needs to decide whether to reduce the first expression as an InitExpression or an Expression. But at that point, the lookahead token as either a semicolon or the first token of a block, so the lookahead token can guide the decision.
But the Mid-Rule Action makes that impossible. The Mid-Rule Action must either be reduced or not before shifting the token which immediately follows the FOR token, and -- as your examples show -- the lookahead token could be the same (i) in both cases.
Fundamentally, the issue is that you want to build a one-pass compiler rather than just parsing the input into an AST and then walking the AST to generate assembler code (possibly after doing some other traverses over the AST in order to perform other analyses and allow for code optimisation). The one-pass code generator depends on Mid-Rule Actions, and Mid-Rule Actions in turn can easily generate unresolvable parsing conflicts. This issue is so notorious that there is a chapter in the bison manual dedicated to it, which is well worth reading.
So there is no good solution. But in this case, there is a simple solution, because the action you want to take is just to insert a label, and inserting a label which happens never to be used is not in any way going to affect the code which will ultimately be executed. So you might as well insert a label immediately after the FOR statement, whether you will need it or not, and then insert another label after the InitExpression if it turns out that there was such a thing. You don't need to actually know which label to use until you reach the end of the conditional expression, which is much later.
As explained in the Bison manual chapter I already linked to, this cannot be done using Mid-Rule Actions, because Bison doesn't attempt to compare Mid-Rule Actions with each other. Even if two actions happen to be identical, Bison will still need to decide which one to execute, thereby generating a conflict. So instead of using an MRA, you need to house the action in a marker non-terminal -- a non-terminal with an empty right-hand side, used only to trigger an action.
That would make the grammar look something like this:
ForLabel
: %empty { $$ = n; printf("for_begin_%d:\n", n++); }
ForStmt
: FOR
ForLabel[label]
Expression { printf("ifeq for_exit_%d\n", label); }
ForBlock { printf("jmp for_begin_%d\n", label);
printf("for_exit_%d:\n", label); }
| FOR
ForLabel
InitExpress ';'
ForLabel[label]
Expression ';'
Expression { printf("ifeq for_exit_%d\n", label); }
ForBlock { printf("jmp for_begin_%d\n", label);
printf("for_exit_%d:\n", label); }
;
([label] gives a name to a semantic value, which avoids having to use a rather mysterious and possibly incorrect $2 or $6. See Named References in the handy Bison manual.)
I have got a problem understanding flex yyunput behavior.
I want to put back some charackters
For exemple:
My scanner found CALL{space}{cc}
cc N?Z|N?C|P[OE]?|M
%%
CALL{blank}{cc} {BEGIN CON; return yy::ez80asm_parser::make_CALL(loc);}
CALL{mmode}{blank}{cc} {BEGIN CON; return yy::ez80asm_parser::make_CALL(loc);}
CALL {BEGIN ARG; return yy::ez80asm_parser::make_CALL(loc);}
and I want to give back the {cc} so it will be scanned next time.
What are the both arguments of yyunput has to be? I couldn't found any helpfully information about that funktion.
Any hints are wellcome
Jürgen
You can't "give back the {cc}" because the regular expression doesn't have pieces. (Flex does not do captures, either, so it wouldn't help to put parentheses around it.)
If you just want to rescan part of a token, it is much better to use yyless than unput, since yyless mostly just changes a pointer. With a single call to yyless you can return as many characters as you like, so you only need to know how many characters to return. (More precisely, you tell it how many characters you want to keep in yytext; the remainder are returned and yytext is truncated accordingly.)
For reference, unput is a macro whose single argument is a single character which will be pushed onto the beginning of the unconsumed input, overwriting yytext as it goes. (In the C++ API, it calls the internal member function ::yyunput, supplying it an additional necessary argument. Don't call this function directly.)
If you need to push several characters onto the input, you need to unput them one at a time, starting with the last one. Since unput destroys the value of yytext, you need to make sure that you've already copied it if you need it before calling unput.
In your case, I think neither of these is appropriate. What you probably want to do is to not include the {cc} pattern in match in the first place, which you can do with flex's trailing context operator /. (That assumes that you don't need to include the characters matched by {cc} in the semantic value you will be returning; in the example provided, yytext does not appear to be part of the semantic value, so the assumption should be safe.) To do so, you might write something like:
CALL{mmode}?{blank}/{cc} {BEGIN CON; return yy::ez80asm_parser::make_CALL(loc);}
CALL {BEGIN ARG; return yy::ez80asm_parser::make_CALL(loc);}
(Note: I combined your first two patterns into a single one since they seem to have the same action, but if you actually need the characters matched by {mmode} you might not want to do that.)
If that doesn't work, for whatever reason, use yyless. You'll need to know how many characters you want to return to the input, so I imagine you would end up with something like:
CALL{mmode}?{blank}{cc} { BEGIN CON;
int to_keep = yyleng - 1;
switch (yytext[to_keep]) {
case 'C': case 'Z':
if (yytext[to_keep - 1] == 'N') --to_keep;
break;
case 'E': case 'O': --to_keep; break
case 'P': case 'N': break;
default: assert(false); /* internal error */
}
yyless(to_keep);
return yy::ez80asm_parser::make_CALL(loc);
}
For details on the trailing context operator, see the Flex manual section on patterns (search for the word "trailing"; there is an important note towards the end as well) as well as the first paragraph of the following chapter on matching. yyless and unput are both documented in the chapter on actions, which includes examples of their usage.
Is there a good reason why the type of Prelude.read is
read :: Read a => String -> a
rather than returning a Maybe value?
read :: Read a => String -> Maybe a
Since the string might fail to be parseable Haskell, wouldn't the latter be be more natural?
Or even an Either String a, where Left would contain the original string if it didn't parse, and Right the result if it did?
Edit:
I'm not trying to get others to write a corresponding wrapper for me. Just seeking reassurance that it's safe to do so.
Edit: As of GHC 7.6, readMaybe is available in the Text.Read module in the base package, along with readEither: http://hackage.haskell.org/packages/archive/base/latest/doc/html/Text-Read.html#v:readMaybe
Great question! The type of read itself isn't changing anytime soon because that would break lots of things. However, there should be a maybeRead function.
Why isn't there? The answer is "inertia". There was a discussion in '08 which got derailed by a discussion over "fail."
The good news is that folks were sufficiently convinced to start moving away from fail in the libraries. The bad news is that the proposal got lost in the shuffle. There should be such a function, although one is easy to write (and there are zillions of very similar versions floating around many codebases).
See also this discussion.
Personally, I use the version from the safe package.
Yeah, it would be handy with a read function that returns Maybe. You can make one yourself:
readMaybe :: (Read a) => String -> Maybe a
readMaybe s = case reads s of
[(x, "")] -> Just x
_ -> Nothing
Apart from inertia and/or changing insights, another reason might be that it's aesthetically pleasing to have a function that can act as a kind of inverse of show. That is, you want that read . show is the identity (for types which are an instance of Show and Read) and that show . read is the identity on the range of show (i.e. show . read . show == show)
Having a Maybe in the type of read breaks the symmetry with show :: a -> String.
As #augustss pointed out, you can make your own safe read function. However, his readMaybe isn't completely consistent with read, as it doesn't ignore whitespace at the end of a string. (I made this mistake once, I don't quite remember the context)
Looking at the definition of read in the Haskell 98 report, we can modify it to implement a readMaybe that is perfectly consistent with read, and this is not too inconvenient because all the functions it depends on are defined in the Prelude:
readMaybe :: (Read a) => String -> Maybe a
readMaybe s = case [x | (x,t) <- reads s, ("","") <- lex t] of
[x] -> Just x
_ -> Nothing
This function (called readMaybe) is now in the Haskell prelude! (As of the current base -- 4.6)
I am curious why the comma ‹,› is a shortcut for and and not andalso in guard tests.
Since I'd call myself a “C native” I fail to see any shortcomings of short-circuit boolean evaluation.
I compiled some test code using the to_core flag to see what code is actually generated. Using the comma, I see the left hand value and right and value get evaluated and both and'ed. With andalso you have a case block within the case block and no call to erlang:and/2.
I did no benchmark tests but I daresay the andalso variant is the faster one.
To delve into the past:
Originally in guards there were only , separated tests which were evaluated from left-to-right until either there were no more and the guard succeeded or a test failed and the guard as a whole failed. Later ; was added to allow alternate guards in the same clause. If guards evaluate both sides of a , before testing then someone has gotten it wrong along the way. #Kay's example seems to imply that they do go from left-to-right as they should.
Boolean operators were only allowed much later in guards.
and, together with or, xor and not, is a boolean operator and was not intended for control. They are all strict and evaluate their arguments first, like the arithmetic operators +, -, * and '/'. There exist strict boolean operators in C as well.
The short-circuiting control operators andalso and orelse were added later to simplify some code. As you have said the compiler does expand them to nested case expressions so there is no performance gain in using them, just convenience and clarity of code. This would explain the resultant code you saw.
N.B. in guards there are tests and not expressions. There is a subtle difference which means that while using and and andalso is equivalent to , using orelse is not equivalent to ;. This is left to another question. Hint: it's all about failure.
So both and and andalso have their place.
Adam Lindbergs link is right. Using the comma does generate better beam code than using andalso. I compiled the following code using the +to_asm flag:
a(A,B) ->
case ok of
_ when A, B -> true;
_ -> false
end.
aa(A,B) ->
case ok of
_ when A andalso B -> true;
_ -> false
end.
which generates
{function, a, 2, 2}.
{label,1}.
{func_info,{atom,andAndAndalso},{atom,a},2}.
{label,2}.
{test,is_eq_exact,{f,3},[{x,0},{atom,true}]}.
{test,is_eq_exact,{f,3},[{x,1},{atom,true}]}.
{move,{atom,true},{x,0}}.
return.
{label,3}.
{move,{atom,false},{x,0}}.
return.
{function, aa, 2, 5}.
{label,4}.
{func_info,{atom,andAndAndalso},{atom,aa},2}.
{label,5}.
{test,is_atom,{f,7},[{x,0}]}.
{select_val,{x,0},{f,7},{list,[{atom,true},{f,6},{atom,false},{f,9}]}}.
{label,6}.
{move,{x,1},{x,2}}.
{jump,{f,8}}.
{label,7}.
{move,{x,0},{x,2}}.
{label,8}.
{test,is_eq_exact,{f,9},[{x,2},{atom,true}]}.
{move,{atom,true},{x,0}}.
return.
{label,9}.
{move,{atom,false},{x,0}}.
return.
I only looked into what is generated with the +to_core flag, but obviously there is a optimization step between to_core and to_asm.
It's an historical reason. and was implemented before andalso, which was introduced in Erlang 5.1 (the only reference I can find right now is EEP-17). Guards have not been changed because of backwards compatibility.
The boolean operators "and" and "or" always evaluate arguements on both the sides of the operator. Whereas if you want the functionality of C operators && and || (where 2nd arguement is evaluated only if needed..for eg if we want to evalue "true orelse false" as soon as true is found to be the first arguement, the second arguement will not be evaluated which is not the case had "or" been used ) go for "andalso" and "orelse".
I have read the GOLD Homepage ( http://www.devincook.com/goldparser/ ) docs, FAQ and Wikipedia to find out what practical application there could possibly be for GOLD. I was thinking along the lines of having a programming language (easily) available to my systems such as ABAP on SAP or X++ on Axapta - but it doesn't look feasible to me, at least not easily - even if you use GOLD.
The final use of the parsed result produced by GOLD escapes me - what do you do with the result of the parse?
EDIT: A practical example (description) would be great.
Parsing really consists of two phases. The first is "lexing", which convert the raw strings of character in to something that the program can more readily understand (commonly called tokens).
Simple example, lex would convert:
if (a + b > 2) then
In to:
IF_TOKEN LEFT_PAREN IDENTIFIER(a) PLUS_SIGN IDENTIFIER(b) GREATER_THAN NUMBER(2) RIGHT_PAREN THEN_TOKEN
The parse takes that stream of tokens, and attempts to make yet more sense out of them. In this case, it would try and match up those tokens to an IF_STATEMENT. To the parse, the IF _STATEMENT may well look like this:
IF ( BOOLEAN_EXPRESSION ) THEN
Where the result of the lexing phase is a token stream, the result of the parsing phase is a Parse Tree.
So, a parser could convert the above in to:
if_statement
|
v
boolean_expression.operator = GREATER_THAN
| |
| v
V numeric_constant.string="2"
expression.operator = PLUS_SIGN
| |
| v
v identifier.string = "b"
identifier.string = "a"
Here you see we have an IF_STATEMENT. An IF_STATEMENT has a single argument, which is a BOOLEAN_EXPRESSION. This was explained in some manner to the parser. When the parser is converting the token stream, it "knows" what a IF looks like, and know what a BOOLEAN_EXPRESSION looks like, so it can make the proper assignments when it sees the code.
For example, if you have just:
if (a + b) then
The parser could know that it's not a boolean expression (because the + is arithmetic, not a boolean operator) and the parse could throw an error at this point.
Next, we see that a BOOLEAN_EXPRESSION has 3 components, the operator (GREATER_THAN), and two sides, the left side and the right side.
On the left side, it points to yet another expression, the "a + b", while on the right is points to a NUMERIC_CONSTANT, in this case the string "2". Again, the parser "knows" this is a NUMERIC constant because we told it about strings of numbers. If it wasn't numbers, it would be an IDENTIFIER (like "a" and "b" are).
Note, that if we had something like:
if (a + b > "XYZ") then
That "parses" just fine (expression on the left, string constant on the right). We don't know from looking at this whether this is a valid expression or not. We don't know if "a" or "b" reference Strings or Numbers at this point. So, this is something the parser can't decided for us, can't flag as an error, as it simply doesn't know. That will happen when we evaluate (either execute or try to compile in to code) the IF statement.
If we did:
if [a > b ) then
The parser can readily see that syntax error as a problem, and will throw an error. That string of tokens doesn't look like anything it knows about.
So, the point being that when you get a complete parse tree, you have some assurance that at first cut the "code looks good". Now during execution, other errors may well come up.
To evaluate the parse tree, you just walk the tree. You'll have some code associated with the major nodes of the parse tree during the compile or evaluation part. Let's assuming that we have an interpreter.
public void execute_if_statment(ParseTreeNode node) {
// We already know we have a IF_STATEMENT node
Value value = evaluate_expression(node.getBooleanExpression());
if (value.getBooleanResult() == true) {
// we do the "then" part of the code
}
}
public Value evaluate_expression(ParseTreeNode node) {
Value result = null;
if (node.isConstant()) {
result = evaluate_constant(node);
return result;
}
if (node.isIdentifier()) {
result = lookupIdentifier(node);
return result;
}
Value leftSide = evaluate_expression(node.getLeftSide());
Value rightSide = evaluate_expression(node.getRightSide());
if (node.getOperator() == '+') {
if (!leftSide.isNumber() || !rightSide.isNumber()) {
throw new RuntimeError("Must have numbers for adding");
}
int l = leftSide.getIntValue();
int r = rightSide.getIntValue();
int sum = l + r;
return new Value(sum);
}
if (node.getOperator() == '>') {
if (leftSide.getType() != rightSide.getType()) {
throw new RuntimeError("You can only compare values of the same type");
}
if (leftSide.isNumber()) {
int l = leftSide.getIntValue();
int r = rightSide.getIntValue();
boolean greater = l > r;
return new Value(greater);
} else {
// do string compare instead
}
}
}
So, you can see that we have a recursive evaluator here. You see how we're checking the run time types, and performing the basic evaluations.
What will happen is the execute_if_statement will evaluate it's main expression. Even tho we wanted only BOOLEAN_EXPRESION in the parse, all expressions are mostly the same for our purposes. So, execute_if_statement calls evaluate_expression.
In our system, all expressions have an operator and a left and right side. Each side of an expression is ALSO an expression, so you can see how we immediately try and evaluate those as well to get their real value. The one note is that if the expression consists of a CONSTANT, then we simply return the constants value, if it's an identifier, we look it up as a variable (and that would be a good place to throw a "I can't find the variable 'a'" message), otherwise we're back to the left side/right side thing.
I hope you can see how a simple evaluator can work once you have a token stream from a parser. Note how during evaluation, the major elements of the language are in place, otherwise we'd have got a syntax error and never got to this phase. We can simply expect to "know" that when we have a, for example, PLUS operator, we're going to have 2 expressions, the left and right side. Or when we execute an IF statement, that we already have a boolean expression to evaluate. The parse is what does that heavy lifting for us.
Getting started with a new language can be a challenge, but you'll find once you get rolling, the rest become pretty straightforward and it's almost "magic" that it all works in the end.
Note, pardon the formatting, but underscores are messing things up -- I hope it's still clear.
I would recommend antlr.org for information and the 'free' tool I would use for any parser use.
GOLD can be used for any kind of application where you have to apply context-free grammars to input.
elaboration:
Essentially, CFGs apply to all programming languages. So if you wanted to develop a scripting language for your company, you'd need to write a parser- or get a parsing program. Alternatively, if you wanted to have a semi-natural language for input for non-programmers in the company, you could use a parser to read that input and spit out more "machine-readable" data. Essentially, a context-free grammar allows you to describe far more inputs than a regular expression. The GOLD system apparently makes the parsing problem somewhat easier than lex/yacc(the UNIX standard programs for parsing).