I am looking at the following F# line
for i = 0 to i=10 do
Console.WriteLine("Hello")
An I am wondering that isn't the above line a statement as opposed to an expression?
Shouldn't everything be an expression in F#?
As already said, every syntactical construct in F# is an expression. F# does not distinguish between statements and expressions (and so I'd say that the WikiPedia quote posted by Robert is a bit misleading - F# does not have statements).
Actually, the above is not fully true, because some constructs in F# computation expressions such as let! are not expressions, but we can ignore that.
What does that mean? In C#, the syntax of for and method calls is defined something like this:
statement := foreach(var v in <expression>) <statement>
| { <statement> ... <statement> }
| <expression>;
| (...)
expression := <expression>.<ident>(<expression>, ..., <expression>)
| <literal>
| <expression> + <expression>
| (...)
This is very simplified, but it should give you the idea - a statement is something that does not evaluate to a value. It can be foreach loop (other loops), a statement block (with multiple statements) or an expression with semicolon (where the result of the expression is void or is ignored). An expression is, for example, method call, primitive literal (string, int) or a binary operator.
This means that you cannot write certain things in C# - for example, the argument of method call cannot be a statement (because statements do not evaluate to a value!)
On the other hand, in F#, everything is an expression. This means there is just a single syntactic category:
expression := for v in <expression> do <expression>
| <expression>; <expression>
| <expression>.<ident>(<expression>, ..., <expression>)
| <literal>
| <expression> + <expression>
| (...)
This means that in F# all syntactic constructs are expressions, including for and other loops. The body of for is also an expression, but it would not make sense if the expression evaluated to some value (i.e. 42), so the types require that the result of the body is unit (which does not carry any information). Similarly, the first expression in sequencing (<expr>; <expr>) should return unit - the result of sequencing is the result of the second expression.
This makes the language simpler and more uniform, but you can write some odd things:
let x = (for i in 0 .. 10 do printfn "%d" i); 42
This will print numbers from 0 to 10 and then define a value x to be 42. The assignment is a sequencing of expressions (<expr>; <expr>) where the first one is for loop (that has a type unit, because it does not evaluate to anything) and the second one is 42, which evaluates to 42.
Every statement in F#, including if statements and loops, is a
composable expression with a definite return type.
Functions and expressions that do not return any value have a return
type of unit.
http://en.wikipedia.org/wiki/F_Sharp_(programming_language)
In languages like F# statements are just expressions that return the value () of type unit. As the unit type has only one value it conveys no information so returning the value of type unit is saying "if I'm doing anything then it is by way of a side effect" like printing to the console or writing to disk.
Note that not everything is an expression in F#. Type definitions are not expressions. Patterns are not expressions. And so on...
Related
So, the cppreference claims:
The expression in the middle of the conditional operator (between ? and :) is parsed as if parenthesized: its precedence relative to ?: is ignored.
However, it appears to me that the part of the expression after the ':' operator is also parsed as if it were between parentheses. I've tried to implement the ternary operator in my programming language (and you can see the results of parsing expressions here), and my parser pretends that the part of the expression after ':' is also parenthesized. For example, for the expression (1?1:0?2:0)-1, the interpreter for my programming language outputs 0, and this appears to be compatible with C. For instance, the C program:
#include <stdio.h>
int main() {
printf("%d\n",(1?1:0?2:0)-1);
}
Outputs 0.
Had I programmed the parser of my programming language that, when parsing the ternary operators, simply take the first already parsed node after ':' and take it as the third operand to '?:', it would output the same as ((1?1:0)?2:0)-1, that is 1.
My question is whether this would (pretending that the expression after the ':' is parenthesized) always be compatible with C?
"Pretends that it is parenthesised" is some kind of description of operator parenthesis. But of course that has to be interpreted relative to precedence relations (including associativity). So in a-b*c and a*b-c, the subtraction effectively acts as though its arguments are parenthesised, only the left-hand argument is treated that way in a-b-c and it is the comparison operator which causes grouping in a<b-c and a-b<c.
I'm sure you know all that since your parser seems to work for all these cases, but I say that because the ternary operator is right-associative and of lower precedence than any other operator [Note 1]. That means that the pseudo-parentheses imposed by operator precedence surround the right-hand argument (regardless of its dominating operator, since all operators have higher precedence), and also the left-hand argument unless its dominating operator is another conditional operator. But that wouldn't be the case in C, where the comma operator has lower precedence and would not be enclosed by the imaginary parentheses following the :.
It's important to understand what is meant by the precedence of a complex operator. In effect, to compute the precedence relations we first collapse the operator to a simple ?: which includes the enclosed (second) argument. This is not "as if the expression were parenthesized", because it is parenthesized. It is parenthesized between ? and :, which in this context are syntactically parenthetic.
In this sense, it is very similar to the usual analysis of the subscript operator as a postfix operator, although the brackets of the subscript operator enclose a second argument. The precedence of the subscript operator is logically what would result from considering it to be a single [], abstracting away the expression contained inside. This is also the same as the function call operator. That happens to be written with parentheses, but the precise symbols are not important: it is possible to imagine an alternative language in which function calls are written with different symbols, perhaps { and }. That wouldn't affect the grammar at all.
It might seem odd to think of ? and : to be "parenthetic", since they don't look parenthetic. But a parser doesn't see the shapes of the symbols. It is satisfied by being told that a ( is closed by a ) and, in this case, that a ? is closed by a :. [Note 2]
Having said all that, I tried your compiler on the conditional expression
d = 0 ? 0 : n / d
It parses this expression correctly, but the compiled code computes n / d before verifying whether d = 0 is true. That's not the way the conditional operator should work; in this case, it will lead to an unexpected divide by 0 exception. The conditional operator must first evaluate its left-hand argument, and then evaluate exactly one of the other two expressions.
Notes:
In C, this is not quite correct. The comma operator has lower precedence, and there is a more complex interaction with assignment operators, which logically have the same precedence and are also right-associative.
In C-like languages those symbols are not used for any other purpose, so it's OK to just regard them as strange-looking parentheses and leave it at that. But as the case of the function-call operator shows (or, for that matter, the unary - operator), it is sometimes possible to reuse operator symbols for more than one purpose.
As a curiosity, it is not strictly necessary that open and close parentheses be different symbols, as long as they are not used for any other purpose. So, for example, if | is not used as an operator symbol (as it is in C), then you could use | a | to mean the absolute value of a without creating any ambiguities.
A precise analysis of the circumstances in which symbol reuse leads to actual ambiguities is beyond the scope of this answer.
I have written a lexer and parser to analyze linear algebra statements. Each statement consists of one or more expressions followed by one or more declarations. I am using menhir and OCaml to write the lexer and parser.
For example:
Ax = b, where A is invertible.
This should be read as A * x = b, (A, invertible)
In an expression all ids must be either an uppercase or lowercase symbol. I would like to overload the multiplication operator so that the user does not have to type in the '*' symbol.
However, since the lexer also needs to be able to read strings (such as "invertible" in this case), the "Ax" portion of the expression is sent over to the parser as a string. This causes a parser error since no strings should be encountered in the expression portion of the statement.
Here is the basic idea of the grammar
stmt :=
| expr "."
| decl "."
| expr "," decl "."
expr :=
| term
| unop expr
| expr binop expr
term :=
| <int> num
| <char> id
| "(" expr ")"
decl :=
| id "is" kinds
kinds :=
| <string> kind
| kind "and" kinds
Is there some way to separate the individual characters and tell the parser that they should be treated as multiplication? Is there a way to change the lexer so that it is smart enough to know that all character clusters before a comma are ids and all clusters after should be treated as strings?
It seems to me you have two problems:
You want your lexer to treat sequences of characters differently in different places.
You want multiplication to be indicated by adjacent expressions (no operator in between).
The first problem I would tackle in the lexer.
One question is why you say you need to use strings. This implies that there is a completely open-ended set of things you can say. It might be true, but if you can limit yourself to a smallish number, you can use keywords rather than strings. E.g., invertible would be a keyword.
If you really want to allow any string at all in such places, it's definitely still possible to hack a lexer so that it maintains a state describing what it has seen, and looks ahead to see what's coming. If you're not required to adhere to a pre-defined grammar, you could adjust your grammar to make this easier. (E.g., you could use commas for only one purpose.)
For the second problem, I'd say you need to add adjacency to your grammar. I.e., your grammar needs a rule that says something like term := term term. I suspect it's tricky to get this to work correctly, but it does work in OCaml (where adjacent expressions represent function application) and in awk (where adjacent expressions represent string concatenation).
I am currently learning how to create a simple expression language using Irony. I'm having a little bit of trouble figuring out the best way to define function signatures, and determining whose responsibility it is to validate the input to those functions.
So far, I have a simple grammar that defines the basic elements of my language. This includes a handful of binary operators, parentheses, numbers, identifiers, and function calls. The BNF for my grammar looks something like this:
<expression> ::= <number> | <parenexp> | <binexp> | <fncall> | <identifier>
<parenexp> ::= ( <expression> )
<fncall> ::= <identifier> ( <argumentlist> )
<binexp> ::= <expression> <binop> <expression>
<binop> ::= + - * / %
... the rest of the grammar definition
Using the Irony parser, I am able to validate the syntax of various input strings to make sure they conform to this grammar:
x + y / z * AVG(a + b, p) -> Valid Syntax
x +/ AVG(x -> Invalid Syntax
All that is well and good, but now I want to go a step further and define the available functions, along with the number of parameters that each function requires. So for example, I want to have a function FOO that accepts one parameter and BAR that accepts two parameters:
FOO(a + b) * BAR(x + y, p + q) -> Valid
FOO(a + b, 13) -> Invalid
When the second statement is parsed, I'd like to be able to output an error message that is aware of the expected input for this function:
Too many arguments specified for function 'FOO'
I don't actually need to evaluate any of these statements, only validate the syntax of the statements and determine if they are valid expressions or not.
How exactly should I be doing this? I know that technically I could simply add the functions to the grammar like so:
<foofncall> ::= FOO( <expression> )
<barfncall> ::= BAR( <expression>, <expression> )
But something about this doesn't feel quite right. To me it seems like the grammar should only define a generic function call, and not every function available to the language.
How is this typically accomplished in other languages?
What are the components called that should handle the responsibilities of analyzing the basic syntax of the language grammar versus the more specific elements like function definitions? Should both responsibilities be handled by the same component?
While you can do typechecking in directly in the grammar so its enforced in the parser, its generally a bad idea to do so. Instead, the parser should just parse the basic syntax, and separate typechecking code should be used for typechecking.
In the normal case of a compiler, the parser just produces an abstract syntax tree or some equivalent representation of the program. Then, a typechecking pass is run over the AST that ensures all types match appropriately -- ensures that functions have the right number of arguments and those arguments have the right type, as well as ensuring that variables have the right type for what is assigned to them and how they are used.
Besides being generally simpler, this usually allows you to give better error messages -- instead of just 'Invalid', you can say 'too many arguments to FOO' or what have you.
I found it's very hard to search for the simple indentation guide in F#.
Basically, I am wondering what's the rule for multiple-line statement indentation.
In C#, there is no problem because whitespace doesn't count.
Although I can write F# code according to my intuition and it works, I really want to know what's the rule for breaking one statement into multiple lines.
I write as
printfn "%d"
1
It works as expected
And if I write them in the same column, something goes wrong.
>
printfn "%A%A"
1
[];;
> //nothing is returned... and no error in this case
I want to confirm the basic rule for doing this. It's a little annoying when you can't be sure what you are doing.
Thanks in advance
I just tried another case
List.iter
(printfn "%d")
[1..10];;
And it prints out 1 to 10.
Why it's not
List.iter
((printfn "%d")
[1..10]);;
As Yin points out, the rule is that arguments of a function should be indented further than the call to the function. To add more details, your first snippet is interpreted like this:
printfn "%A%A";
1;
[];
Each of these is a valid expression that returns something (function, number, empty list) and then ignores the result and continues. Because they are written in the top-level scope, F# Interactive doesn't emit a warning that you're ignoring some values. If they were in a do block or let declaration:
do
printfn "%A%A"
1
[]
The F# compiler would emit a warning when sequencing expressions (using ;) that do not return unit:
stdin(5,3): warning FS0193: This expression is a function value, i.e. is missing arguments. Its type is 'a -> 'b -> unit.
stdin(6,3): warning FS0020: This expression should have type 'unit', but has typ
e 'int'. Use 'ignore' to discard the result of the expression, or 'let' to bind
the result to a name.
stdin(5,3): warning FS0020: This expression should have type 'unit', but has typ
e ''a list'. Use 'ignore' to discard the result of the expression, or 'let' to b
ind the result to a name.
In your second example, you should indent:
>
printfn "%A%A"
1
[];;
Otherwise the three expressions are three sequential expressions, not a single expression.
You can refer F# Language Specification for firm rules, e.g. Chapter 15 in the specification.
How can I write a no-op statement in F#?
Specifically, how can I improve the second clause of the following match statement:
match list with
| [] -> printfn "Empty!"
| _ -> ignore 0
Use unit for empty side effect:
match list with
| [] -> printfn "Empty!"
| _ -> ()
The answer from Stringer is, of course, correct. I thought it may be useful to clarify how this works, because "()" insn't really an empty statement or empty side effect...
In F#, every valid piece of code is an expression. Constructs like let and match consist of some keywords, patterns and several sub-expressions. The F# grammar for let and match looks like this:
<expr> ::= let <pattern> = <expr>
<expr>
::= match <expr> with
| <pat> -> <expr>
This means that the body of let or the body of clause of match must be some expression. It can be some function call such as ignore 0 or it can be some value - in your case it must be some expression of type unit, because printfn ".." is also of type unit.
The unit type is a type that has only one value, which is written as () (and it also means empty tuple with no elements). This is, indeed, somewhat similar to void in C# with the exception that void doesn't have any values.
BTW: The following code may look like a sequence of statements, but it is also an expression:
printf "Hello "
printf "world"
The F# compiler implicitly adds ; between the two lines and ; is a sequencing operator, which has the following structure: <expr>; <expr>. It requires that the first expression returns unit and returns the result of the second expression.
This is a bit surprising when you're coming from C# background, but it makes the langauge surprisingly elegant and consise. It doesn't limit you in any way - you can for example write:
if (a < 10 && (printfn "demo"; true)) then // ...
(This example isn't really useful - just a demonstration of the flexibility)