Why does f# dot operator have such a low precedence - f#

The precedence of F#'s member selection dot (.) operator as used in
System.Console.WriteLine("test")
has a lower precedence than [space] such that the following
ignore System.Console.WriteLine("test")
must be written explicitly as
ignore (System.Console.WriteLine("test"))
though this would be the intuition from the notion of juxtaposed symbols. Having used CoffeeScript, I can appreciate how intuitive precedence can serve to de-clutter code.
Are there any efforts being made to rationalize this kerfuffle, perhaps something along the lines that incorporated the "lightweight" syntax of the early years?
==============
Upon review, the culprit is not the "." operator but the invocation operator "()", as in "f()". So, given:
type C() = class end
then the following intuitive syntax fails:
printfn "%A" C() <-- syntax error FS0597
and must be written thus (as prescribed by the documentation):
printfn "%A" (C()) <-- OK
It seems intuitive that a string of symbols unbroken by white space should implicitly represents a block. In fact, the utility of juxtaposing is to create such a block.

a b.c is parsed as a (b.c), not (a b).c. So there are no efforts to rationalize this - it simply is not true.

Thanks to all those who responded.
My particular perplexity stemmed from treating () as an invocation operator. As an eager evaluation language, F# does not have or need such a thing. In stead, this is an expression boundary, as in, (expression). In particular, () bounds the nothing expression which is the only value of the type, unit. Consequently, () is the stipulation of a value and not a direction to resolved the associated function (though that is the practical consequence when parameters are provided to functions due to F#'s eager evaluation.)
As a result, the following expression
ignore System.Console.WriteLine("test")
actually surfaces three distinct values,
ignore System.Console.WriteLine ("test")
which are interpreted according to the left-to-right precedence evaluation order or F# (which then permits partial function application and perhaps other things)
( ignore System.Console.WriteLine ) ("test")
...but the result of (ignore expr) will be unit, which does not expect a parameter. Hence, syntax error (strong typing, yea!). So, an expression boundary is required. In particular,
ignore ( System.Console.WriteLine ("test") )
or
ignore (System.Console.WriteLine "test")
or
ignore <| System.Console.WriteLine "test"
or
System.Console.WriteLine "test" |> ignore

Related

How is the conditional operator parsed?

So, the cppreference claims:
The expression in the middle of the conditional operator (between ? and :) is parsed as if parenthesized: its precedence relative to ?: is ignored.
However, it appears to me that the part of the expression after the ':' operator is also parsed as if it were between parentheses. I've tried to implement the ternary operator in my programming language (and you can see the results of parsing expressions here), and my parser pretends that the part of the expression after ':' is also parenthesized. For example, for the expression (1?1:0?2:0)-1, the interpreter for my programming language outputs 0, and this appears to be compatible with C. For instance, the C program:
#include <stdio.h>
int main() {
printf("%d\n",(1?1:0?2:0)-1);
}
Outputs 0.
Had I programmed the parser of my programming language that, when parsing the ternary operators, simply take the first already parsed node after ':' and take it as the third operand to '?:', it would output the same as ((1?1:0)?2:0)-1, that is 1.
My question is whether this would (pretending that the expression after the ':' is parenthesized) always be compatible with C?
"Pretends that it is parenthesised" is some kind of description of operator parenthesis. But of course that has to be interpreted relative to precedence relations (including associativity). So in a-b*c and a*b-c, the subtraction effectively acts as though its arguments are parenthesised, only the left-hand argument is treated that way in a-b-c and it is the comparison operator which causes grouping in a<b-c and a-b<c.
I'm sure you know all that since your parser seems to work for all these cases, but I say that because the ternary operator is right-associative and of lower precedence than any other operator [Note 1]. That means that the pseudo-parentheses imposed by operator precedence surround the right-hand argument (regardless of its dominating operator, since all operators have higher precedence), and also the left-hand argument unless its dominating operator is another conditional operator. But that wouldn't be the case in C, where the comma operator has lower precedence and would not be enclosed by the imaginary parentheses following the :.
It's important to understand what is meant by the precedence of a complex operator. In effect, to compute the precedence relations we first collapse the operator to a simple ?: which includes the enclosed (second) argument. This is not "as if the expression were parenthesized", because it is parenthesized. It is parenthesized between ? and :, which in this context are syntactically parenthetic.
In this sense, it is very similar to the usual analysis of the subscript operator as a postfix operator, although the brackets of the subscript operator enclose a second argument. The precedence of the subscript operator is logically what would result from considering it to be a single [], abstracting away the expression contained inside. This is also the same as the function call operator. That happens to be written with parentheses, but the precise symbols are not important: it is possible to imagine an alternative language in which function calls are written with different symbols, perhaps { and }. That wouldn't affect the grammar at all.
It might seem odd to think of ? and : to be "parenthetic", since they don't look parenthetic. But a parser doesn't see the shapes of the symbols. It is satisfied by being told that a ( is closed by a ) and, in this case, that a ? is closed by a :. [Note 2]
Having said all that, I tried your compiler on the conditional expression
d = 0 ? 0 : n / d
It parses this expression correctly, but the compiled code computes n / d before verifying whether d = 0 is true. That's not the way the conditional operator should work; in this case, it will lead to an unexpected divide by 0 exception. The conditional operator must first evaluate its left-hand argument, and then evaluate exactly one of the other two expressions.
Notes:
In C, this is not quite correct. The comma operator has lower precedence, and there is a more complex interaction with assignment operators, which logically have the same precedence and are also right-associative.
In C-like languages those symbols are not used for any other purpose, so it's OK to just regard them as strange-looking parentheses and leave it at that. But as the case of the function-call operator shows (or, for that matter, the unary - operator), it is sometimes possible to reuse operator symbols for more than one purpose.
As a curiosity, it is not strictly necessary that open and close parentheses be different symbols, as long as they are not used for any other purpose. So, for example, if | is not used as an operator symbol (as it is in C), then you could use | a | to mean the absolute value of a without creating any ambiguities.
A precise analysis of the circumstances in which symbol reuse leads to actual ambiguities is beyond the scope of this answer.

Error: Unexpected infix operator in expression, about a successfully compiled prefix operator

Playing around a little bit with infix operators, I was surprised about the following:
let (>~~~) = function null -> String.Empty | s -> s // compiles fine, see screenshot
match >~~~ input with .... // error: Unexpected infix operator in expression
and:
Changing the first characters of the prefix operator (to !~~~ for instance) fixes it. That I get an error that the infix operator is unexpected is rather weird. Hovering shows the definition to be string -> string.
I'm not too surprised about the error, F# requires (iirc) that the first character of a prefix operator must itself be one of the predefined prefix operators. But why does it compile just fine, and when I use it, the compiler complains?
Update: the F# compiler seems to know in other cases just fine when I use an invalid character in my operator definition, it says "Invalid operator definition. Prefix operator definitions must use a valid prefix operator name."
The rules for custom operators in F# are quite tight - so even though you can define custom operators, there is a lot of rules about how they will behave and you cannot change those. In particular:
Only some operators (mainly those with ! and ~) can be used as prefix operators. With ~ you can also overload unary operators +, -, ~ and ~~, so if you define an operator named ~+., you can then use it as e.g. +. 42.
Other operators (including those starting with >) can only be used as infix. You can turn any operator into ordinary function using parentheses, which is why e.g. (+) 1 2 is valid.
The ? symbols is special (it is used for dynamic invocation) and cannot appear as the first symbol of a custom operator.
I think the most intuitive way of thinking about this is that custom operators will behave like standard F# operators, but you can add additional symbols after the standard operator name.

How to resolve ambiguity in the definition of an LR(1) grammar?

I am writing a Golang compiler in OCaml, and argument lists are causing me a bit of a headache. In Go, you can group consecutive parameter names of the same type in the following way:
func f(a, b, c int) === func f(a int, b int, c int)
You can also have a list of types, without parameter names:
func g(int, string, int)
The two styles cannot be mix-and-matched; either all parameters are named or none are.
My issue is that when the parser sees a comma, it doesn't know what to do. In the first example, is a the name of a type or the name of a variable with more variables coming up? The comma has a dual role and I am not sure how to fix this.
I am using the Menhir parser generator tool for OCaml.
Edit: at the moment, my Menhir grammar follows exactly the rules as specified at http://golang.org/ref/spec#Function_types
As written, the go grammar is not LALR(1). In fact, it is not LR(k) for any k. It is, however, unambiguous, so you could successfully parse it with a GLR parser, if you can find one (I'm pretty sure that there are several GLR parser generators for OCAML, but I don't know enough about any of them to recommend one).
If you don't want to (or can't) use a GLR parser, you can do it the same way Russ Cox did in the gccgo compiler, which uses bison. (bison can generate GLR parsers, but Cox doesn't use that feature.) His technique does not rely on the scanner distinguishing between type-names and non-type-names.
Rather, it just accepts parameter lists whose elements are either name_or_type or name name_or_type (actually, there are more possibilities than that, because of the ... syntax, but it doesn't change the general principle.) That's simple, unambiguous and LALR(1), but it is overly-accepting -- it will accept func foo(a, b int, c), for example -- and it does not produce the correct abstract syntax tree because it doesn't attach the type to the list of parameters being declared.
What that means is that once the argument list is fully parsed and is about to be inserted into the AST attached to some function declaration (for example), a semantic scan is performed to fix it up and, if necessary, produce an error message. That scan is done right-to-left over the list of declaration elements, so that the specified type can be propagated to the left.
It's worth noting that the grammar in the reference manual is also overly-accepting, because it does not express the constraint that "either all parameters are named or none are". That constraint could be expressed in an LR(1) grammar -- I'll leave that as an exercise for readers -- but the resulting grammar would be a lot more difficult to understand.
You don't have ambiguity. The fact that the standard Go parser is LALR(1) proves that.
is a the name of a type or the name of a variable with more variables coming up?
So basically your grammar and the parser as a whole should be completely disconnected from the symbol table; don't be C – your grammar is not ambiguous therefore you can check the type name later in the AST.
These are the relevant rules (from http://golang.org/ref/spec); they are already correct.
Parameters = "(" [ ParameterList [ "," ] ] ")" .
ParameterList = ParameterDecl { "," ParameterDecl } .
ParameterDecl = [ IdentifierList ] [ "..." ] Type .
IdentifierList = identifier { "," identifier } .
I'll explain them to you:
IdentifierList = identifier { "," identifier } .
The curly braces represent the kleene-closure (In POSIX regular expression notation it's the asterisk). This rule says "an identifier name, optionally followed by a literal comma and an identifier, optionally followed by a literal comma and an identifier, etc… ad infinitum"
ParameterDecl = [ IdentifierList ] [ "..." ] Type .
The square brackets are nullability; this means that that part may or may not be present. (In POSIX regular expression notation it's the question mark). So you have "Maybe an IdentifierList, followed by maybe an ellipsis, followed by a type.
ParameterList = ParameterDecl { "," ParameterDecl } .
You can have several ParameterDecl in a list like e.g. func x(a, b int, c, d string).
Parameters = "(" [ ParameterList [ "," ] ] ")" .
This rules defines that a ParameterList is optional and to be surrounded by parenthesis and may include an optional final comma literal, useful when you write something like:
func x(
a, b int,
c, d string, // <- note the final comma
)
The Go grammar is portable and can be parsed by any bottom-up parser with one token of lookahead.
Edit regarding "don't be C": I said this because C is context-sensitive and the way they solve this problem in many (all?) compilers is by wiring the symbol table to the lexer and lexing tokens differently depending on if they are defined as type names or variables. This is a hack and should not be done for unambiguous grammars!

Break a statement (expression) into multiple lines: how to indent

I found it's very hard to search for the simple indentation guide in F#.
Basically, I am wondering what's the rule for multiple-line statement indentation.
In C#, there is no problem because whitespace doesn't count.
Although I can write F# code according to my intuition and it works, I really want to know what's the rule for breaking one statement into multiple lines.
I write as
printfn "%d"
1
It works as expected
And if I write them in the same column, something goes wrong.
>
printfn "%A%A"
1
[];;
> //nothing is returned... and no error in this case
I want to confirm the basic rule for doing this. It's a little annoying when you can't be sure what you are doing.
Thanks in advance
I just tried another case
List.iter
(printfn "%d")
[1..10];;
And it prints out 1 to 10.
Why it's not
List.iter
((printfn "%d")
[1..10]);;
As Yin points out, the rule is that arguments of a function should be indented further than the call to the function. To add more details, your first snippet is interpreted like this:
printfn "%A%A";
1;
[];
Each of these is a valid expression that returns something (function, number, empty list) and then ignores the result and continues. Because they are written in the top-level scope, F# Interactive doesn't emit a warning that you're ignoring some values. If they were in a do block or let declaration:
do
printfn "%A%A"
1
[]
The F# compiler would emit a warning when sequencing expressions (using ;) that do not return unit:
stdin(5,3): warning FS0193: This expression is a function value, i.e. is missing arguments. Its type is 'a -> 'b -> unit.
stdin(6,3): warning FS0020: This expression should have type 'unit', but has typ
e 'int'. Use 'ignore' to discard the result of the expression, or 'let' to bind
the result to a name.
stdin(5,3): warning FS0020: This expression should have type 'unit', but has typ
e ''a list'. Use 'ignore' to discard the result of the expression, or 'let' to b
ind the result to a name.
In your second example, you should indent:
>
printfn "%A%A"
1
[];;
Otherwise the three expressions are three sequential expressions, not a single expression.
You can refer F# Language Specification for firm rules, e.g. Chapter 15 in the specification.

Run method in computation expressions

What's the status of the Run() method in a computation method? I've seen it in several examples (here, here, here), and I've seen in it F#'s compiler source, yet it's not in the spec or the MSDN documentation. I filed an issue in MS Connect about this and it was closed as "by design" without further explanations.
So is it deprecated/undocumented/unsupported? Should I avoid it?
UPDATE: MS Connect issue status was promptly changed and the MSDN page updated to include Run()
6.3.10 Computation Expressions
More specifically, computation
expressions are of the form
builder-expr { cexpr } where cexpr is,
syntactically, the grammar of
expressions with the additional
constructs defined in comp-expr.
Computation expressions are used for
sequences and other non-standard
interpretations of the F# expression
syntax. The expression builder-expr {
cexpr } translates to
let b = builder-expr in b.Run (b.Delay(fun () -> {| cexpr |}C))
for a fresh variable b. If no method Run exists on the inferred type of b when this
expression is checked, then that call is omitted. Likewise, if no method Delay exists on > the type of b when this expression is checked, then that call is omitted
I think that the Run method was added quite late in the development process, so that's probably a reason why it is missing in the documentation. As desco explains, the method is used to "run" a computation expression. This means that whenever you write expr { ... }, the translated code will be wrapped in a call to Run.
The method is a bit problematic, because it breaks compositionality. For example, it is sensible to require that for any computation expression, the following two examples represents the same thing:
expr { let! a = foo() expr { let! c = expr {
let! b = bar(a) let! a = foo()
let! c = woo(b) let! b = bar(a)
return! zoo(c) } return! woo(b) }
return! zoo(c) }
However, the Run method will be called only on the overall result in the left example and two times on the right (for the overall computation expression and for the nested one). A usual type signature of the method is M<T> -> T, which means that the right code will not even compile.
For this reason, it is a good idea to avoid it when creating monads (as they are usually defined and used e.g. in Haskell), because the Run method breaks some nice aspects of monads. However, if you know what you are doing, then it can be useful...
For example, in my break code, the computation builder executes its body immediately (in the declaration), so adding Run to unwrap the result doesn't break compositionality - composing means just running another code. However, defining Run for async and other delayed computations is not a good idea at all.

Resources