Some languages use a unary plus operator for implicit conversions, such as coercing a string to a number (e.g. Javascript) or casting small number types to an int (e.g. most C-based languages), or to be used when overloading operators.
Since the unary plus is primarily used for hackish purposes like this, and also since F# does not perform automatic widening conversions, I was surprised that F# includes the unary plus.
What adds to my surprise is that Haskell does not have a unary plus operator. Since the F# design was influenced by Haskell, I'm curious as to why it was decided that F# needed a unary plus when Haskell apparently didn't.
Can you give an example of a credible use for the unary plus in F#? If you can't, why is it included in the language at all?
I'll summarize the extended comments. Possible reasons (until a more authoritative answer is given):
Consistency with OCaml, from which F# is derived (if you're doing something wrong/unnecessary it's best to keep doing it so people know what to expect :-))
Overloading (mostly for custom types)
Symmetry with unary negation
F# has two core influences:
OCaml, with which it was originally compatible, and
the CLR, on which it is built.
As has been pointed out, OCaml has a unary plus operator, so from that point of view, it was natural for F# to have one as well.
As for the CLR... To my surprise, the Common Language Specification doesn't specify any requirements for languages to support operator overloading. However, it does specify semantics and naming conventions when the mechanism is used. Still, F# was allowed to opt out of using unary plus, just like C# and VB opted out of support for overloading compound assignment operators (+=, etc.).
The most common .NET languages aside from F# (C#, VB and C++/CLI) do allow it and have a unary plus. So from this point of view as well it would be natural for F# to have support for a unary plus operator.
There is a unary plus operator in standard mathematical notation. Most programming languages have standard math notation as the original influence and motivation for the syntax of arithmetic expressions.
According to the this "Used to declare an overload for the unary plus operator."
Related
With a Backus-Naur form grammar (BNF), we can specify the syntax of the programming language in order to parse it and produce an abstract syntax tree (AST).
<if> ::= "if" <expression> "then" <action> "end"
But we can also specify the tokens with a BNF grammar, as the first usage of BNF did for ALGOL-60:
<digit> ::= "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
<digit_with_zero> ::= <digit> | "0"
<integer> ::= <digit> | <digit_with_zero> <integer>
However, this usage of the BNF in order to lex (= produce a list of minimal meaningful units aka tokens) has been deprecated in favor of regular expressions (like [1-9][0-9]*).
It seems clear that the regex are much more concise.
It seems also that keeping the structure of an if statement is interesting for the interpreter or the compiler which will handle the AST produced by the parser, but keeping the structure of an integer (or a float) is not.
But do you agree that BNF could be used for both lexing and parsing?
And do you agree with the reasons which make regex much more suited for lexing?
Or are there others?
Regular expressions (in the mathematical sense) are equivalent in power to regular grammars and regular grammars can be written in BNF. So in that sense, it is clearly possible to write a full grammar for any context-free language in pure BNF.
Indeed, it is not even necessary to maintain the lexer/parser dichotomy. Some programmers find it convenient to use scannerless parsing (the article is not great but it has some interesting references), although many of these are based on the PEG formalism (which is not context-free) rather than BNF. (These are not the same despite the superficial resemblance.)
That said, it might not be convenient. In general, like most questions related to the structure of parsers, the answer is going to be based less on theory and more on a combination of practicality (with reference to a specific use case) and programmer prejudice.
As is well known, purity is rarely the most practical. Most real-life parser and scanner generators deviate from the pure theoretical models in order to provide mechanisms which are easier to use, easier to implement efficiently, or more powerful. For example, the character class syntax ([a-zA-Z]), which is almost universal in scanner generators, is a clear extension to regular expression syntax which deliberately avoids the need to explicitly list the entire contents of the set. One could say that the listing is implicit and unambiguous in the example I just presented, but most scanner generators also allow the use of classes like [[:alnum:]] ("alphanumeric symbols"), where the precise list of matched symbols is either locale-dependent or, in the Unicode world, extensible in the future. Regardless, this is obviously a useful extension.
While it is true that some aspects of regular expressions are more compact than their equivalent regular grammars -- especially the Kleene star operator, which in BNF requires an additional non-terminal and thus an additional name -- there are also cases where the ability to name subexpressions makes regular grammars more compact. Many scanner generators, starting with Lex, allowed named subpatterns as another regular expression extension. Furthermore, it is possible (with some caveats) to add the Kleene star and other operators to BNF as macros, and many parser generators do so. So there is a certain convergence of notation.
As you say, one difference between scanners and parsers is that the scanner generally makes no attempt to parse the substructure of a lexeme. But it is not true that no lexeme has substructure, and these substructures often do need to be analysed. The most notorious example is probably floating point numbers, which have to be analysed into a multiplier and an exponent, and the multiplier also analysed into an integer part and a fractional part. This analysis is commonly done using primitive functions available in the scanner implementation language (such as strtod for C scanners), but that does mean a second lexical scan. (Using the built-in avoids the considerable inconvenience of writing a mathematically correct string-to-internal converter, which is a much more difficult problem than it first appears. Rolling your own number converter is not recommended.)
Other lexemes with internal structure include string literals (which may contain escape sequences) and a large variety of more complex lexemes available in certain languages (dates and times, IP addresses, HTML tags, etc., etc.). All of these things tend to blur the boundary between scanning and parsing. Which is fine, because, as I said, the boundary is situational and not restrained by any absolute law of nature.
Still, it is certainly the case that many lexemes do not have any interesting internal structure, and furthermore that while it is easy to rewrite a regular expression as a regular grammar, it is considerably harder to rewrite it as an unambiguous, deterministic regular grammar, much less an LALR(1) regular grammar. (This is one of the reasons scannerless parsing is often associated with PEG, but it can also be solved with GLL or GLR parsers, at a slight loss of efficiency.)
I'm following along with Bob Nystrom's great book "Crafting Interpreters".
Please let me know if this question is too specific for this site - I've been trying for hours but couldn't figure this out on my own :)
In chapter Compiling Expressions, in function unary(), the function parsePrecedence(Precedence) is called with PREC_UNARY instead of PREC_UNARY + 1.
The book explains this is in order to enable "nesting" of unary operators. E.g.: --1.
However, in parsePrecedence(Precedence) no precedence level is checked before parsing prefix operators - it is checked only before infix ones. And unary is a prefix parser.
So passing PREC_UNARY or PREC_UNARY + 1 to parsePrecedence(Precedence) doesn't seem to make a difference. What am I missing?
The simple answer is that you are right: with this particular grammar, there is no difference because no binary (or postfix) operator has precedence PREC_UNARY, and the test that will be used is ≤.
All the same, the conventional answer is to use PREC_UNARY because unary prefix operators are (necessarily) right associative. This convention comes from the case of binary operators, where you need to use the operator's precedence plus one for left associative operators (the normal case) and the operator's precedence itself for right-associative operators (exponentiation and assignment, for example). (Assignment is actually somewhat more complicated, but I personally think the solution proposed by Bob Nystrom is more complicated than would have been necessary.)
Another conventional answer derives from the possibility of using a bottom-up operator precedence parser (Dijkstra's "shunting yard") instead of the top-down Pratt parser. Fully exploring bottom-up parsing goes well beyond the scope of this question; suffice it to say that the same principle applies with respect to associativity.
I wonder if there is any difference in how the two features are implemented under the hood? I.e. Aren't just code quotations built on top of the old good expression trees?
Thanks.
The two types are quite similar, but they are represented differently.
Quotations are designed in a more functional way. For example foo a b would be represented as a series of applications App(App(foo, a), b)
Quotations can represent some constructs that are available only in F# and using expression trees would hide them. For example there is Expr.LetRecursive for let rec declarations
Quotations were first introduced in .NET 3.0. Back then expression trees could only represent C# expressions, so it wasn't possible to easily capture all F# constructs (quotations can capture any F# expression including imperative ones).
Quotations are also designed to be easily processible using recursion. The ExprShape module contains patterns that allow you to handle all possible quotations with just 4 cases (which is a lot easier than implementing visitor pattern with tens of methods in C#).
When you have an F# quotation, you can translate it to C# expression tree using FSharp.Quotations.Evaluator. This is quite useful if you're using some .NET API that expects expression trees from F#. As far as I know, there is no translation the other way round.
I've always found postfix languages like Factor to be far more readable than prefix (Lispy languages) and infix/postfix languages (all C-style languages, if we include both operators and functions).
Unlike prefix languages, you don't need for delimiters everywhere. Unlike infix notation, there's no complex precedence order to remember. What isn't there to like?
These languages all seem to be concatanative, and thus nearly always stack-based.
Could a modern language be implemented that was applicative over concatanative, and was still postfix-based?
I've been mulling over creating a language that would be extremely well suited to creation of DSLs, by allowing definitions of functions that are infix, postfix, prefix, or even consist of multiple words. For example, you could define an infix multiplication operator as follows (where multiply(X,Y) is already defined):
a * b => multiply(a,b)
Or a postfix "squared" operator:
a squared => a * a
Or a C or Java-style ternary operator, which involves two keywords interspersed with variables:
a ? b : c => if a==true then b else c
Clearly there is plenty of scope for ambiguities in such a language, but if it is statically typed (with type inference), then most ambiguities could be eliminated, and those that remain could be considered a syntax error (to be corrected by adding brackets where appropriate).
Is there some reason I'm not seeing that would make this extremely difficult, impossible, or just a plain bad idea?
Edit: A number of people have pointed me to languages that may do this or something like this, but I'm actually interested in pointers to how I could implement my own parser for it, or problems I might encounter if doing so.
This is not too hard to do. You'll want to assign each operator a fixity (infix, prefix, or postfix) and a precedence. Make the precedence a real number; you'll thank me later. Operators of higher precedence bind more tightly than operators of lower precedence; at equal levels of precedence, you can require disambiguation with parentheses, but you'll probably prefer to permit some operators to be associative so you can write
x + y + z
without parentheses. Once you have a fixity, a precedence, and an associativity for each operator, you'll want to write an operator-precedence parser. This kind of parser is fairly simply to write; it scans tokens from left to right and uses one auxiliary stack. There is an explanation in the dragon book but I have never found it very clear, in part because the dragon book describes a very general case of operator-precedence parsing. But I don't think you'll find it difficult.
Another case you'll want to be careful of is when you have
prefix (e) postfix
where prefix and postfix have the same precedence. This case also requires parentheses for disambiguation.
My paper Unparsing Expressions with Prefix and Postfix Operators has an example parser in the back, and you can download the code, but it's written in ML, so its workings may not be obvious to the amateur. But the whole business of fixity and so on is explained in great detail.
What are you going to do about order of operations?
a * b squared
You might want to check out Scala which has a kind of unique approach to operators and methods.
Haskell has just what you're looking for.