Let bindings support punctuation enclosed in double backticks, but types don't? - f#

Using F# in Visual Studio 2012, this code compiles:
let ``foo.bar`` = 5
But this code does not:
type ``foo.bar`` = class end
Invalid namespace, module, type or union case name
According to section 3.4 of the F# language specification:
Any sequence of characters that is enclosed in double-backtick marks (````),
excluding newlines, tabs, and double-backtick pairs themselves, is treated
as an identifier.
token ident =
| ident-text
| `` [^ '\n' '\r' '\t']+ | [^ '\n' '\r' '\t'] ``
Section 5 defines type as:
type :=
( type )
type -> type -- function type
type * ... * type -- tuple type
typar -- variable type
long-ident -- named type, such as int
long-ident<types> -- named type, such as list<int>
long-ident< > -- named type, such as IEnumerable< >
type long-ident -- named type, such as int list
type[ , ... , ] -- array type
type lazy -- lazy type
type typar-defns -- type with constraints
typar :> type -- variable type with subtype constraint
#type -- anonymous type with subtype constraint
... and Section 4.2 defines long-ident as:
long-ident := ident '.' ... '.' ident
As far as I can tell from the spec, types are named with long-idents, and long-idents can be idents. Since idents support double-backtick-quoted punctuation, it therefore seems like types should too.
So am I misreading the spec? Or is this a compiler bug?

It definitely looks like the specification is not synchronized with the actual implementation, so there is a bug on one side or the other.
When you use identifier in double backticks, the compiler treats it as a name and simply generates type (or member) with the name you specified in backticks. It does not do any name mangling to make sure that the identifier is valid type/member name.
This means that it is not too surprising that you cannot use identifiers that would clash with some standard meaning in the compiled code. In your example, it is dot, but here are a few other examples:
type ``Foo.Bar``() = // Dot is not allowed because it represents namespace
member x.Bar = 0
type ``Foo`1``() = // Single backtick is used to compile generic types
member x.Bar = 0
type ``Foo+Bar``() = // + is used in the name of a nested type
member x.Bar = 0
The above examples are not allowed as type names (because they clash with some standard meaning), but you can use them in let-bindings, because there are no such restrictions on variable names:
let ``foo`1`` = 0
let ``foo.bar`` = 2
let ``foo+bar`` = 1
This is definitely something that should be explained in the documentation & the specification, but I hope this helps to clarify what is going on.

Related

Antlr4: access variable name in type annotation context (access grandparent's sibling rule context)

I need to parse variable declaration using Antlr4.
variableDeclaration
: name ':' type
;
name
: Name
;
type is top-level rule for type expressions. So there are some simple types parsed via long production chain.
type
: // some expression
| lessComplexTypeExpression
;
... // many rules
verySimpleTypeExpression:
: Many
| Simple
| Types
| But
| OneSpecificType
;
Unfortunately, I need to know variable name when build OneSpecificType's application-specific node. I could pass the name through the whole stack of my custom visit methods, but name parameter will be ignored by the majority of simple types.
Which way is the best to access name.Name() from OneSpecificTypeContext? Is there some antlr4 feature to do this?

What characters are allowed in F# identifiers, module, type and member names?

This question is about characters in identifiers, not keywords as identifiers.
I found this question on C# names, but couldn't readily find the same on F#. Normally this is hardly relevant, but in my tests naming I often use the dot . and was surprised it wasn't supported in a module name, but is supported in a let-binding:
// fails:
module ``Run Test.Me functions`` =
[<Test>]
let ``X.Add should add``() = test <# X.Add 2 2 = 4 #>
// Succeeds
module ``Run Test-Me functions``
[<Test>]
let ``X.Add should add``() = test <# X.Add 2 2 = 4 #>
Outside of naming tests I don't see much use for this, but it made me wonder: what characters are supported by type and module names, and what characters for member names and let bindings?
Some tests:
module ``Weird.name`` = () // fails
module ``Weird-name`` = () // succeeds
module ``Weird()name`` = () // succeeds (?)
module ``Weird*name`` = () // fails
module ``Weird+name`` = () // fails
module ``Weird%name`` = () // succeeds (?)
module ``Weird/name`` = () // fails
module ``Weird\\name`` = () // fails
All of these name succeed in a let-binding or member name, but not as a type name or module name. At least that is consistent. But I can't find any line or logic in what is allowed and what not
Perhaps the limitation is imposed by the CLR / MSIL and not by F# itself?
Take a look at the F# Language Specification 4.0 - In section 3.4 Identifiers and Keywords.
Note that when an identifier is used for the name of a types, union
type case, module, or namespace, the following characters are not
allowed even inside double-backtick marks:
., +, $, &, [, ], /, \\, *, \", `
In addition to this list, the # (at-sign) is allowed in any name, but will raise a warning:
warning FS1104: Identifiers containing '#' are reserved for use in F# code generation
As near as I can find:
The list of characters can be found in the F# compiler with the name IllegalCharactersInTypeAndNamespaceNames.
As this is used for generating IL, that leads to ECMA-335 - Common Language Infrastructure (CLI) Partitions I to VI which reads:
II.5.3 Identifiers - Identifiers are used to name entities. Simple
identifiers are equivalent to an ID. However, the ILAsm syntax allows
the use of any identifier that can be formed using the Unicode
character set (see Partition I). To achieve this, an identifier shall
be placed within single quotation marks.
ID is a contiguous string of characters which starts with either an
alphabetic character (A–Z, a–z)
or one of _, $, #, ` (grave accent), or ?,
and is followed by any number of
alphanumeric characters (A–Z, a–z, 0–9)
or the characters _, $, #, ` (grave accent), and ?

Type to represent a string which is not empty or spaces in F#

I love the simplicity of types like
type Code = Code of string
But I would like to put some restrictions on string (in this case - do not allow empty of spaces-only strings). Something like
type nonemptystring = ???
type Code = Code of nonemptystring
How do I define this type in F# idiomatic way? I know I can make it a class with constructor or a restricted module with factory function, but is there an easy way?
A string is essentially a sequence of char values (in Haskell, BTW, String is a type alias for [Char]). A more general question, then, would be if it's possible to statically declare a list as having a given size.
Such a language feature is know as Dependent Types, and F# doesn't have it. The short answer, therefore, is that this is not possible to do in a declarative fashion.
The easiest, and probably also most idiomatic, way, then, would be to define Code as a single-case Discriminated Union:
type Code = Code of string
In the module that defines Code, you'd also define a function that clients can use to create Code values:
let tryCreateCode candidate =
if System.String.IsNullOrWhiteSpace candidate
then None
else Some (Code candidate)
This function contains the run-time logic that prevents clients from creating empty Code values:
> tryCreateCode "foo";;
val it : Code option = Some (Code "foo")
> tryCreateCode "";;
val it : Code option = None
> tryCreateCode " ";;
val it : Code option = None
What prevents a client from creating an invalid Code value, then? For example, wouldn't a client be able to circumvent the tryCreateCode function and simply write Code ""?
This is where signature files come in. You create a signature file (.fsi), and in that declare types and functions like this:
type Code
val tryCreateCode : string -> Code option
Here, the Code type is declared, but its 'constructor' isn't. This means that you can't directly create values of this types. This, for example, doesn't compile:
Code ""
The error given is:
error FS0039: The value, constructor, namespace or type 'Code' is not defined
The only way to create a Code value is to use the tryCreateCode function.
As given here, you can no longer access the underlying string value of Code, unless you also provide a function for that:
let toString (Code x) = x
and declare it in the same .fsi file as above:
val toString : Code -> string
That may look like a lot of work, but is really only six lines of code, and three lines of type declaration (in the .fsi file).
Unfortunately there isn't convenient syntax for declaring a restricted subset of types but I would leverage active patterns to do this. As you rightly say, you can make a type and check it's validity when you construct it:
/// String type which can't be null or whitespace
type FullString (string) =
let string =
match (System.String.IsNullOrWhiteSpace string) with
|true -> invalidArg "string" "string cannot be null or whitespace"
|false -> string
member this.String = string
Now, constructing this type naively may throw runtime exceptions and we don't want that! So let's use active patterns:
let (|FullStr|WhitespaceStr|NullStr|) (str : string) =
match str with
|null -> NullStr
|str when System.String.IsNullOrWhiteSpace str -> WhitespaceStr
|str -> FullStr(FullString(str))
Now we have something that we can use with pattern matching syntax to build our FullStrings. This function is safe at runtime because we only create a FullString if we're in the valid case.
You can use it like this:
let printString str =
match str with
|NullStr -> printfn "The string is null"
|WhitespaceStr -> printfn "The string is whitespace"
|FullStr fstr -> printfn "The string is %s" (fstr.String)

Xtext: Cross-References with Alternative

I've started my first Xtext project and I run into a problem with Cross-Reference (That's what I think is maybe the problem). I've got a DatType, InterfaceDescription rule and an Enumeration. What I want to do is to describe an interface by leting the user choose a datatype from the enumeration or define a new one.
The Enum works without a problem, but when I define a new Datatype with "datatype test1" and use it inside the InterfaceDescription, I get the following Error: 'XtextReconcilerJob' has encountered a problem. An internal error occurred during: "XtextReconcileJon". And that's the error stack: http://pastebin.com/evFki2mB
DataType:
'datatype' name=ID ('mapto' mappedType = JAVAID)?
;
Interface:
interfaceType=InterfaceType name=ID datatype=([DataType]| DataTypeEnum)
;
enum InterfaceType:
INLET = 'inlet' |
OUTLET = 'outlet'
;
DataTypeEnum:
INT8 = 'int8' | INT16 = 'int16' | INT32 = 'int32' |
DOUBLE = 'double' | SINGLE = 'single' | REAL = 'real' |
BOOLEAN = 'boolean' | CHAR = 'char'
;
When I use the DataType Cross-Reference in another Rule, it works:
ParamList:
'param:' datatype=[DataType] name=ID
;
Anyone knows what's the problem?
There are a few issues with the grammar, that together cause this strange behaviour:
DataTypeEnum, as opposed to its name is not an enum but a strange object that might represent a few string values. This hides the issue of the alternate type assignment in the interface rule in the editor.
When generating the editor, some cryptic error messages appear in the output:
error(208): ../org.xtext.example.mydsl/src-gen/org/xtext/example/mydsl/parser/antlr/internal/InternalMyDsl.g:447:1: The following token definitions can never be matched because prior tokens match the same input: RULE_ID
ebnf2 is not supported for CrossReference - this means, that an extended construct, such as the '|' pattern is not allowed when defining references
By prefixing the DataTypeEnum with the enum keyword, the datatype attribute definition becomes erroneous in the editor, as there is no type in EMF that can be both an enum and an EObject, thus the problem location becomes obvious.
Finally, the runtime error was caused by the fact, that something was missing from the generated parser/lexer tooling, and the resulting model was also incorrect.
To be more constructive, I suggest replacing the offending line by defining a TypeReference element, that can either refer to types mapped to Java or data types. I could extend your grammar in the following way:
Interface:
interfaceType=InterfaceType name=ID datatype=(TypeReference)
;
TypeReference:
JavaTypeReference | DataTypeReference
;
JavaTypeReference:
type = [DataType]
;
DataTypeReference:
type = DataTypeEnum
;
enum DataTypeEnum:
INT8 = 'int8' | INT16 = 'int16' | INT32 = 'int32' |
DOUBLE = 'double' | SINGLE = 'single' | REAL = 'real' |
BOOLEAN = 'boolean' | CHAR = 'char'
;
PS.: I suggest adding some keywords to the language to ease parsing, especially error recovery. See the following blog post for details: http://zarnekow.blogspot.hu/2012/11/xtext-corner-7-parser-error-recovery.html

Can I use a type within its own type definition?

I'm trying to define the following type:
type lToken =
LInt of int
| LString of string
| LList of lToken list
| LFunction of string * LList
but I'm getting an error 'LList' is not defined.
Is there a way to do what I'm trying to do - i.e. use the types I'm defining inside their own type definition?
Thanks
As others pointed out, LList is not a name of a type, but just a name of discriminated union's constructor. In F#, cases of discriminated union happen to be compiled as .NET types, but that's just an implementation detail and you cannot refer to the generated types.
If you want to declare LFunction as a cast that consists of string and a LList then you can either expand the definition (as Brian and Marcelo suggest) or declare a new type (using type .. and to declare recursive types):
type List = Token list
and Token =
| LInt of int
| LString of string
| LList of List
| LFunction of string * List
PS: If you're writing F# then I'd recommend following standard naming guidelines and using PascalCase with a more descriptive name for type names. What does "l" stand for? Could you expand it (thanks to type inference, you won't need to write the type name anyway).
LList is a constructor, not a type. Just use the associated type directly:
...
| LFunction of string * (lToken list)
(My ML is very rusty; I'm not sure whether the parentheses are right.)
LList is not the name of a type; lToken is. Perhaps you want lToken list there instead?

Resources