Why can't a statement be added on the same line in a record expression? - f#

Hm, well that was a hard question to name appropriately. Anyway, I'm wondering why, given this type declaration
type T = {
C : int
}
This does not compile:
let foo () =
{
C = printfn ""; 3
}
but this does:
let foo () =
{
C =
printfn ""; 3
}
Compiler bug or by design?

"Works as designed" probably more than a "bug", but it's just an overall weird thing to do.
Semicolon is an expression sequencing operator (as in your intended usage), but also a record field separator. In the first case, the parser assumes the latter, and gets confused by it. In the second case, by indenting it you make it clear that the semicolon means expression sequencing.
You could get away with this without splitting the field over two lines:
let foo () =
{
C = (printfn ""; 3)
}

Related

Generically add field to anonymous record using a function

I would like to be able to take an arbitrary record as a parameter and return an anonymous record with a field added using the copy-and-update syntax.
For example, this works:
let fooBar =
{| Foo = ()
Bar = () |}
let fooBarBaz = {| fooBar with Baz = () |}
But I would like to do this:
let fooBar =
{| Foo = ()
Bar = () |}
let inline addBaz a = {| a with Baz = () |} (* The input to a copy-and-update expression that creates an anonymous record must be either an anonymous record or a record *)
let fooBarBaz = addBaz fooBar
Is there a way to do this in F#?
No, that's not possible.
Think about it, if that function was possible, what would be the type ?
Is it something that "fits" into existing F# type system?
The type should be something like val addField: x: {| FieldName<1>: 't1; FieldName<2>: 't2; ... FieldName<n>: 'tn;|} -> {| FieldName<1>: 't1; FieldName<2>: 't2; ... FieldName<n>: 'tn; Baz: unit |}
Clearly something like that's not representable in F# type system.
UPDATE
It was mentioned that adding a record constraint would allow this but this is quite far from reality. A record constraint would just filter the argument to be a record but the problem which remains is how do the type system express that the function takes a type ^T when ^T : record and returns something like ^U when ^U : ^T_butWithAnAdditionalField
Also, on the SRTP side there is a way to "read" a field by adding a get_FieldName constraint but not to write, moreover the possibility to read a field allow us to read a field of a known name, not any name.
Conclusion: F# type system is very far away from allowing to express stuff like this and the SRTP mechanism is not there either.
Allowing an "is record" constraint shouldn't be that complicated but it won't solve anything here.
This is currently not possible. You can find a discussion about it in this proposal:
#807: Allow record to be a generic constraint
If I understand correctly, the solution they mention in the proposal is that you have to be able to constrain the input type to be record. Something that is currently not possible.
But as Gus points out in his answer, a second consideration is that you also need to have a way to specify the output type because the output type depends on the input type.
UPDATE:
There is some discussion going on whether the linked to proposal is relevant or not or if the conclusion I draw from it it correct. First of all, I do not work on the compiler, so I don't know what the biggest hurdle would be. But below analysis does not seem unreasonable to me.
First, let's get rid of generics to have a feel for what's going on:
type Foo =
{ Foo: int }
let addBaz (x: Foo) = // Foo -> {| Baz: unit; Foo: int|}
{| x with Baz = () |}
This compiles and works as expected and does pretty much what is asked, but without the input argument being generic.
A few things to note:
There is no need to explicitly define the output type.
There is no need for a dynamic type system.
Because the compiler can infer the type definition from the code: an anonymous record with all fields from the input type Foo and the additional Baz field with unit type (this would also work if Foo already had a Baz field in which case it would be replaced).
If we make it generic we get the following compilation error
let addBaz (x: 'a) =
{| x with Baz = () |}
FS3245: The input to a copy-and-update expression that creates an anonymous record must be either an anonymous record or a record
And that makes sense, because there is no way to say that x is a record. We could give it a class or just an int.
So, assuming we could constrain the input to be a record, could this in theory work?
Let's check a few cases by deducing types as the compiler would do for some examples:
type Foo =
{ Foo: int }
type BazInt =
{ Baz: int }
type BazUnit =
{ Baz: unit }
type MoreFields =
{ Bar: string
Foo: int }
let addBaz (x: BazInt) =
{| x with Baz = () |}
let test () =
let foo` = addBaz { Foo = 5 } // OK: addBaz monomorphizes* to `Foo -> {| Baz: unit; Foo: int |}`
let bazInt` = addBaz { BazInt.Baz = 5 } // OK: addBaz monomorphizes* to `BazInt -> {| Baz: unit |}`
let bazUnit` = addBaz { BazUnit.Baz = () } // OK: addBaz monomorphizes* to `BazUnit -> {| Baz: unit |}`
let moreFields` = addBaz { MoreFields.Foo = 1; MoreFields.Bar = "" } // OK: addBaz monomorphizes* to `MoreFields -> {| Bar: string; Baz: unit; Foo: int |}`
*: I used monomorphization because it's the terminology I know from C++ templates, I'm not sure if .net uses the same terminology. But AFAIK, it doesn't matter for this discussion and the idea is the same: you just replace the generic input type 'a with the concrete type used where the function is called.
UPDATE 2:
I now get what Gus points to with the return type. The problem isn't that it can't be deduced as I showed. The problem is that somehow you need to show the generic output type. Something that currently isn't possible in F#. The return type only becomes 'complete' when it the function is called. In C++ this is perfectly fine, but in F# it isn't.

No assignment has given for field

I have the following code:
type Client =
{ Name : string; Income : int ; YearsInJob : int
UsesCreditCard : bool; CriminalRecord : bool }
type QueryInfo =
{ Title : string
Check : Client -> bool
Positive : Decision
Negative : Decision }
and Decision =
| Result of string
| Querys of QueryInfo
let tree =
Querys {Title = "More than €40k"
Check = (fun cl -> cl.Income > 40000)
Positive = moreThan40
Negative = lessThan40}
But in the last line :
Querys {Title = "More than €40k"
Check = (fun cl -> cl.Income > 40000)
Positive = moreThan40
Negative = lessThan40}
I have an erorr:
No assignment has given for field 'Check' of type 'Script.QueryInfo'
F# is white-space sensitive, which means that white-space is used to indicate scope. The code as given doesn't compile because Check appears too far to the left.
This, on the other hand, ought to compile (if moreThan40 and lessThan40 are correctly defined):
let tree =
Querys {Title = "More than €40k"
Check = (fun cl -> cl.Income > 40000)
Positive = moreThan40
Negative = lessThan40}
The curly brackets here do not indicate scope, but instead the start and end of a record. Because of the incorrect indentation in the OP, the compiler sees the Check binding as outside the scope of the record expression. That's the reason it complains that no value has been bound to the field Check.
Until you get used to significant white-space, it can be a bit annoying, but it does save you from a lot of explicit opening and closing of scopes (e.g. with curly brackets), so in my opinion, it's a benefit once you get used to it.

Need advice on the Swift while statement

The following code got compilation error:
var a : Int = 0
var b : Int = 3
var sum : Int = 0
while (sum = a+b) < 2 {
}
The error message is:
Cannot invoke '<' with an argument list of type '((()),
IntegerLiteralConvertible)'
How to solve this problem? (Of course I can put sum assignment statement out side the while statement. But this is not convenient. Any other advice? Thanks
In many other languages, including C and Objective-C, sum = a+b would return the value of sum, so it could be compared.
In Swift, this doesn't work. This was done intentionally to avoid a common programmer error. From The Swift Programming Language:
Swift supports most standard C operators and improves several capabilities to eliminate common coding errors. The assignment operator (=) does not return a value, to prevent it from being mistakenly used when the equal to operator (==) is intended.
Since the assignment operator does not return a value, it can't be compared with another value.
It is not possible to overload the default assignment operator (=), but you could create a new operator or overload one of the compound operators to add this functionality. However, this would be unintuitive to future readers of your code, so you may want to simply move the assignment to a separate line.
In most languages, assignments propagate their value -- that is, when you call
sum = a + b
the new value of sum is available for another part of the expression:
doubleSum = (sum = a + b) * 2
Swift doesn't work that way -- the value of sum isn't available after the assignment, so it can't be compared in your while statement. From Apple's documentation:
This feature prevents the assignment operator (=) from being used by
accident when the equal to operator (==) is actually intended. By
making if x = y invalid, Swift helps you to avoid these kinds of
errors in your code.
The other answers explain why your code won't compile. Here is how you can clean it up without calculating sum in the while loop (I'm assuming you want to be able to reassign what sum's getter is, elsewhere.):
var a = 0, b = 3
var getSum = { a + b }
var sum: Int { return getSum() }
while sum < 2 {
...and if you're okay with invoking sum with parentheses:
var a = 0, b = 3
var sum = { a + b }
while sum() < 2 {
You can rewrite it as a for loop, although you'll have to repeat the assignment and addition:
for sum = a+b; sum < 2; sum = a+b {
}

Seq.map from one list to populate another list of different type?

This question is in follow up to an earlier question, Preserving Field names across the F#/C# boundary
Because of the current limitation encountered with F# type providers (see the earlier question), I want to map the type-provider-generated list to my own list of records, in which the record is, in part,
type inspection = {
inspectionID : string;
inspectorID : int;
EstablishmentID : string;
EstablishmentName : string; // other members elided
}
I think the way to do this will use Seq.map, but I am not certain. (Recall I am doing a learning exercise.) So here is what I tried:
type restaurantCsv = CsvProvider<"C:\somepath\RestaurantRatings2013.csv",HasHeaders=true>
// which generates a type, but it is an "erased" type, so member names do not propogate
// over to C#.
type RawInspectionData(filename : string) =
member this.allData = restaurantCsv.Load(filename) // works fine
member this.allInspections =
this.allData.Data
|> Seq.map(fun rcrd -> new inspection[{inspectionID = rcrd.InspectionID;}])
and, of course, the complete statement would have the other member names as part of the inspection, here elided for brevity. Someone pointed me to p 43 of F# For Scientists, which is why I thought to use this format with the curly braces. But this yields a syntax error, "Unexpected symbol '{' in expression. Expected ',', ']' or other token."
Hopefully, though, this snippet is adequate to show what I would like to do, create a Generated Type from the Erased Type. How can I accomplish this?
Your code is going in the right direction. When using Seq.map (which is like Select in LINQ), you need to turn a single element of the original sequence into a single element of the new sequence. So the lambda function just needs to create a single instance of the record.
A record is constructed using { Field1 = value1; Field2 = value2; ... } so you need:
type RawInspectionData(filename : string) =
let allData = restaurantCsv.Load(filename) // works fine
member this.allInspections =
allData.Data
|> Seq.map(fun rcrd -> {inspectionID = rcrd.InspectionID})
I also changed allData from a member to a local let definition (which makes it private field of the class). I suppose that your original code new inspection[{...}] tried to create a singleton array with the element - to create an array you'd write [| { Field = value; ... } |] (and the compiler would infer the type of the array for you). But in this case, no arrays are needed.

Where/how to declare the unique key of variables in a compiler written in Ocaml?

I am writing a compiler of mini-pascal in Ocaml. I would like my compiler to accept the following code for instance:
program test;
var
a,b : boolean;
n : integer;
begin
...
end.
I have difficulties in dealing with the declaration of variables (the part following var). At the moment, the type of variables is defined like this in sib_syntax.ml:
type s_var =
{ s_var_name: string;
s_var_type: s_type;
s_var_uniqueId: s_uniqueId (* key *) }
Where s_var_uniqueId (instead of s_var_name) is the unique key of the variables. My first question is, where and how I could implement the mechanism of generating a new id (actually by increasing the biggest id by 1) every time I have got a new variable. I am wondering if I should implement it in sib_parser.mly, which probably involves a static variable cur_id and the modification of the part of binding, again don't know how to realize them in .mly. Or should I implement the mechanism at the next stage - the interpreter.ml? but in this case, the question is how to make the .mly consistent with the type s_var, what s_var_uniqueId should I provide in the part of binding?
Another question is about this part of statement in .mly:
id = IDENT COLONEQ e = expression
{ Sc_assign (Sle_var {s_var_name = id; s_var_type = St_void}, e) }
Here, I also need to provide the next level (the interpreter.ml) a variable of which I only know the s_var_name, so what could I do regarding its s_var_type and s_var_uniqueId here?
Could anyone help? Thank you very much!
The first question to ask yourself is whether you actually need an unique id. From my experience, they're almost never necessary or even useful. If what you're trying to do is making variables unique through alpha-equivalence, then this should happen after parsing is complete, and will probably involve some form of DeBruijn indices instead of unique identifiers.
Either way, a function which returns a new integer identifier every time it is called is:
let unique =
let last = ref 0 in
fun () -> incr last ; !last
let one = unique () (* 1 *)
let two = unique () (* 2 *)
So, you can simply assign { ... ; s_var_uniqueId = unique () } in your Menhir rules.
The more important problem you're trying to solve here is that of variable binding. Variable x is defined in one location and used in another, and you need to determine that it happens to be the same variable in both places. There are many ways of doing this, one of them being to delay the binding until the interpreter. I'm going to show you how to deal with this during parsing.
First, I'm going to define a context: it's a set of variables that allows you to easily retrieve a variable based on its name. You might want to create it with hash tables or maps, but to keep things simple I will be using List.assoc here.
type s_context = {
s_ctx_parent : s_context option ;
s_ctx_bindings : (string * (int * s_type)) list ;
s_ctx_size : int ;
}
let empty_context parent = {
s_ctx_parent = parent ;
s_ctx_bindings = [] ;
s_ctx_size = 0
}
let bind v_name v_type ctx =
try let _ = List.assoc ctx.s_ctx_bindings v_name in
failwith "Variable is already defined"
with Not_found ->
{ ctx with
s_ctx_bindings = (v_name, (ctx.s_ctx_size, v_type))
:: ctx.s_ctx_bindings ;
s_ctx_size = ctx.s_ctx_size + 1 }
let rec find v_name ctx =
try 0, List.assoc ctx.s_ctx_bindings v_name
with Not_found ->
match ctx.s_ctx_parent with
| Some parent -> let depth, found = find v_name parent in
depth + 1, found
| None -> failwith "Variable is not defined"
So, bind adds a new variable to the current context, find looks for a variable in the current context and its parents, and returns both the bound data and the depth at which it was found. So, you could have all global variables in one context, then all parameters of a function in another context that has the global context as its parent, then all local variables in a function (when you'll have them) in a third context that has the function's main context as the parent, and so on.
So, for instance, find 'x' ctx will return something like 0, (3, St_int) where 0 is the DeBruijn index of the variable, 3 is the position of the variable in the context identified by the DeBruijn index, and St_int is the type.
type s_var = {
s_var_deBruijn: int;
s_var_type: s_type;
s_var_pos: int
}
let find v_name ctx =
let deBruijn, (pos, typ) = find v_name ctx in
{ s_var_deBruijn = deBruijn ;
s_var_type = typ ;
s_var_pos = pos }
Of course, you need your functions to store their context, and make sure that the first argument is the variable at position 0 within the context:
type s_fun =
{ s_fun_name: string;
s_fun_type: s_type;
s_fun_params: context;
s_fun_body: s_block; }
let context_of_paramlist parent paramlist =
List.fold_left
(fun ctx (v_name,v_type) -> bind v_name v_type ctx)
(empty_context parent)
paramlist
Then, you can change your parser to take into account the context. The trick is that instead of returning an object representing part of your AST, most of your rules will return a function that takes a context as an argument and returns an AST node.
For instance:
int_expression:
(* Constant : ignore the context *)
| c = INT { fun _ -> Se_const (Sc_int c) }
(* Variable : look for the variable inside the contex *)
| id = IDENT { fun ctx -> Se_var (find id ctx) }
(* Subexpressions : pass the context to both *)
| e1 = int_expression o = operator e2 = int_expression
{ fun ctx -> Se_binary (o, e1 ctx, e2 ctx) }
;
So, you simply propagate the context "down" recursively through the expressions. The only clever parts are those when new contexts are created (you don't have this syntax yet, so I'm just adding a placeholder):
| function_definition_expression (args, body)
{ fun ctx -> let ctx = context_of_paramlist (Some ctx) args in
{ s_fun_params = ctx ;
s_fun_body = body ctx } }
As well as the global context (the program rule itself does not return a function, but the block rule does, and so a context is created from the globals and provided).
prog:
PROGRAM IDENT SEMICOLON
globals = variables
main = block
DOT
{ let ctx = context_of_paramlist None globals in
{ globals = ctx;
main = main ctx } }
All of this makes the implementation of your interpreter much easier due to the DeBruijn indices: you can have a "stack" which holds your values (of type value) defined as:
type stack = value array list
Then, reading and writing variable x is as simple as:
let read stack x =
(List.nth stack x.s_var_deBruijn).(x.s_var_pos)
let write stack x value =
(List.nth stack x.s_var_deBruijn).(x.s_var_pos) <- value
Also, since we made sure that function parameters are in the same order as their position in the function context, if you want to call function f and its arguments are stored in the array args, then constructing the stack is as simple as:
let inner_stack = args :: stack in
(* Evaluate f.s_fun_body with inner_stack here *)
But I'm sure you'll have a lot more questions to ask when you start working on your interpeter ;)
How to create a global id generator:
let unique =
let counter = ref (-1) in
fun () -> incr counter; !counter
Test:
# unique ();;
- : int = 0
# unique ();;
- : int = 1
Regarding your more general design question: it seems that your data representation does not faithfully represent the compiler phases. If you must return a type-aware data-type (with this field s_var_type) after the parsing phase, something is wrong. You have two choices:
devise a more precise data representation for the post-parsing AST, that would be different from the post-typing AST, and not have those s_var_type fields. Typing would then be a conversion from the untyped to the typed AST. This is a clean solution that I would recommend.
admit that you must break the data representation semantics because you don't have enough information at this stage, and try to be at peace with the idea of returning garbage such as St_void after the parsing phase, to reconstruct the correct information later. This is less typed (as you have an implicit assumption on your data which is not apparent in the type), more pragmatic, ugly but sometimes necessary. I don't think it's the right decision in this case, but you will encounter situation where it's better to be a bit less typed.
I think the specific choice of unique id handling design depends on your position on this more general question, and your concrete decisions about types. If you choose a finer-typed representation of post-parsing AST, it's your choice to decide whether to include unique ids or not (I would, because generating a unique ID is dead simple and doesn't need a separate pass, and I would rather slightly complexify the grammar productions than the typing phase). If you choose to hack the type field with a dummy value, it's also reasonable to do that for variable ids if you wish to, putting 0 as a dummy value and defining it later; but still I personally would do that in the parsing phase.

Resources