Dealing with .NET generic dictionaries in F#? - f#

I am not a functional programmer.
I am learning F#.
I got a problem here.
Let me start from following piece of code:
type XmlNode(tagName, innerValue) =
member this.TagName = tagName
member this.InnerValue = innerValue
member this.Atts = Dictionary<string, obj>()
I don't use F# dict because (as I know) that one is readonly, however I obviously need to modify my attributes.
So I am really struggling to make it pure functional way:
type XmlNode with member this.WriteTo (output:StringBuilder) =
output.Append("<" + this.TagName) |> ignore
//let writeAtts =
// List.map2 (fun key value -> " " + key + "=" + value.ToString())
(List.ofSeq this.Atts.Keys) (List.ofSeq this.Atts.Values)
// |> List.reduce (fun acc str -> acc + " " + str)
//output.Append((writeAtts)) |> ignore
output.Append(">" + this.InnerValue + "</" + this.TagName + ">") |> ignore
output
The code I commented out was my (probably stupid) attemp to use mapping and reduction to concat all the atts in the single correctly formatted string. And that compiles OK.
But when I try to access my Atts property:
[<EntryPoint>]
let main argv =
let root = new XmlNode("root", "test")
root.Atts.Add("att", "val") // trying to add a new KVP
let output = new StringBuilder()
printfn "%O" (root.WriteTo(output))
Console.ReadLine()|>ignore
0 // return an integer exit code
...new attribute does not appear inside the Atts property, i.e. it remains empty.
So:
1) help me to make my code more functional.
2) and to understand how to deal with modificable dictionaries in F#.
Thank you.

First, your immediate problem: the way you defined the Atts property, it's not one value that is "stored" somewhere and is accessible via property. Instead, your definition means "every time somebody reads this property, create a new dictionary and return it". This is why your new attribute doesn't appear in the dictionary: it's a different dictionary every time you read root.Atts.
To create a property with a backing field and initial value, use member val:
type XmlNode(...) =
...
member val Atts = Dictionary<string,object>()
Now, answers to some implied questions.
First order of business: "modify the attributes" and "purely functional" are contradictory ideas. Functional programming implies immutable data. Nothing changes ever. The way to advance your computation is to create a new datum at every step, without overwriting the previous one. This basic idea turns out to be immensely valuable in practice: safer threading, trivial "undo" scenarios, trivial parallelization, trivial distribution to other machines, and even reduced memory consumption via persistent data structures.
Immutability is a very important point, and I urge you not to glance over it. Accepting it requires a mental shift. From my own (and other people I know) experience, it is very hard coming from imperative programming, but it is well worth it.
Second: do not use classes and properties. Technically speaking, object-oriented programming (in the sense of message passing) is not contradictory to functional, but the Enterprise flavor that is used in practice and implemented in C++, Java, C# et al., is contradictory, because it emphasizes this idea that "methods are operations that change an object's state", which is not functional (see above). So it's better to avoid object-oriented constructs, at least while you're learning. And especially since F# provides much better ways to encode data:
type XmlNode = { TagName: string; InnerValue: string; Atts: (string*string) list }
(notice how my Atts is not a dictionary; we'll come to this in a bit)
Similarly, to represent operations on your data, use functions, not methods:
let printNode (node: XmlNode) = (* we'll come to the implementation later *)
Third: why do you say that you "obviously" need to modify the attributes? The code you've shown does not call for this. For example, using my definition of XmlNode above, I can rewrite your code this way:
[<EntryPoint>]
let main argv =
let root = { TagName = "root"; InnerValue = "test"; Atts = ["att", "val"] }
printfn "%s" (printNode root)
...
But even if that was a real need, you shouldn't do it "in place". As I've described above while talking about immutability, you should not mutate the existing node, but rather create a new node that differs from the original one in whatever way you wanted to "modify":
let addAttr node name value = { node with Atts = (name, value) :: node.Atts }
In this implementation, I take a node and name/value of an attribute, and produce a new node whose Atts list consists of whatever was in the original node's Atts with the new attribute prepended.
The original Atts list stays intact, unmodified. But this does not mean twice the memory consumption: because we know that the original list never changes, we can reuse it: we create the new list by only allocating memory for the new item and including a reference to the old list as "other items". If the old list was subject to change, we couldn't do that, we would have to create a full copy (see "Defensive Copy"). This strategy is known as "Persistent Data Structure". It is one of the pillars of functional programming.
Finally, for string formatting, I recommend using sprintf instead of StringBuilder. It offers similar performance benefits, but in addition provides type safety. For example, code sprintf "%s" 5 will not compile, complaining that the format expects a string, but the final argument 5 is a number. With this, we can implement the printNode function:
let printNode (node: XmlNode) =
let atts = seq { for n, v in node.Atts -> sprintf " %s=\"%s\"" n v } |> String.concat ""
sprintf "<%s%s>%s</%s>" node.TagName atts node.InnerValue node.TagName
For reference, here's your complete program, rewritten in functional style:
type XmlNode = { TagName: string; InnerValue: string; Atts: (string*string) list }
let printNode (node: XmlNode) =
let atts = seq { for n, v in node.Atts -> sprintf " %s=\"%s\"" n v } |> String.concat ""
sprintf "<%s%s>%s</%s>" node.TagName atts node.InnerValue node.TagName
[<EntryPoint>]
let main argv =
let root = { TagName = "root"; InnerValue = "test"; Atts = ["att", "val"] }
printfn "%s" (printNode root)
Console.ReadLine() |> ignore
0

Related

Type inference F# - how to generate fresh variables?

i'm trying to develop the algorithm W in f# for type inference, but i would like to understand how to write the function for generating fresh variables properly.
Actually my function is
let counter = ref -1
let generate_fresh_variable () : string =
let list_variables = ['a' .. 'z'] |> List.map string
counter.Value <- !counter + 1
list_variables.Item(!counter)
but i'm not satisfy with this solution, someone can give me other better ideas?
If you really want to do this with an impure function, I would write it like this:
let mutable counter = -1
let generate_fresh_variable () =
counter <- counter + 1
counter + int 'a'
|> char
|> string
Notes:
Reference cells are obsolete. If you need impurity, use mutable variables instead. (Alternatively, if you really want to stick with a reference cell, the canonical way to update it is with :=, rather than assigning directly to the underlying Value.)
There's no need to maintain a list of potential variable names (and there's especially no need to rebuild the entire list each time you generate a fresh variable).
What happens if you need more than 26 variables?
If you wanted to use some more sophisticated F# tricks, you could create an inifinte sequence of names using a sequence expression (which makes it very easy to handle the looping and dealing with >26 names):
let names = seq {
for i in Seq.initInfinite id do
for c in 'a' .. 'z' do
if i = 0 then yield string c
else yield string c + string i }
A function to get the fresh name would then pick the next name from the sequence. You need to do this using the underlying enumerator. Another nice trick is to hide the state in a local variable and return a function using lambda:
let freshName =
let en = names.GetEnumerator()
fun () ->
ignore(en.MoveNext())
en.Current
Then just call freshName() as many times as you need.

F# hidden mutation

Anyone have a decent example, preferably practical/useful, they could post demonstrating the concept?
I came across this term somewhere that I’m unable to find, probably it has to do something with a function returning a function while enclosing on some mutable variable. So there’s no visible mutation.
Probably Haskell community has originated the idea where mutation happens in another area not visible to the scope. I maybe vague here so seeking help to understand more.
It's a good idea to hide mutation, so the consumers of the API won't inadvartently change something unexpectedly. This just means that you have to encapsulate your mutable data/state. This can be done via objects (yes, objects), but what you are referring to in your question can be done with a closure, the canonical example is a counter:
let countUp =
let mutable count = 0
(fun () -> count <- count + 1
count)
countUp() // 1
countUp() // 2
countUp() // 3
You cannot access the mutable count variable directly.
Another example would be using mutable state within a function so that you cannot observe it, and the function is, for all intents and purposes, referentially transparent. Take for example the following function that reverses a string not character-wise, but rather by taking individual text elements (which, depending on language, can be more than one character):
let reverseStringU s =
if Core.string.IsNullOrEmpty s then s else
let rec iter acc (ee : System.Globalization.TextElementEnumerator) =
if not <| ee.MoveNext () then acc else
let e = ee.GetTextElement ()
iter (e :: acc) ee
let inline append x s = (^s : (member Append : ^x -> ^s) (s, x))
let sb = System.Text.StringBuilder s.Length
System.Globalization.StringInfo.GetTextElementEnumerator s
|> iter []
|> List.fold (fun a e -> append e a) sb
|> string
It uses a StringBuilder internally but you cannot observe this externally.

Avoiding using Option.Value

I have a type like this:
type TaskRow =
{
RowIndex : int
TaskId : string
Task : Task option
}
A function returns a list of these records to be processed further. Some of the functions doing that processing are only relevant for TaskRow items where Task is Some. I'm wondering what the best way is to go about that.
The naive way would be doing
let taskRowsWithTasks = taskRows |> Seq.filter (fun row -> Option.isSome row.Task)
and passing that to those functions, simply assuming that Task will never be None and using Task.Value, risking an NRE if I don't pass in that one special list. That is exactly what the current C# code does but seems rather unidiomatic for F#. I shouldn't be 'assuming' things but rather let the compiler tell me what will work.
More 'functional' would be to pattern match every time the value is relevant and then do/return nothing (and use choose or the like) for None, but that seems repetitive and wasteful as the same work would be done multiple times.
Another thought was introducing a second, slightly different type:
type TaskRowWithTask =
{
RowIndex : int
TaskId : string
Task : Task
}
The original list would then be filtered into a 'sublist' of this type one to be used where appropriate. I guess that would be okay from a functional perspective, but I wonder whether there's a nicer, idiomatic way without resorting to this kind of 'helper type'.
Thanks for any pointers!
There's quite a bit of value knowing that the tasks have already been filtered, so having two different types can be helpful. Instead of defining two different types (which, in F#, isn't that big a deal, though), you could also consider defining a generic Row type:
type Row<'a> = {
RowIndex : int
TaskId : string
Item : 'a }
This enables you to define a projection like this:
let project = function
| { RowIndex = ridx; TaskId = tid; Item = Some t } ->
Some { RowIndex = ridx; TaskId = tid; Item = t }
| _ -> None
let taskRowsWithTasks =
taskRows
|> Seq.map project
|> Seq.choose id
If the initial taskRows value has the type seq<Row<Task option>>, then the resulting taskRowsWithTasks sequence has the type seq<Row<Task>>.
I agree with you, the more "pure functional" way is to repeat the pattern match, I mean use a function with Seq.choose that does the filtering, instead of saving it to another structure.
let tasks = Seq.choose (fun {Task = t} -> t) taskRows
The problem is performance as it would be calculated many times, but you can use Seq.cache so behind the scenes it's saved into an intermediate structure, while keeping your code more "pure functional" looking.

Where/how to declare the unique key of variables in a compiler written in Ocaml?

I am writing a compiler of mini-pascal in Ocaml. I would like my compiler to accept the following code for instance:
program test;
var
a,b : boolean;
n : integer;
begin
...
end.
I have difficulties in dealing with the declaration of variables (the part following var). At the moment, the type of variables is defined like this in sib_syntax.ml:
type s_var =
{ s_var_name: string;
s_var_type: s_type;
s_var_uniqueId: s_uniqueId (* key *) }
Where s_var_uniqueId (instead of s_var_name) is the unique key of the variables. My first question is, where and how I could implement the mechanism of generating a new id (actually by increasing the biggest id by 1) every time I have got a new variable. I am wondering if I should implement it in sib_parser.mly, which probably involves a static variable cur_id and the modification of the part of binding, again don't know how to realize them in .mly. Or should I implement the mechanism at the next stage - the interpreter.ml? but in this case, the question is how to make the .mly consistent with the type s_var, what s_var_uniqueId should I provide in the part of binding?
Another question is about this part of statement in .mly:
id = IDENT COLONEQ e = expression
{ Sc_assign (Sle_var {s_var_name = id; s_var_type = St_void}, e) }
Here, I also need to provide the next level (the interpreter.ml) a variable of which I only know the s_var_name, so what could I do regarding its s_var_type and s_var_uniqueId here?
Could anyone help? Thank you very much!
The first question to ask yourself is whether you actually need an unique id. From my experience, they're almost never necessary or even useful. If what you're trying to do is making variables unique through alpha-equivalence, then this should happen after parsing is complete, and will probably involve some form of DeBruijn indices instead of unique identifiers.
Either way, a function which returns a new integer identifier every time it is called is:
let unique =
let last = ref 0 in
fun () -> incr last ; !last
let one = unique () (* 1 *)
let two = unique () (* 2 *)
So, you can simply assign { ... ; s_var_uniqueId = unique () } in your Menhir rules.
The more important problem you're trying to solve here is that of variable binding. Variable x is defined in one location and used in another, and you need to determine that it happens to be the same variable in both places. There are many ways of doing this, one of them being to delay the binding until the interpreter. I'm going to show you how to deal with this during parsing.
First, I'm going to define a context: it's a set of variables that allows you to easily retrieve a variable based on its name. You might want to create it with hash tables or maps, but to keep things simple I will be using List.assoc here.
type s_context = {
s_ctx_parent : s_context option ;
s_ctx_bindings : (string * (int * s_type)) list ;
s_ctx_size : int ;
}
let empty_context parent = {
s_ctx_parent = parent ;
s_ctx_bindings = [] ;
s_ctx_size = 0
}
let bind v_name v_type ctx =
try let _ = List.assoc ctx.s_ctx_bindings v_name in
failwith "Variable is already defined"
with Not_found ->
{ ctx with
s_ctx_bindings = (v_name, (ctx.s_ctx_size, v_type))
:: ctx.s_ctx_bindings ;
s_ctx_size = ctx.s_ctx_size + 1 }
let rec find v_name ctx =
try 0, List.assoc ctx.s_ctx_bindings v_name
with Not_found ->
match ctx.s_ctx_parent with
| Some parent -> let depth, found = find v_name parent in
depth + 1, found
| None -> failwith "Variable is not defined"
So, bind adds a new variable to the current context, find looks for a variable in the current context and its parents, and returns both the bound data and the depth at which it was found. So, you could have all global variables in one context, then all parameters of a function in another context that has the global context as its parent, then all local variables in a function (when you'll have them) in a third context that has the function's main context as the parent, and so on.
So, for instance, find 'x' ctx will return something like 0, (3, St_int) where 0 is the DeBruijn index of the variable, 3 is the position of the variable in the context identified by the DeBruijn index, and St_int is the type.
type s_var = {
s_var_deBruijn: int;
s_var_type: s_type;
s_var_pos: int
}
let find v_name ctx =
let deBruijn, (pos, typ) = find v_name ctx in
{ s_var_deBruijn = deBruijn ;
s_var_type = typ ;
s_var_pos = pos }
Of course, you need your functions to store their context, and make sure that the first argument is the variable at position 0 within the context:
type s_fun =
{ s_fun_name: string;
s_fun_type: s_type;
s_fun_params: context;
s_fun_body: s_block; }
let context_of_paramlist parent paramlist =
List.fold_left
(fun ctx (v_name,v_type) -> bind v_name v_type ctx)
(empty_context parent)
paramlist
Then, you can change your parser to take into account the context. The trick is that instead of returning an object representing part of your AST, most of your rules will return a function that takes a context as an argument and returns an AST node.
For instance:
int_expression:
(* Constant : ignore the context *)
| c = INT { fun _ -> Se_const (Sc_int c) }
(* Variable : look for the variable inside the contex *)
| id = IDENT { fun ctx -> Se_var (find id ctx) }
(* Subexpressions : pass the context to both *)
| e1 = int_expression o = operator e2 = int_expression
{ fun ctx -> Se_binary (o, e1 ctx, e2 ctx) }
;
So, you simply propagate the context "down" recursively through the expressions. The only clever parts are those when new contexts are created (you don't have this syntax yet, so I'm just adding a placeholder):
| function_definition_expression (args, body)
{ fun ctx -> let ctx = context_of_paramlist (Some ctx) args in
{ s_fun_params = ctx ;
s_fun_body = body ctx } }
As well as the global context (the program rule itself does not return a function, but the block rule does, and so a context is created from the globals and provided).
prog:
PROGRAM IDENT SEMICOLON
globals = variables
main = block
DOT
{ let ctx = context_of_paramlist None globals in
{ globals = ctx;
main = main ctx } }
All of this makes the implementation of your interpreter much easier due to the DeBruijn indices: you can have a "stack" which holds your values (of type value) defined as:
type stack = value array list
Then, reading and writing variable x is as simple as:
let read stack x =
(List.nth stack x.s_var_deBruijn).(x.s_var_pos)
let write stack x value =
(List.nth stack x.s_var_deBruijn).(x.s_var_pos) <- value
Also, since we made sure that function parameters are in the same order as their position in the function context, if you want to call function f and its arguments are stored in the array args, then constructing the stack is as simple as:
let inner_stack = args :: stack in
(* Evaluate f.s_fun_body with inner_stack here *)
But I'm sure you'll have a lot more questions to ask when you start working on your interpeter ;)
How to create a global id generator:
let unique =
let counter = ref (-1) in
fun () -> incr counter; !counter
Test:
# unique ();;
- : int = 0
# unique ();;
- : int = 1
Regarding your more general design question: it seems that your data representation does not faithfully represent the compiler phases. If you must return a type-aware data-type (with this field s_var_type) after the parsing phase, something is wrong. You have two choices:
devise a more precise data representation for the post-parsing AST, that would be different from the post-typing AST, and not have those s_var_type fields. Typing would then be a conversion from the untyped to the typed AST. This is a clean solution that I would recommend.
admit that you must break the data representation semantics because you don't have enough information at this stage, and try to be at peace with the idea of returning garbage such as St_void after the parsing phase, to reconstruct the correct information later. This is less typed (as you have an implicit assumption on your data which is not apparent in the type), more pragmatic, ugly but sometimes necessary. I don't think it's the right decision in this case, but you will encounter situation where it's better to be a bit less typed.
I think the specific choice of unique id handling design depends on your position on this more general question, and your concrete decisions about types. If you choose a finer-typed representation of post-parsing AST, it's your choice to decide whether to include unique ids or not (I would, because generating a unique ID is dead simple and doesn't need a separate pass, and I would rather slightly complexify the grammar productions than the typing phase). If you choose to hack the type field with a dummy value, it's also reasonable to do that for variable ids if you wish to, putting 0 as a dummy value and defining it later; but still I personally would do that in the parsing phase.

Cyclic lists in F#

Is it just me, or does F# not cater for cyclic lists?
I looked at the FSharpList<T> class via reflector, and noticed, that neither the 'structural equals' or the length methods check for cycles. I can only guess if 2 such primitive functions does not check, that most list functions would not do this either.
If cyclic lists are not supported, why is that?
Thanks
PS: Am I even looking at the right list class?
There are many different lists/collection types in F#.
F# list type. As Chris said, you cannot initialize a recursive value of this type, because the type is not lazy and not mutable (Immutability means that you have to create it at once and the fact that it's not lazy means that you can't use F# recursive values using let rec). As ssp said, you could use Reflection to hack it, but that's probably a case that we don't want to discuss.
Another type is seq (which is actually IEnumerable) or the LazyList type from PowerPack. These are lazy, so you can use let rec to create a cyclic value. However, (as far as I know) none of the functions working with them take cyclic lists into account - if you create a cyclic list, it simply means that you're creating an infinite list, so the result of (e.g.) map will be a potentially infinite list.
Here is an example for LazyList type:
#r "FSharp.PowerPack.dll"
// Valid use of value recursion
let rec ones = LazyList.consDelayed 1 (fun () -> ones)
Seq.take 5 l // Gives [1; 1; 1; 1; 1]
The question is what data types can you define yourself. Chris shows a mutable list and if you write operations that modify it, they will affect the entire list (if you interpret it as an infinite data structure).
You can also define a lazy (potentionally cyclic) data type and implement operations that handle cycles, so when you create a cyclic list and project it into another list, it will create cyclic list as a result (and not a potentionally infinite data structure).
The type declaration may look like this (I'm using object type, so that we can use reference equality when checking for cycles):
type CyclicListValue<'a> =
Nil | Cons of 'a * Lazy<CyclicList<'a>>
and CyclicList<'a>(value:CyclicListValue<'a>) =
member x.Value = value
The following map function handles cycles - if you give it a cyclic list, it will return a newly created list with the same cyclic structure:
let map f (cl:CyclicList<_>) =
// 'start' is the first element of the list (used for cycle checking)
// 'l' is the list we're processing
// 'lazyRes' is a function that returns the first cell of the resulting list
// (which is not available on the first call, but can be accessed
// later, because the list is constructed lazily)
let rec mapAux start (l:CyclicList<_>) lazyRes =
match l.Value with
| Nil -> new CyclicList<_>(Nil)
| Cons(v, rest) when rest.Value = start -> lazyRes()
| Cons(v, rest) ->
let value = Cons(f v, lazy mapAux start rest.Value lazyRes)
new CyclicList<_>(value)
let rec res = mapAux cl cl (fun () -> res)
res
The F# list type is essentially a linked list, where each node has a 'next'. This in theory would allow you to create cycles. However, F# lists are immutable. So you could never 'make' this cycle by mutation, you would have to do it at construction time. (Since you couldn't update the last node to loop around to the front.)
You could write this to do it, however the compiler specifically prevents it:
let rec x = 1 :: 2 :: 3 :: x;;
let rec x = 1 :: 2 :: 3 :: x;;
------------------------^^
stdin(1,25): error FS0260: Recursive values cannot appear directly as a construction of the type 'List`1' within a recursive binding. This feature has been removed from the F# language. Consider using a record instead.
If you do want to create a cycle, you could do the following:
> type CustomListNode = { Value : int; mutable Next : CustomListNode option };;
type CustomListNode =
{Value: int;
mutable Next: CustomListNode option;}
> let head = { Value = 1; Next = None };;
val head : CustomListNode = {Value = 1;
Next = null;}
> let head2 = { Value = 2; Next = Some(head) } ;;
val head2 : CustomListNode = {Value = 2;
Next = Some {Value = 1;
Next = null;};}
> head.Next <- Some(head2);;
val it : unit = ()
> head;;
val it : CustomListNode = {Value = 1;
Next = Some {Value = 2;
Next = Some ...;};}
The answer is same for all languages with tail-call optimization support and first-class functions (function types) support: it's so easy to emulate cyclic structures.
let rec x = seq { yield 1; yield! x};;
It's simplest way to emulate that structure by using laziness of seq.
Of course you can hack list representation as described here.
As was said before, your problem here is that the list type is immutable, and for a list to be cyclic you'd have to have it stick itself into its last element, so that doesn't work. You can use sequences, of course.
If you have an existing list and want to create an infinite sequence on top of it that cycles through the list's elements, here's how you could do it:
let round_robin lst =
let rec inner_rr l =
seq {
match l with
| [] ->
yield! inner_rr lst
| h::t ->
yield h
yield! inner_rr t
}
if lst.IsEmpty then Seq.empty else inner_rr []
let listcycler_sequence = round_robin [1;2;3;4;5;6]

Resources