List of Dictionary's vs Sequence of Dictionary's - f#

I am having trouble understanding the difference between F#'s List and Seq in this example. I thought that the main difference was that Seq was kind of lazy, but I must be missing something.
This code snippet:
open System.Collections.Generic
let arr =
["a"; "b"; "c"]
|> Seq.map (fun a -> let dic = Dictionary () in dic.Add("key", a); dic) in
arr
|> Seq.iter (fun a ->
printfn "here";
a.["key"] <- "something"
);
arr
|> Seq.iter (fun a -> printfn "%s" a.["key"])
Gives
here
here
here
a
b
c
Whereas (replacing the first Seq with List)
open System.Collections.Generic
let arr =
["a"; "b"; "c"]
|> List.map (fun a -> let dic = Dictionary () in dic.Add("key", a); dic) in
arr
|> Seq.iter (fun a ->
a.["key"] <- "something"
);
arr
|> Seq.iter (fun a -> printfn "%s" a.["key"])
Gives
something
something
something
Why do the Dictionary values not change when I use Seq? The elements are clearly visited as the here is printed.
Thanks in advance.

The reason is precisely that Seq is "kind of lazy", as you put it.
It's "lazy" in the sense that it gets evaluated every single time you ask it to. All of it. Up to the last non-lazy thing.
In particular, the call to Seq.map is lazy. It does not create a new structure in memory that is full of dictionaries. Instead, it creates something that you could call a "pipeline". This pipeline starts with your list ["a"; "b"; "c"] and then there is an instruction: every time somebody tries to iterate over this sequence, create a new dictionary for every element. The "every time" bit is important there - since you're iterating over the sequence twice (once to print "here" and another time to print the values), the dictionaries get created twice as well. The dictionary into which you push "something" and the dictionary from which you obtain "key" are not the same dictionary.
To illustrate further, try this:
let s = ["a";"b";"c"] |> Seq.map( fun x -> printfn "got %s" x; x )
s |> Seq.iter(printfn "here's %s")
s |> Seq.iter(printfn "again %s")
This will print the following:
got a
here's a
got b
here's b
got c
here's c
got a
again a
got b
again b
got c
again c
See how the "got" output happens twice for each element? That's because Seq.map works every time you iterate, not just once.
Not so with lists. Every time you List.map, you create a whole new list in memory. It just sits there forever (where "forever" is defined "until garbage collector gets to it") and waits for you to do something with it. If you do multiple things with it, it's still the same list, it doesn't get re-created. That is why your dictionaries are always the same dictionaries, they don't get created anew, like the ones in Seq. That is why you can modify them and see the modifications next time you look.
You can achieve a similar, though not quite identical effect with sequences with the help of Seq.cache. This function takes a regular on-demand-evaluating sequence and returns you a sequence that is identical, except every element only gets evaluated once.
Unlike a list though, Seq.cache will not evaluate the whole sequence the moment it's called. Instead, it will create a mutable cache, which gets updated every time you evaluate.
This is useful for cases when a sequence is very large, or even infinite, but you only need to work with a small finite number of elements at the start of it.
Illustration:
let s = ["a";"b";"c"]
|> Seq.map( fun x -> printfn "got %s" x; x )
|> Seq.cache
s |> Seq.iter(printfn "here's %s")
s |> Seq.iter(printfn "again %s")
Output:
got a
here's a
got b
here's b
got c
here's c
again a
again b
again c

I added some printfns to both examples so you can see the difference:
let arr =
["a"; "b"; "c"]
|> Seq.map (fun a -> printfn "seq: %s" a
let dic = Dictionary ()
dic.Add("key", a)
dic)
arr
|> Seq.iter (fun a ->
printfn "here seq"
a.["key"] <- "something"
)
arr
|> Seq.iter (fun a -> printfn "%s" a.["key"])
produces the following output:
seq: a
here seq
seq: b
here seq
seq: c
here seq
seq: a
a
seq: b
b
seq: c
c
While this one:
let arr =
["a"; "b"; "c"]
|> List.map (fun a -> printfn "list: %s" a
let dic = Dictionary ()
dic.Add("key", a)
dic)
arr
|> Seq.iter (fun a ->
printfn "here list";
a.["key"] <- "something"
)
arr
|> Seq.iter (fun a -> printfn "%s" a.["key"])
produces this output:
list: a
list: b
list: c
here list
here list
here list
something
something
something
As you can see the behavior is quite different.
Seq.map is lazy which means it remains as a function to be invoked later only when strictly necessary. Every time it is invoked it starts from the beginning mapping each element as they are needed. Seq.map gets called twice, one for each Seq.iter and every time it creates a new Dictionary for each element which then is discarded by the garbage collector.
On the other hand, List.map gets invoked only once and it goes over the whole input list creating a new list of dictionaries only one time.

Related

Why can't I go twice through the rows of a CSV provider?

In some languages after one goes through a lazy sequence it becomes exhausted. That is not the case with F#:
let mySeq = seq [1..5]
mySeq |> Seq.iter (fun x -> printfn "%A" <| x)
mySeq |> Seq.iter (fun x -> printfn "%A" <| x)
1
2
3
4
5
1
2
3
4
5
However, it looks like one can go only once through the rows of a CSV provider:
open FSharp.Data
[<Literal>]
let foldr = __SOURCE_DIRECTORY__ + #"\data\"
[<Literal>]
let csvPath = foldr + #"AssetInfoFS.csv"
type AssetsInfo = CsvProvider<Sample=csvPath,
HasHeaders=true,
ResolutionFolder=csvPath,
AssumeMissingValues=false,
CacheRows=false>
let assetInfo = AssetsInfo.Load(csvPath)
assetInfo.Rows |> Seq.iter (fun x -> printfn "%A" <| x) // Works fine 1st time
assetInfo.Rows |> Seq.iter (fun x -> printfn "%A" <| x) // 2nd time exception
Why does that happen?
From this link on the CSV Parser, the CSV Type Provider is built on top of the CSV Parser. The CSV Parser works in streaming mode, most likely by calling a method like File.ReadLines, which will throw an exception if the enumerator is enumerated a second time. The CSV Parser also has a Cache method. Try setting CacheRows=true (or leaving it out of the declaration since its default value is true) to avoid this issue
CsvProvider<Sample=csvPath,
HasHeaders=true,
ResolutionFolder=csvPath,
AssumeMissingValues=false,
CacheRows=true>
The sequence iterator stays put where you point it; after the first loop, that is the end of the sequence.
If you want it to go back to the beginning, you have to set it there.

why does Seq.isEmpty say not enough elements?

nums is indeed seq of int when I mouse over. Any idea what's going on?
This function line is intended to be the equivalent of C#'s DefaultIfEmpty Linq function.
The general idea is take a space delimited line of strings and write out which ones occur count number of times.
code:
open System
[<EntryPoint>]
let main argv =
let tests = Console.ReadLine() |> int
for i in [0..tests] do
let (length, count) = Console.ReadLine()
|> (fun s -> s.Split [|' '|])
|> (fun split -> Int32.Parse(split.[0]), Int32.Parse(split.[1]))
Console.ReadLine()
|> (fun s -> s.Split [|' '|])
|> Seq.map int
|> Seq.take length
|> Seq.groupBy (fun x -> x)
|> Seq.map (fun (key, group) -> key, Seq.sum group)
|> Seq.where (fun (_, countx) -> countx = count)
|> Seq.map (fun (n, _) -> n)
|> (fun nums -> if Seq.isEmpty nums then "-1" else String.Join(" ", nums))
|> Console.WriteLine
0 // return an integer exit code
Sample input:
3
9 2
4 5 2 5 4 3 1 3 4
So, sequences in F# use lazy evaluation. That means that when you use functions such as map, where, take etc, the results are not evaluated immediately.
The results are only evaluated when the sequence is actually enumerated. When you call Seq.isEmpty you trigger a call to MoveNext() which results in the first element of the result sequence being evaluated - in your case this results in a large chain of functions being evaluated.
In this case, the InvalidOperationException is actually being triggered by Seq.take which throws if the sequence doesn't have sufficient elements. This might surprise you coming from C# where Enumerable.Take will take up to the requested number of elements but could take fewer if you reach the end of the sequence.
If you want this behaviour in F#, you need to replace Seq.take with Seq.truncate.

Merging two lists in F#

I wrote this function which merges two lists together but as I'm fairly new to functional programming I was wondering whether there is a better (simpler) way to do it?
let a = ["a"; "b"; "c"]
let b = ["d"; "b"; "a"]
let merge a b =
// take all a and add b
List.fold (fun acc elem ->
let alreadyContains = acc |> List.exists (fun item -> item = elem)
if alreadyContains = true then
acc
else
elem :: acc |> List.rev
) b a
let test = merge a b
Expected result is: ["a"; "b"; "c"; "d"], I'm reverting the list in order to keep the original order. I thought I would be able to achieve the same using List.foldBack (and dropping List.rev) but it results in an error:
Type mismatch. Expecting a
'a
but given a
'a list
The resulting type would be infinite when unifying ''a' and ''a list'
Why is there a difference when using foldBack?
You could use something like the following
let merge a b =
a # b
|> Seq.distinct
|> List.ofSeq
Note that this will preserve order and remove any duplicates.
In F# 4.0 this will be simplified to
let merge a b = a # b |> List.distinct
If I wanted to write this in a way that is similar to your original version (using fold), then the main change I would do is to move List.rev outside of the function (you are calling List.rev every time you add a new element, which is wrong if you're adding even number of elements!)
So, a solution very similar to yours would be:
let merge a b =
(b, a)
||> List.fold (fun acc elem ->
let alreadyContains = acc |> List.exists (fun item -> item = elem)
if alreadyContains = true then acc
else elem :: acc)
|> List.rev
This uses the double-pipe operator ||> to pass two parameters to the fold function (this is not necessary, but I find it a bit nicer) and then passes the result to List.rev.

Convert a sequence of dictionary keys to a set

The following code lists the set of keys found in a dictionary sequence (each dict is basically a row from a database). (I want to convert the keys to a set so I can compare 2 db tables)
for seqitem in tblseq do
let keyset = seqitem.Keys |> Set.ofSeq // works correctly
printfn ">>> List: %A; Item Type: %A" keyset
Rather than print the keyset however I want to return it from a function but am having a problem with type inference. Tried the following but it does not work;
What I want to do is return these values as either an array of list (rather than print them)
let get_keyset tblseq =
tblseq |> Seq.iter (fun x ->
x.Keys |> Set.ofSeq
)
What am I missing here?
Using Seq.map as ildjarn suggests is one option (you may want to add Array.ofSeq to the end to get array of sets as you say in your qurestion).
An alternative approach is to use array comprehension:
let get_keyset (tblseq:seq<System.Collections.Generic.Dictionary<_, _>>) =
[| for x in tblseq -> x.Keys |> Set.ofSeq |]
The notation [| .. |] says that you want to create an array of elements and the expression following -> specifies what should be produced as an element. The syntax is essentially just a nicer way for writing Seq.map (although it supports more features).
You can also use this syntax for creating sets (instead of calling Set.ofSeq). In this case, it doesn't make much sense, because Set.ofSeq is faster and sorhter, but sometimes it is quite neat option. It allows you to avoid type annotations, because you can get key of a dictionary using KeyValue pattern:
let get_keyset tblseq =
[| for x in tblseq ->
set [ for (KeyValue(k, v)) in x -> k ] |]
Use Seq.map rather than Seq.iter:
let get_keyset tblseq =
tblseq
|> Seq.map (fun (x:Dictionary<_,_>) -> x.Keys |> set)
|> Array.ofSeq

F# Manage multiple lazy sequences from a single method?

I am trying to figure out how to manage multiple lazy sequences from a single function in F#.
For example, in the code below, I am trying to get two sequences - one that returns all files in the directories, and one that returns a sequence of tuples of any directories that could not be accessed (for example due to permissions) with the exception.
While the below code compiles and runs, errorSeq never has any elements when used by other code, even though I know that UnauthorizedAccess exceptions have occurred.
I am using F# 2.0.
#light
open System.IO
open System
let rec allFiles errorSeq dir =
Seq.append
(try
dir |> Directory.GetFiles
with
e -> Seq.append errorSeq [|(dir, e)|]
|> ignore
[||]
)
(try
dir
|> Directory.GetDirectories
|> Seq.map (allFiles errorSeq)
|> Seq.concat
with
e -> Seq.append errorSeq [|(dir, e)|]
|> ignore
Seq.empty
)
[<EntryPoint>]
let main args =
printfn "Arguments passed to function : %A" args
let errorSeq = Seq.empty
allFiles errorSeq args.[0]
|> Seq.filter (fun x -> (Path.GetExtension x).ToLowerInvariant() = ".jpg")
|> Seq.iter Console.WriteLine
errorSeq
|> Seq.iter (fun x ->
Console.WriteLine("Error")
x)
0
If you wanted to take a more functional approach, here's one way to do it:
let rec allFiles (errorSeq, fileSeq) dir =
let files, errs =
try
Seq.append (dir |> Directory.GetFiles) fileSeq, errorSeq
with
e -> fileSeq, Seq.append [dir,e] errorSeq
let subdirs, errs =
try
dir |> Directory.GetDirectories, errs
with
e -> [||], Seq.append [dir,e] errs
Seq.fold allFiles (errs, files) subdirs
Now we pass the sequence of errors and the sequence of files into the function each time and return new sequences created by appending to them within the function. I think that the imperative approach is a bit easier to follow in this case, though.
Seq.append returns a new sequence, so this
Seq.append errorSeq [|(dir, e)|]
|> ignore
[||]
has no effect. Perhaps you want your function to return a tuple of two sequences? Or use some kind of mutable collection to write errors as you encounter them?

Resources