I have seq<Nullable<int>> and need to create nice plot, but without null values.
Here is my code:
open System
#r """..\packages\FSharp.Charting.0.90.14\lib\net40\FSharp.Charting.dll"""
#load """..\packages\FSharp.Charting.0.90.14\FSharp.Charting.fsx"""
open FSharp.Charting
//in a real world replaced by .csv with empty values
let seqWithNullInt = seq[Nullable 10 ; Nullable 20 ; Nullable (); Nullable 40; Nullable 50]
//let seqWithNullInt = seq[ 10 ; 20 ; 30; 40; 50] //works fine
let bothSeq = seqWithNullInt |> Seq.zip {1..5}
Chart.Line bothSeq // Error because of nullable int
And here is my vision:
How to skip null values? I don't want to replace them by nearest or something, I need to skip them from chart.. Is there any solution?
Something like this might work (note that I've used Option values instead of nullables, since that's more idiomatic in F#):
let neitherPairHasNoneInValue (pair1, pair2) =
pair1 |> snd |> Option.isSome && pair2 |> snd |> Option.isSome
let seqWithNone = Seq.ofList [Some 10; Some 20; None; Some 40; Some 50]
let pairsWithoutNone = seqWithNone
|> Seq.zip {1..5}
|> Seq.pairwise
|> Seq.filter neitherPairHasNoneInValue
printfn "%A" pairsWithoutNone
This will output [(1,10),(2,20) ; (4,40),(5,50)]. I don't know the FSharp.Charting API offhand so I can't tell you which function will take that list of X,Y pairs and draw the graph you want, but it should be relatively straightforward to get from there to your graph.
Related
I am looking for a type similar to sequences in F# where indices could be big integers, rather that being restricted to int. Does there exist anything like this?
By "big integer indices" I mean a type which allows for something equivalent to that:
let s = Seq.initInfinite (fun i -> i + 10I)
The following will generate an infinite series of bigints:
let s = Seq.initInfinite (fun i -> bigint i + 10I)
What i suspect you actually want though is a Map<'Key, 'Value>.
This lets you efficiently use a bigint as an index to look up whatever value it is you care about:
let map =
seq {
1I, "one"
2I, "two"
3I, "three"
}
|> Map.ofSeq
// val map : Map<System.Numerics.BigInteger,string> =
// map [(1, "one"); (2, "two"); (3, "three")]
map.TryFind 1I |> (printfn "%A") // Some "one"
map.TryFind 4I |> (printfn "%A") // None
The equivalent of initInfinite for BigIntegers would be
let inf = Seq.unfold (fun i -> let n = i + bigint.One in Some(n, n)) bigint.Zero
let biggerThanAnInt = inf |> Seq.skip (Int32.MaxValue) |> Seq.head // 2147483648
which takes ~2 min to run on my machine.
However, I doubt this is of any practical use :-) That is unless you start at some known value > Int32.MaxValue and stop reasonably soon (generating less than Int32.MaxValue items), which then could be solved by offsetting the BigInt indexes into the Int32 domain.
Theoretically you could amend the Seq module with functions working with BigIntegers to skip / window / ... an amount of items > Int32.MaxValue (e.g. by repeatedly performing the corresponding Int32 variant)
Since you want to index into a sequence, I assume you want a version of Seq.item that takes a BigInteger as index. There's nothing like that built into F#, but it's easy to define your own:
open System.Numerics
module Seq =
let itemI (index : BigInteger) source =
source |> Seq.item (int index)
Note that no new type is needed unless you're planning to create sequences that are longer than 2,147,483,647 items, which would probably not be practical anyway.
Usage:
let items = [| "moo"; "baa"; "oink" |]
items
|> Seq.itemI 2I
|> printfn "%A" // output: "oink"
My data is a SEQUENCE of:
[(40,"TX");(48,"MO");(15,"TX");(78,"TN");(41,"VT")]
My code is as follows:
type Csvfile = CsvProvider<somefile>
let data = Csvfile.GetSample().Rows
let nullid row =
row.Id = 15
let otherid row =
row.Id= 40
let iddata =
data
|> Seq.filter (not nullid)
|> Seq.filter (not otherid)
I create the functions.
Then I want to call the "not" of those functions to filter them out of a sequence.
But the issue is that I am getting errors for "row.Id" in the first two functions, because you can only do that with a type.
How do I solve this problem so I can accomplish this successfully.
My result should be a SEQUENCE of:
[(48,"MO);(78,"TN");(41,"VT")]
You can use >> operator to compose the two functions:
let iddata =
data
|> Seq.filter (nullid >> not)
|> Seq.filter (othered >> not)
See Function Composition and Pipelining.
Or you can make it more explicit:
let iddata =
data
|> Seq.filter (fun x -> not (nullid x))
|> Seq.filter (fun x -> not (othered x))
You can see that in action:
let input = [|1;2;3;4;5;6;7;8;9;10|];;
let is3 value =
value = 3;;
input |> Seq.filter (fun x -> not (is3 x));;
input |> Seq.filter (not >> is3);;
They both print val it : seq<int> = seq [1; 2; 4; 5; ...]
Please see below what an MCVE might look in your case, for an fsx file you can reference the Fsharp.Data dll with #r, for a compiled project just reference the dll an open it.
#if INTERACTIVE
#r #"..\..\SO2018\packages\FSharp.Data\lib\net45\FSharp.Data.dll"
#endif
open FSharp.Data
[<Literal>]
let datafile = #"C:\tmp\data.csv"
type CsvFile = CsvProvider<datafile>
let data = CsvFile.GetSample().Rows
In the end this is what you want to achieve:
data
|> Seq.filter (fun x -> x.Id <> 15)
|> Seq.filter (fun x -> x.Id <> 40)
//val it : seq<CsvProvider<...>.Row> = seq [(48, "MO"); (78, "TN"); (41, "VT")]
One way to do this is with SRTP, as they allow a way to do structural typing, where the type depends on its shape, for example in this case having the Id property. If you want you can define helper function for the two numbers 15 and 40, and use that in your filter, just like in the second example. However SRTP syntax is a bit strange, and it's designed for a use case where you need to apply a function to different types that have some similarity (basically like interfaces).
let inline getId row =
(^T : (member Id : int) row)
data
|> Seq.filter (fun x -> (getId x <> 15 ))
|> Seq.filter (fun x -> (getId x <> 40))
//val it : seq<CsvProvider<...>.Row> = seq [(48, "MO"); (78, "TN"); (41, "VT")]
Now back to your original post, as you correctly point out your function will show an error, as you define it to be generic, but it needs to operate on a specific Csv row type (that has the Id property). This is very easy to fix, just add a type annotation to the row parameter. In this case your type is CsvFile.Row, and since CsvFile.Row has the Id property we can access that in the function. Now this function returns a Boolean. You could make it return the actual row as well.
let nullid (row: CsvFile.Row) =
row.Id = 15
let otherid (row: CsvFile.Row) =
row.Id = 40
Then what is left is applying this inside a Seq.filter and negating it:
let iddata =
data
|> Seq.filter (not << nullid)
|> Seq.filter (not << otherid)
|> Seq.toList
//val iddata : CsvProvider<...>.Row list = [(48, "MO"); (78, "TN"); (41, "VT")]
I'm loading a sequence of records into a deedle data frame (from a database table). Is it possible to accumulate (for example sum cumulatively) the values, and get back a data frame? For example there is Series.scanValues but there is no Frame.scanValues. There is Frame.map, but it didn't do what I expected, it left all values as they were.
#if INTERACTIVE
#r #"Fsharp.Charting"
#load #"..\..\Deedle.fsx"
#endif
open FSharp.Charting
open FSharp.Charting.ChartTypes
open Deedle
type SeriesX = {
DataDate:DateTime
Series1:float
Series2:float
Series3:float
}
let rnd = new System.Random()
rnd.NextDouble() - 0.5
let data =
[for i in [100..-1..1] ->
{SeriesX.DataDate = DateTime.Now.AddDays(float -i)
SeriesX.Series1 = rnd.NextDouble() - 0.5
SeriesX.Series2 = rnd.NextDouble() - 0.5
SeriesX.Series3 = rnd.NextDouble() - 0.5
}
]
# now comes the deedle frame:
let df = data |> Frame.ofRecords
let df = df.IndexRows<DateTime>("DataDate")
df.["Series1"] |> Chart.Line
df.["Series1"].ScanValues((fun acc x -> acc + x),0.0) |> Chart.Line
let df' = df |> Frame.mapValues (Seq.scan (fun acc x -> acc + x) 0.0)
df'.["Series1"] |> Chart.Line
The last two lines just give me back the original values while I would like to have the accumulated values like in df.["Series1"].Scanvalues for Series1, Series2, and Series3.
For filtering and projection, series provides Where and Select methods
and corresponding Series.map and Series.filter functions (there is
also Series.mapValues and Series.mapKeys if you only want to transform
one aspect).
So you just apply your function to each Series:
let allSum =
df.Columns
|> Series.mapValues(Series.scanValues(fun acc v -> acc + (v :?> float)) 0.0)
|> Frame.ofColumns
and use Frame.ofColumns that to convert the result to the Frame.
Edit:
If you need to select only numerics columns, you can use the Frame.getNumericCols:
let allSum =
df
|> Frame.getNumericCols
|> Series.mapValues(Series.scanValues (+) 0.0)
|> Frame.ofColumns
without an explicit type cast code has become more beautiful :)
There is a Series.scanValues function. You can obtain a series from every column in your data frame like this: frame$column, which gets you a Series.
If you need all the columns at once to do the scan, you could first map each row into a single value (a tuple, for example) and the apply the Series.scanValues to that new column.
I'm trying to implement Kosaraju's algorithm on a large graph
as part of an assignment [MOOC Algo I Stanford on Coursera]
https://en.wikipedia.org/wiki/Kosaraju%27s_algorithm
The current code works on a small graph, but I'm hitting Stack Overflow during runtime execution.
Despite having read the relevant chapter in Expert in F#, or other available examples on websites and SO, i still don't get how to use continuation to solve this problem
Below is the full code for general purpose, but it will already fail when executing DFSLoop1 and the recursive function DFSsub inside. I think I'm not making the function tail recursive [because of the instructions
t<-t+1
G.[n].finishingtime <- t
?]
but i don't understand how i can implement the continuation properly.
When considering only the part that fails, DFSLoop1 is taking as argument a graph to which we will apply Depth-First Search. We need to record the finishing time as part of the algo to proceed to the second part of the algo in a second DFS Loop (DFSLoop2) [of course we are failing before that].
open System
open System.Collections.Generic
open System.IO
let x = File.ReadAllLines "C:\Users\Fagui\Documents\GitHub\Learning Fsharp\Algo Stanford I\PA 4 - SCC.txt";;
// let x = File.ReadAllLines "C:\Users\Fagui\Documents\GitHub\Learning Fsharp\Algo Stanford I\PA 4 - test1.txt";;
// val x : string [] =
let splitAtTab (text:string)=
text.Split [|'\t';' '|]
let splitIntoKeyValue (A: int[]) =
(A.[0], A.[1])
let parseLine (line:string)=
line
|> splitAtTab
|> Array.filter (fun s -> not(s=""))
|> Array.map (fun s-> (int s))
|> splitIntoKeyValue
let y =
x |> Array.map parseLine
//val it : (int * int) []
type Children = int[]
type Node1 =
{children : Children ;
mutable finishingtime : int ;
mutable explored1 : bool ;
}
type Node2 =
{children : Children ;
mutable leader : int ;
mutable explored2 : bool ;
}
type DFSgraphcore = Dictionary<int,Children>
let directgraphcore = new DFSgraphcore()
let reversegraphcore = new DFSgraphcore()
type DFSgraph1 = Dictionary<int,Node1>
let reversegraph1 = new DFSgraph1()
type DFSgraph2 = Dictionary<int,Node2>
let directgraph2 = new DFSgraph2()
let AddtoGraph (G:DFSgraphcore) (n,c) =
if not(G.ContainsKey n) then
let node = [|c|]
G.Add(n,node)
else
let c'= G.[n]
G.Remove(n) |> ignore
G.Add (n, Array.append c' [|c|])
let inline swaptuple (a,b) = (b,a)
y|> Array.iter (AddtoGraph directgraphcore)
y|> Array.map swaptuple |> Array.iter (AddtoGraph reversegraphcore)
for i in directgraphcore.Keys do
if reversegraphcore.ContainsKey(i) then do
let node = {children = reversegraphcore.[i] ;
finishingtime = -1 ;
explored1 = false ;
}
reversegraph1.Add (i,node)
else
let node = {children = [||] ;
finishingtime = -1 ;
explored1 = false ;
}
reversegraph1.Add (i,node)
directgraphcore.Clear |> ignore
reversegraphcore.Clear |> ignore
// for i in reversegraph1.Keys do printfn "%d %A" i reversegraph1.[i].children
printfn "pause"
Console.ReadKey() |> ignore
let num_nodes =
directgraphcore |> Seq.length
let DFSLoop1 (G:DFSgraph1) =
let mutable t = 0
let mutable s = -1
let mutable k = num_nodes
let rec DFSsub (G:DFSgraph1)(n:int) (cont:int->int) =
//how to make it tail recursive ???
G.[n].explored1 <- true
// G.[n].leader <- s
for j in G.[n].children do
if not(G.[j].explored1) then DFSsub G j cont
t<-t+1
G.[n].finishingtime <- t
// end of DFSsub
for i in num_nodes .. -1 .. 1 do
printfn "%d" i
if not(G.[i].explored1) then do
s <- i
( DFSsub G i (fun s -> s) ) |> ignore
// printfn "%d %d" i G.[i].finishingtime
DFSLoop1 reversegraph1
printfn "pause"
Console.ReadKey() |> ignore
for i in directgraphcore.Keys do
let node = {children =
directgraphcore.[i]
|> Array.map (fun k -> reversegraph1.[k].finishingtime) ;
leader = -1 ;
explored2= false ;
}
directgraph2.Add (reversegraph1.[i].finishingtime,node)
let z = 0
let DFSLoop2 (G:DFSgraph2) =
let mutable t = 0
let mutable s = -1
let mutable k = num_nodes
let rec DFSsub (G:DFSgraph2)(n:int) (cont:int->int) =
G.[n].explored2 <- true
G.[n].leader <- s
for j in G.[n].children do
if not(G.[j].explored2) then DFSsub G j cont
t<-t+1
// G.[n].finishingtime <- t
// end of DFSsub
for i in num_nodes .. -1 .. 1 do
if not(G.[i].explored2) then do
s <- i
( DFSsub G i (fun s -> s) ) |> ignore
// printfn "%d %d" i G.[i].leader
DFSLoop2 directgraph2
printfn "pause"
Console.ReadKey() |> ignore
let table = [for i in directgraph2.Keys do yield directgraph2.[i].leader]
let results = table |> Seq.countBy id |> Seq.map snd |> Seq.toList |> List.sort |> List.rev
printfn "%A" results
printfn "pause"
Console.ReadKey() |> ignore
Here is a text file with a simple graph example
1 4
2 8
3 6
4 7
5 2
6 9
7 1
8 5
8 6
9 7
9 3
(the one which is causing overflow is 70Mo big with around 900,000 nodes)
EDIT
to clarify a few things first
Here is the "pseudo code"
Input: a directed graph G = (V,E), in adjacency list representation. Assume that the vertices V are labeled
1, 2, 3, . . . , n.
1. Let Grev denote the graph G after the orientation of all arcs have been reversed.
2. Run the DFS-Loop subroutine on Grev, processing vertices according to the given order, to obtain a
finishing time f(v) for each vertex v ∈ V .
3. Run the DFS-Loop subroutine on G, processing vertices in decreasing order of f(v), to assign a leader
to each vertex v ∈ V .
4. The strongly connected components of G correspond to vertices of G that share a common leader.
Figure 2: The top level of our SCC algorithm. The f-values and leaders are computed in the first and second
calls to DFS-Loop, respectively (see below).
Input: a directed graph G = (V,E), in adjacency list representation.
1. Initialize a global variable t to 0.
[This keeps track of the number of vertices that have been fully explored.]
2. Initialize a global variable s to NULL.
[This keeps track of the vertex from which the last DFS call was invoked.]
3. For i = n downto 1:
[In the first call, vertices are labeled 1, 2, . . . , n arbitrarily. In the second call, vertices are labeled by
their f(v)-values from the first call.]
(a) if i not yet explored:
i. set s := i
ii. DFS(G, i)
Figure 3: The DFS-Loop subroutine.
Input: a directed graph G = (V,E), in adjacency list representation, and a source vertex i ∈ V .
1. Mark i as explored.
[It remains explored for the entire duration of the DFS-Loop call.]
2. Set leader(i) := s
3. For each arc (i, j) ∈ G:
(a) if j not yet explored:
i. DFS(G, j)
4. t + +
5. Set f(i) := t
Figure 4: The DFS subroutine. The f-values only need to be computed during the first call to DFS-Loop, and
the leader values only need to be computed during the second call to DFS-Loop.
EDIT
i have amended the code, with the help of an experienced programmer (a lisper but who has no experience in F#) simplifying somewhat the first part to have more quickly an example without bothering about non-relevant code for this discussion.
The code focuses only on half of the algo, running DFS once to get finishing times of the reversed tree.
This is the first part of the code just to create a small example
y is the original tree. the first element of a tuple is the parent, the second is the child. But we will be working with the reverse tree
open System
open System.Collections.Generic
open System.IO
let x = File.ReadAllLines "C:\Users\Fagui\Documents\GitHub\Learning Fsharp\Algo Stanford I\PA 4 - SCC.txt";;
// let x = File.ReadAllLines "C:\Users\Fagui\Documents\GitHub\Learning Fsharp\Algo Stanford I\PA 4 - test1.txt";;
// val x : string [] =
let splitAtTab (text:string)=
text.Split [|'\t';' '|]
let splitIntoKeyValue (A: int[]) =
(A.[0], A.[1])
let parseLine (line:string)=
line
|> splitAtTab
|> Array.filter (fun s -> not(s=""))
|> Array.map (fun s-> (int s))
|> splitIntoKeyValue
// let y =
// x |> Array.map parseLine
//let y =
// [|(1, 4); (2, 8); (3, 6); (4, 7); (5, 2); (6, 9); (7, 1); (8, 5); (8, 6);
// (9, 7); (9, 3)|]
// let y = Array.append [|(1,1);(1,2);(2,3);(3,1)|] [|for i in 4 .. 10000 do yield (i,4)|]
let y = Array.append [|(1,1);(1,2);(2,3);(3,1)|] [|for i in 4 .. 99999 do yield (i,i+1)|]
//val it : (int * int) []
type Children = int list
type Node1 =
{children : Children ;
mutable finishingtime : int ;
mutable explored1 : bool ;
}
type Node2 =
{children : Children ;
mutable leader : int ;
mutable explored2 : bool ;
}
type DFSgraphcore = Dictionary<int,Children>
let directgraphcore = new DFSgraphcore()
let reversegraphcore = new DFSgraphcore()
type DFSgraph1 = Dictionary<int,Node1>
let reversegraph1 = new DFSgraph1()
let AddtoGraph (G:DFSgraphcore) (n,c) =
if not(G.ContainsKey n) then
let node = [c]
G.Add(n,node)
else
let c'= G.[n]
G.Remove(n) |> ignore
G.Add (n, List.append c' [c])
let inline swaptuple (a,b) = (b,a)
y|> Array.iter (AddtoGraph directgraphcore)
y|> Array.map swaptuple |> Array.iter (AddtoGraph reversegraphcore)
// définir reversegraph1 = ... with....
for i in reversegraphcore.Keys do
let node = {children = reversegraphcore.[i] ;
finishingtime = -1 ;
explored1 = false ;
}
reversegraph1.Add (i,node)
for i in directgraphcore.Keys do
if not(reversegraphcore.ContainsKey(i)) then do
let node = {children = [] ;
finishingtime = -1 ;
explored1 = false ;
}
reversegraph1.Add (i,node)
directgraphcore.Clear |> ignore
reversegraphcore.Clear |> ignore
// for i in reversegraph1.Keys do printfn "%d %A" i reversegraph1.[i].children
printfn "pause"
Console.ReadKey() |> ignore
let num_nodes =
directgraphcore |> Seq.length
So basically the graph is (1->2->3->1)::(4->5->6->7->8->....->99999->10000)
and the reverse graph is (1->3->2->1)::(10000->9999->....->4)
here is the main code written in direct style
//////////////////// main code is below ///////////////////
let DFSLoop1 (G:DFSgraph1) =
let mutable t = 0
let mutable s = -1
let rec iter (n:int) (f:'a->unit) (list:'a list) : unit =
match list with
| [] -> (t <- t+1) ; (G.[n].finishingtime <- t)
| x::xs -> f x ; iter n f xs
let rec DFSsub (G:DFSgraph1) (n:int) : unit =
let my_f (j:int) : unit = if not(G.[j].explored1) then (DFSsub G j)
G.[n].explored1 <- true
iter n my_f G.[n].children
for i in num_nodes .. -1 .. 1 do
// printfn "%d" i
if not(G.[i].explored1) then do
s <- i
DFSsub G i
printfn "%d %d" i G.[i].finishingtime
// End of DFSLoop1
DFSLoop1 reversegraph1
printfn "pause"
Console.ReadKey() |> ignore
its not tail recursive, so we use continuations, here is the same code adapted to CPS style:
//////////////////// main code is below ///////////////////
let DFSLoop1 (G:DFSgraph1) =
let mutable t = 0
let mutable s = -1
let rec iter_c (n:int) (f_c:'a->(unit->'r)->'r) (list:'a list) (cont: unit->'r) : 'r =
match list with
| [] -> (t <- t+1) ; (G.[n].finishingtime <- t) ; cont()
| x::xs -> f_c x (fun ()-> iter_c n f_c xs cont)
let rec DFSsub (G:DFSgraph1) (n:int) (cont: unit->'r) : 'r=
let my_f_c (j:int)(cont:unit->'r):'r = if not(G.[j].explored1) then (DFSsub G j cont) else cont()
G.[n].explored1 <- true
iter_c n my_f_c G.[n].children cont
for i in maxnum_nodes .. -1 .. 1 do
// printfn "%d" i
if not(G.[i].explored1) then do
s <- i
DFSsub G i id
printfn "%d %d" i G.[i].finishingtime
DFSLoop1 reversegraph1
printfn "faré"
printfn "pause"
Console.ReadKey() |> ignore
both codes compile and give the same results for the small example (the one in comment) or the same tree that we are using , with a smaller size (1000 instead of 100000)
so i don't think its a bug in the algo here, we've got the same tree structure, just a bigger tree is causing problems. it looks to us the continuations are well written. we've typed the code explicitly. and all calls end with a continuation in all cases...
We are looking for expert advice !!! thanks !!!
I did not try to understand the whole code snippet, because it is fairly long, but you'll certainly need to replace the for loop with an iteration implemented using continuation passing style. Something like:
let rec iterc f cont list =
match list with
| [] -> cont ()
| x::xs -> f x (fun () -> iterc f cont xs)
I didn't understand the purpose of cont in your DFSub function (it is never called, is it?), but the continuation based version would look roughly like this:
let rec DFSsub (G:DFSgraph2)(n:int) cont =
G.[n].explored2 <- true
G.[n].leader <- s
G.[n].children
|> iterc
(fun j cont -> if not(G.[j].explored2) then DFSsub G j cont else cont ())
(fun () -> t <- t + 1)
Overflowing the stack when you recurse through hundreds of thousands of entries isn't bad at all, really. A lot of programming language implementations will choke on much shorter recursions than that. You're having serious programmer problems — nothing to be ashamed of!
Now if you want to do deeper recursions than your implementation will handle, you need to transform your algorithm so it is iterative and/or tail-recursive (the two are isomorphic — except that tail-recursion allows for decentralization and modularity, whereas iteration is centralized and non-modular).
To transform an algorithm from recursive to tail-recursive, which is an important skill to possess, you need to understand the state that is implicitly stored in a stack frame, i.e. those free variables in the function body that change across the recursion, and explicitly store them in a FIFO queue (a data structure that replicates your stack, and can be implemented trivially as a linked list). Then you can pass that linked list of reified frame variables as an argument to your tail recursive functions.
In more advanced cases where you have many tail recursive functions each with a different kind of frame, instead of simple self-recursion, you may need to define some mutually recursive data types for the reified stack frames, instead of using a list. But I believe Kosaraju's algorithm only involves self-recursive functions.
OK, so the code given above was the RIGHT code !
the problem lies with the compiler of F#
here is some words about it from Microsoft
http://blogs.msdn.com/b/fsharpteam/archive/2011/07/08/tail-calls-in-fsharp.aspx
Basically, be careful with the settings, in default mode, the compiler may NOT make automatically the tail calls. To do so, in VS2015, go to the Solution Explorer, right click with the mouse and click on "Properties" (the last element of the scrolling list)
Then in the new window, click on "Build" and tick the box "Generate tail calls"
It is also to check if the compiler did its job looking at the disassembly using
ILDASM.exe
you can find the source code for the whole algo in my github repository
https://github.com/FaguiCurtain/Learning-Fsharp/blob/master/Algo%20Stanford/Algo%20Stanford/Kosaraju_cont.fs
on a performance point of view, i'm not very satisfied. The code runs on 36 seconds on my laptop. From the forum with other fellow MOOCers, C/C++/C# typically executes in subsecond to 5s, Java around 10-15, Python around 20-30s.
So my implementation is clearly not optimized. I am now happy to hear about tricks to make it faster !!! thanks !!!!
I have an array of items, from which I'd like to sample.
I was under the impression that a Set would the a good structure to sample from, in a fold where I'd give back the original or a modified set with the retrieved element missing depending if I want replacement of not.
However, there seems to no method to retrieve an element directly from a Set.
Is there something I am missing ? or should I use Set of indices, along with a surrogate function that starts at some random position < Set.count and goes up until it finds a member ?
That is, something along this line
module Seq =
let modulo (n:int) start =
let rec next i = seq { yield (i + 1)%n ; yield! next (i+1)}
next start
module Array =
let Sample (withReplacement:bool) seed (entries:'T array) =
let prng, indexes = new Random(seed), Set(Seq.init (entries |> Array.length) id)
Seq.unfold (fun set -> let N = set |> Set.count
let next = Seq.modulo N (prng.Next(N)) |> Seq.truncate N |> Seq.tryFind(fun i -> set |> Set.exists ((=) i))
if next.IsSome then
Some(entries.[next.Value], if withReplacement then set else Set.remove next.Value set)
else
None)
Edit : Tracking positively what I gave, instead of tracking what I still can give would make it simpler and more efficient.
For sampling without replacement, you could just permute the source seq and take however many elements you want to sample
let sampleWithoutReplacement n s =
let a = Array.ofSeq s
seq { for i = a.Length downto 1 do
let j = rnd.Next i
yield a.[j]
a.[j] <- a.[i - 1] }
|> Seq.take n
To sample with replacement, just pick a random element n times from the source seq
let sampleWithReplacement n s =
let a = Array.ofSeq s
Seq.init n (fun _ -> a.[rnd.Next(a.Length)])
These may not be the most efficient methods with huge data sets however
Continuing our comments...if you want to randomly sample a sequence without slurping the entire thing into memory you could generate a set of random indices the size of your desired sample (not too different from what you already have):
let rand count max =
System.Random()
|> Seq.unfold (fun r -> Some(r.Next(max), r))
|> Seq.distinct
|> Seq.take count
|> set
let takeSample sampleSize inputSize input =
let indices = rand sampleSize inputSize
input
|> Seq.mapi (fun idx x ->
if Set.contains idx indices then Some x else None)
|> Seq.choose id
let inputSize = 100000
let input = Seq.init inputSize id
let sample = takeSample 50 inputSize input
printfn "%A" (Seq.toList sample)