Inconsistent IEnumerable ArgumentException while generating a complex object using FsCheck - f#

The Problem
In F#, I am using FsCheck to generate an object (which I'm then using in an Xunit test, but I can recreate entirely outside of Xunit, so I think we can forget about Xunit). Running the generation 20 times in FSI,
50% of the time, the generation runs successfully.
25% of the time, the generation throws:
System.ArgumentException: The input must be non-negative.
Parameter name: index
> at Microsoft.FSharp.Collections.SeqModule.Item[T](Int32 index, IEnumerable`1 source)
at FsCheck.GenBuilder.bind#62.Invoke(Int32 n, StdGen r0) in C:\Users\Kurt\Projects\FsCheck\FsCheck\src\FsCheck\Gen.fs:line 63
at FsCheck.Gen.go#290-1[b](FSharpList`1 gs, FSharpList`1 acc, Int32 size, StdGen r0) in C:\Users\Kurt\Projects\FsCheck\FsCheck\src\FsCheck\Gen.fs:line 295
at FsCheck.Gen.SequenceToList#297.Invoke(Int32 n, StdGen r) in C:\Users\Kurt\Projects\FsCheck\FsCheck\src\FsCheck\Gen.fs:line 297
at FsCheck.GenBuilder.bind#62.Invoke(Int32 n, StdGen r0) in C:\Users\Kurt\Projects\FsCheck\FsCheck\src\FsCheck\Gen.fs:line 63
at FsCheck.Gen.sample#155[a](Int32 size, Gen`1 gn, Int32 i, StdGen seed, FSharpList`1 samples) in C:\Users\Kurt\Projects\FsCheck\FsCheck\src\FsCheck\Gen.fs:line 157
at FsCheck.Gen.Sample[a](Int32 size, Int32 n, Gen`1 gn) in C:\Users\Kurt\Projects\FsCheck\FsCheck\src\FsCheck\Gen.fs:line 155
at <StartupCode$FSI_0026>.$FSI_0026.main#() in C:\projects\Alberta\Core\TestFunc\Script.fsx:line 57
Stopped due to error
25% of the time, the generation throws:
System.ArgumentException: The input sequence has an insufficient number of elements.
Parameter name: index
> at Microsoft.FSharp.Collections.IEnumerator.nth[T](Int32 index, IEnumerator`1 e)
at Microsoft.FSharp.Collections.SeqModule.Item[T](Int32 index, IEnumerable`1 source)
at FsCheck.GenBuilder.bind#62.Invoke(Int32 n, StdGen r0) in C:\Users\Kurt\Projects\FsCheck\FsCheck\src\FsCheck\Gen.fs:line 63
at FsCheck.Gen.go#290-1[b](FSharpList`1 gs, FSharpList`1 acc, Int32 size, StdGen r0) in C:\Users\Kurt\Projects\FsCheck\FsCheck\src\FsCheck\Gen.fs:line 295
at FsCheck.Gen.SequenceToList#297.Invoke(Int32 n, StdGen r) in C:\Users\Kurt\Projects\FsCheck\FsCheck\src\FsCheck\Gen.fs:line 297
at FsCheck.GenBuilder.bind#62.Invoke(Int32 n, StdGen r0) in C:\Users\Kurt\Projects\FsCheck\FsCheck\src\FsCheck\Gen.fs:line 63
at FsCheck.Gen.sample#155[a](Int32 size, Gen`1 gn, Int32 i, StdGen seed, FSharpList`1 samples) in C:\Users\Kurt\Projects\FsCheck\FsCheck\src\FsCheck\Gen.fs:line 157
at FsCheck.Gen.Sample[a](Int32 size, Int32 n, Gen`1 gn) in C:\Users\Kurt\Projects\FsCheck\FsCheck\src\FsCheck\Gen.fs:line 155
at <StartupCode$FSI_0025>.$FSI_0025.main#() in C:\projects\Alberta\Core\TestFunc\Script.fsx:line 57
Stopped due to error
The Situation
The object is as follows:
type Event =
| InitEvent of string
| RefEvent of string
type Stream = Event seq
The object must be follow these rules to be valid:
All InitEvents must come before all RefEvents
All InitEvents strings must be unique
All RefEvent names must have an earlier corresponding InitEvent
But it's OK if some InitEvents NOT have later corresponding RefEvents
But it's OK if multiple RefEvents have the same name
The Working Workaround
If I have the generator call a function which returns a valid object and do a Gen.constant (function), I never run into the exceptions, but this is not the way FsCheck is meant to be run! :)
/// <summary>
/// This is a non-generator equivalent which is 100% reliable
/// </summary>
let randomStream size =
// valid names for a sample
let names = Gen.sample size size Arb.generate<string> |> List.distinct
// init events
let initEvents = names |> List.map( fun name -> name |> InitEvent )
// reference events
let createRefEvent name = name |> RefEvent
let genRefEvent = createRefEvent <!> Gen.elements names
let refEvents = Gen.sample size size genRefEvent
// combine
Seq.append initEvents refEvents
type MyGenerators =
static member Stream() = {
new Arbitrary<Stream>() with
override x.Generator = Gen.sized( fun size -> Gen.constant (randomStream size) )
}
// repeatedly running the following two lines ALWAYS works
Arb.register<MyGenerators>()
let foo = Gen.sample 10 10 Arb.generate<Stream>
The Broken Right Way?
I cannot seem to completely get away from generating a constant (need to store the list of names outside of the InitEvents so that RefEvent generation can get at them, but I can get more in line with how FsCheck generators seem to work:
type MyGenerators =
static member Stream() = {
new Arbitrary<Stream>() with
override x.Generator = Gen.sized( fun size ->
// valid names for a sample
let names = Gen.sample size size Arb.generate<string> |> List.distinct
// generate inits
let genInits = Gen.constant (names |> List.map InitEvent) |> Gen.map List.toSeq
// generate refs
let makeRef name = name |> RefEvent
let genName = Gen.elements names
let genRef = makeRef <!> genName
Seq.append <!> genInits <*> ( genRef |> Gen.listOf )
)
}
// repeatedly running the following two lines causes the inconsistent errors
// If I don't re-register my generator, I always get the same samples.
// Is this because FsCheck is trying to be deterministic?
Arb.register<MyGenerators>()
let foo = Gen.sample 10 10 Arb.generate<Stream>
What I've Already Checked
Sorry, forgot to mention in original question that I've tried to Debug in Interactive, and because of the inconsistent behavior, it's somewhat hard to track down. However, when the exceptions hit, it seems to be between the end of my generator code and what is asking for the generated samples -- while FsCheck is DOING the generation, it seems to be trying to process a malformed sequence. I'm further assuming that this is because I've incorrectly coded the generator.
IndexOutOfRangeException using FsCheck suggests a potentially similar situation. I have tried running my Xunit tests both via Resharper test runner as well as Xunit's console test runner on the real-world tests on which the above simplification is based. Both runners exhibit identical behavior, so the issue is somewhere else.
Other "How do I generate..." questions such as In FsCheck, how to generate a test record with non-negative fields? and How does one generate a "complex" object in FsCheck? deal with the creation of objects of a lesser complexity. The first was a great help for getting to the code I have, and the second gives a much-needed example of Arb.convert, but Arb.convert doesn't make sense if I'm converting from a "constant" list of randomly generated names. It all seems to come back to that -- the need to make random names, which are then pulled from to make a complete set of InitEvents, and some sequence of RefEvents, both which refer back to the "constant" list, doesn't match anything that I've yet come across.
I've looked through most examples of FsCheck generators I can find, including the included examples in FsCheck: https://github.com/fscheck/FsCheck/blob/master/examples/FsCheck.Examples/Examples.fs These also do not deal with an object needing internal consistency, and do not seem to apply to this case, even though they have been helpful overall.
Perhaps this means that I'm approaching the generation of the object from an unhelpful perspective. If there is a different way to generate an object which follows the above rules, I'm open to switching to it.
Further backing away from the problem, I've seen other SO posts which roughly say "If your object has such restrictions, then what happens when you receive an invalid object? Perhaps you need to rethink the way this object is consumed to better handle invalid cases." If, for example, I were able to initialize on-the-fly a never before seen name in a RefEvent, the entire need for giving an InitEvent first would go away -- the problem gracefully reduces to simply a sequence of RefEvents of some random name. I am open to this kind of solution, but it would require a bit of rework -- in the long run, it may be worth it. In the mean time, the question remains, how can you reliably generate a complex object which follows the above rules using FsCheck?
Thanks!
EDIT(S): Attempts to Solve
The code in Mark Seemann's answer works, but yields a slightly different object than I was looking for (I was unclear in my object rules -- now hopefully clarified). Putting his working code in my generator:
type MyGenerators =
static member Stream() = {
new Arbitrary<Stream>() with
override x.Generator =
gen {
let! uniqueStrings = Arb.Default.Set<string>().Generator
let initEvents = uniqueStrings |> Seq.map InitEvent
let! sortValues =
Arb.Default.Int32()
|> Arb.toGen
|> Gen.listOfLength uniqueStrings.Count
let refEvents =
Seq.zip uniqueStrings sortValues
|> Seq.sortBy snd
|> Seq.map fst
|> Seq.map RefEvent
return Seq.append initEvents refEvents
}
}
This yields an object where every InitEvent has a matching RefEvent, and there is only one RefEvent for each InitEvent. I'm trying to tweak the code so that I can get multiple RefEvents for each name, and not all names need to have a RefEvent. ex: Init foo, Init bar, Ref foo, Ref foo is perfectly valid. Trying to tweak this with:
type MyGenerators =
static member Stream() = {
new Arbitrary<Stream>() with
override x.Generator =
gen {
let! uniqueStrings = Arb.Default.Set<string>().Generator
let initEvents = uniqueStrings |> Seq.map InitEvent
// changed section starts
let makeRef name = name |> RefEvent
let genRef = makeRef <!> Gen.elements uniqueStrings
return! Seq.append initEvents <!> ( genRef |> Gen.listOf )
// changed section ends
}
}
The modified code still exhibits the inconsistent behavior. Interestingly, out of 20 sample runs, only three worked (down from 10), while the insufficient number of elements was thrown 8 times and The input must be non-negative was thrown 9 times -- these changes have made the edge case more than twice as likely to be hit. We're now down to a very small section of code with the error.
Mark quickly responded with another version to address changed requirements:
type MyGenerators =
static member Stream() = {
new Arbitrary<Stream>() with
override x.Generator =
gen {
let! uniqueStrings = Arb.Default.NonEmptySet<string>().Generator
let initEvents = uniqueStrings.Get |> Seq.map InitEvent
let! refEvents =
uniqueStrings.Get |> Seq.map RefEvent |> Gen.elements |> Gen.listOf
return Seq.append initEvents refEvents
}
}
This allowed for some names to not have a RefEvent.
FINAL CODE
A very minor tweak gets it so that duplicate RefEvents can happen:
type MyGenerators =
static member Stream() = {
new Arbitrary<Stream>() with
override x.Generator =
gen {
let! uniqueStrings = Arb.Default.NonEmptySet<string>().Generator
let initEvents = uniqueStrings.Get |> Seq.map InitEvent
let! refEvents =
//uniqueStrings.Get |> Seq.map RefEvent |> Gen.elements |> Gen.listOf
Gen.elements uniqueStrings.Get |> Gen.map RefEvent |> Gen.listOf
return Seq.append initEvents refEvents
}
}
Big thanks to Mark Seemann!

Gen
Here's one way to address the requirements:
open FsCheck
let streamGen = gen {
let! uniqueStrings = Arb.Default.Set<string>().Generator
let initEvents = uniqueStrings |> Seq.map InitEvent
let! sortValues =
Arb.Default.Int32()
|> Arb.toGen
|> Gen.listOfLength uniqueStrings.Count
let refEvents =
Seq.zip uniqueStrings sortValues
|> Seq.sortBy snd
|> Seq.map fst
|> Seq.map RefEvent
return Seq.append initEvents refEvents }
The semi-official answer on how to generate unique strings is to generate a Set<string>. Since Set<'a> also implements 'a seq, you can use all the normal Seq functions on it.
Generating the InitEvent values, then, is a simple map operation over the unique strings.
Since each RefEvent must have a corresponding InitEvent, you can reuse the same unique strings, but you may want to give the RefEvent values on option to appear in a different order. To do that, you can generate sortValues, which is a list of random int values. This list has the same length as the set of strings.
At this point, you have a list of unique strings, and a list of random integers. Here are some fake values that illustrate the concept:
> let uniqueStrings = ["foo"; "bar"; "baz"];;
val uniqueStrings : string list = ["foo"; "bar"; "baz"]
> let sortValues = [42; 1337; 42];;
val sortValues : int list = [42; 1337; 42]
You can now zip them:
> List.zip uniqueStrings sortValues;;
val it : (string * int) list = [("foo", 42); ("bar", 1337); ("baz", 42)]
Sorting such a sequence on its second element will give you a randomly shuffled list, and then you can map to only the first element:
> List.zip uniqueStrings sortValues |> List.sortBy snd |> List.map fst;;
val it : string list = ["foo"; "baz"; "bar"]
Since all InitEvent values must come before the RefEvent values, you can now append refEvents to initEvents, and return this combined list.
Verification
You can verify that streamGen works as intended:
open FsCheck.Xunit
open Swensen.Unquote
let isInitEvent = function InitEvent _ -> true | _ -> false
let isRefEvent = function RefEvent _ -> true | _ -> false
[<Property(MaxTest = 100000)>]
let ``All InitEvents must come before all RefEvents`` () =
Prop.forAll (streamGen |> Arb.fromGen) <| fun s ->
test <# s |> Seq.skipWhile isInitEvent |> Seq.forall isRefEvent #>
[<Property(MaxTest = 100000)>]
let ``All InitEvents strings must be unique`` () =
Prop.forAll (streamGen |> Arb.fromGen) <| fun s ->
let initEventStrings =
s |> Seq.choose (function InitEvent s -> Some s | _ -> None)
let distinctStrings = initEventStrings |> Seq.distinct
distinctStrings |> Seq.length =! (initEventStrings |> Seq.length)
[<Property(MaxTest = 100000)>]
let ``All RefEvent names must have an earlier corresponding InitEvent`` () =
Prop.forAll (streamGen |> Arb.fromGen) <| fun s ->
let initEventStrings =
s
|> Seq.choose (function InitEvent s -> Some s | _ -> None)
|> Seq.sort
|> Seq.toList
let refEventStrings =
s
|> Seq.choose (function RefEvent s -> Some s | _ -> None)
|> Seq.sort
|> Seq.toList
initEventStrings =! refEventStrings
These three properties all pass on my machine.
Looser requirements
Based on the looser requirements outlined in the comments to this answer, here's an updated generator that draws values from the InitEvents strings:
open FsCheck
let streamGen = gen {
let! uniqueStrings = Arb.Default.NonEmptySet<string>().Generator
let initEvents = uniqueStrings.Get |> Seq.map InitEvent
let! refEvents =
uniqueStrings.Get |> Seq.map RefEvent |> Gen.elements |> Gen.listOf
return Seq.append initEvents refEvents }
This time, uniqueStrings is a non-empty set of strings.
You can use Seq.map RefEvent to generate a sequence of all valid RefEvent values based on uniqueStrings, and then Gen.elements to defined a generator of valid RefEvent values that draws from that sequence of valid values. Finally, Gen.listOf creates lists of values generated by that generator.
Tests
These tests demonstrate that streamGen generates values according to the rules:
open FsCheck.Xunit
open Swensen.Unquote
let isInitEvent = function InitEvent _ -> true | _ -> false
let isRefEvent = function RefEvent _ -> true | _ -> false
[<Property(MaxTest = 100000)>]
let ``All InitEvents must come before all RefEvents`` () =
Prop.forAll (streamGen |> Arb.fromGen) <| fun s ->
test <# s |> Seq.skipWhile isInitEvent |> Seq.forall isRefEvent #>
[<Property(MaxTest = 100000)>]
let ``All InitEvents strings must be unique`` () =
Prop.forAll (streamGen |> Arb.fromGen) <| fun s ->
let initEventStrings =
s |> Seq.choose (function InitEvent s -> Some s | _ -> None)
let distinctStrings = initEventStrings |> Seq.distinct
distinctStrings |> Seq.length =! (initEventStrings |> Seq.length)
[<Property(MaxTest = 100000)>]
let ``All RefEvent names must have an earlier corresponding InitEvent`` () =
Prop.forAll (streamGen |> Arb.fromGen) <| fun s ->
let initEventStrings =
s
|> Seq.choose (function InitEvent s -> Some s | _ -> None)
|> Seq.sort
|> Set.ofSeq
test <# s
|> Seq.choose (function RefEvent s -> Some s | _ -> None)
|> Seq.forall initEventStrings.Contains #>
These three properties all pass on my machine.

Related

Trying to create a function, and then filtering a sequence by "not that function" in F#

My data is a SEQUENCE of:
[(40,"TX");(48,"MO");(15,"TX");(78,"TN");(41,"VT")]
My code is as follows:
type Csvfile = CsvProvider<somefile>
let data = Csvfile.GetSample().Rows
let nullid row =
row.Id = 15
let otherid row =
row.Id= 40
let iddata =
data
|> Seq.filter (not nullid)
|> Seq.filter (not otherid)
I create the functions.
Then I want to call the "not" of those functions to filter them out of a sequence.
But the issue is that I am getting errors for "row.Id" in the first two functions, because you can only do that with a type.
How do I solve this problem so I can accomplish this successfully.
My result should be a SEQUENCE of:
[(48,"MO);(78,"TN");(41,"VT")]
You can use >> operator to compose the two functions:
let iddata =
data
|> Seq.filter (nullid >> not)
|> Seq.filter (othered >> not)
See Function Composition and Pipelining.
Or you can make it more explicit:
let iddata =
data
|> Seq.filter (fun x -> not (nullid x))
|> Seq.filter (fun x -> not (othered x))
You can see that in action:
let input = [|1;2;3;4;5;6;7;8;9;10|];;
let is3 value =
value = 3;;
input |> Seq.filter (fun x -> not (is3 x));;
input |> Seq.filter (not >> is3);;
They both print val it : seq<int> = seq [1; 2; 4; 5; ...]
Please see below what an MCVE might look in your case, for an fsx file you can reference the Fsharp.Data dll with #r, for a compiled project just reference the dll an open it.
#if INTERACTIVE
#r #"..\..\SO2018\packages\FSharp.Data\lib\net45\FSharp.Data.dll"
#endif
open FSharp.Data
[<Literal>]
let datafile = #"C:\tmp\data.csv"
type CsvFile = CsvProvider<datafile>
let data = CsvFile.GetSample().Rows
In the end this is what you want to achieve:
data
|> Seq.filter (fun x -> x.Id <> 15)
|> Seq.filter (fun x -> x.Id <> 40)
//val it : seq<CsvProvider<...>.Row> = seq [(48, "MO"); (78, "TN"); (41, "VT")]
One way to do this is with SRTP, as they allow a way to do structural typing, where the type depends on its shape, for example in this case having the Id property. If you want you can define helper function for the two numbers 15 and 40, and use that in your filter, just like in the second example. However SRTP syntax is a bit strange, and it's designed for a use case where you need to apply a function to different types that have some similarity (basically like interfaces).
let inline getId row =
(^T : (member Id : int) row)
data
|> Seq.filter (fun x -> (getId x <> 15 ))
|> Seq.filter (fun x -> (getId x <> 40))
//val it : seq<CsvProvider<...>.Row> = seq [(48, "MO"); (78, "TN"); (41, "VT")]
Now back to your original post, as you correctly point out your function will show an error, as you define it to be generic, but it needs to operate on a specific Csv row type (that has the Id property). This is very easy to fix, just add a type annotation to the row parameter. In this case your type is CsvFile.Row, and since CsvFile.Row has the Id property we can access that in the function. Now this function returns a Boolean. You could make it return the actual row as well.
let nullid (row: CsvFile.Row) =
row.Id = 15
let otherid (row: CsvFile.Row) =
row.Id = 40
Then what is left is applying this inside a Seq.filter and negating it:
let iddata =
data
|> Seq.filter (not << nullid)
|> Seq.filter (not << otherid)
|> Seq.toList
//val iddata : CsvProvider<...>.Row list = [(48, "MO"); (78, "TN"); (41, "VT")]

How to create a dependency between observables?

I want a tool for testing Rx components that would work like this:
Given an order of the events specified as a 'v seq and a key selector function (keySelector :: 'v -> 'k) I want to create a Map<'k, IObservable<'k>> where the guarantee is that the groupped observables yield the values in the global order defined by the above enumerable.
For example:
makeObservables isEven [1;2;3;4;5;6]
...should produce
{ true : -2-4-6|,
false: 1-3-5| }
This is my attempt looks like this:
open System
open System.Reactive.Linq
open FSharp.Control.Reactive
let subscribeAfter (o1: IObservable<'a>) (o2 : IObservable<'b>) : IObservable<'b> =
fun (observer : IObserver<'b>) ->
let tempObserver = { new IObserver<'a> with
member this.OnNext x = ()
member this.OnError e = observer.OnError e
member this.OnCompleted () = o2 |> Observable.subscribeObserver observer |> ignore
}
o1.Subscribe tempObserver
|> Observable.Create
let makeObservables (keySelector : 'a -> 'k) (xs : 'a seq) : Map<'k, IObservable<'a>> =
let makeDependencies : ('k * IObservable<'a>) seq -> ('k * IObservable<'a>) seq =
let makeDep ((_, o1), (k2, o2)) = (k2, subscribeAfter o1 o2)
Seq.pairwise
>> Seq.map makeDep
let makeObservable x = (keySelector x, Observable.single x)
let firstItem =
Seq.head xs
|> makeObservable
|> Seq.singleton
let dependentObservables =
xs
|> Seq.map makeObservable
|> makeDependencies
dependentObservables
|> Seq.append firstItem
|> Seq.groupBy fst
|> Seq.map (fun (k, obs) -> (k, obs |> Seq.map snd |> Observable.concatSeq))
|> Map.ofSeq
[<EntryPoint>]
let main argv =
let isEven x = (x % 2 = 0)
let splits : Map<bool, IObservable<int>> =
[1;2;3;4;5]
|> makeObservables isEven
use subscription =
splits
|> Map.toSeq
|> Seq.map snd
|> Observable.mergeSeq
|> Observable.subscribe (printfn "%A")
Console.ReadKey() |> ignore
0 // return an integer exit code
...but the results are not as expected and the observed values are not in the global order.
Apparently the items in each group are yield correctly but when the groups are merged its more like a concat then a merge
The expected output is: 1 2 3 4 5
...but the actual output is 1 3 5 2 4
What am I doing wrong?
Thanks!
You describe wanting this:
{ true : -2-4-6|,
false: 1-3-5| }
But you're really creating this:
{ true : 246|,
false: 135| }
Since there's no time gaps between the items in the observables, the merge basically has a constant race condition. Rx guarantees that element 1 of a given sequence will fire before element 2, but Merge offers no guarantees around cases like this.
You need to introduce time gaps into your observables if you want Merge to be able to re-sequence in the original order.

Understanding Mutability in F# : case study

I'm a beginner in F#, and this is my first attempt at programming something serious. I'm sorry the code is a bit long, but there are some issues with mutability that I don't understand.
This is an implementation of the Karger MinCut Algorithm to calculate the mincut on a non-directed graph component. I won't discuss here how the algo works,
for more info https://en.wikipedia.org/wiki/Karger%27s_algorithm
What is important is it's a randomized algorithm, which is running a determined number of trial runs, and taking the "best" run.
I realize now that I could avoid a lot of the problems below if I did construct a specific function for each random trial, but I'd like to understand EXACTLY what is wrong in the implementation below.
I'm running the code on this simple graph (the mincut is 2 when we cut the graph
into 2 components (1,2,3,4) and (5,6,7,8) with only 2 edges between those 2 components)
3--4-----5--6
|\/| |\/|
|/\| |/\|
2--1-----7--8
the file simplegraph.txt should encode this graph as follow
(1st column = node number, other columns = links)
1 2 3 4 7
2 1 3 4
3 1 2 4
4 1 2 3 5
5 4 6 7 8
6 5 7 8
7 1 5 6 8
8 5 6 7
This code may look too much as imperative programming yet, I'm sorry for that.
So There is a main for i loop calling each trial.
the first execution, (when i=1) looks smooth and perfect,
but I have runtime error execution when i=2, because it looks some variables,
like WG are not reinitialized correctly, causing out of bound errors.
WG, WG1 and WGmin are type wgraphobj, which are a record of Dictionary objects
WG1 is defined outside the main loop and i make no new assignments to WG1.
[but its type is mutable though, alas]
I defined first WG with the instruction
let mutable WG = WG1
then at the beginning of the for i loop,
i write
WG <- WG1
and then later, i modify the WG object in each trial to make some calculations.
when the trial is finished and we go to the next trial (i is increased) i want to reset WG to its initial state being like WG1.
but it seems its not working, and I don't get why...
Here is the full code
MyModule.fs [some functions not necessary for execution]
namespace MyModule
module Dict =
open System.Collections.Generic
let toSeq d = d |> Seq.map (fun (KeyValue(k,v)) -> (k,v))
let toArray (d:IDictionary<_,_>) = d |> toSeq |> Seq.toArray
let toList (d:IDictionary<_,_>) = d |> toSeq |> Seq.toList
let ofMap (m:Map<'k,'v>) = new Dictionary<'k,'v>(m) :> IDictionary<'k,'v>
let ofList (l:('k * 'v) list) = new Dictionary<'k,'v>(l |> Map.ofList) :> IDictionary<'k,'v>
let ofSeq (s:('k * 'v) seq) = new Dictionary<'k,'v>(s |> Map.ofSeq) :> IDictionary<'k,'v>
let ofArray (a:('k * 'v) []) = new Dictionary<'k,'v>(a |> Map.ofArray) :> IDictionary<'k,'v>
Karger.fs
open MyModule.Dict
open System.IO
let x = File.ReadAllLines "\..\simplegraph.txt";;
// val x : string [] =
let splitAtTab (text:string)=
text.Split [|'\t';' '|]
let splitIntoKeyValue (s:seq<'T>) =
(Seq.head s, Seq.tail s)
let parseLine (line:string)=
line
|> splitAtTab
|> Array.filter (fun s -> not(s=""))
|> Array.map (fun s-> (int s))
|> Array.toSeq
|> splitIntoKeyValue
let y =
x |> Array.map parseLine
open System.Collections.Generic
// let graph = new Map <int, int array>
let graphD = new Dictionary<int,int seq>()
y |> Array.iter graphD.Add
let graphM = y |> Map.ofArray //immutable
let N = y.Length // number of nodes
let Nruns = 2
let remove_table = new Dictionary<int,bool>()
[for i in 1..N do yield (i,false)] |> List.iter remove_table.Add
// let remove_table = seq [|for a in 1 ..N -> false|] // plus court
let label_head_table = new Dictionary<int,int>()
[for i in 1..N do yield (i,i)] |> List.iter label_head_table.Add
let label = new Dictionary<int,int seq>()
[for i in 1..N do yield (i,[i])] |> List.iter label.Add
let mutable min_cut = 1000000
type wgraphobj =
{ Graph : Dictionary<int,int seq>
RemoveTable : Dictionary<int,bool>
Label : Dictionary<int,int seq>
LabelHead : Dictionary<int,int> }
let WG1 = {Graph = graphD;
RemoveTable = remove_table;
Label = label;
LabelHead = label_head_table}
let mutable WGmin = WG1
let IsNotRemoved x = //
match x with
| (i,false) -> true
| (i,true) -> false
let IsNotRemoved1 WG i = //
(i,WG.RemoveTable.[i]) |>IsNotRemoved
let GetLiveNode d =
let myfun x =
match x with
| (i,b) -> i
d |> toList |> List.filter IsNotRemoved |> List.map myfun
let rand = System.Random()
// subsets a dictionary given a sub_list of keys
let D_Subset (dict:Dictionary<'T,'U>) (sub_list:list<'T>) =
let z = Dictionary<'T,'U>() // create new empty dictionary
sub_list |> List.filter (fun k -> dict.ContainsKey k)
|> List.map (fun k -> (k, dict.[k]))
|> List.iter (fun s -> z.Add s)
z
// subsets a dictionary given a sub_list of keys to remove
let D_SubsetC (dict:Dictionary<'T,'U>) (sub_list:list<'T>) =
let z = dict
sub_list |> List.filter (fun k -> dict.ContainsKey k)
|> List.map (fun k -> (dict.Remove k)) |>ignore
z
// subsets a sequence by values in a sequence
let S_Subset (S:seq<'T>)(sub_list:seq<'T>) =
S |> Seq.filter (fun s-> Seq.exists (fun elem -> elem = s) sub_list)
let S_SubsetC (S:seq<'T>)(sub_list:seq<'T>) =
S |> Seq.filter (fun s-> not(Seq.exists (fun elem -> elem = s) sub_list))
[<EntryPoint>]
let main argv =
let mutable u = 0
let mutable v = 0
let mutable r = 0
let mutable N_cut = 1000000
let mutable cluster_A_min = seq [0]
let mutable cluster_B_min = seq [0]
let mutable WG = WG1
let mutable LiveNodeList = [0]
// when i = 2, i encounter problems with mutability
for i in 1 .. Nruns do
WG <- WG1
printfn "%d" i
for k in 1..(N-2) do
LiveNodeList <- GetLiveNode WG.RemoveTable
r <- rand.Next(0,N-k)
u <- LiveNodeList.[r] //selecting a live node
let uuu = WG.Graph.[u] |> Seq.map (fun s -> WG.LabelHead.[s] )
|> Seq.filter (IsNotRemoved1 WG)
|> Seq.distinct
let n_edge = uuu |> Seq.length
let x = rand.Next(1,n_edge)
let mutable ok = false //maybe we can take this out
while not(ok) do
// selecting the edge from node u
v <- WG.LabelHead.[Array.get (uuu |> Seq.toArray) (x-1)]
let vvv = WG.Graph.[v] |> Seq.map (fun s -> WG.LabelHead.[s] )
|> Seq.filter (IsNotRemoved1 WG)
|> Seq.distinct
let zzz = S_SubsetC (Seq.concat [uuu;vvv] |> Seq.distinct) [u;v]
WG.Graph.[u] <- zzz
let lab_u = WG.Label.[u]
let lab_v = WG.Label.[v]
WG.Label.[u] <- Seq.concat [lab_u;lab_v] |> Seq.distinct
if (k<N-1) then
WG.RemoveTable.[v]<-true
//updating Label_head for all members of Label.[v]
WG.LabelHead.[v]<- u
for j in WG.Label.[v] do
WG.LabelHead.[j]<- u
ok <- true
printfn "u= %d v=%d" u v
// end of for k in 1..(N-2)
// counting cuts
// u,v contain the 2 indexes of groupings
let cluster_A = WG.Label.[u]
let cluster_B = S_SubsetC (seq[for i in 1..N do yield i]) cluster_A // defined as complementary of A
// let WG2 = {Graph = D_Subset WG1.Graph (cluster_A |> Seq.toList)
// RemoveTable = remove_table
// Label = D_Subset WG1.Graph (cluster_A |> Seq.toList)
// LabelHead = label_head_table}
let cross_edge = // returns keyvalue pair (k,S')
let IsInCluster cluster (k,S) =
(k,S_Subset S cluster)
graphM |> toSeq |> Seq.map (IsInCluster cluster_B)
N_cut <-
cross_edge |> Seq.map (fun (k:int,v:int seq)-> Seq.length v)
|> Seq.sum
if (N_cut<min_cut) then
min_cut <- N_cut
WGmin <- WG
cluster_A_min <- cluster_A
cluster_B_min <- cluster_B
// end of for i in 1..Nruns
0 // return an integer exit code
Description of the algo: (i don't think its too essential to solve my problem)
at each trial, there are several steps. at each step, we merge 2 nodes into 1, (removing effectively 1) updating the graph. we do that 6 times until there are only 2 nodes left, which we define as 2 clusters, and we look at the number of cross edges between those 2 clusters. if we are "lucky" those 2 clusters would be (1,2,3,4) and (5,6,7,8) and find the right number of cuts.
at each step, the object WG is updated with the effects of merging 2 nodes
with only LiveNodes (the ones which are not eliminated as a result of merging 2 nodes) being perfectly kept up to date.
WG.Graph is the updated graph
WG.Label contains the labels of the nodes which have been merged into the current node
WG.LabelHead contains the label of the node into which that node has been merged
WG.RemoveTable says if the node has been removed or not.
Thanks in advance for anyone willing to take a look at it !
"It seems not working", because wgraphobj is a reference type, which is allocated on the stack, which means that when you're mutating the innards of WG, you're also mutating the innards of WG1, because they're the same innards.
This is precisely the kind of mess you get yourself into if you use mutable state. This is why people recommend to not use it. In particular, your use of mutable dictionaries undermines the robustness of your algorithm. I recommend using the F#'s own efficient immutable dictionary (called Map) instead.
Now, in response to your comment about WG.Graph <- GraphD giving compile error.
WG is mutable, but WG.Graph is not (but the contents of WG.Graph are again mutable). There is a difference, let me try to explain it.
WG is mutable in the sense that it points to some object of type wgraphobj, but you can make it, in the course of your program, to point at another object of the same type.
WG.Graph, on the other hand, is a field packed inside WG. It points to some object of type Dictionary<_,_>. And you cannot make it point to another object. You can create a different wgraphobj, in which the field Graph point to a different dictionary, but you cannot change where the field Graph of the original wgraphobj points.
In order to make the field Graph itself mutable, you can declare it as such:
type wgraphobj = {
mutable Graph: Dictionary<int, int seq>
...
Then you will be able to mutate that field:
WG.Graph <- GraphD
Note that in this case you do not need to declare the value WG itself as mutable.
However, it seems to me that for your purposes you can actually go the way of creating a new instance wgraphobj with the field Graph changed, and assigning it to the mutable reference WG:
WG.Graph <- { WG with Graph = GraphD }

Lists except - filter a seq of string which not equal any item.A of another list

I'm trying to create a filter function accept two list parameters and return all the items in the first seq after excluded these existing (equal to A) in the second list.
type R = { A: string; B: int; ...}
let filter (xxx: seq<string) (except: list<R>) =
xxx
|> Seq.filter (fun i ->
// returns all the items in xxx which not equal to any except.A
The simplest code would be:
type R = { A: string; B: int; }
let filter where except =
let except' = except |> List.map (fun x -> x.A) |> Set.ofList
where
|> Seq.filter (not << except'.Contains)
Notes:
Since the computation only uses R.A, we retrieve these R.A values only once for performance reasons.
Converting it to Set would eliminate duplicates as they only degrade performance and not affect the final result.
Since the type of except' is inferred as Set<string>, we can use member method except'.Contains instead of Set.contains.
I think one thing would be to do
let filter (xxx: seq<string>) (except: list<R>) =
xxx
|> Seq.filter (fun i -> except |> List.exists (fun t -> t.A = i) |> not)
Fluent LINQ implementation:
let filter (where: seq<string>) except =
let contains = set (where.Except(List.map (fun x -> x.A) except)) in
where.Where contains.Contains
There is now Seq.except:
xs
|> Seq.except ys
// All xs that are not in ys

Yet Another Value Restriction Question

In the following code Seq.generateUnique is constrained to be of type ((Assembly -> seq<Assembly>) -> seq<Assembly> -> seq<Assembly>).
open System
open System.Collections.Generic
open System.Reflection
module Seq =
let generateUnique =
let known = HashSet()
fun f initial ->
let rec loop items =
seq {
let cachedSeq = items |> Seq.filter known.Add |> Seq.cache
if not (cachedSeq |> Seq.isEmpty) then
yield! cachedSeq
yield! loop (cachedSeq |> Seq.collect f)
}
loop initial
let discoverAssemblies() =
AppDomain.CurrentDomain.GetAssemblies() :> seq<_>
|> Seq.generateUnique (fun asm -> asm.GetReferencedAssemblies() |> Seq.map Assembly.Load)
let test() = printfn "%A" (discoverAssemblies() |> Seq.truncate 2 |> Seq.map (fun asm -> asm.GetName().Name) |> Seq.toList)
for _ in 1 .. 5 do test()
System.Console.Read() |> ignore
I'd like it to be generic, but putting it into a file apart from its usage yields a value restriction error:
Value restriction. The value
'generateUnique' has been inferred to
have generic type val
generateUnique : (('_a -> '_b) -> '_c
-> seq<'_a>) when '_b :> seq<'_a> and '_c :> seq<'_a> Either make the
arguments to 'generateUnique' explicit
or, if you do not intend for it to be
generic, add a type annotation.
Adding an explicit type parameter (let generateUnique<'T> = ...) eliminates the error, but now it returns different results.
Output without type parameter (desired/correct behavior):
["mscorlib"; "TEST"]
["FSharp.Core"; "System"]
["System.Core"; "System.Security"]
[]
[]
And with:
["mscorlib"; "TEST"]
["mscorlib"; "TEST"]
["mscorlib"; "TEST"]
["mscorlib"; "TEST"]
["mscorlib"; "TEST"]
Why does the behavior change? How could I make the function generic and achieve the desired behavior?
generateUnique is a lot like the standard memoize pattern: it should be used to calculate memoized functions from normal functions, not do the actual caching itself.
#kvb was right about the change in the definition required for this shift, but then you need to change the definition of discoverAssemblies as follows:
let discoverAssemblies =
//"memoize"
let generator = Seq.generateUnique (fun (asm:Assembly) -> asm.GetReferencedAssemblies() |> Seq.map Assembly.Load)
fun () ->
AppDomain.CurrentDomain.GetAssemblies() :> seq<_>
|> generator
I don't think that your definition is quite correct: it seems to me that f needs to be a syntactic argument to generateUnique (that is, I don't believe that it makes sense to use the same HashSet for different fs). Therefore, a simple fix is:
let generateUnique f =
let known = HashSet()
fun initial ->
let rec loop items =
seq {
let cachedSeq = items |> Seq.filter known.Add |> Seq.cache
if not (cachedSeq |> Seq.isEmpty) then
yield! cachedSeq
yield! loop (cachedSeq |> Seq.collect f)
}
loop initial

Resources