Does MailboxProcessor just duplicate IObservable? - f#

I want to process to types of a message
Add x makes program remember number x
Print makes it print all remembered numbers
Why would I write this:
open System
type Message =
| Add of int
| Print
let mailbox = new MailboxProcessor<Message>(fun inbox ->
let rec loop history = async{
let! msg=inbox.Receive()
match msg with
| Add x -> return! loop(history + x.ToString()+" ")
| Print ->
printfn "%s" history
return! loop(history)
}
loop ""
)
[<EntryPoint>]
let main argv =
mailbox.Start()
mailbox.Post(Add 12)
mailbox.Post(Add 56)
mailbox.Post(Print)
mailbox.Post(Add 34)
mailbox.Post(Print)
ignore <| Console.ReadLine()
0
instead of this:
open System
open System.Reactive.Subjects
type Message =
| Add of int
| Print
let subject = new Subject<Message>()
[<EntryPoint>]
let main argv =
subject
|> Observable.scan(fun history msg ->
match msg with
| Add x -> history + x.ToString()+" "
| Print ->
printfn "%s" history
history
) ""
|> Observable.subscribe(fun _->())
|> ignore
subject.OnNext(Add 12)
subject.OnNext(Add 56)
subject.OnNext(Print)
subject.OnNext(Add 34)
subject.OnNext(Print)
ignore <| Console.ReadLine()
0
The MailboxProcessor adds additional level of complexity. I need a state machine which takes a state and returns a state. But it forces me to take inbox, which is used to receive state.
Does it has any advantages to IObservable?

No, they're not duplicates of one another. MailboxProcessor and IObservable are low-level building blocks of two different models of computation - actor model and functional reactive programming respectively.
Both deal with asynchronicity, but emphasize different qualities. It's might be possible to build your solution in terms of one or the other - as you noticed in your simple example - but you will find one or the other more natural to use in a particular context.
MailboxProcessors are particularly useful for thread-safe, lock-free access to a resource, such as a file. You can have multiple threads manipulating the resource through an asynchronous interface, and the MailboxProcessor guarantees that only one of those requests is processed at a time.

Related

Combining two observables in F#

I'm having trouble understanding how to manage multiple observables that depend on each other. I want to define a function with the following signature:
clock:IObservable<unit> -> obs:IObservable<'a> -> IObservable<'a>
So that events from obs can only be emitted once per clock tick, and excess events from obs are discarded.
I have tried mapping the two observables and then merging them into one stream, but it is not the solution.
The built-in F# library for Observables comes with only a few basic functions, so this is not something you can easily do using the built-in primitives. You can probably use a function from the full Rx library, which has a nice F# wrapper and comes with huge number of operations (but that makes it a bit hard to figure out which one is the one that you need).
An alternative purely F# approach would be to use agent-based programming. This lets you nicely handle complex concurrency patterns. The following implements an agent that has Tick and Event as two kinds of messages. It remembers the last Event and when Tick happens, it trigges the returned observable with the last Event value:
open System
type WhenTickMessage<'T> =
| Tick
| Event of 'T
let whenTick (clock:IObservable<_>) (event:IObservable<_>) =
let result = new Event<_>()
let agent = MailboxProcessor.Start(fun inbox ->
let rec loop event = async {
let! msg = inbox.Receive()
match msg with
| Tick ->
event |> Option.iter (fun e -> result.Trigger(e))
return! loop None
| Event e ->
return! loop (Some e) }
loop None)
clock.Add(fun _ -> agent.Post Tick)
event.Add(fun e -> agent.Post (Event e))
result.Publish

Slow conversion of F# discriminated union case to string

I have around 100k discriminated union cases I have to convert to strings, but it seems to be extremely slow.
As a comparison, the following executes (in F# interactive) in 3seconds on average :
open System
let buf = Text.StringBuilder()
let s = DateTime.Now
for i in 1 .. 100000 do
Printf.bprintf buf "%A" "OtherFinancingInterest" //string
buf.Length <- 0
printfn "elapsed : %0.2f" (DateTime.Now - s).TotalMilliseconds
While the following executes (also in F# interactive) in over a minute...
open System
let buf = Text.StringBuilder()
let s = DateTime.Now
for i in 1 .. 100000 do
Printf.bprintf buf "%A" OtherFinancingInterest //DU
buf.Length <- 0
printfn "elapsed : %0.2f" (DateTime.Now - s).TotalMilliseconds
The discriminated union has 25 values (the result is still extremely slow, around 16 seconds with two cases, but less so than with 25). Any idea if that is "normal" or if I may be doing something wrong ?
Many thanks
The %A format specifier pretty prints any F# value. It uses reflection to do so. It should only really be used for debugging purposes, and not in normal application code.
Note that using %s in your first example using a string makes it a lot faster because there is no type checking needed at runtime.
For the DU, there is a hack you could use to make the reflection only happen once on application load:
type FinancingInterest =
| OtherFinancingInterest
open FSharp.Reflection
let private OtherFinancingInterestStringMap =
FSharpType.GetUnionCases typeof<FinancingInterest>
|> Array.map (fun c -> FSharpValue.MakeUnion(c, [||]) :?> FinancingInterest)
|> Array.map (fun x -> x, sprintf "%A" x)
|> Map.ofArray
type FinancingInterest with
member this.AsString = OtherFinancingInterestStringMap |> Map.find this
You would also use this with the %s format specifier:
Printf.bprintf buf "%s" OtherFinancingInterest.AsString
I had similar timings to yours in your example, and now this one comes down to 40ms.
This only works as long as all of the DU cases don't have an arguments. You will get an exception on application load as soon as you try anything like this:
type FinancingInterest =
| Foo of string
| OtherFinancingInterest
Having said all this, I think you're better off writing a simple function that explicitly converts your type into a string value, writing out the names in full with repetition if necessary. The names of discriminated union cases should not generally be thought of as data that affects your program. You would usually expect to be able to safely rename case names without affecting runtime behaviour at all.

How to write efficient list/seq functions in F#? (mapFoldWhile)

I was trying to write a generic mapFoldWhile function, which is just mapFold but requires the state to be an option and stops as soon as it encounters a None state.
I don't want to use mapFold because it will transform the entire list, but I want it to stop as soon as an invalid state (i.e. None) is found.
This was myfirst attempt:
let mapFoldWhile (f : 'State option -> 'T -> 'Result * 'State option) (state : 'State option) (list : 'T list) =
let rec mapRec f state list results =
match list with
| [] -> (List.rev results, state)
| item :: tail ->
let (result, newState) = f state item
match newState with
| Some x -> mapRec f newState tail (result :: results)
| None -> ([], None)
mapRec f state list []
The List.rev irked me, since the point of the exercise was to exit early and constructing a new list ought to be even slower.
So I looked up what F#'s very own map does, which was:
let map f list = Microsoft.FSharp.Primitives.Basics.List.map f list
The ominous Microsoft.FSharp.Primitives.Basics.List.map can be found here and looks like this:
let map f x =
match x with
| [] -> []
| [h] -> [f h]
| (h::t) ->
let cons = freshConsNoTail (f h)
mapToFreshConsTail cons f t
cons
The consNoTail stuff is also in this file:
// optimized mutation-based implementation. This code is only valid in fslib, where mutation of private
// tail cons cells is permitted in carefully written library code.
let inline setFreshConsTail cons t = cons.(::).1 <- t
let inline freshConsNoTail h = h :: (# "ldnull" : 'T list #)
So I guess it turns out that F#'s immutable lists are actually mutable because performance? I'm a bit worried about this, having used the prepend-then-reverse list approach as I thought it was the "way to go" in F#.
I'm not very experienced with F# or functional programming in general, so maybe (probably) the whole idea of creating a new mapFoldWhile function is the wrong thing to do, but then what am I to do instead?
I often find myself in situations where I need to "exit early" because a collection item is "invalid" and I know that I don't have to look at the rest. I'm using List.pick or Seq.takeWhile in some cases, but in other instances I need to do more (mapFold).
Is there an efficient solution to this kind of problem (mapFoldWhile in particular and "exit early" in general) with functional programming concepts, or do I have to switch to an imperative solution / use a Collections.Generics.List?
In most cases, using List.rev is a perfectly sufficient solution.
You are right that the F# core library uses mutation and other dirty hacks to squeeze some more performance out of the F# list operations, but I think the micro-optimizations done there are not particularly good example. F# list functions are used almost everywhere so it might be a good trade-off, but I would not follow it in most situations.
Running your function with the following:
let l = [ 1 .. 1000000 ]
#time
mapFoldWhile (fun s v -> 0, s) (Some 1) l
I get ~240ms on the second line when I run the function without changes. When I just drop List.rev (so that it returns the data in the other order), I get around ~190ms. If you are really calling the function frequently enough that this matters, then you'd have to use mutation (actually, your own mutable list type), but I think that is rarely worth it.
For general "exit early" problems, you can often write the code as a composition of Seq.scan and Seq.takeWhile. For example, say you want to sum numbers from a sequence until you reach 1000. You can write:
input
|> Seq.scan (fun sum v -> v + sum) 0
|> Seq.takeWhile (fun sum -> sum < 1000)
Using Seq.scan generates a sequence of sums that is over the whole input, but since this is lazily generated, using Seq.takeWhile stops the computation as soon as the exit condition happens.

Avoid mutation in this example in F#

Coming from an OO background, I am having trouble wrapping my head around how to solve simple issues with FP when trying to avoid mutation.
let mutable run = true
let player1List = ["he"; "ho"; "ha"]
let addValue lst value =
value :: lst
while run do
let input = Console.ReadLine()
addValue player1List input |> printfn "%A"
if player1List.Length > 5 then
run <- false
printfn "all done" // daz never gunna happen
I know it is ok to use mutation in certain cases, but I am trying to train myself to avoid mutation as the default. With that said, can someone please show me an example of the above w/o using mutation in F#?
The final result should be that player1List continues to grow until the length of items are 6, then exit and print 'all done'
The easiest way is to use recursion
open System
let rec makelist l =
match l |> List.length with
|6 -> printfn "all done"; l
| _ -> makelist ((Console.ReadLine())::l)
makelist []
I also removed some the addValue function as it is far more idiomatic to just use :: in typical F# code.
Your original code also has a common problem for new F# coders that you use run = false when you wanted run <- false. In F#, = is always for comparison. The compiler does actually warn about this.
As others already explained, you can rewrite imperative loops using recursion. This is useful because it is an approach that always works and is quite fundamental to functional programming.
Alternatively, F# provides a rich set of library functions for working with collections, which can actually nicely express the logic that you need. So, you could write something like:
let player1List = ["he"; "ho"; "ha"]
let player2List = Seq.initInfinite (fun _ -> Console.ReadLine())
let listOf6 = Seq.append player1List list2 |> Seq.take 6 |> List.ofSeq
The idea here is that you create an infinite lazy sequence that reads inputs from the console, append it at the end of your initial player1List and then take first 6 elements.
Depending on what your actual logic is, you might do this a bit differently, but the nice thing is that this is probably closer to the logic that you want to implement...
In F#, we use recursion to do loop. However, if you know how many times you need to iterate, you could use F# List.fold like this to hide the recursion implementation.
[1..6] |> List.fold (fun acc _ -> Console.ReadLine()::acc) []
I would remove the pipe from match for readability but use it in the last expression to avoid extra brackets:
open System
let rec makelist l =
match List.length l with
| 6 -> printfn "all done"; l
| _ -> Console.ReadLine()::l |> makelist
makelist []

Pairwise Sequence Processing to compare db tables

Consider the following Use case:
I want to iterate through 2 db tables in parallel and find differences and gaps/missing records in either table. Assume that 1) pk of table is an Int ID field; 2) the tables are read in ID order; 3) records may be missing from either table (with corresponding sequence gaps).
I'd like to do this in a single pass over each db - using lazy reads. (My initial version of this program uses sequence objects and the data reader - unfortunately makes multiple passes over each db).
I've thought of using pairwise sequence processing and use Seq.skip within the iterations to try and keep the table processing in sync. However apparently this is very slow as I Seq.skip has a high overhead (creating new sequences under the hood) so this could be a problem with a large table (say 200k recs).
I imagine this is a common design pattern (compare concurrent data streams from different sources) and am interested in feedback/comments/links to similar projects.
Anyone care to comment?
Here's my (completely untested) take, doing a single pass over both tables:
let findDifferences readerA readerB =
let idsA, idsB =
let getIds (reader:System.Data.Common.DbDataReader) =
reader |> LazyList.unfold (fun reader ->
if reader.Read ()
then Some (reader.GetInt32 0, reader)
else None)
getIds readerA, getIds readerB
let onlyInA, onlyInB = ResizeArray<_>(), ResizeArray<_>()
let rec impl a b =
let inline handleOnlyInA idA as' = onlyInA.Add idA; impl as' b
let inline handleOnlyInB idB bs' = onlyInB.Add idB; impl a bs'
match a, b with
| LazyList.Cons (idA, as'), LazyList.Cons (idB, bs') ->
if idA < idB then handleOnlyInA idA as'
elif idA > idB then handleOnlyInB idB bs'
else impl as' bs'
| LazyList.Nil, LazyList.Nil -> () // termination condition
| LazyList.Cons (idA, as'), _ -> handleOnlyInA idA as'
| _, LazyList.Cons (idB, bs') -> handleOnlyInB idB bs'
impl idsA idsB
onlyInA.ToArray (), onlyInB.ToArray ()
This takes two DataReaders (one for each table) and returns two int[]s which indicate the IDs that were only present in their respective table. The code assumes that the ID field is of type int and is at ordinal index 0.
Also note that this code uses LazyList from the F# PowerPack, so you'll need to get that if you don't already have it. If you're targeting .NET 4.0 then I strongly recommend getting the .NET 4.0 binaries which I've built and hosted here, as the binaries from the F# PowerPack site only target .NET 2.0 and sometimes don't play nice with VS2010 SP1 (see this thread for more info: Problem with F# Powerpack. Method not found error).
When you use sequences, any lazy function adds some overhead on the sequence. Calling Seq.skip thousands of times on the same sequence will clearly be slow.
You can use Seq.zip or Seq.map2 to process two sequences at a time:
> Seq.map2 (+) [1..3] [10..12];;
val it : seq<int> = seq [11; 13; 15]
If the Seq module is not enough, you might need to write your own function.
I'm not sure if I understand what you try to do, but this sample function might help you:
let fct (s1: seq<_>) (s2: seq<_>) =
use e1 = s1.GetEnumerator()
use e2 = s2.GetEnumerator()
let rec walk () =
// do some stuff with the element of both sequences
printfn "%d %d" e1.Current e2.Current
if cond1 then // move in both sequences
if e1.MoveNext() && e2.MoveNext() then walk ()
else () // end of a sequence
elif cond2 then // move to the next element of s1
if e1.MoveNext() then walk()
else () // end of s1
elif cond3 then // move to the next element of s2
if e2.MoveNext() then walk ()
else () // end of s2
// we need at least one element in each sequence
if e1.MoveNext() && e2.MoveNext() then walk()
Edit :
The previous function was meant to extend functionality of the Seq module, and you'll probably want to make it a high-order function. As ildjarn said, using LazyList can lead to cleaner code:
let rec merge (l1: LazyList<_>) (l2: LazyList<_>) =
match l1, l2 with
| LazyList.Cons(h1, t1), LazyList.Cons(h2, t2) ->
if h1 <= h2 then LazyList.cons h1 (merge t1 l2)
else LazyList.cons h2 (merge l1 t2)
| LazyList.Nil, l2 -> l2
| _ -> l1
merge (LazyList.ofSeq [1; 4; 5; 7]) (LazyList.ofSeq [1; 2; 3; 6; 8; 9])
But I still think you should separate the iteration of your data, from the processing. Writing a high-order function to iterate is a good idea (at the end, it's not annoying if the iterator function code uses mutable enumerators).

Resources