F# Async actions with conflicts / ordering / mutex - f#

F# makes it easy to define asynchronous computations using the async builder. You can write an entire program and then pass it to Async.RunSynchronously.
The problem I am having is that some async actions must not be run at the same time; they should be forced to wait for other async actions to complete. This is something like a mutex. However, I do not want to just chain them in serial because this would be inefficient.
Concrete Example: Download Cache
Suppose I want to fetch some remote files using a local file cache. In my application I call fetchFile : Async<string> in many places, but there is a risk that if I call fetchFile on the same URL at the same time, the cache will get corrupted by multiple writes. Instead, the fetchFile command should have behaviour like this:
If there is no cache, download the file to the cache and then read the cache contents
If the cache is currently being written to, wait for the write to finish and then read the contents
If the cache is present and complete, just read the cache contents
fetchFile on two different URLs should work in parallel
I am imagining some kind of stateful DownloadManager class to which requests can be sent and ordered internally.
How do F# programmers usually implement such logic with async?
Imaginary usage:
let dm = new DownloadManager()
let urls = [
"https://www.google.com";
"https://www.google.com";
"https://www.wikipedia.org";
"https://www.google.com";
"https://www.bing.com";
]
let results =
urls
|> Seq.map dm.Download
|> Async.Parallel
|> Async.RunSynchronously
Note: I previously asked this question before about how to run async actions in a semi-parallel fashion, but now I have realized that this approach is hard to compose.
Note: I do not have to worry about multiple instances of the application running at once. In-memory locking is sufficient.

I agree with #AMieres that mailbox processor is a good way to do this. My version of the code is somewhat less general - it uses mailbox processor directly for this one purpose and so it might be a bit simpler.
Our mailbox processor has only one message - you ask it to download a URL and it gives you back an async workflow that you can wait for to get your result:
type DownloadMessage =
| Download of string * AsyncReplyChannel<Async<string>>
We need a helper function to asynchronously download a URL:
let asyncDownload url = async {
let wc = new System.Net.WebClient()
printfn "Downloading: %s" url
return! wc.AsyncDownloadString(System.Uri(url)) }
In the mailbox processor, we keep a mutable cache (this is fine, because the mailbox processor processes messages synchronously). When we get a download request, we check if we already have download in the cache - if no, we start the download as a child async and add it to the cache - so the cache contains async workflows that represent results of a running download.
let downloadCache = MailboxProcessor.Start(fun inbox -> async {
let cache = System.Collections.Generic.Dictionary<_, _>()
while true do
let! (Download(url, repl)) = inbox.Receive()
if not (cache.ContainsKey url) then
let! proc = asyncDownload url |> Async.StartChild
cache.Add(url, proc)
repl.Reply(cache.[url]) })
To actually download using cache, we just send a request to the mailbox processor and then wait for the result of the returned workflow (which may be shared by multiple requests).
let downloadUsingCache url = async {
let! res = downloadCache.PostAndAsyncReply(fun ch -> Download(url, ch))
return! res }

UPDATE
Better than the lazy value is the Async.StartChild suggested by Petricek so I changed lazyDownload to asyncDownload
You could use a MailboxProcessor as a download manager that handles the cache. The MailboxProcessor is a structure in F# that processes a queue of messages ensuring no collisions.
First you need a processor capable of maintaining a state:
let stateFull hndl initState =
MailboxProcessor.Start(fun inbox ->
let rec loop state : Async<unit> = async {
try let! f = inbox.Receive()
let! newState = f state
return! loop newState
with e -> return! loop (hndl e state)
}
loop initState
)
The first parameter is a handler for errors, the second is the initial state, in this case a Map<string, Async<string>>. This is our downloadManager:
let downloadManager =
stateFull (fun e s -> printfn "%A" e ; s) (Map.empty : Map<string, _>)
To invoke the MailBox we need to use PostAndReply:
let applyReplyS f (agent: MailboxProcessor<'a->Async<'a>>) =
agent.PostAndReply(fun (reply:AsyncReplyChannel<'r>) ->
fun v -> async {
let st, r = f v
reply.Reply r
return st
})
This function expects a folder function that checks the cache and adds an Async<string> if none is found and returns the updated cache.
First the asyncDownload function:
let asyncDownload url =
async {
let started = System.DateTime.UtcNow.Ticks
do! Async.Sleep 30
let finished = System.DateTime.UtcNow.Ticks
let r = sprintf "Downloaded %A it took: %dms %s" (started / 10000L) ((finished - started) / 10000L) url
printfn "%s" r
return r
}
Just a dummy function that returns a string and timing information.
Now the folder function that checks the cache:
let folderCache url cache =
cache
|> Map.tryFind url
|> Option.map(fun ld -> cache, ld)
|> Option.defaultWith (fun () ->
let ld = asyncDownload url |> Async.StartChild |> Async.RunSynchronously
cache |> Map.add url ld, ld
)
finally our download function:
let downloadUrl url =
downloadManager
|> applyReplyS (folderCache url)
// val downloadUrl: url: string -> Async<string>
Testing
let s = System.DateTime.UtcNow.Ticks
printfn "started %A" (s / 10000L)
let res =
List.init 50 (fun i -> i, downloadUrl (string <| i % 5) )
|> List.groupBy (snd >> Async.RunSynchronously)
|> List.map (fun (t, ts) -> sprintf "%s - %A" t (ts |> List.map fst ) )
let f = System.DateTime.UtcNow.Ticks
printfn "finish %A" (f / 10000L)
printfn "elapsed %dms" ((f - s) / 10000L)
res |> printfn "Result: \n%A"
produces the following output:
started 63676683215256L
Downloaded 63676683215292L it took: 37ms "2"
Downloaded 63676683215292L it took: 36ms "3"
Downloaded 63676683215292L it took: 36ms "1"
Downloaded 63676683215291L it took: 38ms "0"
Downloaded 63676683215292L it took: 36ms "4"
finish 63676683215362L
elapsed 106ms
Result:
["Downloaded 63676683215291L it took: 38ms "0" - [0; 5; 10; 15; 20; 25; 30; 35; 40; 45]";
"Downloaded 63676683215292L it took: 36ms "1" - [1; 6; 11; 16; 21; 26; 31; 36; 41; 46]";
"Downloaded 63676683215292L it took: 37ms "2" - [2; 7; 12; 17; 22; 27; 32; 37; 42; 47]";
"Downloaded 63676683215292L it took: 36ms "3" - [3; 8; 13; 18; 23; 28; 33; 38; 43; 48]";
"Downloaded 63676683215292L it took: 36ms "4" - [4; 9; 14; 19; 24; 29; 34; 39; 44; 49]"]

I am offering you a simplified version based on #Tomas Petricek answer.
Lets assume that we have our download function that given a url returns an Async<string>. This is a dummy version:
let asyncDownload url =
async {
let started = System.DateTime.UtcNow.Ticks
do! Async.Sleep 30
let finished = System.DateTime.UtcNow.Ticks
let r = sprintf "Downloaded %A it took: %dms %s" (started / 10000L) ((finished - started) / 10000L) url
printfn "%s" r
return r
}
Here we have some simple generic Mailbox helper functions in their own module:
module Mailbox =
let iterA hndl f =
MailboxProcessor.Start(fun inbox ->
async {
while true do
try let! msg = inbox.Receive()
do! f msg
with e -> hndl e
}
)
let callA hndl f = iterA hndl (fun ((replyChannel: AsyncReplyChannel<_>), msg) -> async {
let! r = f msg
replyChannel.Reply r
})
let call hndl f = callA hndl (fun msg -> async { return f msg } )
The purpose of this 'library' is to simplify the more typical uses of the MailboxProcessor. While it looks complicated and hard to understand the important thing is what the functions do and how to use them.
In particular we are going to use Mailbox.call which returns a Mailbox agent capable of returning a value. It's signature is:
val call:
hndl: exn -> unit ->
f : 'a -> 'b
-> MailboxProcessor<AsyncReplyChannel<'b> * 'a>
The first parameter is an exception handler and the second a function that returns a value. Here is how we define our downloadManager:
let downloadManager =
let dict = new System.Collections.Generic.Dictionary<string, _>()
Mailbox.call (printfn "%A") (fun url ->
if dict.ContainsKey url then dict.[url] else
let result = asyncDownload url |> Async.StartChild |> Async.RunSynchronously
dict.Add(url, result)
result
)
Our cache is a Dictionary. If a url is not present we call asyncDownload and start it as a child process. By using Async.StartChild we do not have to wait until it finishes downloading, we just return an async that waits for it to finish.
To invoke the manager we use downloadManager.PostAndReply
let downloadUrl url = downloadManager.PostAndReply(fun reply -> reply, url)
And here is a test:
let s = System.DateTime.UtcNow.Ticks
printfn "started %A" (s / 10000L)
let res =
List.init 50 (fun i -> i, downloadUrl (string <| i % 5) )
|> List.groupBy (snd >> Async.RunSynchronously)
|> List.map (fun (t, ts) -> sprintf "%s - %A" t (ts |> List.map fst ) )
let f = System.DateTime.UtcNow.Ticks
printfn "finish %A" (f / 10000L)
printfn "elapsed %dms" ((f - s) / 10000L)
res |> printfn "Result: \n%A"
That produces:
started 63676682503885L
Downloaded 63676682503911L it took: 34ms 1
Downloaded 63676682503912L it took: 33ms 2
Downloaded 63676682503911L it took: 37ms 0
Downloaded 63676682503912L it took: 33ms 3
Downloaded 63676682503912L it took: 33ms 4
finish 63676682503994L
elapsed 109ms
Result:
["Downloaded 63676682503911L it took: 37ms 0 - [0; 5; 10; 15; 20; 25; 30; 35; 40; 45]";
"Downloaded 63676682503911L it took: 34ms 1 - [1; 6; 11; 16; 21; 26; 31; 36; 41; 46]";
"Downloaded 63676682503912L it took: 33ms 2 - [2; 7; 12; 17; 22; 27; 32; 37; 42; 47]";
"Downloaded 63676682503912L it took: 33ms 3 - [3; 8; 13; 18; 23; 28; 33; 38; 43; 48]";
"Downloaded 63676682503912L it took: 33ms 4 - [4; 9; 14; 19; 24; 29; 34; 39; 44; 49]"]

Related

Call async method in an inner lambda? "This construct may only be used within computation expressions"

I have the following code
let rec consume() : Async<unit> = async {
.....
listA
|> Seq.iter(fun i ->
.....
let listB : seq<...> option =
let c = getListB a b
match c with
| Some d -> Seq.filter(....) |> Some
| None -> None
match listB with .....
....
Now the function getListB is converted to return async<Seq<B>> instead of Seq<B>. So the code was converted to the following. However, the getListB blocked the execution. How to rewrite it nonblocking? Simply convert the line to let! c = getListB a b won't work because the code is in an inner lambda? The error message is "This construct may only be used within computation expressions".
let rec consume() : Async<unit> = async {
.....
listA
|> Seq.iter(fun i ->
.....
let listB : seq<...> option =
let c = getListB a b |> Async.RunSynchronously
match c with
| Some d -> Seq.filter(....) |> Some
| None -> None
I believe the problem you are describing boils down to how to convert an seq<Async> to an Async<seq>. This is described comprehensively in this post by Scott Wlaschin.
This is a poor man's implementation of the concepts described in his post which are far more powerful and generic. The general idea is that we want to delay the creation of the sequence until we have the values promised by the instance of Async<_>
let traverseSequence ( seqAsync : seq<Async<'a>>) =
let promiseOfAnEmptySequence = async { return Seq.empty }
let delayedCalculation (asyncHead : Async<'a>) (asyncTail : Async<seq<'a>>) =
async {
let! calculatedHead = asyncHead
return!
async {
let! calculatedTail = asyncTail
return calculatedHead |> Seq.singleton |> Seq.append(calculatedTail)
}
}
Seq.foldBack delayedCalculation seqAsync promiseOfAnEmptySequence
The answer depends on whether you want to run each element of the sequence sequentially or in parallel.
In both cases, start by using Seq.map instead of Seq.iter, then you can put another async block inside the lambda such that the result of the map is seq<Async<'a>>.
Sequential
For this, you need define some extra functions in an extra Async module.
module Async =
let map f x =
async{
let! x = x
return f x
}
let lift2 f x1 x2 =
async{
let! x1 = x1
let! x2 = x2
return f x1 x2
}
let return' x = async { return x }
let mapM mFunc sequ =
let consF x ys = lift2 (fun h t -> h::t) (mFunc x) ys
Seq.foldBack(consF) sequ (return' [])
|> map (Seq.ofList)
let sequence sequ = mapM id sequ
You might have seen mapM called traverse elsewhere, they are basically just different names for the same concept.
The sequence function is just a special case of mapM where the supplied binding function is just the identity (id) function. It has type seq<Async<'a>> -> Async<seq<'a>>, i.e. it flips the Async from being inside the Seq to being outside.
You then simply pipe the result of your Seq.map to the sequence function, which gives you an async value.
Your example code isn't complete so I made up some example code to use this:
let sleep = Async.Sleep 100
let sleeps = Seq.init 15 (fun _ -> sleep)
let sequencedSleeps = Async.sequence sleeps
Async.RunSynchronously sequencedSleeps
Real: 00:00:01.632, CPU: 00:00:00.000, GC gen0: 0, gen1: 0, gen2: 0
val it : seq<unit> =
[null; null; null; null; null; null; null; null; null; null; null; null;
null; null; null]
Parallel
To execute each element of the sequence in parallel, instead of sequentially, you could do:
let pSequence sequ = Async.Parallel sequ |> Async.map (Seq.ofArray)
Example test code:
let pSleeps = pSequence sleeps;;
Async.RunSynchronously pSleeps;;
Real: 00:00:00.104, CPU: 00:00:00.000, GC gen0: 0, gen1: 0, gen2: 0
val it : seq<unit> = seq [null; null; null; null; ...]
Note how the execution time depends on the chosen approach.
For the cases where you're getting back a seq<unit> and so want to ignore the result it can be useful to define some extra helper functions, such as:
let sequenceIgnore sequ = sequ |> Async.sequence |> Async.map (ignore)
let pSequenceIgnore sequ = sequ |> pSequence |> Async.map (ignore)
That lets you return a single unit rather than a superfluous sequence of them.

http download to disk with fsharp.data.dll and async workflows stalls

The following .fsx file is supposed to download and save to disk binary table base files which are posted as links in a html page on the internet, using Fsharp.Data.dll.
What happens, is that the whole thing stalls after a while and way before it is done, not even throwing an exception or alike.
I am pretty sure, I kind of mis-handle the CopyToAsync() thingy in my async workflow. As this is supposed to run while I go for a nap, it would be nice if someone could tell me how it is supposed to be done correctly. (In more general terms - how to handle a System.Threading.Task thingy in an async workflow thingy?)
#r #"E:\R\playground\DataTypeProviderStuff\packages\FSharp.Data.2.2.3\lib\net40\FSharp.Data.dll"
open FSharp.Data
open Microsoft.FSharp.Control.CommonExtensions
let document = HtmlDocument.Load("http://www.olympuschess.com/egtb/gaviota/")
let links =
document.Descendants ["a"] |> Seq.choose (fun x -> x.TryGetAttribute("href") |> Option.map (fun a -> a.Value()))
|> Seq.filter (fun v -> v.EndsWith(".cp4"))
|> List.ofSeq
let targetFolder = #"E:\temp\tablebases\"
let downloadUrls =
links |> List.map (fun name -> "http://www.olympuschess.com/egtb/gaviota/" + name, targetFolder + name )
let awaitTask = Async.AwaitIAsyncResult >> Async.Ignore
let fetchAndSave (s,t) =
async {
printfn "Starting with %s..." s
let! result = Http.AsyncRequestStream(s)
use fileStream = new System.IO.FileStream(t,System.IO.FileMode.Create)
do! awaitTask (result.ResponseStream.CopyToAsync(fileStream))
printfn "Done with %s." s
}
let makeBatches n jobs =
let rec collect i jl acc =
match i,jl with
| 0, _ -> acc,jl
| _, [] -> acc,jl
| _, x::xs -> collect (i-1) (xs) (acc # [x])
let rec loop remaining acc =
match remaining with
| [] -> acc
| x::xs ->
let r,rest = collect n remaining []
loop rest (acc # [r])
loop jobs []
let download () =
downloadUrls
|> List.map fetchAndSave
|> makeBatches 2
|> List.iter (fun l -> l |> Async.Parallel |> Async.RunSynchronously |> ignore )
|> ignore
download()
Note Updated code so it creates batches of 2 downloads at a time and only the first batch works. Also added the awaitTask from the first answer as this seems the right way to do it.
News What is also funny: If I interrupt the stalled script and then #load it again into the same instance of fsi.exe, it stalls right away. I start to think it is a bug in the library I use or something like that.
Thanks, in advance!
Here fetchAndSave has been modified to handle the Task returned from CopyToAsync asynchronously. In your version you are waiting on the Task synchronously. Your script will appear to lock up as you are using Async.RunSynchronously to run the whole workflow. However the files do download as expected in the background.
let awaitTask = Async.AwaitIAsyncResult >> Async.Ignore
let fetchAndSave (s,t) = async {
let! result = Http.AsyncRequestStream(s)
use fileStream = new System.IO.FileStream(t,System.IO.FileMode.Create)
do! awaitTask (result.ResponseStream.CopyToAsync(fileStream))
}
Of course you also need to call
do download()
on the last line of your script to kick things into motion.

Are F# Async.Parallel results guaranteed to be in order?

Are the results from F#'s Async.Parallel operation guaranteed to arrive in the order jobs were submitted? My sample code returns the results in order, but I can't find any mention in the MSDN docs, or the F# spec, assuring this must be the case -- that it's not a coincidence.
Here is my sample code:
let r = System.Random()
Async.Parallel [
for i in 0..10 ->
async {
let rand_num = r.Next(10)
do! Async.Sleep(rand_num) (* Simulate jobs taking a variable amount of time *)
printfn "%i %i" i rand_num
return i
}
]
|> Async.RunSynchronously
|> printfn "%A"
And here's the output.
0 0
5 1
4 1
3 3
10 6
9 4
7 5
2 5
1 5
8 7
6 9
[|0; 1; 2; 3; 4; 5; 6; 7; 8; 9; 10|]
You can see that, in this run, the async functions complete in indeterminate order, yet the resulting array is sorted. Is this behavior guaranteed?
At the moment, the source of the function is written so that this guarantee is enforced. Looking at control.fs around line #1300 for the definition, we can see the function that puts the results into the output array is
let recordSuccess i res =
results.[i] <- res;
finishTask(Interlocked.Decrement count)
this function is called in this segment
tasks |> Array.iteri (fun i p ->
queueAsync
innerCTS.Token
// on success, record the result
(fun res -> recordSuccess i res)
where tasks has the original tasks in sorted order. This guarantees that the output list is in the same order as the input.
UPDATE
The spec at least seems to imply that the order is fixed - it contains this code:
let rec fib x = if x < 2 then 1 else fib(x-1) + fib(x-2)
let fibs =
Async.Parallel [ for i in 0..40 -> async { return fib(i) } ]
|> Async.RunSynchronously
printfn "The Fibonacci numbers are %A" fibs //I changed this line to be accurate
System.Console.ReadKey(true)
If the spec didn't guarantee the output order, this code would be incorrect.

F# divide sequence up in blocks [duplicate]

I'm trying to learn F# by rewriting some C# algorithms I have into idiomatic F#.
One of the first functions I'm trying to rewrite is a batchesOf where:
[1..17] |> batchesOf 5
Which would split the sequence into batches with a max of five in each, i.e:
[[1; 2; 3; 4; 5]; [6; 7; 8; 9; 10]; [11; 12; 13; 14; 15]; [16; 17]]
My first attempt at doing this is kind of ugly where I've resorted to using a mutable ref object after running into errors trying to use mutable type inside the closure. Using ref is particularly unpleasant since to dereference it you have to use the ! operator which when inside a condition expression can be counter intuitive to some devs who will read it as logical not. Another problem I ran into is where Seq.skip and Seq.take are not like their Linq aliases in that they will throw an error if size exceeds the size of the sequence.
let batchesOf size (sequence: _ seq) : _ list seq =
seq {
let s = ref sequence
while not (!s |> Seq.isEmpty) do
yield !s |> Seq.truncate size |> List.ofSeq
s := System.Linq.Enumerable.Skip(!s, size)
}
Anyway what would be the most elegant/idiomatic way to rewrite this in F#? Keeping the original behaviour but preferably without the ref mutable variable.
Implementing this function using the seq<_> type idiomatically is difficult - the type is inherently mutable, so there is no simple nice functional way. Your version is quite inefficient, because it uses Skip repeatedly on the sequence. A better imperative option would be to use GetEnumerator and just iterate over elements using IEnumerator. You can find various imperative options in this snippet: http://fssnip.net/1o
If you're learning F#, then it is better to try writing the function using F# list type. This way, you can use idiomatic functional style. Then you can write batchesOf using pattern matching with recursion and accumulator argument like this:
let batchesOf size input =
// Inner function that does the actual work.
// 'input' is the remaining part of the list, 'num' is the number of elements
// in a current batch, which is stored in 'batch'. Finally, 'acc' is a list of
// batches (in a reverse order)
let rec loop input num batch acc =
match input with
| [] ->
// We've reached the end - add current batch to the list of all
// batches if it is not empty and return batch (in the right order)
if batch <> [] then (List.rev batch)::acc else acc
|> List.rev
| x::xs when num = size - 1 ->
// We've reached the end of the batch - add the last element
// and add batch to the list of batches.
loop xs 0 [] ((List.rev (x::batch))::acc)
| x::xs ->
// Take one element from the input and add it to the current batch
loop xs (num + 1) (x::batch) acc
loop input 0 [] []
As a footnote, the imperative version can be made a bit nicer using computation expression for working with IEnumerator, but that's not standard and it is quite advanced trick (for example, see http://fssnip.net/37).
A friend asked me this a while back. Here's a recycled answer. This works and is pure:
let batchesOf n =
Seq.mapi (fun i v -> i / n, v) >>
Seq.groupBy fst >>
Seq.map snd >>
Seq.map (Seq.map snd)
Or an impure version:
let batchesOf n =
let i = ref -1
Seq.groupBy (fun _ -> i := !i + 1; !i / n) >> Seq.map snd
These produce a seq<seq<'a>>. If you really must have an 'a list list as in your sample then just add ... |> Seq.map (List.ofSeq) |> List.ofSeq as in:
> [1..17] |> batchesOf 5 |> Seq.map (List.ofSeq) |> List.ofSeq;;
val it : int list list = [[1; 2; 3; 4; 5]; [6; 7; 8; 9; 10]; [11; 12; 13; 14; 15]; [16; 17]]
Hope that helps!
This can be done without recursion if you want
[0..20]
|> Seq.mapi (fun i elem -> (i/size),elem)
|> Seq.groupBy (fun (a,_) -> a)
|> Seq.map (fun (_,se) -> se |> Seq.map (snd));;
val it : seq<seq<int>> =
seq
[seq [0; 1; 2; 3; ...]; seq [5; 6; 7; 8; ...]; seq [10; 11; 12; 13; ...];
seq [15; 16; 17; 18; ...]; ...]
Depending on how you think this may be easier to understand. Tomas' solution is probably more idiomatic F# though
Hurray, we can use List.chunkBySize, Seq.chunkBySize and Array.chunkBySize in F# 4, as mentioned by Brad Collins and Scott Wlaschin.
This isn't perhaps idiomatic but it works:
let batchesOf n l =
let _, _, temp', res' = List.fold (fun (i, n, temp, res) hd ->
if i < n then
(i + 1, n, hd :: temp, res)
else
(1, i, [hd], (List.rev temp) :: res))
(0, n, [], []) l
(List.rev temp') :: res' |> List.rev
Here's a simple implementation for sequences:
let chunks size (items:seq<_>) =
use e = items.GetEnumerator()
let rec loop i acc =
seq {
if i = size then
yield (List.rev acc)
yield! loop 0 []
elif e.MoveNext() then
yield! loop (i+1) (e.Current::acc)
else
yield (List.rev acc)
}
if size = 0 then invalidArg "size" "must be greater than zero"
if Seq.isEmpty items then Seq.empty else loop 0 []
let s = Seq.init 10 id
chunks 3 s
//output: seq [[0; 1; 2]; [3; 4; 5]; [6; 7; 8]; [9]]
My method involves converting the list to an array and recursively chunking the array:
let batchesOf (sz:int) lt =
let arr = List.toArray lt
let rec bite curr =
if (curr + sz - 1 ) >= arr.Length then
[Array.toList arr.[ curr .. (arr.Length - 1)]]
else
let curr1 = curr + sz
(Array.toList (arr.[curr .. (curr + sz - 1)])) :: (bite curr1)
bite 0
batchesOf 5 [1 .. 17]
[[1; 2; 3; 4; 5]; [6; 7; 8; 9; 10]; [11; 12; 13; 14; 15]; [16; 17]]
I found this to be a quite terse solution:
let partition n (stream:seq<_>) = seq {
let enum = stream.GetEnumerator()
let rec collect n partition =
if n = 1 || not (enum.MoveNext()) then
partition
else
collect (n-1) (partition # [enum.Current])
while enum.MoveNext() do
yield collect n [enum.Current]
}
It works on a sequence and produces a sequence. The output sequence consists of lists of n elements from the input sequence.
You can solve your task with analog of Clojure partition library function below:
let partition n step coll =
let rec split ss =
seq {
yield(ss |> Seq.truncate n)
if Seq.length(ss |> Seq.truncate (step+1)) > step then
yield! split <| (ss |> Seq.skip step)
}
split coll
Being used as partition 5 5 it will provide you with sought batchesOf 5 functionality:
[1..17] |> partition 5 5;;
val it : seq<seq<int>> =
seq
[seq [1; 2; 3; 4; ...]; seq [6; 7; 8; 9; ...]; seq [11; 12; 13; 14; ...];
seq [16; 17]]
As a premium by playing with n and step you can use it for slicing overlapping batches aka sliding windows, and even apply to infinite sequences, like below:
Seq.initInfinite(fun x -> x) |> partition 4 1;;
val it : seq<seq<int>> =
seq
[seq [0; 1; 2; 3]; seq [1; 2; 3; 4]; seq [2; 3; 4; 5]; seq [3; 4; 5; 6];
...]
Consider it as a prototype only as it does many redundant evaluations on the source sequence and not likely fit for production purposes.
This version passes all my tests I could think of including ones for lazy evaluation and single sequence evaluation:
let batchIn batchLength sequence =
let padding = seq { for i in 1 .. batchLength -> None }
let wrapped = sequence |> Seq.map Some
Seq.concat [wrapped; padding]
|> Seq.windowed batchLength
|> Seq.mapi (fun i el -> (i, el))
|> Seq.filter (fun t -> fst t % batchLength = 0)
|> Seq.map snd
|> Seq.map (Seq.choose id)
|> Seq.filter (fun el -> not (Seq.isEmpty el))
I am still quite new to F# so if I'm missing anything - please do correct me, it will be greatly appreciated.

Map Reduce with F# agents

After playing with F# agents I tried to do a map reduce using them.
The basic structure I use is:
map supervisor which queues up all the work to do in its state and receives work request from map workers
reduce supervisor does the same thing as map supervisor for reduce work
a bunch of map and reduce workers that map and reduce, if one fails its work it sends it back to the respective supervisr to be reprocessed.
The questions I wonder about is:
does this make any sense compared to a more traditional (yet very nice) map reduce like (http://tomasp.net/blog/fsharp-parallel-aggregate.aspx) that uses PSeq ?
the way I implemented the map and reduce workers seems ugly is there a better way ?
it seems like I can create a 1000 000 map workers and 1000 0000 reduce workers lol, how should I choose these numbers, the more the better ?
Thanks a lot,
type Agent<'T> = MailboxProcessor<'T>
//This is the response the supervisor
//gives to the worker request for work
type 'work SupervisorResponse =
| Work of 'work //a piece of work
| NoWork//no work left to do
//This is the message to the supervisor
type 'work WorkMsg =
| ToDo of 'work //piles up work in the Supervisor queue
| WorkReq of AsyncReplyChannel<SupervisorResponse<'work>> //'
//The supervisor agent can be interacted with
type AgentOperation =
| Stop //stop the agent
| Status //yield the current status of supervisor
type 'work SupervisorMsg =
| WorkRel of 'work WorkMsg
| Operation of AgentOperation
//Supervises Map and Reduce workers
module AgentSupervisor=
let getNew (name:string) =
new Agent<SupervisorMsg<'work>>(fun inbox -> //'
let rec loop state = async {
let! msg = inbox.Receive()
match msg with
| WorkRel(m) ->
match m with
| ToDo(work) ->
let newState = work:state
return! loop newState
| WorkReq(replyChannel) ->
match state with
| [] ->
replyChannel.Reply(NoWork)
return! loop []
| [item] ->
replyChannel.Reply(Work(item))
return! loop []
| (item::remaining) ->
replyChannel.Reply(Work(item))
return! loop remaining
| Operation(op) ->
match op with
| Status ->
Console.WriteLine(name+" current Work Queue "+
string (state.Length))
return! loop state
| Stop ->
Console.WriteLine("Stoppped SuperVisor Agent "+name)
return()
}
loop [] )
let stop (agent:Agent<SupervisorMsg<'work>>) = agent.Post(Operation(Stop))
let status (agent:Agent<SupervisorMsg<'work>>) =agent.Post(Operation(Status))
//Code for the workers
type 'success WorkOutcome =
| Success of 'success
| Fail
type WorkerMsg =
| Start
| Stop
| Continue
module AgentWorker =
type WorkerSupervisors<'reduce,'work> =
{ Map:Agent<SupervisorMsg<'work>> ; Reduce:Agent<SupervisorMsg<'reduce>> }
let stop (agent:Agent<WorkerMsg>) = agent.Post(Stop)
let start (agent:Agent<WorkerMsg>) = agent.Start()
agent.Post(Start)
let getNewMapWorker( map, supervisors:WorkerSupervisors<'reduce,'work> ) =
new Agent<WorkerMsg>(fun inbox ->
let rec loop () = async {
let! msg = inbox.Receive()
match msg with
| Start -> inbox.Post(Continue)
return! loop ()
| Continue ->
let! supervisorOrder =
supervisors.Map.PostAndAsyncReply(
fun replyChannel ->
WorkRel(WorkReq(replyChannel)))
match supervisorOrder with
| Work(work) ->
let! res = map work
match res with
| Success(toReduce) ->
supervisors.Reduce
.Post(WorkRel(ToDo(toReduce)))
| Fail ->
Console.WriteLine("Map Fail")
supervisors.Map
.Post(WorkRel(ToDo(work)))
inbox.Post(Continue)
| NoWork ->
inbox.Post(Continue)
return! loop ()
| Stop ->
Console.WriteLine("Map worker stopped")
return ()
}
loop () )
let getNewReduceWorker(reduce,reduceSupervisor:Agent<SupervisorMsg<'work>>)=//'
new Agent<WorkerMsg>(fun inbox ->
let rec loop () = async {
let! msg = inbox.Receive()
match msg with
| Start -> inbox.Post(Continue)
return! loop()
| Continue ->
let! supervisorOrder =
reduceSupervisor.PostAndAsyncReply(fun replyChannel ->
WorkRel(WorkReq(replyChannel)))
match supervisorOrder with
| Work(work) ->
let! res = reduce work
match res with
| Success(toReduce) -> inbox.Post(Continue)
| Fail ->
Console.WriteLine("ReduceFail")
reduceSupervisor.Post(WorkRel(ToDo(work)))
inbox.Post(Continue)
| NoWork -> inbox.Post(Continue)
return! loop()
|Stop ->Console.WriteLine("Reduce worker stopped"); return ()
}
loop() )
open AgentWorker
type MapReduce<'work,'reduce>( numberMap:int ,
numberReduce: int,
toProcess:'work list,
map:'work->Async<'reduce WorkOutcome>,
reduce:'reduce-> Async<unit WorkOutcome>) =
let mapSupervisor= AgentSupervisor.getNew("MapSupervisor")
let reduceSupervisor = AgentSupervisor.getNew("ReduceSupervisor")
let workerSupervisors = {Map = mapSupervisor ; Reduce = reduceSupervisor }
let mapWorkers =
[for i in 1..numberMap ->
AgentWorker.getNewMapWorker(map,workerSupervisors) ]
let reduceWorkers =
[for i in 1..numberReduce ->
AgentWorker.getNewReduceWorker(reduce,workerSupervisors.Reduce) ]
member this.Start() =
//Post work to do
toProcess
|>List.iter(fun elem -> mapSupervisor.Post( WorkRel(ToDo(elem))))
//Start supervisors
mapSupervisor.Start()
reduceSupervisor.Start()
//start workers
List.iter( fun mapper -> mapper |>start) mapWorkers
List.iter( fun reducer ->reducer|>start) reduceWorkers
member this.Status() = (mapSupervisor|>AgentSupervisor.status)
(reduceSupervisor|>AgentSupervisor.status)
member this.Stop() =
List.map2(fun mapper reducer ->
mapper |>stop; reducer|>stop) mapWorkers reduceWorkers
//Run some tests
let map = function (n:int64) -> async{ return Success(n) }
let reduce = function (toto: int64) -> async{ return Success() }
let mp = MapReduce<int64,int64>( 1,1,[for i in 1L..1000000L->i],map,reduce)
mp.Start()
mp.Status()
mp.Stop()
I like to use MailboxProcessor for the reduce part of the algorithm, and async block that's invoked with Async.Parallel for the map part. It makes things more explicit, giving you finer control over exception handling, timeouts, and cancellation.
The following code was designed with Brian's help, and with the help of his excellent F# block highlighting "F# Depth Colorizer" plug-in for VS2010.
This code is meant to pull RSS feeds from yahoo weather server in a map-reduce pattern. It demonstrates how we can control execution flow from the outside of actual algorithm.
fetchWeather is the map part, and mailboxLoop is the reduce part of the algorithm.
#r "System.Xml.Linq.dll"
#r "FSharp.PowerPack.dll"
open System
open System.Diagnostics
open System.IO
open System.Linq
open System.Net
open System.Xml.Linq
open Microsoft.FSharp.Control.WebExtensions
type Weather (city, region, temperature) = class
member x.City = city
member x.Region = region
member x.Temperature : int = temperature
override this.ToString() =
sprintf "%s, %s: %d F" this.City this.Region this.Temperature
end
type MessageForActor =
| ProcessWeather of Weather
| ProcessError of int
| GetResults of (Weather * Weather * Weather list) AsyncReplyChannel
let parseRss woeid (rssStream : Stream) =
let xn str = XName.Get str
let yweather elementName = XName.Get(elementName, "http://xml.weather.yahoo.com/ns/rss/1.0")
let channel = (XDocument.Load rssStream).Descendants(xn "channel").First()
let location = channel.Element(yweather "location")
let condition = channel.Element(xn "item").Element(yweather "condition")
// If the RSS server returns error, condition XML element won't be available.
if not(condition = null) then
let temperature = Int32.Parse(condition.Attribute(xn "temp").Value)
ProcessWeather(new Weather(
location.Attribute(xn "city").Value,
location.Attribute(xn "region").Value,
temperature))
else
ProcessError(woeid)
let fetchWeather (actor : MessageForActor MailboxProcessor) woeid =
async {
let rssAddress = sprintf "http://weather.yahooapis.com/forecastrss?w=%d&u=f" woeid
let webRequest = WebRequest.Create rssAddress
use! response = webRequest.AsyncGetResponse()
use responseStream = response.GetResponseStream()
let weather = parseRss woeid responseStream
//do! Async.Sleep 1000 // enable this line to see amplified timing that proves concurrent flow
actor.Post(weather)
}
let mailboxLoop initialCount =
let chooseCityByTemperature op (x : Weather) (y : Weather) =
if op x.Temperature y.Temperature then x else y
let sortWeatherByCityAndState (weatherList : Weather list) =
weatherList
|> List.sortWith (fun x y -> x.City.CompareTo(y.City))
|> List.sortWith (fun x y -> x.Region.CompareTo(y.Region))
MailboxProcessor.Start(fun inbox ->
let rec loop minAcc maxAcc weatherList remaining =
async {
let! message = inbox.Receive()
let remaining = remaining - 1
match message with
| ProcessWeather weather ->
let colderCity = chooseCityByTemperature (<) minAcc weather
let warmerCity = chooseCityByTemperature (>) maxAcc weather
return! loop colderCity warmerCity (weather :: weatherList) remaining
| ProcessError woeid ->
let errorWeather = new Weather(sprintf "Error with woeid=%d" woeid, "ZZ", 99999)
return! loop minAcc maxAcc (errorWeather :: weatherList) remaining
| GetResults replyChannel ->
replyChannel.Reply(minAcc, maxAcc, sortWeatherByCityAndState weatherList)
}
let minValueInitial = new Weather("", "", Int32.MaxValue)
let maxValueInitial = new Weather("", "", Int32.MinValue)
loop minValueInitial maxValueInitial [] initialCount
)
let RunSynchronouslyWithExceptionAndTimeoutHandlers computation =
let timeout = 30000
try
Async.RunSynchronously(Async.Catch(computation), timeout)
|> function Choice1Of2 answer -> answer |> ignore
| Choice2Of2 (except : Exception) -> printfn "%s" except.Message; printfn "%s" except.StackTrace; exit -4
with
| :? System.TimeoutException -> printfn "Timed out waiting for results for %d seconds!" <| timeout / 1000; exit -5
let main =
// Should have script name, sync/async select, and at least one woeid
if fsi.CommandLineArgs.Length < 3 then
printfn "Expecting at least two arguments!"
printfn "There were %d arguments" (fsi.CommandLineArgs.Length - 1)
exit -1
let woeids =
try
fsi.CommandLineArgs
|> Seq.skip 2 // skip the script name and sync/async select
|> Seq.map Int32.Parse
|> Seq.toList
with
| except -> printfn "One of supplied arguments was not an integer: %s" except.Message; exit -2
let actor = mailboxLoop woeids.Length
let processWeatherItemsConcurrently woeids =
woeids
|> Seq.map (fetchWeather actor)
|> Async.Parallel
|> RunSynchronouslyWithExceptionAndTimeoutHandlers
let processOneWeatherItem woeid =
woeid
|> fetchWeather actor
|> RunSynchronouslyWithExceptionAndTimeoutHandlers
let stopWatch = new Stopwatch()
stopWatch.Start()
match fsi.CommandLineArgs.[1].ToUpper() with
| "C" -> printfn "Concurrent execution: "; processWeatherItemsConcurrently woeids
| "S" -> printfn "Synchronous execution: "; woeids |> Seq.iter processOneWeatherItem
| _ -> printfn "Unexpected run options!"; exit -3
let (min, max, weatherList) = actor.PostAndReply GetResults
stopWatch.Stop()
assert (weatherList.Length = woeids.Length)
printfn "{"
weatherList |> List.iter (printfn " %O")
printfn "}"
printfn "Coldest place: %O" min
printfn "Hottest place: %O" max
printfn "Completed in %d millisec" stopWatch.ElapsedMilliseconds
main

Resources