I wrote a program in F# that asynchronously lists all directories on disk. An async task lists all files in a given directory and creates separate async tasks (daemons: I start them using Async.Start) to list subdirectories. They all communicate the results to the central MailboxProcessor.
My problem is, how do I detect that all the daemon tasks have finished and there will be no more files arriving. Essentially I need a barrier for all tasks that are (direct and indirect) children of my top task. I couldn't find anything like that in the F#'s async model.
What I did instead is to create a separate MailboxProcessor where I register each task's start and termination. When the active count goes to zero, I'm done. But I'm not happy with that solution. Any other suggestions?
Have you tried using Async.Parallel? That is, rather than Async.Start each subdirectory, just combine the subdirectory tasks into a single async via Async.Parallel. Then you end up with a (nested) fork-join task that you can RunSynchronously and await the final result.
EDIT
Here is some approximate code, that shows the gist, if not the full detail:
open System.IO
let agent = MailboxProcessor.Start(fun mbox ->
async {
while true do
let! msg = mbox.Receive()
printfn "%s" msg
})
let rec traverse dir =
async {
agent.Post(dir)
let subDirs = Directory.EnumerateDirectories(dir)
return! [for d in subDirs do yield traverse d]
|> Async.Parallel |> Async.Ignore
}
traverse "d:\\" |> Async.RunSynchronously
// now all will be traversed,
// though Post-ed messages to agent may still be in flight
EDIT 2
Here is the waiting version that uses replies:
open System.IO
let agent = MailboxProcessor.Start(fun mbox ->
async {
while true do
let! dir, (replyChannel:AsyncReplyChannel<unit>) = mbox.Receive()
printfn "%s" dir
replyChannel.Reply()
})
let rec traverse dir =
async {
let r = agent.PostAndAsyncReply(fun replyChannel -> dir, replyChannel)
let subDirs = Directory.EnumerateDirectories(dir)
do! [for d in subDirs do yield traverse d]
|> Async.Parallel |> Async.Ignore
do! r // wait for Post to finish
}
traverse "c:\\Projects\\" |> Async.RunSynchronously
// now all will be traversed to completion
You could just use Interlocked to increment and decrement as you begin/end tasks, and be all done when it goes to zero. I've used this strategy in similar code with MailboxProcessors.
This is probably a learning exercise, but it seems that you would be happy with a lazy list of all of the files. Stealing from Brian's answer above... (and I think something like this is in all of the F# books, which I don't have with me at home)
open System.IO
let rec traverse dir =
seq {
let subDirs = Directory.EnumerateDirectories(dir)
yield dir
for d in subDirs do
yield! traverse d
}
For what it is worth, I have found the Async workflow in F# very useful for "embarrassingly easy" parallel problems, though I haven't tried much general multitasking.
You may be better off just using Task.Factory.StartNew() and Task.WaitAll().
Just for clarification: I thought there might have been a better solution similar to what one can do in Chapel. There you have a "sync" statement, a barrier that waits for all the tasks spawned within a statement to finish. Here's an example from the Chapel manual:
def concurrentUpdate(tree: Tree) {
if requiresUpdate(tree) then
begin update(tree);
if !tree.isLeaf {
concurrentUpdate(tree.left);
concurrentUpdate(tree.right);
}
}
sync concurrentUpdate(tree);
The "begin" statement creates a task that is run in parallel, somewhat similar to F# "async" block with Async.Start.
Related
Let's consider I've function loadCustomerProjection:
let loadCustomerProjection id =
use session = store.OpenSession()
let result =
session.Load<CustomerReadModel>(id)
match result with
| NotNull -> Some result
| _ -> None
session.Load is synchronous and returns a CustomerReadModel
session also provides a LoadAsync method which I'm been using like this:
let loadCustomerProjection id =
use session = store.OpenSession()
let y = async {
let! x = session.LoadAsync<CustomerReadModel>(id) |> Async.AwaitTask
return x
}
let x = y |> Async.RunSynchronously
match x with
| NotNull -> Some x
| _ -> None
What I'm trying to understand:
Does the second version even make sense and does it add any value in terms of non blocking behavior as both have the same signature: Guid -> CustomerReadModel option?
Would it make more sense to have this signature Guid -> Async<CustomerReadModel> if loadCustomerProjection is called from with an Giraffe HttpHandler?
Or considering the Giraffe context, would be even better to have the signature Guid -> Task<CustomerReadModel>?
In my Giraffe handler what I want to do at least is to handle a null result via pattern matching as 404 - so at some point I need to have an |> Async.AwaitTask call anyway.
In this particular case there is no difference between your first and second version, except that you go through all the hoops in the second version to call the async function and then immediately wait for it. But the second version blocks exactly like the first one does.
As for the hot vs cold comments in the other answers; Because you wrapped the call to LoadAsync in an async computation expression, it stays cold (because async expression only executes when it is run). If on the other hand you would write
let y =
session.LoadAsync<CustomerReadModel>(id)
|> Async.AwaitTask
Then the LoadAsync would start executing immediately.
If you want to support async operations then it would indeed make sense make the entire function async:
let loadCustomerProjection id =
async {
use session = store.OpenSession()
let! x = session.LoadAsync<CustomerReadModel>(id) |> Async.AwaitTask
match x with
| NotNull -> return Some x
| _ -> return None
}
Whether you would use Task or Async is up to you. Personally I prefer Async because it's native to F#. But in your case you might want to stick with Giraffe's decision to stick with Task to avoid conversions.
Since the main difference between a Task and an Async is that when you have a Task it is always Hot whereas the Async is Cold (until you decide to run it) it makes sense to work with Async in your backend code and then when it is time for Giraffe to use it, convert it into a Task or run it.
In your case you are just doing one thing. I imagine it is more useful to go full async if you have multiple async steps that you want to compose in some way.
See https://learn.microsoft.com/en-us/dotnet/fsharp/tutorials/asynchronous-and-concurrent-programming/async#combine-asynchronous-computations for example.
As mentioned in the other answer the main difference between Task and Async is that Tasks are started immediately whereas Asyncs must be started explicitly.
In the simple example above it wouldn't make much difference in returning the 'T, Async<'T>, or Task<'T>. My preference would probably be Task<'T> as that is what you are receiving in the function and it makes sense to continue propagating the Task<_> until the final point of use.
Note that when you are using Giraffe you should have access to TaskBuilder.fs which gives you a task { } computation expression in the FSharp.Control.Tasks.V2 module.
Currently, I have a function that receives raw data from the outside, process it, and sends it to a callback:
let Process callback rawData =
let data = rawData //transforming into actual data....
callback data
callback is a function 'T -> unit. In my case specifically, it's the Post function of a MailboxProcessor (being called like Process mailbox.Post rawData)
The process function is called multiple times, and each time I push the processed data into the mailbox queue. So far so good.
Now I want to change this code in a way I can publish this processed data to various consumers, using the rx extensions for FSharp (FSharp.Control.Reactive). This means that callback will be either an Observable, or a function that publishes to subscribers. How do I do this?
I found two options:
Create a class that implements IObservable, and pass that object to the Process function. I'd like to avoid creating classes if possible.
Use the Subject.behavior. This does exactly what I want, except it requires a initial state, which doesnt make sense semantically in this case, and apparently Subjects are frowned upon (from a link in the ReactiveX site http://davesexton.com/blog/post/To-Use-Subject-Or-Not-To-Use-Subject.aspx).
What would be the better way, from a functional programming perspective? Is there a better way?
Here's one idea: You can use an object expression to implement IObservable<_> without the overhead of an explicit class:
let createObservable subscribe =
{
new IObservable<_> with
member __.Subscribe(observer) =
subscribe observer
}
To use this, specify a subscribe function of type IObserver<_> -> IDisposable. No classes needed.
Using the observe { .. } computation builder works, but there is a function in the FSharp.Control.Reactive library that does the same thing:
open FSharp.Control.Reactive
let obs = Observable.ofSeq [1;2;3;4;5]
If I was using the observe { .. } computation builder, I'd also use the fact that it supports for loop, which makes your code a bit simpler:
let Process initialData = observe {
for x in initialData do yield x }
Got it. Fsharp reactive provides the keyword observe from the module FSharp.Control.Reactive.Builders. This allows you to create ad-hoc observables:
open FSharp.Control.Reactive.Builders
//In my real case, "initialData" is a byte stream and
//at each step I read a few bytes off of it
let Process initialData =
let rec loop data =
observe {
match data with
| x :: xs ->
yield x
yield! loop xs
| [] -> ()
}
loop initialData
let obs = Process ([1;2;3;4;5])
obs.Subscribe(fun d -> printfn "Consumer A: %A" d) |> ignore
obs.Subscribe(fun d -> printfn "Consumer B: %A" d) |> ignore
Threading.Thread.Sleep 1000
obs.Subscribe(fun d -> printfn "Late consumer: %A" d) |> ignore
Important to note that this creates a cold observable, so the Late consumer receives all events.
In the following code, both do! ag.AsyncAdd (Some i) or ag.AsyncAdd (Some i) (in the function enqueue()) work. What's the difference between them? It seems do! ... will make enqueuing and dequeuing calls more mixed? How?
open FSharpx.Control
let test () =
let ag = new BlockingQueueAgent<int option>(500)
let enqueue() = async {
for i = 1 to 15 do
// ag.AsyncAdd (Some i) // works too
do! ag.AsyncAdd (Some i)
printfn "=> %d" i }
async {
do! [ for i = 1 to 10 do yield enqueue() ]
|> Async.Parallel |> Async.Ignore
for i = 1 to 5 do ag.Add None
} |> Async.Start
let rec dequeue() =
async {
let! m = ag.AsyncGet()
match m with
| Some v ->
printfn "<= %d" v
return! dequeue()
| None ->
printfn "Done"
}
[ for i = 1 to 5 do yield dequeue() ]
|> Async.Parallel |> Async.Ignore |> Async.RunSynchronously
0
Inside any F# computation expression, any keyword that ends with ! tends to mean "Handle this one specially, according to the rules of this block". E.g., in an async { } block, the let! keyword means "await the result, then assign the result to this variable" and the do! keyword means "await this asynchronous operation, but throw away the result and don't assign it to anything". If you don't use a do! keyword, then you are not awaiting the result of that operation.
So with a do! keyword inside your enqueue function, you are doing the following fifteen times:
Kick off an AsyncAdd operation
Wait for it to complete
print "=> 1" (or 2, or 3...)
Without a do! keyword, you are doing the following:
Kick off fifteen AsyncAdd operations as fast as possible
After kicking each one off, print "=> 1" (or 2, or 3...)
It sounds like you don't yet fully understand how F#'s computation expressions work behind the scenes. I recommend reading Scott Wlaschin's excellent site to gain more understanding: first https://fsharpforfunandprofit.com/posts/concurrency-async-and-parallel/ and then https://fsharpforfunandprofit.com/series/computation-expressions.html so that when you read the second series of articles, you're building on a bit of existing knowledge.
From FSharpx source code (see comments):
/// Asynchronously adds item to the queue. The operation ends when
/// there is a place for the item. If the queue is full, the operation
/// will block until some items are removed.
member x.AsyncAdd(v:'T, ?timeout) =
agent.PostAndAsyncReply((fun ch -> AsyncAdd(v, ch)), ?timeout=timeout)
When you do not use do!, you do not block enqueue thread in case if queue is full (500 items in queue as you state in constructor). So, when you changed loops to higher number, you spammed MailboxProcessor queue with messages (behind the scene FSharpx uses MailboxProcessor - check docs for this class) of type AsyncAdd from all iterations of all enqueue thread. This slows down another operation, agent.Scan:
and fullQueue() =
agent.Scan(fun msg ->
match msg with
| AsyncGet(reply) -> Some(dequeueAndContinue(reply))
| _ -> None )
because you have in queue a lot of AsyncAdd and AsyncGet.
In case, when you put do! before AsyncAdd, you enqueue threads will be blocked at moment when there are 500 items in queue and no additional messages would be generated for MailboxProcessor, thus agent.Scan will work fast. When dequeue thread takes an item and the number of them becomes 499, new enqueue thread awaiks and adds new item and then goes to next iteration of a loop, put new AsyncAdd message into MailboxProcessor and again, goes to sleep till moment of dequeue. Thus, MailboxProcessor is not spammed with messages AsyncAdd of all iterations of one enqueue thread. Note: queue of items and queue of MailboxProcessor messages are different queues.
I have a simple TCP server that listens to connection client and it looks quite simple
let Listener (ip:IPAddress) (port:int32) =
async {
let listener = TcpListener(ip, port)
listener.Start()
_logger.Info(sprintf "Server binded to IP: %A - Port: %i" ip port)
let rec listenerPending (listener : TcpListener) =
async {
if not (listener.Pending()) then return! listenerPending listener // MEMMORY LEAK SOURCE
printfn "Test"
}
listenerPending listener |> Async.Start
}
Well, it looks simple, but I have a memory leak problem: when it is waiting for a connection, it eats RAM like candy.
I suppose it is connected with recursive function but have no idea what to do to stabilize it.
The problem is that your listenerPending function is not tail-recursive. This is a bit counter-intuitive - but the return! keyword is not like imperative "return" that breaks the current execution, but rather a call to some other function that can be tail-recursive (as opposed to do! which never is) if it is in the right location.
To illustrate, consider the following:
let rec demo n =
async {
if n > 0 then return! demo (n-1)
printfn "%d" n
}
demo 10 |> Async.RunSynchronously
This actually prints numbers from 0 to 10! This is because the code after return! will still get executed after the recursive call finishes. The structure of your code is similar, except that your loop never terminates (at least not quickly enough). You can fix that just by removing the code after return! (or perhaps moving it to an else branch).
Also, it's worth noting that your code is not really asynchronous - you don't have any non-blocking waiting in it. You should probably be doing something like this instead of a loop:
let! socket = listener.AcceptSocketAsync () |> Async.AwaitTask
In this SO post, adding
inSeq
|> Seq.length
|> printfn "%d lines read"
caused the lazy sequence in inSeq to be read in.
OK, I've expanded on that code and want to first print out that sequence (see new program below).
When the Visual Studio (2012) debugger gets to
inSeq |> Seq.iter (fun x -> printfn "%A" x)
the read process starts over again. When I examine inSeq using the debugger, inSeq appears to have no elements in it.
If I have first read elements into inSeq, how can I see (examine) those elements and why won't they print out with the call to Seq.iter?
open System
open System.Collections.Generic
open System.Text
open System.IO
#nowarn "40"
let rec readlines () =
seq {
let line = Console.ReadLine()
if not (line.Equals("")) then
yield line
yield! readlines ()
}
[<EntryPoint>]
let main argv =
let inSeq = readlines ()
inSeq
|> Seq.length
|> printfn "%d lines read"
inSeq |> Seq.iter (fun x -> printfn "%A" x)
// This will keep it alive enough to read your output
Console.ReadKey() |> ignore
0
I've read somewhere that results of lazy evaluation are not cached. Is that what is going on here? How can I cache the results?
Sequence is not a "container" of items, rather it's a "promise" to deliver items sometime in the future. You can think of it as a function that you call, except it returns its result in chunks, not all at once. If you call that function once, it returns you the result once. If you call it second time, it will return the result second time.
Because your particular sequence is not pure, you can compare it to a non-pure function: you call it once, it returns a result; you call it second time, it may return something different.
Sequences do not automatically "remember" their items after the first read - exactly same way as functions do not automatically "remember" their result after the first call. If you want that from a function, you can wrap it in a special "caching" wrapper. And so you can do for a sequence as well.
The general technique of "caching return value" is usually called "memoization". For F# sequences in particular, it is implemented in the Seq.cache function.