F# and memory leaks in simple rec function - f#

I have a simple TCP server that listens to connection client and it looks quite simple
let Listener (ip:IPAddress) (port:int32) =
async {
let listener = TcpListener(ip, port)
listener.Start()
_logger.Info(sprintf "Server binded to IP: %A - Port: %i" ip port)
let rec listenerPending (listener : TcpListener) =
async {
if not (listener.Pending()) then return! listenerPending listener // MEMMORY LEAK SOURCE
printfn "Test"
}
listenerPending listener |> Async.Start
}
Well, it looks simple, but I have a memory leak problem: when it is waiting for a connection, it eats RAM like candy.
I suppose it is connected with recursive function but have no idea what to do to stabilize it.

The problem is that your listenerPending function is not tail-recursive. This is a bit counter-intuitive - but the return! keyword is not like imperative "return" that breaks the current execution, but rather a call to some other function that can be tail-recursive (as opposed to do! which never is) if it is in the right location.
To illustrate, consider the following:
let rec demo n =
async {
if n > 0 then return! demo (n-1)
printfn "%d" n
}
demo 10 |> Async.RunSynchronously
This actually prints numbers from 0 to 10! This is because the code after return! will still get executed after the recursive call finishes. The structure of your code is similar, except that your loop never terminates (at least not quickly enough). You can fix that just by removing the code after return! (or perhaps moving it to an else branch).
Also, it's worth noting that your code is not really asynchronous - you don't have any non-blocking waiting in it. You should probably be doing something like this instead of a loop:
let! socket = listener.AcceptSocketAsync () |> Async.AwaitTask

Related

How many threads used by Async.Parallel

Example this code:
let b id =
async {
printfn "running %A \n" id
System.Threading.Thread.Sleep(2000)
printfn "finish %A \n" id
}
let works = [1..100]
works |> Seq.map b|> Async.Parallel |> Async.RunSynchronously
how many threads will be used ?
The answer is; we can't know. And why would we want to?
Also, it's not clear what you're asking. Are you asking whether b will be run on a single thread from start to end? For this example the answer is yes, but a slight change could turn the answer into a no. Are you asking how many threads will be used in total? By ID or by runs? How many will run simultaneously?
It is often better just to think in terms of asyncs rather than threads, since threads are just resources used behind the scenes by asyncs.
Some more info.
There will be a thread pool that is used to run the asyncs, and the size of that thread pool is usually the number of processors. On my machine, I have 8 processors with hyperthreading, so then I guess there will be 16 threads in the thread pool.
The code in the Q produces a quite messy output, since printfn does not write atomically to the console. In order to see line by line output, I will use Console.WriteLine, which will write lines atomically.
open System
let t: bool[] = Array.zeroCreate 10000
let b id =
async {
let threadId = Threading.Thread.CurrentThread.ManagedThreadId
t.[threadId] <- true
Console.WriteLine $#"running {id} on {threadId}"
Threading.Thread.Sleep 2000
Console.WriteLine $#"finish {id} on {threadId}"
}
[<EntryPoint>]
let main argv =
Console.WriteLine "Start"
[1..100]
|> Seq.map b
|> Async.Parallel
|> Async.RunSynchronously
|> ignore
Console.WriteLine "Done"
let n = t |> Array.filter id |> Array.length
Console.WriteLine $#"n = {n}"
Console.ReadKey() |> ignore
0
So here we keep track of which threads are used overall.
It turns out that the number of threads used varies from 19 to 23 on my machine, and I bet it can be lower and higher. But why isn't the number 16 on my machine? I suspect the reason for this higher than expected number is that the asyncs use Threading.Thread.Sleep instead of Async.Sleep. When a thread from the thread pool turns out to be long running, then soon a new thread must be allocated in its stead so that the corresponding processor can still be used effectively by the thread pool. If the length of the sleep is increased, then the number of threads used overall go up. If I try with Threading.Thread.Sleep 100_000, then 100 threads are used.
In production, use Async.Sleep instead for actually sleeping an async for a while, so that thread pool threads are not blocked, but returned to the thread pool so that they can do other work instead of just doing a blocked sleep for a long time. When I do that here, only 33 threads are used. But 33 still seems high. Do I still have a problem? I don't think so. Let's explore some more to see why.
There is an overload of Async.Parallel which can be used to specify the degree of parallelism. Let's see what happens.
open System
let t: bool[] = Array.zeroCreate 10000
let b id =
async {
let threadId = Threading.Thread.CurrentThread.ManagedThreadId
t.[threadId] <- true
Console.WriteLine $#"running {id} on {threadId}"
// Threading.Thread.Sleep(2000)
do! Async.Sleep 2000
let threadId2 = Threading.Thread.CurrentThread.ManagedThreadId
t.[threadId2] <- true
Console.WriteLine $#"finish {id} on {threadId2}"
if threadId <> threadId2 then Console.WriteLine "NOT THE SAME THREAD!"
}
let asyncParallel x y = Async.Parallel (y, x)
[<EntryPoint>]
let main argv =
Console.WriteLine "Start"
[1..100]
|> Seq.map b
|> asyncParallel 3
|> Async.RunSynchronously
|> ignore
Console.WriteLine "Done"
let n = t |> Array.filter id |> Array.length
Console.WriteLine $#"n = {n}"
Console.ReadKey() |> ignore
0
When I run this, I usually get n = 6, but I've also gotten 5. Why does it tend towards 6, but not always be exactly 6? And why isn't it 3?
If the nasty Threading.Thread.Sleep is used instead of Async.Sleep, then n will tend towards 3.
If one more Async.Sleep is inserted into b, so that there are two, will then n tend towards 9? Surprisingly, no! It's still at 6.
So far it looks like use of one or more bang - the do! in our case - will cause use of two threads per b running in total, instead of only one. Not sure why, but of course it doesn't mean two threads are used simultaneously by one run of b, but rather in sequence, before and after the Async.Sleep. (Can also sometimes happen to be the same thread that continues the work, which can explain why I got 5 rather than 6 one time.)
Now we can guess more easily as to why 33 threads was used when Threading.Thread.Sleep 100_000 was replaced with Async.Sleep 100_000. It's just one more than 32, which is 2 * 16, and 16 is the expected number of threads in the thread pool. When Async.Sleep 500_000 is used, the number of threads involved is still only 33 on my machine.
We haven't really reached any conclusions with the experiments, but we've gotten some insight into how things work.

Grokking F# async vs. System.Threading.Task

Let's consider I've function loadCustomerProjection:
let loadCustomerProjection id =
use session = store.OpenSession()
let result =
session.Load<CustomerReadModel>(id)
match result with
| NotNull -> Some result
| _ -> None
session.Load is synchronous and returns a CustomerReadModel
session also provides a LoadAsync method which I'm been using like this:
let loadCustomerProjection id =
use session = store.OpenSession()
let y = async {
let! x = session.LoadAsync<CustomerReadModel>(id) |> Async.AwaitTask
return x
}
let x = y |> Async.RunSynchronously
match x with
| NotNull -> Some x
| _ -> None
What I'm trying to understand:
Does the second version even make sense and does it add any value in terms of non blocking behavior as both have the same signature: Guid -> CustomerReadModel option?
Would it make more sense to have this signature Guid -> Async<CustomerReadModel> if loadCustomerProjection is called from with an Giraffe HttpHandler?
Or considering the Giraffe context, would be even better to have the signature Guid -> Task<CustomerReadModel>?
In my Giraffe handler what I want to do at least is to handle a null result via pattern matching as 404 - so at some point I need to have an |> Async.AwaitTask call anyway.
In this particular case there is no difference between your first and second version, except that you go through all the hoops in the second version to call the async function and then immediately wait for it. But the second version blocks exactly like the first one does.
As for the hot vs cold comments in the other answers; Because you wrapped the call to LoadAsync in an async computation expression, it stays cold (because async expression only executes when it is run). If on the other hand you would write
let y =
session.LoadAsync<CustomerReadModel>(id)
|> Async.AwaitTask
Then the LoadAsync would start executing immediately.
If you want to support async operations then it would indeed make sense make the entire function async:
let loadCustomerProjection id =
async {
use session = store.OpenSession()
let! x = session.LoadAsync<CustomerReadModel>(id) |> Async.AwaitTask
match x with
| NotNull -> return Some x
| _ -> return None
}
Whether you would use Task or Async is up to you. Personally I prefer Async because it's native to F#. But in your case you might want to stick with Giraffe's decision to stick with Task to avoid conversions.
Since the main difference between a Task and an Async is that when you have a Task it is always Hot whereas the Async is Cold (until you decide to run it) it makes sense to work with Async in your backend code and then when it is time for Giraffe to use it, convert it into a Task or run it.
In your case you are just doing one thing. I imagine it is more useful to go full async if you have multiple async steps that you want to compose in some way.
See https://learn.microsoft.com/en-us/dotnet/fsharp/tutorials/asynchronous-and-concurrent-programming/async#combine-asynchronous-computations for example.
As mentioned in the other answer the main difference between Task and Async is that Tasks are started immediately whereas Asyncs must be started explicitly.
In the simple example above it wouldn't make much difference in returning the 'T, Async<'T>, or Task<'T>. My preference would probably be Task<'T> as that is what you are receiving in the function and it makes sense to continue propagating the Task<_> until the final point of use.
Note that when you are using Giraffe you should have access to TaskBuilder.fs which gives you a task { } computation expression in the FSharp.Control.Tasks.V2 module.

What's different between async { ... AsyncAdd ... } and async { ... do AsyncAdd ... }?

In the following code, both do! ag.AsyncAdd (Some i) or ag.AsyncAdd (Some i) (in the function enqueue()) work. What's the difference between them? It seems do! ... will make enqueuing and dequeuing calls more mixed? How?
open FSharpx.Control
let test () =
let ag = new BlockingQueueAgent<int option>(500)
let enqueue() = async {
for i = 1 to 15 do
// ag.AsyncAdd (Some i) // works too
do! ag.AsyncAdd (Some i)
printfn "=> %d" i }
async {
do! [ for i = 1 to 10 do yield enqueue() ]
|> Async.Parallel |> Async.Ignore
for i = 1 to 5 do ag.Add None
} |> Async.Start
let rec dequeue() =
async {
let! m = ag.AsyncGet()
match m with
| Some v ->
printfn "<= %d" v
return! dequeue()
| None ->
printfn "Done"
}
[ for i = 1 to 5 do yield dequeue() ]
|> Async.Parallel |> Async.Ignore |> Async.RunSynchronously
0
Inside any F# computation expression, any keyword that ends with ! tends to mean "Handle this one specially, according to the rules of this block". E.g., in an async { } block, the let! keyword means "await the result, then assign the result to this variable" and the do! keyword means "await this asynchronous operation, but throw away the result and don't assign it to anything". If you don't use a do! keyword, then you are not awaiting the result of that operation.
So with a do! keyword inside your enqueue function, you are doing the following fifteen times:
Kick off an AsyncAdd operation
Wait for it to complete
print "=> 1" (or 2, or 3...)
Without a do! keyword, you are doing the following:
Kick off fifteen AsyncAdd operations as fast as possible
After kicking each one off, print "=> 1" (or 2, or 3...)
It sounds like you don't yet fully understand how F#'s computation expressions work behind the scenes. I recommend reading Scott Wlaschin's excellent site to gain more understanding: first https://fsharpforfunandprofit.com/posts/concurrency-async-and-parallel/ and then https://fsharpforfunandprofit.com/series/computation-expressions.html so that when you read the second series of articles, you're building on a bit of existing knowledge.
From FSharpx source code (see comments):
/// Asynchronously adds item to the queue. The operation ends when
/// there is a place for the item. If the queue is full, the operation
/// will block until some items are removed.
member x.AsyncAdd(v:'T, ?timeout) =
agent.PostAndAsyncReply((fun ch -> AsyncAdd(v, ch)), ?timeout=timeout)
When you do not use do!, you do not block enqueue thread in case if queue is full (500 items in queue as you state in constructor). So, when you changed loops to higher number, you spammed MailboxProcessor queue with messages (behind the scene FSharpx uses MailboxProcessor - check docs for this class) of type AsyncAdd from all iterations of all enqueue thread. This slows down another operation, agent.Scan:
and fullQueue() =
agent.Scan(fun msg ->
match msg with
| AsyncGet(reply) -> Some(dequeueAndContinue(reply))
| _ -> None )
because you have in queue a lot of AsyncAdd and AsyncGet.
In case, when you put do! before AsyncAdd, you enqueue threads will be blocked at moment when there are 500 items in queue and no additional messages would be generated for MailboxProcessor, thus agent.Scan will work fast. When dequeue thread takes an item and the number of them becomes 499, new enqueue thread awaiks and adds new item and then goes to next iteration of a loop, put new AsyncAdd message into MailboxProcessor and again, goes to sleep till moment of dequeue. Thus, MailboxProcessor is not spammed with messages AsyncAdd of all iterations of one enqueue thread. Note: queue of items and queue of MailboxProcessor messages are different queues.

Why does iterating previously read-in sequence trigger a new read?

In this SO post, adding
inSeq
|> Seq.length
|> printfn "%d lines read"
caused the lazy sequence in inSeq to be read in.
OK, I've expanded on that code and want to first print out that sequence (see new program below).
When the Visual Studio (2012) debugger gets to
inSeq |> Seq.iter (fun x -> printfn "%A" x)
the read process starts over again. When I examine inSeq using the debugger, inSeq appears to have no elements in it.
If I have first read elements into inSeq, how can I see (examine) those elements and why won't they print out with the call to Seq.iter?
open System
open System.Collections.Generic
open System.Text
open System.IO
#nowarn "40"
let rec readlines () =
seq {
let line = Console.ReadLine()
if not (line.Equals("")) then
yield line
yield! readlines ()
}
[<EntryPoint>]
let main argv =
let inSeq = readlines ()
inSeq
|> Seq.length
|> printfn "%d lines read"
inSeq |> Seq.iter (fun x -> printfn "%A" x)
// This will keep it alive enough to read your output
Console.ReadKey() |> ignore
0
I've read somewhere that results of lazy evaluation are not cached. Is that what is going on here? How can I cache the results?
Sequence is not a "container" of items, rather it's a "promise" to deliver items sometime in the future. You can think of it as a function that you call, except it returns its result in chunks, not all at once. If you call that function once, it returns you the result once. If you call it second time, it will return the result second time.
Because your particular sequence is not pure, you can compare it to a non-pure function: you call it once, it returns a result; you call it second time, it may return something different.
Sequences do not automatically "remember" their items after the first read - exactly same way as functions do not automatically "remember" their result after the first call. If you want that from a function, you can wrap it in a special "caching" wrapper. And so you can do for a sequence as well.
The general technique of "caching return value" is usually called "memoization". For F# sequences in particular, it is implemented in the Seq.cache function.

Asynchronous barrier in F#

I wrote a program in F# that asynchronously lists all directories on disk. An async task lists all files in a given directory and creates separate async tasks (daemons: I start them using Async.Start) to list subdirectories. They all communicate the results to the central MailboxProcessor.
My problem is, how do I detect that all the daemon tasks have finished and there will be no more files arriving. Essentially I need a barrier for all tasks that are (direct and indirect) children of my top task. I couldn't find anything like that in the F#'s async model.
What I did instead is to create a separate MailboxProcessor where I register each task's start and termination. When the active count goes to zero, I'm done. But I'm not happy with that solution. Any other suggestions?
Have you tried using Async.Parallel? That is, rather than Async.Start each subdirectory, just combine the subdirectory tasks into a single async via Async.Parallel. Then you end up with a (nested) fork-join task that you can RunSynchronously and await the final result.
EDIT
Here is some approximate code, that shows the gist, if not the full detail:
open System.IO
let agent = MailboxProcessor.Start(fun mbox ->
async {
while true do
let! msg = mbox.Receive()
printfn "%s" msg
})
let rec traverse dir =
async {
agent.Post(dir)
let subDirs = Directory.EnumerateDirectories(dir)
return! [for d in subDirs do yield traverse d]
|> Async.Parallel |> Async.Ignore
}
traverse "d:\\" |> Async.RunSynchronously
// now all will be traversed,
// though Post-ed messages to agent may still be in flight
EDIT 2
Here is the waiting version that uses replies:
open System.IO
let agent = MailboxProcessor.Start(fun mbox ->
async {
while true do
let! dir, (replyChannel:AsyncReplyChannel<unit>) = mbox.Receive()
printfn "%s" dir
replyChannel.Reply()
})
let rec traverse dir =
async {
let r = agent.PostAndAsyncReply(fun replyChannel -> dir, replyChannel)
let subDirs = Directory.EnumerateDirectories(dir)
do! [for d in subDirs do yield traverse d]
|> Async.Parallel |> Async.Ignore
do! r // wait for Post to finish
}
traverse "c:\\Projects\\" |> Async.RunSynchronously
// now all will be traversed to completion
You could just use Interlocked to increment and decrement as you begin/end tasks, and be all done when it goes to zero. I've used this strategy in similar code with MailboxProcessors.
This is probably a learning exercise, but it seems that you would be happy with a lazy list of all of the files. Stealing from Brian's answer above... (and I think something like this is in all of the F# books, which I don't have with me at home)
open System.IO
let rec traverse dir =
seq {
let subDirs = Directory.EnumerateDirectories(dir)
yield dir
for d in subDirs do
yield! traverse d
}
For what it is worth, I have found the Async workflow in F# very useful for "embarrassingly easy" parallel problems, though I haven't tried much general multitasking.
You may be better off just using Task.Factory.StartNew() and Task.WaitAll().
Just for clarification: I thought there might have been a better solution similar to what one can do in Chapel. There you have a "sync" statement, a barrier that waits for all the tasks spawned within a statement to finish. Here's an example from the Chapel manual:
def concurrentUpdate(tree: Tree) {
if requiresUpdate(tree) then
begin update(tree);
if !tree.isLeaf {
concurrentUpdate(tree.left);
concurrentUpdate(tree.right);
}
}
sync concurrentUpdate(tree);
The "begin" statement creates a task that is run in parallel, somewhat similar to F# "async" block with Async.Start.

Resources