I'm being a little adventurous with my code for the amount of experience I have with F# and I am a little worried about cross threading issues.
Background:
I have a number of orders where I need to validate the address. Some of the orders can be validated against google maps geocoding API which allows 50/ second. the rest are Australian PO Boxes which we don't have many of - but I need to validate them against a different API that only allows 1 call per second.
I have switched over most of my code from async{} functions to task{} functions and I am assuming to get something on several threads at the same time it needs to be in an async{} function or block and be piped to Async.Parallel
Question: Is this the right way to do this or will it fall over? I am wondering if I am fundamentally thinking about this the wrong way.
Notes:
I am passing a database context into the async function and updating the database within that function
I will call this from a C# ( WPF ) Application and report the progress
Am I going to have cross threading issues?
let validateOrder
(
order: artooProvider.dataContext.``dbo.OrdersEntity``,
httpClient: HttpClient,
ctx: artooProvider.dataContext,
isAuPoBox: bool
) =
async {
// Validate Address
let! addressExceptions = ValidateAddress.validateAddress (order, httpClient, ctx, isAuPoBox) |> Async.AwaitTask
// SaveExceptions
do! ctx.SubmitUpdatesAsync()
// return Exception count
return ""
}
let validateGMapOrders(httpClient: HttpClient, ctx: artooProvider.dataContext, orders: artooProvider.dataContext.``dbo.OrdersEntity`` list) =
async {
let ordersChunked = orders |> List.chunkBySize 50
for fiftyOrders in ordersChunked do
let! tasks =
fiftyOrders
|> List.map (fun (order) -> validateOrder (order, httpClient, ctx, false) )
|> Async.Parallel
do! Async.Sleep(2000)
}
let validateOrders (ctx: artooProvider.dataContext, progress: IProgress<DownloadProgressModel>) =
task {
let unvalidatedOrders =
query {
for orders in ctx.Dbo.Orders do
where (orders.IsValidated.IsNone)
select (orders)
}
|> Seq.toList
let auPoBoxOrders =
unvalidatedOrders
|> List.filter (fun order -> isAUPoBox(order) = true )
let gMapOrders =
unvalidatedOrders
|> List.filter (fun order -> isAUPoBox(order) = false )
let googleHttpClient = new HttpClient()
let auspostHttpclient = Auspost.AuspostApi.getApiClient ()
// Google maps validations
do! validateGMapOrders(googleHttpClient,ctx,gMapOrders)
// PO Box Validations
for position in 0 .. auPoBoxOrders.Length - 1 do
let! result = validateOrder (gMapOrders[position], auspostHttpclient, ctx, true)
do! Task.Delay(1000)
return true
}
When I have had to deal with rate-limited API problems I hide that API behind a MailboxProcessor that maintains an internal time to comply with the rate limit but appears as a normal async API from the outside.
Since you have two API's with different rate limits I'd parameterise the time delay and processing action then create one object for each API.
open System
type Request = string
type Response = string
type RateLimitedProcessor() =
// Initialise 1s in past so ready to start immediately.
let mutable lastCall = DateTime.Now - TimeSpan(0, 0, 1)
let mbox = new MailboxProcessor<Request * AsyncReplyChannel<Response>>((fun mbox ->
let rec f () =
async {
let! (req, reply) = mbox.Receive()
let msSinceCall = (DateTime.Now - lastCall).Milliseconds
// wait 1s between requests
if msSinceCall < 1000 then
do! Async.Sleep (1000 - msSinceCall)
lastCall <- DateTime.Now
reply.Reply "Response"
// Call self recursively to process the next incoming message
return! f()
}
f()
))
do mbox.Start()
member __.Process(req:Request): Async<Response> =
async {
return! mbox.PostAndAsyncReply(fun reply -> req, reply)
}
interface IDisposable with
member this.Dispose() = (mbox :> IDisposable).Dispose()
I am a little curious about the code example below and what people think.
The idea was to read from a NetworkStream (~20 msg/s) and instead of working in the main, pass things to MainboxProcessor to handle and get things back for bindings when done.
The usual way is to use PostAndReply, but I want to bind to ListView or other control in C#. Must do magic with LastN items and filtering anyway.
Plus, Rx has some error handling.
The example below observes numbers from 2..10 and returns "hello X". On 8 it stops like it was EOF. Made it to ToEnumerable because other thread finishes before otherwise, but it works with Subscribe as well.
What bothers me:
passing Subject(obj) around in recursion. I don't see any problems having around 3-4 of those. Good idea?
Lifetime of Subject.
open System
open System.Threading
open System.Reactive.Subjects
open System.Reactive.Linq // NuGet, take System.Reactive.Core also.
open System.Reactive.Concurrency
type SerializedLogger() =
let _letters = new Subject<string>()
// create the mailbox processor
let agent = MailboxProcessor.Start(fun inbox ->
// the message processing function
let rec messageLoop (letters:Subject<string>) = async{
// read a message
let! msg = inbox.Receive()
printfn "mailbox: %d in Thread: %d" msg Thread.CurrentThread.ManagedThreadId
do! Async.Sleep 100
// write it to the log
match msg with
| 8 -> letters.OnCompleted() // like EOF.
| x -> letters.OnNext(sprintf "hello %d" x)
// loop to top
return! messageLoop letters
}
// start the loop
messageLoop _letters
)
// public interface
member this.Log msg = agent.Post msg
member this.Getletters() = _letters.AsObservable()
/// Print line with prefix 1.
let myPrint1 x = printfn "onNext - %s, Thread: %d" x Thread.CurrentThread.ManagedThreadId
// Actions
let onNext = new Action<string>(myPrint1)
let onCompleted = new Action(fun _ -> printfn "Complete")
[<EntryPoint>]
let main argv =
async{
printfn "Main is on: %d" Thread.CurrentThread.ManagedThreadId
// test
let logger = SerializedLogger()
logger.Log 1 // ignored?
let xObs = logger
.Getletters() //.Where( fun x -> x <> "hello 5")
.SubscribeOn(Scheduler.CurrentThread)
.ObserveOn(Scheduler.CurrentThread)
.ToEnumerable() // this
//.Subscribe(onNext, onCompleted) // or with Dispose()
[2..10] |> Seq.iter (logger.Log)
xObs |> Seq.iter myPrint1
while true
do
printfn "waiting"
System.Threading.Thread.Sleep(1000)
return 0
} |> Async.RunSynchronously // return an integer exit code
I have done similar things, but using the plain F# Event type rather than Subject. It basically lets you create IObservable and trigger its subscribes - much like your use of more complex Subject. The event-based version would be:
type SerializedLogger() =
let letterProduced = new Event<string>()
let lettersEnded = new Event<unit>()
let agent = MailboxProcessor.Start(fun inbox ->
let rec messageLoop (letters:Subject<string>) = async {
// Some code omitted
match msg with
| 8 -> lettersEnded.Trigger()
| x -> letterProduced.Trigger(sprintf "hello %d" x)
// ...
member this.Log msg = agent.Post msg
member this.LetterProduced = letterProduced.Publish
member this.LettersEnded = lettersEnded.Publish
The important differences are:
Event cannot trigger OnCompleted, so I instead exposed two separate events. This is quite unfortunate! Given that Subject is very similar to events in all other aspects, this might be a good reason for using subject instead of plain event.
The nice aspect of using Event is that it is a standard F# type, so you do not need any external dependencies in the agent.
I noticed your comment noting that the first call to Log was ignored. That's because you subscribe to the event handler only after this call happens. I think you could use ReplaySubject variation on the Subject idea here - it replays all events when you subscribe to it, so the one that happened earlier would not be lost (but there is a cost to caching).
In summary, I think using Subject is probably a good idea - it is essentially the same pattern as using Event (which I think is quite standard way of exposing notifications from agents), but it lets you trigger OnCompleted. I would probably not use ReplaySubject, because of the caching cost - you just have to make sure to subscribe before triggering any events.
Suppose I have a stream which only allows one request/response at a time but is used in several threads.
Requests/commands should be throttled such that a new request can only occur once
the previous request has been sent and a reply has been received.
The user would be able to do this
let! res = getResponse("longResp")
let! res2 = getResponse("shortResp")
and not really know or care about the throttle.
I have tried with a modified version of Tomas Petricek's Throttling Agent that allows async with return values, but this requires the user to call getResponse("..") |> Enqueue |> w.Post which is a recipe for disaster (in case they forget to do so).
Is there a good/idiomatic way of doing this in F#?
Then make it explicit in your type system that the returned type needs to be unwrapped with another function. So instead of returning an Async<'T> which as you pointed out can be called directly with Async.Start, rather return something like:
type Queuable<'T> = Queuable of Async<'T>
Then getResponse changes to return a Queueable:
let getResponse (s:string) =
let r =
async{
do! write to your stream
return! read from your stream
}
Queuable r
Provide a function that unwraps the Queuable:
let enqueue (Queuable q) = async{
return! processor.PostAndAsyncReply(fun replyChannel -> replyChannel,q)
}
The processor is an agent that simply runs the Async workflow. Something like this:
let processor = new MailboxProcessor<_>(fun inbox ->
let rec Loop() = async {
let! (r:AsyncReplyChannel<_>,job) = inbox.Receive()
let! res = job
r.Reply res
return! Loop()}
Loop())
Does anyone know of 'prior art' regarding the following subject :
I have data that take some decent time to load. they are historical level for various stocks.
I would like to preload them somehow, to avoid the latency when using my app
However, preloading them in one chunk at start makes my app unresponsive first which is not user friendly
So I would like to not load my data.... unless the user is not requesting any and playing with what he already has, in which case I would like to get little by little. So it is neither 'lazy' nor 'eager', more 'lazy when you need' and 'eager when you can', hence the acronym LWYNEWYC.
I have made the following which seems to work, but I just wonder if there is a recognized and blessed approach for such thing ?
let r = LoggingFakeRepo () :> IQuoteRepository
r.getHisto "1" |> ignore //prints Getting histo for 1 when called
let rc = RepoCached (r) :> IQuoteRepository
rc.getHisto "1" |> ignore //prints Getting histo for 1 the first time only
let rcc = RepoCachedEager (r) :> IQuoteRepository
rcc.getHisto "100" |> ignore //prints Getting histo 1..100 by itself BUT
//prints Getting histo 100 immediately when called
And the classes
type IQuoteRepository =
abstract getUnderlyings : string seq
abstract getHisto : string -> string
type LoggingFakeRepo () =
interface IQuoteRepository with
member x.getUnderlyings = printfn "getting underlyings"
[1 .. 100] |> List.map string :> _
member x.getHisto udl = printfn "getting histo for %A" udl
"I am a historical dataset in a disguised party"
type RepoCached (rep : IQuoteRepository) =
let memoize f =
let cache = new System.Collections.Generic.Dictionary<_, _>()
fun x ->
if cache.ContainsKey(x) then cache.[x]
else let res = f x
cache.[x] <- res
res
let udls = lazy (rep.getUnderlyings )
let gethistom = memoize rep.getHisto
interface IQuoteRepository with
member x.getUnderlyings = udls.Force()
member x.getHisto udl = gethistom udl
type Message = string * AsyncReplyChannel<UnderlyingWrap>
type RepoCachedEager (rep : IQuoteRepository) =
let udls = rep.getUnderlyings
let agent = MailboxProcessor<Message>.Start(fun inbox ->
let repocached = RepoCached (rep) :> IQuoteRepository
let rec loop l =
async { try
let timeout = if l|> List.isEmpty then -1 else 50
let! (udl, replyChannel) = inbox.Receive(timeout)
replyChannel.Reply(repocached.getHisto udl)
do! loop l
with
| :? System.TimeoutException ->
let udl::xs = l
repocached.getHisto udl |> ignore
do! loop xs
}
loop (udls |> Seq.toList))
interface IQuoteRepository with
member x.getUnderlyings = udls
member x.getHisto udl = agent.PostAndReply(fun reply -> udl, reply)
I like your solution. I think using agent to implement some background loading with a timeout is a great way to go - agents can nicely encapsulate mutable state, so it is clearly safe and you can encode the behaviour you want quite easily.
I think asynchronous sequences might be another useful abstraction (if I'm correct, they are available in FSharpX these days). An asynchronous sequence represents a computation that asynchronously produces more values, so they might be a good way to separate the data loader from the rest of the code.
I think you'll still need an agent to synchronize at some point, but you can nicely separate different concerns using async sequences.
The code to load the data might look something like this:
let loadStockPrices repo = asyncSeq {
// TODO: Not sure how you detect that the repository has no more data...
while true do
// Get next item from the repository, preferably asynchronously!
let! data = repo.AsyncGetNextHistoricalValue()
// Return the value to the caller...
yield data }
This code represents the data loader, and it separates it from the code that uses it. From the agent that consumes the data source, you can use AsyncSeq.iterAsync to consume the values and do something with them.
With iterAsync, the function that you specify as a consumer is asynchronous. It may block (i.e. using Sleep) and when it blocks, the source - that is.your loader - is also blocked. This is quite nice implicit way to control the loader from the code that consumes the data.
A feature that is not in the library yet (but would be useful) is an partially eager evaluator that takes AsyncSeq<'T> and returns a new AsyncSeq<'T> but obtains a certain number of elements from the source as soon as possible and caches them (so that the consumer does not have to wait when it asks for a value, as long as the source can produce values fast enough).
I'd like to write some code that runs a sequence of F# scripts (.fsx). The thing is that I could have literally hundreds of scripts and if I do that:
let shellExecute program args =
let startInfo = new ProcessStartInfo()
do startInfo.FileName <- program
do startInfo.Arguments <- args
do startInfo.UseShellExecute <- true
do startInfo.WindowStyle <- ProcessWindowStyle.Hidden
//do printfn "%s" startInfo.Arguments
let proc = Process.Start(startInfo)
()
scripts
|> Seq.iter (shellExecute "fsi")
it could stress too much my 2GB system. Anyway, I'd like to run scripts by batch of n, which seems also a good exercise for learning Async (I guess it's the way to go).
I have started to write some code for that but unfortunately it doesn't work:
open System.Diagnostics
let p = shellExecute "fsi" #"C:\Users\Stringer\foo.fsx"
async {
let! exit = Async.AwaitEvent p.Exited
do printfn "process has exited"
}
|> Async.StartImmediate
foo.fsx is just a hello world script.
What would be the most idiomatic way of solving this problem?
I'd like also to figure out if it's doable to retrieve a return code for each executing script and if not, find another way. Thanks!
EDIT:
Thanks a lot for your insights and links! I've learned a lot.
I just want to add some code for running batchs in parallel using Async.Parallel as Tomas suggested it. Please comment if there is a better implementation for my cut function.
module Seq =
/// Returns a sequence of sequences of N elements from the source sequence.
/// If the length of the source sequence is not a multiple
/// of N, last element of the returned sequence will have a length
/// included between 1 and N-1.
let cut (count : int) (source : seq<´T>) =
let rec aux s length = seq {
if (length < count) then yield s
else
yield Seq.take count s
if (length <> count) then
yield! aux (Seq.skip count s) (length - count)
}
aux source (Seq.length source)
let batchCount = 2
let filesPerBatch =
let q = (scripts.Length / batchCount)
q + if scripts.Length % batchCount = 0 then 0 else 1
let batchs =
scripts
|> Seq.cut filesPerBatch
|> Seq.map Seq.toList
|> Seq.map loop
Async.RunSynchronously (Async.Parallel batchs) |> ignore
EDIT2:
So I had some troubles to get Tomas's guard code working. I guess the f function had to be called in AddHandler method, otherwise we loose the event for ever... Here's the code:
module Event =
let guard f (e:IEvent<´Del, ´Args>) =
let e = Event.map id e
{ new IEvent<´Args> with
member this.AddHandler(d) = e.AddHandler(d); f() //must call f here!
member this.RemoveHandler(d) = e.RemoveHandler(d); f()
member this.Subscribe(observer) =
let rm = e.Subscribe(observer) in f(); rm }
The interesting thing (as mentioned by Tomas) is that it looks like the Exited event is stored somewhere when the process terminates, even though the process has not started with EnableRaisingEvents set to true.
When this property is finally set to true, the event is fired up.
Since I'm not sure that this is the official specification (and also a bit paranoid), I found another solution that consists in starting the process in the guard function, so we ensure that the code will work on whichever situation:
let createStartInfo program args =
new ProcessStartInfo
(FileName = program, Arguments = args, UseShellExecute = false,
WindowStyle = ProcessWindowStyle.Normal,
RedirectStandardOutput = true)
let createProcess info =
let p = new Process()
do p.StartInfo <- info
do p.EnableRaisingEvents <- true
p
let rec loop scripts = async {
match scripts with
| [] -> printfn "FINISHED"
| script::scripts ->
let args = sprintf "\"%s\"" script
let p = createStartInfo "notepad" args |> createProcess
let! exit =
p.Exited
|> Event.guard (fun () -> p.Start() |> ignore)
|> Async.AwaitEvent
let output = p.StandardOutput.ReadToEnd()
do printfn "\nPROCESSED: %s, CODE: %d, OUTPUT: %A"script p.ExitCode output
return! loop scripts
}
Notice I've replaced fsi.exe by notepad.exe so I can replay different scenarios step by step in the debugger and control explicitly the exit of the process myself.
I did some experiments and here is one way to deal with the problem discussed in the comments below my post and in the answer from Joel (which I think doesn't work currently, but could be fixed).
I think the specification of Process is that it can trigger the Exited event after we set the EnableRaisingEvents property to true (and will trigger the event even if the process has already completed before we set the property). To handle this case correctly, we need to enable raising of events after we attach handler to the Exited event.
This is a problme, because if we use AwaitEvent it will block the workflow until the event fires. We cannot do anything after calling AwaitEvent from the workflow (and if we set the property before calling AwaitEvent, then we get a race....). Vladimir's approach is correct, but I think there is a simpler way to deal with this.
I'll create a function Event.guard taking an event and returning an event, which allows us to specify some function that will be executed after a handler is attached to the event. This means that if we do some operation (which in turn triggers the event) inside this function, the event will be handled.
To use it for the problem discussed here, we need to change my original solution as follows. Firstly, the shellExecute function must not set the EnableRaisingEvents property (otherwise, we could lose the event!). Secondly, the waiting code should look like this:
let rec loop scripts = async {
match scripts with
| [] -> printf "FINISHED"
| script::scripts ->
let p = shellExecute fsi script
let! exit =
p.Exited
|> Event.guard (fun () -> p.EnableRaisingEvents <- true)
|> Async.AwaitEvent
let output = p.StandardOutput.ReadToEnd()
return! loop scripts }
Note the use of the Event.guard function. Roughly, it says that after the workflow attaches handler to the p.Exited event, the provided lambda function will run (and will enable raising of events). However, we already attached the handler to the event, so if this causes the event immediately, we're fine!
The implementation (for both Event and Observable) looks like this:
module Event =
let guard f (e:IEvent<'Del, 'Args>) =
let e = Event.map id e
{ new IEvent<'Args> with
member x.AddHandler(d) = e.AddHandler(d)
member x.RemoveHandler(d) = e.RemoveHandler(d); f()
member x.Subscribe(observer) =
let rm = e.Subscribe(observer) in f(); rm }
module Observable =
let guard f (e:IObservable<'Args>) =
{ new IObservable<'Args> with
member x.Subscribe(observer) =
let rm = e.Subscribe(observer) in f(); rm }
Nice thing is that this code is very straightforward.
Your approach looks great to me, I really like the idea of embedding process execution into asynchronous workflows using AwaitEvent!
The likely reason why it didn't work is that you need to set EnableRisingEvents property of the Process to true if you want it to ever trigger the Exited event (don't ask my why you have to do that, it sounds pretty silly to me!) Anyway, I did a couple of other changes to your code when testing it, so here is a version that worked for me:
open System
open System.Diagnostics
let shellExecute program args =
// Configure process to redirect output (so that we can read it)
let startInfo =
new ProcessStartInfo
(FileName = program, Arguments = args, UseShellExecute = false,
WindowStyle = ProcessWindowStyle.Hidden,
RedirectStandardOutput = true)
// Start the process
// Note: We must enable rising events explicitly here!
Process.Start(startInfo, EnableRaisingEvents = true)
Most importantly, the code now sets EnableRaisingEvents to true. I also changed the code to use a syntax where you specify properties of an object when constructing it (to make the code a bit more succinct) and I changed a few properties, so that I can read the output (RedirectStandardOutput).
Now, we can use the AwaitEvent method to wait until a process completes. I'll assume that fsi contains the path to fsi.exe and that scripts is a list of FSX scripts. If you want to run them sequentially, you could use a loop implemented using recursion:
let rec loop scripts = async {
match scripts with
| [] -> printf "FINISHED"
| script::scripts ->
// Start the proces in background
let p = shellExecute fsi script
// Wait until the process completes
let! exit = Async.AwaitEvent p.Exited
// Read the output produced by the process, the exit code
// is available in the `ExitCode` property of `Process`
let output = p.StandardOutput.ReadToEnd()
printfn "\nPROCESSED: %s, CODE: %d\n%A" script p.ExitCode output
// Process the rest of the scripts
return! loop scripts }
// This starts the workflow on background thread, so that we can
// do other things in the meantime. You need to add `ReadLine`, so that
// the console application doesn't quit immedeiately
loop scripts |> Async.Start
Console.ReadLine() |> ignore
Of course, you could also run the processes in parallel (or for example run 2 groups of them in parallel etc.) To do that you would use Async.Parallel (in the usual way).
Anyway, this is a really nice example of using asynchronous workflows in an area where I haven't seen them used so far. Very interesting :-)
In response to Tomas's answer, would this be a workable solution to the race condition involved in starting the process, and then subscribing to its Exited event?
type Process with
static member AsyncStart psi =
let proc = new Process(StartInfo = psi, EnableRaisingEvents = true)
let asyncExit = Async.AwaitEvent proc.Exited
async {
proc.Start() |> ignore
let! args = asyncExit
return proc
}
Unless I'm mistaken, this would subscribe to the event prior to starting the process, and package it all up as an Async<Process> result.
This would allow you to rewrite the rest of the code like this:
let shellExecute program args =
// Configure process to redirect output (so that we can read it)
let startInfo =
new ProcessStartInfo(FileName = program, Arguments = args,
UseShellExecute = false,
WindowStyle = ProcessWindowStyle.Hidden,
RedirectStandardOutput = true)
// Start the process
Process.AsyncStart(startInfo)
let fsi = "PATH TO FSI.EXE"
let rec loop scripts = async {
match scripts with
| [] -> printf "FINISHED"
| script::scripts ->
// Start the proces in background
use! p = shellExecute fsi script
// Read the output produced by the process, the exit code
// is available in the `ExitCode` property of `Process`
let output = p.StandardOutput.ReadToEnd()
printfn "\nPROCESSED: %s, CODE: %d\n%A" script p.ExitCode output
// Process the rest of the scripts
return! loop scripts
}
If that does the job, it's certainly a lot less code to worry about than Vladimir's Async.GetSubject.
What about a mailboxprocessor?
It is possible to simplify version of Subject from blogpost. instead of returning imitation of event, getSubject can return workflow.
Result workflow itself is state machine with two states
1. Event wasn't triggered yet: all pending listeners should be registered
2. Value is already set, listener is served immediately
In code it will appear like this:
type SubjectState<'T> = Listen of ('T -> unit) list | Value of 'T
getSubject implementation is trivial
let getSubject (e : IEvent<_, _>) =
let state = ref (Listen [])
let switchState v =
let listeners =
lock state (fun () ->
match !state with
| Listen ls ->
state := Value v
ls
| _ -> failwith "Value is set twice"
)
for l in listeners do l v
Async.StartWithContinuations(
Async.AwaitEvent e,
switchState,
ignore,
ignore
)
Async.FromContinuations(fun (cont, _, _) ->
let ok, v = lock state (fun () ->
match !state with
| Listen ls ->
state := Listen (cont::ls)
false, Unchecked.defaultof<_>
| Value v ->
true, v
)
if ok then cont v
)