Lazy.. but eager data loader in F# - f#

Does anyone know of 'prior art' regarding the following subject :
I have data that take some decent time to load. they are historical level for various stocks.
I would like to preload them somehow, to avoid the latency when using my app
However, preloading them in one chunk at start makes my app unresponsive first which is not user friendly
So I would like to not load my data.... unless the user is not requesting any and playing with what he already has, in which case I would like to get little by little. So it is neither 'lazy' nor 'eager', more 'lazy when you need' and 'eager when you can', hence the acronym LWYNEWYC.
I have made the following which seems to work, but I just wonder if there is a recognized and blessed approach for such thing ?
let r = LoggingFakeRepo () :> IQuoteRepository
r.getHisto "1" |> ignore //prints Getting histo for 1 when called
let rc = RepoCached (r) :> IQuoteRepository
rc.getHisto "1" |> ignore //prints Getting histo for 1 the first time only
let rcc = RepoCachedEager (r) :> IQuoteRepository
rcc.getHisto "100" |> ignore //prints Getting histo 1..100 by itself BUT
//prints Getting histo 100 immediately when called
And the classes
type IQuoteRepository =
abstract getUnderlyings : string seq
abstract getHisto : string -> string
type LoggingFakeRepo () =
interface IQuoteRepository with
member x.getUnderlyings = printfn "getting underlyings"
[1 .. 100] |> List.map string :> _
member x.getHisto udl = printfn "getting histo for %A" udl
"I am a historical dataset in a disguised party"
type RepoCached (rep : IQuoteRepository) =
let memoize f =
let cache = new System.Collections.Generic.Dictionary<_, _>()
fun x ->
if cache.ContainsKey(x) then cache.[x]
else let res = f x
cache.[x] <- res
res
let udls = lazy (rep.getUnderlyings )
let gethistom = memoize rep.getHisto
interface IQuoteRepository with
member x.getUnderlyings = udls.Force()
member x.getHisto udl = gethistom udl
type Message = string * AsyncReplyChannel<UnderlyingWrap>
type RepoCachedEager (rep : IQuoteRepository) =
let udls = rep.getUnderlyings
let agent = MailboxProcessor<Message>.Start(fun inbox ->
let repocached = RepoCached (rep) :> IQuoteRepository
let rec loop l =
async { try
let timeout = if l|> List.isEmpty then -1 else 50
let! (udl, replyChannel) = inbox.Receive(timeout)
replyChannel.Reply(repocached.getHisto udl)
do! loop l
with
| :? System.TimeoutException ->
let udl::xs = l
repocached.getHisto udl |> ignore
do! loop xs
}
loop (udls |> Seq.toList))
interface IQuoteRepository with
member x.getUnderlyings = udls
member x.getHisto udl = agent.PostAndReply(fun reply -> udl, reply)

I like your solution. I think using agent to implement some background loading with a timeout is a great way to go - agents can nicely encapsulate mutable state, so it is clearly safe and you can encode the behaviour you want quite easily.
I think asynchronous sequences might be another useful abstraction (if I'm correct, they are available in FSharpX these days). An asynchronous sequence represents a computation that asynchronously produces more values, so they might be a good way to separate the data loader from the rest of the code.
I think you'll still need an agent to synchronize at some point, but you can nicely separate different concerns using async sequences.
The code to load the data might look something like this:
let loadStockPrices repo = asyncSeq {
// TODO: Not sure how you detect that the repository has no more data...
while true do
// Get next item from the repository, preferably asynchronously!
let! data = repo.AsyncGetNextHistoricalValue()
// Return the value to the caller...
yield data }
This code represents the data loader, and it separates it from the code that uses it. From the agent that consumes the data source, you can use AsyncSeq.iterAsync to consume the values and do something with them.
With iterAsync, the function that you specify as a consumer is asynchronous. It may block (i.e. using Sleep) and when it blocks, the source - that is.your loader - is also blocked. This is quite nice implicit way to control the loader from the code that consumes the data.
A feature that is not in the library yet (but would be useful) is an partially eager evaluator that takes AsyncSeq<'T> and returns a new AsyncSeq<'T> but obtains a certain number of elements from the source as soon as possible and caches them (so that the consumer does not have to wait when it asks for a value, as long as the source can produce values fast enough).

Related

Waiting for database rows to load using TableDependency and F#

I've got an F# project that loads some files to an outside subsystem and then uses Table Dependency to wait for some rows to be added to a table as a side effect.
Table Dependency is used in the type below to watch for the db changes. It fires a custom event when a row is added/changed/whatever:
// just using this type for the RecordChangedEvent to marshal the id we want into something
type AccountLoaded() =
let mutable someId = ""
// this property name matches the name of the table column (SomeId)
member this.SomeId
with get () = someId
and set (value) = someId <- value
// AccountLoadWatcher
type AccountLoadWatcher() =
let mutable _tableDependency = null
let event = new Event<_>()
interface IDisposable with
member this.Dispose() =
_tableDependency.Stop()
_tableDependency.Dispose()
// custom event we can send when an account is loaded
[<CLIEvent>]
member this.AccountLoaded = event.Publish
member private this.NotifyAccountLoaded(sender : RecordChangedEventArgs<AccountLoaded>) =
let accountLoaded = sender.Entity
event.Trigger(accountLoaded.SomeId)
member this.Watch() =
_tableDependency <- DbLib.getTableDependency "dbo" "AccountTable"
null
_tableDependency.OnChanged.Add(this.NotifyAccountLoaded)
_tableDependency.Start()
What I want to do is take the above object and just wait for all the rows with ids I care about to be loaded. What I have so far is:
let waitForRows(csvFileRows) =
let idsToWaitFor = parseUniqueIdsFromAllRows csvFileRows
let mutable collected = Set.empty
let isInSet id = Set.contains id idsToWaitFor
let notDone = not <| (Set.difference idsToWaitFor collected = Set.empty)
let accountLoadedHandler id =
collected <- collected.Add id
printfn "Id loaded %s, waiting for %A\n" id (Set.difference idsToWaitFor collected)
loadToSubsystem csvFileRows |> ignore
// wait for all the watcher events; filtering each event object for ids we care about
watcher.AccountLoaded
|> Observable.takeWhile (fun _ -> notDone)
|> Observable.filter (fun e -> isInSet e)
|> Observable.subscribe accountLoadedHandler
|> ignore
doMoreWork()
but that just continues to doMoreWork without waiting for all the events i need above.
Do I need to use a task or async? F# Agents?
Given that you are using Observable.takeWhile in your example, I'm assuming that you are using the FSharp.Control.Reactive wrapper to get access to the full range of reactive combinators.
Your approach has some good ideas, such as using takeWhile to wait until you collect all IDs, but the use of mutation is quite unfortunate - it might not even be safe to do this because of possible race conditions.
A nice alternative is to use one of the various scan function to collect a state as the events happen. You can use Observable.scanInit to start with an empty set and add all IDs; followed by Observable.takeWhile to keep accepting events until you have all the IDs you're waiting for. To actually wait (and block), you can use Observable.wait. Something like this:
let waitForRows(csvFileRows) =
let idsToWaitFor = parseUniqueIdsFromAllRows csvFileRows
let finalCollectedIDs =
watcher.AccountLoaded
|> Observable.scanInit Set.empty (fun collected id -> Set.add id collected)
|> Observable.takeWhile (fun collected -> not (Set.isSubset idsToWaitFor co llected))
|> Observable.wait
printfn "Completed. Final collected IDs are: %A" finalCollectedIDs

Is returning results from MailboxProcessor via Rx a good idea?

I am a little curious about the code example below and what people think.
The idea was to read from a NetworkStream (~20 msg/s) and instead of working in the main, pass things to MainboxProcessor to handle and get things back for bindings when done.
The usual way is to use PostAndReply, but I want to bind to ListView or other control in C#. Must do magic with LastN items and filtering anyway.
Plus, Rx has some error handling.
The example below observes numbers from 2..10 and returns "hello X". On 8 it stops like it was EOF. Made it to ToEnumerable because other thread finishes before otherwise, but it works with Subscribe as well.
What bothers me:
passing Subject(obj) around in recursion. I don't see any problems having around 3-4 of those. Good idea?
Lifetime of Subject.
open System
open System.Threading
open System.Reactive.Subjects
open System.Reactive.Linq // NuGet, take System.Reactive.Core also.
open System.Reactive.Concurrency
type SerializedLogger() =
let _letters = new Subject<string>()
// create the mailbox processor
let agent = MailboxProcessor.Start(fun inbox ->
// the message processing function
let rec messageLoop (letters:Subject<string>) = async{
// read a message
let! msg = inbox.Receive()
printfn "mailbox: %d in Thread: %d" msg Thread.CurrentThread.ManagedThreadId
do! Async.Sleep 100
// write it to the log
match msg with
| 8 -> letters.OnCompleted() // like EOF.
| x -> letters.OnNext(sprintf "hello %d" x)
// loop to top
return! messageLoop letters
}
// start the loop
messageLoop _letters
)
// public interface
member this.Log msg = agent.Post msg
member this.Getletters() = _letters.AsObservable()
/// Print line with prefix 1.
let myPrint1 x = printfn "onNext - %s, Thread: %d" x Thread.CurrentThread.ManagedThreadId
// Actions
let onNext = new Action<string>(myPrint1)
let onCompleted = new Action(fun _ -> printfn "Complete")
[<EntryPoint>]
let main argv =
async{
printfn "Main is on: %d" Thread.CurrentThread.ManagedThreadId
// test
let logger = SerializedLogger()
logger.Log 1 // ignored?
let xObs = logger
.Getletters() //.Where( fun x -> x <> "hello 5")
.SubscribeOn(Scheduler.CurrentThread)
.ObserveOn(Scheduler.CurrentThread)
.ToEnumerable() // this
//.Subscribe(onNext, onCompleted) // or with Dispose()
[2..10] |> Seq.iter (logger.Log)
xObs |> Seq.iter myPrint1
while true
do
printfn "waiting"
System.Threading.Thread.Sleep(1000)
return 0
} |> Async.RunSynchronously // return an integer exit code
I have done similar things, but using the plain F# Event type rather than Subject. It basically lets you create IObservable and trigger its subscribes - much like your use of more complex Subject. The event-based version would be:
type SerializedLogger() =
let letterProduced = new Event<string>()
let lettersEnded = new Event<unit>()
let agent = MailboxProcessor.Start(fun inbox ->
let rec messageLoop (letters:Subject<string>) = async {
// Some code omitted
match msg with
| 8 -> lettersEnded.Trigger()
| x -> letterProduced.Trigger(sprintf "hello %d" x)
// ...
member this.Log msg = agent.Post msg
member this.LetterProduced = letterProduced.Publish
member this.LettersEnded = lettersEnded.Publish
The important differences are:
Event cannot trigger OnCompleted, so I instead exposed two separate events. This is quite unfortunate! Given that Subject is very similar to events in all other aspects, this might be a good reason for using subject instead of plain event.
The nice aspect of using Event is that it is a standard F# type, so you do not need any external dependencies in the agent.
I noticed your comment noting that the first call to Log was ignored. That's because you subscribe to the event handler only after this call happens. I think you could use ReplaySubject variation on the Subject idea here - it replays all events when you subscribe to it, so the one that happened earlier would not be lost (but there is a cost to caching).
In summary, I think using Subject is probably a good idea - it is essentially the same pattern as using Event (which I think is quite standard way of exposing notifications from agents), but it lets you trigger OnCompleted. I would probably not use ReplaySubject, because of the caching cost - you just have to make sure to subscribe before triggering any events.

Random / State workflow in F#

I'm trying to wrap my head around mon-, err, workflows in F# and while I think that I have a pretty solid understanding of the basic "Maybe" workflow, trying to implement a state workflow to generate random numbers has really got me stumped.
My non-completed attempt can be seen here:
let randomInt state =
let random = System.Random(state)
// Generate random number and a new state as well
random.Next(0,1000), random.Next()
type RandomWF (initState) =
member this.Bind(rnd,rest) =
let value, newState = rnd initState
// How to feed "newState" into "rest"??
value |> rest
member this.Return a = a // Should I maybe feed "initState" into the computation here?
RandomWF(0) {
let! a = randomInt
let! b = randomInt
let! c = randomInt
return [a; b; c]
} |> printfn "%A"
Edit: Actually got it to work! Not exactly sure how it works though, so if anyone wants to lay it out in a good answer, it's still up for grabs. Here's my working code:
type RandomWF (initState) =
member this.Bind(rnd,rest) =
fun state ->
let value, nextState = rnd state
rest value nextState
member this.Return a = fun _ -> a
member this.Run x = x initState
There are two things that make it harder to see what your workflow is doing:
You're using a function type for the type of your monad,
Your workflow not only builds up the computation, it also runs it.
I think it's clearer to follow once you see how it would look without those two impediments. Here's the workflow defined using a DU wrapper type:
type Random<'a> =
Comp of (int -> 'a * int)
let run init (Comp f) = f init
type Random<'a> with
member this.Run(state) = fst <| run state this
type RandomBuilder() =
member this.Bind(Comp m, f: 'a -> Random<_>) =
Comp <| fun state ->
let value, nextState = m state
let comp = f value
run nextState comp
member this.Return(a) = Comp (fun s -> a, s)
let random = RandomBuilder()
And here is how you use it:
let randomInt =
Comp <| fun state ->
let rnd = System.Random(state)
rnd.Next(0,1000), rnd.Next()
let rand =
random {
let! a = randomInt
let! b = randomInt
let! c = randomInt
return [a; b; c ]
}
rand.Run(0)
|> printfn "%A"
In this version you separately build up the computation (and store it inside the Random type), and then you run it passing in the initial state. Look at how types on the builder methods are inferred and compare them to what MSDN documentation describes.
Edit: Constructing a builder object once and using the binding as an alias of sorts is mostly convention, but it's well justified in that it makes sense for the builders to be stateless. I can see why having parameterized builders seems like a useful feature, but I can't honestly imagine a convincing use case for it.
The key selling point of monads is the separation of definition and execution of a computation.
In your case - what you want to be able to do is to take a representation of your computation and be able to run it with some state - perhaps 0, perhaps 42. You don't need to know the initial state to define a computation that will use it. By passing in the state to the builder, you end up blurring the line between definition and execution, and this simply makes the workflow less useful.
Compare that with async workflow - when you write an async block, you don't make the code run asynchronously. You only create an Async<'a> object representing a computation that will produce an object of 'a when you run it - but how you do it, is up to you. The builder doesn't need to know.

Global state and Async Workflows in F#

A common example used to illustrate asynchronous workflows in F# is retrieving multiple webpages in parallel. One such example is given at: http://en.wikibooks.org/wiki/F_Sharp_Programming/Async_Workflows Code shown here in case the link changes in the future:
open System.Text.RegularExpressions
open System.Net
let download url =
let webclient = new System.Net.WebClient()
webclient.DownloadString(url : string)
let extractLinks html = Regex.Matches(html, #"http://\S+")
let downloadAndExtractLinks url =
let links = (url |> download |> extractLinks)
url, links.Count
let urls =
[#"http://www.craigslist.com/";
#"http://www.msn.com/";
#"http://en.wikibooks.org/wiki/Main_Page";
#"http://www.wordpress.com/";
#"http://news.google.com/";]
let pmap f l =
seq { for a in l -> async { return f a } }
|> Async.Parallel
|> Async.Run
let testSynchronous() = List.map downloadAndExtractLinks urls
let testAsynchronous() = pmap downloadAndExtractLinks urls
let time msg f =
let stopwatch = System.Diagnostics.Stopwatch.StartNew()
let temp = f()
stopwatch.Stop()
printfn "(%f ms) %s: %A" stopwatch.Elapsed.TotalMilliseconds msg temp
let main() =
printfn "Start..."
time "Synchronous" testSynchronous
time "Asynchronous" testAsynchronous
printfn "Done."
main()
What I would like to know is how one should handle changes in global state such as loss of a network connection? Is there an elegant way to do this?
One could check the state of the network prior to making the Async.Parallel call, but the state could change during execution. Assuming what one wanted to do was pause execution until the network was available again rather than fail, is there a functional way to do this?
First of all, there is one issue with the example - it uses Async.Parallel to run multiple operations in parallel but the operations themselves are not implemented as asynchronous, so this will not avoid blocking excessive number of threads in the thread pool.
Asynchronous. To make the code fully asynchronous, the download and downloadAndExtractLinks functions should be asynchronous too, so that you can use AsyncDownloadString of the WebClient:
let asyncDownload url = async {
let webclient = new System.Net.WebClient()
return! webclient.AsyncDownloadString(System.Uri(url : string)) }
let asyncDownloadAndExtractLinks url = async {
let! html = asyncDownload url
let links = extractLinks html
return url, links.Count }
let pmap f l =
seq { for a in l -> async { return! f a } }
|> Async.Parallel
|> Async.RunSynchronously
Retrying. Now, to answer the question - there is no built-in mechanism for handling of errors such as network failure, so you will need to implement this logic yourself. What is the right approach depends on your situation. One common approach is to retry the operation certain number of times and throw the exception only if it does not succeed e.g. 10 times. You can write this as a primitive that takes other asynchronous workflow:
let rec asyncRetry times op = async {
try
return! op
with e ->
if times <= 1 then return (reraise e)
else return! asyncRetry (times - 1) op }
Then you can change the main function to build a workflow that retries the download 10 times:
let testAsynchronous() =
pmap (asyncRetry 10 downloadAndExtractLinks) urls
Shared state. Another problem is that Async.Parallel will only return once all the downloads have completed (if there is one faulty web site, you will have to wait). If you want to show the results as they come back, you will need something more sophisticated.
One nice way to do this is to use F# agent - create an agent that stores the results obtained so far and can handle two messages - one that adds new result and another that returns the current state. Then you can start multiple async tasks that will send the result to the agent and, in a separate async workflow, you can use polling to check the current status (and e.g. update the user interface).
I wrote a MSDN series about agents and also two articles for developerFusion that have a plenty of code samples with F# agents.

My First F# program

I just finish writing my first F# program. Functionality wise the code works the way I wanted, but not sure if the code is efficient. I would much appreciate if someone could review the code for me and point out the areas where the code can be improved.
Thanks
Sudaly
open System
open System.IO
open System.IO.Pipes
open System.Text
open System.Collections.Generic
open System.Runtime.Serialization
[<DataContract>]
type Quote = {
[<field: DataMember(Name="securityIdentifier") >]
RicCode:string
[<field: DataMember(Name="madeOn") >]
MadeOn:DateTime
[<field: DataMember(Name="closePrice") >]
Price:float
}
let m_cache = new Dictionary<string, Quote>()
let ParseQuoteString (quoteString:string) =
let data = Encoding.Unicode.GetBytes(quoteString)
let stream = new MemoryStream()
stream.Write(data, 0, data.Length);
stream.Position <- 0L
let ser = Json.DataContractJsonSerializer(typeof<Quote array>)
let results:Quote array = ser.ReadObject(stream) :?> Quote array
results
let RefreshCache quoteList =
m_cache.Clear()
quoteList |> Array.iter(fun result->m_cache.Add(result.RicCode, result))
let EstablishConnection() =
let pipeServer = new NamedPipeServerStream("testpipe", PipeDirection.InOut, 4)
let mutable sr = null
printfn "[F#] NamedPipeServerStream thread created, Wait for a client to connect"
pipeServer.WaitForConnection()
printfn "[F#] Client connected."
try
// Stream for the request.
sr <- new StreamReader(pipeServer)
with
| _ as e -> printfn "[F#]ERROR: %s" e.Message
sr
while true do
let sr = EstablishConnection()
// Read request from the stream.
printfn "[F#] Ready to Receive data"
sr.ReadLine()
|> ParseQuoteString
|> RefreshCache
printfn "[F#]Quot Size, %d" m_cache.Count
let quot = m_cache.["MSFT.OQ"]
printfn "[F#]RIC: %s" quot.RicCode
printfn "[F#]MadeOn: %s" (String.Format("{0:T}",quot.MadeOn))
printfn "[F#]Price: %f" quot.Price
In general, you should try using immutable data types and avoid imperative constructs such as global variables and imperative loops - although using them in F# is fine in many cases, they should be used only when there is a good reason for doing so. Here are a couple of examples where you could use functional approach:
First of all, to make the code more functional, you should avoid using global mutable cache. Instead, your RefreshCache function should return the data as the result (preferably using some functional data structure, such as F# Map type):
let PopulateCache quoteList =
quoteList
// Generate a sequence of tuples containing key and value
|> Seq.map (fun result -> result.RicCode, result)
// Turn the sequence into an F# immutable map (replacement for hashtable)
|> Map.ofSeq
The code that uses it would be changed like this:
let cache =
sr.ReadLine()
|> ParseQuoteString
|> PopulateCache
printfn "[F#]Quot Size, %d" m_cache.Count
let quot = m_cache.["MSFT.OQ"]
// The rest of the sample stays the same
In the EstablishConnection function, you definitely don't need to declare a mutable variable sr, because in case of an exception, the function will return null. I would instead use option type to make sure that this case is handled:
let EstablishConnection() =
let pipeServer =
new NamedPipeServerStream("testpipe", PipeDirection.InOut, 4)
printfn "[F#] NamedPipeServerStream thread created..."
pipeServer.WaitForConnection()
printfn "[F#] Client connected."
try // Wrap the result in 'Some' to denote success
Some(new StreamReader(pipeServer))
with e ->
printfn "[F#]ERROR: %s" e.Message
// Return 'None' to denote a failure
None
The main loop can be written using a recursive function that stops when EstablishConnection fails:
let rec loop() =
match EstablishConnection() with
| Some(conn) ->
printfn "[F#] Ready to Receive data"
// rest of the code
loop() // continue looping
| _ -> () // Quit
Just a couple thoughts...
You probably want a 'use' rather than a 'let' in a few places, as I think some of the objects in the program are IDisposable.
You may consider wrapping the EstablishConnection method and the final while loop in async blocks (and make other minor changes), so that e.g. you can wait asynchronously for connections without blocking a thread.
At first glance it is written in imperative style rather than functional style, which does make sense given that most of the program involves side effects (i.e. I/O). Line for line, it almost looks like a C# program.
Given the amount of I/O that is taking place, I don't know that there is much you can do to this particular program to make it more of a functional style of coding.

Resources