F# observable filter with side effect - f#

I have a number of events that are merged into one observable that executes some commands. If a command succeeded some result takes place. In addition, the command should be logged.
In terms of code, this looks like
let mevts = modifyingevents |> Observable.filter exec_action
|> Observable.add (fun action -> self.OutlineEdited <- true)
where the function exec_action results in some side effect such as editing a treeview. If this succeeded then the property OutlineEdited is set to true.
I was hoping to follow this with something like
mevts |> Observable.scan (fun log action -> action::log) []
but it turns out that Observable.filter is executed once for each subscribed observer. Meaning that the side effect will be repeated.
Can you please suggest another way to achieve the same result without having the exec_action executed twice? I am hoping to avoid having to use a mutable variable if possible.

This example ilustrates nicely the difference between the IObservable<'T> type (used in this example via the Observable module) and the F# type IEvent<'T> (and functions in Event module).
When you use observables, every subscriber creates a new chain of operations (so side-effects are executed once for every subscriber). If you use events then the state is shared and side-effects are executed just once (regardless of the number of subscribers). On the other hand, the events do not get garbage collected when you remove all subscribers from an event.
So, if you do not need the events to be removed when all subscribers are removed, you should get the behaviour you want just by using Event instead of Observable:
modifyingevents
|> Event.filter exec_action
|> Event.scan (fun log action -> action::log) []

Related

How can I split a list of strings by chunks when they have some light markdown attributes, in F#?

I have a tool that is using a Telegram chatbot to interact with its users.
Telegram limits the call rate, so I use a queue system that gets flushed at regular intervals.
So, the current code is very basic:
// flush the message queue
let flushMessageQueue() =
if not messageQueue.IsEmpty then <- messageQueue is a ConcurrentQueue
// get all the messages
let messages =
messageQueue
|> Seq.unfold(fun q ->
match q.TryDequeue () with
| true, m -> Some (m, q)
| _ -> None)
// put all the messages in a single string
let messagesString = String.Join("\n", messages)
// send the data
client.SendTextMessageAsync(chatId, messagesString, ParseMode.Markdown)
|> Async.AwaitTask
|> Async.RunSynchronously
|> ignore
this is called at regular interval, while the write is:
// broadcast message
let broadcastMessage message =
messageQueue.Enqueue(message)
printfn "%s" (message.Replace ("```", String.Empty))
But as messages became more complex, two problems came at once:
Part of the output is formatted text with simple markdown:
Some blocks of lines are wrapped between ``` sections
There are some ``` sections as well inside some lines
The text is UTF-8 and uses a bunch of symbols
Some example of text may be:
```
this is a group of lines
with one, or many many lines
```
and sometimes there are things ```like this``` as well
And... I found out that Telegram limits message size to 4kb as well
So, I thought of two things:
I can maintain a state with the open / close ``` and pull from a queue, wrap each line in triple back ticks based on the state and push into another queue that will be used to make the 4kb block.
I can keep taking messages from the re-formatted queue and aggregate them until I reach 4kb, or the end of the queue and loop around.
Is there an elegant way to do this in F#?
I remember seeing a snippet where a collection function was used to aggregate data until a certain size but it looked very inefficient as it was making a collection of line1, line1+line2, line1+line2+line3... and then picking the one with the right size.

F# Observable - Converting an event stream to a list

I was writing an unit test that verified the events fired from a class. I followed the standard "IEvent<_>, Publish, Trigger inside an FSharp type" pattern.
Can you recommend the "functional" way to achieve that?
Here are the options I can think of:
Convert the event stream into a list of strings and compare that list with an expected list
(not sure if there is a way) Convert the expected list into an event stream and compare the two streams.
Pointer to a code snipped will greatly help.
Thanks!
Edit 1: Answering Mark's question:
This is what I have as of now:
let expectedFiles = [ "c:\a\1"
"c:\a\2" ]
[<Fact>]
let ``Can find files from a folder`` () =
let ad = new FileSearchAdapter()
let foundFiles = ref []
ad.FileFound
|> Observable.scan (fun acc e -> e::acc) []
|> Observable.add (fun acc -> foundFiles := acc)
ad.FindFiles #"c:\a"
Assert.Equal<string list>(expectedFiles, !foundFiles)
The issues here I feel are the [a] use of reference cell [b] the observable.add is essentially overwriting the reference for each event.
Is there a functional way to achieve the same?
Events are all about side-effects, so it's limited how much sense it makes to try to be all Functional about it.
(Yes: you can build Reactive systems where immutable event data flows through a system, being filtered and aggregated along the way, but at the source, that an event is raised, is a side-effect.)
Given that a unit test tests a unit in isolation from its dependencies, testing that events are correctly raised, exercises the isolated, 'un-functional' part of a system, so I don't think you have to do it in a Functional way.
Here's a simpler alternative:
open System.Collections.Generic
let ``Can find files from a folder`` () =
let ad = new FileSearchAdapter()
let foundFiles = List<string>()
ad.FileFound.Add(fun (sender, args) -> foundFiles.Add args)
ad.FindFiles "c:\a"
let expectedFiles = [ "c:\a\1"; "c:\a\2" ]
expectedFiles = (foundFiles |> Seq.toList)
(This test function is just a normal function that returns bool, but I'm sure you know how to convert it to a unit test.)

Why does order matter in this usage of Observable.merge?

I am trying to write a basic "game loop" using Observables in F#. Basically I conceptualize the fundamental input stream of events as two streams merged together: the key presses of the user (game uses just keyboard to begin with), and the regular ticks of the game (say, 60 times per second).
My problem seems to stem from the fact that one of the observed sequences, i.e. the ticks, is also the loop that calls DispatchEvents() on the Window allowing it to process its inputs and fire key pressed events, so one stream of events is actually driven by the other, if that makes sense. Here is the code:
open System;
open System.IO
open SFML.Window
open SFML.Graphics
open System.Reactive
open System.Reactive.Linq
open System.Diagnostics
type InputEvent =
| Tick of TimeSpan
| KeyPressed of Keyboard.Key
[<EntryPoint;STAThread>]
let main _ =
use window = new RenderWindow(VideoMode(640u, 480u), "GameWindow")
window.SetVerticalSyncEnabled(true)
let displayStream =
Observable.Create(
fun (observer:IObserver<TimeSpan>) ->
let sw = Stopwatch.StartNew()
while (window.IsOpen()) do
window.DispatchEvents() // this calls the KeyPressed event synchronously
window.Display() // this blocks until the next vertical sync
window.Clear()
observer.OnNext sw.Elapsed
sw.Restart()
observer.OnCompleted();
{ new IDisposable with member this.Dispose() = ()})
let onDisplay elapsedTime =
// draw game: code elided
let inputEvents = Observable.merge
(window.KeyPressed |> Observable.map (fun key -> KeyPressed(key.Code)))
(displayStream |> Observable.map (fun t -> Tick(t)))
use subscription =
inputEvents.Subscribe(fun inputEvent -> match inputEvent with
| Tick(t) -> onDisplay(t)
| KeyPressed(key) -> printfn "%A" key)
0
This works, however, if I change the order of parameters in Observable.merge:
let inputEvents = Observable.merge
(displayStream |> Observable.map (fun t -> Tick(t)))
(window.KeyPressed |> Observable.map (fun key -> KeyPressed(key.Code)))
Then the game renders (onDisplay is called), but I don't see KeyPressed events printed to the console. Why is that?
(If you're wondering what is SFML, here's the link).
In pseudo-code, what merge does is:
firstStream.Subscribe(...);
secondStream.Subscribe(...);
The subscribe function you pass to Observable.create is synchronous and never yields control back to the caller. This means that merge itself is blocked from trying to subscribe to any streams that come after displayStream. When you reorder the streams so that displayStream is first, you prevent it from ever subscribing to your KeyPressed stream. This is why you are seeing the behavior you see.
In some respects, your displayStream is behaving badly. Subscribe methods should not block.
So, either make sure displayStream is the last item in your list, or do some refactoring of your code. You could just use a Subject for displayStream. Then subscribe to everything and finally start the "display loop", where you execute the loop that is currently in your displayStream definition and each time through the loop, just call OnNext on the subject.

How does F#'s async really work?

I am trying to learn how async and let! work in F#.
All the docs i've read seem confusing.
What's the point of running an async block with Async.RunSynchronously? Is this async or sync? Looks like a contradiction.
The documentation says that Async.StartImmediate runs in the current thread. If it runs in the same thread, it doesn't look very asynchronous to me... Or maybe asyncs are more like coroutines rather than threads. If so, when do they yield back an forth?
Quoting MS docs:
The line of code that uses let! starts the computation, and then the thread is suspended
until the result is available, at which point execution continues.
If the thread waits for the result, why should i use it? Looks like plain old function call.
And what does Async.Parallel do? It receives a sequence of Async<'T>. Why not a sequence of plain functions to be executed in parallel?
I think i'm missing something very basic here. I guess after i understand that, all the documentation and samples will start making sense.
A few things.
First, the difference between
let resp = req.GetResponse()
and
let! resp = req.AsyncGetReponse()
is that for the probably hundreds of milliseconds (an eternity to the CPU) where the web request is 'at sea', the former is using one thread (blocked on I/O), whereas the latter is using zero threads. This is the most common 'win' for async: you can write non-blocking I/O that doesn't waste any threads waiting for hard disks to spin around or network requests to return. (Unlike most other languages, you aren't forced to do inversion of control and factor things into callbacks.)
Second, Async.StartImmediate will start an async on the current thread. A typical use is with a GUI, you have some GUI app that wants to e.g. update the UI (e.g. to say "loading..." somewhere), and then do some background work (load something off disk or whatever), and then return to the foreground UI thread to update the UI when completed ("done!"). StartImmediate enables an async to update the UI at the start of the operation and to capture the SynchronizationContext so that at the end of the operation is can return to the GUI to do a final update of the UI.
Next, Async.RunSynchronously is rarely used (one thesis is that you call it at most once in any app). In the limit, if you wrote your entire program async, then in the "main" method you would call RunSynchronously to run the program and wait for the result (e.g. to print out the result in a console app). This does block a thread, so it is typically only useful at the very 'top' of the async portion of your program, on the boundary back with synch stuff. (The more advanced user may prefer StartWithContinuations - RunSynchronously is kinda the "easy hack" to get from async back to sync.)
Finally, Async.Parallel does fork-join parallelism. You could write a similar function that just takes functions rather than asyncs (like stuff in the TPL), but the typical sweet spot in F# is parallel I/O-bound computations, which are already async objects, so this is the most commonly useful signature. (For CPU-bound parallelism, you could use asyncs, but you could also use TPL just as well.)
The usage of async is to save the number of threads in usage.
See the following example:
let fetchUrlSync url =
let req = WebRequest.Create(Uri url)
use resp = req.GetResponse()
use stream = resp.GetResponseStream()
use reader = new StreamReader(stream)
let contents = reader.ReadToEnd()
contents
let sites = ["http://www.bing.com";
"http://www.google.com";
"http://www.yahoo.com";
"http://www.search.com"]
// execute the fetchUrlSync function in parallel
let pagesSync = sites |> PSeq.map fetchUrlSync |> PSeq.toList
The above code is what you want to do: define a function and execute in parallel. So why do we need async here?
Let's consider something big. E.g. if the number of sites is not 4, but say, 10,000! Then There needs 10,000 threads to run them in parallel, which is a huge resource cost.
While in async:
let fetchUrlAsync url =
async { let req = WebRequest.Create(Uri url)
use! resp = req.AsyncGetResponse()
use stream = resp.GetResponseStream()
use reader = new StreamReader(stream)
let contents = reader.ReadToEnd()
return contents }
let pagesAsync = sites |> Seq.map fetchUrlAsync |> Async.Parallel |> Async.RunSynchronously
When the code is in use! resp = req.AsyncGetResponse(), the current thread is given up and its resource could be used for other purposes. If the response comes back in 1 second, then your thread could use this 1 second to process other stuff. Otherwise the thread is blocked, wasting thread resource for 1 second.
So even your are downloading 10000 web pages in parallel in an asynchronous way, the number of threads are limited to a small number.
I think you are not a .Net/C# programmer. The async tutorial usually assumes that one knows .Net and how to program asynchronous IO in C#(a lot of code). The magic of Async construct in F# is not for parallel. Because simple parallel could be realized by other constructs, e.g. ParallelFor in the .Net parallel extension. However, the asynchronous IO is more complex, as you see the thread gives up its execution, when the IO finishes, the IO needs to wake up its parent thread. This is where async magic is used for: in several lines of concise code, you can do very complex control.
Many good answers here but I thought I take a different angle to the question: How does F#'s async really work?
Unlike async/await in C# F# developers can actually implement their own version of Async. This can be a great way to learn how Async works.
(For the interested the source code to Async can be found here: https://github.com/Microsoft/visualfsharp/blob/fsharp4/src/fsharp/FSharp.Core/control.fs)
As our fundamental building block for our DIY workflows we define:
type DIY<'T> = ('T->unit)->unit
This is a function that accepts another function (called the continuation) that is called when the result of type 'T is ready. This allows DIY<'T> to start a background task without blocking the calling thread. When the result is ready the continuation is called allowing the computation to continue.
The F# Async building block is a bit more complicated as it also includes cancellation and exception continuations but essentially this is it.
In order to support the F# workflow syntax we need to define a computation expression (https://msdn.microsoft.com/en-us/library/dd233182.aspx). While this is a rather advanced F# feature it's also one of the most amazing features of F#. The two most important operations to define are return & bind which are used by F# to combine our DIY<_> building blocks into aggregated DIY<_> building blocks.
adaptTask is used to adapt a Task<'T> into a DIY<'T>.
startChild allows starting several simulatenous DIY<'T>, note that it doesn't start new threads in order to do so but reuses the calling thread.
Without any further ado here's the sample program:
open System
open System.Diagnostics
open System.Threading
open System.Threading.Tasks
// Our Do It Yourself Async workflow is a function accepting a continuation ('T->unit).
// The continuation is called when the result of the workflow is ready.
// This may happen immediately or after awhile, the important thing is that
// we don't block the calling thread which may then continue executing useful code.
type DIY<'T> = ('T->unit)->unit
// In order to support let!, do! and so on we implement a computation expression.
// The two most important operations are returnValue/bind but delay is also generally
// good to implement.
module DIY =
// returnValue is called when devs uses return x in a workflow.
// returnValue passed v immediately to the continuation.
let returnValue (v : 'T) : DIY<'T> =
fun a ->
a v
// bind is called when devs uses let!/do! x in a workflow
// bind binds two DIY workflows together
let bind (t : DIY<'T>) (fu : 'T->DIY<'U>) : DIY<'U> =
fun a ->
let aa tv =
let u = fu tv
u a
t aa
let delay (ft : unit->DIY<'T>) : DIY<'T> =
fun a ->
let t = ft ()
t a
// starts a DIY workflow as a subflow
// The way it works is that the workflow is executed
// which may be a delayed operation. But startChild
// should always complete immediately so in order to
// have something to return it returns a DIY workflow
// postProcess checks if the child has computed a value
// ie rv has some value and if we have computation ready
// to receive the value (rca has some value).
// If this is true invoke ca with v
let startChild (t : DIY<'T>) : DIY<DIY<'T>> =
fun a ->
let l = obj()
let rv = ref None
let rca = ref None
let postProcess () =
match !rv, !rca with
| Some v, Some ca ->
ca v
rv := None
rca := None
| _ , _ -> ()
let receiver v =
lock l <| fun () ->
rv := Some v
postProcess ()
t receiver
let child : DIY<'T> =
fun ca ->
lock l <| fun () ->
rca := Some ca
postProcess ()
a child
let runWithContinuation (t : DIY<'T>) (f : 'T -> unit) : unit =
t f
// Adapts a task as a DIY workflow
let adaptTask (t : Task<'T>) : DIY<'T> =
fun a ->
let action = Action<Task<'T>> (fun t -> a t.Result)
ignore <| t.ContinueWith action
// Because C# generics doesn't allow Task<void> we need to have
// a special overload of for the unit Task.
let adaptUnitTask (t : Task) : DIY<unit> =
fun a ->
let action = Action<Task> (fun t -> a ())
ignore <| t.ContinueWith action
type DIYBuilder() =
member x.Return(v) = returnValue v
member x.Bind(t,fu) = bind t fu
member x.Delay(ft) = delay ft
let diy = DIY.DIYBuilder()
open DIY
[<EntryPoint>]
let main argv =
let delay (ms : int) = adaptUnitTask <| Task.Delay ms
let delayedValue ms v =
diy {
do! delay ms
return v
}
let complete =
diy {
let sw = Stopwatch ()
sw.Start ()
// Since we are executing these tasks concurrently
// the time this takes should be roughly 700ms
let! cd1 = startChild <| delayedValue 100 1
let! cd2 = startChild <| delayedValue 300 2
let! cd3 = startChild <| delayedValue 700 3
let! d1 = cd1
let! d2 = cd2
let! d3 = cd3
sw.Stop ()
return sw.ElapsedMilliseconds,d1,d2,d3
}
printfn "Starting workflow"
runWithContinuation complete (printfn "Result is: %A")
printfn "Waiting for key"
ignore <| Console.ReadKey ()
0
The output of the program should be something like this:
Starting workflow
Waiting for key
Result is: (706L, 1, 2, 3)
When running the program note that Waiting for key is printed immidiately as the Console thread is not blocked from starting workflow. After about 700ms the result is printed.
I hope this was interesting to some F# devs
Lots of great detail in the other answers, but as I beginner I got tripped up by the differences between C# and F#.
F# async blocks are a recipe for how the code should run, not actually an instruction to run it yet.
You build up your recipe, maybe combining with other recipes (e.g. Async.Parallel). Only then do you ask the system to run it, and you can do that on the current thread (e.g. Async.StartImmediate) or on a new task, or various other ways.
So it's a decoupling of what you want to do from who should do it.
The C# model is often called 'Hot Tasks' because the tasks are started for you as part of their definition, vs. the F# 'Cold Task' models.
The idea behind let! and Async.RunSynchronously is that sometimes you have an asynchronous activity that you need the results of before you can continue. For example, the "download a web page" function may not have a synchronous equivalent, so you need some way to run it synchronously. Or if you have an Async.Parallel, you may have hundreds of tasks all happening concurrently, but you want them all to complete before continuing.
As far as I can tell, the reason you would use Async.StartImmediate is that you have some computation that you need to run on the current thread (perhaps a UI thread) without blocking it. Does it use coroutines? I guess you could call it that, although there isn't a general coroutine mechanism in .Net.
So why does Async.Parallel require a sequence of Async<'T>? Probably because it's a way of composing Async<'T> objects. You could easily create your own abstraction that works with just plain functions (or a combination of plain functions and Asyncs, but it would just be a convenience function.
In an async block you can have some synchronous and some async operations, so, for example, you may have a web site that will show the status of the user in several ways, so you may show if they have bills that are due shortly, birthdays coming up and homework due. None of these are in the same database, so your application will make three separate calls. You may want to make the calls in parallel, so that when the slowest one is done, you can put the results together and display it, so, the end result will be that the display is based on the slowest. You don't care about the order that these come back, you just want to know when all three are received.
To finish my example, you may then want to synchronously do the work to create the UI to show this information. So, at the end, you wanted this data fetched and the UI displayed, the parts where order doesn't matter is done in parallel, and where order matters can be done in a synchronous fashion.
You can do these as three threads, but then you have to keep track and unpause the original thread when the third one is finished, but it is more work, it is easier to have the .NET framework take care of this.

Can the lock function be used to implement thread-safe enumeration?

I'm working on a thread-safe collection that uses Dictionary as a backing store.
In C# you can do the following:
private IEnumerable<KeyValuePair<K, V>> Enumerate() {
if (_synchronize) {
lock (_locker) {
foreach (var entry in _dict)
yield return entry;
}
} else {
foreach (var entry in _dict)
yield return entry;
}
}
The only way I've found to do this in F# is using Monitor, e.g.:
let enumerate() =
if synchronize then
seq {
System.Threading.Monitor.Enter(locker)
try for entry in dict -> entry
finally System.Threading.Monitor.Exit(locker)
}
else seq { for entry in dict -> entry }
Can this be done using the lock function? Or, is there a better way to do this in general? I don't think returning a copy of the collection for iteration will work because I need absolute synchronization.
I don't think that you'll be able to do the same thing with the lock function, since you would be trying to yield from within it. Having said that, this looks like a dangerous approach in either language, since it means that the lock can be held for an arbitrary amount of time (e.g. if one thread calls Enumerate() but doesn't enumerate all the way through the resulting IEnumerable<_>, then the lock will continue to be held).
It may make more sense to invert the logic, providing an iter method along the lines of:
let iter f =
if synchronize then
lock locker (fun () -> Seq.iter f dict)
else
Seq.iter f dict
This brings the iteration back under your control, ensuring that the sequence is fully iterated (assuming that f doesn't block, which seems like a necessary assumption in any case) and that the lock is released immediately thereafter.
EDIT
Here's an example of code that could hold the lock forever.
let cached = enumerate() |> Seq.cache
let firstFive = Seq.take 5 cached |> Seq.toList
We've taken the lock in order to start enumerating through the first 5 items. However, we haven't continued through the rest of the sequence, so the lock won't be released (maybe we would enumerate the rest of the way later based on user feedback or something, in which case the lock would finally be released).
In most cases, correctly written code will ensure that it disposes of the original enumerator, but there's no way to guarantee that in general. Therefore, your sequence expressions should be designed to be robust to only being enumerated part way. If you intend to require your callers to enumerate the collection all at once, then forcing them to pass you the function to apply to each element is better than returning a sequence which they can enumerate as they please.
I agree with kvb that the code is suspicious and that you probably don't want to hold the lock. However, there is a way to write the locking in a more comfortable way using the use keyword. It's worth mentioning it, because it may be useful in other situations.
You can write a function that starts holding a lock and returns IDisposable, which releases the lock when it is disposed:
let makeLock locker =
System.Threading.Monitor.Enter(locker)
{ new System.IDisposable with
member x.Dispose() =
System.Threading.Monitor.Exit(locker) }
Then you can write for example:
let enumerate() = seq {
if synchronize then
use l0 = makeLock locker
for entry in dict do
yield entry
else
for entry in dict do
yield entry }
This is essentially implementing C# like lock using the use keyword, which has similar properties (allows you to do something when leaving the scope). So, this is much closer to the original C# version of the code.

Resources