Suave - Control when responses are 'cached' or recalculated - f#

I want to understand how to control when responses are 'cached' versus when they are 'recalculated'.
As an example:
[<EntryPoint>]
let main [| port |] =
let config =
{ defaultConfig with
bindings = [ HttpBinding.mk HTTP IPAddress.Loopback (uint16 port) ]
listenTimeout = TimeSpan.FromMilliseconds 3000.
}
let appDemo:WebPart =
DateTime.Now.ToString()
|> sprintf "Server timestamp: %s"
|> Successful.OK
startWebServer config appDemo
If I run the above webserver and hit it several times then each time I get the same timestamp back. Which I guess makes sense; appDemo is just an expression which is calculated first time around and never again, right?
In this circumstance, I might want appDemo to be 'recalculated' for every request. How do I do that? I can't seem to find an example in the docs.

Try this - not sure how high it scores on "idiomatic Suave" scale though:
let appDemo:WebPart =
request (fun req ->
DateTime.Now.ToString()
|> sprintf "Server timestamp: %s"
|> Successful.OK)
You're right in that you're seeing the same value because it's captured at the time appDemo is evaluated. That's a property of how F# works however, and has nothing to do with Suave caching it.
Note that WebPart type is an alias for HttpContext -> Async<HttpContext option> function - so inherently it yields itself to being recalculated on each request rather than being calculated once.

Related

F# CSV TypeProvider less robust in console application

I am trying to experiment with live data from the Coronavirus pandemic (unfortunately and good luck to all of us).
I have developed a small script and I am transitioning into a console application: it uses CSV type providers.
I have the following issue. Suppose we want to filter by region the Italian spread we can use this code into a .fsx file:
open FSharp.Data
let provinceData = CsvProvider< #"https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-province/dpc-covid19-ita-province.csv" , IgnoreErrors = true>.GetSample()
let filterDataByProvince province =
provinceData.Rows
|> Seq.filter (fun x -> x.Sigla_provincia = province)
Being sequences lazy, then suppose I force the complier to load in memory the data for the province of Rome, I can add:
let romeProvince = filterDataByProvince "RM" |> Seq.toArray
This works fine, run by FSI, locally.
Now, if I transition this code into a console application using a .fs file; I declare exactly the same functions and using exactly the same type provider loader; but instead of using the last line to gather the data, I put it into a main function:
[<EntryPoint>]
let main _ =
let romeProvince = filterDataByProvince "RM" |> Seq.toArray
Console.Read() |> ignore
0
This results into the following runtime exception:
System.Exception
HResult=0x80131500
Message=totale_casi is missing
Source=FSharp.Data
StackTrace:
at <StartupCode$FSharp-Data>.$TextRuntime.GetNonOptionalValue#139-4.Invoke(String message)
at CoronaSchiatta.Evoluzione.provinceData#10.Invoke(Object parent, String[] row) in C:\Users\glddm\source\repos\CoronaSchiatta\CoronaSchiatta\CoronaEvolution.fs:line 10
at FSharp.Data.Runtime.CsvHelpers.parseIntoTypedRows#174.GenerateNext(IEnumerable`1& next)
Can you explain that?
Some rows have an odd format, possibly, but the FSI session is robust to those, whilst the console version is fragile; why? How can I fix that?
I am using VS2019 Community Edition, targeting .NET Framework 4.7.2, F# runtime: 4.7.0.0;
as FSI, I am using the following: FSI Microsoft (R) F# Interactive version 10.7.0.0 for F# 4.7
PS: Please also be aware that if I use CsvFile, instead of type providers, as in:
let test = #"https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-province/dpc-covid19-ita-province.csv"
|> CsvFile.Load |> (fun x -> x.Rows ) |> Seq.filter ( fun x-> x.[6 ] = "RM")
|> Seq.iter ( fun x -> x.[9] |> Console.WriteLine )
Then it works like a charm also in the console application. Of course I would like to use type providers otherwise I have to add type definition, mapping the schema to the columns (and it will be more fragile). The last line was just a quick test.
Fragility
CSV Type Providers can be fragile if you don't have a good schema or sample.
Now getting a runtime error is almost certainly because your data doesn't match up.
How do you figure it out? One way is to run through your data first:
provinceData.Rows |> Seq.iteri (fun i x -> printfn "Row %d: %A" (i + 1) x)
This runs up to Row 2150. And sure enough, the next row:
2020-03-11 17:00:00,ITA,19,Sicilia,994,In fase di definizione/aggiornamento,,0,0,
You can see the last value (totale_casi) is missing.
One of CsvProvider's options is InferRows. This is the number of rows the provider scans in order to build up a schema - and its default value happens to be 1000.
So:
type COVID = CsvProvider<uri, InferRows = 0>
A better way to prevent this from happening in the future is to manually define a sample from a sub-set of data:
type COVID = CsvProvider<"sample-dpc-covid19-ita-province.csv">
and sample-dpc-covid19-ita-province.csv is:
data,stato,codice_regione,denominazione_regione,codice_provincia,denominazione_provincia,sigla_provincia,lat,long,totale_casi
2020-02-24 18:00:00,ITA,13,Abruzzo,069,Chieti,CH,42.35103167,14.16754574,0
2020-02-24 18:00:00,ITA,13,Abruzzo,066,L'Aquila,AQ,42.35122196,13.39843823,
2020-02-24 18:00:00,ITA,13,Abruzzo,068,Pescara,PE,42.46458398,14.21364822,0
2020-02-24 18:00:00,ITA,13,Abruzzo,067,Teramo,TE,42.6589177,13.70439971,0
With this the type of totale_casi is now Nullable<int>.
If you don't mind NaN values, you can also use:
CsvProvider<..., AssumeMissingValues = true>
Why does FSI seem more robust?
FSI isn't more robust. This is my best guess:
Your schema source is being regularly updated.
Type Providers cache the schema, so that it doesn't regenerate the schema every time you compile your code, which can be impractical. When you restart an FSI session, you end up regenerating your Type Provider, but not so with the console application. So it might sometimes has the effect of being less error-prone, having worked with a newer source.

mutable state in collection

I'm pretty new to functional programming so this might be a question due to misconception, but I can't get my head around this - from an OOP point of view it seems so obvious...
scenario:
Assume you have an actor or micro-service like architecture approach where messages/requests are sent to some components that handle them and reply. Assume now, one of the components stores some of the data from the requests for future requests (e.g. it calculates a value and stores it in a cache so that the next time the same request occurs, no calculation is needed).
The data can be hold in memory.
question:
How do you in functional programming in general, and especially in f#, handle such a scenario? I guess a static dictionary is not a functional approach and I don't want to include any external things like data stores if possible.
Or more precise:
If an application creates data that will be used later in the processing again, where do we store the data?
example: You have an application that executes some sort of tasks on some initial data. First, you store the inital data (e.g. add it to a dictionary), then you execute the first task that does some processing based on a subset of the data, then you execute the second task that adds additional data and so on until all tasks are done...
Now the basic approach (from my understanding) would be to define the data and use the tasks as some sort of processing-chain that forward the processed data, like initial-data -> task-1 -> task-2 -> ... -> done
but that does not fit an architecture where getting/adding data is done message-based and asynchronous.
approach:
My initial approach was this
type Record = { }
let private dummyStore = new System.Collections.Concurrent.ConcurrentBag<Record>()
let search comparison =
let matchingRecords = dummyStore |> Seq.where (comparison)
if matchingRecords |> Seq.isEmpty
then EmptyFailedRequest
else Record (matchingRecords |> Seq.head)
let initialize initialData =
initialData |> Seq.iter (dummyStore.Add)
let add newRecord =
dummyStore.Add(newRecord)
encapsulated in a module that looks to me like an OOP approach.
After #Gustavo asked me to provide an example and considering his suggestion I've realized that I could do it like this (go one level higher to the place where the functions are actually called):
let handleMessage message store =
// all the operations from above but now with Seq<Record> -> ... -> Seq<Record>
store
let agent = MailboxProcessor.Start(fun inbox->
let rec messageLoop store = async{
let! msg = inbox.Receive()
let modifiedStore = handleMessage msg store
return! messageLoop modifiedStore
}
messageLoop Seq.empty
)
This answers the question for me well since it removed mutability and shared state at all. But when just looking at the first approach, I cannot think of any solution w/o the collection outside the functions
Please note that this question is in f# to explain the environment, the syntax etc. I don't want a solution that works because f# is multi-paradigm, I would like to get a functional approach for that.
I've read all questions that I could find on SO so far but they either prove the theoretical possibility or they use collections for this scenario - if duplicated please point me the right direction.
You can use a technique called memoization which is very common in FP.
And it consists precisely on keeping a dictionary with the calculated values.
Here's a sample implementation:
open System
open System.Collections.Concurrent
let getOrAdd (a:ConcurrentDictionary<'A,'B>) (b:_->_) k = a.GetOrAdd(k, b)
let memoize f =
let dic = new ConcurrentDictionary<_,_>()
getOrAdd dic f
Note that with memoize you can decorate any function and get a memoized version of it. Here's a sample:
let f x =
printfn "calculating f (%i)" x
2 * x
let g = memoize f // g is the memoized version of f
// test
> g 5 ;;
calculating f (5)
val it : int = 10
> g 5 ;;
val it : int = 10
You can see that in the second execution the value was not calculated.

F# Observable - Converting an event stream to a list

I was writing an unit test that verified the events fired from a class. I followed the standard "IEvent<_>, Publish, Trigger inside an FSharp type" pattern.
Can you recommend the "functional" way to achieve that?
Here are the options I can think of:
Convert the event stream into a list of strings and compare that list with an expected list
(not sure if there is a way) Convert the expected list into an event stream and compare the two streams.
Pointer to a code snipped will greatly help.
Thanks!
Edit 1: Answering Mark's question:
This is what I have as of now:
let expectedFiles = [ "c:\a\1"
"c:\a\2" ]
[<Fact>]
let ``Can find files from a folder`` () =
let ad = new FileSearchAdapter()
let foundFiles = ref []
ad.FileFound
|> Observable.scan (fun acc e -> e::acc) []
|> Observable.add (fun acc -> foundFiles := acc)
ad.FindFiles #"c:\a"
Assert.Equal<string list>(expectedFiles, !foundFiles)
The issues here I feel are the [a] use of reference cell [b] the observable.add is essentially overwriting the reference for each event.
Is there a functional way to achieve the same?
Events are all about side-effects, so it's limited how much sense it makes to try to be all Functional about it.
(Yes: you can build Reactive systems where immutable event data flows through a system, being filtered and aggregated along the way, but at the source, that an event is raised, is a side-effect.)
Given that a unit test tests a unit in isolation from its dependencies, testing that events are correctly raised, exercises the isolated, 'un-functional' part of a system, so I don't think you have to do it in a Functional way.
Here's a simpler alternative:
open System.Collections.Generic
let ``Can find files from a folder`` () =
let ad = new FileSearchAdapter()
let foundFiles = List<string>()
ad.FileFound.Add(fun (sender, args) -> foundFiles.Add args)
ad.FindFiles "c:\a"
let expectedFiles = [ "c:\a\1"; "c:\a\2" ]
expectedFiles = (foundFiles |> Seq.toList)
(This test function is just a normal function that returns bool, but I'm sure you know how to convert it to a unit test.)

Why does order matter in this usage of Observable.merge?

I am trying to write a basic "game loop" using Observables in F#. Basically I conceptualize the fundamental input stream of events as two streams merged together: the key presses of the user (game uses just keyboard to begin with), and the regular ticks of the game (say, 60 times per second).
My problem seems to stem from the fact that one of the observed sequences, i.e. the ticks, is also the loop that calls DispatchEvents() on the Window allowing it to process its inputs and fire key pressed events, so one stream of events is actually driven by the other, if that makes sense. Here is the code:
open System;
open System.IO
open SFML.Window
open SFML.Graphics
open System.Reactive
open System.Reactive.Linq
open System.Diagnostics
type InputEvent =
| Tick of TimeSpan
| KeyPressed of Keyboard.Key
[<EntryPoint;STAThread>]
let main _ =
use window = new RenderWindow(VideoMode(640u, 480u), "GameWindow")
window.SetVerticalSyncEnabled(true)
let displayStream =
Observable.Create(
fun (observer:IObserver<TimeSpan>) ->
let sw = Stopwatch.StartNew()
while (window.IsOpen()) do
window.DispatchEvents() // this calls the KeyPressed event synchronously
window.Display() // this blocks until the next vertical sync
window.Clear()
observer.OnNext sw.Elapsed
sw.Restart()
observer.OnCompleted();
{ new IDisposable with member this.Dispose() = ()})
let onDisplay elapsedTime =
// draw game: code elided
let inputEvents = Observable.merge
(window.KeyPressed |> Observable.map (fun key -> KeyPressed(key.Code)))
(displayStream |> Observable.map (fun t -> Tick(t)))
use subscription =
inputEvents.Subscribe(fun inputEvent -> match inputEvent with
| Tick(t) -> onDisplay(t)
| KeyPressed(key) -> printfn "%A" key)
0
This works, however, if I change the order of parameters in Observable.merge:
let inputEvents = Observable.merge
(displayStream |> Observable.map (fun t -> Tick(t)))
(window.KeyPressed |> Observable.map (fun key -> KeyPressed(key.Code)))
Then the game renders (onDisplay is called), but I don't see KeyPressed events printed to the console. Why is that?
(If you're wondering what is SFML, here's the link).
In pseudo-code, what merge does is:
firstStream.Subscribe(...);
secondStream.Subscribe(...);
The subscribe function you pass to Observable.create is synchronous and never yields control back to the caller. This means that merge itself is blocked from trying to subscribe to any streams that come after displayStream. When you reorder the streams so that displayStream is first, you prevent it from ever subscribing to your KeyPressed stream. This is why you are seeing the behavior you see.
In some respects, your displayStream is behaving badly. Subscribe methods should not block.
So, either make sure displayStream is the last item in your list, or do some refactoring of your code. You could just use a Subject for displayStream. Then subscribe to everything and finally start the "display loop", where you execute the loop that is currently in your displayStream definition and each time through the loop, just call OnNext on the subject.

Asynchronous crawling F#

When crawling on webpages I need to be careful as to not make too many requests to the same domain, for example I want to put 1 s between requests. From what I understand it is the time between requests that is important. So to speed things up I want to use async workflows in F#, the idea being make your requests with 1 sec interval but avoid blocking things while waiting for request response.
let getHtmlPrimitiveAsyncTimer (uri : System.Uri) (timer:int) =
async{
let req = (WebRequest.Create(uri)) :?> HttpWebRequest
req.UserAgent<-"Mozilla"
try
Thread.Sleep(timer)
let! resp = (req.AsyncGetResponse())
Console.WriteLine(uri.AbsoluteUri+" got response")
use stream = resp.GetResponseStream()
use reader = new StreamReader(stream)
let html = reader.ReadToEnd()
return html
with
| _ as ex -> return "Bad Link"
}
Then I do something like:
let uri1 = System.Uri "http://rue89.com"
let timer = 1000
let jobs = [|for i in 1..10 -> getHtmlPrimitiveAsyncTimer uri1 timer|]
jobs
|> Array.mapi(fun i job -> Console.WriteLine("Starting job "+string i)
Async.StartAsTask(job).Result)
Is this alright ? I am very unsure about 2 things:
-Does the Thread.Sleep thing work for delaying the request ?
-Is using StartTask a problem ?
I am a beginner (as you may have noticed) in F# (coding in general actually ), and everything envolving Threads scares me :)
Thanks !!
I think what you want to do is
- create 10 jobs, numbered 'n', each starting 'n' seconds from now
- run those all in parallel
Approximately like
let makeAsync uri n = async {
// create the request
do! Async.Sleep(n * 1000)
// AsyncGetResponse etc
}
let a = [| for i in 1..10 -> makeAsync uri i |]
let results = a |> Async.Parallel |> Async.RunSynchronously
Note that of course they all won't start exactly now, if e.g. you have a 4-core machine, 4 will start running very soon, but then quickly execute up to the Async.Sleep, at which point the next 4 will run up until their sleeps, and so forth. And then in one second the first async wakes up and posts a request, and another second later the 2nd async wakes up, ... so this should work. The 1s is only approximate, since they're starting their timers each a very tiny bit staggered from one another... you may want to buffer it a little, e.g. 1100 ms or something if the cut-off you need is really exactly a second (network latencies and whatnot still leave a bit of this outside the possible control of your program probably).
Thread.Sleep is suboptimal, it will work ok for a small number of requests, but you're burning a thread, and threads are expensive and it won't scale to a large number.
You don't need StartAsTask unless you want to interoperate with .NET Tasks or later do a blocking rendezvous with the result via .Result. If you just want these to all run and then block to collect all the results in an array, Async.Parallel will do that fork-join parallelism for you just fine. If they're just going to print results, you can fire-and-forget via Async.Start which will drop the results on the floor.
(An alternative strategy is to use an agent as a throttle. Post all the http requests to a single agent, where the agent is logically single-threaded and sits in a loop, doing Async.Sleep for 1s, and then handling the next request. That's a nice way to make a general-purpose throttle... may be blog-worthy for me, come to think of it.)

Resources