How to use FSharpx TaskBuilder with functions taking parameters - f#

I have been lately programming with the FSharpx library and especially its TaskBuilder. Now I wonder if it should be possible to define a function which takes parameters and takes a result. Such as
let doTask(parameter:int) =
let task = TaskBuilder(scheduler = TaskScheduler.Current)
task {
return! Task.Factory.StartNew(fun() -> parameter + 1)
}
match FSharpx.Task.run doTask(1) with
| _ -> ()
Looking at the source code I see run expects a function taking no parameters and returning a Task<'a>. There doesn't look like being examples on FSharpx TaskTests either.
I'd appreciate if someone could advice how should I get a scenario like this going with FSharpx or if one isn't supposed to use the library like this for a reason I haven't quite grasped as of yet.
<edit: I believe I could wrap doTask as follows
wrapperDoTask() = doTask(101)
match FSharpx.Task.run wrapperDoTask with
| _ -> ()
And it might work. I'm not with a compiler currently, so this is a bit of a handwaving. Does anyone have an opinion on any direction or did I just answer my own question? :)
<edit2:
I think I need to edit this one more time based on MisterMetaphor's answer. Especially his P.S., I think, was well informing. I use FSharpx TaskBuilder to interop with C#, in which, as noted, tasks are returned as hot (with some minor exceptions), already running. This is in connection with my recent question Translating async-await C# code to F# with respect to the scheduler and in relation Orleans (I'll add some tags to beef up the context, maybe someone else is pondering these too).
When thinking in C# terms, what I try to achieve is to await the task result before returning, but without blocking. The behaviour I'm after is especially of that of await not .Result. The difference can be read, for instance, from
Await, and UI, and deadlocks! Oh my!
Don't Block on Async Code.
Trying to think which context or scheduler or behavior or something is going on in terms of C# is somewhat fuzzy for me. Unfortunatelly it looks like I can't ignore all the details when it comes to interop. :)

You need to use Task.run only if you want to wait for the task completion synchronously on the current thread. It takes a single parameter and you can consider that parameter a task factory -- i.e. a means to create a Task<_>. Unlike Async<_>, the Task<_> starts running as soon as it is created. That is not always a desirable behavior.
You could achieve similar results (a blocking wait for task completion) with (doTask 101).Result, but I think Task.run is more idiomatic to F#, in a way that it uses a Result return type to signal an error instead of raising an exception. It might be arguable which is better, depending on situation, but in my experience in simpler cases a special result type is more composable than exceptions.
Another point here is that you should avoid blocking waits (Task.run, .Wait(), .Result) as much as you can. (Ideally, you'd have one of those only at the top level of your program.)
P.S. This if out of scope of the question, but your doTask function looks funny. task { return! Task.Factory.StartNew( ... ) } is equivalent to Task.Factory.StartNew( ... ). What you probably wanted to do is task { return parameter + 1 }.
EDIT
So, in response to OP's question edit :) If you need the await behavior from C#, you just need to use let! .... Like this:
task {
let! x = someTask 1 2 3
return x + 5
}

Related

How to handle blocking calls when using reactor in a JAX-RS-powered server?

To process HTTP requests, we have to make blocking calls (e.g. JDBC calls) as part of a Mono/Flux-based process. Our current plan looks something like this:
// I renamed getSomething to processJaxrsHttpRequest
CompletionStage<String> processJaxrsHttpRequest(String input) {
return Mono.just(input)
.map(in -> process(in))
.flatMap(str -> Mono.fromCallable(() -> jdbcCall(str)).subscribeOn(fixedScheduler))
.flatMap(str -> asyncHttpCall(str))
.flatMap(str -> Mono.fromCallable(() -> jdbcCall(str)).subscribeOn(fixedScheduler))
.toFuture();
}
where fixedScheduler is used concurrently across HTTP requests.
We were hoping to get some feedback on this strategy for handling block calls within a decent number of fluxes. Of course, we understand that if all our requests were flowing through these blocking calls then we might as well not use reactor (outside of the admittedly nice processing API).
Update: Thanks bsideup for this answer. However, I should have been a little more specific with my questions.
My overall question is how to effectively have a blocking call used across multiple fluxes were these fluxes can be created/subscribed to in large numbers. We tried the suggested approach, but it results in an explosion of threads and quickly OOMs. So we are thinking to use a shared scheduler. So.. here are my questions.
Is using a shared scheduler (fixedScheduler) what you would suggest in the situation I describe? If not, will you point me in any directions?
If using a shared scheduler is good, would this be a good implementation of it: Schedulers.newParallel("blocking-scheduler", maxNumThreads)?
Update 2: Just dug a big on Schedulers#newParallel and realize that won't work since it 'rejects' blocking calls.
Really appreciate any tips!
While subscribeOn is indeed one way of handling blocking calls and your usage is okay, you can as well use publishOn.
It moves processing to the provided Scheduler, unless other publishOn is specified:
CompletionStage<String> getSomething(String input) {
return Mono.just(input)
.map(in -> process(in)) // process must be non-blocking, or go after publishOn
.publishOn(Schedulers.boundedElastic())
.map(::jdbcCall)
.flatMap(str -> asyncHttpCall(str))
.publishOn(Schedulers.boundedElastic())
.map(::jdbcCall)
.toFuture();
}
As you can see, you can continue using async calls too. Just make sure you're not blocking non-blocking schedulers (in that example, I use publishOn again after flatMap because asyncHttpCall may complete from non-blocking scheduler)

How to deal with checking for valid state in every method call

I have encountered some code that looks like this.
member this.Send (data:array<byte>) =
if tcpClient.Connected then
// Send something.
member this.Open () =
if not tcpClient.Connected then
// Connect.
It's a potential bug hive with constantly checking to see if the TcpClient is connected before performing an operation on it.
A similar problem would be to check whether or not something is null before performing an operation on that something.
What is the general approach to dealing with this?
I was thinking along the lines of a monad that abstracts this boring checking away.
EDIT:
Potentially I can write many methods that each will have to check if we are connected.
member this.SendName name =
if tcpClient.Connected then
// Send name
member this.ThrottleConnection percent =
if tcpClient.Connected then
// Throttle
member this.SendAsTest text =
if tcpClient.Connected then
// Send as text.
So, it depends on whether you want to do the check inside the wrapper class or outside of it. Doing the check inside the class, I don't see how a computation expression is really relevant; you're not binding operations.
A workflow expression would only be useful if you're doing the check outside the wrapper class (i.e. from the calling function). If you create a connected builder together, the resulting code would look like
connected {
do! wrapper.Send(..)
do! wrapper.Throttle(..)
do! wrapper.SendAsTest(..)
}
However, that is really no simpler than
if wrapper.connected do
wrapper.Send(..)
wrapper.Throttle(..)
wrapper.SendAsTest(..)
So, kind of, what's the point, right?
It'd make more sense if you had multiple tcpClient wrapper objects and needed them all to be connected within your workflow. That's more what the "monadic" approach is for.
connected {
do! wrapper1.Send(..)
do! wrapper2.Throttle(..)
do! wrapper3.SendAsText(..)
}
However, specific to your example of doing the checks inside the wrapper class, like I said earlier, monads would not be applicable. One neat approach to that specific problem would be to try mimicking some preconditions like the following link http://laurent.le-brun.eu/site/index.php/2008/03/26/32-design-by-contract-with-fsharp. I don't know if it's much more intuitive than the if statements, but if you're looking for an fsharp-y way of doing things interestingly, that's the best I can come up with.
Ultimately your existing code is about as compact as it gets. Presumably not all of your functions would start with the same if statement, so there's nothing unnecessarily repetitive there.

Replacement for Future.transform?

I'm going over some older Dart code, addressing breaking changes with the latest Dart SDK. This one I can't figure out:
Future<DateTime> get lastsave =>
client.lastsave.transform((int unixTs) =>
new DateTime.fromMillisecondsSinceEpoch(unixTs * 1000, isUtc:true));
=>
The method 'transform' is not defined for the class 'Future<List<int>>'
From what I understand, the purpose of Future.transform() was to apply a synchronous transformation (see e.g. this discussion thread). I.e. convert the async call to a sync call and return the value.
Has Future.transform been replaced with something else?
Must have been quite a while since that code was updated ;)
Just replace transform with then and it should work.
From https://groups.google.com/a/dartlang.org/forum/#!topic/misc/Boch2XH9Tmk
We have also improved the Future class, and made it simpler to use. One simple “then” methods lets you apply asynchronous or synchronous functions to the result of a future, merging the three methods “chain”, “transform” and “then”. Streams and Futures should make asynchronous Dart programs easier to write and read, and should reduce some types of programming errors.

Async.Parallel or Array.Parallel.Map?

I'm trying to implement a pattern I read from Don Syme's blog
(https://blogs.msdn.microsoft.com/dsyme/2010/01/09/async-and-parallel-design-patterns-in-f-parallelizing-cpu-and-io-computations/)
which suggests that there are opportunities for massive performance improvements from leveraging asynchronous I/O. I am currently trying to take a piece of code that "works" one way, using Array.Parallel.Map, and see if I can somehow achieve the same result using Async.Parallel, but I really don't understand Async.Parallel, and cannot get anything to work.
I have a piece of code (simplified below to illustrate the point) that successfully retrieves an array of data for one cusip. (A price series, for example)
let getStockData cusip =
let D = DataProvider()
let arr = D.GetPriceSeries(cusip)
return arr
let data = Array.Parallel.map (fun x -> getStockData x) stockCusips
So this approach contructs an array of arrays, by making a connection over the internet to my data vendor for each stock (which could be as many as 3000) and returns me an array of arrays (1 per stock, with a price series for each one). I admittedly don't understand what goes on underneath Array.Parallel.map, but am wondering if this is a scenario where there are resources wasted under the hood, and it actually could be faster using asynchronous I/O? So to test this out, I have attempted to make this function using asyncs, and I think that the function below follows the pattern in Don Syme's article using the URLs, but it won't compile with "let!".
let getStockDataAsync cusip =
async { let D = DataProvider()
let! arr = D.GetData(cusip)
return arr
}
The error I get is:
This expression was expected to have type Async<'a> but here has type obj
It compiles fine with "let" instead of "let!", but I had thought the whole point was that you need the exclamation point in order for the command to run without blocking a thread.
So the first question really is, what's wrong with my syntax above, in getStockDataAsync, and then at a higher level, can anyone offer some additional insight about asychronous I/O and whether the scenario I have presented would benefit from it, making it potentially much, much faster than Array.Parallel.map? Thanks so much.
F# asynchronous workflows allow you to implement asynchronous computations, however, F# makes a distinction between usual computation and asynchronous computations. This difference is tracked by the type-system. For example a method that downloads web page and is synchronous has a type string -> string (taking URL and returning HTML), but a method that does the same thing asynchronously has a type string -> Async<string>. In the async block, you can use let! to call asynchronous operations, but all other (standard synchronous) methods have to be called using let. Now, the problem with your example is that the GetData operation is ordinary synchronous method, so you cannot invoke it with let!.
In the typical F# scenario, if you want to make the GetData member asynchronous, you'll need to implement it using an asynchronous workflow, so you'll also need to wrap it in the async block. At some point, you will reach a location where you really need to run some primitive operation asynchronously (for example, downloading data from a web site). F# provides several primitive asynchronous operations that you can call from async block using let! such as AsyncGetResponse (which is an asynchronous version of GetResponse method). So, in your GetData method, you'll for example write something like this:
let GetData (url:string) = async {
let req = WebRequest.Create(url)
let! rsp = req.AsyncGetResponse()
use stream = rsp.GetResponseStream()
use reader = new System.IO.StreamReader(stream)
let html = reader.AsyncReadToEnd()
return CalculateResult(html) }
The summary is that you need to identify some primitive asynchronous operations (such as waiting for the web server or for the file system), use primitive asynchronous operations at that point and wrap all the code that uses these operations in async blocks. If there are no primitive operations that could be run asynchronously, then your code is CPU-bound and you can just use Parallel.map.
I hope this helps you understand how F# asynchronous workflows work. For more information, you can for example take a look at Don Syme's blog post, series about asynchronous programming by Robert Pickering, or my F# web cast.
#Tomas already has a great answer. I'll just say a couple bits in addition.
The idiom for F# asyncs is to name the method with an "Async" prefix (AsyncFoo, not FooAsync; the latter is an idiom already used by another .NET technology). So your functions should be getStockData and asyncGetStockData.
Inside an async workflow, whenever you use let! instead of let or do! instead of do, the thing on the right should have type Async<T> instead of T. Basically you need an existing async computation in order to 'go async' at this point in the workflow. Each Async<T> will itself be either some other async{...} workflow, or else an async "primitive". The primitives are defined in the F# library or created in user code via Async.FromBeginEnd or Async.FromContinuations which enable defining the low-level details of starting a computation, registering an I/O callback, releasing the thread, and then restarting the computation when getting called back. So you have to 'plumb' async all the way down to some truly-async-I/O-primitive in order to get the full benefits of async I/O.

Best practices to parallelize using async workflow

Lets say I wanted to scrape a webpage, and extract some data. I'd most likely write something like this:
let getAllHyperlinks(url:string) =
async { let req = WebRequest.Create(url)
let! rsp = req.GetResponseAsync()
use stream = rsp.GetResponseStream() // depends on rsp
use reader = new System.IO.StreamReader(stream) // depends on stream
let! data = reader.AsyncReadToEnd() // depends on reader
return extractAllUrls(data) } // depends on data
The let! tells F# to execute the code in another thread, then bind the result to a variable, and continue processing. The sample above uses two let statements: one to get the response, and one to read all the data, so it spawns at least two threads (please correct me if I'm wrong).
Although the workflow above spawns several threads, the order of execution is serial because each item in the workflow depends on the previous item. Its not really possible to evaluate any items further down the workflow until the other threads return.
Is there any benefit to having more than one let! in the code above?
If not, how would this code need to change to take advantage of multiple let! statements?
The key is we are not spawning any new threads. During the whole course of the workflow, there are 1 or 0 active threads being consumed from the ThreadPool. (An exception, up until the first '!', the code runs on the user thread that did an Async.Run.) "let!" lets go of a thread while the Async operation is at sea, and then picks up a thread from the ThreadPool when the operation returns. The (performance) advantage is less pressure against the ThreadPool (and of course the major user advantage is the simple programming model - a million times better than all that BeginFoo/EndFoo/callback stuff you otherwise write).
See also http://cs.hubfs.net/forums/thread/8262.aspx
I was writing an answer but Brian beat me to it. I fully agree with him.
I'd like to add that if you want to parallelize synchronous code, the right tool is PLINQ, not async workflows, as Don Syme explains.

Resources