I'm trying to implement a pattern I read from Don Syme's blog
(https://blogs.msdn.microsoft.com/dsyme/2010/01/09/async-and-parallel-design-patterns-in-f-parallelizing-cpu-and-io-computations/)
which suggests that there are opportunities for massive performance improvements from leveraging asynchronous I/O. I am currently trying to take a piece of code that "works" one way, using Array.Parallel.Map, and see if I can somehow achieve the same result using Async.Parallel, but I really don't understand Async.Parallel, and cannot get anything to work.
I have a piece of code (simplified below to illustrate the point) that successfully retrieves an array of data for one cusip. (A price series, for example)
let getStockData cusip =
let D = DataProvider()
let arr = D.GetPriceSeries(cusip)
return arr
let data = Array.Parallel.map (fun x -> getStockData x) stockCusips
So this approach contructs an array of arrays, by making a connection over the internet to my data vendor for each stock (which could be as many as 3000) and returns me an array of arrays (1 per stock, with a price series for each one). I admittedly don't understand what goes on underneath Array.Parallel.map, but am wondering if this is a scenario where there are resources wasted under the hood, and it actually could be faster using asynchronous I/O? So to test this out, I have attempted to make this function using asyncs, and I think that the function below follows the pattern in Don Syme's article using the URLs, but it won't compile with "let!".
let getStockDataAsync cusip =
async { let D = DataProvider()
let! arr = D.GetData(cusip)
return arr
}
The error I get is:
This expression was expected to have type Async<'a> but here has type obj
It compiles fine with "let" instead of "let!", but I had thought the whole point was that you need the exclamation point in order for the command to run without blocking a thread.
So the first question really is, what's wrong with my syntax above, in getStockDataAsync, and then at a higher level, can anyone offer some additional insight about asychronous I/O and whether the scenario I have presented would benefit from it, making it potentially much, much faster than Array.Parallel.map? Thanks so much.
F# asynchronous workflows allow you to implement asynchronous computations, however, F# makes a distinction between usual computation and asynchronous computations. This difference is tracked by the type-system. For example a method that downloads web page and is synchronous has a type string -> string (taking URL and returning HTML), but a method that does the same thing asynchronously has a type string -> Async<string>. In the async block, you can use let! to call asynchronous operations, but all other (standard synchronous) methods have to be called using let. Now, the problem with your example is that the GetData operation is ordinary synchronous method, so you cannot invoke it with let!.
In the typical F# scenario, if you want to make the GetData member asynchronous, you'll need to implement it using an asynchronous workflow, so you'll also need to wrap it in the async block. At some point, you will reach a location where you really need to run some primitive operation asynchronously (for example, downloading data from a web site). F# provides several primitive asynchronous operations that you can call from async block using let! such as AsyncGetResponse (which is an asynchronous version of GetResponse method). So, in your GetData method, you'll for example write something like this:
let GetData (url:string) = async {
let req = WebRequest.Create(url)
let! rsp = req.AsyncGetResponse()
use stream = rsp.GetResponseStream()
use reader = new System.IO.StreamReader(stream)
let html = reader.AsyncReadToEnd()
return CalculateResult(html) }
The summary is that you need to identify some primitive asynchronous operations (such as waiting for the web server or for the file system), use primitive asynchronous operations at that point and wrap all the code that uses these operations in async blocks. If there are no primitive operations that could be run asynchronously, then your code is CPU-bound and you can just use Parallel.map.
I hope this helps you understand how F# asynchronous workflows work. For more information, you can for example take a look at Don Syme's blog post, series about asynchronous programming by Robert Pickering, or my F# web cast.
#Tomas already has a great answer. I'll just say a couple bits in addition.
The idiom for F# asyncs is to name the method with an "Async" prefix (AsyncFoo, not FooAsync; the latter is an idiom already used by another .NET technology). So your functions should be getStockData and asyncGetStockData.
Inside an async workflow, whenever you use let! instead of let or do! instead of do, the thing on the right should have type Async<T> instead of T. Basically you need an existing async computation in order to 'go async' at this point in the workflow. Each Async<T> will itself be either some other async{...} workflow, or else an async "primitive". The primitives are defined in the F# library or created in user code via Async.FromBeginEnd or Async.FromContinuations which enable defining the low-level details of starting a computation, registering an I/O callback, releasing the thread, and then restarting the computation when getting called back. So you have to 'plumb' async all the way down to some truly-async-I/O-primitive in order to get the full benefits of async I/O.
Related
From the wikibook on F# there is a small section where it says:
What does let! do?#
let! runs an async<'a> object on its own thread, then it immediately
releases the current thread back to the threadpool. When let! returns,
execution of the workflow will continue on the new thread, which may
or may not be the same thread that the workflow started out on.
I have not found anywhere else in books or on the web where this fact (highlighted in bold) is stated.
Is this true for all let!/do! regardless of what the async object contains (e.g. Thread.Sleep()) and how it is started (e.g. Async.Start)?
Looking in the F# source code on github, I wasn't able to find the place where a call to bind executes on a new (TP) thread. Where in the code is the magic happening?
Which part of that statement do you find surprising? That parts of a single async can execute on different threadpool threads, or that a threadpool thread is necessarily being released and obtained on each bind?
If it's the latter, then I agree - it sounds wrong. Looking at the code, there are only a few places where a new work item is being queued on the threadpool (namely, the few Async module functions that use queueAsync internally), and Async.SwitchToNewThread spawns a non-threadpool thread and runs the continuation there. A bind alone doesn't seem to be enough to switch threads.
The spirit of the statement however seems to be about the former - no guarantees are made that parts of an async block will run on the same thread. The exact thread that you run on should be treated as an implementation detail, and when you yield control and await some result, you can be pretty sure that you'll land on a different thread at least some of the time.
No. An async operations might execute synchronously on the current thread, or it might wind up completing on a different thread. It depends entirely on how the async API in question is implemented.
See Do the new C# 5.0 'async' and 'await' keywords use multiple cores? for a decent explanation. The implementation details of F# and C# async are different, but the overall principles are the same.
The builder that implements the F# async computation expression is here.
I have encountered some code that looks like this.
member this.Send (data:array<byte>) =
if tcpClient.Connected then
// Send something.
member this.Open () =
if not tcpClient.Connected then
// Connect.
It's a potential bug hive with constantly checking to see if the TcpClient is connected before performing an operation on it.
A similar problem would be to check whether or not something is null before performing an operation on that something.
What is the general approach to dealing with this?
I was thinking along the lines of a monad that abstracts this boring checking away.
EDIT:
Potentially I can write many methods that each will have to check if we are connected.
member this.SendName name =
if tcpClient.Connected then
// Send name
member this.ThrottleConnection percent =
if tcpClient.Connected then
// Throttle
member this.SendAsTest text =
if tcpClient.Connected then
// Send as text.
So, it depends on whether you want to do the check inside the wrapper class or outside of it. Doing the check inside the class, I don't see how a computation expression is really relevant; you're not binding operations.
A workflow expression would only be useful if you're doing the check outside the wrapper class (i.e. from the calling function). If you create a connected builder together, the resulting code would look like
connected {
do! wrapper.Send(..)
do! wrapper.Throttle(..)
do! wrapper.SendAsTest(..)
}
However, that is really no simpler than
if wrapper.connected do
wrapper.Send(..)
wrapper.Throttle(..)
wrapper.SendAsTest(..)
So, kind of, what's the point, right?
It'd make more sense if you had multiple tcpClient wrapper objects and needed them all to be connected within your workflow. That's more what the "monadic" approach is for.
connected {
do! wrapper1.Send(..)
do! wrapper2.Throttle(..)
do! wrapper3.SendAsText(..)
}
However, specific to your example of doing the checks inside the wrapper class, like I said earlier, monads would not be applicable. One neat approach to that specific problem would be to try mimicking some preconditions like the following link http://laurent.le-brun.eu/site/index.php/2008/03/26/32-design-by-contract-with-fsharp. I don't know if it's much more intuitive than the if statements, but if you're looking for an fsharp-y way of doing things interestingly, that's the best I can come up with.
Ultimately your existing code is about as compact as it gets. Presumably not all of your functions would start with the same if statement, so there's nothing unnecessarily repetitive there.
I have been lately programming with the FSharpx library and especially its TaskBuilder. Now I wonder if it should be possible to define a function which takes parameters and takes a result. Such as
let doTask(parameter:int) =
let task = TaskBuilder(scheduler = TaskScheduler.Current)
task {
return! Task.Factory.StartNew(fun() -> parameter + 1)
}
match FSharpx.Task.run doTask(1) with
| _ -> ()
Looking at the source code I see run expects a function taking no parameters and returning a Task<'a>. There doesn't look like being examples on FSharpx TaskTests either.
I'd appreciate if someone could advice how should I get a scenario like this going with FSharpx or if one isn't supposed to use the library like this for a reason I haven't quite grasped as of yet.
<edit: I believe I could wrap doTask as follows
wrapperDoTask() = doTask(101)
match FSharpx.Task.run wrapperDoTask with
| _ -> ()
And it might work. I'm not with a compiler currently, so this is a bit of a handwaving. Does anyone have an opinion on any direction or did I just answer my own question? :)
<edit2:
I think I need to edit this one more time based on MisterMetaphor's answer. Especially his P.S., I think, was well informing. I use FSharpx TaskBuilder to interop with C#, in which, as noted, tasks are returned as hot (with some minor exceptions), already running. This is in connection with my recent question Translating async-await C# code to F# with respect to the scheduler and in relation Orleans (I'll add some tags to beef up the context, maybe someone else is pondering these too).
When thinking in C# terms, what I try to achieve is to await the task result before returning, but without blocking. The behaviour I'm after is especially of that of await not .Result. The difference can be read, for instance, from
Await, and UI, and deadlocks! Oh my!
Don't Block on Async Code.
Trying to think which context or scheduler or behavior or something is going on in terms of C# is somewhat fuzzy for me. Unfortunatelly it looks like I can't ignore all the details when it comes to interop. :)
You need to use Task.run only if you want to wait for the task completion synchronously on the current thread. It takes a single parameter and you can consider that parameter a task factory -- i.e. a means to create a Task<_>. Unlike Async<_>, the Task<_> starts running as soon as it is created. That is not always a desirable behavior.
You could achieve similar results (a blocking wait for task completion) with (doTask 101).Result, but I think Task.run is more idiomatic to F#, in a way that it uses a Result return type to signal an error instead of raising an exception. It might be arguable which is better, depending on situation, but in my experience in simpler cases a special result type is more composable than exceptions.
Another point here is that you should avoid blocking waits (Task.run, .Wait(), .Result) as much as you can. (Ideally, you'd have one of those only at the top level of your program.)
P.S. This if out of scope of the question, but your doTask function looks funny. task { return! Task.Factory.StartNew( ... ) } is equivalent to Task.Factory.StartNew( ... ). What you probably wanted to do is task { return parameter + 1 }.
EDIT
So, in response to OP's question edit :) If you need the await behavior from C#, you just need to use let! .... Like this:
task {
let! x = someTask 1 2 3
return x + 5
}
I'm going over some older Dart code, addressing breaking changes with the latest Dart SDK. This one I can't figure out:
Future<DateTime> get lastsave =>
client.lastsave.transform((int unixTs) =>
new DateTime.fromMillisecondsSinceEpoch(unixTs * 1000, isUtc:true));
=>
The method 'transform' is not defined for the class 'Future<List<int>>'
From what I understand, the purpose of Future.transform() was to apply a synchronous transformation (see e.g. this discussion thread). I.e. convert the async call to a sync call and return the value.
Has Future.transform been replaced with something else?
Must have been quite a while since that code was updated ;)
Just replace transform with then and it should work.
From https://groups.google.com/a/dartlang.org/forum/#!topic/misc/Boch2XH9Tmk
We have also improved the Future class, and made it simpler to use. One simple “then” methods lets you apply asynchronous or synchronous functions to the result of a future, merging the three methods “chain”, “transform” and “then”. Streams and Futures should make asynchronous Dart programs easier to write and read, and should reduce some types of programming errors.
Lets say I wanted to scrape a webpage, and extract some data. I'd most likely write something like this:
let getAllHyperlinks(url:string) =
async { let req = WebRequest.Create(url)
let! rsp = req.GetResponseAsync()
use stream = rsp.GetResponseStream() // depends on rsp
use reader = new System.IO.StreamReader(stream) // depends on stream
let! data = reader.AsyncReadToEnd() // depends on reader
return extractAllUrls(data) } // depends on data
The let! tells F# to execute the code in another thread, then bind the result to a variable, and continue processing. The sample above uses two let statements: one to get the response, and one to read all the data, so it spawns at least two threads (please correct me if I'm wrong).
Although the workflow above spawns several threads, the order of execution is serial because each item in the workflow depends on the previous item. Its not really possible to evaluate any items further down the workflow until the other threads return.
Is there any benefit to having more than one let! in the code above?
If not, how would this code need to change to take advantage of multiple let! statements?
The key is we are not spawning any new threads. During the whole course of the workflow, there are 1 or 0 active threads being consumed from the ThreadPool. (An exception, up until the first '!', the code runs on the user thread that did an Async.Run.) "let!" lets go of a thread while the Async operation is at sea, and then picks up a thread from the ThreadPool when the operation returns. The (performance) advantage is less pressure against the ThreadPool (and of course the major user advantage is the simple programming model - a million times better than all that BeginFoo/EndFoo/callback stuff you otherwise write).
See also http://cs.hubfs.net/forums/thread/8262.aspx
I was writing an answer but Brian beat me to it. I fully agree with him.
I'd like to add that if you want to parallelize synchronous code, the right tool is PLINQ, not async workflows, as Don Syme explains.