FSharp: Using CSV Type Provider Async - f#

I am using the csv type provider to collect some data from a series of files I have on Azure blob storage:
#r "../packages/FSharp.Data.2.0.9/lib/portable-net40+sl5+wp8+win8/FSharp.Data.dll"
open FSharp.Data
type censusDataContext = CsvProvider<"https://portalvhdspgzl51prtcpfj.blob.core.windows.net/censuschicken/AK.TXT">
type stateCodeContext = CsvProvider<"https://portalvhdspgzl51prtcpfj.blob.core.windows.net/censuschicken/states.csv">
let stateCodes = stateCodeContext.Load("https://portalvhdspgzl51prtcpfj.blob.core.windows.net/censuschicken/states.csv");
let fetchStateData (stateCode:string)=
let uri = System.String.Format("https://portalvhdspgzl51prtcpfj.blob.core.windows.net/censuschicken/{0}.TXT",stateCode)
censusDataContext.Load(uri).Rows
let usaData = stateCodes.Rows
|> Seq.collect(fun r -> fetchStateData(r.Abbreviation))
|> Seq.length
I now want to run these async and I am running into a problem with AsyncLoad:
let fetchStateDataAsync(stateCode:string)=
async{
let uri = System.String.Format("https://portalvhdspgzl51prtcpfj.blob.core.windows.net/censuschicken/{0}.TXT",stateCode)
let! stateData = censusDataContext.AsyncLoad(uri)
return stateData.Rows
}
let usaData = stateCodes.Rows
|> Seq.collect(fun r -> fetchStateDataAsync(r.Abbreviation))
|> Seq.length
The error message is
The type 'Async<seq<CsvProvider<...>.Row>>' is not compatible with the type 'seq<'a>'
Forgive my lack of async knowledge, but do I have to use something other than Seq.Collect when applying async functions?
Thanks in advance

The problem is that turning code to asynchronous (by wrapping it in the async { .. } block) changes the result from seq<Row> to Async<seq<Row>> - that is, you now get an asynchronous computation that will eventually complete and return the sequence.
To fix this, you need to somehow start the computation and wait for the result. There is a number of choices - like running one by one sequentially. Probably the easiest option (and maybe the best - depending on what you want to do) is to run the computations in parallel:
let getAll =
stateCodes.Rows
|> Seq.map(fun r -> fetchStateDataAsync(r.Abbreviation))
|> Async.Parallel
This gives you an asynchronous computation that runs all the downloads and returns an array of results. You can run this synchronously (and block) and get the results:
getAll |> Async.RunSynchronously
|> Seq.collect id
|> Seq.length
If you want to run the downloads asynchronously in the background you can do that to, but you need to specify what to do with the result. For example:
async {
let! all = getAll
all |> Seq.collect id |> Seq.length |> printfn "Length %d" }
|> Async.Start

Related

Is there a way in F# to chain computation?

I would like to create a chain of expressions and any of them can fail when the computation should just stop.
With Unix pipes it is usually like this:
bash-3.2$ echo && { echo 'a ok'; echo; } && { echo 'b ok'; echo; }
a ok
b ok
When something fails the pipeline stops:
echo && { echo 'a ok'; false; } && { echo 'b ok'; echo; }
a ok
I can handle Optionals but my problem is that I might want to do multiple things in each branch:
let someExternalOperation = callToAnAPI()
match someExternalOperation with
| None -> LogAndStop()
| Some x -> LogAndContinue()
Then I would like to keep going with other API calls and only stop if there is an error.
Is there something like that in F#?
Update1:
What I am trying to do is calling out to external APIs. Each call can fail. Would be nice to try to retry but not required.
You can use the F# Async and Result types together to represent the results of each API Call. You can then use the bind functions for those types to build a workflow in which you only continue processing when the previous calls were successful. In order to make that easier, you can wrap the Async<Result<_,_>> you would be working with for each api call in its own type and build a module around binding those results to orchestrate a chained computation. Here's a quick example of what that would look like:
First, we would lay out the type ApiCallResult to wrap Async and Result, and we would define ApiCallError to represent HTTP error responses or exceptions:
open System
open System.Net
open System.Net.Http
type ApiCallError =
| HttpError of (int * string)
| UnexpectedError of exn
type ApiCallResult<'a> = Async<Result<'a, ApiCallError>>
Next, we would create a module to work with ApiCallResult instances, allowing us to do things like bind, map, and return so that we can process the results of a computation and feed them into the next one.
module ApiCall =
let ``return`` x : ApiCallResult<_> =
async { return Ok x }
let private zero () : ApiCallResult<_> =
``return`` []
let bind<'a, 'b> (f: 'a -> ApiCallResult<'b>) (x: ApiCallResult<'a>) : ApiCallResult<'b> =
async {
let! result = x
match result with
| Ok value ->
return! f value
| Error error ->
return Error error
}
let map f x = x |> bind (f >> ``return``)
let combine<'a> (acc: ApiCallResult<'a list>) (cur: ApiCallResult<'a>) =
acc |> bind (fun values -> cur |> map (fun value -> value :: values))
let join results =
results |> Seq.fold (combine) (zero ())
Then, you would have a module to simply do your API calls, however that works in your real scenario. Here's one that just handles GETs with query parameters, but you could make this more sophisticated:
module Api =
let call (baseUrl: Uri) (queryString: string) : ApiCallResult<string> =
async {
try
use client = new HttpClient()
let url =
let builder = UriBuilder(baseUrl)
builder.Query <- queryString
builder.Uri
printfn "Calling API: %O" url
let! response = client.GetAsync(url) |> Async.AwaitTask
let! content = response.Content.ReadAsStringAsync() |> Async.AwaitTask
if response.IsSuccessStatusCode then
let! content = response.Content.ReadAsStringAsync() |> Async.AwaitTask
return Ok content
else
return Error <| HttpError (response.StatusCode |> int, content)
with ex ->
return Error <| UnexpectedError ex
}
let getQueryParam name value =
value |> WebUtility.UrlEncode |> sprintf "%s=%s" name
Finally, you would have your actual business workflow logic, where you call multiple APIs and feed the results of one into another. In the below example, anywhere you see callMathApi, it is making a call to an external REST API that may fail, and by using the ApiCall module to bind the results of the API call, it only proceeds to the next API call if the previous call was successful. You can declare an operator like >>= to eliminate some of the noise in the code when binding computations together:
module MathWorkflow =
let private (>>=) x f = ApiCall.bind f x
let private apiUrl = Uri "http://api.mathjs.org/v4/" // REST API for mathematical expressions
let private callMathApi expression =
expression |> Api.getQueryParam "expr" |> Api.call apiUrl
let average values =
values
|> List.map (sprintf "%d")
|> String.concat "+"
|> callMathApi
>>= fun sum ->
sprintf "%s/%d" sum values.Length
|> callMathApi
let averageOfSquares values =
values
|> List.map (fun value -> sprintf "%d*%d" value value)
|> List.map callMathApi
|> ApiCall.join
|> ApiCall.map (List.map int)
>>= average
This example uses the Mathjs.org API to compute the average of a list of integers (making one API call to compute the sum, then another to divide by the number of elements), and also allows you to compute the average of the squares of a list of values, by calling the API asynchronously for each element in the list to square it, then joining the results together and computing the average. You can use these functions as follows (I added a printfn to the actual API call so it logs the HTTP requests):
Calling average:
MathWorkflow.average [1;2;3;4;5] |> Async.RunSynchronously
Outputs:
Calling API: http://api.mathjs.org/v4/?expr=1%2B2%2B3%2B4%2B5
Calling API: http://api.mathjs.org/v4/?expr=15%2F5
[<Struct>]
val it : Result<string,ApiCallError> = Ok "3"
Calling averageOfSquares:
MathWorkflow.averageOfSquares [2;4;6;8;10] |> Async.RunSynchronously
Outputs:
Calling API: http://api.mathjs.org/v4/?expr=2*2
Calling API: http://api.mathjs.org/v4/?expr=4*4
Calling API: http://api.mathjs.org/v4/?expr=6*6
Calling API: http://api.mathjs.org/v4/?expr=8*8
Calling API: http://api.mathjs.org/v4/?expr=10*10
Calling API: http://api.mathjs.org/v4/?expr=100%2B64%2B36%2B16%2B4
Calling API: http://api.mathjs.org/v4/?expr=220%2F5
[<Struct>]
val it : Result<string,ApiCallError> = Ok "44"
Ultimately, you may want to implement a custom Computation Builder to allow you to use a computation expression with the let! syntax, instead of explicitly writing the calls to ApiCall.bind everywhere. This is fairly simple, since you already do all the real work in the ApiCall module, and you just need to make a class with the appropriate Bind/Return members:
type ApiCallBuilder () =
member __.Bind (x, f) = ApiCall.bind f x
member __.Return x = ApiCall.``return`` x
member __.ReturnFrom x = x
member __.Zero () = ApiCall.``return`` ()
let apiCall = ApiCallBuilder()
With the ApiCallBuilder, you could rewrite the functions in the MathWorkflow module like this, making them a little easier to read and compose:
let average values =
apiCall {
let! sum =
values
|> List.map (sprintf "%d")
|> String.concat "+"
|> callMathApi
return!
sprintf "%s/%d" sum values.Length
|> callMathApi
}
let averageOfSquares values =
apiCall {
let! squares =
values
|> List.map (fun value -> sprintf "%d*%d" value value)
|> List.map callMathApi
|> ApiCall.join
return! squares |> List.map int |> average
}
These work as you described in the question, where each API call is made independently and the results feed into the next call, but if one call fails the computation is stopped and the error is returned. For example, if you change the URL used in the example calls here to the v3 API ("http://api.mathjs.org/v3/") without changing anything else, you get the following:
Calling API: http://api.mathjs.org/v3/?expr=2*2
[<Struct>]
val it : Result<string,ApiCallError> =
Error
(HttpError
(404,
"<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Error</title>
</head>
<body>
<pre>Cannot GET /v3/</pre>
</body>
</html>
"))

F# idiomatic conversion of async while loop accumulation

What is the idiomatic F# way of handling an asynchronous while loop accumulation?
I'm working with the new (still in preview) Azure Cosmos DB SDK. Querying the database returns a CosmosResultSetIterator<T> which has a HasMoreResults property and a FetchNextSetAsync() method. My straight-up translation of the C# code looks like this:
let private fetchItemsFromResultSet (resultSetIterator: CosmosResultSetIterator<'a>) =
let results = ResizeArray<'a>()
async {
while resultSetIterator.HasMoreResults do
let! response = resultSetIterator.FetchNextSetAsync() |> Async.AwaitTask
results.AddRange(response |> Seq.toArray)
return Seq.toList results
}
I would take a look at the AsyncSeq package. You can use it to create asynchronously computed sequences and then iterate them asynchronously or in parallel. This allows for the async-binding to be inside the sequence and the yield to occur asynchronously, so you don't have to build up an accumulator explicitly.
You can use it to do something like:
open FSharp.Control
let private fetchItemsFromResultSet (resultSetIterator: CosmosResultSetIterator<'a>) =
asyncSeq {
while resultSetIterator.HasMoreResults do
let! response = resultSetIterator.FetchNextSetAsync() |> Async.AwaitTask
yield! response |> AsyncSeq.ofSeq
}
IMHO tail-recursion is preferable to while loops as it's one way to avoid mutation.
For example:
let fetchItemsFromResultSet (resultSetIterator: CosmosResultSetIterator<'a>) =
let rec loop results =
async {
if resultSetIterator.HasMoreResults then
let! vs = resultSetIterator.FetchNextSetAsync () |> Async.AwaitTask
let vs = vs |> Seq.toList
return! loop (vs::results)
else
// List.rev needed because batches are in reverse
return results |> List.rev |> List.concat
}
loop []
Very recently, FSharp.Control.TaskSeq was added to support tasks natively with seqs. The answer here by #Just another metaprogrammer can be rewritten as
#r "nuget: FSharp.Control.TaskSeq"
open FSharp.Control
let private fetchItemsFromResultSet (resultSetIterator: CosmosResultSetIterator<'a>) = taskSeq {
while resultSetIterator.HasMoreResults do
let! response = resultSetIterator.FetchNextSetAsync()
yield! response |> TaskSeq.ofSeq
}

How to do await an Async method, similar to C#

How to do an simple await in F# ?
In C# I have code like this:
await collection.InsertOneAsync(DO);
var r = collection.ReplaceOneAsync((fun d -> d.Id = DO.Id), DO)
So I created a let await = ... to my F# code become more similar with my C# code.
My current F# code is this:
let awaits (t: Threading.Tasks.Task) = t |> Async.AwaitTask |> Async.RunSynchronously
let await (t: Threading.Tasks.Task<'T>) = t |> Async.AwaitTask |> Async.RunSynchronously
let Busca (numero) =
let c = collection.Find(fun d -> d.Numero=numero).ToList()
c
let Insere(DO: DiarioOficial) =
//collection.InsertOneAsync(DO) |> Async.AwaitTask |> Async.RunSynchronously
collection.InsertOneAsync(DO) |> awaits
let Salva (DO: DiarioOficial) =
//let r = collection.ReplaceOneAsync((fun d -> d.Id = DO.Id), DO) |> Async.AwaitTask |> Async.RunSynchronously
let r = collection.ReplaceOneAsync((fun d -> d.Id = DO.Id), DO) |> await
r
I want to have only one definition for await (awaits), but the best I could do is this, because on Insere, type is Task, but on Salva, type is Task<'T>
If i use only the await, I get this compile error:
FS0001 The type 'Threading.Tasks.Task' is not compatible with the type 'Threading.Tasks.Task<'a>'
If I use only the awaits, it compiles, but I lose the return type from the async Task
I want to merge the await and awaits in a single
let await = ...
How can I do this?
In F# we tend to use another syntax. It is described e.g. here: https://fsharpforfunandprofit.com/posts/concurrency-async-and-parallel/.
or here: https://learn.microsoft.com/en-us/dotnet/fsharp/tutorials/asynchronous-and-concurrent-programming/async
The idea of working with C# Tasks is to "convert" them to async with Async.Await<'T>
You can do it probably another way, but it is the most straightforward.
There are two parts of writing async code in both F# and C#.
You need to mark the method or code block as asynchronous. In C#, this is done using the async keyword. The F# equivalent is to use the async { ... } block (which is an expression, but otherwise, it is similar).
Inside async method or async { .. } block, you can make non-blocking calls. In C#, this is done using await and in F# it is done using let!. Note that this is not just a function call - the compiler handles this in a special way.
F# also uses Async<T> type rather than Task<T>, but those are easy to convert - e.g. using Async.AwaitTask. So, you probably want something like this:
let myAsyncFunction () = async {
let! _ = collection.InsertOneAsync(DO) |> Async.AwaitTask
let r = collection.ReplaceOneAsync((fun d -> d.Id = DO.Id), DO)
// More code goes here
}
I used let! to show the idea, but if you have an asynchronous operation that returns unit, you can also use do!
do! collection.InsertOneAsync(DO) |> Async.AwaitTask

How to convert the download program to async?

I have the following code
open FSharp.Data
let downloadFile link =
......
use os = File.Create(...)
Http.RequestStream(....).ReponseStream.CopyTo(os)
let rec consume() = async {
......
|> Seq.iter (fun x ->
xxx |> Seq.iter(fun link ->
downloadFile link
))
}
I found that the sync downloading makes the code not run concurrently. So I'm trying to do somthing like the following. How to change it to use the FSharp.Data http AsyncRequestStream? Maybe the CopyTo can be async too?
open FSharp.Data
let downloadFile link = async {
......
use os = File.Create(...)
Http.AsyncRequestStream(....).ReponseStream.CopyTo(os) // Error
}
let rec consume() = async {
......
|> Seq.iter (fun x ->
xxx |> Seq.iter(fun link ->
downloadFile link |> Async.Start // do! downloadFile link????
))
}
consume() |> Async.RunSynchronously
Here's a skeleton solution, worthy of all the blank spots in your example:
let downloadFile link =
async {
......
use os = File.Create(...)
let! resp = Http.AsyncRequestStream(....)
return resp.ReponseStream.CopyTo(os)
}
let consume link =
async {
let comps : Async<unit> [] =
xxx
|> Seq.map (fun link -> downloadFile link)
|> Array.ofSeq
return! Async.Parallel comps
}
I think you should read up on asynchronicity and concurrency in general, as well as how to use it in F# in particular. From the OP it seems the whole thing is a bit hazy to you.
Edit: to answer the question in the comment:
With return! (or let!, or do!) you execute the nested workflow asynchronously, then pick up executing the current workflow from that point. That is, everything "below" the do! is put into a continuation that gets called once the thing "after" the do! finishes.
Whereas Async.Start fires up the workflow on (another) background thread and returns immediately without waiting for it to finish.

Process function in parallel/async and append results, returning one list of results?

I have a function that returns a string[].
let asyncScrape url allParameters =
allParameters
|> Seq.map(fun v ->
yearAndClassResultsAsync url v)
|> Async.Parallel
|> Async.RunSynchronously
I want to iterate through that string array, sending each string to a method called resultsBody (that returns a seq), and then finally returning an single sequence that is the concatenation of the results from resultsBody.
I tried doing something like below, but I'm rather lost as it returns:
seq<string[]>[]
and I just want a single combined
seq<string[]>
My attempts so far:
let parseSite html =
Array.mapi (fun s -> resultsBody) html
Simplified, I think your problem is that you have a nested sequence of strings and you want to get as much parallelism as possible, rather than just at the innermost layer of the nesting.
One way you can do this is to also nest the Async.Parallel calls before calling Async.RunSynchronously. Here's a simple example of the technique:
let squareInt n = async { return n*n }
let inParallel (seqOfseqOfInts : seq<seq<int>>) =
seqOfseqOfInts
|> Seq.map // deal with each inner seq of ints
(fun (seqOfInts : seq<int>) ->
seqOfInts
|> Seq.map squareInt
|> Async.Parallel // this gives us Async<int[]>
) // this gives us seq<Async<int[]>>
|> Async.Parallel // this gives us Async<int[][]>
|> Async.RunSynchronously // this gives us int[][]

Resources