I have the following code
open FSharp.Data
let downloadFile link =
......
use os = File.Create(...)
Http.RequestStream(....).ReponseStream.CopyTo(os)
let rec consume() = async {
......
|> Seq.iter (fun x ->
xxx |> Seq.iter(fun link ->
downloadFile link
))
}
I found that the sync downloading makes the code not run concurrently. So I'm trying to do somthing like the following. How to change it to use the FSharp.Data http AsyncRequestStream? Maybe the CopyTo can be async too?
open FSharp.Data
let downloadFile link = async {
......
use os = File.Create(...)
Http.AsyncRequestStream(....).ReponseStream.CopyTo(os) // Error
}
let rec consume() = async {
......
|> Seq.iter (fun x ->
xxx |> Seq.iter(fun link ->
downloadFile link |> Async.Start // do! downloadFile link????
))
}
consume() |> Async.RunSynchronously
Here's a skeleton solution, worthy of all the blank spots in your example:
let downloadFile link =
async {
......
use os = File.Create(...)
let! resp = Http.AsyncRequestStream(....)
return resp.ReponseStream.CopyTo(os)
}
let consume link =
async {
let comps : Async<unit> [] =
xxx
|> Seq.map (fun link -> downloadFile link)
|> Array.ofSeq
return! Async.Parallel comps
}
I think you should read up on asynchronicity and concurrency in general, as well as how to use it in F# in particular. From the OP it seems the whole thing is a bit hazy to you.
Edit: to answer the question in the comment:
With return! (or let!, or do!) you execute the nested workflow asynchronously, then pick up executing the current workflow from that point. That is, everything "below" the do! is put into a continuation that gets called once the thing "after" the do! finishes.
Whereas Async.Start fires up the workflow on (another) background thread and returns immediately without waiting for it to finish.
Related
What is the idiomatic F# way of handling an asynchronous while loop accumulation?
I'm working with the new (still in preview) Azure Cosmos DB SDK. Querying the database returns a CosmosResultSetIterator<T> which has a HasMoreResults property and a FetchNextSetAsync() method. My straight-up translation of the C# code looks like this:
let private fetchItemsFromResultSet (resultSetIterator: CosmosResultSetIterator<'a>) =
let results = ResizeArray<'a>()
async {
while resultSetIterator.HasMoreResults do
let! response = resultSetIterator.FetchNextSetAsync() |> Async.AwaitTask
results.AddRange(response |> Seq.toArray)
return Seq.toList results
}
I would take a look at the AsyncSeq package. You can use it to create asynchronously computed sequences and then iterate them asynchronously or in parallel. This allows for the async-binding to be inside the sequence and the yield to occur asynchronously, so you don't have to build up an accumulator explicitly.
You can use it to do something like:
open FSharp.Control
let private fetchItemsFromResultSet (resultSetIterator: CosmosResultSetIterator<'a>) =
asyncSeq {
while resultSetIterator.HasMoreResults do
let! response = resultSetIterator.FetchNextSetAsync() |> Async.AwaitTask
yield! response |> AsyncSeq.ofSeq
}
IMHO tail-recursion is preferable to while loops as it's one way to avoid mutation.
For example:
let fetchItemsFromResultSet (resultSetIterator: CosmosResultSetIterator<'a>) =
let rec loop results =
async {
if resultSetIterator.HasMoreResults then
let! vs = resultSetIterator.FetchNextSetAsync () |> Async.AwaitTask
let vs = vs |> Seq.toList
return! loop (vs::results)
else
// List.rev needed because batches are in reverse
return results |> List.rev |> List.concat
}
loop []
Very recently, FSharp.Control.TaskSeq was added to support tasks natively with seqs. The answer here by #Just another metaprogrammer can be rewritten as
#r "nuget: FSharp.Control.TaskSeq"
open FSharp.Control
let private fetchItemsFromResultSet (resultSetIterator: CosmosResultSetIterator<'a>) = taskSeq {
while resultSetIterator.HasMoreResults do
let! response = resultSetIterator.FetchNextSetAsync()
yield! response |> TaskSeq.ofSeq
}
How to do an simple await in F# ?
In C# I have code like this:
await collection.InsertOneAsync(DO);
var r = collection.ReplaceOneAsync((fun d -> d.Id = DO.Id), DO)
So I created a let await = ... to my F# code become more similar with my C# code.
My current F# code is this:
let awaits (t: Threading.Tasks.Task) = t |> Async.AwaitTask |> Async.RunSynchronously
let await (t: Threading.Tasks.Task<'T>) = t |> Async.AwaitTask |> Async.RunSynchronously
let Busca (numero) =
let c = collection.Find(fun d -> d.Numero=numero).ToList()
c
let Insere(DO: DiarioOficial) =
//collection.InsertOneAsync(DO) |> Async.AwaitTask |> Async.RunSynchronously
collection.InsertOneAsync(DO) |> awaits
let Salva (DO: DiarioOficial) =
//let r = collection.ReplaceOneAsync((fun d -> d.Id = DO.Id), DO) |> Async.AwaitTask |> Async.RunSynchronously
let r = collection.ReplaceOneAsync((fun d -> d.Id = DO.Id), DO) |> await
r
I want to have only one definition for await (awaits), but the best I could do is this, because on Insere, type is Task, but on Salva, type is Task<'T>
If i use only the await, I get this compile error:
FS0001 The type 'Threading.Tasks.Task' is not compatible with the type 'Threading.Tasks.Task<'a>'
If I use only the awaits, it compiles, but I lose the return type from the async Task
I want to merge the await and awaits in a single
let await = ...
How can I do this?
In F# we tend to use another syntax. It is described e.g. here: https://fsharpforfunandprofit.com/posts/concurrency-async-and-parallel/.
or here: https://learn.microsoft.com/en-us/dotnet/fsharp/tutorials/asynchronous-and-concurrent-programming/async
The idea of working with C# Tasks is to "convert" them to async with Async.Await<'T>
You can do it probably another way, but it is the most straightforward.
There are two parts of writing async code in both F# and C#.
You need to mark the method or code block as asynchronous. In C#, this is done using the async keyword. The F# equivalent is to use the async { ... } block (which is an expression, but otherwise, it is similar).
Inside async method or async { .. } block, you can make non-blocking calls. In C#, this is done using await and in F# it is done using let!. Note that this is not just a function call - the compiler handles this in a special way.
F# also uses Async<T> type rather than Task<T>, but those are easy to convert - e.g. using Async.AwaitTask. So, you probably want something like this:
let myAsyncFunction () = async {
let! _ = collection.InsertOneAsync(DO) |> Async.AwaitTask
let r = collection.ReplaceOneAsync((fun d -> d.Id = DO.Id), DO)
// More code goes here
}
I used let! to show the idea, but if you have an asynchronous operation that returns unit, you can also use do!
do! collection.InsertOneAsync(DO) |> Async.AwaitTask
I'm doing many async web requests and using Async.Parallel. Something like:
xs
|> Seq.map (fun u -> downloadAsync u.Url)
|> Async.Parallel
|> Async.Catch
Some request may throw exceptions, I want to log them and continue with the rest of urls. I found the Async.Catch function, but this stop the computation when the first exception is thrown. I know I can use a try...with expression within the async expression in order to compute the entire list, but, i think, this implies passing a log function to my downloadAsync function changing his type. Is there any other way to catch the exceptions, log them and continue with the rest of urls?
The 'trick' is to move the catch into the map such that catching is parallelized as well:
open System
open System.IO
open System.Net
type T = { Url : string }
let xs = [
{ Url = "http://microsoft.com" }
{ Url = "thisDoesNotExists" } // throws when constructing Uri, before downloading
{ Url = "https://thisDotNotExist.Either" }
{ Url = "http://google.com" }
]
let isAllowedInFileName c =
not <| Seq.contains c (Path.GetInvalidFileNameChars())
let downloadAsync url =
async {
use client = new WebClient()
let fn =
[|
__SOURCE_DIRECTORY__
url |> Seq.filter isAllowedInFileName |> String.Concat
|]
|> Path.Combine
printfn "Downloading %s to %s" url fn
return! client.AsyncDownloadFile(Uri(url), fn)
}
xs
|> Seq.map (fun u -> downloadAsync u.Url |> Async.Catch)
|> Async.Parallel
|> Async.RunSynchronously
|> Seq.iter (function
| Choice1Of2 () -> printfn "Succeeded"
| Choice2Of2 exn -> printfn "Failed with %s" exn.Message)
(*
Downloading http://microsoft.com to httpmicrosoft.com
Downloading thisDoesNotExists to thisDoesNotExists
Downloading http://google.com to httpgoogle.com
Downloading https://thisDotNotExist.Either to httpsthisDotNotExist.Either
Succeeded
Failed with Invalid URI: The format of the URI could not be determined.
Failed with The remote name could not be resolved: 'thisdotnotexist.either'
Succeeded
*)
Here I wrapped the download into another async to capture the Uri construction exception.
I have the following interface method:
Task<string[]> GetBlobsFromContainer(string containerName);
and its implementation in C#:
var container = await _containerClient.GetContainer(containerName);
var tasks = container.ListBlobs()
.Cast<CloudBlockBlob>()
.Select(b => b.DownloadTextAsync());
return await Task.WhenAll(tasks);
When I try to rewrite it in F#:
member this.GetBlobsFromContainer(containerName : string) : Task<string[]> =
let task = async {
let! container = containerClient.GetContainer(containerName) |> Async.AwaitTask
return container.ListBlobs()
|> Seq.cast<CloudBlockBlob>
|> Seq.map (fun b -> b.DownloadTextAsync())
|> ??
}
task |> ??
I'm stuck with the last lines.
How to return to Task<string[]> from F# properly?
I had to guess what the type of containerClient is and the closest I found is CloudBlobClient (which does not have getContainer: string -> Task<CloubBlobContainer> but it shouldn't be too hard to adapt). Then, your function might look like as follows:
open System
open System.Threading.Tasks
open Microsoft.WindowsAzure.Storage.Blob
open Microsoft.WindowsAzure.Storage
let containerClient : CloudBlobClient = null
let GetBlobsFromContainer(containerName : string) : Task<string[]> =
async {
let container = containerClient.GetContainerReference(containerName)
return! container.ListBlobs()
|> Seq.cast<CloudBlockBlob>
|> Seq.map (fun b -> b.DownloadTextAsync() |> Async.AwaitTask)
|> Async.Parallel
} |> Async.StartAsTask
I changed the return type to be Task<string[]> instead of Task<string seq> as I suppose you want to keep the interface. Otherwise, I'd suggest to get rid of the Task and use Async in F#-only code.
Will this work?
member this.GetBlobsFromContainer(containerName : string) : Task<string seq> =
let aMap f x = async {
let! a = x
return f a }
let task = async {
let! container = containerClient.GetContainer(containerName) |> Async.AwaitTask
return! container.ListBlobs()
|> Seq.cast<CloudBlockBlob>
|> Seq.map (fun b -> b.DownloadTextAsync() |> Async.AwaitTask)
|> Async.Parallel
|> aMap Array.toSeq
}
task |> Async.StartAsTask
I had to make some assumptions about containerClient etc. so I haven't been able to test this, but at least it compiles.
I am using the csv type provider to collect some data from a series of files I have on Azure blob storage:
#r "../packages/FSharp.Data.2.0.9/lib/portable-net40+sl5+wp8+win8/FSharp.Data.dll"
open FSharp.Data
type censusDataContext = CsvProvider<"https://portalvhdspgzl51prtcpfj.blob.core.windows.net/censuschicken/AK.TXT">
type stateCodeContext = CsvProvider<"https://portalvhdspgzl51prtcpfj.blob.core.windows.net/censuschicken/states.csv">
let stateCodes = stateCodeContext.Load("https://portalvhdspgzl51prtcpfj.blob.core.windows.net/censuschicken/states.csv");
let fetchStateData (stateCode:string)=
let uri = System.String.Format("https://portalvhdspgzl51prtcpfj.blob.core.windows.net/censuschicken/{0}.TXT",stateCode)
censusDataContext.Load(uri).Rows
let usaData = stateCodes.Rows
|> Seq.collect(fun r -> fetchStateData(r.Abbreviation))
|> Seq.length
I now want to run these async and I am running into a problem with AsyncLoad:
let fetchStateDataAsync(stateCode:string)=
async{
let uri = System.String.Format("https://portalvhdspgzl51prtcpfj.blob.core.windows.net/censuschicken/{0}.TXT",stateCode)
let! stateData = censusDataContext.AsyncLoad(uri)
return stateData.Rows
}
let usaData = stateCodes.Rows
|> Seq.collect(fun r -> fetchStateDataAsync(r.Abbreviation))
|> Seq.length
The error message is
The type 'Async<seq<CsvProvider<...>.Row>>' is not compatible with the type 'seq<'a>'
Forgive my lack of async knowledge, but do I have to use something other than Seq.Collect when applying async functions?
Thanks in advance
The problem is that turning code to asynchronous (by wrapping it in the async { .. } block) changes the result from seq<Row> to Async<seq<Row>> - that is, you now get an asynchronous computation that will eventually complete and return the sequence.
To fix this, you need to somehow start the computation and wait for the result. There is a number of choices - like running one by one sequentially. Probably the easiest option (and maybe the best - depending on what you want to do) is to run the computations in parallel:
let getAll =
stateCodes.Rows
|> Seq.map(fun r -> fetchStateDataAsync(r.Abbreviation))
|> Async.Parallel
This gives you an asynchronous computation that runs all the downloads and returns an array of results. You can run this synchronously (and block) and get the results:
getAll |> Async.RunSynchronously
|> Seq.collect id
|> Seq.length
If you want to run the downloads asynchronously in the background you can do that to, but you need to specify what to do with the result. For example:
async {
let! all = getAll
all |> Seq.collect id |> Seq.length |> printfn "Length %d" }
|> Async.Start