I have the following F# program that retrieves a webpage from the internet:
open System.Net
[<EntryPoint>]
let main argv =
let mutable pageData : byte[] = [| |]
let fullURI = "http://www.badaddress.xyz"
let wc = new WebClient()
try
pageData <- wc.DownloadData(fullURI)
()
with
| :? System.Net.WebException as err -> printfn "Web error: \n%s" err.Message
| exn -> printfn "Unknown exception:\n%s" exn.Message
0 // return an integer exit code
This works fine if the URI is valid and the machine has an internet connection and the web server responds properly etc. In an ideal functional programming world the results of a function would not depend on external variables not passed as arguments (side effects).
What I would like to know is what is the appropriate F# design pattern to deal with operations which might require the function to deal with recoverable external errors. For example if the website is down one might want to wait 5 minutes and try again. Should parameters like how many times to retry and delays between retries be passed explicitly or is it OK to embed these variables in the function?
In F#, when you want to handle recoverable errors you almost universally want to use the option or the Choice<_,_> type. In practice the only difference between them is that Choice allows you to return some information about the error while option does not. In other words, option is best when it doesn't matter how or why something failed (only that it did fail); Choice<_,_> is used when having information about how or why something failed is important. For example, you might want to write the error information to a log; or perhaps you want to handle an error situation differently based on why something failed -- a great use case for this is providing accurate error messages to help users diagnose a problem.
With that in mind, here's how I'd refactor your code to handle failures in a clean, functional style:
open System
open System.Net
/// Retrieves the content at the given URI.
let retrievePage (client : WebClient) (uri : Uri) =
// Preconditions
checkNonNull "uri" uri
if not <| uri.IsAbsoluteUri then
invalidArg "uri" "The URI must be an absolute URI."
try
// If the data is retrieved successfully, return it.
client.DownloadData uri
|> Choice1Of2
with
| :? System.Net.WebException as webExn ->
// Return the URI and WebException so they can be used to diagnose the problem.
Choice2Of2 (uri, webExn)
| _ ->
// Reraise any other exceptions -- we don't want to handle them here.
reraise ()
/// Retrieves the content at the given URI.
/// If a WebException is raised when retrieving the content, the request
/// will be retried up to a specified number of times.
let rec retrievePageRetry (retryWaitTime : TimeSpan) remainingRetries (client : WebClient) (uri : Uri) =
// Preconditions
checkNonNull "uri" uri
if not <| uri.IsAbsoluteUri then
invalidArg "uri" "The URI must be an absolute URI."
elif remainingRetries = 0u then
invalidArg "remainingRetries" "The number of retries must be greater than zero (0)."
// Try to retrieve the page.
match retrievePage client uri with
| Choice1Of2 _ as result ->
// Successfully retrieved the page. Return the result.
result
| Choice2Of2 _ as error ->
// Decrement the number of retries.
let retries = remainingRetries - 1u
// If there are no retries left, return the error along with the URI
// for diagnostic purposes; otherwise, wait a bit and try again.
if retries = 0u then error
else
// NOTE : If this is modified to use 'async', you MUST
// change this to use 'Async.Sleep' here instead!
System.Threading.Thread.Sleep retryWaitTime
// Try retrieving the page again.
retrievePageRetry retryWaitTime retries client uri
[<EntryPoint>]
let main argv =
/// WebClient used for retrieving content.
use wc = new WebClient ()
/// The amount of time to wait before re-attempting to fetch a page.
let retryWaitTime = TimeSpan.FromSeconds 2.0
/// The maximum number of times we'll try to fetch each page.
let maxPageRetries = 3u
/// The URI to fetch.
let fullURI = Uri ("http://www.badaddress.xyz", UriKind.Absolute)
// Fetch the page data.
match retrievePageRetry retryWaitTime maxPageRetries wc fullURI with
| Choice1Of2 pageData ->
printfn "Retrieved %u bytes from: %O" (Array.length pageData) fullURI
0 // Success
| Choice2Of2 (uri, error) ->
printfn "Unable to retrieve the content from: %O" uri
printfn "HTTP Status: (%i) %O" (int error.Status) error.Status
printfn "Message: %s" error.Message
1 // Failure
Basically, I split your code out into two functions, plus the original main:
One function that attempts to retrieve the content from a specified URI.
One function containing the logic for retrying attempts; this 'wraps' the first function which performs the actual requests.
The original main function now only handles 'settings' (which you could easily pull from an app.config or web.config) and printing the final results. In other words, it's oblivious to the retrying logic -- you could modify the single line of code with the match statement and use the non-retrying request function instead if you wanted.
If you want to pull content from multiple URIs AND wait for a significant amount of time (e.g., 5 minutes) between retries, you should modify the retrying logic to use a priority queue or something instead of using Thread.Sleep or Async.Sleep.
Shameless plug: my ExtCore library contains some things to make your life significantly easier when building something like this, especially if you want to make it all asynchronous. Most importantly, it provides an asyncChoice workflow and collections functions designed to work with it.
As for your question about passing in parameters (like the retry timeout and number of retries) -- I don't think there's a hard-and-fast rule for deciding whether to pass them in or hard-code them within the function. In most cases, I prefer to pass them in, though if you have more than a few parameters to pass in, you're better off creating a record to hold them all and passing that instead. Another approach I've used is to make the parameters option values, where the defaults are pulled from a configuration file (though you'll want to pull them from the file once and assign them to some private field to avoid re-parsing the configuration file each time your function is called); this makes it easy to modify the default values you've used in your code, but also gives you the flexibility of overriding them when necessary.
Related
I am using Playwright in F# for web scrapping and I noticed that result is returned randomly.
I have this code.
let getContent (url:string) =
task{
use! paywright = Playwright.CreateAsync()
let! browser = paywright.Chromium.LaunchAsync()
printfn "URL %A" url
let! page = browser.NewPageAsync()
page.SetDefaultTimeout(15000f)
let! goto = page.GotoAsync(url)
let! price = page.Locator("//span[#class='norm-price ng-binding']").AllInnerTextsAsync()
printfn "Price %A" price
}
When I run the console program sometimes it returns result (list of prices), but sometimes its just finished with empty result.
I really dont know what can be wrong. I also try use async wrapper instead of task but the output is same.
The delay I increase to 15s, but it also doesnt help.
Could it be that you do not await the task returned by getContent?
Maybe the program terminates before writing to the console. If the calling code is not asynchronous (and cannot propagate the task), you could try:
let printContent (url : string) =
task { ... } |> Task.RunSynchronously
Update 1:
Probably the page loads it's price data asynchronously.The default timeout on the page is there to specify a maximum timeout, not to wait that long for some data to arrive in the controlled browser instance. Most likely you'll have to wait for some request to finish or some element to appear on the page. Can you share the URL publicly?
I am struggling with how to set different cache response headers based on whether the result is an Ok or an Error. My code is something like the following (but with other types in the result):
let resultToJson (result:Result<'a,string>) : HttpHandler =
match result with
| Ok o -> Successful.ok (json o)
| Error s -> ServerErrors.internalError (text s)
I can add the headers by doing something like the following:
let resultToJson (result:Result<'a,string>) : HttpHandler =
fun (next : HttpFunc) (ctx : HttpContext) ->
let response =
let headers = ctx.Response.Headers
match result with
| Ok o ->
headers.Add("Cache-Control", new StringValues("public, max-age=10, stale-while-revalidate=2"))
headers.Add("Vary", new StringValues("Origin"))
Successful.ok (json o)
| Error s ->
headers.Add("Cache-Control", new StringValues("no-cache"))
ServerErrors.internalError (text s)
response next ctx
But this does not feel right. I would like to use the standard HttpHandlers from the ResponseCaching module to set the right cache headers:
publicResponseCaching 10 (Some "Origin") // For Ok: Add 10 sec public cache, Vary by Origin
noResponseCaching // For Error: no caching
How do I achieve this?
The response cache handler is supposed to be piped into an normal pipeline. Your choice between Ok and Error is a choose function, so you can use a choose that takes a list of handlers that can be attempted. To reject a path, just return a task { return None }, to move forward, it's next ctx.
If you want to keep all the logic in one controller, like you have now, just keep your match and pipe your json/text response into one of the caching handlers.
let fn = json o >=> publicResponseCaching 30 None) in fn next ctx
if it's nested inside a hander, instead of in a pipeline, you have to apply the next & ctx
I found the solution to my problem.
Yes, I can chain the HttpHandlers like Gerard and Honza Brestan mentioned, using the fish operator (>=>). The reason I could not make that work in the first place was that I also had created a fish operator for the Result type in an opened module. Basically I had created proper fish soup
As soon as I refactored my code so that the module containing the Result fish operator was not open in this scope, everything worked fine as expected.
Another point to remember is that response caching needs to be called before the finalizing HttpHandler, otherwise it will not be called:
// Simplified code
let resultToJson =
function
| Ok o -> publicResponseCaching 10 (Some "Origin") >=> Successful.ok(json o)
| Error e -> noResponseCaching >=> ServerErrors.internalError(text e)
I was pretty comfortable with how async cancellations where done in C# with the TPL, but I am a little bit confused in F#. Apparently by calling Async.CancelDefaultToken() is enough to cancel outgoing Async<'T> operations. But they are not cancelled as I expected, they just... vanishes... I cannot detect properly the cancellation and tear down the stack properly.
For example, I have this code that depends on a C# library that uses TPL:
type WebSocketListener with
member x.AsyncAcceptWebSocket = async {
let! client = Async.AwaitTask <| x.AcceptWebSocketAsync Async.DefaultCancellationToken
if(not(isNull client)) then
return Some client
else
return None
}
let rec AsyncAcceptClients(listener : WebSocketListener) =
async {
let! result = listener.AsyncAcceptWebSocket
match result with
| None -> printf "Stop accepting clients.\n"
| Some client ->
Async.Start <| AsyncAcceptMessages client
do! AsyncAcceptClients listener
}
When the CancellationToken passed to x.AcceptWebSocketAsync is cancelled, returns null, and then AsyncAcceptWebSocket method returns None. I can verify this with a breakpoint.
But, AsyncAcceptClients (the caller), never gets that None value, the method just ends, and "Stop accepting clients.\n" is never displayed on the console. If I wrap everything in a try\finally :
let rec AsyncAcceptClients(listener : WebSocketListener) =
async {
try
let! result = listener.AsyncAcceptWebSocket
match result with
| None -> printf "Stop accepting clients.\n"
| Some client ->
Async.Start <| AsyncAcceptMessages client
do! AsyncAcceptClients listener
finally
printf "This message is actually printed"
}
Then what I put in the finally gets executed when listener.AsyncAcceptWebSocket returns None, but the code I have in the match still doesn't. (Actually, it prints the message on the finally block once for each connected client, so maybe I should move to an iterative approach?)
However, if I use a custom CancellationToken rather than Async.DefaultCancellationToken, everything works as expected, and the "Stop accepting clients.\n" message is print on screen.
What is going on here?
There are two things about the question:
First, when a cancellation happens in F#, the AwaitTask does not return null, but instead, the task throws OperationCanceledException exception. So, you do not get back None value, but instead, you get an exception (and then F# also runs your finally block).
The confusing thing is that cancellation is a special kind of exception that cannot be handled in user code inside the async block - once your computation is cancelled, it cannot be un-cancelled and it will always stop (you can do cleanup in finally). You can workaround this (see this SO answer) but it might cause unexpected things.
Second, I would not use default cancellation token - that's shared by all async workflows and so it might do unexpected things. You can instead use Async.CancellationToken which gives you access to a current cancellation token (which F# automatically propagates for you - so you do not have to pass it around by hand as you do in C#).
EDIT: Clarified how F# async handles cancellation exceptions.
Looking at:
member this.PostAndReply : (AsyncReplyChannel<'Reply> -> 'Msg) * ?int -> 'Reply
I can't figure out why the signature looks so counter-intuitive to me. What we want to do is posting a message to an agent, and wait for a reply.
Why do we have to give him a weird function as a 'message'?
See again this MSDN snippet:
let rec loop() =
printf "> "
let input = Console.ReadLine()
printThreadId("Console loop")
let reply = agent.PostAndReply(fun replyChannel -> input, replyChannel)
if (reply <> "Stopping.") then
printfn "Reply: %s" reply
loop()
else
()
loop()
I'd rather prefer something like this:
member this.PostAndReply : 'Msg * ?int -> 'Reply
Thanks
This type signature looks pretty confusing when you see it for the first time, but it does make sense.
The F# library design
The idea behind the is that when you call PostAndReply you need to give it a function that:
constructs a message of type 'Msg (to be sent to the agent)
after the F# runtime builds a channel for sending messages back to the caller (channels are represented as values of type AsyncReplyChannel<'Reply>).
The message that you construct needs to contain the reply channel, but the F# library does not know how you want to represent your messages (and so it does not know how you want to store the reply channel in the message). As a result, the library asks you to write a function that will construct the message for the agent after the system constructs the channel.
Your alternative suggestion
The problem with your suggestion is that if PostAndReply had a type 'Msg -> 'Reply, the message that the agent receives after it calls Receive would be of the following type:
'Msg * AsyncReplyChannel<'Reply>
... so every message received to the agent would also have to carry a channel for sending replies back. However, you probably don't want to send a reply back for every received message, so this wouldn't really work. Maybe you could use something like:
'Msg * option<AsyncReplyChannel<'Reply>>
... but that's just getting more complicated (and it still isn't quite right, because you can only reply to some messages from 'Msg, but not to all of them).
Knowing an RPC call to a server method that returns unit is a message passing call, I want to force the call to be asynchronous and be able to fire the next server call only after the first one has gone to the server.
Server code:
[<Rpc>]
let FirstCall value =
printfn "%s" value
async.Zero()
[<Rpc>]
let SecondCall() =
"test"
Client code:
|>! OnClick (fun _ _ -> async {
do! Server.FirstCall "test"
do Server.SecondCall() |> ignore
} |> Async.Start)
This seems to crash on the client since returning unit, replacing the server and client code to:
[<Rpc>]
let FirstCall value =
printfn "%s" value
async { return () }
let! _ = Server.FirstCall "test"
Didn't fix the problem, while the following did:
[<Rpc>]
let FirstCall value =
printfn "%s" value
async { return "" }
let! _ = Server.FirstCall "test"
Is there another way to force a message passing call to be asynchronous instead?
This is most definitely a bug. I added it here:
https://bugs.intellifactory.com/websharper/show_bug.cgi?id=468
Your approach is completely legit. Your workaround is also probably the best for now, e.g. instead of returning Async<unit> return Async<int> with a zero and ignore it.
We are busy with preparing the 2.4 release due next week and the fix will make it there. Thanks!
Also, in 2.4 we'll be dropping synchronous calls, so you will have to use Async throughout for RPC, as discussed in https://bugs.intellifactory.com/websharper/show_bug.cgi?id=467 -- primarily motivated by new targets (Android and WP7) that do not support sync AJAX.