How to get Async Values from within Seq.map in F# - f#

With the following code - I am using Microsoft.Playwright to scrape a webpage.
The first line returns System.Collections.Generic.IReadOnlyList
I'm not sure how to get the second line properly as it needs to execute the async over every single line ( it doesn't currently work )
let! lines = Async.AwaitTask(page.QuerySelectorAllAsync("#statementLines>div"))
let! values = lines |> Seq.map( fun line -> Async.AwaitTask(line.GetAttributeAsync("data-statementlineid")))

To execute a sequence of async computations and obtain their results, use Async.Sequential or Async.Parallel for sequential or parallel execution respectively.
Both these methods take a sequence of asyncs as parameter and return their results as an array. Both methods guarantee the order of results in the array to be the same as the order of async computations in the sequence.
let! lines = Async.AwaitTask(page.QuerySelectorAllAsync("#statementLines>div"))
let! values =
lines
|> Seq.map (fun line -> Async.AwaitTask (line.GetAttributesAsync("data-statementlineid")))
|> Async.Sequential

Related

How to get the String Value of a Json element on the Same line in F#

I have some code that I would like to put on the same line instead of having separate variable ( maybe pipe? )
let! creditText = row.EvalOnSelectorAsync("div.moneyout","node => node.innerText") |> Async.AwaitTask
let JSonElementString = creditText.Value.GetString()
I would like to have something like:
let! creditText = row.EvalOnSelectorAsync("div.moneyout","node => node.innerText") |> Async.AwaitTask |> (fun js -> js.Value.GetString)
I can see what is happening - at the point of the function the variable is still async. How can I make it pipe the result through to a function on the same line?
I do not understand why you want to do this. The code is perfectly readable and succinct when it is written as two lines. Condensing it into a single line only makes it harder to understand.
That said, if you wanted to do this, the best option would be to have a map operation for either Async<T> or Task<T>. This exists in various libraries, but you can also easily define it yourself:
module Async =
let map f a = async {
let! r = a
return f a }
Using this, you can now write:
let! creditText =
row.EvalOnSelectorAsync("div.moneyout","node => node.innerText")
|> Async.AwaitTask
|> Async.map (fun creditText -> creditText.Value.GetString())
But as I said above, I think this is a bad idea and your two-line version is nicer.

Why does iterating previously read-in sequence trigger a new read?

In this SO post, adding
inSeq
|> Seq.length
|> printfn "%d lines read"
caused the lazy sequence in inSeq to be read in.
OK, I've expanded on that code and want to first print out that sequence (see new program below).
When the Visual Studio (2012) debugger gets to
inSeq |> Seq.iter (fun x -> printfn "%A" x)
the read process starts over again. When I examine inSeq using the debugger, inSeq appears to have no elements in it.
If I have first read elements into inSeq, how can I see (examine) those elements and why won't they print out with the call to Seq.iter?
open System
open System.Collections.Generic
open System.Text
open System.IO
#nowarn "40"
let rec readlines () =
seq {
let line = Console.ReadLine()
if not (line.Equals("")) then
yield line
yield! readlines ()
}
[<EntryPoint>]
let main argv =
let inSeq = readlines ()
inSeq
|> Seq.length
|> printfn "%d lines read"
inSeq |> Seq.iter (fun x -> printfn "%A" x)
// This will keep it alive enough to read your output
Console.ReadKey() |> ignore
0
I've read somewhere that results of lazy evaluation are not cached. Is that what is going on here? How can I cache the results?
Sequence is not a "container" of items, rather it's a "promise" to deliver items sometime in the future. You can think of it as a function that you call, except it returns its result in chunks, not all at once. If you call that function once, it returns you the result once. If you call it second time, it will return the result second time.
Because your particular sequence is not pure, you can compare it to a non-pure function: you call it once, it returns a result; you call it second time, it may return something different.
Sequences do not automatically "remember" their items after the first read - exactly same way as functions do not automatically "remember" their result after the first call. If you want that from a function, you can wrap it in a special "caching" wrapper. And so you can do for a sequence as well.
The general technique of "caching return value" is usually called "memoization". For F# sequences in particular, it is implemented in the Seq.cache function.

(Mis)understanding Seq.cache

I have the following code (inside a larger function, but that shouldn't matter):
let ordersForTask = Dictionary<_, _>()
let getOrdersForTask task =
match ordersForTask.TryGetValue task with
| true, orders -> orders
| false, _ ->
let orders =
scenario.Orders
|> Seq.filter (fun (order : Order) -> order.Tasks.Contains(task))
|> Seq.cache
ordersForTask.Add(task, orders)
orders
From what I understand, that should lead to order.Tasks.Contains() being called exactly once for each pair of order and task values, no matter how often getOrdersForTask is called with the same task value, because the input sequence is only iterated (at most) once. However, that does not appear to be the case; if I call the function n times for all the values of task that I have, profiling shows n * number of orders * number of tasks calls to Contains().
Replacing Seq.cache with Seq.toList has the effect I expected, but I want to avoid incurring the cost for Seq.toList.
What am I misunderstanding about Seq.cache or doing wrong in its usage?

Why data parameter comes last

Why have the data parameter in F# to come last, like the following code snippet shows:
let startsWith lookFor (s:string) = s.StartsWith(lookFor)
let str1 =
"hello"
|> startsWith "h"
I think part of your answer is in your question. The |> (forward pipe) operator lets you specify the last parameter to a function before you call it. If the parameters were in the opposite order, then that wouldn't work. The best examples of the power of this are with chaining of functions that operate on lists. Each function takes a list as its last parameter and returns a list that can be passed to the next function.
From http://www.tryfsharp.org/Learn/getting-started#chaining-functions:
[0..100]
|> List.filter (fun x -> x % 2 = 0)
|> List.map (fun x -> x * 2)
|> List.sum
The |> operator allows you to reorder your code by specifying the last
argument of a function before you call it. This example is
functionally equivalent to the previous code, but it reads much more
cleanly. First, it creates a list of numbers. Then, it pipes that list
of numbers to filter out the odds. Next, it pipes that result to
List.map to double it. Finally, it pipes the doubled numbers to
List.sum to add them together. The Forward Pipe Operator reorganizes
the function chain so that your code reads the way you think about the
problem instead of forcing you to think inside out.
As mentioned in the comments there is also the concept of currying, but I don't think that is as easy to grasp as chaining functions.

Working with large text files?

I need to import a large text file (55MB) (525000 * 25) and manipulate the data and produce some output. As usual I started exploring with f# interactive, and I get some really strange behaviours.
Is this file too large or my code wrong?
First test was to import and simply comute the sum over one column (not the end goal but first test):
let calctest =
let reader = new StreamReader(path)
let csv = reader.ReadToEnd()
csv.Split([|'\n'|])
|> Seq.skip 1
|> Seq.map (fun line -> line.Split([|','|]))
|> Seq.filter (fun a -> a.[11] = "M")
|> Seq.map (fun values -> float(values.[14]))
As expected this produces a seq of float both in typecheck and in interactive. If I know add:
|> Seq.sum
Type check works and says this function should return a float but if I run it in interactive I get this error:
System.IndexOutOfRangeException: Index was outside the bounds of the array
Then I removed the last line again and thought I look at the seq of float in a text file:
let writetest =
let str = calctest |> Seq.map (fun i -> i.ToString())
System.IO.File.WriteAllLines("test.txt", str )
Again, this passes the type check but throws errors in interactive.
Can the standard StreamReader not handle that amount of data? or am I going wrong somewhere? Should I use a different function then Streamreader?
Thanks.
Seq is lazy, which means that only when you add the Seq.sum is all the mapping and filtering actually being done, that's why you don't see the error before adding that line. Are you sure you have 15 columns on all rows? That's probably the problem
I would advise you to use the CSV Type Provider instead of just doing a string.Split, that way you'll be sure to not have an accidental IndexOutOfRangeException, and you'll handle , escaping correctly.
Additionaly, you're reading the whole csv file into memory by calling reader.ReadToEnd(), the CsvProvider supports streaming if you set the Cache parameter to false. It's not a problem with a 55MB file, but if you have something much larger it might be

Resources