F#.Data HTML Parser Extracting Strings From Nodes - f#

I am trying to use FSharp.Data's HTML Parser to extract a string List of links from href attributes.
I can get the links printed out to console, however, i'm struggling to get them into a list.
Working snippet of a code which prints the wanted links:
let results = HtmlDocument.Load(myUrl)
let links =
results.Descendants("td")
|> Seq.filter (fun x -> x.HasClass("pagenav"))
|> Seq.map (fun x -> x.Elements("a"))
|> Seq.iter (fun x -> x |> Seq.iter (fun y -> y.AttributeValue("href") |> printf "%A"))
How do i store those strings into variable links instead of printing them out?
Cheers,

On the very last line, you end up with a sequence of sequences - for each td.pagenav you have a bunch of <a>, each of which has a href. That's why you have to have two nested Seq.iters - first you iterate over the outer sequence, and on each iteration you iterate over the inner sequence.
To flatten a sequence of sequences, use Seq.collect. Further, to convert a sequence to a list, use Seq.toList or List.ofSeq (they're equivalent):
let a = [ [1;2;3]; [4;5;6] ]
let b = a |> Seq.collect id |> Seq.toList
> val b : int list = [1; 2; 3; 4; 5; 6]
Applying this to your code:
let links =
results.Descendants("td")
|> Seq.filter (fun x -> x.HasClass("pagenav"))
|> Seq.map (fun x -> x.Elements("a"))
|> Seq.collect (fun x -> x |> Seq.map (fun y -> y.AttributeValue("href")))
|> Seq.toList
Or you could make it a bit cleaner by applying Seq.collect at the point where you first encounter a nested sequence:
let links =
results.Descendants("td")
|> Seq.filter (fun x -> x.HasClass("pagenav"))
|> Seq.collect (fun x -> x.Elements("a"))
|> Seq.map (fun y -> y.AttributeValue("href"))
|> Seq.toList
That said, I would rather rewrite this as a list comprehension. Looks even cleaner:
let links = [ for td in results.Descendants "td" do
if td.HasClass "pagenav" then
for a in td.Elements "a" ->
a.AttributeValue "href"
]

Related

What's an alternative to Seq.iter so that I can return the result of the operation for the last item?

What's an alternative to Seq.iter so that I can return the result of the operation for the last item?
Seq.iter returns a unit. However, I want to iterate through my collection and return the last result.
Consider the following code:
let updatedGrid = grid |> Map.toSeq
|> Seq.map snd
|> Seq.iter (fun c -> grid |> setCell c
NOTE: SetCell returns a new Map:
Here's the actual code:
let setCell cell (grid:Map<(int * int), Cell>) =
grid |> Map.map (fun k v -> match k with
| c when c = (cell.X, cell.Y) -> { v with State=cell.State }
| _ -> v)
let cycleThroughCells (grid:Map<(int * int), Cell>) =
let updatedGrid = grid |> Map.toSeq
|> Seq.map snd
|> Seq.iter (fun c -> grid |> setCell c
|> ignore)
updatedGrid
Again, I just want to take the result of the last operation in the iter function
[UPDATED]
I think this works (using map):
let cycleThroughCells (grid:Map<(int * int), Cell>) =
let updatedGrid = grid |> Map.toSeq
|> Seq.map snd
|> Seq.map (fun c -> grid |> setCell c)
|> Seq.last
updatedGrid
As I said in a comment, it seems like you almost certainly want a fold so that the updated grid is passed to each successive call; otherwise the modifications are all dropped except for the last one.
I think this would do the trick:
let cycleThroughCells (grid:Map<(int * int), Cell>) =
grid
|> Map.toSeq
|> Seq.map snd
|> Seq.fold (fun grid c -> grid |> setCell c) grid
and if you reorder the arguments to setCell so that the grid argument comes first then the last line can just be |> Seq.fold setCell grid.
I don't think one exists but you can define your own using fold:
let tapSeq f s = Seq.fold (fun _ x -> f x; Some(x)) None s

How do I identify the max length from a Map's value set?

How do I identify the max length from a Map's value set?
let numbers = [1;2;2;3;3;3;4;5;5]
let map = numbers |> Seq.groupBy id
|> Map.ofSeq
I want to do this:
map.Values |> List.max
or...
let longestSequence = Map.map (fun (k, v) -> List.max(List.ofSeq(v)));
you can get something similar to Dictionary.Values with Map.toSeq >> Seq.map snd so you can get the largest collected sequence in your map like this:
> map |> Map.toSeq |> Seq.map snd |> Seq.maxBy Seq.length;;
val it : seq<int> = seq [3; 3; 3]
of course when your list is already in a sorted stage it seems strange to take the detour over Map as
> numbers |> Seq.groupBy id |> Seq.map snd |> Seq.maxBy Seq.length;;
val it : seq<int> = seq [3; 3; 3]
will do the same ;)
also if you think about the problem here can write a List.fold (with a additional map of the result) doing this as well which will only require to traverse the (sorted) list once ... maybe you can try to do this yourself ^^

GroupBy Year then take Pairwise diffs except for the head value then Flatten Using Deedle and F#

I have the following variable:
data:seq<(DateTime*float)>
and I want to do something like the following F# code but using Deedle:
data
|> Seq.groupBy (fun (k,v) -> k.Year)
|> Seq.map (fun (k,v) ->
let vals = v |> Seq.pairwise
let first = seq { yield v |> Seq.head }
let diffs = vals |> Seq.map (fun ((t0,v0),(t1,v1)) -> (t1, v1 - v0))
(k, diffs |> Seq.append first))
|> Seq.collect snd
This works fine using F# sequences but I want to do it using Deedle series. I know I can do something like:
(data:Series<DateTime*float>) |> Series.groupBy (fun k v -> k.Year)...
But then I need to take the within group year diffs except for the head value which should just be the value itself and then flatten the results into on series...I am bit confused with the deedle syntax
Thanks!
I think the following might be doing what you need:
ts
|> Series.groupInto
(fun k _ -> k.Month)
(fun m s ->
let first = series [ fst s.KeyRange => s.[fst s.KeyRange]]
Series.merge first (Series.diff 1 s))
|> Series.values
|> Series.mergeAll
The groupInto function lets you specify a function that should be called on each of the groups
For each group, we create series with the differences using Series.diff and append a series with the first value at the beginning using Series.merge.
At the end, we get all the nested series & flatten them using Series.mergeAll.

key based functional fold

I have a map reduce code for which I group in each of the threads by some key and then in the reduce part merge the results. My current approach is to search for an specific key index in the accumulator and then mapi to retrieve the combined result only for this key, leaving the rest unmodified:
let rec groupFolder sequence acc =
match sequence with
| (by:string, what) :: rest ->
let index = acc |> Seq.tryFindIndex( fun (byInAcc, _) -> byInAcc.Equals(by) )
match index with
| Some (idx) ->
acc |> Seq.mapi( fun i (byInAcc, whatInAcc) -> if i = idx then (by, (what |> Array.append whatInAcc) ) else byInAcc, whatInAcc )
|> groupFolder rest
| None -> acc |> Seq.append( seq{ yield (by, what) } )
|> groupFolder rest
My question is, is it a more functional way to achieve just this?
As an example input to this reducer
let GroupsCommingFromMap = [| seq { yield! [|("key1", [|1;2;3|] ); ("key2", [|1;2;3|] ); ("key3", [|1;2;3|]) |] }, seq { yield! [|("key1", [|4;5;6|] ); ("key2", [|4;5;6|] ); ("key3", [|4;5;6|]) |] } |];;
GroupsCommingFromMap |> Seq.reduce( fun acc i ->
acc |> groupFolder (i |> Seq.toList))
the expected result should contain all key1..key3 each with the array 1..6
From the code you posted, it is not very clear what you're trying to do. Could you include some sample inputs (together with the output that you would like to get)? And does your code actually work on any of the inputs (it has incomplete pattern match, so I doubt that...)
Anyway, you can implement key-based map reduce using Seq.groupBy. For example:
let mapReduce mapper reducer input =
input
|> Seq.map mapper
|> Seq.groupBy fst
|> Seq.map (fun (k, vs) ->
k, vs |> Seq.map snd |> Seq.reduce reducer)
Here:
The mapper takes a value from the input sequence and turns it into key value pair. The mapReduce function then groups the values using the key
The reducer is then used to reduce all values associated with each key
This lets you create a word count function like this (using simple mapper that returns the word as the key with 1 as a value and reducer that just adds all the numbers):
"hello world hello people hello world".Split(' ')
|> mapReduce (fun w -> w, 1) (+)
EDIT: The example you mentioned does not really have "mapper" part, but instead it has array of arrays as an input - so perhaps it is easier to write this directly using Seq.groupBy like this:
let GroupsCommingFromMap =
[| [|("key1", [|1;2;3|] ); ("key2", [|1;2;3|] ); ("key3", [|1;2;3|]) |]
[|("key1", [|4;5;6|] ); ("key2", [|4;5;6|] ); ("key3", [|4;5;6|]) |] |]
GroupsCommingFromMap
|> Seq.concat
|> Seq.groupBy fst
|> Seq.map (fun (k, vs) -> k, vs |> Seq.map snd |> Array.concat)

Is there a way to write this in F#?

let is_sum_greater_than_10 list =
list
|> Seq.filter (filter)
|> Seq.sum
|> (10 >)
This does not compile. Lookng at the last line "|> (10 >)" is there a way to write this such that the left is pipelined to the right for binary operators?
Thanks
You can use a partial application of the < operator, using the (operator-symbol) syntax:
let is_sum_greater_than_10 list =
list
|> Seq.filter filter
|> Seq.sum
|> (<)10
You can also see this as an equivalent of a lambda application:
let is_sum_greater_than_10 list =
list
|> Seq.filter filter
|> Seq.sum
|> (fun x y -> x < y)10
or just a lambda:
let is_sum_greater_than_10 list =
list
|> Seq.filter filter
|> Seq.sum
|> (fun y -> 10 < y)
You can use a slightly modified version of your example, albeit this is in an infix expression notation:
let ``is sum greater than 10`` filter list =
(list
|> Seq.filter filter
|> Seq.sum) > 10

Resources