I'm learning F# and I've started to play around with both sequences and match expressions.
I'm writing a web scraper that's looking through HTML similar to the following and taking the last URL in a parent <span> with the paging class.
<html>
<body>
<span class="paging">
Link to Google
The Link I want
</span>
</body>
</html>
My attempt to get the last URL is as follows:
type AnHtmlPage = FSharp.Data.HtmlProvider<"http://somesite.com">
let findMaxPageNumber (page:AnHtmlPage)=
page.Html.Descendants()
|> Seq.filter(fun n -> n.HasClass("paging"))
|> Seq.collect(fun n -> n.Descendants() |> Seq.filter(fun m -> m.HasName("a")))
|> Seq.last
|> fun n -> n.AttributeValue("href")
However I'm running into issues when the class I'm searching for is absent from the page. In particular I get ArgumentExceptions with the message: Additional information: The input sequence was empty.
My first thought was to build another function that matched empty sequences and returned an empty string when the paging class wasn't found on a page.
let findUrlOrReturnEmptyString (span:seq<HtmlNode>) =
match span with
| Seq.empty -> String.Empty // <----- This is invalid
| span -> span
|> Seq.collect(fun (n:HtmlNode) -> n.Descendants() |> Seq.filter(fun m -> m.HasName("a")))
|> Seq.last
|> fun n -> n.AttributeValue("href")
let findMaxPageNumber (page:AnHtmlPage)=
page.Html.Descendants()
|> Seq.filter(fun n -> n.HasClass("paging"))
|> findUrlOrReturnEmptyStrin
My issue is now that Seq.Empty is not a literal and cannot be used in a pattern. Most examples with pattern matching specify empty lists [] in their patterns so I'm wondering: How can I use a similar approach and match empty sequences?
The suggestion that ildjarn gave in the comments is a good one: if you feel that using match would create more readable code, then make an active pattern to check for empty seqs:
let (|EmptySeq|_|) a = if Seq.isEmpty a then Some () else None
let s0 = Seq.empty<int>
match s0 with
| EmptySeq -> "empty"
| _ -> "not empty"
Run that in F# interactive, and the result will be "empty".
You can use a when guard to further qualify the case:
match span with
| sequence when Seq.isEmpty sequence -> String.Empty
| span -> span
|> Seq.collect (fun (n: HtmlNode) ->
n.Descendants()
|> Seq.filter (fun m -> m.HasName("a")))
|> Seq.last
|> fun n -> n.AttributeValue("href")
ildjarn is correct in that in this case, an if...then...else may be the more readable alternative, though.
Use a guard clause
match myseq with
| s when Seq.isEmpty s -> "empty"
| _ -> "not empty"
Building on the answer from #rmunn, you can make a more general sequence equality active pattern.
let (|Seq|_|) test input =
if Seq.compareWith Operators.compare input test = 0
then Some ()
else None
match [] with
| Seq [] -> "empty"
| _ -> "not empty"
Related
I am trying to create a recursive function that is conditionally calls itself and so far is is defined as follows:
let rec crawlPage (page : String, nestingLevel : int) =
HtmlDocument.Load(page)
|> fun m -> m.CssSelect("a")
|> List.map(fun a -> a.AttributeValue("href"))
|> Seq.distinctBy id
|> Seq.map (fun x -> baseUrl + x)
|> Seq.map (fun x ->
match nestingLevel with
// Compiler says it is expecting a but given seq<a> in reference to the recursive call
| _ when (nestingLevel > 0) -> crawlPage(x, (nestingLevel - 1))
| _ when (nestingLevel <= 0) -> ignore
| _ -> (* To silence warnigs.*) ignore
)
It is that the Seq.map (fun x -> ...) cannot handle the return sequence or can the match condition not handle the returned sequence? Given that the crawlPage is underlined by the compiler it seems that the match statement cannot handle the seq returned so how can this be done?
The rule is that all the matching branches must return the same type, so you have to:
Replace ignore with Seq.singleton x to indicate that this branch yields nothing more except the x itself.
At the end, concat (flat map) the seq<seq<string>> to transform it to a seq<string>.
The code would be:
|> Seq.map (fun x ->
match nestingLevel with
| _ when (nestingLevel > 0) -> crawlPage(x, (nestingLevel - 1))
| _ -> Seq.singleton x)
|> Seq.concat
The existing post answers your specific question, but I think it is worth noting that there are a few other changes that could be done to your code snippet. Some of those are a matter of personal preference, but I believe they make your code simpler:
You can use sequence comprehension, which lets you handle recursive calls nicely using yield! (and non-recursive using yield)
You do not actually need match, because you have just two branches that are more easily tested using ordinary if
I would also avoid the |> fun m -> m.Xyz pattern, because it's not necessary here.
With all those tweaks, my preferred version of the code snippet would be:
let rec crawlPage (page : String, nestingLevel : int) = seq {
let urls =
HtmlDocument.Load(page).CssSelect("a")
|> List.map(fun a -> a.AttributeValue("href"))
|> Seq.distinctBy id
|> Seq.map (fun x -> baseUrl + x)
for x in urls do
if nestingLevel > 0 then
yield! crawlPage(x, (nestingLevel - 1))
else
yield x }
I'm trying to write a tail-recursion function that will look at a list of distinct words, a list of all words, and return a list with the count of occurrences of each word. I'm actually reading the words out of files in a directory, but I can't seem to get the tail-recursion to compile. This is what I have so far:
let countOccurence (word:string) list =
List.filter (fun x -> x.Equals(word)) list
//(all words being a list of all words across several files)
let distinctWords = allWords |> Seq.distinct
let rec wordCloud distinct (all:string list) acc =
match distinct with
| head :: tail -> wordCloud distinct tail Array.append(acc, (countOccurence head all)) //<- What am I doing with my life?
| [] -> 0
I realize this is probably a fairly straightforward question, but I've been banging my head for an hour on this final piece of the puzzle. Any thoughts?
There are several issues with the statement as given:
Use of Array.append to manipulate lists
Typos
Incorrect use of whitespace to group things together
Try expressing the logic as a series of steps instead of putting everything into a single, unreadable line of code. Here's what I did to understand the problems with the above expression:
let rec wordCloud distinct (all:string list) acc =
match distinct with
| head :: tail ->
let count = countOccurence head all
let acc' = acc |> List.append count
wordCloud distinct tail acc'
| [] -> 0
This compiles, but I don't know if it does what you want it to do...
Notice the replacement of Array.append with List.append.
This is still tail recursive, since the call to wordCloud sits in the tail position.
After several hours more work, I came up with this:
let countOccurance (word:string) list =
let count = List.filter (fun x -> word.Equals(x)) list
(word, count.Length)
let distinctWords = allWords |> Seq.distinct |> Seq.toList
let print (tup:string*int) =
match tup with
| (a,b) -> printfn "%A: %A" a b
let rec wordCloud distinct (all:string list) (acc:(string*int) list) =
match distinct with
| [] -> acc
| head :: tail ->
let accumSoFar = acc # [(countOccurance head all)]
wordCloud tail all accumSoFar
let acc = []
let cloud = (wordCloud distinctWords allWords acc)
let rec printTup (tupList:(string*int) list) =
match tupList with
| [] -> 0
| head :: tail ->
printfn "%A" head
printTup tail
printTup cloud
This problem actually has a pretty straightforward solution, if you take a step back and simply type in what you want to do.
/// When you want to make a tag cloud...
let makeTagCloud (words: string list) =
// ...take a list of all words...
words
// ...then walk along the list...
|> List.fold (fun cloud word ->
// ...and check if you've seen that word...
match cloud |> Map.tryFind word with
// ...if you have, bump the count...
| Some count -> cloud |> Map.add word (count+1)
// ...if not, add it to the map...
| None -> cloud |> Map.add word 1) Map.empty
// ...and change the map back into a list when you are done.
|> Map.toList
Reads like poetry ;)
I have a list :
[objA;objB;objC;objD]
I need to do the following reduction :
obj -> obj -> obj
ie :
objA objB -> objB'
and then take back the original list so that I get :
[objB';objC;objD]
I am trying to do the following :
let rec resolveConflicts = function
| [] -> []
| [c] -> resolveConflict c
| [x;y] -> resolveConflict x
|> List.filter getMovesRelatedtoY
|> List.append y
|> resolveConflict
| [x;y::tail] ->
let y' = resolveConflict x
|> List.filter getMovesRelatedtoY
|> List.append y
resolveConflicts y'::tail
This syntax is not correct, may be I am not even using the correct tool... I am open to any well suited solution so that I can learn.
As to why, I filter the list and append one to another, it is just that every conflict is a list of moves.
To match first element, second element and the rest of the list, use this pattern:
| fst::snd::rest ->
You can match any constant number of first elements using this styling:
| fst::snd::thrd:: ... ::rest ->
Is there already a way to do something like a chooseTill or a foldTill, where it will process until a None option is received? Really, any of the higher order functions with a "till" option. Granted, it makes no sense for stuff like map, but I find I need this kind of thing pretty often and I wanted to make sure I wasn't reinventing the wheel.
In general, it'd be pretty easy to write something like this, but I'm curious if there is already a way to do this, or if this exists in some known library?
let chooseTill predicate (sequence:seq<'a>) =
seq {
let finished = ref false
for elem in sequence do
if not !finished then
match predicate elem with
| Some(x) -> yield x
| None -> finished := true
}
let foldTill predicate seed list =
let rec foldTill' acc = function
| [] -> acc
| (h::t) -> match predicate acc h with
| Some(x) -> foldTill' x t
| None -> acc
foldTill' seed list
let (++) a b = a.ToString() + b.ToString()
let abcdef = foldTill (fun acc v ->
if Char.IsWhiteSpace v then None
else Some(acc ++ v)) "" ("abcdef ghi" |> Seq.toList)
// result is "abcdef"
I think you can get that easily by combining Seq.scan and Seq.takeWhile:
open System
"abcdef ghi"
|> Seq.scan (fun (_, state) c -> c, (string c) + state) ('x', "")
|> Seq.takeWhile (fst >> Char.IsWhiteSpace >> not)
|> Seq.last |> snd
The idea is that Seq.scan is doing something like Seq.fold, but instead of waiting for the final result, it yields the intermediate states as it goes. You can then keep taking the intermediate states until you reach the end. In the above example, the state is the current character and the concatenated string (so that we can check if the character was whitespace).
A more general version based on a function that returns option could look like this:
let foldWhile f initial input =
// Generate sequence of all intermediate states
input |> Seq.scan (fun stateOpt inp ->
// If the current state is not 'None', then calculate a new one
// if 'f' returns 'None' then the overall result will be 'None'
stateOpt |> Option.bind (fun state -> f state inp)) (Some initial)
// Take only 'Some' states and get the last one
|> Seq.takeWhile Option.isSome
|> Seq.last |> Option.get
Suppose I have the following code:
type Vehicle =
| Car of string * int
| Bike of string
let xs = [ Car("family", 8); Bike("racing"); Car("sports", 2); Bike("chopper") ]
I can filter above list using incomplete pattern matching in an imperative for loop like:
> for Car(kind, _) in xs do
> printfn "found %s" kind;;
found family
found sports
val it : unit = ()
but it will cause a:warning FS0025: Incomplete pattern matches on this expression. For example, the value 'Bike (_)' may indicate a case not covered by the pattern(s). Unmatched elements will be ignored.
As the ignoring of unmatched elements is my intention, is there a possibility to get rid of this warning?
And is there a way to make this work with list-comprehensions without causing a MatchFailureException? e.g. something like that:
> [for Car(_, seats) in xs -> seats] |> List.sum;;
val it : int = 10
Two years ago, your code was valid and it was the standard way to do it. Then, the language has been cleaned up and the design decision was to favour the explicit syntax. For this reason, I think it's not a good idea to ignore the warning.
The standard replacement for your code is:
for x in xs do
match x with
| Car(kind, _) -> printfn "found %s" kind
| _ -> ()
(you could also use high-order functions has in pad sample)
For the other one, List.sumBy would fit well:
xs |> List.sumBy (function Car(_, seats) -> seats | _ -> 0)
If you prefer to stick with comprehensions, this is the explicit syntax:
[for x in xs do
match x with
| Car(_, seats) -> yield seats
| _ -> ()
] |> List.sum
You can silence any warning via the #nowarn directive or --nowarn: compiler option (pass the warning number, here 25 as in FS0025).
But more generally, no, the best thing is to explicitly filter, as in the other answer (e.g. with choose).
To explicitly state that you want to ignore unmatched cases, you can use List.choose and return None for those unmatched elements. Your codes could be written in a more idomatic way as follows:
let _ = xs |> List.choose (function | Car(kind, _) -> Some kind
| _ -> None)
|> List.iter (printfn "found %s")
let sum = xs |> List.choose (function | Car(_, seats)-> Some seats
| _ -> None)
|> List.sum