Finding objects by class or id in HTML parser - f#

There is a nice library for parsing HTML files in F#. I can easily get all <a> objects:
let links = results.Descendants ["a"]
But what about searching for objects with specific classes or id? Does this library provide such functionality?
Unfortunately, documentation of this library is quite bad, I don't really know what exactly I can do with it.

There is a work-in-progress pull request adding CSS selectors to F# Data. If you can help us by testing it and reviewing it, that would be great!
In the meantime, you can use standard F# collection processing functions - for example, to find <a> elements with class="fl", you can write:
results.Descendants ["a"]
|> Seq.filter (fun a ->
a.TryGetAttribute("class")
|> Option.map (fun cls -> cls.Value()) = Some "fl")
|> Seq.iter (fun l ->
printfn "%s" (l.InnerText()))

Related

Any way to "open Seq" or similar effect?

a
|>Seq.map fixLine
|>Seq.map splitCells
|>Seq.map getName
|>Seq.where(fun a->not<|Seq.isEmpty a)
|>Seq.map fixName
Always find it annoying while keep lots of Seq. in lines. Suggest a good way to omit them...
For example, use List.map for lists, use just map for seq, or split them into different modules when I'm using seq and lists.
a
|>map fixLine
|>map splitCells
|>map getName
|>where(fun a->not<|isEmpty a)
|>map fixName
Looks really better.
You could also just define aliases for the functions you want:
let map = Seq.map
let where = Seq.filter
Or you could make it even more terse by defining your own operators:
let (|!>) s f = Seq.map f s
let (|*>) s f = Seq.filter f s
a
|!> fixLine
|!> splitCells
|!> getName
|*> (fun a->not<|isEmpty a)
|!> fixName
But at this point, your code becomes way too cryptic - i.e. someone looking at the code will have a hard time understanding what's going on.
And finally, you could make the original code look a bit better by noticing that a composition of maps is a map of composition:
a
|> Seq.map (fixLine >> splitCells >> getName)
|> Seq.filter (not << isEmpty)
|> Seq.map fixName
This is the solution that I personally would prefer.
In general, my personal experience shows that, despite the first impulse to "fix" the repetitiveness by making the repetitive parts themselves smaller, there is usually a better solution that would make your code not only look better, but better factored as well.
I don't think there is an easy way to avoid repeating the Seq - this is just one place where F# makes things a bit more explicit (so that you know what's going on).
But you can use the F# Core Fluent library which gives you a more C#-like syntax with .:
a.map(fixLine).map(splitCells).map(getName).filter(isEmpty >> not).map(fixName)

Using an F# type provider to instantiate types and present them as properties

I'd like to do the following:
let allTypes = AllTypes (t, assemblies)
... where AllTypes is a type provider, the properties of which are instances of all types in the given array of assemblies that subclass type t. (All of the types have a single constructor that takes no arguments.)
Is this doable using F# type providers? I have no experience creating my own provider, and I don't want to waste my time attempting to do this if it isn't feasible.
I'd greatly appreciate any links to pages that would get me started coding this.
there's a lot of activity going on in the FSharp.Data github repo. There is a learning curve, but tuning into that repo might be useful.
Beyond that, this intro tutorial covers some of the basics, and here's a Type Provider starter pack that's been prepared by the F# open source community.
The fsharp.org site, and this projects page covers a cross-section of what's going on (including type providers).
You could take the list that Mark suggests here and turn it into a type provider. I think an exploratory way of interacting with namespaces would be useful. Why not? I'd use it. Please publish on GitHub if you get around to it.
You don't need a type provider for that; you can write that code using basic reflection:
open System.Reflection
let allTypes (baseClass : Type) (assemblies : Assembly seq) =
assemblies
|> Seq.collect (fun x -> x.GetExportedTypes())
|> Seq.filter (fun x -> baseClass.IsAssignableFrom x)
|> Seq.collect (fun x -> x.GetConstructors())
|> Seq.filter (fun x -> x.GetParameters().Length = 0)
|> Seq.map (fun x -> x.Invoke([||]))
The allTypes function has this signature: Type -> Assembly seq -> obj seq.

Get a column by name as array from CsvFile.Load (or create dictionary of arrays from csv)

I have the following code to load a csv. What is the best way to get a column from "msft" (preferably by name) as an array? Or should I be loading the data in a different way to do this?
#r "FSharp.Data.dll"
open FSharp.Data.Csv
let msft = CsvFile.Load("http://ichart.finance.yahoo.com/table.csv?s=MSFT").Cache()
Edit: Alternatively, what would be an efficient way to import a csv into a dictionary of arrays keyed by column name? If I should really be creating a new question for this, please let me know. Not yet familiar with all stackoverflow standards.
Building on Latkin's answer, this seems like the more functional or F# way of doing what you want.
let getVector columnAccessor msft =
[| yield! msft.Data |> Seq.map columnAccessor |]
(* Now we can get the column all at once *)
let closes = getVector (fun x -> x.Close) msft
(* Or we can create an accessor and pipe our data to it. *)
let getCloses = getVector (fun x -> x.Close)
let closes = msft |> getCloses
I hope that this helps.
I went through this example as well. Something like the following should do it.
let data =
msft.Data
|> List.fold (fun acc row -> row.Date :: acc) List.Empty<DateTime>
Here I am piping the msft.Data list of msft data records and folding it down to a list of one item from that list. Please check the documentation for all functions mentioned. I have not run this.
When you say you want to column "by name" it's not clear if you mean "someone passes me the column name as a string" or "I use the column name in my code." Type providers are perfect for the latter case, but do not really help with the former.
For the latter case, you could use this:
let closes = [| yield! msft.Data |> Seq.map (fun x -> x.Close) |]
If the former, you might want to consider reading in the data some other way, perhaps to a dictionary keyed by column names.
The whole point of type providers is to make all of this strongly typed and code-focused, and to move away from passing column names as strings which might or might not be valid.

Splitting a string list list

I'm quite (very) new to F# and I'm scratching my head over a little problem. I have a string list list that I'm trying to manipulate and transform. This is probably trivial.
The following data is being read in from a CSV file:
1,ABC,3
1,DEF,3
1,XYZ,1
2,ABC,2
2,XYZ,1
3,DEF,2
3,XYZ,2
Which right or wrong, I'm reading into a string list list. This data represents a non-normalized set of data, where the field at index 0 on each record is an Identifier field. At the moment I'm just trying to split the outer-list up so that I end up with a string list list list representing the following:
1,ABC,3 2,ABC,2 3,DEF,2
1,DEF,3 2,XYZ,1 3,XYZ,2
1,XYZ,1
The results above will then be pushed into my Typed model and fed into the rest of the application.
In your code:
csvRecords
|> Seq.groupBy (fun record -> (record.Item 0))
|> List.ofSeq
|> List.map(toTypedModel)
record.Item 0 isn't a good way to get the first element of a list. You should either use List.head or pattern matching for that purpose.
Your example would look like:
csvRecords
|> Seq.groupBy List.head
|> Seq.map toTypedModel
|> List.ofSeq
I also changed the order to use toTypedModel with sequence, it helps to avoid allocating an unnecessary list.
Use Seq.groupby -
input
|> Seq.groupBy (fun (a,b,c) -> a)
|> Seq.toList

Shorthand for calling a method on an object in F#

Is there a way to "generate" functions like this one:
fun x -> x.ToString
I would like to be able to turn an instance method into a static method which takes "this" as a parameter, like so:
items |> Seq.filter my_predicate |> Seq.map Object.ToString
this has been discussed several times on the F# hub. See for example instance methods as functions. This is quite tricky problem, so there are no plans to have something like this in the first version of F# as far as I know, but it would be great to have something like that eventually :-).
Another workaround that you could do is to add static member as an extension method in F#:
type System.Object with
static member ObjToString(o:obj) = o.ToString()
open System
[ 1 .. 10 ] |> Seq.map Object.ObjToString;;
But that is a bit ugly. Also, it seems that this works only if you use different name for the method. I guess that F# doesn't allow you to overload existing method with an extension method and always prefer the intrinsic one.
I don't know if I exactly understood you, but for this specific example you could write:
items |> Seq.filter my_predicate |> Seq.map (fun x -> x.ToString)
or
let f = fun x -> x.ToString
items |> Seq.filter my_predicate |> Seq.map f

Resources