I can find specific row with HtmlProvider.
Is it possible to get another html information from HtmlProvider.Tables.Row?
type Provider = HtmlProvider<"
<table><tbody>
<tr><td class=\"orange\" >something2</td><td>20.1</td></tr>
<tr><td class=\"grean\" >something</td><td>23.5</td></tr>
<tr><td class=\"orange\" >something3</td><td>20.0</td></tr>
</tbody></table>">// backslashes are for escaping
let wantedRow = Provider.GetSample().Tables.Table1.Rows
|> Seq.filter (fun c->if float (c.Column2)=20.0 then true else false)
|> Seq.head
Now I have wantedRow and can extract something3 string.
But I need to get class of that row (orange).
Something like this written in pseudocode (GetHtml is pseudo..):
(*Pseudo code warning *)
let tdTag= wantedRow.GetHtml.Descendants["td"] |>Seq.head
let classStr = tdOfWantedRow.AttributeValue ("class") //orange
Is it possible to get such information with ease of HtmlProvider?
The Tables functionality of the HtmlProvider treats the table as data, discarding html attributes. To get at the HTML itself, you can resort to treating it like an HTML document:
let wantedRow = Provider.GetSample().Html.Descendants("tr")
|> Seq.filter (fun x -> float((x.Descendants("td") |> Seq.item 1).InnerText()) = float(20.0))
|> Seq.head
let cssClass = (wantedRow.Descendants() |> Seq.head).Attribute("class").Value()
printf "%s\n" cssClass
// prints "orange"
(of course, in a real world example, you'll want some safeguards in case nodes don't exist or the floats don't parse, but this should get you headed in the right direction)
Related
I have a text file that contains the following and I need to retrieve the value assigned to taskId, which in this case is AWc34YBAp0N7ZCmVka2u.
projectKey=ProjectName
serverUrl=http://localhost:9090
serverVersion=10.5.32.3
strong text**interfaceUrl=http://localhost:9090/interface?id=ProjectName
taskId=AWc34YBAp0N7ZCmVka2u
taskUrl=http://localhost:9090/api/ce/task?id=AWc34YBAp0N7ZCmVka2u
I have two different ways of reading the file that I've wrote.
let readLines (filePath:string) = seq {
use sr = new StreamReader (filePath)
while not sr.EndOfStream do
yield sr.ReadLine ()
}
readLines (FindFile currentDirectory "../**/sample.txt")
|> Seq.iter (fun line ->
printfn "%s" line
)
and
let readLines (filePath:string) =
(File.ReadAllLines filePath)
readLines (FindFile currentDirectory "../**/sample.txt")
|> Seq.iter (fun line ->
printfn "%s" line
)
At this point, I don't know how to approach getting the value I need. Options that, I think, are on the table are:
use Contains()
Regex
Record type
Active Pattern
How can I get this value returned and fail if it doesn't exist?
I think all the options would be reasonable - it depends on how complex the file will actually be. If there is no escaping then you can probably just look for = in the line and use that to split the line into a key value pair. If the syntax is more complex, this might not always work though.
My preferred method would be to use Split on string - you can then filter to find values with your required key, map to get the value and use Seq.head to get the value:
["foo=bar"]
|> Seq.map (fun line -> line.Split('='))
|> Seq.filter (fun kvp -> kvp.[0] = "foo")
|> Seq.map (fun kvp -> kvp.[1])
|> Seq.head
Using active patterns, you could define a pattern that takes a string and splits it using = into a list:
let (|Split|) (s:string) = s.Split('=') |> List.ofSeq
This then lets you get the value using Seq.pick with a pattern matching that looks for strings where the substring before = is e.g. foo:
["foo=bar"] |> Seq.pick (function
| Split ["foo"; value] -> Some value
| _ -> None)
The active pattern trick is quite neat, but it might be unnecessarily complicating the code if you only need this in one place.
I have a function processing a DataTable looking for any row that has a column with a certain value. It looks like this:
let exists =
let mutable e = false
for row in dt.Rows do
if row.["Status"] :?> bool = false
then e <- true
e
I'm wondering if there is a way to do this in a single expression. For example, Python has the "any" function which would do it something like this:
exists = any(row for row in dt.Rows if not row["Status"])
Can I write a similar one-liner in F# for my exists function?
You can use the Seq.exists function, which takes a predicate and returns true if the predicate holds for at least one element of the sequence.
let xs = [1;2;3]
let contains2 = xs |> Seq.exists (fun x -> x = 2)
But in your specific case, it won't work right away, because DataTable.Rows is of type DataRowCollection, which only implements IEnumerable, but not IEnumerable<T>, and so it won't be considered a "sequence" in F# sense, which means that Seq.* functions won't work on it. To make them work, you have to first cast the sequence to the correct type with Seq.cast:
let exists =
dt.Rows |>
Seq.cast<DataRow> |>
Seq.exists (fun r -> not (r.["Status"] :?> bool) )
Something like this (untested):
dt.Rows |> Seq.exists (fun row -> not (row.["Status"] :?> bool))
https://msdn.microsoft.com/visualfsharpdocs/conceptual/seq.exists%5b%27t%5d-function-%5bfsharp%5d
I have managed to read my text file which contains line by line random numbers. When I output lines using printfn "%A" lines I get seq ["45"; "5435" "34"; ... ] so I assume that lines must be a datatype list.
open System
let readLines filePath = System.IO.File.ReadLines(filePath);;
let lines = readLines #"C:\Users\Dan\Desktop\unsorted.txt"
I am now trying to sort the list by lowest to highest but it does not have the .sortBy() method. Any chance anyone can tell me how to manually do this? I have tried turning it to an array to sort it but it doesn't work.
let array = [||]
let counter = 0
for i in lines do
array.[counter] = i
counter +1
Console.ReadKey <| ignore
Thanks in advance.
If all the lines are integers, you can just use Seq.sortBy int, like so:
open System
let readLines filePath = System.IO.File.ReadLines(filePath)
let lines = readLines #"C:\Users\Dan\Desktop\unsorted.txt"
let sorted = lines |> Seq.sortBy int
If some of the lines may not be valid integers, then you'd need to run through a parsing and validation step. E.g.:
let tryParseInt s =
match System.Int32.TryParse s with
| true, n -> Some n
| false, _ -> None
let readLines filePath = System.IO.File.ReadLines(filePath)
let lines = readLines #"C:\Users\Dan\Desktop\unsorted.txt"
let sorted = lines |> Seq.choose tryParseInt |> Seq.sort
Note that the tryParseInt function I just wrote is returning the int value, so I used Seq.sort instead of Seq.sortBy int, and the output of that function chain is going to be a sequence of ints rather than a sequence of strings. If you really wanted a sequence of strings, but only the strings that could be parsed to ints, you could have done it like this:
let tryParseInt s =
match System.Int32.TryParse s with
| true, _ -> Some s
| false, _ -> None
let readLines filePath = System.IO.File.ReadLines(filePath)
let lines = readLines #"C:\Users\Dan\Desktop\unsorted.txt"
let sorted = lines |> Seq.choose tryParseInt |> Seq.sortBy int
Note how I'm returning s from this version of tryParseInt, so that Seq.choose is keeping the strings (but throwing away any strings that failed to validate through System.Int32.TryParse). There's plenty more possibilities, but that should give you enough to get started.
All the comments are valid but I'm a bit more concerned about your very imperative loop.
So here's an example:
To read all the lines:
open System.IO
let file = #"c:\tmp\sort.csv"
let lines = File.ReadAllLines(file)
To sort the lines:
let sorted = Seq.sort lines
sorted |> Seq.length // to get the number of lines
sorted |> Seq.map (fun x -> x.Length) // to iterate over all lines and get the length of each line
You can also use a list comprehension syntax:
[for l in sorted -> l.ToUpper()]
Seq will work for all kinds of collections but you can replace it with Array (mutable) or List (F# List).
I am completely at loss why this code doesn't mutate a member variable in a sequence of types:
for p in prescrs do
p.ATC <- "A"
for c in p.Drug.Components do
for s in c.Substances do
s.DoseTotal.Adjust <- adjustKg
s.DoseTotal.Time <- "DAY"
s.DoseTotal.Unit <- s.DrugConcentration.Unit
s.DoseRate.Adjust <- adjustKg
s.DoseRate.Time <- "DAY"
s.DoseRate.Unit <- s.DrugConcentration.Unit
prescrs is a sequence of Prescriptions which is a very simple 'POCO' defined as a type with member values. I don't have clue why this doesn't work.
I tried a simple test case like:
type IterTest () =
member val Name = "" with get, set
member val IterTests = [] |> List.toSeq : IterTest seq with get, set
let iterseq =
[
new IterTest(Name = "Test1")
new IterTest(Name = "Test2")
]
|> List.toSeq
iterseq |> Seq.iter(fun x -> x.IterTests <- iterseq)
iterseq |> Seq.iter(fun x ->
x.IterTests
|> Seq.iter(fun x' -> x'.Name <- "itered"))
But here the result is as expected. So, can't even quite reproduce my problem???
Found a solution (without really understanding the problem above). When I first convert the prescrs sequence to a list like:
let prescrs = prescrs |> Seq.toList
and then do the imperative looping, properties do get mutated.
Try this sample:
type Mutable() =
member val Iterated = false with get, set
let muts = Seq.init 5 (fun _ -> printfn "init"; Mutable())
let muts2 = muts // try again with let muts2 = muts |> List.ofSeq
printfn "Before iter"
for a in muts2 do
printfn "iter"
a.Iterated <- true
printfn "After iter"
muts2 |> List.ofSeq
and check how iter and init are interleaved.
Seqs are lazy, but are not cached once computed. So even if you imperatively try to mutate some of the elements in your prescrs sequence, it all goes away once you pull prescrs again. If you change prescrs into a concrete collection type like list before doing the mutation, you no longer hit the same problem. Note that things might get even trickier if what you have is a seq inside a seq inside a seq.
The best idea would be to avoid mutation in the first place though.
I have a function that returns a string[].
let asyncScrape url allParameters =
allParameters
|> Seq.map(fun v ->
yearAndClassResultsAsync url v)
|> Async.Parallel
|> Async.RunSynchronously
I want to iterate through that string array, sending each string to a method called resultsBody (that returns a seq), and then finally returning an single sequence that is the concatenation of the results from resultsBody.
I tried doing something like below, but I'm rather lost as it returns:
seq<string[]>[]
and I just want a single combined
seq<string[]>
My attempts so far:
let parseSite html =
Array.mapi (fun s -> resultsBody) html
Simplified, I think your problem is that you have a nested sequence of strings and you want to get as much parallelism as possible, rather than just at the innermost layer of the nesting.
One way you can do this is to also nest the Async.Parallel calls before calling Async.RunSynchronously. Here's a simple example of the technique:
let squareInt n = async { return n*n }
let inParallel (seqOfseqOfInts : seq<seq<int>>) =
seqOfseqOfInts
|> Seq.map // deal with each inner seq of ints
(fun (seqOfInts : seq<int>) ->
seqOfInts
|> Seq.map squareInt
|> Async.Parallel // this gives us Async<int[]>
) // this gives us seq<Async<int[]>>
|> Async.Parallel // this gives us Async<int[][]>
|> Async.RunSynchronously // this gives us int[][]