How to filter rows using Deedle - f#

In order to get comfortable with Deedle I made up a CSV file that represents a log of video rentals.
RentedOn,Shop,Title
12/dec/2013 00:00:00,East,Rambo
12/dec/2013 00:00:00,West,Rocky
12/dec/2013 00:00:00,West,Rambo
12/dec/2013 00:00:00,East,Rambo
13/dec/2013 00:00:00,East,Rocky
13/dec/2013 00:00:00,East,Rocky
13/dec/2013 00:00:00,East,Rocky
14/dec/2013 00:00:00,West,Rocky 2
I have the following function, that groups the rentals by Shop (East or West):
let overview =
__SOURCE_DIRECTORY__ + "/rentallog.csv"
|> Frame.ReadCsv
|> Frame.groupRowsByString "Shop"
|> Frame.nest
|> Series.map (fun dtc df ->
df.GetSeries<string>("Title") |> Series.groupBy (fun k v -> v)
|> Frame.ofColumns |> Frame.countValues )
|> Frame.ofRows
I'd like to be able to filter the rows by the date in the RentedOn col, however, I'm not sure how to do this. I know its probably using the Frame.filterRowValues function but I'm unsure the best way to use this. Any guidance on how to filter would be appreciated.
Update based on #jeremyh advice
let overview rentedOnDate =
let addRentedDate (f:Frame<_,_>) =
f.AddSeries ("RentedOnDate", f.GetSeries<DateTime>("RentedOn"))
f
__SOURCE_DIRECTORY__ + "/rentallog.csv"
|> Frame.ReadCsv
|> addRentedDate
|> Frame.filterRowValues (fun row -> row.GetAs<DateTime>("RentedOnDate") = rentedOnDate)
|> Frame.groupRowsByString "Shop"
|> Frame.nest
|> Series.map (fun dtc df ->
df.GetSeries<string>("Title") |> Series.groupBy (fun k v -> v)
|> Frame.ofColumns |> Frame.countValues )
|> Frame.ofRows
Thanks,
Rob

Hey I think that you might get a faster answer if you add an f# tag to your question too.
I used the following link to answer your question which has some helpful examples.
This is the solution I came up with. Please note that I added a new column RentedOnDate that actually has a DateTime type that I do the filtering on.
let overview rentedOnDate =
let rentalLog =
__SOURCE_DIRECTORY__ + "/rentallog.csv"
|> Frame.ReadCsv
rentalLog
|> Frame.addSeries "RentedOnDate" (rentalLog.GetSeries<DateTime>("RentedOn"))
|> Frame.filterRowValues (fun row -> row.GetAs<DateTime>("RentedOnDate") = rentedOnDate)
|> Frame.groupRowsByString "Shop"
|> Frame.nest
|> Series.map (fun dtc df ->
df.GetSeries<string>("Title") |> Series.groupBy (fun k v -> v)
|> Frame.ofColumns |> Frame.countValues )
|> Frame.ofRows
// Testing
overview (DateTime.Parse "12/dec/2013 00:00:00")

Related

F#.Data HTML Parser Extracting Strings From Nodes

I am trying to use FSharp.Data's HTML Parser to extract a string List of links from href attributes.
I can get the links printed out to console, however, i'm struggling to get them into a list.
Working snippet of a code which prints the wanted links:
let results = HtmlDocument.Load(myUrl)
let links =
results.Descendants("td")
|> Seq.filter (fun x -> x.HasClass("pagenav"))
|> Seq.map (fun x -> x.Elements("a"))
|> Seq.iter (fun x -> x |> Seq.iter (fun y -> y.AttributeValue("href") |> printf "%A"))
How do i store those strings into variable links instead of printing them out?
Cheers,
On the very last line, you end up with a sequence of sequences - for each td.pagenav you have a bunch of <a>, each of which has a href. That's why you have to have two nested Seq.iters - first you iterate over the outer sequence, and on each iteration you iterate over the inner sequence.
To flatten a sequence of sequences, use Seq.collect. Further, to convert a sequence to a list, use Seq.toList or List.ofSeq (they're equivalent):
let a = [ [1;2;3]; [4;5;6] ]
let b = a |> Seq.collect id |> Seq.toList
> val b : int list = [1; 2; 3; 4; 5; 6]
Applying this to your code:
let links =
results.Descendants("td")
|> Seq.filter (fun x -> x.HasClass("pagenav"))
|> Seq.map (fun x -> x.Elements("a"))
|> Seq.collect (fun x -> x |> Seq.map (fun y -> y.AttributeValue("href")))
|> Seq.toList
Or you could make it a bit cleaner by applying Seq.collect at the point where you first encounter a nested sequence:
let links =
results.Descendants("td")
|> Seq.filter (fun x -> x.HasClass("pagenav"))
|> Seq.collect (fun x -> x.Elements("a"))
|> Seq.map (fun y -> y.AttributeValue("href"))
|> Seq.toList
That said, I would rather rewrite this as a list comprehension. Looks even cleaner:
let links = [ for td in results.Descendants "td" do
if td.HasClass "pagenav" then
for a in td.Elements "a" ->
a.AttributeValue "href"
]

F# sort by indexes

Let's say I have two lists:
let listOfValues = [100..105] //can be list of strings or whatever
let indexesToSortBy = [1;2;0;4;5;3]
Now I need listOfValues_sorted: 102;100;101;105;103;104
It can be done with zip and "conversion" to Tuple:
let listOfValues_sorted = listOfValues
|> Seq.zip indexesToSortBy
|> Seq.sortBy( fun x-> fst x)
|> Seq.iter(fun c -> printfn "%i" (snd c))
But I guess, there is better solution for that?
I think your solution is pretty close. I would do this
let listOfValues_sorted =
listOfValues
|> Seq.zip indexesToSortBy
|> Seq.sortBy fst
|> Seq.toList
|> List.unzip
|> List.head
you can collapse fun x -> fst x into simply fst. And then unzip and get what ever list you want
If indexesToSortBy is a complete set of indexes you could simply use:
indexesToSortBy |> List.map (fun x -> listOfValues |> List.item x )
Your example sounds precisely what the List.permute function is for:
let listOfValues = [100..105]
let indexesToSortBy = [|1;2;0;4;5;3|] // Note 0-based indexes
listOfValues |> List.permute (fun i -> indexesToSortBy.[i])
// Result: [102; 100; 101; 105; 103; 104]
Two things: First, I made indexesToSortBy an array since I'll be looking up a value inside it N times, and doing that in a list would lead to O(N^2) run time. Second, List.permute expects to be handed a 0-based index into the original list, so I subtracted 1 from all the indexes in your original indexToSortBy list. With these two changes, this produces exactly the same ordering as the let listOfValues_sorted = ... example in your question.

What's an alternative to Seq.iter so that I can return the result of the operation for the last item?

What's an alternative to Seq.iter so that I can return the result of the operation for the last item?
Seq.iter returns a unit. However, I want to iterate through my collection and return the last result.
Consider the following code:
let updatedGrid = grid |> Map.toSeq
|> Seq.map snd
|> Seq.iter (fun c -> grid |> setCell c
NOTE: SetCell returns a new Map:
Here's the actual code:
let setCell cell (grid:Map<(int * int), Cell>) =
grid |> Map.map (fun k v -> match k with
| c when c = (cell.X, cell.Y) -> { v with State=cell.State }
| _ -> v)
let cycleThroughCells (grid:Map<(int * int), Cell>) =
let updatedGrid = grid |> Map.toSeq
|> Seq.map snd
|> Seq.iter (fun c -> grid |> setCell c
|> ignore)
updatedGrid
Again, I just want to take the result of the last operation in the iter function
[UPDATED]
I think this works (using map):
let cycleThroughCells (grid:Map<(int * int), Cell>) =
let updatedGrid = grid |> Map.toSeq
|> Seq.map snd
|> Seq.map (fun c -> grid |> setCell c)
|> Seq.last
updatedGrid
As I said in a comment, it seems like you almost certainly want a fold so that the updated grid is passed to each successive call; otherwise the modifications are all dropped except for the last one.
I think this would do the trick:
let cycleThroughCells (grid:Map<(int * int), Cell>) =
grid
|> Map.toSeq
|> Seq.map snd
|> Seq.fold (fun grid c -> grid |> setCell c) grid
and if you reorder the arguments to setCell so that the grid argument comes first then the last line can just be |> Seq.fold setCell grid.
I don't think one exists but you can define your own using fold:
let tapSeq f s = Seq.fold (fun _ x -> f x; Some(x)) None s

GroupBy Year then take Pairwise diffs except for the head value then Flatten Using Deedle and F#

I have the following variable:
data:seq<(DateTime*float)>
and I want to do something like the following F# code but using Deedle:
data
|> Seq.groupBy (fun (k,v) -> k.Year)
|> Seq.map (fun (k,v) ->
let vals = v |> Seq.pairwise
let first = seq { yield v |> Seq.head }
let diffs = vals |> Seq.map (fun ((t0,v0),(t1,v1)) -> (t1, v1 - v0))
(k, diffs |> Seq.append first))
|> Seq.collect snd
This works fine using F# sequences but I want to do it using Deedle series. I know I can do something like:
(data:Series<DateTime*float>) |> Series.groupBy (fun k v -> k.Year)...
But then I need to take the within group year diffs except for the head value which should just be the value itself and then flatten the results into on series...I am bit confused with the deedle syntax
Thanks!
I think the following might be doing what you need:
ts
|> Series.groupInto
(fun k _ -> k.Month)
(fun m s ->
let first = series [ fst s.KeyRange => s.[fst s.KeyRange]]
Series.merge first (Series.diff 1 s))
|> Series.values
|> Series.mergeAll
The groupInto function lets you specify a function that should be called on each of the groups
For each group, we create series with the differences using Series.diff and append a series with the first value at the beginning using Series.merge.
At the end, we get all the nested series & flatten them using Series.mergeAll.

Is there a way to write this in F#?

let is_sum_greater_than_10 list =
list
|> Seq.filter (filter)
|> Seq.sum
|> (10 >)
This does not compile. Lookng at the last line "|> (10 >)" is there a way to write this such that the left is pipelined to the right for binary operators?
Thanks
You can use a partial application of the < operator, using the (operator-symbol) syntax:
let is_sum_greater_than_10 list =
list
|> Seq.filter filter
|> Seq.sum
|> (<)10
You can also see this as an equivalent of a lambda application:
let is_sum_greater_than_10 list =
list
|> Seq.filter filter
|> Seq.sum
|> (fun x y -> x < y)10
or just a lambda:
let is_sum_greater_than_10 list =
list
|> Seq.filter filter
|> Seq.sum
|> (fun y -> 10 < y)
You can use a slightly modified version of your example, albeit this is in an infix expression notation:
let ``is sum greater than 10`` filter list =
(list
|> Seq.filter filter
|> Seq.sum) > 10

Resources