Suggestion for fast performance expanding apply with deedle - f#

The Stats.expandingXXXX functions are pretty fast. However there is no public api to do a expandingWindow apply. The following solution i created is really slow when it comes to large dataset like 100k. Any suggestion is appreciated?
let ExpWindowApply f minSize data =
let keys = dataSeries.Keys
let startKey = dataSeries.FirstKey()
let values = keys
|> Seq.map(fun k ->
let ds = data.Between(startKey,k)
match ds with
|_ when ds.ValueCount >= minSize -> f ds.Values
|_ -> Double.NaN
)
let result = Series(keys, values)
result
I understand the Stats.expandingXXX function are actually special cases where the function being applied can be iterately calculated based on previous loop's state. And not all function can take advantage of states from previous calculation. Is there anything better way than Series.Between in terms of creating a window of data?
Update
For those who are also interested in the similar issue. The answer provides alternative implementation and insight into rarely documented series vector and index operation. But it doesn't improve performance.

The expanding functions in Deedle are fast because they are using an efficient online algorithm that makes it possible to calculate the statistics on the fly with just one pass - rather than actually building the intermediate series for the sub-ranges.
There is a built-in function aggregate that lets you do something this - though it works in the reversed way. For example, if you want to sum all elements starting from the current one to the end, you can write:
let s = series [ for i in 1 .. 10 -> i, float i ]
s |> Series.aggregateInto
(Aggregation.WindowWhile(fun _ _ -> true))
(fun seg -> seg.Data.FirstKey())
(fun seg -> OptionalValue(Stats.sum seg.Data))
If you want to do the same thing using the underlying representation, you can directly use the addressing scheme that Deedle uses to link the keys (in the index) with values (in the data vector). This is an ugly mutable sample, but you can encapsulate it into something nicer:
[ let firstAddr = s.Index.Locate(s.FirstKey())
for k in s.Index.KeySequence ->
let lastAddr = s.Index.Locate(k)
seq {
let a = ref firstAddr
while !a <> lastAddr do
yield s.Vector.GetValue(!a).Value
a := s.Index.AddressOperations.AdjustBy(!a, +1L) } |> Seq.sum ]

Related

F# hidden mutation

Anyone have a decent example, preferably practical/useful, they could post demonstrating the concept?
I came across this term somewhere that I’m unable to find, probably it has to do something with a function returning a function while enclosing on some mutable variable. So there’s no visible mutation.
Probably Haskell community has originated the idea where mutation happens in another area not visible to the scope. I maybe vague here so seeking help to understand more.
It's a good idea to hide mutation, so the consumers of the API won't inadvartently change something unexpectedly. This just means that you have to encapsulate your mutable data/state. This can be done via objects (yes, objects), but what you are referring to in your question can be done with a closure, the canonical example is a counter:
let countUp =
let mutable count = 0
(fun () -> count <- count + 1
count)
countUp() // 1
countUp() // 2
countUp() // 3
You cannot access the mutable count variable directly.
Another example would be using mutable state within a function so that you cannot observe it, and the function is, for all intents and purposes, referentially transparent. Take for example the following function that reverses a string not character-wise, but rather by taking individual text elements (which, depending on language, can be more than one character):
let reverseStringU s =
if Core.string.IsNullOrEmpty s then s else
let rec iter acc (ee : System.Globalization.TextElementEnumerator) =
if not <| ee.MoveNext () then acc else
let e = ee.GetTextElement ()
iter (e :: acc) ee
let inline append x s = (^s : (member Append : ^x -> ^s) (s, x))
let sb = System.Text.StringBuilder s.Length
System.Globalization.StringInfo.GetTextElementEnumerator s
|> iter []
|> List.fold (fun a e -> append e a) sb
|> string
It uses a StringBuilder internally but you cannot observe this externally.

Fourier transformation with mathdotnet in F#

I am trying to use the Math.NET numerics implementation of the FFT algorithm, but I must be doing something wrong because the output is always unit
The following is the the setup:
open MathNet.Numerics
open MathNet.Numerics.Statistics
open MathNet.Numerics.IntegralTransforms
let rnd = new Random()
let rnddata = Array.init 100 (fun u -> rnd.NextDouble())
let x = rnddata |> Array.Parallel.map (fun d -> MathNet.Numerics.complex.Create(d, 0.0) )
then when I run this:
let tt = MathNet.Numerics.IntegralTransforms.Fourier.BluesteinForward(x, FourierOptions.Default)
I receive an empty output below?
val tt : unit = ()
Any ideas why?
I think the Fourier.BluesteinForward method stores the results in the input array (by overwriting whatever was there originally).
If you do not need the input after running the transform, you can just use x and read the results (this saves some memory copying, which is why Math.NET does that by default). Otherwise, you can clone the array and wrap it in a more functional style code like this:
let bluesteinForward input =
let output = Array.copy input
MathNet.Numerics.IntegralTransforms.Fourier.BluesteinForward
(output, FourierOptions.Default)
output

Avoiding using Option.Value

I have a type like this:
type TaskRow =
{
RowIndex : int
TaskId : string
Task : Task option
}
A function returns a list of these records to be processed further. Some of the functions doing that processing are only relevant for TaskRow items where Task is Some. I'm wondering what the best way is to go about that.
The naive way would be doing
let taskRowsWithTasks = taskRows |> Seq.filter (fun row -> Option.isSome row.Task)
and passing that to those functions, simply assuming that Task will never be None and using Task.Value, risking an NRE if I don't pass in that one special list. That is exactly what the current C# code does but seems rather unidiomatic for F#. I shouldn't be 'assuming' things but rather let the compiler tell me what will work.
More 'functional' would be to pattern match every time the value is relevant and then do/return nothing (and use choose or the like) for None, but that seems repetitive and wasteful as the same work would be done multiple times.
Another thought was introducing a second, slightly different type:
type TaskRowWithTask =
{
RowIndex : int
TaskId : string
Task : Task
}
The original list would then be filtered into a 'sublist' of this type one to be used where appropriate. I guess that would be okay from a functional perspective, but I wonder whether there's a nicer, idiomatic way without resorting to this kind of 'helper type'.
Thanks for any pointers!
There's quite a bit of value knowing that the tasks have already been filtered, so having two different types can be helpful. Instead of defining two different types (which, in F#, isn't that big a deal, though), you could also consider defining a generic Row type:
type Row<'a> = {
RowIndex : int
TaskId : string
Item : 'a }
This enables you to define a projection like this:
let project = function
| { RowIndex = ridx; TaskId = tid; Item = Some t } ->
Some { RowIndex = ridx; TaskId = tid; Item = t }
| _ -> None
let taskRowsWithTasks =
taskRows
|> Seq.map project
|> Seq.choose id
If the initial taskRows value has the type seq<Row<Task option>>, then the resulting taskRowsWithTasks sequence has the type seq<Row<Task>>.
I agree with you, the more "pure functional" way is to repeat the pattern match, I mean use a function with Seq.choose that does the filtering, instead of saving it to another structure.
let tasks = Seq.choose (fun {Task = t} -> t) taskRows
The problem is performance as it would be calculated many times, but you can use Seq.cache so behind the scenes it's saved into an intermediate structure, while keeping your code more "pure functional" looking.

Convert procedural solution to functional one. F#

I'm trying to wrap my head around functional programming using F#. I'm working my way through the Project Euler problems, and I feel like I am just writing procedural code in F#. For instance, this is my solution to #3.
let Calc() =
let mutable limit = 600851475143L
let mutable factor = 2L // Start with the lowest prime
while factor < limit do
if limit % factor = 0L then
begin
limit <- limit / factor
end
else factor <- factor + 1L
limit
This works just fine, but all I've really done is taken how I would solve this problem in c# and converted it to F# syntax. Looking back over several of my solutions, this is becoming a pattern. I think that I should be able to solve this problem without using mutable, but I'm having trouble not thinking about the problem procedurally.
Why not with recursion?
let Calc() =
let rec calcinner factor limit =
if factor < limit then
if limit % factor = 0L then
calcinner factor (limit/factor)
else
calcinner (factor + 1L) limit
else limit
let limit = 600851475143L
let factor = 2L // Start with the lowest prime
calcinner factor limit
For algorithmic problems (like project Euler), you'll probably want to write most iterations using recursion (as John suggests). However, even mutable imperative code sometimes makes sense if you are using e.g. hashtables or arrays and care about performance.
One area where F# works really well which is (sadly) not really covered by the project Euler exercises is designing data types - so if you're interested in learning F# from another perspective, have a look at Designing with types at F# for Fun and Profit.
In this case, you could also use Seq.unfold to implement the solution (in general, you can often compose solutions to sequence processing problems using Seq functions - though it does not look as elegant here).
let Calc() =
// Start with initial state (600851475143L, 2L) and generate a sequence
// of "limits" by generating new states & returning limit in each step
(600851475143L, 2L)
|> Seq.unfold (fun (limit, factor) ->
// End the sequence when factor is greater than limit
if factor >= limit then None
// Update limit when divisible by factor
elif limit % factor = 0L then
let limit = limit / factor
Some(limit, (limit, factor))
// Update factor
else
Some(limit, (limit, factor + 1L)) )
// Take the last generated limit value
|> Seq.last
In functional programming when I think mutable I think heap and when trying to write code that is more functional, you should use the stack instead of the heap.
So how do you get values on to the stack for use with a function?
Place the value in the function's parameters.
let result01 = List.filter (fun x -> x % 2 = 0) [0;1;2;3;4;5]
here both a function an a list of values are hard coded into the List.filter parameter's.
Bind the value to a name and then reference the name.
let divisibleBy2 = fun x -> x % 2 = 0
let values = [0;1;2;3;4;5]
let result02 = List.filter divisibleBy2 values
here the function parameter for list.filter is bound to divisibleBy2 and the list parameter for list.filter is bound to values.
Create a nameless data structure and pipe it into the function.
let result03 =
[0;1;2;3;4;5]
|> List.filter divisibleBy2
here the list parameter for list.filter is forward piped into the list.filter function.
Pass the result of a function into the function
let result04 =
[ for i in 1 .. 5 -> i]
|> List.filter divisibleBy2
Now that we have all of the data on the stack, how do we process the data using only the stack?
One of the patterns often used with functional programming is to put data into a structure and then process the items one at a time using a recursive function. The structure can be a list, tree, graph, etc. and is usually defined using a discriminated union. Data structures that have one or more self references are typically used with recursive functions.
So here is an example where we take a list and multiply all the values by 2 and put the result back onto the stack as we progress. The variable on the stack holding the new values is accumulator.
let mult2 values =
let rec mult2withAccumulator values accumulator =
match values with
| headValue::tailValues ->
let newValue = headValue * 2
let accumulator = newValue :: accumulator
mult2withAccumulator tailValues accumulator
| [] ->
List.rev accumulator
mult2withAccumulator values []
We use an accumulator for this which being a parameter to a function and not defined mutable is stored on the stack. Also this method is using pattern matching and the list discriminated union. The accumulator holds the new values as we process the items in the input list and then when there are not more items in the list ([]) we just reverse the list to get the new list in the correct order because the new items are concatenated to the head of the accumulator.
To understand the data structure (discriminated union) for a list you need to see it, so here it is
type list =
| Item of 'a * List
| Empty
Notice how the end of the definition of an item is List referring back to itself, and that a list can ben an empty list, which is when used with pattern match is [].
A quick example of how list are built is
empty list - []
list with one int value - 1::[]
list with two int values - 1::2::[]
list with three int values - 1::2::3::[]
Here is the same function with all of the types defined.
let mult2 (values : int list) =
let rec mult2withAccumulator (values : int list) (accumulator : int list) =
match (values : int list) with
| (headValue : int)::(tailValues : int list) ->
let (newValue : int) = headValue * 2
let (accumulator : int list) =
(((newValue : int) :: (accumulator : int list)) : int list)
mult2withAccumulator tailValues accumulator
| [] ->
((List.rev accumulator) : int list)
mult2withAccumulator values []
So putting values onto the stack and using self referencing discriminated unions with pattern matching will help to solve a lot of problems with functional programming.

Verifying a set of objects have been mapped correctly

I'm looking for a clean set of ways to manage Test Specific Equality in F# unit tests. 90% of the time, the standard Structural Equality fits the bill and I can leverage it with unquote to express the relation between my result and my expected.
TL;DR "I can't find a clean way to having a custom Equality function for one or two properties in a value which 90% of is well served by Structural Equality, does F# have a way to match an arbitrary record with custom Equality for just one or two of its fields?"
Example of a general technique that works for me
When verifying a function that performs a 1:1 mapping of a datatype to another, I'll often extract matching tuples from both sides of in some cases and compare the input and output sets. For example, I have an operator:-
let (====) x y = (x |> Set.ofSeq) = (y |> Set.ofSeq)
So I can do:
let inputs = ["KeyA",DateTime.Today; "KeyB",DateTime.Today.AddDays(1); "KeyC",DateTime.Today.AddDays(2)]
let trivialFun (a:string,b) = a.ToLower(),b
let expected = inputs |> Seq.map trivialFun
let result = inputs |> MyMagicMapper
test <# expected ==== actual #>
This enables me to Assert that each of my inputs has been mapped to an output, without any superfluous outputs.
The problem
The problem is when I want to have a custom comparison for one or two of the fields.
For example, if my DateTime is being passed through a slightly lossy serialization layer by the SUT, I need a test-specific tolerant DateTime comparison. Or maybe I want to do a case-insensitive verification for a string field
Normally, I'd use Mark Seemann's SemanticComparison library's Likeness<Source,Destination> to define a Test Specific equality, but I've run into some roadblocks:
tuples: F# hides .ItemX on Tuple so I can't define the property via a .With strongly typed field name Expression<T>
record types: TTBOMK these are sealed by F# with no opt-out so SemanticComparison can't proxy them to override Object.Equals
My ideas
All I can think of is to create a generic Resemblance proxy type that I can include in a tuple or record.
Or maybe using pattern matching (Is there a way I can use that to generate an IEqualityComparer and then do a set comparison using that?)
Alternate failing test
I'm also open to using some other function to verify the full mapping (i.e. not abusing F# Set or involving too much third party code. i.e. something to make this pass:
let sut (a:string,b:DateTime) = a.ToLower(),b + TimeSpan.FromTicks(1L)
let inputs = ["KeyA",DateTime.Today; "KeyB",DateTime.Today.AddDays(1.0); "KeyC",DateTime.Today.AddDays(2.0)]
let toResemblance (a,b) = TODO generate Resemblance which will case insensitively compare fst and tolerantly compare snd
let expected = inputs |> List.map toResemblance
let result = inputs |> List.map sut
test <# expected = result #>
Firstly, thanks to all for the inputs. I was largely unaware of SemanticComparer<'T> and it definitely provides a good set of building blocks for building generalized facilities in this space. Nikos' post gives excellent food for thought in the area too. I shouldn't have been surprised Fil exists too - #ptrelford really does have a lib for everything (the FSharpValue point is also v valuable)!
We've thankfully arrived at a conclusion to this. Unfortunately it's not a single all-encompassing tool or technique, but even better, a set of techniques that can be used as necessary in a given context.
Firstly, the issue of ensuring a mapping is complete is really an orthogonal concern. The question refers to an ==== operator:-
let (====) x y = (x |> Set.ofSeq) = (y |> Set.ofSeq)
This is definitely the best default approach - lean on Structural Equality. One thing to note is that, being reliant on F# persistent sets, it requires your type to support : comparison (as opposed to just : equality).
When doing set comparisons off the proven Structural Equality path, a useful technique is to use HashSet<T> with a custom IEqualityComparer:-
[<AutoOpen>]
module UnorderedSeqComparisons =
let seqSetEquals ec x y =
HashSet<_>( x, ec).SetEquals( y)
let (==|==) x y equals =
let funEqualityComparer = {
new IEqualityComparer<_> with
member this.GetHashCode(obj) = 0
member this.Equals(x,y) =
equals x y }
seqSetEquals funEqualityComparer x y
the equals parameter of ==|== is 'a -> 'a -> bool which allows one to use pattern matching to destructure args for the purposes of comparison. This works well if either the input or the result side are naturally already tuples. Example:
sut.Store( inputs)
let results = sut.Read()
let expecteds = seq { for x in inputs -> x.Name,x.ValidUntil }
test <# expecteds ==|== results
<| fun (xN,xD) (yN,yD) ->
xF=yF
&& xD |> equalsWithinASecond <| yD #>
While SemanticComparer<'T> can do a job, it's simply not worth bothering for tuples with when you have the power of pattern matching. e.g. Using SemanticComparer<'T>, the above test can be expressed as:
test <# expecteds ==~== results
<| [ funNamedMemberComparer "Item2" equalsWithinASecond ] #>
using the helper:
[<AutoOpen>]
module MemberComparerHelpers =
let funNamedMemberComparer<'T> name equals = {
new IMemberComparer with
member this.IsSatisfiedBy(request: PropertyInfo) =
request.PropertyType = typedefof<'T>
&& request.Name = name
member this.IsSatisfiedBy(request: FieldInfo) =
request.FieldType = typedefof<'T>
&& request.Name = name
member this.GetHashCode(obj) = 0
member this.Equals(x, y) =
equals (x :?> 'T) (y :?> 'T) }
let valueObjectMemberComparer() = {
new IMemberComparer with
member this.IsSatisfiedBy(request: PropertyInfo) = true
member this.IsSatisfiedBy(request: FieldInfo) = true
member this.GetHashCode(obj) = hash obj
member this.Equals(x, y) =
x.Equals( y) }
let (==~==) x y mcs =
let ec = SemanticComparer<'T>( seq {
yield valueObjectMemberComparer()
yield! mcs } )
seqSetEquals ec x y
All of the above is best understood by reading Nikos Baxevanis' post NOW!
For types or records, the ==|== technique can work (except critically you lose Likeness<'T>s verifying coverage of fields). However the succinctness can make it a valuable tool for certain sorts of tests :-
sut.Save( inputs)
let expected = inputs |> Seq.map (fun x -> Mapped( base + x.ttl, x.Name))
let likeExpected x = expected ==|== x <| (fun x y -> x.Name = y.Name && x.ValidUntil = y.ValidUntil)
verify <# repo.Store( is( likeExpected)) #> once

Resources