Write PolynomialFeatures function from scikit-learn for F# - f#

I recently started to learn F#, I have learnt about the fslab for data science, however I cannot seem to find any function similar to the PolynomialFeatures from Scikit-Learn (http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html)
A simple example for this function is
f [x;y] 2 -> [1;x;y;x**2;x*y;y**2]
I wonder if anyone has written a general function like PolynomialFeatures for F#, thanks.

I guess not many people use F# nowsaday, I figured out the answer by looking at the PolynomialFeatures source code in scikit-learn (https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/preprocessing/data.py).
However, F# does not have the "combinations_w_r" (or any equivalent) as in Python, I then looked at the Rosettacode (http://rosettacode.org/wiki/Combinations_with_repetitions) and luckily their OCAML code is exactly same as F#, I combined them all to the following code
let PolyFeature ndgree (x:float list) =
let rec combsWithRep xxs k =
match k, xxs with
| 0, _ -> [[]]
| _, [] -> []
| k, x::xs ->
List.map (fun ys -> x::ys) (combsWithRep xxs (k-1))
# combsWithRep xs k
let rec genCombtill n xlen =
match n with
| 0 -> List.empty
| n -> (genCombtill (n-1) xlen) # combsWithRep [0..(xlen-1)] n
let rec mulList list1 =
match list1 with
| head :: tail -> head * (mulList tail)
| [] -> 1.0
let mul2List (b:float list) (a:int list) = [for i in a -> b.[i]] |> mulList
1.0 :: ((genCombtill ndgree x.Length) |> List.map (mul2List x))
Test
> PolyFeature 2 [2.0;3.0];;
val it : float list = [1.0; 2.0; 3.0; 4.0; 6.0; 9.0]
The code works as expected, however I believe my code is not optimized and probably will be slow with large list and high order of polynomial.

Related

Is there a standard F# function for separating a sequence of Choices?

I'm looking for a standard F# function that takes a sequence of 2-choices and returns a pair of sequences:
let separate (choices : seq<Choice<'T1, 'T2>>) : seq<'T1> * seq<'T2> = ...
A naive implementation is pretty simple:
let separate choices =
let ones =
choices
|> Seq.choose (function
| Choice1Of2 one -> Some one
| _ -> None)
let twos =
choices
|> Seq.choose (function
| Choice2Of2 two -> Some two
| _ -> None)
ones, twos
This works fine, but iterates the sequence twice, which is less than ideal. Is this function defined in one of the semi-standard libraries? I looked around, but couldn't find it. (If it exists, I'm sure it goes by some other name.)
For bonus points, versions that work with 3-choices, 4-choices, and so on, would also be nice, as would versions for List, Array, etc. Thanks.
I can't find builtin implementation but can write my own.
It uses IEnumerator<> based approach, so it will work with any collection type but it's not optimal (e.g. arrays will work slower than could be). Order is reversed (easy to fix with ResizeArray but more code). Also this version is not lazy, but can be easily adapted to work with Choice<'a, 'b, 'c> and others
let splitChoices2 (choices: Choice<'a, 'b> seq) =
let rec inner (it: IEnumerator<_>) acc1 acc2 =
if it.MoveNext() then
match it.Current with
| Choice1Of2 c1 -> inner it (c1 :: acc1) acc2
| Choice2Of2 c2 -> inner it acc1 (c2 :: acc2)
else
acc1, acc2
inner (choices.GetEnumerator()) [] []
let choices = [
Choice1Of2 11
Choice2Of2 "12"
Choice1Of2 21
Choice2Of2 "22"
]
choices |> splitChoices2 |> printfn "%A"
Update: ResizeArray based approach without reversed order and potentially less expensive enumeration
let splitChoices2 (choices: Choice<'a, 'b> seq) =
let acc1 = ResizeArray()
let acc2 = ResizeArray()
for el in choices do
match el with
| Choice1Of2 c1 -> acc1.Add c1
| Choice2Of2 c2 -> acc2.Add c2
acc1, acc2
This is sort of inspired by TraverseA but has come out quite different. Here is a single pass solution (UPDATE: however while the core algorithm might be single pass from List to List, but getting it to match your type signature, and ordering the result the same way makes it 3*O(n), it depends how important the ordering and type signature are to you)
let choices = seq {Choice1Of2(1) ; Choice2Of2(2) ; Choice2Of2(3) ; Choice1Of2(4)}
let seperate' choices =
let rec traverse2ChoicesA tupleSeq choices =
match choices with
| [] -> fst tupleSeq |> List.rev |>Seq.ofList , snd tupleSeq |> List.rev |> Seq.ofList
| (Choice1Of2 f)::tl -> traverse2ChoicesA (f::fst tupleSeq, snd tupleSeq) tl
| (Choice2Of2 s)::tl -> traverse2ChoicesA (fst tupleSeq, s::snd tupleSeq) tl
traverse2ChoicesA ([],[]) <| List.ofSeq choices
seperate' choices;;
val seperate' : choices:seq<Choice<'a,'b>> -> seq<'a> * seq<'b>
val it : seq<int> * seq<int> = ([1; 4], [2; 3])
Update: To be clear, if ordering and List instead of Seq are ok then this is a single pass:
let choices = [Choice1Of2(1) ; Choice2Of2(2) ; Choice2Of2(3) ; Choice1Of2(4)]
let seperate' choices =
let rec traverse2ChoicesA (tupleSeq) choices =
match choices with
| [] -> tupleSeq
| (Choice1Of2 f)::tl -> traverse2ChoicesA (f :: fst tupleSeq, snd tupleSeq) tl
| (Choice2Of2 s)::tl -> traverse2ChoicesA (fst tupleSeq, s:: snd tupleSeq) tl
traverse2ChoicesA ([],[]) choices
seperate' choices;;
val choices : Choice<int,int> list =
[Choice1Of2 1; Choice2Of2 2; Choice2Of2 3; Choice1Of2 4]
val seperate' : choices:Choice<'a,'b> list -> 'a list * 'b list
val it : int list * int list = ([4; 1], [3; 2])
You might find something more general, performant and with appropriate type signature in the FSharpPlus "semi-standard" library using TraverseA?

Why is this F# sequence function not tail recursive?

Disclosure: this came up in FsCheck, an F# random testing framework I maintain. I have a solution, but I do not like it. Moreover, I do not understand the problem - it was merely circumvented.
A fairly standard implementation of (monadic, if we're going to use big words) sequence is:
let sequence l =
let k m m' = gen { let! x = m
let! xs = m'
return (x::xs) }
List.foldBack k l (gen { return [] })
Where gen can be replaced by a computation builder of choice. Unfortunately, that implementation consumes stack space, and so eventually stack overflows if the list is long enough.The question is: why? I know in principle foldBack is not tail recursive, but the clever bunnies of the F# team have circumvented that in the foldBack implementation. Is there a problem in the computation builder implementation?
If I change the implementation to the below, everything is fine:
let sequence l =
let rec go gs acc size r0 =
match gs with
| [] -> List.rev acc
| (Gen g)::gs' ->
let r1,r2 = split r0
let y = g size r1
go gs' (y::acc) size r2
Gen(fun n r -> go l [] n r)
For completeness, the Gen type and computation builder can be found in the FsCheck source
Building on Tomas's answer, let's define two modules:
module Kurt =
type Gen<'a> = Gen of (int -> 'a)
let unit x = Gen (fun _ -> x)
let bind k (Gen m) =
Gen (fun n ->
let (Gen m') = k (m n)
m' n)
type GenBuilder() =
member x.Return(v) = unit v
member x.Bind(v,f) = bind f v
let gen = GenBuilder()
module Tomas =
type Gen<'a> = Gen of (int -> ('a -> unit) -> unit)
let unit x = Gen (fun _ f -> f x)
let bind k (Gen m) =
Gen (fun n f ->
m n (fun r ->
let (Gen m') = k r
m' n f))
type GenBuilder() =
member x.Return v = unit v
member x.Bind(v,f) = bind f v
let gen = GenBuilder()
To simplify things a bit, let's rewrite your original sequence function as
let rec sequence = function
| [] -> gen { return [] }
| m::ms -> gen {
let! x = m
let! xs = sequence ms
return x::xs }
Now, sequence [for i in 1 .. 100000 -> unit i] will run to completion regardless of whether sequence is defined in terms of Kurt.gen or Tomas.gen. The issue is not that sequence causes a stack overflow when using your definitions, it's that the function returned from the call to sequence causes a stack overflow when it is called.
To see why this is so, let's expand the definition of sequence in terms of the underlying monadic operations:
let rec sequence = function
| [] -> unit []
| m::ms ->
bind (fun x -> bind (fun xs -> unit (x::xs)) (sequence ms)) m
Inlining the Kurt.unit and Kurt.bind values and simplifying like crazy, we get
let rec sequence = function
| [] -> Kurt.Gen(fun _ -> [])
| (Kurt.Gen m)::ms ->
Kurt.Gen(fun n ->
let (Kurt.Gen ms') = sequence ms
(m n)::(ms' n))
Now it's hopefully clear why calling let (Kurt.Gen f) = sequence [for i in 1 .. 1000000 -> unit i] in f 0 overflows the stack: f requires a non-tail-recursive call to sequence and evaluation of the resulting function, so there will be one stack frame for each recursive call.
Inlining Tomas.unit and Tomas.bind into the definition of sequence instead, we get the following simplified version:
let rec sequence = function
| [] -> Tomas.Gen (fun _ f -> f [])
| (Tomas.Gen m)::ms ->
Tomas.Gen(fun n f ->
m n (fun r ->
let (Tomas.Gen ms') = sequence ms
ms' n (fun rs -> f (r::rs))))
Reasoning about this variant is tricky. You can empirically verify that it won't blow the stack for some arbitrarily large inputs (as Tomas shows in his answer), and you can step through the evaluation to convince yourself of this fact. However, the stack consumption depends on the Gen instances in the list that's passed in, and it is possible to blow the stack for inputs that aren't themselves tail recursive:
// ok
let (Tomas.Gen f) = sequence [for i in 1 .. 1000000 -> unit i]
f 0 (fun list -> printfn "%i" list.Length)
// not ok...
let (Tomas.Gen f) = sequence [for i in 1 .. 1000000 -> Gen(fun _ f -> f i; printfn "%i" i)]
f 0 (fun list -> printfn "%i" list.Length)
You're correct - the reason why you're getting a stack overflow is that the bind operation of the monad needs to be tail-recursive (because it is used to aggregate values during folding).
The monad used in FsCheck is essentially a state monad (it keeps the current generator and some number). I simplified it a bit and got something like:
type Gen<'a> = Gen of (int -> 'a)
let unit x = Gen (fun n -> x)
let bind k (Gen m) =
Gen (fun n ->
let (Gen m') = k (m n)
m' n)
Here, the bind function is not tail-recursive because it calls k and then does some more work. You can change the monad to be a continuation monad. It is implemented as a function that takes the state and a continuation - a function that is called with the result as an argument. For this monad, you can make bind tail recursive:
type Gen<'a> = Gen of (int -> ('a -> unit) -> unit)
let unit x = Gen (fun n f -> f x)
let bind k (Gen m) =
Gen (fun n f ->
m n (fun r ->
let (Gen m') = k r
m' n f))
The following example will not stack overflow (and it did with the original implementation):
let sequence l =
let k m m' =
m |> bind (fun x ->
m' |> bind (fun xs ->
unit (x::xs)))
List.foldBack k l (unit [])
let (Gen f) = sequence [ for i in 1 .. 100000 -> unit i ]
f 0 (fun list -> printfn "%d" list.Length)

How can I implement a tail-recursive list append?

A simple append function like this (in F#):
let rec app s t =
match s with
| [] -> t
| (x::ss) -> x :: (app ss t)
will crash when s becomes big, since the function is not tail recursive. I noticed that F#'s standard append function does not crash with big lists, so it must be implemented differently. So I wondered: How does a tail recursive definition of append look like? I came up with something like this:
let rec comb s t =
match s with
| [] -> t
| (x::ss) -> comb ss (x::t)
let app2 s t = comb (List.rev s) t
which works, but looks rather odd. Is there a more elegant definition?
Traditional (not tail-recursive)
let rec append a b =
match a, b with
| [], ys -> ys
| x::xs, ys -> x::append xs ys
With an accumulator (tail-recursive)
let append2 a b =
let rec loop acc = function
| [] -> acc
| x::xs -> loop (x::acc) xs
loop b (List.rev a)
With continuations (tail-recursive)
let append3 a b =
let rec append = function
| cont, [], ys -> cont ys
| cont, x::xs, ys -> append ((fun acc -> cont (x::acc)), xs, ys)
append(id, a, b)
Its pretty straight-forward to convert any non-tail recursive function to recursive with continuations, but I personally prefer accumulators for straight-forward readability.
In addition to what Juliet posted:
Using sequence expressions
Internally, sequence expressions generate tail-recursive code, so this works just fine.
let append xs ys =
[ yield! xs
yield! ys ]
Using mutable .NET types
David mentioned that F# lists can be mutated - that's however limited only to F# core libraries (and the feature cannot be used by users, because it breaks the functional concepts). You can use mutable .NET data types to implement a mutation-based version:
let append (xs:'a[]) (ys:'a[]) =
let ra = new ResizeArray<_>(xs)
for y in ys do ra.Add(y)
ra |> List.ofSeq
This may be useful in some scenarios, but I'd generally avoid mutation in F# code.
From a quick glance at the F# sources, it seems the tail is internally mutable. A simple solution would be to reverse the first list before consing its elements to the second list. That, along with reversing the list, are trivial to implement tail recursively.

F# Split list into sublists based on comparison of adjacent elements

I've found this question on hubFS, but that handles a splitting criteria based on individual elements. I'd like to split based on a comparison of adjacent elements, so the type would look like this:
val split = ('T -> 'T -> bool) -> 'T list -> 'T list list
Currently, I am trying to start from Don's imperative solution, but I can't work out how to initialize and use a 'prev' value for comparison. Is fold a better way to go?
//Don's solution for single criteria, copied from hubFS
let SequencesStartingWith n (s:seq<_>) =
seq { use ie = s.GetEnumerator()
let acc = new ResizeArray<_>()
while ie.MoveNext() do
let x = ie.Current
if x = n && acc.Count > 0 then
yield ResizeArray.to_list acc
acc.Clear()
acc.Add x
if acc.Count > 0 then
yield ResizeArray.to_list acc }
This is an interesting problem! I needed to implement exactly this in C# just recently for my article about grouping (because the type signature of the function is pretty similar to groupBy, so it can be used in LINQ query as the group by clause). The C# implementation was quite ugly though.
Anyway, there must be a way to express this function using some simple primitives. It just seems that the F# library doesn't provide any functions that fit for this purpose. I was able to come up with two functions that seem to be generally useful and can be combined together to solve this problem, so here they are:
// Splits a list into two lists using the specified function
// The list is split between two elements for which 'f' returns 'true'
let splitAt f list =
let rec splitAtAux acc list =
match list with
| x::y::ys when f x y -> List.rev (x::acc), y::ys
| x::xs -> splitAtAux (x::acc) xs
| [] -> (List.rev acc), []
splitAtAux [] list
val splitAt : ('a -> 'a -> bool) -> 'a list -> 'a list * 'a list
This is similar to what we want to achieve, but it splits the list only in two pieces (which is a simpler case than splitting the list multiple times). Then we'll need to repeat this operation, which can be done using this function:
// Repeatedly uses 'f' to take several elements of the input list and
// aggregate them into value of type 'b until the remaining list
// (second value returned by 'f') is empty
let foldUntilEmpty f list =
let rec foldUntilEmptyAux acc list =
match f list with
| l, [] -> l::acc |> List.rev
| l, rest -> foldUntilEmptyAux (l::acc) rest
foldUntilEmptyAux [] list
val foldUntilEmpty : ('a list -> 'b * 'a list) -> 'a list -> 'b list
Now we can repeatedly apply splitAt (with some predicate specified as the first argument) on the input list using foldUntilEmpty, which gives us the function we wanted:
let splitAtEvery f list = foldUntilEmpty (splitAt f) list
splitAtEvery (<>) [ 1; 1; 1; 2; 2; 3; 3; 3; 3 ];;
val it : int list list = [[1; 1; 1]; [2; 2]; [3; 3; 3; 3]]
I think that the last step is really nice :-). The first two functions are quite straightforward and may be useful for other things, although they are not as general as functions from the F# core library.
How about:
let splitOn test lst =
List.foldBack (fun el lst ->
match lst with
| [] -> [[el]]
| (x::xs)::ys when not (test el x) -> (el::(x::xs))::ys
| _ -> [el]::lst
) lst []
the foldBack removes the need to reverse the list.
Having thought about this a bit further, I've come up with this solution. I'm not sure that it's very readable (except for me who wrote it).
UPDATE Building on the better matching example in Tomas's answer, here's an improved version which removes the 'code smell' (see edits for previous version), and is slightly more readable (says me).
It still breaks on this (splitOn (<>) []), because of the dreaded value restriction error, but I think that might be inevitable.
(EDIT: Corrected bug spotted by Johan Kullbom, now works correctly for [1;1;2;3]. The problem was eating two elements directly in the first match, this meant I missed a comparison/check.)
//Function for splitting list into list of lists based on comparison of adjacent elements
let splitOn test lst =
let rec loop lst inner outer = //inner=current sublist, outer=list of sublists
match lst with
| x::y::ys when test x y -> loop (y::ys) [] (List.rev (x::inner) :: outer)
| x::xs -> loop xs (x::inner) outer
| _ -> List.rev ((List.rev inner) :: outer)
loop lst [] []
splitOn (fun a b -> b - a > 1) [1]
> val it : [[1]]
splitOn (fun a b -> b - a > 1) [1;3]
> val it : [[1]; [3]]
splitOn (fun a b -> b - a > 1) [1;2;3;4;6;7;8;9;11;12;13;14;15;16;18;19;21]
> val it : [[1; 2; 3; 4]; [6; 7; 8; 9]; [11; 12; 13; 14; 15; 16]; [18; 19]; [21]]
Any thoughts on this, or the partial solution in my question?
"adjacent" immediately makes me think of Seq.pairwise.
let splitAt pred xs =
if Seq.isEmpty xs then
[]
else
xs
|> Seq.pairwise
|> Seq.fold (fun (curr :: rest as lists) (i, j) -> if pred i j then [j] :: lists else (j :: curr) :: rest) [[Seq.head xs]]
|> List.rev
|> List.map List.rev
Example:
[1;1;2;3;3;3;2;1;2;2]
|> splitAt (>)
Gives:
[[1; 1; 2; 3; 3; 3]; [2]; [1; 2; 2]]
I would prefer using List.fold over explicit recursion.
let splitOn pred = function
| [] -> []
| hd :: tl ->
let (outer, inner, _) =
List.fold (fun (outer, inner, prev) curr ->
if pred prev curr
then (List.rev inner) :: outer, [curr], curr
else outer, curr :: inner, curr)
([], [hd], hd)
tl
List.rev ((List.rev inner) :: outer)
I like answers provided by #Joh and #Johan as these solutions seem to be most idiomatic and straightforward. I also like an idea suggested by #Shooton. However, each solution had their own drawbacks.
I was trying to avoid:
Reversing lists
Unsplitting and joining back the temporary results
Complex match instructions
Even Seq.pairwise appeared to be redundant
Checking list for emptiness can be removed in cost of using Unchecked.defaultof<_> below
Here's my version:
let splitWhen f src =
if List.isEmpty src then [] else
src
|> List.foldBack
(fun el (prev, current, rest) ->
if f el prev
then el , [el] , current :: rest
else el , el :: current , rest
)
<| (List.head src, [], []) // Initial value does not matter, dislike using Unchecked.defaultof<_>
|> fun (_, current, rest) -> current :: rest // Merge temporary lists
|> List.filter (not << List.isEmpty) // Drop tail element

F# permutations

I need to generate permutations on a given list. I managed to do it like this
let rec Permute (final, arr) =
if List.length arr > 0 then
for x in arr do
let n_final = final # [x]
let rest = arr |> List.filter (fun a -> not (x = a))
Permute (n_final, rest)
else
printfn "%A" final
let DoPermute lst =
Permute ([], lst)
DoPermute lst
There are obvious issues with this code. For example, list elements must be unique. Also, this is more-less a same approach that I would use when generating straight forward implementation in any other language. Is there any better way to implement this in F#.
Thanks!
Here's the solution I gave in my book F# for Scientists (page 166-167):
let rec distribute e = function
| [] -> [[e]]
| x::xs' as xs -> (e::xs)::[for xs in distribute e xs' -> x::xs]
let rec permute = function
| [] -> [[]]
| e::xs -> List.collect (distribute e) (permute xs)
For permutations of small lists, I use the following code:
let distrib e L =
let rec aux pre post =
seq {
match post with
| [] -> yield (L # [e])
| h::t -> yield (List.rev pre # [e] # post)
yield! aux (h::pre) t
}
aux [] L
let rec perms = function
| [] -> Seq.singleton []
| h::t -> Seq.collect (distrib h) (perms t)
It works as follows: the function "distrib" distributes a given element over all positions in a list, example:
distrib 10 [1;2;3] --> [[10;1;2;3];[1;10;2;3];[1;2;10;3];[1;2;3;10]]
The function perms works (recursively) as follows: distribute the head of the list over all permutations of its tail.
The distrib function will get slow for large lists, because it uses the # operator a lot, but for lists of reasonable length (<=10), the code above works fine.
One warning: if your list contains duplicates, the result will contain identical permutations. For example:
perms [1;1;3] = [[1;1;3]; [1;1;3]; [1;3;1]; [1;3;1]; [3;1;1]; [3;1;1]]
The nice thing about this code is that it returns a sequence of permutations, instead of generating them all at once.
Of course, generating permutations with an imperative array-based algorithm will be (much) faster, but this algorithm has served me well in most cases.
Here's another sequence-based version, hopefully more readable than the voted answer.
This version is similar to Jon's version in terms of logic, but uses computation expressions instead of lists. The first function computes all ways to insert an element x in a list l. The second function computes permutations.
You should be able to use this on larger lists (e.g. for brute force searches on all permutations of a set of inputs).
let rec inserts x l =
seq { match l with
| [] -> yield [x]
| y::rest ->
yield x::l
for i in inserts x rest do
yield y::i
}
let rec permutations l =
seq { match l with
| [] -> yield []
| x::rest ->
for p in permutations rest do
yield! inserts x p
}
It depends on what you mean by "better". I'd consider this to be slightly more elegant, but that may be a matter of taste:
(* get the list of possible heads + remaining elements *)
let rec splitList = function
| [x] -> [x,[]]
| x::xs -> (x, xs) :: List.map (fun (y,l) -> y,x::l) (splitList xs)
let rec permutations = function
| [] -> [[]]
| l ->
splitList l
|> List.collect (fun (x,rest) ->
(* permute remaining elements, then prepend head *)
permutations rest |> List.map (fun l -> x::l))
This can handle lists with duplicate elements, though it will result in duplicated permutations.
In the spirit of Cyrl's suggestion, here's a sequence comprehension version
let rec permsOf xs =
match xs with
| [] -> List.toSeq([[]])
| _ -> seq{ for x in xs do
for xs' in permsOf (remove x xs) do
yield (x::xs')}
where remove is a simple function that removes a given element from a list
let rec remove x xs =
match xs with [] -> [] | (x'::xs')-> if x=x' then xs' else x'::(remove x xs')
IMHO the best solution should alleviate the fact that F# is a functional language so imho the solution should be as close to the definition of what we mean as permutation there as possible.
So the permutation is such an instance of list of things where the head of the list is somehow added to the permutation of the rest of the input list.
The erlang solution shows that in a pretty way:
permutations([]) -> [[]];
permutations(L) -> [[H|T] H<- L, T <- permutations( L--[H] ) ].
taken fron the "programming erlang" book
There is a list comprehension operator used, in solution mentioned here by the fellow stackoverflowers there is a helper function which does the similar job
basically I'd vote for the solution without any visible loops etc, just pure function definition
I'm like 11 years late, but still in case anyone needs permutations like I did recently. Here's Array version of permutation func, I believe it's more performant:
[<RequireQualifiedAccess>]
module Array =
let private swap (arr: _[]) i j =
let buf = arr.[i]
arr.[i] <- arr.[j]
arr.[j] <- buf
let permutations arr =
match arr with
| null | [||] -> [||]
| arr ->
let last = arr.Length - 1
let arr = Array.copy arr
let rec perm arr k =
let arr = Array.copy arr
[|
if k = last then
yield arr
else
for i in k .. last do
swap arr k i
yield! perm arr (k + 1)
|]
perm arr 0

Resources