Equivalent of repeated takeWhile calls: does this function have a "standard" name? - f#

I have a scenario where the standard List.groupBy function isn't what I want, but I don't know the right name for this function so it's making it hard to search for.
I have a list of items of type 'T, and a 'T -> 'k key-producing function. The items are already somewhat "grouped" together in the list, so that when you map the list through the key function, its result will tend have the same key in a row several times, e.g. [1; 1; 1; 2; 2; 1; 1; 3; 3; 3; 1; 1]. What I want is to get a list of lists, where the inner list contains all the items for which the key-producing function returned the same value -- but it should NOT group the different sequences of 1's together.
In other words, say my data was a list of strings, and the key-producing function was String.length. So the input is:
["a"; "e"; "i"; "to"; "of"; "o"; "u"; "and"; "for"; "the"; "I"; "O"]
The output I'm looking for would be:
[["a"; "e"; "i"]; ["to"; "of"]; ["o"; "u"]; ["and"; "for"; "the"]; ["I"; "O"]]
To think of it another way: this is like taking the first item of the list and storing the result of calling the key function. Then you'd use takeWhile (fun x -> keyFun x = origKeyFunResult) to generate the first segment. Then when that takeWhile stops returning values, you record when it stopped, and the value of keyFun x on the first value that didn't return the original result -- and go on from there. (Except that that would be O(N*M) where M is the number of sequences, and would devolve into O(N^2) in many cases -- whereas it should be possible to implement this function in O(N) time).
Now, I can write that function pretty easily. That's not the question. What I want to know is whether there's a standard name for this function. Because I thought it would be called groupBy, but that's something else. (List.groupBy String.length would return [(1, ["a"; "e"; "i"; "o"; "u"; "I"; "O"]); (2, ["to"; "of"]), (3, ["and"; "for"; "the"])], but what I want in this case is for the "a/e/i", "o/u", and "I/O" lists to remain separated, and I don't want the value that the key-generating returns to be in the output data).
Maybe there isn't a standard name for this function. But if there is, what is it?

I'm a little late and it seems that you have found a solution, and it seems that there doesn't exists a single function i F# that can handle the problem.
Just for the challenge I tried to find some usable solutions and came up with the following (whether they are efficient or not is up the reader to deside):
open System
module List =
/// <summary>
/// Generic List Extension:
/// Given a comparer function the list will be chunked into sub lists
/// starting when ever comparer finds a difference.
/// </summary>
let chunkByPredicate (comparer : 'T -> 'T -> bool) list =
let rec func (i : int, lst : 'T list) : 'T list list =
if i >= lst.Length then
List.empty
else
let first = lst.[i]
let chunk = lst |> List.skip(i) |> List.takeWhile (fun s -> comparer first s)
List.append [chunk] (func((i + chunk.Length), lst))
func (0, list) |> List.where (fun lst -> not (List.isEmpty lst))
// 1. Using List.fold to chunk by string length
let usingListFold (data : string list) =
printfn "1. Using List.fold: "
data
|> List.fold (fun (acc : string list list) s ->
if acc.Length > 0 then
let last = acc.[acc.Length - 1]
let lastLength = last.[0].Length
if lastLength = s.Length then
List.append (acc |> List.take (acc.Length - 1)) [(last |> List.append [s])]
else
List.append acc [[s]]
else
[[s]]) ([])
|> List.iter (printfn "%A")
printfn ""
// 2. Using List.chunkByPredicate
let usingListChunkByPredicate<'a> (predicate : 'a -> 'a -> bool, data : 'a list) =
printfn "2. Using List.chunkByPredicate: "
data
|> List.chunkByPredicate predicate
|> List.iter (printfn "%A")
printfn ""
[<EntryPoint>]
let main argv =
let data = ["a"; "e"; "i"; "to"; "of"; "o"; "u"; "and"; "for"; "the"; "I"; "O"]
usingListFold data
usingListChunkByPredicate<string>((fun first s -> first.Length = s.Length), data)
let intData = [0..50]
usingListChunkByPredicate<int>((fun first n -> first / 10 = n / 10), intData)
Console.ReadLine() |> ignore
0

Related

Remove same values from two lists and compare them using List.fold in F#

I am trying to create a function that takes two lists that removes values in one list that are also in the other. E.g if we have the lists [1;2;3] and [1;2;3;4] then the first list becomes empty []
and the second list is just [4]. At the end I just when to compare both lists.
I am trying to use List.fold for this since I want to understand it better. Also I created my own folder function that deletes elements from a list.
I am very new to F# so I only came up with a partial solution
let rec delete x list =
match list with
| [] -> []
| hd:: tl when hd = x -> tl
| hd:: tl-> hd:: delete x tl
let myFunc list1 list2 =
let x = list1 |> List.fold(delete) [] list2
let y = list2 |> List.fold(delete) [] list1
x = y
but this does not work and the compiler is telling me "The type '('a -> 'b)' does not support the 'equality' constraint because it is a function type" when I try to use the delete function with the list.fold method.
Although you say you are trying to use List.fold for this to understand it better, there is another List function that makes this simpler. This is to use List.except which is one of a number of methods that treats lists as sets.
let list1 = [1;2;3]
let list2 = [1;2;3;4]
let myFunc list1 list2=
list1 |> List.except list2, list2 |> List.except list1
printfn "%A" (myFunc list1 list2)
[],[4]
If you want to understand List.fold here you could try and create an explicit implementation of except using List.fold. However, again, this is simpler to implement using List.filter.
let list1 = [1;2;3]
let list2 = [1;2;3;4]
let except exclude src =
src |> List.filter (fun i -> exclude |> List.contains i |> not)
let myFuncCustom list1 list2 =
(list1 |> except list2), (list2 |> except list1)
printfn "%A" (myFuncCustom list1 list2)
[],[4]
So really you want to implement filter using List.fold. In this case you would actually need List.foldBack:
let filter f src =
List.foldBack (fun item filtered ->
if f item then item :: filtered else filtered) src []
You can use List.fold but then results are reversed and you need to pipe this into List.rev. And note that List.fold only takes three arguments: the first a folder function; second the accumulator which becomes the output - in this case a list too; and, the last, the source list to fold over. (Let us expand List.contains as well):
let list1 = [1;2;3]
let list2 = [1;2;3;4]
let rec contains item = function
| [] -> false
| hd::tl when hd = item -> true
| hd::tl -> contains item tl
let filter f src =
src
|> List.fold (fun filtered item ->
if f item then item :: filtered else filtered) []
|> List.rev
let except exclude src =
src |> filter (fun i -> exclude |> contains i |> not)
let myFuncCustom list1 list2 =
(list1 |> except list2), (list2 |> except list1)
printfn "%A" (myFuncCustom list1 list2)
[],[4]
This should be what you want:
let difference list blacklist =
let folder acc a =
if List.contains a blacklist
then acc
else a::acc
List.fold folder [] list
difference [1;2;3;4] [1;2;3] // [4]
difference [1;2;3] [1;2;3;4] // []
Looking at the code you posted, there seems to be some confusion on how fold works.
the arguments to fold are
A function that somehow combines a given state with an element of the list. This function can be as simple as summing the two arguments together resulting in a single scalar or it can be something really complicated that creates some weird data structure.
An initial state which must be of the type that you want fold to produce
And, of course, the list you want to fold over
Fold iterates the list, by calling your fold function for every element of the list.
The first time your fold function is called, it will get the initial state. Every other time it will get the state produced from the previous iteration.
Fold will return the last state that was produced by your fold function (or the initial state if the list is empty)
As your goal is to better understand fold I try to explain fold instead of explaining how you achive your goal.
fold is bacially a for loop for immutable data-types. It allows you to eliminate mutable variables. For example,
lets assume you want to sum all values of an integer list. In an "imperative" style you are probaly used to
write something like this.
(* This xs is used through all exampes *)
let xs = [1..10]
(* Example A1 *)
let mutable sum = 0
for x in xs do
sum <- sum + x
(* sum = 55 *)
Before you loop through a list, you define a mutable sum and then mutate the sum and updating it on everey iteration.
This is how you achive it with List.fold.
(* Example A2 *)
let sum =
List.fold (fun sum x ->
sum + x
) 0 xs
(* sum = 55 *)
You can think of List.fold as the following.
The function is the body of the loop that gets executed for every item in your list.
The second argument to List.fold (here 0) is the state you want to compute. This is the sum.
The last argument of List.fold is finally the list you want to traverse.
The function always gets two arguments. The state and the next item of your list. Your function must return
the next state.
With the for-loop you also have state. But the state is outside of the for-loop and you achieve your goal
by mutating the state.
You also can think of the List.fold by mentally mapping the values to the lambda function you provide. The second
argument 0 will be sum in your lambda and x in your lambda is one value of xs. The result of your lambda is
the sum for the next call.
Let's say you want to compute three things on the fly. A mutable version looks like this
(* Helper Function *)
let isEven x = x &&& 1 = 0
(* Example B1 *)
let mutable count = 0
let mutable evens = 0
let mutable sum = 0
for x in xs do
count <- count + 1
if isEven x then
evens <- evens + 1
sum <- sum + x
(* count=10; evens=5; sum=55 *)
Here we compute the amount of values in a list, how many even values exists, and the sum in one go.
List.fold only allows one state, but the state can be a complex value. For example a tuple with three values. The
same example with List.fold looks like this:
(* Example B2 *)
let count,evens,sum =
List.fold (fun (count,evens,sum) x ->
(count+1), (if isEven x then evens + 1 else evens), (sum + x)
) (0,0,0) xs
(* count=10; evens=5; sum=55 *)
To better understand fold it is crucial to understand recursion and immutable data-strucutres like how list works.
You could implement fold yourself like this:
(* Self-defined fold *)
let rec myFold f state xs =
match xs with
| [] -> state
| x::rest -> myFold f (f state x) rest
(* Example C *)
let sum = myFold (fun sum x -> sum + x) 0 xs
(* sum = 55 *)
fold just do two things, it checks if the list is empty and in that case returns the state. Or it removes one element from the top of your list and calls itself recursively by
Keeping the function.
Producing the next state with (f state x)
Use the remaining list rest
Maybe you wonder about performance. This is tail-recursive, and tail-recursive functions are basically turned into for-loops by the compiler. So it has no performance penalty compared to the code that mutate things.
This is at least the case in F#. Just a reminder, not every compiler or run-time for other languages support tail-recursion.

Sequence of incorrect length generated by function

Why is the following function returning a sequence of incorrect length when the repl variable is set to false?
open MathNet.Numerics.Distributions
open MathNet.Numerics.LinearAlgebra
let sample (data : seq<float>) (size : int) (repl : bool) =
let n = data |> Seq.length
// without replacement
let rec generateIndex idx =
let m = size - Seq.length(idx)
match m > 0 with
| true ->
let newIdx = DiscreteUniform.Samples(0, n-1) |> Seq.take m
let idx = (Seq.append idx newIdx) |> Seq.distinct
generateIndex idx
| false ->
idx
let sample =
match repl with
| true ->
DiscreteUniform.Samples(0, n-1)
|> Seq.take size
|> Seq.map (fun index -> Seq.item index data)
| false ->
generateIndex (seq [])
|> Seq.map (fun index -> Seq.item index data)
sample
Running the function...
let requested = 1000
let dat = Normal.Samples(0., 1.) |> Seq.take 10000
let resultlen = sample dat requested false |> Seq.length
printfn "requested -> %A\nreturned -> %A" requested resultlen
Resulting lengths are wrong.
>
requested -> 1000
returned -> 998
>
requested -> 1000
returned -> 1001
>
requested -> 1000
returned -> 997
Any idea what mistake I'm making?
First, there's a comment I want to make about coding style. Then I'll get to the explanation of why your sequences are coming back with different lengths.
In the comments, I mentioned replacing match (bool) with true -> ... | false -> ... with a simple if ... then ... else expression, but there's another coding style that you're using that I think could be improved. You wrote:
let sample (various_parameters) = // This is a function
// Other code ...
let sample = some_calculation // This is a variable
sample // Return the variable
While F# allows you to reuse names like that, and the name inside the function will "shadow" the name outside the function, it's generally a bad idea for the reused name to have a totally different type than the original name. In other words, this can be a good idea:
let f (a : float option) =
let a = match a with
| None -> 0.0
| Some value -> value
// Now proceed, knowing that `a` has a real value even if had been None before
Or, because the above is exactly what F# gives you defaultArg for:
let f (a : float option) =
let a = defaultArg a 0.0
// This does exactly the same thing as the previous snippet
Here, we are making the name a inside our function refer to a different type than the parameter named a: the parameter was a float option, and the a inside our function is a float. But they're sort of the "same" type -- that is, there's very little mental difference between "The caller may have specified a floating-point value or they may not" and "Now I definitely have a floating-point value". But there's a very large mental gap between "The name sample is a function that takes three parameters" and "The name sample is a sequence of floats". I strongly recommend using a name like result for the value you're going to return from your function, rather than re-using the function name.
Also, this seems unnecessarily verbose:
let result =
match repl with
| true ->
DiscreteUniform.Samples(0, n-1)
|> Seq.take size
|> Seq.map (fun index -> Seq.item index data)
| false ->
generateIndex (seq [])
|> Seq.map (fun index -> Seq.item index data)
result
Anytime I find myself writing "let result = (something) ; result" at the end of my function, I usually just want to replace that whole code block with just the (something). I.e., the above snippet could just become:
match repl with
| true ->
DiscreteUniform.Samples(0, n-1)
|> Seq.take size
|> Seq.map (fun index -> Seq.item index data)
| false ->
generateIndex (seq [])
|> Seq.map (fun index -> Seq.item index data)
Which in turn can be replaced with an if...then...else expression:
if repl then
DiscreteUniform.Samples(0, n-1)
|> Seq.take size
|> Seq.map (fun index -> Seq.item index data)
else
generateIndex (seq [])
|> Seq.map (fun index -> Seq.item index data)
And that's the last expression in your code. In other words, I would probably rewrite your function as follows (changing ONLY the style, and making no changes to the logic):
open MathNet.Numerics.Distributions
open MathNet.Numerics.LinearAlgebra
let sample (data : seq<float>) (size : int) (repl : bool) =
let n = data |> Seq.length
// without replacement
let rec generateIndex idx =
let m = size - Seq.length(idx)
if m > 0 then
let newIdx = DiscreteUniform.Samples(0, n-1) |> Seq.take m
let idx = (Seq.append idx newIdx) |> Seq.distinct
generateIndex idx
else
idx
if repl then
DiscreteUniform.Samples(0, n-1)
|> Seq.take size
|> Seq.map (fun index -> Seq.item index data)
else
generateIndex (seq [])
|> Seq.map (fun index -> Seq.item index data)
If I can figure out why your sequences have the wrong length, I'll update this answer with that information as well.
UPDATE: Okay, I think I see what's happening in your generateIndex function that's giving you unexpected results. There are two things tripping you up: one is sequence laziness, and the other is randomness.
I copied your generateIndex function into VS Code and added some printfn statements to look at what's going on. First, the code I ran, and then the results:
let rec generateIndex n size idx =
let m = size - Seq.length(idx)
printfn "m = %d" m
match m > 0 with
| true ->
let newIdx = DiscreteUniform.Samples(0, n-1) |> Seq.take m
printfn "Generating newIdx as %A" (List.ofSeq newIdx)
let idx = (Seq.append idx newIdx) |> Seq.distinct
printfn "Now idx is %A" (List.ofSeq idx)
generateIndex n size idx
| false ->
printfn "Done, returning %A" (List.ofSeq idx)
idx
All those List.ofSeq idx calls are so that F# Interactive would print more than four items of the seq when I print it out (by default, if you try to print a seq with %A, it will only print out four values and then print an ellipsis if there are more values available in the seq). Also, I turned n and size into parameters (that I don't change between calls) so that I could test it easily. I then called it as generateIndex 100 5 (seq []) and got the following result:
m = 5
Generating newIdx as [74; 76; 97; 78; 31]
Now idx is [68; 28; 65; 58; 82]
m = 0
Done, returning [37; 58; 24; 48; 49]
val it : seq<int> = seq [12; 69; 97; 38; ...]
See how the numbers keep changing? That was my first clue that something was up. See, seqs are lazy. They don't evaluate their contents until they have to. You shouldn't think of a seq as a list of numbers. Instead, think of it as a generator that will, when asked for numbers, produce them according to some rule. In your case, the rule is "Choose random integers between 0 and n-1, then take m of those numbers". And the other thing about seqs is that they do not cache their contents (although there's a Seq.cache function available that will cache their contents). Therefore, if you have a seq based on a random number generator, its results will be different each time, as you can see in my output. When I printed out newIdx, it printed out as [74; 76; 97; 78; 31], but when I appended it to an empty seq, the result printed out as [68; 28; 65; 58; 82].
Why this difference? Because Seq.append does not force evaluation. It simply creates a new seq whose rule is "take all items from the first seq, then when that one exhausts, take all items from the second seq. And when that one exhausts, end." And Seq.distinct does not force evaluation either; it simply creates a new seq whose rule is "take the items from the seq handed to you, and start handing them out when asked. But memorize them as you go, and if you've handed one of them out before, don't hand it out again." So what you are passing around between your calls to generateIdx is an object that, when evaluated, will pick a set of random numbers between 0 and n-1 (in my simple case, between 0 and 100) and then reduce that set down to a distinct set of numbers.
Now, here's the thing. Every time you evaluate that seq, it will start from the beginning: first calling DiscreteUniform.Samples(0, n-1) to generate an infinite stream of random numbers, then selecting m numbers from that stream, then throwing out any duplicates. (I'm ignoring the Seq.append for now, because it would create unnecessary mental complexity and it isn't really part of the bug anyway). Now, at the start of each go-round of your function, you check the length of the sequence, which does cause it to be evaluated. That means that it selects (in the case of my sample code) 5 random numbers between 0 and 99, then makes sure that they're all distinct. If they are all distinct, then m = 0 and the function will exit, returning... not the list of numbers, but the seq object. And when that seq object is evaluated, it will start over from the beginning, choosing a different set of 5 random numbers and then throwing out any duplicates. Therefore, there's still a chance that at least one of that set of 5 numbers will end up being a duplicate, because the sequence whose length was tested (which we know contained no duplicates, otherwise m would have been greater than 0) was not the sequence that was returned. The sequence that was returned has a 1.0 * 0.99 * 0.98 * 0.97 * 0.96 chance of not containing any duplicates, which comes to about 0.9035. So there's a just-under-10% chance that even though you checked Seq.length and it was 5, the length of the returned seq ends up being 4 after all -- because it was choosing a different set of random numbers than the one you checked.
To prove this, I ran the function again, this time only picking 4 numbers so that the result would be completely shown at the F# Interactive prompt. And my run of generateIndex 100 4 (seq []) produced the following output:
m = 4
Generating newIdx as [36; 63; 97; 31]
Now idx is [39; 93; 53; 94]
m = 0
Done, returning [47; 94; 34]
val it : seq<int> = seq [48; 24; 14; 68]
Notice how when I printed "Done, returning (value of idx)", it had only 3 values? Even though it eventually returned 4 values (because it picked a different selection of random numbers for the actual result, and that selection had no duplicates), that demonstrated the problem.
By the way, there's one other problem with your function, which is that it's far slower than it needs to be. The function Seq.item, in some circumstances, has to run through the sequence from the beginning in order to pick the nth item of the sequence. It would be far better to store your data in an array at the start of your function (let arrData = data |> Array.ofSeq), then replace
|> Seq.map (fun index -> Seq.item index data)
with
|> Seq.map (fun index -> arrData.[index])
Array lookups are done in constant time, so that takes your sample function down from O(N^2) to O(N).
TL;DR: Use Seq.distinct before you take m values from it and the bug will go away. You can just replace your entire generateIdx function with a simple DiscreteUniform.Samples(0, n-1) |> Seq.distinct |> Seq.take size. (And use an array for your data lookups so that your function will run faster). In other words, here's the final almost-final version of how I would rewrite your code:
let sample (data : seq<float>) (size : int) (repl : bool) =
let arrData = data |> Array.ofSeq
let n = arrData |> Array.length
if repl then
DiscreteUniform.Samples(0, n-1)
|> Seq.take size
|> Seq.map (fun index -> arrData.[index])
else
DiscreteUniform.Samples(0, n-1)
|> Seq.distinct
|> Seq.take size
|> Seq.map (fun index -> arrData.[index])
That's it! Simple, easy to understand, and (as far as I can tell) bug-free.
Edit: ... but not completely DRY, because there's still a bit of repeated code in that "final" version. (Credit to CaringDev for pointing it out in the comments below). The Seq.take size |> Seq.map is repeated in both branches of the if expression, so there's a way to simplify that expression. We could do this:
let randomIndices =
if repl then
DiscreteUniform.Samples(0, n-1)
else
DiscreteUniform.Samples(0, n-1) |> Seq.distinct
randomIndices
|> Seq.take size
|> Seq.map (fun index -> arrData.[index])
So here's a truly-final version of my suggestion:
let sample (data : seq<float>) (size : int) (repl : bool) =
let arrData = data |> Array.ofSeq
let n = arrData |> Array.length
let randomIndices =
if repl then
DiscreteUniform.Samples(0, n-1)
else
DiscreteUniform.Samples(0, n-1) |> Seq.distinct
randomIndices
|> Seq.take size
|> Seq.map (fun index -> arrData.[index])

Return value in F# - incomplete construct

I've trying to learn F#. I'm a complete beginner, so this might be a walkover for you guys :)
I have the following function:
let removeEven l =
let n = List.length l;
let list_ = [];
let seq_ = seq { for x in 1..n do if x % 2 <> 0 then yield List.nth l (x-1)}
for x in seq_ do
let list_ = list_ # [x];
list_;
It takes a list, and return a new list containing all the numbers, which is placed at an odd index in the original list, so removeEven [x1;x2;x3] = [x1;x3]
However, I get my already favourite error-message: Incomplete construct at or before this point in expression...
If I add a print to the end of the line, instead of list_:
...
print_any list_;
the problem is fixed. But I do not want to print the list, I want to return it!
What causes this? Why can't I return my list?
To answer your question first, the compiler complains because there is a problem inside the for loop. In F#, let serves to declare values (that are immutable and cannot be changed later in the program). It isn't a statement as in C# - let can be only used as part of another expression. For example:
let n = 10
n + n
Actually means that you want the n symbol to refer to the value 10 in the expression n + n. The problem with your code is that you're using let without any expression (probably because you want to use mutable variables):
for x in seq_ do
let list_ = list_ # [x] // This isn't assignment!
list_
The problematic line is an incomplete expression - using let in this way isn't allowed, because it doesn't contain any expression (the list_ value will not be accessed from any code). You can use mutable variable to correct your code:
let mutable list_ = [] // declared as 'mutable'
let seq_ = seq { for x in 1..n do if x % 2 <> 0 then yield List.nth l (x-1)}
for x in seq_ do
list_ <- list_ # [x] // assignment using '<-'
Now, this should work, but it isn't really functional, because you're using imperative mutation. Moreover, appending elements using # is really inefficient thing to do in functional languages. So, if you want to make your code functional, you'll probably need to use different approach. Both of the other answers show a great approach, although I prefer the example by Joel, because indexing into a list (in the solution by Chaos) also isn't very functional (there is no pointer arithmetic, so it will be also slower).
Probably the most classical functional solution would be to use the List.fold function, which aggregates all elements of the list into a single result, walking from the left to the right:
[1;2;3;4;5]
|> List.fold (fun (flag, res) el ->
if flag then (not flag, el::res) else (not flag, res)) (true, [])
|> snd |> List.rev
Here, the state used during the aggregation is a Boolean flag specifying whether to include the next element (during each step, we flip the flag by returning not flag). The second element is the list aggregated so far (we add element by el::res only when the flag is set. After fold returns, we use snd to get the second element of the tuple (the aggregated list) and reverse it using List.rev, because it was collected in the reversed order (this is more efficient than appending to the end using res#[el]).
Edit: If I understand your requirements correctly, here's a version of your function done functional rather than imperative style, that removes elements with odd indexes.
let removeEven list =
list
|> Seq.mapi (fun i x -> (i, x))
|> Seq.filter (fun (i, x) -> i % 2 = 0)
|> Seq.map snd
|> List.ofSeq
> removeEven ['a'; 'b'; 'c'; 'd'];;
val it : char list = ['a'; 'c']
I think this is what you are looking for.
let removeEven list =
let maxIndex = (List.length list) - 1;
seq { for i in 0..2..maxIndex -> list.[i] }
|> Seq.toList
Tests
val removeEven : 'a list -> 'a list
> removeEven [1;2;3;4;5;6];;
val it : int list = [1; 3; 5]
> removeEven [1;2;3;4;5];;
val it : int list = [1; 3; 5]
> removeEven [1;2;3;4];;
val it : int list = [1; 3]
> removeEven [1;2;3];;
val it : int list = [1; 3]
> removeEven [1;2];;
val it : int list = [1]
> removeEven [1];;
val it : int list = [1]
You can try a pattern-matching approach. I haven't used F# in a while and I can't test things right now, but it would be something like this:
let rec curse sofar ls =
match ls with
| even :: odd :: tl -> curse (even :: sofar) tl
| even :: [] -> curse (even :: sofar) []
| [] -> List.rev sofar
curse [] [ 1; 2; 3; 4; 5 ]
This recursively picks off the even elements. I think. I would probably use Joel Mueller's approach though. I don't remember if there is an index-based filter function, but that would probably be the ideal to use, or to make if it doesn't exist in the libraries.
But in general lists aren't really meant as index-type things. That's what arrays are for. If you consider what kind of algorithm would require a list having its even elements removed, maybe it's possible that in the steps prior to this requirement, the elements can be paired up in tuples, like this:
[ (1,2); (3,4) ]
That would make it trivial to get the even-"indexed" elements out:
thelist |> List.map fst // take first element from each tuple
There's a variety of options if the input list isn't guaranteed to have an even number of elements.
Yet another alternative, which (by my reckoning) is slightly slower than Joel's, but it's shorter :)
let removeEven list =
list
|> Seq.mapi (fun i x -> (i, x))
|> Seq.choose (fun (i,x) -> if i % 2 = 0 then Some(x) else None)
|> List.ofSeq

F#: How do i split up a sequence into a sequence of sequences

Background:
I have a sequence of contiguous, time-stamped data. The data-sequence has gaps in it where the data is not contiguous. I want create a method to split the sequence up into a sequence of sequences so that each subsequence contains contiguous data (split the input-sequence at the gaps).
Constraints:
The return value must be a sequence of sequences to ensure that elements are only produced as needed (cannot use list/array/cacheing)
The solution must NOT be O(n^2), probably ruling out a Seq.take - Seq.skip pattern (cf. Brian's post)
Bonus points for a functionally idiomatic approach (since I want to become more proficient at functional programming), but it's not a requirement.
Method signature
let groupContiguousDataPoints (timeBetweenContiguousDataPoints : TimeSpan) (dataPointsWithHoles : seq<DateTime * float>) : (seq<seq< DateTime * float >>)= ...
On the face of it the problem looked trivial to me, but even employing Seq.pairwise, IEnumerator<_>, sequence comprehensions and yield statements, the solution eludes me. I am sure that this is because I still lack experience with combining F#-idioms, or possibly because there are some language-constructs that I have not yet been exposed to.
// Test data
let numbers = {1.0..1000.0}
let baseTime = DateTime.Now
let contiguousTimeStamps = seq { for n in numbers ->baseTime.AddMinutes(n)}
let dataWithOccationalHoles = Seq.zip contiguousTimeStamps numbers |> Seq.filter (fun (dateTime, num) -> num % 77.0 <> 0.0) // Has a gap in the data every 77 items
let timeBetweenContiguousValues = (new TimeSpan(0,1,0))
dataWithOccationalHoles |> groupContiguousDataPoints timeBetweenContiguousValues |> Seq.iteri (fun i sequence -> printfn "Group %d has %d data-points: Head: %f" i (Seq.length sequence) (snd(Seq.hd sequence)))
I think this does what you want
dataWithOccationalHoles
|> Seq.pairwise
|> Seq.map(fun ((time1,elem1),(time2,elem2)) -> if time2-time1 = timeBetweenContiguousValues then 0, ((time1,elem1),(time2,elem2)) else 1, ((time1,elem1),(time2,elem2)) )
|> Seq.scan(fun (indexres,(t1,e1),(t2,e2)) (index,((time1,elem1),(time2,elem2))) -> (index+indexres,(time1,elem1),(time2,elem2)) ) (0,(baseTime,-1.0),(baseTime,-1.0))
|> Seq.map( fun (index,(time1,elem1),(time2,elem2)) -> index,(time2,elem2) )
|> Seq.filter( fun (_,(_,elem)) -> elem <> -1.0)
|> PSeq.groupBy(fst)
|> Seq.map(snd>>Seq.map(snd))
Thanks for asking this cool question
I translated Alexey's Haskell to F#, but it's not pretty in F#, and still one element too eager.
I expect there is a better way, but I'll have to try again later.
let N = 20
let data = // produce some arbitrary data with holes
seq {
for x in 1..N do
if x % 4 <> 0 && x % 7 <> 0 then
printfn "producing %d" x
yield x
}
let rec GroupBy comp (input:LazyList<'a>) : LazyList<LazyList<'a>> =
LazyList.delayed (fun () ->
match input with
| LazyList.Nil -> LazyList.cons (LazyList.empty()) (LazyList.empty())
| LazyList.Cons(x,LazyList.Nil) ->
LazyList.cons (LazyList.cons x (LazyList.empty())) (LazyList.empty())
| LazyList.Cons(x,(LazyList.Cons(y,_) as xs)) ->
let groups = GroupBy comp xs
if comp x y then
LazyList.consf
(LazyList.consf x (fun () ->
let (LazyList.Cons(firstGroup,_)) = groups
firstGroup))
(fun () ->
let (LazyList.Cons(_,otherGroups)) = groups
otherGroups)
else
LazyList.cons (LazyList.cons x (LazyList.empty())) groups)
let result = data |> LazyList.of_seq |> GroupBy (fun x y -> y = x + 1)
printfn "Consuming..."
for group in result do
printfn "about to do a group"
for x in group do
printfn " %d" x
You seem to want a function that has signature
(`a -> bool) -> seq<'a> -> seq<seq<'a>>
I.e. a function and a sequence, then break up the input sequence into a sequence of sequences based on the result of the function.
Caching the values into a collection that implements IEnumerable would likely be simplest (albeit not exactly purist, but avoiding iterating the input multiple times. It will lose much of the laziness of the input):
let groupBy (fun: 'a -> bool) (input: seq) =
seq {
let cache = ref (new System.Collections.Generic.List())
for e in input do
(!cache).Add(e)
if not (fun e) then
yield !cache
cache := new System.Collections.Generic.List()
if cache.Length > 0 then
yield !cache
}
An alternative implementation could pass cache collection (as seq<'a>) to the function so it can see multiple elements to chose the break points.
A Haskell solution, because I don't know F# syntax well, but it should be easy enough to translate:
type TimeStamp = Integer -- ticks
type TimeSpan = Integer -- difference between TimeStamps
groupContiguousDataPoints :: TimeSpan -> [(TimeStamp, a)] -> [[(TimeStamp, a)]]
There is a function groupBy :: (a -> a -> Bool) -> [a] -> [[a]] in the Prelude:
The group function takes a list and returns a list of lists such that the concatenation of the result is equal to the argument. Moreover, each sublist in the result contains only equal elements. For example,
group "Mississippi" = ["M","i","ss","i","ss","i","pp","i"]
It is a special case of groupBy, which allows the programmer to supply their own equality test.
It isn't quite what we want, because it compares each element in the list with the first element of the current group, and we need to compare consecutive elements. If we had such a function groupBy1, we could write groupContiguousDataPoints easily:
groupContiguousDataPoints maxTimeDiff list = groupBy1 (\(t1, _) (t2, _) -> t2 - t1 <= maxTimeDiff) list
So let's write it!
groupBy1 :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy1 _ [] = [[]]
groupBy1 _ [x] = [[x]]
groupBy1 comp (x : xs#(y : _))
| comp x y = (x : firstGroup) : otherGroups
| otherwise = [x] : groups
where groups#(firstGroup : otherGroups) = groupBy1 comp xs
UPDATE: it looks like F# doesn't let you pattern match on seq, so it isn't too easy to translate after all. However, this thread on HubFS shows a way to pattern match sequences by converting them to LazyList when needed.
UPDATE2: Haskell lists are lazy and generated as needed, so they correspond to F#'s LazyList (not to seq, because the generated data is cached (and garbage collected, of course, if you no longer hold a reference to it)).
(EDIT: This suffers from a similar problem to Brian's solution, in that iterating the outer sequence without iterating over each inner sequence will mess things up badly!)
Here's a solution that nests sequence expressions. The imperitave nature of .NET's IEnumerable<T> is pretty apparent here, which makes it a bit harder to write idiomatic F# code for this problem, but hopefully it's still clear what's going on.
let groupBy cmp (sq:seq<_>) =
let en = sq.GetEnumerator()
let rec partitions (first:option<_>) =
seq {
match first with
| Some first' -> //'
(* The following value is always overwritten;
it represents the first element of the next subsequence to output, if any *)
let next = ref None
(* This function generates a subsequence to output,
setting next appropriately as it goes *)
let rec iter item =
seq {
yield item
if (en.MoveNext()) then
let curr = en.Current
if (cmp item curr) then
yield! iter curr
else // consumed one too many - pass it on as the start of the next sequence
next := Some curr
else
next := None
}
yield iter first' (* ' generate the first sequence *)
yield! partitions !next (* recursively generate all remaining sequences *)
| None -> () // return an empty sequence if there are no more values
}
let first = if en.MoveNext() then Some en.Current else None
partitions first
let groupContiguousDataPoints (time:TimeSpan) : (seq<DateTime*_> -> _) =
groupBy (fun (t,_) (t',_) -> t' - t <= time)
Okay, trying again. Achieving the optimal amount of laziness turns out to be a bit difficult in F#... On the bright side, this is somewhat more functional than my last attempt, in that it doesn't use any ref cells.
let groupBy cmp (sq:seq<_>) =
let en = sq.GetEnumerator()
let next() = if en.MoveNext() then Some en.Current else None
(* this function returns a pair containing the first sequence and a lazy option indicating the first element in the next sequence (if any) *)
let rec seqStartingWith start =
match next() with
| Some y when cmp start y ->
let rest_next = lazy seqStartingWith y // delay evaluation until forced - stores the rest of this sequence and the start of the next one as a pair
seq { yield start; yield! fst (Lazy.force rest_next) },
lazy Lazy.force (snd (Lazy.force rest_next))
| next -> seq { yield start }, lazy next
let rec iter start =
seq {
match (Lazy.force start) with
| None -> ()
| Some start ->
let (first,next) = seqStartingWith start
yield first
yield! iter next
}
Seq.cache (iter (lazy next()))
Below is some code that does what I think you want. It is not idiomatic F#.
(It may be similar to Brian's answer, though I can't tell because I'm not familiar with the LazyList semantics.)
But it doesn't exactly match your test specification: Seq.length enumerates its entire input. Your "test code" calls Seq.length and then calls Seq.hd. That will generate an enumerator twice, and since there is no caching, things get messed up. I'm not sure if there is any clean way to allow multiple enumerators without caching. Frankly, seq<seq<'a>> may not be the best data structure for this problem.
Anyway, here's the code:
type State<'a> = Unstarted | InnerOkay of 'a | NeedNewInner of 'a | Finished
// f() = true means the neighbors should be kept together
// f() = false means they should be split
let split_up (f : 'a -> 'a -> bool) (input : seq<'a>) =
// simple unfold that assumes f captured a mutable variable
let iter f = Seq.unfold (fun _ ->
match f() with
| Some(x) -> Some(x,())
| None -> None) ()
seq {
let state = ref (Unstarted)
use ie = input.GetEnumerator()
let innerMoveNext() =
match !state with
| Unstarted ->
if ie.MoveNext()
then let cur = ie.Current
state := InnerOkay(cur); Some(cur)
else state := Finished; None
| InnerOkay(last) ->
if ie.MoveNext()
then let cur = ie.Current
if f last cur
then state := InnerOkay(cur); Some(cur)
else state := NeedNewInner(cur); None
else state := Finished; None
| NeedNewInner(last) -> state := InnerOkay(last); Some(last)
| Finished -> None
let outerMoveNext() =
match !state with
| Unstarted | NeedNewInner(_) -> Some(iter innerMoveNext)
| InnerOkay(_) -> failwith "Move to next inner seq when current is active: undefined behavior."
| Finished -> None
yield! iter outerMoveNext }
open System
let groupContigs (contigTime : TimeSpan) (holey : seq<DateTime * int>) =
split_up (fun (t1,_) (t2,_) -> (t2 - t1) <= contigTime) holey
// Test data
let numbers = {1 .. 15}
let contiguousTimeStamps =
let baseTime = DateTime.Now
seq { for n in numbers -> baseTime.AddMinutes(float n)}
let holeyData =
Seq.zip contiguousTimeStamps numbers
|> Seq.filter (fun (dateTime, num) -> num % 7 <> 0)
let grouped_data = groupContigs (new TimeSpan(0,1,0)) holeyData
printfn "Consuming..."
for group in grouped_data do
printfn "about to do a group"
for x in group do
printfn " %A" x
Ok, here's an answer I'm not unhappy with.
(EDIT: I am unhappy - it's wrong! No time to try to fix right now though.)
It uses a bit of imperative state, but it is not too difficult to follow (provided you recall that '!' is the F# dereference operator, and not 'not'). It is as lazy as possible, and takes a seq as input and returns a seq of seqs as output.
let N = 20
let data = // produce some arbitrary data with holes
seq {
for x in 1..N do
if x % 4 <> 0 && x % 7 <> 0 then
printfn "producing %d" x
yield x
}
let rec GroupBy comp (input:seq<_>) = seq {
let doneWithThisGroup = ref false
let areMore = ref true
use e = input.GetEnumerator()
let Next() = areMore := e.MoveNext(); !areMore
// deal with length 0 or 1, seed 'prev'
if not(e.MoveNext()) then () else
let prev = ref e.Current
while !areMore do
yield seq {
while not(!doneWithThisGroup) do
if Next() then
let next = e.Current
doneWithThisGroup := not(comp !prev next)
yield !prev
prev := next
else
// end of list, yield final value
yield !prev
doneWithThisGroup := true }
doneWithThisGroup := false }
let result = data |> GroupBy (fun x y -> y = x + 1)
printfn "Consuming..."
for group in result do
printfn "about to do a group"
for x in group do
printfn " %d" x

Handy F# snippets [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
There are already two questions about F#/functional snippets.
However what I'm looking for here are useful snippets, little 'helper' functions that are reusable. Or obscure but nifty patterns that you can never quite remember.
Something like:
open System.IO
let rec visitor dir filter=
seq { yield! Directory.GetFiles(dir, filter)
for subdir in Directory.GetDirectories(dir) do
yield! visitor subdir filter}
I'd like to make this a kind of handy reference page. As such there will be no right answer, but hopefully lots of good ones.
EDIT Tomas Petricek has created a site specifically for F# snippets http://fssnip.net/.
Perl style regex matching
let (=~) input pattern =
System.Text.RegularExpressions.Regex.IsMatch(input, pattern)
It lets you match text using let test = "monkey" =~ "monk.+" notation.
Infix Operator
I got this from http://sandersn.com/blog//index.php/2009/10/22/infix-function-trick-for-f go to that page for more details.
If you know Haskell, you might find yourself missing infix sugar in F#:
// standard Haskell call has function first, then args just like F#. So obviously
// here there is a function that takes two strings: string -> string -> string
startsWith "kevin" "k"
//Haskell infix operator via backQuotes. Sometimes makes a function read better.
"kevin" `startsWith` "K"
While F# doesn't have a true 'infix' operator, the same thing can be accomplished almost as elegantly via a pipeline and a 'backpipeline' (who knew of such a thing??)
// F# 'infix' trick via pipelines
"kevin" |> startsWith <| "K"
Multi-Line Strings
This is pretty trivial, but it seems to be a feature of F# strings that is not widely known.
let sql = "select a,b,c \
from table \
where a = 1"
This produces:
val sql : string = "select a,b,c from table where a = 1"
When the F# compiler sees a back-slash followed by a carriage return inside a string literal, it will remove everything from the back-slash to the first non-space character on the next line. This allows you to have multi-line string literals that line up, without using a bunch of string concatenation.
Generic memoization, courtesy of the man himself
let memoize f =
let cache = System.Collections.Generic.Dictionary<_,_>(HashIdentity.Structural)
fun x ->
let ok, res = cache.TryGetValue(x)
if ok then res
else let res = f x
cache.[x] <- res
res
Using this, you could do a cached reader like so:
let cachedReader = memoize reader
Simple read-write to text files
These are trivial, but make file access pipeable:
open System.IO
let fileread f = File.ReadAllText(f)
let filewrite f s = File.WriteAllText(f, s)
let filereadlines f = File.ReadAllLines(f)
let filewritelines f ar = File.WriteAllLines(f, ar)
So
let replace f (r:string) (s:string) = s.Replace(f, r)
"C:\\Test.txt" |>
fileread |>
replace "teh" "the" |>
filewrite "C:\\Test.txt"
And combining that with the visitor quoted in the question:
let filereplace find repl path =
path |> fileread |> replace find repl |> filewrite path
let recurseReplace root filter find repl =
visitor root filter |> Seq.iter (filereplace find repl)
Update Slight improvement if you want to be able to read 'locked' files (e.g. csv files which are already open in Excel...):
let safereadall f =
use fs = new FileStream(f, FileMode.Open, FileAccess.Read, FileShare.ReadWrite)
use sr = new StreamReader(fs, System.Text.Encoding.Default)
sr.ReadToEnd()
let split sep (s:string) = System.Text.RegularExpressions.Regex.Split(s, sep)
let fileread f = safereadall f
let filereadlines f = f |> safereadall |> split System.Environment.NewLine
For performance intensive stuff where you need to check for null
let inline isNull o = System.Object.ReferenceEquals(o, null)
if isNull o then ... else ...
Is about 20x faster then
if o = null then ... else ...
Active Patterns, aka "Banana Splits", are a very handy construct that let one match against multiple regular expression patterns. This is much like AWK, but without the high performance of DFA's because the patterns are matched in sequence until one succeeds.
#light
open System
open System.Text.RegularExpressions
let (|Test|_|) pat s =
if (new Regex(pat)).IsMatch(s)
then Some()
else None
let (|Match|_|) pat s =
let opt = RegexOptions.None
let re = new Regex(pat,opt)
let m = re.Match(s)
if m.Success
then Some(m.Groups)
else None
Some examples of use:
let HasIndefiniteArticle = function
| Test "(?: |^)(a|an)(?: |$)" _ -> true
| _ -> false
type Ast =
| IntVal of string * int
| StringVal of string * string
| LineNo of int
| Goto of int
let Parse = function
| Match "^LET\s+([A-Z])\s*=\s*(\d+)$" g ->
IntVal( g.[1].Value, Int32.Parse(g.[2].Value) )
| Match "^LET\s+([A-Z]\$)\s*=\s*(.*)$" g ->
StringVal( g.[1].Value, g.[2].Value )
| Match "^(\d+)\s*:$" g ->
LineNo( Int32.Parse(g.[1].Value) )
| Match "^GOTO \s*(\d+)$" g ->
Goto( Int32.Parse(g.[1].Value) )
| s -> failwithf "Unexpected statement: %s" s
Maybe monad
type maybeBuilder() =
member this.Bind(v, f) =
match v with
| None -> None
| Some(x) -> f x
member this.Delay(f) = f()
member this.Return(v) = Some v
let maybe = maybeBuilder()
Here's a brief intro to monads for the uninitiated.
Option-coalescing operators
I wanted a version of the defaultArg function that had a syntax closer to the C# null-coalescing operator, ??. This lets me get the value from an Option while providing a default value, using a very concise syntax.
/// Option-coalescing operator - this is like the C# ?? operator, but works with
/// the Option type.
/// Warning: Unlike the C# ?? operator, the second parameter will always be
/// evaluated.
/// Example: let foo = someOption |? default
let inline (|?) value defaultValue =
defaultArg value defaultValue
/// Option-coalescing operator with delayed evaluation. The other version of
/// this operator always evaluates the default value expression. If you only
/// want to create the default value when needed, use this operator and pass
/// in a function that creates the default.
/// Example: let foo = someOption |?! (fun () -> new Default())
let inline (|?!) value f =
match value with Some x -> x | None -> f()
'Unitize' a function which doesn't handle units
Using the FloatWithMeasure function http://msdn.microsoft.com/en-us/library/ee806527(VS.100).aspx.
let unitize (f:float -> float) (v:float<'u>) =
LanguagePrimitives.FloatWithMeasure<'u> (f (float v))
Example:
[<Measure>] type m
[<Measure>] type kg
let unitize (f:float -> float) (v:float<'u>) =
LanguagePrimitives.FloatWithMeasure<'u> (f (float v))
//this function doesn't take units
let badinc a = a + 1.
//this one does!
let goodinc v = unitize badinc v
goodinc 3.<m>
goodinc 3.<kg>
OLD version:
let unitize (f:float -> float) (v:float<'u>) =
let unit = box 1. :?> float<'u>
unit * (f (v/unit))
Kudos to kvb
Scale/Ratio function builder
Again, trivial, but handy.
//returns a function which will convert from a1-a2 range to b1-b2 range
let scale (a1:float<'u>, a2:float<'u>) (b1:float<'v>,b2:float<'v>) =
let m = (b2 - b1)/(a2 - a1) //gradient of line (evaluated once only..)
(fun a -> b1 + m * (a - a1))
Example:
[<Measure>] type m
[<Measure>] type px
let screenSize = (0.<px>, 300.<px>)
let displayRange = (100.<m>, 200.<m>)
let scaleToScreen = scale displayRange screenSize
scaleToScreen 120.<m> //-> 60.<px>
Transposing a list (seen on Jomo Fisher's blog)
///Given list of 'rows', returns list of 'columns'
let rec transpose lst =
match lst with
| (_::_)::_ -> List.map List.head lst :: transpose (List.map List.tail lst)
| _ -> []
transpose [[1;2;3];[4;5;6];[7;8;9]] // returns [[1;4;7];[2;5;8];[3;6;9]]
And here is a tail-recursive version which (from my sketchy profiling) is mildly slower, but has the advantage of not throwing a stack overflow when the inner lists are longer than 10000 elements (on my machine):
let transposeTR lst =
let rec inner acc lst =
match lst with
| (_::_)::_ -> inner (List.map List.head lst :: acc) (List.map List.tail lst)
| _ -> List.rev acc
inner [] lst
If I was clever, I'd try and parallelise it with async...
F# Map <-> C# Dictionary
(I know, I know, System.Collections.Generic.Dictionary isn't really a 'C#' dictionary)
C# to F#
(dic :> seq<_>) //cast to seq of KeyValuePair
|> Seq.map (|KeyValue|) //convert KeyValuePairs to tuples
|> Map.ofSeq //convert to Map
(From Brian, here, with improvement proposed by Mauricio in comment below. (|KeyValue|) is an active pattern for matching KeyValuePair - from FSharp.Core - equivalent to (fun kvp -> kvp.Key, kvp.Value))
Interesting alternative
To get all of the immutable goodness, but with the O(1) lookup speed of Dictionary, you can use the dict operator, which returns an immutable IDictionary (see this question).
I currently can't see a way to directly convert a Dictionary using this method, other than
(dic :> seq<_>) //cast to seq of KeyValuePair
|> (fun kvp -> kvp.Key, kvp.Value) //convert KeyValuePairs to tuples
|> dict //convert to immutable IDictionary
F# to C#
let dic = Dictionary()
map |> Map.iter (fun k t -> dic.Add(k, t))
dic
What is weird here is that FSI will report the type as (for example):
val it : Dictionary<string,int> = dict [("a",1);("b",2)]
but if you feed dict [("a",1);("b",2)] back in, FSI reports
IDictionary<string,int> = seq[[a,1] {Key = "a"; Value = 1; } ...
Tree-sort / Flatten a tree into a list
I have the following binary tree:
___ 77 _
/ \
______ 47 __ 99
/ \
21 _ 54
\ / \
43 53 74
/
39
/
32
Which is represented as follows:
type 'a tree =
| Node of 'a tree * 'a * 'a tree
| Nil
let myTree =
Node
(Node
(Node (Nil,21,Node (Node (Node (Nil,32,Nil),39,Nil),43,Nil)),47,
Node (Node (Nil,53,Nil),54,Node (Nil,74,Nil))),77,Node (Nil,99,Nil))
A straightforward method to flatten the tree is:
let rec flatten = function
| Nil -> []
| Node(l, a, r) -> flatten l # a::flatten r
This isn't tail-recursive, and I believe the # operator causes it to be O(n log n) or O(n^2) with unbalanced binary trees. With a little tweaking, I came up with this tail-recursive O(n) version:
let flatten2 t =
let rec loop acc c = function
| Nil -> c acc
| Node(l, a, r) ->
loop acc (fun acc' -> loop (a::acc') c l) r
loop [] (fun x -> x) t
Here's the output in fsi:
> flatten2 myTree;;
val it : int list = [21; 32; 39; 43; 47; 53; 54; 74; 77; 99]
LINQ-to-XML helpers
namespace System.Xml.Linq
// hide warning about op_Explicit
#nowarn "77"
[<AutoOpen>]
module XmlUtils =
/// Converts a string to an XName.
let xn = XName.op_Implicit
/// Converts a string to an XNamespace.
let xmlns = XNamespace.op_Implicit
/// Gets the string value of any XObject subclass that has a Value property.
let inline xstr (x : ^a when ^a :> XObject) =
(^a : (member get_Value : unit -> string) x)
/// Gets a strongly-typed value from any XObject subclass, provided that
/// an explicit conversion to the output type has been defined.
/// (Many explicit conversions are defined on XElement and XAttribute)
/// Example: let value:int = xval foo
let inline xval (x : ^a when ^a :> XObject) : ^b =
((^a or ^b) : (static member op_Explicit : ^a -> ^b) x)
/// Dynamic lookup operator for getting an attribute value from an XElement.
/// Returns a string option, set to None if the attribute was not present.
/// Example: let value = foo?href
/// Example with default: let value = defaultArg foo?Name "<Unknown>"
let (?) (el:XElement) (name:string) =
match el.Attribute(xn name) with
| null -> None
| att -> Some(att.Value)
/// Dynamic operator for setting an attribute on an XElement.
/// Example: foo?href <- "http://www.foo.com/"
let (?<-) (el:XElement) (name:string) (value:obj) =
el.SetAttributeValue(xn name, value)
OK, this has nothing to do with snippets, but I keep forgetting this:
If you are in the interactive window, you hit F7 to jump back to the code window (without deselecting the code which you just ran...)
Going from code window to F# window (and also to open the F# window) is Ctrl Alt F
(unless CodeRush has stolen your bindings...)
Weighted sum of arrays
Calculating a weighted [n-array] sum of a [k-array of n-arrays] of numbers, based on a [k-array] of weights
(Copied from this question, and kvb's answer)
Given these arrays
let weights = [|0.6;0.3;0.1|]
let arrs = [| [|0.0453;0.065345;0.07566;1.562;356.6|] ;
[|0.0873;0.075565;0.07666;1.562222;3.66|] ;
[|0.06753;0.075675;0.04566;1.452;3.4556|] |]
We want a weighted sum (by column), given that both dimensions of the arrays can be variable.
Array.map2 (fun w -> Array.map ((*) w)) weights arrs
|> Array.reduce (Array.map2 (+))
First line: Partial application of the first Array.map2 function to weights yields a new function (Array.map ((*) weight) which is applied (for each weight) to each array in arr.
Second line: Array.reduce is like fold, except it starts on the second value and uses the first as the initial 'state'. In this case each value is a 'line' of our array of arrays. So applying an Array.map2 (+) on the first two lines means that we sum the first two arrays, which leaves us with a new array, which we then (Array.reduce) sum again onto the next (in this case last) array.
Result:
[|0.060123; 0.069444; 0.07296; 1.5510666; 215.40356|]
Performance testing
(Found here and updated for latest release of F#)
open System
open System.Diagnostics
module PerformanceTesting =
let Time func =
let stopwatch = new Stopwatch()
stopwatch.Start()
func()
stopwatch.Stop()
stopwatch.Elapsed.TotalMilliseconds
let GetAverageTime timesToRun func =
Seq.initInfinite (fun _ -> (Time func))
|> Seq.take timesToRun
|> Seq.average
let TimeOperation timesToRun =
GC.Collect()
GetAverageTime timesToRun
let TimeOperations funcsWithName =
let randomizer = new Random(int DateTime.Now.Ticks)
funcsWithName
|> Seq.sortBy (fun _ -> randomizer.Next())
|> Seq.map (fun (name, func) -> name, (TimeOperation 100000 func))
let TimeOperationsAFewTimes funcsWithName =
Seq.initInfinite (fun _ -> (TimeOperations funcsWithName))
|> Seq.take 50
|> Seq.concat
|> Seq.groupBy fst
|> Seq.map (fun (name, individualResults) -> name, (individualResults |> Seq.map snd |> Seq.average))
DataSetExtensions for F#, DataReaders
System.Data.DataSetExtensions.dll adds the ability to treat a DataTable as an IEnumerable<DataRow> as well as unboxing the values of individual cells in a way that gracefully handles DBNull by supporting System.Nullable. For example, in C# we can get the value of an integer column that contains nulls, and specify that DBNull should default to zero with a very concise syntax:
var total = myDataTable.AsEnumerable()
.Select(row => row.Field<int?>("MyColumn") ?? 0)
.Sum();
There are two areas where DataSetExtensions are lacking, however. First, it doesn't support IDataReader and second, it doesn't support the F# option type. The following code does both - it allows an IDataReader to be treated as a seq<IDataRecord>, and it can unbox values from either a reader or a dataset, with support for F# options or System.Nullable. Combined with the option-coalescing operator in another answer, this allows for code such as the following when working with a DataReader:
let total =
myReader.AsSeq
|> Seq.map (fun row -> row.Field<int option>("MyColumn") |? 0)
|> Seq.sum
Perhaps a more idiomatic F# way of ignoring database nulls would be...
let total =
myReader.AsSeq
|> Seq.choose (fun row -> row.Field<int option>("MyColumn"))
|> Seq.sum
Further, the extension methods defined below are usable from both F# and from C#/VB.
open System
open System.Data
open System.Reflection
open System.Runtime.CompilerServices
open Microsoft.FSharp.Collections
/// Ported from System.Data.DatasetExtensions.dll to add support for the Option type.
[<AbstractClass; Sealed>]
type private UnboxT<'a> private () =
// This class generates a converter function based on the desired output type,
// and then re-uses the converter function forever. Because the class itself is generic,
// different output types get different cached converter functions.
static let referenceField (value:obj) =
if value = null || DBNull.Value.Equals(value) then
Unchecked.defaultof<'a>
else
unbox value
static let valueField (value:obj) =
if value = null || DBNull.Value.Equals(value) then
raise <| InvalidCastException("Null cannot be converted to " + typeof<'a>.Name)
else
unbox value
static let makeConverter (target:Type) methodName =
Delegate.CreateDelegate(typeof<Converter<obj,'a>>,
typeof<UnboxT<'a>>
.GetMethod(methodName, BindingFlags.NonPublic ||| BindingFlags.Static)
.MakeGenericMethod([| target.GetGenericArguments().[0] |]))
|> unbox<Converter<obj,'a>>
|> FSharpFunc.FromConverter
static let unboxFn =
let theType = typeof<'a>
if theType.IsGenericType && not theType.IsGenericTypeDefinition then
let genericType = theType.GetGenericTypeDefinition()
if typedefof<Nullable<_>> = genericType then
makeConverter theType "NullableField"
elif typedefof<option<_>> = genericType then
makeConverter theType "OptionField"
else
invalidOp "The only generic types supported are Option<T> and Nullable<T>."
elif theType.IsValueType then
valueField
else
referenceField
static member private NullableField<'b when 'b : struct and 'b :> ValueType and 'b:(new:unit -> 'b)> (value:obj) =
if value = null || DBNull.Value.Equals(value) then
Nullable<_>()
else
Nullable<_>(unbox<'b> value)
static member private OptionField<'b> (value:obj) =
if value = null || DBNull.Value.Equals(value) then
None
else
Some(unbox<'b> value)
static member inline Unbox =
unboxFn
/// F# data-related extension methods.
[<AutoOpen>]
module FsDataEx =
type System.Data.IDataReader with
/// Exposes a reader's current result set as seq<IDataRecord>.
/// Reader is closed when sequence is fully enumerated.
member this.AsSeq =
seq { use reader = this
while reader.Read() do yield reader :> IDataRecord }
/// Exposes all result sets in a reader as seq<seq<IDataRecord>>.
/// Reader is closed when sequence is fully enumerated.
member this.AsMultiSeq =
let rowSeq (reader:IDataReader) =
seq { while reader.Read() do yield reader :> IDataRecord }
seq {
use reader = this
yield rowSeq reader
while reader.NextResult() do
yield rowSeq reader
}
/// Populates a new DataSet with the contents of the reader. Closes the reader after completion.
member this.ToDataSet () =
use reader = this
let dataSet = new DataSet(RemotingFormat=SerializationFormat.Binary, EnforceConstraints=false)
dataSet.Load(reader, LoadOption.OverwriteChanges, [| "" |])
dataSet
type System.Data.IDataRecord with
/// Gets a value from the record by name.
/// DBNull and null are returned as the default value for the type.
/// Supports both nullable and option types.
member this.Field<'a> (fieldName:string) =
this.[fieldName] |> UnboxT<'a>.Unbox
/// Gets a value from the record by column index.
/// DBNull and null are returned as the default value for the type.
/// Supports both nullable and option types.
member this.Field<'a> (ordinal:int) =
this.GetValue(ordinal) |> UnboxT<'a>.Unbox
type System.Data.DataRow with
/// Identical to the Field method from DatasetExtensions, but supports the F# Option type.
member this.Field2<'a> (columnName:string) =
this.[columnName] |> UnboxT<'a>.Unbox
/// Identical to the Field method from DatasetExtensions, but supports the F# Option type.
member this.Field2<'a> (columnIndex:int) =
this.[columnIndex] |> UnboxT<'a>.Unbox
/// Identical to the Field method from DatasetExtensions, but supports the F# Option type.
member this.Field2<'a> (column:DataColumn) =
this.[column] |> UnboxT<'a>.Unbox
/// Identical to the Field method from DatasetExtensions, but supports the F# Option type.
member this.Field2<'a> (columnName:string, version:DataRowVersion) =
this.[columnName, version] |> UnboxT<'a>.Unbox
/// Identical to the Field method from DatasetExtensions, but supports the F# Option type.
member this.Field2<'a> (columnIndex:int, version:DataRowVersion) =
this.[columnIndex, version] |> UnboxT<'a>.Unbox
/// Identical to the Field method from DatasetExtensions, but supports the F# Option type.
member this.Field2<'a> (column:DataColumn, version:DataRowVersion) =
this.[column, version] |> UnboxT<'a>.Unbox
/// C# data-related extension methods.
[<Extension; AbstractClass; Sealed>]
type CsDataEx private () =
/// Populates a new DataSet with the contents of the reader. Closes the reader after completion.
[<Extension>]
static member ToDataSet(this:IDataReader) =
this.ToDataSet()
/// Exposes a reader's current result set as IEnumerable{IDataRecord}.
/// Reader is closed when sequence is fully enumerated.
[<Extension>]
static member AsEnumerable(this:IDataReader) =
this.AsSeq
/// Exposes all result sets in a reader as IEnumerable{IEnumerable{IDataRecord}}.
/// Reader is closed when sequence is fully enumerated.
[<Extension>]
static member AsMultipleEnumerable(this:IDataReader) =
this.AsMultiSeq
/// Gets a value from the record by name.
/// DBNull and null are returned as the default value for the type.
/// Supports both nullable and option types.
[<Extension>]
static member Field<'T> (this:IDataRecord, fieldName:string) =
this.Field<'T>(fieldName)
/// Gets a value from the record by column index.
/// DBNull and null are returned as the default value for the type.
/// Supports both nullable and option types.
[<Extension>]
static member Field<'T> (this:IDataRecord, ordinal:int) =
this.Field<'T>(ordinal)
Handling arguments in a command line application:
//We assume that the actual meat is already defined in function
// DoStuff (string -> string -> string -> unit)
let defaultOutOption = "N"
let defaultUsageOption = "Y"
let usage =
"Scans a folder for and outputs results.\n" +
"Usage:\n\t MyApplication.exe FolderPath [IncludeSubfolders (Y/N) : default=" +
defaultUsageOption + "] [OutputToFile (Y/N): default=" + defaultOutOption + "]"
let HandlArgs arr =
match arr with
| [|d;u;o|] -> DoStuff d u o
| [|d;u|] -> DoStuff d u defaultOutOption
| [|d|] -> DoStuff d defaultUsageOption defaultOutOption
| _ ->
printf "%s" usage
Console.ReadLine() |> ignore
[<EntryPoint>]
let main (args : string array) =
args |> HandlArgs
0
(I had a vague memory of this technique being inspired by Robert Pickering, but can't find a reference now)
A handy cache function that keeps up to max (key,reader(key)) in a dictionary and use a SortedList to track the MRU keys
let Cache (reader: 'key -> 'value) max =
let cache = new Dictionary<'key,LinkedListNode<'key * 'value>>()
let keys = new LinkedList<'key * 'value>()
fun (key : 'key) -> (
let found, value = cache.TryGetValue key
match found with
|true ->
keys.Remove value
keys.AddFirst value |> ignore
(snd value.Value)
|false ->
let newValue = key,reader key
let node = keys.AddFirst newValue
cache.[key] <- node
if (keys.Count > max) then
let lastNode = keys.Last
cache.Remove (fst lastNode.Value) |> ignore
keys.RemoveLast() |> ignore
(snd newValue))
Creating XElements
Nothing amazing, but I keep getting caught out by the implicit conversion of XNames:
#r "System.Xml.Linq.dll"
open System.Xml.Linq
//No! ("type string not compatible with XName")
//let el = new XElement("MyElement", "text")
//better
let xn s = XName.op_Implicit s
let el = new XElement(xn "MyElement", "text")
//or even
let xEl s o = new XElement(xn s, o)
let el = xEl "MyElement" "text"
Pairwise and pairs
I always expect Seq.pairwise to give me [(1,2);(3;4)] and not [(1,2);(2,3);(3,4)]. Given that neither exist in List, and that I needed both, here's the code for future reference. I think they're tail recursive.
//converts to 'windowed tuples' ([1;2;3;4;5] -> [(1,2);(2,3);(3,4);(4,5)])
let pairwise lst =
let rec loop prev rem acc =
match rem with
| hd::tl -> loop hd tl ((prev,hd)::acc)
| _ -> List.rev acc
loop (List.head lst) (List.tail lst) []
//converts to 'paged tuples' ([1;2;3;4;5;6] -> [(1,2);(3,4);(5,6)])
let pairs lst =
let rec loop rem acc =
match rem with
| l::r::tl -> loop tl ((l,r)::acc)
| l::[] -> failwith "odd-numbered list"
| _ -> List.rev acc
loop lst []
Naive CSV reader (i.e., won't handle anything nasty)
(Using filereadlines and List.transpose from other answers here)
///Given a file path, returns a List of row lists
let ReadCSV =
filereadlines
>> Array.map ( fun line -> line.Split([|',';';'|]) |> List.ofArray )
>> Array.toList
///takes list of col ids and list of rows,
/// returns array of columns (in requested order)
let GetColumns cols rows =
//Create filter
let pick cols (row:list<'a>) = List.map (fun i -> row.[i]) cols
rows
|> transpose //change list of rows to list of columns
|> pick cols //pick out the columns we want
|> Array.ofList //an array output is easier to index for user
Example
"C:\MySampleCSV"
|> ReadCSV
|> List.tail //skip header line
|> GetColumns [0;3;1] //reorder columns as well, if needs be.
Date Range
simple but useful list of dates between fromDate and toDate
let getDateRange fromDate toDate =
let rec dates (fromDate:System.DateTime) (toDate:System.DateTime) =
seq {
if fromDate <= toDate then
yield fromDate
yield! dates (fromDate.AddDays(1.0)) toDate
}
dates fromDate toDate
|> List.ofSeq
toggle code to sql
More trivial than most on this list, but handy nonetheless:
I'm always taking sql in and out of code to move it to a sql environment during development. Example:
let sql = "select a,b,c "
+ "from table "
+ "where a = 1"
needs to be 'stripped' to:
select a,b,c
from table
where a = 1
keeping the formatting. It's a pain to strip out the code symbols for the sql editor, then put them back again by hand when I've got the sql worked out. These two functions toggle the sql back and forth from code to stripped:
// reads the file with the code quoted sql, strips code symbols, dumps to FSI
let stripForSql fileName =
File.ReadAllText(fileName)
|> (fun s -> Regex.Replace(s, "\+(\s*)\"", ""))
|> (fun s -> s.Replace("\"", ""))
|> (fun s -> Regex.Replace(s, ";$", "")) // end of line semicolons
|> (fun s -> Regex.Replace(s, "//.+", "")) // get rid of any comments
|> (fun s -> printfn "%s" s)
then when you are ready to put it back into your code source file:
let prepFromSql fileName =
File.ReadAllText(fileName)
|> (fun s -> Regex.Replace(s, #"\r\n", " \"\r\n+\"")) // matches newline
|> (fun s -> Regex.Replace(s, #"\A", " \""))
|> (fun s -> Regex.Replace(s, #"\z", " \""))
|> (fun s -> printfn "%s" s)
I'd love to get rid of the input file but can't even begin to grok how to make that happen. anyone?
edit:
I figured out how to eliminate the requirement of a file for these functions by adding a windows forms dialog input/output. Too much code to show, but for those who would like to do such a thing, that's how I solved it.
Pascal's Triangle (hey, someone might find it useful)
So we want to create a something like this:
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
Easy enough:
let rec next = function
| [] -> []
| x::y::xs -> (x + y)::next (y::xs)
| x::xs -> x::next xs
let pascal n =
seq { 1 .. n }
|> List.scan (fun acc _ -> next (0::acc) ) [1]
The next function returns a new list where each item[i] = item[i] + item[i + 1].
Here's the output in fsi:
> pascal 10 |> Seq.iter (printfn "%A");;
[1]
[1; 1]
[1; 2; 1]
[1; 3; 3; 1]
[1; 4; 6; 4; 1]
[1; 5; 10; 10; 5; 1]
[1; 6; 15; 20; 15; 6; 1]
[1; 7; 21; 35; 35; 21; 7; 1]
[1; 8; 28; 56; 70; 56; 28; 8; 1]
[1; 9; 36; 84; 126; 126; 84; 36; 9; 1]
[1; 10; 45; 120; 210; 252; 210; 120; 45; 10; 1]
For the adventurous, here's a tail-recursive version:
let rec next2 cont = function
| [] -> cont []
| x::y::xs -> next2 (fun l -> cont <| (x + y)::l ) <| y::xs
| x::xs -> next2 (fun l -> cont <| x::l ) <| xs
let pascal2 n =
set { 1 .. n }
|> Seq.scan (fun acc _ -> next2 id <| 0::acc)) [1]
Flatten a List
if you have something like this:
let listList = [[1;2;3;];[4;5;6]]
and want to 'flatten' it down to a singe list so the result is like this:
[1;2;3;4;5;6]
it can be done thusly:
let flatten (l: 'a list list) =
seq {
yield List.head (List.head l)
for a in l do yield! (Seq.skip 1 a)
}
|> List.ofSeq
List comprehensions for float
This [23.0 .. 1.0 .. 40.0] was marked as deprecated a few versions backed.
But apparently, this works:
let dl = 9.5 / 11.
let min = 21.5 + dl
let max = 40.5 - dl
let a = [ for z in min .. dl .. max -> z ]
let b = a.Length
(BTW, there's a floating point gotcha in there. Discovered at fssnip - the other place for F# snippets)
Parallel map
let pmap f s =
seq { for a in s -> async { return f s } }
|> Async.Parallel
|> Async.Run

Resources