Formulate an Arbitrary for pair of lists of ints of the same length - f#

I need some help to do this exercise about generators in f#.
The functions
List.zip : ('a list -> 'b list -> ('a * 'b) list)
and
List.unzip : (('a * 'b) list -> 'a list * 'b list)
are inverse of each other, under the condition that they operate on lists
of the same length.
Formulate an Arbitrary for pair of lists of ints of the same length
I tried to write some code:
let length xs ys =
List.length xs = List.length ys
let samelength =
Arb.filter length Arb.from<int list>
It doesn't work, I get a type mismatch at length in samelength:
Error: type mismatch. Expecting a 'a list -> bool but given a 'a list -> 'b list -> bool. The type bool does not match the type 'a list -> bool.
Edit:
As suggested I tried to follow the outline of steps but I'm stuck.
let sizegen =
Arb.filter (fun x -> x > 0) Arb.from<int>
let listgen =
let size = sizegen
let xs = Gen.listOfLength size
let ys = Gen.listOfLength size
xs, ys
And of course I have the error type mismatch:
Error: type mistmatch. Expected to have type int but here has type Arbitrary<int>
Edit
I solved the exercise but it seems that my generator is not working when I do the test, it looks another one is invoked.
let samelength (xs, ys) =
List.length xs = List.length ys
let arbMyGen2 = Arb.filter samelength Arb.from<int list * int list>
type MyGeneratorZ =
static member arbMyGen2() =
{
new Arbitrary<int list * int list>() with
override x.Generator = arbMyGen2 |> Arb.toGen
override x.Shrinker t = Seq.empty
}
let _ = Arb.register<MyGeneratorZ>()
let pro_zip (xs: int list, ys: int list) =
(xs, ys) = List.unzip(List.zip xs ys)
do Check.Quick pro_zip
I get the error:
Error:  System.ArgumentException: list1 is 1 element shorter than list2
But why? My generator should only generate two lists of the same length.

If we look at the API reference for the Arb module, and hover over the definition of filter, you'll see that the type of Arb.filter is:
pred:('a -> bool) -> a:Arbitrary<'a> -> a:Arbitrary<'a>
This means that the predicate should be a function of one parameter that returns a bool. But your length function is a function of two parameters. You want to turn it into a function of just one parameter.
Think of it this way. When you write Arb.filter length Arb.from<int list>, what you're saying is "I want to generate an arbitrary int list (just one at a time), and filter it according to the length rule." But the length rule you've written takes two lists and compares their length. If FsCheck generates just a single list of ints, what will it compare its length to? There's no second list to compare to, so the compiler can't actually turn your code into something that makes sense.
What you probably wanted to do (though there's a problem with this, which I'll get to in a minute) was generate a pair of lists, then pass it to your length predicate. I.e., you probably wanted Arb.from<int list * int list>. That will generate a pair of integer lists, completely independent from each other. Then you'll still get a type mismatch in your length function, but you just have to turn its signature from let length xs ys = to let length (xs,ys) =, e.g. have it receive a single argument that contains a pair of lists, instead of having each list as a separate argument. After those tweaks, your code looks like:
let length (xs,ys) =
List.length xs = List.length ys
let samelength =
Arb.filter length Arb.from<int list * int list>
But there are still problems with this. Specifically, if we look at the FsCheck documentation, we find this warning:
When using Gen.filter, be sure to provide a predicate with a high chance of returning true. If the predicate discards 'too many' candidates, it may cause tests to run slower, or to not terminate at all.
This applies to Arb.filter just as much as to Gen.filter, by the way. The way your code currently stands, this is a problem, because your filter will discard most pairs of lists. Since the lists are generated independently of each other, it will most often happen that they have different lengths, so your filter will return false most of the time. I'd suggest a different approach. Since you've said that this is an exercise, I won't write the code for you since you'll learn more by doing it yourself; I'll just give you an outline of the steps you'll want to take.
Generate a non-negative int n that will be the size of both lists in the pair. (For bonus points, use Gen.sized to get the "current size" of the data you should generate, and generate n as a value between 0 and size, so that your list-pair generator, like FsCheck's default list generator, will create lists that start small and slowly grow larger).
Use Gen.listOfLength n to generate both lists. (You could even do Gen.two (Gen.listOfLength n) to easily generate a pair of lists of the same size).
Don't forget to write an appropriate shrinker for a pair of lists, because the exercise wants you to generate a proper Arbitrary, and an Arbitrary that doesn't have a shrinker is not very useful in practice. You can probably do something with Arb.mapFilter here, where the mapper is id because you're already generating lists of matching length, but the filter is your length predicate. Then use Arb.fromGenShrink to turn your generator and shrinker functions into a proper Arbitrary instance.
If that outline isn't enough for you to get it working, ask another question about wherever you're stuck and I'll be glad to help out however I can.
Edit:
In your edit where you're trying to write a list generator using sizegen, you have the following code that doesn't work:
let listgen =
let size = sizegen
let xs = Gen.listOfLength size
let ys = Gen.listOfLength size
xs, ys
Here sizegen is a Gen<int> and you're wanting to extract the int parameter from it. There are several ways to do this, but the simplest is the gen { ... } computation expression that FsCheck has provided for us.
BTW, if you don't know what computation expressions are, they're some of F#'s most powerful features: they are highly complex under the hood, but they allow you to write very simple-looking code. You should bookmark https://fsharpforfunandprofit.com/series/computation-expressions.html and https://fsharpforfunandprofit.com/series/map-and-bind-and-apply-oh-my.html and plan to read them later. Don't worry if you don't understand them on your first, or second, or even fifth reading: that's fine. Just keep coming back to these two series of articles, and using computation expressions like gen or seq in practice, and eventually the concepts will become clear. And every time you read these series, you'll learn more, and get closer to that moment of enlightenment when it all "clicks" in your brain.
But back to your code. As I said, you want to use the gen { ... } computation expression. Inside a gen { ... } expression, the let! assignment will "unwrap" a Gen<Foo> object into the generated Foo, which you can then use in further code. Which is what you want to do with your size int. So we'll just wrap a gen { ... } expression around your code, and get the following:
let listgen =
gen {
let! size = sizegen
let xs = Gen.listOfLength size
let ys = Gen.listOfLength size
return (xs, ys)
}
Note that I also added a return keyword on the last line. Inside a computation expression, return has the opposite effect of let!. The let! keyword unwraps a value (the type goes from Gen<Foo> to Foo), while the return keyword wraps a value (the type goes from Foo to Gen<Foo>). So that return line takes an int list * int list and turns it into a Gen<int list * int list>. There's some very complex code going on under the hood, but at the surface level of the computation expression, you just need to think in terms of "unwrapping" and "wrapping" types to decide whether to use let! or return.

Related

F#: What to call a combination of map and fold, or of map and reduce?

A simple example, inspired by this question:
module SimpleExample =
let fooFold projection folder state source =
source |> List.map projection |> List.fold folder state
// val fooFold :
// projection:('a -> 'b) ->
// folder:('c -> 'b -> 'c) -> state:'c -> source:'a list -> 'c
let fooReduce projection reducer source =
source |> List.map projection |> List.reduce reducer
// val fooReduce :
// projection:('a -> 'b) -> reducer:('b -> 'b -> 'b) -> source:'a list -> 'b
let game = [0, 5; 10, 15]
let minX, maxX = fooReduce fst min game, fooReduce fst max game
let minY, maxY = fooReduce snd min game, fooReduce snd max game
What would be a natural name for the functions fooFold and fooReduce in this example? Alas, mapFold and mapReduce are already taken.
mapFold is part of the F# library and does a fold operation over the input to return a tuple of 'result list * 'state, similar to scan, but without the initial state and the need to provide the tuple as part of the state yourself. Its signature is:
val mapFold : ('State -> 'T -> 'Result * 'State) -> 'State -> 'T list
-> 'Result list * 'State
Since the projection can easily be integrated into the folder, the fooFold function is only included for illustration purposes.
And MapReduce:
MapReduce is an algorithm for processing huge datasets on certain
kinds of distributable problems using a large number of nodes
Now for a more complex example, where the fold/reduce is not directly applied to the input, but to the groupings following a selection of the keys.
The example has been borrowed from a Python library, where it is called - perhaps misleadingly - reduceby.
module ComplexExample =
let fooFold keySelection folder state source =
source |> Seq.groupBy keySelection
|> Seq.map (fun (k, xs) ->
k, Seq.fold folder state xs)
// val fooFold :
// keySelection:('a -> 'b) ->
// folder:('c -> 'a -> 'c) -> state:'c -> source:seq<'a> -> seq<'b * 'c>
// when 'b : equality
let fooReduce keySelection projection reducer source =
source |> Seq.groupBy keySelection
|> Seq.map (fun (k, xs) ->
k, xs |> Seq.map projection |> Seq.reduce reducer)
// val fooReduce :
// keySelection:('a -> 'b) ->
// projection:('a -> 'c) ->
// reducer:('c -> 'c -> 'c) -> source:seq<'a> -> seq<'b * 'c>
// when 'b : equality
type Project = { name : string; state : string; cost : decimal }
let projects =
[ { name = "build roads"; state = "CA"; cost = 1000000M }
{ name = "fight crime"; state = "IL"; cost = 100000M }
{ name = "help farmers"; state = "IL"; cost = 2000000M }
{ name = "help farmers"; state = "CA"; cost = 200000M } ]
fooFold (fun x -> x.state) (fun acc x -> acc + x.cost) 0M projects
// val it : seq<string * decimal> = seq [("CA", 1200000M); ("IL", 2100000M)]
fooReduce (fun x -> x.state) (fun x -> x.cost) (+) projects
// val it : seq<string * decimal> = seq [("CA", 1200000M); ("IL", 2100000M)]
What would be the natural name for the functions fooFold and fooReduce here?
I'd probably call the first two mapAndFold and mapAndReduce (though I agree that mapFold and mapReduce would be good names if they were not already taken). Alternatively, I'd go with mapThenFold (etc.), which is perhaps more explicit, but it reads a bit cumbersome.
For the more complex ones, reduceBy and foldBy sound good. The issue is that this would not work if you also wanted a version of those functions that do not do the mapping operation. If you wanted that, you'd probably need mapAndFoldBy and mapAndReduceBy (as well as just foldBy and reduceBy). This gets a bit ugly, but I'm afraid that's the best you can do.
More generally, the issue when comparing names with Python is that Python allows overloading whereas F# functions do not. This means that you need to have a unique name for functions that would have multiple overloads. This means that you just need to come up with a consistent naming scheme that will not make the names unbearably long.
(I experienced this when coming up with names for the functions in the Deedle library, which is somewhat inspired by Pandas. You can see for example the aggregation functions in Deedle for an example - there is a pattern in the naming to deal with the fact that each function needs a unique name.)
I have a different opinion as Thomas.
First; I think that not having overloads is a good thing, and giving every operation unique names is also
something good. I also would say that giving long names to functions rarely used is even more important
and should not be avoided.
Writing longer names is usally never a problem as we as programers usually use an IDE with auto-completion.
But reading and understanding is different. Knowing what a functions does because of a long descriptive name
is better then a short name.
A long descriptive function name gets more important the less often a function is used. It helps reading and
understanding the code. A short and less descriptive function name that is rarely used causes confusion. The
confusion would just increase if it even would be just an overload of another function name.
Yes; naming things can be hard, that's the reason why its important and shoudn't be avoided.
To what you describe. I would have name it mapFold and mapReduce. As those exactly describe what they do.
There is already a mapFold in F#, and in my opinion, the F# devs fucked up either with the naming, arguments or the
output of the function. But anyhow, they just fucked up.
I usually would have expected mapFold to do map and then fold. Actually it does, but it also returns the intermediate
list that is created on the run. Something I would not expect it to return. And i would also expect it to pass two
functions instead of one.
When we get to Thomas suggestion on naming it mapAndFold or mapThenFold. Then i would expect different behaviour
for those two functions. mapThenFold exactly tells what it does. map and then fold on it. I think the then is
not important. That's also why I would name it mapFold or mapReduce. Writing it this way already suggest a then.
But mapAndFold or mapAndReduce does not tell something about the order of execution. It just says it does two things
or somehow returns this AND that.
With that in mind, i would say that the F# library should have named its mapFold either mapAndFold, changed the return
value to just return the fold (and have two arguments instead of one). But hey, its fucked up now, we cannot change it anymore.
As for mapReduce, I think you are a little bit mistaken. The mapReduce algorithm is named that way, because it just does
map and then reduce. And that's it.
But functional programming with its stateless and more descriptive operations sometimes have additional benefits. Technically
a map is less powerful compared to a for/fold as it just describes how values are changed, without that the order matters
or the position in a list. But because of this limitation, you can run it in parallel, even on a big computer cluster. And that's all
what mapReduce Algorithm you cite do.
But that doesn't mean a mapReduce must always run its operation on a big cluster or in parallel. In my opinion you could
just name it mapReduce and that's fine. Everybody will know what it does and I think nobody expect it to suddenly run on
cluster.
In general I think the mapFold that F# provides is silly, here are 4 examples how I think it should have been provided.
let double x = x * 2
let add x y = x + y
mapFold double add 0 [1..10] // 110
mapAndFold double add 0 [1..10] // [2;4;6;8;10;12;14;16;18;20] * 110
mapReduce double add [1..10] // Some (110)
mapAndReduce double add [1..10] // Some ([2;4;6;8;10;12;14;16;18;20] * 110)
Well mapFold doesn't work that way, so you have the following options.
Implement mapReduce the way you have it. And ignore the in-consistency with mapFold.
Provide mapAndReduce and mapReduce.
Make your mapReduce return the same crap as the default implementation of mapFold does and provide mapThenReduce.
Like (3) but also add mapThenFold.
Option 4 has the most compatibility and expectation of what already exists in F#. But that doesn't mean you must do it that way.
In my opinion I would just:
implement mapReduce returning the result of map and then reduce.
I wouldn't care about a mapAndReduce version that returns a list and the result.
Provide a mapThenFold expecting two function arguments returning the result just of fold.
As a general notice: Implementing mapReduce just by calling map and then reduce is somewhat pointless. I would
expect it to have a more low-level implementation that does both things by just traversing the data-structure once.
If not, i just can call map and then reduce anyway.
So an implementation should look like:
let mapReduce mapper reducer xs =
let rec loop state xs =
match xs with
| [] -> state
| x::xs -> loop (reducer state (mapper x)) xs
match xs with
| [] -> ValueNone
| [x] -> ValueSome (mapper x)
| x::xs -> ValueSome (loop (mapper x) xs)
let double x = x * 2
let add x y = x + y
let some110 = mapReduce double add [1..10]

unexpected return type from list comprehension

I am teaching myself a bit of F# by doing a bit of simple matrix mathematics. I decided to write a set of simple functions for combining two matrices as I thought that this would be a good way of learning list comprehensions. However when I compile it my unit tests produce a type mismatch exception.
//return a column from the matrix as a list
let getColumn(matrix: list<list<double>>, column:int) =
[for row in matrix do yield row.Item(column)]
//return a row from the matrix as a list
let getRow(matrix: list<list<double>>, column:int) =
matrix.Item(column)
//find the minimum width of the matrices in order to avoid index out of range exceptions
let minWidth(matrix1: list<list<double>>,matrix2: list<list<double>>) =
let width1 = [for row in matrix1 do yield row.Length] |> List.min
let width2 = [for row in matrix2 do yield row.Length] |> List.min
if width1 > width2 then width2 else width1
//find the minimum height of the matrices in order to avoid index out of range exceptions
let minHeight(matrix1: list<list<double>>,matrix2: list<list<double>>) =
let height1 = matrix1.Length
let height2 = matrix2.Length
if height1 > height2 then height2 else height1
//combine the two matrices
let concat(matrix1: list<list<double>>,matrix2: list<list<double>>) =
let width = minWidth(matrix1, matrix2)
let height = minHeight(matrix1, matrix2)
[for y in 0 .. height do yield [for x in 0 .. width do yield (List.fold2 (fun acc a b -> acc + (a*b)), getRow(matrix1, y), getColumn(matrix2, x))]]
I was expecting the function to return a list of lists of type
double list list
However what it actually returns looks more like some kind of lambda expression
((int -> int list -> int list -> int) * double list * double list) list list
Can somebody tell me what is being returned, and how to force it to be evaluated into the list of lists that I originally expected?
There's a short answer and a long answer to your question.
The short answer
The short version is that F# functions (like List.fold2) take multiple parameters not with commas the way you think they do, but with spaces in between. I.e., you should NOT call List.fold2 like this:
List.fold2 (function, list1, list2)
but rather like this:
List.fold2 function list1 list2
Now, if you just remove the commas in your List.fold2 call, you'll see that the compiler complains about your getRow(matrix1, y) call, and tells you to put parentheses around them. (And the outer pair of parentheses around List.fold2 isn't actually needed). So this:
(List.fold2 (fun acc a b -> acc + (a*b)), getRow(matrix1, y), getColumn(matrix2, x))
Needs to turn into this:
List.fold2 (fun acc a b -> acc + (a*b)) (getRow(matrix1, y)) (getColumn(matrix2, x))
The long answer
The way F# functions take multiple parameters is actually very different from most other languages such as C#. In fact, all F# functions take exactly one parameter! "But wait," you're probably thinking right now, "you just now showed me the syntax for F# functions taking multiple parameters!" Yes, I did. What's going on under the hood is a combination of currying and partial application. I'd write a long explanation, but Scott Wlaschin has already written one, that's much better than I could have written, so I'll just point you to the https://fsharpforfunandprofit.com/series/thinking-functionally.html series to help you understand what's going on here. (The sections on currying and partial application are the ones you want, but I'd recommend reading the series in order because the later parts build on concepts introduced in earlier parts).
And yes, this "long" answer appears shorter than the "short" answer, but if you go read that series (and then the rest of Scott Wlaschin's excellent site), you'll find that it's much longer than the short answer. :-)
If you have more questions, I'll be happy to try to explain.

summing elements from a user defined datatype

Upon covering the predefined datatypes in f# (i.e lists) and how to sum elements of a list or a sequence, I'm trying to learn how I can work with user defined datatypes. Say I create a data type, call it list1:
type list1 =
A
| B of int * list1
Where:
A stands for an empty list
B builds a new list by adding an int in front of another list
so 1,2,3,4, will be represented with the list1 value:
B(1, B(2, B(3, B(4, A))))
From the wikibook I learned that with a list I can sum the elements by doing:
let List.sum [1; 2; 3; 4]
But how do I go about summing the elements of a user defined datatype? Any hints would be greatly appreciated.
Edit: I'm able to take advantage of the match operator:
let rec sumit (l: ilist) : int =
match l with
| (B(x1, A)) -> x1
| (B(x1, B(x2, A))) -> (x1+x2)
sumit (B(3, B(4, A)))
I get:
val it : int = 7
How can I make it so that if I have more than 2 ints it still sums the elemets (i.e. (B(3, B(4, B(5, A)))) gets 12?
One good general approach to questions like this is to write out your algorithm in word form or pseudocode form, then once you've figured out your algorithm, convert it to F#. In this case where you want to sum the lists, that would look like this:
The first step in figuring out an algorithm is to carefully define the specifications of the problem. I want an algorithm to sum my custom list type. What exactly does that mean? Or, to be more specific, what exactly does that mean for the two different kinds of values (A and B) that my custom list type can have? Well, let's look at them one at a time. If a list is of type A, then that represents an empty list, so I need to decide what the sum of an empty list should be. The most sensible value for the sum of an empty list is 0, so the rule is "I the list is of type A, then the sum is 0". Now, if the list is of type B, then what does the sum of that list mean? Well, the sum of a list of type B would be its int value, plus the sum of the sublist.
So now we have a "sum" rule for each of the two types that list1 can have. If A, the sum is 0. If B, the sum is (value + sum of sublist). And that rule translates almost verbatim into F# code!
let rec sum (lst : list1) =
match lst with
| A -> 0
| B (value, sublist) -> value + sum sublist
A couple things I want to note about this code. First, one thing you may or may not have seen before (since you seem to be an F# beginner) is the rec keyword. This is required when you're writing a recursive function: due to internal details in how the F# parser is implemented, if a function is going to call itself, you have to declare that ahead of time when you declare the function's name and parameters. Second, this is not the best way to write a sum function, because it is not actually tail-recursive, which means that it might throw a StackOverflowException if you try to sum a really, really long list. At this point in your learning F# you maybe shouldn't worry about that just yet, but eventually you will learn a useful technique for turning a non-tail-recursive function into a tail-recursive one. It involves adding an extra parameter usually called an "accumulator" (and sometimes spelled acc for short), and a properly tail-recursive version of the above sum function would have looked like this:
let sum (lst : list1) =
let rec tailRecursiveSum (acc : int) (lst : list1) =
match lst with
| A -> acc
| B (value, sublist) -> tailRecursiveSum (acc + value) sublist
tailRecursiveSum 0 lst
If you're already at the point where you can understand this, great! If you're not at that point yet, bookmark this answer and come back to it once you've studied tail recursion, because this technique (turning a non-tail-recursive function into a tail-recursive one with the use of an inner function and an accumulator parameter) is a very valuable one that has all sorts of applications in F# programming.
Besides tail-recursion, generic programming may be a concept of importance for the functional learner. Why go to the trouble of creating a custom data type, if it only can hold integer values?
The sum of all elements of a list can be abstracted as the repeated application of the addition operator to all elements of the list and an accumulator primed with an initial state. This can be generalized as a functional fold:
type 'a list1 = A | B of 'a * 'a list1
let fold folder (state : 'State) list =
let rec loop s = function
| A -> s
| B(x : 'T, xs) -> loop (folder s x) xs
loop state list
// val fold :
// folder:('State -> 'T -> 'State) -> state:'State -> list:'T list1 -> 'State
B(1, B(2, B(3, B(4, A))))
|> fold (+) 0
// val it : int = 10
Making also the sum function generic needs a little black magic called statically resolved type parameters. The signature isn't pretty, it essentially tells you that it expects the (+) operator on a type to successfully compile.
let inline sum xs = fold (+) Unchecked.defaultof<_> xs
// val inline sum :
// xs: ^a list1 -> ^b
// when ( ^b or ^a) : (static member ( + ) : ^b * ^a -> ^b)
B(1, B(2, B(3, B(4, A))))
|> sum
// val it : int = 10

Simple exercise of OCaml about list

Good Morning everyone,
I must do an exercise of Programming, but i'm stuck!
Well, the exercise requires a function that given a list not empty of integers, return the first number with maximum number of occurrences.
For example:
mode [1;2;5;1;2;3;4;5;5;4:5;5] ==> 5
mode [2;1;2;1;1;2] ==> 2
mode [-1;2;1;2;5;-1;5;5;2] ==> 2
mode [7] ==> 7
Important: the exercise must be in functional programming
My idea is:
let rec occurences_counter xs i = match xs with
|[] -> failwith "Error"
|x :: xs when x = i -> 1 + occurences_counter xs i
|x :: xs -> occurences_counter xs i;;
In this function i'm stuck:
let rec mode (l : int list) : int = match l with
|[] -> failwith "Error"
|[x] -> x
|x::y::l when occurences_counter l x >= occurences_counter l y -> x :: mode l
|x::y::l when occurences_counter l y > occurences_counter l x -> y :: mode l;;
Thanks in advance, i'm newbie in programming and in stackoverflow
Sorry for my english
one solution : calculate first a list of couples (number , occurences).
hint : use List.assoc.
Then, loop over that list of couple to find the max occurrence and then return the number.
One suggestion:
your algorithm could be simplified if you sort the list before. This has O(N log(N)) complexity. Then measure the longest sequence of identical numbers.
This is a good strategy because you delegate the hard part of the work to a well known algorithm.
It is probably not the most beautiful code, but here is with what i came up (F#). At first i transform every element to an intermediate format. This format contains the element itself, the position of it occurrence and the amount it occurred.
type T<'a> = {
Element: 'a
Position: int
Occurred: int
}
The idea is that those Records can be added. So you can first transform every element, and then add them together. So a list like
[1;3]
will be first transformed to
[{Element=1;Position=0;Occurred=1}; {Element=3;Position=1;Occurred=1}]
By adding two together you only can add those with the same "Element". The Position with the lower number from both is taken, and Occurred is just added together. So if you for example have
{Element=3;Position=1;Occurred=2} {Element=3;Position=3;Occurred=2}
the result will be
{Element=3;Position=1;Occurred=4}
The idea that i had in mind was a Monoid. But in a real Monoid you had to come up that you also could add different Elements together. By trying some stuff out i feel that the restriction of just adding the same Element where way more easier. I created a small Module with the type. Including some helper functions for creating, adding and comparing.
module Occurred =
type T<'a> = {
Element: 'a
Position: int
Occurred: int
}
let create x pos occ = {Element=x; Position=pos; Occurred=occ}
let sameElements x y = x.Element = y.Element
let add x y =
if not <| sameElements x y then failwith "Cannot add two different Occurred"
create x.Element (min x.Position y.Position) (x.Occurred + y.Occurred)
let compareOccurredPosition x y =
let occ = compare x.Occurred y.Occurred
let pos = compare x.Position y.Position
match occ,pos with
| 0,x -> x * -1
| x,_ -> x
With this setup i now wrote two additional function. One aggregate function that first turns every element into a Occurred.T, group them by x.Element (the result is a list of list). And then it uses List.reduce on the inner list to add the Occurred with the same Element together. The result is a List that Contains only a single Occurred.T for every Element with the first Position and the amount of Occurred items.
let aggregate =
List.mapi (fun i x -> Occurred.create x i 1)
>> List.groupBy (fun occ -> occ.Element)
>> List.map (fun (x,occ) -> List.reduce Occurred.add occ)
You could use that aggregate function to now implement different aggregation logic. In your case you only wanted the one with the highest Occurrences and the lowest position. I wrote another function that did that.
let firstMostOccurred =
List.sortWith (fun x y -> (Occurred.compareOccurredPosition x y) * -1) >> List.head >> (fun x -> x.Element)
One note. Occurred.compareOccurredPosition is written that it sorts everything in ascending order. I think people expecting it in this order to go to the smallest to the biggest element by default. So by default the first element would be the element with the lowest occurrence and the biggest Position. By multiplying the result of it with -1 you turn that function into a descending sorting function. The reason why i did that is that i could use List.head. I also could use List.last to get the last element, but i felt that it would be better not to go through the whole list again just to get the last element. On top of it, you didn't wanted an Occurred.T you wanted the element itself, so i unwrap the Element to get the number.
Here is everything in action
let ll = [
[1;2;5;1;2;3;4;5;5;4;5;5]
[2;1;2;1;1;2]
[-1;2;1;2;5;-1;5;5;2]
[7]
]
ll
|> List.map aggregate
|> List.map firstMostOccurred
|> List.iter (printfn "%d")
This code will now print
5
2
2
7
It has still some rough edges like
Occurred.add throws an exception if you try to add Occurred with different Elements
List.head throws an exception for empty lists
And in both cases no code is written to handle those cases or making sure an exception will not raise.
You need to process you input list while maintaining a state, that stores the number of occurrences of each number. Basically, the state can be a map, where keys are in the domain of list elements, and values are in domain of natural numbers. If you will use Map the algorithm would be of O(NlogN) complexity. You can also use associative list (i.e., a list of type ('key,'value) list) to implement map. This will lead to quadratic complexity. Another approach is to use hash table or an array of the length equal to the size of the input domain. Both will give you a linear complexity.
After you collected the statistics, (i.e., a mapping from element to the number of its occurrences) you need to go through the set of winners, and choose the one, that was first on the list.
In OCaml the solution would look like this:
open Core_kernel.Std
let mode xs : int =
List.fold xs ~init:Int.Map.empty ~f:(fun stat x ->
Map.change stat x (function
| None -> Some 1
| Some n -> Some (n+1))) |>
Map.fold ~init:Int.Map.empty ~f:(fun ~key:x ~data:n modes ->
Map.add_multi modes ~key:n ~data:x) |>
Map.max_elt |> function
| None -> invalid_arg "mode: empty list"
| Some (_,ms) -> List.find_exn xs ~f:(List.mem ms)
The algorithm is the following:
Run through input and compute frequency of each element
Run through statistics and compute spectrum (i.e., a mapping from frequency to elements).
Get the set of elements that has the highest frequency, and find an element in the input list, that is in this set.
For example, if we take sample [1;2;5;1;2;3;4;5;5;4;5;5],
stats = {1 => 2; 2 => 2; 3 => 1; 4 => 2; 5 => 5}
mods = {1 => [3]; 2 => [1;2]; 5 => [5]}
You need to install core library to play with it. Use coretop to play with this function in the toplevel. Or corebuild to compile it, like this:
corebuild test.byte --
if the source code is stored in test.ml

cons operator (::) in F#

The :: operator in F# always prepends elements to the list. Is there an operator that appends to the list? I'm guessing that using # operator
[1; 2; 3] # [4]
would be less efficient, than appending one element.
As others said, there is no such operator, because it wouldn't make much sense. I actually think that this is a good thing, because it makes it easier to realize that the operation will not be efficient. In practice, you shouldn't need the operator - there is usually a better way to write the same thing.
Typical scenario: I think that the typical scenario where you could think that you need to append elements to the end is so common that it may be useful to describe it.
Adding elements to the end seems necessary when you're writing a tail-recursive version of a function using the accumulator parameter. For example a (inefficient) implementation of filter function for lists would look like this:
let filter f l =
let rec filterUtil acc l =
match l with
| [] -> acc
| x::xs when f x -> filterUtil (acc # [x]) xs
| x::xs -> filterUtil acc xs
filterUtil [] l
In each step, we need to append one element to the accumulator (which stores elements to be returned as the result). This code can be easily modified to use the :: operator instead of appending elements to the end of the acc list:
let filter f l =
let rec filterUtil acc l =
match l with
| [] -> List.rev acc // (1)
| x::xs when f x -> filterUtil (x::acc) xs // (2)
| x::xs -> filterUtil acc xs
filterUtil [] l
In (2), we're now adding elements to the front of the accumulator and when the function is about to return the result, we reverse the list (1), which is a lot more efficient than appending elements one by one.
Lists in F# are singly-linked and immutable. This means consing onto the front is O(1) (create an element and have it point to an existing list), whereas snocing onto the back is O(N) (as the entire list must be replicated; you can't change the existing final pointer, you must create a whole new list).
If you do need to "append one element to the back", then e.g.
l # [42]
is the way to do it, but this is a code smell.
The cost of appending two standard lists is proportional to the length of the list on the left. In particular, the cost of
xs # [x]
is proportional to the length of xs—it is not a constant cost.
If you want a list-like abstraction with a constant-time append, you can use John Hughes's function representation, which I'll call hlist. I'll try to use OCaml syntax, which I hope is close enough to F#:
type 'a hlist = 'a list -> 'a list (* a John Hughes list *)
let empty : 'a hlist = let id xs = xs in id
let append xs ys = fun tail -> xs (ys tail)
let singleton x = fun tail -> x :: tail
let cons x xs = append (singleton x) xs
let snoc xs x = append xs (singleton x)
let to_list : 'a hlist -> 'a list = fun xs -> xs []
The idea is that you represent a list functionally as a function from "the rest of the elements" to "the final list". This works great if you are going to build up the whole list before you look at any of the elements. Otherwise you'll have to deal with the linear cost of append or use another data structure entirely.
I'm guessing that using # operator [...] would be less efficient, than appending one element.
If it is, it will be a negligible difference. Both appending a single item and concatenating a list to the end are O(n) operations. As a matter of fact I can't think of a single thing that # has to do, which a single-item append function wouldn't.
Maybe you want to use another data structure. We have double-ended queues (or short "Deques") in fsharpx. You can read more about them at http://jackfoxy.com/double-ended-queues-for-fsharp
The efficiency (or lack of) comes from iterating through the list to find the final element. So declaring a new list with [4] is going to be negligible for all but the most trivial scenarios.
Try using a double-ended queue instead of list. I recently added 4 versions of deques (Okasaki's spelling) to FSharpx.Core (Available through NuGet. Source code at FSharpx.Core.Datastructures). See my article about using dequeus Double-ended queues for F#
I've suggested to the F# team the cons operator, ::, and the active pattern discriminator be made available for other data structures with a head/tail signature.3

Resources