F# mergesort tail-recursion - f#

I tried to write a tail recursive code for mergesort. The code does compile and run. However, the output is wrong. It only outputs one integer. I was wondering how I would fix this code so that the list of integers are sorted and outputted.
let rec merge L L2 P =
match L, L2 with
| [], [] -> P
| [], _ -> L2
| _, [] -> L
| hd::t1, hd2::t2 ->
if hd <= hd2 then
merge t1 L2 (P # [hd])
else
merge L t2 (P # [hd2])
//
// mergesort:
//
let rec ms L =
match L with
| [] -> []
| e::[] -> L
| _ ->
let mid = List.length L / 2
let (L, L2) = List.splitAt mid L
merge (ms L) (ms L2) []

Your problem is in function merge: imagine you sort the list [2;1]. It turns to merge [2] [1] [], then to merge [] [2] [1], and finally second case of match yields [2]. Second and third cases of match must account for P somehow.
In fact, you absolutely do not need to manipulate 3 lists in your merge, two is quite enough if we refactor it to:
let rec merge l1 l2 =
match (l1,l2) with
| (x,[]) -> x
| ([],y) -> y
| (x::tx,y::ty) ->
if x <= y then x::merge tx l2
else y::merge l1 ty
and change the last line of ms to merge (ms L) (ms L2) - and this variant does work as expected:
ms List<int>.Empty returns []
ms [2;1] returns [1;2]
e.t.c
Update: As #kvb pointed out the merge function above is not tail recursive. This is correct and refactoring it to tail-recursive version requires more involvement by introducing an accumulator acc being filled via continuation function:
let merge l1 l2 =
let rec mergeAux continuation l1 l2 =
match l1, l2 with
| l1, [] -> continuation l1
| [], l2 -> continuation l2
| x::tx, y::ty ->
if x <= y then mergeAux (fun acc -> continuation(x::acc)) tx l2
else mergeAux (fun acc -> continuation(y::acc)) l1 ty
mergeAux id l1 l2
Now the implementation is tail-recursive that is easy to check with:
let rand = System.Random() in
List.init 1000000 (fun _ -> rand.Next(-10000,10000)) |> ms
>
val it : int list =
[-10000; -10000; -10000; -10000; -10000; -10000; -10000; ...

Even after the changes made for the accepted answer you still do not have a tail recursive merge sort. The last line of the merge sort merge (ms L) (ms L2) calls ms twice then calls merge. In order for a function to be tail recursive your function must end with at most one recursive call to itself. This scenario is where continuations are needed. By passing a continuation you can make one call to ms and pass it a continuation that makes the second call to ms and pass that second call another continuation that makes the call to merge. I would actually remove the continuation from the merge function as it is not needed and it makes the merge function more difficult to read than implementing it with an accumulator parameter. Lastly, for easy callability from the outside, I would nest the merge function as well as the ms function inside of a mergeSort function that only takes one list parameter, There's no need to expose the rest of the details to callers. My implementation of a fully tail recursive merge sort in F# would be as follows:
let mergeSort ls =
let rec merge l1 l2 res =
match l1, l2 with
| [], [] -> res |> List.rev
| x::xs, [] -> merge xs [] (x::res)
| [], y::ys -> merge [] ys (y::res)
| x::xs, y::ys when x < y -> merge xs (y::ys) (x::res)
| xs, y::ys -> merge xs ys (y::res)
let rec ms ls cont =
match ls with
| [] -> cont []
| [x] -> cont [x]
| xs ->
let ys, zs = List.splitAt ((List.length xs) / 2) xs
ms ys (fun ys' -> ms zs (fun zs' -> cont (merge ys' zs' [])))
ms ls id
Note that there are ways to do this more efficiently in terms of memory usage, which would likely also help the speed due to less memory allocations, but since that is beyond the scope of this question I'm not going to get into that in the answer.

Related

F# merge sort error value restriction [duplicate]

let rec merge = function
| ([], ys) -> ys
| (xs, []) -> xs
| (x::xs, y::ys) -> if x < y then x :: merge (xs, y::ys)
else y :: merge (x::xs, ys)
let rec split = function
| [] -> ([], [])
| [a] -> ([a], [])
| a::b::cs -> let (M,N) = split cs
(a::M, b::N)
let rec mergesort = function
| [] -> []
| L -> let (M, N) = split L
merge (mergesort M, mergesort N)
mergesort [5;3;2;1] // Will throw an error.
I took this code from here StackOverflow Question but when I run the mergesort with a list I get an error:
stdin(192,1): error FS0030: Value restriction. The value 'it' has been inferred to have generic type
val it : '_a list when '_a : comparison
How would I fix this problem? What is the problem? The more information, the better (so I can learn :) )
Your mergesort function is missing a case causing the signature to be inferred by the compiler to be 'a list -> 'b list instead of 'a list -> 'a list which it should be. The reason it should be 'a list -> 'a list is that you're not looking to changing the type of the list in mergesort.
Try changing your mergesort function to this, that should fix the problem:
let rec mergesort = function
| [] -> []
| [a] -> [a]
| L -> let (M, N) = split L
merge (mergesort M, mergesort N)
Another problem with your code however is that neither merge nor split is tail recursive and you will therefore get stack overflow exceptions on large lists (try to call the corrected mergesort like this mergesort [for i in 1000000..-1..1 -> i]).
You can make your split and merge functions tail recursive by using the accumulator pattern
let split list =
let rec aux l acc1 acc2 =
match l with
| [] -> (acc1,acc2)
| [x] -> (x::acc1,acc2)
| x::y::tail ->
aux tail (x::acc1) (y::acc2)
aux list [] []
let merge l1 l2 =
let rec aux l1 l2 result =
match l1, l2 with
| [], [] -> result
| [], h :: t | h :: t, [] -> aux [] t (h :: result)
| h1 :: t1, h2 :: t2 ->
if h1 < h2 then aux t1 l2 (h1 :: result)
else aux l1 t2 (h2 :: result)
List.rev (aux l1 l2 [])
You can read more about the accumulator pattern here; the examples are in lisp but it's a general pattern that works in any language that provides tail call optimization.

Merge sort for f sharp

This is my code, when I enter a very large number I get stack overflow error does anyone know why? When i enter a very large number i get that error and im not really sure what is causing it, it is only with large numbers small ones work fine.....
//
// merge two sorted lists into one:
//
let rec merge L1 L2 =
if L1 = [] && L2 = [] then
[]
else if L1 = [] then
L2
else if L2 = [] then
L1
else if L1.Head <= L2.Head then
L1.Head :: merge L1.Tail L2
else
L2.Head :: merge L1 L2.Tail
//
// mergesort:
//
let rec mergesort L =
match L with
| [] -> []
| E::[] -> L
| _ ->
let mid = List.length L / 2
let (L1, L2) = List.splitAt mid L
merge (mergesort L1) (mergesort L2)
In both your functions you had the problem, that the last step you take is not the recursive call but some other thing:
in merge it is the :: operation
in mergesort it is the merge
So you have to get to a point where the very last thing is the recursive call!
One possibility in situations where you have more than one recursive call to make is to use continuations - the idea is to pass a function around that should be called with the result of the current step and then continue the computation from there.
this is a tail-recursive version of mergesort using this technique:
let mergesort xs =
let rec msort xs cont =
match xs with
| [] -> cont []
| [x] -> cont xs
| _ ->
let mid = List.length xs / 2
let (xs', xs'') = List.splitAt mid xs
msort xs' (fun ys' -> msort xs'' (fun ys'' -> cont (merge ys' ys'')))
msort xs id
as you can see the idea is not to hard - instead of first calling both recursive paths it starts with just one half but adds a continuation that basically says:
once I have the result of mergesort xs' I take the result ys' and continue by mergesorting xs'' and then merge those
of course the second step is done in just the same way (push the merge into the continuation)
the very first continuation is usually the identity as you can see in the very last line ;)
and here is something similar for your merge:
let merge xs ys =
let rec mrg xs ys cont =
match (xs, ys) with
| ([], ys) -> cont ys
| (xs, []) -> cont xs
| (x::xs', y::ys') ->
if x < y
then mrg xs' ys (fun rs -> cont (x::rs))
else mrg xs ys' (fun rs -> cont (y::rs))
mrg xs ys id
those will of course take as much space on the heap (probably more) - but that is usually no problem - your stack should be fine ;)
Each recursive call requires stack space. The more times mergesort calls itself, the more stack is used.
You avoid the stack overflow with recursive function by taking advantage of tail recursion. It simply means the last thing a function does is call itself, the call is removed and turns into a jump instead, saving stack space.
This is tricky to do in your case because you have to call mergesort twice. Only one of them can be last. The solution is to use a continuation. You only call mergesort once, but pass it a function to call, which will call mergesort the second time.
Search the internet for F# examples of a merge sort that uses continuations.

Mergesort Getting an Error in F#

let rec merge = function
| ([], ys) -> ys
| (xs, []) -> xs
| (x::xs, y::ys) -> if x < y then x :: merge (xs, y::ys)
else y :: merge (x::xs, ys)
let rec split = function
| [] -> ([], [])
| [a] -> ([a], [])
| a::b::cs -> let (M,N) = split cs
(a::M, b::N)
let rec mergesort = function
| [] -> []
| L -> let (M, N) = split L
merge (mergesort M, mergesort N)
mergesort [5;3;2;1] // Will throw an error.
I took this code from here StackOverflow Question but when I run the mergesort with a list I get an error:
stdin(192,1): error FS0030: Value restriction. The value 'it' has been inferred to have generic type
val it : '_a list when '_a : comparison
How would I fix this problem? What is the problem? The more information, the better (so I can learn :) )
Your mergesort function is missing a case causing the signature to be inferred by the compiler to be 'a list -> 'b list instead of 'a list -> 'a list which it should be. The reason it should be 'a list -> 'a list is that you're not looking to changing the type of the list in mergesort.
Try changing your mergesort function to this, that should fix the problem:
let rec mergesort = function
| [] -> []
| [a] -> [a]
| L -> let (M, N) = split L
merge (mergesort M, mergesort N)
Another problem with your code however is that neither merge nor split is tail recursive and you will therefore get stack overflow exceptions on large lists (try to call the corrected mergesort like this mergesort [for i in 1000000..-1..1 -> i]).
You can make your split and merge functions tail recursive by using the accumulator pattern
let split list =
let rec aux l acc1 acc2 =
match l with
| [] -> (acc1,acc2)
| [x] -> (x::acc1,acc2)
| x::y::tail ->
aux tail (x::acc1) (y::acc2)
aux list [] []
let merge l1 l2 =
let rec aux l1 l2 result =
match l1, l2 with
| [], [] -> result
| [], h :: t | h :: t, [] -> aux [] t (h :: result)
| h1 :: t1, h2 :: t2 ->
if h1 < h2 then aux t1 l2 (h1 :: result)
else aux l1 t2 (h2 :: result)
List.rev (aux l1 l2 [])
You can read more about the accumulator pattern here; the examples are in lisp but it's a general pattern that works in any language that provides tail call optimization.

How can I convert this imperative style merge-sort implementation into functional style?

This is how I implemented merge-sort in F# using imperative style:
let merge (l1: List<string>, l2: List<string>) =
let r: List<string> = new List<string>()
let mutable (i,j, cnt1, cnt2) = (0,0, l1.Count, l2.Count)
while i < cnt1 && j < cnt2 do
if l1.[i] <= l2.[j] then
r.Add (l1.[i])
i <- i + 1
else
r.Add (l2.[j])
j <- j + 1
if i = cnt1 then
while j < cnt2 do
r.Add (l2.[j])
j <- j + 1
else
while i < cnt1 do
r.Add (l1.[i])
i <- i + 1
r
Can you convert this to alternate 'functional' styled implementation and explain how it works, if possible? Even though I am studying list comprehensions and all that at the moment, I can't come up with an idea to use it here.
You're using .NET List<'T> which is renamed to ResizeArray<'T> in F# to avoid confusion. If you use functional list, merge function would look like this:
let merge(xs, ys) =
let rec loop xs ys acc =
match xs, ys with
| [], [] -> List.rev acc (* 1 *)
| [], y::ys' -> loop xs ys' (y::acc) (* 2 *)
| x::xs', [] -> loop xs' ys (x::acc) (* 3 *)
| x::xs', y::_ when x <= y -> loop xs' ys (x::acc) (* 4 *)
| _::_, y::ys' -> loop xs ys' (y::acc) (* 5 *)
loop xs ys []
To explain this function in terms of your imperative version:
The 4th and 5th patterns are corresponding to the first while loop where you compare two current elements and add the smaller one into a resulting list.
The 2nd and 3rd patterns are similar to your 2nd and 3rd while loops.
The first pattern is the case where i = cnt1 and j = cnt2 and we should return results. Since a new element is always prepended to the accumulator, we need to reverse it to get a list in the increasing order.
To be precise, your merge function is just one part of merge-sort algorithm. You need a function to split a list in two halves, call merge-sort on two halves and merge two sorted halves into one. The split function below is left for you as an exercise.
let rec mergeSort ls =
match ls with
| [] | [_] -> ls
| _ -> let xs, ys = split ls
let xs', ys' = mergeSort xs, mergeSort ys
merge(xs', ys')
To add a more simple but naive alternative to pad's:
let rec merge x y =
match (x, y) with
| ([], []) -> []
| ([], rest) -> rest
| (rest, []) -> rest
| (fx :: xs, fy :: _) when fx <= fy -> fx :: merge xs y
| (fx :: _, fy :: ys) -> fy :: merge x ys
Similarly to pad's, we're pattern matching over the function parameters.
I first put them into a tuple so that I can pattern match them both at the same time.
I then take care of the base cases with both or either of the parameters being empty.
I then use when guard to check which first item is smaller
I finally take the first item and cons it to the result of another call to merge with the rest of the items the smaller item was taken from and the whole of the other list. So if the first item of x is smaller, I append the first item of x (fx in this case) to the result of a call to merge passing in the rest of x (xs) and the whole of y (because the first item of y was larger).

Splitting a list into list of lists based on predicate

(I am aware of this question, but it relates to sequences, which is not my problem here)
Given this input (for example):
let testlist =
[
"*text1";
"*text2";
"text3";
"text4";
"*text5";
"*text6";
"*text7"
]
let pred (s:string) = s.StartsWith("*")
I would like to be able to call MyFunc pred testlist and get this output:
[
["*text1";"*text2"];
["*text5";"*text6";"*text7"]
]
This is my current solution, but I don't really like the nested List.revs (ignore the fact that it takes Seq as input)
let shunt pred sq =
let shunter (prevpick, acc) (pick, a) =
match pick, prevpick with
| (true, true) -> (true, (a :: (List.hd acc)) :: (List.tl acc))
| (false, _) -> (false, acc)
| (true, _) -> (true, [a] :: acc)
sq
|> Seq.map (fun a -> (pred a, a))
|> Seq.fold shunter (false, [])
|> snd
|> List.map List.rev
|> List.rev
there is a List.partition function in the F# core library (in case you wanted to implement this just to have it working and not to learn how to write recursive functions yourself). Using this function, you can write this:
> testlist |> List.partition (fun s -> s.StartsWith("*"))
val it : string list * string list =
(["*text1"; "*text2"; "*text5"; "*text6"; "*text7"], ["text3"; "text4"])
Note that this function returns a tuple instead of returning a list of lists. This is a bit different to what you wanted, but if the predicate returns just true or false, then this makes more sense.
The implementation of partition function that returns tuples is also a bit simpler, so it may be useful for learning purposes:
let partition pred list =
// Helper function, which keeps results collected so
// far in 'accumulator' arguments outTrue and outFalse
let rec partitionAux list outTrue outFalse =
match list with
| [] ->
// We need to reverse the results (as we collected
// them in the opposite order!)
List.rev outTrue, List.rev outFalse
// Append element to one of the lists, depending on 'pred'
| x::xs when pred x -> partitionAux xs (x::outTrue) outFalse
| x::xs -> partitionAux xs outTrue (x::outFalse)
// Run the helper function
partitionAux list [] []
Edit: rev-less version using foldBack added below.
Here's some code that uses lists and tail-recursion:
//divides a list L into chunks for which all elements match pred
let divide pred L =
let rec aux buf acc L =
match L,buf with
//no more input and an empty buffer -> return acc
| [],[] -> List.rev acc
//no more input and a non-empty buffer -> return acc + rest of buffer
| [],buf -> List.rev (List.rev buf :: acc)
//found something that matches pred: put it in the buffer and go to next in list
| h::t,buf when pred h -> aux (h::buf) acc t
//found something that doesn't match pred. Continue but don't add an empty buffer to acc
| h::t,[] -> aux [] acc t
//found input that doesn't match pred. Add buffer to acc and continue with an empty buffer
| h::t,buf -> aux [] (List.rev buf :: acc) t
aux [] [] L
usage:
> divide pred testlist;;
val it : string list list =
[["*text1"; "*text2"]; ["*text5"; "*text6"; "*text7"]]
Using a list as data structure for a buffer means that it always needs to be reversed when outputting the contents. This may not be a problem if individual chunks are modestly sized. If speed/efficiency becomes an issue, you could use a Queue<'a> or a `List<'a>' for the buffers, for which appending is fast. But using these data structures instead of lists also means that you lose the powerful list pattern matching. In my opinion, being able to pattern match lists outweighs the presence of a few List.rev calls.
Here's a streaming version that outputs the result one block at a time. This avoids the List.rev on the accumulator in the previous example:
let dividestream pred L =
let rec aux buf L =
seq { match L, buf with
| [],[] -> ()
| [],buf -> yield List.rev buf
| h::t,buf when pred h -> yield! aux (h::buf) t
| h::t,[] -> yield! aux [] t
| h::t,buf -> yield List.rev buf
yield! aux [] t }
aux [] L
This streaming version avoids the List.rev on the accumulator. Using List.foldBack can be used to avoid reversing the accumulated chunks as well.
update: here's a version using foldBack
//divides a list L into chunks for which all elements match pred
let divide2 pred L =
let f x (acc,buf) =
match pred x,buf with
| true,buf -> (acc,x::buf)
| false,[] -> (acc,[])
| false,buf -> (buf::acc,[])
let rest,remainingBuffer = List.foldBack f L ([],[])
match remainingBuffer with
| [] -> rest
| buf -> buf :: rest
Just reverse the list once up front, and then build the structure in order easily:
let Shunt p l =
let mutable r = List.rev l
let mutable result = []
while not r.IsEmpty do
let mutable thisBatch = []
while not r.IsEmpty && not(p r.Head) do
r <- r.Tail
while not r.IsEmpty && p r.Head do
thisBatch <- r.Head :: thisBatch
r <- r.Tail
if not thisBatch.IsEmpty then
result <- thisBatch :: result
result
The outer while deals with each 'batch', and the first inner while skips over any that don't match the predicate, followed by another while that grabs all those that do and stores them in the current batch. If there was anything in this batch (the final one may be empty), prepend it to the final result.
This is an example where I think locally imperative code is simply superior to a purely functional counterpart. The code above is so easy to write and to reason about.
Another version of shunt:
let shunt pred lst =
let rec tWhile pred lst =
match lst with
| [] -> [], []
| hd :: tl when pred hd -> let taken, rest = tWhile pred tl
(hd :: taken), rest
| lst -> [], lst
let rec collect = function
| [] -> []
| lst -> let taken, rest = tWhile pred lst
taken :: (collect (snd (tWhile (fun x -> not (pred x)) rest)))
collect lst
This one avoids List.rev but it's not tail recursive - so only suitable for small lists.
yet another one...
let partition pred lst =
let rec trec xs cont =
match xs with
| [] -> ([],[]) |> cont
| h::t when pred h -> (fun (y,n) -> h::y,n) >> cont |> trec t
| h::t -> (fun (y,n) -> y,h::n) >> cont |> trec t
trec lst id
then we can define shunt:
let shunt pred lst = lst |> partition pred |> (fun (x,y) -> [x;y])

Resources