F# sort using head::tail - f#

I am trying to write a recursive function that uses head::tail. I understand that head in the first element of the list and tail is all other elements in the list. I also understand how recursions works. What I am wondering is how to go about sorting the elements in the list. Is there a way to compare the head to every element in the tail then choose the smallest element? My background in C++ and I am not allowed to use the List.sort(). Any idea of how to go about it? I have looked at the tutorials on the msdn site and still have had no luck

Here is recursive list-based implementation of quicksort algorithm in F#
let rec quicksort list =
match list with
| [] -> []
| h::t ->
let lesser = List.filter ((>) h) t
let greater = List.filter ((<=) h) t
(quicksort lesser) #[h] #(quicksort greater)

You need to decide a sorting methodology before worrying about the data structure used. If you were to do, say, insertion sort, you would likely want to start from the end of the list and insert an item at each recursion level, being careful how you handle the insertion itself.
Technically at any particular level you only have access to one data element, however you can pass a particular data element as a parameter to preserve it. For instance here is the inserting part of an insertion sort algorithm, it assumes the list is sorted.
let rec insert i l =
match l with
| [] -> [i]
| h::t -> if h > i then
i::l
else
h::(insert i t)
Note how I now have access to two elements, the cached one and the remainder. Another variation would be a merge sort where you had two sorted lists and therefore two items to work with any particular iteration.
Daniel's commented answer mentions a particular implementation (quicksort) if you are interested.
Finally list's aren't optimal for sorting algorithms due to their rigid structure, and the number of allocations required. Given that all known sorting algorithms are > O(n) complexity, you can translate you list to and from an array in order to improve performance without hurting your asymptotic performance.
EDIT:
Note that above isn't in tail recursive format, you would need to do something like this:
let insert i l =
let rec insert i l acc =
match l with
| [] -> List.foldBack (fun e a -> e :: a) acc [i]
| h::t -> if h > i then
List.foldBack (fun e a -> e :: a) acc i::l
else
insert i l (i::acc)
insert i l []
I don't remember offhand the best way to reverse a list so went with an example from https://learn.microsoft.com/en-us/dotnet/fsharp/language-reference/lists

Related

How to split F# result type list into lists of inner type

I have a list/sequence as follows Result<DataEntry, exn> []. This list is populated by calling multiple API endpoints in parallel based on some user inputs.
I don't care if some of the calls fail as long as at least 1 succeeds. I then need to perform multiple operations on the success list.
My question is how to partition the Result list into exn [] and DataEntry [] lists. I tried the following:
// allData is Result<DataEntry, exn> []
let filterOutErrors (input: Result<DataEntry, exn>) =
match input with
| Ok v -> true
| _ -> false
let values, err = allData |> Array.partition filterOutErrors
This in principle meets the requirement since values contains all the success cases but understandably the compiler can't infer the types so both values and err contains Result<DataEntry, exn>.
Is there any way to split a list of result Result<Success, Err> such that you end up with separate lists of the inner type?
Is there any way to split a list of result Result<Success, Err> such that you end up with separate lists of the inner type?
Remember that Seq / List / Array are foldable, so you can use fold to convert a Seq / List / Array of 'Ts into any other type 'S. Here you want to go from []Result<DataEntry, exn> to, e.g., the tuple list<DataEntry> * list<exn>. We can define the following folder function, that takes an initial state s of type list<'a> * list<'b> and a Result Result<'a, 'b> and returns your tuple of lists list<'a> * list<'b>:
let listFolder s r =
match r with
| Ok data -> (data :: (fst s), snd s)
| Error err -> (fst s, err :: (snd s))
then you can fold over your array as follows:
let (values, err) = Seq.fold listFolder ([], []) allData
You can extract the good and the bad like this.
let values =
allData
|> Array.choose (fun r ->
match r with
| Result.Ok ok -> Some ok
| Result.Error _ -> None)
let err =
allData
|> Array.choose (fun r ->
match r with
| Result.Ok _ -> None
| Result.Error error -> Some error)
You seem confused about whether you have arrays or lists. The F# code you use, in the snippet and in your question text, all points to use of arrays, in spite of you several times mentioning lists.
It has recently been recommended that we use array instead of the [] symbol in types, since there are inconsistencies in the way F# uses the symbol [] to mean list in some places, and array in other places. There is also the symbol [||] for arrays, which may add more confusion.
So that would be recommending Result<DataEntry,exn> array in this case.
The answer from Víctor G. Adán is functional, but it's a downside that the API requires you to pass in two empty lists, exposing the internal implementation.
You could wrap this into a "starter" function, but then the code grows, requires nested functions or using modules and the intention is obscured.
The answer from Bent Tranberg, while more readable requires two passes of the data, and it seems inefficient to map into Option type just to be able to filter on it using .Choose.
I propose KISS'ing it with some good old mutation.
open System.Collections.Generic
let splitByOkAndErrors xs =
let oks = List<'T>()
let errors = List<'V>()
for x in xs do
match x with
| Ok v -> oks.Add v
| Error e -> errors.Add e
(oks |> seq, errors |> seq)
I know I know, mutation, yuck right? I believe you should not shy away from that even in F#, use the right tool for every situation: the mutation is kept local to the function, so it's still pure. The API is clean just taking in the list of Result to split, there is no concepts like folding, recursive calls, list cons pattern matching etc. to understand, and the function won't reverse the input list, you also have the option to return array or seq, that is, you are not confined to a linked list that can only be appended to in O(1) in the head - which in my experience seldom fits well into business case, win win win in my book.
I general, I hope to see F# grow into a more multi-paradigm programming language in the community's mind. It's nice to see these functional solutions, but I fear they scare some people away unnecessarily, as F# is already multi-paradigm!

How to write efficient list/seq functions in F#? (mapFoldWhile)

I was trying to write a generic mapFoldWhile function, which is just mapFold but requires the state to be an option and stops as soon as it encounters a None state.
I don't want to use mapFold because it will transform the entire list, but I want it to stop as soon as an invalid state (i.e. None) is found.
This was myfirst attempt:
let mapFoldWhile (f : 'State option -> 'T -> 'Result * 'State option) (state : 'State option) (list : 'T list) =
let rec mapRec f state list results =
match list with
| [] -> (List.rev results, state)
| item :: tail ->
let (result, newState) = f state item
match newState with
| Some x -> mapRec f newState tail (result :: results)
| None -> ([], None)
mapRec f state list []
The List.rev irked me, since the point of the exercise was to exit early and constructing a new list ought to be even slower.
So I looked up what F#'s very own map does, which was:
let map f list = Microsoft.FSharp.Primitives.Basics.List.map f list
The ominous Microsoft.FSharp.Primitives.Basics.List.map can be found here and looks like this:
let map f x =
match x with
| [] -> []
| [h] -> [f h]
| (h::t) ->
let cons = freshConsNoTail (f h)
mapToFreshConsTail cons f t
cons
The consNoTail stuff is also in this file:
// optimized mutation-based implementation. This code is only valid in fslib, where mutation of private
// tail cons cells is permitted in carefully written library code.
let inline setFreshConsTail cons t = cons.(::).1 <- t
let inline freshConsNoTail h = h :: (# "ldnull" : 'T list #)
So I guess it turns out that F#'s immutable lists are actually mutable because performance? I'm a bit worried about this, having used the prepend-then-reverse list approach as I thought it was the "way to go" in F#.
I'm not very experienced with F# or functional programming in general, so maybe (probably) the whole idea of creating a new mapFoldWhile function is the wrong thing to do, but then what am I to do instead?
I often find myself in situations where I need to "exit early" because a collection item is "invalid" and I know that I don't have to look at the rest. I'm using List.pick or Seq.takeWhile in some cases, but in other instances I need to do more (mapFold).
Is there an efficient solution to this kind of problem (mapFoldWhile in particular and "exit early" in general) with functional programming concepts, or do I have to switch to an imperative solution / use a Collections.Generics.List?
In most cases, using List.rev is a perfectly sufficient solution.
You are right that the F# core library uses mutation and other dirty hacks to squeeze some more performance out of the F# list operations, but I think the micro-optimizations done there are not particularly good example. F# list functions are used almost everywhere so it might be a good trade-off, but I would not follow it in most situations.
Running your function with the following:
let l = [ 1 .. 1000000 ]
#time
mapFoldWhile (fun s v -> 0, s) (Some 1) l
I get ~240ms on the second line when I run the function without changes. When I just drop List.rev (so that it returns the data in the other order), I get around ~190ms. If you are really calling the function frequently enough that this matters, then you'd have to use mutation (actually, your own mutable list type), but I think that is rarely worth it.
For general "exit early" problems, you can often write the code as a composition of Seq.scan and Seq.takeWhile. For example, say you want to sum numbers from a sequence until you reach 1000. You can write:
input
|> Seq.scan (fun sum v -> v + sum) 0
|> Seq.takeWhile (fun sum -> sum < 1000)
Using Seq.scan generates a sequence of sums that is over the whole input, but since this is lazily generated, using Seq.takeWhile stops the computation as soon as the exit condition happens.

How does the implementation of list in F# work?

I'm curious as to how the list module/type works in F#, specifically does it optimise this?
let xs = ["1"; "2"; "3"]
let ys = "0"::xs
let zs = ["hello"; "world"]#xs
I've looked over some of the source https://github.com/fsharp/fsharp/blob/68e37d03dfc15f8105aeb0ac70b846f82b364901/src/fsharp/FSharp.Core/prim-types.fs#L3493 seems to be the relevant area.
I would like to know if xs is copied when making ys.
I would have thought it's easy to just point to the existing list if you just cons element.
If you are concatenating I imagine it might be impossible as it would require mutating the last element of the list to point to the next one?
If someone could annotate/paste snippets of code from FSharp.Core that would be ideal.
So the implementation of List is a little odd. It is actually implemented as a discriminated union. From the spec:
type 'T list =
| ([])
| (::) of 'T * 'T list
So you can think of :: as a function that takes two arguments and creates a tuple (which is fast as it is independent of the list size).
# is much more complicated. Here is the implementation:
let (#) l1 l2 =
match l1 with
| [] -> l2
| (h::t) ->
match l2 with
| [] -> l1
| _ ->
let res = [h]
let lastCons = PrivateListHelpers.appendToFreshConsTail res t
PrivateListHelpers.setFreshConsTail lastCons l2;
res
The two weird functions basically mutate the list in place. appendToFreshConsTail copies the list and returns the last element. setFreshConsTail then changes the last element so that its next element is set to l2 rather than [] joining the lists.

Comparing values in loop inside function

I want to make a function that takes an integer list as argument and compares every value and returns the largest value. In C# I would simply iterate through every value in the list, save the largest to a variable and return it, I'm hoping F# works similarly but the syntax is kinda iffy for me, here's what my code looks like. Also max2 is a function that compares 2 values and returns the largest.
let max_list list =
let a = 0 : int
match list with
| head :: tail -> (for i in list do a = max2 i a) a
| [] -> failwith "sry";;
You could use mutable variable and write the code using for loop, just like in C#. However, if you're doing this to learn F# and functional concepts, then it's good idea to use recursion.
In this case, recursive function is a bit longer, but it demonstrates the key concepts including pattern matching - so learning the tricks is something that will be useful when writing more complicated F# code.
The key idea is to write a function that takes the largest value found so far and calls itself recursively until it reaches the end of the list.
let max_list list =
// Inner recursive function that takes the largest value found so far
// and a list to be processed (if it is empty, it returns 'maxSoFar')
let rec loop maxSoFar list =
match list with
// If the head value is greater than what we found so far, use it as new greater
| head::tail when head > maxSoFar -> loop head tail
// If the head is smaller, use the previous maxSoFar value
| _::tail -> loop maxSoFar tail
// At the end, just return the largest value found so far
| [] -> maxSoFar
// Start with head as the greatest and tail as the rest to be processed
// (fails for empty list - but you could match here to give better error)
loop (List.head list) (List.tail list)
As a final note, this will be slow because it uses generic comparison (via an interface). You can make the function faster using let inline max_list list = (...). That way, the code will use native comparison instruction when used with primitive types like int (this is really a special case - the problem only really happens with generic comparison)
Also know that you can write a nice one-liner using reduce:
let max_list list = List.reduce (fun max x -> if x > max then x else max)
If your intention is to be able to find the maximum value of items in a list where the value of the items is found by the function max2 then this approach works:
let findMax list =
list
|> List.map (fun i -> i, max2 i)
|> List.maxBy snd
|> fst

cons operator (::) in F#

The :: operator in F# always prepends elements to the list. Is there an operator that appends to the list? I'm guessing that using # operator
[1; 2; 3] # [4]
would be less efficient, than appending one element.
As others said, there is no such operator, because it wouldn't make much sense. I actually think that this is a good thing, because it makes it easier to realize that the operation will not be efficient. In practice, you shouldn't need the operator - there is usually a better way to write the same thing.
Typical scenario: I think that the typical scenario where you could think that you need to append elements to the end is so common that it may be useful to describe it.
Adding elements to the end seems necessary when you're writing a tail-recursive version of a function using the accumulator parameter. For example a (inefficient) implementation of filter function for lists would look like this:
let filter f l =
let rec filterUtil acc l =
match l with
| [] -> acc
| x::xs when f x -> filterUtil (acc # [x]) xs
| x::xs -> filterUtil acc xs
filterUtil [] l
In each step, we need to append one element to the accumulator (which stores elements to be returned as the result). This code can be easily modified to use the :: operator instead of appending elements to the end of the acc list:
let filter f l =
let rec filterUtil acc l =
match l with
| [] -> List.rev acc // (1)
| x::xs when f x -> filterUtil (x::acc) xs // (2)
| x::xs -> filterUtil acc xs
filterUtil [] l
In (2), we're now adding elements to the front of the accumulator and when the function is about to return the result, we reverse the list (1), which is a lot more efficient than appending elements one by one.
Lists in F# are singly-linked and immutable. This means consing onto the front is O(1) (create an element and have it point to an existing list), whereas snocing onto the back is O(N) (as the entire list must be replicated; you can't change the existing final pointer, you must create a whole new list).
If you do need to "append one element to the back", then e.g.
l # [42]
is the way to do it, but this is a code smell.
The cost of appending two standard lists is proportional to the length of the list on the left. In particular, the cost of
xs # [x]
is proportional to the length of xs—it is not a constant cost.
If you want a list-like abstraction with a constant-time append, you can use John Hughes's function representation, which I'll call hlist. I'll try to use OCaml syntax, which I hope is close enough to F#:
type 'a hlist = 'a list -> 'a list (* a John Hughes list *)
let empty : 'a hlist = let id xs = xs in id
let append xs ys = fun tail -> xs (ys tail)
let singleton x = fun tail -> x :: tail
let cons x xs = append (singleton x) xs
let snoc xs x = append xs (singleton x)
let to_list : 'a hlist -> 'a list = fun xs -> xs []
The idea is that you represent a list functionally as a function from "the rest of the elements" to "the final list". This works great if you are going to build up the whole list before you look at any of the elements. Otherwise you'll have to deal with the linear cost of append or use another data structure entirely.
I'm guessing that using # operator [...] would be less efficient, than appending one element.
If it is, it will be a negligible difference. Both appending a single item and concatenating a list to the end are O(n) operations. As a matter of fact I can't think of a single thing that # has to do, which a single-item append function wouldn't.
Maybe you want to use another data structure. We have double-ended queues (or short "Deques") in fsharpx. You can read more about them at http://jackfoxy.com/double-ended-queues-for-fsharp
The efficiency (or lack of) comes from iterating through the list to find the final element. So declaring a new list with [4] is going to be negligible for all but the most trivial scenarios.
Try using a double-ended queue instead of list. I recently added 4 versions of deques (Okasaki's spelling) to FSharpx.Core (Available through NuGet. Source code at FSharpx.Core.Datastructures). See my article about using dequeus Double-ended queues for F#
I've suggested to the F# team the cons operator, ::, and the active pattern discriminator be made available for other data structures with a head/tail signature.3

Resources