Sequence in F# folding triples - f#

I've googled and read, and I'm trying to find a "correct" way to do it, but every question I read on SO seems to have completely different answers.
Here is the gist of my problem. files has the type signature of a seq of a triple (a:string, b:string,c:Int64). Being new to f# I'm still not fluent in expressing type signatures (or for that matter understanding them). a is a filename, b is an internal identifier, and c is a value representing the length (size) of the file. baseconfig is a string from earlier in the code.
ignore(files
|> Seq.filter( fun(x,y,z) -> y = baseconfig) // used to filter only files we want
|> Seq.fold( fun f n ->
if( (fun (_,_,z) -> z) n > 50L*1024L*1024L) then
zipfilex.Add((fun (z:string, _, _) -> z) n)
printfn("Adding 50mb to zip")
zipfilex.CommitUpdate()
zipfilex.BeginUpdate()
("","",0L)
else
zipfilex.Add((fun (z, _, _) -> z) n)
("", "", (fun (_, _, z:Int64) -> z) n + (fun (_, _, z:Int64) -> z) f)
) ("","",0L)
)
What this chunk of code is supposed to do, is iterate through each file in files, add it to a zip archive (but not really, it just goes on a list to be committed later), and when the files exceed 50MB, commit the currently pending files to the zip archive. Adding a file is cheap, committing is expensive, so I try to mitigate the cost by batching it.
So far the code kinda works... Except for the ObjectDisposedException I got when it approached 150MB of committed files. But I'm not sure this is the right way to do such an operation. It feels like I'm using Seq.fold in a unconventional way, but yet, I don't know of a better way to do it.
Bonus question: Is there a better way to snipe values out of tuples? fst and snd only work for 2 valued tuples, and I realize you can define your own functions instead of inline them like I did, but it seems there should be a better way.
Update: My previous attempts at fold, I couldn't understand why I couldn't just use an Int64 as an accumulator. Turns out I was missing some critical parenthesis. Little simpler version below. Also eliminates all the crazy tuple extraction.
ignore(foundoldfiles
|> Seq.filter( fun (x,y,z) -> y = baseconfig)
|> Seq.fold( fun (a) (f,g,j) ->
zipfilex.Add( f)
if( a > 50L*1024L*1024L) then
printfn("Adding 50mb to zip")
zipfilex.CommitUpdate()
zipfilex.BeginUpdate()
0L
else
a + j
) 0L
)
Update 2: I'm going to have to go with an imperative solution, F# is somehow re-entering this block of code, after the zip file is closed in the statement which follows it. Which explains the ObjectDisposedException. No idea how that works or why.

As an alternative to the "dirty" imperative style, you can extend the Seq module with a general and reusable function for chunking. The function is a bit like fold, but it takes a lambda that returns option<'State>. If it returns None, then a new chunk is started and otherwise the element is added to the previous chunk. Then you can write an elegant solution:
files
|> Seq.filter(fun (x, y, z) -> y = baseconfig)
|> Seq.chunkBy(fun (x, y, z) sum ->
if sum + z > 50L*1024L*1024L then None
else Some(sum + z)) 0L
|> Seq.iter(fun files ->
zipfilex.BeginUpdate()
for f, _, _ in files do zipfilex.Add(f)
zipfilex.CommitUpdate())
The implementation of the chunkBy function is a bit longer - it needs to use IEnumerator directly & it can be expressed using recursion:
module Seq =
let chunkBy f initst (files:seq<_>) =
let en = files.GetEnumerator()
let rec loop chunk st = seq {
if not (en.MoveNext()) then
if chunk <> [] then yield chunk
else
match f en.Current st with
| Some(nst) -> yield! loop (en.Current::chunk) nst
| None ->
yield chunk
yield! loop [en.Current] initst }
loop [] initst

I don't think your problem benefits from the use of fold. It's most useful when building immutable structures. My opinion, in this case, is that it makes what you're trying to do less clear. The imperative solution works nicely:
let mutable a = 0L
for (f, g, j) in foundoldfiles do
if g = baseconfig then
zipfilex.Add(f)
if a > 50L * 1024L * 1024L then
printfn "Adding 50mb to zip"
zipfilex.CommitUpdate()
zipfilex.BeginUpdate()
a <- 0L
else
a <- a + j

Here's my take:
let inline zip a b = a, b
foundoldfiles
|> Seq.filter (fun (_, internalid, _) -> internalid = baseconfig)
|> zip 0L
||> Seq.fold (fun acc (filename, _, filesize) ->
zipfilex.Add filename
let acc = acc + filesize
if acc > 50L*1024L*1024L then
printfn "Adding 50mb to zip"
zipfilex.CommitUpdate ()
zipfilex.BeginUpdate ()
0L
else acc)
|> ignore
Some notes:
The zip helper function makes for a clean a pipeline through the entire function without any overhead, and in more complex scenarios helps with type inferrence since the state gets shifted from the right to the left side of the fold functor (though that doesn't matter or help in this particular case)
The use of _ to locally discard elements of the tuple that you don't need makes the code easier to read
The approach of pipelining into ignore rather than wrapping the entire expression with extra parenthesis makes the code easier to read
Wrapping the arguments of unary functions in parenthesis looks bizarre; you can't use parenthesis for non-unary curried functions, so using them for unary functions is inconsistent. My policy is to reserve parenthesis for constructor calls and tupled-function calls
EDIT: P.S. if( a > 50L*1024L*1024L) then is incorrect logic -- the if needs to take into account the accumulator plus the current filesize. E.g., if the first file was >= 50MB then the if wouldn't trigger.

If you're not fond of mutable variables and imperative loops, you could always rewrite this using GOTO a functional loop:
let rec loop acc = function
| (file, id, size) :: files ->
if id = baseconfig then
zipfilex.Add file
if acc > 50L*1024L*1024L then
printfn "Adding 50mb to zip"
zipfilex.CommitUpdate()
zipfilex.BeginUpdate()
loop 0L files
else
loop (acc + size) files
else
loop acc files
| [] -> ()
loop 0L foundoldfiles
The advantage of this is it explicitly states the three different ways that the inductive case can proceed and how the accumulator is transformed in each case (so you're less likely to get this wrong - witness the bug in Daniel's for loop version).
You could even move the baseconfig check into a when clause:
let rec loop acc = function
| (file, id, size) :: files when id = baseconfig ->
zipfilex.Add file
if acc > 50L*1024L*1024L then
printfn "Adding 50mb to zip"
zipfilex.CommitUpdate()
zipfilex.BeginUpdate()
loop 0L files
else
loop (acc + size) files
| _ :: files -> loop acc files
| [] -> ()
loop 0L foundoldfiles

Related

How to stop the traversal of a list

I start in Fsharp and I have this question.
Assuming I have two lists a and b of the same length, I travel theses lists simultaneously and test a condition on a and b each step, with the result of a previous calculus. If this test fails there is no need to keep on.
I wrote this code :
let mutable (i : int) = 0
let mutable (good : bool) = true
let mutable (previous : int) = 0
while good && i < len do
good <- test a.[i] b.[i] previous
previous <- my_func a.[i] b.[i]
i <- i + 1
I saw this code which is much much better :
List.zip a b |> List.fold (fun (x, y) (a,b) -> (p && test a b y, my_func a b) (true, 0)
But, with my code, as soon as the test fails, the process finished, not with the second code.
Is there a way, using the design of the second code to stop the process ?
Thank you
I assume you are only interested in whether the final result is good.
As mentioned by Brian, you can use Seq.scan which behaves like Seq.fold but it returns all the intermediate states rather than just the final state. By using Seq instead of List you are also using a lazy sequence and so functions can terminate early. To do what you want, you can use Seq.scan together with Seq.forall, which will check that all values of a given sequence satisfy a certain condition - the nice thing here is that this can terminate early as soon as the condition is false.
Putting all this together, I get something like this:
Seq.zip a b
|> Seq.scan (fun (good, prev) (a, b) ->
test a b prev, my_func a b) (true, 0)
|> Seq.forall (fun (good, _) -> good)
Three ideas:
You could write your own variation of fold that stops when a flag is set to false. Personally, I think that would be cumbersome, though.
You could use Seq.scan to lazily accumulate all the results of my_func and then examine them, similar to this answer. Since Seq is lazy, the scan would short circuit the way you want. This is tricky to get right, though.
You could walk the lists recursively. IMHO, this is the simplest functional solution. Something like this:
let rec examine listA listB prev =
match listA, listB with
| headA :: tailA, headB :: tailB ->
if test headA headB prev then
let prev' = my_func headA headB
examine tailA tailB prev'
else false
| [], [] -> true
| _ -> false
examine listA listB 0

F# - Insert element in sorted list (tail-recursive)

I am trying to convert the following normal-recursive code to tail-recursive in F#, but I am failing miserably.
let rec insert elem lst =
match lst with
| [] -> [elem]
| hd::tl -> if hd > elem then
elem::lst
else
hd::(insert elem tl)
let lst1 = []
let lst2 = [1;2;3;5]
printfn "\nInserting 4 in an empty list: %A" (insert 4 lst1)
printfn "\nInserting 4 in a sorted list: %A" (insert 4 lst2)
Can you guys help? Unfortunately I am a beginner in f#. Also, can anyone point me to a good tutorial to understand tail-recursion?
The point of tail recursion is the following: the last operation before returning from a function is a call to itself; this is called a tail call, and is where tail recursion gets its name from (the recursive call is in last, i.e. tail position).
Your function is not tail recursive because at least one of its branches has an operation after the recursive call (the list cons operator).
The usual way of converting a recursive function into a tail-recursive function is to add an argument to accumulate intermediate results (the accumulator). When it comes to lists, and when you realize that the only elementary list operation is prepending an element, this also means that after you are through with processing your list, it will be reversed, and thus the resulting accumulator will usually have to be reversed again.
With all these points in mind, and given that we do not want to change the function's public interface by adding a parameter that is superfluous from the caller's point of view, we move the real work to an internal subfunction. This particular function is slightly more complicated because after the element has been inserted, there is nothing else to do but concatenate the two partial lists again, one of which is now in reverse order while the other is not. We create a second internal function to handle that part, and so the whole function looks as follows:
let insert elm lst =
let rec iter acc = function
| [] -> List.rev (elm :: acc)
| (h :: t) as ls ->
if h > elm then finish (elm :: ls) acc
else iter (h :: acc) t
and finish acc = function
| [] -> acc
| h :: t -> finish (h :: acc) t
iter [] lst
For further studying, Scott Wlaschin's F# for Fun and Profit is a great resource, and tail recursion is handled in a larger chapter about recursive types and more: https://fsharpforfunandprofit.com/posts/recursive-types-and-folds

How to write efficient list/seq functions in F#? (mapFoldWhile)

I was trying to write a generic mapFoldWhile function, which is just mapFold but requires the state to be an option and stops as soon as it encounters a None state.
I don't want to use mapFold because it will transform the entire list, but I want it to stop as soon as an invalid state (i.e. None) is found.
This was myfirst attempt:
let mapFoldWhile (f : 'State option -> 'T -> 'Result * 'State option) (state : 'State option) (list : 'T list) =
let rec mapRec f state list results =
match list with
| [] -> (List.rev results, state)
| item :: tail ->
let (result, newState) = f state item
match newState with
| Some x -> mapRec f newState tail (result :: results)
| None -> ([], None)
mapRec f state list []
The List.rev irked me, since the point of the exercise was to exit early and constructing a new list ought to be even slower.
So I looked up what F#'s very own map does, which was:
let map f list = Microsoft.FSharp.Primitives.Basics.List.map f list
The ominous Microsoft.FSharp.Primitives.Basics.List.map can be found here and looks like this:
let map f x =
match x with
| [] -> []
| [h] -> [f h]
| (h::t) ->
let cons = freshConsNoTail (f h)
mapToFreshConsTail cons f t
cons
The consNoTail stuff is also in this file:
// optimized mutation-based implementation. This code is only valid in fslib, where mutation of private
// tail cons cells is permitted in carefully written library code.
let inline setFreshConsTail cons t = cons.(::).1 <- t
let inline freshConsNoTail h = h :: (# "ldnull" : 'T list #)
So I guess it turns out that F#'s immutable lists are actually mutable because performance? I'm a bit worried about this, having used the prepend-then-reverse list approach as I thought it was the "way to go" in F#.
I'm not very experienced with F# or functional programming in general, so maybe (probably) the whole idea of creating a new mapFoldWhile function is the wrong thing to do, but then what am I to do instead?
I often find myself in situations where I need to "exit early" because a collection item is "invalid" and I know that I don't have to look at the rest. I'm using List.pick or Seq.takeWhile in some cases, but in other instances I need to do more (mapFold).
Is there an efficient solution to this kind of problem (mapFoldWhile in particular and "exit early" in general) with functional programming concepts, or do I have to switch to an imperative solution / use a Collections.Generics.List?
In most cases, using List.rev is a perfectly sufficient solution.
You are right that the F# core library uses mutation and other dirty hacks to squeeze some more performance out of the F# list operations, but I think the micro-optimizations done there are not particularly good example. F# list functions are used almost everywhere so it might be a good trade-off, but I would not follow it in most situations.
Running your function with the following:
let l = [ 1 .. 1000000 ]
#time
mapFoldWhile (fun s v -> 0, s) (Some 1) l
I get ~240ms on the second line when I run the function without changes. When I just drop List.rev (so that it returns the data in the other order), I get around ~190ms. If you are really calling the function frequently enough that this matters, then you'd have to use mutation (actually, your own mutable list type), but I think that is rarely worth it.
For general "exit early" problems, you can often write the code as a composition of Seq.scan and Seq.takeWhile. For example, say you want to sum numbers from a sequence until you reach 1000. You can write:
input
|> Seq.scan (fun sum v -> v + sum) 0
|> Seq.takeWhile (fun sum -> sum < 1000)
Using Seq.scan generates a sequence of sums that is over the whole input, but since this is lazily generated, using Seq.takeWhile stops the computation as soon as the exit condition happens.

Some basic seq and list questions [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Linked list partition function and reversed results
Actually I don't care about the input type or the output type, any of seq, array, list will do. (It doesn't have to be generic) Currently my code takes list as input and (list * list) as output
let takeWhile predicator list =
let rec takeWhileRec newList remain =
match remain with
| [] -> (newList |> List.rev, remain)
| x::xs -> if predicator x then
takeWhileRec (x::newList) xs
else
(newList |> List.rev, remain)
takeWhileRec [] list
However, there is a pitfall. As fas as I see, List.rev is O(n^2), which would likely to dominate the overall speed? I think it is even slower than the ugly solution: Seq.takeWhile, then count, and then take tail n times... which is still O(n)
(If there is a C# List, then i would use that without having to reverse it...)
A side question, what's difference between Array.ofList and List.toArray , or more generally, A.ofB and B.ofA in List, Seq, Array?
is seq myList identical to List.toSeq myList?
Another side question, is nested Seq.append have same complexity as Seq.concat?
e.g.
Seq.append (Seq.append (Seq.append a b) c) d // looks aweful
Seq.concat [a;b;c;d]
1)The relevant implementation of List.rev is in local.fs in the compiler - it is
// optimized mutation-based implementation. This code is only valid in fslib, where mutation of private
// tail cons cells is permitted in carefully written library code.
let rec revAcc xs acc =
match xs with
| [] -> acc
| h::t -> revAcc t (h::acc)
let rev xs =
match xs with
| [] -> xs
| [_] -> xs
| h1::h2::t -> revAcc t [h2;h1]
The comment does seem odd as there is no obvious mutation. Note that this is in fact O(n) not O(n^2)
2) As pad said there is no difference - I prefer to use the to.. as I think
A
|> List.map ...
|> List.toArray
looks nicer than
A
|> List.map ...
|> Array.ofList
but that is just me.
3)
Append (compiler source):
[<CompiledName("Append")>]
let append (source1: seq<'T>) (source2: seq<'T>) =
checkNonNull "source1" source1
checkNonNull "source2" source2
fromGenerator(fun () -> Generator.bindG (toGenerator source1) (fun () -> toGenerator source2))
Note that for each append we get an extra generator that has to be walked through. In comparison, the concat implementation will just have 1 single extra function rather than n so using concat is probably better.
To answer your questions:
1) Time complexity of List.rev is O(n) and worst-case complexity of takeWhile is also O(n). So using List.rev doesn't increase complexity of the function. Using ResizeArray could help you avoid List.rev, but you have to tolerate a bit of mutation.
let takeWhile predicate list =
let rec loop (acc: ResizeArray<_>) rest =
match rest with
| x::xs when predicate x -> acc.Add(x); loop acc xs
| _ -> (acc |> Seq.toList, rest)
loop (ResizeArray()) list
2) There is no difference. Array.ofList and List.toArray uses the same function internally (see here and here).
3). I think Seq.concat has the same complexity with a bunch of Seq.append. In the context of List andArray, concat is more efficient than append because you have more information to pre-allocate space for outputs.
how about this:
let takeWhile pred =
let cont = ref true
List.partition (pred >> fun r -> !cont && (cont := r; r))
It uses a single library function, List.partition, which is efficiently implemented.
Hope this is what you meant :)

Conversion to tail recursion

Hey guys, I'm trying to get cozy with functional programming (particularly with F#), and I've hit a wall when it comes to building tail-recursive functions. I'm pretty good with turning basic recursion (where the function basically calls itself once per invocation), into tail recursion, but I now have a slightly more complicated situation.
In my case, the function must accept a single list as a parameter. When the function is called, I have to remove the first element from the list, and then recur using the remainder of the list. Then I need to apply the first element which I removed in some way to the result of the recursion. Next, I remove the second element and do the same thing (Note: when I say "remove the seond element", that is from the original list, so the list passed at the recursion includes the first element as well). I do the same for the third, fourth, etc. elements of the list.
Is there a way to convert the above situation into a tail-recursive function? Maybe nested tail-recursive functions??? Thank you for any answers.
Okay, so here's my basic code. This particular one is a permutation generator (I'm not too concern with the permutation part, though - it's the recursion I'd like to focusing on):
let permutationsOther str =
match str with
| value :: [] ->
[[value]]
| _ ->
let list = (List.map (fun a -> // This applies the remove part for every element a
let lst = (List.filter (fun b -> b <> a) str) // This part removes element a from the list
let permutedLst = permutations lst // recursive call
consToAll a permutedLst // constToAll this is my own function which performs "cons" operation with a and every element in the list permutedLst
) str)
List.reduce (fun acc elem -> elem # acc) list // flatten list of lists produce by map into a single list
I hope this is clear enough - I'll be happy to provide clarifications if needed.
By the way, I have found just a way to rewrite this particular function so that it only uses a single recursion, but it was a fluke more than an informed decision. However, this has encouraged me that there may be a general method of turning multiple recursion into single recursion, but I have not yet found it.
Conversion to CPS should do the trick:
NOTE 1: Source of the sample is typed directly in browser, so may contain errors :(. But I hope it can demonstrate the general idea.
NOTE 2: consToAll function should be converted to CPS too: consToAll: 'T -> 'T list list -> ('T list list -> 'R) -> 'R
let remove x l = List.filter ((<>) x) l // from original post: should duplicates also be removed ???
let permute l =
let rec loop k l =
match l with
| [] -> k []
| [value] -> k [[value]]
| _ -> filter l [] l (fun r -> r |> List.reduce (fun acc elem -> elem # acc) |> k )
and filter l acc orig fk =
match l with
| [] -> fk acc
| x::xs ->
remove x orig
|> loop (fun res ->
consToAll x res (fun rs -> filter xs (rs::acc) orig fk)
)
loop id l

Resources