Better way to get tree representation of directory using F#? - f#

I am new(ish) to F# and am trying to get a tree representation of a filesystem directory. Here's what I came up with:
type FSEntry =
| File of name:string
| Directory of name:string * entries:seq<FSEntry>
let BuildFSDirectoryTreeNonTailRecursive path =
let rec GetEntries (directoryInfo:System.IO.DirectoryInfo) =
directoryInfo.EnumerateFileSystemInfos("*", System.IO.SearchOption.TopDirectoryOnly)
|> Seq.map (fun info ->
match info with
| :? System.IO.FileInfo as file -> File (file.Name)
| :? System.IO.DirectoryInfo as dir -> Directory (dir.Name, GetEntries dir)
| _ -> failwith "Illegal FileSystemInfo type"
)
let directoryInfo = System.IO.DirectoryInfo path
Directory (path, GetEntries directoryInfo)
But... pretty sure that isn't tail recursive. I took a look at the generated IL and didn't see any tail prefix. Is there a better way to do this? I tried using an accumulator but didn't see how that helps. I tried mutual recursive functions and got nowhere. Maybe a continuation would work but I found that confusing.
(I know that stack-depth won't be an issue in this particular case but still would like to know how to tackle this non-tail recursion problem in general)
OTOH, it does seem to work. The following prints out what I am expecting:
let PrintFSEntry fsEntry =
let rec printFSEntryHelper indent entry =
match entry with
| File name -> printfn "%s%s" indent name
| Directory(name, entries) ->
printfn "%s\\%s" indent name
entries
|> Seq.sortBy (function | File name -> 0 | Directory (name, entries) -> 1)
|> Seq.iter (printFSEntryHelper (indent + " "))
printFSEntryHelper "" fsEntry
This should probably be a different question but... how does one go about testing BuildFSDirectoryTreeNonTailRecursive? I suppose I could create an interface and mock it like I would in C#, but I thought F# had better approaches.
Edited: Based on the initial comments, I specified that I know stack space probably isn't an issue. I also specify I'm mainly concerned with testing the first function.

To expand on my comment from earlier - unless you anticipate working with inputs that would cause a stack overflow without tail recursion, there's nothing to be gained from making a function tail-recursive. For your case, the limiting factor is the ~260 characters in path name, beyond which most Windows APIs will start to break. You'll hit that way before you start running out of stack space due to non-tail recursion.
As for testing, you want your functions to be as close to a pure function as possible. This involves refactoring out the pieces of the function that are side-effecting. This is the case with both of your functions - one of them implicitly depends on the filesystem, the other prints text directly to the standard output.
I guess the refactoring I suggest is fairly close to Mark Seemann's points: few mocks - checked, few interfaces - checked, function composition - checked. The example you have however doesn't lend itself nicely to it, because it's an extremely thin veneer over EnumerateFileSystemInfo. I can get rid of System.IO like this:
type FSInfo = DirInfo of string * string | FileInfo of string
let build enumerate path =
let rec entries path =
enumerate path
|> Seq.map (fun info ->
match info with
| DirInfo (name, path) -> Directory(name, entries path)
| FileInfo name -> File name)
Directory(path, entries path)
And now I'm left with an enumerate: string -> seq<FSInfo> function that can easily be replaced with a test implementation that doesn't even touch the drive. Then the default implementation of enumerate would be:
let enumerateFileSystem path =
let directoryInfo = DirectoryInfo(path)
directoryInfo.EnumerateFileSystemInfos("*", System.IO.SearchOption.TopDirectoryOnly)
|> Seq.map (fun info ->
match info with
| :? System.IO.FileInfo as file -> FileInfo (file.Name)
| :? System.IO.DirectoryInfo as dir -> DirInfo (dir.Name, dir.FullName)
| _ -> failwith "Illegal FileSystemInfo type")
You can see that it has virtually the same shape as the build function, minus recursion, since the entire 'core' of your logic is in EnumerateFileSystemInfos which lives beyond your code. This is a slight improvement, not in any way test-induced damage, but still it's not something that will make it onto anyone's slides anytime soon.

Related

How to split F# result type list into lists of inner type

I have a list/sequence as follows Result<DataEntry, exn> []. This list is populated by calling multiple API endpoints in parallel based on some user inputs.
I don't care if some of the calls fail as long as at least 1 succeeds. I then need to perform multiple operations on the success list.
My question is how to partition the Result list into exn [] and DataEntry [] lists. I tried the following:
// allData is Result<DataEntry, exn> []
let filterOutErrors (input: Result<DataEntry, exn>) =
match input with
| Ok v -> true
| _ -> false
let values, err = allData |> Array.partition filterOutErrors
This in principle meets the requirement since values contains all the success cases but understandably the compiler can't infer the types so both values and err contains Result<DataEntry, exn>.
Is there any way to split a list of result Result<Success, Err> such that you end up with separate lists of the inner type?
Is there any way to split a list of result Result<Success, Err> such that you end up with separate lists of the inner type?
Remember that Seq / List / Array are foldable, so you can use fold to convert a Seq / List / Array of 'Ts into any other type 'S. Here you want to go from []Result<DataEntry, exn> to, e.g., the tuple list<DataEntry> * list<exn>. We can define the following folder function, that takes an initial state s of type list<'a> * list<'b> and a Result Result<'a, 'b> and returns your tuple of lists list<'a> * list<'b>:
let listFolder s r =
match r with
| Ok data -> (data :: (fst s), snd s)
| Error err -> (fst s, err :: (snd s))
then you can fold over your array as follows:
let (values, err) = Seq.fold listFolder ([], []) allData
You can extract the good and the bad like this.
let values =
allData
|> Array.choose (fun r ->
match r with
| Result.Ok ok -> Some ok
| Result.Error _ -> None)
let err =
allData
|> Array.choose (fun r ->
match r with
| Result.Ok _ -> None
| Result.Error error -> Some error)
You seem confused about whether you have arrays or lists. The F# code you use, in the snippet and in your question text, all points to use of arrays, in spite of you several times mentioning lists.
It has recently been recommended that we use array instead of the [] symbol in types, since there are inconsistencies in the way F# uses the symbol [] to mean list in some places, and array in other places. There is also the symbol [||] for arrays, which may add more confusion.
So that would be recommending Result<DataEntry,exn> array in this case.
The answer from Víctor G. Adán is functional, but it's a downside that the API requires you to pass in two empty lists, exposing the internal implementation.
You could wrap this into a "starter" function, but then the code grows, requires nested functions or using modules and the intention is obscured.
The answer from Bent Tranberg, while more readable requires two passes of the data, and it seems inefficient to map into Option type just to be able to filter on it using .Choose.
I propose KISS'ing it with some good old mutation.
open System.Collections.Generic
let splitByOkAndErrors xs =
let oks = List<'T>()
let errors = List<'V>()
for x in xs do
match x with
| Ok v -> oks.Add v
| Error e -> errors.Add e
(oks |> seq, errors |> seq)
I know I know, mutation, yuck right? I believe you should not shy away from that even in F#, use the right tool for every situation: the mutation is kept local to the function, so it's still pure. The API is clean just taking in the list of Result to split, there is no concepts like folding, recursive calls, list cons pattern matching etc. to understand, and the function won't reverse the input list, you also have the option to return array or seq, that is, you are not confined to a linked list that can only be appended to in O(1) in the head - which in my experience seldom fits well into business case, win win win in my book.
I general, I hope to see F# grow into a more multi-paradigm programming language in the community's mind. It's nice to see these functional solutions, but I fear they scare some people away unnecessarily, as F# is already multi-paradigm!

Re-implementing List.map in OCaml/F# with correct side effect order?

According to this previous answer
You could implement List.map like this:
let rec map project = function
| [] -> []
| head :: tail ->
project head :: map project tail ;;
but instead, it is implemented like this:
let rec map project = function
| [] -> []
| head :: tail ->
let result = project head in
result :: map project tail ;;
They say that it is done this way to make sure the projection function is called in the expected order in case it has side effects, e.g.
map print_int [1;2;3] ;;
should print 123, but the first implementation would print 321. However, when I test both of them myself in OCaml and F#, they produce exactly the same 123 result.
(Note that I am testing this in the OCaml and F# REPLs--Nick in the comments suggests this might be the cause of my inability to reproduce, but why?)
What am I misunderstanding? Can someone elaborate why they should produce different orders and how I can reproduce? This runs contrary to my previous understanding of OCaml code I've written in the past so this was surprising to me and I want to make sure not to repeat the mistake. When I read the two, I read it as exactly the same thing with an extraneous intermediary binding.
My only guess is that the order of expression evaluation using cons is right to left, but that seems very odd?
This is being done purely as research to better understand how OCaml executes code, I don't really need to create my own List.map for production code.
The point is that the order of function application in OCaml is unspecified, not that it will be in some specific undesired order.
When evaluating this expression:
project head :: map project tail
OCaml is allowed to evaluate project head first or it can evaluate map project tail first. Which one it chooses to do is unspecified. (In theory it would probably be admissible for the order to be different for different calls.) Since you want a specified order, you need to use the form with let.
The fact that the order is unspecified is documented in Section 6.7 of the OCaml manual. See the section Function application:
The order in which the expressions expr, argument1, …, argumentn are evaluated is not specified.
(The claim that the evaluation order is unspecified isn't something you can test. No number of cases of a particular order prove that that order is always going to be chosen.)
So when you have an implementation of map like this:
let rec map f = function
| [] -> []
| a::l -> f a :: map f l
none of the function applications (f a) within the map calls are guaranteed to be evaluated sequentially in the order you'd expect. So when you try this:
map print_int [1;2;3]
you get the output
321- : unit list = [(); (); ()]
since by the time those function applications weren't executed in a specific order.
Now when you implement the map like this:
let rec map f = function
| [] -> []
| a::l -> let r = f a in r :: map f l
you're forcing the function applications to be executed in the order you're expecting because you explicitly make a call to evaluate let r = f a.
So now when you try:
map print_int [1;2;3]
you will get
123- : unit list = [(); (); ()]
because you've explicitly made an effort to evaluate the function applications in order.

F#: shortest way to convert a string option to a string

The objective is to convert a string option that comes out of some nicely typed computation to a plain string that can then be passed to the UI/printf/URL/other things that just want a string and know nothing of option types. None should just become the empty string.
The obvious way is to do a match or an if on the input:
input |> fun s -> fun s -> match s with | Some v -> v | _ -> "" or
input |> fun s -> if s.IsSome then s.Value else ""
but while still being one-liners, these still take up quite a lot of line space. I was hoping to find the shortest possible method for doing this.
You can also use the function defaultArg input "" which in your code that uses forward pipe would be:
input |> fun s -> defaultArg s ""
Here's another way of writing the same but without the lambda:
input |> defaultArg <| ""
It would be better if we had a version in the F# core with the arguments flipped. Still I think this is the shortest way without relaying in other libraries or user defined functions.
UPDATE
Now in F# 4.1 FSharp.Core provides Option.defaultValue which is the same but with arguments flipped, so now you can simply write:
Option.defaultValue "" input
Which is pipe-forward friendly:
input |> Option.defaultValue ""
The obvious way is to write yourself a function to do it, and if you put it in an Option module, you won't even notice it's not part of the core library:
module Option =
let defaultTo defValue opt =
match opt with
| Some x -> x
| None -> defValue
Then use it like this:
input |> Option.defaultTo ""
The NuGet package FSharpX.Extras has Option.getOrElse which can be composed nicely.
let x = stringOption |> Option.getOrElse ""
The best solution I found so far is input |> Option.fold (+) "".
...which is just a shortened version of input |> Option.fold (fun s t -> s + t) "".
I suspect that it's the shortest I'll get, but I'd like to hear if there are other short ways of doing this that would be easier to understand by non-functional programmers.

How to write efficient list/seq functions in F#? (mapFoldWhile)

I was trying to write a generic mapFoldWhile function, which is just mapFold but requires the state to be an option and stops as soon as it encounters a None state.
I don't want to use mapFold because it will transform the entire list, but I want it to stop as soon as an invalid state (i.e. None) is found.
This was myfirst attempt:
let mapFoldWhile (f : 'State option -> 'T -> 'Result * 'State option) (state : 'State option) (list : 'T list) =
let rec mapRec f state list results =
match list with
| [] -> (List.rev results, state)
| item :: tail ->
let (result, newState) = f state item
match newState with
| Some x -> mapRec f newState tail (result :: results)
| None -> ([], None)
mapRec f state list []
The List.rev irked me, since the point of the exercise was to exit early and constructing a new list ought to be even slower.
So I looked up what F#'s very own map does, which was:
let map f list = Microsoft.FSharp.Primitives.Basics.List.map f list
The ominous Microsoft.FSharp.Primitives.Basics.List.map can be found here and looks like this:
let map f x =
match x with
| [] -> []
| [h] -> [f h]
| (h::t) ->
let cons = freshConsNoTail (f h)
mapToFreshConsTail cons f t
cons
The consNoTail stuff is also in this file:
// optimized mutation-based implementation. This code is only valid in fslib, where mutation of private
// tail cons cells is permitted in carefully written library code.
let inline setFreshConsTail cons t = cons.(::).1 <- t
let inline freshConsNoTail h = h :: (# "ldnull" : 'T list #)
So I guess it turns out that F#'s immutable lists are actually mutable because performance? I'm a bit worried about this, having used the prepend-then-reverse list approach as I thought it was the "way to go" in F#.
I'm not very experienced with F# or functional programming in general, so maybe (probably) the whole idea of creating a new mapFoldWhile function is the wrong thing to do, but then what am I to do instead?
I often find myself in situations where I need to "exit early" because a collection item is "invalid" and I know that I don't have to look at the rest. I'm using List.pick or Seq.takeWhile in some cases, but in other instances I need to do more (mapFold).
Is there an efficient solution to this kind of problem (mapFoldWhile in particular and "exit early" in general) with functional programming concepts, or do I have to switch to an imperative solution / use a Collections.Generics.List?
In most cases, using List.rev is a perfectly sufficient solution.
You are right that the F# core library uses mutation and other dirty hacks to squeeze some more performance out of the F# list operations, but I think the micro-optimizations done there are not particularly good example. F# list functions are used almost everywhere so it might be a good trade-off, but I would not follow it in most situations.
Running your function with the following:
let l = [ 1 .. 1000000 ]
#time
mapFoldWhile (fun s v -> 0, s) (Some 1) l
I get ~240ms on the second line when I run the function without changes. When I just drop List.rev (so that it returns the data in the other order), I get around ~190ms. If you are really calling the function frequently enough that this matters, then you'd have to use mutation (actually, your own mutable list type), but I think that is rarely worth it.
For general "exit early" problems, you can often write the code as a composition of Seq.scan and Seq.takeWhile. For example, say you want to sum numbers from a sequence until you reach 1000. You can write:
input
|> Seq.scan (fun sum v -> v + sum) 0
|> Seq.takeWhile (fun sum -> sum < 1000)
Using Seq.scan generates a sequence of sums that is over the whole input, but since this is lazily generated, using Seq.takeWhile stops the computation as soon as the exit condition happens.

Simpler way of pattern matching against start of list in F#

I'm trying to write a string processing function in F#, which looks like this:
let rec Process html =
match html with
| '-' :: '-' :: '>' :: tail -> ("→" |> List.of_seq) # Process tail
| head :: tail -> head :: Process tail
| [] -> []
My pattern matching expression against several elements is a bit ugly (the whole '-' :: '-' :: '>' thing). Is there any way to make it better? Also, is what I'm doing efficient if I were to process large texts? Or is there another way?
Clarification: what I mean is, e.g., being able to write something like this:
match html with
| "-->" :: tail ->
I agree with others that using a list of characters for doing serious string manipulation is probably not ideal. However, if you'd like to continue to use this approach, one way to get something close to what you're asking for is to define an active pattern. For instance:
let rec (|Prefix|_|) s l =
if s = "" then
Some(Prefix l)
else
match l with
| c::(Prefix (s.Substring(1)) xs) when c = s.[0] -> Some(Prefix xs)
| _ -> None
Then you can use it like:
let rec Process html =
match html with
| Prefix "-->" tail -> ("→" |> List.of_seq) # Process tail
| head :: tail -> head :: Process tail
| [] -> []
Is there any way to make it better?
Sure:
let process (s: string) = s.Replace("-->", "→")
Also, is what I'm doing efficient if I were to process large texts?
No, it is incredibly inefficient. Allocation and garbage collection is expensive and you're doing so for every single character.
Or is there another way?
Try the Replace member. If that doesn't work, try a regular expression. If that doesn't work, write a lexer (e.g. using fslex). Ultimately, what you want for efficiency is a state machine processing a stream of chars and outputting its result by mutating in-place.
I think you should avoid using list<char> and using strings and e.g. String.Replace, String.Contains, etc. System.String and System.StringBuilder will be much better for manipulating text than list<char>.
For simple problems, using String and StringBuilder directly as Brian mentioned is probably the best way. For more complicated problems, you may want to check out some sophisticated parsing library like FParsec for F#.
This question may be some help to give you ideas for another way of approaching your problem - using list<> to contain lines, but using String functions within each line.

Resources