Filesystem implementation in F# - f#

I'm coming from C# background and have just started learning F#.
I seem to be stuck on an assignment that seems rather simple.
I need to implement a filesystem with the GIVEN (changing it is not an option) type where 'a is meant for names and 'b for data.
type Tree<'a,'b> = Node of list<'a * Tree<'a,'b>>
| Leaf of 'b
Filesystem would then be an instance of this type where file/directory names are given by strings 'a and the file contents by strings 'b. Directories should be nodes and files leaves.
type FileSystem = Tree<string,string>
// / (directory)
// /home (directory)
// /home/.emacs (file)
// /home/.ssh (directory)
// /home/.ssh/id_rsa (file)
// /usr (directory)
// /usr/bin (directory)
// /usr/bin/bash (file)
// /usr/bin/emacs (file)
I am stuck at...
let fs = Node["/",
Node["home",
Node[".ssh",
Node["id_rsa", Leaf("id_rsa_DATA")]
]
]
]
Any attempt to add more nodes or leaves to any of the nodes fails as compiler expects a tree and not a list?
What is the proper syntax to build a tree like this?
With this tree I need to implement:
ls : FileSystem -> list<string> // list root directory
subfs : FileSystem -> list<string> -> FileSystem // return the filesystem you find at that path
ls' : FileSystem -> list<string> -> list<string> // list data of directory files at given path (fail if its just a file)
Edit:
So this is it:
let fs = Node["/",
Node["home/",
Node[".ssh/",
Node["id_rsa", Leaf("id_rsa_DATA")
];
".emacs", Leaf(".emacs_DATA")
];
"usr/",
Node["bin/",
Node["bash", Leaf("bash_DATA");
"emacs", Leaf("emacs_DATA")
];
".emacs", Leaf(".emacs_DATA")];
]
]
Now how can I list directories and subfilesystem (ls, subfs, ls')?

You need to have you list elements (the 'a * Tree<'a,'b>) be a string * Node for a directory (or sub-directory), and a string * Leaf for a file.
That way you'd end up with something like this:
type Tree<'a,'b> =
| Node of list<'a * Tree<'a,'b>>
| Leaf of 'b
let fs = Node["/",
Node ["someFile.txt", Leaf("content");
"aFile.txt", Leaf("some content");
"someDir/",
Node [ "anotherFile.txt", Leaf("more content");
"script.sh", Leaf("script content") ];
"logfile.log", Leaf("log content") ] ]

Is this closer to what you want?
That is, each node is a tuple of some value 'a and a list of children of type Tree<'a,'b>?
type Tree<'a,'b> =
| Node of 'a * Tree<'a,'b> list
| Leaf of 'b
let fs = Node("/",
[Node("home",
[Node(".ssh",
[Node("id_rsa",
[Leaf("id_rsa_DATA")]
)]
)]
)]
)
let rec f n =
match n with
| Node(".ssh",nodeList)::tail ->
printfn "found .ssh"
f tail
f nodeList
| Leaf(leafStr)::tail ->
printfn "found leaf %s" leafStr
f tail
f [fs]
The recursive cases for this structure are a little bit more complicated to work with than it needs to be, so... you might like this definition better:
type Tree<'a,'b> =
| Node of 'a * Tree<'a,'b> list
// omit the leaf and define it as a node with an empty list of children
Unless of course you would like to be more explicit about what you are working with:
type FileSys<'a,'b> =
| Folder of 'a * FileSys<'a,'b> list
| File of 'b
and if we are talking about a system of file and folder labels, does it need to be generic?
type FileSys =
| Folder of string * FileSys list
| File of string
Often going towards a solution that makes use of discriminated unions in preference to generic types is the simpler way to go. If you do that, you are putting more trust in FSharp's rich set of structures and pattern matching rather than first building up a generic set of tools so that you can subsequently reason about your problem. With F# reasoning and making tooling around your problem space can happen at the same time, in an exploratory fashion.

Related

'Anonymous type variables are not permitted in this declaration' error when adding parameters to discriminated union cases in F#

So I have some (I'm assuming rather unusual) code which is for building Function Trees. Here's it is right now:
type FunctionTree<'Function> =
| BranchNode of seq<FunctionTree<'Function>>
| Leaf of (a:'Function -> unit) with
member __.Execute() = do a
The expression a:'Function -> unit is what makes the compiler throw a fit, giving me the error 'Anonymous type variables are not permitted in this declaration' and I have no idea why. I've tried adding a variable to the BranchNode, adding (yucky) double parentheses around the expression but nothing seems to have worked.
Answer to the compiler error question
This does not compile...
Leaf of (a:'Function -> unit)
...because discriminated field names can be added to the types of the DU cases, not to the types of the function types in a DU case. In contrast, this compiles...
Leaf of a: ('Function -> unit)
...because the field name a is being used to name the type (Function -> unit).
Additional discussion about the code
However, there is another issue. The member Execute that you are adding is not being added to the Leaf node, as your code implies. It is being added to the entire function tree. Consequently, you will not have access to the label a inside your implementation of Execute. Think of it like this...
type FunctionTree<'Function> =
| BranchNode of seq<FunctionTree<'Function>>
| Leaf of a: ('Function -> unit)
with member __.Execute() = do a
... with the member shifted to the left to clarify that it applies to the entire union, not just the leaf case. That explains why the above code now has a different compiler error... a is not defined. The field name a is used to clarify the instantiation of a Leaf case. The field name a is not available elsewhere.
let leaf = Leaf(a: myFunc)
Consequently, the label a is not available to your Execute member. You would need to do something like this...
with member x.Execute(input) =
match x with
| BranchNode(b) -> b |> Seq.iter(fun n -> n.Execute(input))
| Leaf(f) -> f(input) |> ignore
Notice in the above code that the x value is a FunctionTree.
Alternative implementation
We could continue going. However, I think the following may implement what you are aiming for:
type FunctionTree<'T> =
| BranchNode of seq<FunctionTree<'T>>
| LeafNode of ('T -> unit)
let rec evaluate input tree =
match tree with
| LeafNode(leaf) -> leaf(input)
| BranchNode(branch) -> branch |> Seq.iter (evaluate input)
BranchNode([
LeafNode(printfn "%d")
LeafNode(printfn "%A")
])
|> evaluate 42

Customise FsCheck output

I am testing with FsCheck and NUnit in VisualStudio.
The problem currently is: I managed to generate random graphs (for testing some graph functionality) but when a test fails, FsCheck spits out the whole graph and it does not use ToString so it literally dumps raw list of records and you cannot see anything in there.
Also I would need not only the input graph for inspection but also some other data that I create when running a property.
So how can I change the output behaviour of FsCheck in order to
actually call my ToString method on the input graph
output further information
when a test fails?
EDIT:
Here is my current test setup.
module GraphProperties
open NUnit.Framework
open FsCheck
open FsCheck.NUnit
let generateRandomGraph =
gen {
let graph: Graph<int,int> = Graph<_,_>.Empty()
// fill in random nodes and transitions...
return graph
}
type MyGenerators =
static member Graph() =
{new Arbitrary<Graph<int,int>>() with
override this.Generator = generateRandomGraph
override this.Shrinker _ = Seq.empty }
[<TestFixture>]
type NUnitTest() =
[<Property(Arbitrary=[|typeof<MyGenerators>|], QuietOnSuccess = true)>]
member __.cloningDoesNotChangeTheGraph (originalGraph: Graph<int,int>) =
let newGraph = clone originalGraph
newGraph = originalGraph
FsCheck uses sprintf "%A" to convert test parameters to strings in the test output, so what you need to do is control how your types are formatted by the %A formatter. According to How do I customize output of a custom type using printf?, the way to do that is with the StructuredFormatDisplay attribute. The value of that attribute should be a string in the format PreText {PropertyName} PostText, where PropertyName should be a property (not a function!) on your type. E.g., let's say you have a tree structure with some complicated information in the leaves, but for your testing you only need to know about the number of leaves, not what's in them. So you'd start with a data type like this:
// Example 1
type ComplicatedRecord = { ... }
type Tree =
| Leaf of ComplicatedRecord
| Node of Tree list
with
member x.LeafCount =
match x with
| Leaf _ -> 1
| Node leaves -> leaves |> List.sumBy (fun x -> x.LeafCount)
override x.ToString() =
// For test output, we don't care about leaf data, just count
match x with
| Leaf -> "Tree with a total of 1 leaf"
| Node -> sprintf "Tree with a total of %d leaves" x.LeafCount
Now, so far that's not what you want. This type does not have a custom %A format declared, so FsCheck (and anything else that uses sprintf "%A" to format it) will end up outputting the whole complicated structure of the tree and all its irrelevant-to-the-test leaf data. To make FsCheck output what you want to see, you'll need to set up a property, not a function (ToString won't work for this purpose) that will output what you want to see. E.g.:
// Example 2
type ComplicatedRecord = { ... }
[<StructuredFormatDisplay("{LeafCountAsString}")>]
type Tree =
| Leaf of ComplicatedRecord
| Node of Tree list
with
member x.LeafCount =
match x with
| Leaf _ -> 1
| Node leaves -> leaves |> List.sumBy (fun x -> x.LeafCount)
member x.LeafCountAsString = x.ToString()
override x.ToString() =
// For test output, we don't care about leaf data, just count
match x with
| Leaf -> "Tree with a total of 1 leaf"
| Node -> sprintf "Tree with a total of %d leaves" x.LeafCount
NOTE: I haven't tested this in F#, just typed it into the Stack Overflow comment box -- so it's possible that I've messed up the ToString() part. (I don't remember, and can't find with a quick Google, whether overrides should be after or before the with keyword). But I know that the StructuredFormatDisplay attribute is what you want, because I've used this myself to get custom output out of FsCheck.
By the way, you could also have set a StructuredFormatDisplay attribute on the complicated record type in my example as well. For example, if you have a test where you care about the tree structure but not about the contents of the leaves, you'd write it like:
// Example 3
[<StructuredFormatDisplay("LeafRecord")>] // Note no {} and no property
type ComplicatedRecord = { ... }
type Tree =
| Leaf of ComplicatedRecord
| Node of Tree list
with
member x.LeafCount =
match x with
| Leaf _ -> 1
| Node leaves -> leaves |> List.sumBy (fun x -> x.LeafCount)
override x.ToString() =
// For test output, we don't care about leaf data, just count
match x with
| Leaf -> "Tree with a total of 1 leaf"
| Node -> sprintf "Tree with a total of %d leaves" x.LeafCount
Now all your ComplicatedRecord instances, no matter their contents, will show up as the text LeafRecord in your output, and you'll be better able to focus on the tree structure instead -- and there was no need to set a StructuredFormatDisplay attribute on the Tree type.
This isn't a totally ideal solution, as you might need to adjust the StructuredFormatDisplay attribute from time to time, as needed by the various tests you're running. (For some tests you might want to focus on one part of the leaf data, for others you'd want to ignore the leaf data entirely, and so on). And you'll probably want to take the attribute out before you go to production. But until FsCheck acquires a "Give me a function to format failed test data with" config parameter, this is the best way to get your test data formatted the way you need it.
You can also use labels to display whatever you want when a test fails: https://fscheck.github.io/FsCheck/Properties.html#And-Or-and-Labels

How to implement data structure using functional approach? (Linked list, tree etc)

I am new in functional programming, I learn F# and sorry if question is stupid.
I want figure out with syntax and implement some simple data structure, but I don't know how do it.
How should look implementation of linked list?
I tried to create type, put there mutable property and define set of methods to work with the type, but it looks like object oriented linked list...
The basic list type in F# is already somewhat a linked list.
Though you can easily recreate a linked list with a simple union type:
type LinkedList<'t> = Node of 't * LinkedList<'t> | End
A node can have a value and a pointer to the next node or, be the end.
You can simply make a new list by hand:
Node(1, Node(2, Node(3, End))) //LinkedList<int> = Node (1,Node (2,Node (3,End)))
Or make a new linked list by feeding it an F# list:
let rec toLinkedList = function
| [] -> End
| x::xs -> Node (x, (toLinkedList xs))
Walking through it:
let rec walk = function
| End -> printfn "%s" "End"
| Node(value, list) -> printfn "%A" value; walk list
The same concepts would apply for a tree structure as well.
A tree would look something like
type Tree<'leaf,'node> =
| Leaf of 'leaf
| Node of 'node * Tree<'leaf,'node> list
The F# Wikibook has a good article on data structures in F#.

Subset of Union members as "parameter" in Pattern matching

Let us have a type definition for a tree with several types of binary nodes, among other types of nodes, i.e.
type Tree =
| BinaryNodeA of Tree * Tree
| BinaryNodeB of Tree * Tree
| [Other stuff...]
I want to manipulate this tree using a recursive function that could, e.g., swap subnodes of any kind of binary node (by constructing a new node). The problem that is driving me crazy: How to match all BinaryNodes so that Node flavor becomes "a parameter" so as to have generic swap that can be applied to any BinaryNode flavor to return swapped node of that flavor?
I know how to match all Trees that are BinaryNodes by using an active pattern:
let (|BinaryNode|_|) (tree : Tree) =
match tree with
| BinaryNodeA _ | BinaryNodeB _ -> Some(tree)
| _ -> None
But that's not good enough because the following does not seem achievable:
match tree with
| [cases related to unary nodes..]
| BinaryNode a b -> BinaryNode b a
In other words, I have not found way to use BinaryNode flavor as if it were parameter like a and b. Instead, it seems I have to match each BinaryNode flavor separately. This could have practical significance if there were large number of binary node flavors. Type Tree is AST for Fsyacc/Fslex-generated parser/lexer, which limits options to restructure it. Any ideas?
You just need to change the definition of your active pattern:
type Flavor = A | B
let (|BinaryNode|_|) (tree : Tree) =
match tree with
| BinaryNodeA(x,y) -> Some(A,x,y)
| BinaryNodeB(x,y) -> Some(B,x,y)
| _ -> None
let mkBinaryNode f t1 t2 =
match f with
| A -> BinaryNodeA(t1,t2)
| B -> BinaryNodeB(t1,t2)
Then you can achieve what you want like this:
match tree with
| [cases related to unary nodes..]
| BinaryNode(f,t1,t2) -> mkBinaryNode f t2 t1
But if this is a common need then it might make sense to alter the definition of Tree to include flavor instead of dealing with it using active patterns.

Better way to get tree representation of directory using F#?

I am new(ish) to F# and am trying to get a tree representation of a filesystem directory. Here's what I came up with:
type FSEntry =
| File of name:string
| Directory of name:string * entries:seq<FSEntry>
let BuildFSDirectoryTreeNonTailRecursive path =
let rec GetEntries (directoryInfo:System.IO.DirectoryInfo) =
directoryInfo.EnumerateFileSystemInfos("*", System.IO.SearchOption.TopDirectoryOnly)
|> Seq.map (fun info ->
match info with
| :? System.IO.FileInfo as file -> File (file.Name)
| :? System.IO.DirectoryInfo as dir -> Directory (dir.Name, GetEntries dir)
| _ -> failwith "Illegal FileSystemInfo type"
)
let directoryInfo = System.IO.DirectoryInfo path
Directory (path, GetEntries directoryInfo)
But... pretty sure that isn't tail recursive. I took a look at the generated IL and didn't see any tail prefix. Is there a better way to do this? I tried using an accumulator but didn't see how that helps. I tried mutual recursive functions and got nowhere. Maybe a continuation would work but I found that confusing.
(I know that stack-depth won't be an issue in this particular case but still would like to know how to tackle this non-tail recursion problem in general)
OTOH, it does seem to work. The following prints out what I am expecting:
let PrintFSEntry fsEntry =
let rec printFSEntryHelper indent entry =
match entry with
| File name -> printfn "%s%s" indent name
| Directory(name, entries) ->
printfn "%s\\%s" indent name
entries
|> Seq.sortBy (function | File name -> 0 | Directory (name, entries) -> 1)
|> Seq.iter (printFSEntryHelper (indent + " "))
printFSEntryHelper "" fsEntry
This should probably be a different question but... how does one go about testing BuildFSDirectoryTreeNonTailRecursive? I suppose I could create an interface and mock it like I would in C#, but I thought F# had better approaches.
Edited: Based on the initial comments, I specified that I know stack space probably isn't an issue. I also specify I'm mainly concerned with testing the first function.
To expand on my comment from earlier - unless you anticipate working with inputs that would cause a stack overflow without tail recursion, there's nothing to be gained from making a function tail-recursive. For your case, the limiting factor is the ~260 characters in path name, beyond which most Windows APIs will start to break. You'll hit that way before you start running out of stack space due to non-tail recursion.
As for testing, you want your functions to be as close to a pure function as possible. This involves refactoring out the pieces of the function that are side-effecting. This is the case with both of your functions - one of them implicitly depends on the filesystem, the other prints text directly to the standard output.
I guess the refactoring I suggest is fairly close to Mark Seemann's points: few mocks - checked, few interfaces - checked, function composition - checked. The example you have however doesn't lend itself nicely to it, because it's an extremely thin veneer over EnumerateFileSystemInfo. I can get rid of System.IO like this:
type FSInfo = DirInfo of string * string | FileInfo of string
let build enumerate path =
let rec entries path =
enumerate path
|> Seq.map (fun info ->
match info with
| DirInfo (name, path) -> Directory(name, entries path)
| FileInfo name -> File name)
Directory(path, entries path)
And now I'm left with an enumerate: string -> seq<FSInfo> function that can easily be replaced with a test implementation that doesn't even touch the drive. Then the default implementation of enumerate would be:
let enumerateFileSystem path =
let directoryInfo = DirectoryInfo(path)
directoryInfo.EnumerateFileSystemInfos("*", System.IO.SearchOption.TopDirectoryOnly)
|> Seq.map (fun info ->
match info with
| :? System.IO.FileInfo as file -> FileInfo (file.Name)
| :? System.IO.DirectoryInfo as dir -> DirInfo (dir.Name, dir.FullName)
| _ -> failwith "Illegal FileSystemInfo type")
You can see that it has virtually the same shape as the build function, minus recursion, since the entire 'core' of your logic is in EnumerateFileSystemInfos which lives beyond your code. This is a slight improvement, not in any way test-induced damage, but still it's not something that will make it onto anyone's slides anytime soon.

Resources