Can this be made more functional? - f#

Just starting out with f#, I come OO C# background and I have the following code that reads a text file of uk postscodes, it then hits an api end point with post codes, I test the result to see if post is valid or not:
let postCodeFile = "c:\\tmp\\randomPostCodes.txt"
let readLines filePath = System.IO.File.ReadLines(filePath)
let lines = readLines postCodeFile
let postCodeValidDaterUrl = "https://api.postcodes.io/postcodes/"
let validatePostCode postCode =
Request.createUrl Get (postCodeValidDaterUrl + postCode)
|> getResponse
|> run
let translateResponse response =
match response.statusCode with
| 200 -> true
| _ -> false
let validPostCode = validatePostCode >> translateResponse
lines |> Seq.iter(fun x -> validPostCode(x) |> printfn "%s-%b" x)
Any suggestions on making it more functional?

Critique / code review
You've already made this about as functional as it can be. I'll go through your code and explain why your decisions were good, and in a few cases, where you could make minor improvements.
let postCodeFile = "c:\\tmp\\randomPostCodes.txt"
Good to have this as a named variable; in real code, this would of course be a parameter to the script or a function parameter. So having it as a named variable is useful.
let readLines filePath = System.IO.File.ReadLines(filePath)
This function isn't strictly necessary; since System.IO.File.ReadLines takes a single parameter, you would be able to pipe into it (e.g., postCodeFile |> System.IO.File.ReadLines |> Seq.iter(...). But I like the shorter name, so I'd probably write it this way as well.
let lines = readLines postCodeFile
I'd probably leave off creating the lines name, and instead do postCodeFile |> readLines |> Seq.iter (...) in the last line of your code. There's nothing inherently wrong with the way you've done it, but you don't use the lines variable anywhere else so there's no real reason to give it a name. F#'s pipes allow you to skip naming your intermediate steps.
let postCodeValidDaterUrl = "https://api.postcodes.io/postcodes/"
Again, good to give this a name so that you can turn it into a parameter or a config file variable later. Only thing I see here that could be improved is the spelling: ValidDater should have been Validator.
let validatePostCode postCode =
Request.createUrl Get (postCodeValidDaterUrl + postCode)
|> getResponse
|> run
Looks good.
let translateResponse response =
match response.statusCode with
| 200 -> true
| _ -> false
Could be simpler: let translateResponse response = (response.statusCode = 200). But the match expression lets you expand it later if you have an API that could return other status codes, like 204, to indicate success as well. I'd probably go with the simpler comparison here, and add the match statement only if it's needed.
let validPostCode = validatePostCode >> translateResponse
Nice.
lines |> Seq.iter(fun x -> validPostCode(x) |> printfn "%s-%b" x)
As I mentioned earlier, lines is an intermediate step, so I'd probably change this to postCodeFile |> readLines |> Seq.iter (...) since that allows you to skip giving names to your intermediate steps.
Why this is good
Two things you did well here:
You wrote each function to do just one thing, and composed functions together to create larger "building blocks" of code. E.g., validatePostCode just sends off a request, and a different function decides whether the response indicates a valid code. This is good.
You separated (as much as you could) your I/O from your business logic. Specifically, the part of the code that validates the post codes doesn't try to read them in, or write out the results; it just says "Is this valid, or not?" That means that if you later need to swap out the API you call, or if you can do some internal checking on post codes that doesn't need to go hit the outside world, you can swap that out easily later. It's usually good practice to write your code in "layers", with I/O as the "outside" layer of your code, validation just "inside" the I/O layer, and then business logic inside the validation layer — so that your business-logic code can trust that it has received only valid data. You don't have any business logic in this simple example, but you have the I/O and validation layers properly separated. Well done.

I don't think the aim here should be to make the code "more functional". Being functional is not an inherent value. In F#, it makes sense to keep the core of your logic functional, but if you are doing a lot of I/O, then it makes sense to follow more imperative style.
My version of your code would look like this (somewhat overlapping with some suggestions by #rmunn):
let postCodeFile = "c:\\tmp\\randomPostCodes.txt"
let postCodeValidDaterUrl = "https://api.postcodes.io/postcodes/"
let lines = System.IO.File.ReadLines(postCodeFile)
let validatePostCode postCode =
Request.createUrl Get (postCodeValidDaterUrl + postCode)
|> getResponse
|> run
let translateResponse response =
response.statusCode = 200
for line in lines do
let valid = translateResponse (validatePostCode line)
printfn "%s-%b" line valid
My changes are:
I removed the readLines helper and just call File.ReadLines directly. There is no need to introduce F# alias for .NET methods if it does not serve some greater purpose like providing a more F#-friendly API that is reused in multiple places.
Like #rmunn, I replaced the match with response.statusCode = 200. I only use match when I need to bind new variables as part of matching. When just testing Boolean conditions, I think if is better.
I replaced your composed function and Seq.iter with a normal for loop. The code is imperative anyway, so I do not see why you wouldn't want to use a built-in language construct. I eliminated validPostCode because you're only using the composed function in one place, so introducing it does not simplify code.

This is just my style, it is not necessarily more functional or better:
open System.IO
let postCodeFile = "c:\\tmp\\randomPostCodes.txt"
let postCodeValidDaterUrl = "https://api.postcodes.io/postcodes/"
let validatePostCode postCode =
Request.createUrl Get (postCodeValidDaterUrl + postCode)
|> getResponse
|> run
|> fun response -> response.statusCode = 200
File.ReadLines postCodeFile
|> Seq.iter (fun code -> validatePostCode code |> printfn "%s-%b" code )

Related

formatting Composite function in f#

I have a recursive function in f# that iterates a string[] of commands that need to be run, each command runs a new command to generate a map to be passed to the next function.
The commands run correctly but are large and cumbersome to read, I believe that there is a better way to order / format these composite functions using pipe syntax however coming from c# as a lot of us do i for the life of me cannot seem to get it to work.
my command is :
let rec iterateCommands (map:Map<int,string array>) commandPosition =
if commandPosition < commands.Length then
match splitCommand(commands.[0]).[0] with
|"comOne" ->
iterateCommands (map.Add(commandPosition,create(splitCommand commands.[commandPosition])))(commandPosition+1)
The closest i have managed is by indenting the function but this is messy :
iterateCommands
(map.Add
(commandPosition,create
(splitCommand commands.[commandPosition])
)
)
(commandPosition+1)
Is it even possible to reformat this in f#? From what i have read i believe it possible, any help would be greatly appreciated
The command/variable types are:
commandPosition - int
commands - string[]
splitCommand string -> string[]
create string[] -> string[]
map : Map<int,string[]>
and of course the map.add map -> map + x
It's often hard to make out what is going on in a big statement with multiple inputs. I'd give names to the individual expressions, so that a reader can jump into any position and have a rough idea what's in the values used in a calculation, e.g.
let inCommands = splitCommand commands.[commandPosition]
let map' = map.Add (commandPosition, inCommands)
iterateCommands map' inCommands
Since I don't know what is being done here, the names aren't very meaningful. Ideally, they'd help to understand the individual steps of the calculation.
It'd be a bit easier to compose the call if you changed the arguments around:
let rec iterateCommands commandPosition (map:Map<int,string array>) =
// ...
That would enable you to write something like:
splitCommand commands.[commandPosition]
|> create
|> (fun x -> commandPosition, x)
|> map.Add
|> iterateCommands (commandPosition + 1)
The fact that commandPosition appears thrice in the composition is, in my opinion, a design smell, as is the fact that the type of this entire expression is unit. It doesn't look particularly functional, but since I don't understand exactly what this function attempts to do, I can't suggest a better design.
If you don't control iterateCommands, and hence can't change the order of arguments, you can always define a standard functional programming utility function:
let flip f x y = f y x
This enables you to write the following against the original version of iterateCommands:
splitCommand commands.[commandPosition]
|> create
|> (fun x -> commandPosition, x)
|> map.Add
|> (flip iterateCommands) (commandPosition + 1)

Avoid mutation in this example in F#

Coming from an OO background, I am having trouble wrapping my head around how to solve simple issues with FP when trying to avoid mutation.
let mutable run = true
let player1List = ["he"; "ho"; "ha"]
let addValue lst value =
value :: lst
while run do
let input = Console.ReadLine()
addValue player1List input |> printfn "%A"
if player1List.Length > 5 then
run <- false
printfn "all done" // daz never gunna happen
I know it is ok to use mutation in certain cases, but I am trying to train myself to avoid mutation as the default. With that said, can someone please show me an example of the above w/o using mutation in F#?
The final result should be that player1List continues to grow until the length of items are 6, then exit and print 'all done'
The easiest way is to use recursion
open System
let rec makelist l =
match l |> List.length with
|6 -> printfn "all done"; l
| _ -> makelist ((Console.ReadLine())::l)
makelist []
I also removed some the addValue function as it is far more idiomatic to just use :: in typical F# code.
Your original code also has a common problem for new F# coders that you use run = false when you wanted run <- false. In F#, = is always for comparison. The compiler does actually warn about this.
As others already explained, you can rewrite imperative loops using recursion. This is useful because it is an approach that always works and is quite fundamental to functional programming.
Alternatively, F# provides a rich set of library functions for working with collections, which can actually nicely express the logic that you need. So, you could write something like:
let player1List = ["he"; "ho"; "ha"]
let player2List = Seq.initInfinite (fun _ -> Console.ReadLine())
let listOf6 = Seq.append player1List list2 |> Seq.take 6 |> List.ofSeq
The idea here is that you create an infinite lazy sequence that reads inputs from the console, append it at the end of your initial player1List and then take first 6 elements.
Depending on what your actual logic is, you might do this a bit differently, but the nice thing is that this is probably closer to the logic that you want to implement...
In F#, we use recursion to do loop. However, if you know how many times you need to iterate, you could use F# List.fold like this to hide the recursion implementation.
[1..6] |> List.fold (fun acc _ -> Console.ReadLine()::acc) []
I would remove the pipe from match for readability but use it in the last expression to avoid extra brackets:
open System
let rec makelist l =
match List.length l with
| 6 -> printfn "all done"; l
| _ -> Console.ReadLine()::l |> makelist
makelist []

Working with large text files?

I need to import a large text file (55MB) (525000 * 25) and manipulate the data and produce some output. As usual I started exploring with f# interactive, and I get some really strange behaviours.
Is this file too large or my code wrong?
First test was to import and simply comute the sum over one column (not the end goal but first test):
let calctest =
let reader = new StreamReader(path)
let csv = reader.ReadToEnd()
csv.Split([|'\n'|])
|> Seq.skip 1
|> Seq.map (fun line -> line.Split([|','|]))
|> Seq.filter (fun a -> a.[11] = "M")
|> Seq.map (fun values -> float(values.[14]))
As expected this produces a seq of float both in typecheck and in interactive. If I know add:
|> Seq.sum
Type check works and says this function should return a float but if I run it in interactive I get this error:
System.IndexOutOfRangeException: Index was outside the bounds of the array
Then I removed the last line again and thought I look at the seq of float in a text file:
let writetest =
let str = calctest |> Seq.map (fun i -> i.ToString())
System.IO.File.WriteAllLines("test.txt", str )
Again, this passes the type check but throws errors in interactive.
Can the standard StreamReader not handle that amount of data? or am I going wrong somewhere? Should I use a different function then Streamreader?
Thanks.
Seq is lazy, which means that only when you add the Seq.sum is all the mapping and filtering actually being done, that's why you don't see the error before adding that line. Are you sure you have 15 columns on all rows? That's probably the problem
I would advise you to use the CSV Type Provider instead of just doing a string.Split, that way you'll be sure to not have an accidental IndexOutOfRangeException, and you'll handle , escaping correctly.
Additionaly, you're reading the whole csv file into memory by calling reader.ReadToEnd(), the CsvProvider supports streaming if you set the Cache parameter to false. It's not a problem with a 55MB file, but if you have something much larger it might be

How to invoke the function in Seq.whatever without "printf"?

I'm new to f# and I tried to write a program supposed to go through all files in a given dir and for each file of type ".txt" to add an id number + "DONE" to the file.
my program:
//const:
[<Literal>]
let notImportantString= "blahBlah"
let mutable COUNT = 1.0
//funcs:
//addNumber --> add the sequence number COUNT to each file.
let addNumber (file : string) =
let mutable str = File.ReadAllText(file)
printfn "%s" str//just for check
let num = COUNT.ToString()
let str4 = str + " " + num + "\n\n\n DONE"
COUNT <- COUNT + 1.0
let str2 = File.WriteAllText(file,str4)
file
//matchFunc --> check if is ".txt"
let matchFunc (file : string) =
file.Contains(".txt")
//allFiles --> go through all files of a given dir
let allFiles dir =
seq
{ for file in Directory.GetFiles(dir) do
yield file
}
////////////////////////////
let dir = "D:\FSharpTesting"
let a = allFiles dir
|> Seq.filter(matchFunc)
|> Seq.map(addNumber)
printfn "%A" a
My question:
Tf I do not write the last line (printfn "%A" a) the files will not change.(if I DO write this line it works and change the files)
when I use debugger I see that it doesn't really computes the value of 'a' when it arrives to the line if "let a =......" it continues to the printfn line and than when it "sees" the 'a' there it goes back and computes the answer of 'a'.
why is it and how can i "start" the function without printing??
also- Can some one tells me why do I have to add file as a return type of the function "addNumber"?
(I added this because that how it works but I don't really understand why....)
last question-
if I write the COUNT variable right after the line of the [] definition
it gives an error and says that a constant cannot be "mutable" but if a add (and this is why I did so) another line before (like the string) it "forgets" the mistakes and works.
why that? and if you really cannot have a mutable const how can I do a static variable?
if I do not write the last line (printfn "%A" a) the files will not change.
F# sequences are lazy. So to force evaluation, you can execute some operation not returning a sequence. For example, you can call Seq.iter (have side effects, return ()), Seq.length (return an int which is the length of the sequence) or Seq.toList (return a list, an eager data structure), etc.
Can some one tells me why do I have to add file : string as a return type of the function "addNumber"?
Method and property access don't play nice with F# type inference. The type checker works from left to right, from top to bottom. When you say file.Contains, it doesn't know which type this should be with Contains member. Therefore, your type annotation is a good hint to F# type checker.
if I write the COUNT variable right after the line of the [<Literal>] definition
it gives an error and says that a constant cannot be "mutable"
Quoting from MSDN:
Values that are intended to be constants can be marked with the Literal attribute. This attribute has the effect of causing a value to be compiled as a constant.
A mutable value can change its value at some point in your program; the compiler complains for a good reason. You can simply delete [<Literal>] attribute.
To elaborate on Alex's answer -- F# sequences are lazily evaluated. This means that each element in the sequence is generated "on demand".
The benefit of this is that you don't waste computation time and memory on elements you don't ever need. Lazy evaluation does take a little getting used to though -- specifically because you can't assume order of execution (or that execution will even happen at all).
Your problem has a simple fix: just use Seq.iter to force execution/evaluation of the sequence, and pass the 'ignore' function to it since we don't care about the values returned by the sequence.
let a = allFiles dir
|> Seq.filter(matchFunc)
|> Seq.map(addNumber)
|> Seq.iter ignore // Forces the sequence to execute
Seq.map is intended to map one value to another, not generally to mutate a value. seq<_> represents a lazily generated sequence so, as Alex pointed out, nothing will happen until the sequence is enumerated. This is probably a better fit for codereview, but here's how I would write this:
Directory.EnumerateFiles(dir, "*.txt")
|> Seq.iteri (fun i path ->
let text = File.ReadAllText(path)
printfn "%s" text
let text = sprintf "%s %d\n\n\n DONE" text (i + 1)
File.WriteAllText(path, text))
Seq.map requires a return type, as do all expressions in F#. If a function performs an action, as opposed to computing a value, it can return unit: (). Regarding COUNT, a value cannot be mutable and [<Literal>] (const in C#). Those are precise opposites. For a static variable, use a module-scoped let mutable binding:
module Counter =
let mutable count = 1
open Counter
count <- count + 1
But you can avoid global mutable data by making count a function with a counter variable as a part of its private implementation. You can do this with a closure:
let count =
let i = ref 0
fun () ->
incr i
!i
let one = count()
let two = count()
f# is evaluated from top to bottom, but you are creating only lazy values until you do printfn. So, printfn is actually the first thing that gets executed which in turn executes the rest of your code. I think you can do the same thing if you tack on a println after Seq.map(addNumber) and do toList on it which will force evaluation as well.
This is a general behaviour of lazy sequence. you have the same in, say C# using IEnumerable, for which seq is an alias.
In pseudo code :
var lazyseq = "abcdef".Select(a => print a); //does not do anything
var b = lazyseq.ToArray(); //will evaluate the sequence
ToArray triggers the evaluation of a sequence :
This illustrate the fact that a sequence is just a description, and does not tell you when it will be enumerated : this is in control of the consumer of the sequence.
To go a bit further on the subject, you might want to look at this page from F# wikibook:
let isNebraskaCity_bad city =
let cities =
printfn "Creating cities Set"
["Bellevue"; "Omaha"; "Lincoln"; "Papillion"]
|> Set.ofList
cities.Contains(city)
let isNebraskaCity_good =
let cities =
printfn "Creating cities Set"
["Bellevue"; "Omaha"; "Lincoln"; "Papillion"]
|> Set.ofList
fun city -> cities.Contains(city)
Most notably, Sequence are not cached (although you can make them so). You see then that the dintinguo between the description and the runtime behaviour can have important consequence as the sequence itself is recomputed which can incur a very high cost and introduce quadratic number of operations if each value is itself linear to get !

F# currying efficiency?

I have a function that looks as follows:
let isInSet setElems normalize p =
normalize p |> (Set.ofList setElems).Contains
This function can be used to quickly check whether an element is semantically part of some set; for example, to check if a file path belongs to an html file:
let getLowerExtension p = (Path.GetExtension p).ToLowerInvariant()
let isHtmlPath = isInSet [".htm"; ".html"; ".xhtml"] getLowerExtension
However, when I use a function such as the above, performance is poor since evaluation of the function body as written in "isInSet" seems to be delayed until all parameters are known - in particular, invariant bits such as (Set.ofList setElems).Contains are reevaluated each execution of isHtmlPath.
How can best I maintain F#'s succint, readable nature while still getting the more efficient behavior in which the set construction is preevaluated.
The above is just an example; I'm looking for a general approach that avoids bogging me down in implementation details - where possible I'd like to avoid being distracted by details such as the implementation's execution order since that's usually not important to me and kind of undermines a major selling point of functional programming.
As long as F# doesn't differentiate between pure and impure code, I doubt we'll see optimisations of that kind. You can, however, make the currying explicit.
let isInSet setElems =
let set = Set.ofList setElems
fun normalize p -> normalize p |> set.Contains
isHtmlSet will now call isInSet only once to obtain the closure, at the same time executing ofList.
The answer from Kha shows how to optimize the code manually by using closures directly. If this is a frequent pattern that you need to use often, it is also possible to define a higher-order function that constructs the efficient code from two functions - the first one that does pre-processing of some arguments and a second one which does the actual processing once it gets the remaining arguments.
The code would look like this:
let preProcess finit frun preInput =
let preRes = finit preInput
fun input -> frun preRes input
let f : string list -> ((string -> string) * string) -> bool =
preProcess
Set.ofList // Pre-processing of the first argument
(fun elemsSet (normalize, p) -> // Implements the actual work to be
normalize p |> elemsSet.Contains) // .. done once we get the last argument
It is a question whether this is more elegant though...
Another (crazy) idea is that you could use computation expressions for this. The definition of computation builder that allows you to do this is very non-standard (it is not something that people usually do with them and it isn't in any way related to monads or any other theory). However, it should be possible to write this:
type CurryBuilder() =
member x.Bind((), f:'a -> 'b) = f
member x.Return(a) = a
let curry = new CurryBuilder()
In the curry computation, you can use let! to denote that you want to take the next argument of the function (after evaluating the preceeding code):
let f : string list -> (string -> string) -> string -> bool = curry {
let! elems = ()
let elemsSet = Set.ofList elems
printf "elements converted"
let! normalize = ()
let! p = ()
printf "calling"
return normalize p |> elemsSet.Contains }
let ff = f [ "a"; "b"; "c" ] (fun s -> s.ToLower())
// Prints 'elements converted' here
ff "C"
ff "D"
// Prints 'calling' two times
Here are some resources with more information about computation expressions:
The usual way of using computation expressions is described in free sample chapter of my book: Chapter 12: Sequence Expressions and Alternative Workflows (PDF)
The example above uses some specifics of the translation which is in full detailes described in the F# specification (PDF)
#Kha's answer is spot on. F# cannot rewrite
// effects of g only after both x and y are passed
let f x y =
let xStuff = g x
h xStuff y
into
// effects of g once after x passed, returning new closure waiting on y
let f x =
let xStuff = g x
fun y -> h xStuff y
unless it knows that g has no effects, and in the .NET Framework today, it's usually impossible to reason about the effects of 99% of all expressions. Which means the programmer is still responsible for explicitly coding evaluation order as above.
Currying does not hurt. Currying sometimes introduces closures as well. They are usually efficient too.
refer to this question I asked before. You can use inline to boost performance if necessary.
However, your performance problem in the example is mainly due to your code:
normalize p |> (Set.ofList setElems).Contains
here you need to perform Set.ofList setElems even you curry it. It costs O(n log n) time.
You need to change the type of setElems to F# Set, not List now. Btw, for small set, using lists is faster than sets even for querying.

Resources