Useful F# Scripts - f#

I have been investigating the use of F# for development and have found (for my situations) building scripts to help me simplify some complex tasks is where I can get value from it (at the moment).
My most common complex task is concatenating files for many tasks (mostly SQL related).
I do this often and every time I try to improve on my F# script to do this.
This is my best effort so far:
open System.IO
let path = "C:\\FSharp\\"
let pattern = "*.txt"
let out_path = path + "concat.out"
File.Delete(out_path)
Directory.GetFiles(path, pattern)
|> Array.collect (fun file -> File.ReadAllLines(file))
|> (fun content -> File.WriteAllLines(out_path, content) )
I'm sure others have scripts which makes their sometimes complex/boring tasks easier.
What F# scripts have you used to do this or what other purposes for F# scripts have you found useful?
I found the best way for me to improve my F# was to browse other scripts to get ideas on how to tackle specific situations. Hopefully this question will help me and others in the future. :)
I have found an article on generating F# scripts that may be of interest:
http://blogs.msdn.com/chrsmith/archive/2008/09/12/scripting-in-f.aspx

I use F# in a similar way when I need to quickly pre-process some data or convert data between various formats. F# has a great advantage that you can create higher-order functions for doing all sorts of similar tasks.
For example, I needed to load some data from SQL database and generate Matlab script files that load the data. I needed to do this for a couple of different SQL queries, so I wrote these two functions:
// Runs the specified query 'str' and reads results using 'f'
let query str f = seq {
let conn = new SqlConnection("<conn.str>");
let cmd = new SqlCommand(str, conn)
conn.Open()
use rdr = cmd.ExecuteReader(CommandBehavior.CloseConnection)
while rdr.Read() do yield f(rdr) }
// Simple function to save all data to the specified file
let save file data =
File.WriteAllLines(#"C:\...\" + file, data |> Array.ofSeq)
Now I could easily write specific calls to read the data I need, convert them to F# data types, does some pre-processing (if needed) and print the outputs to a file. For example for processing companies I had something like:
let comps =
query "SELECT [ID], [Name] FROM [Companies] ORDER BY [ID]"
(fun rdr -> rdr.GetString(1) )
let cdata =
seq { yield "function res = companies()"
yield " res = {"
for name in comps do yield sprintf " %s" name
yield " };"
yield "end" }
save "companies.m" cdata
Generating output as a sequence of strings is also pretty neat, though you could probably write a more efficient computation builder using StringBuilder.
Another example of using F# in the interactive way is described in my functional programming book in Chapter 13 (you can get the source code here). It connects to the World Bank database (which contains a lots of information about various countries), extracts some data, explores the structure of the data, convert them to F# data types and calculates some results (and visualizes them). I think this is (one of many) kinds of tasks that can be very nicely done in F#.

Sometimes if I want a brief of an XML structure (or have a recursive list to use in other forms such as searches), I can print out a tabbed list of nodes in the XML using the following script:
open System
open System.Xml
let path = "C:\\XML\\"
let xml_file = path + "Test.xml"
let print_element level (node:XmlNode) = [ for tabs in 1..level -> " " ] # [node.Name]
|> (String.concat "" >> printfn "%s")
let rec print_tree level (element:XmlNode) =
element
|> print_element level
|> (fun _ -> [ for child in element.ChildNodes -> print_tree (level+1) child ])
|> ignore
new XmlDocument()
|> (fun doc -> doc.Load(xml_file); doc)
|> (fun doc -> print_tree 0 doc.DocumentElement)
I am sure it can be optimised/cut down and would encourage by others' improvements on this code. :)

(For an alternative snippet see the answer below.)
This snippet transforms an XML using an XSLT. I wasn't sure of hte best way to use the XslCompiledTransform and XmlDocument objects the best in F#, but it seemed to work. I am sure there are better ways and would be happy to hear about them.
(* Transforms an XML document given an XSLT. *)
open System.IO
open System.Text
open System.Xml
open System.Xml.Xsl
let path = "C:\\XSL\\"
let file_xml = path + "test.xml"
let file_xsl = path + "xml-to-xhtml.xsl"
(* Compile XSL file to allow transforms *)
let compile_xsl (xsl_file:string) = new XslCompiledTransform() |> (fun compiled -> compiled.Load(xsl_file); compiled)
let load_xml (xml_file:string) = new XmlDocument() |> (fun doc -> doc.Load(xml_file); doc)
(* Transform an Xml document given an XSL (compiled *)
let transform (xsl_file:string) (xml_file:string) =
new MemoryStream()
|> (fun mem -> (compile_xsl xsl_file).Transform((load_xml xml_file), new XmlTextWriter(mem, Encoding.UTF8)); mem)
|> (fun mem -> mem.Position <- (int64)0; mem.ToArray())
(* Return an Xml fo document that has been transformed *)
transform file_xsl file_xml
|> (fun bytes -> File.WriteAllBytes(path + "out.html", bytes))

After clarifying approaches to writing F# code with existing .net classes, the following useful code came up for transforming xml documents given xsl documents. The function also allows you to create a custom function to transform xml documents with a specific xsl document (see example):
let transform =
(fun xsl ->
let xsl_doc = new XslCompiledTransform()
xsl_doc.Load(string xsl)
(fun xml ->
let doc = new XmlDocument()
doc.Load(string xml)
let mem = new MemoryStream()
xsl_doc.Transform(doc.CreateNavigator(), null, mem)
mem
)
)
This allows you to transform docs this way:
let result = transform "report.xml" "report.xsl"
or you can create another function which can be used multiple times:
let transform_report "report.xsl"
let reports = [| "report1.xml"; "report2.xml" |]
let results = [ for report in reports do transform_report report ]

Related

Can this be made more functional?

Just starting out with f#, I come OO C# background and I have the following code that reads a text file of uk postscodes, it then hits an api end point with post codes, I test the result to see if post is valid or not:
let postCodeFile = "c:\\tmp\\randomPostCodes.txt"
let readLines filePath = System.IO.File.ReadLines(filePath)
let lines = readLines postCodeFile
let postCodeValidDaterUrl = "https://api.postcodes.io/postcodes/"
let validatePostCode postCode =
Request.createUrl Get (postCodeValidDaterUrl + postCode)
|> getResponse
|> run
let translateResponse response =
match response.statusCode with
| 200 -> true
| _ -> false
let validPostCode = validatePostCode >> translateResponse
lines |> Seq.iter(fun x -> validPostCode(x) |> printfn "%s-%b" x)
Any suggestions on making it more functional?
Critique / code review
You've already made this about as functional as it can be. I'll go through your code and explain why your decisions were good, and in a few cases, where you could make minor improvements.
let postCodeFile = "c:\\tmp\\randomPostCodes.txt"
Good to have this as a named variable; in real code, this would of course be a parameter to the script or a function parameter. So having it as a named variable is useful.
let readLines filePath = System.IO.File.ReadLines(filePath)
This function isn't strictly necessary; since System.IO.File.ReadLines takes a single parameter, you would be able to pipe into it (e.g., postCodeFile |> System.IO.File.ReadLines |> Seq.iter(...). But I like the shorter name, so I'd probably write it this way as well.
let lines = readLines postCodeFile
I'd probably leave off creating the lines name, and instead do postCodeFile |> readLines |> Seq.iter (...) in the last line of your code. There's nothing inherently wrong with the way you've done it, but you don't use the lines variable anywhere else so there's no real reason to give it a name. F#'s pipes allow you to skip naming your intermediate steps.
let postCodeValidDaterUrl = "https://api.postcodes.io/postcodes/"
Again, good to give this a name so that you can turn it into a parameter or a config file variable later. Only thing I see here that could be improved is the spelling: ValidDater should have been Validator.
let validatePostCode postCode =
Request.createUrl Get (postCodeValidDaterUrl + postCode)
|> getResponse
|> run
Looks good.
let translateResponse response =
match response.statusCode with
| 200 -> true
| _ -> false
Could be simpler: let translateResponse response = (response.statusCode = 200). But the match expression lets you expand it later if you have an API that could return other status codes, like 204, to indicate success as well. I'd probably go with the simpler comparison here, and add the match statement only if it's needed.
let validPostCode = validatePostCode >> translateResponse
Nice.
lines |> Seq.iter(fun x -> validPostCode(x) |> printfn "%s-%b" x)
As I mentioned earlier, lines is an intermediate step, so I'd probably change this to postCodeFile |> readLines |> Seq.iter (...) since that allows you to skip giving names to your intermediate steps.
Why this is good
Two things you did well here:
You wrote each function to do just one thing, and composed functions together to create larger "building blocks" of code. E.g., validatePostCode just sends off a request, and a different function decides whether the response indicates a valid code. This is good.
You separated (as much as you could) your I/O from your business logic. Specifically, the part of the code that validates the post codes doesn't try to read them in, or write out the results; it just says "Is this valid, or not?" That means that if you later need to swap out the API you call, or if you can do some internal checking on post codes that doesn't need to go hit the outside world, you can swap that out easily later. It's usually good practice to write your code in "layers", with I/O as the "outside" layer of your code, validation just "inside" the I/O layer, and then business logic inside the validation layer — so that your business-logic code can trust that it has received only valid data. You don't have any business logic in this simple example, but you have the I/O and validation layers properly separated. Well done.
I don't think the aim here should be to make the code "more functional". Being functional is not an inherent value. In F#, it makes sense to keep the core of your logic functional, but if you are doing a lot of I/O, then it makes sense to follow more imperative style.
My version of your code would look like this (somewhat overlapping with some suggestions by #rmunn):
let postCodeFile = "c:\\tmp\\randomPostCodes.txt"
let postCodeValidDaterUrl = "https://api.postcodes.io/postcodes/"
let lines = System.IO.File.ReadLines(postCodeFile)
let validatePostCode postCode =
Request.createUrl Get (postCodeValidDaterUrl + postCode)
|> getResponse
|> run
let translateResponse response =
response.statusCode = 200
for line in lines do
let valid = translateResponse (validatePostCode line)
printfn "%s-%b" line valid
My changes are:
I removed the readLines helper and just call File.ReadLines directly. There is no need to introduce F# alias for .NET methods if it does not serve some greater purpose like providing a more F#-friendly API that is reused in multiple places.
Like #rmunn, I replaced the match with response.statusCode = 200. I only use match when I need to bind new variables as part of matching. When just testing Boolean conditions, I think if is better.
I replaced your composed function and Seq.iter with a normal for loop. The code is imperative anyway, so I do not see why you wouldn't want to use a built-in language construct. I eliminated validPostCode because you're only using the composed function in one place, so introducing it does not simplify code.
This is just my style, it is not necessarily more functional or better:
open System.IO
let postCodeFile = "c:\\tmp\\randomPostCodes.txt"
let postCodeValidDaterUrl = "https://api.postcodes.io/postcodes/"
let validatePostCode postCode =
Request.createUrl Get (postCodeValidDaterUrl + postCode)
|> getResponse
|> run
|> fun response -> response.statusCode = 200
File.ReadLines postCodeFile
|> Seq.iter (fun code -> validatePostCode code |> printfn "%s-%b" code )

Custom FsCheck Arbitrary type broken in Xunit but working in LINQPad and regular F# program

I'm trying to implement a custom Arbitrary that generates glob syntax patterns like a*c?. I think my implementation is correct, it's just that, when running the test with Xunit, FsCheck doesn't seem to be using the custom arbitrary Pattern to generate the test data. When I use LINQPad however everything works as expected. Here's the code:
open Xunit
open FsCheck
type Pattern = Pattern of string with
static member op_Explicit(Pattern s) = s
type MyArbitraries =
static member Pattern() =
(['a'..'c']#['?'; '*'])
|> Gen.elements
|> Gen.nonEmptyListOf
|> Gen.map (List.map string >> List.fold (+) "")
|> Arb.fromGen
|> Arb.convert Pattern string
Arb.register<MyArbitraries>() |> ignore
[<Fact>]
let test () =
let prop (Pattern p) = p.Length = 0
Check.QuickThrowOnFailure prop
This is the output:
Falsifiable, after 2 tests (0 shrinks) (StdGen (1884571966,296370531)): Original: Pattern null with exception: System.NullReferenceException ...
And here is the code I'm running in LINQPad along with the output:
open FsCheck
type Pattern = Pattern of string with
static member op_Explicit(Pattern s) = s
type MyArbitraries =
static member Pattern() =
(['a'..'c']#['?'; '*'])
|> Gen.elements
|> Gen.nonEmptyListOf
|> Gen.map (List.map string >> List.fold (+) "")
|> Arb.fromGen
|> Arb.convert Pattern string
Arb.register<MyArbitraries>() |> ignore
let prop (Pattern p) = p.Length = 0
Check.Quick prop
Falsifiable, after 1 test (0 shrinks) (StdGen (1148389153,296370531)): Original: Pattern "a*"
As you can see FsCheck generates a null value for the Pattern in the Xunit test although I'm using Gen.elements and Gen.nonEmptyListOf to control the test data. Also, when I run it a couple times, I'm seeing test patterns that are out of the specified character range. In LINQPad those patterns are generated correctly. I also tested the same with a regular F# console application in Visual Studio 2017 and there the custom Arbitrary works as expected as well.
What is going wrong? Is FsCheck falling back to the default string Arbitrary when running in Xunit?
You can clone this repo to see for yourself: https://github.com/bert2/GlobMatcher
(I don't want to use Prop.forAll, because each test will have multiple custom Arbitrarys and Prop.forAll doesn't go well with that. As far as I know I can only tuple them up, because the F# version of Prop.forAll only accepts a single Arbitrary.)
Don't use Arb.register. This method mutates global state, and due to the built-in parallelism support in xUnit.net 2, it's undetermined when it runs.
If you don't want to use the FsCheck.Xunit Glue Library, you can use Prop.forAll, which works like this:
[<Fact>]
let test () =
let prop (Pattern p) = p.Length = 0
Check.QuickThrowOnFailure (Prop.forAll (MyArbitraries.Pattern()) prop)
(I'm writing this partially from memory, so I may have made some small syntax mistakes, but hopefully, this should give you an idea on how to proceed.)
If, on the other hand, you choose to use FsCheck.Xunit, you can register your custom Arbitraries in a Property annotation, like this:
[<Property(Arbitrary = [|typeof<MyArbitraries>|])>]
let test (Pattern p) = p.Length = 0
As you can see, this takes care of much of the boilerplate; you don't even have to call Check.QuickThrowOnFailure.
The Arbitrary property takes an array of types, so when you have more than one, this still works.
If you need to write many properties with the same array of Arbitraries, you can create your own custom attributes that derives from the [<Property>] attribute. Here's an example:
type Letters =
static member Char() =
Arb.Default.Char()
|> Arb.filter (fun c -> 'A' <= c && c <= 'Z')
type DiamondPropertyAttribute() =
inherit PropertyAttribute(
Arbitrary = [| typeof<Letters> |],
QuietOnSuccess = true)
[<DiamondProperty>]
let ``Diamond is non-empty`` (letter : char) =
let actual = Diamond.make letter
not (String.IsNullOrWhiteSpace actual)
All that said, I'm not too fond of 'registering' Arbitraries like this. I much prefer using the combinator library, because it's type-safe, which this whole type-based mechanism isn't.

How to traverse String[][] in F#

Context: Microsoft Visual Studio 2015 Community; F#
I've been learning F# for about 1/2 a day. I do have a vague idea of how to do functional programming from a year spent fiddling with mLite.
The following script traverses a folder tree and pulls in log files. The files have entries delimited by ~ and there may be one or more there.
open System
open System.IO
let files =
System.IO.Directory.GetFiles("C:\\scratch\\snapshots\\", "*.log", SearchOption.AllDirectories)
let readFile (file: string) =
//Console.WriteLine(file)
let text = File.ReadAllText(file)
text
let dataLines (line: string) =
line.Split('~')
let data =
files |> Array.map readFile |> Array.map dataLines
So at this point data contains a String[][] and I'm at a bit of a loss to figure out how to turn it into a String[], the idea being that I want to convert all the logs into one long vector so that I can do some other transformations on it. For example, each log line begins with a datetime so having turned it all into one long list I can then sort on the datetime.
Where to from here?
As stated in the comments, you can use Array.concat :
files |> Array.map readFile |> Array.map dataLines |> Array.concat
Now some refactoring, the composition of two maps is equivalent to the map of the composition of both functions.
files |> Array.map (readFile >> dataLines) |> Array.concat
Finally map >> concat is equivalent to collect. So your code becomes:
files |> Array.collect (readFile >> dataLines)

formatting Composite function in f#

I have a recursive function in f# that iterates a string[] of commands that need to be run, each command runs a new command to generate a map to be passed to the next function.
The commands run correctly but are large and cumbersome to read, I believe that there is a better way to order / format these composite functions using pipe syntax however coming from c# as a lot of us do i for the life of me cannot seem to get it to work.
my command is :
let rec iterateCommands (map:Map<int,string array>) commandPosition =
if commandPosition < commands.Length then
match splitCommand(commands.[0]).[0] with
|"comOne" ->
iterateCommands (map.Add(commandPosition,create(splitCommand commands.[commandPosition])))(commandPosition+1)
The closest i have managed is by indenting the function but this is messy :
iterateCommands
(map.Add
(commandPosition,create
(splitCommand commands.[commandPosition])
)
)
(commandPosition+1)
Is it even possible to reformat this in f#? From what i have read i believe it possible, any help would be greatly appreciated
The command/variable types are:
commandPosition - int
commands - string[]
splitCommand string -> string[]
create string[] -> string[]
map : Map<int,string[]>
and of course the map.add map -> map + x
It's often hard to make out what is going on in a big statement with multiple inputs. I'd give names to the individual expressions, so that a reader can jump into any position and have a rough idea what's in the values used in a calculation, e.g.
let inCommands = splitCommand commands.[commandPosition]
let map' = map.Add (commandPosition, inCommands)
iterateCommands map' inCommands
Since I don't know what is being done here, the names aren't very meaningful. Ideally, they'd help to understand the individual steps of the calculation.
It'd be a bit easier to compose the call if you changed the arguments around:
let rec iterateCommands commandPosition (map:Map<int,string array>) =
// ...
That would enable you to write something like:
splitCommand commands.[commandPosition]
|> create
|> (fun x -> commandPosition, x)
|> map.Add
|> iterateCommands (commandPosition + 1)
The fact that commandPosition appears thrice in the composition is, in my opinion, a design smell, as is the fact that the type of this entire expression is unit. It doesn't look particularly functional, but since I don't understand exactly what this function attempts to do, I can't suggest a better design.
If you don't control iterateCommands, and hence can't change the order of arguments, you can always define a standard functional programming utility function:
let flip f x y = f y x
This enables you to write the following against the original version of iterateCommands:
splitCommand commands.[commandPosition]
|> create
|> (fun x -> commandPosition, x)
|> map.Add
|> (flip iterateCommands) (commandPosition + 1)

Working with large text files?

I need to import a large text file (55MB) (525000 * 25) and manipulate the data and produce some output. As usual I started exploring with f# interactive, and I get some really strange behaviours.
Is this file too large or my code wrong?
First test was to import and simply comute the sum over one column (not the end goal but first test):
let calctest =
let reader = new StreamReader(path)
let csv = reader.ReadToEnd()
csv.Split([|'\n'|])
|> Seq.skip 1
|> Seq.map (fun line -> line.Split([|','|]))
|> Seq.filter (fun a -> a.[11] = "M")
|> Seq.map (fun values -> float(values.[14]))
As expected this produces a seq of float both in typecheck and in interactive. If I know add:
|> Seq.sum
Type check works and says this function should return a float but if I run it in interactive I get this error:
System.IndexOutOfRangeException: Index was outside the bounds of the array
Then I removed the last line again and thought I look at the seq of float in a text file:
let writetest =
let str = calctest |> Seq.map (fun i -> i.ToString())
System.IO.File.WriteAllLines("test.txt", str )
Again, this passes the type check but throws errors in interactive.
Can the standard StreamReader not handle that amount of data? or am I going wrong somewhere? Should I use a different function then Streamreader?
Thanks.
Seq is lazy, which means that only when you add the Seq.sum is all the mapping and filtering actually being done, that's why you don't see the error before adding that line. Are you sure you have 15 columns on all rows? That's probably the problem
I would advise you to use the CSV Type Provider instead of just doing a string.Split, that way you'll be sure to not have an accidental IndexOutOfRangeException, and you'll handle , escaping correctly.
Additionaly, you're reading the whole csv file into memory by calling reader.ReadToEnd(), the CsvProvider supports streaming if you set the Cache parameter to false. It's not a problem with a 55MB file, but if you have something much larger it might be

Resources