type Title = string
type Document = Title * Element list
and Element = Par of string | Sec of Document;;
let s1 = ("Background", [Par "Bla"])
let s21 = ("Expressions", [Sec("Arithmetical Expressions", [Par "Bla"]);
Sec("Boolean Expressions", [Par "Bla"])])
let s222 = ("Switch statements", [Par "Bla"])
let s223 = ("Repeat statements", [Par "Bla"])
let s22 = ("Statements",[Sec("Basics", [Par "Bla"]) ; Sec s222; Sec s223])
let s23 = ("Programs", [Par "Bla"])
let s2 = ("The Programming Language", [Sec s21; Sec s22; Sec s23])
let s3 = ("Tasks", [Sec("Frontend", [Par "Bla"]);
Sec("Backend", [Par "Bla"])])
let doc = ("Compiler project", [Par "Bla"; Sec s1; Sec s2; Sec s3]);;
Define an F# function toc: Document → ToC that generates the table of contents for the document. For Example, the document should have prefixes with numbered subheadings like the one below:
[([], "Compiler project");
([1], "Background");
([2], "The Programming Language");
([2;1], "Expressions");
([2;1;1], "Arithmetical Expressions");
([2;1;2], "Boolean Expressions");
([2;2], "Statements");
([2;2;1], "Basics");
([2;2;2], "Switch statements");
([2;2;3], "Repeat statements");
([2;3], "Programs");
([3], "Tasks");
([3;1], "Frontend");
([3;2], "Backend")]
You can do this relatively easily by iterating over the sections in a sequence expression, keeping the number of the current section and yielding all the titles with the current section number.
// Given the current level (as a list of numbers) and a list of elements,
// generate a list of headings in the elements and prepend `idx` indices
// to each heading that we generate
let rec generateToc idx elems = seq {
// We only care about nested sections, so this gets a list containing
// just sections and we later add number to them using `Seq.indexed`
let nested = elems |> Seq.choose (function
| Sec(title, elems) -> Some(title, elems) | _ -> None)
// For every nested section, we yield the title of the section and then
// recursively call `generateToc` to get the nested section titles
// (we append the current section number to `idx` when making the recursive call)
for i, (title, elems) in Seq.indexed nested do
yield List.rev (i+1::idx), title
yield! generateToc (i+1::idx) elems }
// To generate document TOC, we just yield the document title and
// then call `generateToc` on the document elements
let generateDocToc (title, elems) = seq {
yield [], title
yield! generateToc [] elems }
generateDocToc doc
|> Seq.iter (printfn "%A")
You seem to be missing quite a bit from your question. For example: what mutually recursive auxiliary functions do you imagine would be helpful for doing this?
To me it looks like this should be quite straightforward as just a single recursive function. The basic logic could be:
For each element, if it's a paragraph ignore it, or if it's a subsection then generate the corresponding TOC.
Append a new index (one per subsection, starting with 1) to all of the entries in each subsection.
Add [],title to the beginning of this TOC.
Related
I need to write a Deedle FrameData (including "ID" column and additional "Delta" column with blank entries) to CSV. While I can generate a 2D array of the FrameData, I am unable to write it correctly to a CSV file.
module SOQN =
open System
open Deedle
open FSharp.Data
// TestInput.csv
// ID,Alpha,Beta,Gamma
// 1,no,1,hi
// ...
// TestOutput.csv
// ID,Alpha,Beta,Gamma,Delta
// 1,"no","1","hi",""
// ...
let inputCsv = #"D:\TestInput.csv"
let outputCsv = #"D:\TestOutput.csv"
let (df:Frame<obj,string>) = Frame.ReadCsv(inputCsv, hasHeaders=true, inferTypes=false, separators=",", indexCol="ID")
// See http://www.fssnip.net/sj/title/Insert-Deedle-frame-into-Excel
let data4Frame (frame:Frame<_,_>) = frame.GetFrameData()
// See http://www.fssnip.net/sj/title/Insert-Deedle-frame-into-Excel
let boxOptional obj =
match obj with
| Deedle.OptionalValue.Present obj -> box (obj.ToString())
| _ -> box ""
// See http://www.fssnip.net/sj/title/Insert-Deedle-frame-into-Excel
let frameToArray (data:FrameData) =
let transpose (array:'T[,]) =
Array2D.init (array.GetLength(1)) (array.GetLength(0)) (fun i j -> array.[j, i])
data.Columns
|> Seq.map (fun (typ, vctr) -> vctr.ObjectSequence |> Seq.map boxOptional |> Array.ofSeq)
|> array2D
|> transpose
let main =
printfn ""
printfn "Output Deedle FrameData To CSV"
printfn ""
let dff = data4Frame df
let rzlt = frameToArray dff
printfn "rzlt: %A" rzlt
do
use writer = new StreamWriter(outputCsv)
writer.WriteLine("ID,Alpha,Beta,Gamma,Delta")
// writer.WriteLine rzlt
0
[<EntryPoint>]
main
|> ignore
What am I missing?
I would not use FrameData to do this - frame data is mostly internal and while there are some legitimate uses for it, I don't think it makes sense for this task.
If you simply want to add an empty Delta column to your input CSV, then you can do this:
let df : Frame<int, _> = Frame.ReadCsv("C:/temp/test-input.csv", indexCol="ID")
df.AddColumn("Delta", [])
df.SaveCsv("C:/temp/test-output.csv", ["ID"])
This does almost everything you need - it writes the ID column and the extra Delta column.
The only caveat is that it does not add the extra quotes around the data. This is not required by the CSV specification unless you need to escape a comma in a column and I don't think there is an easy way to get Deedle to do this.
So, I think then you'd have to write your own writing to a CSV file. The following shows how to do this, but it does not correctly escape quotes and commas (which is why you should use SaveCsv even if it does not put in the quotes when they're not needed):
use writer = new StreamWriter("C:/temp/test-output.csv")
writer.WriteLine("ID,Alpha,Beta,Gamma,Delta")
for key, row in Series.observations df.Rows do
writer.Write(key)
for value in Series.valuesAll row do
writer.Write(",")
writer.Write(sprintf "\"%O\"" (if value.IsSome then value.Value else box ""))
writer.WriteLine()
You can get the example of writing to csv from source of the library (it uses FrameData there)
After adding wrapper:
type FrameData with
member frameData.SaveCsv(path:string, ?includeRowKeys, ?keyNames, ?separator, ?culture) =
use writer = new StreamWriter(path)
writeCsv writer (Some path) separator culture includeRowKeys keyNames frameData
you could write like this:
dff.SaveCsv outputCsv
Currently I have a function to return the first elements of each list (floats), within a list to a separate list.
let firstElements list =
match list with
| head::_ -> head
| [] -> 0.00
My question is, how do I expand this to return elements at the same index into different lists while I don't know how long this list is? For example
let biglist = [[1;2;3];[4;5;6];[7;8;9]]
If I did not know the length of this list, what is the most efficient and safest way to get
[[1;4;7];[2;5;8];[3;6;9]]
List.transpose has been added recently to FSharp.Core
let biglist = [[1;2;3];[4;5;6];[7;8;9]]
let res = biglist |> List.transpose
//val res : int list list = [[1; 4; 7]; [2; 5; 8]; [3; 6; 9]]
You can use the recent added List.transpose function. But it is always good to be good enough to create such functions yourself. If you want to solve the problem yourself, think of a general algorithm to solve your problem. One would be.
From the first element of each list you create a new list
You drop the first element of each list
If you end with empty lists you end, otherwise repeat at step 1)
This could be the first attempt to solve the Problem. Function names are made up, at this point.
let transpose lst =
if allEmpty lst
then // Some Default value, we don't know yet
else ...
The else branch looks like following. First we want to pick the first element of every element. We imagine a function pickFirsts that do this task. So we could write pickFirsts lst. The result is a list that itself is the first element of a new list.
The new list is the result of the remaining list. First we imagine again a function that drops the first element of every sub-list dropFirsts lst. On that list we need to repeat step 1). We do that by a recursive call to transpose.
Overall we get:
let rec transpose lst =
if allEmpty lst
then // Some Default value, we don't know yet
else (pickFirsts lst) :: (transpose (dropFirsts lst))
At this point we can think of the default value. transpose needs to return a value in the case it ends up with an empty list of empty lists. As we use the result of transpose to add an element to it. The results of it must be a list. And the best default value is an empty list. So we end up with.
let rec transpose lst =
if allEmpty lst
then []
else (pickFirsts lst) :: (transpose (dropFirsts lst))
Next we need to implement the remaining functions allEmpty, pickFirsts and dropFirsts.
pickFirst is easy. We need to iterate over each element, and must return the first value. We get the first value of a list by List.head, and iterating over it and turning every element into a new list is what List.map does.
let pickFirsts lst = List.map List.head lst
dropFirsts need to iterate ver each element, and just remove the first element, or in other words keeps the remaining/tail of a list.
let dropFirsts lst = List.map List.tail lst
The remaining allEmpty is a predicate that either return true/false if we have an empty list of lists or not. With a return value of bool, we need another function that allows to return another type is a list. This is usually the reason to use List.fold. An implementation could look like this:
let allEmpty lst =
let folder acc x =
match x with
| [] -> acc
| _ -> false
List.fold folder true lst
It starts with true as the default value. As long it finds empty lists it returns the default value unchanged. As soon there is one element found, in any list, it will return false (Not Empty) as the new default value.
The whole code:
let allEmpty lst =
let folder acc x =
match x with
| [] -> acc
| _ -> false
List.fold folder true lst
let pickFirsts lst = List.map List.head lst
let dropFirsts lst = List.map List.tail lst
let rec transpose lst =
if allEmpty lst
then []
else (pickFirsts lst) :: (transpose (dropFirsts lst))
transpose [[1;2;3];[4;5;6];[7;8;9]]
Another approach would be to turn it into a 2 dimensional mutable array. Also do length checkings. Do the transformation and return the mutable array again as an immutable list.
I have managed to read my text file which contains line by line random numbers. When I output lines using printfn "%A" lines I get seq ["45"; "5435" "34"; ... ] so I assume that lines must be a datatype list.
open System
let readLines filePath = System.IO.File.ReadLines(filePath);;
let lines = readLines #"C:\Users\Dan\Desktop\unsorted.txt"
I am now trying to sort the list by lowest to highest but it does not have the .sortBy() method. Any chance anyone can tell me how to manually do this? I have tried turning it to an array to sort it but it doesn't work.
let array = [||]
let counter = 0
for i in lines do
array.[counter] = i
counter +1
Console.ReadKey <| ignore
Thanks in advance.
If all the lines are integers, you can just use Seq.sortBy int, like so:
open System
let readLines filePath = System.IO.File.ReadLines(filePath)
let lines = readLines #"C:\Users\Dan\Desktop\unsorted.txt"
let sorted = lines |> Seq.sortBy int
If some of the lines may not be valid integers, then you'd need to run through a parsing and validation step. E.g.:
let tryParseInt s =
match System.Int32.TryParse s with
| true, n -> Some n
| false, _ -> None
let readLines filePath = System.IO.File.ReadLines(filePath)
let lines = readLines #"C:\Users\Dan\Desktop\unsorted.txt"
let sorted = lines |> Seq.choose tryParseInt |> Seq.sort
Note that the tryParseInt function I just wrote is returning the int value, so I used Seq.sort instead of Seq.sortBy int, and the output of that function chain is going to be a sequence of ints rather than a sequence of strings. If you really wanted a sequence of strings, but only the strings that could be parsed to ints, you could have done it like this:
let tryParseInt s =
match System.Int32.TryParse s with
| true, _ -> Some s
| false, _ -> None
let readLines filePath = System.IO.File.ReadLines(filePath)
let lines = readLines #"C:\Users\Dan\Desktop\unsorted.txt"
let sorted = lines |> Seq.choose tryParseInt |> Seq.sortBy int
Note how I'm returning s from this version of tryParseInt, so that Seq.choose is keeping the strings (but throwing away any strings that failed to validate through System.Int32.TryParse). There's plenty more possibilities, but that should give you enough to get started.
All the comments are valid but I'm a bit more concerned about your very imperative loop.
So here's an example:
To read all the lines:
open System.IO
let file = #"c:\tmp\sort.csv"
let lines = File.ReadAllLines(file)
To sort the lines:
let sorted = Seq.sort lines
sorted |> Seq.length // to get the number of lines
sorted |> Seq.map (fun x -> x.Length) // to iterate over all lines and get the length of each line
You can also use a list comprehension syntax:
[for l in sorted -> l.ToUpper()]
Seq will work for all kinds of collections but you can replace it with Array (mutable) or List (F# List).
Hi I'm looking to find the best way to read in a fixed width text file using F#. The file will be plain text, from one to a couple of thousand lines long and around 1000 characters wide. Each line contains around 50 fields, each with varying lengths. My initial thoughts were to have something like the following
type MyRecord = {
Name : string
Address : string
Postcode : string
Tel : string
}
let format = [
(0,10)
(10,50)
(50,7)
(57,20)
]
and read each line one by one, assigning each field by the format tuple(where the first item is the start character and the second is the number of characters wide).
Any pointers would be appreciated.
The hardest part is probably to split a single line according to the column format. It can be done something like this:
let splitLine format (line : string) =
format |> List.map (fun (index, length) -> line.Substring(index, length))
This function has the type (int * int) list -> string -> string list. In other words, format is an (int * int) list. This corresponds exactly to your format list. The line argument is a string, and the function returns a string list.
You can map a list of lines like this:
let result = lines |> List.map (splitLine format)
You can also use Seq.map or Array.map, depending on how lines is defined. Such a result will be a string list list, and you can now map over such a list to produce a MyRecord list.
You can use File.ReadLines to get a lazily evaluated sequence of strings from a file.
Please note that the above is only an outline of a possible solution. I left out boundary checks, error handling, and such. The above code may contain off-by-one errors.
Here's a solution with a focus on custom validation and error handling for each field. This might be overkill for a data file consisting of just numeric data!
First, for these kinds of things, I like to use the parser in Microsoft.VisualBasic.dll as it's already available without using NuGet.
For each row, we can return the array of fields, and the line number (for error reporting)
#r "Microsoft.VisualBasic.dll"
// for each row, return the line number and the fields
let parserReadAllFields fieldWidths textReader =
let parser = new Microsoft.VisualBasic.FileIO.TextFieldParser(reader=textReader)
parser.SetFieldWidths fieldWidths
parser.TextFieldType <- Microsoft.VisualBasic.FileIO.FieldType.FixedWidth
seq {while not parser.EndOfData do
yield parser.LineNumber,parser.ReadFields() }
Next, we need a little error handling library (see http://fsharpforfunandprofit.com/rop/ for more)
type Result<'a> =
| Success of 'a
| Failure of string list
module Result =
let succeedR x =
Success x
let failR err =
Failure [err]
let mapR f xR =
match xR with
| Success a -> Success (f a)
| Failure errs -> Failure errs
let applyR fR xR =
match fR,xR with
| Success f,Success x -> Success (f x)
| Failure errs,Success _ -> Failure errs
| Success _,Failure errs -> Failure errs
| Failure errs1, Failure errs2 -> Failure (errs1 # errs2)
Then define your domain model. In this case, it is the record type with a field for each field in the file.
type MyRecord =
{id:int; name:string; description:string}
And then you can define your domain-specific parsing code. For each field I have created a validation function (validateId, validateName, etc).
Fields that don't need validation can pass through the raw data (validateDescription).
In fieldsToRecord the various fields are combined using applicative style (<!> and <*>).
For more on this, see http://fsharpforfunandprofit.com/posts/elevated-world-3/#validation.
Finally, readRecords maps each input row to the a record Result and chooses the successful ones only. The failed ones are written to a log in handleResult.
module MyFileParser =
open Result
let createRecord id name description =
{id=id; name=name; description=description}
let validateId (lineNo:int64) (fields:string[]) =
let rawId = fields.[0]
match System.Int32.TryParse(rawId) with
| true, id -> succeedR id
| false, _ -> failR (sprintf "[%i] Can't parse id '%s'" lineNo rawId)
let validateName (lineNo:int64) (fields:string[]) =
let rawName = fields.[1]
if System.String.IsNullOrWhiteSpace rawName then
failR (sprintf "[%i] Name cannot be blank" lineNo )
else
succeedR rawName
let validateDescription (lineNo:int64) (fields:string[]) =
let rawDescription = fields.[2]
succeedR rawDescription // no validation
let fieldsToRecord (lineNo,fields) =
let (<!>) = mapR
let (<*>) = applyR
let validatedId = validateId lineNo fields
let validatedName = validateName lineNo fields
let validatedDescription = validateDescription lineNo fields
createRecord <!> validatedId <*> validatedName <*> validatedDescription
/// print any errors and only return good results
let handleResult result =
match result with
| Success record -> Some record
| Failure errs -> printfn "ERRORS %A" errs; None
/// return a sequence of records
let readRecords parserOutput =
parserOutput
|> Seq.map fieldsToRecord
|> Seq.choose handleResult
Here's an example of the parsing in practice:
// Set up some sample text
let text = """01name1description1
02name2description2
xxname3badid-------
yy badidandname
"""
// create a low-level parser
let textReader = new System.IO.StringReader(text)
let fieldWidths = [| 2; 5; 11 |]
let parserOutput = parserReadAllFields fieldWidths textReader
// convert to records in my domain
let records =
parserOutput
|> MyFileParser.readRecords
|> Seq.iter (printfn "RECORD %A") // print each record
The output will look like:
RECORD {id = 1;
name = "name1";
description = "description";}
RECORD {id = 2;
name = "name2";
description = "description";}
ERRORS ["[3] Can't parse id 'xx'"]
ERRORS ["[4] Can't parse id 'yy'"; "[4] Name cannot be blank"]
By no means is this the most efficient way to parse a file (I think there are some CSV parsing libraries available on NuGet that can do validation while parsing) but it does show how you can have complete control over validation and error handling if you need it.
A record of 50 fields is a bit unwieldy, therefore alternate approaches which allow dynamic generation of the data structure may be preferable (eg. System.Data.DataRow).
If it has to be a record anyway, you could spare at least the manual assignment to each record field and populate it with the help of Reflection instead. This trick relies on the field order as they are defined. I am assuming that every column of fixed width represents a record field, so that start indices are implied.
open Microsoft.FSharp.Reflection
type MyRecord = {
Name : string
Address : string
City : string
Postcode : string
Tel : string } with
static member CreateFromFixedWidth format (line : string) =
let fields =
format
|> List.fold (fun (index, acc) length ->
let str = line.[index .. index + length - 1].Trim()
index + length, box str :: acc )
(0, [])
|> snd
|> List.rev
|> List.toArray
FSharpValue.MakeRecord(
typeof<MyRecord>,
fields ) :?> MyRecord
Example data:
"Postman Pat " +
"Farringdon Road " +
"London " +
"EC1A 1BB" +
"+44 20 7946 0813"
|> MyRecord.CreateFromFixedWidth [16; 16; 16; 8; 16]
// val it : MyRecord = {Name = "Postman Pat";
// Address = "Farringdon Road";
// City = "London";
// Postcode = "EC1A 1BB";
// Tel = "+44 20 7946 0813";}
//Return a tuple from a text file:
let ExtractFromLine (line:string) =
let strings = line.Split('\t') //data members are spaced by tab
let strlist = Array.toList(strings) //each data member is now a list of str
let year = System.Int32.Parse(strlist.Head) //year is first in file, so Head
let values = List.map System.Double.Parse strlist.Tail //tail are all values
let average = List.average values //not part of text file
let min = List.min values //not part of text file
let max = List.max values //not part of text file
(year, values, average, min, max) //return tuple with all info
//----------
let rec createList fileline =
if fileline = [] then
[]
else
let (year, values, average, min, max) = ExtractFromLine fileline.Head
let l = (year, values, average, min, max) :: createList fileline.Tail
l
//------------
let main argv =
let file = ReadFile "data.txt"
let biglist = createList file //recursive function to make a list of tuples
printfn"%A" biglist //This prints the year, all values, average, min, and max for each tuple created
I now have a giant list of tuples with all of the information that I need.
Have I retained the possibility of accessing all elements inside and performing calculations on them? I program in C++, and the solution is doable in that language, but F# is so much more powerful in my opinion. I'm sure its possible, I'm just missing the basics.
For example, how do I print the average of all the values for all years?
I'm thinking of a for loop, but I'm not sure how to iterate.
for(all tuples in biglist)
printfn"%A:%A" tuple.year tuple.average
It's wrong obviously, but I think you guys understand what I'm trying to do.
The above question involves pulling data from one tuple at a time across the list. What if I wanted to print the largest average?This would involve accessing each tuple's average data member and comparing them to return the largest one. Do I have to create another list containing these averages?
I learned about fst and snd but I had a hard time applying it to this example.
You don't have to answer all questions if it is too much, but any help is greatly appreciated as I start out in this language, thank you
You can loop in F# but it's a construct from imperative programming world. More idiomatic approach is to access items of the list recursively.
Below some sample code that creates tuples, constructs a list, and then access items and checks which one is bigger. Looking at your code the average was third item in the tuple. That's why I've added a trd function. It takes a 5-item tuple and returns a third item.
The prcsLst function takes 2 arguments: a list and a starting max value. The idea is that when processing the list we take the head (first item on the list), compare it's average with current max. Whichever is bigger is passed to the next recursive round together with list's tail (the list without the first item).In this case as the initial max I passed in the average of the first item.
You can run the example in F# Interactive to see the results.
// create sample tuples
let t1 = (2014, 35, 18, 5, 45)
let t2 = (2014, 32, 28, 8, 75)
let t3 = (2014, 25, 11, 9, 55)
let t4 = (2015, 16, 13, 2, 15)
let t5 = (2015, 29, 15, 1, 35)
// create sample list
let lst = [t1;t2;t3;t4;t5]
// a function to return third item in a tuple
let trd (_,_,t,_,_) = t
// process list recursively
let rec prcsLst l max =
match l with
| [] -> max
| hd::tl ->
if (trd hd) > max then
prcsLst tl (trd hd)
else
prcsLst tl max
// invoke the method on the sample list
// as a starting point use the first item in the list
prcsLst lst (trd t1);;
On a mobile so forgive me for not doing any code examples :)
I suspect that the missing piece of your puzzle is called pattern matching. In F# you address elements of a tuple like so:
let (y, v, Av, mn, mx) = mytuple
Note that you can also use this in function declarations, and when doing a 'match'.
(there is an exception for 'twoples' where you can use the functions 'fst' and 'snd')
Another thing you should play with is the |> operator.