I need to write a Deedle FrameData (including "ID" column and additional "Delta" column with blank entries) to CSV. While I can generate a 2D array of the FrameData, I am unable to write it correctly to a CSV file.
module SOQN =
open System
open Deedle
open FSharp.Data
// TestInput.csv
// ID,Alpha,Beta,Gamma
// 1,no,1,hi
// ...
// TestOutput.csv
// ID,Alpha,Beta,Gamma,Delta
// 1,"no","1","hi",""
// ...
let inputCsv = #"D:\TestInput.csv"
let outputCsv = #"D:\TestOutput.csv"
let (df:Frame<obj,string>) = Frame.ReadCsv(inputCsv, hasHeaders=true, inferTypes=false, separators=",", indexCol="ID")
// See http://www.fssnip.net/sj/title/Insert-Deedle-frame-into-Excel
let data4Frame (frame:Frame<_,_>) = frame.GetFrameData()
// See http://www.fssnip.net/sj/title/Insert-Deedle-frame-into-Excel
let boxOptional obj =
match obj with
| Deedle.OptionalValue.Present obj -> box (obj.ToString())
| _ -> box ""
// See http://www.fssnip.net/sj/title/Insert-Deedle-frame-into-Excel
let frameToArray (data:FrameData) =
let transpose (array:'T[,]) =
Array2D.init (array.GetLength(1)) (array.GetLength(0)) (fun i j -> array.[j, i])
data.Columns
|> Seq.map (fun (typ, vctr) -> vctr.ObjectSequence |> Seq.map boxOptional |> Array.ofSeq)
|> array2D
|> transpose
let main =
printfn ""
printfn "Output Deedle FrameData To CSV"
printfn ""
let dff = data4Frame df
let rzlt = frameToArray dff
printfn "rzlt: %A" rzlt
do
use writer = new StreamWriter(outputCsv)
writer.WriteLine("ID,Alpha,Beta,Gamma,Delta")
// writer.WriteLine rzlt
0
[<EntryPoint>]
main
|> ignore
What am I missing?
I would not use FrameData to do this - frame data is mostly internal and while there are some legitimate uses for it, I don't think it makes sense for this task.
If you simply want to add an empty Delta column to your input CSV, then you can do this:
let df : Frame<int, _> = Frame.ReadCsv("C:/temp/test-input.csv", indexCol="ID")
df.AddColumn("Delta", [])
df.SaveCsv("C:/temp/test-output.csv", ["ID"])
This does almost everything you need - it writes the ID column and the extra Delta column.
The only caveat is that it does not add the extra quotes around the data. This is not required by the CSV specification unless you need to escape a comma in a column and I don't think there is an easy way to get Deedle to do this.
So, I think then you'd have to write your own writing to a CSV file. The following shows how to do this, but it does not correctly escape quotes and commas (which is why you should use SaveCsv even if it does not put in the quotes when they're not needed):
use writer = new StreamWriter("C:/temp/test-output.csv")
writer.WriteLine("ID,Alpha,Beta,Gamma,Delta")
for key, row in Series.observations df.Rows do
writer.Write(key)
for value in Series.valuesAll row do
writer.Write(",")
writer.Write(sprintf "\"%O\"" (if value.IsSome then value.Value else box ""))
writer.WriteLine()
You can get the example of writing to csv from source of the library (it uses FrameData there)
After adding wrapper:
type FrameData with
member frameData.SaveCsv(path:string, ?includeRowKeys, ?keyNames, ?separator, ?culture) =
use writer = new StreamWriter(path)
writeCsv writer (Some path) separator culture includeRowKeys keyNames frameData
you could write like this:
dff.SaveCsv outputCsv
Related
Trying to learn F# and got stuck when trying to find a better approach of converting a csv file to a json array where each row + header is a json object in that array.
After some trial and error I finally caved and went for an ugly approach with mutable list and map. Are there any better ways this can be implemented?
let csvFileToJsonList (csvFile: FSharp.Data.CsvFile) =
let mutable tempList = List.empty<Map<string,string>>
let heads =
match csvFile.Headers with
| Some h -> h
| None -> [|"Missing"|] // what to do here?
let nbrOfColumns = csvFile.NumberOfColumns
for row in csvFile.Rows do
let columns = row.Columns
let mutable tempMap = Map.empty<string,string>
for i = 0 to nbrOfColumns-1 do
tempMap <- tempMap.Add(heads.[i], columns.[i])
tempList <- tempMap :: tempList
System.Text.Json.JsonSerializer.Serialize(tempList)
This outputs the following which is the goal:
[
{
"Header1": "Row1Val1",
"Header2": "Row1Val2",
"Header3": "Row1Val3",
"Header4": "Row1Val4",
"Header5": "Row1Val5"
},
{
"Header1": "Row2Val1",
"Header2": "Row2Val2",
"Header3": "Row2Val3",
"Header4": "Row2Val4",
"Header5": "Row2Val5"
}
]
This is about as simple as I could make it, although a longer version might be more readable for you:
let csvFileToJsonList (csvFile: FSharp.Data.CsvFile) =
let heads = csvFile.Headers |> Option.defaultValue [||]
csvFile.Rows
|> Seq.map (fun row -> Seq.zip heads row.Columns |> Map)
|> System.Text.Json.JsonSerializer.Serialize
This produces the output in the original order, which I'm assuming is preferable (your solution reverses the order).
This also assumes some headers exist, otherwise the output will be empty objects.
Description: For each row use Seq.zip to produce a sequence of header-value tuples. Pass that to the Map constructor to create a map, providing a sequence of maps, which can be serialized.
Note that using dict instead of Map might be a bit faster.
You also could use CsvProvider to create a typed object (Row)
open FSharp.Data
type Persons =
CsvProvider<"David,Raab,19.02.1983",
Schema="First (string), Last (string), BirthDay(string)",
HasHeaders=true>
let parseCsv (reader:System.IO.TextReader) = [
let data = Persons.Load reader
for row in data.Rows do
Map [
("First", row.First)
("Last", row.Last)
("Birthday", row.BirthDay)
]
]
Returning a List of a map instead of Json, but i guess you will know how to change it to Serialze the data.
I am doing some data-mining using Fsharp. To ensure that no blank values occur during the datamining process I need to make sure that empty values do not make it through what I am parsing therefore I am using a double option. An masked version of the data is shown below...
type structure = {
time: double option
pressure: double option
force: double option }
let rawData =
[| {| time = Some(15); pressure = Some(50); force = Some(100)|}
{| time = Some(16); pressure = Some(55); force = Some(110)|}
{| time = Some(17); pressure = None); force = Some(110)|}
{| time = Some(16); pressure = Some(65); force = None|}
{| time = Some(18); pressure = Some(70); force = Some(120)|} |]
I am currently saving this into a Deedle Data frame and saving to a .csv. However when I do this the values have "Some()" associated with them. They also have blank values for the None values.
How would I be able to take the "Some()" around the numbers away and turn the None values to "NaN" then save this to a .csv?
How would I be able to take the "Some()" around the numbers away and turn the None values to "NaN"
Create a function with signature double option -> string. You can process an option using a match expression.
let doubleOptionToStringFlattened =
function
| Some(d:double) -> d.ToString()
| None -> "NaN"
https://learn.microsoft.com/en-us/dotnet/fsharp/language-reference/options
then save this to a .csv
let csvLineFromStructure(s:structure) =
[| s.time; s.pressure; s.force |]
|> Array.map doubleOptionToStringFlattened
|> String.concat ","
// then create a line per structure:
let csvFromStructures(structures:structure[]) =
structures |> Array.map csvLineFromStructure |> String.concat Environment.NewLine
I have managed to read my text file which contains line by line random numbers. When I output lines using printfn "%A" lines I get seq ["45"; "5435" "34"; ... ] so I assume that lines must be a datatype list.
open System
let readLines filePath = System.IO.File.ReadLines(filePath);;
let lines = readLines #"C:\Users\Dan\Desktop\unsorted.txt"
I am now trying to sort the list by lowest to highest but it does not have the .sortBy() method. Any chance anyone can tell me how to manually do this? I have tried turning it to an array to sort it but it doesn't work.
let array = [||]
let counter = 0
for i in lines do
array.[counter] = i
counter +1
Console.ReadKey <| ignore
Thanks in advance.
If all the lines are integers, you can just use Seq.sortBy int, like so:
open System
let readLines filePath = System.IO.File.ReadLines(filePath)
let lines = readLines #"C:\Users\Dan\Desktop\unsorted.txt"
let sorted = lines |> Seq.sortBy int
If some of the lines may not be valid integers, then you'd need to run through a parsing and validation step. E.g.:
let tryParseInt s =
match System.Int32.TryParse s with
| true, n -> Some n
| false, _ -> None
let readLines filePath = System.IO.File.ReadLines(filePath)
let lines = readLines #"C:\Users\Dan\Desktop\unsorted.txt"
let sorted = lines |> Seq.choose tryParseInt |> Seq.sort
Note that the tryParseInt function I just wrote is returning the int value, so I used Seq.sort instead of Seq.sortBy int, and the output of that function chain is going to be a sequence of ints rather than a sequence of strings. If you really wanted a sequence of strings, but only the strings that could be parsed to ints, you could have done it like this:
let tryParseInt s =
match System.Int32.TryParse s with
| true, _ -> Some s
| false, _ -> None
let readLines filePath = System.IO.File.ReadLines(filePath)
let lines = readLines #"C:\Users\Dan\Desktop\unsorted.txt"
let sorted = lines |> Seq.choose tryParseInt |> Seq.sortBy int
Note how I'm returning s from this version of tryParseInt, so that Seq.choose is keeping the strings (but throwing away any strings that failed to validate through System.Int32.TryParse). There's plenty more possibilities, but that should give you enough to get started.
All the comments are valid but I'm a bit more concerned about your very imperative loop.
So here's an example:
To read all the lines:
open System.IO
let file = #"c:\tmp\sort.csv"
let lines = File.ReadAllLines(file)
To sort the lines:
let sorted = Seq.sort lines
sorted |> Seq.length // to get the number of lines
sorted |> Seq.map (fun x -> x.Length) // to iterate over all lines and get the length of each line
You can also use a list comprehension syntax:
[for l in sorted -> l.ToUpper()]
Seq will work for all kinds of collections but you can replace it with Array (mutable) or List (F# List).
Hi I'm looking to find the best way to read in a fixed width text file using F#. The file will be plain text, from one to a couple of thousand lines long and around 1000 characters wide. Each line contains around 50 fields, each with varying lengths. My initial thoughts were to have something like the following
type MyRecord = {
Name : string
Address : string
Postcode : string
Tel : string
}
let format = [
(0,10)
(10,50)
(50,7)
(57,20)
]
and read each line one by one, assigning each field by the format tuple(where the first item is the start character and the second is the number of characters wide).
Any pointers would be appreciated.
The hardest part is probably to split a single line according to the column format. It can be done something like this:
let splitLine format (line : string) =
format |> List.map (fun (index, length) -> line.Substring(index, length))
This function has the type (int * int) list -> string -> string list. In other words, format is an (int * int) list. This corresponds exactly to your format list. The line argument is a string, and the function returns a string list.
You can map a list of lines like this:
let result = lines |> List.map (splitLine format)
You can also use Seq.map or Array.map, depending on how lines is defined. Such a result will be a string list list, and you can now map over such a list to produce a MyRecord list.
You can use File.ReadLines to get a lazily evaluated sequence of strings from a file.
Please note that the above is only an outline of a possible solution. I left out boundary checks, error handling, and such. The above code may contain off-by-one errors.
Here's a solution with a focus on custom validation and error handling for each field. This might be overkill for a data file consisting of just numeric data!
First, for these kinds of things, I like to use the parser in Microsoft.VisualBasic.dll as it's already available without using NuGet.
For each row, we can return the array of fields, and the line number (for error reporting)
#r "Microsoft.VisualBasic.dll"
// for each row, return the line number and the fields
let parserReadAllFields fieldWidths textReader =
let parser = new Microsoft.VisualBasic.FileIO.TextFieldParser(reader=textReader)
parser.SetFieldWidths fieldWidths
parser.TextFieldType <- Microsoft.VisualBasic.FileIO.FieldType.FixedWidth
seq {while not parser.EndOfData do
yield parser.LineNumber,parser.ReadFields() }
Next, we need a little error handling library (see http://fsharpforfunandprofit.com/rop/ for more)
type Result<'a> =
| Success of 'a
| Failure of string list
module Result =
let succeedR x =
Success x
let failR err =
Failure [err]
let mapR f xR =
match xR with
| Success a -> Success (f a)
| Failure errs -> Failure errs
let applyR fR xR =
match fR,xR with
| Success f,Success x -> Success (f x)
| Failure errs,Success _ -> Failure errs
| Success _,Failure errs -> Failure errs
| Failure errs1, Failure errs2 -> Failure (errs1 # errs2)
Then define your domain model. In this case, it is the record type with a field for each field in the file.
type MyRecord =
{id:int; name:string; description:string}
And then you can define your domain-specific parsing code. For each field I have created a validation function (validateId, validateName, etc).
Fields that don't need validation can pass through the raw data (validateDescription).
In fieldsToRecord the various fields are combined using applicative style (<!> and <*>).
For more on this, see http://fsharpforfunandprofit.com/posts/elevated-world-3/#validation.
Finally, readRecords maps each input row to the a record Result and chooses the successful ones only. The failed ones are written to a log in handleResult.
module MyFileParser =
open Result
let createRecord id name description =
{id=id; name=name; description=description}
let validateId (lineNo:int64) (fields:string[]) =
let rawId = fields.[0]
match System.Int32.TryParse(rawId) with
| true, id -> succeedR id
| false, _ -> failR (sprintf "[%i] Can't parse id '%s'" lineNo rawId)
let validateName (lineNo:int64) (fields:string[]) =
let rawName = fields.[1]
if System.String.IsNullOrWhiteSpace rawName then
failR (sprintf "[%i] Name cannot be blank" lineNo )
else
succeedR rawName
let validateDescription (lineNo:int64) (fields:string[]) =
let rawDescription = fields.[2]
succeedR rawDescription // no validation
let fieldsToRecord (lineNo,fields) =
let (<!>) = mapR
let (<*>) = applyR
let validatedId = validateId lineNo fields
let validatedName = validateName lineNo fields
let validatedDescription = validateDescription lineNo fields
createRecord <!> validatedId <*> validatedName <*> validatedDescription
/// print any errors and only return good results
let handleResult result =
match result with
| Success record -> Some record
| Failure errs -> printfn "ERRORS %A" errs; None
/// return a sequence of records
let readRecords parserOutput =
parserOutput
|> Seq.map fieldsToRecord
|> Seq.choose handleResult
Here's an example of the parsing in practice:
// Set up some sample text
let text = """01name1description1
02name2description2
xxname3badid-------
yy badidandname
"""
// create a low-level parser
let textReader = new System.IO.StringReader(text)
let fieldWidths = [| 2; 5; 11 |]
let parserOutput = parserReadAllFields fieldWidths textReader
// convert to records in my domain
let records =
parserOutput
|> MyFileParser.readRecords
|> Seq.iter (printfn "RECORD %A") // print each record
The output will look like:
RECORD {id = 1;
name = "name1";
description = "description";}
RECORD {id = 2;
name = "name2";
description = "description";}
ERRORS ["[3] Can't parse id 'xx'"]
ERRORS ["[4] Can't parse id 'yy'"; "[4] Name cannot be blank"]
By no means is this the most efficient way to parse a file (I think there are some CSV parsing libraries available on NuGet that can do validation while parsing) but it does show how you can have complete control over validation and error handling if you need it.
A record of 50 fields is a bit unwieldy, therefore alternate approaches which allow dynamic generation of the data structure may be preferable (eg. System.Data.DataRow).
If it has to be a record anyway, you could spare at least the manual assignment to each record field and populate it with the help of Reflection instead. This trick relies on the field order as they are defined. I am assuming that every column of fixed width represents a record field, so that start indices are implied.
open Microsoft.FSharp.Reflection
type MyRecord = {
Name : string
Address : string
City : string
Postcode : string
Tel : string } with
static member CreateFromFixedWidth format (line : string) =
let fields =
format
|> List.fold (fun (index, acc) length ->
let str = line.[index .. index + length - 1].Trim()
index + length, box str :: acc )
(0, [])
|> snd
|> List.rev
|> List.toArray
FSharpValue.MakeRecord(
typeof<MyRecord>,
fields ) :?> MyRecord
Example data:
"Postman Pat " +
"Farringdon Road " +
"London " +
"EC1A 1BB" +
"+44 20 7946 0813"
|> MyRecord.CreateFromFixedWidth [16; 16; 16; 8; 16]
// val it : MyRecord = {Name = "Postman Pat";
// Address = "Farringdon Road";
// City = "London";
// Postcode = "EC1A 1BB";
// Tel = "+44 20 7946 0813";}
I am trying to use the Math.NET numerics implementation of the FFT algorithm, but I must be doing something wrong because the output is always unit
The following is the the setup:
open MathNet.Numerics
open MathNet.Numerics.Statistics
open MathNet.Numerics.IntegralTransforms
let rnd = new Random()
let rnddata = Array.init 100 (fun u -> rnd.NextDouble())
let x = rnddata |> Array.Parallel.map (fun d -> MathNet.Numerics.complex.Create(d, 0.0) )
then when I run this:
let tt = MathNet.Numerics.IntegralTransforms.Fourier.BluesteinForward(x, FourierOptions.Default)
I receive an empty output below?
val tt : unit = ()
Any ideas why?
I think the Fourier.BluesteinForward method stores the results in the input array (by overwriting whatever was there originally).
If you do not need the input after running the transform, you can just use x and read the results (this saves some memory copying, which is why Math.NET does that by default). Otherwise, you can clone the array and wrap it in a more functional style code like this:
let bluesteinForward input =
let output = Array.copy input
MathNet.Numerics.IntegralTransforms.Fourier.BluesteinForward
(output, FourierOptions.Default)
output