How read a file into a seq of lines in F# - f#

This is C# version:
public static IEnumerable<string> ReadLinesEnumerable(string path) {
using ( var reader = new StreamReader(path) ) {
var line = reader.ReadLine();
while ( line != null ) {
yield return line;
line = reader.ReadLine();
}
}
}
But directly translating needs a mutable variable.

If you're using .NET 4.0, you can just use File.ReadLines.
> let readLines filePath = System.IO.File.ReadLines(filePath);;
val readLines : string -> seq<string>

open System.IO
let readLines (filePath:string) = seq {
use sr = new StreamReader (filePath)
while not sr.EndOfStream do
yield sr.ReadLine ()
}

To answer the question whether there is a library function for encapsulating this pattern - there isn't a function exactly for this, but there is a function that allows you to generate sequence from some state called Seq.unfold. You can use it to implement the functionality above like this:
new StreamReader(filePath) |> Seq.unfold (fun sr ->
match sr.ReadLine() with
| null -> sr.Dispose(); None
| str -> Some(str, sr))
The sr value represents the stream reader and is passed as the state. As long as it gives you non-null values, you can return Some containing an element to generate and the state (which could change if you wanted). When it reads null, we dispose it and return None to end the sequence. This isn't a direct equivalent, because it doesn't properly dispose StreamReader when an exception is thrown.
In this case, I would definitely use sequence expression (which is more elegant and more readable in most of the cases), but it's useful to know that it could be also written using a higher-order function.

let lines = File.ReadLines(path)
// To check
lines |> Seq.iter(fun x -> printfn "%s" x)

On .NET 2/3 you can do:
let readLines filePath = File.ReadAllLines(filePath) |> Seq.cast<string>
and on .NET 4:
let readLines filePath = File.ReadLines(filePath);;

In order to avoid the "System.ObjectDisposedException: Cannot read from a closed TextReader." exception, use:
let lines = seq { yield! System.IO.File.ReadLines "/path/to/file.txt" }

Related

F# sequences returning one item at a time

I am new to F# and I am trying to convert a DbDataReader class from c# to F#, the DBDataReader reads and returns a csv line as a list
member this.ReadCSV =
seq {
use textReader = File.OpenText(filename)
while (not textReader.EndOfStream) do
let line = textReader.ReadLine()
// linedata is a class variable that holds the current data for use by DBDataReader
linedata <- line |> Seq.toList |> splitDelimited delimiter "" [])
yield true
yield false
}
the DBDataReader class has a Read() function that when called should move the cursor to the next line in the input, i implemented this using an index variable as below but this seems inefficient when processing a file with millions of rows. is there a more efficient way of doing this?
override this.Read() =
let hasRows = Seq.item idx this.ReadCSV
idx <- idx + 1
hasRows
Why not use an Enumerator instead of a Sequence?
If you really want to use a Sequence in FP style, then
yield option values from the sequence and
replace the loop statement that uses this.Read() method with a for..in expression that uses this.ReadCSV as the enumerable-expression and
member this.CSV =
seq {
use textReader = File.OpenText(filename)
while (not textReader.EndOfStream) do
let line = textReader.ReadLine()
yield (line |> Seq.toList |> splitDelimited delimiter "" [])
}
for linedata in this.CSV do
...

F# - Write Deedle FrameData To CSV

I need to write a Deedle FrameData (including "ID" column and additional "Delta" column with blank entries) to CSV. While I can generate a 2D array of the FrameData, I am unable to write it correctly to a CSV file.
module SOQN =
open System
open Deedle
open FSharp.Data
// TestInput.csv
// ID,Alpha,Beta,Gamma
// 1,no,1,hi
// ...
// TestOutput.csv
// ID,Alpha,Beta,Gamma,Delta
// 1,"no","1","hi",""
// ...
let inputCsv = #"D:\TestInput.csv"
let outputCsv = #"D:\TestOutput.csv"
let (df:Frame<obj,string>) = Frame.ReadCsv(inputCsv, hasHeaders=true, inferTypes=false, separators=",", indexCol="ID")
// See http://www.fssnip.net/sj/title/Insert-Deedle-frame-into-Excel
let data4Frame (frame:Frame<_,_>) = frame.GetFrameData()
// See http://www.fssnip.net/sj/title/Insert-Deedle-frame-into-Excel
let boxOptional obj =
match obj with
| Deedle.OptionalValue.Present obj -> box (obj.ToString())
| _ -> box ""
// See http://www.fssnip.net/sj/title/Insert-Deedle-frame-into-Excel
let frameToArray (data:FrameData) =
let transpose (array:'T[,]) =
Array2D.init (array.GetLength(1)) (array.GetLength(0)) (fun i j -> array.[j, i])
data.Columns
|> Seq.map (fun (typ, vctr) -> vctr.ObjectSequence |> Seq.map boxOptional |> Array.ofSeq)
|> array2D
|> transpose
let main =
printfn ""
printfn "Output Deedle FrameData To CSV"
printfn ""
let dff = data4Frame df
let rzlt = frameToArray dff
printfn "rzlt: %A" rzlt
do
use writer = new StreamWriter(outputCsv)
writer.WriteLine("ID,Alpha,Beta,Gamma,Delta")
// writer.WriteLine rzlt
0
[<EntryPoint>]
main
|> ignore
What am I missing?
I would not use FrameData to do this - frame data is mostly internal and while there are some legitimate uses for it, I don't think it makes sense for this task.
If you simply want to add an empty Delta column to your input CSV, then you can do this:
let df : Frame<int, _> = Frame.ReadCsv("C:/temp/test-input.csv", indexCol="ID")
df.AddColumn("Delta", [])
df.SaveCsv("C:/temp/test-output.csv", ["ID"])
This does almost everything you need - it writes the ID column and the extra Delta column.
The only caveat is that it does not add the extra quotes around the data. This is not required by the CSV specification unless you need to escape a comma in a column and I don't think there is an easy way to get Deedle to do this.
So, I think then you'd have to write your own writing to a CSV file. The following shows how to do this, but it does not correctly escape quotes and commas (which is why you should use SaveCsv even if it does not put in the quotes when they're not needed):
use writer = new StreamWriter("C:/temp/test-output.csv")
writer.WriteLine("ID,Alpha,Beta,Gamma,Delta")
for key, row in Series.observations df.Rows do
writer.Write(key)
for value in Series.valuesAll row do
writer.Write(",")
writer.Write(sprintf "\"%O\"" (if value.IsSome then value.Value else box ""))
writer.WriteLine()
You can get the example of writing to csv from source of the library (it uses FrameData there)
After adding wrapper:
type FrameData with
member frameData.SaveCsv(path:string, ?includeRowKeys, ?keyNames, ?separator, ?culture) =
use writer = new StreamWriter(path)
writeCsv writer (Some path) separator culture includeRowKeys keyNames frameData
you could write like this:
dff.SaveCsv outputCsv

F#: How to enumerate through multiple files correctly?

I have a bunch of files several MiB in size which are very simple:
They have a size of multiples of 8
They only contain doubles in little endian, so can be read with BinaryReader's ReadDouble() method
When lexicographically sorted, they contain all values in the sequence they need to be.
I can't keep everything in memory as a float list or float array so I need a float seq that goes through the necessary files when actually being accessed. The portion that goes through the sequence actually does it in imperative style using GetEnumerator() because I don't want any resource leaks and want to close all files correctly.
My first functional approach was:
let readFile file =
let rec readReader (maybeReader : BinaryReader option) =
match maybeReader with
| None ->
let openFile() =
printfn "Opening the file"
new BinaryReader(new FileStream(file, FileMode.Open, FileAccess.Read, FileShare.Read))
|> Some
|> readReader
seq { yield! openFile() }
| Some reader when reader.BaseStream.Position >= reader.BaseStream.Length ->
printfn "Closing the file"
reader.Dispose()
Seq.empty
| Some reader ->
reader.BaseStream.Position |> printfn "Reading from position %d"
let bytesToRead = Math.Min(1048576L, reader.BaseStream.Length - reader.BaseStream.Position) |> int
let bytes = reader.ReadBytes bytesToRead
let doubles = Array.zeroCreate<float> (bytesToRead / 8)
Buffer.BlockCopy(bytes, 0, doubles, 0, bytesToRead)
seq {
yield! doubles
yield! readReader maybeReader
}
readReader None
And then, when I have a string list containing all the files, I can say something like:
let values = files |> Seq.collect readFile
use ve = values.GetEnumerator()
// Do stuff that only gets partial data from one file
However, this only closes the files when the reader reaches its end (which is clear when looking at the function). So as a second approach I implemented the file enumerating imperatively:
type FileEnumerator(file : string) =
let reader = new BinaryReader(new FileStream(file, FileMode.Open, FileAccess.Read, FileShare.Read))
let mutable _current : float = Double.NaN
do file |> printfn "Enumerator active for %s"
interface IDisposable with
member this.Dispose() =
reader.Dispose()
file |> printfn "Enumerator disposed for %s"
interface IEnumerator with
member this.Current = _current :> obj
member this.Reset() = reader.BaseStream.Position <- 0L
member this.MoveNext() =
let stream = reader.BaseStream
if stream.Position >= stream.Length then false
else
_current <- reader.ReadDouble()
true
interface IEnumerator<float> with
member this.Current = _current
type FileEnumerable(file : string) =
interface IEnumerable with
member this.GetEnumerator() = new FileEnumerator(file) :> IEnumerator
interface IEnumerable<float> with
member this.GetEnumerator() = new FileEnumerator(file) :> IEnumerator<float>
let readFile' file = new FileEnumerable(file) :> float seq
now, when I say
let values = files |> Seq.collect readFile'
use ve = values.GetEnumerator()
// do stuff with the enumerator
disposing the enumerator correctly bubbles through to my imperative enumerator.
While this is a feasible solution for what I want to achieve (I could make it faster by reading it blockwise like the first functional approach but for brevity I didn't do it here) I wonder if there is a truly functional approach for this avoiding the mutable state in the enumerator.
I don't quite get what you mean when you say that using GetEnumerator() will prevent resource leaks and allow to close all files correctly. The below would be my attempt at this (ignoring block copy part for demonstration purposes) and I think it results in the files properly closed.
let eof (br : BinaryReader) =
br.BaseStream.Position = br.BaseStream.Length
let readFileAsFloats filePath =
seq{
use file = new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.Read)
use reader = new BinaryReader(file)
while (not (eof reader)) do
yield reader.ReadDouble()
}
let readFilesAsFloats filePaths =
filePaths |> Seq.collect readFileAsFloats
let floats = readFilesAsFloats ["D:\\floatFile1.txt"; "D:\\floatFile2.txt"]
Is that what you had in mind?

Is it possible to use the pipeline operator to call a method on a returned object?

Is it possible to call a method on a returned object using the pipeline infix operator?
Example, I have a .Net class (Class1) with a method (Method1). I can currently code it like this:
let myclass = new Class1()
let val = myclass.Method1()
I know I could also code it as such
let val = new Class1().Method1()
However I would like to do be able to pipeline it (I am using the ? below where I don't know what to do):
new Class1()
|> ?.Method1()
Furthermore, say I had a method which returns an object, and I want to only reference it if that method didn't return null (otherwise bail?)
new Class1()
|> ?.Method1()
|> ?? ?.Method2()
Or to make it clearer, here is some C# code:
public void foo()
{
var myclass = new Class1();
Class2 class2 = myclass.Method1();
if (class2 == null)
{
return;
}
class2.Method2();
}
You can define something similar to your (??) operator fairly easily (but operators can't start with a question mark):
let (~??) f x =
if (x <> null) then
f x
Unfortunately, your pipelined code will need to be a bit more verbose (also, note that you can drop the new keyword for calling constructors):
Class1()
|> fun x -> x.Method1()
Putting it all together:
Class1()
|> fun x -> x.Method1()
|> ~?? (fun x -> x.Method2())
Using a custom operator as 'kvb' suggests is definitely an option. Another approach that you may find interesting in this case is to define your own 'computation expression' that automatically performs the check for null value at every point you specify. The code that uses it would look like this:
open System.Windows.Forms
// this function returns (0) null, or (1) btn whose parent is
// null or (2) button whose parent is not null
let test = function
| 1 -> new Button(Text = "Button")
| 2 -> new Button(Text = "Button", Parent = new Button(Text = "Parent"))
| _ -> null
let res =
safe { let! btn = test(2) // specify number here for testing
// if btn = null, this part of the computation will not execute
// and the computation expression immediately returns null
printfn "Text = %s" btn.Text
let! parent = btn.Parent // safe access to parent
printfn "Parent = %s" parent.Text // will never be null!
return parent }
As you can see, when you want to use a value that can potentially be 'null', you use let! inside the computation expression. The computation expression can be defined so that it immediately returns null if the value is null and runs the rest of the computation otherwise. Here is the code:
type SafeNullBuilder() =
member x.Return(v) = v
member x.Bind(v, f) =
if v = null then null else f(v)
let safe = new SafeNullBuilder()
BTW: If you want to learn more about this, it is very similar to 'Maybe' monad in Haskell (or computation working with F# option type).

f# byte[] -> hex -> string conversion

I have byte array as input. I would like to convert that array to string that contains hexadecimal representation of array values. This is F# code:
let ByteToHex bytes =
bytes
|> Array.map (fun (x : byte) -> String.Format("{0:X2}", x))
let ConcatArray stringArray = String.Join(null, (ByteToHex stringArray))
This produces result I need, but I would like to make it more compact so that I have only one function.
I could not find function that would concat string representation of each byte at the end
of ByteToHex.
I tried Array.concat, concat_map, I tried with lists, but the best I could get is array or list of strings.
Questions:
What would be simplest, most elegant way to do this?
Is there string formatting construct in F# so that I can replace String.Format from System assembly?
Example input: [|0x24uy; 0xA1uy; 0x00uy; 0x1Cuy|] should produce string "24A1001C"
There is nothing inherently wrong with your example. If you'd like to get it down to a single expression then use the String.contcat method.
let ByteToHex bytes =
bytes
|> Array.map (fun (x : byte) -> System.String.Format("{0:X2}", x))
|> String.concat System.String.Empty
Under the hood, String.concat will just call into String.Join. Your code may have to be altered slighly though because based on your sample you import System. This may create a name resolution conflict between F# String and System.String.
If you want to transform and accumulate in one step, fold is your answer. sprintf is the F# string format function.
let ByteToHex (bytes:byte[]) =
bytes |> Array.fold (fun state x-> state + sprintf "%02X" x) ""
This can also be done with a StringBuilder
open System.Text
let ByteToHex (bytes:byte[]) =
(StringBuilder(), bytes)
||> Array.fold (fun state -> sprintf "%02X" >> state.Append)
|> string
produces:
[|0x24uy; 0xA1uy; 0x00uy; 0x1Cuy|] |> ByteToHex;;
val it : string = "24A1001C"
Here's another answer:
let hashFormat (h : byte[]) =
let sb = StringBuilder(h.Length * 2)
let rec hashFormat' = function
| _ as currIndex when currIndex = h.Length -> sb.ToString()
| _ as currIndex ->
sb.AppendFormat("{0:X2}", h.[currIndex]) |> ignore
hashFormat' (currIndex + 1)
hashFormat' 0
The upside of this one is that it's tail-recursive and that it pre-allocates the exact amount of space in the string builder as will be required to convert the byte array to a hex-string.
For context, I have it in this module:
module EncodingUtils
open System
open System.Text
open System.Security.Cryptography
open Newtonsoft.Json
let private hmacmd5 = new HMACMD5()
let private encoding = System.Text.Encoding.UTF8
let private enc (str : string) = encoding.GetBytes str
let private json o = JsonConvert.SerializeObject o
let md5 a = a |> (json >> enc >> hmacmd5.ComputeHash >> hashFormat)
Meaning I can pass md5 any object and get back a JSON hash of it.
Here's another. I'm learning F#, so feel free to correct me with more idiomatic ways of doing this:
let bytesToHexString (bytes : byte[]) : string =
bytes
|> Seq.map (fun c -> c.ToString("X2"))
|> Seq.reduce (+)
Looks fine to me. Just to point out another, in my opinion, very helpful function in the Printf module, have a look at ksprintf. It passes the result of a formated string into a function of your choice (in this case, the identity function).
val ksprintf : (string -> 'd) -> StringFormat<'a,'d> -> 'a
sprintf, but call the given 'final' function to generate the result.
To be honest, that doesn't look terrible (although I also have very little F# experience). Does F# offer an easy way to iterate (foreach)? If this was C#, I might use something like (where raw is a byte[] argument):
StringBuilder sb = new StringBuilder();
foreach (byte b in raw) {
sb.Append(b.ToString("x2"));
}
return sb.ToString()
I wonder how that translates to F#...

Resources