Writing (string * int) [] to text file in F# - f#

I am quite new to F# and this is my first post, so I hope I can provide an adequate description of my problem.
Like the title says I need to write my (string * int) [] function to a text file. I know of System.IO.File.WriteAllText (string, string), but is there a way for me to write to a text file without converting to a string first?
Right now I have a sort of word count, which is sorted with the counted word, and number of times the word occurs. Like so:
[|("and", 130); ("he", 128); ("that", 103); ("was", 80); ...|]
The file directory of the text file can just be the location of the .fsx file from my project.

You do need to convert your array somehow. This is usually a good thing, since it forces you to control the format of your file. Of course you could use some sort of serializer, but that's probably a bit much for such a simple task.
I would probably do something like this:
open System.IO
let words = [|("and", 130); ("he", 128); ("that", 103); ("was", 80) |]
let lines = words |> Array.map (fun (w, c) -> sprintf "%s;%i" w c)
File.WriteAllLines(__SOURCE_DIRECTORY__ + #"\file.csv", lines)

Related

Format built-in types for pretty printing in Deedle

I understand that in order to pretty print things like discriminated unions in Deedle, you have to override ToString(). But what about built in types, like float?
Specifically, I want floats in one column to be displayed as percentages, or at the very least, to not have a million digits past the decimal.
Is there a way to do this?
There is no built-in support for doing this - it sounds like a useful addition, so if you want to contribute this to Deedle, please open an issue to discuss this! We'd be happy to accepta pull request that adds this feature.
As a workaround, I think your best chance is to transform the data in the frame before printing. Something like this should do the trick:
let df = frame [ "A" => series [ 1 => 0.001 ] ]
df |> Frame.map (fun r c (v:float) ->
if c = "A" then box (sprintf "%f%%" (v*100.0)) else box v)
This creates a new frame where all float values of a column named A are transformed using the formatting function sprintf "%f%%" (v*100.0) and the rest is left unchanged.

How to format strings to print in a file with F#

This code is printing float numbers in the file with this format f,ffffff (with comma) and the numbers are in a row, but I need to print it like this f.ffffff (with dot) and after each number skip a line, so each number has its own line. Any ideas on how do I do it?
CODE EDITED
module writeFiles =
let (w:float[]) = [|-1.3231725; 1.052134922; 1.23082055; 1.457748868; -0.3481141253; -0.06886428466; -1.473392229; 0.1103078722; -1.047231857; -2.641890652; -1.335060286; -0.9839854216; 0.1844535984; 3.087001584; -0.008467130841; 1.175365466; 1.637297522; 5.557832631; -0.2906445452; -0.4052301538; 1.766454088; -2.604325471; -1.807107036; -2.471407376; -2.204730614;|]
let write secfilePath=
for j in 0 .. 24 do
let z = w.[j].ToString()
File.AppendAllText(secfilePath, z)
//File.AppendAllLines(secfilePath, z)
done
There is couple things that could be done better in your code.
You're opening the file over and over again every time you add a number
z does not need to be mutable
You can pass format pattern and/or culture to ToString call
You can iterate over filterMod.y instead of for loop and array indexer access
I would probably go with something more like
module writeFiles =
let write secfilePath=
let data = filterMod.y
|> Array.map (fun x -> x.ToString(CultureInfo.InvariantCulture))
File.AppendAllLines(secfilePath, data)
It prepares an array of strings, where every number of filterMod.y gets formatted using CultureInfo.InvariantCulture, which will make it use . as decimal separator. And later on it uses AppendAllLines to write the whole array to the file at once, where every element will be written in a separate line.

File transform in F#

I am just starting to work with F# and trying to understand typical idoms and effective ways of thinking and working.
The task at hand is a simple transform of a tab-delimited file to one which is comma-delimited. A typical input line will look like:
let line = "#ES# 01/31/2006 13:31:00 1303.00 1303.00 1302.00 1302.00 2514 0"
I started out with looping code like this:
// inFile and outFile defined in preceding code not shown here
for line in File.ReadLines(inFile) do
let typicalArray = line.Split '\t'
let transformedLine = typicalArray |> String.concat ","
outFile.WriteLine(transformedLine)
I then replaced the split/concat pair of operations with a single Regex.Replace():
for line in File.ReadLines(inFile) do
let transformedLine = Regex.Replace(line, "\t",",")
outFile.WriteLine(transformedLine)
And now, finally, have replaced the looping with a pipeline:
File.ReadLines(inFile)
|> Seq.map (fun x -> Regex.Replace(x, "\t", ","))
|> Seq.iter (fun y -> outFile.WriteLine(y))
// other housekeeping code below here not shown
While all versions work, the final version seems to me the most intuitive. Is this how a more experienced F# programmer would accomplish this task?
I think all three versions are perfectly fine, idiomatic code that F# experts would write.
I generally prefer writing code using built-in language features (like for loops and if conditions) if they let me solve the problem I have. These are imperative, but I think using them is a good idea when the API requires imperative code (like outFile.WriteLine). As you mentioned - you started with this version (and I would do the same).
Using higher-order functions is nice too - although I would probably do that only if I wanted to write data transformation and get a new sequence or list of lines - this would be handy if you were using File.WriteAllLines instead of writing lines one-by-one. Although, that could be also done by simply wrapping your second version with sequence expression:
let transformed =
seq { for line in File.ReadLines(inFile) -> Regex.Replace(line, "\t",",") }
File.WriteAllLines(outFilePath, transformed)
I do not think there is any objective reason to prefer one of the versions. My personal stylistic preference is to use for and refactor to sequence expressions (if needed), but others will likely disagree.
A side note that if you want to write to the same file that you are reading from, you need to remember that Seq is doing lazy evaluation.
Using Array as opposed to Seq makes sure file is closed for reading when it is needed for writing.
This works:
let lines =
file |> File.ReadAllLines
|> Array.map(fun line -> ..modify line..)
File.WriteAllLines(file, lines)
This does not (causes file access file violation)
let lines =
file |> File.ReadLines
|> Seq.map(fun line -> ..modify line..)
File.WriteAllLines(file, lines)
(potential overlap with another discussion here, where intermediate variable helps with the same problem)

Writing F# code to parse "2 + 2" into code

Extremely just-started-yesterday new to F#.
What I want: To write code that parses the string "2 + 2" into (using as an example code from the tutorial project) Expr.Add(Expr.Num 2, Expr.Num 2) for evaluation. Some help to at least point me in the right direction or tell me it's too complex for my first F# project. (This is how I learn things: By bashing my head against stuff that's hard)
What I have: My best guess at code to extract the numbers. Probably horribly off base. Also, a lack of clue.
let script = "2 + 2";
let rec scriptParse xs =
match xs with
| [] -> (double)0
| y::ys -> (double)y
let split = (script.Split([|' '|]))
let f x = (split[x]) // "This code is not a function and cannot be applied."
let list = [ for x in 0..script.Length -> f x ]
let result = scriptParse
Thanks.
The immediate issue that you're running into is that split is an array of strings. To access an element of this array, the syntax is split.[x], not split[x] (which would apply split to the singleton list [x], assuming it were a function).
Here are a few other issues:
Your definition of list is probably wrong: x ranges up to the length of script, not the length of the array split. If you want to convert an array or other sequence to a list you can just use List.ofSeq or Seq.toList instead of an explicit list comprehension [...].
Your "casts" to double are a bit odd - that's not the right syntax for performing conversions in F#, although it will work in this case. double is a function, so the parentheses are unnecessary and what you are doing is really calling double 0 and double y. You should just use 0.0 for the first case, and in the second case, it's unclear what you are converting from.
In general, it would probably be better to do a bit more design up front to decide what your overall strategy will be, since it's not clear to me that you'll be able to piece together a working parser based on your current approach. There are several well known techniques for writing a parser - are you trying to use a particular approach?

Is this a better (more functional way) to write the following fsharp code?

I have pieces of code like this in a project and I realize it's not
written in a functional way:
let data = Array.zeroCreate(3 + (int)firmwareVersions.Count * 27)
data.[0] <- 0x09uy //drcode
data.[1..2] <- firmwareVersionBytes //Number of firmware versions
let mutable index = 0
let loops = firmwareVersions.Count - 1
for i = 0 to loops do
let nameBytes = ASCIIEncoding.ASCII.GetBytes(firmwareVersions.[i].Name)
let timestampBytes = this.getTimeStampBytes firmwareVersions.[i].Timestamp
let sizeBytes = BitConverter.GetBytes(firmwareVersions.[i].Size) |> Array.rev
data.[index + 3 .. index + 10] <- nameBytes
data.[index + 11 .. index + 24] <- timestampBytes
data.[index + 25 .. index + 28] <- sizeBytes
data.[index + 29] <- firmwareVersions.[i].Status
index <- index + 27
firmwareVersions is a List which is part of a csharp library.
It has (and should not have) any knowledge of how it will be converted into
an array of bytes. I realize the code above is very non-functional, so I tried
changing it like this:
let headerData = Array.zeroCreate(3)
headerData.[0] <- 0x09uy
headerData.[1..2] <- firmwareVersionBytes
let getFirmwareVersionBytes (firmware : FirmwareVersion) =
let nameBytes = ASCIIEncoding.ASCII.GetBytes(firmware.Name)
let timestampBytes = this.getTimeStampBytes firmware.Timestamp
let sizeBytes = BitConverter.GetBytes(firmware.Size) |> Array.rev
Array.concat [nameBytes; timestampBytes; sizeBytes]
let data =
firmwareVersions.ToArray()
|> Array.map (fun f -> getFirmwareVersionBytes f)
|> Array.reduce (fun acc b -> Array.concat [acc; b])
let fullData = Array.concat [headerData;data]
So now I'm wondering if this is a better (more functional) way
to write the code. If so... why and what improvements should I make,
if not, why not and what should I do instead?
Suggestions, feedback, remarks?
Thank you
Update
Just wanted to add some more information.
This is part of some library that handles the data for a binary communication
protocol. The only upside I see of the first version of the code is that
people implementing the protocol in a different language (which is the case
in our situation as well) might get a better idea of how many bytes every
part takes up and where exactly they are located in the byte stream... just a remark.
(As not everybody understand english, but all our partners can read code)
I'd be inclined to inline everything because the whole program becomes so much shorter:
let fullData =
[|yield! [0x09uy; firmwareVersionBytes; firmwareVersionBytes]
for firmware in firmwareVersions do
yield! ASCIIEncoding.ASCII.GetBytes(firmware.Name)
yield! this.getTimeStampBytes firmware.Timestamp
yield! BitConverter.GetBytes(firmware.Size) |> Array.rev|]
If you want to convey the positions of the bytes, I'd put them in comments at the end of each line.
I like your first version better because the indexing gives a better picture of the offsets, which are an important piece of the problem (I assume). The imperative code features the byte offsets prominently, which might be important if your partners can't/don't read the documentation. The functional code emphasises sticking together structures, which would be OK if the byte offsets are not important enough to be mentioned in the documentation either.
Indexing is normally accidental complexity, in which case it should be avoided. For example, your first version's loop could be for firmwareVersion in firmwareVersion instead of for i = 0 to loops.
Also, like Brian says, using constants for the offsets would make the imperative version even more readable.
How often does the code run?
The advantage of 'array concatenation' is that it does make it easier to 'see' the logical portions. The disadvantage is that it creates a lot of garbage (allocating temporary arrays) and may also be slower if used in a tight loop.
Also, I think perhaps your "Array.reduce(...)" can just be "Array.concat".
Overall I prefer the first way (just create one huge array), though I would factor it differently to make the logic more apparent (e.g. have a named constant HEADER_SIZE, etc.).
While we're at it, I'd probably add some asserts to ensure that e.g. nameBytes has the expected length.

Resources