F# records, usage, code clarity - f#

Background:
I find myself harnessing F# Records a lot. Currently I am working on a project for packet dissection & replay of a proprietary binary protocol (a protocol that is very strangely designed ...).
We define the skeleton record for the packet.
type bytes = byte array
type packetSkeleton = {
part1 : bytes
part2 : bytes
... }
Now, it is easy to use this to 'dissect' our packet, (really just giving names to the byte fields).
let dissect (raw : bytes) =
let slice a b = raw.[a..b]
{ part1 = slice 0 4
part2 = slice 4 5
... }
This works perfectly even for longish packets, we can even use some neat recursive functions if there is a predicable pattern to the slicing.
So I dissect the packet, pull out the fields that I need and create a packet based off the packetSkeleton using the fields I took from the dissection, which by now is starting to look a bit awkward:
let createAuthStub a b c d e f g h ... =
{ part1 = a; part2 = b
part3 = d; ...
}
Then, after creating the populated stub, I need to deserialise it to a form that can be put on the wire:
(* packetSkeleton -> byte array *)
let deserialise (packet : packetSkeleton) =
[| packet.part1; packet.part2; ... |]
let xab = dissect buf
let authStub = createAuthStub xab.part1 1 2 xab.part9 ...
deserialise authStub |> send
So it ends up that I have 3 areas, the record type, the creation of the record for a given packet, and the deserialised byte array. Something tells me that this is a poor design choice on my part in terms of code clarity, and I can already feel it starting to shoot me in the foot even at this early stage.
Questions:
a) Am I using the correct datatype for such a project? Is my approach correct?
b) Should I just give up on trying to make this code feel clean?
As I am kinda coding this by touch and go, I would appreciate some insights!
P.S I realise that this problem is quite suited for C, but F# is more fun (additionally verification of the dissector later on sounds appealing)!

If a packet could be rather large packetSkeleton might grow unwieldy. Another option is to work with the raw bytes and define a module that reads/writes each part.
module Packet
let Length = 42
let GetPart1 src = src.[0..3]
let SetPart1 src dst = Array.blit src 0 dst 0 4
let GetPart2 src = src.[4..5]
let SetPart2 src dst = Array.blit src 0 dst 4 2
...
open Packet
let createAuthStub bytes b c =
let resp = Array.zeroCreate Packet.Length
SetPart1 (GetPart1 bytes)
SetPart2 b resp
SetPart3 c resp
SetPart4 (GetPart9 bytes)
resp
This removes the need for de/serialization functions (and probably helps performance a bit).
EDIT
Creating a wrapper type is another option
type Packet(bytes: byte[]) =
new() = Packet(Array.zeroCreate Packet.Length)
static member Length = 42
member x.Part1
with get() = bytes.[0..3]
and set value = Array.blit value 0 bytes 0 4
...
which might reduce code a bit:
let createAuthStub (req: Packet) b c =
let resp = Packet()
resp.Part1 <- req.Part1
resp.Part2 <- b
resp.Part3 <- c
resp.Part4 <- req.Part9
resp

I think your approach is essentially sound - but of course, it is difficult to tell without knowing more details.
I think one key idea that shows in your code and that is key to functional architecture is the separation between types (used to model the problem domain) and the processing functionality that creates the values of the domain model, processes it and formats them.
In your case:
The types bytes and packetSkeleton model the problem domain
The function createAuthStub processes your domain (and I agree with Daniel that it might be more readable if it took the whole packetSkeleton as an argument)
The function deserialize turns your domain back to bytes
I think this way of structuring code is quite good, because it separates different concerns of the program. I even wrote an article that tries to describe this as a more general programming approach.

Related

Performant, idiomatic way to concatenate two chars that are not in a list into a string

I've done most of my development in C# and am just learning F#. Here's what I want to do in C#:
string AddChars(char char1, char char2) => char1.ToString() + char2.ToString();
EDIT: added ToString() method to the C# example.
I want to write the same method in F# and I don't know how to do it other than this:
let addChars char1 char2 = Char.ToString(char1) + Char.ToString(char2)
Is there a way to add concatenate these chars into a string without converting both into strings first?
Sidenote:
I also have considered making a char array and converting that into a string, but that seems similarly wasteful.
let addChars (char1:char) (char2: char) = string([|char1; char2|])
As I said in my comment, your C# code is not going to do what you want ( i.e. concatenate the characters into a string). In C#, adding a char and a char will result in an int. The reason for this is because the char type doesn't define a + operator, so C# reverts to the nearest compatable type that does, which just happens to be int. (Source)
So to accomplish this behavior, you will need to do something similar to what you are already trying to do in F#:
char a = 'a';
char b = 'b';
// This is the wrong way to concatenate chars, because the
// chars will be treated as ints and the result will be 195.
Console.WriteLine(a + b);
// These are the correct ways to concatenate characters into
// a single string. The result of all of these will be "ab".
// The third way is the recommended way as it is concise and
// involves creating the fewest temporary objects.
Console.WriteLine(a.ToString() + b.ToString());
Console.WriteLine(Char.ToString(a) + Char.ToString(b));
Console.WriteLine(new String(new[] { a, b }));
(See https://dotnetfiddle.net/aEh1FI)
F# is the same way in that concatenating two or more chars doesn't result in a String. Unlike C#, it results instead in another char, but the process is the same - the char values are treated like int and added together, and the result is the char representation of the sum.
So really, the way to concatenate chars into a String in F# is what you already have, and is the direct translation of the C# equivalent:
let a = 'a'
let b = 'b'
// This is still the wrong way (prints 'Ã')
printfn "%O" (a + b)
// These are still the right ways (prints "ab")
printfn "%O" (a.ToString() + b.ToString())
printfn "%O" (Char.ToString(a) + Char.ToString(b))
printfn "%O" (String [| a;b |]) // This is still the best way
(See https://dotnetfiddle.net/ALwI3V)
The reason the "String from char array" approach is the best way is two-fold. First, it is the most concise, since you can see that that approach offers the shortest line of code in both languages (and the difference only increases as you add more and more chars together). And second, only one temporary object is created (the array) before the final String, whereas the other two methods involve making two separate temporary String objects to feed into the final result.
(Also, I'm not sure if it works this way as the String constructors are hidden in external sources, but I imagine that the array passed into the constructor would be used as the String's backing data, so it wouldn't end up getting wasted at all.) Strings are immutable, but using the passed array directly as the created String's backing data could result in a situation where a reference to the array could be held elsewhere in the program and jeopardize the String's immutability, so this speculation wouldn't fly in practice. (Credit: #CaringDev)
Another option you could do in F# that could be more idiomatic is to use the sprintf function to combine the two characters (Credit: #rmunn):
let a = 'a'
let b = 'b'
let s = sprintf "%c%c" a b
printfn "%O" s
// Prints "ab"
(See https://dotnetfiddle.net/Pp9Tee)
A note of warning about this method, however, is that it is almost certainly going to be much slower than any of the other three methods listed above. That's because instead of processing array or String data directly, sprintf is going to be performing more advanced formatting logic on the output. (I'm not in a position where I could benchmark this myself at the moment, but plugged into #TomasPetricek's benckmarking code below, I wouldn't be surprised if you got performance hits of 10x or more.)
This might not be a big deal as for a single conversion it will still be far faster than any end-user could possibly notice, but be careful if this is going to be used in any performance-critical code.
The answer by #Abion47 already lists all the possible sensible methods I can think of. If you are interested in performance, then you can run a quick experiment using the F# Interactive #time feature:
#time
open System
open System.Text
let a = 'a'
let b = 'b'
Comparing the three methods, the one with String [| a; b |] turns out to be about twice as fast as the methods involving ToString. In practice, that's probably not a big deal unless you are doing millions of such operations (as my experiment does), but it's an interesting fact to know:
// 432ms, 468ms, 472ms
for i in 0 .. 10000000 do
let s = a.ToString() + b.ToString()
ignore s
// 396ms 440ms, 458ms
for i in 0 .. 10000000 do
let s = Char.ToString(a) + Char.ToString(b)
ignore s
// 201ms, 171ms, 170ms
for i in 0 .. 10000000 do
let s = String [| a;b |]
ignore s

F# / Simplest way to validate array length at COMPILE time

I have some scientific project. There are vectors / square matrices of various lengths there. Obviously (for example) a vector of length 2 cannot be added to a vector of length 3 (and so on and so forth). There are several NET libraries, which deal with vectors / matrices. All of them either have generic vectors / matrices OR have some very specific vectors / matrices, which do not suite the needs.
Most, if not all, of these libraries can create a vector from a list or array. Unfortunately, If I mistakenly give an input array of the wrong length, then I will get a vector of the wrong length and then everything will blow up at run time!
I wonder if it is possible to check array length at compile time so that to get a compile error if, let’s say, I try to pass a 5-element array to a vector of length 2 “constructor”. After all, printfn does almost that!
F# type providers come to mind, but I am not sure how to apply them here.
Thanks a lot!
Thanks to the OP for an interesting question. My answer frequency has dropped not because of unwillingness to help but rather that there a few questions that tickles my interest.
We don't have dependent types in F# and F# doesn't support generics with numerical type arguments (like C++).
However we could create distinct types for different dimensions like Dim1, Dim2 and so on and provide them as type arguments.
This would allow us to have a type signature for apply that applies a vector a matrix like this:
let apply (m : Matrix<'R, 'C>) (v : Vector<'C>) : Vector<'R> = …
The code won't compile unless the columns of the matrix matches the length of the vector. In addition; the resulting vector has the length that is rows of the columns.
One way to do this is defining an interface IDimension and some concrete implementions representing the different dimensions.
type IDimension =
interface
abstract Size : int
end
type Dim1 () = class interface IDimension with member x.Size = 1 end end
type Dim2 () = class interface IDimension with member x.Size = 2 end end
The vector and the matrix can then be implemented like this
type Vector<'Dim when 'Dim :> IDimension
and 'Dim : (new : unit -> 'Dim)
> () =
class
let dim = new 'Dim()
let vs = Array.zeroCreate<float> dim.Size
member x.Dim = dim
member x.Values = vs
end
type Matrix<'RowDim, 'ColumnDim when 'RowDim :> IDimension
and 'RowDim : (new : unit -> 'RowDim)
and 'ColumnDim :> IDimension
and 'ColumnDim : (new : unit -> 'ColumnDim)
> () =
class
let rowDim = new 'RowDim()
let columnDim = new 'ColumnDim()
let vs = Array.zeroCreate<float> (rowDim.Size*columnDim.Size)
member x.RowDim = rowDim
member x.ColumnDim = columnDim
member x.Values = vs
end
Finally this allows us to write code like this:
let m76 = Matrix<Dim7, Dim6> ()
let v6 = Vector<Dim6> ()
let v7 = apply m76 v6 // Vector<Dim7>
// Doesn't compile because v7 has the wrong dimension
let vv = apply m76 v7
If you need a wide range of dimensions (because you have an algebra increments/decrements the dimensions of vectors/matrices) you could support that using some smart variant of church numerals.
If this is usable or not is entirely up the reader I think.
PS.
Perhaps unit of measures could have been used for this as well if they applied to more types than floats.
The general term for what you're looking for is dependent types, but F# does not support them.
I've seen an experiment in using type providers to mimic one particular flavor of dependent types (constraining the domain of a primitive type), but I wouldn't expect it to be possible to achieve what you want using type providers in their current form. They seem to be too whimsical for that.
Print format strings appear to be doing that (and in fact printers are a "Hello World" application for dependent types), but actually they work because they get special treatment by the compiler, and the mechanism for that is not extensible.
You're doomed to ensure correct lengths at runtime.
My best bet would be to use structs to encode actual vectors and ensure correctness on the API level that way, map them to arrays at the point where you're interacting with those matrix algebra libraries, then map the results back to structs with ample assertions when done.
The comment from #Justanothermetaprogrammer qualifies as an answer. Here is how it works in the real example. The matrix implementation in the example is based on MathNet.Numerics.LinearAlgebra:
open MathNet.Numerics.LinearAlgebra
type RealMatrix2x2 =
| RealMatrix2x2 of Matrix<double>
static member private createInternal (a : #seq<#seq<double>>) =
matrix a |> RealMatrix2x2
static member create
(
(a11, a12),
(a21, a22)
) =
RealMatrix2x2.createInternal
[|
[| a11; a12|]
[| a21; a22|]
|]
let m2 =
(
(1., 2.),
(3., 4.)
)
|> RealMatrix2x2.create
The tuple signatures and "re-mapping" into #seq<#seq<double>> can be easily code-generated using, for example, Excel or any other convenient tool for as many dimensions as necessary. In fact, the whole class along with any other necessary operator overrides (like multiplication of RealMatrix2x2 by RealMatrix2x2, ...) can be code generated for all necessary dimensions.

mutable state in collection

I'm pretty new to functional programming so this might be a question due to misconception, but I can't get my head around this - from an OOP point of view it seems so obvious...
scenario:
Assume you have an actor or micro-service like architecture approach where messages/requests are sent to some components that handle them and reply. Assume now, one of the components stores some of the data from the requests for future requests (e.g. it calculates a value and stores it in a cache so that the next time the same request occurs, no calculation is needed).
The data can be hold in memory.
question:
How do you in functional programming in general, and especially in f#, handle such a scenario? I guess a static dictionary is not a functional approach and I don't want to include any external things like data stores if possible.
Or more precise:
If an application creates data that will be used later in the processing again, where do we store the data?
example: You have an application that executes some sort of tasks on some initial data. First, you store the inital data (e.g. add it to a dictionary), then you execute the first task that does some processing based on a subset of the data, then you execute the second task that adds additional data and so on until all tasks are done...
Now the basic approach (from my understanding) would be to define the data and use the tasks as some sort of processing-chain that forward the processed data, like initial-data -> task-1 -> task-2 -> ... -> done
but that does not fit an architecture where getting/adding data is done message-based and asynchronous.
approach:
My initial approach was this
type Record = { }
let private dummyStore = new System.Collections.Concurrent.ConcurrentBag<Record>()
let search comparison =
let matchingRecords = dummyStore |> Seq.where (comparison)
if matchingRecords |> Seq.isEmpty
then EmptyFailedRequest
else Record (matchingRecords |> Seq.head)
let initialize initialData =
initialData |> Seq.iter (dummyStore.Add)
let add newRecord =
dummyStore.Add(newRecord)
encapsulated in a module that looks to me like an OOP approach.
After #Gustavo asked me to provide an example and considering his suggestion I've realized that I could do it like this (go one level higher to the place where the functions are actually called):
let handleMessage message store =
// all the operations from above but now with Seq<Record> -> ... -> Seq<Record>
store
let agent = MailboxProcessor.Start(fun inbox->
let rec messageLoop store = async{
let! msg = inbox.Receive()
let modifiedStore = handleMessage msg store
return! messageLoop modifiedStore
}
messageLoop Seq.empty
)
This answers the question for me well since it removed mutability and shared state at all. But when just looking at the first approach, I cannot think of any solution w/o the collection outside the functions
Please note that this question is in f# to explain the environment, the syntax etc. I don't want a solution that works because f# is multi-paradigm, I would like to get a functional approach for that.
I've read all questions that I could find on SO so far but they either prove the theoretical possibility or they use collections for this scenario - if duplicated please point me the right direction.
You can use a technique called memoization which is very common in FP.
And it consists precisely on keeping a dictionary with the calculated values.
Here's a sample implementation:
open System
open System.Collections.Concurrent
let getOrAdd (a:ConcurrentDictionary<'A,'B>) (b:_->_) k = a.GetOrAdd(k, b)
let memoize f =
let dic = new ConcurrentDictionary<_,_>()
getOrAdd dic f
Note that with memoize you can decorate any function and get a memoized version of it. Here's a sample:
let f x =
printfn "calculating f (%i)" x
2 * x
let g = memoize f // g is the memoized version of f
// test
> g 5 ;;
calculating f (5)
val it : int = 10
> g 5 ;;
val it : int = 10
You can see that in the second execution the value was not calculated.

F# ref-mutable vars vs object fields

I'm writing a parser in F#, and it needs to be as fast as possible (I'm hoping to parse a 100 MB file in less than a minute). As normal, it uses mutable variables to store the next available character and the next available token (i.e. both the lexer and the parser proper use one unit of lookahead).
My current partial implementation uses local variables for these. Since closure variables can't be mutable (anyone know the reason for this?) I've declared them as ref:
let rec read file includepath =
let c = ref ' '
let k = ref NONE
let sb = new StringBuilder()
use stream = File.OpenText file
let readc() =
c := stream.Read() |> char
// etc
I assume this has some overhead (not much, I know, but I'm trying for maximum speed here), and it's a little inelegant. The most obvious alternative would be to create a parser class object and have the mutable variables be fields in it. Does anyone know which is likely to be faster? Is there any consensus on which is considered better/more idiomatic style? Is there another option I'm missing?
You mentioned that local mutable values cannot be captured by a closure, so you need to use ref instead. The reason for this is that mutable values captured in the closure need to be allocated on the heap (because closure is allocated on the heap).
F# forces you to write this explicitly (using ref). In C# you can "capture mutable variable", but the compiler translates it to a field in a heap-allocated object behind the scene, so it will be on the heap anyway.
Summary is: If you want to use closures, mutable variables need to be allocated on the heap.
Now, regarding your code - your implementation uses ref, which creates a small object for every mutable variable that you're using. An alternative would be to create a single object with multiple mutable fields. Using records, you could write:
type ReadClosure = {
mutable c : char
mutable k : SomeType } // whatever type you use here
let rec read file includepath =
let state = { c = ' '; k = NONE }
// ...
let readc() =
state.c <- stream.Read() |> char
// etc...
This may be a bit more efficient, because you're allocating a single object instead of a few objects, but I don't expect the difference will be noticeable.
There is also one confusing thing about your code - the stream value will be disposed after the function read returns, so the call to stream.Read may be invalid (if you call readc after read completes).
let rec read file includepath =
let c = ref ' '
use stream = File.OpenText file
let readc() =
c := stream.Read() |> char
readc
let f = read a1 a2
f() // This would fail!
I'm not quite sure how you're actually using readc, but this may be a problem to think about. Also, if you're declaring it only as a helper closure, you could probably rewrite the code without closure (or write it explicitly using tail-recursion, which is translated to imperative loop with mutable variables) to avoid any allocations.
I did the following profiling:
let test() =
tic()
let mutable a = 0.0
for i=1 to 10 do
for j=1 to 10000000 do
a <- a + float j
toc("mutable")
let test2() =
tic()
let a = ref 0.0
for i=1 to 10 do
for j=1 to 10000000 do
a := !a + float j
toc("ref")
the average for mutable is 50ms, while ref 600ms. The performance difference is due to that mutable variables are in stack, while ref variables are in managed heap.
The relative difference is big. However, 10^8 times of access is a big number. And the total time is acceptable. So don't worry too much about the performance of ref variables. And remember:
Premature optimization is the root of
all evil.
My advice is you first finish your parser, then consider optimizing it. You won't know where the bottomneck is until you actually run the program. One good thing about F# is that its terse syntax and functional style well support code refactoring. Once the code is done, optimizing it would be convenient. Here's an profiling example.
Just another example, we use .net arrays everyday, which is also in managed heap:
let test3() =
tic()
let a = Array.create 1 0.0
for i=1 to 10 do
for j=1 to 10000000 do
a.[0] <- a.[0] + float j
toc("array")
test3() runs about the same as ref's. If you worry too much of variables in managed heap, then you won't use array anymore.

Is this a better (more functional way) to write the following fsharp code?

I have pieces of code like this in a project and I realize it's not
written in a functional way:
let data = Array.zeroCreate(3 + (int)firmwareVersions.Count * 27)
data.[0] <- 0x09uy //drcode
data.[1..2] <- firmwareVersionBytes //Number of firmware versions
let mutable index = 0
let loops = firmwareVersions.Count - 1
for i = 0 to loops do
let nameBytes = ASCIIEncoding.ASCII.GetBytes(firmwareVersions.[i].Name)
let timestampBytes = this.getTimeStampBytes firmwareVersions.[i].Timestamp
let sizeBytes = BitConverter.GetBytes(firmwareVersions.[i].Size) |> Array.rev
data.[index + 3 .. index + 10] <- nameBytes
data.[index + 11 .. index + 24] <- timestampBytes
data.[index + 25 .. index + 28] <- sizeBytes
data.[index + 29] <- firmwareVersions.[i].Status
index <- index + 27
firmwareVersions is a List which is part of a csharp library.
It has (and should not have) any knowledge of how it will be converted into
an array of bytes. I realize the code above is very non-functional, so I tried
changing it like this:
let headerData = Array.zeroCreate(3)
headerData.[0] <- 0x09uy
headerData.[1..2] <- firmwareVersionBytes
let getFirmwareVersionBytes (firmware : FirmwareVersion) =
let nameBytes = ASCIIEncoding.ASCII.GetBytes(firmware.Name)
let timestampBytes = this.getTimeStampBytes firmware.Timestamp
let sizeBytes = BitConverter.GetBytes(firmware.Size) |> Array.rev
Array.concat [nameBytes; timestampBytes; sizeBytes]
let data =
firmwareVersions.ToArray()
|> Array.map (fun f -> getFirmwareVersionBytes f)
|> Array.reduce (fun acc b -> Array.concat [acc; b])
let fullData = Array.concat [headerData;data]
So now I'm wondering if this is a better (more functional) way
to write the code. If so... why and what improvements should I make,
if not, why not and what should I do instead?
Suggestions, feedback, remarks?
Thank you
Update
Just wanted to add some more information.
This is part of some library that handles the data for a binary communication
protocol. The only upside I see of the first version of the code is that
people implementing the protocol in a different language (which is the case
in our situation as well) might get a better idea of how many bytes every
part takes up and where exactly they are located in the byte stream... just a remark.
(As not everybody understand english, but all our partners can read code)
I'd be inclined to inline everything because the whole program becomes so much shorter:
let fullData =
[|yield! [0x09uy; firmwareVersionBytes; firmwareVersionBytes]
for firmware in firmwareVersions do
yield! ASCIIEncoding.ASCII.GetBytes(firmware.Name)
yield! this.getTimeStampBytes firmware.Timestamp
yield! BitConverter.GetBytes(firmware.Size) |> Array.rev|]
If you want to convey the positions of the bytes, I'd put them in comments at the end of each line.
I like your first version better because the indexing gives a better picture of the offsets, which are an important piece of the problem (I assume). The imperative code features the byte offsets prominently, which might be important if your partners can't/don't read the documentation. The functional code emphasises sticking together structures, which would be OK if the byte offsets are not important enough to be mentioned in the documentation either.
Indexing is normally accidental complexity, in which case it should be avoided. For example, your first version's loop could be for firmwareVersion in firmwareVersion instead of for i = 0 to loops.
Also, like Brian says, using constants for the offsets would make the imperative version even more readable.
How often does the code run?
The advantage of 'array concatenation' is that it does make it easier to 'see' the logical portions. The disadvantage is that it creates a lot of garbage (allocating temporary arrays) and may also be slower if used in a tight loop.
Also, I think perhaps your "Array.reduce(...)" can just be "Array.concat".
Overall I prefer the first way (just create one huge array), though I would factor it differently to make the logic more apparent (e.g. have a named constant HEADER_SIZE, etc.).
While we're at it, I'd probably add some asserts to ensure that e.g. nameBytes has the expected length.

Resources