Suppose I have a two step process. First data collection/cleaning and second some operation.
For example:
#r "nuget: Deedle"
open Deedle
type Person =
{ Name:string; Birthday:DateTime}
let fixB b =
if b > DateTime(2023,01,01) then OptionalValue.Missing else OptionalValue b
let peopleRecds = [ { Name = "Joe"; Birthday = DateTime(9999,12,31) }
{ Name = "Jim"; Birthday = DateTime(2000,12,31) }]
let df = Frame.ofRecords peopleRecds
let step1 = df.Clone()
step1.ReplaceColumn("Birthday", df |> Frame.mapRowValues (fun row -> fixB (row.GetAs<DateTime>"Birthday")))
step1.SaveCsv(__SOURCE_DIRECTORY__ + "step1.csv")
let step1' = Frame.ReadCsv(__SOURCE_DIRECTORY__ + "step1.csv")
step1.Print()
Name Birthday
0 -> Joe <missing>
1 -> Jim 12/31/2000 12:00:00 AM
If I save it (step1') or not (step1), I would like to continue without having to deal with different cases in step2.
let payout b =
match b with
| OptionalValue.Present c -> if c > DateTime(2000,01,01) then 100 else 0
| OptionalValue.Missing -> 0
let step2 = step1.Clone()
step2.AddColumn("Payout", step1 |> Frame.mapRowValues (fun row -> payout (row.TryGetAs<DateTime>"Birthday")))
Error: System.InvalidCastException: Object must implement IConvertible.
The first issue is that the way you use mapRowValues introduces optional values into the data frame (this is something that is often automatically eliminated, but not in this case it seems). OptionValue<'T> does not implement IConvertible, so this later causes issues. You can solve this by calculating birthday as follows:
let fixB b =
if b > DateTime(2023,01,01) then None else Some b
let bday =
df.Columns.["Birthda y"].As<DateTime>()
|> Series.mapAll (fun _ v -> Option.bind fixB v)
step1.ReplaceColumn("Birthday", bday)
The second issue with saving and loading data frame is that the CSV parser does not seem to automatically figure out that Birthday is DateTime. You can solve this by adding an explicit schema (and you can also disable saving of keys to make sure the frame you load is exactly the same as the one you save):
step1.SaveCsv(__SOURCE_DIRECTORY__ + "step1.csv",includeRowKeys=false)
let step1' = Frame.ReadCsv(__SOURCE_DIRECTORY__ + "step1.csv", schema="string,date")
Related
How would I write a record in F# into a csv? It would be optimal to have one row for each instance of a certain variable. My record and final output is a map like the one below.
type Family =
{ Month : int
Year : int
Income : float
Family : int
Dogs : int
Cats : int
}
let monthly =
timeMap
|> Seq.ofList
|> Seq.map(fun ((month,year), rows) ->
{ Month = month
Year = year
Income = rows.Inc
Family = familyMap.[(month,year)].Children
Dogs = familyMap.[(month,year)].Dogs
Cats = familyMap.[(month,year)].Cats
})
|> List.ofSeq
let map =
monthly
|> List.map (fun x -> (x.Year,x.Month),x)
|> Map.ofList
EDITED
This is what I have tried, but I am getting the error that (A,B,C,D,E,F) are not defined, and that it is recommended that I use the syntax new (type) args. This last error is showing up under >> MyCsvType
type MyCsvType = CsvProvider<Schema = "A (int), B (int), C (float), D (int), E (int), F (int)", HasHeaders = false>
let myCsvBuildRow (x:Family) = MyCsvType.Row(x.A,x.B,x.C,x.D,x.E,x.F)
let myCsvBuildTable = (Seq.map myCsvBuildRow) >> Seq.toList >> MyCsvType
let myCsv = monthly|> myCsvBuildTable
myCsv.SaveToString()
Your code is almost there, except that the myCsvBuildRow function needs to access members of the Family type using their correct names. In your version, you are accessing names such as A, B, etc., but those are the names of columns in your CSV file, not the names of members of the F# record. The following does the trick for me:
type MyCsvType = CsvProvider<Schema = "A (int), B (int), C (float), D (int), E (int), F (int)", HasHeaders = false>
let myCsvBuildRow (x:Family) =
MyCsvType.Row(x.Month,x.Year,x.Income,x.Family,x.Dogs,x.Cats)
let myCsvBuildTable data =
new MyCsvType(Seq.map myCsvBuildRow data)
let myCsv = family |> myCsvBuildTable
myCsv.SaveToString()
While defining an inventory system for an RPG game, I came across an odd issue. So, what I'm trying to do is to add items a player would get from a shop. While adding, I'm make sure not to go over the weight limit and will increase the quantity of an item if it were already in my inventory bag, otherwise I'll plainly add the item.
So far so good, this looks pretty sane. My issue is when I'm updating my abstract class, IntelliSens tries to tell me that I don't have that property define for the type that I'm using. Actually, it can't find any of the property of the abstract class. Could be a bad mistake, but I've been scratching my head over this for quite some time and I would like some support !
UPDATE
here's the compiling error:The type 'InventoryItem' does not contain a field 'Quantity'..\InventoryItems.fs 188
[<AbstractClass>]
type InventoryItem() =
abstract member ItemName : string
abstract member ItemDescription : string
abstract member ItemWeight : float<kg>
abstract member ItemPrice : float<usd>
abstract member Quantity : int with get, set
let makeBagItemsDistinct (bag: InventoryItem array) =
bag |> Seq.distinct |> Seq.toArray
type Inventory = {
Bag : InventoryItem array
Weight: float<kg>
}
with
member x.addItem (ii: InventoryItem): Inventory =
if x.Weight >= MaxWeight <> true then x
elif (x.Weight + ii.ItemWeight) >= MaxWeight then x
else
let oItemIndex = x.Bag |> Array.tryFindIndex(fun x -> x = ii)
match oItemIndex with
| Some index ->
// There already an item of this type in the bag
let item = x.Bag |> Array.find(fun x -> x = ii)
let newBag =
x.Bag
|> Array.filter((<>) item)
|> Array.append [| { item with Quantity = item.Quantity +ii.Quantity |]
|> makeBagItemsDistinct
let inventory = { x with Bag = newBag }
{ inventory with Weight = inventory.Weight + item.ItemWeight }
| None ->
let newBag = x.Bag |> Array.append [|ii|] |> makeBagItemsDistinct
let inventory = { x with Bag = newBag }
{ inventory with Weight = inventory.Weight + ii.ItemWeight }
The with keyword works with records only. You are trying to use it on a class.
You might want to switch to a record if you want to always copy InventoryItem on change, like you are already doing with Inventory.
I want to sort items of a class and collect them in Collection-Classes that beside a List-Member also contain further information that are necessary for the sorting process.
The following example is a a very simplified example for my problem. Although it doesn't make sense, I hope it still can help to understand my Question.
type ItemType = Odd|Even //realworld: more than two types possible
type Item(number) =
member this.number = number
member this.Type = if (this.number % 2) = 0 then Even else Odd
type NumberTypeCollection(numberType:ItemType , ?items:List<Item>) =
member this.ItemType = numberType
member val items:List<Item> = defaultArg items List.empty<Item> with get,set
member this.append(item:Item) = this.items <- item::this.items
let addToCollection (collections:List<NumberTypeCollection>) (item:Item) =
let possibleItem =
collections
|> Seq.where (fun c -> c.ItemType = item.Type) //in my realworld code, several groups may be returned
|> Seq.tryFind(fun _ -> true)
match possibleItem with
|Some(f) -> f.append item
collections
|None -> NumberTypeCollection(item.Type, [item]) :: collections
let rec findTypes (collections:List<NumberTypeCollection>) (items:List<Item>) =
match items with
| [] -> collections
| h::t -> let newCollections = ( h|> addToCollection collections)
findTypes newCollections t
let items = [Item(1);Item(2);Item(3);Item(4)]
let finalCollections = findTypes List.empty<NumberTypeCollection> items
I'm unsatisfied with the addToCollection method, since it requires the items in NumberTypeCollection to be mutual. Maybe there are further issues.
What can be a proper functional solution to solve this issue?
Edit: I'm sorry. May code was too simplified. Here is a little more complex example that should hopefully illustrate why I chose the mutual class-member (although this could still be the wrong decision):
open System
type Origin = Afrika|Asia|Australia|Europa|NorthAmerika|SouthAmerica
type Person(income, taxrate, origin:Origin) =
member this.income = income
member this.taxrate = taxrate
member this.origin = origin
type PersonGroup(origin:Origin , ?persons:List<Person>) =
member this.origin = origin
member val persons:List<Person> = defaultArg persons List.empty<Person> with get,set
member this.append(person:Person) = this.persons <- person::this.persons
//just some calculations to group people into some subgroups
let isInGroup (person:Person) (personGroup:PersonGroup) =
let avgIncome =
personGroup.persons
|> Seq.map (fun p -> float(p.income * p.taxrate) / 100.0)
|> Seq.average
Math.Abs ( (avgIncome / float person.income) - 1.0 ) < 0.5
let addToGroup (personGroups:List<PersonGroup>) (person:Person) =
let possibleItem =
personGroups
|> Seq.where (fun p -> p.origin = person.origin)
|> Seq.where (isInGroup person)
|> Seq.tryFind(fun _ -> true)
match possibleItem with
|Some(f) -> f.append person
personGroups
|None -> PersonGroup(person.origin, [person]) :: personGroups
let rec findPersonGroups (persons:List<Person>) (personGroups:List<PersonGroup>) =
match persons with
| [] -> personGroups
| h::t -> let newGroup = ( h|> addToGroup personGroups)
findPersonGroups t newGroup
let persons = [Person(1000,20, Afrika);Person(1300,22,Afrika);Person(500,21,Afrika);Person(400,20,Afrika)]
let c = findPersonGroups persons List.empty<PersonGroup>
What I may need to emphasize: There can be several different groups with the same origin.
Tomas' solution using groupby is the optimal approach if you want to generate your collections only once, it's a simple and concise.
If you want to be able to add/remove items in a functional, referentially transparent style for this type of problem, I suggest you move away from seq and start using Map.
You have a setup which is fundamentally dictionary-like. You have a unique key and a value. The functional F# equivalent to a dictionary is a Map, it is an immutable data structure based on an AVL tree. You can insert, remove and search in O(log n) time. When you append/remove from the Map, the old Map is maintained and you receive a new Map.
Here is your code expressed in this style
type ItemType =
|Odd
|Even
type Item (number) =
member this.Number = number
member this.Type = if (this.Number % 2) = 0 then Even else Odd
type NumTypeCollection = {Items : Map<ItemType, Item list>}
/// Functions on NumTypeCollection
module NumberTypeCollection =
/// Create empty collection
let empty = {Items = Map.empty}
/// Append one item to the collection
let append (item : Item) numTypeCollection =
let key = item.Type
match Map.containsKey key numTypeCollection.Items with
|true ->
let value = numTypeCollection.Items |> Map.find key
let newItems =
numTypeCollection.Items
|> Map.remove key
|> Map.add key (item :: value) // append item
{Items = newItems }
|false -> {Items = numTypeCollection.Items |> Map.add key [item]}
/// Append a list of items to the collections
let appendList (item : Item list) numTypeCollection =
item |> List.fold (fun acc it -> append it acc) numTypeCollection
Then call it using:
let items = [Item(1);Item(2);Item(3);Item(4)]
let finalCollections = NumberTypeCollection.appendList items (NumberTypeCollection.empty)
If I understand your problem correctly, you're trying to group the items by their type. The easiest way to do that is to use the standard library function Seq.groupBy. The following should implement the same logic as your code:
items
|> Seq.groupBy (fun item -> item.Type)
|> Seq.map (fun (key, values) ->
NumberTypeCollection(key, List.ofSeq values))
Maybe there are further issues.
Probably. It's difficult to tell, since it's hard to detect the purpose of the OP code... still:
Why do you even need an Item class? Instead, you could simply have a itemType function:
let itemType i = if i % 2 = 0 then Even else Odd
This function is referentially transparent, which means that you can replace it with its value if you wish. That makes it as good as a property getter method, but now you've already saved yourself from introducing a new type.
Why define a NumberTypeCollection class? Why not a simple record?
type NumberTypeList = { ItemType : ItemType; Numbers : int list }
You can implement addToCollection like something like this:
let addToCollection collections i =
let candidate =
collections
|> Seq.filter (fun c -> c.ItemType = (itemType i))
|> Seq.tryHead
match candidate with
| Some x ->
let x' = { x with Numbers = i :: x.Numbers }
collections |> Seq.filter ((<>) x) |> Seq.append [x']
| None ->
collections |> Seq.append [{ ItemType = (itemType i); Numbers = [i] }]
Being immutable, it doesn't mutate the input collections, but instead returns a new sequence of NumberTypeList.
Also notice the use of Seq.tryHead instead of Seq.tryFind(fun _ -> true).
Still, if you're attempting to group items, then Tomas' suggestion of using Seq.groupBy is more appropriate.
Hi I'm looking to find the best way to read in a fixed width text file using F#. The file will be plain text, from one to a couple of thousand lines long and around 1000 characters wide. Each line contains around 50 fields, each with varying lengths. My initial thoughts were to have something like the following
type MyRecord = {
Name : string
Address : string
Postcode : string
Tel : string
}
let format = [
(0,10)
(10,50)
(50,7)
(57,20)
]
and read each line one by one, assigning each field by the format tuple(where the first item is the start character and the second is the number of characters wide).
Any pointers would be appreciated.
The hardest part is probably to split a single line according to the column format. It can be done something like this:
let splitLine format (line : string) =
format |> List.map (fun (index, length) -> line.Substring(index, length))
This function has the type (int * int) list -> string -> string list. In other words, format is an (int * int) list. This corresponds exactly to your format list. The line argument is a string, and the function returns a string list.
You can map a list of lines like this:
let result = lines |> List.map (splitLine format)
You can also use Seq.map or Array.map, depending on how lines is defined. Such a result will be a string list list, and you can now map over such a list to produce a MyRecord list.
You can use File.ReadLines to get a lazily evaluated sequence of strings from a file.
Please note that the above is only an outline of a possible solution. I left out boundary checks, error handling, and such. The above code may contain off-by-one errors.
Here's a solution with a focus on custom validation and error handling for each field. This might be overkill for a data file consisting of just numeric data!
First, for these kinds of things, I like to use the parser in Microsoft.VisualBasic.dll as it's already available without using NuGet.
For each row, we can return the array of fields, and the line number (for error reporting)
#r "Microsoft.VisualBasic.dll"
// for each row, return the line number and the fields
let parserReadAllFields fieldWidths textReader =
let parser = new Microsoft.VisualBasic.FileIO.TextFieldParser(reader=textReader)
parser.SetFieldWidths fieldWidths
parser.TextFieldType <- Microsoft.VisualBasic.FileIO.FieldType.FixedWidth
seq {while not parser.EndOfData do
yield parser.LineNumber,parser.ReadFields() }
Next, we need a little error handling library (see http://fsharpforfunandprofit.com/rop/ for more)
type Result<'a> =
| Success of 'a
| Failure of string list
module Result =
let succeedR x =
Success x
let failR err =
Failure [err]
let mapR f xR =
match xR with
| Success a -> Success (f a)
| Failure errs -> Failure errs
let applyR fR xR =
match fR,xR with
| Success f,Success x -> Success (f x)
| Failure errs,Success _ -> Failure errs
| Success _,Failure errs -> Failure errs
| Failure errs1, Failure errs2 -> Failure (errs1 # errs2)
Then define your domain model. In this case, it is the record type with a field for each field in the file.
type MyRecord =
{id:int; name:string; description:string}
And then you can define your domain-specific parsing code. For each field I have created a validation function (validateId, validateName, etc).
Fields that don't need validation can pass through the raw data (validateDescription).
In fieldsToRecord the various fields are combined using applicative style (<!> and <*>).
For more on this, see http://fsharpforfunandprofit.com/posts/elevated-world-3/#validation.
Finally, readRecords maps each input row to the a record Result and chooses the successful ones only. The failed ones are written to a log in handleResult.
module MyFileParser =
open Result
let createRecord id name description =
{id=id; name=name; description=description}
let validateId (lineNo:int64) (fields:string[]) =
let rawId = fields.[0]
match System.Int32.TryParse(rawId) with
| true, id -> succeedR id
| false, _ -> failR (sprintf "[%i] Can't parse id '%s'" lineNo rawId)
let validateName (lineNo:int64) (fields:string[]) =
let rawName = fields.[1]
if System.String.IsNullOrWhiteSpace rawName then
failR (sprintf "[%i] Name cannot be blank" lineNo )
else
succeedR rawName
let validateDescription (lineNo:int64) (fields:string[]) =
let rawDescription = fields.[2]
succeedR rawDescription // no validation
let fieldsToRecord (lineNo,fields) =
let (<!>) = mapR
let (<*>) = applyR
let validatedId = validateId lineNo fields
let validatedName = validateName lineNo fields
let validatedDescription = validateDescription lineNo fields
createRecord <!> validatedId <*> validatedName <*> validatedDescription
/// print any errors and only return good results
let handleResult result =
match result with
| Success record -> Some record
| Failure errs -> printfn "ERRORS %A" errs; None
/// return a sequence of records
let readRecords parserOutput =
parserOutput
|> Seq.map fieldsToRecord
|> Seq.choose handleResult
Here's an example of the parsing in practice:
// Set up some sample text
let text = """01name1description1
02name2description2
xxname3badid-------
yy badidandname
"""
// create a low-level parser
let textReader = new System.IO.StringReader(text)
let fieldWidths = [| 2; 5; 11 |]
let parserOutput = parserReadAllFields fieldWidths textReader
// convert to records in my domain
let records =
parserOutput
|> MyFileParser.readRecords
|> Seq.iter (printfn "RECORD %A") // print each record
The output will look like:
RECORD {id = 1;
name = "name1";
description = "description";}
RECORD {id = 2;
name = "name2";
description = "description";}
ERRORS ["[3] Can't parse id 'xx'"]
ERRORS ["[4] Can't parse id 'yy'"; "[4] Name cannot be blank"]
By no means is this the most efficient way to parse a file (I think there are some CSV parsing libraries available on NuGet that can do validation while parsing) but it does show how you can have complete control over validation and error handling if you need it.
A record of 50 fields is a bit unwieldy, therefore alternate approaches which allow dynamic generation of the data structure may be preferable (eg. System.Data.DataRow).
If it has to be a record anyway, you could spare at least the manual assignment to each record field and populate it with the help of Reflection instead. This trick relies on the field order as they are defined. I am assuming that every column of fixed width represents a record field, so that start indices are implied.
open Microsoft.FSharp.Reflection
type MyRecord = {
Name : string
Address : string
City : string
Postcode : string
Tel : string } with
static member CreateFromFixedWidth format (line : string) =
let fields =
format
|> List.fold (fun (index, acc) length ->
let str = line.[index .. index + length - 1].Trim()
index + length, box str :: acc )
(0, [])
|> snd
|> List.rev
|> List.toArray
FSharpValue.MakeRecord(
typeof<MyRecord>,
fields ) :?> MyRecord
Example data:
"Postman Pat " +
"Farringdon Road " +
"London " +
"EC1A 1BB" +
"+44 20 7946 0813"
|> MyRecord.CreateFromFixedWidth [16; 16; 16; 8; 16]
// val it : MyRecord = {Name = "Postman Pat";
// Address = "Farringdon Road";
// City = "London";
// Postcode = "EC1A 1BB";
// Tel = "+44 20 7946 0813";}
I am not sure about "exclusive state management" thing in the title, I did my best making it up trying to put the problem concisely.
I am porting some of my C# code to F# trying to do it as idiomatic as I can. I have an entity that requests a number of ID's from a sequence in my database and then dispenses these ID to anyone in need. Once an id is given out it should no longer be available for anybody else. Hence there must be some sort of state associated with that entity that keeps track of the remaining number of IDs. Since using a mutable state is not idiomatic, what I can do is to write something like this:
let createIdManager =
let idToStartWith = 127
let allowed = 10
let givenOut = 0
(idToStartWith, allowed, givenOut)
-
let getNextAvailableId (idToStartWith, allowed, givenOut) =
if givenOut< allowed
then ((idToStartWith, allowed, givenOut+ 1), Some(idToStartWith + givenOut))
else ((idToStartWith, allowed, givenOut), None)
let (idManager, idOpt) = getNextAvailableId createIdManager()
match idOpt with
| Some(id) -> printf "Yay!"
| None -> reloadIdManager idManager |> getNextAvailableId
This approach is idiomatic (as far as I can tell) but extremely vulnerable. There are so many ways to get it messed up. My biggest concern is that once an id is advanced and a newer copy of id manager is made, there is no force that can stop you from using the older copy and get the same id again.
So how do I do exclusive state management, per se, in F#?
If you only need to initialize the set of ids once then you can simply hide a mutable reference to a list inside a local function scope, as in:
let nextId =
let idsRef = ref <| loadIdsFromDatabase()
fun () ->
match idsRef.Value with
| [] ->
None
| id::ids ->
idsRef := ids
Some id
let id1 = nextId ()
let id2 = nextId ()
You could use a state-monad(Computational Expression).
First we declare the state-monad
type State<'s,'a> = State of ('s -> 'a * 's)
type StateBuilder<'s>() =
member x.Return v : State<'s,_> = State(fun s -> v,s)
member x.Bind(State v, f) : State<'s,_> =
State(fun s ->
let (a,s) = v s
let (State v') = f a
v' s)
let withState<'s> = StateBuilder<'s>()
let runState (State f) init = f init
Then we define your 'IdManager' and a function to get the next available id as well as the new state after the execution of the function.
type IdManager = {
IdToStartWith : int
Allowed : int
GivenOut : int
}
let getNextId state =
if state.Allowed > state.GivenOut then
Some (state.IdToStartWith + state.GivenOut), { state with GivenOut = state.GivenOut + 1 }
else
None, state
Finally we define our logic that requests the ids and execute the state-monad.
let idStateProcess =
withState {
let! id1 = State(getNextId)
printfn "Got id %A" id1
let! id2 = State(getNextId)
printfn "Got id %A" id2
//...
return ()
}
let initState = { IdToStartWith = 127; Allowed = 10; GivenOut = 0 }
let (_, postState) =
runState
idStateProcess
initState //This should be loaded from database in your case
Output:
Got id Some 127
Got id Some 128