I am trying to read a csv file (without headers) using F#-data. So far I have got:
let filename = #"data.csv"
let file = File.OpenText(filename)
let data = CsvFile.Load(file)
for row in data.Rows do
// ..
I would like to convert each row to an array of integers. How can I accomplish this?
The row value is an object that exposes all values of the row via row.Columns. This is an array of strings, so you can use Array.map and turn each value into a float using the float function.
for row in data.Rows do
let asFloatArray = Array.map float row.Columns
printfn "%A" asFloatArray // TODO: Do something useful here :-)
Related
I have a function to read a byte from an array:
let readByte array address = (* []<byte> -> int -> byte *)
(* ... *)
In an other function where I use this function very often, I'd like to do partial application:
let myOtherFunction array =
let readByteAsInt = readByte array
(* ... *)
let x = readByteAsInt address
Basically this works. In my example x is now of type byte but I would need it as an int. In this simple example I could cast the return value when assigning to x but as I said, I use readByteAsInt very often and I don't want to do the cast on every call.
I have tried
let readByteAsInt = int (readByte array)
and
let readByteAsInt = readByte array |> int
but both result in an error when calling readByteAsInt:
FS0003: This value is not a function and cannot be applied
How can I change the return type of the partial application?
Changing the signature of readByte to return an int is not an option as I do need byte elsewhere.
readByte array is a function, and int is another function; you want to compose them so that the output of readByte array becomes the input of int. You can write readByte array >> int - note that it's >> for composition, instead of |> for application.
Like #kaya3 mentioned, you can use function composition with (<<) or (>>). But even without function application or composition, you could just create another function directly with a parameter instead.
let readByteAsInt pos = int (readByte array pos)
let readByteAsInt pos = readByte array pos |> int
Or the function compositon way
let readByteAsInt = int << readByte array
let readByteAsInt = readByte array >> int
Keep in mind, that function composition, does not always work in F# when the function has a generic input, then you must define a function with arguments explicitly.
I have an array of strings (they were urls that are casted to strings), in the middle of each string they all have a UUID().uuidString with a count variable and a .jpg in them.
let arr = [ htps://firebasestorage.googleapis.com/myApp.appspot.com/o/users%Mxd6EUO5l2LK-mKa%2F4606E275-B2C5-4A69-B997-01423ABFE3B7%2FBE26726D-B8E5-47C8-9A18-504D23B99090_3.jpg?alt=media&token=e215e6a1-f5b9-431e-83a3,
htps://firebasestorage.googleapis.com/myAapp.appspot.com/o/users%-Ll_Mxd6EUO5l2LK-mKa%2F4606E275-B2C5-4A69-B997-01423ABFE3B7%2FBE26726D-B8E5-47C8-9A18-504D23B99090_1.jpg?alt=media&token=f350cf36-4c4e-4faf,
htps://firebasestorage.googleapis.com/myAPp.appspot.com/o/users%mKa%2F4606E275-B2C5-4A69-B997-01423ABFE3B7%2FBE26726D-B8E5-47C8-9A18-504D23B99090_2.jpg?alt=media&token=123uyqtr
....]
The first element has this in the middle of it: 2FBE26726D-B8E5-47C8-9A18-504D23B99090_3.jpg
The second element has this in the middle of it: 2FBE26726D-B8E5-47C8-9A18-504D23B99090_1.jpg
The third element has this in the middle of it: 2FBE26726D-B8E5-47C8-9A18-504D23B99090_2.jpg
The fourth element and on and on ..
How can I sort these strings in this array based on either the substring of the UUID with the _x.jpg or just the _x.jpg alone?
FYI I have access to the UUID beforehand
You can sort the array this way
Convert the strings (back) to URL.
Get the lastPathComponent of each URL.
extract the substring from the last underscore character to the end.
Compare the strings with compare: and numeric option or localizedStandardCompare:
If your starting array is something like this
let array = ["htps://firebasestorage.googleapis.com/myApp.appspot.com/o/users%Mxd6EUO5l2LK-mKa%2F4606E275-B2C5-4A69-B997-01423ABFE3B7%2FBE26726D-B8E5-47C8-9A18-504D23B99090_3.jpg?alt=media&token=e215e6a1-f5b9-431e-83a3","htps://firebasestorage.googleapis.com/myAapp.appspot.com/o/users%-Ll_Mxd6EUO5l2LK-mKa%2F4606E275-B2C5-4A69-B997-01423ABFE3B7%2FBE26726D-B8E5-47C8-9A18-504D23B99090_1.jpg?alt=media&token=f350cf36-4c4e-4faf","htps://firebasestorage.googleapis.com/myAPp.appspot.com/o/users%mKa%2F4606E275-B2C5-4A69-B997-01423ABFE3B7%2FBE26726D-B8E5-47C8-9A18-504D23B99090_2.jpg?alt=media&token=123uyqtr"]
Give this one a try.
You can split based on your jpg suffix, and then based on your UUID.
let sortedArray = array.sorted { (first, second) -> Bool in
let firstIndex = Int((first.components(separatedBy: ".jpg")[0]).components(separatedBy: "FBE26726D-B8E5-47C8-9A18-504D23B99090_")[1]) ?? -1
let secondIndex = Int((second.components(separatedBy: ".jpg")[0]).components(separatedBy: "FBE26726D-B8E5-47C8-9A18-504D23B99090_")[1]) ?? -1
return firstIndex < secondIndex
}
I use the CSV Type Provider quite a bit which uses tuples for all the underlying data. My issue is that I run into csv files with lots of columns and find that building the infrastructure around manipulating data of that size to be rather cumbersome. How do I build a function that will update only a specific subset of the columns without having to create huge pattern matching statements? See code example below.
// I want to generate lots of sample data for the first three columns
// And I would like to avoid creating giant record types in order to
// do this task as the csv's are always changing and the records would
// have to change as well
type CumbersomeCSVRow = // 17 fields now, but imagine 212
string * string * string * Option<int> * string *
Option<int> * string * string * string * string *
string * string * Option<float> * string *
Option<float> * Option<float> * string
// Here is a sample row of data
let sampleCumbersomeRow : CumbersomeCSVRow =
("First","Second","Third",Some(52),"MSCI",None,"B74A123","",
"Airlines","Transportation","","",Some(1.04293),"Updated",
Some(0.95),Some(56.7423),"Measured")
// Record type of the sample data that I want to 'insert' into the cumbersome data type
type FirstThreeStrings =
{ First : string; Second : string; Third : string}
// and some instances of the new data
let generatedFrontMatters =
// imagine lots of sample data and workflows to create it, hence the records
seq { for letter in ["A";"B";"C"] ->
{ First = letter;
Second = letter + letter
Third = letter + letter + letter } }
Take a look at the FSharp.Reflection module
// Your Code --------------------------------------------------
// I want to generate lots of sample data for the first three columns
// And I would like to avoid creating giant record types in order to
// do this task as the csv's are always changing and the records would
// have to change as well
type CumbersomeCSVRow = // 17 fields now, but imagine 212
string * string * string * Option<int> * string *
Option<int> * string * string * string * string *
string * string * Option<float> * string *
Option<float> * Option<float> * string
// Here is a sample row of data
let sampleCumbersomeRow : CumbersomeCSVRow =
("First","Second","Third",Some(52),"MSCI",None,"B74A123","",
"Airlines","Transportation","","",Some(1.04293),"Updated",
Some(0.95),Some(56.7423),"Measured")
// Record type of the sample data that I want to 'insert' into the cumbersome data type
type FirstThreeStrings =
{ First : string; Second : string; Third : string}
// and some instances of the new data
let generatedFrontMatters =
// imagine lots of sample data and workflows to create it, hence the records
seq { for letter in ["A";"B";"C"] ->
{ First = letter;
Second = letter + letter
Third = letter + letter + letter } }
// Response ---------------------------------------------------
open FSharp.Reflection
// create a static back matter to append to
// the slicing here is the part that will need to change as the data changes
// ++ maybe there's a better way to handle this part??
let staticBackMatter =
sampleCumbersomeRow
|> FSharpValue.GetTupleFields
|> (fun x -> x.[3 .. 16])
// cast the front matter using FSharp.Reflection
let castedFrontMatters = Seq.map FSharpValue.GetRecordFields generatedFrontMatters
// append the data arrays together, create a tuple using reflection, and downcast the result
// to your type
let generatedRows =
castedFrontMatters
|> Seq.map (fun frontArray -> Array.append frontArray staticBackMatter)
|> Seq.map (fun data -> FSharpValue.MakeTuple(data,typeof<CumbersomeCSVRow>) :?> CumbersomeCSVRow)
|> Seq.toList
Whether this is proper F#, I don't know.
i am reading my data as a frame on F# as follows
let myannual = Frame.ReadCsv("data/annual.csv")
My frame consists of time series columns and a year column, and I would like to index my time series by year. I cannot do it as follows
let myyears = [| for i in myannual.GetColumn<float>("yyyy").Values -> float i |]
let myindexedframe = myannual.IndexRows(myyears)
What should I do? Any feedback is appreciated!
The ReadCsv method takes an optional parameter indexCol that can be used to specify the index column - and you also need to provide a type parameter to tell Deedle what is the type of the index:
let myannual = Frame.ReadCsv<int>("data/annual.csv", indexCol="yyy")
Your approach would work too, but you'd need to use IndexRowsWith, which takes a sequence of new indices (and it is better to use int because float is imprecise for years):
let myyears = [| for i in myannual.GetColumn<float>("yyyy").Values -> int i |]
let myindexedframe = myannual.IndexRowsWith(myyears)
The IndexRows method takes just the name of a column (very similar to using the indexCol parameter when calling ReadCsv):
let myindexedframe = myannual.IndexRows<int>("yyyy")
I'm trying to create a communication library that interacts with hardware. The protocol is made up of byte arrays with a header (source/destination address, command number, length) and a command specific payload. I'm creating Record Types for each of the commands to make them more user friendly.
Is there a more idiomatic way of converting an array to a record than
let data = [0;1]
type Rec = {
A : int
B : int
}
let convert d =
{
A = d.[0]
B = d.[1]
}
This can become very tedious when the records are much larger.
A few comments:
You record type definition is bogus - there should be no = in there. I assume you want
type Rec = {
A : int
B : int
}
You mentioned byte arrays, but your data value is a List. Accessing List items by index is expensive (O(n)) and should be avoided. If you meant to declare it as an array, the syntax is let data = [|0;1|]
But I wonder if records are the right fit here. If your goal is to have a single function that accepts a byte array and returns back various strongly-typed interpretations of that data, then a discriminated union might be best.
Maybe something along these lines:
// various possible command types
type Commands =
| Command1 of byte * int // maybe payload of Command1 is known to be an int
| Command2 of byte * string // maybe payload of Command1 is known to be a string
// active pattern for initial data decomposition
let (|Command|) (bytes : byte[]) =
(bytes.[0], bytes.[1], Array.skip 2 bytes)
let convert (bytes : byte[]) =
match bytes with
| Command(addr, 1uy, [| intData |]) ->
Command1(addr, int intData)
| Command(addr, 2uy, strData) ->
Command2(addr, String(Text.Encoding.ASCII.GetChars(strData)))
| _ ->
failwith "unknown command type"
// returns Command1(0x10, 42)
convert [| 0x10uy; 0x01uy; 0x2Auy |]
// returns Command2(0x10, "foobar")
convert [| 0x10uy; 0x02uy; 0x66uy; 0x6Fuy; 0x6Fuy; 0x62uy; 0x61uy; 0x72uy |]