I have a text file of rain measured over 3 years, where each number after the year corresponds to the month.
For example, in
2002 1.17 0.78 7.11 5.17 5.84 4.29 1.12 4.06 1.9 2.07 1.47 1.53
2001 1.24 3.22 1.61 3.33 6.55 2.4 3.5 1.32 3.9 6.04 1.69 1.13
2000 1.277 1.4 1.17 5.74 6.48 4.81 4.07 3.19 6.2 1.95 2.65 1.7
In 2002, Average rainfall in Feb was 0.78.
I made a list of tuples called mylist, in the format (year,values,average,min,max) where years is int, values is a float list, average is an int that averages all of 'values', min is an int holding the smallest 'value' and max.
My question:
How do I calculate the average of the n'th elements in the list, like the average of month January, Feb, March....
I have:
let months = [ "Jan"; "Feb"; "Mar"; "Apr"; "May"; "Jun"; "Jul"; "Aug"; "Sep"; "Oct"; "Nov"; "Dec" ] //string list
and I'm thinking of something along the lines of:
mylist |> List.map (fun (_,values, _, _, _) -> average 0th index across all tuples, print Jan, then average 1st index across all tuples, print Feb, etc...)
or
mylist |> List.map (fun (_,values, _, _, _) -> printfn"%A %A" List.average 0thIndex,0thMonth....List.average 1stIndex, 1stMonth, etc...)
But I'm not familiar enough with the functional language to know all operations on lists and maps. Am more comfortable with Java and C
I would map values to list of lists:
let vs = mylist |> List.map (fun (_, values, _, _, _) -> values)
Then transpose it to get list of lists of values in months.
let mvs = vs |> transpose
And then calculate averages using:
let avgs = mvs |> List.map List.average
Use transpose from this answer.
Oh, and if you want to print them in a nice way:
avgs |> List.iteri (fun i avg -> printfn "Month %s average: %d" months.[i] avg)
Related
Good evening! I am a very new programmer getting my feet wet with F#. I am attempting to do some simple data analysis and plotting but I cannot figure out how access the data properly. I get everything set up and use the CSVProvider and it works perfectly:
#load #"packages\FsLab\FsLab.fsx"
#load #"packages\FSharp.Charting\FSharp.Charting.fsx"
open Deedle
open FSharp.Data
type Pt = CsvProvider<"C:/Users/berkl/Test10/CGC.csv">
let data = Pt.Load("C:/Users/berkl/Test10/CGC.csv")
Then, I pull out the data for a specific entry:
let test = data.Rows |> Seq.filter (fun r -> r.``Patient number`` = 2104)
This works as expected and prints the following to FSI:
test;;
val it : seq<CsvProvider<...>.Row> =
seq
[(2104, "Cita 1", "Nuevo", "Femenino", nan, nan, nan);
(2104, "Cita 2", "Establecido", "", 18.85191818, 44.0, 103.0);
(2104, "Cita 3", "Establecido", "Femenino", 17.92617533, 46.0, 108.0);
(2104, "Cita 4", "Establecido", "Femenino", nan, nan, nan); ...]
Here is where I'm at a loss. I want to take out the fifth column and plot it against the sixth column. And I don't know how to access it.
What I can do so far is access a single value in one of the columns:
let Finally = Seq.item 1 test
let PtHt = Finally.Ht_cm
Any help is much appreciated!!
I would probably recommend using the XPlot library instead of F# Charting, because that is the one that's going to be available in FsLab in the long term (it is cross-platform).
To create a chart using XPlot, you need to give it a sequence of pairs with X and Y values:
#load "packages/FsLab/FsLab.fsx"
open XPlot.Plotly
Chart.Scatter [ for x in 0.0 .. 0.1 .. 10.0 -> x, sin x ]
In your example, you can get the required format using sequence comprehensions (as in the above example) or using Seq.map as in the existing answer - both options do the same thing:
// Using sequence comprehensions
Chart.Scatter [ for row in test -> row.Ht_cm, row.Wt_kg ]
// Using Seq.map and piping
test |> Seq.map (fun row -> row.Ht_cm, row.Wt_kg) |> Chart.Scatter
The key thing is that you need to produce one sequence (or a list) containing the X and Y values as a tuple (rather than producing two separate sequences).
What you want to do is transform your sequence of rows to a sequence of values from a column. You use Seq.map for any such transformation.
In your case, you could do (modulo the correct column names which I don't have)
let col5 =
test
|> Seq.map (fun row -> row.Ht_cm)
let col6 =
test
|> Seq.map (fun row -> row.Wt_kg)
I'm an R developer that is interested in getting good at F# so this question is part of a broader theme of how to shape and reshape data.
Question:
There are three months in the NYC Flight Delays dataset where there were more than 7000 weather delays. I would like to filter out all other months so that I have only those three months alone to analyze. How would this be done in F#? Is the long-term F# solution just to call R? Or are there robust data libraries in .NET that can already do these sort of tasks.
You can use the CSV Type Provider from FSharp.Data to get strongly typed access to your data, even directly from the internet address:
#r "../packages/FSharp.Data.2.2.5/lib/net40/FSharp.Data.dll"
open System
open FSharp.Data
type FlightDelays =
CsvProvider<"https://raw.githubusercontent.com/wiki/arunsrinivasan/flights/NYCflights14/delays14.csv">
This gives you strongly typed access to the data source. As an example, to find all the months with weather delays more than 7000, you can do something like this:
let monthsWithDelaysOver7k =
FlightDelays.GetSample().Rows
|> Seq.filter (fun r -> not (Double.IsNaN r.Weather_delay))
|> Seq.groupBy (fun r -> r.Year, r.Month)
|> Seq.map (fun ((y, m), rs) -> y, m, rs |> Seq.sumBy (fun r -> r.Weather_delay))
|> Seq.filter (fun (y, m, d) -> d >= 7000.)
Converted to a list, the data looks like this:
> monthsWithDelaysOver7k |> Seq.toList;;
val it : (int * int * float) list =
[(2014, 1, 118753.0); (2014, 2, 59567.0); (2014, 4, 7618.0);
(2014, 5, 11594.0); (2014, 6, 15928.0); (2014, 7, 54298.0);
(2014, 10, 7241.0)]
You can now use monthsWithDelaysOver7k to get all the rows in those months.
You can probably write some more efficient queries than the above, but this should give you an idea about how to approach the problem.
I have a text file of rain measured over 3 years, where each number after the year corresponds to the month.
For example, in
2002 1.17 0.78 7.11 5.17 5.84 4.29 1.12 4.06 1.9 2.07 1.47 1.53
2001 1.24 3.22 1.61 3.33 6.55 2.4 3.5 1.32 3.9 6.04 1.69 1.13
2000 1.277 1.4 1.17 5.74 6.48 4.81 4.07 3.19 6.2 1.95 2.65 1.7
In 2002, Average rainfall in Feb was 0.78.
I made a list of tuples, in the format (year,values,average,min,max) where years is int, values is a float list, average is an int that averages all of 'values', min is an int holding the smallest 'value' and max.
My question:
I am able to print the smallest value from all the tuples using List.minBy fourth mylist and the year it came from (because its single element), but how do I correspond that number to the month it came from?
I have
let TupleWithSmallestValue= List.minBy min biglist
let (year,_,_,min,_) = TupleWithSmallestValue
printfn"Min rainfall:\t%A; %A" min year
and I was thinking something along the lines of :
List.map (fun (year,(Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec), avg, min, max) -> somehow print the min and the month it came from )
But I know that's wrong and I'm trying to address each value to make it correspond to a month, just like I did with my example above. If my question is clear enough, how do I do this? I am a C++ programmer but I like the style of this language. Just missing some basics.
Something like this should work:
let months = [ "Jan"; "Feb"; ...; "Dec" ]
let (year, values, _, min, _) = TupleWithSmallestValue
let month = values |> List.mapi (fun i x -> (i, x))
|> List.find (fun (m, v) -> v = min)
|> fst
|> List.nth months
I have a record that holds one DateTime array and 2 double arrays all of the same length and related by index.
I want to get the average of the delta between the 2 double arrays based the hour of the DateTime. So I'll have 24 averages in the end. All data in index 0 of the arrays are related / happen at the same time, and so on for all the indexes. Perhaps I should instead have 1 array of tuples or records that each holds just one datetime, and 2 doubles.
But anyway this is what I have so far:
let data2 = [ for i in 0..data.Date.Length-1 do
yield data.Date.[i].Hour, data.High.[i] - data.Low.[i]]
And here is where my inexperience hurts the most. The only thing I can think of is to do some kind of matching or if statements that go through all of those 24 hours (0 - 23) and having individual mutable values for each hour. There must be an easier way. I've been unsuccessful so far in finding a way.
I think you want to do
let grouped = data2 |> Seq.GroupBy (fst) |> Seq.map (fun (a,b) -> Seq.average (b |> Seq.map (snd)))
Here Seq.groupBy will group all the elements which have an identical first element. You can then take the average with Seq.average.
Note:
I think your original expression for data2 would be better written as
data.Date |> Array.mapi (fun i t -> t.Hour,data.High.[i]-data.Low.[i])
I would like to sort some tab separated data that is of the following form.
Marketing, Advertising, PR Graduate, Trainees Oil, Gas, Alternative Energy
Marketing, Advertising, PR Graduate, Trainees Public Sector & Services
Marketing, Advertising, PR Graduate, Trainees Recruitment Sales
Marketing, Advertising, PR Graduate, Trainees Secretarial, PAs, Administration
Marketing, Advertising, PR Graduate, Trainees Senior Appointments
Marketing, Advertising, PR Graduate, Trainees Telecommunications
Marketing, Advertising, PR Graduate, Trainees Transport, Logistics
Other Graduate, Trainees Banking, Insurance, Finance
Other Graduate, Trainees Customer Services
Other Graduate, Trainees Education
Other Graduate, Trainees Health, Nursing
Other Graduate, Trainees Legal
Other Graduate, Trainees Management Consultancy
There is a mixture of single phrases words and multi word phrases. The words of the phrases have commas between them. The phrases are tab delimited.
I need to compare it with another set of data where the text cells have been helpfully sorted alphabetically.
Obviously this makes direct comparison difficult (impossible).
Following ovastus's suggestion below I have the following code
open System;;
open System.IO;;
#load #"BigDataModule.fs";;
open BigDataModule;;
let sample = "TruncatedData.txt";;
let outputFile = "SortedOutput.csv";;
let sortWithinRow (row:string) =
let columns = row.Split([|'\t'|])
let sortedColumns =
Seq.append
(columns |> Seq.take (columns.Length) |> Seq.sort)
[ columns.[columns.Length - 1] ]
sortedColumns |> String.concat ",";;
sample |> readLines |> Seq.map sortWithinRow |> saveTo (outputFile);;
Where readLines and saveTo are functions in my own Big Data module for reading in files and saving outputs.
When I get the output from this script, unfortunately the sort has not produced the desired result and the rows are still not sorted alphabetically.
If anyone can help me to further refine my script I will be very grateful.
I apologise for wasting time, having originally underdetermined the problem by oversimplifying the format of the input.
EDIT 1: Clarified I have saved the data as a csv file and will do this in F#.
EDIT 2: I have gotten rid of all of the extraneous parts of the data set, I just need to sort within these rows. I have also given further details of some code I have tried.
EDIT 3:
This was the original data frame I entered, which was an oversimplification
Alpha Bravo Tango Delta 15.00
Bravo Delta Tango 20.30
Delta Alpha Tango 6.17
Charlie Tango Foxtrot Alpha 19.13
I'm not sure if I understand correctly what you want, but if you want to generate this output:
Alpha Bravo Delta Tango 15.00
Bravo Delta Tango 20.30
Alpha Delta Tango 6.17
Alpha Charlie Foxtrot Tango 19.13
You can do it like this:
open System
let sample = """Alpha Bravo Tango Delta 15.00
Bravo Delta Tango 20.30
Delta Alpha Tango 6.17
Charlie Tango Foxtrot Alpha 19.13""".Split [|'\n'|]
let sortWithinRow (row:string) =
let columns = row.Split([|' '|], StringSplitOptions.RemoveEmptyEntries)
let sortedColumns =
Seq.append
(columns |> Seq.take (columns.Length - 1) |> Seq.sort)
[ columns.[columns.Length - 1] ]
sortedColumns |> String.concat " "
sample |> Seq.map sortWithinRow |> String.concat "\n"
What about the following?
sample |>
Seq.map (fun x -> x.Split('\t')) |>
Seq.map (Seq.map (fun x -> x.Trim())) |>
Seq.map (Seq.filter (fun x -> not (String.IsNullOrEmpty(x)))) |>
Seq.map Seq.sort |>
Seq.map (String.concat '\t') |>
String.concat '\n';;
I can't type \t in a way that will paste for an example, so for an executable example I had to switch field delimiters to spaces
open System
let sample2 = """Alpha Bravo Tango Delta 15.00
Bravo Delta Tango 20.30
Delta Alpha Tango 6.17
Charlie Tango Foxtrot Alpha 19.13""".Split [|'\n'|]
sample2 |>
Seq.map (fun x -> x.Split([|" "|], StringSplitOptions.None)) |>
Seq.map (Seq.map (fun x -> x.Trim())) |>
Seq.map (Seq.filter (fun x -> not (String.IsNullOrEmpty(x)))) |>
Seq.map Seq.sort |>
Seq.map (String.concat '\t') |>
String.concat '\n';;
Try using F# Data
[<Literal>]
let sample = """Text1,Text2,Text3,Text4,ValueField
Alpha,Bravo,Tango,Delta,15.00
Bravo,Delta,Tango,,20.30
Delta,Alpha,Tango,,6.17
Charlie,Tango,Foxtrot,Alpha,19.13"""
open FSharp.Data
let csv = CsvProvider<sample, Separator = ",">.Load("input.csv")
let sortedData =
csv.Data
|> Seq.sortBy (fun row -> row.Text1)
|> Seq.map (fun row -> row.Columns |> String.concat ",")
System.IO.File.WriteAllLines("output.csv", sortedData)
If you want to sort by multiple fields you can just tuple them in the sorting function:
|> Seq.sortBy (fun row -> row.Text1, row.Text3)