I am looking to merge two Deedle (F#) frames based on a specific column in each frame in a similar manner as pandas.DataFrame.Merge.The perfect example of this would be a primary frame that contains columns of data and a (city, state) column along with an information frame that contains the following columns: (city, state); lat; long. If I want to add the lat long columns into my primary frame, I would merge the two frames on the (city, state) column.
Here is an example:
let primaryFrame =
[(0, "Job Name", box "Job 1")
(0, "City, State", box "Reno, NV")
(1, "Job Name", box "Job 2")
(1, "City, State", box "Portland, OR")
(2, "Job Name", box "Job 3")
(2, "City, State", box "Portland, OR")
(3, "Job Name", box "Job 4")
(3, "City, State", box "Sacramento, CA")] |> Frame.ofValues
let infoFrame =
[(0, "City, State", box "Reno, NV")
(0, "Lat", box "Reno_NV_Lat")
(0, "Long", box "Reno_NV_Long")
(1, "City, State", box "Portland, OR")
(1, "Lat", box "Portland_OR_Lat")
(1, "Long", box "Portland_OR_Long")] |> Frame.ofValues
// see code for merge_on below.
let mergedFrame = primaryFrame
|> merge_On infoFrame "City, State" null
Which would result in 'mergedFrame' looking like this:
> mergedFrame.Format();;
val it : string =
" Job Name City, State Lat Long
0 -> Job 1 Reno, NV Reno_NV_Lat Reno_NV_Long
1 -> Job 2 Portland, OR Portland_OR_Lat Portland_OR_Long
2 -> Job 3 Portland, OR Portland_OR_Lat Portland_OR_Long
3 -> Job 4 Sacramento, CA <missing> <missing>
I have come up with a way of doing this (the 'merge_on' function used in the example above), but being a Sales Engineer who is new to F#, I imagine there is a more idiomatic/efficient way of doing this. Below is my functions for doing this along with a 'removeDuplicateRows' which does what you would expect and was needed for the 'merge_on' function; if you want to comment on a better way of doing this as well, please do.
let removeDuplicateRows column (frame : Frame<'a, 'b>) =
let nonDupKeys = frame.GroupRowsBy(column).RowKeys
|> Seq.distinctBy (fun (a, b) -> a)
|> Seq.map (fun (a, b) -> b)
frame.Rows.[nonDupKeys]
let merge_On (infoFrame : Frame<'c, 'b>) mergeOnCol missingReplacement
(primaryFrame : Frame<'a,'b>) =
let frame = primaryFrame.Clone()
let infoFrame = infoFrame
|> removeDuplicateRows mergeOnCol
|> Frame.indexRows mergeOnCol
let initialSeries = frame.GetColumn(mergeOnCol)
let infoFrameRows = infoFrame.RowKeys
for colKey in infoFrame.ColumnKeys do
let newSeries =
[for v in initialSeries.ValuesAll do
if Seq.contains v infoFrameRows then
let key = infoFrame.GetRow(v)
yield key.[colKey]
else
yield box missingReplacement ]
frame.AddColumn(colKey, newSeries)
frame
Thanks for your help!
UPDATE:
Switched Frame.indexRowsString to Frame.indexRows to handle cases where the types in the 'mergOnCol' are not strings.
Got rid of infoFrame.Clone() as suggested by Tomas
The way Deedle does joining of frames (only in row/column keys) sadly means that it does not have a nice built-in function to do joining of frames over a non-key column.
As far as I can see, your approach looks very good to me. You do not need Clone on the infoFrame (because you are not mutating the frame) and I think you can replace infoFrame.GetRow with infoFrame.TryGetRow (and then you won't need to get the keys in advance), but other than that, your code looks fine!
I came up with an alternative and a bit shorter way of doing this, which looks as follows:
// Index the info frame by city/state, so that we can do lookup
let infoByCity = infoFrame |> Frame.indexRowsString "City, State"
// Create a new frame with the same row indices as 'primaryFrame'
// containing the additional information from infoFrame.
let infoMatched =
primaryFrame.Rows
|> Series.map (fun k row ->
// For every row, we get the "City, State" value of the row and then
// find the corresponding row with additional information in infoFrame. Using
// 'ValueOrDefault' will automatically give missing when the key does not exist
infoByCity.Rows.TryGet(row.GetAs<string>("City, State")).ValueOrDefault)
// Now turn the series of rows into a frame
|> Frame.ofRows
// Now we have two frames with matching keys, so we can join!
primaryFrame.Join(infoMatched)
This is a bit shorter and maybe more self-explanatory, but I have not done any tests to check which is faster. Unless performance is a primary concern, I think going with the more readable version is a good default choice though!
Related
I am looking for a way to sum two sequences by element in F#.
For example, if I have these two sequences:
let first = seq [ 183.24; 170.15;174.17]
let second = seq [25.524;24.069;24.5]
I want to get the following result:
third list = [208.764;194.219;198.67]
What would be the simplest or the best way to achieve this?
You can use the zip function :
let third = Seq.zip first second |> Seq.map (fun (x, y) -> x + y)
It will create a new sequence with a tuple where the first element is from first and second form second, then you can map and apply the addition of both elements.
As pointed in the comments, map2 is another option, we could say that map2 is equivalent to zip followed by map.
The easies way to do this - use Seq.map2
let first = seq [ 183.24; 170.15;174.17]
let second = seq [25.524;24.069;24.5]
//seq [208.764; 194.219; 198.67]
let third = Seq.map2 (+) first second
I want to map over the values of the Title column of my dataframe.
The solution I came up with is the following:
df.Columns.[ [ "Title"; "Amount" ] ]
|> Frame.mapCols(fun k s ->
if k = "Title"
then s |> Series.mapValues (string >> someModif >> box)
else s.Observations |> Series)
Since s is of type ObjectSeries<_> I have to cast it to string, modify it then box it back.
Is there a recommended way to map over the values of a single column?
Another option would be to add a TitleMapped column with:
df?TitleMapped <- df?Title |> Series.mapValues (...your mapping fn...)
...and then throw away the Title column with df |> Frame.dropCol "Title" (or not bother if you don't care whether it stays or not).
Or, if you don't like the "imperativeness" of <-, you can do something like:
df?Title
|> Series.mapValues (...your mapping fn...)
|> fun x -> Frame( ["Title"], [x] )
|> Frame.join JoinKind.Left (df |> Frame.dropCol "Title")
You can use GetColumn:
df.GetColumn<string>("Title")
|> Series.mapValues(someModif)
Or in more F#-style:
df
|> Frame.getCol "Title"
|> Series.mapValues(string >> someModif)
In some cases, you may want to map over values of a specific column and keep that mapped column in the frame. Supposing we have a frame called someFrame with 2 columns (Col1 and Col2) and we want to transform Col1 (for example, Col1 + Col2), what I usually do is:
someFrame
|> Frame.replaceCol "Col1"
(Frame.mapRowValues (fun row ->
row.GetAs<float>("Col1") + row.GetAs<float>("Col2"))
someFrame)
If you want to create a new column instead of replacing it, all you have to do is to change the "replaceCol" method for "addCol" and choose a new name for the column instead of "Col1" of the given example. I don't know if this is the most efficient way, but it worked for me so far.
//Return a tuple from a text file:
let ExtractFromLine (line:string) =
let strings = line.Split('\t') //data members are spaced by tab
let strlist = Array.toList(strings) //each data member is now a list of str
let year = System.Int32.Parse(strlist.Head) //year is first in file, so Head
let values = List.map System.Double.Parse strlist.Tail //tail are all values
let average = List.average values //not part of text file
let min = List.min values //not part of text file
let max = List.max values //not part of text file
(year, values, average, min, max) //return tuple with all info
//----------
let rec createList fileline =
if fileline = [] then
[]
else
let (year, values, average, min, max) = ExtractFromLine fileline.Head
let l = (year, values, average, min, max) :: createList fileline.Tail
l
//------------
let main argv =
let file = ReadFile "data.txt"
let biglist = createList file //recursive function to make a list of tuples
printfn"%A" biglist //This prints the year, all values, average, min, and max for each tuple created
I now have a giant list of tuples with all of the information that I need.
Have I retained the possibility of accessing all elements inside and performing calculations on them? I program in C++, and the solution is doable in that language, but F# is so much more powerful in my opinion. I'm sure its possible, I'm just missing the basics.
For example, how do I print the average of all the values for all years?
I'm thinking of a for loop, but I'm not sure how to iterate.
for(all tuples in biglist)
printfn"%A:%A" tuple.year tuple.average
It's wrong obviously, but I think you guys understand what I'm trying to do.
The above question involves pulling data from one tuple at a time across the list. What if I wanted to print the largest average?This would involve accessing each tuple's average data member and comparing them to return the largest one. Do I have to create another list containing these averages?
I learned about fst and snd but I had a hard time applying it to this example.
You don't have to answer all questions if it is too much, but any help is greatly appreciated as I start out in this language, thank you
You can loop in F# but it's a construct from imperative programming world. More idiomatic approach is to access items of the list recursively.
Below some sample code that creates tuples, constructs a list, and then access items and checks which one is bigger. Looking at your code the average was third item in the tuple. That's why I've added a trd function. It takes a 5-item tuple and returns a third item.
The prcsLst function takes 2 arguments: a list and a starting max value. The idea is that when processing the list we take the head (first item on the list), compare it's average with current max. Whichever is bigger is passed to the next recursive round together with list's tail (the list without the first item).In this case as the initial max I passed in the average of the first item.
You can run the example in F# Interactive to see the results.
// create sample tuples
let t1 = (2014, 35, 18, 5, 45)
let t2 = (2014, 32, 28, 8, 75)
let t3 = (2014, 25, 11, 9, 55)
let t4 = (2015, 16, 13, 2, 15)
let t5 = (2015, 29, 15, 1, 35)
// create sample list
let lst = [t1;t2;t3;t4;t5]
// a function to return third item in a tuple
let trd (_,_,t,_,_) = t
// process list recursively
let rec prcsLst l max =
match l with
| [] -> max
| hd::tl ->
if (trd hd) > max then
prcsLst tl (trd hd)
else
prcsLst tl max
// invoke the method on the sample list
// as a starting point use the first item in the list
prcsLst lst (trd t1);;
On a mobile so forgive me for not doing any code examples :)
I suspect that the missing piece of your puzzle is called pattern matching. In F# you address elements of a tuple like so:
let (y, v, Av, mn, mx) = mytuple
Note that you can also use this in function declarations, and when doing a 'match'.
(there is an exception for 'twoples' where you can use the functions 'fst' and 'snd')
Another thing you should play with is the |> operator.
I am new to F# and functional programming in general. Given a scenario where you want to iterate over a sequence or list of strings, and map that to a new list of a different type, WITH an accumulator, what is the correct functional approach? I can achieve this in F# using mutable variables, but I am struggling to find the right function to use for this. It's similar to map I think, but there is the notion of state.
In other words, I want to transform a list of strings into a list of win forms radio buttons, but for each new button I want to add 20 to the previous y coordinate. Something like:
new RadioButton(Text=str,Location=new Point(20,y+20),Width=350)
You can use List.fold:
open System.Drawing
open System.Windows.Forms
let getButtons () =
let strings = ["a"; "b"; "c"]
let (_, pointsRev) = List.fold (fun (offset, l) s -> (offset+20, (new RadioButton(Text=s, Location = new Point(20, offset), Width = 350))::l)) (0, []) strings
pointsRev |> List.rev
The state is a pair containing the current offset and the current output list. The output list is built in reverse order so has to be reversed at the end.
You could also use Seq.map2:
let points = Seq.map2 (fun offset s -> new RadioButton(Text=s, Location = new Point(20, offset)) (Seq.initInfinite ((*)20)) strings |> List.ofSeq
You can access and change variable by reference alike
let x = ref 0
x := !x + 5
new Point(20,!x+20)
and you can use such variable inside closures.
Also you can use mapi : http://msdn.microsoft.com/en-us/library/ee353425.aspx
And add value to y based on i alike new Point(20,i*20+20)
Using List.fold is a great idea (see the accepted answer).
Being an F# beginner myself, I split the fold out into a separate function and renamed some variables so I could understand things more clearly. This seems to work:
let buttonNames = ["Button1Name"; "Button2Name"]
let createRadioButton (offset, radioButtons) name =
let newRadioButton = new RadioButton(Text=name, Location=new Point(20, offset), Width=350)
(offset + 20, newRadioButton::radioButtons)
let (_, buttonsReversed) = buttonNames |> List.fold createRadioButton (0, [])
let buttons = buttonsReversed |> List.rev
Anyone have a decent example, preferably practical/useful, they could post demonstrating the concept?
(Edit: a small Ocaml FP Koan to start things off)
The Koan of Currying (A koan about food, that is not about food)
A student came to Jacques Garrigue and said, "I do not understand what currying is good for." Jacques replied, "Tell me your favorite meal and your favorite dessert". The puzzled student replied that he liked okonomiyaki and kanten, but while his favorite restaurant served great okonomiyaki, their kanten always gave him a stomach ache the following morning. So Jacques took the student to eat at a restaurant that served okonomiyaki every bit as good as the student's favorite, then took him across town to a shop that made excellent kanten where the student happily applied the remainder of his appetite. The student was sated, but he was not enlightened ... until the next morning when he woke up and his stomach felt fine.
My examples will cover using it for the reuse and encapsulation of code. This is fairly obvious once you look at these and should give you a concrete, simple example that you can think of applying in numerous situations.
We want to do a map over a tree. This function could be curried and applied to each node if it needs more then one argument -- since we'd be applying the one at the node as it's final argument. It doesn't have to be curried, but writing another function (assuming this function is being used in other instances with other variables) would be a waste.
type 'a tree = E of 'a | N of 'a * 'a tree * 'a tree
let rec tree_map f tree = match tree with
| N(x,left,right) -> N(f x, tree_map f left, tree_map f right)
| E(x) -> E(f x)
let sample_tree = N(1,E(3),E(4)
let multiply x y = x * y
let sample_tree2 = tree_map (multiply 3) sample_tree
but this is the same as:
let sample_tree2 = tree_map (fun x -> x * 3) sample_tree
So this simple case isn't convincing. It really is though, and powerful once you use the language more and naturally come across these situations. The other example with some code reuse as currying. A recurrence relation to create prime numbers. Awful lot of similarity in there:
let rec f_recurrence f a seed n =
match n with
| a -> seed
| _ -> let prev = f_recurrence f a seed (n-1) in
prev + (f n prev)
let rowland = f_recurrence gcd 1 7
let cloitre = f_recurrence lcm 1 1
let rowland_prime n = (rowland (n+1)) - (rowland n)
let cloitre_prime n = ((cloitre (n+1))/(cloitre n)) - 1
Ok, now rowland and cloitre are curried functions, since they have free variables, and we can get any index of it's sequence without knowing or worrying about f_recurrence.
While the previous examples answered the question, here are two simpler examples of how Currying can be beneficial for F# programming.
open System.IO
let appendFile (fileName : string) (text : string) =
let file = new StreamWriter(fileName, true)
file.WriteLine(text)
file.Close()
// Call it normally
appendFile #"D:\Log.txt" "Processing Event X..."
// If you curry the function, you don't need to keep specifying the
// log file name.
let curriedAppendFile = appendFile #"D:\Log.txt"
// Adds data to "Log.txt"
curriedAppendFile "Processing Event Y..."
And don't forget you can curry the Printf family of function! In the curried version, notice the distinct lack of a lambda.
// Non curried, Prints 1 2 3
List.iter (fun i -> printf "%d " i) [1 .. 3];;
// Curried, Prints 1 2 3
List.iter (printfn "%d ") [1 .. 3];;
Currying describes the process of transforming a function with multiple arguments into a chain of single-argument functions. Example in C#, for a three-argument function:
Func<T1, Func<T2, Func<T3, T4>>> Curry<T1, T2, T3, T4>(Func<T1, T2, T3, T4> f)
{
return a => b => c => f(a, b, c);
}
void UseACurriedFunction()
{
var curryCompare = Curry<string, string, bool, int>(String.Compare);
var a = "SomeString";
var b = "SOMESTRING";
Console.WriteLine(String.Compare(a, b, true));
Console.WriteLine(curryCompare(a)(b)(true));
//partial application
var compareAWithB = curryCompare(a)(b);
Console.WriteLine(compareAWithB(true));
Console.WriteLine(compareAWithB(false));
}
Now, the boolean argument is probably not the argument you'd most likely want to leave open with a partial application. This is one reason why the order of arguments in F# functions can seem a little odd at first. Let's define a different C# curry function:
Func<T3, Func<T2, Func<T1, T4>>> BackwardsCurry<T1, T2, T3, T4>(Func<T1, T2, T3, T4> f)
{
return a => b => c => f(c, b, a);
}
Now, we can do something a little more useful:
void UseADifferentlyCurriedFunction()
{
var curryCompare = BackwardsCurry<string, string, bool, int>(String.Compare);
var caseSensitiveCompare = curryCompare(false);
var caseInsensitiveCompare = curryCompare(true);
var format = Curry<string, string, string, string>(String.Format)("Results of comparing {0} with {1}:");
var strings = new[] {"Hello", "HELLO", "Greetings", "GREETINGS"};
foreach (var s in strings)
{
var caseSensitiveCompareWithS = caseSensitiveCompare(s);
var caseInsensitiveCompareWithS = caseInsensitiveCompare(s);
var formatWithS = format(s);
foreach (var t in strings)
{
Console.WriteLine(formatWithS(t));
Console.WriteLine(caseSensitiveCompareWithS(t));
Console.WriteLine(caseInsensitiveCompareWithS(t));
}
}
}
Why are these examples in C#? Because in F#, function declarations are curried by default. You don't usually need to curry functions; they're already curried. The major exception to this is framework methods and other overloaded functions, which take a tuple containing their multiple arguments. You therefore might want to curry such functions, and, in fact, I came upon this question when I was looking for a library function that would do this. I suppose it is missing (if indeed it is) because it's pretty trivial to implement:
let curry f a b c = f(a, b, c)
//overload resolution failure: there are two overloads with three arguments.
//let curryCompare = curry String.Compare
//This one might be more useful; it works because there's only one 3-argument overload
let backCurry f a b c = f(c, b, a)
let intParse = backCurry Int32.Parse
let intParseCurrentCultureAnyStyle = intParse CultureInfo.CurrentCulture NumberStyles.Any
let myInt = intParseCurrentCultureAnyStyle "23"
let myOtherInt = intParseCurrentCultureAnyStyle "42"
To get around the failure with String.Compare, since as far as I can tell there's no way to specify which 3-argument overload to pick, you can use a non-general solution:
let curryCompare s1 s2 (b:bool) = String.Compare(s1, s2, b)
let backwardsCurryCompare (b:bool) s1 s2 = String.Compare(s1, s2, b)
I won't go into detail about the uses of partial function application in F# because the other answers have covered that already.
It's a fairly simple process. Take a function, bind one of its arguments and return a new function. For example:
let concatStrings left right = left + right
let makeCommandPrompt= appendString "c:\> "
Now by currying the simple concatStrings function, you can easily add a DOS style command prompt to the front of any string! Really useful!
Okay, not really. A more useful case I find is when I want to have a make a function that returns me data in a stream like manner.
let readDWORD array i = array[i] | array[i + 1] << 8 | array[i + 2] << 16 |
array[i + 3] << 24 //I've actually used this function in Python.
The convenient part about it is that rather than creating an entire class for this sort of thing, calling the constructor, calling obj.readDWORD(), you just have a function that can't be mutated out from under you.
You know you can map a function over a list? For example, mapping a function to add one to each element of a list:
> List.map ((+) 1) [1; 2; 3];;
val it : int list = [2; 3; 4]
This is actually already using currying because the (+) operator was used to create a function to add one to its argument but you can squeeze a little more out of this example by altering it to map the same function of a list of lists:
> List.map (List.map ((+) 1)) [[1; 2]; [3]];;
val it : int list = [[2; 3]; [4]]
Without currying you could not partially apply these functions and would have to write something like this instead:
> List.map((fun xs -> List.map((fun n -> n + 1), xs)), [[1; 2]; [3]]);;
val it : int list = [[2; 3]; [4]]
I gave a good example of simulating currying in C# on my blog. The gist is that you can create a function that is closed over a parameter (in my example create a function for calculating the sales tax closed over the value of a given municipality)out of an existing multi-parameter function.
What is appealing here is instead of having to make a separate function specifically for calculating sales tax in Cook County, you can create (and reuse) the function dynamically at runtime.