Map over values of one column - f#

I want to map over the values of the Title column of my dataframe.
The solution I came up with is the following:
df.Columns.[ [ "Title"; "Amount" ] ]
|> Frame.mapCols(fun k s ->
if k = "Title"
then s |> Series.mapValues (string >> someModif >> box)
else s.Observations |> Series)
Since s is of type ObjectSeries<_> I have to cast it to string, modify it then box it back.
Is there a recommended way to map over the values of a single column?

Another option would be to add a TitleMapped column with:
df?TitleMapped <- df?Title |> Series.mapValues (...your mapping fn...)
...and then throw away the Title column with df |> Frame.dropCol "Title" (or not bother if you don't care whether it stays or not).
Or, if you don't like the "imperativeness" of <-, you can do something like:
df?Title
|> Series.mapValues (...your mapping fn...)
|> fun x -> Frame( ["Title"], [x] )
|> Frame.join JoinKind.Left (df |> Frame.dropCol "Title")

You can use GetColumn:
df.GetColumn<string>("Title")
|> Series.mapValues(someModif)
Or in more F#-style:
df
|> Frame.getCol "Title"
|> Series.mapValues(string >> someModif)

In some cases, you may want to map over values of a specific column and keep that mapped column in the frame. Supposing we have a frame called someFrame with 2 columns (Col1 and Col2) and we want to transform Col1 (for example, Col1 + Col2), what I usually do is:
someFrame
|> Frame.replaceCol "Col1"
(Frame.mapRowValues (fun row ->
row.GetAs<float>("Col1") + row.GetAs<float>("Col2"))
someFrame)
If you want to create a new column instead of replacing it, all you have to do is to change the "replaceCol" method for "addCol" and choose a new name for the column instead of "Col1" of the given example. I don't know if this is the most efficient way, but it worked for me so far.

Related

Reading text file, iterating over lines to find a match, and return the value with FSharp

I have a text file that contains the following and I need to retrieve the value assigned to taskId, which in this case is AWc34YBAp0N7ZCmVka2u.
projectKey=ProjectName
serverUrl=http://localhost:9090
serverVersion=10.5.32.3
strong text**interfaceUrl=http://localhost:9090/interface?id=ProjectName
taskId=AWc34YBAp0N7ZCmVka2u
taskUrl=http://localhost:9090/api/ce/task?id=AWc34YBAp0N7ZCmVka2u
I have two different ways of reading the file that I've wrote.
let readLines (filePath:string) = seq {
use sr = new StreamReader (filePath)
while not sr.EndOfStream do
yield sr.ReadLine ()
}
readLines (FindFile currentDirectory "../**/sample.txt")
|> Seq.iter (fun line ->
printfn "%s" line
)
and
let readLines (filePath:string) =
(File.ReadAllLines filePath)
readLines (FindFile currentDirectory "../**/sample.txt")
|> Seq.iter (fun line ->
printfn "%s" line
)
At this point, I don't know how to approach getting the value I need. Options that, I think, are on the table are:
use Contains()
Regex
Record type
Active Pattern
How can I get this value returned and fail if it doesn't exist?
I think all the options would be reasonable - it depends on how complex the file will actually be. If there is no escaping then you can probably just look for = in the line and use that to split the line into a key value pair. If the syntax is more complex, this might not always work though.
My preferred method would be to use Split on string - you can then filter to find values with your required key, map to get the value and use Seq.head to get the value:
["foo=bar"]
|> Seq.map (fun line -> line.Split('='))
|> Seq.filter (fun kvp -> kvp.[0] = "foo")
|> Seq.map (fun kvp -> kvp.[1])
|> Seq.head
Using active patterns, you could define a pattern that takes a string and splits it using = into a list:
let (|Split|) (s:string) = s.Split('=') |> List.ofSeq
This then lets you get the value using Seq.pick with a pattern matching that looks for strings where the substring before = is e.g. foo:
["foo=bar"] |> Seq.pick (function
| Split ["foo"; value] -> Some value
| _ -> None)
The active pattern trick is quite neat, but it might be unnecessarily complicating the code if you only need this in one place.

Get the count of distinct values from a Sequence in F#

I have a sequence of Country names in F#. I want to get how many of each distinct country entries do I have in the sequence.
The countBy examples in Microsoft docs and MSDN use if and else to get the Keys, but since I have ~240 distinct entries, I guess that I don't need to make an elif sentence for each entry, right?
So, is there an option to use another sequence to get the keys for the countBy?
#load "packages/FsLab/FsLab.fsx"
open FSharp.Data
open System
type City = JsonProvider<"city.list.json",SampleIsList=true>
let cities = City.GetSamples()
let Countries = seq { for city in cities do yield city.Country.ToString() } |> Seq.sort
let DistinctCountries = Countries |> Seq.distinct
//Something like this
let Count = Seq.countBy DistinctCountries Countries
Anyone interested in my city.list.json
Update
My input sequence is something like this (with a lot more of entries) with each code repeated as many cities for that country are in the original list:
{ "AR","AR","AR","MX","MX" }
As a result I expect:
{("AR", 3),("MX", 2),...}
Countries |> Seq.countBy id
id is the identity function fun x -> x. Use this because the "key" here is the sequence item itself.
You can group the countries and then count the number of entries in each group:
let countsByCountry =
Countries
|> Seq.groupBy id
|> Seq.map (fun (c, cs) -> c, Seq.length cs)
This combination is also implemented as a single function, countBy:
let countsByCountry = Countries |> Seq.countBy id
So, is there an option to use another sequence to get the keys for the countBy?
You do not need to get the keys from somewhere, the function passed to Seq.countBy generates the keys. You should be able to get away with this:
let count =
cities
|> Seq.countBy (fun c -> c.Country.ToString())

Deedle Equivalent to pandas.merge

I am looking to merge two Deedle (F#) frames based on a specific column in each frame in a similar manner as pandas.DataFrame.Merge.The perfect example of this would be a primary frame that contains columns of data and a (city, state) column along with an information frame that contains the following columns: (city, state); lat; long. If I want to add the lat long columns into my primary frame, I would merge the two frames on the (city, state) column.
Here is an example:
let primaryFrame =
[(0, "Job Name", box "Job 1")
(0, "City, State", box "Reno, NV")
(1, "Job Name", box "Job 2")
(1, "City, State", box "Portland, OR")
(2, "Job Name", box "Job 3")
(2, "City, State", box "Portland, OR")
(3, "Job Name", box "Job 4")
(3, "City, State", box "Sacramento, CA")] |> Frame.ofValues
let infoFrame =
[(0, "City, State", box "Reno, NV")
(0, "Lat", box "Reno_NV_Lat")
(0, "Long", box "Reno_NV_Long")
(1, "City, State", box "Portland, OR")
(1, "Lat", box "Portland_OR_Lat")
(1, "Long", box "Portland_OR_Long")] |> Frame.ofValues
// see code for merge_on below.
let mergedFrame = primaryFrame
|> merge_On infoFrame "City, State" null
Which would result in 'mergedFrame' looking like this:
> mergedFrame.Format();;
val it : string =
" Job Name City, State Lat Long
0 -> Job 1 Reno, NV Reno_NV_Lat Reno_NV_Long
1 -> Job 2 Portland, OR Portland_OR_Lat Portland_OR_Long
2 -> Job 3 Portland, OR Portland_OR_Lat Portland_OR_Long
3 -> Job 4 Sacramento, CA <missing> <missing>
I have come up with a way of doing this (the 'merge_on' function used in the example above), but being a Sales Engineer who is new to F#, I imagine there is a more idiomatic/efficient way of doing this. Below is my functions for doing this along with a 'removeDuplicateRows' which does what you would expect and was needed for the 'merge_on' function; if you want to comment on a better way of doing this as well, please do.
let removeDuplicateRows column (frame : Frame<'a, 'b>) =
let nonDupKeys = frame.GroupRowsBy(column).RowKeys
|> Seq.distinctBy (fun (a, b) -> a)
|> Seq.map (fun (a, b) -> b)
frame.Rows.[nonDupKeys]
let merge_On (infoFrame : Frame<'c, 'b>) mergeOnCol missingReplacement
(primaryFrame : Frame<'a,'b>) =
let frame = primaryFrame.Clone()
let infoFrame = infoFrame
|> removeDuplicateRows mergeOnCol
|> Frame.indexRows mergeOnCol
let initialSeries = frame.GetColumn(mergeOnCol)
let infoFrameRows = infoFrame.RowKeys
for colKey in infoFrame.ColumnKeys do
let newSeries =
[for v in initialSeries.ValuesAll do
if Seq.contains v infoFrameRows then
let key = infoFrame.GetRow(v)
yield key.[colKey]
else
yield box missingReplacement ]
frame.AddColumn(colKey, newSeries)
frame
Thanks for your help!
UPDATE:
Switched Frame.indexRowsString to Frame.indexRows to handle cases where the types in the 'mergOnCol' are not strings.
Got rid of infoFrame.Clone() as suggested by Tomas
The way Deedle does joining of frames (only in row/column keys) sadly means that it does not have a nice built-in function to do joining of frames over a non-key column.
As far as I can see, your approach looks very good to me. You do not need Clone on the infoFrame (because you are not mutating the frame) and I think you can replace infoFrame.GetRow with infoFrame.TryGetRow (and then you won't need to get the keys in advance), but other than that, your code looks fine!
I came up with an alternative and a bit shorter way of doing this, which looks as follows:
// Index the info frame by city/state, so that we can do lookup
let infoByCity = infoFrame |> Frame.indexRowsString "City, State"
// Create a new frame with the same row indices as 'primaryFrame'
// containing the additional information from infoFrame.
let infoMatched =
primaryFrame.Rows
|> Series.map (fun k row ->
// For every row, we get the "City, State" value of the row and then
// find the corresponding row with additional information in infoFrame. Using
// 'ValueOrDefault' will automatically give missing when the key does not exist
infoByCity.Rows.TryGet(row.GetAs<string>("City, State")).ValueOrDefault)
// Now turn the series of rows into a frame
|> Frame.ofRows
// Now we have two frames with matching keys, so we can join!
primaryFrame.Join(infoMatched)
This is a bit shorter and maybe more self-explanatory, but I have not done any tests to check which is faster. Unless performance is a primary concern, I think going with the more readable version is a good default choice though!

F# exists where function?

I have a function processing a DataTable looking for any row that has a column with a certain value. It looks like this:
let exists =
let mutable e = false
for row in dt.Rows do
if row.["Status"] :?> bool = false
then e <- true
e
I'm wondering if there is a way to do this in a single expression. For example, Python has the "any" function which would do it something like this:
exists = any(row for row in dt.Rows if not row["Status"])
Can I write a similar one-liner in F# for my exists function?
You can use the Seq.exists function, which takes a predicate and returns true if the predicate holds for at least one element of the sequence.
let xs = [1;2;3]
let contains2 = xs |> Seq.exists (fun x -> x = 2)
But in your specific case, it won't work right away, because DataTable.Rows is of type DataRowCollection, which only implements IEnumerable, but not IEnumerable<T>, and so it won't be considered a "sequence" in F# sense, which means that Seq.* functions won't work on it. To make them work, you have to first cast the sequence to the correct type with Seq.cast:
let exists =
dt.Rows |>
Seq.cast<DataRow> |>
Seq.exists (fun r -> not (r.["Status"] :?> bool) )
Something like this (untested):
dt.Rows |> Seq.exists (fun row -> not (row.["Status"] :?> bool))
https://msdn.microsoft.com/visualfsharpdocs/conceptual/seq.exists%5b%27t%5d-function-%5bfsharp%5d

F#: Updating a single tuple in a list of tuples

I have a list of tuples like so:
let scorecard = [ for i in 0 .. 39 -> i,0 ]
I want to identify the nth tuple in it. I was thinking about it in this way:
let foundTuple = scorecard |> Seq.find(fun (x,y) -> x = 10)
I then want to create a new tuple based on the found one:
let newTuple = (fst foundTuple, snd foundTuple + 1)
And have a new list with that updated value
Does anyone have some code that matches this pattern? I think I have to split the list into 2 sublists: 1 list has 1 element (the tuple I want to replace) and the other list has the remaining elements. I then create a new list with the replacing tuple and the list of unchanged tuples...
You can use List.mapi which creates a new list using a specified projection function - but it also calls the projection function with the current index and so you can decide what to do based on this index.
For example, to increment second element of a list of integers, you can do:
let oldList = [0;0;0;0]
let newList = oldList |> List.mapi (fun index v -> if index = 1 then v + 1 else v)
Depending on the problem, it might make sense to use the Map type instead of list - map represents a mapping from keys to values and does not need to copy the entire contents when you change just a single value. So, for example:
// Map keys from 0 to 3 to values 0
let m = Map.ofList [0,0;1,0;2,0;3,0]
// Set the value at index 1 to 10 and get a new map
Map.add 1 10 m
I went back and thought about the problem and decided to use an array, which is mutable.
let scorecard = [| for i in 0 .. 39 -> i,0 |]
Since tuples are not mutable, I need to create a new tuple based on the existing one and overwrite it in the array:
let targetTuple = scorecard.[3]
let newTuple = (fst targetTuple, snd targetTuple + 1)
scorecard.[3] <- newTuple
I am using the "<-" which is a code smell in F#. I wonder if there a comparable purely functional equivalent?

Resources