FSharp.Data: Transform multiple columns to a single column (dictionary result) - f#

I am using FSharp.Data to transform HTML table data, i.e.
type RawResults = HtmlProvider<url>
let results = RawResults.Load(url).Tables
for row in results.Table1.Rows do
printfn " %A " row
Example output:
("Model: Generic", "Submit Date: July 22, 2016")
("Gene: Sequencing Failed", "Exectime: 5 hrs. 21 min.")
~~~ hundreds of more rows ~~~~
I am trying to split those "two column"-based elements into a single column sequence to eventually get to a dictionary result.
Desired dictionary key:value result:
["Model", Generic]
["Submit Date", July 22, 2016]
["Gene", "Sequencing Failed"]
~~~~
How can you iter (or split?) the two columns (Column1 & Column2) to pipe both of those individual columns to produce a dictionary result?
let summaryDict =
results.Table1.Rows
|> Seq.skip 1
|> Seq.iter (fun x -> x.Column1 ......
|> ....

Use the built-in string API to split over the :. I usually prefer to wrap String.Split in curried form:
let split (separator : string) (s : string) = s.Split (separator.ToCharArray ())
Additionally, while not required, when working with two-element tuples, I often find it useful to define a helper module with functions related to this particular data structure. You can put various functions in such a module (e.g. curry, uncurry, swap, etcetera), but in this case, a single function is all you need:
module Tuple2 =
let mapBoth f g (x, y) = f x, g y
With these building blocks, you can easily split each tuple element over :, as shown in this FSI session:
> [
("Model: Generic", "Submit Date: July 22, 2016")
("Gene: Sequencing Failed", "Exectime: 5 hrs. 21 min.") ]
|> List.map (Tuple2.mapBoth (split ":") (split ":"));;
val it : (string [] * string []) list =
[([|"Model"; " Generic"|], [|"Submit Date"; " July 22, 2016"|]);
([|"Gene"; " Sequencing Failed"|], [|"Exectime"; " 5 hrs. 21 min."|])]
At this point, you still need to strip leading whitespace, as well as convert the arrays into your desired format, but I trust you can take it from here (otherwise, please ask).

Related

F# - Ask for 3 numbers and then find the minimum?

I am new to programming so this should be an easy one.
I want to write a code that asks for 3 numbers and then finds the minimum. Something like that:
let main(): Unit =
putline ("Please enter 3 numbers:")
putline ("First number: ")
let a = getline ()
putline ("Second number: ")
let b = getline ()
putline("Third number: ")
let c = getline ()
if (a<b && a<c) then putline ("Minimum:" + a)
elif (b<c && b<a) then putline ("Minimum:" + b)
else putline ("Minimum:" + c)
I am sorry if this is terrible but I am still new to this. Also I am not allowed to use the dictionary. Any advice?
You can use the F# function min, which gives you the minimum of two values.
min 1 2 // 1
To get the minimum of three values you can use it twice:
min (min a b) c
A cleaner way to write this with F# piping is:
a |> min b |> min c
Alternatively, put the items in a list and use List.min:
[ a; b; c ] |> List.min
If, for some reason, you decide to expand beyond three numbers, you could consider using Seq.reduce
let xs = [0;-5;3;4]
xs
|> Seq.reduce min
|> printfn "%d"
// prints -5 to stdout
You can use min as the reducer because it accepts 2 arguments, which is exactly what Seq.reduce expects
Firstly your putline function. I'm assuming that this is supposed to take a value and print it to the console with a newline, so the built in F# command to do this is printfn and you would use it something like this:
let a = 1
printfn "Minimum: %d" a
The %d gets replaced with the value of a as, in this case, a is an integer. You would use %f for a float, %s for a string... the details will all be in the documentation.
So we could write your putline function like this:
let putline s = printfn "%s" s
This function has the following signature, val putline : s:string -> unit, it accepts a string and return nothing. This brings us onto your next problem, you try and say putline ("Minimum:" + a). This won't work as adding a number and a string isn't allowed, so what you could do is convert a to a string and you have several ways to do this:
putline (sprintf "Minimum: %d" a)
putline ("Minimum:" + a.ToString())
sprintf is related to printfn but gives you back a string rather than printing to the console, a.ToString() converts a to a string allowing it to be concatenated with the preceding string. However just using printfn instead of putline will work here!
You also have a logic problem, you don't consider the cases where a == b == c, what's the minimum of 1,1,3? Your code would say 3. Try using <= rather than <
For reading data from the console, there is already an answer on the site for this Read from Console in F# that you can look at.

Times a str is shown

I've made a function to read a .txt file and turn it into a string.
From here I need help with collecting how many times a word is shown.
But I'm not sure where to go from here and any kind of help with any of the bulletpoints would be greatly appreciated.
Let's go through this step by step then, creating a function for each bit:
Convert words starting with an upper-case to a lower-case word so that all words are lower case.
Split the string into a sequence of words:
let getWords (s: string) =
s.Split(' ')
Turns "hello world" into ["hello"; "world"]
Sort the amount of times a word is shown. A word in this sense is a sequence of characters without whitespaces or punctuation (!#= etc)
Part #1: Format a word in lower without punctuation:
let isNotPunctuation c =
not (Char.IsPunctuation(c))
let formatWord (s: string) =
let chars =
s.ToLowerInvariant()
|> Seq.filter isNotPunctuation
|> Seq.toArray
new String(chars)
Turns "Hello!" into "hello".
Part #2: Group the list of words by the formatted version of it.
let groupWords (words: string seq) =
words
|> Seq.groupBy formatWord
This returns a tuple, with the first part as the key (formatWord) the second part is a list of the words.
Turns ["hello"; "world"; "hello"] into
[("hello", ["hello"; "hello"]);
("world", ["world"])]
Sort from most frequent word shown and to less frequent word.
let sortWords group =
group
|> Seq.sortByDescending (fun g -> Seq.length (snd g))
Sort the list descending (biggest first) by the length (count) of items in the second part - see the above representation.
Now we just need to clean up the output:
let output group =
group
|> Seq.map fst
This picks the first part of the tuple from the group:
Turns ("hello", ["hello"; "hello"]) into "hello".
Now we have all the functions, we can stick them together into one chain:
let s = "some long string with some repeated words again and some other words"
let finished =
s
|> getWords
|> groupWords
|> sortWords
|> output
printfn "%A" finished
//seq ["some"; "words"; "long"; "string"; ...]
Here's another way using Regex
open System.Text.RegularExpressions
let str = "Some (very) long string with some repeated words again, and some other words, and some punctuation too."
str
|> (Regex #"\W+").Split
|> Seq.choose(fun s -> if s = "" then None else Some (s.ToLower()))
|> Seq.countBy id
|> Seq.sortByDescending snd

My function outputs a "Seq" and not a String in F#?

I am curious why when I run this, the function "parsenumber" gets outputted as a Seq and not just an int.
|> Seq.map (fun ((number,income,sex,house),data) ->
let parsenumber = number |> Seq.skip 9 |> Seq.take 5
let numberofpets = data |> Seq.map (fun row -> (1.0 - float row.pets))
((parsenumber,income,sex,house),numberofpets)
This is the result:
(<seq>, "30050", "Male", "Yes")
(<seq>, "78000", "Female", "No")
How can I change this so it outputs the number and not <seq>.
With Seq.skip and Seq.take, I am trying to skip the first 9 integers of each observation in number and return the last 5.
RAW CSV DATA:
10000000001452,30050,Male,Yes
10000000001455,78000,Female,No
What I want as a result:
('01452','30050','Male','Yes')
('01455','78000','Female','No')
What I am actually getting:
(<seq>, "30050", "Male", "Yes")
(<seq>, "78000", "Female", "No")
I need to not have as an output, and the actual number instead.
When you say, "I am trying to skip the first 9 integers of each observation in number and return the last 5", did you mean "digits" rather than "integers"? I.e., is number a string originally? Then you should use number.Substring(9, 5) instead of Seq.skip and Seq.take. The Seq.skip and Seq.take functions are defined as returning sequences — that's what they're for. When you interpret a string as a sequence, it returns a sequence of characters. If you use the .Substring method, it returns a string.
BTW, if you want to use .Substring, you'll need to tell F# what type you expect number to be: calling methods of a parameter is one place where F#'s type inference can't figure out what type you have. (Because in theory, you could have defined your own type with a .Substring method and meant to call that type). To explicitly declare the type of the number parameter, you'd use a colon and the type name, so that fun ((number,income,sex,house),data) -> would become fun ((number : string, income, sex, house), data) ->. So your entire Seq.map expression would become:
|> Seq.map (fun ((number : string, income, sex, house), data) ->
let parsenumber = number.Substring(9, 5)
let numberofpets = data |> Seq.map (fun row -> (1.0 - float row.pets))
((parsenumber, income, sex, house), numberofpets)
Also, parsenumber isn't a function, so that's not a good name for it. Possibly parsednumber would be better, though if I understood more about what you're trying to do then there's probably an even better suggestion.

F# map a seq to another seq of shorter length

I have a sequence of strings like this (lines in a file)
[20150101] error a
details 1
details 2
[20150101] error b
details
[20150101] error c
I am trying to map this to a sequence of strings like this (log entries)
[20150101] error a details 1 details 2
[20150101] error b details
[20150101] error c
I can do this in an imperative way (by translating the code I would write in C#) - this works but it reads like pseudo-code because I have omitted the referenced functions:
let getLogEntries logFilePath =
seq {
let logEntryLines = new ResizeArray<string>()
for lineOfText in getLinesOfText logFilePath do
if isStartOfNewLogEntry lineOfText && logEntryLines.Any() then
yield joinLines logEntryLines
logEntryLines.Clear()
logEntryLines.Add(lineOfText)
if logEntryLines.Any() then
yield joinLines logEntryLines
}
Is there a more functional way of doing this?
I can't use Seq.map since it's not a one to one mapping, and Seq.fold doesn't seem right because I suspect it will process the entire input sequence before returning the results (not great if I have very large log files). I assume my code above isn't the ideal way to do this in F# because it's using ResizeArray<string>.
In general, when there is no built-in function that you can use, the functional way to solve things is to use recursion. Here, you can recursively walk over the input, remember the items of the last chunk (since the last [xyz] Info line) and produce new results when you reach a new starting block. In F#, you can write this nicely with sequence expressions:
let rec joinDetails (lines:string list) lastChunk = seq {
match lines with
| [] ->
// We are at the end - if there are any records left, produce a new item!
if lastChunk <> [] then yield String.concat " " (List.rev lastChunk)
| line::lines when line.StartsWith("[") ->
// New block starting. Produce a new item and then start a new chunk
if lastChunk <> [] then yield String.concat " " (List.rev lastChunk)
yield! joinDetails lines [line]
| line::lines ->
// Ordinary line - just add it to the last chunk that we're collection
yield! joinDetails lines (line::lastChunk) }
Here is an example showing the code in action:
let lines =
[ "[20150101] error a"
"details 1"
"details 2"
"[20150101] error b"
"details"
"[20150101] error c" ]
joinDetails lines []
There is not much in-built in Seq that is going to help you, so you have to roll your own solution. Ultimately, parsing a file like this involves iterating and maintaining state, but what F# does is encapsulate that iteration and state by means of computation expressions (hence your use of the seq computation expression).
What you've done isn't bad but you could extract your code into a generic function that computes the chunks (i.e. sequences of strings) in an input sequence without knowledge of the format. The rest, i.e. parsing an actual log file, can be made purely functional.
I have written this function in the past to help with this.
let chunkBy chunkIdentifier source =
seq {
let chunk = ref []
for sourceItem in source do
let isNewChunk = chunkIdentifier sourceItem
if isNewChunk && !chunk <> [] then
yield !chunk
chunk := [ sourceItem ]
else chunk := !chunk # [ sourceItem ]
yield !chunk
}
It takes a chunkIdentifier function which returns true if the input is the start of a new chunk.
Parsing a log file is simply a case of extracting the lines, computing the chunks and joining each chunk:
logEntryLines |> chunkBy (fun line -> line.[0] = '[')
|> Seq.map (fun s -> String.Join (" ", s))
By encapsulating the iteration and mutation as much as possible, while creating a reusable function, it's more in the spirit of functional programming.
Alternatively, another two variants:
let lst = ["[20150101] error a";
"details 1";
"details 2";
"[20150101] error b";
"details";
"[20150101] error c";]
let fun1 (xs:string list) =
let sb = new System.Text.StringBuilder(xs.Head)
xs.Tail
|> Seq.iter(fun x -> match x.[0] with
| '[' -> sb.Append("\n" + x)
| _ -> sb.Append(" " + x)
|> ignore)
sb.ToString()
lst |> fun1 |> printfn "%s"
printfn "";
let fun2 (xs:string list) =
List.fold(fun acc (x:string) -> acc +
match x.[0] with| '[' -> "\n" | _ -> " "
+ x) xs.Head xs.Tail
lst |> fun2 |> printfn "%s"
Print:
[20150101] error a details 1 details 2
[20150101] error b details
[20150101] error c
[20150101] error a details 1 details 2
[20150101] error b details
[20150101] error c
Link:
https://dotnetfiddle.net/3KcIwv

Joining two lists of records and calculating a result

I have two lists of records with the following types:
type AverageTempType = {Date: System.DateTime; Year: int64; Month: int64; AverageTemp: float}
type DailyTempType = {Date: System.DateTime; Year: int64; Month: int64; Day: int64; DailyTemp: float}
I want to get a new list which is made up of the DailyTempType "joined" with the AverageTempType. Ultimately though for each daily record I want the Daily Temp - Average temp for the matching month.
I think I can do this with loops as per below and massage this into a reasonable output:
let MatchLoop =
for i in DailyData do
for j in AverageData do
if (i.Year = j.Year && i.Month = j.Month)
then printfn "%A %A %A %A %A" i.Year i.Month i.Day i.DailyTemp j.Average
else printfn "NOMATCH"
I have also try to do this with matching but I can't quite get there (I'm not sure how to define the list correctly in the input type and then iterate to get a result. Also I'm not sure sure if this approach even makes sense):
let MatchPattern (x:DailyTempType) (y:AverageTempType) =
match (x,y) with
|(x,y) when (x.Year = y.Year && x.Month = y.Month) ->
printfn "match"
|(_,_) -> printfn "nomatch"
I have looked into Deedle which I think can do this relatively easily but I am keen to understand how to do it a lower level.
What you can do is to create a map of the monthly average data. You can think of a map as a read-only dictionary:
let averageDataMap =
averageData
|> Seq.map (fun x -> ((x.Year, x.Month), x))
|> Map.ofSeq
This particular map is a Map<(int64 * int64), AverageTempType>, which, in plainer words, means that the keys into the map are tuples of year and month, and the value associated with each key is an AverageTempType record.
This enables you to find all the matching month data, based on the daily data:
let matches =
dailyData
|> Seq.map (fun x -> (x, averageDataMap |> Map.tryFind (x.Year, x.Month)))
Here, matches has the data type seq<DailyTempType * AverageTempType option>. Again, in plainer words, this is a sequence of tuples, where the first element of each tuple is the original daily observation, and the second element is the corresponding monthly average, if a match was found, or None if no matching monthly average was found.
If you want to print the values as in the OP, you can do this:
matches
|> Seq.map snd
|> Seq.map (function | Some _ -> "Match" | None -> "No match")
|> Seq.iter (printfn "%s")
This expression starts with the matches; then pulls out the second element of each tuple; then again maps a Some value to the string "Match", and a None value to the string "No match"; and finally prints each string.
I would convert first AverageTempType seq to a Map (reducing cost of join):
let toMap (avg:AverageTempType seq) = avg |> Seq.groupBy(fun a -> a.Year + a.Month) |> Map.ofSeq
Then you can join and return an option, so consuming code can do whatever you want (print, store, error, etc.):
let join (avg:AverageTempType seq) (dly:DailyTempType seq) =
let avgMap = toMap avg
dly |> Seq.map (fun d -> d.Year, d.Month, d.Day, d.DailyTemp, Map.tryFind (d.Year + d.Month) avgMap);;

Resources