Splitting a string list list - f#

I'm quite (very) new to F# and I'm scratching my head over a little problem. I have a string list list that I'm trying to manipulate and transform. This is probably trivial.
The following data is being read in from a CSV file:
1,ABC,3
1,DEF,3
1,XYZ,1
2,ABC,2
2,XYZ,1
3,DEF,2
3,XYZ,2
Which right or wrong, I'm reading into a string list list. This data represents a non-normalized set of data, where the field at index 0 on each record is an Identifier field. At the moment I'm just trying to split the outer-list up so that I end up with a string list list list representing the following:
1,ABC,3 2,ABC,2 3,DEF,2
1,DEF,3 2,XYZ,1 3,XYZ,2
1,XYZ,1
The results above will then be pushed into my Typed model and fed into the rest of the application.

In your code:
csvRecords
|> Seq.groupBy (fun record -> (record.Item 0))
|> List.ofSeq
|> List.map(toTypedModel)
record.Item 0 isn't a good way to get the first element of a list. You should either use List.head or pattern matching for that purpose.
Your example would look like:
csvRecords
|> Seq.groupBy List.head
|> Seq.map toTypedModel
|> List.ofSeq
I also changed the order to use toTypedModel with sequence, it helps to avoid allocating an unnecessary list.

Use Seq.groupby -
input
|> Seq.groupBy (fun (a,b,c) -> a)
|> Seq.toList

Related

What are the essential functions to find duplicate elements within a list?

What are the essential functions to find duplicate elements within a list?
Translated, how can I simplify the following function:
let numbers = [ 3;5;5;8;9;9;9 ]
let getDuplicates = numbers |> List.groupBy id
|> List.map snd
|> List.filter (fun set -> set.Length > 1)
|> List.map (fun set -> set.[0])
I'm sure this is a duplicate. However, I am unable to locate the question on this site.
UPDATE
let getDuplicates numbers =
numbers |> List.groupBy id
|> List.choose (fun (k,v) -> match v.Length with
| x when x > 1 -> Some k
| _ -> None)
Simplifying your function:
Whenever you have a filter followed by a map, you can probably replace the pair with a choose. The purpose of choose is to run a function for each value in the list, and return only the items which return Some value (None values are removed, which is the filter portion). Whatever value you put inside Some is the map portion:
let getDuplicates = numbers |> List.groupBy id
|> List.map snd
|> List.choose( fun( set ) ->
if set.Length > 1
then Some( set.[0] )
else None )
We can take it one additional step by removing the map. In this case, keeping the tuple which contains the key is helpful, because it eliminates the need to get the first item of the list:
let getDuplicates = numbers |> List.groupBy id
|> List.choose( fun( key, set ) ->
if set.Length > 1
then Some key
else None )
Is this simpler than the original? Perhaps. Because choose combines two purposes, it is by necessity more complex than those purposes kept separate (the filter and the map), and this makes it harder to understand at a glance, perhaps undoing the more "simplified" code. More on this later.
Decomposing the concept
Simplifying the code wasn't the direct question, though. You asked about functions useful in finding duplicates. At a high level, how do you find a duplicate? It depends on your algorithm and specific needs:
Your given algorithm uses the "put items in buckets based on their value", and "look for buckets with more than one item". This is a direct match to List.groupBy and List.choose (or filter/map)
A different algorithm could be to "iterate through all items", "modify an accumulator as we see each", then "report all items which have been seen multiple times". This is kind of like the first algorithm, where something like List.fold is replacing List.groupBy, but if you need to drag some other kind of state along, it may be helpful.
Perhaps you need to know how many times there are duplicates. A different algorithm satisfying these requirements may be "sort the items so they are always ascending", and "flag if the next item is the same as the current item". In this case, you have a List.sort followed by a List.toSeq then Seq.windowed:
let getDuplicates = numbers |> List.sort
|> List.toSeq
|> Seq.windowed 2
|> Seq.choose( function
| [|x; y|] when x = y -> Some x
| _ -> None )
Note that this returns a sequence with [5; 9; 9], informing you that 9 is duplicated twice.
These were algorithms mostly based on List functions. There are already two answers, one mutable, the other not, which are based on sets and existence.
My point is, a complete list of functions helpful to finding duplicates would read like a who's who list of existing collection functions -- it all depends on what you're trying to do and your specific requirements. I think your choice of List.groupBy and List.choose is probably about as simple as it gets.
Simplifying for maintainability
The last thought on simplification is to remember that simplifying code will improve the readability of your code to a certain extent. "Simplifying" beyond that point will most likely involve tricks, or obscure intent. If I were to look back at a sample of code I wrote even several weeks and a couple of projects ago, the shortest and perhaps simplest code would probably not be the easiest to understand. Thus the last point -- simplifying future code maintainability may be your goal. If this is the case, your original algorithm modified only keeping the groupBy tuple and adding comments as to what each step of the pipeline is doing may be your best bet:
// combine numbers into common buckets specified by the number itself
let getDuplicates = numbers |> List.groupBy id
// only look at buckets with more than one item
|> List.filter( fun (_,set) -> set.Length > 1)
// change each bucket to only its key
|> List.map( fun (key,_) -> key )
The original question comments already show that your code was unclear to people unfamiliar with it. Is this a question of experience? Definitely. But, regardless of whether we work on a team, or are lone wolves, optimizing code (where possible) for quick understanding should probably be close to everyone's top priority. (climbing down off sandbox...) :)
Regardless, best of luck.
If you don't mind using a mutable collection in a local scope, this could do it:
open System.Collections.Generic
let getDuplicates numbers =
let known = HashSet()
numbers |> List.filter (known.Add >> not) |> set
You can wrap the last three operations in a List.choose:
let duplicates =
numbers
|> List.groupBy id
|> List.choose ( function
| _, x::_::_ -> Some x
| _ -> None )
Here's a solution which uses only basic functions and immutable data structures:
let findDups elems =
let findDupsHelper (oneOccurrence, manyOccurrences) elem =
if oneOccurrence |> Set.contains elem
then (oneOccurrence, manyOccurrences |> Set.add elem)
else (oneOccurrence |> Set.add elem, manyOccurrences)
List.fold findDupsHelper (Set.empty, Set.empty) elems |> snd

F#: Generating a word count summary

I am new to programming and F# is my first .NET language.
I would like to read the contents of a text file, count the number of occurrences of each word, and then return the 10 most common words and the number of times each of them appears.
My questions are: Is using a dictionary encouraged in F#? How would I write the code if I wish to use a dictionary? (I have browsed through the Dictionary class on MSDN, but I am still puzzling over how I can update the value to a key.) Do I always have to resort to using Map in functional programming?
While there's nothing wrong with the other answers, I'd like to point out that there's already a specialized function to get the number of unique keys in a sequence: Seq.countBy. Plumbing the relevant parts of Reed's and torbonde's answers together:
let countWordsTopTen (s : string) =
s.Split([|','|])
|> Seq.countBy (fun s -> s.Trim())
|> Seq.sortBy (snd >> (~-))
|> Seq.truncate 10
"one, two, one, three, four, one, two, four, five"
|> countWordsTopTen
|> printfn "%A" // seq [("one", 3); ("two", 2); ("four", 2); ("three", 1); ...]
My questions are: Is using a dictionary encouraged in F#?
Using a Dictionary is fine from F#, though it does use mutability, so it's not quite as common.
How would I write the code if I wish to use a dictionary?
If you read the file, and have a string with comma separated values, you could
parse using something similar to:
// Just an example of input - this would come from your file...
let strings = "one, two, one, three, four, one, two, four, five"
let words =
strings.Split([|','|])
|> Array.map (fun s -> s.Trim())
let dict = Dictionary<_,_>()
words
|> Array.iter (fun w ->
match dict.TryGetValue w with
| true, v -> dict.[w] <- v + 1
| false, _ -> dict.[w] <- 1)
// Creates a sequence of tuples, with (word,count) in order
let topTen =
dict
|> Seq.sortBy (fun kvp -> -kvp.Value)
|> Seq.truncate 10
|> Seq.map (fun kvp -> kvp.Key, kvp.Value)
I would say an obvious choice for this task is to use the Seq module, which is really one of the major workhorses in F#. As Reed said, using dictionary is not as common, since it is mutable. Sequences, on the other hand, are immutable. An example of how to do this using sequences is
let strings = "one, two, one, three, four, one, two, four, five"
let words =
strings.Split([|','|])
|> Array.map (fun s -> s.Trim())
let topTen =
words
|> Seq.groupBy id
|> Seq.map (fun (w, ws) -> (w, Seq.length ws))
|> Seq.sortBy (snd >> (~-))
|> Seq.truncate 10
I think the code speaks pretty much for itself, although maybe the second last line requires a short explanation:
The snd-function gives the second entry in a pair (i.e. snd (a,b) is b), >> is the functional composition operator (i.e. (f >> g) a is the same as g (f a)) and ~- is the unary minus operator. Note here that operators are essentially functions, but when using (and declaring) them as functions, you have to wrap them in parentheses. That is, -3 is the same as (~-) 3, where in the last case we have used the operator as a function.
In total, what the second last line does, is sort the sequence by the negative value of the second entry in the pair (the number of occurrences).

FSharp order of function parameters

Since functions in FSharp with multiple parameters get curried inherently into functions with only one parameter, should the signature of Seq.filter have to be
Seq.filter predicate source
?
How different will it be from
Seq.filter source predicate
Thanks
The first order (predicate, sequence) is more appropriate for chaining sequence combinators via the |> operator. Typically, you have a single sequence to which you apply a number of operations/transformations, consider something like
xs |> Seq.map ... |> Seq.filter ... |> Seq. ...
etc. Reversing the order of the parameters to (source, predicate) would prohibit that (or at least make it much more awkward to express). That (and maybe also partial application) is why for (almost) all the default Seq combinators the last parameter is the sequence the operation is applied to.
The reason it is
Seq.filter predicate source
instead of
Seq.filter soure predicate
is so that you can do this
source
|> Seq.filter predicate
Since you are more likely to build a new function using Seq.filter predicate
let isEven = Seq.filter (fun x -> x % 2 = 0)
you can now do
source |> isEven
There are functions in F# where the order of parameters are not done like this because of it's history of coming from OCaml. See: Different argument order for getting N-th element of Array, List or Seq
Yes Seq.filter takes the predicate followed by the sequence to filter. If you want to provide them in the other order you could write a function to reverse the arguments:
let flip f a b = f b a
then you could write
(flip Seq.filter) [1..10] (fun i -> i > 3)
The existing order is more convenient however since it makes partial application more useful e.g.
[1..3] |> Seq.map ((*)2) |> Seq.filter (fun i -> i > 2)
and you have also ||> for piping functions accepting two arguments signature, or partially apply 2 arguments to a wider signature. : )

Is there a more generic way of iterating,filtering, applying operations on collections in F#?

Let's take this code:
open System
open System.IO
let lines = seq {
use sr = new StreamReader(#"d:\a.h")
while not sr.EndOfStream do yield sr.ReadLine()
}
lines |> Seq.iter Console.WriteLine
Console.ReadLine()
Here I am reading all the lines in a seq, and to go over it, I am using Seq.iter. If I have a list I would be using List.iter, and if I have an array I would be using Array.iter. Isn't there a more generic traversal function I could use, instead of having to keep track of what kind of collection I have? For example, in Scala, I would just call a foreach and it would work regardless of the fact that I am using a List, an Array, or a Seq.
Am I doing it wrong?
You may or may not need to keep track of what type of collection you deal with, depending on your situation.
In case of simple iterating over items nothing may prevent you from using Seq.iter on lists or arrays in F#: it will work over arrays as well as over lists as both are also sequences, or IEnumerables from .NET standpoint. Using Array.iter over an array, or List.iter over a list would simply offer more effective implementations of traversal based on specific properties of each type of collection. As the signature of Seq.iter Seq.iter : ('T -> unit) -> seq<'T> -> unit shows you do not care about your type 'T after the traversal.
In other situations you may want to consider types of input and output arguments and use specialized functions, if you care about further composition. For example, if you need to filter a list and continue using result, then
List.filter : ('T -> bool) -> 'T list -> 'T list will preserve you the type of underlying collection intact, but Seq.filter : ('T -> bool) -> seq<'T> -> seq<'T> being applied to a list will return you a sequence, not a list anymore:
let alist = [1;2;3;4] |> List.filter (fun x -> x%2 = 0) // alist is still a list
let aseq = [1;2;3;4] |> Seq.filter (fun x -> x%2 = 0) // aseq is not a list anymore
Seq.iter works on lists and arrays just as well.
The type seq is actually an alias for the interface IEnumerable<'T>, which list and array both implement. So, as BLUEPIXY indicated, you can use Seq.* functions on arrays or lists.
A less functional-looking way would be the following:
for x in [1..10] do
printfn "%A" x
List and Array is treated as Seq.
let printAll seqData =
seqData |> Seq.iter (printfn "%A")
Console.ReadLine() |> ignore
printAll lines
printAll [1..10]
printAll [|1..10|]

f# - looping through array

I have decided to take up f# as my functional language.
My problem: Give a bunch of 50digits in a file, get the first 10 digits of the sum of each line. (euler problem for those who know)
for example (simplified):
1234567890
The sum is 45
The first "ten" digits or in our case the "first" digit is 4.
Heres my problem,
I read my file of numbers,
I can split it using "\n" and now i have each line, and then I try to convert it to an char array, but the problem comes here. I can't access each element of that array.
let total =
lines.Split([|'\n'|])
|> Seq.map (fun line -> line.ToCharArray())
|> Seq.take 1
|> Seq.to_list
|> Seq.length
I get each line, convert it to array, i take the first array (for testing only), and i try to convert it to list, and then get the length of the list. But this length is the length of how many arrays i have (ie, 1). It should be 50 as thats how many elements there are in the array.
Does anyone know how to pipeline it to access each char?
Seq.take is still returning a seq<char array>. To get only the first array you could use Seq.nth 0.
My final answer:
let total =
lines.Split([|'\n'|])
|> Seq.map (fun line -> line.ToCharArray() |> Array.to_seq)
|> Seq.map (fun eachSeq -> eachSeq
|> Seq.take 50 //get rid of the \r
|> Seq.map (fun c -> Double.Parse(c.ToString()))
|> Seq.skip 10
|> Seq.sum
)
|> Seq.average
is what i got finally and it's working :).
Bascially after I convert it to charArray, i make it a sequence. So now i have a sequence of sequence. Then I can loop through each seqquence.
I'm not 100% sure what you're asking for, but I believe you're trying to write something like this:
lines.Split([|'\n'|) |> Seq.map (fun line -> line.Length)
This converts each line to a sequence of integers representing the length of each line.
Here's my solution:
string(Seq.sumBy bigint.Parse (data.Split[|'\n'|])).Substring(0, 10)
I copied the data into a string, each line separated by x. Then the answer is one line (wrapped for SO):
let ans13 = data |> String.split ['x'] |> Seq.map Math.BigInt.Parse
|> Seq.reduce (+)
If you are reading it from a file, you'd add the file reading code:
let ans13 = IO.File.ReadAllLines("filename") |> Seq.map Math.BigInt.Parse
|> Seq.reduce (+)
Edit: Actually, I'm not sure we're talking about the same Euler problem -- this is for 13, but your description sounds slightly different. To get the first 10 digits after the summing, do:
printfn "%s" <| String.sub (string ans13) 0 10

Resources