Times a str is shown - f#

I've made a function to read a .txt file and turn it into a string.
From here I need help with collecting how many times a word is shown.
But I'm not sure where to go from here and any kind of help with any of the bulletpoints would be greatly appreciated.

Let's go through this step by step then, creating a function for each bit:
Convert words starting with an upper-case to a lower-case word so that all words are lower case.
Split the string into a sequence of words:
let getWords (s: string) =
s.Split(' ')
Turns "hello world" into ["hello"; "world"]
Sort the amount of times a word is shown. A word in this sense is a sequence of characters without whitespaces or punctuation (!#= etc)
Part #1: Format a word in lower without punctuation:
let isNotPunctuation c =
not (Char.IsPunctuation(c))
let formatWord (s: string) =
let chars =
s.ToLowerInvariant()
|> Seq.filter isNotPunctuation
|> Seq.toArray
new String(chars)
Turns "Hello!" into "hello".
Part #2: Group the list of words by the formatted version of it.
let groupWords (words: string seq) =
words
|> Seq.groupBy formatWord
This returns a tuple, with the first part as the key (formatWord) the second part is a list of the words.
Turns ["hello"; "world"; "hello"] into
[("hello", ["hello"; "hello"]);
("world", ["world"])]
Sort from most frequent word shown and to less frequent word.
let sortWords group =
group
|> Seq.sortByDescending (fun g -> Seq.length (snd g))
Sort the list descending (biggest first) by the length (count) of items in the second part - see the above representation.
Now we just need to clean up the output:
let output group =
group
|> Seq.map fst
This picks the first part of the tuple from the group:
Turns ("hello", ["hello"; "hello"]) into "hello".
Now we have all the functions, we can stick them together into one chain:
let s = "some long string with some repeated words again and some other words"
let finished =
s
|> getWords
|> groupWords
|> sortWords
|> output
printfn "%A" finished
//seq ["some"; "words"; "long"; "string"; ...]

Here's another way using Regex
open System.Text.RegularExpressions
let str = "Some (very) long string with some repeated words again, and some other words, and some punctuation too."
str
|> (Regex #"\W+").Split
|> Seq.choose(fun s -> if s = "" then None else Some (s.ToLower()))
|> Seq.countBy id
|> Seq.sortByDescending snd

Related

How to reverse each string in a string list at \n?

I have this function which first reads the content of some files, then I have made the contents of the files into separate strings in a list. Then I want to access each element in the list and split the string when \n appears and reverse it. However I haven't been able to do the last part. How do I split each string in a string list when it comes across \n and then reverse the content?
An example. If I have a file that says "aaa\nbbb\n" and another that say "ccc\nddd\n" I want to split the string at \n at make the string says "ddd\nccc\nbbb\naaa\n "
Right now the output of the code underneath is Some "ccc\nddd\naaa\nbbb\n"
let tac (filenames:string list) : string option =
let bal = List.map (fun x -> readFile(x)) filenames
let mutable ral = []
for elem in bal do
let wal = [elem.Value]
ral <- ral # wal
let sal = List.choose id ral |> List.rev |> String.concat """"""
try
Some(sal)
with
| _ -> None
To be able to run your code, I simplified it slightly, so I have just:
let bal = ["aa\nbb"; "cc\ndd"]
let mutable ral = []
for elem in bal do
let wal = [elem]
ral <- ral # wal
let sal = bal |> List.rev |> String.concat ""
It seems that your code is also using option values in some way, but that's not relevant to this question. I also replace """""" with much simpler "". In your original code, you also have:
try Some(sal) with _ -> None
This is not necessary, because Some(sal) can never throw an exception. Now, in the code shown above, you have a for loop and then you use List.rev. The for loop is just recreating the same list, so this is not making much sense. You could either change the loop to reverse the list and drop List.rev or you could drop List.rev. I'll do the former:
let bal = ["aa\nbb"; "cc\ndd"]
let sal = bal |> List.rev |> String.concat ""
This takes a list of strings, reverses it and then concatenates the strings in reversed order. You also want to reverse each string. To do this, you can take characters of the string, reverse them and then turn them into strings and concatenate those:
"abc" |> Seq.rev |> Seq.map string |> String.concat ""
To do this for all strings in your original list, you can use List.map:
let sal =
bal
|> List.map (fun s ->
s |> Seq.rev |> Seq.map string |> String.concat "")
|> List.rev |> String.concat ""

Remove All but First Occurrence of a Character in a List of Strings

I have a list of names, and I need to output a single string that shows the letters from the names in the order they appear without the duplicates (e.g. If the list is ["John"; "James"; "Jack"], the output string should be Johnamesck). I've got a solution (folding all the names into a string then parse), but I feel like I'm cheesing it a bit by making my string mutable.
I also want to state this is not a school assignment, just an exercise from a work colleague as I'm coming into F# from only ever knowing Java Web stuff.
Here is my working solution (for insight purposes):
let lower = ['a' .. 'z']
let upper = ['A' .. 'Z']
let mutable concatedNames = ["John"; "James"; "Jack"] |> List.fold (+) ""
let greaterThanOne (length : int) = length > 1
let stripChars (str : string) letter =
let parts = str.Split([| letter |])
match greaterThanOne (Array.length parts) with
| true -> seq {
yield Array.head parts
yield string letter
yield! Array.tail parts
}
|> String.concat ""
| _ -> str
let killAllButFirstLower = lower |> List.iter (fun letter -> concatedNames <- (stripChars concatedNames letter))
let killAllButFirstUpper = upper |> List.iter ( fun letter -> concatedNames <- (stripChars concatedNames letter))
printfn "All names with duplicate letters removed: %s" concatedNames
I originally wanted to do this explicitly with functions alone and had a solution previous to above
let lower = ['a' .. 'z']
let upper = ['A' .. 'Z']
:
:
:
let lowerStripped = [""]
let killLowerDuplicates = lower |> List.iter (fun letter ->
match lowerStripped.Length with
| 1 ->
(stripChars concatedNames letter)::lowerStripped |> ignore
| _ -> (stripChars (List.head lowerStripped) letter)::lowerStripped |> ignore
)
let upperStripped = [List.head lowerStripped]
let killUpperDuplicates = lower |> List.iter ( fun letter -> (stripChars (List.head upperStripped) letter)::upperStripped |> ignore )
let strippedAll = List.head upperStripped
printfn "%s" strippedAll
But I couldn't get this working because I realized the consed lists weren't going anywhere (not to mention this is probably inefficient). The idea was that by doing it this way, once I parsed everything, the first element of the list would be the desired string.
I understand it may be strange asking a question I already have a solution to, but I feel like using mutable is just me not letting go of my Imperative habits (as I've read it should be rare to need to use it) and I want to more reinforce pure functional. So is there a better way to do this? Is the second solution a feasible route if I can somehow pipe the result somewhere?
You can use Seq.distinct to remove duplicates and retain ordering, so you just need to convert the list of strings to a single string, which can be done with String.concat "":
let distinctChars s = s |> String.concat ""
|> Seq.distinct
|> Array.ofSeq
|> System.String
If you run distinctChars ["John"; "James"; "Jack"], you will get back:
"Johnamesck"
This should do the trick:
let removeDuplicateCharacters strings =
// Treat each string as a seq<char>, flattening them into one big seq<char>
let chars = strings |> Seq.collect id // The id function (f(x) = x) is built in to F#
// We use it here because we want to collect the characters themselves
chars
|> Seq.mapi (fun i c -> i,c) // Get the index of each character in the overall sequence
|> Seq.choose (fun (i,c) ->
if i = (chars |> Seq.findIndex ((=) c)) // Is this character's index the same as the first occurence's index?
then Some c // If so, return (Some c) so that `choose` will include it,
else None) // Otherwise, return None so that `choose` will ignore it
|> Seq.toArray // Convert the seq<char> into a char []
|> System.String // Call the new String(char []) constructor with the choosen characters
Basically, we just treat the list of strings as one big sequence of characters, and choose the ones where the index in the overall sequence is the same as the index of the first occurrence of that character.
Running removeDuplicateCharacters ["John"; "James"; "Jack";] gives the expected output: "Johnamesck".

F# combining two sequences

I have two sequences that I would like to combine somehow as I need the result of the second one printed right next to the first. The code is currently where playerItems refers to a list:
seq state.player.playerItems
|> Seq.map (fun i -> i.name)
|> Seq.iter (printfn "You have a %s")
seq state.player.playerItems
|> Seq.map (fun i -> i.description) |> Seq.iter (printfn "Description = %s")
The result currently is
You have a Keycard
You have a Hammer
You have a Wrench
You have a Screw
Description = Swipe to enter
Description = Thump
Description = Grab, Twist, Let go, Repeat
Description = Twisty poke
However, I need it to be
You have a Keycard
Description = Swipe to enter
You have a Hammer
Description = Thump
Any help with this would be very appreciated.
As Foggy Finder said in the comments, in your specific case you really don't have two sequences, you have one sequence and you want to print two lines for each item, which can be done with a single Seq.iter like this:
state.player.playerItems // The "seq" beforehand is not necessary
|> Seq.iter (fun player -> printfn "You have a %s\nDescription = %s" player.name player.description)
However, I'll also tell you about two ways to combine two sequences, for the time when you really do have two different sequences. First, if you want to turn the two sequences into a sequence of tuples, you'd use Seq.zip:
let colors = Seq.ofList ["red"; "green"; "blue"]
let numbers = Seq.ofList [25; 73; 42]
let pairs = Seq.zip colors numbers
printfn "%A" pairs
// Prints: seq [("red", 25); ("green", 73); ("blue", 42)]
If you want to combine the two sequences in some other way than producing tuples, use Seq.map2 and pass it a two-parameter function:
let colors = Seq.ofList ["red"; "green"; "blue"]
let numbers = Seq.ofList [25; 73; 42]
let combined = Seq.map2 (fun clr num -> sprintf "%s: %d" clr num) colors numbers
printfn "%A" combined
// Prints: seq ["red: 25"; "green: 73"; "blue: 42"]
Finally, if all you want is to perform some side-effect for each pair of items in the two sequences, then Seq.iter2 is your friend:
let colors = Seq.ofList ["red"; "green"; "blue"]
let numbers = Seq.ofList [25; 73; 42]
Seq.iter2 (fun clr num -> printfn "%s: %d" clr num)
That would print the following three lines to the console:
red: 25
green: 73
blue: 42
Note how in the Seq.iter function, I'm not storing the result. That's because the result of Seq.iter is always (), the "unit" value that is F#'s equivalent of void. (Except that it's much more useful than void, for reasons that are beyond the scope of this answer. Search Stack Overflow for "[F#] unit" and you should find some interesting questions and answers, like this one.

F# Canopy - Generate Random Letters and or Numbers and use in a variable

I am using F# Canopy to complete some web testing. I am trying to create and load a random number with or without letters, not that important and use it to paste to my website.
The code I am currently using is
let genRandomNumbers count =
let rnd = System.Random()
List.init count
let l = genRandomNumbers 1
"#CompanyName" << l()
The #CompanyName is the ID of the element I am trying to pass l into. As it stands I am receiving the error 'The expression was expected to have type string but here it has type a list.
Any help would be greatly appreciated.
The << operator in canopy writes a string to the selector (I haven't used it but the documentation looks pretty clear), but your function returns a list. If you want the random string to work, you could do something like this (not tested code)
let randomNumString n = genRandomNumbers n |> List.map string |> List.reduce (+)
This maps your random list to strings then concats all the strings together using the first element as the accumulator seed. You could also do a fold
let randomNumString n = genRandomNumbers n
|> List.fold (fun acc i -> acc + (string i)) ""
Putting it all together
let rand = new System.Random()
let genRandomNumbers count = List.init count (fun _ -> rand.Next())
let randomNumString n = genRandomNumbers n |> List.map string |> List.reduce (+)
"#CompanyName" << (randomNumString 1)
In general, F# won't do any type promotion for you. Since the << operator wants a string on the right hand side, you need to map your list to a string somehow. That means iterating over each element, converting the number to a string, and adding all the elements together into one final string.

How to make word freq counter more efficient?

I have written this F# code to count word frequencies in a list and return a tuple to C#. Could you tell me how can I make the code more efficient or shorter?
let rec internal countword2 (tail : string list) wrd ((last : string list), count) =
match tail with
| [] -> last, wrd, count
| h::t -> countword2 t wrd (if h = wrd then last, count+1 else last # [h], count)
let internal countword1 (str : string list) wrd =
let temp, wrd, count = countword2 str wrd ([], 0) in
temp, wrd, count
let rec public countword (str : string list) =
match str with
| [] -> []
| h::_ ->
let temp, wrd, count = countword1 str h in
[(wrd, count)] # countword temp
Even pad's version can be made more efficient and concise:
let countWords = Seq.countBy id
Example:
countWords ["a"; "a"; "b"; "c"] //returns: seq [("a", 2); ("b", 1); ("c", 1)]
If you want to count word frequencies in a string list, your approach seems to be overkill. Seq.groupBy is well-fitted for this purpose:
let public countWords (words: string list) =
words |> Seq.groupBy id
|> Seq.map (fun (word, sq) -> word, Seq.length sq)
|> Seq.toList
Your solution iterates over the input list several times, for every new word that it founds. Instead of doing that, you could iterate over the list just once and build a dictionary that holds the number of all occurrences for every word.
To do this in a functional style, you can use F# Map, which is an immutable dictionary:
let countWords words =
// Increment the number of occurrences of 'word' in the map 'counts'
// If it isn't already in the dictionary, add it with count 1
let increment counts word =
match Map.tryFind word counts with
| Some count -> Map.add word (count + 1) counts
| _ -> Map.add word 1 counts
// Start with an empty map and call 'increment'
// to add all words to the dictionary
words |> List.fold increment Map.empty
You can also implement the same thing in an imperative style, which is going to be more efficient, but less elegant (and you don't get all benefits of functional style). However, standard mutable Dictionary can be used nicely from F# too (this is going to be similar to C# version, so I won't write it here).
Finally, if you want a simple solution using just standard F# functions, you can use Seq.groupBy as suggested by pad. This would be probably almost as efficient as the Dictionary based version. But then, if you're just learning F# then writing a few recursive functions like countWords yourself is a great way to learn!
To give you some comments about your code - the complexity of your approach is slightly higher, but that should probably be fine. There are however some common isses:
In your countword2 function, you have if h = wrd then ... else last # [h], count. The call last # [h] is inefficient, because it needs to clone the entire list last. Instead of this, you could just write h::last to add the word to the beginning, because the order does not matter.
On the last line, you're using # again in [(wrd, count)] # countword temp. This is not necessary. If you're adding single element to the beginning of list, you should use: (wrd,count)::(countword temp).

Resources