How to generate tuples by FsCheck - f#

This is a json generation :
let strGen = Arb.Default.String()
|> Arb.toGen
strGen
|> Gen.arrayOf
|> Gen.map (String.concat "\", \"")
|> Gen.map (fun strs -> "[\"" + strs + "\"]")
How can I have the string that the json have been created from in my test body to assert the final result.

My original answer was to use Gen.map2 to combine two generators, one for the string array and one for the json string. But Gen.map2 is specifically designed to let two independent generators be combined, i.e., the result of one generator won't affect the result of the other one. (E.g., rolling two dice: the result of the first die is independent of the result of the second die). What you need is a simple Gen.map that takes the string array generator and produces a tuple of (string array, json). Like so:
let strGen = Arb.Default.String() |> Arb.toGen
let arrayGen = strGen |> Gen.arrayOf
arrayGen |> Gen.map (fun array ->
let json =
array
|> String.concat "\", \""
|> fun strs -> "[\"" + strs + "\"]")
array,json)
Unlike my answer below which combined two independent generators, here there is only ONE generator, whose value is used to produce both the array and the json values. So these values will be dependent rather than independent, and the json will always match the string array.
Original, INCORRECT, answer below, preserved in case the contrast between the two answers is useful:
Easy. Just save the array generator, and re-use it later, using Gen.map2 to combine the array and the json. E.g.:
let strGen = Arb.Default.String()
|> Arb.toGen
let arrayGen = strGen |> Gen.arrayOf
let jsonGen =
arrayGen
|> Gen.map (String.concat "\", \"")
|> Gen.map (fun strs -> "[\"" + strs + "\"]")
Gen.map2 (fun array json -> array,json) arrayGen jsonGen
And now you have a generator that produces a 2-tuple. The first element of the tuple is the string array, and the second element is the json that was generated.
BTW, your JSON-creating code isn't quite correct yet, because if the generated string contains quotation marks, you'll need to quote them in some way or your generated JSON will be invalid. But I'll let you handle that, or ask a new question about that if you don't know how to handle that. The "single responsibility principle" applies to Stack Overflow questions, too: each question should ideally be about just one subject.

Can't seem to be able to put the code in comments, so here's a cleaned up version:
let isDigitOrWord i =
i |> String.isNullOrEmpty
|> not && Regex.IsMatch(i,"^[a-zA-Z0-9 ]*$")
let strGen = Arb.Default.String() |> Arb.toGen
Gen.arrayOf strGen
|> Gen.map (fun array ->
let array = array |> Array.filter isDigitOrWord
let json =
array
|> String.concat "\", \""
|> fun strs -> if strs|> String.isEmpty then strs else "\"" + strs + "\""
|> fun strs -> "[" + strs + "]"
array,json)

Related

How to reverse each string in a string list at \n?

I have this function which first reads the content of some files, then I have made the contents of the files into separate strings in a list. Then I want to access each element in the list and split the string when \n appears and reverse it. However I haven't been able to do the last part. How do I split each string in a string list when it comes across \n and then reverse the content?
An example. If I have a file that says "aaa\nbbb\n" and another that say "ccc\nddd\n" I want to split the string at \n at make the string says "ddd\nccc\nbbb\naaa\n "
Right now the output of the code underneath is Some "ccc\nddd\naaa\nbbb\n"
let tac (filenames:string list) : string option =
let bal = List.map (fun x -> readFile(x)) filenames
let mutable ral = []
for elem in bal do
let wal = [elem.Value]
ral <- ral # wal
let sal = List.choose id ral |> List.rev |> String.concat """"""
try
Some(sal)
with
| _ -> None
To be able to run your code, I simplified it slightly, so I have just:
let bal = ["aa\nbb"; "cc\ndd"]
let mutable ral = []
for elem in bal do
let wal = [elem]
ral <- ral # wal
let sal = bal |> List.rev |> String.concat ""
It seems that your code is also using option values in some way, but that's not relevant to this question. I also replace """""" with much simpler "". In your original code, you also have:
try Some(sal) with _ -> None
This is not necessary, because Some(sal) can never throw an exception. Now, in the code shown above, you have a for loop and then you use List.rev. The for loop is just recreating the same list, so this is not making much sense. You could either change the loop to reverse the list and drop List.rev or you could drop List.rev. I'll do the former:
let bal = ["aa\nbb"; "cc\ndd"]
let sal = bal |> List.rev |> String.concat ""
This takes a list of strings, reverses it and then concatenates the strings in reversed order. You also want to reverse each string. To do this, you can take characters of the string, reverse them and then turn them into strings and concatenate those:
"abc" |> Seq.rev |> Seq.map string |> String.concat ""
To do this for all strings in your original list, you can use List.map:
let sal =
bal
|> List.map (fun s ->
s |> Seq.rev |> Seq.map string |> String.concat "")
|> List.rev |> String.concat ""

Remove All but First Occurrence of a Character in a List of Strings

I have a list of names, and I need to output a single string that shows the letters from the names in the order they appear without the duplicates (e.g. If the list is ["John"; "James"; "Jack"], the output string should be Johnamesck). I've got a solution (folding all the names into a string then parse), but I feel like I'm cheesing it a bit by making my string mutable.
I also want to state this is not a school assignment, just an exercise from a work colleague as I'm coming into F# from only ever knowing Java Web stuff.
Here is my working solution (for insight purposes):
let lower = ['a' .. 'z']
let upper = ['A' .. 'Z']
let mutable concatedNames = ["John"; "James"; "Jack"] |> List.fold (+) ""
let greaterThanOne (length : int) = length > 1
let stripChars (str : string) letter =
let parts = str.Split([| letter |])
match greaterThanOne (Array.length parts) with
| true -> seq {
yield Array.head parts
yield string letter
yield! Array.tail parts
}
|> String.concat ""
| _ -> str
let killAllButFirstLower = lower |> List.iter (fun letter -> concatedNames <- (stripChars concatedNames letter))
let killAllButFirstUpper = upper |> List.iter ( fun letter -> concatedNames <- (stripChars concatedNames letter))
printfn "All names with duplicate letters removed: %s" concatedNames
I originally wanted to do this explicitly with functions alone and had a solution previous to above
let lower = ['a' .. 'z']
let upper = ['A' .. 'Z']
:
:
:
let lowerStripped = [""]
let killLowerDuplicates = lower |> List.iter (fun letter ->
match lowerStripped.Length with
| 1 ->
(stripChars concatedNames letter)::lowerStripped |> ignore
| _ -> (stripChars (List.head lowerStripped) letter)::lowerStripped |> ignore
)
let upperStripped = [List.head lowerStripped]
let killUpperDuplicates = lower |> List.iter ( fun letter -> (stripChars (List.head upperStripped) letter)::upperStripped |> ignore )
let strippedAll = List.head upperStripped
printfn "%s" strippedAll
But I couldn't get this working because I realized the consed lists weren't going anywhere (not to mention this is probably inefficient). The idea was that by doing it this way, once I parsed everything, the first element of the list would be the desired string.
I understand it may be strange asking a question I already have a solution to, but I feel like using mutable is just me not letting go of my Imperative habits (as I've read it should be rare to need to use it) and I want to more reinforce pure functional. So is there a better way to do this? Is the second solution a feasible route if I can somehow pipe the result somewhere?
You can use Seq.distinct to remove duplicates and retain ordering, so you just need to convert the list of strings to a single string, which can be done with String.concat "":
let distinctChars s = s |> String.concat ""
|> Seq.distinct
|> Array.ofSeq
|> System.String
If you run distinctChars ["John"; "James"; "Jack"], you will get back:
"Johnamesck"
This should do the trick:
let removeDuplicateCharacters strings =
// Treat each string as a seq<char>, flattening them into one big seq<char>
let chars = strings |> Seq.collect id // The id function (f(x) = x) is built in to F#
// We use it here because we want to collect the characters themselves
chars
|> Seq.mapi (fun i c -> i,c) // Get the index of each character in the overall sequence
|> Seq.choose (fun (i,c) ->
if i = (chars |> Seq.findIndex ((=) c)) // Is this character's index the same as the first occurence's index?
then Some c // If so, return (Some c) so that `choose` will include it,
else None) // Otherwise, return None so that `choose` will ignore it
|> Seq.toArray // Convert the seq<char> into a char []
|> System.String // Call the new String(char []) constructor with the choosen characters
Basically, we just treat the list of strings as one big sequence of characters, and choose the ones where the index in the overall sequence is the same as the index of the first occurrence of that character.
Running removeDuplicateCharacters ["John"; "James"; "Jack";] gives the expected output: "Johnamesck".

F# Writing to file changes behavior on return type

I have the following function that convert csv files to a specific txt schema (expected by CNTKTextFormat Reader):
open System.IO
open FSharp.Data;
open Deedle;
let convert (inFileName : string) =
let data = Frame.ReadCsv(inFileName)
let outFileName = inFileName.Substring(0, (inFileName.Length - 4)) + ".txt"
use outFile = new StreamWriter(outFileName, false)
data.Rows.Observations
|> Seq.map(fun kvp ->
let row = kvp.Value |> Series.observations |> Seq.map(fun (k,v) -> v) |> Seq.toList
match row with
| label::data ->
let body = data |> List.map string |> String.concat " "
outFile.WriteLine(sprintf "|labels %A |features %s" label body)
printf "%A" label
| _ ->
failwith "Bad data."
)
|> ignore
Strangely, the output file is empty after running in the F# interactive panel and that printf yields no printing at all.
If I remove the ignore to make sure that there are actual rows being processed (evidenced by returning a seq of nulls), instead of an empty file I get:
val it : seq<unit> = Error: Cannot write to a closed TextWriter.
Before, I was declaring the StreamWriter using let and disposing it manually, but I also generated empty files or just a few lines (say 5 out of thousands).
What is happening here? Also, how to fix the file writing?
Seq.map returns a lazy sequence which is not evaluated until it is iterated over. You are not currently iterating over it within convert so no rows are processed. If you return a Seq<unit> and iterate over it outside convert, outFile will already be closed which is why you see the exception.
You should use Seq.iter instead:
data.Rows.Observations
|> Seq.iter (fun kvp -> ...)
Apart from the solutions already mentioned, you could also avoid the StreamWriter altogether, and use one of the standard .Net functions, File.WriteAllLines. You would prepare a sequence of converted lines, and then write that to the file:
let convert (inFileName : string) =
let lines =
Frame.ReadCsv(inFileName).Rows.Observations
|> Seq.map(fun kvp ->
let row = kvp.Value |> Series.observations |> Seq.map snd |> Seq.toList
match row with
| label::data ->
let body = data |> List.map string |> String.concat " "
printf "%A" label
sprintf "|labels %A |features %s" label body
| _ ->
failwith "Bad data."
)
let outFileName = inFileName.Substring(0, (inFileName.Length - 4)) + ".txt"
File.WriteAllLines(outFileName, lines)
Update based on the discussion in the comments: Here's a solution that avoids Deedle altogether. I'm making some assumptions about your input file format here, based on another question you posted today: Label is in column 1, features follow.
let lines =
File.ReadLines inFileName
|> Seq.map (fun line ->
match Seq.toList(line.Split ',') with
| label::data ->
let body = data |> List.map string |> String.concat " "
printf "%A" label
sprintf "|labels %A |features %s" label body
| _ ->
failwith "Bad data."
)
As Lee already mentioned, Seq.map is lazy. And that's also why you were getting "Cannot write to a closed TextWriter": the use keyword disposes of its IDisposable when it goes out of scope. In this case, that's at the end of your function. Since Seq.map is lazy, your function was returning an unevaluated sequence object, which had closed over the StreamWriter in your use statement -- but by the time you evaluated that sequence (in whatever part of your code checked for the Seq of nulls, or in the F# Interactive window), the StreamWriter had already been disposed by going out of scope.
Change Seq.map to Seq.iter and both of your problems will be solved.

How to manipulate list elements in F#

I'm currently working my way through a project using F#. I'm quite new to functional programming, and while I'm familiar with the idea of list items being immutable, I'm still having a bit of an issue:
I have a list of strings of the format
["(states, (1,2,3,4,5))"; "(alpha, (1,2))"; "(final, (1))"]
What I would like to do is turn each list element into its own list without the initial comma separated string. The output should look something like this:
["1"; "2"; "3"; "4"; "5"]
["1"; "2"]
["1"]
I've found myriad ways to concatenate list elements and my best guesses thus far (unfolding, or something of the sort) have been fruitless. Any help or a point in the right direction would be much appreciated. Thanks!
Just for the fun of it, here's an outline of how to parse the strings using FParsec, a parser combinator library.
First, you import some modules:
open FParsec.Primitives
open FParsec.CharParsers
Then, you can define a parser that will match all strings enclosed by parentheses:
let betweenParentheses p s = between (pstring "(") (pstring ")") p s
This will match any string enclosed in parentheses, such as "(42)", "(foo)", "(1,2,3,4,5)", etc., depending on the specific parser p passed as the first argument.
In order to parse numbers like "(1,2,3,4,5)" or "(1,2)", you can combine betweenParentheses with FParsec's built-in sepBy and pint32:
let pnumbers s = betweenParentheses (sepBy pint32 (pstring ",")) s
pint32 is a parser of integers, and sepBy is a parser that reads a list of values, separated by a string - in this case ",".
In order to parse an entire 'group' of values, such as "(states, (1,2,3,4,5))" or "(alpha, (1,2))", you can again use betweenParentheses and pnumbers:
let pgroup s =
betweenParentheses
(manyTill anyChar (pstring ",") >>. spaces >>. pnumbers) s
The manyTill combination parses any char value until it encounters ,. Next, the pgroup parser expects any number of spaces, and then the format defined by pnumbers.
Finally, you can define a function that runs the pgroup parser on a string:
// string -> int32 list option
let parseGroup s =
match run pgroup s with
| Success (result, _, _) -> Some result
| Failure _ -> None
Since this function returns an option, you can use List.choose to map the strings that can be parsed:
> ["(states, (1,2,3,4,5))"; "(alpha, (1,2))"; "(final, (1))"]
|> List.choose parseGroup;;
val it : int32 list list = [[1; 2; 3; 4; 5]; [1; 2]; [1]]
Using FParsec is most likely overkill, unless you have some more flexible formatting rules than what can easily be addressed with .NET's standard string API.
You can also just use Char.IsDigit (at least based on your sample data) like so:
open System
// Signature is string -> string list
let getDigits (input : string) =
input.ToCharArray()
|> Array.filter Char.IsDigit
|> Array.map (fun c -> c.ToString())
|> List.ofArray
// signature is string list -> string list list
let convertToDigits input =
input
|> List.map getDigits
And testing it out in F# interactive:
> let sampleData = ["(states, (1,2,3,4,5))"; "(alpha, (1,2))"; "(final, (1))"];;
val sampleData : string list =
["(states, (1,2,3,4,5))"; "(alpha, (1,2))"; "(final, (1))"]
> let test = convertToDigits sampleData;;
val test : string list list = [["1"; "2"; "3"; "4"; "5"]; ["1"; "2"]; ["1"]]
NOTE: If you have more than 1 digit numbers, this will split them into individual elements in the list. If you don't want that you'll have to use regex or string.split or something else.
You can achieve this with the built-in string manipulation API in .NET. You don't have to make it particular fancy, but it helps to provide some slim, curried Adapters over the string API:
open System
let removeWhitespace (x : string) = x.Replace(" ", "")
let splitOn (separator : string) (x : string) =
x.Split([| separator |], StringSplitOptions.RemoveEmptyEntries)
let trim c (x : string) = x.Trim [| c |]
The only slightly tricky step is once you've used splitOn to split "(states, (1,2,3,4,5))" into [|"(states"; "1,2,3,4,5))"|]. Now you have an array with two elements, and you want the second element. You can do this by first taking Seq.tail of that array, throwing away the first element, and then taking Seq.head of the resulting sequence, giving you the first element of the remaining sequence.
Using these building blocks, you can extract the desired data like this:
let result =
["(states, (1,2,3,4,5))"; "(alpha, (1,2))"; "(final, (1))"]
|> List.map (
removeWhitespace
>> splitOn ",("
>> Seq.tail
>> Seq.head
>> trim ')'
>> splitOn ","
>> Array.toList)
Result:
val result : string list list = [["1"; "2"; "3"; "4"; "5"]; ["1"; "2"]; ["1"]]
The most unsafe part is the Seq.tail >> Seq.head combination. It can fail if the input list has fewer than two elements. A safer alternative would be to use something like the following trySecond helper function:
let trySecond xs =
match xs |> Seq.truncate 2 |> Seq.toList with
| [_; second] -> Some second
| _ -> None
Using this function, you can rewrite the data extraction function to be a bit more robust:
let result' =
["(states, (1,2,3,4,5))"; "(alpha, (1,2))"; "(final, (1))"]
|> List.map (removeWhitespace >> splitOn ",(" >> trySecond)
|> List.choose id
|> List.map (trim ')' >> splitOn "," >> Array.toList)
The result is the same as before.
As #JWosty suggested, start with a single list item and match it using regular expressions.
let text = "(states, (1,2,3,4,5))"
// Match all numbers into group "number"
let pattern = #"^\(\w+,\s*\((?:(?<number>\d+),)*(?<number>\d+)\)$"
let numberMatch = System.Text.RegularExpressions.Regex.Match(text, pattern)
let values =
numberMatch.Groups.["number"].Captures // get all matches from the group
|> Seq.cast<Capture> // cast each item because regex captures are non-generic (i.e. IEnumerable instead of IEnumerable<'a>)
|> Seq.map (fun m -> m.Value) // get the matched (string) value for each capture
|> Seq.map int // parse as int
|> Seq.toList // listify
Doing this for a list of input texts is just a matter of passing this logic to List.map.
What I like about this solution is that it doesn't use magic numbers but the core of it is just a regular expression. Also parsing each match as integer is pretty safe because we only match digits.
Similar to Luiso's answer, but should avoid exceptions. Note that I split on '(' and ')' so I can isolate the tuple. Then I try to get the tuple only before splitting it on ',' to get the final result. I use pattern matching to avoid exceptions.
open System
let values = ["(states, (1,2,3,4,5))"; "(alpha, (1,2))"; "(final, (1))"]
let new_list = values |> List.map(fun i -> i.Split([|'(';')'|], StringSplitOptions.RemoveEmptyEntries))
|> List.map(fun i -> i|> Array.tryItem(1))
|> List.map(function x -> match x with
| Some i -> i.Split(',') |> Array.toList
| None -> [])
printfn "%A" new_list
gives you:
[["1"; "2"; "3"; "4"; "5"]; ["1"; "2"]; ["1"]]
This snippet should do about you ask:
let values = ["(states, (1,2,3,4,5))"; "(alpha, (1,2))"; "(final, (1))"]
let mapper (value:string) =
let index = value.IndexOf('(', 2) + 1;
value.Substring(index, value.Length - index - 2).Split(',') |> Array.toList
values |> List.map mapper
Output:
val it : string list list = [["1"; "2"; "3"; "4"; "5"]; ["1"; "2"]; ["1"]]
As I see it every item on you original list is a tuple of a string and a tuple of int of variable size, in any case what the code above does is removing the first item of the tuple and then then use the remaining variable size tuple (the numbers inside the parens), then call the .Net string.Split() function and turns the resulting array to a list. Hope this helps

F# Basics: Folding 2 lists together into a string

a little rusty from my Scheme days, I'd like to take 2 lists: one of numbers and one of strings, and fold them together into a single string where each pair is written like "{(ushort)5, "bla bla bla"},\n". I have most of it, i'm just not sure how to write the Fold properly:
let splitter = [|","|]
let indexes =
indexStr.Split(splitter, System.StringSplitOptions.None) |> Seq.toList
let values =
valueStr.Split(splitter, System.StringSplitOptions.None) |> Seq.toList
let pairs = List.zip indexes values
printfn "%A" pairs
let result = pairs |> Seq.fold
(fun acc a -> String.Format("{0}, \{(ushort){1}, \"{2}\"\}\n",
acc, (List.nth a 0), (List.nth a 1)))
Your missing two things. The initial state of the fold which is an empty string and you can't use list comprehension on tuples in F#.
let splitter = [|","|]
let indexes =
indexStr.Split(splitter, System.StringSplitOptions.None) |> Seq.toList
let values =
valueStr.Split(splitter, System.StringSplitOptions.None) |> Seq.toList
let pairs = List.zip indexes values
printfn "%A" pairs
let result =
pairs
|> Seq.fold (fun acc (index, value) ->
String.Format("{0}{{(ushort){1}, \"{2}\"}},\n", acc, index, value)) ""
fold2 version
let result =
List.fold2
(fun acc index value ->
String.Format("{0}{{(ushort){1}, \"{2}\"}},\n", acc, index, value))
""
indexes
values
If you are concerned with speed you may want to use string builder since it doesn't create a new string every time you append.
let result =
List.fold2
(fun (sb:StringBuilder) index value ->
sb.AppendFormat("{{(ushort){0}, \"{1}\"}},\n", index, value))
(StringBuilder())
indexes
values
|> string
Fold probably isn't the best method for this task. Its a lot easier to map and concat like this:
let l1 = "a,b,c,d,e".Split([|','|])
let l2 = "1,2,3,4,5".Split([|','|])
let pairs =
Seq.zip l1 l2
|> Seq.map (fun (x, y) -> sprintf "(ushort)%s, \"%s\"" x y)
|> String.concat "\n"
I think you want List.fold2. For some reason the List module has a fold2 member but Seq doesn't. Then you can dispense with the zip entirely.
The types of your named variables and the type of the result you hope for are all implicit, so it's difficult to help, but if you are trying to accumulate a list of strings you might consider something along the lines of
let result = pairs |> Seq.fold
(fun prev (l, r) ->
String.Format("{0}, \{(ushort){1}, \"{2}\"\}\n", prev, l, r)
"" pairs
My F#/Caml is very rusty so I may have the order of arguments wrong. Also note your string formation is quadratic; in my own code I would go with something more along these lines:
let strings =
List.fold2 (fun ss l r ->
String.format ("\{(ushort){0}, \"{1}\"\}\n", l, r) :: ss)
[] indexes values
let result = String.concat ", " strings
This won't cost you quadratic time and it's a little easier to follow. I've checked MSDN and believe I have the correct order of arguments on fold2.
Keep in mind I know Caml not F# and so I may have details or order of arguments wrong.
Perhaps this:
let strBuilder = new StringBuilder()
for (i,v) in Seq.zip indexes values do
strBuilder.Append(String.Format("{{(ushort){0}, \"{1}\"}},\n", i,v))
|> ignore
with F# sometimes is better go imperative...
map2 or fold2 is the right way to go. Here's my take, using the (||>) operator:
let l1 = [| "a"; "b"; "c"; "d"; "e" |]
let l2 = [| "1"; "2"; "3"; "4"; "5" |]
let pairs = (l1, l2) ||> Seq.map2 (sprintf ("(ushort)%s, \"%s\""))
|> String.concat "\n"

Resources