Convert multiple consequent invalid characters to one underline? - f#

The following string is not a valid file name.
"File name\r\n\t\t\t\t\r\n\t\t\t\t (Revised 2018-05-31 15:35:41.16).txt"
The following code converts it to a valid file name.
let fn = """File name
(Revised 2018-05-31 15:35:41.16).txt""";;
let invalid = System.IO.Path.GetInvalidFileNameChars();;
String.Join("",
fn |> Seq.filter(fun x ->
not (Array.exists (fun y -> y = x) invalid)
)
)
// "File name (Revised 2018-05-31 153541.16).txt"
It just removes these invalid characters. How to convert these invalid to a _? For these multiple consequent invalid characters, I want them to be replaced to only one _. So the expected result should be
"File name_ (Revised 2018-05-31 15_35_41.16).txt"

This should work:
open System.Text.RegularExpressions
let normalizeFileName name =
let invalidPattern =
System.IO.Path.GetInvalidFileNameChars()
|> Seq.map (string >> Regex.Escape)
|> String.concat ""
|> sprintf "[%s]+"
Regex.Replace(name, invalidPattern, "_")

Related

Can I make return type vary with parameter a bit like sprintf in F#?

In the F# core libraries there are functions whose signature seemingly changes based on the parameter at compile-time:
> sprintf "Hello %i" ;;
val it : (int -> string) = <fun:it#1>
> sprintf "Hello %s" ;;
val it : (string -> string) = <fun:it#2-1>
Is it possible to implement my own functions that have this property?
For example, could I design a function that matches strings with variable components:
matchPath "/products/:string/:string" (fun (category : string) (sku : string) -> ())
matchPath "/tickets/:int" (fun (id : int) -> ())
Ideally, I would like to do avoid dynamic casts.
There are two relevant F# features that make it possible to do something like this.
Printf format strings. The compiler handles format strings like "hi %s" in a special way. They are not limited just to printf and it's possible to use those in your library in a somewhat different way. This does not let you change the syntax, but if you were happy to specify your paths using e.g. "/products/%s/%d", then you could use this. The Giraffe library defines routef function, which uses this trick for request routing:
let webApp =
choose [
routef "/foo/%s/%s/%i" fooHandler
routef "/bar/%O" (fun guid -> text (guid.ToString()))
]
Type providers. Another option is to use F# type providers. With parameterized type providers, you can write a type that is parameterized by a literal string and has members with types that are generated by some F# code you write based on the literal string parameter. An example is the Regex type provider:
type TempRegex = Regex< #"^(?<Temperature>[\d\.]+)\s*°C$", noMethodPrefix = true >
TempRegex().Match("21.3°C").Temperature.TryValue
Here, the regular expression on the first line is static parameter of the Regex type provider. The type provider generates a Match method which returns an object with properties like Temperature that are based on the literal string. You would likely be able to use this and write something like:
MatchPath<"/products/:category/:sku">.Match(fun r ->
printfn "Got category %s and sku %s" r.Category r.Sku)
I tweaked your example so that r is an object with properties that have names matching to those in the string, but you could use a lambda with multiple parameters too. Although, if you wanted to specify types of those matches, you might need a fancier syntax like "/product/[category:int]/[sku:string]" - this is just a string you have to parse in the type provider, so it's completely up to you.
1st: Tomas's answer is the right answer.
But ... I had the same question.
And while I could understand it conceptually as "it has to be 'the string format thing' or 'the provider stuff'"
I could not tell my self that I got until I tried an implementation
... And it took me a bit .
I used FSharp.Core's printfs and Giraffe's FormatExpressions.fs as guidelines
And came up with this naive gist/implementation, inspired by Giraffe FormatExpressions.fs
BTW The trick is in this bit of magic fun (format: PrintfFormat<_, _, _, _, 'T>) (handle: 'T -> 'R)
open System.Text.RegularExpressions
// convert format pattern to Regex Pattern
let rec toRegexPattern =
function
| '%' :: c :: tail ->
match c with
| 'i' ->
let x, rest = toRegexPattern tail
"(\d+)" + x, rest
| 's' ->
let x, rest = toRegexPattern tail
"(\w+)" + x, rest
| x ->
failwithf "'%%%c' is Not Implemented\n" x
| c :: tail ->
let x, rest = toRegexPattern tail
let r = c.ToString() |> Regex.Escape
r + x, rest
| [] -> "", []
// Handler Factory
let inline Handler (format: PrintfFormat<_, _, _, _, 'T>) (handle: 'T -> string) (decode: string list -> 'T) =
format.Value.ToCharArray()
|> List.ofArray
|> toRegexPattern
|> fst, handle, decode
// Active Patterns
let (|RegexMatch|_|) pattern input =
let m = Regex.Match(input, pattern)
if m.Success then
let values =
[ for g in Regex(pattern).Match(input).Groups do
if g.Success && g.Name <> "0" then yield g.Value ]
Some values
else
None
let getPattern (pattern, _, _) = pattern
let gethandler (_, handle, _) = handle
let getDecoder (_, _, decode) = decode
let Router path =
let route1 =
Handler "/xyz/%s/%i"
(fun (category, id) ->
// process request
sprintf "handled: route1: %s/%i" category id)
(fun values ->
// convert matches
values |> List.item 0,
values
|> List.item 1
|> int32)
let route2 =
Handler "/xyz/%i"
(fun (id) -> sprintf "handled: route2: id: %i" id) // handle
(fun values -> values|> List.head |> int32) // decode
// Router
(match path with
| RegexMatch (getPattern route2) values ->
values
|> getDecoder route2
|> gethandler route2
| RegexMatch (getPattern route1) values ->
values
|> getDecoder route1
|> gethandler route1
| _ -> failwith "No Match")
|> printf "routed: %A\n"
let main argv =
try
let arg = argv |> Array.skip 1 |> Array.head
Router arg
0 // return an integer exit code
with
| Failure msg ->
eprintf "Error: %s\n" msg
-1

Remove All but First Occurrence of a Character in a List of Strings

I have a list of names, and I need to output a single string that shows the letters from the names in the order they appear without the duplicates (e.g. If the list is ["John"; "James"; "Jack"], the output string should be Johnamesck). I've got a solution (folding all the names into a string then parse), but I feel like I'm cheesing it a bit by making my string mutable.
I also want to state this is not a school assignment, just an exercise from a work colleague as I'm coming into F# from only ever knowing Java Web stuff.
Here is my working solution (for insight purposes):
let lower = ['a' .. 'z']
let upper = ['A' .. 'Z']
let mutable concatedNames = ["John"; "James"; "Jack"] |> List.fold (+) ""
let greaterThanOne (length : int) = length > 1
let stripChars (str : string) letter =
let parts = str.Split([| letter |])
match greaterThanOne (Array.length parts) with
| true -> seq {
yield Array.head parts
yield string letter
yield! Array.tail parts
}
|> String.concat ""
| _ -> str
let killAllButFirstLower = lower |> List.iter (fun letter -> concatedNames <- (stripChars concatedNames letter))
let killAllButFirstUpper = upper |> List.iter ( fun letter -> concatedNames <- (stripChars concatedNames letter))
printfn "All names with duplicate letters removed: %s" concatedNames
I originally wanted to do this explicitly with functions alone and had a solution previous to above
let lower = ['a' .. 'z']
let upper = ['A' .. 'Z']
:
:
:
let lowerStripped = [""]
let killLowerDuplicates = lower |> List.iter (fun letter ->
match lowerStripped.Length with
| 1 ->
(stripChars concatedNames letter)::lowerStripped |> ignore
| _ -> (stripChars (List.head lowerStripped) letter)::lowerStripped |> ignore
)
let upperStripped = [List.head lowerStripped]
let killUpperDuplicates = lower |> List.iter ( fun letter -> (stripChars (List.head upperStripped) letter)::upperStripped |> ignore )
let strippedAll = List.head upperStripped
printfn "%s" strippedAll
But I couldn't get this working because I realized the consed lists weren't going anywhere (not to mention this is probably inefficient). The idea was that by doing it this way, once I parsed everything, the first element of the list would be the desired string.
I understand it may be strange asking a question I already have a solution to, but I feel like using mutable is just me not letting go of my Imperative habits (as I've read it should be rare to need to use it) and I want to more reinforce pure functional. So is there a better way to do this? Is the second solution a feasible route if I can somehow pipe the result somewhere?
You can use Seq.distinct to remove duplicates and retain ordering, so you just need to convert the list of strings to a single string, which can be done with String.concat "":
let distinctChars s = s |> String.concat ""
|> Seq.distinct
|> Array.ofSeq
|> System.String
If you run distinctChars ["John"; "James"; "Jack"], you will get back:
"Johnamesck"
This should do the trick:
let removeDuplicateCharacters strings =
// Treat each string as a seq<char>, flattening them into one big seq<char>
let chars = strings |> Seq.collect id // The id function (f(x) = x) is built in to F#
// We use it here because we want to collect the characters themselves
chars
|> Seq.mapi (fun i c -> i,c) // Get the index of each character in the overall sequence
|> Seq.choose (fun (i,c) ->
if i = (chars |> Seq.findIndex ((=) c)) // Is this character's index the same as the first occurence's index?
then Some c // If so, return (Some c) so that `choose` will include it,
else None) // Otherwise, return None so that `choose` will ignore it
|> Seq.toArray // Convert the seq<char> into a char []
|> System.String // Call the new String(char []) constructor with the choosen characters
Basically, we just treat the list of strings as one big sequence of characters, and choose the ones where the index in the overall sequence is the same as the index of the first occurrence of that character.
Running removeDuplicateCharacters ["John"; "James"; "Jack";] gives the expected output: "Johnamesck".

F# Writing to file changes behavior on return type

I have the following function that convert csv files to a specific txt schema (expected by CNTKTextFormat Reader):
open System.IO
open FSharp.Data;
open Deedle;
let convert (inFileName : string) =
let data = Frame.ReadCsv(inFileName)
let outFileName = inFileName.Substring(0, (inFileName.Length - 4)) + ".txt"
use outFile = new StreamWriter(outFileName, false)
data.Rows.Observations
|> Seq.map(fun kvp ->
let row = kvp.Value |> Series.observations |> Seq.map(fun (k,v) -> v) |> Seq.toList
match row with
| label::data ->
let body = data |> List.map string |> String.concat " "
outFile.WriteLine(sprintf "|labels %A |features %s" label body)
printf "%A" label
| _ ->
failwith "Bad data."
)
|> ignore
Strangely, the output file is empty after running in the F# interactive panel and that printf yields no printing at all.
If I remove the ignore to make sure that there are actual rows being processed (evidenced by returning a seq of nulls), instead of an empty file I get:
val it : seq<unit> = Error: Cannot write to a closed TextWriter.
Before, I was declaring the StreamWriter using let and disposing it manually, but I also generated empty files or just a few lines (say 5 out of thousands).
What is happening here? Also, how to fix the file writing?
Seq.map returns a lazy sequence which is not evaluated until it is iterated over. You are not currently iterating over it within convert so no rows are processed. If you return a Seq<unit> and iterate over it outside convert, outFile will already be closed which is why you see the exception.
You should use Seq.iter instead:
data.Rows.Observations
|> Seq.iter (fun kvp -> ...)
Apart from the solutions already mentioned, you could also avoid the StreamWriter altogether, and use one of the standard .Net functions, File.WriteAllLines. You would prepare a sequence of converted lines, and then write that to the file:
let convert (inFileName : string) =
let lines =
Frame.ReadCsv(inFileName).Rows.Observations
|> Seq.map(fun kvp ->
let row = kvp.Value |> Series.observations |> Seq.map snd |> Seq.toList
match row with
| label::data ->
let body = data |> List.map string |> String.concat " "
printf "%A" label
sprintf "|labels %A |features %s" label body
| _ ->
failwith "Bad data."
)
let outFileName = inFileName.Substring(0, (inFileName.Length - 4)) + ".txt"
File.WriteAllLines(outFileName, lines)
Update based on the discussion in the comments: Here's a solution that avoids Deedle altogether. I'm making some assumptions about your input file format here, based on another question you posted today: Label is in column 1, features follow.
let lines =
File.ReadLines inFileName
|> Seq.map (fun line ->
match Seq.toList(line.Split ',') with
| label::data ->
let body = data |> List.map string |> String.concat " "
printf "%A" label
sprintf "|labels %A |features %s" label body
| _ ->
failwith "Bad data."
)
As Lee already mentioned, Seq.map is lazy. And that's also why you were getting "Cannot write to a closed TextWriter": the use keyword disposes of its IDisposable when it goes out of scope. In this case, that's at the end of your function. Since Seq.map is lazy, your function was returning an unevaluated sequence object, which had closed over the StreamWriter in your use statement -- but by the time you evaluated that sequence (in whatever part of your code checked for the Seq of nulls, or in the F# Interactive window), the StreamWriter had already been disposed by going out of scope.
Change Seq.map to Seq.iter and both of your problems will be solved.

How to manipulate list elements in F#

I'm currently working my way through a project using F#. I'm quite new to functional programming, and while I'm familiar with the idea of list items being immutable, I'm still having a bit of an issue:
I have a list of strings of the format
["(states, (1,2,3,4,5))"; "(alpha, (1,2))"; "(final, (1))"]
What I would like to do is turn each list element into its own list without the initial comma separated string. The output should look something like this:
["1"; "2"; "3"; "4"; "5"]
["1"; "2"]
["1"]
I've found myriad ways to concatenate list elements and my best guesses thus far (unfolding, or something of the sort) have been fruitless. Any help or a point in the right direction would be much appreciated. Thanks!
Just for the fun of it, here's an outline of how to parse the strings using FParsec, a parser combinator library.
First, you import some modules:
open FParsec.Primitives
open FParsec.CharParsers
Then, you can define a parser that will match all strings enclosed by parentheses:
let betweenParentheses p s = between (pstring "(") (pstring ")") p s
This will match any string enclosed in parentheses, such as "(42)", "(foo)", "(1,2,3,4,5)", etc., depending on the specific parser p passed as the first argument.
In order to parse numbers like "(1,2,3,4,5)" or "(1,2)", you can combine betweenParentheses with FParsec's built-in sepBy and pint32:
let pnumbers s = betweenParentheses (sepBy pint32 (pstring ",")) s
pint32 is a parser of integers, and sepBy is a parser that reads a list of values, separated by a string - in this case ",".
In order to parse an entire 'group' of values, such as "(states, (1,2,3,4,5))" or "(alpha, (1,2))", you can again use betweenParentheses and pnumbers:
let pgroup s =
betweenParentheses
(manyTill anyChar (pstring ",") >>. spaces >>. pnumbers) s
The manyTill combination parses any char value until it encounters ,. Next, the pgroup parser expects any number of spaces, and then the format defined by pnumbers.
Finally, you can define a function that runs the pgroup parser on a string:
// string -> int32 list option
let parseGroup s =
match run pgroup s with
| Success (result, _, _) -> Some result
| Failure _ -> None
Since this function returns an option, you can use List.choose to map the strings that can be parsed:
> ["(states, (1,2,3,4,5))"; "(alpha, (1,2))"; "(final, (1))"]
|> List.choose parseGroup;;
val it : int32 list list = [[1; 2; 3; 4; 5]; [1; 2]; [1]]
Using FParsec is most likely overkill, unless you have some more flexible formatting rules than what can easily be addressed with .NET's standard string API.
You can also just use Char.IsDigit (at least based on your sample data) like so:
open System
// Signature is string -> string list
let getDigits (input : string) =
input.ToCharArray()
|> Array.filter Char.IsDigit
|> Array.map (fun c -> c.ToString())
|> List.ofArray
// signature is string list -> string list list
let convertToDigits input =
input
|> List.map getDigits
And testing it out in F# interactive:
> let sampleData = ["(states, (1,2,3,4,5))"; "(alpha, (1,2))"; "(final, (1))"];;
val sampleData : string list =
["(states, (1,2,3,4,5))"; "(alpha, (1,2))"; "(final, (1))"]
> let test = convertToDigits sampleData;;
val test : string list list = [["1"; "2"; "3"; "4"; "5"]; ["1"; "2"]; ["1"]]
NOTE: If you have more than 1 digit numbers, this will split them into individual elements in the list. If you don't want that you'll have to use regex or string.split or something else.
You can achieve this with the built-in string manipulation API in .NET. You don't have to make it particular fancy, but it helps to provide some slim, curried Adapters over the string API:
open System
let removeWhitespace (x : string) = x.Replace(" ", "")
let splitOn (separator : string) (x : string) =
x.Split([| separator |], StringSplitOptions.RemoveEmptyEntries)
let trim c (x : string) = x.Trim [| c |]
The only slightly tricky step is once you've used splitOn to split "(states, (1,2,3,4,5))" into [|"(states"; "1,2,3,4,5))"|]. Now you have an array with two elements, and you want the second element. You can do this by first taking Seq.tail of that array, throwing away the first element, and then taking Seq.head of the resulting sequence, giving you the first element of the remaining sequence.
Using these building blocks, you can extract the desired data like this:
let result =
["(states, (1,2,3,4,5))"; "(alpha, (1,2))"; "(final, (1))"]
|> List.map (
removeWhitespace
>> splitOn ",("
>> Seq.tail
>> Seq.head
>> trim ')'
>> splitOn ","
>> Array.toList)
Result:
val result : string list list = [["1"; "2"; "3"; "4"; "5"]; ["1"; "2"]; ["1"]]
The most unsafe part is the Seq.tail >> Seq.head combination. It can fail if the input list has fewer than two elements. A safer alternative would be to use something like the following trySecond helper function:
let trySecond xs =
match xs |> Seq.truncate 2 |> Seq.toList with
| [_; second] -> Some second
| _ -> None
Using this function, you can rewrite the data extraction function to be a bit more robust:
let result' =
["(states, (1,2,3,4,5))"; "(alpha, (1,2))"; "(final, (1))"]
|> List.map (removeWhitespace >> splitOn ",(" >> trySecond)
|> List.choose id
|> List.map (trim ')' >> splitOn "," >> Array.toList)
The result is the same as before.
As #JWosty suggested, start with a single list item and match it using regular expressions.
let text = "(states, (1,2,3,4,5))"
// Match all numbers into group "number"
let pattern = #"^\(\w+,\s*\((?:(?<number>\d+),)*(?<number>\d+)\)$"
let numberMatch = System.Text.RegularExpressions.Regex.Match(text, pattern)
let values =
numberMatch.Groups.["number"].Captures // get all matches from the group
|> Seq.cast<Capture> // cast each item because regex captures are non-generic (i.e. IEnumerable instead of IEnumerable<'a>)
|> Seq.map (fun m -> m.Value) // get the matched (string) value for each capture
|> Seq.map int // parse as int
|> Seq.toList // listify
Doing this for a list of input texts is just a matter of passing this logic to List.map.
What I like about this solution is that it doesn't use magic numbers but the core of it is just a regular expression. Also parsing each match as integer is pretty safe because we only match digits.
Similar to Luiso's answer, but should avoid exceptions. Note that I split on '(' and ')' so I can isolate the tuple. Then I try to get the tuple only before splitting it on ',' to get the final result. I use pattern matching to avoid exceptions.
open System
let values = ["(states, (1,2,3,4,5))"; "(alpha, (1,2))"; "(final, (1))"]
let new_list = values |> List.map(fun i -> i.Split([|'(';')'|], StringSplitOptions.RemoveEmptyEntries))
|> List.map(fun i -> i|> Array.tryItem(1))
|> List.map(function x -> match x with
| Some i -> i.Split(',') |> Array.toList
| None -> [])
printfn "%A" new_list
gives you:
[["1"; "2"; "3"; "4"; "5"]; ["1"; "2"]; ["1"]]
This snippet should do about you ask:
let values = ["(states, (1,2,3,4,5))"; "(alpha, (1,2))"; "(final, (1))"]
let mapper (value:string) =
let index = value.IndexOf('(', 2) + 1;
value.Substring(index, value.Length - index - 2).Split(',') |> Array.toList
values |> List.map mapper
Output:
val it : string list list = [["1"; "2"; "3"; "4"; "5"]; ["1"; "2"]; ["1"]]
As I see it every item on you original list is a tuple of a string and a tuple of int of variable size, in any case what the code above does is removing the first item of the tuple and then then use the remaining variable size tuple (the numbers inside the parens), then call the .Net string.Split() function and turns the resulting array to a list. Hope this helps

Magic sprintf function - how to wrap it?

I am trying to wrap a call to sprintf function. Here's my attempt:
let p format args = "That was: " + (sprintf format args)
let a = "a"
let b = "b"
let z1 = p "A %s has invalid b" a
This seems to work, output is
val p : format:Printf.StringFormat<('a -> string)> -> args:'a -> string
val a : string = "a"
val b : string = "b"
val z1 : string = "That was: A a has invalid b"
But it wouldn't work with more than one arg:
let z2 = p "A %s has invalid b %A" a b
I get compile-time error:
let z2 = p "A %s has invalid b %A" a b;;
---------^^^^^^^^^^^^^^^^^^^^^^^^^^^
stdin(7,10): error FS0003: This value is not a function and cannot be applied
How can I create a single function which would work with any number of args?
UPD. Tomas has suggested to use
let p format = Printf.kprintf (fun s -> "This was: " + s) format
It works indeed. Here's an example
let p format = Printf.kprintf (fun s -> "This was: " + s) format
let a = p "something like %d" 123
// val p : format:Printf.StringFormat<'a,string> -> 'a
// val a : string = "This was: something like 123"
But the thing is that main purpose of my function is to do some work except for formatring, so I tried to use the suggested code as follows
let q format =
let z = p format // p is defined as suggested
printf z // Some work with formatted string
let z = q "something like %d" 123
And it doesn't work again:
let z = q "something like %d" 123;;
----------^^^^^^^^^^^^^^^^^^^
stdin(30,15): error FS0001: The type ''c -> string' is not compatible with the type 'Printf.TextWriterFormat<('a -> 'b)>'
How could I fix it?
For this to work, you need to use currying - your function p needs to take the format and return a function returned by one of the printf functions (which can then be a function taking one or more arguments).
This cannot be done using sprintf (because then you would have to propagate the arguments explicitly. However, you can use kprintf which takes a continuation as the first argument::
let p format = Printf.kprintf (fun s -> "This was: " + s) format
The continuation is called with the formatted string and so you can do whatever you need with the resulting string before returning.
EDIT: To answer your extended question, the trick is to put all the additional work into the continuation:
let q format =
let cont z =
// Some work with formatted string
printf "%s" z
Printf.kprintf cont format

Resources