F# check if a string contains only number - f#

I am trying to figure out a nice way to check if a string contains only number. This is the result of my effort but it seems really verbose:
let isDigit c = Char.IsDigit c
let rec strContainsOnlyNumber (s:string)=
let charList = List.ofSeq s
match charList with
| x :: xs ->
if isDigit x then
strContainsOnlyNumber ( String.Concat (Array.ofList xs))
else
false
| [] -> true
for example it seems really ugly that I have to convert a string to char list and then back to a string.
Can you figure out a better solution?

There are a few different options for approaching this.
Given that System.String is a sequence of characters, which you're currently using to turn into a list, you can skip the list conversions and just use Seq.forall to directly test:
let strContainsOnlyNumber (s:string) = s |> Seq.forall Char.IsDigit
If you want to see if it's a valid number, you can parse it into a number directly:
let strContainsOnlyNumber (s:string) = System.Int32.TryParse s |> fst
Note that this will also return true for things like "-342" (which contains -, but is a valid number).
Another approach would be to use a regular expression:
let numberCheck = System.Text.RegularExpressions.Regex("^[0-9]+$")
let strContainsOnlyNumbers (s:string) = numberCheck.IsMatch s
This will also handle numeric characters, but could be adapted to include other symbols in numbers if needed.
If the goal is to later use the string as a number, my suggestion would be to just do a conversion, and store in an option:
let tryToInt s =
match System.Int32.TryParse s with
| true, v -> Some v
| false, _ -> None
This will allow you to check to see if the value was a number (via Option.isSome), pattern match to use the results, and more.
Note that conversions to floating point numbers is nearly identical - just change the Int32.TryParse to a Double.TryParse if you want to handle float values.

Related

How can I determine if a list of discriminated union types are of the same case?

Suppose I have a DU like so:
type DU = Number of int | Word of string
And suppose I create a list of them:
[Number(1); Word("abc"); Number(2)]
How can I write a function that would return true for a list of DUs where all the elements are the same case. For the above list it should return false.
The general approach I'd use here would be to map the union values into tags identifying the cases, and then check if the resulting set of tags has at most one element.
let allTheSameCase (tagger: 'a -> int) (coll: #seq<'a>) =
let cases =
coll
|> Seq.map tagger
|> Set.ofSeq
Set.count cases <= 1
For the tagger function, you can assign the tags by hand:
allTheSameCase (function Number _ -> 0 | Word _ -> 1) lst
or use reflection (note that you might need to set binding flags as necessary):
open Microsoft.FSharp.Reflection
let reflectionTagger (case: obj) =
let typ = case.GetType()
if FSharpType.IsUnion(typ)
then
let info, _ = FSharpValue.GetUnionFields(case, typ)
info.Tag
else -1 // or fail, depending what makes sense in the context.
In case you wanted to check that the elements of a list are of a specific union case, it's straightforward to provide a predicate function.
let isNumbers = List.forall (function Number _ -> true | _ -> false)
If you do not care which union case, as long as they are all the same, you need to spell them all out explicitly. Barring reflection magic to get a property not exposed inside F#, you also need to assign some value to each case. To avoid having to think up arbitrary values, we can employ an active pattern which maps to a different DU behind the scenes.
let (|IsNumber|IsWord|) = function
| Number _ -> IsNumber
| Word _ -> IsWord
let isSameCase src =
src |> Seq.groupBy (|IsNumber|IsWord|) |> Seq.length <= 1
I had the exact same use case recently and the solution can be done much simpler than complicated reflections or explicit pattern matching, GetType does all the magic:
let AreAllElementsOfTheSameType seq = // seq<'a> -> bool
if Seq.isEmpty seq then true else
let t = (Seq.head seq).GetType ()
seq |> Seq.forall (fun e -> (e.GetType ()) = t)

Parsing int or float with FParsec

I'm trying to parse a file, using FParsec, which consists of either float or int values. I'm facing two problems that I can't find a good solution for.
1
Both pint32 and pfloat will successfully parse the same string, but give different answers, e.g pint32 will return 3 when parsing the string "3.0" and pfloat will return 3.0 when parsing the same string. Is it possible to try parsing a floating point value using pint32 and have it fail if the string is "3.0"?
In other words, is there a way to make the following code work:
let parseFloatOrInt lines =
let rec loop intvalues floatvalues lines =
match lines with
| [] -> floatvalues, intvalues
| line::rest ->
match run floatWs line with
| Success (r, _, _) -> loop intvalues (r::floatvalues) rest
| Failure _ ->
match run intWs line with
| Success (r, _, _) -> loop (r::intvalues) floatvalues rest
| Failure _ -> loop intvalues floatvalues rest
loop [] [] lines
This piece of code will correctly place all floating point values in the floatvalues list, but because pfloat returns "3.0" when parsing the string "3", all integer values will also be placed in the floatvalues list.
2
The above code example seems a bit clumsy to me, so I'm guessing there must be a better way to do it. I considered combining them using choice, however both parsers must return the same type for that to work. I guess I could make a discriminated union with one option for float and one for int and convert the output from pint32 and pfloat using the |>> operator. However, I'm wondering if there is a better solution?
You're on the right path thinking about defining domain data and separating definition of parsers and their usage on source data. This seems to be a good approach, because as your real-life project grows further, you would probably need more data types.
Here's how I would write it:
/// The resulting type, or DSL
type MyData =
| IntValue of int
| FloatValue of float
| Error // special case for all parse failures
// Then, let's define individual parsers:
let pMyInt =
pint32
|>> IntValue
// this is an alternative version of float parser.
// it ensures that the value has non-zero fractional part.
// caveat: the naive approach would treat values like 42.0 as integer
let pMyFloat =
pfloat
>>= (fun x -> if x % 1 = 0 then fail "Not a float" else preturn (FloatValue x))
let pError =
// this parser must consume some input,
// otherwise combined with `many` it would hang in a dead loop
skipAnyChar
>>. preturn Error
// Now, the combined parser:
let pCombined =
[ pMyFloat; pMyInt; pError ] // note, future parsers will be added here;
// mind the order as float supersedes the int,
// and Error must be the last
|> List.map (fun p -> p .>> ws) // I'm too lazy to add whitespase skipping
// into each individual parser
|> List.map attempt // each parser is optional
|> choice // on each iteration, one of the parsers must succeed
|> many // a loop
Note, the code above is capable working with any sources: strings, streams, or whatever. Your real app may need to work with files, but unit testing can be simplified by using just string list.
// Now, applying the parser somewhere in the code:
let maybeParseResult =
match run pCombined myStringData with
| Success(result, _, _) -> Some result
| Failure(_, _, _) -> None // or anything that indicates general parse failure
UPD. I have edited the code according to comments. pMyFloat was updated to ensure that the parsed value has non-zero fractional part.
FParsec has the numberLiteral parser that can be used to solve the problem.
As a start you can use the example available at the link above:
open FParsec
open FParsec.Primitives
open FParsec.CharParsers
type Number = Int of int64
| Float of float
// -?[0-9]+(\.[0-9]*)?([eE][+-]?[0-9]+)?
let numberFormat = NumberLiteralOptions.AllowMinusSign
||| NumberLiteralOptions.AllowFraction
||| NumberLiteralOptions.AllowExponent
let pnumber : Parser<Number, unit> =
numberLiteral numberFormat "number"
|>> fun nl ->
if nl.IsInteger then Int (int64 nl.String)
else Float (float nl.String)```

How to add a condition that a parsed number must satisfy in FParsec?

I am trying to parse an int32 with FParsec but have an additional restriction that the number must be less than some maximum value. Is their a way to perform this without writing my own custom parser (as below) and/or is my custom parser (below) the appropriate way of achieving the requirements.
I ask because most of the built-in library functions seem to revolve around a char satisfying certain predicates and not any other type.
let pRow: Parser<int> =
let error = messageError ("int parsed larger than maxRows")
let mutable res = Reply(Error, error)
fun stream ->
let reply = pint32 stream
if reply.Status = Ok && reply.Result <= 1000000 then
res <- reply
res
UPDATE
Below is an attempt at a more fitting FParsec solution based on the direction given in the comment below:
let pRow2: Parser<int> =
pint32 >>= (fun x -> if x <= 1048576 then (preturn x) else fail "int parsed larger than maxRows")
Is this the correct way to do it?
You've done an excellent research and almost answered your own question.
Generally, there are two approaches:
Unconditionally parse out an int and let the further code to check it for validity;
Use a guard rule bound to the parser. In this case (>>=) is the right tool;
In order to make a good choice, ask yourself whether an integer that failed to pass the guard rule has to "give another chance" by triggering another parser?
Here's what I mean. Usually, in real-life projects, parsers are combined in some chains. If one parser fails, the following one is attempted. For example, in this question, some programming language is parsed, so it needs something like:
let pContent =
pLineComment <|> pOperator <|> pNumeral <|> pKeyword <|> pIdentifier
Theoretically, your DSL may need to differentiate a "small int value" from another type:
/// The resulting type, or DSL
type Output =
| SmallValue of int
| LargeValueAndString of int * string
| Comment of string
let pSmallValue =
pint32 >>= (fun x -> if x <= 1048576 then (preturn x) else fail "int parsed larger than maxRows")
|>> SmallValue
let pLargeValueAndString =
pint32 .>> ws .>>. (manyTill ws)
|>> LargeValueAndString
let pComment =
manyTill ws
|>> Comment
let pCombined =
[ pSmallValue; pLargeValueAndString; pComment]
|> List.map attempt // each parser is optional
|> choice // on each iteration, one of the parsers must succeed
|> many // a loop
Built this way, pCombined will return:
"42 ABC" gets parsed as [ SmallValue 42 ; Comment "ABC" ]
"1234567 ABC" gets parsed as [ LargeValueAndString(1234567, "ABC") ]
As we see, the guard rule impacts how the parsers are applied, so the guard rule has to be within the parsing process.
If, however, you don't need such complication (e.g., an int is parsed unconditionally), your first snippet is just fine.

stopword removal in F#

I am trying to write a code to remove stopwords like "the", "this" in a string list etc.
I wrote this code:
let rec public stopword (a : string list, b :string list) =
match [a.Head] with
|["the"]|["this"] -> stopword (a.Tail, b)
|[] -> b
|_ -> stopword (a.Tail, b#[a.Head])
I ran this in the interactive:
stopword (["this";"is";"the"], []);;
I got this error:
This expression was expected to have type string list but here has type 'a * 'b
Match expressions in F# are very powerful, although the syntax is confusing at first
You need to match the list like so:
let rec stopword a =
match a with
|"the"::t |"this"::t -> stopword t
|h::t ->h::(stopword t)
|[] -> []
The actual error is due to the function expecting a tuple argument. You would have to call the function with:
let result = stopword (["this";"is";"the"], [])
Edit: since the original question was changed, the above answer is not valid anymore; the logical error in the actual function is that you end up with a single element list of which the tail is taken, resulting in an empty list. On the next recursive call the function chokes on trying to get the head of this empty list
The function in itself is not correctly implemented though and much more complicated than necessary.
let isNoStopword (word:string) =
match word with
| "the"|"this" -> false
| _ -> true
let removeStopword (a : string list) =
a |> List.filter(isNoStopword)
let test = removeStopword ["this";"is";"the"]
Others have mentioned the power of pattern matching in this case. In practice, you usually have a set of stopwords you want to remove. And the when guard allows us to pattern match quite naturally:
let rec removeStopwords (stopwords: Set<string>) = function
| x::xs when Set.contains x stopwords -> removeStopwords stopwords xs
| x::xs -> x::(removeStopwords stopwords xs)
| [] -> []
The problem with this function and #John's answer is that they are not tail-recursive. They run out of stack on a long list consisting of a few stopwords. It's a good idea to use high-order functions in List module which are tail-recursive:
let removeStopwords (stopwords: Set<string>) xs =
xs |> List.filter (stopwords.Contains >> not)

How do I use Some/None Options in this F# example?

I am new to F# and I have this code:
if s.Contains("-") then
let x,y =
match s.Split [|'-'|] with
| [|a;b|] -> int a, int b
| _ -> 0,0
Notice that we validate that there is a '-' in the string before we split the string, so the match is really unnecessary. Can I rewrite this with Options?
I changed this code, it was originally this (but I was getting a warning):
if s.Contains("-") then
let [|a;b|] = s.Split [|'-'|]
let x,y = int a, int b
NOTE: I am splitting a range of numbers (range is expressed in a string) and then creating the integer values that represent the range's minimum and maximum.
The match is not unnecessary, the string might be "1-2-3" and you'll get a three-element array.
Quit trying to get rid of the match, it is your friend, not your enemy. :) Your enemy is the mistaken attempt at pre-validation (the "if contains" logic, which was wrong).
I think you may enjoy this two-part blog series.
http://lorgonblog.spaces.live.com/blog/cns!701679AD17B6D310!180.entry
http://lorgonblog.spaces.live.com/blog/cns!701679AD17B6D310!181.entry
EDIT
Regarding Some/None comment, yes, you can do
let parseRange (s:string) =
match s.Split [|'-'|] with
| [|a;b|] -> Some(int a, int b)
| _ -> None
let Example s =
match parseRange s with
| Some(lo,hi) -> printfn "%d - %d" lo hi
| None -> printfn "range was bad"
Example "1-2"
Example "1-2-3"
Example "1"
where parseRange return value is a Some (success) or None (failure) and rest of program can make a decision later based on that.

Resources