How to parse a string of integers only in some range with Parsec? - parsing

I'm trying to learn Parsec by parsing a date string of format "YYYYMMDD", for example "20161030". And my solution is:
date :: Parser (String, String, String)
date = do
year <- replicateM 4 digit
month <- replicateM 2 digit
day <- replicateM 2 digit
return (year, month, day)
But the problem is that "20161356" is also a valid date for my code.
How can I validate the "MM" is between 1 and 12; and "DD" is between 1 and 31?

You could add a guard as suggested by Thomas M. DuBuisson:
date :: Parser (String, String, String)
date = do
year <- replicateM 4 digit
month <- replicateM 2 digit
day <- replicateM 2 digit
guard $ read month > 0 && read month <= 12 && read day > 0 && read day <= 31
return (year, month, day)
However, this results in a bad error message:
λ> parse date "" "20161356"
Left (line 1, column 9):unknown parse error
We can fix this by combining guard with <?> to provide a better error message:
date :: Parser (String, String, String)
date = do
year <- replicateM 4 digit
month <- replicateM 2 digit
guard (read month > 0 && read month <= 12) <?> "valid month (1–12)"
day <- replicateM 2 digit
guard (read day > 0 && read day <= 31) <?> "valid day (1–31)"
return (year, month, day)
With this approach, you get a more useful error message:
λ> parse date "" "20161356"
Left (line 1, column 7):
expecting valid month (1–12)
As a side note, I think it is valuable to validate (or at least sanity check) the date in a parser—it ensures that the date validation composes with the rest of your parser and error-handling code. You can't forget to check the date later in your code and the error is localized correctly, which is very useful if you're parsing documents with lots of dates.

Related

Issues Creating Records with FsCheck

This question is a follow-up to an earlier question on using FsCheck to generate records. The original question was answered with a well composed example solution. However, prior to the answer being shared, I attempted to create a generator which is included below. Unfortunately, the generated records of type QueryRequest = {Symbol: string; StartDate: DateTime; EndDate: DateTime} have the following issues:
Missing symbol
Start dates earlier than January 1, 2000
End dates later than January 1, 2019
Original:
{ Symbol = ""
StartDate = 8/9/2057 4:07:10 AM
EndDate = 10/14/2013 6:15:32 PM }
Shrunk:
{ Symbol = ""
StartDate = 8/9/2057 12:00:00 AM
EndDate = 10/14/2013 12:00:00 AM }
Since I am still in the process of becoming familiar with F#, I would appreciate suggestions/feedback on: how to address the aforementioned issues, and opportunities to improve the code in terms of structure, composition, etc.
namespace Parser
module DataGenerators =
open System
open FsCheck
type QueryRequest = {Symbol: string; StartDate: DateTime; EndDate: DateTime}
type Tweet =
static member GenerateRecords (year, month, day, symbol) =
try
let startDate = DateTime (year, month, day)
let endDate = startDate.AddDays 1.0
Some {Symbol = symbol; StartDate = startDate; EndDate = endDate}
with
| :? ArgumentOutOfRangeException -> None
static member Combine (years: int list) (months: int list) (days: int list) (symbols: string list) =
let rec loop acc years months days symbols =
match years, months, days, symbols with
| [], [], [], [] -> acc
| year :: years, month :: months, day :: days, symbol :: symbols -> loop ((year, month, day, symbol) :: acc) years months days symbols
| _, _, _, _ -> acc
loop [] years months days symbols
static member Generate () =
let years = Gen.choose (2000, 2019) |> Gen.sample 0 10
let months = Gen.choose (1, 12) |> Gen.sample 0 10
let days = Gen.choose(1, 31) |> Gen.sample 0 10
let symbols = Gen.elements ["ORCL"; "IBM"; "AAPL"; "GOOGL"] |> Gen.sample 0 10
Tweet.Combine years months days symbols
|> List.map Tweet.GenerateRecords
|> List.fold (fun acc r -> match r with Some q -> q :: acc | None -> acc) []
I cannot reproduce your issue, the following yields true for 1000s of executions:
Tweet.Generate()
|> List.forall (fun q ->
q.StartDate <= q.EndDate &&
q.StartDate >= DateTime(2000, 1, 1) &&
q.EndDate <= DateTime(2019, 12, 31) &&
["ORCL"; "IBM"; "AAPL"; "GOOGL"] |> List.contains q.Symbol)
However, you can simplify Tweet like so:
type Tweet =
static member GenerateRecords ((year, month, day), symbol) =
try
let startDate = DateTime (year, month, day)
let endDate = startDate.AddDays 1.0
Some {Symbol = symbol; StartDate = startDate; EndDate = endDate}
with
| :? ArgumentOutOfRangeException -> None
static member Generate () =
let years = Gen.choose (2000, 2019) |> Gen.sample 0 10
let months = Gen.choose (1, 12) |> Gen.sample 0 10
let days = Gen.choose(1, 31) |> Gen.sample 0 10
let symbols = Gen.elements ["ORCL"; "IBM"; "AAPL"; "GOOGL"] |> Gen.sample 0 10
let dates = List.zip3 years months days
List.zip dates symbols
|> List.choose Tweet.GenerateRecords

FSharp.Data: Transform multiple columns to a single column (dictionary result)

I am using FSharp.Data to transform HTML table data, i.e.
type RawResults = HtmlProvider<url>
let results = RawResults.Load(url).Tables
for row in results.Table1.Rows do
printfn " %A " row
Example output:
("Model: Generic", "Submit Date: July 22, 2016")
("Gene: Sequencing Failed", "Exectime: 5 hrs. 21 min.")
~~~ hundreds of more rows ~~~~
I am trying to split those "two column"-based elements into a single column sequence to eventually get to a dictionary result.
Desired dictionary key:value result:
["Model", Generic]
["Submit Date", July 22, 2016]
["Gene", "Sequencing Failed"]
~~~~
How can you iter (or split?) the two columns (Column1 & Column2) to pipe both of those individual columns to produce a dictionary result?
let summaryDict =
results.Table1.Rows
|> Seq.skip 1
|> Seq.iter (fun x -> x.Column1 ......
|> ....
Use the built-in string API to split over the :. I usually prefer to wrap String.Split in curried form:
let split (separator : string) (s : string) = s.Split (separator.ToCharArray ())
Additionally, while not required, when working with two-element tuples, I often find it useful to define a helper module with functions related to this particular data structure. You can put various functions in such a module (e.g. curry, uncurry, swap, etcetera), but in this case, a single function is all you need:
module Tuple2 =
let mapBoth f g (x, y) = f x, g y
With these building blocks, you can easily split each tuple element over :, as shown in this FSI session:
> [
("Model: Generic", "Submit Date: July 22, 2016")
("Gene: Sequencing Failed", "Exectime: 5 hrs. 21 min.") ]
|> List.map (Tuple2.mapBoth (split ":") (split ":"));;
val it : (string [] * string []) list =
[([|"Model"; " Generic"|], [|"Submit Date"; " July 22, 2016"|]);
([|"Gene"; " Sequencing Failed"|], [|"Exectime"; " 5 hrs. 21 min."|])]
At this point, you still need to strip leading whitespace, as well as convert the arrays into your desired format, but I trust you can take it from here (otherwise, please ask).

Basic f# error - pattern matching is implying the wrong type

The following code takes 2 parameters. the first is a list of triples: The triple (d,m,y) is meant to represent a date.
the second is an integer which is a month
The code is meant to count the number of occurrences of dates with that month in the list
p.s. I guess this probably looks like a homework question - it's not. It's from a course I did earlier in the year in ML and I'm trying to redo all the exercises in f#. So it's only for my benefit
let rec number_in_month (dates : (int * int * int) list, month) =
match dates with
| [] -> 0
| (_,y,_) when month = y -> 1 + number_in_month(dates.Tail, month)
| _ -> number_in_month(dates.Tail, month)
but it gives the error :
This expression was expected to have type
(int * int * int) list but here has type
'a * 'b * 'c
any idea what I'm doing wrong?
Your second pattern match is trying to match a single date (_,y,_) but it is being matched against your list of dates. Try matching using (_,y,_)::_ instead.
More idiomatic would be to match using (_,y,_)::tail and to use tail instead of dates.Tail later in the expression.
The code can also be tightened up (including the fix MarkP suggested). Note the use of type inference so that the type of dates does not need to be passed
let rec number_in_month dates month =
match dates with
| [] -> 0
| (_,y,_)::tail ->
( number_in_month tail month) + (if y = month then 1 else 0)
let data = [(1,2,3);(1,2,3);(1,5,7);(1,9,2);(1,9,2);(1,9,2)]
number_in_month data 5
number_in_month data 2
http://www.tryfsharp.org/create/bradgonesurfing/datefinder.fsx

Making a Read instance in Haskell

I have a data type
data Time = Time {hour :: Int,
minute :: Int
}
for which i have defined the instance of Show as being
instance Show Time where
show (Time hour minute) = (if hour > 10
then (show hour)
else ("0" ++ show hour))
++ ":" ++
(if minute > 10
then (show minute)
else ("0" ++ show minute))
which prints out times in a format of 07:09.
Now, there should be symmetry between Show and Read, so after reading (but not truly (i think) understanding) this and this, and reading the documentation, i have come up with the following code:
instance Read Time where
readsPrec _ input =
let hourPart = takeWhile (/= ':')
minutePart = tail . dropWhile (/= ':')
in (\str -> [(newTime
(read (hourPart str) :: Int)
(read (minutePart str) :: Int), "")]) input
This works, but the "" part makes it seem wrong. So my question ends up being:
Can anyone explain to me the correct way to implement Read to parse "07:09" into newTime 7 9 and/or show me?
I'll use isDigit and keep your definition of Time.
import Data.Char (isDigit)
data Time = Time {hour :: Int,
minute :: Int
}
You used but didn't define newTime, so I wrote one myself so my code compiles!
newTime :: Int -> Int -> Time
newTime h m | between 0 23 h && between 0 59 m = Time h m
| otherwise = error "newTime: hours must be in range 0-23 and minutes 0-59"
where between low high val = low <= val && val <= high
Firstly, your show instance is a little wrong because show $ Time 10 10 gives "010:010"
instance Show Time where
show (Time hour minute) = (if hour > 9 -- oops
then (show hour)
else ("0" ++ show hour))
++ ":" ++
(if minute > 9 -- oops
then (show minute)
else ("0" ++ show minute))
Let's have a look at readsPrec:
*Main> :i readsPrec
class Read a where
readsPrec :: Int -> ReadS a
...
-- Defined in GHC.Read
*Main> :i ReadS
type ReadS a = String -> [(a, String)]
-- Defined in Text.ParserCombinators.ReadP
That's a parser - it should return the unmatched remaining string instead of just "", so you're right that the "" is wrong:
*Main> read "03:22" :: Time
03:22
*Main> read "[23:34,23:12,03:22]" :: [Time]
*** Exception: Prelude.read: no parse
It can't parse it because you threw away the ,23:12,03:22] in the first read.
Let's refactor that a bit to eat the input as we go along:
instance Read Time where
readsPrec _ input =
let (hours,rest1) = span isDigit input
hour = read hours :: Int
(c:rest2) = rest1
(mins,rest3) = splitAt 2 rest2
minute = read mins :: Int
in
if c==':' && all isDigit mins && length mins == 2 then -- it looks valid
[(newTime hour minute,rest3)]
else [] -- don't give any parse if it was invalid
Gives for example
Main> read "[23:34,23:12,03:22]" :: [Time]
[23:34,23:12,03:22]
*Main> read "34:76" :: Time
*** Exception: Prelude.read: no parse
It does, however, allow "3:45" and interprets it as "03:45". I'm not sure that's a good idea, so perhaps we could add another test length hours == 2.
I'm going off all this split and span stuff if we're doing it this way, so maybe I'd prefer:
instance Read Time where
readsPrec _ (h1:h2:':':m1:m2:therest) =
let hour = read [h1,h2] :: Int -- lazily doesn't get evaluated unless valid
minute = read [m1,m2] :: Int
in
if all isDigit [h1,h2,m1,m2] then -- it looks valid
[(newTime hour minute,therest)]
else [] -- don't give any parse if it was invalid
readsPrec _ _ = [] -- don't give any parse if it was invalid
Which actually seems cleaner and simpler to me.
This time it doesn't allow "3:45":
*Main> read "3:40" :: Time
*** Exception: Prelude.read: no parse
*Main> read "03:40" :: Time
03:40
*Main> read "[03:40,02:10]" :: [Time]
[03:40,02:10]
If the input to readsPrec is a string that contains some other characters after a valid representation of a Time, those other characters should be returned as the second element of the tuple.
So for the string 12:34 bla, the result should be [(newTime 12 34, " bla")]. Your implementation would cause an error for that input. This means that something like read "[12:34]" :: [Time] would fail because it would call Time's readsPrec with "12:34]" as the argument (because readList would consume the [, then call readsPrec with the remaining string, and then check that the remaining string returned by readsPrec is either ] or a comma followed by more elements).
To fix your readsPrec you should rename minutePart to something like afterColon and then split that into the actual minute part (with takeWhile isDigit for example) and whatever comes after the minute part. Then the stuff that came after the minute part should be returned as the second element of the tuple.

Parsing on dates with F#

Are there some 'date parser' library that does for dates what FParsec does to strings ?
That is, you either specify rules and it will match against them to recognize the supplied patterns.
Conversely, are there any libraries to generate dates based on some parsing rules ?
The idea would be to supply user with a 'real time' completion to guide him to a valid future fparsec matching.
(does this problem of generative parsing has a name in the secluded parsing circles ?)
You can define a simple domain specific language (DSL) to express these kinds of rules. The type corresponding to your "parser" is actually just a function that takes a date and returns boolean:
type DateClassifier = DC of (DateTime -> bool)
You can easily define some simple functions:
// Succeeds when the date is wednesday
let wednesday = DC (fun dt -> dt.DayOfWeek = DayOfWeek.Wednesday)
// Succeeds if the date is after specified limit
let after limit = DC (fun dt -> dt > limit)
// Succeeds if the day is the specified value
let day d = DC (fun dt -> dt.Day = d)
// Takes two date classifiers and succeeds if either of them succeeds
let (<|>) (DC f) (DC g) = (fun dt -> f dt || g dt)
// Takes two date classifiers and succeeds if both of them succeed
let (<&>) (DC f) (DC g) = (fun dt -> f dt && g dt)
To specify your condition - "the next wednesday after the 5th of the month" - you'll need a helper that generates function that succeeds on any day following 5th, which can be done like this (this is a bit inefficient, but it is composition using existing primitives, which is nice):
let afterDay d =
[ for n in d + 1 .. 31 -> day n ] |> Seq.reduce (<|>)
Your specification (or "parser") that only succeeds for the day you described is then:
after DateTime.Now (wednesday <&> afterDay 5)

Resources