I have a data type
data Time = Time {hour :: Int,
minute :: Int
}
for which i have defined the instance of Show as being
instance Show Time where
show (Time hour minute) = (if hour > 10
then (show hour)
else ("0" ++ show hour))
++ ":" ++
(if minute > 10
then (show minute)
else ("0" ++ show minute))
which prints out times in a format of 07:09.
Now, there should be symmetry between Show and Read, so after reading (but not truly (i think) understanding) this and this, and reading the documentation, i have come up with the following code:
instance Read Time where
readsPrec _ input =
let hourPart = takeWhile (/= ':')
minutePart = tail . dropWhile (/= ':')
in (\str -> [(newTime
(read (hourPart str) :: Int)
(read (minutePart str) :: Int), "")]) input
This works, but the "" part makes it seem wrong. So my question ends up being:
Can anyone explain to me the correct way to implement Read to parse "07:09" into newTime 7 9 and/or show me?
I'll use isDigit and keep your definition of Time.
import Data.Char (isDigit)
data Time = Time {hour :: Int,
minute :: Int
}
You used but didn't define newTime, so I wrote one myself so my code compiles!
newTime :: Int -> Int -> Time
newTime h m | between 0 23 h && between 0 59 m = Time h m
| otherwise = error "newTime: hours must be in range 0-23 and minutes 0-59"
where between low high val = low <= val && val <= high
Firstly, your show instance is a little wrong because show $ Time 10 10 gives "010:010"
instance Show Time where
show (Time hour minute) = (if hour > 9 -- oops
then (show hour)
else ("0" ++ show hour))
++ ":" ++
(if minute > 9 -- oops
then (show minute)
else ("0" ++ show minute))
Let's have a look at readsPrec:
*Main> :i readsPrec
class Read a where
readsPrec :: Int -> ReadS a
...
-- Defined in GHC.Read
*Main> :i ReadS
type ReadS a = String -> [(a, String)]
-- Defined in Text.ParserCombinators.ReadP
That's a parser - it should return the unmatched remaining string instead of just "", so you're right that the "" is wrong:
*Main> read "03:22" :: Time
03:22
*Main> read "[23:34,23:12,03:22]" :: [Time]
*** Exception: Prelude.read: no parse
It can't parse it because you threw away the ,23:12,03:22] in the first read.
Let's refactor that a bit to eat the input as we go along:
instance Read Time where
readsPrec _ input =
let (hours,rest1) = span isDigit input
hour = read hours :: Int
(c:rest2) = rest1
(mins,rest3) = splitAt 2 rest2
minute = read mins :: Int
in
if c==':' && all isDigit mins && length mins == 2 then -- it looks valid
[(newTime hour minute,rest3)]
else [] -- don't give any parse if it was invalid
Gives for example
Main> read "[23:34,23:12,03:22]" :: [Time]
[23:34,23:12,03:22]
*Main> read "34:76" :: Time
*** Exception: Prelude.read: no parse
It does, however, allow "3:45" and interprets it as "03:45". I'm not sure that's a good idea, so perhaps we could add another test length hours == 2.
I'm going off all this split and span stuff if we're doing it this way, so maybe I'd prefer:
instance Read Time where
readsPrec _ (h1:h2:':':m1:m2:therest) =
let hour = read [h1,h2] :: Int -- lazily doesn't get evaluated unless valid
minute = read [m1,m2] :: Int
in
if all isDigit [h1,h2,m1,m2] then -- it looks valid
[(newTime hour minute,therest)]
else [] -- don't give any parse if it was invalid
readsPrec _ _ = [] -- don't give any parse if it was invalid
Which actually seems cleaner and simpler to me.
This time it doesn't allow "3:45":
*Main> read "3:40" :: Time
*** Exception: Prelude.read: no parse
*Main> read "03:40" :: Time
03:40
*Main> read "[03:40,02:10]" :: [Time]
[03:40,02:10]
If the input to readsPrec is a string that contains some other characters after a valid representation of a Time, those other characters should be returned as the second element of the tuple.
So for the string 12:34 bla, the result should be [(newTime 12 34, " bla")]. Your implementation would cause an error for that input. This means that something like read "[12:34]" :: [Time] would fail because it would call Time's readsPrec with "12:34]" as the argument (because readList would consume the [, then call readsPrec with the remaining string, and then check that the remaining string returned by readsPrec is either ] or a comma followed by more elements).
To fix your readsPrec you should rename minutePart to something like afterColon and then split that into the actual minute part (with takeWhile isDigit for example) and whatever comes after the minute part. Then the stuff that came after the minute part should be returned as the second element of the tuple.
Related
I'd like to parse all days from a text like this:
Ignore this
Also this
2019-09-05
More to ignore
2019-09-06
2019-09-07
Using Trifecta, I've defined a function to parse a day:
dayParser :: Parser Day
dayParser = do
dayString <- tillEnd
parseDay dayString
tillEnd :: Parser String
tillEnd = manyTill anyChar (try eof <|> eol)
parseDay :: String -> Parser Day
parseDay s = maybe failure return dayMaybe
where
dayMaybe = parseTime' dayFormat s
failure = unexpected $ "Failed to parse date. Expected format: " ++ dayFormat
-- %-m makes the parser accept months consisting of a single digit
dayFormat = "%Y-%-m-%-d"
eol :: Parser ()
eol = char '\n' <|> char '\r' >> return ()
-- "%Y-%-m-%-d" for example
type TimeFormat = String
-- Given a time format and a string, parses the string to a time.
parseTime' :: (Monad m, ParseTime t) => TimeFormat -> String -> m t
-- True means that the parser tolerates whitespace before and after the date
parseTime' = parseTimeM True defaultTimeLocale
Parsing a day this way works. What I'm having trouble with is ignoring anything in the text that's not a day.
The following can't work since it assumes the number of text blocks that aren't a day:
daysParser :: Parser [Day]
daysParser = do
-- Ignore everything that's not a day
_ <- manyTill anyChar $ try dayParser
days <- many $ token dayParser
_ <- manyTill anyChar $ try dayParser
-- There might be more days after this...
return days
I reckon there's a straightforward way to express this with Trifecta but I can't seem to find it.
Here's the whole module including an example text to parse:
{-# LANGUAGE QuasiQuotes #-}
module DateParser where
import Text.RawString.QQ
import Data.Time
import Text.Trifecta
import Control.Applicative ( (<|>) )
-- "%Y-%-m-%-d" for example
type TimeFormat = String
dayParser :: Parser Day
dayParser = do
dayString <- tillEnd
parseDay dayString
tillEnd :: Parser String
tillEnd = manyTill anyChar (try eof <|> eol)
parseDay :: String -> Parser Day
parseDay s = maybe failure return dayMaybe
where
dayMaybe = parseTime' dayFormat s
failure = unexpected $ "Failed to parse date. Expected format: " ++ dayFormat
-- %-m makes the parser accept months consisting of a single digit
dayFormat = "%Y-%-m-%-d"
eol :: Parser ()
eol = char '\n' <|> char '\r' >> return ()
-- Given a time format and a string, parses the string to a time.
parseTime' :: (Monad m, ParseTime t) => TimeFormat -> String -> m t
-- True means that the parser tolerates whitespace before and after the date
parseTime' = parseTimeM True defaultTimeLocale
daysParser :: Parser [Day]
daysParser = do
-- Ignore everything that's not a day
_ <- manyTill anyChar $ try dayParser
days <- many $ token dayParser
_ <- manyTill anyChar $ try dayParser
-- There might be more days after this...
return days
test = parseString daysParser mempty text1
text1 = [r|
Ignore this
Also this
2019-09-05
More to ignore
2019-09-06
2019-09-07|]
There are three large problems here.
First, the way you're defining dayParser, it's always trying to parse the rest of the text as a date. For example, if your input text is "2019-01-01 foo bar", then dayParser would first consume the whole string, so that dayString == "2019-01-01 foo bar", and then will try to parse that string as a date. Which, of course, would fail.
In order to have a saner behavior, you could only bite off the beginning of the string that kinda looks like a date and try to parse that, like:
dayParser =
parseDay =<< many (digit <|> char '-')
This implementation bites off the beginning of the input consisting of digits and dashes, and tries to parse that as a date.
Note that this is a quick-n-dirty implementation. It is imprecise. For example, this implementation would accept input like "2019-01-0123456" and try to parse that as a date, and of course will fail. From your question, it is not clear whether you'd want to still parse 2019-01-01 and leave the rest, or whether you want to not consider that a proper date. If you wanted to be super-precise about this, you could specify the exact format as precisely as you want, e.g.:
dayParser = do
y <- count 4 digit
void $ char '-'
m <- try (count 2 digit) <|> count 1 digit
void $ char '-'
d <- try (count 2 digit) <|> count 1 digit
parseDay $ y ++ "-" ++ m ++ "-" ++ d
This implementation expects exactly the format of the date.
Second, there is a logical problem: your daysParser tries to first parse some garbage, then parse many days, and then parse some garbage again. This logic does not admit a case where the many dates have some garbage between them.
Third problem is much more tricky. You see, the way the try combinator works - if the parser fails, then try will roll back the input position, but if the parser succeeds, then the input remains consumed! This means that you cannot use try as a zero-consumption lookahead, the way you're trying to do in manyTill anyChar $ try dayParser. Such a parser will parse until it finds a date, and then it will consume the date, leaving nothing for the next parser and causing it to fail.
I will illustrate with a simpler example. Consider this:
> parseString (many (char 'a')) mempty "aaa"
Success "aaa"
Cool, it parses three 'a's. Now let's add a try at the beginning:
> parseString (try (char 'b') *> many (char 'a')) mempty "aaa"
Success "aaa"
Awesome, this still works: the try fails, and then we parse three 'a's as before.
Now let's change the try from 'b' to 'a':
> parseString (try (char 'a') *> many (char 'a')) mempty "aaa"
Success "aa"
Look what happened: the try has consumed the first 'a', leaving only two to be parsed by many.
We can even extend it to more fully resemble your approach:
> p = manyTill anyChar (try (char 'a')) *> many (char 'a')
> parseString p mempty "aaa"
Success "aa"
> parseString p mempty "cccaaa"
Success "aa"
See what happens? manyTill correctly skips all the 'c's up to the first 'a', but then it also consumes that first 'a'!
There appears to be no sane way (that I see) to have a zero-consumption lookahead like this. You always have to consume the first successful hit.
If I had this problem, I would probably resort to recursion: parsing chars one by one, at every step looking if I can get a day, and concatenating in a list. Something like this:
data WhatsThis = AChar Char | ADay Day | EOF
daysParser = do
r <- (ADay <$> dayParser) <|> (AChar <$> anyChar) <|> (EOF <$ eof)
case r of
ADay d -> do
rest <- daysParser
pure $ d : rest
AChar _ ->
daysParser
EOF ->
pure []
It tries to parse a day, and if that fails, just skips a char, unless there are no more chars. If day parsing succeeded, it calls itself recursively, then prepends the day to the result of the recursive call.
Note that this approach is not very composable: it always consumes everything till the end of the input. If you want to compose it with something else, you may want consider replacing eof with a parameter:
daysParser stop = do
r <- (ADay <$> dayParser) <|> (AChar <$> anyChar) <|> (EOF <$ stop)
...
For an assignment on parser if been working on the following code in Haskell;
Imports;
import ParseLib.Simple
import Prelude hiding ((<*>), (<$>),(<$),(<*))
import Data.Char
import Data.Time.Calendar hiding (Day)
Data Types;
data DateTime = DateTime { date :: Date
, time :: Time
, utc :: Bool }
deriving (Eq, Ord)
data TimeUTC = Eps | Z deriving Show
data Date = Date { year :: Year
, month :: Month
, day :: Day }
deriving (Eq, Ord)
newtype Year = Year { unYear :: Int } deriving (Eq, Ord)
newtype Month = Month { unMonth :: Int } deriving (Eq, Ord)
newtype Day = Day { unDay :: Int } deriving (Eq, Ord)
data Time = Time { hour :: Hour
, minute :: Minute
, second :: Second }
deriving (Eq, Ord)
newtype Hour = Hour { unHour :: Int } deriving (Eq, Ord)
newtype Minute = Minute { unMinute :: Int } deriving (Eq, Ord)
newtype Second = Second { unSecond :: Int } deriving (Eq, Ord)
data Result = SyntaxError | Invalid DateTime | Valid DateTime deriving (Eq, Ord)
instance Show DateTime where
show = printDateTime
instance Show Result where
show SyntaxError = "date/time with wrong syntax"
show (Invalid _) = "good syntax, but invalid date or time values"
show (Valid x) = "valid date: " ++ show x
With show instances for all the newtypes equel to this;
instance Show Year where
show (Year a) = show a
The parsers look like this;
-- Exercise 1
parseDateTime :: Parser Char DateTime
parseDateTime = DateTime <$> parseDate <*> parseTime <*> parseUTC
parseDigits :: Int -> Parser Char Int
parseDigits n (xs) | length f == n = [(read f :: Int, drop n xs)]
| otherwise = []
where f = take n xs
-- Time parsing
parseTime :: Parser Char Time
parseTime = Time <$> parseHour <*> parseMinute <*> parseSeconds
parseHour :: Parser Char Hour
parseHour xs | length (checkT) == 2 = [(Hour(read f :: Int), (drop 3 xs))]
| length (checkT) == 3 = [(Hour(read f :: Int), (drop 2 xs))]
| otherwise = []
where f = take 3 xs
checkT = if (head f) == 'T' then drop 1 f else f
parseMinute :: Parser Char Minute
parseMinute = Minute <$> (parseDigits 2)
parseSeconds :: Parser Char Second
parseSeconds = Second <$> (parseDigits 2)
-- Date parsing
parseDate :: Parser Char Date
parseDate = Date <$> parseYear <*> parseMonth <*> parseDay
parseYear :: Parser Char Year
parseYear = Year <$> (parseDigits 4)
parseMonth :: Parser Char Month
parseMonth = Month <$> (parseDigits 2)
parseDay :: Parser Char Day
parseDay = Day <$> (parseDigits 2)
-- UTC parsing
parseUTC :: Parser Char Bool
parseUTC = utcToBool <$> (option ((\a -> Z) <$> satisfy (=='Z')) Eps)
utcToBool :: TimeUTC -> Bool
utcToBool Eps = False
utcToBool Z = True
-- Exercise 2
isFinished :: [(a, [b])] -> Maybe a
isFinished [] = Nothing
isFinished ((result, rest):xs) | null rest = Just result
| otherwise = isFinished xs
run :: Parser a b -> [a] -> Maybe b
run p s = isFinished r
where r = p s
-- Exercise 3
printDateTime :: DateTime -> String
printDateTime (DateTime (Date year month day)(Time hour minute second)timezone) = f year 4 ++ f month 2 ++ f day 2 ++ "T" ++ f hour 2 ++ f minute 2 ++ f second 2 ++ endOrZ timezone
where f s a = take (a - length (show s)) (repeat '0') ++ show s
endOrZ False = ""
endOrZ True = "Z"
-- Exercise 4
parsePrint s = fmap printDateTime $ run parseDateTime s
So now my problem. After finishing exercise 4 we are supossed to test "parsePrint" and if the string that goes in is correct it returns Just "inputstring" else nothing
So i tried to test it with some strings that were given to us and i got the following error;
*Main> parsePrint "20111012T083945"
*** Exception: Prelude.read: no parse
Just "20111012T*Main>
Now I found some thread on here about the same error, these were mostly about missing qoutes and one about the read instance. Yet I'm still not really able to figure out the problem.
I don't expect the flat out answser seeing it's for a school assignment but if someone could point me in the right direction I would be really grateful, seeing I've been trying to figure it out for a couple of hours now.
-- Indents got a bit messed up when I copied the code here, but I don't think that would be the problem.
I'm starting to learn Haskell and wish to parse a PPM image for execrsice. The structure of the PPM format is rather simple, but it is tricky. It's described here. First of all, I defined a type for a PPM Image:
data Pixel = Pixel { red :: Int, green :: Int, blue :: Int} deriving(Show)
data BitmapFormat = TextualBitmap | BinaryBitmap deriving(Show)
data Header = Header { format :: BitmapFormat
, width :: Int
, height :: Int
, colorDepth :: Int} deriving(Show)
data PPM = PPM { header :: Header
, bitmap :: [Pixel]
}
bitmap should contain the entire image. This is where the first challange comes - the part that contains the actual image data in PPM can be either textual or binary (described in the header).
For textual bitmaps I wrote the following function:
parseTextualBitmap :: String -> [Pixel]
parseTextualBitmap = map textualPixel . chunksOf 3 . wordsBy isSpace
where textualPixel (r:g:b:[]) = Pixel (read r) (read g) (read b)
I'm not sure what to do with binary bitmaps, though. Using read converts a string representation of numbers to numbers. I want to convert "\x01" to 1 of type Int.
The second challange is parsing the header. I wrote the following function:
parseHeader :: String -> Header
parseHeader = constructHeader . wordsBy isSpace . filterComments
where
filterComments = unlines . map (takeWhile (/= '#')) . lines
formatFromText s
| s == "P6" = BinaryBitmap
| s == "P3" = TextualBitmap
constructHeader (format:width:height:colorDepth:_) =
Header (formatFromText format) (read width) (read height) (read colorDepth)
Which works pretty well. Now I should write the module exported function (let's call it parsePPM) which gets the entire file content (String) and then return PPM. The function should call parseHeader, deterime the bitmap format, call the apropriate parse(Textual|Binary)Bitmap and then construct a PPM with the result. Once parseHeader returns I should start decoding the bitmap from the point that parseHeader stopped in. However, I cannot know in which point of the string parseHeader stopped. The only solution I could think of is that instead of Header, parseHeader will return (Header,String), when the second element of the tuple is the remainder retrieved by constructHeader (which currently named as _). But I'm not really sure it's the "Haskell Way" of doing things.
To sum up my questions:
1. How do I decode the binary format into a list of Pixel
2. How can I know in which point the header ends
Since I'm learning Haskell by myself I have no one to actually review my code, so in addition to answering my questions I will appriciate any comment about the way I code (coding style, bugs, alternative way to do things, etc...).
Lets start with question 2 because it is easier to answer. Your approach is correct: as you parse things, you remove those characters from the input string, and return a tuple containing the result of the parse, and the remaining string. However, thereis no reason to write all this from scratch (except perhaps as an academic exercise) - there are plenty of parsers which will take care of this issue for you. The one I will use is Parsec. If you are new to monadic parsing you should first read the section on Parsec in RWH.
As for question 1, if you use ByteString instead of String, then parsing single bytes is easy since single bytes are the atomic elements of ByteStrings!
There is also the issue of the Char/ByteString interface. With Parsec, this is a non-issue since you can treat a ByteString as a sequence of Byte or Char - we will see this later.
I decided to just write the full parser - this is a very simple language so with all the primitives defined for you in the Parsec library, it is very easy and very concise.
The file header:
import Text.Parsec.Combinator
import Text.Parsec.Char
import Text.Parsec.ByteString
import Text.Parsec
import Text.Parsec.Pos
import Data.ByteString (ByteString, pack)
import qualified Data.ByteString.Char8 as C8
import Control.Monad (replicateM)
import Data.Monoid
First, we write the 'primitive' parsers - that is, parsing bytes, parsing textual numbers, and parsing whitespace (which the PPM format uses as a seperator):
parseIntegral :: (Read a, Integral a) => Parser a
parseIntegral = fmap read (many1 digit)
digit parses a single digit - you'll notice that many function names explain what the parser does - and many1 will apply the given parser 1 or more times. Then we read the resulting string to return an actual number (as opposed to a string). In this case, the input ByteString is being treated as text.
parseByte :: Integral a => Parser a
parseByte = fmap (fromIntegral . fromEnum) $ tokenPrim show (\pos tok _ -> updatePosChar pos tok) Just
For this parser, we parse a single Char - which is really just a byte. It is just returned as a Char. We could safely make the return type Parser Word8 because the universe of values that can be returned is [0..255]
whitespace1 :: Parser ()
whitespace1 = many1 (oneOf "\n ") >> return ()
oneOf takes a list of Char and parses any one of the characters in the order given - again, the ByteString is being treated as Text.
Now we can write the parser for the header.
parseHeader :: Parser Header
parseHeader = do
f <- choice $ map try $
[string "P3" >> return TextualBitmap
,string "P6" >> return BinaryBitmap]
w <- whitespace1 >> parseIntegral
h <- whitespace1 >> parseIntegral
d <- whitespace1 >> parseIntegral
return $ Header f w h d
A few notes. choice takes a list of parsers and tries them in order. try p takes the parser p, and 'remembers' the state before p starts parsing. If p succeeds, then try p == p. If p fails, then the state before p started is restored and you pretend you never tried p. This is necessary due to how choice behaves.
For the pixels, we have two choices as of now:
parseTextual :: Header -> Parser [Pixel]
parseTextual h = do
xs <- replicateM (3 * width h * height h) (whitespace1 >> parseIntegral)
return $ map (\[a,b,c] -> Pixel a b c) $ chunksOf 3 xs
We could use many1 (whitespace 1 >> parseIntegral) - but this wouldn't enforce the fact that we know what the length should be. Then, converting the list of numbers to a list of pixels is trivial.
For binary data:
parseBinary :: Header -> Parser [Pixel]
parseBinary h = do
whitespace1
xs <- replicateM (3 * width h * height h) parseByte
return $ map (\[a,b,c] -> Pixel a b c) $ chunksOf 3 xs
Note how the two are almost identical. You could probably generalize this function (it would be especially useful if you decided to parse the other types of pixel data - monochrome and greyscale).
Now to bring it all together:
parsePPM :: Parser PPM
parsePPM = do
h <- parseHeader
fmap (PPM h) $
case format h of
TextualBitmap -> parseTextual h
BinaryBitmap -> parseBinary h
This should be self-explanatory. Parse the header, then parse the body based on the format. Here are some examples to try it on. They are the ones from the specification page.
example0 :: ByteString
example0 = C8.pack $ unlines
["P3"
, "4 4"
, "15"
, " 0 0 0 0 0 0 0 0 0 15 0 15"
, " 0 0 0 0 15 7 0 0 0 0 0 0"
, " 0 0 0 0 0 0 0 15 7 0 0 0"
, "15 0 15 0 0 0 0 0 0 0 0 0" ]
example1 :: ByteString
example1 = C8.pack ("P6 4 4 15 ") <>
pack [0, 0, 0, 0, 0, 0, 0, 0, 0, 15, 0, 15, 0, 0, 0, 0, 15, 7,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 15, 7, 0, 0, 0, 15,
0, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0 ]
Several notes: this doesn't handle comments, which are part of the spec. The error messages are not very useful; you can use the <?> function to create your own error messages. The spec also indicates 'The lines should not be longer than 70 characters.' - this is also not enforced.
edit:
Just because you see do-notation, doesn't necessarily mean that you are working with impure code. Some monads (like this parser) are still pure - they are just used for convenience. For example, you can write your parser with the type parser :: String -> (a, String), or, what we have done here, is we use a new type: data Parser a = Parser (String -> (a, String)) and have parser :: Parser a; we then write a monad instance for Parser to get the useful do-notation. To be clear, Parsec supports monadic parsing, but our parser is not monadic - or rather, uses the Identity monad, which is just newtype Identity a = Identity { runIdentity :: a }, and is only necessary because if we used type Identity a = a we would have 'overlapping instances' errors everywhere, which is not good.
>:i Parser
type Parser = Parsec ByteString ()
-- Defined in `Text.Parsec.ByteString'
>:i Parsec
type Parsec s u = ParsecT s u Data.Functor.Identity.Identity
-- Defined in `Text.Parsec.Prim'
So then, the type of Parser is really ParsecT ByteString () Identity. That is, the parser input is ByteString, the user state is () - which just means we aren't using the user state, and the monad in which we are parsing is Identity. ParsecT is itself just a newtype of:
forall b.
State s u
-> (a -> State s u -> ParseError -> m b)
-> (ParseError -> m b)
-> (a -> State s u -> ParseError -> m b)
-> (ParseError -> m b)
-> m b
All those functions in the middle are just used to pretty-print errors. If you are parsing 10's of thousands of characters and an error occurs, you won't be able to just look at it and see where that happened - but Parsec will tell you the line and column. If we specialize all the types to our Parser, and pretend that Identity is just type Identity a = a, then all the monads disappear and you can see that the parser is not impure. As you can see, Parsec is a lot more powerful than is required for this problem - I just used it due to familiarity, but if you were willing to write your own primitive functions like many and digit, then you could get away with using newtype Parser a = Parser (ByteString -> (a, ByteString)).
I'm working on an instance of Read ComplexInt.
Here's what was given:
data ComplexInt = ComplexInt Int Int
deriving (Show)
and
module Parser (Parser,parser,runParser,satisfy,char,string,many,many1,(+++)) where
import Data.Char
import Control.Monad
import Control.Monad.State
type Parser = StateT String []
runParser :: Parser a -> String -> [(a,String)]
runParser = runStateT
parser :: (String -> [(a,String)]) -> Parser a
parser = StateT
satisfy :: (Char -> Bool) -> Parser Char
satisfy f = parser $ \s -> case s of
[] -> []
a:as -> [(a,as) | f a]
char :: Char -> Parser Char
char = satisfy . (==)
alpha,digit :: Parser Char
alpha = satisfy isAlpha
digit = satisfy isDigit
string :: String -> Parser String
string = mapM char
infixr 5 +++
(+++) :: Parser a -> Parser a -> Parser a
(+++) = mplus
many, many1 :: Parser a -> Parser [a]
many p = return [] +++ many1 p
many1 p = liftM2 (:) p (many p)
Here's the given exercise:
"Use Parser to implement Read ComplexInt, where you can accept either the simple integer
syntax "12" for ComplexInt 12 0 or "(1,2)" for ComplexInt 1 2, and illustrate that read
works as expected (when its return type is specialized appropriately) on these examples.
Don't worry (yet) about the possibility of minus signs in the specification of natural
numbers."
Here's my attempt:
data ComplexInt = ComplexInt Int Int
deriving (Show)
instance Read ComplexInt where
readsPrec _ = runParser parseComplexInt
parseComplexInt :: Parser ComplexInt
parseComplexInt = do
statestring <- getContents
case statestring of
if '(' `elem` statestring
then do process1 statestring
else do process2 statestring
where
process1 ststr = do
number <- read(dropWhile (not(isDigit)) ststr) :: Int
return ComplexInt number 0
process2 ststr = do
numbers <- dropWhile (not(isDigit)) ststr
number1 <- read(takeWhile (not(isSpace)) numbers) :: Int
number2 <- read(dropWhile (not(isSpace)) numbers) :: Int
return ComplexInt number1 number2
Here's my error (my current error, as I'm sure there will be more once I sort this one out, but I'll take this one step at time):
Parse error in pattern: if ')' `elem` statestring then
do { process1 statestring }
else
do { process2 statestring }
I based my structure of the if-then-else statement on the structure used in this question: "parse error on input" in Haskell if-then-else conditional
I would appreciate any help with the if-then-else block as well as with the code in general, if you see any obvious errors.
Let's look at the code around the parse error.
case statestring of
if '(' `elem` statestring
then do process1 statestring
else do process2 statestring
That's not how case works. It's supposed to be used like so:
case statestring of
"foo" -> -- code for when statestring == "foo"
'b':xs -> -- code for when statestring begins with 'b'
_ -> -- code for none of the above
Since you're not making any sort of actual use of the case, just get rid of the case line entirely.
(Also, since they're only followed by a single statement each, the dos after then and else are superfluous.)
You stated you were given some functions to work with, but then didn't use them! Perhaps I misunderstood. Your code seems jumbled and doesn't seem to achieve what you would like it to. You have a call to getContents, which has type IO String but that function is supposed to be in the parser monad, not the io monad.
If you actually would like to use them, here is how:
readAsTuple :: Parser ComplexInt
readAsTuple = do
_ <- char '('
x <- many digit
_ <- char ','
y <- many digit
_ <- char ')'
return $ ComplexInt (read x) (read y)
readAsNum :: Parser ComplexInt
readAsNum = do
x <- many digit
return $ ComplexInt (read x) 0
instance Read ComplexInt where
readsPrec _ = runParser (readAsTuple +++ readAsNum)
This is fairly basic, as strings like " 42" (ones with spaces) will fail.
Usage:
> read "12" :: ComplexInt
ComplexInt 12 0
> read "(12,1)" :: ComplexInt
ComplexInt 12 1
The Read type-class has a method called readsPrec; defining this method is sufficient to fully define the read instance for the type, and gives you the function read automatically.
What is readsPrec?
readsPrec :: Int -> String -> [(a, String)].
The first parameter is the precedence context; you can think of this as the precedence of the last thing that was parsed. This can range from 0 to 11. The default is 0. For simple parses like this you don't even use it. For more complex (ie recursive) datatypes, changing the precedence context may change the parse.
The second parameter is the input string.
The output type is the possible parses and string remaining a parse terminates. For example:
>runStateT (char 'h') "hello world"
[('h',"ello world")]
Note that parsing is not-deterministic; every matching parse is returned.
>runStateT (many1 (char 'a')) "aa"
[("a","a"),("aa","")]
A parse is considered successful if the return list is a singleton list whose second value is the empty string; namely: [(x, "")] for some x. Empty lists, or lists where any of the remaining strings are not the empty string, give the error no parse and lists with more than one value give the error ambiguous parse.
I use n <- getLine to get from user price. How can I check is value correct ? (Price can have '.' and digits and must be greater than 0) ?
It doesn't work:
isFloat = do
n <- getLine
let val = case reads n of
((v,_):_) -> True
_ -> False
If The Input Is Always Valid Or Exceptions Are OK
If you have users entering decimal numbers in the form of "123.456" then this can simply be converted to a Float or Double using read:
n <- getLine
let val = read n
Or in one line (having imported Control.Monad):
n <- liftM read getLine
To Catch Erroneous Input
The above code fails with an exception if the users enter invalid entries. If that's a problem then use reads and listToMaybe (from Data.Maybe):
n <- liftM (fmap fst . listToMaybe . reads) getLine
If that code looks complex then don't sweat it - the below is the same operation but doing all the work with explicit case statements:
n <- getLine
let val = case reads n of
((v,_):_) -> Just v
_ -> Nothing
Notice we pattern match to get the first element of the tuple in the head of the list, The head of the list being (v,_) and the first element is v. The underscore (_) just means "ignore the value in this spot".
If Floating Point Isn't Acceptable
Floating values are well known to be approximate, and not suitable for real world financial computations (but perhaps homework, depending on your professor). In this case you'd want to read the values into a Rational (from Data.Ratio).
n <- liftM maybeRational getLine
...
where
maybeRational :: String -> Maybe Rational
maybeRational str =
let (a,b) = break (=='.') str
in liftM2 (%) (readMaybe a) (readMaybe $ drop 1 b)
readMaybe = fmap fst . listToMaybe . reads
In addition to the parsing advice provided by TomMD, consider using the appropriate monad for error reporting. It allows you to conveniently chain computations which can fail, avoiding explicit error checking on every step.
{-# LANGUAGE FlexibleContexts #-}
import Control.Monad.Error
parsePrice :: MonadError String m => String -> m Double
parsePrice s = do
x <- case reads s of
[(x, "")] -> return x
_ -> throwError "Not a valid real number."
when (x <= 0) $ throwError "Price must be positive."
return x
main = do
n <- getLine
case parsePrice n of
Left err -> putStrLn err
Right x -> putStrLn $ "Price is " ++ show x