I have a monadic parser that I'm implementing as an exercise. Its signature looks like this:
type Parser err src target = ExceptT err (State [src]) target
I've already implemented many basic helpers, but I've come across a use case where a negative lookahead is necessary. In particular, I think I'd like to make something of this signature:
notFollowedBy :: e -> Parser e s t -> Parser e s t' -> Parser e s t
notFollowedBy followedByError parser shouldFail = -- ...
My thought is that it can be used in context like this:
foo = letter `notFollowedBy'` digit
where notFollowedBy' = notFollowedBy FollwedByDigitError
I'm stumbling to implement notFollowedBy, though for a variety of reasons:
I need a way to run shouldFail such that I can invert its ExceptT result (ie. if it throws I want to catch it and do nothing, but if it doesn't throw I need to throw notFollowedByError)
catchError won't do here (I don't think) and I couldn't figure out a way to use runExceptT to get the Either e t' from shouldFail
Before I run shouldFail I need to save the state from StateT because after running shouldFail I need to restore the state (as if this parser wasn't run). But I'm using the Lazy StateT, so it's unclear if I need to switch everything to the strict one just to allow for this case
My best stab doesn't even compile, but it looks like this:
notFollowedBy :: (t' -> e) -> Parser e s t -> Parser e s t' -> Parser e s t
notFollowedBy onUnexpected parser shouldFail = do
parsed <- parser
state <- get -- This isn't strict
result <- runExceptT shouldFail -- This doesn't typecheck
case result of
Left err -> put state >> return parsed
Right t -> put state >> throwError $ onUnexpected t
(As an implementation note, the first parameter is actually (t' -> e), because I want to allow customizing the error thrown based on information returned by the second parser. But I don't think that matters for my question.)
The typecheck failure is due to it expecting ExceptT (ExceptT e (State [s])) t but getting Parser e s t (which is ExceptT e (State [s]) t).
After pouring over the docs and reading some of the source for ExceptT and StateT, my best guess is that I need to emulate catchError (which matches on an Either) and then use liftCatch. This is my sloppy stab at that (which also doesn't compile):
notFollowedBy :: (t' -> e) -> Parser e s t -> Parser e s t' -> Parser e s t
notFollowedBy onUnexpected parser shouldFail = do
state <- get
result <- parser
catchSuccess' result shouldFail (throwError . onUnexpected)
put state
return result
where catchSuccess' result = liftCatch (catchSuccess result)
catchSuccess r (Left l) _ = Right r
catchSuccess _ (Right r) h = Left (h r)
The typechecker seems to be unhappy about a lot of things the second time around. In particular, it seems like liftCatch (from State.Lazy) is not what we want (because it expects catchSuccess' to return an ExceptT).
At this point I'm just flailing around making random permutations trying to appease the compiler. Can anyone offer any suggestions about how I could implement notFollowedBy?
edit: After consulting how I implemented optional (below), it seems like the state ordering is not an issue (although how that is is a mystery to me). So my primary issue then is creating the reverse of catchError (catchSuccessAndSuppressError).
option :: Parser e s t -> Parser e s t -> Parser e s t
option parserA parserB = do
state <- get
parserA `catchError` \_ -> put state >> parserB
tl;dr
I'm trying to write a function with this signature that throws followedByError when shouldFail does not throw an exception (when run after running parser). The state upon return should be the same as the state after parser is run.
type Parser err src target = ExceptT err (State [src]) target
notFollowedBy :: e -> Parser e s t -> Parser e s t' -> Parser e s t
notFollowedBy followedByError parser shouldFail = -- ...
Related
(I use trifecta parser lib). I'm trying to make a parser that parses integers into Right and literal sequences (alphabet, numeral symbols and "-" are allowed) into Left:
*Lib> parseString myParser mempty "123 qwe 123qwe 123-qwe-"
Success [Right 123,Left "qwe",Left "123qwe",Left "123-qwe-"]
That is what I invented:
myParser :: Parser [Either String Integer]
myParser = sepBy1 (try (Right . read <$> (some digit <* notFollowedBy (choice [letter, char '-'])))
<|> Left <$> some (choice [alphaNum, char '-']))
(char ' ')
My problem is that I don't understand why try is needed there (and in any other similar situations). When try is not used, an error appears:
*Lib> parseString myParser mempty "123 qwe 123qwe 123-qwe-"
Failure (ErrInfo {_errDoc = (interactive):1:12: error: expected: digit
1 | 123 qwe 123qwe 123-qwe-<EOF>
| ^ , _errDeltas = [Columns 11 11]})
So try puts the parsing cursor back to where we started on failure. Imagine try isn't used:
123qwe
^ failed there, the cursor position remains there
On the other hand, <|> is like "either". It should run the second parser Left <$> some (choice [alphaNum, char '-'])) (when the first parser failed) and consume just "qwe".
Somewhere I'm wrong.
The second parser would indeed consume the "qwe" part if only it was given a chance to run. But it isn't given such chance.
Look at the definition of (<|>) for Parser:
Parser m <|> Parser n = Parser $ \ eo ee co ce d bs ->
m eo (\e -> n (\a e' -> eo a (e <> e')) (\e' -> ee (e <> e')) co ce d bs) co ce d bs
Hmm... Maybe not such a good idea to look at that. But let's push through nevertheless. To make sense of all those eo, ee, etc., let's look at their explanations on the Parser definition:
The first four arguments are behavior continuations:
epsilon success: the parser has consumed no input and has a result as well as a possible Err; the position and chunk are unchanged (see pure)
epsilon failure: the parser has consumed no input and is failing with the given Err; the position and chunk are unchanged (see empty)
committed success: the parser has consumed input and is yielding the result, set of expected strings that would have permitted this parse to continue, new position, and residual chunk to the continuation.
committed failure: the parser has consumed input and is failing with a given ErrInfo (user-facing error message)
In your case we clearly have "committed failure" - i.e. the Right parser has consumed some input and failed. So in this case it's going to call the fourth continuation - denoted ce in the definition of (<|>).
And now look at the body of the definition: the fourth continuation is passed to parser m unchanged:
m eo (\e -> n (\a e' -> eo a (e <> e')) (\e' -> ee (e <> e')) co ce d bs) co ce d bs
^
|
here it is
This means that the parser returned from (<|>) will call the fourth continuation in all cases in which parser m calls it. Which means that it will fail with "committed failure" in all cases in which the parser m fails with "committed failure". Which is exactly what you observe.
From Brent Yorgey's 2013 Penn class, after getting help on defining a Functor Parser, I'm attempting to make an Applicative Parser:
--p1 <*> p2 represents the parser which first runs p1 (which will
--consume some input and produce a function), then passes the
--remaining input to p2 (which consumes more input and produces
--some value), then returns the result of applying the function to the
--value
Here's my attempt:
instance Applicative (Parser) where
pure x = Parser $ \_ -> Just (x, [])
(Parser f) <*> (Parser g) = case (\ys -> f ys) of Nothing -> Parser Nothing
Just (_, xs) -> Parser $ g xs
However, I'm getting compile-time errors on the apply (<*>) definition.
Intuitively, I believe that using <*> achieves AND functionality.
If I have a parser for foo and a parser for bar, then I should be able to use apply <*> to say: foo followed by bar. In other words, input of foobar should successfully match, whereas foobip would not. It would fail on the second parser.
However, I believe that the types are:
Parser (a -> b) -> Parser a -> Parser b
So, that makes me think that my intuition is not entirely correct.
Please give me a tip to guide me towards understanding how to implement apply.
Your code is predicated on a misunderstanding of what a Parser is. Don't worry, virtually everybody makes this mistake.
newtype Parser a = Parser { runParser :: String -> Maybe (a, String) }
Lets break down what this means.
String -> Maybe (a, String)
[1] [2] [3] [4]
[1]: I take a string and return Maybe (a, String)
[2]: I might not succeed in parsing the input into the desired datatype
[3]: The desired type I am parsing the String into
[4]: Remaining input after having consumed the amount of data required to parse a
Parser is a function of text input to Maybe a tuple of a value and the rest of the text. Parser is emphatically not a tuple, otherwise you wouldn't have a parser. Just data in a tuple.
I'm not going to tell you how to implement <*> and nobody else should either as it would deprive you of the experience.
However, I'll give you pure so you understand the basic pattern:
pure a = Parser (\s -> Just (a, s))
See? It's a function of s -> Maybe (a, s). I intentionally mimicked the type variables in my terms to make it more obvious.
I want to parse some text in which certain fields have structure most of the time but occasionally (due to special casing, typos etc) this structure is missing.
E.g. Regular case is Cost: 5, but occasionally it will read Cost: 5m or Cost: 3 + 1 per ally, or some other random stuff.
In the case of the normal parser (p) not working, I'd like to fallback to a parser which just takes the whole line as a string.
To this end, I'd like to create a combinator of type Parser a -> Parser b -> Either a b. However, I cannot work out how to inspect the results of attempting to see if the first parser succeeds or not, without doing something like case parse p "" txt of ....
I can't see a build in combinator, but I'm sure there's some easy way to solve this that I'm missing
I think you want something like this
eitherParse :: Parser a -> Parser b -> Parser (Either a b)
eitherParse a b = fmap Left (try a) <|> fmap Right b
The try is just to ensure that if a consumes some input and then fails, you'll backtrack properly. Then you can just use the normal methods for running a parser to yield Either ParseError (Either a b)
Which is quite easy to transform into your Either a b
case parse p "" str of
Right (Left a) -> useA a
Right (Right b) -> useB b
Left err -> handleParserError err
Try this: (<|>) :: ParsecT s u m a -> ParsecT s u m a -> ParsecT s u m a
As a rule you could use it this way:
try p <|> q
I've been coding up an attoparsec parser and have been hitting a pattern where I want to turn parsers into recursive parsers (recursively combining them with the monad bind >>= operator).
So I created a function to turn a parser into a recursive parser as follows:
recursiveParser :: (a -> A.Parser a) -> a -> A.Parser a
recursiveParser parser a = (parser a >>= recursiveParser parser) <|> return a
Which is useful if you have a recursive data type like
data Expression = ConsExpr Expression Expression | EmptyExpr
parseRHS :: Expression -> Parser Expression
parseRHS e = ConsExpr e <$> parseFoo
parseExpression :: Parser Expression
parseExpression = parseLHS >>= recursiveParser parseRHS
where parseLHS = parseRHS EmptyExpr
Is there a more idiomatic solution? It almost seems like recursiveParser should be some kind of fold... I also saw sepBy in the docs, but this method seems to suit me better for my application.
EDIT: Oh, actually now that I think about it should actually be something similar to fix... Don't know how I forgot about that.
EDIT2: Rotsor makes a good point with his alternative for my example, but I'm afraid my AST is actually a bit more complicated than that. It actually looks something more like this (although this is still simplified)
data Segment = Choice1 Expression
| Choice2 Expression
data Expression = ConsExpr Segment Expression
| Token String
| EmptyExpr
where the string a -> b brackets to the right and c:d brackets to the left, with : binding more tightly than ->.
I.e. a -> b evaluates to
(ConsExpr (Choice1 (Token "a")) (Token "b"))
and c:d evaluates to
(ConsExpr (Choice2 (Token "d")) (Token "c"))
I suppose I could use foldl for the one and foldr for the other but there's still more complexity in there. Note that it's recursive in a slightly strange way, so "a:b:c -> e:f -> :g:h ->" is actually a valid string, but "-> a" and "b:" are not. In the end fix seemed simpler to me. I've renamed the recursive method like so:
fixParser :: (a -> A.Parser a) -> a -> A.Parser a
fixParser parser a = (parser a >>= fixParser parser) <|> pure a
Thanks.
Why not just parse a list and fold it into whatever you want later?
Maybe I am missing something, but this looks more natural to me:
consChain :: [Expression] -> Expression
consChain = foldl ConsExpr EmptyExpr
parseExpression :: Parser Expression
parseExpression = consChain <$> many1 parseFoo
And it's shorter too.
As you can see, consChain is now independent from parsing and can be useful somewhere else. Also, if you separate out the result folding, the somewhat unintuitive recursive parsing simplifies down to many or many1 in this case.
You may want to take a look at how many is implemented too:
many :: (Alternative f) => f a -> f [a]
many v = many_v
where many_v = some_v <|> pure []
some_v = (:) <$> v <*> many_v
It has a lot in common with your recursiveParser:
some_v is similar to parser a >>= recursiveParser parser
many_v is similar to recursiveParser parser
You may ask why I called your recursive parser function unintuitive. This is because this pattern allows parser argument to affect the parsing behaviour (a -> A.Parser a, remember?), which may be useful, but not obviously (I don't see a use case for this yet). The fact that your example does not use this feature makes it look redundant.
As part of the 4th exercise here
I would like to use a reads type function such as readHex with a parsec Parser.
To do this I have written a function:
liftReadsToParse :: Parser String -> (String -> [(a, String)]) -> Parser a
liftReadsToParse p f = p >>= \s -> if null (f s) then fail "No parse" else (return . fst . head ) (f s)
Which can be used, for example in GHCI, like this:
*Main Numeric> parse (liftReadsToParse (many1 hexDigit) readHex) "" "a1"
Right 161
Can anyone suggest any improvement to this approach with regard to:
Will the term (f s) be memoised, or evaluated twice in the case of a null (f s) returning False?
Handling multiple successful parses, i.e. when length (f s) is greater than one, I do not know how parsec deals with this.
Handling the remainder of the parse, i.e. (snd . head) (f s).
This is a nice idea. A more natural approach that would make
your ReadS parser fit in better with Parsec would be to
leave off the Parser String at the beginning of the type:
liftReadS :: ReadS a -> String -> Parser a
liftReadS reader = maybe (unexpected "no parse") (return . fst) .
listToMaybe . filter (null . snd) . reader
This "combinator" style is very idiomatic Haskell - once you
get used to it, it makes function definitions much easier
to read and understand.
You would then use liftReadS like this in the simple case:
> parse (many1 hexDigit >>= liftReadS readHex) "" "a1"
(Note that listToMaybe is in the Data.Maybe module.)
In more complex cases, liftReadS is easy to use inside any
Parsec do block.
Regarding some of your other questions:
The function reader is applied only once now, so there is nothing to "memoize".
It is common and accepted practice to ignore all except the first parse in a ReadS parser in most cases, so you're fine.
To answer the first part of your question, no (f s) will not be memoised, you would have to do that manually:
liftReadsToParse p f = p >>= \s -> let fs = f s in if null fs then fail "No parse"
else (return . fst . head ) fs
But I'd use pattern matching instead:
liftReadsToParse p f = p >>= \s -> case f s of
[] -> fail "No parse"
(answer, _) : _ -> return answer