Haskell - Parsec with state - parsing

I have a file in which the state of a game is saved in String format. This string consists of a list of moves, seperated by ,. From this list of moves I have to reconstruct the game state. Thus, conceptually, for each move I parse I want to modify the gamestate appropriatly and pass this gamestate to the parsing of the next move. Conceptually, this could be equivalent with having an empty list at the start and for each move consing the parsed move to that list. At the end you should have a list with all parsed moves.
I made the code example below as a simplified version to parse alfabetic letters and push these on the list. The core concept I want to learn is how to have an initial state, pass this on for each parsing cycle and return the final state using parsec. someState is the empty list initially.
parseExample :: State -> Parser [Char]
parseExample someState = do spaces
c <- char
c : someState
return someState

The simplest way to incorporate a "state" into a parser is not to do it at all. Let's say we have a tic-tac-toe board:
data Piece = X | O | N deriving (Show)
type Board = [[Piece]]
To parse a list of moves:
X11,O00,X01
into a board [[O,X,N],[N,X,N],[N,N,N]] representing the game state:
O | X |
---+---+---
| X |
---+---+---
| |
we can separate the parser, which just generates a list of moves:
data Move = Move Piece Int Int
moves :: Parser [Move]
moves = sepBy move (char ',')
where move = Move <$> piece <*> num <*> num
piece = X <$ char 'X' <|> O <$ char 'O'
num = read . (:[]) <$> digit
from the functions that regenerate the game state:
board0 :: Board
board0 = [[N,N,N],[N,N,N],[N,N,N]]
game :: [Move] -> Board
game = foldl' turn board0
turn :: Board -> Move -> Board
turn brd (Move p r c) = brd & ix r . ix c .~ p
and then connect them together in a loadGame function:
loadGame :: String -> Board
loadGame str =
case parse moves "" str of
Left err -> error $ "parse error: " ++ show err
Right mvs -> game mvs
This should be your go-to solution for this kind of problem: parse first into a simple stateless intermediate form, and then process this intermediate form in a "stateful" computation.
If you really want to build up the state during the parse, there are several ways to do it. In this particular case, given the definition of turn above, we can parse directly into a Board by incorporating the fold from the game function into the parser:
moves1 :: Parser Board
moves1 = foldl' turn board0 <$> sepBy move (char ',')
where move = Move <$> piece <*> num <*> num
piece = X <$ char 'X' <|> O <$ char 'O'
num = read . (:[]) <$> digit
but this won't generalize too well if you have multiple parsers that need to operate on a single underlying state.
To actually thread a state through a set of parsers, you can use the "user state" feature of Parsec. Define a parser with a Board user state:
type Parser' = Parsec String Board
and then a parser for a single move that modifies the user state:
move' :: Parser' ()
move' = do
m <- Move <$> piece <*> num <*> num
modifyState (flip turn m)
where piece = X <$ char 'X' <|> O <$ char 'O'
num = read . (:[]) <$> digit
Note that move''s return type is () because its action is implemented as a side-effect on the user state.
Now, the act of simply parsing a list of moves:
moves' :: Parser' ()
moves' = sepBy move' (char ',')
will generate the final game state:
loadGame' :: String -> Board
loadGame' str =
case runParser (moves' >> getState) [[N,N,N],[N,N,N],[N,N,N]] "" str of
Left err -> error $ "parse error: " ++ show err
Right brd -> brd
Here, loadGame' runs the parser on the user state with moves' and then uses a getState call to fetch the final board.
A nearly equivalent solution, since ParsecT is a monad transformer, is to create a ParsecT ... (State Board) monad transformer stack with a standard State layer. For example:
type Parser'' = ParsecT String () (Control.Monad.State.State Board)
move'' :: Parser'' ()
move'' = do
m <- Move <$> piece <*> num <*> num
modify (flip turn m)
where piece = X <$ char 'X' <|> O <$ char 'O'
num = read . (:[]) <$> digit
moves'' :: Parser'' ()
moves'' = void $ sepBy move'' (char ',')
loadGame'' :: String -> Board
loadGame'' str =
case runState (runParserT moves'' () "" str) board0 of
(Left err, _) -> error $ "parse error: " ++ show err
(Right (), brd) -> brd
However, both these approaches of building up a state while parsing are weird and non-standard. A parser written in this form will be harder to understand and modify than the standard approach. Also, the intended usage for a user state is to maintain state that's necessary for the parser to decide how to perform the actual parse. For example, if you were parsing a language with dynamic operator precedence, you might want to maintain the current set of operator precedences as states, so when you parse an infixr 8 ** line, you can modify the state to correctly parse subsequent expressions. Using the user state to actually build up the result of the parse is not the intended usage.
Anyway, here's the code I used:
import Control.Lens
import Control.Monad
import Control.Monad.State
import Data.Foldable
import Text.Parsec
import Text.Parsec.Char
import Text.Parsec.String
data Piece = X | O | N deriving (Show)
type Board = [[Piece]]
data Move = Move Piece Int Int
-- *Standard parsing approach
moves :: Parser [Move]
moves = sepBy move (char ',')
where move = Move <$> piece <*> num <*> num
piece = X <$ char 'X' <|> O <$ char 'O'
num = read . (:[]) <$> digit
board0 :: Board
board0 = [[N,N,N],[N,N,N],[N,N,N]]
game :: [Move] -> Board
game = foldl' turn board0
turn :: Board -> Move -> Board
turn brd (Move p r c) = brd & ix r . ix c .~ p
loadGame :: String -> Board
loadGame str =
case parse moves "" str of
Left err -> error $ "parse error: " ++ show err
Right mvs -> game mvs
-- *Incoporate fold into parser
moves1 :: Parser Board
moves1 = foldl' turn board0 <$> sepBy move (char ',')
where move = Move <$> piece <*> num <*> num
piece = X <$ char 'X' <|> O <$ char 'O'
num = read . (:[]) <$> digit
-- *Non-standard effectful parser
type Parser' = Parsec String Board
move' :: Parser' ()
move' = do
m <- Move <$> piece <*> num <*> num
modifyState (flip turn m)
where piece = X <$ char 'X' <|> O <$ char 'O'
num = read . (:[]) <$> digit
moves' :: Parser' ()
moves' = void $ sepBy move' (char ',')
loadGame' :: String -> Board
loadGame' str =
case runParser (moves' >> getState) board0 "" str of
Left err -> error $ "parse error: " ++ show err
Right brd -> brd
-- *Monad transformer stack
type Parser'' = ParsecT String () (Control.Monad.State.State Board)
move'' :: Parser'' ()
move'' = do
m <- Move <$> piece <*> num <*> num
modify (flip turn m)
where piece = X <$ char 'X' <|> O <$ char 'O'
num = read . (:[]) <$> digit
moves'' :: Parser'' ()
moves'' = void $ sepBy move'' (char ',')
loadGame'' :: String -> Board
loadGame'' str =
case runState (runParserT moves'' () "" str) board0 of
(Left err, _) -> error $ "parse error: " ++ show err
(Right (), brd) -> brd
-- *Tests
main = do
print $ loadGame "X11,O00,X01"
print $ loadGame' "X11,O00,X01"
print $ loadGame'' "X11,O00,X01"

You probably want to use foldl (if I am understanding your question correctly). So you'll end up with a function like:
module Main where
import Data.Text
import Data.String
main :: IO ()
main =
putStrLn (show $ parseGameState "a, b, c")
data State = State deriving (Show)
parseGameState :: String -> [State]
parseGameState stateString = parsedState where
parsedState = Prelude.foldl mkNewStateFromPreviousAndMove [] moves where
moves = splitOn (fromString ",") (fromString stateString)
mkNewStateFromPreviousAndMove oldStates move = oldStates ++ [newState previousState move] where
previousState = Prelude.last oldStates
newState previousState move = State
What this is doing is:
Taking the CSV move string as input.
It is then splitting that string into a list of move strings.
Then, we start with an empty list, and fold the move strings into this list by applying mkNewStateFromPreviousAndMove to each element in the moves list and the last element of the list which is built up by the fold.
Note, you will need to add the following deps to your package.yaml file (if using stack):
text
This dep is used to split strings.

Related

Adding error handling to function - Haskell

I am completely new to haskell and seen examples online of how to add error handling but I'm not sure how to incorporate it in my context. Below is an example of the code which works before trying to handle errors.
expr'::Parser Double
expr' = term' `chainl1'` addop
term'::Parser Double
term' = factor' `chainl1` mulop
chainl :: Parser a -> Parser (a -> a -> a) -> a -> Parser a
chainl p op a = (p `chainl1` op) <|> pure a
chainl1 ::Parser a -> Parser (a -> a -> a) -> Parser a
chainl1 p op = p >>= rest
where
rest a = (do
f <- op
b <- p
rest (f a b)) <|> pure a
addop, mulop :: Parser (Double -> Double -> Double)
I've since expanded this to let addop and mulop return error messages if something irregular is found. This causes the function definition to change to:
addop, mulop :: Parser (Either String (Double -> Double -> Double))
In other programming languages I would check if f <- op is a String and return the string. However I'm not sure how to go about this in Haskell. The idea is that this error message returns all the way back to term'. Hence its function definition also needs to change eventually. This is all in the attempt to build a Monadic Parser.
If you're using parsec then you can make your code more general to work with the ParsecT monad transformer:
import Text.Parsec hiding (chainl1)
import Control.Monad.Trans.Class (lift)
expr' :: ParsecT String () (Either String) Double
expr' = term' `chainl1` addop
term' :: ParsecT String () (Either String) Double
term' = factor' `chainl1` mulop
factor' :: ParsecT String () (Either String) Double
factor' = read <$> many1 digit
chainl1 :: Monad m => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> ParsecT s u m a
chainl1 p op = p >>= rest
where
rest a = (do
f <- op
b <- p
rest (f a b))
<|> pure a
addop, mulop :: ParsecT String () (Either String) (Double -> Double -> Double)
addop = (+) <$ char '+' <|> (-) <$ char '-'
mulop = ((*) <$ char '*' <* lift (Left "error")) <|> (/) <$ char '/' <|> (**) <$ char '^'
I don't know what kind of errors you would want to return, so I've just made an error if an '*' is encountered in the input.
You can run the parser like this:
ghci> runParserT (expr' <* eof) () "buffer" "1+2+3"
Right (Right 6.0)
ghci> runParserT (expr' <* eof) () "buffer" "1+2*3"
Left "error"
The answer based on parsec implementation.
Actually the operator <|> is what you need. It handles any parsing errors. In expression a <|> b if the parser a fails then the parser b will be run (expect if the parser a consume some input before fails; for handle this case you can use combinator try like this: try a <|> b).
But if you want to handle error depending to the kind of error then you should do like #Noughtmare answered. But then I recomend you to do that:
Define your type for errors. It will be bugless to handle errors.
data MyError
= ME_DivByZero
| ...
You can simplify type signature if you define type alias for your parser.
type MyParser = ParsecT String () (Either MyError)
Then signatires will look like this:
expr' :: MyParser Double
addop, mulop :: MyParser (Double -> Double -> Double)
Use throwError to throw your errors and catchError to handle your errors, that will be more idiomatic. So it's look like this:
f <- catchError op $ \case
ME_DivByZero -> ...
ME_... -> ...
err -> throwError err -- rethrow error

Lazy parsing of large files with millions of data points

I have a parser I'm trying to write and I've gone over multiple versions of it and I can't seem to keep the memory usage down. I'm trying to parse the wikipedia sql dumps and in this example take the page entries file and throw them all in one giant vector (13 million pages). I got this working before, but I sharded the file into smaller pieces and aggregated them from the filesystem. Anyway, below is my attempt to parse these files in one pass. I'm pulling out the lines "INSERT INTO ... (data),(data)..(data);". I'm letting the other lines just fail in the parser and not yield values. The insert lines average 1MB in size and 8k entries.
I've tried what feels like a thousand versions of this with deepseqs and outputting lists of page entries per line instead of the vector I have now and using lineC with a partial parser instead of the linesUnboundedC. I assume there must be some high level concept I'm missing that is keeping the memory usage exponential. The file is 5GB and I'm easily blowing through 16GB of memory, but I can't seem to nail down a lazy thunk memory leak anywhere. I would assume that this code would be able to store each line independently and then move on without the extra memory to the next line, but I can't figure out what is wrong. I had no issue counting the amount of entries or using a more complex monoid to get data from the file using the same parser.
Any help is much appreciated!
{-# LANGUAGE OverloadedStrings #-}
module Parser.Helper where
import Conduit
import Control.DeepSeq
import Control.Monad (void)
import Control.Monad.Primitive
import Data.Attoparsec.Text
import Data.Maybe (catMaybes)
import Data.Text (Text)
import qualified Data.Vector as V
import Data.Vector.Generic (Vector)
import System.IO (openFile,IOMode(..))
data Page = Page !Int !Text !Bool
deriving Show
instance NFData Page where
rnf (Page pid title isRedirect) = pid `seq` title `seq` isRedirect `seq` ()
parseFile
:: (Foldable t, Vector v e)
=> FilePath -> Parser (t e) -> IO [v e]
parseFile fp parser =
do
handle <- openFile fp ReadMode
runConduit $ sourceHandle handle
.| decodeUtf8LenientC
.| peekForeverE linesUnboundedC
.| vectors parser
.| sinkList
vectors
:: (Foldable t, Vector v e, MonadBase base m, Control.Monad.Primitive.PrimMonad base)
=> Parser (t e) -> ConduitM Text (v e) m ()
vectors parser =
do
vectorBuilderC (1024*1024)
(\f ->
peekForeverE $ do
ml <- await
case ml of
Nothing -> return ()
Just l ->
case parseOnly parser l of
Left _ -> return ()
Right v -> mapM_ f v
)
parsePageLine :: Parser (V.Vector Page)
parsePageLine =
do
string "INSERT INTO" *> skipWhile (/= '(')
V.fromList . catMaybes <$> sepBy' parsePageField (char ',') <* char ';'
parsePageField :: Parser (Maybe Page)
parsePageField =
do
void $ char '('
pid <- parseInt <* char ','
namespace <- parseInt <* char ','
title <- parseTextField <* char ','
_ <- skipField <* char ','
_ <- skipField <* char ','
redirect <- parseInt <* char ','
void $ sepBy' skipField (char ',')
void $ char ')'
let ret = case namespace == 0 of
True -> Just $ Page pid title (redirect == 1)
False -> Nothing
return $ force ret
parseTextField :: Parser Text
parseTextField = char '\'' *> scan False f <* char '\''
where
f :: Bool -> Char -> Maybe Bool
f False '\'' = Nothing
f False '\\' = Just True
f _ _ = Just False
skipField :: Parser ()
skipField = void $ scan False f
where
f :: Bool-> Char -> Maybe Bool
f False '\\' = Just True
f False ',' = Nothing
f False ')' = Nothing
f _ _ = Just False
parseInt :: Parser Int
parseInt = signed $ decimal

Indentation-aware parsing of expression trees

As a follow-up to this question, I am now trying to parse an expression language that has variables and case ... of ... expressions. The syntax should be indentation-based:
Expressions can span multiple lines, as long as every line is indented relative to the first one; i.e. this should be parsed as a single application:
f x y
z
q
Each alternative of a case expression needs to be on its own line, indented relative to the case keyword. Right-hand sides can span multiple lines.
case E of
C -> x
D -> f x
y
should be parsed into a single case with two alternatives, with x and f x y as the right-hand sides
I've simplified my code into the following:
import qualified Text.Megaparsec.Lexer as L
import Text.Megaparsec hiding (space)
import Text.Megaparsec.Char hiding (space)
import Text.Megaparsec.String
import Control.Monad (void)
import Control.Applicative
data Term = Var String
| App [Term]
| Case Term [(String, Term)]
deriving Show
space :: Parser ()
space = L.space (void spaceChar) empty empty
name :: Parser String
name = try $ do
s <- some letterChar
if s `elem` ["case", "of"]
then fail $ unwords ["Unexpected: reserved word", show s]
else return s
term :: Parser () -> Parser Term
term sp = App <$> atom `sepBy1` try sp
where
atom = choice [ caseBlock
, Var <$> L.lexeme sp name
]
caseBlock = L.lineFold sp $ \sp' ->
Case <$>
(L.symbol sp "case" *> L.lexeme sp (term sp) <* L.symbol sp' "of") <*>
alt sp' `sepBy` try sp' <* sp
alt sp' = L.lineFold sp' $ \sp'' ->
(,) <$> L.lexeme sp' name <* L.symbol sp' "->" <*> term sp''
As you can see, I am trying to use the technique from this answer to separate alternatives with sp'aces that are more indented than the case keyword.
Problems
This seems to work for single expressions made up of application only:
λ» parseTest (L.lineFold space term) "x y\n z"
App [Var "x",Var "y",Var "z"]
It doesn't work for list of such expressions using the technique from the linked answer:
λ» parseTest (L.lineFold space $ \sp -> (term sp `sepBy` try sp)) "x\n y\nz"
3:1:
incorrect indentation (got 1, should be greater than 1)
case expressions fail out of the gate when trying to use line-folding:
λ» parseTest (L.lineFold space term) "case x of\n C -> y\n D -> z"
1:5:
Unexpected: reserved word "case"
case works without line folding for the outermost expression, for one alternative only:
λ» parseTest (term space) "case x of\n C -> y\n z"
App [Case (App [Var "x"]) [("C",App [Var "y",Var "z"])]]
But case fails as soon as I have multiple alternatives:
λ» parseTest (term space) "case x of\n C -> y\n D -> z"
3:2:
incorrect indentation (got 2, should be greater than 2)
What am I doing wrong?
I'm answering since I promised to take a look at this. This problem represents a rather difficult problem for Parsec-like parsers in their current state. I probably could make it work after spending much more time that I have available, but in the slot of time I can spend answering this, I only got this far:
module Main (main) where
import Control.Applicative
import Control.Monad (void)
import Text.Megaparsec
import Text.Megaparsec.String
import qualified Data.List.NonEmpty as NE
import qualified Text.Megaparsec.Lexer as L
data Term = Var String
| App [Term]
| Case Term [(String, Term)]
deriving Show
scn :: Parser ()
scn = L.space (void spaceChar) empty empty
sc :: Parser ()
sc = L.space (void $ oneOf " \t") empty empty
name :: Parser String
name = try $ do
s <- some letterChar
if s `elem` ["case", "of"]
then (unexpected . Label . NE.fromList) ("reserved word \"" ++ s ++ "\"")
else return s
manyTerms :: Parser [Term]
manyTerms = many pTerm
pTerm :: Parser Term
pTerm = caseBlock <|> app -- parse a term first
caseBlock :: Parser Term
caseBlock = L.indentBlock scn $ do
void (L.symbol sc "case")
t <- Var <$> L.lexeme sc name -- not sure what sort of syntax case of
-- case expressions should have, so simplified to vars for now
void (L.symbol sc "of")
return (L.IndentSome Nothing (return . Case t) alt)
alt :: Parser (String, Term)
alt = L.lineFold scn $ \sc' ->
(,) <$> L.lexeme sc' name <* L.symbol sc' "->" <*> pTerm -- (1)
app :: Parser Term
app = L.lineFold scn $ \sc' ->
App <$> ((Var <$> name) `sepBy1` try sc' <* scn)
-- simplified here, with some effort should be possible to go from Var to
-- more general Term in applications
Your original grammar is left-recursive because every term can be either a case expression or an application and if it's an application, then the first part of it again can be either case expression or application, etc. You'll need to deal with that somehow.
Here is a session:
λ> parseTest pTerm "x y\n z"
App [Var "x",Var "y",Var "z"]
λ> parseTest pTerm "x\n y\nz"
App [Var "x",Var "y"]
λ> parseTest manyTerms "x\n y\nz"
[App [Var "x",Var "y"],App [Var "z"]]
λ> parseTest pTerm "case x of\n C -> y\n D -> z"
Case (Var "x") [("C",App [Var "y"]),("D",App [Var "z"])]
λ> parseTest pTerm "case x of\n C -> y\n z"
3:3:
incorrect indentation (got 3, should be equal to 2)
This last result is because of (1) in the code. Introducing a parameter to app makes it impossible to use it without thinking of context (it would be no longer stand-alone expression, but factored-out part of something). We can see that if you indent z with respect to start of y application, not the entire alternative, it works:
λ> parseTest pTerm "case x of\n C -> y\n z"
Case (Var "x") [("C",App [Var "y",Var "z"])]
Finally, case expression works:
λ> parseTest pTerm "case x of\n C -> y\n D -> z"
Case (Var "x") [("C",App [Var "y"]),("D",App [Var "z"])]
My advice here would be to take a look at some pre-processor and use Megaparsec on top of that. The tools in Text.Megaparsec.Lexer are not that easy to apply in this case, but they are the best we could come up with and they work fine for simple indentation-sensitive grammars.

Parsing juxtaposition-based, indentation-aware syntax using Text.Parsec.Layout

I'm trying to parse a small language with Haskell-like syntax, using parsec-layout. The two key features that don't seem to interact too well with each other are:
Function application syntax is juxtaposition, i.e. if F and E are terms, F E is the syntax for F applied to E.
Indentation can be used to denote nesting, i.e. the following two are equivalent:
X = case Y of
A -> V
B -> W
X = case Y of A -> V; B -> W
I haven't managed to figure out a combination of skipping and keeping whitespace that would allow me to parse a list of such definitions. Here's my simplified code:
import Text.Parsec hiding (space, runP)
import Text.Parsec.Layout
import Control.Monad (void)
type Parser = Parsec String LayoutEnv
data Term = Var String
| App Term Term
| Case Term [(String, Term)]
deriving Show
name :: Parser String
name = spaced $ (:) <$> upper <*> many alphaNum
kw :: String -> Parser ()
kw = void . spaced . string
reserved :: String -> Parser ()
reserved s = try $ spaced $ string s >> notFollowedBy alphaNum
term :: Parser Term
term = foldl1 App <$> part `sepBy1` space
where
part = choice [ caseBlock
, Var <$> name
]
caseBlock = Case <$> (reserved "case" *> term <* reserved "of") <*> laidout alt
alt = (,) <$> (name <* kw "->") <*> term
binding :: Parser (String, Term)
binding = (,) <$> (name <* kw "=") <*> term
-- https://github.com/luqui/parsec-layout/issues/1
trim :: String -> String
trim = reverse . dropWhile (== '\n') . reverse
runP :: Parser a -> String -> Either ParseError a
runP p = runParser (p <* eof) defaultLayoutEnv "" . trim
If I try to run it on input like
s = unlines [ "A = case B of"
, " X -> Y Z"
, "C = D"
]
via runP (laidout binding) s, it fails on the application Y Z:
(line 2, column 10):
expecting space or semi-colon
However, if I change the definition of term to
term = foldl1 App <$> many1 part
then it doesn't stop parsing the term at the start of the (unindented!) third line, leading to
(line 3, column 4):
expecting semi-colon
I think the problem has to do with that name already eliminates the following space, so the sepBy1 in the definition of term doesn't see it.
Consider these simplified versions of term:
term0 = foldl1 App <$> (Var <$> name) `sepBy1` space
term1 = foldl1 App <$> (Var <$> name') `sepBy1` space
name' = (:) <$> upper <*> many alphaNum
term3 = foldl1 App <$> many (Var <$> name)
Then:
runP term0 "A B C" -- fails
runP term1 "A B C" -- succeeds
runP term3 "A B C" -- succeeds
I think part of the solution is to define
part = [ caseBlock, Var <$> name' ]
where name' is as above. However, there are still some issues.

Unplanned greedy behaviour in uu-parsinglib

The problem
I came across a problem today and I do not know how to solve it. It is very strange to me, because the code I've written should (according to my current knowledge) is correct.
So below you can find a sample parser combinators. The most important one is pOperator, which in very simple way (only for demonstration purposes) builds an operator AST.
It consumes "x" and can consume multiple "x" separated by whitespaces.
I've got also pParens combinator which is defined like:
pPacked pParenL (pWSpaces *> pParenR)
so it consumes Whitespaces before closing bracket.
Sample input / output
The correct input/output SHOULD be:
in: "(x)"
out: Single "x"
in: "(x )"
out: Single "x"
but I'm getting:
in: "(x)"
out: Single "x"
in: "(x )"
out: Multi (Single "x") (Single "x")
-- Correcting steps:
-- Inserted 'x' at position LineColPos 0 3 3 expecting one of ['\t', ' ', 'x']
but in the second example I'm getting error - and the parser behaves like it greedy eats some tokens (and there is no greedy operation).
I would be thankful for any help with it.
Sample code
import Prelude hiding(lex)
import Data.Char hiding (Space)
import qualified Text.ParserCombinators.UU as UU
import Text.ParserCombinators.UU hiding(parse)
import qualified Text.ParserCombinators.UU.Utils as Utils
import Text.ParserCombinators.UU.BasicInstances hiding (Parser)
data El = Multi El El
| Single String
deriving (Show)
---------- Example core grammar ----------
pElement = Single <$> pSyms "x"
pOperator = applyAll <$> pElement <*> pMany (flip <$> (Multi <$ pWSpaces1) <*> pElement)
---------- Basic combinators ----------
applyAll x (f:fs) = applyAll (f x) fs
applyAll x [] = x
pSpace = pSym ' '
pTab = pSym '\t'
pWSpace = pSpace <|> pTab
pWSpaces = pMany pWSpace
pWSpaces1 = pMany1 pWSpace
pMany1 p = (:) <$> p <*> pMany p
pSyms [] = pReturn []
pSyms (x : xs) = (:) <$> pSym x <*> pSyms xs
pParenL = Utils.lexeme $ pSym '('
pParenR = Utils.lexeme $ pSym ')'
pParens = pPacked pParenL (pWSpaces *> pParenR)
---------- Program ----------
pProgram = pParens pOperator
-- if you replace it with following line, it works:
-- pProgram = pParens pElement
-- so it seems like something in pOperator is greedy
tests = [ ("test", "(x)")
, ("test", "(x )")
]
---------- Helpers ----------
type Parser a = P (Str Char String LineColPos) a
parse p s = UU.parse ( (,) <$> p <*> pEnd) (createStr (LineColPos 0 0 0) s)
main :: IO ()
main = do
mapM_ (\(desc, p) -> putStrLn ("\n=== " ++ desc ++ " ===") >> run pProgram p) tests
return ()
run :: Show t => Parser t -> String -> IO ()
run p inp = do let (a, errors) = parse p inp
putStrLn ("-- Result: \n" ++ show a)
if null errors then return ()
else do putStr ("-- Correcting steps: \n")
show_errors errors
putStrLn "-- "
where show_errors :: (Show a) => [a] -> IO ()
show_errors = sequence_ . (map (putStrLn . show))
IMPORTANT
pOperator = applyAll <$> pElement <*> pMany (flip <$> (Multi <$ pWSpaces1) <*> pElement)
is equivalent to:
foldr pChainl pElement (Multi <$ pWSpaces1)
according to: Combinator Parsing: A Short Tutorial
And it is used to define operator precedense.
The definition of pMany reads:
pMany :: IsParser p => p a -> p [a]
pMany p = pList p
and this suggest the solution. When seeing the space we should not commit immediately to the choice to continue with more x-es so we define:
pMany :: IsParser p => p a -> p [a]
pMany_ng p = pList_ng p
Of course you may also call pList_ng immediately. Even better would be to write:
pParens (pChainr_ng (pMulti <$ pWSpaces1) px) --
I did not test it since I am not sure whether between x-es there should be at least one space etc.
Doaitse

Resources