Parse Within Parse - Attoparsec [duplicate] - parsing

given the following type and function, meant to parse a field of a CSV field into a string:
type Parser resultType = ParsecT String () Identity resultType
cell :: Parser String
I have implemented the following function:
customCell :: String -> Parser res -> Parser res
customCell typeName subparser =
cell
>>= either (const $ unexpected typeName)
return . parse (subparser <* eof) ""
Though I cannot stop thinking that I am not using the Monad concept as much as desired and that eventually there is a better way to merge the result of the inner with the outer parser, specially on what regards its failure.
Does anybody know how could I do so, or is this code what is meant to be done?
PS - I now realised that my type simplification is probably not appropriate and that maybe what I want is to replace the underlying Identity Monad by the Either Monad.... Unfortunately, I do not feel enough acquainted with monad transformers yet.
PS2 - What the hell is the underlying monad good for anyway?

Elaborating on #Daniel Wagner's answer... The way parsers are normally built with Parsec, you start with low-level parsers that parse specific characters (e.g., a plus sign or a digit), and you build parsers on top of them using combinators (like a many1 combinator that turns a parser that reads a single digit into a parser that reads one or more digits, or a monadic parse that parsers "one or more digits" followed by a "plus sign" followed by "one or more digits").
However, each parser, whether it's a low-level digit parser or a higher-level "addition expression" parser, is intended to be applied directly to the same input stream.
What you don't typically do is write a parser that gobbles a chunk of the input stream to produce, say, a String and another parser that parses that String (instead of the original input stream) and try to combine them. This is the kind of "vertical composition" that isn't directly supported by Parsec and looks unnatural and non-monadic.
As pointed out in the comments, there are some situations where vertical composition is the cleanest overall approach (like when you have one language embedded within the components or expressions of another language), but it's not the usual approach taken by a Parsec parser.
The bottom line in your application is that a cell parser that produces only a String is too specialized to be useful. A more useful Parsec framework for CSV files would be:
import Text.Parsec
import Text.Parsec.String
-- | `csv cell` parses a CSV file each of whose elements is parsed by `cell`
csv :: Parser a -> Parser [[a]]
csv cell = many (row cell)
-- | `row cell` parses a newline-terminated row of comma separated
-- `cell`-expressions
row :: Parser a -> Parser [a]
row cell = sepBy cell (char ',') <* char '\n'
Now, you can write a custom cell parser that parses positive integers:
customCell :: Parser Int
customCell = read <$> many1 digit
and parse CSV files:
> parse (csv customCell) "" "1,2,3\n4,5,6\n"
Right [[1,2,3],[4,5,6]]
>
Here, instead of having a cell subparser that explicitly parses a comma-delimited cell into a string to be fed to a different parser, the "cell" is an implicit context in which a supplied cell parser is called to parse the underlying input stream at the appropriate point where one would expect a comma-delimited cell in the middle of a row in the middle of the input stream.

Sadly I know of no parser library or parser generator for Haskell that supports vertical parser composition like this. Something like what you wrote is about as good as it gets. Dang!

Related

Generate parser that runs a received parser on the output of another parser and monadically joins the results

given the following type and function, meant to parse a field of a CSV field into a string:
type Parser resultType = ParsecT String () Identity resultType
cell :: Parser String
I have implemented the following function:
customCell :: String -> Parser res -> Parser res
customCell typeName subparser =
cell
>>= either (const $ unexpected typeName)
return . parse (subparser <* eof) ""
Though I cannot stop thinking that I am not using the Monad concept as much as desired and that eventually there is a better way to merge the result of the inner with the outer parser, specially on what regards its failure.
Does anybody know how could I do so, or is this code what is meant to be done?
PS - I now realised that my type simplification is probably not appropriate and that maybe what I want is to replace the underlying Identity Monad by the Either Monad.... Unfortunately, I do not feel enough acquainted with monad transformers yet.
PS2 - What the hell is the underlying monad good for anyway?
Elaborating on #Daniel Wagner's answer... The way parsers are normally built with Parsec, you start with low-level parsers that parse specific characters (e.g., a plus sign or a digit), and you build parsers on top of them using combinators (like a many1 combinator that turns a parser that reads a single digit into a parser that reads one or more digits, or a monadic parse that parsers "one or more digits" followed by a "plus sign" followed by "one or more digits").
However, each parser, whether it's a low-level digit parser or a higher-level "addition expression" parser, is intended to be applied directly to the same input stream.
What you don't typically do is write a parser that gobbles a chunk of the input stream to produce, say, a String and another parser that parses that String (instead of the original input stream) and try to combine them. This is the kind of "vertical composition" that isn't directly supported by Parsec and looks unnatural and non-monadic.
As pointed out in the comments, there are some situations where vertical composition is the cleanest overall approach (like when you have one language embedded within the components or expressions of another language), but it's not the usual approach taken by a Parsec parser.
The bottom line in your application is that a cell parser that produces only a String is too specialized to be useful. A more useful Parsec framework for CSV files would be:
import Text.Parsec
import Text.Parsec.String
-- | `csv cell` parses a CSV file each of whose elements is parsed by `cell`
csv :: Parser a -> Parser [[a]]
csv cell = many (row cell)
-- | `row cell` parses a newline-terminated row of comma separated
-- `cell`-expressions
row :: Parser a -> Parser [a]
row cell = sepBy cell (char ',') <* char '\n'
Now, you can write a custom cell parser that parses positive integers:
customCell :: Parser Int
customCell = read <$> many1 digit
and parse CSV files:
> parse (csv customCell) "" "1,2,3\n4,5,6\n"
Right [[1,2,3],[4,5,6]]
>
Here, instead of having a cell subparser that explicitly parses a comma-delimited cell into a string to be fed to a different parser, the "cell" is an implicit context in which a supplied cell parser is called to parse the underlying input stream at the appropriate point where one would expect a comma-delimited cell in the middle of a row in the middle of the input stream.
Sadly I know of no parser library or parser generator for Haskell that supports vertical parser composition like this. Something like what you wrote is about as good as it gets. Dang!

Approaches For Deeply Extending a Parser Library

The material on parser combinators I have found covers building up complex parsers though composition, but I would like to know if there are any good approaches for defining parsers by tweaking the composed parsers of a library without completely duplicating the original library's logic.
For example, here is a simplified CSV parser defined in Real world Haskell
import Text.ParserCombinators.Parsec
csvFile = endBy line eol
line = sepBy cell (char ',')
cell = many (noneOf ",\n")
eol = char '\n'
Assuming csvFile is defined in one library, can another library create its own CSV parser using a custom version of the cell parser without having to rewrite the line and csvFile parsers as well? Can the source library be rewritten to make this possible? This is simple enough for the CSV parser but I am interested in a broadly applicable solution.
Generally you'd need to abstract over the signature of the components you want to replace. For instance, in the CSV example we'd need to extend the type of csvFile to allow us to slot in a custom cell.
line cell = sepBy cell (char ',')
csvFile cell = endBy (line cell) eol
Obviously this gets unwieldy quickly. We can package all of the extension points desired up into a dictionary however and pass it around
data LanguageDefinition =
LanguageDefinition { cell :: Parser Cell
, ...
}
Then we parameterize the entire parser combinator library over this LanguageDefinition
data Parsers = Parsers { line :: Parser Line, csvFile :: Parser [Line], ... }
mkParsers :: LanguageDefinition -> Parsers
This is exactly the approach taken by the generalized Token parsing modules of Parsec: see Text.Parsec.Token and Text.Parsec.Language.
Even more generic approaches can be taken which abstract more and more things into the dictionary that's getting passed around. Effectively this becomes an object- or OCaml module oriented method of organizing code and can be very effective.
The folk Expression Problem states that there's a tension between introducing more functionality and introducing more variants. In this case, you're asking for a new variant so you need to fix the functionality (and list it all out in a dictionary). This will open a path to introduce new variants.

Insert a character into parser combinator character stream in Haskell

This question is related to both Parsec and uu-parsinglib. When we write parser combinators, they process characters streams from compiler. Is it somehow possible to parse a character and put it back (or return another character back) to the input stream?
I want for example to parse input "test + 5", parse the t, e, s, t and after recognition of test pattern, put for example v character back into the character stream, so while continuating the parsing process we are matching against v + 5
I do not want to use this in any particular case for now - I want to deeply learn the possibilities.
I'm not sure if it's possible with these parsers directly, but in general you can accomplish it by combining parsers with some streaming that allows injecting leftovers.
For example, using attoparsec-conduit you can turn a parser into a conduit using
sinkParser :: (AttoparsecInput a, MonadThrow m)
=> Parser a b -> Consumer a m b
where Consumer is a special kind of conduit that doesn't produce any output, only receives input and returns a final value.
Since conduits support leftovers, you can create a helper method that converts a parser that optionally returns a value to be pushed into the stream into a conduit:
import Data.Attoparsec.Types
import Data.Conduit
import Data.Conduit.Attoparsec
import Data.Functor
reinject :: (AttoparsecInput a, MonadThrow m)
=> Parser a (Maybe a, b) -> Consumer a m b
reinject p = do
(lo, r) <- sinkParser p
maybe (return ()) leftover lo
return r
Then you convert standard parsers to conduits using sinkParser and these special parsers using reinject, and then combine conduits instead of parsers.
I think the simplest way to archive this is to build a multi-layered parser. Think of a lexer + parser combination. This is a clean approach to this problem.
You have to separate the two kind of parsing. The search-and-replace parsing goes to the first parser and the build-the-AST parsing to the second. Or you can create an intermediate token representation.
import Text.Parsec
import Text.Parsec.String
parserLvl1 :: Parser String
parserLvl1 = many (try (string "test" >> return 'v') <|> anyChar)
parserLvl2 :: Parser Plus
parserLvl2 = do text1 <- many (noneOf "+")
char '+'
text2 <- many (noneOf "+")
return $ Plus text1 text2
data Plus = Plus String String
deriving Show
wholeParse :: String -> Either ParseError Plus
wholeParse source = do res1 <- parse parserLvl1 "lvl1" source
res2 <- parse parserLvl2 "lvl2" res1
return res2
Now you can parse your example. wholeParse "test+5" results in Right (Plus "v" "5").
Possible variations:
Create a class and an instance for combining wrapped parser stages. (Possibly carrying parser state.)
Create an intermediate representation, a stream of tokens
This is easily done in uu-parsinglib using the pSwitch function. But the question is why you want to do so? Because the v is missing from the input? In that case uu-parsinglib will perform error correction automatically so you do not need something like this. Otherwise you can write
pSwitch :: (st1 -> (st2, st2 -> st1)) -> P st2 a -> P st1 a
pInsert_v = pSwitch (\st1 -> (prepend v st2, id) (pSucceed ())
It depends on your actual state type how the v is actually added, so you will have to define the function prepend yourself. I do not know e.g. how such an insertion would influence the current position in the file etc.
Doaitse Swierstra

Using Parsec to write a Read instance

Using Parsec, I'm able to write a function of type String -> Maybe MyType with relative ease. I would now like to create a Read instance for my type based on that; however, I don't understand how readsPrec works or what it is supposed to do.
My best guess right now is that readsPrec is used to build a recursive parser from scratch to traverse a string, building up the desired datatype in Haskell. However, I already have a very robust parser who does that very thing for me. So how do I tell readsPrec to use my parser? What is the "operator precedence" parameter it takes, and what is it good for in my context?
If it helps, I've created a minimal example on Github. It contains a type, a parser, and a blank Read instance, and reflects quite well where I'm stuck.
(Background: The real parser is for Scheme.)
However, I already have a very robust parser who does that very thing for me.
It's actually not that robust, your parser has problems with superfluous parentheses, it won't parse
((1) (2))
for example, and it will throw an exception on some malformed inputs, because
singleP = Single . read <$> many digit
may use read "" :: Int.
That out of the way, the precedence argument is used to determine whether parentheses are necessary in some place, e.g. if you have
infixr 6 :+:
data a :+: b = a :+: b
data C = C Int
data D = D C
you don't need parentheses around a C 12 as an argument of (:+:), since the precedence of application is higher than that of (:+:), but you'd need parentheses around C 12 as an argument of D.
So you'd usually have something like
readsPrec p = needsParens (p >= precedenceLevel) someParser
where someParser parses a value from the input without enclosing parentheses, and needsParens True thing parses a thing between parentheses, while needsParens False thing parses a thing optionally enclosed in parentheses [you should always accept more parentheses than necessary, ((((((1)))))) should parse fine as an Int].
Since the readsPrec p parsers are used to parse parts of the input as parts of the value when reading lists, tuples etc., they must return not only the parsed value, but also the remaining part of the input.
With that, a simple way to transform a parsec parser to a readsPrec parser would be
withRemaining :: Parser a -> Parser (a, String)
withRemaining p = (,) <$> p <*> getInput
parsecToReadsPrec :: Parser a -> Int -> ReadS a
parsecToReadsPrec parsecParser prec input
= case parse (withremaining $ needsParens (prec >= threshold) parsecParser) "" input of
Left _ -> []
Right result -> [result]
If you're using GHC, it may however be preferable to use a ReadPrec / ReadP parser (built using Text.ParserCombinators.ReadP[rec]) instead of a parsec parser and define readPrec instead of readsPrec.

Using Haskell's Parsec for Programming Language Converter

Say I have two languages (A & B). My goal is to write some type of program to convert the syntax found in A to the equivalent of B. Currently my solution has been to use Haskell's Parsec to perform this task. As someone who is new to Haskell and functional programming for that matter however, finding just a simple example in Parsec has been quite difficult. The examples I have found on the web are either incomplete examples (frustrating for a new Haskell programmer) or too much removed from my goal.
So can someone provide me with an amazingly trivial and explicit example of using Parsec for something related to what I'd like to achieve? Or perhaps even some tutorial that parallels my goal as well.
Thanks.
Consider the following simple grammar of a CSV document (In ABNF):
csv = *crow
crow = *(ccell ',') ccell CR
ccell = "'" *(ALPHA / DIGIT) "'"
We want to write a converter that converts this grammar into a TSV (tabulator separated values) document:
tsv = *trow
trow = *(tcell HTAB) tcell CR
tcell = DQUOTE *(ALPHA / DIGIT) DQUOTE
First of all, let's create an algebraic data type that descibes our abstract syntax tree. Type synonyms are included to ease understandment:
data XSV = [Row]
type Row = [Cell]
type Cell = String
Writing a parser for this grammar is pretty simple. We write a parser as if we would describe the ABNF:
csv :: Parser XSV
csv = XSV <$> many crow
crow :: Parser Row
crow = do cells <- ccell `sepBy` (char ',')
newline
return cells
ccell :: Parser Cell
ccell = do char '\''
content <- many (digit <|> letter)
char '\''
return content
This parser uses do-notation. After a do, a sequence of statements follows. For parsers, these statements are simply other parsers. One can use <- to bind the result of a parser. This way, one builds a big parser by chaining multiple smaller parsers. To obtain interesting effects, one can also combine parser using special combinators (such as a <|> b, which parses either a or b or many a, which parses as many as as possible). Please be aware that Parsec does not backtrack by default. If a parser might fail after consuming characters, prefix it with try to enable backtracking for one instance. try slows down parsing.
The result is a parser csv that parses our CSV document into an abstract syntax tree. Now it is easy to turn that into another language (such as TSV):
xsvToTSV :: XSV -> String
xsvToTSV xst = unlines (map toLines xst) where
toLines = intersperse '\t'
Connecting these two things one gets a conversion function:
csvToTSV :: String -> Maybe String
csvToTSV document = case parse csv "" document of
Left _ -> Nothing
Right xsv -> xsvToTSV xsv
And that is all! Parsec has lots of other functions to build up extremely sophisticated parsers. The book Real World Haskell has a nice chapter about parsers, but it's a little bit outdated. Most of that is still true, though. If you have further questions, feel free to ask.

Resources