parsec: feeding output of one parser to another [duplicate] - parsing

This question already has answers here:
Generate parser that runs a received parser on the output of another parser and monadically joins the results
(2 answers)
Closed 5 years ago.
I use (abuse) parsers to do some string transformation e.g. normalizeWS :: Parser String removes duplicate whitespace and normalizeCase maps specific strings to lower case. I use parsers because the input data has some structure for example literate strings have to be left untransformed. Is there an elegant way to feed the output of one parser as input to the next and thus form a transformation pipeline? Something in the vein of normalizeWS . normalizeCase (which of course doesnt work)?
Many thanks in advance!

I solved the problem using this approach ... maybe there is a more elegant way
preprocessor :: Parser String
preprocessor = normalizeCase `feeds` expandKettensatz `feeds` normalizeWs
feeds :: Parser String -> Parser String -> Parser String
feeds p1 p2 = do
s <- p1
setInput s
p2

If you have functions like
normalizeWhitespace :: Stream s m Char => ParsecT s u m String
normalizeCase :: Stream s m Char => Set String -> Parsec s u m String
You could chain them together using runParser and >>=:
runBoth :: Stream s Identity Char => Set String -> SourceName -> s -> Either ParseError String
runBoth wordSet src input = do
input <- runParser normalizeWhitespace () src input
runParser (normalizeCase wordSet) () src input
But this doesn't give you a parser that you can chain together with other parsers.
This isn't terribly surprising, as parser composition in Parsec is all about
composing parsers that operate on the same stream, whereas these operate on
different streams.
Having multiple different streams is pretty common too - using the output of a tokenization or lexing
pass as input to parsing can make the process easier
to understand, but Parsec is a little easier to use out of the box as a direct parser (without
lexing/tokenization).

Related

Unpacking nested applicative functors f#

Hi I am attempting to make a combinator parser and I am currently attempting to make it read headers and create parsers based upon what the header which is parsed is. I.e A header of; int, float, string will result in Parser<Parser<int>*Parser<float>*Parser<string>>.
I am wondering however how you would unpack the "inner" parsers which and then end up with a something like; Parser<int*float*string>?
Parser type is: type Parser<'a> = Parser of (string -> Result<'a * string, string>)
I'm not sure that your idea with nested parsers is going to work - if you parse a header dynamically, then you'll need to produce a list of parsers of the same type. The way you wrote this is suggesting that the type of the parser will depend on the input, which is not possible in F#.
So, I'd expect that you will need to define a value like this:
type Value = Int of int | String of string | Float of float
And then your parser that parses a header will produce something like:
let parseHeaders args : Parser<Parser<Value> list> = (...)
The next question is, what do you want to do with the nested parsers? Presumably, you'll need to turn them into a single parser that parses the whole line of data (if this is something like a CSV file). Typically, you'd define a function sequence:
val sequence : sep:Parser<unit> -> parsers:Parser<'a> list -> Parser<'a list>
This takes a separator (say, parser to recognize a comma) and a list of parsers and produces a single parser that runs all the parsers in a sequence with the separator in between.
Then you can do:
parseHeaders input |> map (fun parsers -> sequence (char ',') parsers)
And you get a single parser Parser<Parser<string>>. You now want to run the nested parser on the rest of the that is left after running the outer parser, which recognizes headers. The following function does the trick:
let unwrap (Parser f:Parser<Parser<'a>>) = Parser (fun s ->
match f s with
| Result.Ok(Parser nested, rest) -> nested rest
| Result.Error e -> Result.Error e )

Generate parser that runs a received parser on the output of another parser and monadically joins the results

given the following type and function, meant to parse a field of a CSV field into a string:
type Parser resultType = ParsecT String () Identity resultType
cell :: Parser String
I have implemented the following function:
customCell :: String -> Parser res -> Parser res
customCell typeName subparser =
cell
>>= either (const $ unexpected typeName)
return . parse (subparser <* eof) ""
Though I cannot stop thinking that I am not using the Monad concept as much as desired and that eventually there is a better way to merge the result of the inner with the outer parser, specially on what regards its failure.
Does anybody know how could I do so, or is this code what is meant to be done?
PS - I now realised that my type simplification is probably not appropriate and that maybe what I want is to replace the underlying Identity Monad by the Either Monad.... Unfortunately, I do not feel enough acquainted with monad transformers yet.
PS2 - What the hell is the underlying monad good for anyway?
Elaborating on #Daniel Wagner's answer... The way parsers are normally built with Parsec, you start with low-level parsers that parse specific characters (e.g., a plus sign or a digit), and you build parsers on top of them using combinators (like a many1 combinator that turns a parser that reads a single digit into a parser that reads one or more digits, or a monadic parse that parsers "one or more digits" followed by a "plus sign" followed by "one or more digits").
However, each parser, whether it's a low-level digit parser or a higher-level "addition expression" parser, is intended to be applied directly to the same input stream.
What you don't typically do is write a parser that gobbles a chunk of the input stream to produce, say, a String and another parser that parses that String (instead of the original input stream) and try to combine them. This is the kind of "vertical composition" that isn't directly supported by Parsec and looks unnatural and non-monadic.
As pointed out in the comments, there are some situations where vertical composition is the cleanest overall approach (like when you have one language embedded within the components or expressions of another language), but it's not the usual approach taken by a Parsec parser.
The bottom line in your application is that a cell parser that produces only a String is too specialized to be useful. A more useful Parsec framework for CSV files would be:
import Text.Parsec
import Text.Parsec.String
-- | `csv cell` parses a CSV file each of whose elements is parsed by `cell`
csv :: Parser a -> Parser [[a]]
csv cell = many (row cell)
-- | `row cell` parses a newline-terminated row of comma separated
-- `cell`-expressions
row :: Parser a -> Parser [a]
row cell = sepBy cell (char ',') <* char '\n'
Now, you can write a custom cell parser that parses positive integers:
customCell :: Parser Int
customCell = read <$> many1 digit
and parse CSV files:
> parse (csv customCell) "" "1,2,3\n4,5,6\n"
Right [[1,2,3],[4,5,6]]
>
Here, instead of having a cell subparser that explicitly parses a comma-delimited cell into a string to be fed to a different parser, the "cell" is an implicit context in which a supplied cell parser is called to parse the underlying input stream at the appropriate point where one would expect a comma-delimited cell in the middle of a row in the middle of the input stream.
Sadly I know of no parser library or parser generator for Haskell that supports vertical parser composition like this. Something like what you wrote is about as good as it gets. Dang!

Insert a character into parser combinator character stream in Haskell

This question is related to both Parsec and uu-parsinglib. When we write parser combinators, they process characters streams from compiler. Is it somehow possible to parse a character and put it back (or return another character back) to the input stream?
I want for example to parse input "test + 5", parse the t, e, s, t and after recognition of test pattern, put for example v character back into the character stream, so while continuating the parsing process we are matching against v + 5
I do not want to use this in any particular case for now - I want to deeply learn the possibilities.
I'm not sure if it's possible with these parsers directly, but in general you can accomplish it by combining parsers with some streaming that allows injecting leftovers.
For example, using attoparsec-conduit you can turn a parser into a conduit using
sinkParser :: (AttoparsecInput a, MonadThrow m)
=> Parser a b -> Consumer a m b
where Consumer is a special kind of conduit that doesn't produce any output, only receives input and returns a final value.
Since conduits support leftovers, you can create a helper method that converts a parser that optionally returns a value to be pushed into the stream into a conduit:
import Data.Attoparsec.Types
import Data.Conduit
import Data.Conduit.Attoparsec
import Data.Functor
reinject :: (AttoparsecInput a, MonadThrow m)
=> Parser a (Maybe a, b) -> Consumer a m b
reinject p = do
(lo, r) <- sinkParser p
maybe (return ()) leftover lo
return r
Then you convert standard parsers to conduits using sinkParser and these special parsers using reinject, and then combine conduits instead of parsers.
I think the simplest way to archive this is to build a multi-layered parser. Think of a lexer + parser combination. This is a clean approach to this problem.
You have to separate the two kind of parsing. The search-and-replace parsing goes to the first parser and the build-the-AST parsing to the second. Or you can create an intermediate token representation.
import Text.Parsec
import Text.Parsec.String
parserLvl1 :: Parser String
parserLvl1 = many (try (string "test" >> return 'v') <|> anyChar)
parserLvl2 :: Parser Plus
parserLvl2 = do text1 <- many (noneOf "+")
char '+'
text2 <- many (noneOf "+")
return $ Plus text1 text2
data Plus = Plus String String
deriving Show
wholeParse :: String -> Either ParseError Plus
wholeParse source = do res1 <- parse parserLvl1 "lvl1" source
res2 <- parse parserLvl2 "lvl2" res1
return res2
Now you can parse your example. wholeParse "test+5" results in Right (Plus "v" "5").
Possible variations:
Create a class and an instance for combining wrapped parser stages. (Possibly carrying parser state.)
Create an intermediate representation, a stream of tokens
This is easily done in uu-parsinglib using the pSwitch function. But the question is why you want to do so? Because the v is missing from the input? In that case uu-parsinglib will perform error correction automatically so you do not need something like this. Otherwise you can write
pSwitch :: (st1 -> (st2, st2 -> st1)) -> P st2 a -> P st1 a
pInsert_v = pSwitch (\st1 -> (prepend v st2, id) (pSucceed ())
It depends on your actual state type how the v is actually added, so you will have to define the function prepend yourself. I do not know e.g. how such an insertion would influence the current position in the file etc.
Doaitse Swierstra

Haskell Parsec, adapting oneOf to [String]

I'm going through the Write yourself a scheme in 48 hours tutorial.
symbol :: Parser Char
symbol = oneOf "!#$%&|*+-/:<=>?#^_~"
This is great for symbols, but what if I have a list of keywords? (i.e. struct, int)
can oneOf be adapted to lists? This is ideally what I want, depicted below.
keywords :: Parser String
keywords = oneOf ["struct","int",..etc]
Or should I import Text.Parsec.Char and try to mapM string over the list of keywords?
I'm attempting to tokenize and just wanted to know what best practices were from others who have gone down this road.
The docs say to use something like this:
divOrMod = string "div"
<|> string "mod"
http://hackage.haskell.org/packages/archive/parsec/3.0.0/doc/html/Text-Parsec-Char.html
The general form of this is the choice combinator, which has the following type:
choice :: Stream s m t => [ParsecT s u m a] -> ParsecT s u m a
Basically, you give it a list of parsers, and it tries them in order until one succeeds. choice is implemented using (<|>), so it's the same as that approach.
In your case, to match a list of keywords but no other parsers, you can just map string over a list of Strings and then use choice on that.
On the other hand, mapM string would do something entirely different--it would expect all of the parsers to succeed in order.

Using Parsec to write a Read instance

Using Parsec, I'm able to write a function of type String -> Maybe MyType with relative ease. I would now like to create a Read instance for my type based on that; however, I don't understand how readsPrec works or what it is supposed to do.
My best guess right now is that readsPrec is used to build a recursive parser from scratch to traverse a string, building up the desired datatype in Haskell. However, I already have a very robust parser who does that very thing for me. So how do I tell readsPrec to use my parser? What is the "operator precedence" parameter it takes, and what is it good for in my context?
If it helps, I've created a minimal example on Github. It contains a type, a parser, and a blank Read instance, and reflects quite well where I'm stuck.
(Background: The real parser is for Scheme.)
However, I already have a very robust parser who does that very thing for me.
It's actually not that robust, your parser has problems with superfluous parentheses, it won't parse
((1) (2))
for example, and it will throw an exception on some malformed inputs, because
singleP = Single . read <$> many digit
may use read "" :: Int.
That out of the way, the precedence argument is used to determine whether parentheses are necessary in some place, e.g. if you have
infixr 6 :+:
data a :+: b = a :+: b
data C = C Int
data D = D C
you don't need parentheses around a C 12 as an argument of (:+:), since the precedence of application is higher than that of (:+:), but you'd need parentheses around C 12 as an argument of D.
So you'd usually have something like
readsPrec p = needsParens (p >= precedenceLevel) someParser
where someParser parses a value from the input without enclosing parentheses, and needsParens True thing parses a thing between parentheses, while needsParens False thing parses a thing optionally enclosed in parentheses [you should always accept more parentheses than necessary, ((((((1)))))) should parse fine as an Int].
Since the readsPrec p parsers are used to parse parts of the input as parts of the value when reading lists, tuples etc., they must return not only the parsed value, but also the remaining part of the input.
With that, a simple way to transform a parsec parser to a readsPrec parser would be
withRemaining :: Parser a -> Parser (a, String)
withRemaining p = (,) <$> p <*> getInput
parsecToReadsPrec :: Parser a -> Int -> ReadS a
parsecToReadsPrec parsecParser prec input
= case parse (withremaining $ needsParens (prec >= threshold) parsecParser) "" input of
Left _ -> []
Right result -> [result]
If you're using GHC, it may however be preferable to use a ReadPrec / ReadP parser (built using Text.ParserCombinators.ReadP[rec]) instead of a parsec parser and define readPrec instead of readsPrec.

Resources