Parse multiple instances of a string using ReadP - parsing

Problem Statement: Parse the string "AB" in "ABCDFGABIJABGHA"
Constraint: Using ReadP
Expected Solution : (["AB","AB","AB"],"whatever is left")
Attempt:
getAll :: ReadP a -> ReadP [a]
getAll p = many loop
where
loop = p <|> (get >> loop)
readP_to_S (getAll $ string "AB") "ABCDFGABIJABGHA"
[([],"ABCDFGABIJABGHA"),(["AB"],"CDFGABIJABGHA"),(["AB","AB"],"IJABGHA"),(["AB"],"IJABGHA"),(["AB","AB","AB"],"GHA"),(["AB","AB"],"GHA"),(["AB","AB"],"GHA"),(["AB"],"GHA")]
I wanted the last state to be (["AB","AB","AB"],"GHA"). Is it possible to do the same using ReadP?

The problem is that you are doing a symmetric choice with <|>. If you want your parser to match all the ps without exception use the provided left-biased choice: <++.
getAll :: ReadP a -> ReadP [a]
getAll p = many loop
where
loop = p <++ (get >> loop)

Related

Writing a parser for Persons in haskell

I'm trying to write a parser for a data Person (data Person). But I have to write it in just one line using <$> and <*> and I was trying a lot, but I'm getting really "overtaxed".
The parser type is as usual:
newtype Parser a = Parser (String -> [(a,String)])
And I have this function:
parse :: Parser a -> String -> Maybe a
that returns the first complete parse.
e.g.
if I have this easy function:
upper :: Parser Char
upper = satisfy isUpper
If I run parse upper "A" I get Just 'A'
I also have a funnier function like this:
name :: Parser String
name = (:) <$> (satisfy isUpper) <*> (many $ satisfy isAlpha)
which, as you can see, accepts all strings that are literal characters and begin with an upper Letter.
so:
*Main> parse name1 "hello"
Nothing
*Main> parse name1 "Hello"
Just "Hello"
Until now is everything fine, the only problem is that I have to do something like that for the class (data, type ?!) Person (data Person)
so, I have this:
data Person = Person String deriving (Eq, Show)
And then, in just one line, I have to write the parser for Person, but the name should satisfy the function name, it means, the name should be just a chain of literal characters, where the first one is upper case.
And it should work so:
> parse parserPerson "Chuck"
Just (Person "Chuck")
> parse parserPerson "chuck"
Nothing
where:
parserPerson :: Parser Person
parserPerson = ???
As you can see, bevor "Chuck" there is Person, so I've to use somehow *> to get it.
And that's it, just a line with <$>, <*> and *> that works that way.
I don't have a clue, and I'm getting crazy with this. Maybe anyone could help me.
EDIT
satisfy :: (Char -> Bool) -> Parser Char -- parse a desired character
satisfy p = Parser check
where
check (c:s) | p c = [(c,s)] -- successful
check _ = [ ] -- no parse
and many (as some) are functions from the Control.Applicative Control.Applicative
As tsorn said, the answer was really easy...
parserPerson :: Parser Person
parserPerson = Person <$> name1
and it works because the Functor Instacnce was defined.
instance Functor Parser where
fmap f (Parser p) = Parser $ \s -> map (\(a,b) -> (f a, b)) $ p s

The value of x is undefined here, so this reference is not allowed

I wrote a very simple parser combinator library which seems to work alright (https://github.com/mukeshsoni/tinyparsec).
I then tried writing parser for json with the library. The code for the json parser is here - https://github.com/mukeshsoni/tinyparsec/blob/master/src/example_parsers/JsonParser.purs
The grammar for json is recursive -
data JsonVal
= JsonInt Int
| JsonString String
| JsonBool Boolean
| JsonObj (List (Tuple String JsonVal))
Which means the parser for json object must again call the parser for jsonVal. The code for jsonObj parser looks like this -
jsonValParser
= jsonIntParser <|> jsonBoolParser <|> jsonStringParser <|> jsonObjParser
propValParser :: Parser (Tuple String JsonVal)
propValParser = do
prop <- stringLitParser
_ <- symb ":"
val <- jsonValParser
pure (Tuple prop val)
listOfPropValParser :: Parser (List (Tuple String JsonVal))
listOfPropValParser = sepBy propValParser (symb ",")
jsonObjParser :: Parser JsonVal
jsonObjParser = do
_ <- symb "{"
propValList <- listOfPropValParser
_ <- symb "}"
pure (JsonObj propValList)
But when i try to build it, i get the following error - The value of propValParser is undefined here. So this reference is not allowed here
I found similar issues on stackoverflow but could not understand why the error happens or how should i refactor my code so that it takes care of the recursive references from jsonValParser to propValParser.
Any help would be appreciated.
See https://stackoverflow.com/a/36991223/139614 for a similar case - you'll need to make use of the fix function, or introduce Unit -> ... in front of a parser somewhere to break the cyclic definition.
I managed to get rid of the error by wrapping the blocks which were throwing error inside a do block and starting the do block with a noop -
listOfPropValParser :: Parser (List (Tuple String JsonVal))
listOfPropValParser = do
_ <- pure 1 -- does nothing but defer the execution of the second line
sepBy propValParser (symb ",")
Had to do the same for jsonValParser.
jsonValParser = do
_ <- pure 1
jsonIntParser <|> jsonBoolParser <|> jsonStringParser <|> jsonObjParser
The idea is to defer the execution of the code which might lead to cyclic dependency. The added line, _ <- pure 1, does exactly that. I think it might be doing the same as fix from Data.Fix does or what defer from Data.Lazy does.

Monadic parse with uu-parsinglib

I'm trying to create a Monadic parser using uu_parsinglib. I thought I had it covered, but I'm getting some unexpected results in testing
A cut down example of my parser is:
pType :: Parser ASTType
pType = addLength 0 $
do (Amb n_list) <- pName
let r_list = filter attributeFilter n_list
case r_list of
(ASTName_IdName a : [] ) -> return (ASTType a)
(ASTName_TypeName a : [] ) -> return (ASTType a)
_ -> pFail
where nameFilter :: ASTName' -> Bool
nameFilter a =
case a of
(ASTName_IDName _) -> True
(ASTName_TypeName _) -> True
_ -> False
data ASTType = ASTType ASTName
data ASTName = Amb [ASTName']
data ASTName' =
ASTName_IDName ASTName
ASTName_TypeName ASTName
ASTName_OtherName ASTName
ASTName_Simple String
pName is an ambiguous parser. What I want type parser to do is apply a post filter, and return all alternatives that satisfy nameFilter, wrapped as ASTType.
If there are none, it should fail.
(I realise the example I've given will fail if there is more than one valid match in the list, but the example serves its purpose)
Now, this all works as far as I can see. The problem lies when you use it in more complicated Grammars, where odd matches seem to occur. What I suspect is the problem is the addLength 0 part
What I would like to do is separate out the monadic and applicative parts. Create a monadic parser with the filtering component, and then apply pName using the <**> operator.
Alternatively
I'd settle for a really good explanation of what addLength is doing.
I've put together a fudge/workaround to use for monadic parsing with uu-parsinglib. The only way I ever use Monadic parsers is to analysis a overly generous initial parser, and selectively fail its results.
bind' :: Parser a -> (a -> Parser b) -> Parser b
bind' a#(P _ _ _ l') b = let (P t nep e _) = (a >>= b) in P t nep e l'
The important thing to remember when using this parser is that
a -> M b
must consume no input. It must either return a transformed version of a, or fail.
WARNING
Testing on this is only minimal currently, and its behaviour is not enforced by type. It is a fudge.

Parsing a particular string in Haskell

I'm using the parsec Haskell library.
I want to parse strings of the following kind:
[[v1]][[v2]]
xyz[[v1]][[v2]]
[[v1]]xyz[[v2]]
etc.
I'm interesting to collect only the values v1 and v2, and store these in a data structure.
I tried with the following code:
import Text.ParserCombinators.Parsec
quantifiedVars = sepEndBy var (string "]]")
var = between (string "[[") (string "") (many (noneOf "]]"))
parseSL :: String -> Either ParseError [String]
parseSL input = parse quantifiedVars "(unknown)" input
main = do {
c <- getContents;
case parse quantifiedVars "(stdin)" c of {
Left e -> do { putStrLn "Error parsing input:"; print e; };
Right r -> do{ putStrLn "ok"; mapM_ print r; };
}
}
In this way, if the input is "[[v1]][[v2]]" the program works fine, returning the following output:
"v1"
"v2"
If the input is "xyz[[v1]][[v2]]" the program doesn't work. In particular, I want only what is contained in [[...]], ignoring "xyz".
Also, I want to store the content of [[...]] in a data structure.
How do you solve this problem?
You need to restructure your parser. You are using combinators in very strange locations, and they mess things up.
A var is a varName between "[[" and "]]". So, write that:
var = between (string "[[") (string "]]") varName
A varName should have some kind of format (I don't think that you want to accept "%A¤%&", do you?), so you should make a parser for that; but in case it really can be anything, just do this:
varName = many $ noneOf "]"
Then, a text containing vars, is something with vars separated by non-vars.
varText = someText *> var `sepEndBy` someText
... where someText is anything except a '[':
someText = many $ noneOf "["
Things get more complicated if you want this to be parseable:
bla bla [ bla bla [[somevar]blabla]]
Then you need a better parser for varName and someText:
varName = concat <$> many (try incompleteTerminator <|> many1 (noneOf "]"))
-- Parses e.g. "]a"
incompleteTerminator = (\ a b -> [a, b]) <$> char ']' <*> noneOf "]"
someText = concat <$> many (try incompleteInitiator <|> many1 (noneOf "["))
-- Parses e.g. "[b"
incompleteInitiator = (\ a b -> [a, b]) <$> char '[' <*> noneOf "["
PS. (<*>), (*>) and (<$>) is from Control.Applicative.

Haskell Parsec items numeration

I'm using Text.ParserCombinators.Parsec and Text.XHtml to parse an input like this:
- First type A\n
-- First type B\n
- Second type A\n
-- First type B\n
--Second type B\n
And my output should be:
<h1>1 First type A\n</h1>
<h2>1.1 First type B\n</h2>
<h1>2 Second type A\n</h2>
<h2>2.1 First type B\n</h2>
<h2>2.2 Second type B\n</h2>
I have come to this part, but I cannot get any further:
title1= do{
;(count 1 (char '-'))
;s <- many1 anyChar newline
;return (h1 << s)
}
title2= do{
;(count 2 (char '--'))
;s <- many1 anyChar newline
;return (h1 << s)
}
text=do {
;many (choice [try(title1),try(title2)])
}
main :: IO ()
main = do t putStr "Error: " >> print err
Right x -> putStrLn $ prettyHtml x
This is ok, but it does not include the numbering.
Any ideas?
Thanks!
You probably want to use GenParser with a state containing the current section numbers as a list in reverse order, so section 1.2.3 will be represented as [3,2,1], and maybe the length of the list to avoid repeatedly counting it. Something like
data SectionState = SectionState {nums :: [Int], depth :: Int}
Then make your parser actions return type be "GenParser Char SectionState a". You can access the current state in your parser actions using "getState" and "setState". When you get a series of "-" at the start of a line count them and compare it with "depth" in the state, manipulate the "nums" list appropriately, and then emit "nums" in reverse order (I suggest keeping nums in reverse order because most of the time you want to access the least significant item, so putting it at the head of the list is both easier and more efficient).
See Text.ParserCombinators.Parsec.Prim for details of GenParser. The more usual Parser type is just "type Parser a = GenParser Char () a" You probably want to say
type MyParser a = GenParser Char SectionState a
somewhere near the start of your code.

Resources