Choosing the right ReadP parse result - parsing

I'm trying to parse a RFC5322 email address. My parser works in the sense that among the results, one of them is correct. However, how do I go about selecting the “correct” result?
Given the string Foo Bar <foo#bar.com>, my parser should produce a value of Address (Just "Foo Bar") "foo#bar.com".
Alternatively, given the string foo#bar.com, my parser should produce a value of Address Nothing "foo#bar.com".
The value with the name included is preferred.
My parser looks like this:
import Control.Applicative
import Data.Char
import qualified Data.Text as T
import Text.ParserCombinators.ReadP
onlyEmail :: ReadP Address
onlyEmail = do
skipSpaces
email <- many1 $ satisfy isAscii
skipSpaces
return $ Address Nothing (T.pack email)
withName :: ReadP Address
withName = do
skipSpaces
name <- many1 (satisfy isAscii)
skipSpaces
email <- between (char '<') (char '>') (many1 $ satisfy isAscii)
skipSpaces
return $ Address (Just $ T.pack name) (T.pack email)
rfc5322 :: ReadP Address
rfc5322 = withName <|> onlyEmail
When I run the parser with readP_to_S rfc5322 "Foo Bar <foo#bar.com>", it produces the following results:
[ (Address {addressName = Nothing, addressEmail = "F"},"oo Bar <foo#bar.com>")
, (Address {addressName = Nothing, addressEmail = "Fo"},"o Bar <foo#bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo"},"Bar <foo#bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo "},"Bar <foo#bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo B"},"ar <foo#bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Ba"},"r <foo#bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar"},"<foo#bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar "},"<foo#bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <"},"foo#bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <f"},"oo#bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <fo"},"o#bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo"},"#bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo#"},"bar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo#b"},"ar.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo#ba"},"r.com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo#bar"},".com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo#bar."},"com>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo#bar.c"},"om>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo#bar.co"},"m>")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo#bar.com"},">")
, (Address {addressName = Just "Foo Bar", addressEmail = "foo#bar.com"},"")
, (Address {addressName = Just "Foo Bar ", addressEmail = "foo#bar.com"},"")
, (Address {addressName = Nothing, addressEmail = "Foo Bar <foo#bar.com>"},"")
]
In this case, the result I actually want appears third-last in the list. How do I express that preference?

You should not to do preference. Your problem is that your partial parsers are accepting the more bigger string set than really need.
For example, my solution:
import Control.Bool
import Control.Applicative
import Data.Char
import qualified Data.Text as T
import Data.Text (Text)
import Text.ParserCombinators.ReadP
email :: ReadP Text
email = do
l <- part
a <- char '#'
d <- part
return . T.pack $ l ++ a:d
where
part = munch1 (isAscii <&&> (/='#') <&&> (/='<') <&&> (/='>'))
name :: ReadP Text
name = T.pack <$> chainr1 part sep
where
part = munch1 (isAlpha <||> isDigit <||> (=='\''))
sep = (\xs ys -> xs ++ ' ':ys) <$ munch1 (==' ')
onlyEmail :: ReadP Address
onlyEmail = Address Nothing <$> email
withName :: ReadP Address
withName = do
n <- name
skipSpaces
e <- between (char '<') (char '>') email
return $ Address (Just n) e
address :: ReadP Address
address = skipSpaces *> (withName <|> onlyEmail)
main = print $ readP_to_S address "Foo Bar <foo#bar.com>"
Will be printed:
[(Address (Just "Foo Bar") "foo#bar.com","")]

Related

Is it possible to match a changing JSON key to a sum type data constructor with aeson inside a larger record type?

So I have this data type ItemType which is decoded using its data constructor name (see the FromJSON instance).
import Data.Aeson
import Data.Aeson.Types
import Data.Char (toLower)
import GHC.Generics
data ItemType =
MkLogin Login
| MkCard Card
| MkIdentity Identity
| MkSecureNote Note
deriving (Generic, Show)
lowercase :: String -> String
lowercase "" = ""
lowercase (s:ss) = toLower s : ss
stripPrefix :: String -> String
stripPrefix ('M':'k':ss) = ss
stripPrefix str = str
-- | Decode value using ItemType data constructor names
instance FromJSON ItemType where
parseJSON = genericParseJSON defaultOptions
{ constructorTagModifier = lowercase . stripPrefix
, sumEncoding = ObjectWithSingleField }
and what I want to do is add this type as a field to a larger record type called Item
data Item =
Item { _object :: String
, _id :: String
, _organizationId :: Maybe Int
, _folderId :: Maybe Int
, _type :: Int
, _name :: String
, _notes :: String
, _favorite :: Bool
, ??? :: ItemType -- don't know how to add this without a different field name
, _collectionIds :: [Int]
, _revisionDate :: Maybe String
} deriving (Generic, Show)
instance FromJSON Item where
parseJSON =
genericParseJSON defaultOptions { fieldLabelModifier = stripUnderscore }
However I don't want to create a new field name for the type. Instead I want to use the data constructor which aeson matched on ItemType as the field name because the key of the ItemType field in the JSON object I'm trying to model changes depending upon what ItemType it is. So in this case the key is either "login", "card", "identity", "secureNote". Perhaps I should be using TaggedObject for the sumEncoding, but I'm not totally sure how it works.
Example JSON list of Item objects: https://i.imgur.com/xzHy9MU.png. Here you can see the ItemType field by the keys "login", "card", "identity" depending on what type they are.
You can use a rather ugly hack to pre-process the incoming JSON Value, so that actual JSON input like:
{
"id": "foo",
"bool": false
}
is parsed as if it had been:
{
"id": "foo",
"itemtype": {"bool" : false}
}
which can be handled directly by the generic parsers using the ObjectWithSingleField sum encoding method.
As a simplified example, given:
data ItemType =
MkInt Int
| MkBool Bool
deriving (Generic, Show)
instance FromJSON ItemType where
parseJSON = genericParseJSON defaultOptions
{ constructorTagModifier = map toLower . \('M':'k':ss) -> ss
, sumEncoding = ObjectWithSingleField }
and:
data Item =
Item { _id :: String
, _itemtype :: ItemType
}
deriving (Generic, Show)
you can write a FromJSON instance for Item that nests an "int" or "bool" field inside an "itemtype" field. (A duplicate of the original field is left in place but ignored by the generic parser.)
instance FromJSON Item where
parseJSON v = do
v' <- withObject "Item" nest v
genericParseJSON defaultOptions { fieldLabelModifier = \('_':ss) -> ss } v'
where nest o = Object <$> (HM.insert "itemtype" <$> item <*> pure o)
where item = subObj "int" <|> subObj "bool" <|> fail "no item type field"
subObj k = (\v -> object [(k,v)]) <$> o .: k
Full code:
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE TupleSections #-}
import Control.Applicative
import Data.Aeson
import Data.Aeson.Types
import Data.Char (toLower)
import GHC.Generics
import qualified Data.HashMap.Strict as HM
data ItemType =
MkInt Int
| MkBool Bool
deriving (Generic, Show)
instance FromJSON ItemType where
parseJSON = genericParseJSON defaultOptions
{ constructorTagModifier = map toLower . \('M':'k':ss) -> ss
, sumEncoding = ObjectWithSingleField }
data Item =
Item { _id :: String
, _itemtype :: ItemType
}
deriving (Generic, Show)
instance FromJSON Item where
parseJSON v = do
v' <- withObject "Item" nest v
genericParseJSON defaultOptions { fieldLabelModifier = \('_':ss) -> ss } v'
where nest o = Object <$> (HM.insert "itemtype" <$> item <*> pure o)
where item = subObj "int" <|> subObj "bool" <|> fail "no item type field"
subObj k = (\v -> object [(k,v)]) <$> o .: k
test1, test2, test3 :: Either String Item
test1 = eitherDecode "{\"id\":\"foo\",\"bool\":false}"
test2 = eitherDecode "{\"id\":\"foo\",\"int\":10}"
test3 = eitherDecode "{\"id\":\"foo\"}"
main = do
print test1
print test2
print test3
Generally, though, unless you're doing this a lot, it's probably better for the sake of clarity and readability to just abandon the generics and write the necessary boilerplate. It's not that onerous, even for your original example. Yes, you have to keep the type and instance in sync, but a couple of simple tests should catch any problems. So, for example, something like:
instance FromJSON Item where
parseJSON = withObject "Item" $ \o ->
Item <$> o .: "object"
<*> o .: "id"
<*> o .:? "organizationId"
<*> o .:? "folderId"
<*> o .: "type"
<*> o .: "name"
<*> o .: "notes"
<*> o .: "favorite"
<*> parseItemType o
<*> o .: "collectionIds"
<*> o .:? "revisionDate"
where parseItemType o =
MkLogin <$> o .: "login"
<|> MkCard <$> o .: "card"
<|> MkIdentity <$> o .: "identity"
<|> MkSecureNote <$> o .: "securenote"
One way would be to have no field at all for ItemType in the Item data declaration. Then use either a tuple or a custom pair type to hold both pieces; so:
data ItemWithType = ItemWithType ItemType Item
instance FromJSON ItemWithType where
parseJSON v = liftA2 ItemWithType (parseJSON v) (parseJSON v)
You can also skip defining ItemWithType and just use
\o -> liftA2 (,) (parseJSON o) (parseJSON o)
directly to parse a tuple of the fields with a consistent name and the object under the variable key.

Strange behaviour parsing an imperative language using Parsec

I'm trying to parse a fragment of the Abap language with Parsec in haskell. The statements in Abap are delimited by dots. The syntax for function definition is:
FORM <name> <arguments>.
<statements>.
ENDFORM.
I will use it as a minimal example.
Here is my attempt at writing the corresponding type in haskell and the parser. The GenStatement-Constructor is for all other statements except function definition as described above.
module Main where
import Control.Applicative
import Data.Functor.Identity
import qualified Text.Parsec as P
import qualified Text.Parsec.String as S
import Text.Parsec.Language
import qualified Text.Parsec.Token as T
type Args = String
type Name = String
data AbapExpr -- ABAP Program
= Form Name Args [AbapExpr]
| GenStatement String [AbapExpr]
deriving (Show, Read)
lexer :: T.TokenParser ()
lexer = T.makeTokenParser style
where
caseSensitive = False
keys = ["form", "endform"]
style = emptyDef
{ T.reservedNames = keys
, T.identStart = P.alphaNum <|> P.char '_'
, T.identLetter = P.alphaNum <|> P.char '_'
}
dot :: S.Parser String
dot = T.dot lexer
reserved :: String -> S.Parser ()
reserved = T.reserved lexer
identifier :: S.Parser String
identifier = T.identifier lexer
argsP :: S.Parser String
argsP = P.manyTill P.anyChar (P.try (P.lookAhead dot))
genericStatementP :: S.Parser String
genericStatementP = P.manyTill P.anyChar (P.try dot)
abapExprP = P.try (P.between (reserved "form")
(reserved "endform" >> dot)
abapFormP)
<|> abapStmtP
where
abapFormP = Form <$> identifier <*> argsP <* dot <*> many abapExprP
abapStmtP = GenStatement <$> genericStatementP <*> many abapExprP
Testing the parser with the following input results in a strange behaviour.
-- a wrapper for convenience
parse :: S.Parser a -> String -> Either P.ParseError a
parse = flip P.parse "Test"
testParse1 = parse abapExprP "form foo arg1 arg2 arg2. form bar arg1. endform. endform."
results in
Right (GenStatement "form foo arg1 arg2 arg2" [GenStatement "form bar arg1" [GenStatement "endform" [GenStatement "endform" []]]])
so it seems the first brach always fails and only the second generic branch is successful. However if the second branch (parsing generic statements) is commented parsing forms suddenly succeeds:
abapExprP = P.try (P.between (reserved "form")
(reserved "endform" >> dot)
abapFormP)
-- <|> abapStmtP
where
abapFormP = Form <$> identifier <*> argsP <* dot <*> many abapExprP
-- abapStmtP = GenStatement <$> genericStatementP <*> many abapExprP
Now we get
Right (Form "foo" "arg1 arg2 arg2" [Form "bar" "arg1" []])
How is this possible? It seems that the first branch succeeds so why doesn't it work in the first example - what am I missing?
Many thanks in advance!
Looks for me that your parser genericStatementP parses any character until a dot appears (you are using P.anyChar). Hence it doesn't recognize the reserved keywords for your lexer.
I think you must define:
type Args = [String]
and:
argsP :: S.Parser [String]
argsP = P.manyTill identifier (P.try (P.lookAhead dot))
genericStatementP :: S.Parser String
genericStatementP = identifier
With these changes I get the following result:
Right (Form "foo" ["arg1","arg2","arg2"] [Form "bar" ["arg1"] []])

How to parse a sequence of " " separated values in Haskell

I am a beginner in Haskell and i need to parse a sequence of valuses spearate by something.
The following parser is generated with tokenparser:
m_semiSep1 p parses and returns a semicolon-separated sequence of one or more p's.
But i dont quite understand how it is created. I need one that returns a comma separated sequence of p`s. Can you give me a hint how can i do that. I also need to parse a sequence of "=|" separated values but i suppose that it will be the same as with the comma. This is the code i am working on:
def = emptyDef{ commentStart = "{-"
, commentEnd = "-}"
, identStart = letter
, identLetter = alphaNum
, opStart = oneOf "^~&=:-|,"
, opLetter = oneOf "^~&=:-|,"
, reservedOpNames = ["~", "&", "^", ":=", "|-", ","]
, reservedNames = ["true", "false", "nop",
"if", "then", "else", "fi",
"while", "do", "od"]
}
TokenParser{ parens = m_parens
, identifier = m_identifier
, reservedOp = m_reservedOp
, reserved = m_reserved
, semiSep1 = m_semiSep1
, whiteSpace = m_whiteSpace } = makeTokenParser def
You can use sepBy in parsec. sepBy cell deli parses something like cell deli cell deli...
For example:
Prelude> :m Text.ParserCombinators.Parsec
Prelude Text.ParserCombinators.Parsec> let csv = (many letter) `sepBy` (char ',') :: Parser [String]
Prelude Text.ParserCombinators.Parsec> parse csv "" "xx,yy,zz"
Right ["xx","yy","zz"]
https://hackage.haskell.org/package/parsec-3.1.9/docs/Text-Parsec-Combinator.html#v:sepBy

Parsing a particular string in Haskell

I'm using the parsec Haskell library.
I want to parse strings of the following kind:
[[v1]][[v2]]
xyz[[v1]][[v2]]
[[v1]]xyz[[v2]]
etc.
I'm interesting to collect only the values v1 and v2, and store these in a data structure.
I tried with the following code:
import Text.ParserCombinators.Parsec
quantifiedVars = sepEndBy var (string "]]")
var = between (string "[[") (string "") (many (noneOf "]]"))
parseSL :: String -> Either ParseError [String]
parseSL input = parse quantifiedVars "(unknown)" input
main = do {
c <- getContents;
case parse quantifiedVars "(stdin)" c of {
Left e -> do { putStrLn "Error parsing input:"; print e; };
Right r -> do{ putStrLn "ok"; mapM_ print r; };
}
}
In this way, if the input is "[[v1]][[v2]]" the program works fine, returning the following output:
"v1"
"v2"
If the input is "xyz[[v1]][[v2]]" the program doesn't work. In particular, I want only what is contained in [[...]], ignoring "xyz".
Also, I want to store the content of [[...]] in a data structure.
How do you solve this problem?
You need to restructure your parser. You are using combinators in very strange locations, and they mess things up.
A var is a varName between "[[" and "]]". So, write that:
var = between (string "[[") (string "]]") varName
A varName should have some kind of format (I don't think that you want to accept "%A¤%&", do you?), so you should make a parser for that; but in case it really can be anything, just do this:
varName = many $ noneOf "]"
Then, a text containing vars, is something with vars separated by non-vars.
varText = someText *> var `sepEndBy` someText
... where someText is anything except a '[':
someText = many $ noneOf "["
Things get more complicated if you want this to be parseable:
bla bla [ bla bla [[somevar]blabla]]
Then you need a better parser for varName and someText:
varName = concat <$> many (try incompleteTerminator <|> many1 (noneOf "]"))
-- Parses e.g. "]a"
incompleteTerminator = (\ a b -> [a, b]) <$> char ']' <*> noneOf "]"
someText = concat <$> many (try incompleteInitiator <|> many1 (noneOf "["))
-- Parses e.g. "[b"
incompleteInitiator = (\ a b -> [a, b]) <$> char '[' <*> noneOf "["
PS. (<*>), (*>) and (<$>) is from Control.Applicative.

Scala Parser fails on <init>

Edit: I was able to fix it by making changing val to lazy val in the MessageParser class. I forgot that I had previously tested it using def instead of val. Can someone make it clear why this change fixes it?
So, I am currently writing an IRC Server. I decided to use Scala's Combinator Parser library to help me parse the messages. I've been able to correctly parse a message through a test program, but when I attempted to incorporate my parser into an echo server I already wrote I receive the following error message when I make a connection to my server:
Connected to the target VM, address: '127.0.0.1:55567', transport: 'socket'
Exception in thread "main" java.lang.ExceptionInInitializerError
at IRCServer.main(IRCServer.scala)
Caused by: java.lang.NullPointerException
at messages.MessageParser.<init>(MessageParser.scala:11)
at net.Connection.<init>(Connection.scala:14)
at net.Server.start(Server.scala:14)
at IRCServer$.<init>(IRCServer.scala:12)
at IRCServer$.<clinit>(IRCServer.scala)
... 1 more
Disconnected from the target VM, address: '127.0.0.1:55567', transport: 'socket'
The Connection class handles a listener Socket created from a ServerSocket
class Connection(socket: Socket) extends Thread {
private val out = new PrintStream(socket.getOutputStream)
private val in = new BufferedReader(new InputStreamReader(socket.getInputStream))
private val parser = new MessageParser
override def run(): Unit = {
var line = ""
while({(line = in.readLine); line != null}) {
Console.println("received: " + line)
parser.parseLine(line.trim)
out.println("out: " + line)
}
}
}
And the following is my MessageParser:
class MessageParser extends JavaTokenParsers {
def parseLine(line :CharSequence) = {
parseAll(message, line)
}
val message: Parser[Any] = opt(":"~prefix)~command~opt(params) ^^ (x=> {println("message: "+x)})
val prefix: Parser[Any] = nick~"!"~user~"#"~host | servername ^^ (x=> {println("prefix: " +x)})
val nick: Parser[Any] = letter~rep(letter | wholeNumber | special) ^^ (x=> {println("nick: " +x)})
val special: Parser[Any] = "-" | "[" | "]" | "\\" | "`" | "^" | "{" | "}" ^^ (x=> {println("special: " +x)})
val user: Parser[Any] = """[^\s#]+""".r ^^ (x=> {println("user: " +x)})
val host: Parser[Any] = """[\w\.]+\w+""".r ^^ (x=> {println("host: " +x)})
val servername: Parser[Any] = host ^^ (x=> {println("servername: " +x)})
val command: Parser[Any] = """([A-Za-z]+)|([0-9]{3})""".r ^^ (x=> {println("command: " +x)})
val params: Parser[Any] = rep(param)~opt(":"~tail) ^^ (x=> {println("params: " +x)})
val param: Parser[Any] = """[^:][\S]*""".r
val tail: Parser[Any] = """.*$""".r ^^ (x=> {println("tail: " +x)})
val letter: Parser[Any] = """[A-Za-z]""".r ^^ (x=> {println("letter: " +x)})
}
I'm not quite sure what could be causing this. Hopefully I'm just being blind to something small.
lazy val values are populated as-needed; val values are populated in the order you specify. With a parser, earlier entries refer to later ones which don't exist yet. So they'd better be lazy val or def (which one depends on the parser; the packrat parser likes lazy val, while the others usually assume def, but I'm not sure that they require it).
Catch the exception with the following code:
try {
//your code here
} catch {
case err: ExceptionInInitializerError => err.getCause.printStackTrace
}
This will help you to find the reason of the exception.

Resources