How to combine StringParser with ParserT in Purescript? - parsing

How can I combine functions from both modules:
import Text.Parsing.Parser.String (noneOf)
import Text.Parsing.StringParser.Combinators (many)
For my parser userInput:
userInput :: Parser String
userInput = many (noneOf ['\t', '\n'])
Purescript complains that
Cannot unify type
Text.Parsing.Parser.ParserT Prim.String _0
with type
Text.Parsing.StringParser.Parser
Documentation on StringParser and ParserT.

Related

Haskell interpret literal types written to file

Write a literal type implementation (ClientCore) to file. Using readFile returns a String. What would be the easiest way to separate this String and return individual ClientCore types? Bit struggling here (note: Haskell beginner getting feet wet).
{-# LANGUAGE OverloadedStrings, DeriveGeneric #-}
import System.IO
import Data.Text
import Data.Aeson
import Web.Scotty
import GHC.Generics
import qualified Data.ByteString.Lazy as BL
import Control.Applicative
import Data.Monoid ((<>))
data ClientCore = ClientCore { clId :: Int
, clName :: String
, clCore :: String
, clClass :: Int
} deriving (Show, Generic)
parseReadClientCore getFile = undefined
constructClientData :: String -> String -> String -> String -> ClientCore
constructClientData clId' clName' clCore' clClass' =
ClientCore { clId = parse_clId
, clName = clName'
, clCore = clCore'
, clClass = parse_clClass
}
where
parse_clId = read $ clId' :: Int
parse_clClass = read $ clClass' :: Int
newClientCore :: IO ()
newClientCore = do
putStr "Client ID: "; clId <- getLine
putStr "Name: "; clName <- getLine
putStr "Core business: "; clCore <- getLine
putStr "Classification: "; clClass <- getLine
postClientCore <- return (constructClientData clId clName clCore clClass)
appendFile "haskelltypes.txt" $ (show postClientCore)
readClientCore :: IO ()
readClientCore = do
getFile <- readFile "haskelltypes.txt"
return (parseReadClientCore getFile)
Read is the inverse of Show, and will let you convert one String to one ClientCore.
I think it will be easier to first separate the file into strings each representing one ClientCore, then reading each piece. The easiest way is to add a newline after each when writing the file, and then use lines to split the file.

Parsec sepBy Haskell

I wrote a function and it complies, but I'm not sure if it works the way I intend it to or how to call it in the terminal. Essentially, I want to take a string, like ("age",5),("age",6) and make it into a list of tuples [("age1",5)...]. I am trying to write a function separate the commas and either I am just not sure how to call it in the terminal or I did it wrong.
items :: Parser (String,Integer) -> Parser [(String,Integer)]
items p = do { p <- sepBy strToTup (char ",");
return p }
I'm not sure what you want and I don't know what is Parser.
Starting from such a string:
thestring = "(\"age\",5),(\"age\",6),(\"age\",7)"
I would firstly remove the outer commas with a regular expression method:
import Text.Regex
rgx = mkRegex "\\),\\("
thestring' = subRegex rgx thestring ")("
This gives:
>>> thestring'
"(\"age\",5)(\"age\",6)(\"age\",7)"
Then I would split:
import Data.List.Split
thelist = split (startsWith "(") thestring'
which gives:
>>> thelist
["(\"age\",5)","(\"age\",6)","(\"age\",7)"]
This is what you want, if I correctly understood.
That's probably not the best way. Since all the elements of the final list have form ("age", X) you could extract all numbers (I don't know but it should not be difficult) and then it would be easy to get the final list. Maybe better.
Apologies if this has nothing to do with your question.
Edit
JFF ("just for fun"), another way:
import Data.Char (isDigit)
import Data.List.Split
thestring = "(\"age\",15),(\"age\",6),(\"age\",7)"
ages = (split . dropBlanks . dropDelims . whenElt) (not . isDigit) thestring
map (\age -> "(age," ++ age ++ ")") ages
-- result: ["(age,15)","(age,6)","(age,7)"]
Or rather:
>>> map (\age -> ("age",age)) ages
[("age","15"),("age","6"),("age","7")]
Or if you want integers:
>>> map (\age -> ("age", read age :: Int)) ages
[("age",15),("age",6),("age",7)]
Or if you want age1, age2, ...:
import Data.List.Index
imap (\i age -> ("age" ++ show (i+1), read age :: Int)) ages
-- result: [("age1",15),("age2",6),("age3",7)]

Strange behaviour parsing an imperative language using Parsec

I'm trying to parse a fragment of the Abap language with Parsec in haskell. The statements in Abap are delimited by dots. The syntax for function definition is:
FORM <name> <arguments>.
<statements>.
ENDFORM.
I will use it as a minimal example.
Here is my attempt at writing the corresponding type in haskell and the parser. The GenStatement-Constructor is for all other statements except function definition as described above.
module Main where
import Control.Applicative
import Data.Functor.Identity
import qualified Text.Parsec as P
import qualified Text.Parsec.String as S
import Text.Parsec.Language
import qualified Text.Parsec.Token as T
type Args = String
type Name = String
data AbapExpr -- ABAP Program
= Form Name Args [AbapExpr]
| GenStatement String [AbapExpr]
deriving (Show, Read)
lexer :: T.TokenParser ()
lexer = T.makeTokenParser style
where
caseSensitive = False
keys = ["form", "endform"]
style = emptyDef
{ T.reservedNames = keys
, T.identStart = P.alphaNum <|> P.char '_'
, T.identLetter = P.alphaNum <|> P.char '_'
}
dot :: S.Parser String
dot = T.dot lexer
reserved :: String -> S.Parser ()
reserved = T.reserved lexer
identifier :: S.Parser String
identifier = T.identifier lexer
argsP :: S.Parser String
argsP = P.manyTill P.anyChar (P.try (P.lookAhead dot))
genericStatementP :: S.Parser String
genericStatementP = P.manyTill P.anyChar (P.try dot)
abapExprP = P.try (P.between (reserved "form")
(reserved "endform" >> dot)
abapFormP)
<|> abapStmtP
where
abapFormP = Form <$> identifier <*> argsP <* dot <*> many abapExprP
abapStmtP = GenStatement <$> genericStatementP <*> many abapExprP
Testing the parser with the following input results in a strange behaviour.
-- a wrapper for convenience
parse :: S.Parser a -> String -> Either P.ParseError a
parse = flip P.parse "Test"
testParse1 = parse abapExprP "form foo arg1 arg2 arg2. form bar arg1. endform. endform."
results in
Right (GenStatement "form foo arg1 arg2 arg2" [GenStatement "form bar arg1" [GenStatement "endform" [GenStatement "endform" []]]])
so it seems the first brach always fails and only the second generic branch is successful. However if the second branch (parsing generic statements) is commented parsing forms suddenly succeeds:
abapExprP = P.try (P.between (reserved "form")
(reserved "endform" >> dot)
abapFormP)
-- <|> abapStmtP
where
abapFormP = Form <$> identifier <*> argsP <* dot <*> many abapExprP
-- abapStmtP = GenStatement <$> genericStatementP <*> many abapExprP
Now we get
Right (Form "foo" "arg1 arg2 arg2" [Form "bar" "arg1" []])
How is this possible? It seems that the first branch succeeds so why doesn't it work in the first example - what am I missing?
Many thanks in advance!
Looks for me that your parser genericStatementP parses any character until a dot appears (you are using P.anyChar). Hence it doesn't recognize the reserved keywords for your lexer.
I think you must define:
type Args = [String]
and:
argsP :: S.Parser [String]
argsP = P.manyTill identifier (P.try (P.lookAhead dot))
genericStatementP :: S.Parser String
genericStatementP = identifier
With these changes I get the following result:
Right (Form "foo" ["arg1","arg2","arg2"] [Form "bar" ["arg1"] []])

Parsing CSV header into list of parsers

I want to parse first line of CSV file and to get list of parsers as a result, and fail miserably.
After some simplifications I got code I think should work, but it does not, and I don't understand why.
Here it is:
{-# LANGUAGE OverloadedStrings #-}
import Data.Text
import Data.Attoparsec.Text
import Control.Applicative
doTestSep :: [String] -> Parser [String]
doTestSep t = do
(endOfLine >> return t)
<|> (char ';' *> doTestParse t)
doTestParse :: [String] -> Parser [String]
doTestParse t = do
(string "<FIELD1>" *> doTestSep ("field1" : t))
<|> (string "<FIELD2>" *> doTestSep ("field2" : t))
test = parseOnly (doTestParse []) "<FIELD1>"
I call test, expecting to get something like
> Right ["field1"]
but instead I get
> Left "Failed reading: takeWith"
What am I doing wrong?
A problem is wrong input: my title line will definitely have \n or \r\n, which will be catched by endOfLine, and my example input did not have \n in it.
Working version is
test = parseOnly (doTestParse []) "<FIELD1>\n"

Parsing a particular string in Haskell

I'm using the parsec Haskell library.
I want to parse strings of the following kind:
[[v1]][[v2]]
xyz[[v1]][[v2]]
[[v1]]xyz[[v2]]
etc.
I'm interesting to collect only the values v1 and v2, and store these in a data structure.
I tried with the following code:
import Text.ParserCombinators.Parsec
quantifiedVars = sepEndBy var (string "]]")
var = between (string "[[") (string "") (many (noneOf "]]"))
parseSL :: String -> Either ParseError [String]
parseSL input = parse quantifiedVars "(unknown)" input
main = do {
c <- getContents;
case parse quantifiedVars "(stdin)" c of {
Left e -> do { putStrLn "Error parsing input:"; print e; };
Right r -> do{ putStrLn "ok"; mapM_ print r; };
}
}
In this way, if the input is "[[v1]][[v2]]" the program works fine, returning the following output:
"v1"
"v2"
If the input is "xyz[[v1]][[v2]]" the program doesn't work. In particular, I want only what is contained in [[...]], ignoring "xyz".
Also, I want to store the content of [[...]] in a data structure.
How do you solve this problem?
You need to restructure your parser. You are using combinators in very strange locations, and they mess things up.
A var is a varName between "[[" and "]]". So, write that:
var = between (string "[[") (string "]]") varName
A varName should have some kind of format (I don't think that you want to accept "%A¤%&", do you?), so you should make a parser for that; but in case it really can be anything, just do this:
varName = many $ noneOf "]"
Then, a text containing vars, is something with vars separated by non-vars.
varText = someText *> var `sepEndBy` someText
... where someText is anything except a '[':
someText = many $ noneOf "["
Things get more complicated if you want this to be parseable:
bla bla [ bla bla [[somevar]blabla]]
Then you need a better parser for varName and someText:
varName = concat <$> many (try incompleteTerminator <|> many1 (noneOf "]"))
-- Parses e.g. "]a"
incompleteTerminator = (\ a b -> [a, b]) <$> char ']' <*> noneOf "]"
someText = concat <$> many (try incompleteInitiator <|> many1 (noneOf "["))
-- Parses e.g. "[b"
incompleteInitiator = (\ a b -> [a, b]) <$> char '[' <*> noneOf "["
PS. (<*>), (*>) and (<$>) is from Control.Applicative.

Resources