How to parse a sequence of " " separated values in Haskell - parsing

I am a beginner in Haskell and i need to parse a sequence of valuses spearate by something.
The following parser is generated with tokenparser:
m_semiSep1 p parses and returns a semicolon-separated sequence of one or more p's.
But i dont quite understand how it is created. I need one that returns a comma separated sequence of p`s. Can you give me a hint how can i do that. I also need to parse a sequence of "=|" separated values but i suppose that it will be the same as with the comma. This is the code i am working on:
def = emptyDef{ commentStart = "{-"
, commentEnd = "-}"
, identStart = letter
, identLetter = alphaNum
, opStart = oneOf "^~&=:-|,"
, opLetter = oneOf "^~&=:-|,"
, reservedOpNames = ["~", "&", "^", ":=", "|-", ","]
, reservedNames = ["true", "false", "nop",
"if", "then", "else", "fi",
"while", "do", "od"]
}
TokenParser{ parens = m_parens
, identifier = m_identifier
, reservedOp = m_reservedOp
, reserved = m_reserved
, semiSep1 = m_semiSep1
, whiteSpace = m_whiteSpace } = makeTokenParser def

You can use sepBy in parsec. sepBy cell deli parses something like cell deli cell deli...
For example:
Prelude> :m Text.ParserCombinators.Parsec
Prelude Text.ParserCombinators.Parsec> let csv = (many letter) `sepBy` (char ',') :: Parser [String]
Prelude Text.ParserCombinators.Parsec> parse csv "" "xx,yy,zz"
Right ["xx","yy","zz"]
https://hackage.haskell.org/package/parsec-3.1.9/docs/Text-Parsec-Combinator.html#v:sepBy

Related

FParsec match string which have one of 2 patterns

I'm trying to learn FParsec and am trying to match strings which follow on of two patterns.
The string can either be an ordanary string like "string" or it can be a string with one dot in it, like "st.ring".
The parser should look like this: Parser<(string Option * string),unit>. The first string is optional depending of if the string is splitted by a dot or not. The optional string represent the part of the string which is before the ".".
I have tried a few different things but I feel this attempt was the closes:
let charstilldot = manyCharsTill anyChar (pstring ".")
let parser = opt(charstilldot) .>>. (many1Chars anyChar)
This works with input like this "st.ring" but not "string" since not dot exists in the latter.
I would verry much appriciate some help, thank you!
EDIT:
I have solution which basicly parse the arguments in order and swap the arguments depending of their is a dot or not in the string
let colTargetWithoutDot : Parser<string Option,unit> = spaces |>> fun _ -> None
let colTargetWithDot = (pstring "." >>. alphastring) |>> Some
let specificColumn = alphastring .>>. (colTargetWithDot <|> colTargetWithoutDot) |>> (fun (h,t) ->
match h,t with
| h,None -> (None,h)
| h,Some(t) -> (Some(h),t))
However this is not pretty so I would still appriciate another solution!
I think the main problem here is that charstilldot consumes characters even when it fails. In that situation, many1chars then fails because the entire input has already been consumed. The easiest way to address this is by using attempt to rollback when there is no dot:
let charstilldot = attempt (manyCharsTill anyChar (pstring "."))
let parser = opt(charstilldot) .>>. (many1Chars anyChar)
Result:
"str.ing" -> (Some "str", "ing")
"string" -> (None, "string")
I think there are other good solutions as well, but I've tried to give you one that requires the least change to your current code.

Strange behaviour parsing an imperative language using Parsec

I'm trying to parse a fragment of the Abap language with Parsec in haskell. The statements in Abap are delimited by dots. The syntax for function definition is:
FORM <name> <arguments>.
<statements>.
ENDFORM.
I will use it as a minimal example.
Here is my attempt at writing the corresponding type in haskell and the parser. The GenStatement-Constructor is for all other statements except function definition as described above.
module Main where
import Control.Applicative
import Data.Functor.Identity
import qualified Text.Parsec as P
import qualified Text.Parsec.String as S
import Text.Parsec.Language
import qualified Text.Parsec.Token as T
type Args = String
type Name = String
data AbapExpr -- ABAP Program
= Form Name Args [AbapExpr]
| GenStatement String [AbapExpr]
deriving (Show, Read)
lexer :: T.TokenParser ()
lexer = T.makeTokenParser style
where
caseSensitive = False
keys = ["form", "endform"]
style = emptyDef
{ T.reservedNames = keys
, T.identStart = P.alphaNum <|> P.char '_'
, T.identLetter = P.alphaNum <|> P.char '_'
}
dot :: S.Parser String
dot = T.dot lexer
reserved :: String -> S.Parser ()
reserved = T.reserved lexer
identifier :: S.Parser String
identifier = T.identifier lexer
argsP :: S.Parser String
argsP = P.manyTill P.anyChar (P.try (P.lookAhead dot))
genericStatementP :: S.Parser String
genericStatementP = P.manyTill P.anyChar (P.try dot)
abapExprP = P.try (P.between (reserved "form")
(reserved "endform" >> dot)
abapFormP)
<|> abapStmtP
where
abapFormP = Form <$> identifier <*> argsP <* dot <*> many abapExprP
abapStmtP = GenStatement <$> genericStatementP <*> many abapExprP
Testing the parser with the following input results in a strange behaviour.
-- a wrapper for convenience
parse :: S.Parser a -> String -> Either P.ParseError a
parse = flip P.parse "Test"
testParse1 = parse abapExprP "form foo arg1 arg2 arg2. form bar arg1. endform. endform."
results in
Right (GenStatement "form foo arg1 arg2 arg2" [GenStatement "form bar arg1" [GenStatement "endform" [GenStatement "endform" []]]])
so it seems the first brach always fails and only the second generic branch is successful. However if the second branch (parsing generic statements) is commented parsing forms suddenly succeeds:
abapExprP = P.try (P.between (reserved "form")
(reserved "endform" >> dot)
abapFormP)
-- <|> abapStmtP
where
abapFormP = Form <$> identifier <*> argsP <* dot <*> many abapExprP
-- abapStmtP = GenStatement <$> genericStatementP <*> many abapExprP
Now we get
Right (Form "foo" "arg1 arg2 arg2" [Form "bar" "arg1" []])
How is this possible? It seems that the first branch succeeds so why doesn't it work in the first example - what am I missing?
Many thanks in advance!
Looks for me that your parser genericStatementP parses any character until a dot appears (you are using P.anyChar). Hence it doesn't recognize the reserved keywords for your lexer.
I think you must define:
type Args = [String]
and:
argsP :: S.Parser [String]
argsP = P.manyTill identifier (P.try (P.lookAhead dot))
genericStatementP :: S.Parser String
genericStatementP = identifier
With these changes I get the following result:
Right (Form "foo" ["arg1","arg2","arg2"] [Form "bar" ["arg1"] []])

Re-define "stringLiteral" token in Parsec.Token

I am developing Pascal language parser in Haskell using Parsec library and I need to re-define some tokens defined in Parsec.Token class.
Speeking of it, here is my case:
I need to change how stringLiteral token is matched. In default definition, it is something between char '"' (see this), but I need it to be between '\'' (apostrophes). How can I do this modification to Parsec behavior?
Thanks!!!
You are talking about adjusting the field of a data type named GenTokenParser. It looks like you are using a function that automatically fills in the data type with sensible defaults and you just want to adjust one thing, here you go:
myMakeTokenParser langDef =
let default = makeTokenParser langDef
in default { stringLiteral = newStringLit }
where
newStringLit = lexeme (
do{ str <- between (char '\'')
(char '\'' <?> "end of string")
(many stringChar)
; return (foldr (maybe id (:)) "" str)
}
<?> "literal string")

Parsing a particular string in Haskell

I'm using the parsec Haskell library.
I want to parse strings of the following kind:
[[v1]][[v2]]
xyz[[v1]][[v2]]
[[v1]]xyz[[v2]]
etc.
I'm interesting to collect only the values v1 and v2, and store these in a data structure.
I tried with the following code:
import Text.ParserCombinators.Parsec
quantifiedVars = sepEndBy var (string "]]")
var = between (string "[[") (string "") (many (noneOf "]]"))
parseSL :: String -> Either ParseError [String]
parseSL input = parse quantifiedVars "(unknown)" input
main = do {
c <- getContents;
case parse quantifiedVars "(stdin)" c of {
Left e -> do { putStrLn "Error parsing input:"; print e; };
Right r -> do{ putStrLn "ok"; mapM_ print r; };
}
}
In this way, if the input is "[[v1]][[v2]]" the program works fine, returning the following output:
"v1"
"v2"
If the input is "xyz[[v1]][[v2]]" the program doesn't work. In particular, I want only what is contained in [[...]], ignoring "xyz".
Also, I want to store the content of [[...]] in a data structure.
How do you solve this problem?
You need to restructure your parser. You are using combinators in very strange locations, and they mess things up.
A var is a varName between "[[" and "]]". So, write that:
var = between (string "[[") (string "]]") varName
A varName should have some kind of format (I don't think that you want to accept "%A¤%&", do you?), so you should make a parser for that; but in case it really can be anything, just do this:
varName = many $ noneOf "]"
Then, a text containing vars, is something with vars separated by non-vars.
varText = someText *> var `sepEndBy` someText
... where someText is anything except a '[':
someText = many $ noneOf "["
Things get more complicated if you want this to be parseable:
bla bla [ bla bla [[somevar]blabla]]
Then you need a better parser for varName and someText:
varName = concat <$> many (try incompleteTerminator <|> many1 (noneOf "]"))
-- Parses e.g. "]a"
incompleteTerminator = (\ a b -> [a, b]) <$> char ']' <*> noneOf "]"
someText = concat <$> many (try incompleteInitiator <|> many1 (noneOf "["))
-- Parses e.g. "[b"
incompleteInitiator = (\ a b -> [a, b]) <$> char '[' <*> noneOf "["
PS. (<*>), (*>) and (<$>) is from Control.Applicative.

Scala: Using StandardTokenParser for parsing hexadecimal numbers

I am using Scala combinatorial parser by extending scala.util.parsing.combinator.syntactical.StandardTokenParser. This class provides following methods
def ident : Parser[String] for parsing identifiers and
def numericLit : Parser[String] for parsing a number (decimal I suppose)
I am using scala.util.parsing.combinator.lexical.Scannersfrom scala.util.parsing.combinator.lexical.StdLexicalfor lexing.
My requirement is to parse a hexadecimal number (without the 0x prefix) which can be of any length. Basically a grammar like: ([0-9]|[a-f])+
I tried integrating Regex parser but there are type issues there. Other ways to extend the definition of lexer delimiter and grammar rules lead to token not found!
As I thought the problem can be solved by extending the behavior of Lexer and not the Parser. The standard lexer takes only decimal digits, so I created a new lexer:
class MyLexer extends StdLexical {
override type Elem = Char
override def digit = ( super.digit | hexDigit )
lazy val hexDigits = Set[Char]() ++ "0123456789abcdefABCDEF".toArray
lazy val hexDigit = elem("hex digit", hexDigits.contains(_))
}
And my parser (which has to be a StandardTokenParser) can be extended as follows:
object ParseAST extends StandardTokenParsers{
override val lexical:MyLexer = new MyLexer()
lexical.delimiters += ( "(" , ")" , "," , "#")
...
}
The construction of the "number" from digits is taken care by StdLexical class:
class StdLexical {
...
def token: Parser[Token] =
...
| digit~rep(digit)^^{case first ~ rest => NumericLit(first :: rest mkString "")}
}
Since StdLexical gives just the parsed number as a String it is not a problem for me, as I am not interested in numeric value either.
You can use the RegexParsers with an action associated to the token in question.
import scala.util.parsing.combinator._
object HexParser extends RegexParsers {
val hexNum: Parser[Int] = """[0-9a-f]+""".r ^^
{ case s:String => Integer.parseInt(s,16) }
def seq: Parser[Any] = repsep(hexNum, ",")
}
This will define a parser that reads comma separated hex number with no prior 0x. And it will actually return a Int.
val result = HexParser.parse(HexParser.seq, "1, 2, f, 10, 1a2b34d")
scala> println(result)
[1.21] parsed: List(1, 2, 15, 16, 27439949)
Not there is no way to distinguish decimal notation numbers. Also I'm using the Integer.parseInt, this is limited to the size of your Int. To get any length you may have to make your own parser and use BigInteger or arrays.

Resources