I am working on a project for a class and we are tasked with writing a scanner for numbers, symbols, comments, arithmetic operators, parenthesis, and EOF in both Python and Racket. I am working on the racket version and I have written the following line to define one or more character as a symbol:
[(any-char) (token-CHAR (string->character lexeme))]
I have the following line to define on or more digits as a number:
[(:+ digit) (token-NUM (string->number lexeme))]
I am very new to Racket, this is my third program, so I am not exactly sure how to approach this, so any suggestions are greatly appreciated. I have scoured the Racket documentation, but I wasn't able to find what I was looking for.
Thanks!
Here is a minimal getting-started example - heavily commented.
#lang racket
;;; IMPORT
;; Import the lexer tools
(require parser-tools/yacc
parser-tools/lex
(prefix-in : parser-tools/lex-sre) ; names from lex-sre are prefixed with :
; to avoid name collisions
syntax/readerr)
;;; REGULAR EXPRESSIONS
;; Names for regular expressions matching letters and digits.
;; Note that :or are prefixed with a : due to (prefix-in : ...) above
(define-lex-abbrevs
[letter (:or (:/ "a" "z") (:/ #\A #\Z) )]
[digit (:/ #\0 #\9)])
;;; TOKENS
;; Tokens such as numbers (and identifiers and strings) carry a value
;; In the example only the NUMBER token is used, but you may need more.
(define-tokens value-tokens (NUMBER IDENTIFIER STRING))
;; Tokens that don't carry a value.
(define-empty-tokens op-tokens (newline := = < > + - * / ^ EOF))
;;; LEXER
;; Here the lexer (aka the scanner) is defined.
;; The construct lexer-src-pos evaluates to a function which scans an input port
;; returning one position-token at a time.
;; A position token contains besides the actual token also source location information
;; (i.e. you can see where in the file the token was read)
(define lex
(lexer-src-pos
[(eof) ; input: eof of file
'EOF] ; output: the symbol EOF
[(:or #\tab #\space #\newline) ; input: whitespace
(return-without-pos (lex input-port))] ; output: the next token
; (i.e. skip the whitespace)
[#\newline ; input: newline
(token-newline)] ; ouput: a newline-token
; ; note: (token-newline) returns 'newline
[(:or ":=" "+" "-" "*" "/" "^" "<" ">" "=") ; input: an operator
(string->symbol lexeme)] ; output: corresponding symbol
[(:+ digit) ; input: digits
(token-NUMBER (string->number lexeme))])) ; outout: a NUMBER token whose value is
; ; the number
; ; note: (token-value token)
; returns the number
;;; TEST
(define input (open-input-string "123+456"))
(lex input) ; (position-token (token 'NUMBER 123) (position 1 #f #f) (position 4 #f #f))
(lex input) ; (position-token '+ (position 4 #f #f) (position 5 #f #f))
(lex input) ; (position-token (token 'NUMBER 456) (position 5 #f #f) (position 8 #f #f))
(lex input) ; (position-token 'EOF (position 8 #f #f) (position 8 #f #f))
;; Let's make it a little easier to play with the lexer.
(define (string->tokens s)
(port->tokens (open-input-string s)))
(define (port->tokens in)
(define token (lex in))
(if (eq? (position-token-token token) 'EOF)
'()
(cons token (port->tokens in))))
(map position-token-token (string->tokens "123*45/3")) ; strip positions
; Output:
; (list (token 'NUMBER 123)
; '*
; (token 'NUMBER 45)
; '/
; (token 'NUMBER 3))
Related
For the MVE code below it outputs [] rather than the expected Not (Oper Eq 2 2)) for the input parseString "2+2" which is supposed to call pOper. My guess is that pOper would expect three arguments for the anonymous function to work. That is 3 strings. However due to partial call of a function only one argument is passed. Is there a way to work around to preserve the type signature of pOper while dealing with the Not and at the same time not changing the type definitions?
import Data.Char
import Text.ParserCombinators.ReadP
import Control.Applicative ((<|>))
type Parser a = ReadP a
data Value =
IntVal Int
deriving (Eq, Show, Read)
data Exp =
Const Value
| Oper Op Exp Exp
| Not Exp
deriving (Eq, Show, Read)
data Op = Plus | Minus | Eq
deriving (Eq, Show, Read)
space :: Parser Char
space = satisfy isSpace
spaces :: Parser String
spaces = many space
space1 :: Parser String
space1 = many1 space
symbol :: String -> Parser String
symbol = token . string
token :: Parser a -> Parser a
token combinator = (do spaces
combinator)
parseString input = readP_to_S (do
e <- pExpr
token eof
return e) input
pExpr :: Parser Exp
pExpr = chainl1 pTerm pOper
pTerm :: Parser Exp
pTerm =
(do
pv <- numConst
skipSpaces
return pv)
numConst :: Parser Exp
numConst =
(do
skipSpaces
y <- munch isDigit
return (Const (IntVal (read y)))
)
-- Parser for an operator
pOper :: ReadP (Exp -> Exp -> Exp)
pOper = symbol "+" >> return (Oper Plus)
<|> (symbol "-" >> return (Oper Minus))
<|> (symbol "=" >> return (Oper Eq))
<|> (symbol "!=" >> return (\e1 e2 -> Not (Oper Eq e1 e2)))
There's nothing wrong with your parser for !=. Rather, your parser for operators in general is broken: it only parses the first operator correctly. A simpler version of your pOper would be
pOper = a >> b
<|> (c >> d)
But because of precedence, this isn't the same as (a >> b) <|> (c >> d). Actually, it's a >> (b <|> (c >> d))! So the symbol your first alternative parses is accidentally mandatory. It would parse 2+!=2 instead.
So, you could fix this by just adding in the missing parentheses. But if, like me, you find it a little tacky to rely so much on operator precedence for semantic meaning, consider something that's more obviously safe, using the type system to separate the clauses from the delimiters:
pOper :: ReadP (Exp -> Exp -> Exp)
pOper = asum [ symbol "+" >> return (Oper Plus)
, symbol "-" >> return (Oper Minus)
, symbol "=" >> return (Oper Eq)
, symbol "!=" >> return (\e1 e2 -> Not (Oper Eq e1 e2))
]
This way, you have a list of independent parsers, not a single parser built with alternation. asum (from Control.Applicative) does the work of combining that list into alternatives. It means the same thing, of course, but it means you don't have to learn any operator precedence tables, because , can only be a list item separator.
The best way I can think of to solve the problem is by creating these to modificatoins: 1) this alternative in the expression
pExpr :: Parser Exp
pExpr =
(do pv <- chainl1 pTerm pOper
pv2 <- pOper2 pv
return pv2)
<|> chainl1 pTerm pOper
And 2) this helper function to deal with infix patterns
pOper2 :: Exp -> Parser Exp
pOper2 e1 = (do
symbol "!="
e2 <- numConst
return (Not (Oper Eq e1 e2)))
This is the output, althought I don't know if there will be problems if other operations such as / and * which has different associativety are to be taken into account as well.
parseString "2+4+6"
[(Oper Plus (Oper Plus (Const (IntVal 2)) (Const (IntVal 4))) (Const (IntVal 6)),"")]
ghci> parseString "2+4+6 != 2"
[(Not (Oper Eq (Oper Plus (Oper Plus (Const (IntVal 2)) (Const (IntVal 4))) (Const (IntVal 6))) (Const (IntVal 2))),"")]
ghci> parseString "2 != 4"
[(Not (Oper Eq (Const (IntVal 2)) (Const (IntVal 4))),"")]
I'm currently working on a parser for a simple programming language written in Haskell. I ran into a problem when I tried to allow for binary operators with differing associativities and precedences. Normally this wouldn't be an issue, but since my language allows users to define their own operators, the precedence of operators isn't known by the compiler until the program has already been parsed.
Here are some of the data types I've defined so far:
data Expr
= Var String
| Op String Expr Expr
| ..
data Assoc
= LeftAssoc
| RightAssoc
| NonAssoc
type OpTable =
Map.Map String (Assoc, Int)
At the moment, the compiler parses all operators as if they were right-associative with equal precedence. So if I give it an expression like a + b * c < d the result will be Op "+" (Var "a") (Op "*" (Var "b") (Op "<" (Var "c") (Var "d"))).
I'm trying to write a function called fixExpr which takes an OpTable and an Expr and rearranges the Expr based on the associativities and precedences listed in the OpTable. For example:
operators :: OpTable
operators =
Map.fromList
[ ("<", (NonAssoc, 4))
, ("+", (LeftAssoc, 6))
, ("*", (LeftAssoc, 7))
]
expr :: Expr
expr = Op "+" (Var "a") (Op "*" (Var "b") (Op "<" (Var "c") (Var "d")))
fixExpr operators expr should evaluate to Op "<" (Op "+" (Var "a") (Op "*" (Var "b") (Var "c"))) (Var "d").
How do I define the fixExpr function? I've tried multiple solutions and none of them have worked.
An expression e may be an atomic term n (e.g. a variable or literal), a parenthesised expression, or an application of an infix operator ○.
e ⩴ n | (e) | e1 ○ e2
We need the parentheses to know whether the user entered a * b + c, which we happen to associate as a * (b + c) and need to reassociate as (a * b) + c, or if they entered a * (b + c) literally, which should not be reassociated. Therefore I’ll make a small change to the data type:
data Expr
= Var String
| Group Expr
| Op String Expr Expr
| …
Then the method is simple:
The rebracketing of an expression ⟦e⟧ applies recursively to all its subexpressions.
⟦n⟧ = n
⟦(e)⟧ = (⟦e⟧)
⟦e1 ○ e2⟧ = ⦅⟦e1⟧ ○ ⟦e2⟧⦆
A single reassociation step ⦅e⦆ removes redundant parentheses on the right, and reassociates nested operator applications leftward in two cases: if the left operator has higher precedence, or if the two operators have equal precedence, and are both left-associative. It leaves nested infix applications alone, that is, associating rightward, in the opposite cases: if the right operator has higher precedence, or the operators have equal precedence and right associativity. If the associativities are mismatched, then the result is undefined.
⦅e ○ n⦆ = e ○ n
⦅e1 ○ (e2)⦆ = ⦅e1 ○ e2⦆
⦅e1 ○ (e2 ● e3)⦆ =
⦅e1 ○ e2⦆ ● e3, if:
a. P(○) > P(●); or
b. P(○) = P(●) and A(○) = A(●) = L
e1 ○ (e2 ● e3), if:
a. P(○) < P(●); or
b. P(○) = P(●) and A(○) = A(●) = R
undefined otherwise
NB.: P(o) and A(o) are respectively the precedence and associativity (L or R) of operator o.
This can be translated fairly literally to Haskell:
fixExpr operators = reassoc
where
-- 1.1
reassoc e#Var{} = e
-- 1.2
reassoc (Group e) = Group (reassoc e)
-- 1.3
reassoc (Op o e1 e2) = reassoc' o (reassoc e1) (reassoc e2)
-- 2.1
reassoc' o e1 e2#Var{} = Op o e1 e2
-- 2.2
reassoc' o e1 (Group e2) = reassoc' o e1 e2
-- 2.3
reassoc' o1 e1 r#(Op o2 e2 e3) = case compare prec1 prec2 of
-- 2.3.1a
GT -> assocLeft
-- 2.3.2a
LT -> assocRight
EQ -> case (assoc1, assoc2) of
-- 2.3.1b
(LeftAssoc, LeftAssoc) -> assocLeft
-- 2.3.2b
(RightAssoc, RightAssoc) -> assocRight
-- 2.3.3
_ -> error $ concat
[ "cannot mix ‘", o1
, "’ ("
, show assoc1
, " "
, show prec1
, ") and ‘"
, o2
, "’ ("
, show assoc2
, " "
, show prec2
, ") in the same infix expression"
]
where
(assoc1, prec1) = opInfo o1
(assoc2, prec2) = opInfo o2
assocLeft = Op o2 (Group (reassoc' o1 e1 e2)) e3
assocRight = Op o1 e1 r
opInfo op = fromMaybe (notFound op) (Map.lookup op operators)
notFound op = error $ concat
[ "no precedence/associativity defined for ‘"
, op
, "’"
]
Note the recursive call in assocLeft: by reassociating the operator applications, we may have revealed another association step, as in a chain of left-associative operator applications like a + b + c + d = (((a + b) + c) + d).
I insert Group constructors in the output for illustration, but they can be removed at this point, since they’re only necessary in the input.
This hasn’t been tested very thoroughly at all, but I think the idea is sound, and should accommodate modifications for more complex situations, even if the code leaves something to be desired.
An alternative that I’ve used is to parse expressions as “flat” sequences of operators applied to terms, and then run a separate parsing pass after name resolution, using e.g. Parsec’s operator precedence parser facility, which would handle these details automatically.
I was looking at these two resources (https://github.com/racket/parser-tools/blob/master/parser-tools-lib/parser-tools/examples/calc.rkt and https://gist.github.com/gcr/1318240) and although I don't fully understand yet how the main calc function works I wondered if it's possible to extend this to work for a simple c like program just without functions? So it will lex, parse and evaluate ifs, whiles and print statements. So something like (define-empty-tokens op-tokens ( newline = OC CC (open-curly/closed-curly for block statements) DEL PRINT WHILE (WHILE exp S) S IF S1 S2 (IF exp S1 S2) OP CP + - * / || % or && == != >= <= > < EOF ))
Here is how I've extended it (the code of the first link) so far to also work with booleans:
So in calcl I added these two lines:
[ (:= 2 #\|) (token-||)]
[(:or "=" "+" "-" "*" "/" "%" "&&" "==" "!=" ">=" "<=" ">" "<") (string->symbol lexeme)]
And then later:
(define calcp
(parser
(start start)
(end newline EOF)
(tokens value-tokens op-tokens)
(error (lambda (a b c) (void)))
(precs (right =)
(left ||)
(left &&)
(left == !=)
(left <= >= < >)
(left - +)
(left * / %)
)
(grammar
(start [() #f]
[(error start) $2]
[(exp) $1])
(exp [(NUM) $1]
[(VAR) (hash-ref vars $1 (lambda () 0))]
[(VAR = exp) (begin (hash-set! vars $1 $3)
$3)]
[(exp || exp) (if (not(and (equal? $1 0) (equal? $3 0) )) 1 0) ]
[(exp && exp) (and $1 $3)]
[(exp == exp) (equal? $1 $3)]
[(exp != exp) (not(equal? $1 $3))]
[(exp < exp) (< $1 $3)]
[(exp > exp) (> $1 $3)]
[(exp >= exp) (>= $1 $3)]
[(exp <= exp) (<= $1 $3)]
[(exp + exp) (+ $1 $3)]
[(exp - exp) (- $1 $3)]
[(exp * exp) (* $1 $3)]
[(exp / exp) (/ $1 $3)]
[(exp % exp) (remainder $1 $3)]
[(OP exp CP) $2]))))
But I'm struggling to understand the above code as well as the below. I would lile to change it so that it also to works for ifs and whiles etc. if it's at all possible?
(define (calc ip)
(port-count-lines! ip)
(letrec ((one-line
(lambda ()
(let ((result (calcp (lambda () (calcl ip)) )))
(when result (printf "~a\n" result) (one-line))
)
) ))
(one-line))
)
Also, this guy seems to be relying on newlines to mark the end of a statement. i.e. you can't have more than 1 statement on one line. I want the program to recognise two statements on one line and evaluate them separately by somehow looking ahead and checking whether there's a new undeclared variable, special keyword or open/closed bracket etc.
Update:
I managed, with the below rules, to build in brag an AST for arith expressions but how do I get rid of all but the important parens so that I can evaluate it?
Eg: with input list: (list (token 'NUM 17) '+ (token 'NUM 1) '* (token 'NUM 3) '/ 'OP (token 'NUM 6) '- (token 'NUM 5) 'CP)
I'm getting back:
'(exp (((((factor 17)))) + (((((factor 1))) * ((factor 3))) / ((((((factor 6)))) - (((factor 5))))))))
Here are my rules:
exp : add
/add : add ('+' mul)+ | add ('-' mul)+ | mul
/mul : mul ('*' atom)+ | mul ('/' atom)+ | mul ('%' atom)+ | atom
/atom : /OP add /CP | factor
factor : NUM | ID
You cannot easily implement a language with conditionals and looping constructs using an evaluator based on immediate evaluation.
That should be clear at least for loops. If you have something like (using a super-simplified syntax):
repeat 3 { i = i + 1 }
If you evaluate during the parse, i = i + 1 will be evaluated exactly once, since the string is parsed exactly once. In order for it to be evaluated several times, the parser needs to convert i = i + 1 into something that can be evaluated several times when the repeat is evaluated.
This something is usually an Abstract Syntax Tree (AST), or possibly a list of virtual machine operations. With Scheme, you could also just turn the expression being parsed into a functional.
All of this is totally practical and not even particularly difficult, but you do need to be prepared to do some reading, both about parsing and about generating executables. For the latter, I highly recommend the classic Structure and Interpretation of Computer Programs (Abelson & Sussman).
Based on what I've seen here and here I managed to implement the lex and parse phase of an interpreter for a simple c-like program. It doesn't have functions in it but it has assignments, variables, conditionals, loops and print statements (As well as arithmetic, it also contains logical expressions.). I'm posting it below, in case others may find it useful (The whole thing including the evaluation phase and samples of input is here):
(require parser-tools/yacc //provides you with the lexer, the parser and the lexeme tools eg. string-> symbol, string->number etc - In general, with the ability to map the literals in the input
parser-tools/lex
(prefix-in : parser-tools/lex-sre))
(define-tokens value-tokens (NUM VAR ))
(define-empty-tokens op-tokens ( newline = OC CC DEL OP CP + - * / || % or && == != >= <= > < EOF PRINT WHILE IF ELSE ))
(define vars (make-hash)) ;to store the values in the variables
(define-lex-abbrevs
(lower-letter (:/ "a" "z"))
(upper-letter (:/ #\A #\Z))
(digit (:/ "0" "9")))
(define calcl ;lexer is mapping the literals to their tokens or values
(lexer
[(eof) 'EOF]
[(:or #\tab #\space #\return #\newline ) (calcl input-port)]
[ (:= 2 #\|) (token-||)]
[(:or "=" "+" "-" "*" "/" "%" "&&" "==" "!=" ">=" "<=" ">" "<") (string->symbol lexeme)]
["(" 'OP]
[")" 'CP]
["{" 'OC]
["}" 'CC]
[ "print" 'PRINT ]
[#\, 'DEL ]
[ "while" 'WHILE ]
[ "if" 'IF ]
[ "else" 'ELSE ]
[(:+ (:or lower-letter upper-letter)) (token-VAR (string->symbol lexeme))]
[(:+ digit) (token-NUM (string->number lexeme))]
))
(define calcp ;defines how to parse the program and how the program structure is made up (recursively)
(parser
(start start);refers to the block named 'start' below. Every parser has a start, end, tokens definition, optional error message when needed and operator precedence def.
(end EOF)
(tokens value-tokens op-tokens )
(error (λ(ok? name value) (if (boolean? value) (printf "Couldn't parse: ~a\n" name) (printf "Couldn't parse: ~a\n" value))))
(precs ;sets the precedence of the operators in relation to each other - that is, to which operand they bind stronger
(left DEL);from lowest to highest
(right =)
(left ||)
(left &&)
(left == !=)
(left <= >= < >)
(left - +)
(left * / %)
(right OP)
(left CP)
(right OC)
(left CC)
)
(grammar ;what is the grammar of my program?
(start
[() '()] ; returns empty list when it matches onto nothing
[(statements) `(,$1)]
[(statements start) `(,$1,$2)] ;we can have more than one statement - one example of the recursiveness
)
(statements /what type of major statements we might have
[(var = exp ) `(assign ,$1 ,$3)]
[(IF ifState) $2]
[(WHILE while) $2]
[(PRINT printVals) `(print ,$2)]
)
(ifState ; It's assumed that you cannot have an if inside an if unless it's within curly braces
[(OP exp CP statements) `(if ,$2 ,$4)] /combinations of different major statements
[(OP exp CP block) `(if ,$2 ,$4)]
[(OP exp CP block ELSE statements ) `(if ,$2 ,$4 ,$6)]
[(OP exp CP statements ELSE block ) `(if ,$2 ,$4 ,$6)]
[(OP exp CP statements ELSE statements ) `(if ,$2 ,$4 ,$6)]
[(OP exp CP block ELSE block ) `(if ,$2 ,$4 ,$6)]
)
(while
[(OP exp CP block) `(while ,$2, $4)]
)
(block
[(OC start CC) $2] ;we can statements or entire program wrapped into curly braces - a block
)
(var
[(VAR) $1]
)
(printVals
[(exp DEL printVals ) `(,$1 ,$3)]
[(exp) $1]
)
(exp [(NUM) $1] ; smallest (most reducible) chunk in an expression when the chunk is an integer
[(VAR) $1] ; smallest (most reducible) chunk in an expression when the chunk is a variable
[(exp || exp) `((lambda (a b) (or a b)) ,$1 ,$3) ]
[(exp && exp) `((lambda (a b) (and a b)) ,$1 ,$3)]
[(exp == exp) `(equal? ,$1 ,$3)]
[(exp != exp) `(not(equal? ,$1 ,$3))]
[(exp < exp) `(< ,$1 ,$3)]
[(exp > exp) `(> ,$1 ,$3)]
[(exp >= exp) `(>= ,$1 ,$3)]
[(exp <= exp) `(<= ,$1 ,$3)]
[(exp + exp) `(+ ,$1 ,$3)]
[(exp - exp) `(- ,$1 ,$3)]
[(exp * exp) `(* ,$1 ,$3)]
[(exp / exp) `(quotient ,$1 ,$3)]
[(exp % exp) `(modulo ,$1 ,$3)]
[(OP exp CP) $2]) ;when the expressions are wrapped parentheses
)
)
)
Procedure calls the parser passing it a lambda (that calls the lexer providing it with the input) which it will use to get values from the lexer, as many values at a time as it sees fit.
; i.e. according to what the parsing rules above, prescribe
(define (calceval ip)
(calcp (lambda () (calcl ip))))
To see the evaluator (unfortunately without much comments yet) see here
I hava a language PLANG that supports evaluating a
polynomial on a sequence of points (numbers).
the language allows expressions of the
form {{ 𝒑𝒐𝒍𝒚 𝑪𝟏 𝑪𝟐 … 𝑪𝒌} {𝑷𝟏 𝑷𝟐 … 𝑷𝓵}} where all 𝐶𝑖 and all 𝑃𝑗 are
valid AE expressions (and both 𝑘 ≥ 1 and ℓ ≥ 1).
I was trying to write a parse for this language here is what I have so far:
(define-type PLANG
[Poly (Listof AE) (Listof AE)])
(define-type AE
[Num Number]
[Add AE AE]
[Sub AE AE]
[Mul AE AE]
[Div AE AE])
(: parse-sexpr : Sexpr -> AE)
;; to convert s-expressions into AEs
(define (parse-sexpr sexpr)
(match sexpr
[(number: n) (Num n)]
[(list '+ lhs rhs) (Add (parse-sexpr lhs)
(parse-sexpr rhs))]
[(list '- lhs rhs) (Sub (parse-sexpr lhs)
(parse-sexpr rhs))]
[(list '* lhs rhs) (Mul (parse-sexpr lhs)
(parse-sexpr rhs))]
[(list '/ lhs rhs) (Div (parse-sexpr lhs)
(parse-sexpr rhs))]
[else (error 'parse-sexpr "bad syntax in ~s"
sexpr)]))
(: parse : String -> PLANG)
;; parses a string containing a PLANG expression to a PLANG AST
(define (parse str)
(let ([code (string->sexpr str)])
(parse-sexpr (code) )))
(test (parse "{{poly 1 2 3} {1 2 3}}")
=> (Poly (list (Num 1) (Num 2) (Num 3))
(list (Num 1) (Num 2) (Num 3))))
(test (parse "{{poly } {1 2} }")
=error> "parse: at least one coefficient is
required in ((poly) (1 2))")
(test (parse "{{poly 1 2} {} }")
=error> "parse: at least one point is
required in ((poly 1 2) ())")
when I'm trying to make it run I get the errors:
Type Checker: Cannot apply expression of type (U (Listof Sexpr) Boolean Real String Symbol), since it is not a function type in: (code)
. Type Checker: type mismatch
expected: Poly
given: AE in: (parse-sexpr (code))
. Type Checker: Summary: 2 errors encountered in:
(code)
(parse-sexpr (code))
>
Any help would be appreciated..
The first problem is caused by an extra pair of parentheses. Keep in mind that in Racket, Typed Racket, and #lang pl, parentheses usually mean function application like this:
(function argument ...)
So when you write (code), it tries to interpret code as a function, to call it with zero arguments.
You can fix this problem by replacing (code) with code in the body of the parse function.
(define (parse str)
(let ([code (string->sexpr str)])
(parse-sexpr code)))
The second problem happens because you specified that the parse function should return a PLANG, but it instead returns the result of parse-sexpr which returns an AE.
Another way of wording this is that you've implemented parsing for AEs, but not for PLANGs.
In the code below I can correctly parse white spaces after each of the tokens using Parsec:
whitespace = skipMany (space <?> "")
number :: Parser Integer
number = result <?> "number"
where
result = do {
ds <- many1 digit;
whitespace;
return (read ds)
}
table = result
where
result = [
[Infix (genParser '*' (*)) AssocLeft,
Infix (genParser '/' div) AssocLeft],
[Infix (genParser '+' (+)) AssocLeft,
Infix (genParser '-' (-)) AssocLeft]]
genParser s f = char s >> whitespace >> return f
factor = parenExpr <|> number <?> "parens or number"
where
parenExpr = do {
char '(';
x <- expr;
char ')';
whitespace;
return x
}
expr :: Parser Integer
expr = buildExpressionParser table factor <?> "expression"
However I get a parse error when trying to only parse white spaces before, and after the operators:
whitespace = skipMany (space <?> "")
number :: Parser Integer
number = result <?> "number"
where
result = do {
ds <- many1 digit;
return (read ds)
}
table = result
where
result = [
[Infix (genParser '*' (*)) AssocLeft,
Infix (genParser '/' div) AssocLeft],
[Infix (genParser '+' (+)) AssocLeft,
Infix (genParser '-' (-)) AssocLeft]]
genParser s f = whitespace >> char s >> whitespace >> return f
factor = parenExpr <|> number <?> "parens or number"
where
parenExpr = do {
char '(';
x <- expr;
char ')';
return x
}
expr :: Parser Integer
expr = buildExpressionParser table factor <?> "expression"
The parse error is:
$ ./parsec_example < <(echo "2 * 2 * 3")
"(stdin)" (line 2, column 1):
unexpected end of input
expecting "*"
Why does this happen? Is there some other way to parse white space around just the operators?
When I test your code, 2 * 2 * 3 parses correctly, but 2 + 2 does not. Parsing fails because the parser for * consumes some input and backtracking isn't enabled at that position, so other parsers cannot be tried.
An expression parser created by buildExpressionParser tries to parse each operator in turn until one succeeds. When parsing 2 + 2, the following occurs:
The first 2 is matched by number. The rest of the input is + 2 (note the space at the beginning).
The parser genParser '*' (*) is applied to the input. It consumes the space, but does not match the + character.
The other infix operator parsers automatically fail because some input was consumed by genParser '*' (*).
You can fix this by wrapping the critical part of the parser in try. This saves the input until after char s succeeds. If char s fails, then buildExpressionParser can backtrack and try another infix operator.
genParser s f = try (whitespace >> char s) >> whitespace >> return f
The drawback of this parser is that, because it backtracks to before the leading whitespace before an infix operator, it repeatedly scans whitespace. It is usually better to parse whitespace after a successful match, like the OP's first parser example.