lalrpop produces error - unresolved import lexer - parsing

I am trying out parser lalrpop with the following grammar:
use std::str::FromStr;
grammar;
pub Term: i32 = {
<n:Num> => n,
"(" <t:Term> ")" => t,
};
Num: i32 = <s:r"[0-9]+"> => i32::from_str(s).unwrap();
When I try to use this parser, I get the following error and many similar ones:
error[E0432]: unresolved import `self::__lalrpop_util::lexer`
--> /home/martijn/rust/quadratic-equations-2d/target/debug/build/quadratic-equations-2d-3e9c9d94e3adc760/out/lang.rs:22:31
|
22 | use self::__lalrpop_util::lexer::Token;
| ^^^^^ could not find `lexer` in `__lalrpop_util`
Why doesn't this work?

You need to enable feature lexer with your dependency lalrpop-util. You should include it in your Cargo.toml like so:
[dependencies]
lalrpop-util = {version = "0.19", features = ["lexer"]}
See https://github.com/lalrpop/lalrpop/issues/538.

Related

Using the Earley library to parse with features and unification

The Earley parsing library is great for writing linguistic parsers in Haskell. CFGs can be specified in an intuitive way, and there is excellent support for backtracking and ambiguity. A simple example:
{-# LANGUAGE OverloadedStrings #-}
import Text.Earley
np = rule ("John" <|> "Mary")
vp = rule ("runs" <|> "walks")
sentence = do
subj <- np
pred <- vp
return $ (++) <$> subj <*> pred
sentence can be used to parse ["John", "runs"] or ["Mary", "walks"], among other inputs.
It would be nice to be able to use Earley to write parsers for FCFGs, where nonterminals are complexes of a label and a feature bundle, and feature matching can happen via unification (for example, the Earley parser in NLTK parses FCFGs). However, it is not clear how to do this using Earley, or whether it can even be done. An example of something we might want in something like BNF:
np[sg] ::= "John" | "Mary"
np[?x] ::= det n[?x]
n[pl] ::= "boys" | "girls"
det ::= "the"
vp[sg] ::= "runs" | "walks"
vp[pl] ::= "run" | "walk"
s ::= np[?x] vp[?x]
Under this FCFG, ["John", "runs"] is an s (since their number features match, as required by the s rule), and ["the", "boys", "walks"] isn't an s (since ["the", "boys"] parses to np[pl] and ["walks"] parses to vp[sg]).
One can in general rewrite an FCFG into an equivalent CFG, but this can be highly inconvenient, and result in a blowup of the grammar, especially when we have many possible features ranging over many possible values.
You're not actually doing any particularly interesting unification here, so perhaps it's enough to toss a very simple nondeterminism applicative of your own into the mix. The standard one is [], but for this case, even Maybe looks like enough. Like this:
{-# Language OverloadedStrings #-}
{-# Language TypeApplications #-}
import Control.Applicative
import Control.Monad
import Data.Foldable
import Text.Earley
data Feature = SG | PL deriving (Eq, Ord, Read, Show)
(=:=) :: (Feature, a) -> (Feature, b) -> Maybe (a, b)
(fa, a) =:= (fb, b) = (a, b) <$ guard (fa == fb)
data NP = Name String | Determined String String deriving (Eq, Ord, Read, Show)
np :: Grammar r (Prod r e String (Feature, NP))
np = rule . asum $
[ fmap (\name -> (SG, Name name)) ("John" <|> "Mary")
, liftA2 (\det n -> (PL, Determined det n)) "the" ("boys" <|> "girls")
]
vp :: Grammar r (Prod r e String (Feature, String))
vp = rule . asum $
[ (,) SG <$> ("runs" <|> "walks")
, (,) PL <$> ("run" <|> "walk")
]
s :: Grammar r (Prod r e String (Maybe (NP, String)))
s = liftA2 (liftA2 (=:=)) np vp
test :: [String] -> IO ()
test = print . allParses #() (parser s)
Try it out in ghci:
> sequence_ [test (words n ++ [v]) | n <- ["John", "the boys"], v <- ["walks", "walk"]]
([(Just (Name "John","walks"),2)],Report {position = 2, expected = [], unconsumed = []})
([(Nothing,2)],Report {position = 2, expected = [], unconsumed = []})
([(Nothing,3)],Report {position = 3, expected = [], unconsumed = []})
([(Just (Determined "the" "boys","walk"),3)],Report {position = 3, expected = [], unconsumed = []})
So, the result needs a bit of interpretation -- a successful parse of Nothing really counts as a failed parse -- but perhaps that's not so bad? Not sure. Certainly it's unfortunate that you don't get to reuse Earley's error-reporting and nondeterminism machinery. Probably to get either thing, you'd have to fork Earley.
If you need to do real unification you could look into returning a IntBindingT t Identity instead of a Maybe, but at least until your features are themselves recursive this is probably enough and much, much simpler.

Parsec: Parsing expression between slashes

I'm trying to parse simple expressions between slashes. Example: / 1+2*3 / should evaluate to 7.
I was trying this
module Test where
import Text.Parsec
import Text.Parsec.Language (emptyDef)
import Text.Parsec.Combinator (between)
import Text.Parsec.String (Parser)
import qualified Text.Parsec.Expr as Ex
import qualified Text.Parsec.Token as Tok
lexer :: Tok.TokenParser ()
lexer = Tok.makeTokenParser style
where
ops = ["+","*","-","/",";"]
names = ["def","extern"]
style = emptyDef {
Tok.commentLine = "#"
, Tok.reservedOpNames = ops
, Tok.reservedNames = names
}
integer :: Parser Int
integer = fromIntegral <$> Tok.integer lexer
parens :: Parser a -> Parser a
parens = Tok.parens lexer
braces :: Parser a -> Parser a
braces = Tok.braces lexer
slashes :: Parser a -> Parser a
slashes = between (reserved "/") (reserved "/")
reserved :: String -> Parser ()
reserved = Tok.reserved lexer
reservedOp :: String -> Parser ()
reservedOp = Tok.reservedOp lexer
binary s f assoc = Ex.Infix (reservedOp s >> return f) assoc
table = [[binary "*" (*) Ex.AssocLeft,
binary "/" div Ex.AssocLeft]
,[binary "+" (+) Ex.AssocLeft,
binary "-" (-) Ex.AssocLeft]]
factor :: Parser Int
factor = try integer
<|> parens expr
expr :: Parser Int
expr = Ex.buildExpressionParser table factor
programInSlashes :: Parser Int
programInSlashes = slashes expr
programInBraces :: Parser Int
programInBraces = braces expr
which works okay for programInBraces:
*Test> parse programInBraces "" "{ 1+2*3/4 }"
Right 2
however, programInSlashes does fail:
*Test> parse programInSlashes "" "/ 1+2*3/4 /"
Left (line 1, column 12):
unexpected end of input
expecting end of "/", integer or "("
Clearly the problem is that / is both an operator and the delimiter for the program itself. But as the language isn't ambiguous we should be able to parse that, no?
I think you can use Text.Parsec.Expr to parse the interior expression; then you can embed backtracking for the / case, for example:
Infix (try $ do { reserved "/"; notFollowedBy eof; return div }) AssocLeft
You can also parse the exterior language and the interior expression in separate passes. I’ve done this in a compiler for a language with custom operators: first parse the program without touching infix expressions, then run another pass to parse infix expressions according to the operators in scope.

Expression parser for unary operator

I was trying to make an expression parser for two operators out of which only ^ is postfix unary, so with an operand it would look like R^.
The problem is that whenever the operator ^ is encountered somewhere other than the end, it just stops there and returns whatever parsed successfully. It means "R;S;T^" parses successfully, but "R^;S;T^" stops at R. However, "(R^);S;T^" just works fine.
I took help from an online post which he used for unary minus but that is a prefix operator (for example -X). His original solution was giving errors at reservedOp2, so I modified it to reservedOp2 name = try (string name >> notFollowedBy (oneOf opChar)) and it produces the output mentioned above. I need it to work with or without parenthesis.
import Control.Applicative
import Text.ParserCombinators.Parsec hiding (many,optinal,(<|>))
import Text.ParserCombinators.Parsec.Expr
import Text.Parsec.Language (haskell)
import qualified Text.Parsec.Token as P
import Text.Parsec.String (Parser)
data RT = Comp RT RT
| Conv RT
| Var String
deriving (Show)
whiteSpace = P.whiteSpace haskell
word = P.identifier haskell
parens = P.parens haskell
opChar = "^;"
reservedOp2 :: String -> CharParser st ()
reservedOp2 name = try (string name >> notFollowedBy (oneOf opChar) >> whiteSpace)
term = parens relexpr
<|> Var <$> word
<?> "term"
table = [ [postfix "^" Conv]
, [binary ";" Comp AssocLeft]
]
prefix name fun = Prefix $ reservedOp2 name >> return fun
binary name fun = Infix $ reservedOp2 name >> return fun
postfix name fun = Postfix $ reservedOp2 name >> return fun
relexpr :: Parser RT
relexpr = buildExpressionParser table term <?> "expression"
It fails to parse e.g. "R^;S" because postfix "^" Conv fails on notFollowedBy (oneOf opChar) (since '^' is followed by ';'). The fix is to remove ';' from opChar:
opChar = "^"
Or, even easier if you can just use reservedOp from haskell:
reservedOp2 = P.reservedOp haskell
Either of these changes fixes the parsing of your examples.

How can I parse a float with a comma in place of the decimal point?

I want to parse Float values from a file where they are stored using comma as the decimal separator. Thus i need a function myParse :: String -> Float such that, for instance, myParse "23,46" == 23.46.
I have some ideas about how to do this, but they all seem overcomplicated, for example:
replace , with a . in the string and use read; or
follow this FP Complete blogpost (entitled Parsing Floats With Parsec), and challenge the curse of the monomorphism restriction.
Is there a simpler way, or do I really need to use a parsing library? In the second case, could you please paste some suggestions in order to get me started? The monomorphism restriction scares me, and I believe that there has to be a way to do this without using language extensions.
Replacing , by . and then call read is straightforward enough; you just need to remember to use your own specialized function instead of plain old read:
readFloatWithComma :: String -> Float
readFloatWithComma = read . sanitize
where
sanitize = map (\c -> if c == ',' then '.' else c)
In GHCi:
λ> readFloatWithComma "23,46"
23.46
Regarding the parsec approach, despite what the article you link to suggest, the monomorphism restriction needs not be a worry, as long as you have type signatures for all your top-level bindings. In particular, the following code doesn't need any language extensions to compile properly (at least, in GHC 7.10.1):
import Text.Parsec
import Text.Parsec.String ( Parser )
import Control.Applicative hiding ( (<|>) )
infixr 5 <++>
(<++>) :: Applicative f => f [a] -> f [a] -> f [a]
a <++> b = (++) <$> a <*> b
infixr 5 <:>
(<:>) :: Applicative f => f a -> f [a] -> f [a]
a <:> b = (:) <$> a <*> b
number :: Parser String
number = many1 digit
plus :: Parser String
plus = char '+' *> number
minus :: Parser String
minus = char '-' <:> number
integer :: Parser String
integer = plus <|> minus <|> number
float :: Parser Float
float = fmap rd $ integer <++> decimal <++> exponent
where rd = read :: String -> Float
decimal = option "" $ ('.' <$ char ',') <:> number
exponent = option "" $ oneOf "eE" <:> integer
In GHCi:
λ> parseTest float "23,46"
23.46

Is there any trick about translating BNF to Parsec program?

The BNF that match function call chain (like x(y)(z)...):
expr = term T
T = (expr) T
| EMPTY
term = (expr)
| VAR
Translate it to Parsec program that looks so tricky.
term :: Parser Term
term = parens expr <|> var
expr :: Parser Term
expr = do whiteSpace
e <- term
maybeAddSuffix e
where addSuffix e0 = do e1 <- parens expr
maybeAddSuffix $ TermApp e0 e1
maybeAddSuffix e = addSuffix e
<|> return e
Could you list all the design patterns about translating BNF to Parsec program?
The simplest think you could do if your grammar is sizeable is to just use the Alex/Happy combo. It is fairly straightforward to use, accepts the BNF format directly - no human translation needed - and perhaps most importantly, produces blazingly fast parsers/lexers.
If you are dead set on doing it with parsec (or you are doing this as a learning exercise), I find it easier in general to do it in two stages; first lexing, then parsing. Parsec will do both!
First write the appropriate types:
{-# LANGUAGE LambdaCase #-}
import Text.Parsec
import Text.Parsec.Combinator
import Text.Parsec.Prim
import Text.Parsec.Pos
import Text.ParserCombinators.Parsec.Char
import Control.Applicative hiding ((<|>))
import Control.Monad
data Term = App Term Term | Var String deriving (Show, Eq)
data Token = LParen | RParen | Str String deriving (Show, Eq)
type Lexer = Parsec [Char] () -- A lexer accepts a stream of Char
type Parser = Parsec [Token] () -- A parser accepts a stream of Token
Parsing a single token is simple. For simplicity, a variable is 1 or more letters. You can of course change this however you like.
oneToken :: Lexer Token
oneToken = (char '(' >> return LParen) <|>
(char ')' >> return RParen) <|>
(Str <$> many1 letter)
Parsing the entire token stream is just parsing a single token many times, possible separated by whitespace:
lexer :: Lexer [Token]
lexer = spaces >> many1 (oneToken <* spaces)
Note the placement of spaces: this way, white space is accepted at the beginning and end of the string.
Since Parser uses a custom token type, you have to use a custom satisfy function. Fortunately, this is almost identical to the existing satisfy.
satisfy' :: (Token -> Bool) -> Parser Token
satisfy' f = tokenPrim show
(\src _ _ -> incSourceColumn src 1)
(\x -> if f x then Just x else Nothing)
Then we can write parsers for each of the primitive tokens.
lparen = satisfy' $ \case { LParen -> True ; _ -> False }
rparen = satisfy' $ \case { RParen -> True ; _ -> False }
strTok = (\(Str s) -> s) <$> (satisfy' $ \case { Str {} -> True ; _ -> False })
As you may imagine, parens would be useful for our purposes. It is very straightforward to write.
parens :: Parser a -> Parser a
parens = between lparen rparen
Now the interesting parts.
term, expr, var :: Parser Term
term = parens expr <|> var
var = Var <$> strTok
These two should be fairly obvious to you.
Parec contains combinators option and optionMaybe which are useful when you you need to "maybe do something".
expr = do
e0 <- term
option e0 (parens expr >>= \e1 -> return (App e0 e1))
The last line means - try to apply the parser given to option - if it fails, instead return e0.
For testing you can do:
tokAndParse = runParser (lexer <* eof) () "" >=> runParser (expr <* eof) () ""
The eof attached to each parser is to make sure that the entire input is consumed; the string cannot be a member of the grammar if there are extra trailing characters. Note - your example x(y)(z) is not actually in your grammar!
>tokAndParse "x(y)(z)"
Left (line 1, column 5):
unexpected LParen
expecting end of input
But the following is
>tokAndParse "(x(y))(z)"
Right (App (App (Var "x") (Var "y")) (Var "z"))

Resources