I'm trying to write a parser in Scala for SML with Tokens. It almost works the way I want it to work, except for the fact that this currently parses
let fun f x = r and fun g y in r end;
instead of
let fun f x = r and g y in r end;
How do I change my code so that it recognizes that it doesn't need a FunToken for the second function?
def parseDef:Def = {
currentToken match {
case ValToken => {
eat(ValToken);
val nme:String = currentToken match {
case IdToken(x) => {advance; x}
case _ => error("Expected a name after VAL.")
}
eat(EqualToken);
VAL(nme,parseExp)
}
case FunToken => {
eat(FunToken);
val fnme:String = currentToken match {
case IdToken(x) => {advance; x}
case _ => error("Expected a name after VAL.")
}
val xnme:String = currentToken match {
case IdToken(x) => {advance; x}
case _ => error("Expected a name after VAL.")
}
def parseAnd:Def = currentToken match {
case AndToken => {eat(AndToken); FUN(fnme,xnme,parseExp,parseAnd)}
case _ => NOFUN
}
FUN(fnme,xnme,parseExp,parseAnd)
}
case _ => error("Expected VAL or FUN.");
}
}
Just implement the right grammar. Instead of
def ::= "val" id "=" exp | fun
fun ::= "fun" id id "=" exp ["and" fun]
SML's grammar actually is
def ::= "val" id "=" exp | "fun" fun
fun ::= id id "=" exp ["and" fun]
Btw, I think there are other problems with your parsing of fun. AFAICS, you are not parsing any "=" in the fun case. Moreover, after an "and", you are not even parsing any identifiers, just the function body.
You could inject the FunToken back into your input stream with an "uneat" function. This is not the most elegant solution, but it's the one that requires the least modification of your current code.
def parseAnd:Def = currentToken match {
case AndToken => { eat(AndToken);
uneat(FunToken);
FUN(fnme,xnme,parseExp,parseAnd) }
case _ => NOFUN
}
Related
I'm just fooling about and strangely found it a bit tricky to parse nested brackets in a simple recursive function.
For example, if the program's purpose it to lookup user details, it may go from {{name surname} age} to {Bob Builder age} and then to Bob Builder 20.
Here is a mini-program for summing totals in curly brackets that demonstrates the concept.
// Parses string recursively by eliminating brackets
def parse(s: String): String = {
if (!s.contains("{")) s
else {
parse(resolvePair(s))
}
}
// Sums one pair and returns the string, starting at deepest nested pair
// e.g.
// {2+10} lollies and {3+{4+5}} peanuts
// should return:
// {2+10} lollies and {3+9} peanuts
def resolvePair(s: String): String = {
??? // Replace the deepest nested pair with it's sumString result
}
// Sums values in a string, returning the result as a string
// e.g. sumString("3+8") returns "11"
def sumString(s: String): String = {
val v = s.split("\\+")
v.foldLeft(0)(_.toInt + _.toInt).toString
}
// Should return "12 lollies and 12 peanuts"
parse("{2+10} lollies and {3+{4+5}} peanuts")
Any ideas to a clean bit of code that could replace the ??? would be great. It's mostly out of curiosity that I'm searching for an elegant solution to this problem.
Parser combinators can handle this kind of situation:
import scala.util.parsing.combinator.RegexParsers
object BraceParser extends RegexParsers {
override def skipWhitespace = false
def number = """\d+""".r ^^ { _.toInt }
def sum: Parser[Int] = "{" ~> (number | sum) ~ "+" ~ (number | sum) <~ "}" ^^ {
case x ~ "+" ~ y => x + y
}
def text = """[^{}]+""".r
def chunk = sum ^^ {_.toString } | text
def chunks = rep1(chunk) ^^ {_.mkString} | ""
def apply(input: String): String = parseAll(chunks, input) match {
case Success(result, _) => result
case failure: NoSuccess => scala.sys.error(failure.msg)
}
}
Then:
BraceParser("{2+10} lollies and {3+{4+5}} peanuts")
//> res0: String = 12 lollies and 12 peanuts
There is some investment before getting comfortable with parser combinators but I think it is really worth it.
To help you decipher the syntax above:
regular expression and strings have implicit conversions to create primitive parsers with strings results, they have type Parser[String].
the ^^ operator allows to apply a function to the parsed elements
it can convert a Parser[String] into a Parser[Int] by doing ^^ {_.toInt}
Parser is a monad and Parser[T].^^(f) is equivalent to Parser[T].map(f)
the ~, ~> and <~ requires some inputs to be in a certain sequence
the ~> and <~ drop one side of the input out of the result
the case a ~ b allows to pattern match the results
Parser is a monad and (p ~ q) ^^ { case a ~ b => f(a, b) } is equivalent to for (a <- p; b <- q) yield (f(a, b))
(p <~ q) ^^ f is equivalent to for (a <- p; _ <- q) yield f(a)
rep1 is a repetition of 1 or more element
| tries to match an input with the parser on its left and if failing it will try the parser on the right
How about
def resolvePair(s: String): String = {
val open = s.lastIndexOf('{')
val close = s.indexOf('}', open)
if((open >= 0) && (close > open)) {
val (a,b) = s.splitAt(open+1)
val (c,d) = b.splitAt(close-open-1)
resolvePair(a.dropRight(1)+sumString(c).toString+d.drop(1))
} else
s
}
I know it's ugly but I think it works fine.
The following snippet of parser combinator demonstrates am aim of generalising binary comparison ops like > by using Ordered[T]. Gt seems to accomplish this at the AST level but I'm having trouble extending this concept.
The intGt parser works but is it possible to generalise this around Ordered[T] such that we don't need to write a second parser for floatGt (and hence one for all supported orderable types * all supported ops - no thanks).
object DSL extends JavaTokenParsers {
// AST
abstract class Expr[+T] { def eval: T }
case class Literal[T](t: T) extends Expr[T] { def eval = t }
case class Gt[T <% Ordered[T]](l: Expr[T], r: Expr[T]) extends Expr[Boolean] {
def eval = l.eval > r.eval // view-bound implicitly wraps eval result as Ordered[T]
}
// Parsers
lazy val intExpr: Parser[Expr[Int]] = wholeNumber ^^ { case x => Literal(x.toInt) }
lazy val floatExpr: Parser[Expr[Float]] = decimalNumber ^^ { case x => Literal(x.toFloat) }
lazy val intGt: Parser[Expr[Boolean]] = intExpr ~ (">" ~> intExpr) ^^ { case l ~ r => Gt(l, r) }
}
I tried playing around and this is the best I could come up with in the time I had:
import scala.util.parsing.combinator.JavaTokenParsers
object DSL extends JavaTokenParsers {
// AST
abstract class Expr[+T] { def eval: T }
case class Literal[T](t: T) extends Expr[T] { def eval = t }
case class BinOp[T,U](
val l : Expr[T],
val r : Expr[T],
val evalOp : (T, T) => U) extends Expr[U] {
def eval = evalOp(l.eval, r.eval)
}
case class OrderOp[O <% Ordered[O]](symbol : String, op : (O, O) => Boolean)
def gtOp[O <% Ordered[O]] = OrderOp[O](">", _ > _)
def gteOp[O <% Ordered[O]] = OrderOp[O](">=", _ >= _)
def ltOp[O <% Ordered[O]] = OrderOp[O]("<", _ < _)
def lteOp[O <% Ordered[O]] = OrderOp[O]("<=", _ <= _)
def eqOp[O <% Ordered[O]] = OrderOp[O]("==", _.compareTo(_) == 0)
def ops[O <% Ordered[O]] =
Seq(gtOp[O], gteOp[O], ltOp[O], lteOp[O], eqOp[O])
def orderExpr[O <% Ordered[O]](
subExpr : Parser[Expr[O]],
orderOp : OrderOp[O])
: Parser[Expr[Boolean]] =
subExpr ~ (orderOp.symbol ~> subExpr) ^^
{ case l ~ r => BinOp(l, r, orderOp.op) }
// Parsers
lazy val intExpr: Parser[Expr[Int]] =
wholeNumber ^^ { case x => Literal(x.toInt) }
lazy val floatExpr: Parser[Expr[Float]] =
decimalNumber ^^ { case x => Literal(x.toFloat) }
lazy val intOrderOps : Parser[Expr[Boolean]] =
ops[Int].map(orderExpr(intExpr, _)).reduce(_ | _)
lazy val floatOrderOps : Parser[Expr[Boolean]] =
ops[Float].map(orderExpr(floatExpr, _)).reduce(_ | _)
}
Essentially, I defined a small case class OrderOp that relates a string representing
an ordering operation to a function which will evaluate that operation. I then defined a function ops capable of creating a Seq[OrderOp] of all such ordering operations for a given Orderable type. These operations can then be turned into parsers using orderExpr, which takes the sub expression parser and the operation. This is mapped over all the ordering operations for your int and float types.
Some issues with this approach:
There is only one node type in the AST type hierarchy for all binary operations. This isn't a problem if all you are ever doing is evaluating, but if you ever wanted to do rewriting operations (eliminating obvious tautologies or contradictions, for instance) then there is insufficient information to do this with the current definition of BinOp.
I still needed to map orderExpr for each distinct type. There may be a way to fix this, but I ran out of time.
orderExpr expects the left and right subexpressions to be parsed with the same parser.
I'm playing around with a toy HTML parser, to help familiarize myself with Scala's parsing combinators library:
import scala.util.parsing.combinator._
sealed abstract class Node
case class TextNode(val contents : String) extends Node
case class Element(
val tag : String,
val attributes : Map[String,Option[String]],
val children : Seq[Node]
) extends Node
object HTML extends RegexParsers {
val node: Parser[Node] = text | element
val text: Parser[TextNode] = """[^<]+""".r ^^ TextNode
val label: Parser[String] = """(\w[:\w]*)""".r
val value : Parser[String] = """("[^"]*"|\w+)""".r
val attribute : Parser[(String,Option[String])] = label ~ (
"=" ~> value ^^ Some[String] | "" ^^ { case _ => None }
) ^^ { case (k ~ v) => k -> v }
val element: Parser[Element] = (
("<" ~> label ~ rep(whiteSpace ~> attribute) <~ ">" )
~ rep(node) ~
("</" ~> label <~ ">")
) ^^ {
case (tag ~ attributes ~ children ~ close) => Element(tag, Map(attributes : _*), children)
}
}
What I'm realizing I want is some way to make sure my opening and closing tags match.
I think to do that, I need some sort of flatMap combinator ~ Parser[A] => (A => Parser[B]) => Parser[B],
so I can use the opening tag to construct the parser for the closing tag. But I don't see anything matching that signature in the library.
What's the proper way to do this?
You can write a method that takes a tag name and returns a parser for a closing tag with that name:
object HTML extends RegexParsers {
lazy val node: Parser[Node] = text | element
val text: Parser[TextNode] = """[^<]+""".r ^^ TextNode
val label: Parser[String] = """(\w[:\w]*)""".r
val value : Parser[String] = """("[^"]*"|\w+)""".r
val attribute : Parser[(String, Option[String])] = label ~ (
"=" ~> value ^^ Some[String] | "" ^^ { case _ => None }
) ^^ { case (k ~ v) => k -> v }
val openTag: Parser[String ~ Seq[(String, Option[String])]] =
"<" ~> label ~ rep(whiteSpace ~> attribute) <~ ">"
def closeTag(name: String): Parser[String] = "</" ~> name <~ ">"
val element: Parser[Element] = openTag.flatMap {
case (tag ~ attrs) =>
rep(node) <~ closeTag(tag) ^^
(children => Element(tag, attrs.toMap, children))
}
}
Note that you also need to make node lazy. Now you get a nice clean error message for unmatched tags:
scala> HTML.parse(HTML.element, "<a></b>")
res0: HTML.ParseResult[Element] =
[1.6] failure: `a' expected but `b' found
<a></b>
^
I've been a little more verbose than necessary for the sake of clarity. If you want concision you can skip the openTag and closeTag methods and write element like this, for example:
val element = "<" ~> label ~ rep(whiteSpace ~> attribute) <~ ">" >> {
case (tag ~ attrs) =>
rep(node) <~ "</" ~> tag <~ ">" ^^
(children => Element(tag, attrs.toMap, children))
}
I'm sure more concise versions would be possible, but in my opinion even this is edging toward unreadability.
There is a flatMap on Parser, and also an equivalent method named into and an operator >>, which might be more convenient aliases (flatMap is still needed when used in for comprehensions). It is indeed a valid way to do what you're looking for.
Alternatively, you can check that the tags match with ^?.
You are looking at the wrong place. It's a normal mistake, though. You want a method Parser[A] => (A => Parser[B]) => Parser[B], but you looked at the docs of Parsers, not Parser.
Look here.
There's a flatMap, also known as into or >>.
So i have a bunch of parsers like this:
object MyParser extends RegexParsers{
override val skipWhitespace = false
def blockLine = ((id ~ args) <~ ":") ~ ".*?".r ^^ {
case (blockID ~ argList) ~ rest => ???
}
def args = (("[" ~> rep1sep(arg, ", ") <~ "]")?) ^^ {
case Some(argList) =>
argList.zipWithIndex.map{
case ((Some(k), v), index) => k -> v
case ((None, v), index) => "arg" + index -> v
}
case None => List()
}
def arg = ((id <~ "=")?) ~ argtext ^^ {
case Some(name) ~ value => Some(name) -> value.toString()
case None ~ value=> None -> value
}
def argtext = "[^\\[\\],]+".r
def id = "[a-zA-Z]*".r
... many other parsers not shown...
}
Essentially, I want to re-use the parsers id and args in blockLine, but rather than getting the nested tree of List()s and ~s, I want to get back the original string that was matched. The purpose of this is doing some smart text-preprocessing (using the same parsers that I will use later for the actual parsing) to insert some text in the middle of the line. Something like:
def blockLine = (rawText(id ~ args) <~ ":") ~ ".*?".r ^^ {
case first ~ rest => first + "{" + rest
}
The higher purpose of the preprocessor is to go through and convert indentation-delimited blocks into curly-braces delimited blocks, so I can run the pre-processed file through a normal parser later. Is there any easy way to do this?
Trying to parse an nested expressions like GroupParser.parse("{{a}{{c}{d}}}")
After many hours i have now following snipplet that parse {a} well, but fails with
[1.5] failure: ``}'' expected but `{' found
{{a}{{b}{c}}}
^
sealed abstract class Expr
case class ValueNode(value:String) extends Expr
object GroupParser extends StandardTokenParsers {
lexical.delimiters ++= List("{","}")
def vstring = ident ^^ { case s => ValueNode(s) }
def expr = ( vstring | parens )
def parens:Parser[Expr] = "{" ~> expr <~ "}"
def parse(s:String) = {
val tokens = new lexical.Scanner(s)
phrase(expr)(tokens)
}
}
any hints?
The problem isn't nesting, it's sequencing. Your grammar would allow arbitrary nesting of expressions inside curlies, but doesn't say that an expression can be sequenced so the parser can't handle {a} followed immediately by {{b}{c}}. You can code sequencing using explicit recursion in your grammar or by using one of the rep variants in http://www.scala-lang.org/api/current/scala/util/parsing/combinator/Parsers.html
Can expressions be repeated multiple times? If so, this would work:
def expr = ( vstring | parens )+
However, it is not clear what is your grammar, or why would your example be acceptable.
This parses the two examples you gave:
import scala.util.parsing.combinator.syntactical._
sealed abstract class Expr
case class ValueNode(value:String) extends Expr
case class ValueListNode(value:List[Expr]) extends Expr
object GroupParser extends StandardTokenParsers {
lexical.delimiters ++= List("{","}")
def vstring = ident ^^ { case s => ValueNode(s) }
def parens:Parser[Expr] = "{" ~> ( expr ) <~ "}"
def expr = vstring | parens
def exprList:Parser[Expr] = "{" ~> rep1( expr | exprList ) <~ "}" ^^ {
case l => {
ValueListNode(l)
}
}
def anyExpr = expr | exprList
def parse(s:String) = {
val tokens = new lexical.Scanner(s)
phrase(anyExpr)(tokens)
}
def test(s: String) = {
parse(s) match {
case Success(tree, _) =>
println("Tree: " + tree)
case e: NoSuccess => Console.err.println(e)
}
}
def main(args: Array[String]) = {
test("{a}")
test("{{a}}")
test("{{a}{{b}{c}}}")
}
}
And succeeds with output:
Tree: ValueNode(a)
Tree: ValueNode(a)
Tree: ValueListNode(List(ValueNode(a), ValueListNode(List(ValueNode(b), ValueNode(c)))))