scala parsing with nested parens - parsing

Trying to parse an nested expressions like GroupParser.parse("{{a}{{c}{d}}}")
After many hours i have now following snipplet that parse {a} well, but fails with
[1.5] failure: ``}'' expected but `{' found
{{a}{{b}{c}}}
^
sealed abstract class Expr
case class ValueNode(value:String) extends Expr
object GroupParser extends StandardTokenParsers {
lexical.delimiters ++= List("{","}")
def vstring = ident ^^ { case s => ValueNode(s) }
def expr = ( vstring | parens )
def parens:Parser[Expr] = "{" ~> expr <~ "}"
def parse(s:String) = {
val tokens = new lexical.Scanner(s)
phrase(expr)(tokens)
}
}
any hints?

The problem isn't nesting, it's sequencing. Your grammar would allow arbitrary nesting of expressions inside curlies, but doesn't say that an expression can be sequenced so the parser can't handle {a} followed immediately by {{b}{c}}. You can code sequencing using explicit recursion in your grammar or by using one of the rep variants in http://www.scala-lang.org/api/current/scala/util/parsing/combinator/Parsers.html

Can expressions be repeated multiple times? If so, this would work:
def expr = ( vstring | parens )+
However, it is not clear what is your grammar, or why would your example be acceptable.

This parses the two examples you gave:
import scala.util.parsing.combinator.syntactical._
sealed abstract class Expr
case class ValueNode(value:String) extends Expr
case class ValueListNode(value:List[Expr]) extends Expr
object GroupParser extends StandardTokenParsers {
lexical.delimiters ++= List("{","}")
def vstring = ident ^^ { case s => ValueNode(s) }
def parens:Parser[Expr] = "{" ~> ( expr ) <~ "}"
def expr = vstring | parens
def exprList:Parser[Expr] = "{" ~> rep1( expr | exprList ) <~ "}" ^^ {
case l => {
ValueListNode(l)
}
}
def anyExpr = expr | exprList
def parse(s:String) = {
val tokens = new lexical.Scanner(s)
phrase(anyExpr)(tokens)
}
def test(s: String) = {
parse(s) match {
case Success(tree, _) =>
println("Tree: " + tree)
case e: NoSuccess => Console.err.println(e)
}
}
def main(args: Array[String]) = {
test("{a}")
test("{{a}}")
test("{{a}{{b}{c}}}")
}
}
And succeeds with output:
Tree: ValueNode(a)
Tree: ValueNode(a)
Tree: ValueListNode(List(ValueNode(a), ValueListNode(List(ValueNode(b), ValueNode(c)))))

Related

How to embed Scala code inside a specially defined syntax?

I don't know if this info is relevant to the question, but I am learning Scala parser combinators.
Using some examples (in this master thesis) I was able to write a simple functional (in the sense that it is non imperative) programming language.
Is there a way to improve my parser/evaluator such that it could allow/evaluate input like this:
<%
import scala.<some package / classes>
import weka.<some package / classes>
%>
some DSL code (lambda calculus)
<%
System.out.println("asdasd");
J48 j48 = new J48();
%>
as input written in the guest language (DSL)?
Should I use reflection or something similar* to evaluate such input?
Is there some source code recommendation to study (may be groovy sources?)?
Maybe this is something similar: runtime compilation, but I am not sure this is the best alternative.
EDIT
Complete answer given bellow with "{" and "}". Maybe "{{" would be better.
It is the question as to what the meaning of such import statements should be.
Perhaps you start first with allowing references to java methods in your language (the Lambda Calculus, I guess?).
For example:
java.lang.System.out.println "foo"
If you have that, you can then add resolution of unqualified names like
println "foo"
But here comes the first problem: println exists in System.out and System.err, or, to be more correct: it is a method of PrintStream, and both System.err and System.out are PrintStreams.
Hence you would need some notion of Objects, Classes, Types, and so on to do it right.
I managed how to run Scala code embedded in my interpreted DSL.
Insertion of DSL vars into Scala code and recovering returning value comes as a bonus. :)
Minimal relevant code from parsing and interpreting until performing embedded Scala code run-time execution (Main Parser AST and Interpreter):
object Main extends App {
val ast = Parser1 parse "some dsl code here"
Interpreter eval ast
}
object Parser1 extends RegexParsers with ImplicitConversions {
import AST._
val separator = ";"
def parse(input: String): Expr = parseAll(program, input).get
type P[+T] = Parser[T]
def program = rep1sep(expr, separator) <~ separator ^^ Sequence
def expr: Parser[Expr] = (assign /*more calls here*/)
def scalacode: P[Expr] = "{" ~> rep(scala_text) <~ "}" ^^ {case l => Scalacode(l.flatten)}
def scala_text = text_no_braces ~ "$" ~ ident ~ text_no_braces ^^ {case a ~ b ~ c ~ d => List(a, b + c, d)}
//more rules here
def assign = ident ~ ("=" ~> atomic_expr) ^^ Assign
//more rules here
def atomic_expr = (
ident ^^ Var
//more calls here
| "(" ~> expr <~ ")"
| scalacode
| failure("expression expected")
)
def text_no_braces = """[a-zA-Z0-9\"\'\+\-\_!##%\&\(\)\[\]\/\?\:;\.\>\<\,\|= \*\\\n]*""".r //| fail("Scala code expected")
def ident = """[a-zA-Z]+[a-zA-Z0-9]*""".r
}
object AST {
sealed abstract class Expr
// more classes here
case class Scalacode(items: List[String]) extends Expr
case class Literal(v: Any) extends Expr
case class Var(name: String) extends Expr
}
object Interpreter {
import AST._
val env = collection.immutable.Map[VarName, VarValue]()
def run(code: String) = {
val code2 = "val res_1 = (" + code + ")"
interpret.interpret(code2)
val res = interpret.valueOfTerm("res_1")
if (res == None) Literal() else Literal(res.get)
}
class Context(private var env: Environment = initEnv) {
def eval(e: Expr): Any = e match {
case Scalacode(l: List[String]) => {
val r = l map {
x =>
if (x.startsWith("$")) {
eval(Var(x.drop(1)))
} else {
x
}
}
eval(run(r.mkString))
}
case Assign(id, expr) => env += (id -> eval(expr))
//more pattern matching here
case Literal(v) => v
case Var(id) => {
env getOrElse(id, sys.error("Undefined " + id))
}
}
}
}

Using regex in StandardTokenParsers

I'm trying to use regex in my StandardTokenParsers based parser. For that, I've subclassed StdLexical as follows:
class CustomLexical extends StdLexical{
def regex(r: Regex): Parser[String] = new Parser[String] {
def apply(in:Input) = r.findPrefixMatchOf(in.source.subSequence(in.offset, in.source.length)) match {
case Some(matched) => Success(in.source.subSequence(in.offset, in.offset + matched.end).toString,
in.drop(matched.end))
case None => Failure("string matching regex `" + r + "' expected but " + in.first + " found", in)
}
}
override def token: Parser[Token] =
( regex("[a-zA-Z]:\\\\[\\w\\\\?]* | /[\\w/]*".r) ^^ { StringLit(_) }
| identChar ~ rep( identChar | digit ) ^^ { case first ~ rest => processIdent(first :: rest mkString "") }
| ...
But I'm a little confused on how I would define a Parser that takes advantage of this. I have a parser defined as:
def mTargetFolder: Parser[String] = "TargetFolder" ~> "=" ~> mFilePath
which should be used to identify valid file paths. I tried then:
def mFilePath: Parser[String] = "[a-zA-Z]:\\\\[\\w\\\\?]* | /[\\w/]*".r
But this is obviously not right. I get an error:
scala: type mismatch;
found : scala.util.matching.Regex
required: McfpDSL.this.Parser[String]
def mFilePath: Parser[String] = "[a-zA-Z]:\\\\[\\w\\\\?]* | /[\\w/]*".r
^
What is the proper way of using the extension made on my StdLexical subclass?
If you really want to use token based parsing, and reuse StdLexical, I would advise to update the syntax for "TargetFolder" so that the value after the equal sign is a proper string literal. Or in other words, make it so the path should be enclosed with quotes. From that point you don't need to extends StdLexical anymore.
Then comes the problem of converting a regexp to a parser. Scala already has RegexParsers for this (which implicitly converts a regexp to a Parser[String]), but unfortunately that's not what you want here because it works on streams of Char (type Elem = Char in RegexParsers) while you are working on a sttream of tokens.
So we will indeed have to define our own conversion from Regex to Parser[String] (but at the syntactic level rather than lexical level, or in other words in the token parser).
import scala.util.parsing.combinator.syntactical._
import scala.util.matching.Regex
import scala.util.parsing.input._
object MyParser extends StandardTokenParsers {
import lexical.StringLit
def regexStringLit(r: Regex): Parser[String] = acceptMatch(
"string literal matching regex " + r,
{ case StringLit( s ) if r.unapplySeq(s).isDefined => s }
)
lexical.delimiters += "="
lexical.reserved += "TargetFolder"
lazy val mTargetFolder: Parser[String] = "TargetFolder" ~> "=" ~> mFilePath
lazy val mFilePath: Parser[String] = regexStringLit("([a-zA-Z]:\\\\[\\w\\\\?]*)|(/[\\w/]*)".r)
def parseTargetFolder( s: String ) = { mTargetFolder( new lexical.Scanner( s ) ) }
}
Example:
scala> MyParser.parseTargetFolder("""TargetFolder = "c:\Dir1\Dir2" """)
res12: MyParser.ParseResult[String] = [1.31] parsed: c:\Dir1\Dir2
scala> MyParser.parseTargetFolder("""TargetFolder = "/Dir1/Dir2" """)
res13: MyParser.ParseResult[String] = [1.29] parsed: /Dir1/Dir2
scala> MyParser.parseTargetFolder("""TargetFolder = "Hello world" """)
res14: MyParser.ParseResult[String] =
[1.16] failure: identifier matching regex ([a-zA-Z]:\\[\w\\?]*)|(/[\w/]*) expected
TargetFolder = "Hello world"
^
Note that also fixed your "target folder" regexp here, you had missing parens around the two alternative, plus unneeded spaces.
Just call your function regex when you want to get a Parser[String] from a Regex:
def p: Parser[String] = regex("".r)
Or make regex implicit to let the compiler call it automatically for you:
implicit def regex(r: Regex): Parser[String] = ...
// =>
def p: Parser[String] = "".r

Creating a recursive data structure using parser combinators in scala

I'm a beginner in scala, working on S99 to try to learn scala. One of the problems involves converting from a string to a tree data structure. I can do it "manually", by I also want to see how to do it using Scala's parser combinator library.
The data structure for the tree is
sealed abstract class Tree[+T]
case class Node[+T](value: T, left: Tree[T], right: Tree[T]) extends Tree[T] {
override def toString = "T(" + value.toString + " " + left.toString + " " + right.toString + ")"
}
case object End extends Tree[Nothing] {
override def toString = "."
}
object Node {
def apply[T](value: T): Node[T] = Node(value, End, End)
}
And the input is supposed to be a string, like this: a(b(d,e),c(,f(g,)))
I can parse the string using something like
trait Tree extends JavaTokenParsers{
def leaf: Parser[Any] = ident
def child: Parser[Any] = node | leaf | ""
def node: Parser[Any] = ident~"("~child~","~child~")" | leaf
}
But how can I use the parsing library to build the tree? I know that I can use ^^ to convert, for example, some string into an integer. My confusing comes from needed to 'know' the left and the right subtrees when creating an instance of Node. How can I do that, or is that a sign that I want to do something different?
Am I better off taking the thing the parser returns ((((((a~()~(((((b~()~d)~,)~e)~)))~,)~(((((c~()~)~,)~(((((f~()~g)~,)~)~)))~)))~)) for the example input above), and building the tree based on that, rather than use parser operators like ^^ or ^^^ to build the tree directly?
It is possible to do this cleanly with ^^, and you're fairly close:
object TreeParser extends JavaTokenParsers{
def leaf: Parser[Node[String]] = ident ^^ (Node(_))
def child: Parser[Tree[String]] = node | leaf | "" ^^ (_ => End)
def node: Parser[Tree[String]] =
ident ~ ("(" ~> child) ~ ("," ~> child <~ ")") ^^ {
case v ~ l ~ r => Node(v, l, r)
} | leaf
}
And now:
scala> TreeParser.parseAll(TreeParser.node, "a(b(d,e),c(,f(g,)))").get
res0: Tree[String] = T(a T(b T(d . .) T(e . .)) T(c . T(f T(g . .) .)))
In my opinion the easiest way to approach this kind of problem is to type the parser methods with the results you want, and then add the appropriate mapping operations with ^^ until the compiler is happy.

Scala: Parsing matching token

I'm playing around with a toy HTML parser, to help familiarize myself with Scala's parsing combinators library:
import scala.util.parsing.combinator._
sealed abstract class Node
case class TextNode(val contents : String) extends Node
case class Element(
val tag : String,
val attributes : Map[String,Option[String]],
val children : Seq[Node]
) extends Node
object HTML extends RegexParsers {
val node: Parser[Node] = text | element
val text: Parser[TextNode] = """[^<]+""".r ^^ TextNode
val label: Parser[String] = """(\w[:\w]*)""".r
val value : Parser[String] = """("[^"]*"|\w+)""".r
val attribute : Parser[(String,Option[String])] = label ~ (
"=" ~> value ^^ Some[String] | "" ^^ { case _ => None }
) ^^ { case (k ~ v) => k -> v }
val element: Parser[Element] = (
("<" ~> label ~ rep(whiteSpace ~> attribute) <~ ">" )
~ rep(node) ~
("</" ~> label <~ ">")
) ^^ {
case (tag ~ attributes ~ children ~ close) => Element(tag, Map(attributes : _*), children)
}
}
What I'm realizing I want is some way to make sure my opening and closing tags match.
I think to do that, I need some sort of flatMap combinator ~ Parser[A] => (A => Parser[B]) => Parser[B],
so I can use the opening tag to construct the parser for the closing tag. But I don't see anything matching that signature in the library.
What's the proper way to do this?
You can write a method that takes a tag name and returns a parser for a closing tag with that name:
object HTML extends RegexParsers {
lazy val node: Parser[Node] = text | element
val text: Parser[TextNode] = """[^<]+""".r ^^ TextNode
val label: Parser[String] = """(\w[:\w]*)""".r
val value : Parser[String] = """("[^"]*"|\w+)""".r
val attribute : Parser[(String, Option[String])] = label ~ (
"=" ~> value ^^ Some[String] | "" ^^ { case _ => None }
) ^^ { case (k ~ v) => k -> v }
val openTag: Parser[String ~ Seq[(String, Option[String])]] =
"<" ~> label ~ rep(whiteSpace ~> attribute) <~ ">"
def closeTag(name: String): Parser[String] = "</" ~> name <~ ">"
val element: Parser[Element] = openTag.flatMap {
case (tag ~ attrs) =>
rep(node) <~ closeTag(tag) ^^
(children => Element(tag, attrs.toMap, children))
}
}
Note that you also need to make node lazy. Now you get a nice clean error message for unmatched tags:
scala> HTML.parse(HTML.element, "<a></b>")
res0: HTML.ParseResult[Element] =
[1.6] failure: `a' expected but `b' found
<a></b>
^
I've been a little more verbose than necessary for the sake of clarity. If you want concision you can skip the openTag and closeTag methods and write element like this, for example:
val element = "<" ~> label ~ rep(whiteSpace ~> attribute) <~ ">" >> {
case (tag ~ attrs) =>
rep(node) <~ "</" ~> tag <~ ">" ^^
(children => Element(tag, attrs.toMap, children))
}
I'm sure more concise versions would be possible, but in my opinion even this is edging toward unreadability.
There is a flatMap on Parser, and also an equivalent method named into and an operator >>, which might be more convenient aliases (flatMap is still needed when used in for comprehensions). It is indeed a valid way to do what you're looking for.
Alternatively, you can check that the tags match with ^?.
You are looking at the wrong place. It's a normal mistake, though. You want a method Parser[A] => (A => Parser[B]) => Parser[B], but you looked at the docs of Parsers, not Parser.
Look here.
There's a flatMap, also known as into or >>.

Scala parsing mutually recursive functions for SML

I'm trying to write a parser in Scala for SML with Tokens. It almost works the way I want it to work, except for the fact that this currently parses
let fun f x = r and fun g y in r end;
instead of
let fun f x = r and g y in r end;
How do I change my code so that it recognizes that it doesn't need a FunToken for the second function?
def parseDef:Def = {
currentToken match {
case ValToken => {
eat(ValToken);
val nme:String = currentToken match {
case IdToken(x) => {advance; x}
case _ => error("Expected a name after VAL.")
}
eat(EqualToken);
VAL(nme,parseExp)
}
case FunToken => {
eat(FunToken);
val fnme:String = currentToken match {
case IdToken(x) => {advance; x}
case _ => error("Expected a name after VAL.")
}
val xnme:String = currentToken match {
case IdToken(x) => {advance; x}
case _ => error("Expected a name after VAL.")
}
def parseAnd:Def = currentToken match {
case AndToken => {eat(AndToken); FUN(fnme,xnme,parseExp,parseAnd)}
case _ => NOFUN
}
FUN(fnme,xnme,parseExp,parseAnd)
}
case _ => error("Expected VAL or FUN.");
}
}
Just implement the right grammar. Instead of
def ::= "val" id "=" exp | fun
fun ::= "fun" id id "=" exp ["and" fun]
SML's grammar actually is
def ::= "val" id "=" exp | "fun" fun
fun ::= id id "=" exp ["and" fun]
Btw, I think there are other problems with your parsing of fun. AFAICS, you are not parsing any "=" in the fun case. Moreover, after an "and", you are not even parsing any identifiers, just the function body.
You could inject the FunToken back into your input stream with an "uneat" function. This is not the most elegant solution, but it's the one that requires the least modification of your current code.
def parseAnd:Def = currentToken match {
case AndToken => { eat(AndToken);
uneat(FunToken);
FUN(fnme,xnme,parseExp,parseAnd) }
case _ => NOFUN
}

Resources