How to embed Scala code inside a specially defined syntax? - parsing

I don't know if this info is relevant to the question, but I am learning Scala parser combinators.
Using some examples (in this master thesis) I was able to write a simple functional (in the sense that it is non imperative) programming language.
Is there a way to improve my parser/evaluator such that it could allow/evaluate input like this:
<%
import scala.<some package / classes>
import weka.<some package / classes>
%>
some DSL code (lambda calculus)
<%
System.out.println("asdasd");
J48 j48 = new J48();
%>
as input written in the guest language (DSL)?
Should I use reflection or something similar* to evaluate such input?
Is there some source code recommendation to study (may be groovy sources?)?
Maybe this is something similar: runtime compilation, but I am not sure this is the best alternative.
EDIT
Complete answer given bellow with "{" and "}". Maybe "{{" would be better.

It is the question as to what the meaning of such import statements should be.
Perhaps you start first with allowing references to java methods in your language (the Lambda Calculus, I guess?).
For example:
java.lang.System.out.println "foo"
If you have that, you can then add resolution of unqualified names like
println "foo"
But here comes the first problem: println exists in System.out and System.err, or, to be more correct: it is a method of PrintStream, and both System.err and System.out are PrintStreams.
Hence you would need some notion of Objects, Classes, Types, and so on to do it right.

I managed how to run Scala code embedded in my interpreted DSL.
Insertion of DSL vars into Scala code and recovering returning value comes as a bonus. :)
Minimal relevant code from parsing and interpreting until performing embedded Scala code run-time execution (Main Parser AST and Interpreter):
object Main extends App {
val ast = Parser1 parse "some dsl code here"
Interpreter eval ast
}
object Parser1 extends RegexParsers with ImplicitConversions {
import AST._
val separator = ";"
def parse(input: String): Expr = parseAll(program, input).get
type P[+T] = Parser[T]
def program = rep1sep(expr, separator) <~ separator ^^ Sequence
def expr: Parser[Expr] = (assign /*more calls here*/)
def scalacode: P[Expr] = "{" ~> rep(scala_text) <~ "}" ^^ {case l => Scalacode(l.flatten)}
def scala_text = text_no_braces ~ "$" ~ ident ~ text_no_braces ^^ {case a ~ b ~ c ~ d => List(a, b + c, d)}
//more rules here
def assign = ident ~ ("=" ~> atomic_expr) ^^ Assign
//more rules here
def atomic_expr = (
ident ^^ Var
//more calls here
| "(" ~> expr <~ ")"
| scalacode
| failure("expression expected")
)
def text_no_braces = """[a-zA-Z0-9\"\'\+\-\_!##%\&\(\)\[\]\/\?\:;\.\>\<\,\|= \*\\\n]*""".r //| fail("Scala code expected")
def ident = """[a-zA-Z]+[a-zA-Z0-9]*""".r
}
object AST {
sealed abstract class Expr
// more classes here
case class Scalacode(items: List[String]) extends Expr
case class Literal(v: Any) extends Expr
case class Var(name: String) extends Expr
}
object Interpreter {
import AST._
val env = collection.immutable.Map[VarName, VarValue]()
def run(code: String) = {
val code2 = "val res_1 = (" + code + ")"
interpret.interpret(code2)
val res = interpret.valueOfTerm("res_1")
if (res == None) Literal() else Literal(res.get)
}
class Context(private var env: Environment = initEnv) {
def eval(e: Expr): Any = e match {
case Scalacode(l: List[String]) => {
val r = l map {
x =>
if (x.startsWith("$")) {
eval(Var(x.drop(1)))
} else {
x
}
}
eval(run(r.mkString))
}
case Assign(id, expr) => env += (id -> eval(expr))
//more pattern matching here
case Literal(v) => v
case Var(id) => {
env getOrElse(id, sys.error("Undefined " + id))
}
}
}
}

Related

Errors and failures in Scala Parser Combinators

I would like to implement a parser for some defined language using Scala Parser Combinators. However, the software that will compile the language does not implements all the language's feature, so I would like to fail if these features are used. I tried to forge a small example below :
object TestFail extends JavaTokenParsers {
def test: Parser[String] =
"hello" ~ "world" ^^ { case _ => ??? } |
"hello" ~ ident ^^ { case "hello" ~ id => s"hi, $id" }
}
I.e., the parser succeeds on "hello" + some identifier, but fails if the identifier is "world". I see that there exist fail() and err() parsers in the Parsers class, but I cannot figure out how to use them, as they return Parser[Nothing] instead of a String. The documentation does not seem to cover this use caseā€¦
In this case you want err, not failure, since if the first parser in a disjunction fails you'll just move on to the second, which isn't what you want.
The other issue is that ^^ is the equivalent of map, but you want flatMap, since err("whatever") is a Parser[Nothing], not a Nothing. You could use the flatMap method on Parser, but in this context it's more idiomatic to use the (completely equivalent) >> operator:
object TestFail extends JavaTokenParsers {
def test: Parser[String] =
"hello" ~> "world" >> (x => err(s"Can't say hello to the $x!")) |
"hello" ~ ident ^^ { case "hello" ~ id => s"hi, $id" }
}
Or, a little more simply:
object TestFail extends JavaTokenParsers {
def test: Parser[String] =
"hello" ~ "world" ~> err(s"Can't say hello to the world!") |
"hello" ~ ident ^^ { case "hello" ~ id => s"hi, $id" }
}
Either approach should do what you want.
You could use ^? method:
object TestFail extends JavaTokenParsers {
def test: Parser[String] =
"hello" ~> ident ^? (
{ case id if id != "world" => s"hi, $id" },
s => s"Should not use '$s' here."
)
}

Using regex in StandardTokenParsers

I'm trying to use regex in my StandardTokenParsers based parser. For that, I've subclassed StdLexical as follows:
class CustomLexical extends StdLexical{
def regex(r: Regex): Parser[String] = new Parser[String] {
def apply(in:Input) = r.findPrefixMatchOf(in.source.subSequence(in.offset, in.source.length)) match {
case Some(matched) => Success(in.source.subSequence(in.offset, in.offset + matched.end).toString,
in.drop(matched.end))
case None => Failure("string matching regex `" + r + "' expected but " + in.first + " found", in)
}
}
override def token: Parser[Token] =
( regex("[a-zA-Z]:\\\\[\\w\\\\?]* | /[\\w/]*".r) ^^ { StringLit(_) }
| identChar ~ rep( identChar | digit ) ^^ { case first ~ rest => processIdent(first :: rest mkString "") }
| ...
But I'm a little confused on how I would define a Parser that takes advantage of this. I have a parser defined as:
def mTargetFolder: Parser[String] = "TargetFolder" ~> "=" ~> mFilePath
which should be used to identify valid file paths. I tried then:
def mFilePath: Parser[String] = "[a-zA-Z]:\\\\[\\w\\\\?]* | /[\\w/]*".r
But this is obviously not right. I get an error:
scala: type mismatch;
found : scala.util.matching.Regex
required: McfpDSL.this.Parser[String]
def mFilePath: Parser[String] = "[a-zA-Z]:\\\\[\\w\\\\?]* | /[\\w/]*".r
^
What is the proper way of using the extension made on my StdLexical subclass?
If you really want to use token based parsing, and reuse StdLexical, I would advise to update the syntax for "TargetFolder" so that the value after the equal sign is a proper string literal. Or in other words, make it so the path should be enclosed with quotes. From that point you don't need to extends StdLexical anymore.
Then comes the problem of converting a regexp to a parser. Scala already has RegexParsers for this (which implicitly converts a regexp to a Parser[String]), but unfortunately that's not what you want here because it works on streams of Char (type Elem = Char in RegexParsers) while you are working on a sttream of tokens.
So we will indeed have to define our own conversion from Regex to Parser[String] (but at the syntactic level rather than lexical level, or in other words in the token parser).
import scala.util.parsing.combinator.syntactical._
import scala.util.matching.Regex
import scala.util.parsing.input._
object MyParser extends StandardTokenParsers {
import lexical.StringLit
def regexStringLit(r: Regex): Parser[String] = acceptMatch(
"string literal matching regex " + r,
{ case StringLit( s ) if r.unapplySeq(s).isDefined => s }
)
lexical.delimiters += "="
lexical.reserved += "TargetFolder"
lazy val mTargetFolder: Parser[String] = "TargetFolder" ~> "=" ~> mFilePath
lazy val mFilePath: Parser[String] = regexStringLit("([a-zA-Z]:\\\\[\\w\\\\?]*)|(/[\\w/]*)".r)
def parseTargetFolder( s: String ) = { mTargetFolder( new lexical.Scanner( s ) ) }
}
Example:
scala> MyParser.parseTargetFolder("""TargetFolder = "c:\Dir1\Dir2" """)
res12: MyParser.ParseResult[String] = [1.31] parsed: c:\Dir1\Dir2
scala> MyParser.parseTargetFolder("""TargetFolder = "/Dir1/Dir2" """)
res13: MyParser.ParseResult[String] = [1.29] parsed: /Dir1/Dir2
scala> MyParser.parseTargetFolder("""TargetFolder = "Hello world" """)
res14: MyParser.ParseResult[String] =
[1.16] failure: identifier matching regex ([a-zA-Z]:\\[\w\\?]*)|(/[\w/]*) expected
TargetFolder = "Hello world"
^
Note that also fixed your "target folder" regexp here, you had missing parens around the two alternative, plus unneeded spaces.
Just call your function regex when you want to get a Parser[String] from a Regex:
def p: Parser[String] = regex("".r)
Or make regex implicit to let the compiler call it automatically for you:
implicit def regex(r: Regex): Parser[String] = ...
// =>
def p: Parser[String] = "".r

Creating a recursive data structure using parser combinators in scala

I'm a beginner in scala, working on S99 to try to learn scala. One of the problems involves converting from a string to a tree data structure. I can do it "manually", by I also want to see how to do it using Scala's parser combinator library.
The data structure for the tree is
sealed abstract class Tree[+T]
case class Node[+T](value: T, left: Tree[T], right: Tree[T]) extends Tree[T] {
override def toString = "T(" + value.toString + " " + left.toString + " " + right.toString + ")"
}
case object End extends Tree[Nothing] {
override def toString = "."
}
object Node {
def apply[T](value: T): Node[T] = Node(value, End, End)
}
And the input is supposed to be a string, like this: a(b(d,e),c(,f(g,)))
I can parse the string using something like
trait Tree extends JavaTokenParsers{
def leaf: Parser[Any] = ident
def child: Parser[Any] = node | leaf | ""
def node: Parser[Any] = ident~"("~child~","~child~")" | leaf
}
But how can I use the parsing library to build the tree? I know that I can use ^^ to convert, for example, some string into an integer. My confusing comes from needed to 'know' the left and the right subtrees when creating an instance of Node. How can I do that, or is that a sign that I want to do something different?
Am I better off taking the thing the parser returns ((((((a~()~(((((b~()~d)~,)~e)~)))~,)~(((((c~()~)~,)~(((((f~()~g)~,)~)~)))~)))~)) for the example input above), and building the tree based on that, rather than use parser operators like ^^ or ^^^ to build the tree directly?
It is possible to do this cleanly with ^^, and you're fairly close:
object TreeParser extends JavaTokenParsers{
def leaf: Parser[Node[String]] = ident ^^ (Node(_))
def child: Parser[Tree[String]] = node | leaf | "" ^^ (_ => End)
def node: Parser[Tree[String]] =
ident ~ ("(" ~> child) ~ ("," ~> child <~ ")") ^^ {
case v ~ l ~ r => Node(v, l, r)
} | leaf
}
And now:
scala> TreeParser.parseAll(TreeParser.node, "a(b(d,e),c(,f(g,)))").get
res0: Tree[String] = T(a T(b T(d . .) T(e . .)) T(c . T(f T(g . .) .)))
In my opinion the easiest way to approach this kind of problem is to type the parser methods with the results you want, and then add the appropriate mapping operations with ^^ until the compiler is happy.

Scala parser combinators: retrieving the original string that the parser consumed

So i have a bunch of parsers like this:
object MyParser extends RegexParsers{
override val skipWhitespace = false
def blockLine = ((id ~ args) <~ ":") ~ ".*?".r ^^ {
case (blockID ~ argList) ~ rest => ???
}
def args = (("[" ~> rep1sep(arg, ", ") <~ "]")?) ^^ {
case Some(argList) =>
argList.zipWithIndex.map{
case ((Some(k), v), index) => k -> v
case ((None, v), index) => "arg" + index -> v
}
case None => List()
}
def arg = ((id <~ "=")?) ~ argtext ^^ {
case Some(name) ~ value => Some(name) -> value.toString()
case None ~ value=> None -> value
}
def argtext = "[^\\[\\],]+".r
def id = "[a-zA-Z]*".r
... many other parsers not shown...
}
Essentially, I want to re-use the parsers id and args in blockLine, but rather than getting the nested tree of List()s and ~s, I want to get back the original string that was matched. The purpose of this is doing some smart text-preprocessing (using the same parsers that I will use later for the actual parsing) to insert some text in the middle of the line. Something like:
def blockLine = (rawText(id ~ args) <~ ":") ~ ".*?".r ^^ {
case first ~ rest => first + "{" + rest
}
The higher purpose of the preprocessor is to go through and convert indentation-delimited blocks into curly-braces delimited blocks, so I can run the pre-processed file through a normal parser later. Is there any easy way to do this?

scala parsing with nested parens

Trying to parse an nested expressions like GroupParser.parse("{{a}{{c}{d}}}")
After many hours i have now following snipplet that parse {a} well, but fails with
[1.5] failure: ``}'' expected but `{' found
{{a}{{b}{c}}}
^
sealed abstract class Expr
case class ValueNode(value:String) extends Expr
object GroupParser extends StandardTokenParsers {
lexical.delimiters ++= List("{","}")
def vstring = ident ^^ { case s => ValueNode(s) }
def expr = ( vstring | parens )
def parens:Parser[Expr] = "{" ~> expr <~ "}"
def parse(s:String) = {
val tokens = new lexical.Scanner(s)
phrase(expr)(tokens)
}
}
any hints?
The problem isn't nesting, it's sequencing. Your grammar would allow arbitrary nesting of expressions inside curlies, but doesn't say that an expression can be sequenced so the parser can't handle {a} followed immediately by {{b}{c}}. You can code sequencing using explicit recursion in your grammar or by using one of the rep variants in http://www.scala-lang.org/api/current/scala/util/parsing/combinator/Parsers.html
Can expressions be repeated multiple times? If so, this would work:
def expr = ( vstring | parens )+
However, it is not clear what is your grammar, or why would your example be acceptable.
This parses the two examples you gave:
import scala.util.parsing.combinator.syntactical._
sealed abstract class Expr
case class ValueNode(value:String) extends Expr
case class ValueListNode(value:List[Expr]) extends Expr
object GroupParser extends StandardTokenParsers {
lexical.delimiters ++= List("{","}")
def vstring = ident ^^ { case s => ValueNode(s) }
def parens:Parser[Expr] = "{" ~> ( expr ) <~ "}"
def expr = vstring | parens
def exprList:Parser[Expr] = "{" ~> rep1( expr | exprList ) <~ "}" ^^ {
case l => {
ValueListNode(l)
}
}
def anyExpr = expr | exprList
def parse(s:String) = {
val tokens = new lexical.Scanner(s)
phrase(anyExpr)(tokens)
}
def test(s: String) = {
parse(s) match {
case Success(tree, _) =>
println("Tree: " + tree)
case e: NoSuccess => Console.err.println(e)
}
}
def main(args: Array[String]) = {
test("{a}")
test("{{a}}")
test("{{a}{{b}{c}}}")
}
}
And succeeds with output:
Tree: ValueNode(a)
Tree: ValueNode(a)
Tree: ValueListNode(List(ValueNode(a), ValueListNode(List(ValueNode(b), ValueNode(c)))))

Resources