The following snippet of parser combinator demonstrates am aim of generalising binary comparison ops like > by using Ordered[T]. Gt seems to accomplish this at the AST level but I'm having trouble extending this concept.
The intGt parser works but is it possible to generalise this around Ordered[T] such that we don't need to write a second parser for floatGt (and hence one for all supported orderable types * all supported ops - no thanks).
object DSL extends JavaTokenParsers {
// AST
abstract class Expr[+T] { def eval: T }
case class Literal[T](t: T) extends Expr[T] { def eval = t }
case class Gt[T <% Ordered[T]](l: Expr[T], r: Expr[T]) extends Expr[Boolean] {
def eval = l.eval > r.eval // view-bound implicitly wraps eval result as Ordered[T]
}
// Parsers
lazy val intExpr: Parser[Expr[Int]] = wholeNumber ^^ { case x => Literal(x.toInt) }
lazy val floatExpr: Parser[Expr[Float]] = decimalNumber ^^ { case x => Literal(x.toFloat) }
lazy val intGt: Parser[Expr[Boolean]] = intExpr ~ (">" ~> intExpr) ^^ { case l ~ r => Gt(l, r) }
}
I tried playing around and this is the best I could come up with in the time I had:
import scala.util.parsing.combinator.JavaTokenParsers
object DSL extends JavaTokenParsers {
// AST
abstract class Expr[+T] { def eval: T }
case class Literal[T](t: T) extends Expr[T] { def eval = t }
case class BinOp[T,U](
val l : Expr[T],
val r : Expr[T],
val evalOp : (T, T) => U) extends Expr[U] {
def eval = evalOp(l.eval, r.eval)
}
case class OrderOp[O <% Ordered[O]](symbol : String, op : (O, O) => Boolean)
def gtOp[O <% Ordered[O]] = OrderOp[O](">", _ > _)
def gteOp[O <% Ordered[O]] = OrderOp[O](">=", _ >= _)
def ltOp[O <% Ordered[O]] = OrderOp[O]("<", _ < _)
def lteOp[O <% Ordered[O]] = OrderOp[O]("<=", _ <= _)
def eqOp[O <% Ordered[O]] = OrderOp[O]("==", _.compareTo(_) == 0)
def ops[O <% Ordered[O]] =
Seq(gtOp[O], gteOp[O], ltOp[O], lteOp[O], eqOp[O])
def orderExpr[O <% Ordered[O]](
subExpr : Parser[Expr[O]],
orderOp : OrderOp[O])
: Parser[Expr[Boolean]] =
subExpr ~ (orderOp.symbol ~> subExpr) ^^
{ case l ~ r => BinOp(l, r, orderOp.op) }
// Parsers
lazy val intExpr: Parser[Expr[Int]] =
wholeNumber ^^ { case x => Literal(x.toInt) }
lazy val floatExpr: Parser[Expr[Float]] =
decimalNumber ^^ { case x => Literal(x.toFloat) }
lazy val intOrderOps : Parser[Expr[Boolean]] =
ops[Int].map(orderExpr(intExpr, _)).reduce(_ | _)
lazy val floatOrderOps : Parser[Expr[Boolean]] =
ops[Float].map(orderExpr(floatExpr, _)).reduce(_ | _)
}
Essentially, I defined a small case class OrderOp that relates a string representing
an ordering operation to a function which will evaluate that operation. I then defined a function ops capable of creating a Seq[OrderOp] of all such ordering operations for a given Orderable type. These operations can then be turned into parsers using orderExpr, which takes the sub expression parser and the operation. This is mapped over all the ordering operations for your int and float types.
Some issues with this approach:
There is only one node type in the AST type hierarchy for all binary operations. This isn't a problem if all you are ever doing is evaluating, but if you ever wanted to do rewriting operations (eliminating obvious tautologies or contradictions, for instance) then there is insufficient information to do this with the current definition of BinOp.
I still needed to map orderExpr for each distinct type. There may be a way to fix this, but I ran out of time.
orderExpr expects the left and right subexpressions to be parsed with the same parser.
Related
I'm just fooling about and strangely found it a bit tricky to parse nested brackets in a simple recursive function.
For example, if the program's purpose it to lookup user details, it may go from {{name surname} age} to {Bob Builder age} and then to Bob Builder 20.
Here is a mini-program for summing totals in curly brackets that demonstrates the concept.
// Parses string recursively by eliminating brackets
def parse(s: String): String = {
if (!s.contains("{")) s
else {
parse(resolvePair(s))
}
}
// Sums one pair and returns the string, starting at deepest nested pair
// e.g.
// {2+10} lollies and {3+{4+5}} peanuts
// should return:
// {2+10} lollies and {3+9} peanuts
def resolvePair(s: String): String = {
??? // Replace the deepest nested pair with it's sumString result
}
// Sums values in a string, returning the result as a string
// e.g. sumString("3+8") returns "11"
def sumString(s: String): String = {
val v = s.split("\\+")
v.foldLeft(0)(_.toInt + _.toInt).toString
}
// Should return "12 lollies and 12 peanuts"
parse("{2+10} lollies and {3+{4+5}} peanuts")
Any ideas to a clean bit of code that could replace the ??? would be great. It's mostly out of curiosity that I'm searching for an elegant solution to this problem.
Parser combinators can handle this kind of situation:
import scala.util.parsing.combinator.RegexParsers
object BraceParser extends RegexParsers {
override def skipWhitespace = false
def number = """\d+""".r ^^ { _.toInt }
def sum: Parser[Int] = "{" ~> (number | sum) ~ "+" ~ (number | sum) <~ "}" ^^ {
case x ~ "+" ~ y => x + y
}
def text = """[^{}]+""".r
def chunk = sum ^^ {_.toString } | text
def chunks = rep1(chunk) ^^ {_.mkString} | ""
def apply(input: String): String = parseAll(chunks, input) match {
case Success(result, _) => result
case failure: NoSuccess => scala.sys.error(failure.msg)
}
}
Then:
BraceParser("{2+10} lollies and {3+{4+5}} peanuts")
//> res0: String = 12 lollies and 12 peanuts
There is some investment before getting comfortable with parser combinators but I think it is really worth it.
To help you decipher the syntax above:
regular expression and strings have implicit conversions to create primitive parsers with strings results, they have type Parser[String].
the ^^ operator allows to apply a function to the parsed elements
it can convert a Parser[String] into a Parser[Int] by doing ^^ {_.toInt}
Parser is a monad and Parser[T].^^(f) is equivalent to Parser[T].map(f)
the ~, ~> and <~ requires some inputs to be in a certain sequence
the ~> and <~ drop one side of the input out of the result
the case a ~ b allows to pattern match the results
Parser is a monad and (p ~ q) ^^ { case a ~ b => f(a, b) } is equivalent to for (a <- p; b <- q) yield (f(a, b))
(p <~ q) ^^ f is equivalent to for (a <- p; _ <- q) yield f(a)
rep1 is a repetition of 1 or more element
| tries to match an input with the parser on its left and if failing it will try the parser on the right
How about
def resolvePair(s: String): String = {
val open = s.lastIndexOf('{')
val close = s.indexOf('}', open)
if((open >= 0) && (close > open)) {
val (a,b) = s.splitAt(open+1)
val (c,d) = b.splitAt(close-open-1)
resolvePair(a.dropRight(1)+sumString(c).toString+d.drop(1))
} else
s
}
I know it's ugly but I think it works fine.
I don't know if this info is relevant to the question, but I am learning Scala parser combinators.
Using some examples (in this master thesis) I was able to write a simple functional (in the sense that it is non imperative) programming language.
Is there a way to improve my parser/evaluator such that it could allow/evaluate input like this:
<%
import scala.<some package / classes>
import weka.<some package / classes>
%>
some DSL code (lambda calculus)
<%
System.out.println("asdasd");
J48 j48 = new J48();
%>
as input written in the guest language (DSL)?
Should I use reflection or something similar* to evaluate such input?
Is there some source code recommendation to study (may be groovy sources?)?
Maybe this is something similar: runtime compilation, but I am not sure this is the best alternative.
EDIT
Complete answer given bellow with "{" and "}". Maybe "{{" would be better.
It is the question as to what the meaning of such import statements should be.
Perhaps you start first with allowing references to java methods in your language (the Lambda Calculus, I guess?).
For example:
java.lang.System.out.println "foo"
If you have that, you can then add resolution of unqualified names like
println "foo"
But here comes the first problem: println exists in System.out and System.err, or, to be more correct: it is a method of PrintStream, and both System.err and System.out are PrintStreams.
Hence you would need some notion of Objects, Classes, Types, and so on to do it right.
I managed how to run Scala code embedded in my interpreted DSL.
Insertion of DSL vars into Scala code and recovering returning value comes as a bonus. :)
Minimal relevant code from parsing and interpreting until performing embedded Scala code run-time execution (Main Parser AST and Interpreter):
object Main extends App {
val ast = Parser1 parse "some dsl code here"
Interpreter eval ast
}
object Parser1 extends RegexParsers with ImplicitConversions {
import AST._
val separator = ";"
def parse(input: String): Expr = parseAll(program, input).get
type P[+T] = Parser[T]
def program = rep1sep(expr, separator) <~ separator ^^ Sequence
def expr: Parser[Expr] = (assign /*more calls here*/)
def scalacode: P[Expr] = "{" ~> rep(scala_text) <~ "}" ^^ {case l => Scalacode(l.flatten)}
def scala_text = text_no_braces ~ "$" ~ ident ~ text_no_braces ^^ {case a ~ b ~ c ~ d => List(a, b + c, d)}
//more rules here
def assign = ident ~ ("=" ~> atomic_expr) ^^ Assign
//more rules here
def atomic_expr = (
ident ^^ Var
//more calls here
| "(" ~> expr <~ ")"
| scalacode
| failure("expression expected")
)
def text_no_braces = """[a-zA-Z0-9\"\'\+\-\_!##%\&\(\)\[\]\/\?\:;\.\>\<\,\|= \*\\\n]*""".r //| fail("Scala code expected")
def ident = """[a-zA-Z]+[a-zA-Z0-9]*""".r
}
object AST {
sealed abstract class Expr
// more classes here
case class Scalacode(items: List[String]) extends Expr
case class Literal(v: Any) extends Expr
case class Var(name: String) extends Expr
}
object Interpreter {
import AST._
val env = collection.immutable.Map[VarName, VarValue]()
def run(code: String) = {
val code2 = "val res_1 = (" + code + ")"
interpret.interpret(code2)
val res = interpret.valueOfTerm("res_1")
if (res == None) Literal() else Literal(res.get)
}
class Context(private var env: Environment = initEnv) {
def eval(e: Expr): Any = e match {
case Scalacode(l: List[String]) => {
val r = l map {
x =>
if (x.startsWith("$")) {
eval(Var(x.drop(1)))
} else {
x
}
}
eval(run(r.mkString))
}
case Assign(id, expr) => env += (id -> eval(expr))
//more pattern matching here
case Literal(v) => v
case Var(id) => {
env getOrElse(id, sys.error("Undefined " + id))
}
}
}
}
So i have a bunch of parsers like this:
object MyParser extends RegexParsers{
override val skipWhitespace = false
def blockLine = ((id ~ args) <~ ":") ~ ".*?".r ^^ {
case (blockID ~ argList) ~ rest => ???
}
def args = (("[" ~> rep1sep(arg, ", ") <~ "]")?) ^^ {
case Some(argList) =>
argList.zipWithIndex.map{
case ((Some(k), v), index) => k -> v
case ((None, v), index) => "arg" + index -> v
}
case None => List()
}
def arg = ((id <~ "=")?) ~ argtext ^^ {
case Some(name) ~ value => Some(name) -> value.toString()
case None ~ value=> None -> value
}
def argtext = "[^\\[\\],]+".r
def id = "[a-zA-Z]*".r
... many other parsers not shown...
}
Essentially, I want to re-use the parsers id and args in blockLine, but rather than getting the nested tree of List()s and ~s, I want to get back the original string that was matched. The purpose of this is doing some smart text-preprocessing (using the same parsers that I will use later for the actual parsing) to insert some text in the middle of the line. Something like:
def blockLine = (rawText(id ~ args) <~ ":") ~ ".*?".r ^^ {
case first ~ rest => first + "{" + rest
}
The higher purpose of the preprocessor is to go through and convert indentation-delimited blocks into curly-braces delimited blocks, so I can run the pre-processed file through a normal parser later. Is there any easy way to do this?
I am using Scala's combinator parser as follows:
def a = b ~ c ^^ { case x ~ y => A(x,y) }
def b = ... { B() }
def c = ... { C() }
now I have a feature change that change within the parsing of the reference of previously parsed B to be a val in C. So C's constructor is something like:
C(ref:B)
I can imagine, the only way to achieve this is a dirty patch work by assigning the instance of parsed B object to def c in between the parsing of a. Something like following:
var temp:B = null
def a = ( b ^^ { case x => temp = x } )
~ c(temp) ^^ {case x ~ y => A(x,y} )
Is there a standard, clean way of doing this? The definition of a can't be broken, it is used in many places in rest of the code.
The other solution is to use var instead of val and have following:
def a = (b ~ c ) ^^ { case x ~ y => y.ref = c ; A(x,y) }
But this is also not acceptable as it would "work" now, but it would involve extra effort and boiler-plate code in future development.
I've not tested this code, as this is a small part and all the changes require a lot of effort so want the expert opinion first.
Without changing the definition of a, there is no way to do this cleanly. The ~ combinator produces a new Parser which applies b and c in sequence, then tuples (well, logically tuples) up the results and returns them as its result. The key point is that the application of c is not a function of the output of b, thus there is nothing you can do to get the results of b inside the application of c.
What I would do is add a new combinator which does what you want. I'm not feeling particularly creative name-wise, but I think this should give you a rough idea:
implicit def cleverParserSyntax[A](left: Parser[A]) = new {
def ~%[B](right: A => Parser[B]): Parser[A ~ B] = for {
lr <- left
rr <- right(lr)
} yield new ~(lr, rr)
}
def a = b ~% c ^^ { case x ~ y => A(x,y) }
def b = ... { B() }
def c(res: B) = ... { C(res) }
I'm not sure if I understand the problem correctly but if C depends on B why not express this in functional way?
case class B(...)
case class C(b: B, ...)
case class A(b: B, c: C)
def b: Parser[B] = ... ^^ { B(...) }
def c: Parser[B => C] = ... ^^ { C(_, ...) }
def a: Parser[A] = b ~ c ^^ { A(b, c(b)) }
This way your problem is solved and you have your dependencies expressed explicitly and in concise way.
I'd do this:
case class B(x: String)
case class C(b: B, x: String)
case class A(b: B, c: C)
class MyParser extends RegexParsers {
def a = b >> c ^^ { case x ~ y => A(x, y) }
def b = "\\w+".r ^^ B
def c(b: B) = "\\d+".r ^^ (x => new ~(b, C(b, x)))
}
Now, if B happens much before C, things get more complicated. I suggest, if things get that hairy, to search for the paper about Scala parser, which goes into a lot of very advanced features.
I'm trying to figure out how to terminate a repetition of words using a keyword. An example:
class CAQueryLanguage extends JavaTokenParsers {
def expression = ("START" ~ words ~ "END") ^^ { x =>
println("expression: " + x);
x
}
def words = rep(word) ^^ { x =>
println("words: " + x)
x
}
def word = """\w+""".r
}
When I execute
val caql = new CAQueryLanguage
caql.parseAll(caql.expression, "START one two END")
It prints words: List(one, two, END), indicating the words parser has consumed the END keyword in my input, leaving the expression parser unable to match. I would like END to not be matched by words, which will allow expression to successfully parse.
Is this what you are looking for?
import scala.util.parsing.combinator.syntactical._
object CAQuery extends StandardTokenParsers {
lexical.reserved += ("START", "END")
lexical.delimiters += (" ")
def query:Parser[Any]= "START" ~> rep1(ident) <~ "END"
def parse(s:String) = {
val tokens = new lexical.Scanner(s)
phrase(query)(tokens)
}
}
println(CAQuery.parse("""START a END""")) //List(a)
println(CAQuery.parse("""START a b c END""")) //List(a, b, c)
If you would like more details, you can check out this blog post