I'm just fooling about and strangely found it a bit tricky to parse nested brackets in a simple recursive function.
For example, if the program's purpose it to lookup user details, it may go from {{name surname} age} to {Bob Builder age} and then to Bob Builder 20.
Here is a mini-program for summing totals in curly brackets that demonstrates the concept.
// Parses string recursively by eliminating brackets
def parse(s: String): String = {
if (!s.contains("{")) s
else {
parse(resolvePair(s))
}
}
// Sums one pair and returns the string, starting at deepest nested pair
// e.g.
// {2+10} lollies and {3+{4+5}} peanuts
// should return:
// {2+10} lollies and {3+9} peanuts
def resolvePair(s: String): String = {
??? // Replace the deepest nested pair with it's sumString result
}
// Sums values in a string, returning the result as a string
// e.g. sumString("3+8") returns "11"
def sumString(s: String): String = {
val v = s.split("\\+")
v.foldLeft(0)(_.toInt + _.toInt).toString
}
// Should return "12 lollies and 12 peanuts"
parse("{2+10} lollies and {3+{4+5}} peanuts")
Any ideas to a clean bit of code that could replace the ??? would be great. It's mostly out of curiosity that I'm searching for an elegant solution to this problem.
Parser combinators can handle this kind of situation:
import scala.util.parsing.combinator.RegexParsers
object BraceParser extends RegexParsers {
override def skipWhitespace = false
def number = """\d+""".r ^^ { _.toInt }
def sum: Parser[Int] = "{" ~> (number | sum) ~ "+" ~ (number | sum) <~ "}" ^^ {
case x ~ "+" ~ y => x + y
}
def text = """[^{}]+""".r
def chunk = sum ^^ {_.toString } | text
def chunks = rep1(chunk) ^^ {_.mkString} | ""
def apply(input: String): String = parseAll(chunks, input) match {
case Success(result, _) => result
case failure: NoSuccess => scala.sys.error(failure.msg)
}
}
Then:
BraceParser("{2+10} lollies and {3+{4+5}} peanuts")
//> res0: String = 12 lollies and 12 peanuts
There is some investment before getting comfortable with parser combinators but I think it is really worth it.
To help you decipher the syntax above:
regular expression and strings have implicit conversions to create primitive parsers with strings results, they have type Parser[String].
the ^^ operator allows to apply a function to the parsed elements
it can convert a Parser[String] into a Parser[Int] by doing ^^ {_.toInt}
Parser is a monad and Parser[T].^^(f) is equivalent to Parser[T].map(f)
the ~, ~> and <~ requires some inputs to be in a certain sequence
the ~> and <~ drop one side of the input out of the result
the case a ~ b allows to pattern match the results
Parser is a monad and (p ~ q) ^^ { case a ~ b => f(a, b) } is equivalent to for (a <- p; b <- q) yield (f(a, b))
(p <~ q) ^^ f is equivalent to for (a <- p; _ <- q) yield f(a)
rep1 is a repetition of 1 or more element
| tries to match an input with the parser on its left and if failing it will try the parser on the right
How about
def resolvePair(s: String): String = {
val open = s.lastIndexOf('{')
val close = s.indexOf('}', open)
if((open >= 0) && (close > open)) {
val (a,b) = s.splitAt(open+1)
val (c,d) = b.splitAt(close-open-1)
resolvePair(a.dropRight(1)+sumString(c).toString+d.drop(1))
} else
s
}
I know it's ugly but I think it works fine.
Related
I don't know if this info is relevant to the question, but I am learning Scala parser combinators.
Using some examples (in this master thesis) I was able to write a simple functional (in the sense that it is non imperative) programming language.
Is there a way to improve my parser/evaluator such that it could allow/evaluate input like this:
<%
import scala.<some package / classes>
import weka.<some package / classes>
%>
some DSL code (lambda calculus)
<%
System.out.println("asdasd");
J48 j48 = new J48();
%>
as input written in the guest language (DSL)?
Should I use reflection or something similar* to evaluate such input?
Is there some source code recommendation to study (may be groovy sources?)?
Maybe this is something similar: runtime compilation, but I am not sure this is the best alternative.
EDIT
Complete answer given bellow with "{" and "}". Maybe "{{" would be better.
It is the question as to what the meaning of such import statements should be.
Perhaps you start first with allowing references to java methods in your language (the Lambda Calculus, I guess?).
For example:
java.lang.System.out.println "foo"
If you have that, you can then add resolution of unqualified names like
println "foo"
But here comes the first problem: println exists in System.out and System.err, or, to be more correct: it is a method of PrintStream, and both System.err and System.out are PrintStreams.
Hence you would need some notion of Objects, Classes, Types, and so on to do it right.
I managed how to run Scala code embedded in my interpreted DSL.
Insertion of DSL vars into Scala code and recovering returning value comes as a bonus. :)
Minimal relevant code from parsing and interpreting until performing embedded Scala code run-time execution (Main Parser AST and Interpreter):
object Main extends App {
val ast = Parser1 parse "some dsl code here"
Interpreter eval ast
}
object Parser1 extends RegexParsers with ImplicitConversions {
import AST._
val separator = ";"
def parse(input: String): Expr = parseAll(program, input).get
type P[+T] = Parser[T]
def program = rep1sep(expr, separator) <~ separator ^^ Sequence
def expr: Parser[Expr] = (assign /*more calls here*/)
def scalacode: P[Expr] = "{" ~> rep(scala_text) <~ "}" ^^ {case l => Scalacode(l.flatten)}
def scala_text = text_no_braces ~ "$" ~ ident ~ text_no_braces ^^ {case a ~ b ~ c ~ d => List(a, b + c, d)}
//more rules here
def assign = ident ~ ("=" ~> atomic_expr) ^^ Assign
//more rules here
def atomic_expr = (
ident ^^ Var
//more calls here
| "(" ~> expr <~ ")"
| scalacode
| failure("expression expected")
)
def text_no_braces = """[a-zA-Z0-9\"\'\+\-\_!##%\&\(\)\[\]\/\?\:;\.\>\<\,\|= \*\\\n]*""".r //| fail("Scala code expected")
def ident = """[a-zA-Z]+[a-zA-Z0-9]*""".r
}
object AST {
sealed abstract class Expr
// more classes here
case class Scalacode(items: List[String]) extends Expr
case class Literal(v: Any) extends Expr
case class Var(name: String) extends Expr
}
object Interpreter {
import AST._
val env = collection.immutable.Map[VarName, VarValue]()
def run(code: String) = {
val code2 = "val res_1 = (" + code + ")"
interpret.interpret(code2)
val res = interpret.valueOfTerm("res_1")
if (res == None) Literal() else Literal(res.get)
}
class Context(private var env: Environment = initEnv) {
def eval(e: Expr): Any = e match {
case Scalacode(l: List[String]) => {
val r = l map {
x =>
if (x.startsWith("$")) {
eval(Var(x.drop(1)))
} else {
x
}
}
eval(run(r.mkString))
}
case Assign(id, expr) => env += (id -> eval(expr))
//more pattern matching here
case Literal(v) => v
case Var(id) => {
env getOrElse(id, sys.error("Undefined " + id))
}
}
}
}
I'm trying to split input by some keywords without delimiter like white-space.
object MyParser extends JavaTokenParsers {
def expr = (text | keyword).+
def text = ".+".r ^^ ("'"+_+"'")
def keyword = "ID".r ^^ ("["+_+"]")
}
val p = MyParser
p.parse(p.expr, "fooIDbar") match {
case p.Success(r, _) => r foreach print
case x => println(x.toString)
}
This outputs as below.
>> 'hogeIDfuga'
But I really want to do is like this.
>> 'hoge'[ID]'fuga'
It seems text engulfs all the characters. I tried to express [text does't contain keyword], but I could't. How to express it in regex or scala parser? or any other solutions?
I have seen some posts 1 2, but they don't work in my case.
First, keyword is a constant word so you don't need a regex, a plain string is enough.
Second, a text is some string that doesn't contain a keyword, not any string. Try this:
import util.parsing.combinator._
object MyParser extends JavaTokenParsers {
def expr = (text | keyword).+
def text = """((?!ID).)+""".r ^^ ("'"+_+"'")
def keyword = "ID" ^^ ("["+_+"]")
}
val p = MyParser
p.parse(p.expr, "fooIDbar") match {
case p.Success(r, _) => r foreach print
case x => println(x.toString)
}
As for the trick of writing a regex that not matching something, read this.
I'm playing around with a toy HTML parser, to help familiarize myself with Scala's parsing combinators library:
import scala.util.parsing.combinator._
sealed abstract class Node
case class TextNode(val contents : String) extends Node
case class Element(
val tag : String,
val attributes : Map[String,Option[String]],
val children : Seq[Node]
) extends Node
object HTML extends RegexParsers {
val node: Parser[Node] = text | element
val text: Parser[TextNode] = """[^<]+""".r ^^ TextNode
val label: Parser[String] = """(\w[:\w]*)""".r
val value : Parser[String] = """("[^"]*"|\w+)""".r
val attribute : Parser[(String,Option[String])] = label ~ (
"=" ~> value ^^ Some[String] | "" ^^ { case _ => None }
) ^^ { case (k ~ v) => k -> v }
val element: Parser[Element] = (
("<" ~> label ~ rep(whiteSpace ~> attribute) <~ ">" )
~ rep(node) ~
("</" ~> label <~ ">")
) ^^ {
case (tag ~ attributes ~ children ~ close) => Element(tag, Map(attributes : _*), children)
}
}
What I'm realizing I want is some way to make sure my opening and closing tags match.
I think to do that, I need some sort of flatMap combinator ~ Parser[A] => (A => Parser[B]) => Parser[B],
so I can use the opening tag to construct the parser for the closing tag. But I don't see anything matching that signature in the library.
What's the proper way to do this?
You can write a method that takes a tag name and returns a parser for a closing tag with that name:
object HTML extends RegexParsers {
lazy val node: Parser[Node] = text | element
val text: Parser[TextNode] = """[^<]+""".r ^^ TextNode
val label: Parser[String] = """(\w[:\w]*)""".r
val value : Parser[String] = """("[^"]*"|\w+)""".r
val attribute : Parser[(String, Option[String])] = label ~ (
"=" ~> value ^^ Some[String] | "" ^^ { case _ => None }
) ^^ { case (k ~ v) => k -> v }
val openTag: Parser[String ~ Seq[(String, Option[String])]] =
"<" ~> label ~ rep(whiteSpace ~> attribute) <~ ">"
def closeTag(name: String): Parser[String] = "</" ~> name <~ ">"
val element: Parser[Element] = openTag.flatMap {
case (tag ~ attrs) =>
rep(node) <~ closeTag(tag) ^^
(children => Element(tag, attrs.toMap, children))
}
}
Note that you also need to make node lazy. Now you get a nice clean error message for unmatched tags:
scala> HTML.parse(HTML.element, "<a></b>")
res0: HTML.ParseResult[Element] =
[1.6] failure: `a' expected but `b' found
<a></b>
^
I've been a little more verbose than necessary for the sake of clarity. If you want concision you can skip the openTag and closeTag methods and write element like this, for example:
val element = "<" ~> label ~ rep(whiteSpace ~> attribute) <~ ">" >> {
case (tag ~ attrs) =>
rep(node) <~ "</" ~> tag <~ ">" ^^
(children => Element(tag, attrs.toMap, children))
}
I'm sure more concise versions would be possible, but in my opinion even this is edging toward unreadability.
There is a flatMap on Parser, and also an equivalent method named into and an operator >>, which might be more convenient aliases (flatMap is still needed when used in for comprehensions). It is indeed a valid way to do what you're looking for.
Alternatively, you can check that the tags match with ^?.
You are looking at the wrong place. It's a normal mistake, though. You want a method Parser[A] => (A => Parser[B]) => Parser[B], but you looked at the docs of Parsers, not Parser.
Look here.
There's a flatMap, also known as into or >>.
So i have a bunch of parsers like this:
object MyParser extends RegexParsers{
override val skipWhitespace = false
def blockLine = ((id ~ args) <~ ":") ~ ".*?".r ^^ {
case (blockID ~ argList) ~ rest => ???
}
def args = (("[" ~> rep1sep(arg, ", ") <~ "]")?) ^^ {
case Some(argList) =>
argList.zipWithIndex.map{
case ((Some(k), v), index) => k -> v
case ((None, v), index) => "arg" + index -> v
}
case None => List()
}
def arg = ((id <~ "=")?) ~ argtext ^^ {
case Some(name) ~ value => Some(name) -> value.toString()
case None ~ value=> None -> value
}
def argtext = "[^\\[\\],]+".r
def id = "[a-zA-Z]*".r
... many other parsers not shown...
}
Essentially, I want to re-use the parsers id and args in blockLine, but rather than getting the nested tree of List()s and ~s, I want to get back the original string that was matched. The purpose of this is doing some smart text-preprocessing (using the same parsers that I will use later for the actual parsing) to insert some text in the middle of the line. Something like:
def blockLine = (rawText(id ~ args) <~ ":") ~ ".*?".r ^^ {
case first ~ rest => first + "{" + rest
}
The higher purpose of the preprocessor is to go through and convert indentation-delimited blocks into curly-braces delimited blocks, so I can run the pre-processed file through a normal parser later. Is there any easy way to do this?
I'm trying to figure out how to terminate a repetition of words using a keyword. An example:
class CAQueryLanguage extends JavaTokenParsers {
def expression = ("START" ~ words ~ "END") ^^ { x =>
println("expression: " + x);
x
}
def words = rep(word) ^^ { x =>
println("words: " + x)
x
}
def word = """\w+""".r
}
When I execute
val caql = new CAQueryLanguage
caql.parseAll(caql.expression, "START one two END")
It prints words: List(one, two, END), indicating the words parser has consumed the END keyword in my input, leaving the expression parser unable to match. I would like END to not be matched by words, which will allow expression to successfully parse.
Is this what you are looking for?
import scala.util.parsing.combinator.syntactical._
object CAQuery extends StandardTokenParsers {
lexical.reserved += ("START", "END")
lexical.delimiters += (" ")
def query:Parser[Any]= "START" ~> rep1(ident) <~ "END"
def parse(s:String) = {
val tokens = new lexical.Scanner(s)
phrase(query)(tokens)
}
}
println(CAQuery.parse("""START a END""")) //List(a)
println(CAQuery.parse("""START a b c END""")) //List(a, b, c)
If you would like more details, you can check out this blog post