Errors and failures in Scala Parser Combinators - parsing

I would like to implement a parser for some defined language using Scala Parser Combinators. However, the software that will compile the language does not implements all the language's feature, so I would like to fail if these features are used. I tried to forge a small example below :
object TestFail extends JavaTokenParsers {
def test: Parser[String] =
"hello" ~ "world" ^^ { case _ => ??? } |
"hello" ~ ident ^^ { case "hello" ~ id => s"hi, $id" }
}
I.e., the parser succeeds on "hello" + some identifier, but fails if the identifier is "world". I see that there exist fail() and err() parsers in the Parsers class, but I cannot figure out how to use them, as they return Parser[Nothing] instead of a String. The documentation does not seem to cover this use caseā€¦

In this case you want err, not failure, since if the first parser in a disjunction fails you'll just move on to the second, which isn't what you want.
The other issue is that ^^ is the equivalent of map, but you want flatMap, since err("whatever") is a Parser[Nothing], not a Nothing. You could use the flatMap method on Parser, but in this context it's more idiomatic to use the (completely equivalent) >> operator:
object TestFail extends JavaTokenParsers {
def test: Parser[String] =
"hello" ~> "world" >> (x => err(s"Can't say hello to the $x!")) |
"hello" ~ ident ^^ { case "hello" ~ id => s"hi, $id" }
}
Or, a little more simply:
object TestFail extends JavaTokenParsers {
def test: Parser[String] =
"hello" ~ "world" ~> err(s"Can't say hello to the world!") |
"hello" ~ ident ^^ { case "hello" ~ id => s"hi, $id" }
}
Either approach should do what you want.

You could use ^? method:
object TestFail extends JavaTokenParsers {
def test: Parser[String] =
"hello" ~> ident ^? (
{ case id if id != "world" => s"hi, $id" },
s => s"Should not use '$s' here."
)
}

Related

How to embed Scala code inside a specially defined syntax?

I don't know if this info is relevant to the question, but I am learning Scala parser combinators.
Using some examples (in this master thesis) I was able to write a simple functional (in the sense that it is non imperative) programming language.
Is there a way to improve my parser/evaluator such that it could allow/evaluate input like this:
<%
import scala.<some package / classes>
import weka.<some package / classes>
%>
some DSL code (lambda calculus)
<%
System.out.println("asdasd");
J48 j48 = new J48();
%>
as input written in the guest language (DSL)?
Should I use reflection or something similar* to evaluate such input?
Is there some source code recommendation to study (may be groovy sources?)?
Maybe this is something similar: runtime compilation, but I am not sure this is the best alternative.
EDIT
Complete answer given bellow with "{" and "}". Maybe "{{" would be better.
It is the question as to what the meaning of such import statements should be.
Perhaps you start first with allowing references to java methods in your language (the Lambda Calculus, I guess?).
For example:
java.lang.System.out.println "foo"
If you have that, you can then add resolution of unqualified names like
println "foo"
But here comes the first problem: println exists in System.out and System.err, or, to be more correct: it is a method of PrintStream, and both System.err and System.out are PrintStreams.
Hence you would need some notion of Objects, Classes, Types, and so on to do it right.
I managed how to run Scala code embedded in my interpreted DSL.
Insertion of DSL vars into Scala code and recovering returning value comes as a bonus. :)
Minimal relevant code from parsing and interpreting until performing embedded Scala code run-time execution (Main Parser AST and Interpreter):
object Main extends App {
val ast = Parser1 parse "some dsl code here"
Interpreter eval ast
}
object Parser1 extends RegexParsers with ImplicitConversions {
import AST._
val separator = ";"
def parse(input: String): Expr = parseAll(program, input).get
type P[+T] = Parser[T]
def program = rep1sep(expr, separator) <~ separator ^^ Sequence
def expr: Parser[Expr] = (assign /*more calls here*/)
def scalacode: P[Expr] = "{" ~> rep(scala_text) <~ "}" ^^ {case l => Scalacode(l.flatten)}
def scala_text = text_no_braces ~ "$" ~ ident ~ text_no_braces ^^ {case a ~ b ~ c ~ d => List(a, b + c, d)}
//more rules here
def assign = ident ~ ("=" ~> atomic_expr) ^^ Assign
//more rules here
def atomic_expr = (
ident ^^ Var
//more calls here
| "(" ~> expr <~ ")"
| scalacode
| failure("expression expected")
)
def text_no_braces = """[a-zA-Z0-9\"\'\+\-\_!##%\&\(\)\[\]\/\?\:;\.\>\<\,\|= \*\\\n]*""".r //| fail("Scala code expected")
def ident = """[a-zA-Z]+[a-zA-Z0-9]*""".r
}
object AST {
sealed abstract class Expr
// more classes here
case class Scalacode(items: List[String]) extends Expr
case class Literal(v: Any) extends Expr
case class Var(name: String) extends Expr
}
object Interpreter {
import AST._
val env = collection.immutable.Map[VarName, VarValue]()
def run(code: String) = {
val code2 = "val res_1 = (" + code + ")"
interpret.interpret(code2)
val res = interpret.valueOfTerm("res_1")
if (res == None) Literal() else Literal(res.get)
}
class Context(private var env: Environment = initEnv) {
def eval(e: Expr): Any = e match {
case Scalacode(l: List[String]) => {
val r = l map {
x =>
if (x.startsWith("$")) {
eval(Var(x.drop(1)))
} else {
x
}
}
eval(run(r.mkString))
}
case Assign(id, expr) => env += (id -> eval(expr))
//more pattern matching here
case Literal(v) => v
case Var(id) => {
env getOrElse(id, sys.error("Undefined " + id))
}
}
}
}

Using regex in StandardTokenParsers

I'm trying to use regex in my StandardTokenParsers based parser. For that, I've subclassed StdLexical as follows:
class CustomLexical extends StdLexical{
def regex(r: Regex): Parser[String] = new Parser[String] {
def apply(in:Input) = r.findPrefixMatchOf(in.source.subSequence(in.offset, in.source.length)) match {
case Some(matched) => Success(in.source.subSequence(in.offset, in.offset + matched.end).toString,
in.drop(matched.end))
case None => Failure("string matching regex `" + r + "' expected but " + in.first + " found", in)
}
}
override def token: Parser[Token] =
( regex("[a-zA-Z]:\\\\[\\w\\\\?]* | /[\\w/]*".r) ^^ { StringLit(_) }
| identChar ~ rep( identChar | digit ) ^^ { case first ~ rest => processIdent(first :: rest mkString "") }
| ...
But I'm a little confused on how I would define a Parser that takes advantage of this. I have a parser defined as:
def mTargetFolder: Parser[String] = "TargetFolder" ~> "=" ~> mFilePath
which should be used to identify valid file paths. I tried then:
def mFilePath: Parser[String] = "[a-zA-Z]:\\\\[\\w\\\\?]* | /[\\w/]*".r
But this is obviously not right. I get an error:
scala: type mismatch;
found : scala.util.matching.Regex
required: McfpDSL.this.Parser[String]
def mFilePath: Parser[String] = "[a-zA-Z]:\\\\[\\w\\\\?]* | /[\\w/]*".r
^
What is the proper way of using the extension made on my StdLexical subclass?
If you really want to use token based parsing, and reuse StdLexical, I would advise to update the syntax for "TargetFolder" so that the value after the equal sign is a proper string literal. Or in other words, make it so the path should be enclosed with quotes. From that point you don't need to extends StdLexical anymore.
Then comes the problem of converting a regexp to a parser. Scala already has RegexParsers for this (which implicitly converts a regexp to a Parser[String]), but unfortunately that's not what you want here because it works on streams of Char (type Elem = Char in RegexParsers) while you are working on a sttream of tokens.
So we will indeed have to define our own conversion from Regex to Parser[String] (but at the syntactic level rather than lexical level, or in other words in the token parser).
import scala.util.parsing.combinator.syntactical._
import scala.util.matching.Regex
import scala.util.parsing.input._
object MyParser extends StandardTokenParsers {
import lexical.StringLit
def regexStringLit(r: Regex): Parser[String] = acceptMatch(
"string literal matching regex " + r,
{ case StringLit( s ) if r.unapplySeq(s).isDefined => s }
)
lexical.delimiters += "="
lexical.reserved += "TargetFolder"
lazy val mTargetFolder: Parser[String] = "TargetFolder" ~> "=" ~> mFilePath
lazy val mFilePath: Parser[String] = regexStringLit("([a-zA-Z]:\\\\[\\w\\\\?]*)|(/[\\w/]*)".r)
def parseTargetFolder( s: String ) = { mTargetFolder( new lexical.Scanner( s ) ) }
}
Example:
scala> MyParser.parseTargetFolder("""TargetFolder = "c:\Dir1\Dir2" """)
res12: MyParser.ParseResult[String] = [1.31] parsed: c:\Dir1\Dir2
scala> MyParser.parseTargetFolder("""TargetFolder = "/Dir1/Dir2" """)
res13: MyParser.ParseResult[String] = [1.29] parsed: /Dir1/Dir2
scala> MyParser.parseTargetFolder("""TargetFolder = "Hello world" """)
res14: MyParser.ParseResult[String] =
[1.16] failure: identifier matching regex ([a-zA-Z]:\\[\w\\?]*)|(/[\w/]*) expected
TargetFolder = "Hello world"
^
Note that also fixed your "target folder" regexp here, you had missing parens around the two alternative, plus unneeded spaces.
Just call your function regex when you want to get a Parser[String] from a Regex:
def p: Parser[String] = regex("".r)
Or make regex implicit to let the compiler call it automatically for you:
implicit def regex(r: Regex): Parser[String] = ...
// =>
def p: Parser[String] = "".r

How to parse this structure: "name[arg,arg]" with scala combinator parsers?

I have several strings like these:
name[arg,arg,arg]
name[arg,arg]
name[arg]
name
I wanted to parse it with scala combinator parsers, and this is the best that I managed to get:
object TaskDepParser extends JavaTokenParsers {
def name: Parser[String] = "[^\\[\\],]+".r
def expr: Parser[(String, Option[List[String]])] =
name ^^ { a => (a, None) } |
name ~ "[" ~ repsep(name, ",") ~ "]" ^^ { case name~_~args~_ => (name, Some(args)) }
}
It works on name, but fails to work on name[arg] - says string matching regex\z' expected but [' found. Is it possible to fix it?
#TonyK has already given the answer in his comment. But I wanna suggest that Scala parser combinators can already parse optional values:
object TaskDepParser extends JavaTokenParsers {
def name: Parser[String] = """[^\[\],]+""".r
def expr: Parser[(String, Option[List[String]])] =
name ~ opt("[" ~> repsep(name, ",") <~ "]") ^^ { case name ~ args => (name, args) }
}
With ~> and <~ it is possible to keep only left or right result to avoid unnecessary patter matching in ^^. Furthermore I would use triple quotes for strings to avoid lots of escaping.
I think it might work if you flip it around...Name is getting sucked up by the first rule, and then you get a failure on input.

Scala: Parsing matching token

I'm playing around with a toy HTML parser, to help familiarize myself with Scala's parsing combinators library:
import scala.util.parsing.combinator._
sealed abstract class Node
case class TextNode(val contents : String) extends Node
case class Element(
val tag : String,
val attributes : Map[String,Option[String]],
val children : Seq[Node]
) extends Node
object HTML extends RegexParsers {
val node: Parser[Node] = text | element
val text: Parser[TextNode] = """[^<]+""".r ^^ TextNode
val label: Parser[String] = """(\w[:\w]*)""".r
val value : Parser[String] = """("[^"]*"|\w+)""".r
val attribute : Parser[(String,Option[String])] = label ~ (
"=" ~> value ^^ Some[String] | "" ^^ { case _ => None }
) ^^ { case (k ~ v) => k -> v }
val element: Parser[Element] = (
("<" ~> label ~ rep(whiteSpace ~> attribute) <~ ">" )
~ rep(node) ~
("</" ~> label <~ ">")
) ^^ {
case (tag ~ attributes ~ children ~ close) => Element(tag, Map(attributes : _*), children)
}
}
What I'm realizing I want is some way to make sure my opening and closing tags match.
I think to do that, I need some sort of flatMap combinator ~ Parser[A] => (A => Parser[B]) => Parser[B],
so I can use the opening tag to construct the parser for the closing tag. But I don't see anything matching that signature in the library.
What's the proper way to do this?
You can write a method that takes a tag name and returns a parser for a closing tag with that name:
object HTML extends RegexParsers {
lazy val node: Parser[Node] = text | element
val text: Parser[TextNode] = """[^<]+""".r ^^ TextNode
val label: Parser[String] = """(\w[:\w]*)""".r
val value : Parser[String] = """("[^"]*"|\w+)""".r
val attribute : Parser[(String, Option[String])] = label ~ (
"=" ~> value ^^ Some[String] | "" ^^ { case _ => None }
) ^^ { case (k ~ v) => k -> v }
val openTag: Parser[String ~ Seq[(String, Option[String])]] =
"<" ~> label ~ rep(whiteSpace ~> attribute) <~ ">"
def closeTag(name: String): Parser[String] = "</" ~> name <~ ">"
val element: Parser[Element] = openTag.flatMap {
case (tag ~ attrs) =>
rep(node) <~ closeTag(tag) ^^
(children => Element(tag, attrs.toMap, children))
}
}
Note that you also need to make node lazy. Now you get a nice clean error message for unmatched tags:
scala> HTML.parse(HTML.element, "<a></b>")
res0: HTML.ParseResult[Element] =
[1.6] failure: `a' expected but `b' found
<a></b>
^
I've been a little more verbose than necessary for the sake of clarity. If you want concision you can skip the openTag and closeTag methods and write element like this, for example:
val element = "<" ~> label ~ rep(whiteSpace ~> attribute) <~ ">" >> {
case (tag ~ attrs) =>
rep(node) <~ "</" ~> tag <~ ">" ^^
(children => Element(tag, attrs.toMap, children))
}
I'm sure more concise versions would be possible, but in my opinion even this is edging toward unreadability.
There is a flatMap on Parser, and also an equivalent method named into and an operator >>, which might be more convenient aliases (flatMap is still needed when used in for comprehensions). It is indeed a valid way to do what you're looking for.
Alternatively, you can check that the tags match with ^?.
You are looking at the wrong place. It's a normal mistake, though. You want a method Parser[A] => (A => Parser[B]) => Parser[B], but you looked at the docs of Parsers, not Parser.
Look here.
There's a flatMap, also known as into or >>.

Scala parser combinators: retrieving the original string that the parser consumed

So i have a bunch of parsers like this:
object MyParser extends RegexParsers{
override val skipWhitespace = false
def blockLine = ((id ~ args) <~ ":") ~ ".*?".r ^^ {
case (blockID ~ argList) ~ rest => ???
}
def args = (("[" ~> rep1sep(arg, ", ") <~ "]")?) ^^ {
case Some(argList) =>
argList.zipWithIndex.map{
case ((Some(k), v), index) => k -> v
case ((None, v), index) => "arg" + index -> v
}
case None => List()
}
def arg = ((id <~ "=")?) ~ argtext ^^ {
case Some(name) ~ value => Some(name) -> value.toString()
case None ~ value=> None -> value
}
def argtext = "[^\\[\\],]+".r
def id = "[a-zA-Z]*".r
... many other parsers not shown...
}
Essentially, I want to re-use the parsers id and args in blockLine, but rather than getting the nested tree of List()s and ~s, I want to get back the original string that was matched. The purpose of this is doing some smart text-preprocessing (using the same parsers that I will use later for the actual parsing) to insert some text in the middle of the line. Something like:
def blockLine = (rawText(id ~ args) <~ ":") ~ ".*?".r ^^ {
case first ~ rest => first + "{" + rest
}
The higher purpose of the preprocessor is to go through and convert indentation-delimited blocks into curly-braces delimited blocks, so I can run the pre-processed file through a normal parser later. Is there any easy way to do this?

Resources