Scala Parser fails on <init> - parsing

Edit: I was able to fix it by making changing val to lazy val in the MessageParser class. I forgot that I had previously tested it using def instead of val. Can someone make it clear why this change fixes it?
So, I am currently writing an IRC Server. I decided to use Scala's Combinator Parser library to help me parse the messages. I've been able to correctly parse a message through a test program, but when I attempted to incorporate my parser into an echo server I already wrote I receive the following error message when I make a connection to my server:
Connected to the target VM, address: '127.0.0.1:55567', transport: 'socket'
Exception in thread "main" java.lang.ExceptionInInitializerError
at IRCServer.main(IRCServer.scala)
Caused by: java.lang.NullPointerException
at messages.MessageParser.<init>(MessageParser.scala:11)
at net.Connection.<init>(Connection.scala:14)
at net.Server.start(Server.scala:14)
at IRCServer$.<init>(IRCServer.scala:12)
at IRCServer$.<clinit>(IRCServer.scala)
... 1 more
Disconnected from the target VM, address: '127.0.0.1:55567', transport: 'socket'
The Connection class handles a listener Socket created from a ServerSocket
class Connection(socket: Socket) extends Thread {
private val out = new PrintStream(socket.getOutputStream)
private val in = new BufferedReader(new InputStreamReader(socket.getInputStream))
private val parser = new MessageParser
override def run(): Unit = {
var line = ""
while({(line = in.readLine); line != null}) {
Console.println("received: " + line)
parser.parseLine(line.trim)
out.println("out: " + line)
}
}
}
And the following is my MessageParser:
class MessageParser extends JavaTokenParsers {
def parseLine(line :CharSequence) = {
parseAll(message, line)
}
val message: Parser[Any] = opt(":"~prefix)~command~opt(params) ^^ (x=> {println("message: "+x)})
val prefix: Parser[Any] = nick~"!"~user~"#"~host | servername ^^ (x=> {println("prefix: " +x)})
val nick: Parser[Any] = letter~rep(letter | wholeNumber | special) ^^ (x=> {println("nick: " +x)})
val special: Parser[Any] = "-" | "[" | "]" | "\\" | "`" | "^" | "{" | "}" ^^ (x=> {println("special: " +x)})
val user: Parser[Any] = """[^\s#]+""".r ^^ (x=> {println("user: " +x)})
val host: Parser[Any] = """[\w\.]+\w+""".r ^^ (x=> {println("host: " +x)})
val servername: Parser[Any] = host ^^ (x=> {println("servername: " +x)})
val command: Parser[Any] = """([A-Za-z]+)|([0-9]{3})""".r ^^ (x=> {println("command: " +x)})
val params: Parser[Any] = rep(param)~opt(":"~tail) ^^ (x=> {println("params: " +x)})
val param: Parser[Any] = """[^:][\S]*""".r
val tail: Parser[Any] = """.*$""".r ^^ (x=> {println("tail: " +x)})
val letter: Parser[Any] = """[A-Za-z]""".r ^^ (x=> {println("letter: " +x)})
}
I'm not quite sure what could be causing this. Hopefully I'm just being blind to something small.

lazy val values are populated as-needed; val values are populated in the order you specify. With a parser, earlier entries refer to later ones which don't exist yet. So they'd better be lazy val or def (which one depends on the parser; the packrat parser likes lazy val, while the others usually assume def, but I'm not sure that they require it).

Catch the exception with the following code:
try {
//your code here
} catch {
case err: ExceptionInInitializerError => err.getCause.printStackTrace
}
This will help you to find the reason of the exception.

Related

Errors and failures in Scala Parser Combinators

I would like to implement a parser for some defined language using Scala Parser Combinators. However, the software that will compile the language does not implements all the language's feature, so I would like to fail if these features are used. I tried to forge a small example below :
object TestFail extends JavaTokenParsers {
def test: Parser[String] =
"hello" ~ "world" ^^ { case _ => ??? } |
"hello" ~ ident ^^ { case "hello" ~ id => s"hi, $id" }
}
I.e., the parser succeeds on "hello" + some identifier, but fails if the identifier is "world". I see that there exist fail() and err() parsers in the Parsers class, but I cannot figure out how to use them, as they return Parser[Nothing] instead of a String. The documentation does not seem to cover this use case…
In this case you want err, not failure, since if the first parser in a disjunction fails you'll just move on to the second, which isn't what you want.
The other issue is that ^^ is the equivalent of map, but you want flatMap, since err("whatever") is a Parser[Nothing], not a Nothing. You could use the flatMap method on Parser, but in this context it's more idiomatic to use the (completely equivalent) >> operator:
object TestFail extends JavaTokenParsers {
def test: Parser[String] =
"hello" ~> "world" >> (x => err(s"Can't say hello to the $x!")) |
"hello" ~ ident ^^ { case "hello" ~ id => s"hi, $id" }
}
Or, a little more simply:
object TestFail extends JavaTokenParsers {
def test: Parser[String] =
"hello" ~ "world" ~> err(s"Can't say hello to the world!") |
"hello" ~ ident ^^ { case "hello" ~ id => s"hi, $id" }
}
Either approach should do what you want.
You could use ^? method:
object TestFail extends JavaTokenParsers {
def test: Parser[String] =
"hello" ~> ident ^? (
{ case id if id != "world" => s"hi, $id" },
s => s"Should not use '$s' here."
)
}

How to parse set of properties in unspecified order?

I have a Regex parser that processes a custom property file. In my file, I have the following structure:
...
[NodeA]
propA=val1
propB=val2
propC=val3
[NodeB]
...
I defined a parser that processes NodeA as follows:
lazy val parserA: Parser[String] = "propA" ~> "=" ~> mPropA
lazy val parserB: Parser[String] =
...
lazy val nodeA: Parser[NodeA] = "[" ~> "NodeA" ~> "]" ~> parserA ~> parserB ~> parserB ^^ {
case iPropA ~ iPropB ~ iPropC => new NodeA(iPropA, iPropB, iPropC)
}
This works fine as it stands. The problem is if NodeA comes with a different property order, in which case I get a parsing error. For example:
[NodeA]
propC=val3
propA=val1
propB=val2
Is there any way to define my parser such that it accepts an unspecified ordering of NodeA's properties?
Still I have the feeling not understanding your problem, but what about:
import scala.util.parsing.combinator.JavaTokenParsers
object Test extends App with JavaTokenParsers {
case class Prop(name: String, value: String)
case class Node(name: String, propA: Prop, propB: Prop, propC: Prop)
lazy val prop = (ident <~ "=") ~ ident ^^ {
case p ~ v => (p, v)
}
lazy val node = "[" ~> ident <~ "]"
lazy val props = repN(3, prop) ^^ {
_.sorted map Prop.tupled
}
lazy val nodes = rep(node ~ props) ^^ {
_ map { case node ~ List(a, b, c) => Node(node, a, b, c) }
}
val in =
"""[NodeA]
propA=val1
propB=val2
propC=val3
[NodeB]
propC=val3
propA=val1
propB=val2"""
println(parseAll(nodes, in))
}

Using regex in StandardTokenParsers

I'm trying to use regex in my StandardTokenParsers based parser. For that, I've subclassed StdLexical as follows:
class CustomLexical extends StdLexical{
def regex(r: Regex): Parser[String] = new Parser[String] {
def apply(in:Input) = r.findPrefixMatchOf(in.source.subSequence(in.offset, in.source.length)) match {
case Some(matched) => Success(in.source.subSequence(in.offset, in.offset + matched.end).toString,
in.drop(matched.end))
case None => Failure("string matching regex `" + r + "' expected but " + in.first + " found", in)
}
}
override def token: Parser[Token] =
( regex("[a-zA-Z]:\\\\[\\w\\\\?]* | /[\\w/]*".r) ^^ { StringLit(_) }
| identChar ~ rep( identChar | digit ) ^^ { case first ~ rest => processIdent(first :: rest mkString "") }
| ...
But I'm a little confused on how I would define a Parser that takes advantage of this. I have a parser defined as:
def mTargetFolder: Parser[String] = "TargetFolder" ~> "=" ~> mFilePath
which should be used to identify valid file paths. I tried then:
def mFilePath: Parser[String] = "[a-zA-Z]:\\\\[\\w\\\\?]* | /[\\w/]*".r
But this is obviously not right. I get an error:
scala: type mismatch;
found : scala.util.matching.Regex
required: McfpDSL.this.Parser[String]
def mFilePath: Parser[String] = "[a-zA-Z]:\\\\[\\w\\\\?]* | /[\\w/]*".r
^
What is the proper way of using the extension made on my StdLexical subclass?
If you really want to use token based parsing, and reuse StdLexical, I would advise to update the syntax for "TargetFolder" so that the value after the equal sign is a proper string literal. Or in other words, make it so the path should be enclosed with quotes. From that point you don't need to extends StdLexical anymore.
Then comes the problem of converting a regexp to a parser. Scala already has RegexParsers for this (which implicitly converts a regexp to a Parser[String]), but unfortunately that's not what you want here because it works on streams of Char (type Elem = Char in RegexParsers) while you are working on a sttream of tokens.
So we will indeed have to define our own conversion from Regex to Parser[String] (but at the syntactic level rather than lexical level, or in other words in the token parser).
import scala.util.parsing.combinator.syntactical._
import scala.util.matching.Regex
import scala.util.parsing.input._
object MyParser extends StandardTokenParsers {
import lexical.StringLit
def regexStringLit(r: Regex): Parser[String] = acceptMatch(
"string literal matching regex " + r,
{ case StringLit( s ) if r.unapplySeq(s).isDefined => s }
)
lexical.delimiters += "="
lexical.reserved += "TargetFolder"
lazy val mTargetFolder: Parser[String] = "TargetFolder" ~> "=" ~> mFilePath
lazy val mFilePath: Parser[String] = regexStringLit("([a-zA-Z]:\\\\[\\w\\\\?]*)|(/[\\w/]*)".r)
def parseTargetFolder( s: String ) = { mTargetFolder( new lexical.Scanner( s ) ) }
}
Example:
scala> MyParser.parseTargetFolder("""TargetFolder = "c:\Dir1\Dir2" """)
res12: MyParser.ParseResult[String] = [1.31] parsed: c:\Dir1\Dir2
scala> MyParser.parseTargetFolder("""TargetFolder = "/Dir1/Dir2" """)
res13: MyParser.ParseResult[String] = [1.29] parsed: /Dir1/Dir2
scala> MyParser.parseTargetFolder("""TargetFolder = "Hello world" """)
res14: MyParser.ParseResult[String] =
[1.16] failure: identifier matching regex ([a-zA-Z]:\\[\w\\?]*)|(/[\w/]*) expected
TargetFolder = "Hello world"
^
Note that also fixed your "target folder" regexp here, you had missing parens around the two alternative, plus unneeded spaces.
Just call your function regex when you want to get a Parser[String] from a Regex:
def p: Parser[String] = regex("".r)
Or make regex implicit to let the compiler call it automatically for you:
implicit def regex(r: Regex): Parser[String] = ...
// =>
def p: Parser[String] = "".r

Scala: Parsing matching token

I'm playing around with a toy HTML parser, to help familiarize myself with Scala's parsing combinators library:
import scala.util.parsing.combinator._
sealed abstract class Node
case class TextNode(val contents : String) extends Node
case class Element(
val tag : String,
val attributes : Map[String,Option[String]],
val children : Seq[Node]
) extends Node
object HTML extends RegexParsers {
val node: Parser[Node] = text | element
val text: Parser[TextNode] = """[^<]+""".r ^^ TextNode
val label: Parser[String] = """(\w[:\w]*)""".r
val value : Parser[String] = """("[^"]*"|\w+)""".r
val attribute : Parser[(String,Option[String])] = label ~ (
"=" ~> value ^^ Some[String] | "" ^^ { case _ => None }
) ^^ { case (k ~ v) => k -> v }
val element: Parser[Element] = (
("<" ~> label ~ rep(whiteSpace ~> attribute) <~ ">" )
~ rep(node) ~
("</" ~> label <~ ">")
) ^^ {
case (tag ~ attributes ~ children ~ close) => Element(tag, Map(attributes : _*), children)
}
}
What I'm realizing I want is some way to make sure my opening and closing tags match.
I think to do that, I need some sort of flatMap combinator ~ Parser[A] => (A => Parser[B]) => Parser[B],
so I can use the opening tag to construct the parser for the closing tag. But I don't see anything matching that signature in the library.
What's the proper way to do this?
You can write a method that takes a tag name and returns a parser for a closing tag with that name:
object HTML extends RegexParsers {
lazy val node: Parser[Node] = text | element
val text: Parser[TextNode] = """[^<]+""".r ^^ TextNode
val label: Parser[String] = """(\w[:\w]*)""".r
val value : Parser[String] = """("[^"]*"|\w+)""".r
val attribute : Parser[(String, Option[String])] = label ~ (
"=" ~> value ^^ Some[String] | "" ^^ { case _ => None }
) ^^ { case (k ~ v) => k -> v }
val openTag: Parser[String ~ Seq[(String, Option[String])]] =
"<" ~> label ~ rep(whiteSpace ~> attribute) <~ ">"
def closeTag(name: String): Parser[String] = "</" ~> name <~ ">"
val element: Parser[Element] = openTag.flatMap {
case (tag ~ attrs) =>
rep(node) <~ closeTag(tag) ^^
(children => Element(tag, attrs.toMap, children))
}
}
Note that you also need to make node lazy. Now you get a nice clean error message for unmatched tags:
scala> HTML.parse(HTML.element, "<a></b>")
res0: HTML.ParseResult[Element] =
[1.6] failure: `a' expected but `b' found
<a></b>
^
I've been a little more verbose than necessary for the sake of clarity. If you want concision you can skip the openTag and closeTag methods and write element like this, for example:
val element = "<" ~> label ~ rep(whiteSpace ~> attribute) <~ ">" >> {
case (tag ~ attrs) =>
rep(node) <~ "</" ~> tag <~ ">" ^^
(children => Element(tag, attrs.toMap, children))
}
I'm sure more concise versions would be possible, but in my opinion even this is edging toward unreadability.
There is a flatMap on Parser, and also an equivalent method named into and an operator >>, which might be more convenient aliases (flatMap is still needed when used in for comprehensions). It is indeed a valid way to do what you're looking for.
Alternatively, you can check that the tags match with ^?.
You are looking at the wrong place. It's a normal mistake, though. You want a method Parser[A] => (A => Parser[B]) => Parser[B], but you looked at the docs of Parsers, not Parser.
Look here.
There's a flatMap, also known as into or >>.

Return type of "|" in Scala's parser combinators

I was reading Bernie Pope's slides on "Parser combinators in Scala". He quotes the method signature type of the "alternative" combinator |:
def | [U >: T](q: => Parser[U]): Parser[U]
and asks, "Homework: why doesn’t | have this type instead?"
def | [U](q: => Parser[U]): Parser[Either[T,U]]
case class Stooge(name: String)
val moe: Parser[String] = "Moe"
val larry: Parser[String] = "Larry"
val curly: Parser[String] = "Curly"
val shemp: Parser[String] = "Shemp"
val stooge: Parser[Stooge] = (moe | larry | curly | shemp) ^^ { s => Stooge(s) }
Now, imagine the code you would have to write instead of { s => Stooge(s) } if you were working with an s: Either[Either[Either[String,String],String],String] instead of a s: String.

Resources