Scala 2.9: is there an easy way to log all ParseResults? - parsing

I've written a lexer and parser using scala.util.parsing.combinators.Parsers. I have a bug in at least one of my productions, but I have so many of them that it is difficult to eyeball them to determine the problem.
What I need is a log of every attempt my Parser makes to match the input with any production; logging all the Success and Failure objects when they are instantiated would be lovely. Unfortunately, the only way I can see to do this is to extend a lot of the basic classes provided by the library, then rewriting my massive parser to extend the new classes.
Is there an easy way to get this logging behavior?

You could use the log combinator to wrap productions of your grammar. Here's the definition in Parsers.scala:
def log[T](p: => Parser[T])(name: String): Parser[T] = Parser{ in =>
println("trying "+ name +" at "+ in)
val r = p(in)
println(name +" --> "+ r)
r
}
Otherwise, I think you should be able to override success and failure, but it would be quite uninformative, since you won't know what production called them.

Related

Result vs raise in F# async?

It seems like there are two ways to return errors in an async workflow: raise and Result.
let willFailRaise = async {
return raise <| new Exception("oh no!")
}
let willFailResult = async {
return Result.Error "oh no!"
}
For the caller, the handling is a bit different:
async {
try
let! x = willFailRaise
// ...
with error ->
System.Console.WriteLine(error)
}
async {
let! maybeX = willFailResult
match maybeX with
| Result.Ok x ->
// ...
| Result.Error error ->
System.Console.WriteLine(error)
}
My questions are:
What are the advantages / disadvantages of each approach?
Which approach is more idiomatic F#?
It depends on what kind of error we are talking about. Basically there are three kinds:
Domain errors (e.g. user provided invalid data, user with this email is already registered, etc.)
Infrastructure errors (e.g you can't connect to another microservice or DB)
Panics (e.g. NullReferenceExceptionor StackOverflowException etc.), which are caused by programmers' mistakes.
While both approaches can get the job done, usually your concern should be to make your code as self-documented and easy-to-read as possible. Which means the following:
Domain errors: definitely go for Result. Those "errors" are expected, they are part of your workflow. Using Result reflects your business rules in function's signature, which is very useful.
Infrastructure failures: it depends. If you have microservices, then probably those failures are expected and maybe it would be more convenient to use Result. If not -- go for exceptions.
Panics: definitely Exception. First of all, you can't cover everything with Result, you gonna need global exception filter either way. Second thing -- if you try to cover all possible panics - code becomes a nasty disaster extremely fast, that will kill the whole point of using Result for domain errors.
So really this has nothing to do with Async or C# interop, it's about code readability and maintainability.
As for C# iterop -- don't worry, Result has all the methods to help, like IsError and so on. But you can always add an extension method:
[<AutoOpen>]
module Utils =
type Result<'Ok, 'Error> with
member this.Value =
match this with
| Ok v -> v
| Error e -> Exception(e.ToString()) |> raise
This is one of the many aspects of F# programming that suffers from the mind-split at the core of the language and its community.
On one hand you have "F# the .NET Framework language" where exceptions are the mechanism for handling errors, on the other - "F# the functional programming language" that borrows its idioms from the Haskell side of the world. This is where Result (also known as Either) comes from.
The answer to the question "which one is idiomatic" will change depending who you ask and what they have seen, but my experience has taught me that when in doubt, you're better off using exceptions. Result type has its uses in moderation, but result-heavy programming style easily gets out of hand, and once that happens it's not a pretty sight.
Raise
Advantages
Better .NET interop as throwing exceptions is fairly common in .NET
Can create custom Exceptions
Easier to get the stack trace as it's right there
You probably have to deal with exceptions from library code anyways in most standard async operations such as reading from a webpage
Works with older versions of F#
Disadvantages:
If you aren't aware of it potentially throwing an exception, you might not know to catch the exception. This could result in runtime explosions
Result
Advantages
Any caller of the async function will have to deal with the error, so runtime explosions should be avoided
Can use railway oriented programming style, which can make your code quite clean
Disadvantages
Only available in F# 4.1 or later
Difficult for non-F# languages to use it
The API for Result is not comprehensive.
There are only the functions bind, map, and mapError
Some functions that would be nice to have:
bimap : ('TSuccess -> 'a) -> ('TError -> 'e) -> Result<'TSuccess,'TError> -> Result<'a, 'e>
fold : ('TSuccess -> 'T) -> ('TError -> 'T) -> Result<'TSuccess, 'TError> -> 'T
isOk

Implementing Jena Dataset provider over MongoDB

I have started to implement a set of classes that provide a direct interface to MongoDB for persistence, similar in spirit to the now-unmaintained SDB persistor implementation for RDBMS.
I am using the time-honored technique of creating the necessary concrete classes from the interfaces and doing a println in each method, therein allowing me to trace the execution. I have gotten all the way to where the engine is calling out to my cursor set up:
public ExtendedIterator<Triple> find(Node s, Node p, Node o) {
System.out.println("+++ MongoGraph:extenditer:find(" + s + p + o + ")");
// TBD Need to turn s,p,o into a match expression! Easy!
MongoCursor cur = this.coll.find().iterator();
ExtendedIterator<Triple> curs = new JenaMongoCursorIterator(cur);
return curs;
}
Sadly, when I later call this:
while(rs.hasNext()) {
QuerySolution soln = rs.nextSolution() ;
System.out.println(soln);
}
It turns out rs.hasNext() is always false even though material is present in the MongoCursor (I can debug-print it in the find() method). Also, the trace print in the next() function in my concrete iterator JenaMongoCursorIterator (which extends NiceIterator which I believe is OK) is never hit. In short, the basic setup seems good but then the engine never cranks the iterator on find()
Trying to use SDB as a guide is completely overwhelming for someone not intimately familiar with the software architecture. It's fully factored and filled with interfaces and factories and although that is excellent, it is difficult to nav.
Has anyone tried to create their own persistor implementation and if so, what are the basic steps to getting a "hello world" running? Hello World in this case is ANY implementation, non-optimized, that can call next() on something to produce a Triple.
TLDR: It is now working.
I was coding too eagerly and JenaMongoCursorIterator contained a method hasNexl which of course did not override hasNext (with a t ) in the default implementation of NiceIterator which returns false.
This is the sort of problem that eclipse and visual debugging and tracing makes a lot easier to resolve than regular jdb. jdb is fine if you know the software architecture pretty well but if you don't, the multiple open source files and being able to mouse over vars and such provides a tremendous boost in the amount of context that can be created to home in on the problem.

writing a function to do type cast

I'm trying to write a function that does type casting, which seems to be a frequently occurring activity in Rascal code. But I can't seem to get it right. The following and several variations on it fail.
public &T cast(type[&T] tp, value v) throws str {
if (tp tv := v)
return tv;
else
throw "cast failed";
}
Can someone help me out?
Some more info: I frequently use pattern matching against a pattern of the form "Type Var" (i.e. against a variable declaration) in order to tell Rascal that an expression has a certain type, e.g.
map[str,value] m := myexp
This is usually in cases where I know that myexp has type map[str,value], but omitting the matching would make Rascal's type checking mechanism complain.
In order to be a bit more defensive against mistakes, I usually wrap the matching construct in an if-then-else where an exception is raised if the match fails:
if (map[str,value] m := myexp) {
// use m
} else {
throw "cast failed";
}
I would like to shorten all such similar pieces of code using a single function that does the job generically, so that I can write instead
cast(#map[str,value], myexp)
PS. Also see How to cast a value type to Map in Rascal?
It seems that the best way to write this, if you truly need to do this, is the following:
public map[str,value] cast(map[str,value] v) = v;
public default map[str,value] cast(value v) { throw "cast failed!"; }
Then you could just say
m = cast(myexp);
and it would do what you want to do -- the actual pattern matching is moved into the function signature for cast, with a case specific to the type you are wanting to use and a case that handles everything that doesn't otherwise match.
However, I'm still not sure why you are using type value, either here (inside the map) or in the linked question. The "standard" Rascal way of handling cases where you could have one of multiple choices is to define these with a user-defined data type and constructors. You could then use pattern matching to match the constructors, or use the is and has keywords to interrogate a value to check to see if it was created using a specific constructor or if it has a specific field, respectively. The rule for fields is that all occurrences of a field in the constructor definitions for a given ADT have the same type. So, it may help to know more about your usage scenario to see if this definition of cast is the best option or if there is a better solution to your problem.
EDITED
If you are reading JSON, an alternate way to do it is to use the JSON grammar and AST that also live in that part of the library (I think the one you are using is more of a stream reader, like our current text readers and writers, but I would need to look at the code more to be sure). You can then do something like this (long output included to give an idea of the results):
rascal>import lang::json::\syntax::JSON;
ok
rascal>import lang::json::ast::JSON;
ok
rascal>import lang::json::ast::Implode;
ok
ascal>js = buildAST(parse(#JSONText, |project://rascal/src/org/rascalmpl/library/lang/json/examples/twitter01.json|));
Value: object((
"since_id":integer(0),
"refresh_url":string("?since_id=202744362520678400&q=amsterdam&lang=en"),
"page":integer(1),
"since_id_str":string("0"),
"completed_in":float(0.058),
"results_per_page":integer(25),
"next_page":string("?page=2&max_id=202744362520678400&q=amsterdam&lang=en&rpp=25"),
"max_id_str":string("202744362520678400"),
"query":string("amsterdam"),
"max_id":integer(202744362520678400),
"results":array([
object((
"from_user":string("adekamel"),
"profile_image_url_https":string("https:\\/\\/si0.twimg.com\\/profile_images\\/2206104506\\/339515338_normal.jpg"),
"in_reply_to_status_id_str":string("202730522013728768"),
"to_user_id":integer(215350297),
"from_user_id_str":string("366868475"),
"geo":null(),
"in_reply_to_status_id":integer(202730522013728768),
"profile_image_url":string("http:\\/\\/a0.twimg.com\\/profile_images\\/2206104506\\/339515338_normal.jpg"),
"to_user_id_str":string("215350297"),
"from_user_name":string("nurul amalya \\u1d54\\u1d25\\u1d54"),
"created_at":string("Wed, 16 May 2012 12:56:37 +0000"),
"id_str":string("202744362520678400"),
"text":string("#Donnalita122 #NaishahS #fatihahmS #oishiihotchoc #yummy_DDG #zaimar93 #syedames I\'m here at Amsterdam :O"),
"to_user":string("Donnalita122"),
"metadata":object(("result_type":string("recent"))),
"iso_language_code":string("en"),
"from_user_id":integer(366868475),
"source":string("<a href="http:\\/\\/blackberry.com\\/twitter" rel="nofollow">Twitter for BlackBerry\\u00ae<\\/a>"),
"id":integer(202744362520678400),
"to_user_name":string("Rahmadini Hairuddin")
)),
object((
"from_user":string("kelashby"),
"profile_image_url_https":string("https:\\/\\/si0.twimg.com\\/profile_images\\/1861086809\\/me_beach_normal.JPG"),
"to_user_id":integer(0),
"from_user_id_str":string("291446599"),
"geo":null(),
"profile_image_url":string("http:\\/\\/a0.twimg.com\\/profile_images\\/1861086809\\/me_beach_normal.JPG"),
"to_user_id_str":string("0"),
"from_user_name":string("Kelly Ashby"),
"created_at":string("Wed, 16 May 2012 12:56:25 +0000"),
"id_str":string("202744310872018945"),
"text":string("45 days til freedom! Cannot wait! After Paris: London, maybe Amsterdam, then southern France, then CANADA!!!!"),
"to_user":null(),
"metadata":object(("result_type":string("recent"))),
"iso_language_code":string("en"),
"from_user_id":integer(291446599),
"source":string("<a href="http:\\/\\/mobile.twitter.com" rel="nofollow">Mobile Web<\\/a>"),
"id":integer(202744310872018945),
"to_user_name":null()
)),
object((
"from_user":string("johantolsma"),
"profile_image_url_https":string("https:\\/\\/si0.twimg.com\\/profile_images\\/1961917557\\/image_normal.jpg"),
"to_user_id":integer(0),
"from_user_id_str":string("23632499"),
"geo":null(),
"profile_image_url":string("http:\\/\\/a0.twimg.com\\/profile_images\\/1961917557\\/image_normal.jpg"),
"to_user_id_str":string("0"),
"from_user_name":string("Johan Tolsma"),
"created_at":string("Wed, 16 May 2012 12:56:16 +0000"),
"id_str":string("202744274050236416"),
"text":string("RT #agerolemou: Office space for freelancers in Amsterdam http:\\/\\/t.co\\/6VfHuLeK"),
"to_user":null(),
"metadata":object(("result_type":string("recent"))),
"iso_language_code":string("en"),
"from_user_id":integer(23632499),
"source":string("<a href="http:\\/\\/itunes.apple.com\\/us\\/app\\/twitter\\/id409789998?mt=12" rel="nofollow">Twitter for Mac<\\/a>"),
"id":integer(202744274050236416),
"to_user_name":null()
)),
object((
"from_user":string("hellosophieg"),
"profile_image_url_https":string("https:\\/\\/si0.twimg.com\\/profile_images\\/2213055219\\/image_normal.jpg"),
"to_user_id":integer(0),
"from_user_id_str":string("41153106"),
"geo":null(),
"profile_image_url":string("http:\\/\\/a0.twimg.com\\/profile_images\\/2213055219\\/image_normal.jp...
rascal>js is object;
bool: true
rascal>js.members<0>;
set[str]: {"since_id","refresh_url","page","since_id_str","completed_in","results_per_page","next_page","max_id_str","query","max_id","results"}
rascal>js.members["results_per_page"];
Value: integer(25)
You can then use pattern matching, over the types defined in lang::json::ast::json, to extract the information you need.
The code has a bug. This is the fixed code:
public &T cast(type[&T] tp, value v) throws str {
if (&T tv := v)
return tv;
else
throw "cast failed";
}
Note that we do not wish to include this in the standard library. Rather lets collect cases where we need it and find out how to fix it in another way.
If you find you need this casting often, then you might be avoiding the better parts of Rascal, such as pattern based dispatch. See also the answer by Mark Hills.

How to understand the "isCommitted" property of ParserResult?

I'm reading the source of polux's great parsers, and found there is a special isCommitted property which I can't understand:
class ParseResult<A> {
final bool isSuccess;
final bool isCommitted;
/// [:null:] if [:!isSuccess:]
final A value;
final String text;
final Position position;
final Expectations expectations;
// ...
}
You can see there is already a isSuccess to indicate the parse result is successful or not, why do we need a isCommitted? I tried to read related code, but still don't understand.
If you want to see the source, you can find it here.
The short answer is: don't worry about isCommited, it's for internal purposes only.
The long answer is: you can call commited on a paser, which means that once it has succeeded, you know for sure that it's pointless to backtrack (very much like Prolog's cut). For instance consider a grammar like this:
expr() => str('(') + rec(expr) str(')') ^ ...
| num()
Assume we parse the string "(...". Once we have recognized the parenthesis, we know for sure that if ... turns out not to be an expr, there is no need to rewind to the start of the string and try to parse a num, since a num will never start with a parenthesis anyway. We can fail early. This is done by marking ( as being a "commit point":
expr() => str('(').commited + rec(expr) str(')') ^ ...
| num()
This is an optimisation which should be used with great care because it breaks the modularity of parsers with respect to |. I personally never had to use it so far.
Whenever you call commited on a parser, it returns a new parser whose isCommited property is true. It is then used by | to decide whether to backtrack or not. This is what isCommited is used for. As an end user you should never have to care. I should probably make it private.
This feature is inspired by Polyparse's commit.

How to get lexer to output EOF in Scala TokenParser?

If I define a Lexical to feed a TokenParser, I'm having trouble getting the TokenParser to actually output an EOF token. In particular, some of the methods in Parser[T] (acceptIf, acceptMatch, and phrase) directly check whether the Reader is atEnd, so there's no chance for an EOF token to get added to the token stream before an error is returned.
Since the Tokens trait actually defines an EOF token, I'm sure there must be some simple way to output it, but at this point all I can think to do is to create my own Reader that doesn't return true for atEnd until after at least one EOF has been output or adding a '\032' character to the input so that the Reader doesn't realize it's at the end until after it has emitted that character.
Please tell me I'm missing an easier way...
You don't need to do this at all. Use
new YourLexical.Scanner("foo")
to create a Reader[YourLexical.Token], which will respond to #atEnd automatically.
Then you can hand this Reader to a TokenParser implementing your syntax directly as input:
class YourTokenParser ... {
...
def program: Parser[...] = ...
def parse(s: String) =
phrase(program)(new YourLexical.Scanner(s))
}

Resources