How to understand the "isCommitted" property of ParserResult? - parsing

I'm reading the source of polux's great parsers, and found there is a special isCommitted property which I can't understand:
class ParseResult<A> {
final bool isSuccess;
final bool isCommitted;
/// [:null:] if [:!isSuccess:]
final A value;
final String text;
final Position position;
final Expectations expectations;
// ...
}
You can see there is already a isSuccess to indicate the parse result is successful or not, why do we need a isCommitted? I tried to read related code, but still don't understand.
If you want to see the source, you can find it here.

The short answer is: don't worry about isCommited, it's for internal purposes only.
The long answer is: you can call commited on a paser, which means that once it has succeeded, you know for sure that it's pointless to backtrack (very much like Prolog's cut). For instance consider a grammar like this:
expr() => str('(') + rec(expr) str(')') ^ ...
| num()
Assume we parse the string "(...". Once we have recognized the parenthesis, we know for sure that if ... turns out not to be an expr, there is no need to rewind to the start of the string and try to parse a num, since a num will never start with a parenthesis anyway. We can fail early. This is done by marking ( as being a "commit point":
expr() => str('(').commited + rec(expr) str(')') ^ ...
| num()
This is an optimisation which should be used with great care because it breaks the modularity of parsers with respect to |. I personally never had to use it so far.
Whenever you call commited on a parser, it returns a new parser whose isCommited property is true. It is then used by | to decide whether to backtrack or not. This is what isCommited is used for. As an end user you should never have to care. I should probably make it private.
This feature is inspired by Polyparse's commit.

Related

Does Dart have a comma operator?

Consider the following line of code that doesn't compile in Dart -- lack of comma operator, but comparable things are totally fine in JavaScript or C++:
final foo = (ArgumentError.checkNotNull(value), value) * 2;
The closest I could get with an ugly workaround is
final foo = last(ArgumentError.checkNotNull(value), value) * 2;
with function
T last<T>(void op, T ret) => ret;
Is there a better solution?
Dart does not have a comma operator similar to the one in JavaScript.
There is no obviously better solution than what you already have.
The work-around operation you introduced is how I would solve it. I usually call it seq for "sequence" if I write it.
There is sadly no good way to use an extension operator because you need to be generic on the second operand and operators cannot be generic. You could use an extension method like:
extension Seq on void {
T seq<T>(T next) => next;
}
Then you can write ArgumentError.checkNotNull(value).seq(value).
(For what it's worth, the ArgumentError.checkNotNull function has been changed to return its value, but that change was made after releasing Dart 2.7, so it will only be available in the next release after that).
If the overhead doesn't matter, you can use closures without arguments for a similar effect (and also more complex operations than just a sequence of expressions).
final foo = () {
ArgumentError.checkNotNull(value);
return value;
} ();
This is not great for hot paths due to the overhead incurred by creating and calling a closure, but can work reasonably well outside those.
If you need this kind of test-plus-initialization pattern more than once, the cleanest way would arguably be to put it in a function of its own, anyway.
T ensureNotNull<T>(T value) {
ArgumentError.checkNotNull(value);
return value;
}
final foo = ensureNotNull(value);

Usage of ParentMap in Clang

There seems to be no examples online, according to the documentation Path, ParentMap's constructor accepts "Stmt *ASTRoot", which may means that later the ParentMap instance will find parents under the AST subtree under "ASTRoot". But how to get the root node of a translation unit? I tried
virtual bool VisitTranslationUnitDecl(TranslationUnitDecl *decl) {
//decl->dump();
Stmt *stmt = decl->getBody();
mParentMap = new ParentMap(stmt);
return true;
}
The goal is to create a ParentMap around the root nood then use it in other Visit*** callbacks during the scan process. But decl->getBody() is null. decl->dump() will print everything, and even scan the AST for the second time decl->getBody() is still null.
How to get the root Stmt of an AST? What is the right/better way to use ParentMap?
ParentMap is not really intended to be used on its own. You can use ASTContext::getParents, which constructs and maintains ParentMap.

writing a function to do type cast

I'm trying to write a function that does type casting, which seems to be a frequently occurring activity in Rascal code. But I can't seem to get it right. The following and several variations on it fail.
public &T cast(type[&T] tp, value v) throws str {
if (tp tv := v)
return tv;
else
throw "cast failed";
}
Can someone help me out?
Some more info: I frequently use pattern matching against a pattern of the form "Type Var" (i.e. against a variable declaration) in order to tell Rascal that an expression has a certain type, e.g.
map[str,value] m := myexp
This is usually in cases where I know that myexp has type map[str,value], but omitting the matching would make Rascal's type checking mechanism complain.
In order to be a bit more defensive against mistakes, I usually wrap the matching construct in an if-then-else where an exception is raised if the match fails:
if (map[str,value] m := myexp) {
// use m
} else {
throw "cast failed";
}
I would like to shorten all such similar pieces of code using a single function that does the job generically, so that I can write instead
cast(#map[str,value], myexp)
PS. Also see How to cast a value type to Map in Rascal?
It seems that the best way to write this, if you truly need to do this, is the following:
public map[str,value] cast(map[str,value] v) = v;
public default map[str,value] cast(value v) { throw "cast failed!"; }
Then you could just say
m = cast(myexp);
and it would do what you want to do -- the actual pattern matching is moved into the function signature for cast, with a case specific to the type you are wanting to use and a case that handles everything that doesn't otherwise match.
However, I'm still not sure why you are using type value, either here (inside the map) or in the linked question. The "standard" Rascal way of handling cases where you could have one of multiple choices is to define these with a user-defined data type and constructors. You could then use pattern matching to match the constructors, or use the is and has keywords to interrogate a value to check to see if it was created using a specific constructor or if it has a specific field, respectively. The rule for fields is that all occurrences of a field in the constructor definitions for a given ADT have the same type. So, it may help to know more about your usage scenario to see if this definition of cast is the best option or if there is a better solution to your problem.
EDITED
If you are reading JSON, an alternate way to do it is to use the JSON grammar and AST that also live in that part of the library (I think the one you are using is more of a stream reader, like our current text readers and writers, but I would need to look at the code more to be sure). You can then do something like this (long output included to give an idea of the results):
rascal>import lang::json::\syntax::JSON;
ok
rascal>import lang::json::ast::JSON;
ok
rascal>import lang::json::ast::Implode;
ok
ascal>js = buildAST(parse(#JSONText, |project://rascal/src/org/rascalmpl/library/lang/json/examples/twitter01.json|));
Value: object((
"since_id":integer(0),
"refresh_url":string("?since_id=202744362520678400&q=amsterdam&lang=en"),
"page":integer(1),
"since_id_str":string("0"),
"completed_in":float(0.058),
"results_per_page":integer(25),
"next_page":string("?page=2&max_id=202744362520678400&q=amsterdam&lang=en&rpp=25"),
"max_id_str":string("202744362520678400"),
"query":string("amsterdam"),
"max_id":integer(202744362520678400),
"results":array([
object((
"from_user":string("adekamel"),
"profile_image_url_https":string("https:\\/\\/si0.twimg.com\\/profile_images\\/2206104506\\/339515338_normal.jpg"),
"in_reply_to_status_id_str":string("202730522013728768"),
"to_user_id":integer(215350297),
"from_user_id_str":string("366868475"),
"geo":null(),
"in_reply_to_status_id":integer(202730522013728768),
"profile_image_url":string("http:\\/\\/a0.twimg.com\\/profile_images\\/2206104506\\/339515338_normal.jpg"),
"to_user_id_str":string("215350297"),
"from_user_name":string("nurul amalya \\u1d54\\u1d25\\u1d54"),
"created_at":string("Wed, 16 May 2012 12:56:37 +0000"),
"id_str":string("202744362520678400"),
"text":string("#Donnalita122 #NaishahS #fatihahmS #oishiihotchoc #yummy_DDG #zaimar93 #syedames I\'m here at Amsterdam :O"),
"to_user":string("Donnalita122"),
"metadata":object(("result_type":string("recent"))),
"iso_language_code":string("en"),
"from_user_id":integer(366868475),
"source":string("<a href="http:\\/\\/blackberry.com\\/twitter" rel="nofollow">Twitter for BlackBerry\\u00ae<\\/a>"),
"id":integer(202744362520678400),
"to_user_name":string("Rahmadini Hairuddin")
)),
object((
"from_user":string("kelashby"),
"profile_image_url_https":string("https:\\/\\/si0.twimg.com\\/profile_images\\/1861086809\\/me_beach_normal.JPG"),
"to_user_id":integer(0),
"from_user_id_str":string("291446599"),
"geo":null(),
"profile_image_url":string("http:\\/\\/a0.twimg.com\\/profile_images\\/1861086809\\/me_beach_normal.JPG"),
"to_user_id_str":string("0"),
"from_user_name":string("Kelly Ashby"),
"created_at":string("Wed, 16 May 2012 12:56:25 +0000"),
"id_str":string("202744310872018945"),
"text":string("45 days til freedom! Cannot wait! After Paris: London, maybe Amsterdam, then southern France, then CANADA!!!!"),
"to_user":null(),
"metadata":object(("result_type":string("recent"))),
"iso_language_code":string("en"),
"from_user_id":integer(291446599),
"source":string("<a href="http:\\/\\/mobile.twitter.com" rel="nofollow">Mobile Web<\\/a>"),
"id":integer(202744310872018945),
"to_user_name":null()
)),
object((
"from_user":string("johantolsma"),
"profile_image_url_https":string("https:\\/\\/si0.twimg.com\\/profile_images\\/1961917557\\/image_normal.jpg"),
"to_user_id":integer(0),
"from_user_id_str":string("23632499"),
"geo":null(),
"profile_image_url":string("http:\\/\\/a0.twimg.com\\/profile_images\\/1961917557\\/image_normal.jpg"),
"to_user_id_str":string("0"),
"from_user_name":string("Johan Tolsma"),
"created_at":string("Wed, 16 May 2012 12:56:16 +0000"),
"id_str":string("202744274050236416"),
"text":string("RT #agerolemou: Office space for freelancers in Amsterdam http:\\/\\/t.co\\/6VfHuLeK"),
"to_user":null(),
"metadata":object(("result_type":string("recent"))),
"iso_language_code":string("en"),
"from_user_id":integer(23632499),
"source":string("<a href="http:\\/\\/itunes.apple.com\\/us\\/app\\/twitter\\/id409789998?mt=12" rel="nofollow">Twitter for Mac<\\/a>"),
"id":integer(202744274050236416),
"to_user_name":null()
)),
object((
"from_user":string("hellosophieg"),
"profile_image_url_https":string("https:\\/\\/si0.twimg.com\\/profile_images\\/2213055219\\/image_normal.jpg"),
"to_user_id":integer(0),
"from_user_id_str":string("41153106"),
"geo":null(),
"profile_image_url":string("http:\\/\\/a0.twimg.com\\/profile_images\\/2213055219\\/image_normal.jp...
rascal>js is object;
bool: true
rascal>js.members<0>;
set[str]: {"since_id","refresh_url","page","since_id_str","completed_in","results_per_page","next_page","max_id_str","query","max_id","results"}
rascal>js.members["results_per_page"];
Value: integer(25)
You can then use pattern matching, over the types defined in lang::json::ast::json, to extract the information you need.
The code has a bug. This is the fixed code:
public &T cast(type[&T] tp, value v) throws str {
if (&T tv := v)
return tv;
else
throw "cast failed";
}
Note that we do not wish to include this in the standard library. Rather lets collect cases where we need it and find out how to fix it in another way.
If you find you need this casting often, then you might be avoiding the better parts of Rascal, such as pattern based dispatch. See also the answer by Mark Hills.

ANTLR Parse tree modification

I'm using ANTLR4 to create a parse tree for my grammar, what I want to do is modify certain nodes in the tree. This will include removing certain nodes and inserting new ones. The purpose behind this is optimization for the language I am writing. I have yet to find a solution to this problem. What would be the best way to go about this?
While there is currently no real support or tools for tree rewriting, it is very possible to do. It's not even that painful.
The ParseTreeListener or your MyBaseListener can be used with a ParseTreeWalker to walk your parse tree.
From here, you can remove nodes with ParserRuleContext.removeLastChild(), however when doing this, you have to watch out for ParseTreeWalker.walk:
public void walk(ParseTreeListener listener, ParseTree t) {
if ( t instanceof ErrorNode) {
listener.visitErrorNode((ErrorNode)t);
return;
}
else if ( t instanceof TerminalNode) {
listener.visitTerminal((TerminalNode)t);
return;
}
RuleNode r = (RuleNode)t;
enterRule(listener, r);
int n = r.getChildCount();
for (int i = 0; i<n; i++) {
walk(listener, r.getChild(i));
}
exitRule(listener, r);
}
You must replace removed nodes with something if the walker has visited parents of those nodes, I usually pick empty ParseRuleContext objects (this is because of the cached value of n in the method above). This prevents the ParseTreeWalker from throwing a NPE.
When adding nodes, make sure to set the mutable parent on the ParseRuleContext to the new parent. Also, because of the cached n in the method above, a good strategy is to detect where the changes need to be before you hit where you want your changes to go in the walk, so the ParseTreeWalker will walk over them in the same pass (other wise you might need multiple passes...)
Your pseudo code should look like this:
public void enterRewriteTarget(#NotNull MyParser.RewriteTargetContext ctx){
if(shouldRewrite(ctx)){
ArrayList<ParseTree> nodesReplaced = replaceNodes(ctx);
addChildTo(ctx, createNewParentFor(nodesReplaced));
}
}
I've used this method to write a transpiler that compiled a synchronous internal language into asynchronous javascript. It was pretty painful.
Another approach would be to write a ParseTreeVisitor that converts the tree back to a string. (This can be trivial in some cases, because you are only calling TerminalNode.getText() and concatenate in aggregateResult(..).)
You then add the modifications to this visitor so that the resulting string representation contains the modifications you try to achieve.
Then parse the string and you get a parse tree with the desired modifications.
This is certainly hackish in some ways, since you parse the string twice. On the other hand the solution does not rely on antlr implementation details.
I needed something similar for simple transformations. I ended up using a ParseTreeWalker and a custom ...BaseListener where I overwrote the enter... methods. Inside this method the ParserRuleContext.children is available and can be manipulated.
class MyListener extends ...BaseListener {
#Override
public void enter...(...Context ctx) {
super.enter...(ctx);
ctx.children.add(...);
}
}
new ParseTreeWalker().walk(new MyListener(), parseTree);

Is None less evil than null?

In F# its a big deal that they do not have null values and do not want to support it. Still the programmer has to make cases for None similar to C# programmers having to check != null.
Is None really less evil than null?
The problem with null is that you have the possibility to use it almost everywhere, i.e. introduce invalid states where this is neither intended nor makes sense.
Having an 'a option is always an explicit thing. You state that an operation can either produce Some meaningful value or None, which the compiler can enforce to be checked and processed correctly.
By discouraging null in favor of an 'a option-type, you basically have the guarantee that any value in your program is somehow meaningful. If some code is designed to work with these values, you cannot simply pass invalid ones, and if there is a function of option-type, you will have to cover all possibilities.
Of course it is less evil!
If you don't check against None, then it most cases you'll have a type error in your application, meaning that it won't compile, therefore it cannot crash with a NullReferenceException (since None translates to null).
For example:
let myObject : option<_> = getObjectToUse() // you get a Some<'T>, added explicit typing for clarity
match myObject with
| Some o -> o.DoSomething()
| None -> ... // you have to explicitly handle this case
It is still possible to achieve C#-like behavior, but it is less intuitive, as you have to explicitly say "ignore that this can be None":
let o = myObject.Value // throws NullReferenceException if myObject = None
In C#, you're not forced to consider the case of your variable being null, so it is possible that you simply forget to make a check. Same example as above:
var myObject = GetObjectToUse(); // you get back a nullable type
myObject.DoSomething() // no type error, but a runtime error
Edit: Stephen Swensen is absolutely right, my example code had some flaws, was writing it in a hurry. Fixed. Thank you!
Let's say I show you a function definition like this:
val getPersonByName : (name : string) -> Person
What do you think happens when you pass in a name of a person who doesn't exist in the data store?
Does the function throw a NotFound exception?
Does it return null?
Does it create the person if they don't exist?
Short of reading the code (if you have access to it), reading the documentation (if someone was kindly enough to write it), or just calling the function, you have no way of knowing. And that's basically the problem with null values: they look and act just like non-null values, at least until runtime.
Now let's say you have a function with this signature instead:
val getPersonByName : (name : string) -> option<Person>
This definition makes it very explicit what happens: you'll either get a person back or you won't, and this sort of information is communicated in the function's data type. Usually, you have a better guarantee of handling both cases of a option type than a potentially null value.
I'd say option types are much more benevolent than nulls.
In F# its a big deal that they do not have null values and do not want to support it. Still the programmer has to make cases for None similar to C# programmers having to check != null.
Is None really less evil than null?
Whereas null introduces potential sources of run-time error (NullRefereceException) every time you dereference an object in C#, None forces you to make the sources of run-time error explicit in F#.
For example, invoking GetHashCode on a given object causes C# to silently inject a source of run-time error:
class Foo {
int m;
Foo(int n) { m=n; }
int Hash() { return m; }
static int hash(Foo o) { return o.Hash(); }
};
In contrast, the equivalent code in F# is expected to be null free:
type Foo =
{ m: int }
member foo.Hash() = foo.m
let hash (o: Foo) = o.Hash()
If you really wanted an optional value in F# then you would use the option type and you must handle it explicitly or the compiler will give a warning or error:
let maybeHash (o: Foo option) =
match o with
| None -> 0
| Some o -> o.Hash()
You can still get NullReferenceException in F# by circumventing the type system (which is required for interop):
> hash (box null |> unbox);;
System.NullReferenceException: Object reference not set to an instance of an object.
at Microsoft.FSharp.Core.LanguagePrimitives.IntrinsicFunctions.UnboxGeneric[T](Object source)
at <StartupCode$FSI_0021>.$FSI_0021.main#()
Stopped due to error

Resources