Remove unecessary layout symbol from expression - rascal

I am trying to achieve this in rascal :
select a from myTable; => select a from myTable
Basically I just want to remove the unnecessary Layout of expression and also the commentaries in order to get the minimal form of my expression.
How can I achieve that in Rascal ?

You can replace a specific parse tree node with itself with a visit =>, and substitute the children back. In the example below, only the spaces around the + are replaced then
You can replace all Layout in one go, by visiting the tree and matching on the Layout non-terminal (could be a different name for your grammar). In the example below, you can also see how that is done.
module NormalizeLayout
import demo::lang::Pico::Syntax;
import ParseTree;
import IO;
// a stub space
private Layout space = appl(prod(layouts("Layout"), [], {}), [char(32)]);
void example() {
t = parse(#start[Program],
"begin
' declare
' a : natural;
'
' a := 1 +
' 1
'end");
t = visit (t) {
// this one sets the spaces only of a very specific rule, namely + expressions:
case (Expression) `<Expression l> + <Expression r>`
=> (Expression) `<Expression l> + <Expression r>`
// as an alternative here we match all layout in the entire program and replace it
case Layout _ => space
}
println(t); // begin declare a : natural ; a := 1 + 1 end
}
The main difficulty lies in obtaining an example parse tree for a space since the (NT) and the [NT] notation do not support parsing directly from a layout non-terminal. So in this example I jumped to the generic level and invented a parse tree for Layouts (see ParseTree module for how parse trees are defined under-the-hood in Rascal).
Another (type-safe) trick for obtaining such an example tree is getting one from an example like so:
private Layout space() {
visit ((Expression) `1 + 1`) {
case Layout x : return x;
}
fail;
}

Related

Why is the ordered choice ignored of toChoiceParser() when adding a plus() parser?

I am stuck at one point with the Dart package petitparser: It seems that the "priority rule" ("parse p1, if that doesn't work parse p2 - ordered choice") is ignored by the toChoiceParser() if a plus() parser is added.
import 'package:petitparser/petitparser.dart';
// This parser should check from left to right if a nestedTerm, e.g. '(0)' or '(()', exists.
// If this is not the case, then it looks if a singleCharacter exists, either '(', ')' or '0' (lower priority).
// In case 1 everything works perfectly. But if the process is repeated any number of times, as in case 2,
// then it seems that it no longer recognizes that a nestedTerm exists and that this should actually lead
// to the same terminal output as in case 1 due to the higher priority. Where is my fallacy?
void main() {
final definition = ExpressionDefinition();
final parser = definition.build();
print(parser.parse('(0)').toString());
// Terminal output in case 1: ['(' (nestedTerm), '0' (singleCharacter), ')' (nestedTerm)]
// Terminal output in case 2: ['(' (singleCharacter), '0' (singleCharacter), ')' (singleCharacter)]
}
class ExpressionDefinition extends GrammarDefinition {
#override
Parser start() => ref0(term).end();
// Case 1 (parses only once):
Parser term() => ref0(nestedTerm) | ref0(singleCharacter);
// Case 2 (parses one or more times):
// Parser term() => (ref0(nestedTerm) | ref0(singleCharacter)).plus();
Parser nestedTerm() =>
(char('(')).map((value) => "'$value' (nestedTerm)") &
ref0(term) &
char(')').map((value) => "'$value' (nestedTerm)");
Parser singleCharacter() =>
char('(').map((value) => "'$value' (singleCharacter)") |
char(')').map((value) => "'$value' (singleCharacter)") |
char('0').map((value) => "'$value' (singleCharacter)");
}
However, for my current project, the "priority rule" should also work in this case (in this example case 2).
Can anyone find my fallacy? Thanks a lot for your support!
Probably the easiest way to understand what is going on is to compare the parse trace of the two parsers, see also the section on debugging grammars I recently added:
import 'package:petitparser/debug.dart';
void main() {
...
trace(parser).parse('(0)');
You will see that in case 2 the nested-term is correctly started, but then for the inside of the nested-term the plus() parser eagerly consumes the remaining input characters 0 and ). This then causes the outer nested-term to fail because it cannot be completed with a ) anymore. As a consequence the complete input is consumed using single-characters.
From the examples given it is not entirely clear what you expect to get? Removing char(')') from the singleCharacter parser would solve issue described.

How to prove a thereom about a type class outside the original type class definition?

I was trying alternative ways to write the below proof from this question and Isabelle 2020's Rings.thy. (In particular, I added the note div_mult_mod_eq[of a b] line to test the use of the note command:
lemma mod_div_decomp:
fixes a b
obtains q r where "q = a div b" and "r = a mod b"
and "a = q * b + r"
proof -
from div_mult_mod_eq have "a = a div b * b + a mod b" by simp
note div_mult_mod_eq[of a b]
moreover have "a div b = a div b" ..
moreover have "a mod b = a mod b" ..
note that ultimately show thesis by blast
qed
However, if I write it in a separate .thy file, there is an error about type unification at the note line:
Type unification failed: Variable 'a::{plus,times} not of sort semiring_modulo
Failed to meet type constraint:
Term: a :: 'a
Type: ??'a
The problem goes way if I enclose the whole proof in a pair of type class class begin ... end as follows:
theory "test"
imports Main
HOL.Rings
begin
...
class semiring_modulo = comm_semiring_1_cancel + divide + modulo +
assumes div_mult_mod_eq: "a div b * b + a mod b = a"
begin
(* ... inserted proof here *)
end
...
end
My questions are:
Is this the correct way to prove a theorem about a type class? i.e. to write a separate class definition in a different file?
Is it always necessary to duplicate type class definitions as I did?
If not, what is the proper way to prove a theorem about a type class outside of its original place of definition?
There are two ways to prove things in type classes (basically sort = typeclass for Isabelle/HOL):
Proving in the context of the typeclass
context semiring_modulo
begin
...
end
(slightly less clean) Add the sort constraints to the type:
lemma mod_div_decomp:
fixes a b :: "'a :: {semiring_modulo}"
obtains q r where "q = a div b" and "r = a mod b"
and "a = q * b + r"
semiring_modulo subsumes plus and times, but you can also type {semiring_modulo,plus,times} to really have all of them.
The documentation of classes contains more examples.
The issue you ran into is related to how Isabelle implements polymorphism. Sorts represent a subset of all types, and we characterize them by a set of intersected classes. By attaching a sort to a variable, we restrict the space of terms with which that variable can be instantiated with. One way of looking at this is an assumption that the variable belongs to a certain sort. In your case, type inference (+) (*) div mod apparently gives you {plus,times}, which is insufficient for div_mult_mod_eq. To restrict the variable further you can make an explicit type annotation as Mathias explained.
Note that the simp in the line above should run into the same problem.

customising the parse return value, retaining unnamed terminals

Consider the grammar:
TOP ⩴ 'x' Y 'z'
Y ⩴ 'y'
Here's how to get the exact value ["TOP","x",["Y","y"],"z"] with various parsers (not written manually, but generated from the grammar):
xyz__Parse-Eyapp.eyp
%strict
%tree
%%
start:
TOP { shift; use JSON::MaybeXS qw(encode_json); print encode_json $_[0] };
TOP:
'x' Y 'z' { shift; ['TOP', (scalar #_) ? #_ : undef] };
Y:
'y' { shift; ['Y', (scalar #_) ? #_ : undef] };
%%
xyz__Regexp-Grammars.pl
use 5.028;
use strictures;
use Regexp::Grammars;
use JSON::MaybeXS qw(encode_json);
print encode_json $/{TOP} if (do { local $/; readline; }) =~ qr{
<nocontext:>
<TOP>
<rule: TOP>
<[anon=(x)]> <[anon=Y]> <[anon=(z)]>
<MATCH=(?{['TOP', $MATCH{anon} ? $MATCH{anon}->#* : undef]})>
<rule: Y>
<[anon=(y)]>
<MATCH=(?{['Y', $MATCH{anon} ? $MATCH{anon}->#* : undef]})>
}msx;
Code elided for the next two parsers. With Pegex, the functionality is achieved by inheriting from Pegex::Receiver. With Marpa-R2, the customisation of the return value is quite limited, but nested arrays are possible out of the box with a configuration option.
I have demonstrated that the desired customisation is possible, although it's not always easy or straight-forward. These pieces of code attached to the rules are run as the tree is assembled.
The parse method returns nothing but nested Match objects that are unwieldy. They do not retain the unnamed terminals! (Just to make sure what I'm talking about: these are the two pieces of data at the RHS of the TOP rule whose values are 'x' and 'z'.) Apparently only data springing forth from named declarators are added to the tree.
Assigning to the match variable (analog to how it works in Regexp-Grammars) seems to have no effect. Since the terminals do no make it into the match variable, actions don't help, either.
In summary, here's the grammar and ordinary parse value:
grammar {rule TOP { x <Y> z }; rule Y { y };}.parse('x y z')
How do you get the value ["TOP","x",["Y","y"],"z"] from it? You are not allowed to change the shape of rules because that would potentially spoil user attached semantics, otherwise anything else is fair game. I still think the key to the solution is the match variable, but I can't see how.
Not a full answer, but the Match.chunks method gives you a few of the input string tokenized into captured and non-captured parts.
It does, however, does not give you the ability to distinguish between non-capturing literals in the regex and implicitly matched whitespace.
You could circumvent that by adding positional captures, and use Match.caps
my $m = grammar {rule TOP { (x) <Y> (z) }; rule Y { (y) }}.parse('x y z');
sub transform(Pair $p) {
given $p.key {
when Int { $p.value.Str }
when Str { ($p.key, $p.value.caps.map(&transform)).flat }
}
}
say $m.caps.map(&transform);
This produces
(x (Y y) z)
so pretty much what you wanted, except that the top-level TOP is missing (which you'll likely only get in there if you hard-code it).
Note that this doesn't cover all edge cases; for example when a capture is quantified, $p.value is an Array, not a match object, so you'll need another level of .map in there, but the general idea should be clear.

Pass values as arguments to rules

When implementing real world (TM) languages, I often encounter a situation like this:
(* language Foo *)
type A = ... (* parsed by parse_A *)
type B = ... (* parsed by parse_B *)
type collection = { as : A list ; bs : B list }
(* parser ParseFoo.mly *)
parseA : ... { A ( ... ) }
parseB : ... { B ( ... ) }
parseCollectionElement : parseA { .. } | parseB { .. }
parseCollection : nonempty_list (parseCollectionElement) { ... }
Obviously (in functional style), it would be best to pass the partially parsed collection record to each invocation of the semantic actions of parseA and parseB and update the list elements accordingly.
Is that even possible using menhir, or does one have to use the ugly hack of using a mutable global variable?
Well, you're quite limited in what you're allowed to do in menhir/ocamlyacc semantic actions. If you find this really frustrating, you can try parsec-like parsers, e.g. mparser, that allow you to use OCaml in your rules to full extent.
My personal aproach to this kind of problems is to stay on a most primitive level in the parser, without trying to define anything sophisticated, and lift parser output to higher level later.
But your case looks simple enough to me. Here, instead of using parametrized menhir rule, you can just write a list rule by hand, and produce a collection in its semantic rule. nonempty_list is a syntactic sugar, that, as any other sugars, works in most cases, but in general is less general.

Position of a node?

Rascal is rooted in term rewriting. Does it have built-in support for term/node position as commonly defined in term rewriting so that I can query for the position of a sub-term inside a term or the other way around?
I don't believe explicit positions are commonly defined in the semantics of term rewriting, but nevertheless Rascal defines all kinds of operations on terms such that positions are explicit or can be made explicit. Please also have a look at he manuals at http://www.rascal-mpl.org
The main operation on terms is pattern matching using normal first order congruence, deep (higher order) match, negative match, disjunctive match, etc:
if (and(and(_, _), _) := and(and(true(),false()), false())) // pattern match operator :=
println("yes!");
and(true(), b) = b; // function definition, aka rewrite rule
and(false(), _) = false();
[ a | and(a,b) <- booleanList]; // comprehension with pattern as filter on a generator
innermost visit (t) { // explicit automated traversal with strategies
case and(a,b) => or(a,b)
}
b.leftHandSide = true(); // assign new child term to the leftHandSide field of the term assigned to the b variable (non-destructively, you get a new b)
b[0] = false(); // same but to the anonymous first child.
Then there are the normal projection operators, index on the children term[0] and child-by-name: term.myChildName if there was a many sorted term signature defined using field labels.
if you want to know at which position a sub-child is, I would perhaps write it as such:
int getPos(node t, value child) = [*pre, child, *_] := getChildren(t) ? size(pre) : -1;
but there are other ways of achieving the same.
Rascal does not have pointers to the parents of a term.

Resources