How to handle multiple optionals in grammar - rascal

I would like to know how I can handle multiple optionals without concrete pattern matching for each possible permutation.
Below is a simplified example of the problem I am facing:
lexical Int = [0-9]+;
syntax Bool = "True" | "False";
syntax Period = "Day" | "Month" | "Quarter" | "Year";
layout Standard = [\ \t\n\f\r]*;
syntax Optionals = Int? i Bool? b Period? p;
str printOptionals(Optionals opt){
str res = "";
if(!isEmpty("<opt.i>")) { // opt has i is always true (same for opt.i?)
res += printInt(opt.i);
}
if(!isEmpty("<opt.b>")){
res += printBool(opt.b);
}
if(!isEmpty("<opt.p>")) {
res += printPeriod(opt.period);
}
return res;
}
str printInt(Int i) = "<i>";
str printBool(Bool b) = "<b>";
str printPeriod(Period p) = "<p>";
However this gives the error message:
The called signature: printInt(opt(lex("Int"))), does not match the declared signature: str printInt(sort("Int"));
How do I get rid of the opt part when I know it is there?

I'm not sure how ideal this is, but you could do this for now:
if (/Int i := opt.i) {
res += printInt(i);
}
This will extract the Int from within opt.i if it is there, but the match will fail if Int was not provided as one of the options.

The current master on github has the following feature to deal with optionals: they can be iterated over.
For example:
if (Int i <- opt.i) {
res += printInt(i);
}
The <- will produce false immediately if the optional value is absent, and otherwise loop once through and bind the value which is present to the pattern.

An untyped solution is to project out the element from the parse tree:
rascal>opt.i.args[0];
Tree: `1`
Tree: appl(prod(lex("Int"),[iter(\char-class([range(48,57)]))],{}),[appl(regular(iter(\char-class([range(48,57)]))),[char(49)])[#loc=|file://-|(0,1,<1,0>,<1,1>)]])[#loc=|file://-|(0,1,<1,0>,<1,1>)]
However, then to transfer this back to an Int you'd have to pattern match, like so:
rascal>if (Int i := opt.i.args[0]) { printInt(i); }
str: "1"
One could write a generic cast function to help out here:
rascal>&T cast(type[&T] t, value v) { if (&T a := v) return a; throw "cast exception"; }
ok
rascal>printInt(cast(#Int, opt.i.args[0]))
str: "1"
Still, I believe Rascal is missing a feature here. Something like this would be a good feature request:
rascal>Int j = opt.i.value;
rascal>opt.i has value
bool: true

Related

Building expression parser with Dart petitparser, getting stuck on node visitor

I've got more of my expression parser working (Dart PetitParser to get at AST datastructure created with ExpressionBuilder). It appears to be generating accurate ASTs for floats, parens, power, multiply, divide, add, subtract, unary negative in front of both numbers and expressions. (The nodes are either literal strings, or an object that has a precedence with a List payload that gets walked and concatenated.)
I'm stuck now on visiting the nodes. I have clean access to the top node (thanks to Lukas), but I'm stuck on deciding whether or not to add a paren. For example, in 20+30*40, we don't need parens around 30*40, and the parse tree correctly has the node for this closer to the root so I'll hit it first during traversal. However, I don't seem to have enough data when looking at the 30*40 node to determine if it needs parens before going on to the 20+.. A very similar case would be (20+30)*40, which gets parsed correctly with 20+30 closer to the root, so once again, when visiting the 20+30 node I need to add parens before going on to *40.
This has to be a solved problem, but I never went to compiler school, so I know just enough about ASTs to be dangerous. What "a ha" am I missing?
// rip-common.dart:
import 'package:petitparser/petitparser.dart';
// import 'package:petitparser/debug.dart';
class Node {
int precedence;
List<dynamic> args;
Node([this.precedence = 0, this.args = const []]) {
// nodeList.add(this);
}
#override
String toString() => 'Node($precedence $args)';
String visit([int fromPrecedence = -1]) {
print('=== visiting $this ===');
var buf = StringBuffer();
var parens = (precedence > 0) &&
(fromPrecedence > 0) &&
(precedence < fromPrecedence);
print('<$fromPrecedence $precedence $parens>');
// for debugging:
var curlyOpen = '';
var curlyClose = '';
buf.write(parens ? '(' : curlyOpen);
for (var arg in args) {
if (arg is Node) {
buf.write(arg.visit(precedence));
} else if (arg is String) {
buf.write(arg);
} else {
print('not Node or String: $arg');
buf.write('$arg');
}
}
buf.write(parens ? ')' : curlyClose);
print('$buf for buf');
return '$buf';
}
}
class RIPParser {
Parser _make_parser() {
final builder = ExpressionBuilder();
var number = char('-').optional() &
digit().plus() &
(char('.') & digit().plus()).optional();
// precedence 5
builder.group()
..primitive(number.flatten().map((a) => Node(0, [a])))
..wrapper(char('('), char(')'), (l, a, r) => Node(0, [a]));
// negation is a prefix operator
// precedence 4
builder.group()..prefix(char('-').trim(), (op, a) => Node(4, [op, a]));
// power is right-associative
// precedence 3
builder.group()..right(char('^').trim(), (a, op, b) => Node(3, [a, op, b]));
// multiplication and addition are left-associative
// precedence 2
builder.group()
..left(char('*').trim(), (a, op, b) => Node(2, [a, op, b]))
..left(char('/').trim(), (a, op, b) => Node(2, [a, op, b]));
// precedence 1
builder.group()
..left(char('+').trim(), (a, op, b) => Node(1, [a, op, b]))
..left(char('-').trim(), (a, op, b) => Node(1, [a, op, b]));
final parser = builder.build().end();
return parser;
}
Result _result(String input) {
var parser = _make_parser(); // eventually cache
var result = parser.parse(input);
return result;
}
String parse(String input) {
var result = _result(input);
if (result.isFailure) {
return result.message;
} else {
print('result.value = ${result.value}');
return '$result';
}
}
String visit(String input) {
var result = _result(input);
var top_node = result.value; // result.isFailure ...
return top_node.visit();
}
}
// rip_cmd_example.dart
import 'dart:io';
import 'package:rip_common/rip_common.dart';
void main() {
print('start');
String input;
while (true) {
input = stdin.readLineSync();
if (input.isEmpty) {
break;
}
print(RIPParser().parse(input));
print(RIPParser().visit(input));
}
;
print('done');
}
As you've observed, the ExpressionBuilder already assembles the tree in the right precedence order based on the operator groups you've specified.
This also happens for the wrapping parens node created here: ..wrapper(char('('), char(')'), (l, a, r) => Node(0, [a])). If I test for this node, I get back the input string for your example expressions: var parens = precedence == 0 && args.length == 1 && args[0] is Node;.
Unless I am missing something, there should be no reason for you to track the precedence manually. I would also recommend that you create different node classes for the different operators: ValueNode, ParensNode, NegNode, PowNode, MulNode, ... A bit verbose, but much easier to understand what is going on, if each of them can just visit (print, evaluate, optimize, ...) itself.

Abstract Syntax Tree for Source Code including Expressions

I am building a new simple programming language (just to learn how compilers work in my free time).
I have already built a lexer which can tokenize my source code into lexemes.
However, I am now stuck on how to form an Abstract Syntax Tree from the tokens, where the source code might contain an expression (with operator precedence).
For simplicity, I shall include only 4 basic operators: +, -, /, and * in addition to brackets (). Operator precedence will follow BODMAS rule.
I realize I might be able to convert the expression from infix to prefix/postfix, form the tree and substitute it.
However, I am not sure if that is possible. Even if it is possible, I am not sure how efficient it might be or how difficult it might be to implement.
Is there some trivial way to form the tree in-place without having to convert to prefix/postfix first?
I came across the Shunting Yard algorithm which seems to do this. However, I found it to be quite a complicated algorithm. Is there something simpler, or should I go ahead with implementing the Shunting Yard algorithm?
Currently, the following program is tokenized by my lexer as follows:
I am demonstrating using a Java program for syntax familiarity.
Source Program:
public class Hello
{
public static void main(String[] args)
{
int a = 5;
int b = 6;
int c = 7;
int r = a + b * c;
System.out.println(r);
}
}
Lexer output:
public
class
Hello
{
public
static
void
main
(
String
[
]
args
)
{
int
a
=
5
;
int
b
=
6
;
int
c
=
7
;
int
r
=
a
+
b
*
c
;
System
.
out
.
println
(
r
)
;
}
}
// I know this might look ugly that I use a global variable ret to return parsed subtrees
// but please bear with it, I got used to this for various performance/usability reasons
var ret, tokens
function get_precedence(op) {
// this is an essential part, cannot parse an expression without the precedence checker
if (op == '*' || op == '/' || op == '%') return 14
if (op == '+' || op == '-') return 13
if (op == '<=' || op == '>=' || op == '<' || op == '>') return 11
if (op == '==' || op == '!=') return 10
if (op == '^') return 8
if (op == '&&') return 6
if (op == '||') return 5
return 0
}
function parse_primary(pos) {
// in the real language primary is almost everything that can be on the sides of +
// but here we only handle numbers detected with the JavaScript 'typeof' keyword
if (typeof tokens[pos] == 'number') {
ret = {
type: 'number',
value: tokens[pos],
}
return pos + 1
}
else {
return undefined
}
}
function parse_operator(pos) {
// let's just reuse the function we already wrote insted of creating another huge 'if'
if (get_precedence(tokens[pos]) != 0) {
ret = {
type: 'operator',
operator: tokens[pos],
}
return pos + 1
}
else {
return undefined
}
}
function parse_expr(pos) {
var stack = [], code = [], n, op, next, precedence
pos = parse_primary(pos)
if (pos == undefined) {
// error, an expression can only start with a primary
return undefined
}
stack.push(ret)
while (true) {
n = pos
pos = parse_operator(pos)
if (pos == undefined) break
op = ret
pos = parse_primary(pos)
if (pos == undefined) break
next = ret
precedence = get_precedence(op.operator)
while (stack.length > 0 && get_precedence(stack[stack.length - 1].operator) >= precedence) {
code.push(stack.pop())
}
stack.push(op)
code.push(next)
}
while(stack.length > 0) {
code.push(stack.pop())
}
if (code.length == 1) ret = code[0]
else ret = {
type: 'expr',
stack: code,
}
return n
}
function main() {
tokens = [1, '+', 2, '*', 3]
var pos = parse_expr(0)
if (pos) {
console.log('parsed expression AST')
console.log(ret)
}
else {
console.log('unable to parse anything')
}
}
main()
Here is your bare-bones implementation of shunting yard expression parsing. This is written in JavaScript. This is as minimalistic and simple as you can get. Tokenizing is left off for brevity, you give the parse the array of tokens (you call them lexemes).
The actual Shunting Yard is the parse_expr function. This is the "classic" implementation that uses the stack, this is my preference, some people prefer functional recursion.
Functions that parse various syntax elements are usually called "parselets". here we have three of them, one for expression, others are for primary and operator. If a parselet detects the corresponding syntax construction at the position pos it will return the next position right after the construct, and the construct itself in AST form is returned via the global variable ret. If the parselet does not find what it expects it returns undefined.
It is now trivially simple to add support for parens grouping (, just extend parse_primary with if (parse_group())... else if (parse_number())... etc. In the meantime your parse_primary will grow real big supporting various things, prefix operators, function calls, etc.

Parsing values contained inside nested brackets

I'm just fooling about and strangely found it a bit tricky to parse nested brackets in a simple recursive function.
For example, if the program's purpose it to lookup user details, it may go from {{name surname} age} to {Bob Builder age} and then to Bob Builder 20.
Here is a mini-program for summing totals in curly brackets that demonstrates the concept.
// Parses string recursively by eliminating brackets
def parse(s: String): String = {
if (!s.contains("{")) s
else {
parse(resolvePair(s))
}
}
// Sums one pair and returns the string, starting at deepest nested pair
// e.g.
// {2+10} lollies and {3+{4+5}} peanuts
// should return:
// {2+10} lollies and {3+9} peanuts
def resolvePair(s: String): String = {
??? // Replace the deepest nested pair with it's sumString result
}
// Sums values in a string, returning the result as a string
// e.g. sumString("3+8") returns "11"
def sumString(s: String): String = {
val v = s.split("\\+")
v.foldLeft(0)(_.toInt + _.toInt).toString
}
// Should return "12 lollies and 12 peanuts"
parse("{2+10} lollies and {3+{4+5}} peanuts")
Any ideas to a clean bit of code that could replace the ??? would be great. It's mostly out of curiosity that I'm searching for an elegant solution to this problem.
Parser combinators can handle this kind of situation:
import scala.util.parsing.combinator.RegexParsers
object BraceParser extends RegexParsers {
override def skipWhitespace = false
def number = """\d+""".r ^^ { _.toInt }
def sum: Parser[Int] = "{" ~> (number | sum) ~ "+" ~ (number | sum) <~ "}" ^^ {
case x ~ "+" ~ y => x + y
}
def text = """[^{}]+""".r
def chunk = sum ^^ {_.toString } | text
def chunks = rep1(chunk) ^^ {_.mkString} | ""
def apply(input: String): String = parseAll(chunks, input) match {
case Success(result, _) => result
case failure: NoSuccess => scala.sys.error(failure.msg)
}
}
Then:
BraceParser("{2+10} lollies and {3+{4+5}} peanuts")
//> res0: String = 12 lollies and 12 peanuts
There is some investment before getting comfortable with parser combinators but I think it is really worth it.
To help you decipher the syntax above:
regular expression and strings have implicit conversions to create primitive parsers with strings results, they have type Parser[String].
the ^^ operator allows to apply a function to the parsed elements
it can convert a Parser[String] into a Parser[Int] by doing ^^ {_.toInt}
Parser is a monad and Parser[T].^^(f) is equivalent to Parser[T].map(f)
the ~, ~> and <~ requires some inputs to be in a certain sequence
the ~> and <~ drop one side of the input out of the result
the case a ~ b allows to pattern match the results
Parser is a monad and (p ~ q) ^^ { case a ~ b => f(a, b) } is equivalent to for (a <- p; b <- q) yield (f(a, b))
(p <~ q) ^^ f is equivalent to for (a <- p; _ <- q) yield f(a)
rep1 is a repetition of 1 or more element
| tries to match an input with the parser on its left and if failing it will try the parser on the right
How about
def resolvePair(s: String): String = {
val open = s.lastIndexOf('{')
val close = s.indexOf('}', open)
if((open >= 0) && (close > open)) {
val (a,b) = s.splitAt(open+1)
val (c,d) = b.splitAt(close-open-1)
resolvePair(a.dropRight(1)+sumString(c).toString+d.drop(1))
} else
s
}
I know it's ugly but I think it works fine.

Pattern combining type test and literal

The active pattern in this question fails to compile after upgrading to VS 2012 RTM. It provides a way to do a type test and match a literal within a single pattern. For example:
let (|Value|_|) value =
match box value with
| :? 'T as x -> Some x
| _ -> None
let getValue (name: string) (r: IDataReader) =
match r.[name] with
| null | :? DBNull | Value "" -> Unchecked.defaultof<_>
| v -> unbox v
Can this be done without the active pattern? I realize a when guard could be used (:? string as s when s = "") but it can't be combined with other patterns.
kvb's variation (which doesn't do quite the same thing since it assumes the type test succeeds) can be modified to produce a similar pattern:
let (|Value|_|) x value =
match box value with
| :? 'T as y when x = y -> Some()
| _ -> None
However, there is a subtle performance difference. The original active pattern translates to:
public static FSharpOption<T> |Value|_|<a, T>(a value)
{
object obj = value;
if (!LanguagePrimitives.IntrinsicFunctions.TypeTestGeneric<T>(obj))
{
return null;
}
return FSharpOption<T>.Some((T)((object)obj));
}
that is, it does a type test and cast. It's usage (match x with Value "" -> ...) translates to:
FSharpOption<string> fSharpOption = MyModule.|Value|_|<object, string>(obj);
if (fSharpOption != null && string.Equals(fSharpOption.Value, ""))
{
...
}
Most notably, the typed value returned from the pattern is matched using the typical compiler transformations for patterns (string.Equals for strings).
The updated pattern translates to:
public static FSharpOption<Unit> |Value|_|<T, a>(T x, a value)
{
object obj = value;
if (LanguagePrimitives.IntrinsicFunctions.TypeTestGeneric<T>(obj))
{
T y = (T)((object)obj);
T y3 = y;
if (LanguagePrimitives.HashCompare.GenericEqualityIntrinsic<T>(x, y3))
{
T y2 = (T)((object)obj);
return FSharpOption<Unit>.Some(null);
}
}
return null;
}
which uses generic equality and is less efficient than matching against a literal. The usage is a bit simpler since equality is baked into the pattern:
FSharpOption<Unit> fSharpOption = MyModule.|Value|_|<string, object>("", obj);
if (fSharpOption != null)
{
...
}
Anyway, it works. But I like the original better.
You should be able to use a parameterized active pattern:
let (|Value|_|) v x =
if unbox x = v then
Some()
else None
The usage should look exactly like what you've got now.
Edit
While I don't know if the breaking change was intentional, I believe that active patterns with generic return types unrelated to the input types should usually be avoided. When combined with type inference, they can easily mask subtle errors. Consider the following example, using your original (|Value|_|) pattern:
match [1] with
| Value [_] -> "Singleton"
| _ -> "Huh?"
It seems like this isn't something you would actually ever attempt - the name implies that Value should only be used with literals; parameterized active patterns enable exactly this scenario.

Does F# Seq.sort return a copy of the input sequence?

Here is some unexpected (by me) behaviour in F#. I have a simple class that sorts a sequence :
type MyQueue<'a when 'a : comparison> ( values : 'a[] ) =
let vals =
Seq.sort values
member this.First = Seq.nth 0 vals
override this.ToString() =
Seq.fold ( fun s a -> s + a.ToString() + ";" ) "" vals
I have written a slightly contrived unit test (in C#) to test this:
private class TestObject : IComparable
{
public TestObject( double Value )
{
this.Value = Value;
}
public void Update(double NewValue)
{
this.Value = NewValue;
}
public double Value { get ; private set; }
public int CompareTo(object Comparable)
{
return this.Value.CompareTo( (Comparable as TestObject).Value );
}
public override string ToString ()
{
return Value.ToString();
}
}
[Test]
public void TestUpdate_OK()
{
var nums = new double[]{7,4,3,12,11,3,8};
var values = nums.Select( n => new TestObject(n) ).ToArray();
var q = new MyQueue<TestObject>( values );
Console.WriteLine ( q.ToString() );
// update one of the values in the collection - should not re-sort the collection
values[3].Update( 2.0 );
Console.WriteLine ( q.ToString() );
Assert.AreEqual( q.First.Value, 3.0 );
}
the Seq.sort does sort the sequence, and the first output is correct :
3;3;4;7;8;11;12;
However, updating the test (reference type) object causes the sequence to be re-sorted :
2;3;3;4;7;8;11;
I expected that the vals in the MyQueue object would now be unsorted, since the value in the reference object has changed, but the Seq.sort appears to have been performed again. I don't understand, I thought the object of functional programming was to avoid side effects. Why do I get this behaviour?
The cause of this is the statement let vals = Seq.sort values is not actually sorting the values until some code consumes the vals variable i.e what your Seq.fold does in toString method, it consumes the vals sequence and at that time the sorting happens and whatever values are there in the values array at that time, those values are sorted, so basically the sorting is happening at the time when you call toString method.
Also, I won't call it FP :) as you are basically doing OOPs by creating type with private state and that state is accessed by type members.
Your problem is related to how sequences works and not in general applicable to FP.
Does F# Seq.sort return a copy of the input sequence?
Yes. What else could it do – to change the order of a set of value types you need to copy (true in all .NET languages).
(This includes LINQ operators in C# and VB: the lazy aspect is that the copy is only made when the first copied element is needed, and at that point a complete new collection is created.)
You can actually check this directly in the sourcecode for f# here but in short what it does is call Seq.toArray, sort the array in place and return that array back as the sequence.

Resources