Transform AST tree to another AST tree - parsing

I have a simple PEG parser that generates an AST tree. Every operator is right associative, so parsing A + B + C + D returns a tree [1]. Is there a simple way to transform [1] tree to one that would be created by left associative operator [2]?
[1] + [2] +
/ \ / \
A + + D
/ \ / \
B + + C
/ \ / \
C D A B

PEG generate right associative trees by default. You can add a post-processing step to invert it, like this:
{
function invert(tree, acc) {
if (acc === undefined) {
acc = tree[0]
}
if (tree.length == 1) {
return acc;
}
return invert(tree[2], [acc, tree[1], tree[2][0]]);
}
}
Start
= expr:Expression {
return invert(expr)
}
Expression
= head:Integer tail:(_ "+" _ Expression)* {
var result = [head]
for (var i = 0; i < tail.length; i++) {
result.push(tail[i][1])
result.push(tail[i][3])
}
return result;
}
Integer "integer"
= [0-9]+ { return parseInt(text(), 10); }
_ "whitespace"
= [ \t\n\r]*

Related

Is this handling of ambiguities in dypgen normal or is it not?

I would like to know, if this is a bug or behavior, that is intended by the inventor.
Here I have a minimal example of a dypgen grammar:
{
open Parse_tree
let dyp_merge = Dyp.keep_all
}
%start main
%layout [' ' '\t']
%%
main:
| a "\n" { $1 }
a:
| ms b { Mt ($1,$2) }
| b <Mt(_,_)> kon1 b { Koo ($1, $2, $3) }
| b <Mt(_,_)> kon2 b { Koo ($1, $2, $3) }
| b { $1 }
b:
| k { $1 }
| ns b { Nt ($1,$2) }
/* If you comment this line out, it will work with the permutation, but I need the 'n' ! */
/* | b <Nt(_,_)> kon1 b { Koo ($1, $2, $3) }
| b <Nt(_,_)> kon2 b { Koo ($1, $2, $3) }*/
k:
| k kon1 k { Koo ($1, $2, $3) }
| k kon2 k { Koo ($1, $2, $3) }
| ks { $1 }
ms:
| words <M(_)> { $1 }
ns:
| words <N(_)> { $1 }
ks:
| words <K(_)> { $1 }
kon1:
| words <U(_)> { $1 }
kon2:
| 'y' { Y($1) }
words:
| chain { $1 }
throw_away:
| word "|" throw_away { $3 }
| word { $1 }
chain:
| word "|" throw_away { $1 }
| word "|" chain { $3 }
| word { $1 }
word:
| ('m' ['1'-'9']?) { M ($1) }
| ('n' ['1'-'9']?) { N ($1) }
| ('K' ['1'-'9']?) { K ($1) }
| ('u' ['1'-'9']?) { U ($1) }
The example can handle such grammars:
Think about the ? and * as regular expression operators, and 's' and 'm' and 'K' as lexems.
s = m? n* K
the 'K', 'm' and 'n' can also be replaced by these letters and a following number between 1-9
or they can be replaced by lists delimited by '|' as
m1
n1|n2
K|K|K or K1|K2|K3
and these lists could also be mixed as
m1|n1|K1
all these lists are parsed as possible ambiguities, that are globally merged – in the known sense for dypgen – with
let dyp_merge = Dyp.keep_all
If you type in:
m1|n1|K1 m1|n1|K1 m1|n1|K1
you get the results:
m1 > n1 > K1
n1 > n1 > K1
If you type in
K1|K2
you get
K1
K2
Now the interesting point:
In the grammar there is another feature. There is a "koordination binding" in the style of natural languages with 'u' or with 'y'.
This can bind these lists of "phrases" (a 'K' letter with optional fronting 'm' and a optinal number of 'n's) to somethin like "K1 and K2".
The grammer can parse:
K1|K2 u K3|K4
K1|K2 y K3|K4
And as I thought, it should have the same result.
But the difference between the "koordination bindings" is:
lexem 'u' is defined as a list of ambiguities in the same ways as m, n, K and could also be mixed with 'K's, 'm's, 'n's
lexem 'y' is defined without this list festure.
And this makes a (surprising) difference:
K1|K2 u K3|K4
is parsed as:
koo { K1 u K4 }
koo { K2 u K4 }
and
K1|K2 y K3|K4
is parsed as:
koo2 { K1 y K3 }
koo2 { K2 y K3 }
koo2 { K1 y K4 }
koo2 { K2 y K4 }
In the first case the second part of the u-coordination is not permutated.
In the second case the second part of the coordination is permutated (as dypgen does it with ambiguities normally).
Why this makes a difference?
(It must be somehow connected to the m's and n's, because if the rules for 'n's are left out, it works.)
Best regards and thank you for thinking about
gwf
The minimal example is in the style of the dypgen-demos, to try, make a folder "abc" in the demos and put all the mentioned, fully cited files in there.
The "parse_tree":
type tree =
| M of string
| Mt of tree * tree
| N of string
| Nt of tree * tree
| K of string
| U of string
| Y of string
| Koo of tree * tree * tree
| Koo2 of tree * tree * tree * tree
A file "printit.ml":
open Parse_tree
let print_abc abc=
let rec aux1 t = match t with
| Koo(x1, k, x2) -> (
print_string "\x1b[1m\x1b[31mkoo {\x1b[21m\027[0m ";
aux1 x1;
print_string "";
aux1 k;
print_string "";
aux1 x2;
print_string "\x1b[1m\x1b[31m}\x1b[21m\027[0m")
| Koo2(k1, x1, k2, x2) -> (
print_string "\x1b[1m\x1b[31mkoo2 {\x1b[21m\027[0m ";
aux1 k1;
print_string " ";
aux1 x1;
print_string "";
aux1 k2;
print_string "";
aux1 x2;
print_string "\x1b[1m\x1b[31m}\x1b[21m\027[0m")
| Word (w) -> print_string (w ^ " ")
| M (w) -> print_string (w ^ " ")
| K (w) -> print_string (w ^ " ")
| N (w) -> print_string (w ^ " ")
| U (w) -> print_string (w ^ " ")
| Y (w) -> print_string (w ^ " ")
| Nt (p, l)
| Mt (p, l) -> (
print_string "";
aux1 p;
print_string " > ";
aux1 l;)
in
let aux2 t = aux1 t; print_newline () in
List.iter aux2 abc
and the "main" program:
open Parse_tree
open Printit
let () = print_endline "
please try:
K1|K2 u K3|K4
and
K1|K2 y K3|K4
"
let lexbuf = Dyp.from_channel (Abc_parser.pp ()) stdin
let _ =
try
while true do
(Dyp.flush_input lexbuf;
try
let pf = Abc_parser.main lexbuf in
print_abc (List.map (fun (x,_) -> x) pf)
with
Dyp.Syntax_error -> Printf.printf "Syntax error\n\n"
);
flush stdout
done
with Failure _ -> exit 0
and the "Makefile"
SOURCES = printit.ml abc_parser.dyp abc.ml
REP = -I ../../dyplib
CAMLC = ocamlc $(REP)
DYPGEN = ../../dypgen/dypgen --ocamlc "-I ../../dyplib"
LIBS=dyp.cma
all: abc
SOURCES1 = $(SOURCES:.mll=.ml)
SOURCES2 = $(SOURCES1:.dyp=.ml)
OBJS = $(SOURCES2:.ml=.cmo)
abc: parse_tree.cmi $(OBJS)
$(CAMLC) -o abc $(LIBS) $(OBJS)
.SUFFIXES: .ml .mli .cmo .cmi .dyp
.ml.cmo:
$(CAMLC) -c $<
.mli.cmi:
$(CAMLC) -c $<
.dyp.ml:
$(DYPGEN) $<
$(CAMLC) -c $*.mli
clean:
rm -f *.cm[iox] *~ .*~ *.o
rm -f abc
rm -f *.extract_type *_temp.ml
rm -f *parser.ml *parser.mli
I use dypgen but not the ambiguity handling.
A "merge point" is a point in the input stream where two parses of the same nonterminal are completed. If at this point the AST constructed by your action code is identical, you can safely discard either one of the two parses: there may be two ways to parse the non-terminal but the result is the same for both.
If the result is different, by default dypgen just throws one out unless you tell it to keep all the alternatives (which you have).
I'm not fully sure I understand your grammar, however there is a tricky thing in your grammar that may explain your problem.
Dypgen is GLR but it does NOT do proper GLR. If you have a recursion like
x = A x | A
y = y B | B
dypgen does tail and head optimisation and converts the recursion to a loop. You have that in your "throwaway". A real LR parser can only handle left recursion and would barf on right recursion. Dypgen handles both.
In a backtracking parser, if you have something like A*A as a grammar, it first fails on the trailing A because the leading A* ate all the A's in the input up, so it backtracks. GLR doesn't backtrack, it forks a new parse instead. But it cannot do that if it has tail or head optimised the recursion to a loop.
I suspect this is related to your problem. If you try to parse A*A* on AAAAA input, it should give 6 possible parses.

F# - Expected to be option but has a type

I was trying to build a binary tree in F# but when I tried to test my code, I met the problem above.
Here is my code:
type TreeNode<'a> = { Key: int; Val: 'a }
type Tree<'a> = { LT: Tree<'a> option; TreeNode: TreeNode<'a>; RT: Tree<'a> option; }
//insert a node according to Binary Tree operation
let rec insert (node: TreeNode<'a>) (tree: Tree<'a> option) =
match tree with
| None -> {LT = None; RT = None; TreeNode = node }
| Some t when node.Key < t.TreeNode.Key -> insert node t.LT
| Some t when node.Key > t.TreeNode.Key -> insert node t.RT
let t = seq { for i in 1 .. 10 -> { Key = i; Val = i } }|> Seq.fold (fun a i -> insert i a) None
Your insert function takes option<Tree<'T>> but returns Tree<'T>. When performing the fold, you need to keep state of the same type - so if you want to use None to represent empty tree, the state needs to be optional type.
The way to fix this is to wrap the result of insert in Some:
let tree =
seq { for i in 1 .. 10 -> { Key = i; Val = i } }
|> Seq.fold (fun a i -> Some(insert i a)) None
I worked it out now... It should be like below:
type TreeNode<'a> = { Key: int; Val: 'a }
type Tree<'a> = { TreeNode: TreeNode<'a>; RT: Tree<'a> option; LT: Tree<'a> option; }
//insert a node according to Binary Tree operation
let rec insert (node: TreeNode<'a>) (tree: Tree<'a> option) =
match tree with
| None -> {LT = None; RT = None; TreeNode = node }
| Some t when node.Key < t.TreeNode.Key -> {TreeNode = t.TreeNode; LT = Some(insert node t.LT); RT = t.RT}
| Some t when node.Key > t.TreeNode.Key -> {TreeNode = t.TreeNode; RT = Some(insert node t.RT); LT = t.LT}
let t = seq { for i in 1 .. 10-> { Key = i; Val = i } }|> Seq.fold (fun a i -> Some(insert i a)) None

How can I make my Grails domain class behave like a Number?

I have a bunch of Grails domain classes that I want to be able to treat as Numbers, Integers in particular. For the most part, they are just numeric values with a few extra properties that are used as meta-data. Here's an example:
class Score {
String meaning
Integer value
static hasMany = [responses:Response]
static constraints = {
meaning blank: false, maxSize: 100, unique: true
value min: 1, unique: true // Assume we're using a base-1 ranking system, where 1 is the lowest
}
}
I tried to add #Delegate to the value field, but it didn't seem to have any impact: I still couldn't do 7 + myScore. All I get is missing method exceptions because Integer doesn't have a signature matching plus(Score).
What is the correct way to go about doing this, since #Delegate doesn't seem to work?
Note: I also have a need to turn my domain classes into various Collections with meta data, but I expect it will be the same solution.
I'm imagining everyone has moved on, now that this question is over a year and a half old, but still, I'm surprised that no one offered up Groovy categories as a solution. Indeed, it seems to me that categories are the most natural solution to this problem. Without changing the "Grailsiness" of the original domain class, you can still instill the desired numeric behavior relatively easily.
First define the category class:
class ScoreCategory {
static asType(Score s, Class newClass) {
switch (newClass) {
case Integer.class:
case Integer.TYPE:
case Number.class: return s?.value ?: 0
default: throw new GroovyCastException("Cannot cast to ${newClass}")
}
}
static boolean equals(Score me, def you) {
you instanceof Score && me as int == you as int
}
static Score plus(Score a, b) { plusImpl(a, b) }
static Score plus(Score a, Number b) { plusImpl(a, b) }
static Score plus(Number a, Score b) { plusImpl(a, b) }
private static plusImpl(a, b) { new Score(value: (a as int) + (b as int)) }
static Score minus(Score a, b) { minusImpl(a, b) }
static Score minus(Score a, Number b) { minusImpl(a, b) }
static Score minus(Number a, Score b) { minusImpl(a, b) }
private static minusImpl(a, b) { a + -b }
static Score multiply(Score a, Number b) { multImpl(a,b) }
static Score multiply(Number a, Score b) { multImpl(a,b) }
private static multImpl(a,b) { new Score(value: (a as int) * (b as int)) }
static Integer div(Score a, b) { (a as int).intdiv(b as int) }
static Score div(Score a, Number b) { new Score(value:(a as int).intdiv(b as int)) }
static Score negative(Score a) { new Score(value: -(a as int)) }
static Score abs(Score a) { new Score(value: (a as int).abs())}
}
Then, at some suitably global place in the application declare the mixins:
Number.metaClass.mixin ScoreCategory
Score.metaClass.mixin ScoreCategory
After all this, as demonstrated below, Score objects should now behave like numeric quantities:
def (w, x, y, z) = [54, 78, 21, 32]
def (s1, s2) = [new Score(value:w), new Score(value:x)]
assert (s1 + s2) == new Score(value: w + x)
assert (s1 + y) == new Score(value: w + y)
assert (z + s2) == new Score(value: z + x)
assert (s1 - s2) == new Score(value: w - x)
assert (s1 - y) == new Score(value: w - y)
assert (z - s2) == new Score(value: z - x)
assert (s1 * y) == new Score(value: w * y)
assert (z * s2) == new Score(value: z * x)
assert (s2 / s1) == x.intdiv(w)
assert (s1 / y) == new Score(value: w / y)
You answer can be operator overloding. Here you are overloading the plus method to operate with integers and doubles:
class Score {
...
int plus(int otherValue){
return otherValue + value
}
double plus(double otherValue){
return (value as double) + otherValue
}
// And you can avoid my pitfall overriding asType method
Object asType(Class clazz) {
if (clazz == Integer) {
return value
}
else if(class == Double){
return value as Double
}
else {
super.asType(clazz)
}
}
}
assert new Score(value: 4) + 15 == 19
assert new Score(value: 4) + 15.32 == 19.32
assert 15.32 + (new Score(value: 4) as Double) == 19.32
assert 15 + (new Score(value: 4) as Integer) == 19
Add an overload for the plus operator on the Number metaclass:
Number.metaClass.plus = { Score s -> delegate + s.value }
Add this in BootStrap.groovy in your app, or in doWithDynamicMethods in a plugin.
Edit:
As pointed out in the comments, this doesn't work if the Score is on the left side of the operation. Add a plus method to the Score class to handle that:
Number plus(Number n) { value + n }
You can also extend Number.
class Score extends Number {
Integer value
public int intValue() { return value }
public double doubleValue() { return value }
public long longValue() { return value }
public float floatValue() { return value }
}
Score sc = new Score( value: 5 );
println 10 + sc

Pretty print expression with as few parentheses as possible?

My Question: What is the cleanest way to pretty print an expression without redundant parentheses?
I have the following representation of lambda expressions:
Term ::= Fun(String x, Term t)
| App(Term t1, Term t2)
| Var(String x)
By convention App is left associative, that is a b c is interpreted as (a b) c and function bodies stretch as far to the right as possible, that is, λ x. x y is interpreted as λ x. (x y).
I have a parser that does a good job, but now I want a pretty printer. Here's what I currently have (pseudo scala):
term match {
case Fun(v, t) => "(λ %s.%s)".format(v, prettyPrint(t))
case App(s, t) => "(%s %s)".format(prettyPrint(s), prettyPrint(t))
case Var(v) => v
}
The above printer always puts ( ) around expressions (except for atomic variables). Thus for Fun(x, App(Fun(y, x), y)) it produces
(λ x.((λ y.x) y))
I would like to have
λ x.(λ y.x) y
Here I'll use a simple grammar for infix expressions with the associativity and precedence defined by the following grammar whose operators are listed in ascending order of precedence
E -> E + T | E - T | T left associative
T -> T * F | T / F | F left associative
F -> G ^ F | G right associative
G -> - G | ( E ) | NUM
Given an abstract syntax tree (AST) we convert the AST to a string with only the necessary parenthesis as described in the pseudocode below. We examine relative precedence and associativity as we recursively descend the tree to determine when parenthesis are necessary. Note that all decisions to wrap parentheses around an expression must be made in the parent node.
toParenString(AST) {
if (AST.type == NUM) // simple atomic type (no operator)
return toString(AST)
else if (AST.TYPE == UNARY_MINUS) // prefix unary operator
if (AST.arg.type != NUM AND
precedence(AST.op) > precedence(AST.arg.op))
return "-(" + toParenString(AST.arg) + ")"
else
return "-" + toParenString(AST.arg)
else { // binary operation
var useLeftParen =
AST.leftarg.type != NUM AND
(precedence(AST.op) > precedence(AST.leftarg.op) OR
(precedence(AST.op) == precedence(AST.leftarg.op) AND
isRightAssociative(AST.op)))
var useRightParen =
AST.rightarg.type != NUM AND
(precedence(AST.op) > precedence(AST.rightarg.op) OR
(precedence(AST.op) == precedence(AST.rightarg.op) AND
isLeftAssociative(AST.op)))
var leftString;
if (useLeftParen) {
leftString = "(" + toParenString(AST.leftarg) + ")"
else
leftString = toParenString(AST.leftarg)
var rightString;
if (useRightParen) {
rightString = "(" + toParenString(AST.rightarg) + ")"
else
rightString = toParenString(AST.rightarg)
return leftString + AST.op + rightString;
}
}
Isn't it that you just have to check the types of the arguments of App?
I'm not sure how to write this in scala..
term match {
case Fun(v: String, t: Term) => "λ %s.%s".format(v, prettyPrint(t))
case App(s: Fun, t: App) => "(%s) (%s)".format(prettyPrint(s), prettyPrint(t))
case App(s: Term, t: App) => "%s (%s)".format(prettyPrint(s), prettyPrint(t))
case App(s: Fun, t: Term) => "(%s) %s".format(prettyPrint(s), prettyPrint(t))
case App(s: Term, t: Term) => "%s %s".format(prettyPrint(s), prettyPrint(t))
case Var(v: String) => v
}

F# How to Percentile Rank An Array of Doubles?

I am trying to take a numeric array in F#, and rank all the elements so that ties get the same rank. Basically I'm trying to replicate the algorithm I have below in C#, but just for an array of doubles. Help?
rankMatchNum = 0;
rankMatchSum = 0;
previousScore = -999999999;
for (int i = 0; i < factorStocks.Count; i++)
{
//The 1st time through it won't ever match the previous score...
if (factorStocks[i].factors[factorName + "_R"] == previousScore)
{
rankMatchNum = rankMatchNum + 1; //The count of matching ranks
rankMatchSum = rankMatchSum + i + 1; //The rank itself...
for (int j = 0; j <= rankMatchNum; j++)
{
factorStocks[i - j].factors[factorName + "_WR"] = rankMatchSum / (rankMatchNum + 1);
}
}
else
{
rankMatchNum = 0;
rankMatchSum = i + 1;
previousScore = factorStocks[i].factors[factorName + "_R"];
factorStocks[i].factors[factorName + "_WR"] = i + 1;
}
}
Here's how I would do it, although this isn't a direct translation of your code. I've done things in a functional style, piping results from one transformation to another.
let rank seq =
seq
|> Seq.countBy (fun x -> x) // count repeated numbers
|> Seq.sortBy (fun (k,v) -> k) // order by key
|> Seq.fold (fun (r,l) (_,n) -> // accumulate the number of items seen and the list of grouped average ranks
let r'' = r + n // get the rank after this group is processed
let avg = List.averageBy float [r+1 .. r''] // average ranks for this group
r'', ([for _ in 1 .. n -> avg]) :: l) // add a list with avg repeated
(0,[]) // seed the fold with rank 0 and an empty list
|> snd // get the final list component, ignoring the component storing the final rank
|> List.rev // reverse the list
|> List.collect (fun l -> l) // merge individual lists into final list
Or to copy Mehrdad's style:
let rank arr =
let lt item = arr |> Seq.filter (fun x -> x < item) |> Seq.length
let lte item = arr |> Seq.filter (fun x -> x <= item) |> Seq.length
let avgR item = [(lt item) + 1 .. (lte item)] |> List.averageBy float
Seq.map avgR arr
I think that you'll probably find this problem far easier to solve in F# if you rewrite the above in a declarative manner rather than in an imperative manner. Here's my off-the-top-of-my-head approach to rewriting the above declaratively:
First we need a wrapper class to decorate our items with a property carrying the rank.
class Ranked<T> {
public T Value { get; private set; }
public double Rank { get; private set; }
public Ranked(T value, double rank) {
this.Value = value;
this.Rank = rank;
}
}
Here, then, is your algorithm in a declarative manner. Note that elements is your input sequence and the resulting sequence is in the same order as elements. The delegate func is the value that you want to rank elements by.
static class IEnumerableExtensions {
public static IEnumerable<Ranked<T>> Rank<T, TRank>(
this IEnumerable<T> elements,
Func<T, TRank> func
) {
var groups = elements.GroupBy(x => func(x));
var ranks = groups.OrderBy(g => g.Key)
.Aggregate(
(IEnumerable<double>)new List<double>(),
(x, g) =>
x.Concat(
Enumerable.Repeat(
Enumerable.Range(x.Count() + 1, g.Count()).Sum() / (double)g.Count(),
g.Count()
)
)
)
.GroupBy(r => r)
.Select(r => r.Key)
.ToArray();
var dict = groups.Select((g, i) => new { g.Key, Index = i })
.ToDictionary(x => x.Key, x => ranks[x.Index]);
foreach (T element in elements) {
yield return new Ranked<T>(element, dict[func(element)]);
}
}
}
Usage:
class MyClass {
public double Score { get; private set; }
public MyClass(double score) { this.Score = score; }
}
List<MyClass> list = new List<MyClass>() {
new MyClass(1.414),
new MyClass(2.718),
new MyClass(2.718),
new MyClass(2.718),
new MyClass(1.414),
new MyClass(3.141),
new MyClass(3.141),
new MyClass(3.141),
new MyClass(1.618)
};
foreach(var item in list.Rank(x => x.Score)) {
Console.WriteLine("Score = {0}, Rank = {1}", item.Value.Score, item.Rank);
}
Output:
Score = 1.414, Rank = 1.5
Score = 2.718, Rank = 3
Score = 2.718, Rank = 3
Score = 2.718, Rank = 3
Score = 1.414, Rank = 1.5
Score = 3.141, Rank = 5
Score = 3.141, Rank = 5
Score = 3.141, Rank = 5
Score = 1.618, Rank = 8
Note that I do not require the input sequence to be ordered. The resulting code is simpler if you enforce such a requirement on the input sequence. Note further that we do not mutate the input sequence, nor do we mutate the input items. This makes F# happy.
From here you should be able to rewrite this in F# easily.
This is not a very efficient algorithm (O(n2)), but it's quite short and readable:
let percentile arr =
let rank item = ((arr |> Seq.filter (fun i -> i < item)
|> Seq.length |> float) + 1.0)
/ float (Array.length arr) * 100.0
Array.map rank arr
You might mess with the expression fun i -> i < e (or the + 1.0 expression) to achieve your desired way of ranking results:
let arr = [|1.0;2.0;2.0;4.0;3.0;3.0|]
percentile arr |> print_any;;
[|16.66666667; 33.33333333; 33.33333333; 100.0; 66.66666667; 66.66666667|]
Mehrdad's solution is very nice but a bit slow for my purposes. The initial sorting can be done 1 time. Rather than traversing the lists each time to get the number of items < or <= the target, we can use counters. This is more imperative (could have used a fold):
let GetRanks2 ( arr ) =
let tupleList = arr |> Seq.countBy( fun x -> x ) |> Seq.sortBy( fun (x,count) -> x )
let map = new System.Collections.Generic.Dictionary<int,float>()
let mutable index = 1
for (item, count) in tupleList do
let c = count
let avgRank =
let mutable s = 0
for i = index to index + c - 1 do
s <- s + i
float s / float c
map.Add( item, avgRank )
index <- index + c
//
map

Resources