customising the parse return value, retaining unnamed terminals - parsing

Consider the grammar:
TOP ⩴ 'x' Y 'z'
Y ⩴ 'y'
Here's how to get the exact value ["TOP","x",["Y","y"],"z"] with various parsers (not written manually, but generated from the grammar):
xyz__Parse-Eyapp.eyp
%strict
%tree
%%
start:
TOP { shift; use JSON::MaybeXS qw(encode_json); print encode_json $_[0] };
TOP:
'x' Y 'z' { shift; ['TOP', (scalar #_) ? #_ : undef] };
Y:
'y' { shift; ['Y', (scalar #_) ? #_ : undef] };
%%
xyz__Regexp-Grammars.pl
use 5.028;
use strictures;
use Regexp::Grammars;
use JSON::MaybeXS qw(encode_json);
print encode_json $/{TOP} if (do { local $/; readline; }) =~ qr{
<nocontext:>
<TOP>
<rule: TOP>
<[anon=(x)]> <[anon=Y]> <[anon=(z)]>
<MATCH=(?{['TOP', $MATCH{anon} ? $MATCH{anon}->#* : undef]})>
<rule: Y>
<[anon=(y)]>
<MATCH=(?{['Y', $MATCH{anon} ? $MATCH{anon}->#* : undef]})>
}msx;
Code elided for the next two parsers. With Pegex, the functionality is achieved by inheriting from Pegex::Receiver. With Marpa-R2, the customisation of the return value is quite limited, but nested arrays are possible out of the box with a configuration option.
I have demonstrated that the desired customisation is possible, although it's not always easy or straight-forward. These pieces of code attached to the rules are run as the tree is assembled.
The parse method returns nothing but nested Match objects that are unwieldy. They do not retain the unnamed terminals! (Just to make sure what I'm talking about: these are the two pieces of data at the RHS of the TOP rule whose values are 'x' and 'z'.) Apparently only data springing forth from named declarators are added to the tree.
Assigning to the match variable (analog to how it works in Regexp-Grammars) seems to have no effect. Since the terminals do no make it into the match variable, actions don't help, either.
In summary, here's the grammar and ordinary parse value:
grammar {rule TOP { x <Y> z }; rule Y { y };}.parse('x y z')
How do you get the value ["TOP","x",["Y","y"],"z"] from it? You are not allowed to change the shape of rules because that would potentially spoil user attached semantics, otherwise anything else is fair game. I still think the key to the solution is the match variable, but I can't see how.

Not a full answer, but the Match.chunks method gives you a few of the input string tokenized into captured and non-captured parts.
It does, however, does not give you the ability to distinguish between non-capturing literals in the regex and implicitly matched whitespace.
You could circumvent that by adding positional captures, and use Match.caps
my $m = grammar {rule TOP { (x) <Y> (z) }; rule Y { (y) }}.parse('x y z');
sub transform(Pair $p) {
given $p.key {
when Int { $p.value.Str }
when Str { ($p.key, $p.value.caps.map(&transform)).flat }
}
}
say $m.caps.map(&transform);
This produces
(x (Y y) z)
so pretty much what you wanted, except that the top-level TOP is missing (which you'll likely only get in there if you hard-code it).
Note that this doesn't cover all edge cases; for example when a capture is quantified, $p.value is an Array, not a match object, so you'll need another level of .map in there, but the general idea should be clear.

Related

Maxima: Is there any way to make functions defined within the main function be local, in a similar way to local variables?

I wonder if there is any way to make functions defined within the main function be local, in a similar way to local variables. For example, in this function that calculates the gradient of a scalar function,
grad(var,f) := block([aux],
aux : [gradient, DfDx[i]],
gradient : [],
DfDx[i] := diff(f(x_1,x_2,x_3),var[i],1),
for i in [1,2,3] do (
gradient : append(gradient, [DfDx[i]])
),
return(gradient)
)$
The variable gradient that has been defined inside the main function grad(var,f) has no effect outside the main function, as it is inside the aux list. However, I have observed that the function DfDx, despite being inside the aux list, does have an effect outside the main function.
Is there any way to make the sub-functions defined inside the main function to be local only, in a similar way to what can be made with local variables? (I know that one can kill them once they have been used, but perhaps there is a more elegant way)
To address the problem you are needing to solve here, another way to compute the gradient is to say
grad(var, e) := makelist(diff(e, var1), var1, var);
and then you can say for example
grad([x, y, z], sin(x)*y/z);
to get
cos(x) y sin(x) sin(x) y
[--------, ------, - --------]
z z 2
z
(There isn't a built-in gradient function; this is an oversight.)
About local functions, bear in mind that all function definitions are global. However you can approximate a local function definition via local, which saves and restores all properties of a symbol. Since the function definition is a property, local has the effect of temporarily wiping out an existing function definition and later restoring it. In between you can create a temporary function definition. E.g.
foo(x) := 2*x;
bar(y) := block(local(foo), foo(x) := x - 1, foo(y));
bar(100); /* output is 99 */
foo(100); /* output is 200 */
However, I don't this you need to use local -- just makelist plus diff is enough to compute the gradient.
There is more to say about Maxima's scope rules, named and unnamed functions, etc. I'll try to come back to this question tomorrow.
To compute the gradient, my advice is to call makelist and diff as shown in my first answer. Let me take this opportunity to address some related topics.
I'll paste the definition of grad shown in the problem statement and use that to make some comments.
grad(var,f) := block([aux],
aux : [gradient, DfDx[i]],
gradient : [],
DfDx[i] := diff(f(x_1,x_2,x_3),var[i],1),
for i in [1,2,3] do (
gradient : append(gradient, [DfDx[i]])
),
return(gradient)
)$
(1) Maxima works mostly with expressions as opposed to functions. That's not causing a problem here, I just want to make it clear. E.g. in general one has to say diff(f(x), x) when f is a function, instead of diff(f, x), likewise integrate(f(x), ...) instead of integrate(f, ...).
(2) When gradient and Dfdx are to be the local variables, you have to name them in the list of variables for block. E.g. block([gradient, Dfdx], ...) -- Maxima won't understand block([aux], aux: ...).
(3) Note that a function defined with square brackets instead of parentheses, e.g. f[x] := ... instead of f(x) := ..., is a so-called array function in Maxima. An array function is a memoizing function, i.e. if f[x] is called two or more times, the return value is only computed once, and then returned every time thereafter. Sometimes that's a useful optimization when the domain of the function comprises a finite set.
(4) Bear in mind that x_1, x_2, x_3, are distinct symbols, not related to each other, and not related to x[1], x[2], x[3], even if they are displayed the same. My advice is to work with subscripted symbols x[i] when i is a variable.
(5) About building up return values, try to arrange to compute the whole thing at one go, instead of growing the result incrementally. In this case, makelist is preferable to for plus append.
(6) The return function in Maxima acts differently than in other programming languages; it's a little hard to explain. A function returns the value of the last expression which was evaluated, so if gradient is that last expression, you can just write grad(var, f) := block(..., gradient).
Hope this helps, I know it's obscure and complex. The Maxima programming language was not designed before being implemented, and some of the decisions are clearly questionable at the long interval of more than 50 years (!) later. That's okay, they were figuring it out as they went along. There was not a body of established results which could provide a point of reference; the original authors were contributing to what's considered common knowledge today.

Define a variable which evaluates when expression is evaluated, but not substitutes its definition to expression

Let's say, I want to declare an elliptic integral as
K(k):=elliptic_kc (k^2);
k:=<something like tanh()*coth()...>
The problem is that maxima will always substitute elliptic_kc(x^2) in place of K(x), and k's definition in place of k.
I want to prevent it, while still allowing numeric evaluation of K(), k, and simplifying expressions with these symbols.
...
A function, can be declared as "noun" for disabling substitution. But this also disables its evaluation.
Well, I use various strategies. Sometimes one approach works better than another.
(1) Put a single quote ' on function names to nounify that specific function call. At a later time, ev(expr, nouns) verbifies any nouns, so the functions are called. E.g. foo: 'integrate(sin(x), x); yields a noun expression. Then ev(foo, nouns); (which can be abbreviated to foo, nouns; at the console input) to actually calculate it.
(2) Don't define functions, but just let them be undefined symbols. Then substitute a lambda expression when you want to evaluate them. E.g. foo: f(2); then later subst(f = lambda([x], x + 1), foo);.
(3) Don't assign values, but let them be undefined, then substitute values later on. E.g. foo: a + b; then later subst([a = 123, b = y*z], foo);.

maps,filter,folds and more? Do we really need these in Erlang?

Maps, filters, folds and more : http://learnyousomeerlang.com/higher-order-functions#maps-filters-folds
The more I read ,the more i get confused.
Can any body help simplify these concepts?
I am not able to understand the significance of these concepts.In what use cases will these be needed?
I think it is majorly because of the syntax,diff to find the flow.
The concepts of mapping, filtering and folding prevalent in functional programming actually are simplifications - or stereotypes - of different operations you perform on collections of data. In imperative languages you usually do these operations with loops.
Let's take map for an example. These three loops all take a sequence of elements and return a sequence of squares of the elements:
// C - a lot of bookkeeping
int data[] = {1,2,3,4,5};
int squares_1_to_5[sizeof(data) / sizeof(data[0])];
for (int i = 0; i < sizeof(data) / sizeof(data[0]); ++i)
squares_1_to_5[i] = data[i] * data[i];
// C++11 - less bookkeeping, still not obvious
std::vec<int> data{1,2,3,4,5};
std::vec<int> squares_1_to_5;
for (auto i = begin(data); i < end(data); i++)
squares_1_to_5.push_back((*i) * (*i));
// Python - quite readable, though still not obvious
data = [1,2,3,4,5]
squares_1_to_5 = []
for x in data:
squares_1_to_5.append(x * x)
The property of a map is that it takes a collection of elements and returns the same number of somehow modified elements. No more, no less. Is it obvious at first sight in the above snippets? No, at least not until we read loop bodies. What if there were some ifs inside the loops? Let's take the last example and modify it a bit:
data = [1,2,3,4,5]
squares_1_to_5 = []
for x in data:
if x % 2 == 0:
squares_1_to_5.append(x * x)
This is no longer a map, though it's not obvious before reading the body of the loop. It's not clearly visible that the resulting collection might have less elements (maybe none?) than the input collection.
We filtered the input collection, performing the action only on some elements from the input. This loop is actually a map combined with a filter.
Tackling this in C would be even more noisy due to allocation details (how much space to allocate for the output array?) - the core idea of the operation on data would be drowned in all the bookkeeping.
A fold is the most generic one, where the result doesn't have to contain any of the input elements, but somehow depends on (possibly only some of) them.
Let's rewrite the first Python loop in Erlang:
lists:map(fun (E) -> E * E end, [1,2,3,4,5]).
It's explicit. We see a map, so we know that this call will return a list as long as the input.
And the second one:
lists:map(fun (E) -> E * E end,
lists:filter(fun (E) when E rem 2 == 0 -> true;
(_) -> false end,
[1,2,3,4,5])).
Again, filter will return a list at most as long as the input, map will modify each element in some way.
The latter of the Erlang examples also shows another useful property - the ability to compose maps, filters and folds to express more complicated data transformations. It's not possible with imperative loops.
They are used in almost every application, because they abstract different kinds of iteration over lists.
map is used to transform one list into another. Lets say, you have list of key value tuples and you want just the keys. You could write:
keys([]) -> [];
keys([{Key, _Value} | T]) ->
[Key | keys(T)].
Then you want to have values:
values([]) -> [];
values([{_Key, Value} | T}]) ->
[Value | values(T)].
Or list of only third element of tuple:
third([]) -> [];
third([{_First, _Second, Third} | T]) ->
[Third | third(T)].
Can you see the pattern? The only difference is what you take from the element, so instead of repeating the code, you can simply write what you do for one element and use map.
Third = fun({_First, _Second, Third}) -> Third end,
map(Third, List).
This is much shorter and the shorter your code is, the less bugs it has. Simple as that.
You don't have to think about corner cases (what if the list is empty?) and for experienced developer it is much easier to read.
filter searches lists. You give it function, that takes element, if it returns true, the element will be on the returned list, if it returns false, the element will not be there. For example filter logged in users from list.
foldl and foldr are used, when you have to do additional bookkeeping while iterating over the list - for example summing all the elements or counting something.
The best explanations, I've found about those functions are in books about Lisp: "Structure and Interpretation of Computer Programs" and "On Lisp" Chapter 4..

Position of a node?

Rascal is rooted in term rewriting. Does it have built-in support for term/node position as commonly defined in term rewriting so that I can query for the position of a sub-term inside a term or the other way around?
I don't believe explicit positions are commonly defined in the semantics of term rewriting, but nevertheless Rascal defines all kinds of operations on terms such that positions are explicit or can be made explicit. Please also have a look at he manuals at http://www.rascal-mpl.org
The main operation on terms is pattern matching using normal first order congruence, deep (higher order) match, negative match, disjunctive match, etc:
if (and(and(_, _), _) := and(and(true(),false()), false())) // pattern match operator :=
println("yes!");
and(true(), b) = b; // function definition, aka rewrite rule
and(false(), _) = false();
[ a | and(a,b) <- booleanList]; // comprehension with pattern as filter on a generator
innermost visit (t) { // explicit automated traversal with strategies
case and(a,b) => or(a,b)
}
b.leftHandSide = true(); // assign new child term to the leftHandSide field of the term assigned to the b variable (non-destructively, you get a new b)
b[0] = false(); // same but to the anonymous first child.
Then there are the normal projection operators, index on the children term[0] and child-by-name: term.myChildName if there was a many sorted term signature defined using field labels.
if you want to know at which position a sub-child is, I would perhaps write it as such:
int getPos(node t, value child) = [*pre, child, *_] := getChildren(t) ? size(pre) : -1;
but there are other ways of achieving the same.
Rascal does not have pointers to the parents of a term.

Find all possible pairs between the subsets of N sets with Erlang

I have a set S. It contains N subsets (which in turn contain some sub-subsets of various lengths):
1. [[a,b],[c,d],[*]]
2. [[c],[d],[e,f],[*]]
3. [[d,e],[f],[f,*]]
N. ...
I also have a list L of 'unique' elements that are contained in the set S:
a, b, c, d, e, f, *
I need to find all possible combinations between each sub-subset from each subset so, that each resulting combination has exactly one element from the list L, but any number of occurrences of the element [*] (it is a wildcard element).
So, the result of the needed function working with the above mentioned set S should be (not 100% accurate):
- [a,b],[c],[d,e],[f];
- [a,b],[c],[*],[d,e],[f];
- [a,b],[c],[d,e],[f],[*];
- [a,b],[c],[d,e],[f,*],[*];
So, basically I need an algorithm that does the following:
take a sub-subset from the subset 1,
add one more sub-subset from the subset 2 maintaining the list of 'unique' elements acquired so far (the check on the 'unique' list is skipped if the sub-subset contains the * element);
Repeat 2 until N is reached.
In other words, I need to generate all possible 'chains' (it is pairs, if N == 2, and triples if N==3), but each 'chain' should contain exactly one element from the list L except the wildcard element * that can occur many times in each generated chain.
I know how to do this with N == 2 (it is a simple pair generation), but I do not know how to enhance the algorithm to work with arbitrary values for N.
Maybe Stirling numbers of the second kind could help here, but I do not know how to apply them to get the desired result.
Note: The type of data structure to be used here is not important for me.
Note: This question has grown out from my previous similar question.
These are some pointers (not a complete code) that can take you to right direction probably:
I don't think you will need some advanced data structures here (make use of erlang list comprehensions). You must also explore erlang sets and lists module. Since you are dealing with sets and list of sub-sets, they seems like an ideal fit.
Here is how things with list comprehensions will get solved easily for you: [{X,Y} || X <- [[c],[d],[e,f]], Y <- [[a,b],[c,d]]]. Here i am simply generating a list of {X,Y} 2-tuples but for your use case you will have to put real logic here (including your star case)
Further note that with list comprehensions, you can use output of one generator as input of a later generator e.g. [{X,Y} || X1 <- [[c],[d],[e,f]], X <- X1, Y1 <- [[a,b],[c,d]], Y <- Y1].
Also for removing duplicates from a list of things L = ["a", "b", "a"]., you can anytime simply do sets:to_list(sets:from_list(L)).
With above tools you can easily generate all possible chains and also enforce your logic as these chains get generated.

Resources