Sequential, cumulative operations on Rascal trees - rascal

Given immutability of data in Rascal, what is the preferred method for these, if later operations depend on the results of earlier ones?
For example, assigning annotation values to every node in a tree, where values of higher nodes depend on values of lower ones. If you write a single visit statement with multiple cases, then insertion statements at lower levels don't change the tree, so higher level operations may have nothing on which to operate. On the other hand, surrounding each case statement with a visit statement -- and rebinding your tree variable to the new tree after every visit -- is cumbersome and, worse, seems to make the results depend on the order of the statements.

Subtle question :-) The visit statement has a subtle semantics especially if you look at it from an OO perspective. While it is actively traversing a value it is also rebuilding a new one. Depending on the strategy (order) of the traversal sequential matches in its case statement "see" different things.
It is sometimes insightful to imagine visit as a code generator which generates a set of mutually recursive functions which take the visited part as argument and return a new value on return. The body of the visit (the cases) turn into a switch statement which is the core of these generated functions. Then, depending on the traversal order, the recursive steps are either positioned before (bottom-up) or after this switch statement (top-down):
// bottom-up visit
x f(value x) {
newChildren = map(f, children of x);
x = newX(x, newChildren);
switch (x) {
case y : return whatever(y);
}
}
Hence the code in the switch case for bottom-up visits see the trees produced by recursive calls (although no tree has actually been updated).
Here's an example:
rascal>data T = tee(T, T) | i(int i);
data T = tee(T, T) | i(int i);
ok
rascal>t = tee(tee(i(1),i(2)),tee(i(3),i(4)));
t = tee(tee(i(1),i(2)),tee(i(3),i(4)));
T: tee(
tee(
i(1),
i(2)),
tee(
i(3),
i(4)))
rascal>visit(t) {
>>>>>>> case i(x) => i(x+1)
>>>>>>> case tee(i(x),y) => tee(i(x+1),y)
>>>>>>>}
T: tee(
tee(
i(3), // here you see how 1 turned into 3 by incrementing twice
i(3)), // increment happened once here
tee(
i(5), // increment happened twice here too
i(5))) // increment happened once here
You can see that some nodes had an increment twice because the second case matched
one a tee after it's child i node was already visited to return a different i node which is already incremented.
Experimenting with the other strategies will give you other results, see http://tutor.rascal-mpl.org/Rascal/Rascal.html#/Rascal/Expressions/Visit/Visit.html . Note that the variables in the scope of the visit statement are shared by all visits on all levels of recursion which gives power to emulate zipper-like behavior (you can always store the previously visited node in a temporary).
As an aside, the language design is trying to avoid the need for more involved "functional programming design patterns" such as zippers because they complicate the type system and the way people interact with it. To make these things work type correctly in a visit which does a recursion over a heterogeneous data-type you need a PhD in polymorphism to understand when it is type correct. Secretly, the visit statement simulates a set of built-in type-safe rank-2 higher-order polymorphic functions, but that's all under the hood.

Related

Guarantee Print Order After Parallelism

I have X amount of cores doing unique work in parallel, however, their output needs to be printed in order.
Object {
Data data
int order
}
I've tried putting the objects in a min heap after they're done with their parallel work, however, even that is too much of a bottleneck.
Is there any way I could have work done in parallel and guarantee the print order? Is there a known term for my problem? Have others encountered it before?
Is there any way I could have work done in parallel and guarantee the print order?
Needless to say, we design parallelized routines with focus on an efficiency, but not constraining the order of the calculations. The printing of the results at the end, when everything is done, should dictate the ordering. In fact, parallel routines often do calculations in such a way that they’re conspicuously not in order (e.g., striding on each thread) to minimize thread and synchronization overhead.
The only question is how you structure the results to allow efficient storage and efficient, ordered retrieval. I often just use a mutable buffer or a pre-populated array. It’s very efficient in terms of both storage and retrieval. Or you can use a dictionary, too. It depends upon the nature of your Data. But I’d avoid the order property pattern in your result Object.
Just make sure you’re using optimized build if using standard Swift collections, as this can have a material impact on performance.
Q : Is there a known term for my problem?
Yes, there is. A con·​tra·​dic·​tion:
Definition of contradiction…2a : a proposition, statement, or phrase that asserts or implies both the truth and falsity of something// … both parts of a contradiction cannot possibly be true …— Thomas Hobbes
2b : a statement or phrase whose parts contradict each other// a round square is a contradiction in terms
3a : logical incongruity
3b : a situation in which inherent factors, actions, or propositions are inconsistent or contrary to one anothersource: Merriam-Webster
Computer science, having borrowed the terms { PARALLEL | SERIAL | CONCURRENT } from the theory of systems, respects the distinctive ( and never overlapping ) properties of each such class of operations, where:
[PARALLEL] orchestration of units-of-work implies, that any and every work-unit: a) starts and b) gets executed and c) gets finished at the same time, i.e. all get into/out-of [PARALLEL]-section at once and being elaborated at the very same time, not otherwise.
[SERIAL] orchestration of units-of-work implies, that all work-units be processed in a one, static, known, particular order, starting work-unit(s) in such an order, just a (known)-next one after previous one has finished its work - i.e. one-after-another, not otherwise.
[CONCURRENT] orchestration of units-of-work permits to start more than one unit-of-work, if resources and system conditions permit (scheduler priorities obeyed), resulting in unknown order of execution and unknown time of completion, as both the former and the latter depend on unknown externalities (system conditions and (non)-availability of resources, that are/will be needed for a particular work-unit elaboration)
Whereas there is an a-priori known, inherently embedded sense of an ORDER in [SERIAL]-type of processing ( as it was already pre-wired into the units-of-work processing-orchestration-code ), it has no such meaning in either [CONCURRENT], where opportunistic scheduling makes a wished-to-have order an undeterministically random result from the system states, skewed by the coincidence of all other externalities, and the same wished-to-have order is principally singular value in true [PARALLEL] by definition, as all start/execute/finish at-the-same-time - so all units-of-work being executed in [PARALLEL] fashion have no other chance, but be both 1st and last at the same time.
Q : Is there any way I could have work done in parallel and guarantee the print order?
No, unless you intentionally or unknowingly violate the [PARALLEL] orchestration rules and re-enter a re-[SERIAL]-iser logic into the work-units, so as to imperatively enforce any such wished-to-have ordering, that is not known, the less natural for the originally [PARALLEL] work-units' orchestration ( as is a common practice in python - using a GIL-monopolist indoctrinated stepping - as an example of such step )
Q : Have others encountered it before?
Yes. Since 2011, each and every semester this or similar questions reappear here, on Stack Overflow at growing amounts every year.

Why is splitting a Rust's std::collections::LinkedList O(n)?

The .split_off method on std::collections::LinkedList is described as having a O(n) time complexity. From the (docs):
pub fn split_off(&mut self, at: usize) -> LinkedList<T>
Splits the list into two at the given index. Returns everything after the given index, including the index.
This operation should compute in O(n) time.
Why not O(1)?
I know that linked lists are not trivial in Rust. There are several resources going into the how's and why's like this book and this article among several others, but I haven't got the chance to dive into those or the standard library's source code yet.
Is there a concise explanation about the extra work needed when splitting a linked list in (safe) Rust?
Is this the only way? And if not why was this implementation chosen?
The method LinkedList::split_off(&mut self, at: usize) first has to traverse the list from the start (or the end) to the position at, which takes O(min(at, n - at)) time. The actual split off is a constant time operation (as you said). And since this min() expression is confusing, we just replace it by n which is legal. Thus: O(n).
Why was the method designed like that? The problem goes deeper than this particular method: most of the LinkedList API in the standard library is not really useful.
Due to its cache unfriendliness, a linked list is often a bad choice to store sequential data. But linked lists have a few nice properties which make them the best data structure for a few, rare situations. These nice properties include:
Inserting an element in the middle in O(1), if you already have a pointer to that position
Removing an element from the middle in O(1), if you already have a pointer to that position
Splitting the list into two lists at an arbitrary position in O(1), if you already have a pointer to that position
Notice anything? The linked list is designed for situations where you already have a pointer to the position that you want to do stuff at.
Rust's LinkedList, like many others, just store a pointer to the start and end. To have a pointer to an element inside the linked list, you need something like an Iterator. In our case, that's IterMut. An iterator over a collection can function like a pointer to a specific element and can be advanced carefully (i.e. not with a for loop). And in fact, there is IterMut::insert_next which allows you to insert an element in the middle of the list in O(1). Hurray!
But this method is unstable. And methods to remove the current element or to split the list off at that position are missing. Why? Because of the vicious circle that is:
LinkedList lacks almost all features that make linked lists useful at all
Thus (nearly) everyone recommends not to use it
Thus (nearly) no one uses LinkedList
Thus (nearly) no one cares about improving it
Goto 1
Please note that are a few brave souls occasionally trying to improve the situations. There is the tracking issue about insert_next, where people argue that Iterator might be the wrong concept to perform these O(1) operations and that we want something like a "cursor" instead. And here someone suggested a bunch of methods to be added to IterMut (including cut!).
Now someone just has to write a nice RFC and someone needs to implement it. Maybe then LinkedList won't be nearly useless anymore.
Edit 2018-10-25: someone did write an RFC. Let's hope for the best!
Edit 2019-02-21: the RFC was accepted! Tracking issue.
Maybe I'm misunderstanding your question, but in a linked list, the links of each node have to be followed to proceed to the next node. If you want to get to the third node, you start at the first, follow its link to the second, then finally arrive at the third.
This traversal's complexity is proportional to the target node index n because n nodes are processed/traversed, so it's a linear O(n) operation, not a constant time O(1) operation. The part where the list is "split off" is of course constant time, but the overall split operation's complexity is dominated by the dominant term O(n) incurred by getting to the split-off point node before the split can even be made.
One way in which it could be O(1) would be if a pointer existed to the node after which the list is split off, but that is different from specifying a target node index. Alternatively, an index could be kept mapping the node index to the corresponding node pointer, but it would be extra space and processing overhead in keeping the index updated in sync with list operations.
pub fn split_off(&mut self, at: usize) -> LinkedList<T>
Splits the list into two at the given index. Returns everything after the given index, including the index.
This operation should compute in O(n) time.
The documentation is either:
unclear, if n is supposed to be the index,
pessimistic, if n is supposed to be the length of the list (the usual meaning).
The proper complexity, as can be seen in the implementation, is O(min(at, n - at)) (whichever is smaller). Since at must be smaller than n, the documentation is correct that O(n) is a bound on the complexity (reached for at = n / 2), however such a large bound is unhelpful.
That is, the fact that list.split_off(5) takes the same time if list.len() is 10 or 1,000,000 is quite important!
As to why this complexity, this is an inherent consequence of the structure of doubly-linked list. There is no O(1) indexing operation in a linked-list, after all. The operation implemented in C, C++, C#, D, F#, ... would have the exact same complexity.
Note: I encourage you to write a pseudo-code implementation of a linked-list with the split_off operation; you'll realize this is the best you can get without altering the data-structure to be something else.

How can we implement loop constructs in genetic programming?

I've been playing around with genetic programming for some time and started wondering how to implement looping constructs.
In the case of for loops I can think of 3 parameters:
start: starting value of counter
end: counter upper limit
expression: the expression to execute while counter < end
Now the tricky part is the expression because it generates the same value in every iteration unless counter is somehow injected into it. So I could allow the symbol for counter to be present in the expressions but then how do I prevent it from appearing outside of for loops?
Another problem is using the result of the expression. I could have a for loop which sums the results, another one that multiplies them together but that's limiting and doesn't seem right. I would like a general solution, not one for every operator.
So does anyone know a good method to implement loops in genetic programming?
Well, that's tricky. Genetic programming (the original Koza-style GP) is best suited for functional-style programming, i.e. there is no internal execution state and every node is a function that returns (and maybe takes) values, like lisp. That is a problem when the node is some loop - it is not clear what the node should return.
You could also design your loop node as a binary node. One parameter is a boolean expression that will be called before every loop and if true is returned, the loop will be executed. The second parameter would be the loop expression.
The problem you already mentioned, that there is no way of changing the loop expression. You can solve this by introducing a concept of some internal state or variables. But that leaves you with another problems like the need to define the number of variables. A variable can be realized e.g. by a tuple of functions - a setter (one argument, no return value, or it can return the argument) and getter (no arguments, returns the value of the variable).
Regarding the way of handling the loop result processing, you could step from GP to strongly typed GP or STGP for short. It is essentialy a GP with types. Your loop could then be effectively a function that returns a list of values (e.g. numbers) and you could have other functions that take such lists and calculate other values...
There is another GP algorithm (my favourite), called Grammatical Evolution (or GE) which uses context-free grammar to generate the programs. It can be used to encode type information like in STGP. You could also define the grammar in a way that classical c-like for and while loops can be generated. Some extensions to it, like Dynamically Defined Functions, could be used to implement variables dynamically.
If there is anything unclear, just comment on the answer and I'll try to clarify it.
The issue with zegkjan's answer is that there is more than one way to skin a cat.
Theres actually a simpler, and at times, better solution to creating GP datastructures than koza trees, instead using stacks.
This method is called Stack Based Genetic Programming, which is quite old (1993). This method of genetic programming removes trees entirely, you have a instruction list, and a data stack (where your functional and terminal set remain the same). You iterate through your instruction list, pushing values to the data stack, and pulling values to consume them, and returning a new value/values to the stack given your instruction. For example, consider the following genetic program.
0: PUSH TERMINAL X
1: PUSH TERMINAL X
2: MULTIPLY (A,B)
Iterating through this program will give you:
step1: DATASTACK X
step2: DATASTACK X, X
step3: DATASTACK X^2
Once you have executed all program list statements, you then just take off the number of elements from the stack that you care about (you can get multiple values out of the GP program this way). This ends up being a fast and extremely flexible method (memory locality, number of parameters doesn't matter, nor number of elements returned) you can implement fairly quickly.
To loop in this method, you can create a separate stack, an execution stack, where new special operators are used to push and pop multiple statements from the execution stack at once to be executed afterwards.
Additionally you can simply include a jump statement to move backwards in your program list, make a loop statement, or have a separate stack holding loop information. With this a genetic program could theoretically develop its own for loop.
0: PUSH TERMINAL X
1: START_LOOP 2
2: PUSH TERMINAL X
3: MULTIPLY (A, B)
4: DECREMENT_LOOP_NOT_ZERO
step1: DATASTACK X
LOOPSTACK
step2: DATASTACK X
LOOPSTACK [1,2]
step3: DATASTACK X, X
LOOPSTACK [1,2]
step4: DATASTACK X^2
LOOPSTACK [1,2]
step5: DATASTACK X^2
LOOPSTACK [1,1]
step6: DATASTACK X^2, X
LOOPSTACK [1,1]
step7: DATASTACK X^3
LOOPSTACK [1,1]
step8: DATASTACK X^3
LOOPSTACK [1,0]
Note however, with any method, it may be difficult for a GP program to actually evolve a member that has a loop, and even if it does, its likely that such a mechanism would result in a poor fitness evaluation at the start, likely removing it from the population any way. To fix this type of problem (potentially good innovations dying early due to poor early fitness), you'll need to include the concepts of demes in your genetic program to isolate genetically disparate populations.

Does using single-case discriminated union types have implications on performance?

It is nice to have a wrapper for every primitive value, so that there is no way to misuse it. I suspect this convenience comes at a price. Is there a performance drop? Should I rather use bare primitive values instead if the performance is a concern?
Yes, there's going to be a performance drop when using single-case union types to wrap primitive values. Union cases are compiled into classes, so you'll pay the price of allocating (and later, collecting) the class and you'll also have an additional indirection each time you fetch the value held inside the union case.
Depending on the specifics of your application, and how often you'll incur these additional overheads, it may still be worth doing if it makes your code safer and more modular.
I've written a lot of performance-sensitive code in F#, and my personal preference is to use F# unit-of-measure types whenever possible to "tag" primitive types (e.g., ints). This keeps them from being misused (thanks to the F# compiler's type checker) but also avoids any additional run-time overhead, since the measure types are erased when the code is compiled. If you want some examples of this, I've used this design pattern extensively in my fsharp-tools projects.
Jack has much more experience with writing high-performance F# code than I do, so I think his answer is absolutely right (I also think the idea to use units of measure is pretty interesting).
Just to put things in context, I wrote a really basic test (using just F# Interactive - so things may differ in Release mode) to compare the performance. It allocates an array of wrapped (vs. non-wrapped) int values. This is probably the scenario where non-wrapped types are really a good choice, because the array will be just a continuous block of memory.
#time
// Using a simple wrapped `int` type
type Foo = Foo of int
let foos = Array.init 1000000 Foo
// Add all 'foos' 1k times and ignore the results
for i in 0 .. 1000 do
let mutable total = 0
for Foo f in foos do total <- total + f
On my machine, the for loop takes on average something around 1050ms. Now, the unwrapped version:
let bars = Array.init 1000000 id
for i in 0 .. 1000 do
let mutable total = 0
for b in bars do total <- total + b
On my machine, this takes about 700ms.
So, there is certainly some performance penalty, but perhaps smaller than one would expect (some 33%). And this is looking at a test that does virtually nothing else than unwrap the values in a loop. In code that does something useful, the overhead would be a lot smaller.
This may be an issue if you're writing high-performance code, something that will process lots of data or something that takes some time and the users will run it frequently (like compiler & tools). On the other hand, if you application is not performance critical, then this is not likely to be a problem.
From F# 4.1 onwards adding the [<Struct>] attribute to suitable single case discriminated unions will increase the performance and reduce the number of memory allocations performed.

Splitting and runtime of log n

Sorry, I made a mistake in my earlier question. Because of that I didn't get the answer I wanted.
The teacher told us that every time you divide something by 2, the run-time is likely to be log n. For instance, if we divide an array into two, each time we traverse one of the array, the run-time would be log n. However, we may run into a case with LinkedList where we may be easily misled. For instance, we may have an algorithm to set the nth element of the list to something else by starting from either the head or the tail in order to have a run-time of less than n. Logically, we may think that the run time would be log n, but it's not. Why is that? And how do you determine that?
Do we need to absolutely have splitting to get a run-time of log n? I don't think it makes any logical sense to say the run-time of n when the maximum run-time of the loop is n/2.
I think some concepts need a bit of refining here, because the time complexity is only related to algorithm, not to the size of the data structure you're operating on.
The teacher told us that every time you divide something by 2, the run-time is likely to be log n. For instance, if we divide an array into two, each time we traverse one of the array, the run-time would be log n.
Now, traversing an array, like
for (int i = 0; i < array.size; i++) {
variable = array[i];
}
runs in O(n): the time needed to perform such an operation varies linearly with the size of the array. You will have O(log n) for operations like a binary search on an array, but you cannot generalize this concept to all array operations, and especially not to those who need to iterate over the array.
Now, this sentence
For instance, we may have an algorithm to set the nth element of the list to something else by starting from either the head or the tail in order to have a run-time of less than n.
leads me to believe that you think that the n as used in big O and what you call the "nth element" are directly related. They aren't. On a linked list your only option to go to element n is to go to the start of the list and follow the links down the element you're looking for (or in the case of a double linked list, go to the start or end depending on the position of the element you're looking for), so this operation has a time complexity of O(n), ie linearly related to the length of the collection.

Resources