multi dimension dynamic array allocation - gfortran

I want to use dynamic allocation for multi-block CFD code, where, the index (i,j,k) varies for different blocks. I really dont know, how to allocate arbitrary array index for n number of blocks and pass it to subroutines. I have given a sample code, which gives error message "Error: Expression at (1) must be scalar", on compilation using gfortran.
common/iteration/nb
integer, dimension (:),allocatable::nib,njb,nkb
real, dimension (:,:,:,:),allocatable::x,y,z
allocate (nib(nb),njb(nb),nkb(nb))
do l=1,nb
ni=nib(l)
nj=njb(l)
nk=nkb(l)
allocate (x(l,ni,nj,nk),y(l,ni,nj,nk),z(l,ni,nj,nk))
enddo
call gridatt (x,y,z,nib,njb,nkb)
deallocate(x,y,z,nib,njb,nkb)
end
subroutine gridatt (x,y,z,nib,njb,nkb)
common/iteration/nb
integer, dimension (nb)::nib,njb,nkb
real, dimension (nb,nib,njb,nkb)::x,y,z
do l=1,nb
read(7,*)nib(l),njb(l),nkb(l)
read(7,*)(((x(l,i,j,k),i=1,nib(l)),j=1,njb(l)),k=1,nkb(l)),
$ (((y(l,i,j,k),i=1,nib(l)),j=1,njb(l)),k=1,nkb(l)),
$ (((z(l,i,j,k),i=1,nib(l)),j=1,njb(l)),k=1,nkb(l))
enddo
return
end

The error message gfortran gives, is as good as they get. It points to nib in the line
real, dimension (nb,nib,njb,nkb)::x,y,z
nib is declared as an array. This is not allowed. (What would the size of x, y, and z be in this dimension?)
Apart from this, I don't really understand your description of what it is that you are trying to do, and the sample code you show doesn't make much sense to me.
common/iteration/nb
integer, dimension (:),allocatable::nib,njb,nkb
real, dimension (:,:,:,:),allocatable::x,y,z
allocate (nib(nb),njb(nb),nkb(nb))
When writing new code, using modules to communicate between program units is highly preferred. Old style common blocks are to be avoided.
You are trying to allocate nib, njb, and nkb with size nb. The problem is that nb hasn't been given a value yet (and won't be given one anywhere in the code).
do l=1,nb
ni=nib(l)
nj=njb(l)
nk=nkb(l)
allocate (x(l,ni,nj,nk),y(l,ni,nj,nk),z(l,ni,nj,nk))
enddo
Again the problem with nb not having a value. This loop runs for an unknown number of times. You're also using the arrays nib, njb, and nkb, which don't contain any values yet.
In each iteration of the loop x, y, and z get allocated. This will lead to a runtime error in the second iteration, because you can not allocate an already allocated variable. Even if the allocations would work, this loop would be useless, because the three arrays would be reset in each iteration and would eventually be set to the dimensions of the last allocation.
Now that I'm writing this, I'm starting to think that what it is that you are trying to do, is create so called 'jagged arrays': you want to create a block in x(1,:,:,:) that differs in size in the second, third, and/or fourth dimension from the block in x(2,:,:,:), and so on. This is simply not possible in fortran.
One way to achieve this would be to create a user defined type with an allocatable, three dimensional array component, and create an array of this type. You can then allocate the array component to the desired size for each element of the array of user defined type.
This would look something like the following (disclaimer: untested and just one possible way of achieving your goal).
type :: blocktype
real, dimension(:, :, :), allocatable :: x, y, z
end type blocktype
type(blocktype), dimension(nb) :: myblocks
You can then run a loop to allocate x, y, and z to a different size for each array element. This is assuming nb has been set to the required value, and nib, njb, and nkb contain the desired sizes for the different blocks.
do block = 1, nb
ni = nib(block)
nj = njb(block)
nk = nkb(block)
allocate(myblocks(block)%x(ni, nj, nk))
allocate(myblocks(block)%y(ni, nj, nk))
allocate(myblocks(block)%z(ni, nj, nk))
enddo
If you want to do it like this, you will definitely want to put your procedures in modules, because that way you automatically get explicit interfaces, which are required for passing around such arrays of user defined type.
One afterthought: Don't use implicit typing, not even in sample code. Always use implicit none.

Related

How to get top K elements from count-min-sketch?

I'm reading how the probabilistic data structure count-min-sketch is used in finding the top k elements in a data stream. But I cannot seem to wrap my head around the step where we maintain a heap to get our final answer.
The problem:
We have a stream of items [B, C, A, B, C, A, C, A, A, ...]. We are asked to find out the top k most frequently appearing
items.
My understanding is that, this can be done using micro-batching, in which we accumulate N items before we start doing some real work.
The hashmap+heap approach is easy enough for me to understand. We traverse the micro-batch and build a frequency map (e.g. {B:34, D: 65, C: 9, A:84, ...}) by counting the elements. Then we maintain a min-heap of size k by traversing the frequency map, adding to and evicting from the heap with each [item]:[freq] as needed. Straightforward enough and nothing fancy.
Now with CMS+heap, instead of a hashmap, we have this probabilistic lossy 2D array, which we build by traversing the micro-batch. The question is: how do we maintain our min-heap of size k given this CMS?
The CMS only contains a bunch of numbers, not the original items. Unless I also keep a set of unique elements from the micro-batch, there is no way for me to know which items I need to build my heap against at the end. But if I do, doesn't that defeat the purpose of using CMS to save memory space?
I also considered building the heap in real-time when we traverse the list. With each item coming in, we can quickly update the CMS and get the cumulative frequency of that item at that point. But the fact that this frequency number is cumulative does not help me much. For example, with the example stream above, we would get [B:1, C:1, A:1, B:2, C:2, A:2, C:3, A:3, A:4, ...]. If we use the same logic to update our min-heap, we would get incorrect answers (with duplicates).
I'm definitely missing something here. Please help me understand.
Keep a hashmap of size k, key is id, value is Item(id, count)
Keep a minheap of size k with Item
As events coming in, update the count-min 2d array, get the min, update Item in the hashmap, bubble up/bubble down the heap to recalculate the order of the Item. If heap size > k, poll min Item out and remove id from hashmap as well
Below explanation comes from a comment from this Youtube video:
We need to store the keys, but only K of them (or a bit more). Not all.
When every key comes, we do the following:
Add it to the count-min sketch.
Get key count from the count-min sketch.
Check if the current key is in the heap. If it presents in the heap, we update its count value there. If it not present in the heap, we check if heap is already full. If not full, we add this key to the heap. If heap is full, we check the minimal heap element and compare its value with the current key count value. At this point we may remove the minimal element and add the current key (if current key count > minimal element value).

How can we implement loop constructs in genetic programming?

I've been playing around with genetic programming for some time and started wondering how to implement looping constructs.
In the case of for loops I can think of 3 parameters:
start: starting value of counter
end: counter upper limit
expression: the expression to execute while counter < end
Now the tricky part is the expression because it generates the same value in every iteration unless counter is somehow injected into it. So I could allow the symbol for counter to be present in the expressions but then how do I prevent it from appearing outside of for loops?
Another problem is using the result of the expression. I could have a for loop which sums the results, another one that multiplies them together but that's limiting and doesn't seem right. I would like a general solution, not one for every operator.
So does anyone know a good method to implement loops in genetic programming?
Well, that's tricky. Genetic programming (the original Koza-style GP) is best suited for functional-style programming, i.e. there is no internal execution state and every node is a function that returns (and maybe takes) values, like lisp. That is a problem when the node is some loop - it is not clear what the node should return.
You could also design your loop node as a binary node. One parameter is a boolean expression that will be called before every loop and if true is returned, the loop will be executed. The second parameter would be the loop expression.
The problem you already mentioned, that there is no way of changing the loop expression. You can solve this by introducing a concept of some internal state or variables. But that leaves you with another problems like the need to define the number of variables. A variable can be realized e.g. by a tuple of functions - a setter (one argument, no return value, or it can return the argument) and getter (no arguments, returns the value of the variable).
Regarding the way of handling the loop result processing, you could step from GP to strongly typed GP or STGP for short. It is essentialy a GP with types. Your loop could then be effectively a function that returns a list of values (e.g. numbers) and you could have other functions that take such lists and calculate other values...
There is another GP algorithm (my favourite), called Grammatical Evolution (or GE) which uses context-free grammar to generate the programs. It can be used to encode type information like in STGP. You could also define the grammar in a way that classical c-like for and while loops can be generated. Some extensions to it, like Dynamically Defined Functions, could be used to implement variables dynamically.
If there is anything unclear, just comment on the answer and I'll try to clarify it.
The issue with zegkjan's answer is that there is more than one way to skin a cat.
Theres actually a simpler, and at times, better solution to creating GP datastructures than koza trees, instead using stacks.
This method is called Stack Based Genetic Programming, which is quite old (1993). This method of genetic programming removes trees entirely, you have a instruction list, and a data stack (where your functional and terminal set remain the same). You iterate through your instruction list, pushing values to the data stack, and pulling values to consume them, and returning a new value/values to the stack given your instruction. For example, consider the following genetic program.
0: PUSH TERMINAL X
1: PUSH TERMINAL X
2: MULTIPLY (A,B)
Iterating through this program will give you:
step1: DATASTACK X
step2: DATASTACK X, X
step3: DATASTACK X^2
Once you have executed all program list statements, you then just take off the number of elements from the stack that you care about (you can get multiple values out of the GP program this way). This ends up being a fast and extremely flexible method (memory locality, number of parameters doesn't matter, nor number of elements returned) you can implement fairly quickly.
To loop in this method, you can create a separate stack, an execution stack, where new special operators are used to push and pop multiple statements from the execution stack at once to be executed afterwards.
Additionally you can simply include a jump statement to move backwards in your program list, make a loop statement, or have a separate stack holding loop information. With this a genetic program could theoretically develop its own for loop.
0: PUSH TERMINAL X
1: START_LOOP 2
2: PUSH TERMINAL X
3: MULTIPLY (A, B)
4: DECREMENT_LOOP_NOT_ZERO
step1: DATASTACK X
LOOPSTACK
step2: DATASTACK X
LOOPSTACK [1,2]
step3: DATASTACK X, X
LOOPSTACK [1,2]
step4: DATASTACK X^2
LOOPSTACK [1,2]
step5: DATASTACK X^2
LOOPSTACK [1,1]
step6: DATASTACK X^2, X
LOOPSTACK [1,1]
step7: DATASTACK X^3
LOOPSTACK [1,1]
step8: DATASTACK X^3
LOOPSTACK [1,0]
Note however, with any method, it may be difficult for a GP program to actually evolve a member that has a loop, and even if it does, its likely that such a mechanism would result in a poor fitness evaluation at the start, likely removing it from the population any way. To fix this type of problem (potentially good innovations dying early due to poor early fitness), you'll need to include the concepts of demes in your genetic program to isolate genetically disparate populations.

Does tuple_size count the elements in the tuple like lists:length()?

Is there ever a reason where I would do something like:
{foobar, NumberOfElementsInFoobarTuple, ...}
I would imagine the internal datastructure for tuple_size knows it's size already, like the binary type. With binaries it's still beneficial to keep track of byte size as a variable instead of calling byte_size(). It doesn't make sense but that's how it is. I'm creating this tuple through list_to_tuple so it may have varying size.
A tuple is one fixed size block compared to a list which is a linked list of list cells each of which has a head and a tail. So a tuple explicitly contains its size and tuple_size(Tuple) just directly returns this value.

Splitting and runtime of log n

Sorry, I made a mistake in my earlier question. Because of that I didn't get the answer I wanted.
The teacher told us that every time you divide something by 2, the run-time is likely to be log n. For instance, if we divide an array into two, each time we traverse one of the array, the run-time would be log n. However, we may run into a case with LinkedList where we may be easily misled. For instance, we may have an algorithm to set the nth element of the list to something else by starting from either the head or the tail in order to have a run-time of less than n. Logically, we may think that the run time would be log n, but it's not. Why is that? And how do you determine that?
Do we need to absolutely have splitting to get a run-time of log n? I don't think it makes any logical sense to say the run-time of n when the maximum run-time of the loop is n/2.
I think some concepts need a bit of refining here, because the time complexity is only related to algorithm, not to the size of the data structure you're operating on.
The teacher told us that every time you divide something by 2, the run-time is likely to be log n. For instance, if we divide an array into two, each time we traverse one of the array, the run-time would be log n.
Now, traversing an array, like
for (int i = 0; i < array.size; i++) {
variable = array[i];
}
runs in O(n): the time needed to perform such an operation varies linearly with the size of the array. You will have O(log n) for operations like a binary search on an array, but you cannot generalize this concept to all array operations, and especially not to those who need to iterate over the array.
Now, this sentence
For instance, we may have an algorithm to set the nth element of the list to something else by starting from either the head or the tail in order to have a run-time of less than n.
leads me to believe that you think that the n as used in big O and what you call the "nth element" are directly related. They aren't. On a linked list your only option to go to element n is to go to the start of the list and follow the links down the element you're looking for (or in the case of a double linked list, go to the start or end depending on the position of the element you're looking for), so this operation has a time complexity of O(n), ie linearly related to the length of the collection.

Does erlang implement record copy-and-modify in any clever way?

given:
-record(foo, {a, b, c}).
I do something like this:
Thing = #foo{a={1,2}, b={3,4}, c={5,6}},
Thing1 = Thing#foo{a={7,8}}.
From a semantic view, Thing and Thing1 are unique entities. However, from a language implementation standpoint, making a full copy of Thing to generate Thing1 would be intensely wasteful. For example, if the record were a megabyte in size and I made a thousand "copies," each modifying a couple of bytes, I've just burned a gigabyte. If the internal structure kept track of a representation of the parent structure and each derivative marked up that parent in a way that indicated its own change but preserved everyone elses' versions, the derivates could be created with a minimum of memory overhead.
My question is this: is erlang doing anything clever - internally - to keep the overhead of the usual erlang scribble;
Thing = #ridiculously_large_record,
Thing1 = make_modified_copy(Thing),
Thing2 = make_modified_copy(Thing1),
Thing3 = make_modified_copy(Thing2),
Thing4 = make_modified_copy(Thing3),
Thing5 = make_modified_copy(Thing4)
...to a minimum?
I ask because there would be a number of changes to the way that I did cross-process communications if this were the case.
The exact workings of the garbage collection and memory allocation is only known to a few. Thankfully, they are very happy to share their knowledge and the following is based on what I have learnt from the erlang-questions mailing list and by discussing with OTP developers.
When messaging between processes, the content is always copied as there is no shared heap between processes. The only exception is binaries bigger than 64 bytes, where only a reference is copied.
When executing code in one process, only parts are updated. Let's analyze tuples, as that is the example you provided.
A tuple is actually a structure that keeps references to the actual data somewhere on the heap (except for small integers and maybe one more data type which I can't remember). When you update a tuple, using for example setelement/3, a new tuple is created with the given element replaced, however for all other elements only the reference is copied. There is one exception which I have never been able to take advantage of.
The garbage collector keeps track of each tuple and understands when it is safe to reclaim any tuple that is no longer used. It might be that the data referenced by the tuple is still in use, in which case the data itself is not collected.
As always, Erlang gives you some tools to understand exactly what is going on. The efficiency guide details how to use erts_debug:size/1 and erts_debug:flat_size/1 to understand the size of the data structure when used internally in a process and when copied. The trace tools also allows you to understand when, what and how much was garbage collected.
The record foo is of arity four (holding four words), but the whole structure is 14 words in size. Any immediate (pids, ports, small integers, atoms, catch and nil) can be stored directly in the tuples array. Any other term which can't fit into a word, such as other tuples, are not stored directly but referenced by boxed pointers (a boxed pointer is an erlang term with a forwarding address to the real eterm ... just internals).
In your case a new tuple of same arity is created and the atom foo and all the pointers are copied from the previous tuple except for index two, a, which points to the new tuple {7,8} which constitutes 3 words. In all 5 + 3 new words are created on the heap and only 3 words are copied from the old tuple the other 9 words are not touched.
Excessively large tuples are not recommended. When updating a tuple, the whole tuple, i.e the array and not the deep content, needs to copied and then updated in other to preserve a persistent data structure. This will also generate increased garbage, forcing the garbage collector to heat up which also hurts performance. The dict and array modules avoids using large tuples for this reason and have a shallow tuple tree instead.
I can definitely verify what people have already pointed out:
a record is just a tuple with the record name as the first element and all the fields just the following tuple element
when an element of a tuple is changed, updating a field in a record in your case, only the top level tuple is new, all the elements are just reused
This works just because we have immutable data. So in your example each time you update a value in a #foo record none of the data in the elements are copied and only a new 4-element tuple (5 words) is created. Erlang will never does a deep copy in this type of operation or when passing arguments in function calls.
In conclusion:
Thing = #foo{a={1,2}, b={3,4}, c={5,6}},
Thing1 = Thing#foo{a={7,8}}.
Here, if Thing is not used again, it will probably be updated in place and copying of the tuple will be avoided, as the Efficiency Guide says. (tuple and record syntax is complied into something like setelement, I think)
Thing = #ridiculously_large_record,
Thing1 = make_modified_copy(Thing),
Thing2 = make_modified_copy(Thing1),
...
Here the tuples are actually copied every time.
I guess that it would be theoretically possible make an interesting optimization to this. If the compiler could perform escape analysis on the return value of make_modified_copy and detect that the only reference to it is the one returned, in could save this information about the function. When it encounter a call the that function it would know that it is safe to modify the return value in place.
This would only be possible to do on inter module calls, because of the code replace feature.
Maybe one day we will have it.

Resources