Contiguous memory in Forth arrays? - forth

I know that variable test 5 cells allot is not guaranteed to allocate a contiguous block of memory, while create test 1 , 2 , 3 , 4 , 5 , will definitely create a contiguous block of memory.
variable is defined as : variable create 0 , ;
Is alloting more cells to the variable not guaranteed to extend the block of memory contiguously because create can only be called once per word?
Example:
create test 1 , 2 ,
test 3 , 4 , 5 , <<<< This won't necessarily extend the array contiguously, correct?
Are my assumptions correct?

The wording in the standard gives VARIABLE and CREATE freedom to put the data in different memory regions. If they do, obviously CREATE or ALLOT can't extend the region created by VARIABLE.
CREATE can be called many times from any word.
Your example may not quite do what you think. The second line calls test, leaving its address on the stack. Then it lays down three cells which do extend the region allocated for test.
Your assumption about the definition of VARIABLE is not correct for all implementations.

Related

Optimized machine learning technique

Question: I'm looking for a technique that I can use to reduce the number of iterations my application has to perform to find the optimal variable combination out of all possible variable combinations without testing every variable combination.
Current situation: I have a list of variables and each variable has a valid list of values. At the moment I'm creating a cartesian product of the list of valid variable values and I run logic across each possible variable combination. This means I'm wanting to run 2 000 000 different iterations and this takes a lot of time. I'm not interested in how to more efficiently run 2 000 000 different variable combinations but, instead after a technique I could use to hone in on an optimal variable combination without running through all the combinations.
Example: lets say I've got 3 variables named "one", "two" & "three". Each variable can be any value between 1 and 2. This means I have 2 to the power of 3 or a 8 different variable combinations. My list of possible variable combinations would look something like:
[
[one:1,two:1,three:1],
[one:1,two:1,three:2],
[one:1,two:2,three:1],
[one:1,two:2,three:2],
[one:2,two:1,three:1],
[one:2,two:1,three:2],
[one:2,two:2,three:1],
[one:2,two:2,three:2]
]
I would then run logic against each possible variable combination and this gives me the result of that variable combination. The end result being that I know which variable combination gives me the best result. This works great across smaller variable sets but takes days across larger sets.

Does number of global variables affect performance?

I wonder if having too many global variables in Lua would slow down when accessing a global variable.
For example, if my program has 10,000 global variables, would this be slower to call a global function A than calling the function in a program that has only 100 global variables?
How does Lua find a registered global variable? Does it use some kind of hash map?
Pretty much everything that stores data in Lua is a table of some form (local variables being the notable exception). This includes globals. So global accesses are table accesses, which are implemented as hash tables.
Access to a hash entry is amortized O(1), and therefore does not vary based on the number of entries in the table. Now, "amortized O(1)" does allow for some non-constant variance, but that will not be based on the size of the table so much as the way the hashes for the keys collide. Obviously, larger tables mean that more collisions are possible, but larger tables also use more hash entries, thus lowering the chances of collision. So the two generally cancel out.
Basically, you shouldn't care. Access to a global will not be the principle performance issue you will face in any real program.
In Lua, global variables are stored in a table called _G. Adding a key to a large table might occasionally be slow due to the possibility of having to resize the table. I don't know the threshold for when this slowdown becomes noticeable. As far as I know, accessing a key should be fast regardless of the table's size.

Lua: Is decreasing iterator from inside of loop possible?

array={}
size=10
math.randomseed(os.time())
for i=1,size do
array[i]=math.random(size)
if(i>1) then
for j=1,i do
if array[j]==array[i] then
i=i-1
break
end
end
end
end
for i=1,size do
print(array[i])
end
The code above was meant to generate an array of random numbers from 1 to 'size' avoiding repeating values. I tried to achieve that by repeating top level 'for' loop once more if newly generated value was present before somewhere in array - by decreasing its iterator. Somehow it doesn't work. Why?
Is modifying iterator value from inside of loop not possible?
Example output with repeating values in array:
>lua5.1 "pairsss.lua"
2
1
10
6
5
2
5
7
7
4
>Exit code: 0
The solution to your problem is to shuffle the array, like Random iteration to fill a table in Lua.
To answer your question, from Lua 5.1 reference manual:
§2.4.5 – For Statement
All three control expressions are evaluated only once, before the loop starts. They must all result in numbers.
That means, no matter how you change the value of i inside the for loop, it doesn't affect how the iteration is done.
You can use a set instead of an array, as done by the author of question Randomize numbers in Lua with no repeats. As one of the answers points out, as your set gets closer in size to your range of randome numbers (say, you have random numbers 1 to 100 and your set is size 50) it will be more and more difficult to find a number that hasn't already been picked. You can see that for a set of size 50 and picking a random # from 1 to 100, then by the time you have the set half full, you have a 25-50 % chance of finding the random pick is already in use in your set. In that case, shuffling is the way to go, as explained in one of the answers to that post (Randomize numbers in Lua with no repeats).

Memory Locations of Variables when Using IA-32 Assembly Language

Quick question on memory locations in IA-32 assembly language that i cannot seem to find the answer for anywhere else.
On IA-32 each memory address is 4 bytes long (e.g. 0x0040120e). Each of these addresses points to a 1 byte value (or in the case of a larger value, the first byte of it). Now look at these two simple IA-32 assembly language statements:
var1 db 2
var2 db 3
This will place var1 and var2 in adjacent memory cells (let's say 0x0040120e and 0f). Now I realize that the define directive db allocates 1 byte to the value. But, in the case above I have two values (2 and 3) that in fact only requires two bits each, to be stored.
Questions:
When using the db directive, do these two values still consume a full byte, even though they are smaller than 1 byte?
Is using a full byte for values that could get away with less, still the common way to go (as we have so much memory that we don't care)?
Does integers 0 to 255 then generally take up 1 byte and integers 256 to (2^16 - 1) take up 2 bytes (a word), etc.?
Thank you,
Magnus
EDIT 1: Made questions more clear (apologies for the back and forth)
EDIT 2: Added a structured reply below, based on other posters' input
yes. the B in DB is for Byte.
You could use a nibble for each, like so:
combined db 0x23
but you'd have to
a) shift the result for 4 bits right if you need the "2".
b) mask the leftmost 4 bits if you need the "3".
Hardly worth the effort these days ;-)
Yes, since the architecture is byte-addressable and cannot address anything smaller than a byte.
This means that data requiring less than one byte will need to share its address with other data.
In practice this means that you're going to have to know which bits in the pointed-out byte are used for this particular value.
For hardware registers this sort of mapping is very common.
EDIT: Ah, you seem to mean "values of the same variable" when you said "2 and 3". I thought you meant 2-bit and 3-bit values. You need to decide how many bits are needed at most for a particular variable, for all the values you need that variable to be able to store. There are variable-length encodings for integers of course, but that's generally rarely used in assembly and not what you'd typically use for some general-purpose variable.
You generally should expect to reserve all bits required for all values that a variable need to hold, up front. Otherwise, if you're worried about "wasting memory", you would need to move all other variables as soon as you get some "vacant bits" somewhere. That would end up costing fantastically much. Also, knowing the size of a variable is constant makes it possible to generate (or write) the proper code to handle it, otherwise you would of course also need to explicitly store somewhere "the size of the value held in variable x is now y bits". That becomes extremely painful very very quickly.
My initial question was a bit unstructured, so for the benefit of other searchers stopping by here I will use the answers received from #unwind and #geert3 to create a structured response. Again, this was my fault due to the initial poor structuring and creds for the answers goes to #unwind and #geert3.
When using the db directive you allocate 1 byte to the variable, and even if the variable takes up less space than 1 byte, it will still consume that full 1-byte address spot. As one might guess, that wastes a few bits of memory, but that is okay as you have enough memory and not too bothered about wasting a couple of bits. The reason you want to use the full 1-byte memory location is that it is easier to reference the variable when it is alone in the address slot (see #geert3's note on how to access it if you use less than a byte), and additionally, in case you want to reuse the variable later, it is great to know you have space for any number up to 255.
Yes, see answer to 1
Yes, you would normally allocate multiples of a byte to a variable, in a byte-addressable system

how to generate a bidimensional array with different "branch" lengths very fast

I am a Delphi programmer.
In a program I have to generate bidimensional arrays with different "branch" lengths.
They are very big and the operation takes a few seconds (annoying).
For example:
var a: array of array of Word;
i: Integer;
begin
SetLength(a, 5000000);
for i := 0 to 4999999 do
SetLength(a[i], Diff_Values);
end;
I am aware of the command SetLength(a, dim1, dim2) but is not applicable. Not even setting a min value (> 0) for dim2 and continuing from there because min of dim2 is 0 (some "branches" can be empty).
So, is there a way to make it fast? Not just by 5..10% but really FAST...
Thank you.
When dealing with a large amount of data, there's a lot of work that has to be done, and this places a theoretical minimum on the amount of time it can be done in.
For each of 5 million iterations, you need to:
Determine the size of the "branch" somehow
Allocate a new array of the appropriate size from the memory manager
Zero out all the memory used by the new array (SetLength does this for you automatically)
Step 1 is completely under your control and can possibly be optimized. 2 and 3, though, are about as fast as they're gonna get if you're using a modern version of Delphi. (If you're on an old version, you might benefit from installing FastMM and FastCode, which can speed up these operations.)
The other thing you might do, if appropriate, is lazy initialization. Instead of trying to allocate all 5 million arrays at once, just do the SetLength(a, 5000000); at first. Then when you need to get at a "branch", first check if its length = 0. If so, it hasn't been initialized, so initialize it to the proper length. This doesn't save time overall, in fact it will take slightly longer in total, but it does spread out the initialization time so the user doesn't notice.
If your initialization is already as fast as it will get, and your situation is such that lazy initialization can't be used here, then you're basically out of luck. That's the price of dealing with large amounts of data.
I just tested your exact code, with a constant for Diff_Values, timed it using GetTickCount() for rudimentary timing. If Diff_Values is 186 it takes 1466 milliseconds, if Diff_Values is 187 it fails with Out of Memory. You know, Out of Memory means Out of Address Space, not really Out of Memory.
In my opinion you're allocating so much data you run out of RAM and Windows starts paging, that's why it's slow. On my system I've got enough RAM for the process to allocate as much as it wants; And it does, until it fails.
Possible solutions
The obvious one: Don't allocate that much!
Figure out a way to allocate all data into one contiguous block of memory: helps with address space fragmentation. Similar to how a bi dimensional array with fixed size on the "branches" is allocated, but if your "branches" have different sizes, you'll need to figure a different mathematical formula, based on your data.
Look into other data structures, possibly ones that cache on disk (to brake the 2Gb address space limit).
In addition to Mason's points, here are some more ideas to consider:
If the branch lengths never change after they are allocated, and you have an upper bound on the total number of items that will be stored in the array across all branches, then you might be able to save some time by allocating one huge chunk of memory and divvying up the "branches" within that chunk yourself. Your array would become a 1 dimensional array of pointers, and each entry in that array points to the start of the data for that branch. You keep track of the "end" of the used space in your big block with a single pointer variable, and when you need to reserve space for a new "branch" you take the current "end" pointer value as the start of the new branch and increment the "end" pointer by the amount of space that branch requires. Don't forget to round up to dword boundaries to avoid misalignment penalties.
This technique will require more use of pointers, but it offers the potential of eliminating all the heap allocation overhead, or at least replacing the general purpose heap allocation with a purpose-built very simple, very fast suballocator that matches your specific use pattern. It should be faster to execute, but it will require more time to write and test.
This technique will also avoid heap fragmentation and reduces the releasing of all the memory to a single deallocation (instead of millions of separate allocations in your present model).
Another tip to consider: If the first thing you always do with the each newly allocated array "branch" is assign data into every slot, then you can eliminate step 3 in Mason's example - you don't need to zero out the memory if all you're going to do is immediately assign real data into it. This will cut your memory write operations by half.
Assuming you can fit the entire data structure into a contiguous block of memory, you can do the allocation in one shot and then take over the indexing.
Note: Even if you can't fit the data into a single contiguous block of memory, you can still use this technique by allocating multiple large blocks and then piecing them together.
First off form a helper array, colIndex, which is to contain the index of the first column of each row. Set the length of colIndex to RowCount+1. You build this by setting colIndex[0] := 0 and then colIndex[i+1] := colIndex[i] + ColCount[i]. Do this in a for loop which runs up to and including RowCount. So, in the final entry, colIndex[RowCount], you store the total number of elements.
Now set the length of a to be colIndex[RowCount]. This may take a little while, but it will be quicker than what you were doing before.
Now you need to write a couple of indexers. Put them in a class or a record.
The getter looks like this:
function GetItem(row, col: Integer): Word;
begin
Result := a[colIndex[row]+col];
end;
The setter is obvious. You can inline these access methods for increased performance. Expose them as an indexed property for convenience to the object's clients.
You'll want to add some code to check for validity of row and col. You need to use colIndex for the latter. You can make this checking optional with {$IFOPT R+} if you want to mimic range checking for native indexing.
Of course, this is a total non-starter if you want to change any of your column counts after the initial instantiation!

Resources