How is array element located without array iterating? - memory

It is said that array element access is quick in a programming language knowing a numerical index of the element. I assume this is because computer memory is kind of like an array with each memory slot having a numerical address. So, knowing the memory address we can go directly to that address and access a value stored there.
How is that memory location accessed quickly? Is this done at the level of Microcode?

The local variable actually references the memory location of the first element in the array. The elements in the array are of a fixed size (such as 4 bytes each for an integer) and stored in order in a contiguous block of memory. The calculation for the memory location of an element is memory location of first element + (size in bits of single element * index of element) for a single dimensional array. Note that the index used in the calculation is zero-based, which is why most languages use zero-based indices for arrays. The elements of an array might actually be references to objects to keep the element sizes fixed, rather than the object itself which might be of a variable size.
Extra explanation for multidimensional array: Element locations in a multidimensional array can be calculated using location of element [0][0] + (size of single element * (index for 1st dimension + (index for 2nd dimension * length of 1st dimension) + ... + (index for nth dimension * length of (n-1)th dimension * ... * length of 1st dimension))) in the column major system. There are two systems, column major and row major. Look at this page for a visualization This math is still rather simple for a computer even though it looks complicated here. Multidimensional arrays could also be implemented as an array of arrays, which would be slightly slower and less memory efficient.
Out of bounds checking: The calculation for the element position does not account for the indices being larger than the length of the array, which can lead to accessing memory locations outside the array. The C language does not prevent this. Java will throw an OutOfBoundsException, but accessing the array becomes slightly more expensive due a check being performed. Out of bounds checking is language specific, if it's implemented

Related

Lua table, if the key starts from 1000, will there be a performance loss?

a={}
a[1000]=1
I don't define other things. Will the storage space be occupied from 1 to 999?
I know that a[1]=nil or a[999]=nil, will it be calculated from 1 to 1000 in sequence during traversal?
No, there is not going to be space allocated for other (1-999) elements (unless you create 1000 elements and then delete 1-999). Lua supports "sparse" arrays and will use the hash part of the table to store those key/value pairs.
If you are asking whether a[1000] is going to be slower if 1-999 elements are not present, then it's possible, as in this case the hash part of the table is going to be used (instead of the array part), but you'll have to benchmark to see if there is any observable difference that matters in your case.

Swift 3 and Index of a custom linked list collection type

In Swift 3 Collection indices have to conform to Comparable instead of Equatable.
Full story can be read here swift-evolution/0065.
Here's a relevant quote:
Usually an index can be represented with one or two Ints that
efficiently encode the path to the element from the root of a data
structure. Since one is free to choose the encoding of the “path”, we
think it is possible to choose it in such a way that indices are
cheaply comparable. That has been the case for all of the indices
required to implement the standard library, and a few others we
investigated while researching this change.
In my implementation of a custom linked list collection a node (pointing to a successor) is the opaque index type. However, given two instances, it is not possible to tell if one precedes another without risking traversal of a significant part of the chain.
I'm curious, how would you implement Comparable for a linked list index with O(1) complexity?
The only idea that I currently have is to somehow count steps while advancing the index, storing it within the index type as a property and then comparing those values.
Serious downside of this solution is that indices must be invalidated when mutating the collection. And while that seems reasonable for arrays, I do not want to break that huge benefit linked lists have - they do not invalidate indices of unchanged nodes.
EDIT:
It can be done at the cost of two additional integers as collection properties assuming that single linked list implements front insert, front remove and back append. Any meddling around in the middle would anyway break O(1) complexity requirement.
Here's my take on it.
a) I introduced one private integer type property to my custom Index type: depth.
b) I introduced two private integer type properties to the collection: startDepth and endDepth, which both default to zero for an empty list.
Each front insert decrements the startDepth.
Each front remove increments the startDepth.
Each back append increments the endDepth.
Thus all indices startIndex..<endIndex have a reflecting integer range startDepth..<endDepth.
c) Whenever collection vends an index either by startIndex or endIndex it will inherit its corresponding depth value from the collection. When collection is asked to advance the index by invoking index(_ after:) I will simply initialize a new Index instance with incremented depth value (depth += 1).
Conforming to Comparable boils down to comparing left-hand side depth value to the right-hand side one.
Note that because I expand the integer range from both sides as well, all the depth values for the middle indices remain unchanged (thus are not invalidated).
Conclusion:
Traded benefit of O(1) index comparisons at the cost of minor increase in memory footprint and few integer increments and decrements. I expect index lifetime to be short and number of collections relatively small.
If anyone has a better solution I'd gladly take a look at it!
I may have another solution. If you use floats instead of integers, you can gain kind of O(1) insertion-in-the-middle performance if you set the sortIndex of the inserted node to a value between the predecessor and the successor's sortIndex. This would require to store (and update) the predecessor's sortIndex on your nodes (I imagine this should not be to hard since it is only changed on insertion or removal and it can always be propagated 'up').
In your index(after:) method you need to query the successor node, but since you use your node as index, that is be straightforward.
One caveat is the finite precision of floating points, so if on insertion you the distance between the two sort indices are two small, you need to reindex at least part of the list. Since you said you only expect small scale, I would just go through the hole list and use the position for that.
This approach has all the benefits of your own, with the added benefit of good performance on insertion in the middle.

Fortran entries of array change seemingly at random

I have been working with a FORTRAN program. I have noticed seemingly random changes in a 1D matrix I'm working with. It is a matrix of 4000 integers. Values are added to the matrix one by one, starting with index 1 and iterating by 1 for each added value. The matrix does not get fully "filled", currently only 100 values are placed into the matrix. So one would expect that the first 100 entries of the matrix will be non-zero (all added values are non-zero) and the remaining 3900 entries will be 0. However, several of the entries of the matrix end up being large negative numbers, but I'm certain that no portion of my code touches these entries.
What could be causing this issue? I'm sorry but I can't post the code for you all to work with.
The code has several other large matrices, taking up a total of ~100 MB of space. Could this potentially be a memory issue?
Thanks!
You have to initialize your array, otherwise it will almost always contain garbage. This would do it:
array = 0.0e0 ! real array
or
array = 0.0e0 ! double precision
or
array = 0 ! integer
A "matrix" is two-dimensional; your array is one-dimensional.
Things do not change unless you ask them to change.
FORTRAN does not initialize variables other than (as I recall) in a labeled COMMON. As such, they are guaranteed to start out with garbage values. Try initializing your data with a DATA statement. If you have to initialize a labeled COMMON, you will have to supply a BLOCK DATA subprogram.

Redis Data Structure Space Requirements

What is the difference in space between sorted sets and lists in redis? My guess is that sorted sets are some kind of balanced binary tree, and lists are a linked list. This means that on top of the three values that I'm encoding for each of them, key, score, value, although I'll munge together score and value for the linkedlist, the overhead is that the linkedlist needs to keep track of one other node, and the binary tree needs to keep track of two, so that the space overhead to using a sorted set is O(N).
If my value, and score are both longs, and the pointers to the other nodes are also longs, it seems like the space overhead of a single node goes from 3 longs to 4 longs on a 64-bit computer, which is a 33% increase in space.
Is this true?
It is much more than your estimation. Let's suppose ziplists are not used (i.e. you have a significant number of items).
A Redis list is a classical double-linked list: 3 pointers (prev,next,value) per item.
A sorted set is a dictionary plus a skip list. In the dictionary, items will be stored with 3 pointers as well (key,value,next). The skip list memory footprint is more complex to evaluate: each node takes 1 double (score), 2 pointers (obj,backward), plus n couples (pointer,span value) with n between 1 and 32. Most items will take only 1 or 2 couples.
In other words, when it is not represented as a ziplist, a sorted set is by far the Redis data structure with the most overhead. Compared to a list, the memory overhead is more than 200% (i.e. 3 times).
Note: the best way to evaluate memory consumption with Redis is to try to build a big list or sorted set with pseudo-data and use INFO to get the memory footprint.

What happens when increasing the size of a dynamic array?

I'd like to understand what happens when the size of a dynamic array is increased.
My understanding so far:
Existing array elements will remain unchanged.
New array elements are initialised to 0
All array elements are contiguous in memory.
When the array size is increased, will the extra memory be tacked onto the existing memory block, or will the existing elements be copied to a entirely new memory block?
Does changing the size of a dynamic array have consequences for pointers referencing existing array elements?
Thanks,
[edit] Incorrect assumption struck out. (New array elements are initialised to 0)
Existing array elements will remain unchanged: yes
New array elements are initialized to 0: yes (see update) no, unless it is an array of compiler managed types such as string, an other array or a variant
All array elements are contiguous in memory: yes
When the array size is increased, the array will be copied. From the doc:
...memory for a dynamic array is reallocated when you assign a value to the array or pass it to the SetLength procedure.
So yes, increasing the size of a dynamic array does have consequences for pointers referencing existing array elements.
If you want to keep references to existing elements, use their index in the array (0-based).
Update
Comments by Rob and David prompted me to check the initialization of dynamic arrays in Delphi5 (as I have that readily available anyway). First using some code to create various types of dynamic arrays and inspecting them in the debugger. They were all properly initialized, but that could still have been a result prior initialization of the memory location where they were allocated. So checked the RTL. It turns out D5 already has the FillChar statement in the DynArraySetLength method that Rob pointed to:
// Set the new memory to all zero bits
FillChar((PChar(p) + elSize * oldLength)^, elSize * (newLength - oldLength), 0);
In practice Embarcadero will always zero initialise new elements simply because to do otherwise would break so much code.
In fact it's a shame that they don't officially guarantee the zero allocation because it is so useful. The point is that often at the call site when writing SetLength you don't know whether you are growing or shrinking the array. But the implementation of SetLength does know – clearly it has to. So it really makes sense to have a well-defined action on any new elements.
What's more, if they want people to be able to switch easily between managed and native worlds then zero allocation is desirable since that's what fits with the managed code.

Resources