How data layout in RAM memory? - memory

I have a basic architecture based question. How does multi dimensional arrays layout in memory? Is this correct that data layout linearly in memory? Is so, is it correct that in row major order data store based on row orders (first row store, then second row ...) and in column major data stores based on columns?
Thanks

The representation of an array depends upon the programming language. Most languages (the C abortion and its progeny being notable exceptions) represent arrays using a descriptor. The descriptor specifies the number of dimensions the upper and lower bounds of each dimension, and where the data is located.
Usually, the all the data for the array is stored contiguously. Even when stored contiguously the ordering depends upon the language. In some languages [0, 0, 0] is stored next to [1, 0, 0] (Column Major—e.g. FORTRAN)). In others [0, 0, 0] is next to [0, 0, 1] (and [0, 0, 0] and [1, 0, 0] are apart—row major—e.g., Pascal). Some languages, such as Ada, leave the ordering up to the compiler implementation.

Each array is stored in sequence, naturally. It makes no sense to spread data all over the place.
Example in C:
int matrix[10][10];
matrix[9][1] = 1234;
printf("%d\n", matrix[9][1]); // prints 1234
printf("%d\n", ((int*)matrix)[9 * 10 + 1]); // prints 1234
Of course there is nothing enforcing you to organize data this way, if you want to make a mess you can do it.
For example, if instead of using an array of arrays you decide to dynamically allocate your matrix:
int **matrix;
matrix = malloc(10 * sizeof(int*));
for (int i = 0; i < 10; ++i)
matrix[i] = malloc(10 * sizeof(int));
The above example is most likely still stored in sequence, but certainly not in a contiguous manner, because there are 11 different memory blocks allocated and the memory manager is free to allocate them wherever it makes sense to it.

Related

Check if a list contains any element of another list in Dart

I have an array:
const list1 = [0, 1, 2];
How do I check if other arrays contain any of the target array elements?
For example:
[2, 3] //returns true;
[2, 3, 4] //returns true;
[3, 4] //returns false;
Using list1.indexWhere(list2.contains) should be fine for small lists, but for large lists, the asymptotic runtime complexity would be O(m * n) where m and n are the sizes of the lists.
A different way to pose the problem of checking if a list contains any element of another list is to check if the set-intersection of two lists is non-empty. The direct way to implement that would be:
var contains = list1.toSet().intersection(list2.toSet()).isNotEmpty;
Since the default Set implementation is a LinkedHashSet, lookups would be O(1), and computing the intersection would be linear with respect to one of the Sets. However, converting each List to a Set would take linear time, making the whole operation take O(m + n).
That's asymptotically efficient, but it computes the entire intersection just to determine whether it's empty or not, which is wasteful. You can do a bit better by using .any to stop earlier and noting that .any doesn't benefit from the receiving object being a Set:
var set2 = list2.toSet();
var contains = list1.any(set2.contains);
Note that if you can use Sets in the first place instead of Lists, then the conversion cost would disappear and make the operation O(m).
final contains = list1.indexWhere((e) => list2.contains(e)) > -1
Explanation
indexWhere returns an index of an element where the test function returned true.
contains returns true if the given element is presented in the array.

What is the most memory-efficient array of nullable vectors when most of the second dimension will be empty?

I have a large fixed-size array of variable-sized arrays of u32. Most of the second dimension arrays will be empty (i.e. the first array will be sparsely populated). I think Vec is the most suitable type for both dimensions (Vec<Vec<u32>>). Because my first array might be quite large, I want to find the most space-efficient way to represent this.
I see two options:
I could use a Vec<Option<Vec<u32>>>. I'm guessing that as Option is a tagged union, this would result each cell being sizeof(Vec<u32>) rounded up to the next word boundary for the tag.
I could directly use Vec::with_capacity(0) for all cells. Does an empty Vec allocate zero heap until it's used?
Which is the most space-efficient method?
Actually, both Vec<Vec<T>> and Vec<Option<Vec<T>>> have the same space efficiency.
A Vec contains a pointer that will never be null, so the compiler is smart enough to recognize that in the case of Option<Vec<T>>, it can represent None by putting 0 in the pointer field. What is the overhead of Rust's Option type? contains more information.
What about the backing storage the pointer points to? A Vec doesn't allocate (same link as the first) when you create it with Vec::new or Vec::with_capacity(0); in that case, it uses a special, non-null "empty pointer". Vec only allocates space on the heap when you push something or otherwise force it to allocate. Therefore, the space used both for the Vec itself and for its backing storage are the same.
Vec<Vec<T>> is a decent starting point. Each entry costs 3 pointers, even if it is empty, and for filled entries there can be additional per-allocation overhead. But depending on which trade-offs you're willing to make, there might be a better solution.
Vec<Box<[T]>> This reduces the size of an entry from 3 pointers to 2 pointers. The downside is that changing the number of elements in a box is both inconvenient (convert to and from Vec<T>) and more expensive (reallocation).
HashMap<usize, Vec<T>> This saves a lot of memory if the outer collection is sufficiently sparse. The downsides are higher access cost (hashing, scanning) and a higher per element memory overhead.
If the collection is only filled once and you never resize the inner collections you could use a split data structure:
This not only reduces the per-entry size to 1 pointer, it also eliminates the per-allocation overhead.
struct Nested<T> {
data: Vec<T>,
indices: Vec<usize>,// points after the last element of the i-th slice
}
impl<T> Nested<T> {
fn get_range(&self, i: usize) -> std::ops::Range<usize> {
assert!(i < self.indices.len());
if i > 0 {
self.indices[i-1]..self.indices[i]
} else {
0..self.indices[i]
}
}
pub fn get(&self, i:usize) -> &[T] {
let range = self.get_range(i);
&self.data[range]
}
pub fn get_mut(&mut self, i:usize) -> &mut [T] {
let range = self.get_range(i);
&mut self.data[range]
}
}
For additional memory savings you can reduce the indices to u32 limiting you to 4 billion elements per collection.

In Lua Torch, the product of two zero matrices has nan entries

I have encountered a strange behavior of the torch.mm function in Lua/Torch. Here is a simple program that demonstrates the problem.
iteration = 0;
a = torch.Tensor(2, 2);
b = torch.Tensor(2, 2);
prod = torch.Tensor(2,2);
a:zero();
b:zero();
repeat
prod = torch.mm(a,b);
ent = prod[{2,1}];
iteration = iteration + 1;
until ent ~= ent
print ("error at iteration " .. iteration);
print (prod);
The program consists of one loop, in which the program multiplies two zero 2x2 matrices and tests if entry ent of the product matrix is equal to nan. It seems that the program should run forever since the product should always be equal to 0, and hence ent should be 0. However, the program prints:
error at iteration 548
0.000000 0.000000
nan nan
[torch.DoubleTensor of size 2x2]
Why is this happening?
Update:
The problem disappears if I replace prod = torch.mm(a,b) with torch.mm(prod,a,b), which suggests that something is wrong with the memory allocation.
My version of Torch was compiled without BLAS & LAPACK libraries. After I recompiled torch with OpenBLAS, the problem disappeared. However, I am still interested in its cause.
The part of code that auto-generates the Lua wrapper for torch.mm can be found here.
When you write prod = torch.mm(a,b) within your loop it corresponds to the following C code behind the scenes (generated by this wrapper thanks to cwrap):
/* this is the tensor that will hold the results */
arg1 = THDoubleTensor_new();
THDoubleTensor_resize2d(arg1, arg5->size[0], arg6->size[1]);
arg3 = arg1;
/* .... */
luaT_pushudata(L, arg1, "torch.DoubleTensor");
/* effective matrix multiplication operation that will fill arg1 */
THDoubleTensor_addmm(arg1,arg2,arg3,arg4,arg5,arg6);
So:
a new result tensor is created and resized with the proper dimensions,
but this new tensor is NOT initialized, i.e. there is no calloc or explicit fill here so it points to junk memory and could contain NaN-s,
this tensor is pushed on the stack so as to be available on the Lua side as the return value.
The last point means that this returned tensor is different from the initial prod one (i.e. within the loop, prod shadows the initial value).
On the other hand calling torch.mm(prod,a,b) does use your initial prod tensor to store the results (behind the scenes there is no need to create a dedicated tensor in that case). Since in your code snippet you do not initialize / fill it with given values it could also contain junk.
In both cases the core operation is a gemm multiplication like C = beta * C + alpha * A * B, with beta=0 and alpha=1. The naive implementation looks like that:
real *a_ = a;
for(i = 0; i < m; i++)
{
real *b_ = b;
for(j = 0; j < n; j++)
{
real sum = 0;
for(l = 0; l < k; l++)
sum += a_[l*lda]*b_[l];
b_ += ldb;
/*
* WARNING: beta*c[j*ldc+i] could give NaN even if beta=0
* if the other operand c[j*ldc+i] is NaN!
*/
c[j*ldc+i] = beta*c[j*ldc+i]+alpha*sum;
}
a_++;
}
Comments are mine.
So:
with torch.mm(a,b): at each iteration, a new result tensor is created without being initialized (it could contain NaN-s). So every iteration presents a risk of returning NaN-s (see above warning),
with torch.mm(prod,a,b): there is the same risk since you do not initialized the prod tensor. BUT: this risk only exists at the first iteration of the repeat / until loop since right after prod is filled with 0-s and re-used for the subsequent iterations.
So this is why you do not observe a problem here (it is less frequent).
In case 1: this should be improved at the Torch level, i.e. make sure the wrapper initializes the output (e.g. with THDoubleTensor_fill(arg1, 0);).
In case 2: you should initialize prod initially and use the torch.mm(prod,a,b) construct to avoid any NaN problem.
--
EDIT: this problem is now fixed (see this pull request).

subset without using brute-force

I'm trying to find a generic solution to the next problem without using brute force because is for an iOS app.
Let's say I have 3 arrays of elements.
arr_a = [1,1,2,3,3,3,2,1,2];
arr_b = [2,2,2,2,3,2,2];
arr_c = [3,2,2,2,1,2,2,3,2,2,3];
These arrays will only contain elements 1, 2 or 3. As you noticed, they might have all type of elements or just some of them.
Assuming there's a solution, how would you randomly pick X, Y and Z values from the arrays, being X the amount of 1s, Y the amount of 2s and Z the amount of 3s?

Lua: understanding table array part and hash part

In section 4, Tables, in The Implementation of Lua 5.0 there is and example:
local t = {100, 200, 300, x = 9.3}
So we have t[4] == nil. If I write t[0] = 0, this will go to hash part.
If I write t[5] = 500 where it will go? Array part or hash part?
I would eager to hear answer for Lua 5.1, Lua 5.2 and LuaJIT 2 implementation if there is difference.
Contiguous integer keys starting from 1 always go in the array part.
Keys that are not positive integers always go in the hash part.
Other than that, it is unspecified, so you cannot predict where t[5] will be stored according to the spec (and it may or may not move between the two, for example if you create then delete t[4].)
LuaJIT 2 is slightly different - it will also store t[0] in the array part.
If you need it to be predictable (which is probably a design smell), stick to pure-array tables (contiguous integer keys starting from 1 - if you want to leave gap use a value of false instead of nil) or pure hash tables (avoid non-negative integer keys.)
Quoting from Implementation of Lua 5.0
The array part tries to store the values corresponding to integer keys from 1 to some limit n.Values corresponding to non-integer keys or to integer keys outside the array range are
stored in the hash part.
The index of the array part starts from 1, that's why t[0] = 0 will go to hash part.
The computed size of the array part is the largest nsuch that at least half the slots between 1 and n are in use (to avoid wasting space with sparse arrays) and there is at least one used slot between n/2+1 and n(to avoid a size n when n/2 would do).
According from this rule, in the example table:
local t = {100, 200, 300, x = 9.3}
The array part which holds 3 elements, may have a size of 3, 4 or 5. (EDIT: the size should be 4, see #dualed's comment.)
Assume that the array has a size of 4, when writing t[5] = 500, the array part can no longer hold the element t[5], what if the array part resize to 8? With a size of 8, the array part holds 4 elements, which is equal to (so, not less that) half of the array size. And the index from between n/2+1 and n, which in this case, is 5 to 8, has one element:t[5]. So an array size of 8 can accomplish the requirement. In this case, t[5] will go to the array part.

Resources