While creating and running a small piece of code to see how global and local variables are initialized,the global array outputs a zero(because global variables are initialzed with zero).
But why is it that the printed values of the local array change randomly everytime you re-build the program?
#include<iostream>
using namespace std;
int global[1];
int main()
{
int local[1];
cout<<"global is:"<<global[0]<<global[1]<<"\n";
cout<<"local is:"<<local[0]<<local[1]<<"\n";
return 0;
}
Global variables behave like static in C++ (behaviour defined by standard). Their values are set to 0. They also have a special place in memory.
Local variables, on the other hand, if left unitialized have their values undefined. Which means when you run the program it just reads the data there are at the memory where the structure is allocated. Those are the seemingly random values you encounter.
Related
When we declare a variable, say an int, I would like to know the steps involved during the memory allocation and initialisation. and also for a pointer
int x = 5;
now during compile time, 4-bytes is allocated to the integer x. but when does the memory gets filled with the value 5? does the initialisation take place during compilation or runtime execution.
similarly, consider
int x = 5;
int* p = &x;
in these 2 lines what is process of allocation and initialisation.
Variables initialization depends on the kind of variables. Global or static variables are initialised at compile time, while automatic variables are entirely managed at run time.
Global variables
At compile time, the value of all global variables is known. These values are written by the compiler to specific sections of an object file.
At link time, all the object files are gathered and memory locations are determined for each variable. This allows to know the address of every variable, in case one of these addresses is assigned to another variable.
As a result, an executable file is generated that contains a description of the content of every section ( text, data, rodata, etc). In the data or rodata section, are written the values of all initialized global variables.
at run time, the loader reads the description of the different sections and asks to the OS memory. It then will copy the content of all sections to their respective memory locations.
This is the way variable are initialised with a value determined at compile or link time.
The only exception is for variables that are initialized at zero (or not initialized). They are located in a special section (frequently named bss). To reduce the size of executable files, these zero values are not written in the executable. Instead, before executing main(), a runtime procedure will memset to zero all the content of the bss section.
Automatic variables
The procedure is completely different. One does not know the location of these variable before the program runs and the only way is to compute their values by machine instructions.
So the compiler first determines if theses vars will be located in register or memory, and when entering the function, the first instructions will be to reserve stack space for local variables and to initialize their values. This is done by means of regular machine instructions.
In case the value is the address of another variable (say y=&x),
* if x is a local (automatic) variable, the address will be computed by writing to y the sum of the content of the stack pointer register and a given offset determined by the compiler
* if x is a global or static variable, at link time, once the addresses of global variables are known, the linker modifies the instructions generated by the compiler to write the proper address in the register or stack location used to represent y.
There are situations where there is no way around defining at runtime:
if user_input == "yes":
my_var = 5
else:
my_var = 7
But normally it depends on the concept the responsible compiler programmer has implemented. If you use a different compiler or different language then things might be different.
I'm trying to learn OpenCL but I'm a having a hard time deciding which address spaces to use, as I only find assembled resources declaring what these address spaces are, but not why they exist or when to use them. The resources are at least too scattered, so with this question I hope to assemble all this information: what are all the address spaces, why do they exist, when to use which address space and what are the advantages and disadvantages regarding memory and performance.
As I understand it (which is probably too simplified), the GPU has two physical types of memory: global memory, far from the actual processors, so slow but pretty big and available to all workers, and local memory, close to the actual processors, so fast but small and not accessible from other workers.
Intuitively, the local qualifier makes sure a variable is placed on local memory and the global qualifier makes sure a variable is placed on global memory, though I'm not sure this is exactly what happens. This leaves the private and constant qualifiers. What's the purpose of those?
There also are some implicit qualifiers. For example, the specifications mention the generic address space, which is used for arguments with no qualifiers, I think. What does this do exactly? Then there also are local function variables. What's the address space for those?
Here is an example using my intuition, but without knowing what I'm actually doing:
Example:
Say I pass an array of type long and length 10000 to a kernel which I will only use to read, then I would declare it global const as it must be available to all workers and it will not change. Why wouldn't I use the constant qualifier? When setting the buffer for this array via the CPU, I actually also just could have made the array read-only, which in my eyes says the same as declaring it const. So again, when and why would I declare something constant or global const?
When performing memory-intensive tasks, would it be better to copy the array to a local array inside the kernel? My guess is that local memory would be too small, but what if the array only had a length of 10? When would the array be too big/small? More general: when is it worth copying data from global to local memory?
Say I also want to pass the length of this array, then I would add const int length to the arguments of my kernel, but I'm unsure why I would omit the global qualifier except because I have seen other people do it. After all, length must be accessible for all workers. If I'm right, then length would have a generic address space, but again, I don't really know what that means.
I hope someone with some experience can clear this up. That would be great not only for me, but I hope also for other enthusiasts who want to gain some practical knowledge concerning memory management on the GPU.
Constant: A small portion of cached global memory visible by all workers. Use it if you can, read only.
Global: Slow, visible by all, read or write. It is where all your data will end, so some accesses to it are always necessary.
Local: Do you need to share something in a local group? Use local! Do all your local workers access the same global memory? Use local!
Local memory is only visible inside local workers, and is limited in size, however is very fast.
Private: Memory that is only visible to a worker, consider it like registers. All non defined values are private by default.
Say I pass an array of type long and length 10000 to a kernel which I
will only use to read, then I would declare it global const as it must
be available to all workers and it will not change. Why wouldn't I use
the constant qualifier?
Actually, yes, you can and you should use constant qualifier. Which places your data on the constant memory (a small portion of read only memory quickly accessible by all workers). This is used by GPUs to transfer uniforms to all vertex shaders.
When setting the buffer for this array via the CPU, I actually also
just could have made the array read-only, which in my eyes says the
same as declaring it const. So again, when and why would I declare
something constant or global const?
Not really, when you create a buffer read only you are only specifiying to OpenCL you plan to use it read only, so it can do optimizations in the back, but you can actually write to it from a kernel.
global const is just a safeguard for the developer, so you don't accidentally write to it, it will give an error at compile time.
Basically, the same as in plain C host side computing. Programs will also work fine if all memory is non-const.
When performing memory-intensive tasks, would it be better to copy the array to a local array inside the kernel? My guess is that local memory would be too small, but what if the array only had a length of 10? When would the array be too big/small? More general: when is it worth copying data from global to local memory?
It is only worth if it is read by all workers. If each worker reads a single value of the global memory, then it is not worth.
Useful here:
Worker0 -> Reads 0,1,2,3
Worker1 -> Reads 0,1,2,3
Worker2 -> Reads 0,1,2,3
Worker3 -> Reads 0,1,2,3
Not useful here:
Worker0 -> Reads 0
Worker1 -> Reads 1
Worker2 -> Reads 2
Worker3 -> Reads 3
Say I also want to pass the length of this array, then I would add
const int length to the arguments of my kernel, but I'm unsure why I
would omit the global qualifier except because I have seen other
people do it. After all, length must be accessible for all workers. If
I'm right, then length would have a generic address space, but again,
I don't really know what that means.
When you don't specify a qualifier in the kernel parameter it typically defaults to constant, which is what you want for those small elements, to have a fast access by all workers.
The rules normally OpenCL compilers follow for kernel parameters is: if it only read and fits in constant, constant, otherwise global.
I read that it is faster and better to keep most of your functions local instead of global.
So I'm doing this:
input = require("input")
draw = require("draw")
And then in input.lua for example:
local tableOfFunctions = {isLetter = isLetter, numpadCheck = numpadCheck, isDigit = isDigit, toUpper = toUpper}
return tableOfFunctions
Where isLetter, numpadCheck etc are local functions for that file.
Then I call the functions like so:
input.isLetter(key)
Now, my question is: Am I reinventing the wheel with this? Aren't global functions stored in a lua table? I do like the way it looks with the input. before the function name, keeps it nice and tidy so I may keep it if it's not a bad coding practise.
Reinventing the wheels tailored to your personal needs is centerpiece of lua.
The method you describe is described as a valid one by lua creator himself in his book, here.
Everything in Lua is stored inside a table. The "faster" local function (as well as faster local variables) comes from the way of how globals and upvalues are looked up.
Below the line there's a quote of relevant part of more detailed explanation on speed that happened to occur in on game's forum.
Apart from that, locals are recommended due to cleanness of the code and error proofing.
In lua a table is created with {}, this operator reserves a certain amount of memory in the ram for the table. That reserved space stays constant and unmovable, exceptions are implementation details that script writer should not concern himself with.
Any variable you assign table to
a={};
b={ c={a} }
is just a pointer to the table in memory. A pointer takes up either 32 or 64 bits and that's it.
Whenever you pass table around only those 64 bits are copied.
Whenever you query a table in a table:
return b.c[1]
computer follows the pointer stored in b, finds a table object in ram, queries it for key "c", takes pointer to another table, queries that for key "1" then returns the pointer to the table a. Just a simple pointer hopping, workload on par with arithmetic.
Every function has associated table _ENV, any variable lookup
return a
is actually a query to that table
return _ENV.a
If the variable is local, it is stored in _ENV.
If there's no variable in _ENV with the given name, then global variables are queried, those actually reside in top-level table, the _ENV of the root function of the script (it is require or dofile function that loads and executes the script).
Usually, a link to the global table is stored in any other _ENV as _G. So the access to a global variable
return b
is actually something like
return _ENV.b or _ENV._G.b
Thus it is about 3 pointer jumps instead of 1.
Here is convoluted example that should give you an insight on the amount of work that implies:
%RUN THIS IN STANDALONE LUA INTERPRETER
local depth=100--how many pointers will be in a chain
local q={};--a table
local a={};--a start of pointer chain
local b=a; -- intermediate variable
for i=1,depth do b.a={} b=b.a end; --setup chain
local t=os.clock();
print(q)
print(os.clock()-t);--time of previous line execution
t=os.clock(); --start of pointer chain traversal
b=a
while b.a do b=b.a end
print(b)
print(os.clock()-t)--time of pointer traversal
When a pointer chain is about 100 elements, system load fluctuations may actually cause the second time be smaller. The direct access gets notably faster only when you change depth to thousands and more intermediate pointers.
Note that, whenever you query uninitialized variable, all 3 jumps are taken.
Globals are stored in the reserved table _G (the contents of which you can examine at any time), but it is good programming practice to avoid the use of globals.
Unless there is a very good reason not to, your table input should be local as well.
From Programming in Lua:
It is good programming style to use local variables whenever possible. Local variables help you avoid cluttering the global environment with unnecessary names. Moreover, the access to local variables is faster than to global ones.
I tried to declare a __global memory chunk inside the kernel, like
__global float arr[200];
I assume this would create an array in the global memory that I could referred to in the kernel. The program compiled successfully, but then
when I run it, it indicated:
error: variable with automatic storage duration
cannot be stored in the named address space
I don't know why this happen.
In order to use global memory, did we have to create a buffer on the host side before we use it?
If I want to create an array shared by all the threads, except passing another new argument for this global array, what can I do instead ?
You can allocate it in program scope, at least in OpenCL 2.
__global float arr[200];
kernel void foo()
{
if(get_global_id(0) == 0)
arr[0] = 3;
}
Though be careful with initialization of course, there's no way to synchronize the work-items across the dispatch so it is not really practical to initialize it and use it in the same kernel if you have multiple work-groups.
It doesn't really make much sense to allocate it in kernel scope. If the work-groups are serialized, what would the lifetime be of the global array allocated in the kernel code? Should it outlast a workgroup, a dispatch, stay permanently to be shared between that kernel and the next? The obvious might be that it would have the same lifetime as the kernel, but then it would be impossible to initialize and use without a race. If it is persistent across multiple kernels then host allocation or program scope allocation makes more sense.
Why is passing a new argument such a problem?
__global memory object can be allocated only via API call on the host side.
You can also use __local memory object which can be allocated via API call on the host side as well as inside the kernel and is visible to all threads within the work group.
I've been getting into some assembly lately and its fun as it challenges everything i have learned. I was wondering if i could ask a few questions
When running an executable, does the entire executable get loaded into memory?
From a bit of fiddling i've found that constants aren't really constants? Is it just a compiler thing?
const int i = 5;
_asm { mov i, 0 } // i is now 0 and compiles fine
So are all variables assigned with a constant value embedded into the file as well?
Meaning:
int a = 1;
const int b = 2;
void something()
{
const int c = 3;
int d = 4;
}
Will i find all of these variables embedded in the file (in a hex editor or something)?
If the executable is loaded into memory then "constants" are technically using memory? I've read around on the net people saying that constants don't use memory, is this true?
Your executable's text (i.e. code) and data segments get mapped into the process's virtual address space when the executable starts up, but the bytes might not actually be copied from the disk until those memory locations are accessed. See http://en.wikipedia.org/wiki/Demand_paging
C-language constants actually exist in memory, because you have to be able to take the address of them. (That is, &i.) Constants are usually found in the .rdata segment of your executable image.
A constant is going to take up memory somewhere--if you have the constant number 42 in your program, there must be somewhere in memory where the 42 is stored, even if that means that it's stored as the argument of an immediate-mode instruction.
The OS loads the code and data segments in order to prepare them for execution.
If the executable has a resource segment, the application loads parts of it at demand.
It's true that const variables take memory space but compilers are free to optimize
for memory usage and code size, and embed their values in the code.
(in case they don't detect any address references for those variables)
const char * aka C strings, usually are interned by the compilers, to save memory.