OS memory allocation addresses - memory

Quick curious question, memory allocation addresses are choosed by the language compiler or is it the OS which chooses the addresses for the memory asked?
This is from a doubt about virtual memory, where it could be quickly explained as "let the process think he owns all the memory", but what happens on 64 bits architectures where only 48 bits are used for memory addresses if the process wants a higher address?
Lets say you do a int a = malloc(sizeof(int)); and you have no memory left from the previous system call so you need to ask the OS for more memory, is the compiler the one who determines the memory address to allocate this variable, or does it just ask the OS for memory and it allocates it on the address returned by it?

It would not be the compiler, especially since this is dynamic memory allocation. Compilation is done well before you actually execute your program.
Memory reservation for static variables happens at compile time. But the static memory allocation will happen at start-up, before the user defined Main.
Static variables can be given space in the executable file itself, this would then be memory mapped into the process address space. This is only one of few times(?) I can image the compiler actually "deciding" on an address.
During dynamic memory allocation your program would ask the OS for some memory and it is the OS that returns a memory address. This address is then stored in a pointer for example.

The dynamic memory allocation in C/C++ is simply done by runtime library functions. Those functions can do pretty much as they please as long as their behavior is standards-compliant. A trivial implementation of compliant but useless malloc() looks like this:
void * malloc(size_t size) {
return NULL;
}
The requirements are fairly relaxed -- the pointer has to be suitably aligned and the pointers must be unique unless they've been previously free()d. You could have a rather silly but somewhat portable and absolutely not thread-safe memory allocator done the way below. There, the addresses come from a pool that was decided upon by the compiler.
#include "stdint.h"
// 1/4 of available address space, but at most 2^30.
#define HEAPSIZE (1UL << ( ((sizeof(void*)>4) ? 4 : sizeof(void*)) * 2 ))
// A pseudo-portable alignment size for pointerĊšbwitary types. Breaks
// when faced with SIMD data types.
#define ALIGNMENT (sizeof(intptr_t) > sizeof(double) ? sizeof(intptr_t) : siE 1Azeof(double))
void * malloc(size_t size)
{
static char buffer[HEAPSIZE];
static char * next = NULL;
void * result;
if (next == NULL) {
uintptr_t ptr = (uintptr_t)buffer;
ptr += ptr % ALIGNMENT;
next = (char*)ptr;
}
if (size == 0) return NULL;
if (next-buffer > HEAPSIZE-size) return NULL;
result = next;
next += size;
next += size % ALIGNMENT;
return result;
}
void free(void * ptr)
{}
Practical memory allocators don't depend upon such static memory pools, but rather call the OS to provide them with newly mapped memory.
The proper way of thinking about it is: you don't know what particular pointer you are going to get from malloc(). You can only know that it's unique and points to properly aligned memory if you've called malloc() with a non-zero argument. That's all.

Related

CUDA "out of memory" with plenty of memory in the VRAM [duplicate]

Seems like there are a lot of questions on here about moving double (or int, or float, etc) 2d arrays from host to device. This is NOT my question.
I have already moved all of the data onto the GPU and, the __global__ kernel calls several __device__ functions.
In these device kernels, I have tried the following:
To allocate:
__device__ double** matrixCreate(int rows, int cols, double initialValue)
{
double** temp; temp=(double**)malloc(rows*sizeof(double*));
for(int j=0;j<rows;j++) {temp[j]=(double*)malloc(cols*sizeof(double));}
//Set initial values
for(int i=0;i<rows;i++)
{
for(int j=0;j<cols;j++)
{
temp[i][j]=initialValue;
}
}
return temp;
}
To deallocate:
__device__ void matrixDestroy(double** temp,int rows)
{
for(int j=0;j<rows;j++) { free( temp[j] ); }
free(temp);
}
For single dimension arrays the __device__ mallocs work great, can't seem to keep it stable in the multidimensional case. By the way, the variables are sometime used like this:
double** z=matrixCreate(2,2,0);
double* x=z[0];
However, care is always taken to ensure no calls to free are done with active data. The code is actually an adaption of cpu only code, so I know nothing funny is going on with the pointers or memory. Basically I'm just re-defining the allocators and throwing a __device__ on the serial portions. Just want to run the whole serial bit 10000 times and the GPU seems like a good way to do it.
++++++++++++++ UPDATE +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Problem solved by Vyas. As per cuda specifications, heap size is initially set to 8Mb, if your mallocs exceed this, NSIGHT will not launch and the kernel crashes. Use the following under host code.
float increaseHeap=10;
cudaDeviceSetLimit(cudaLimitMallocHeapSize, size[0]*increaseHeap);
Worked for me!
The GPU side malloc() is a suballocator from a limited heap. Depending on the number of allocations, it is possible the heap is being exhausted. You can change the size of the backing heap using cudaDeviceSetLimit(cudaLimitMallocHeapSize, size_t size). For more info see : CUDA programming guide

Process memory mapping in C++

#include <iostream>
int main(int argc, char** argv) {
int* heap_var = new int[1];
/*
* Page size 4KB == 4*1024 == 4096
*/
heap_var[1025] = 1;
std::cout << heap_var[1025] << std::endl;
return 0;
}
// Output: 1
In the above code, I allocated 4 bytes of space in the heap. Now as the OS maps the virtual memory to system memory in pages (which are 4KB each), A block of 4KB in my virtual mems heap would get mapped to the system mem. For testing I decided I would try to access other addresses in my allocated page/heap-block and it worked, however I shouldn't have been allowed to access more than 4096 bytes from the start (which means index 1025 as an int variable is 4 bytes).
I'm confused why I am able to access 4*1025 bytes (More than the size of the page that has been allocated) from the start of the heap block and not get a seg fault.
Thanks.
The platform allocator likely allocated far more than the page size is since it is planning to use that memory "bucket" for other allocation or is likely keeping some internal state there, it is likely that in release builds there is far more than just a page sized virtual memory chunk there. You also don't know where within that particular page the memory has been allocated (you can find out by masking some bits) and without mentioning the platform/arch (I'm assuming x86_64) there is no telling that this page is even 4kb, it could be a 2MB "huge" page or anything alike.
But by accessing outside array bounds you're triggering undefined behavior like crashes in case of reads or data corruption in case of writes.
Don't use memory that you don't own.
I should also mention that this is likely unrelated to C++ since the new[] operator usually just invokes malloc/calloc behind the scenes in the core platform library (be that libSystem on OSX or glibc or musl or whatever else on Linux, or even an intercepting allocator). The segfaults you experience are usually from guard pages around heap blocks or in absence of guard pages there simply using unmapped memory.
NB: Don't try this at home: There are cases where you may intentionally trigger what would be considered undefined behavior in general, but on that specific platform you may know exactly what's there (a good example is abusing pthread_t opaque on Linux to get tid without an overhead of an extra syscall, but you have to make sure you're using the right libc, the right build type of that libc, the right version of that libc, the right compiler that it was built with etc).

shm_open and mmap with stl vector

I've read that stl vector does not work well with SYS V shared memory. But if I use POSIX shm_open and then mmap with NULL (mmap(NULL, LARGE_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0) and give a much larger size than my object, which contains my vector, and after mapping add aditional items to the vector, can there be a problem other than exceeding the LARGE_SIZE space? Other related question: is it guaranteed on a recent SUSE linux that when mapped to the same start address (using above syntax) in unrelated processes my object will be mapped directly and no (system) copy is performed to actualize changed values in the processes (like what happens with normal open and normal files when mmap-ed)?
Thanks!
Edit:
Is this correct then?:
void* mem = allocate_memory_with_mmap(); // say from a shared region
MyType* ptr = new ( mem ) MyType( args );
ptr.~MyType() //is this really needed?
now in an unrelated process:
MyType* myptr = (MyType*)fetch_address_from_mmap(...)
myptr->printHelloWorld();
myptr->myvalue = 1; //writes to shared memory
myptr.~MyType() //is this really needed?
now if I want to free memory
munmap(address...) //but this done only once, when none of the processes use it any more
You are missing the fact that the STL vector is usually just a tuple of (mem pointer, mem size, element count), where actual memory for contained objects is received from the allocator template parameter.
Placing an instance of std::vector in shared memory does not make any sense. You probably want to check out boost::interprocess library instead.
Edit 0:
Memory allocation and object construction are two distinct phased, though combined in a single statement like bellow (unless operator new is re-defined for MyType):
// allocates from process heap and constructs
MyType* ptr = new MyType( args );
You can split these two phases with placement new:
void* mem = allocate_memory_somehow(); // say from a shared region
MyType* ptr = new ( mem ) MyType( args );
Though now you will have to explicitly call the destructor and release the memory:
ptr->~MyType();
release_memory_back_to_where_it_came_from( ptr );
This is essentially how you can construct objects in shared memory in C++. Note though that types that store pointers are not suitable for shared memory since any pointer in one process memory space does not make any sense in the other. Use explicit sizes and offsets instead.
Hope this helps.

Does continous reassigning of character strings lead to memory leak?

I have two questions:
Q1. The character pointers are used to point to a location where a given string is stored. If we keep reassigning the string, does it lead to memory leak?
On a Linux system I see:
$ cat chk.c
#include <stdio.h>
#define VP (void *)
int main()
{
char *str;
str = "ABC";
printf("str = %p points to %s\n", VP str, str);
str = "CBA";
printf("str = %p points to %s\n", VP str, str);
return 0;
}
$ cc chk.c && ./a.out
str = 0x8048490 points to ABC
str = 0x80484ab points to CBA
$
Q2. What is the maximum length of a string that can be assigned as above?
Can your sample code memory leak?
No.
You are assigning constant strings already in your program so no extra memory allocation happens.
Memory leaks are from forgotten malloc() type calls, or calls that internally do mallocs() type operations that you may not be aware of. Beware of functions that return a pointer to memory... such as strdup(). Such tend to either be not be thread safe or leak memory, if not both. Better are functions like snprintf() where the caller provides both a memory buffer and a maximum size. These function's don't leak.
Maximum string length: tends to have no artificial limit except available memory. Memory in the stack may be limited by various constraints (char can_be_too_big[1_000_000]), but memory from malloc() is not. Malloc memory ias a question of how much free memory you have (char * ok = malloc(1_000_000). Your local size_t provides the maximum memory to allocate in theory, but in practice it is much smaller.
Memory leaks only prevail when we allocate memory using malloc/realloc/calloc and forget to free it. In the above example, no where the we are allocating memory our self, so no memory leaks AFAIK.
OK, to be more specific, usually what happens (OS-specific, but AFAIK, this is universal, possibly in a spec somewhere) is that somewhere in the instruction set of your executable are the strings "ABC" and "CBA" - these are embedded in your program itself. When you do str="ABC" you are saying, "I want this string pointer to point to the address in my compiled program that contains the string ABC". This is why there is a difference between "strings" at runtime and "string literals" if you see that in documentation anywhere. Since you didn't allocate space for your literal - the compiler baked it into your program - you don't have to deallocate space for it.
Anyway, when your process gets unloaded the OS frees up this resource as a natural side effect of unloading your program. In fact, in general, it is impossible to leak after a program exits because the OS will deallocate any resources you forget to, even heinous leaks, on program exit. (this is not entirely true - you can cause another program, which is not unloaded, to leak if you do linked library stuff - but it's close enough). It's just one of those things that the OS takes care of.
when you Allocate Memory for somePointer p,and without freeing the memory or without making the other pointers to point to that memory,if you change the value of p,then in this situvation memory leak occurs.
(E.g)
char* p = malloc(sizeof(char)*n);
char* q= "ABC";
Then if you assign,
p=q;
Then there will be a memory leak.
If you dont use memory allocation,Then there wont be any memory leak.
And,
char* q= "ABC";
In this statement q will be automatiacally points to a constant location. Hence q value cannot be modified.
(E.g)
char* q = "ABC";
q[1] = 'b';
These statements will result in segmentation Fault.
MoreReference:
ErrorOnModifyingValue
DynamicMemoryAllocation

Beginner assembly programming memory usage question

I've been getting into some assembly lately and its fun as it challenges everything i have learned. I was wondering if i could ask a few questions
When running an executable, does the entire executable get loaded into memory?
From a bit of fiddling i've found that constants aren't really constants? Is it just a compiler thing?
const int i = 5;
_asm { mov i, 0 } // i is now 0 and compiles fine
So are all variables assigned with a constant value embedded into the file as well?
Meaning:
int a = 1;
const int b = 2;
void something()
{
const int c = 3;
int d = 4;
}
Will i find all of these variables embedded in the file (in a hex editor or something)?
If the executable is loaded into memory then "constants" are technically using memory? I've read around on the net people saying that constants don't use memory, is this true?
Your executable's text (i.e. code) and data segments get mapped into the process's virtual address space when the executable starts up, but the bytes might not actually be copied from the disk until those memory locations are accessed. See http://en.wikipedia.org/wiki/Demand_paging
C-language constants actually exist in memory, because you have to be able to take the address of them. (That is, &i.) Constants are usually found in the .rdata segment of your executable image.
A constant is going to take up memory somewhere--if you have the constant number 42 in your program, there must be somewhere in memory where the 42 is stored, even if that means that it's stored as the argument of an immediate-mode instruction.
The OS loads the code and data segments in order to prepare them for execution.
If the executable has a resource segment, the application loads parts of it at demand.
It's true that const variables take memory space but compilers are free to optimize
for memory usage and code size, and embed their values in the code.
(in case they don't detect any address references for those variables)
const char * aka C strings, usually are interned by the compilers, to save memory.

Resources