as i can solve this problem: i want use a buffer of byte of 1 MB or more, with array it not is possible becouse i have a stack overlflow. I have thinked about getmem and freemem, or using tmemorystream, but not have understood exactely as solve it. To me need for use a buffer for copy a file using tfilestream with read/write.
I don't want load all fine in one time in memory and after write it in disk all in time too; for it, i have found solution, but not need me it.
Thanks very much. Daniela.
If you have a stack overflow then your variable doesn't fit on the stack. You are clearly using a local variable.
Solve the problem by using the heap instead. Either GetMem or SetLength.
One easy solution is using a dynamic array. Their data is allocated on the heap, so you will avoid the stackoverflow. The advantage of them over directly working with memory allocation functions is that they are refcounted and the memory they allocated will automatically be freed once the last reference goes out of scope.
var buffer:array of byte;
begin
SetLength(buffer,100000);
...
//Will be freed here as buffer goes out of scope
end;
Your buffer variable is allocated on stack and default maximum stack size used by Delphi compiler is 1 MiB. So the solution is to set higher limit by using project options or the following global directive:
{$MAXSTACKSIZE 4194304} // eg. now maximum is 4 MiB
Other way is to use heap instead of stack, any of dynamically allocated memory, in your case probably best solution will be a dynamic array.
Performance note: stack is faster than heap.
Related
In the following procedure, will the array be allocated on the stack?
procedure One:
var
arr: array[0..1023] of byte;
begin
end;
What is the largest item that can go on the stack?
Is there a speed difference between accessing variable on the stack and on the heap?
In the following procedure, will the array be allocated on the stack?
Yes, provided that the local variable is not captured by an anonymous method. Such local variables reside on the heap.
What is the largest item that can go on the stack?
It depends on how large the stack is, and how much of the stack has already been used, and how much of the stack is used by calls made by the function itself. The stack is a fixed size, determined when the thread is created. The stack overflows if it grows beyond that size. On Windows at least, the default stack size is 1MB, so I would not expect you to encounter problems with a 1KB array as can be seen here.
Is there a speed difference between accessing variable on the stack and on the heap?
By and large no, but again this depends. Variables on the stack are probably more likely to be accessed frequently, and so probably easier to be cached. But for a decently sized object, like the 1KB array we can see here, I would not expect there to be any difference in access time. In terms of the underlying memory architecture, there's no difference between stack and heap, it's all just memory.
Now, where there is a difference in performance is in allocation. Heap allocation is more expensive than stack allocation. And especially if you have a multi-threaded application, heap allocation can be a bottleneck. In particular, the default Delphi memory manager does not scale well in multi-threaded use.
Let's go back to the basics. Frankly, I have never used New and Dispose functions before. However, after I read the New() documentation and the included examples on the Embarcadero Technologies's website and the Delphi Basics explanation of New(), it leaves questions in my head:
What are the advantages of using System.New() instead of a local variable, other than just spare a tiny amount of memory?
Common code examples for New() are more or less as follows:
var
pCustRec : ^TCustomer;
begin
New(pCustRec);
pCustRec^.Name := 'Her indoors';
pCustRec^.Age := 55;
Dispose(pCustRec);
end;
In what circumstances is the above code more appropriate than the code below?
var
CustRec : TCustomer;
begin
CustRec.Name := 'Her indoors';
CustRec.Age := 55;
end;
If you can use a local variable, do so. That's a rule with practically no exceptions. This results in the cleanest and most efficient code.
If you need to allocate on the heap, use dynamic arrays, GetMem or New. Use New when allocating a record.
Examples of being unable to use the stack include structures whose size are not known at compile time, or very large structures. But for records, which are the primary use case for New, these concerns seldom apply.
So, if you are faced with a choice of stack vs heap for a record, invariably the stack is the correct choice.
From a different perspective:
Both can suffer from buffer overflow and can be exploited.
If a local variable overflows, you get stack corruption.
If a heap variable overflows, you get heap corruption.
Some say that stack corruptions are easier to exploit than heap corruptions, but that is not true in general.
Note there are various mechanisms in operating systems, processor architectures, libraries and languages that try to help preventing these kinds of exploits.
For instance there is DEP (Data Execution Prevention), ASLR (Address Space Layout Randomization) and more are mentioned at Wikipedia.
A local static variable reserves space on the limited stack. Allocated memory is located on the heap, which is basically all memory available.
As mentioned, the stack space is limited, so you should avoid large local variables and also large parameters which are passed by value (absence of var/const in the parameter declaration).
A word on memory usage:
1. Simple types (integer, char, string, double etc.) are located directly on the stack. The amount of bytes used can be determined by the sizeof(variable) function.
2. The same applies to record variables and arrays.
3. Pointers and Objects require 4/8 bytes.
Every object (that is, class instances) is always allocated on the heap.
Value structures (simple numerical types, records containing only those types) can be allocated on the heap.
Dynamic arrays and strings content are always allocated on the heap. Only the reference pointer can be allocated on the stack. If you write:
function MyFunc;
var s: string;
...
Here, 4/8 bytes are allocated on the stack, but the string content (the text characters) will always be allocated on the heap.
So using New()/Dispose() is of poor benefit. If it contains no reference-counted types, you may use GetMem()/FreeMem() instead, since there is no internal pointer to set to zero.
The main drawback of New() or Dispose() is that if an exception occur, you need to use a try...finally block:
var
pCustRec : ^TCustomer;
begin
New(pCustRec);
try
pCustRec^.Name := 'Her indoors';
pCustRec^.Age := 55;
finally
Dispose(pCustRec);
end;
end;
Whereas allocating on the stack let the compiler do it for you, in an hidden manner:
var
CustRec : TCustomer;
begin // here a try... is generated
CustRec.Name := 'Her indoors';
CustRec.Age := 55;
end; // here a finally + CustRec cleaning is generated
That's why I almost never use New()/Dispose(), but allocate on stack, or even better within a class.
2
The usual case for heap allocation is when the object must outlive the function that created it:
It is being returned as a function result or via a var/out parameter, either directly or by returning some container.
It's being stored in some object, struct or collection that is passed in or otherwise accessible inside the procedure (this includes being signaled/queued off to another thread).
In cases of limited stack space you might prefer allocation from the heap.
Ref.
I would like to allocate space (dynamic size) with a byte array and get a pointer to the "spacearea" and free it later if I don't need it anymore.
I know about VirtualAlloc, VirutalAllocEx and LocalAlloc.
Which one is the best and how can I free the memory afterwards?
Thank you for your help.
I don't think it is a good idea to use the winapi for that instead of the native Pascal functions.
You can simply define an array of bytes as
var yourarray: array of byte;
then it can be allocated by
setlength(yourarray, yoursize);
and freed by
setlength(yourarray, 0);
Such an array is reference counted and you can access individual bytes as yourarray[byteid]
Or if you really want pointers, you can use:
var p: pointer;
GetMem(p, yoursize);
FreeMem(p);
You should better use GetMem/FreeMem or a dynamic array, or a RawByteString. Note that GetMem/FreeMem, dynamic arrays or RawByteString uses the heap, not the stack for its allocation.
There is no interest about using VirtualAlloc/VirtualFree instead of GetMem/FreeMem. For big blocks, the memory manager (which implements the heap) will call VirtualAlloc/VirtualFree APIs, but for smaller blocks, it will be more optimized to rely on the heap.
Since VirtualAlloc/VirtualFree is local to the current process, the only interest to use it is if you want to create some memory block able to execute code, e.g. for creating some stubbing wrappers of classes or interfaces, via their VirtualAllocEx/VirtualFreeEx APIs (but I doubt it is your need).
If you want to use some memory global to all processes/programs, you have GlobalAlloc/GlobalFree API calls at hand.
VirtualAlloc is a page allocation function. It is the low level user space code function for allocating memory. But you must understand that the memory returned from VirtualAlloc is aligned to a multiple of the page size.
On windows 32 bit the page size is normally 4096 Bytes. On other systems it might be larger.
So this makes VirtualAlloc useful when you need whole pages of memory. VirtualAlloc can allocate large "ranges of pages". The pages are virtual and are thus actually mappings to underlying system RAM and half the time are swapped out to the swap file, and this is why it is called VirtualAlloc, emphasis on virtual.
Using VirtualAlloc and VirtualAllocEx you can also just reserve some pages of memory. Reserved pages are a range that are held in reserved state until you are sure they will be used, at which point you can commit the pages, at which time the underlying resources needed for the pages will be allocated/committed.
Use VirtualFree to free the pages you allocated or reserved with VirtualAlloc.
The difference between VirtualAlloc and LocalAlloc is that LocalAlloc allocates from a heap, and a heap is a mechanism of allocating blocks of memory from much larger blocks of reserved pages. Internally, a heap allocates large sections of memory using VirtualAlloc, and then divides those pages up into smaller blocks that you see as buffers returned from functions like malloc, getmem and LocalAlloc.
LocalAlloc could be though of as the Windows built in version of malloc or getmem. A call to LocalAlloc is similar to calling malloc in C++ or to calling getmem in Delphi. In fact you could override the GetMem in Delphi and use LocalAlloc and your DElphi application will probably just run the same.
Call LocalFree to free some memory allocated with LocalAlloc. Internally this will mark the block of memory as available to the next caller.
So the main consideration now when deciding is on overhead. If you need to allocate often then you should use LocalAlloc or getmem, because committing and reserving virtual pages is a more time consuming process.
In other words, use getmem or LocalAlloc unless you have a very special reason not to.
In all my tests with Delphi 5 versus C++ compilers the Delphi 5 getmem was faster, although that was five years ago. Since then allocators like hoard are available that might be faster. But it is hard to say what is faster when there are so many variables.
But for sure all the heap functions like LocalAlloc, malloc and getmem should be much faster than allocating and freeing with VirtualAlloc, which is normally used to reserve memory internally for heap functions like LocalAlloc and getmem.
For Pascal programs, prefer getmem or SetLength because this is more portable. Or you can write your own wrapper function to LocalAlloc or whatever the OS heap function is.
The functions that you have listed are WinAPI functions, which are platform dependant. Obviously you should use the functions of the same API for deallocating that you have used for allocation.
If you want to use Delphi memory manager, than GetMemory and FreeMemory is the obvious choice, however if you need your pointer to be aligned to the system page size(which is requirement for some low level libraries) or you are going to use large buffer sizes, then Windows API virtual memory functions VirtualAlloc and VirtualFree are your best friends.
Ok, I asked the difference between Stackoverflow and bufferoverflow yesterday and almost getting voted down to oblivion and no new information.
So it got me thinking and I decided to rephrase my question in the hopes that I get reply which actually solves my issue.
So here goes nothing.
I am aware of four memory segments(correct me if I am wrong). The code, data, stack and heap. Now AFAIK the the code segment stores the code, while the data segment stores the data related to the program. What seriously confuses me is the purpose of the stack and the heap!
From what I have understood, when you run a function, all the related data to the function gets stored in the stack and when you recursively call a function inside a function, inside of a function... While the function is waiting on the output of the previous function, the function and its necessary data don't pop out of the stack. So you end up with a stack overflow. (Again please correct me if I am wrong)
Also I know what the heap is for. As I have read someplace, its for dynamically allocating data when a program is executing. But this raises more questions that solves my problems. What happens when I initially initialize my variables in the code.. Are they in the code segment or in the data segment or in the heap? Where do arrays get stored? Is it that after my code executes all that was in my heap gets erased? All in all, please tell me about heap in a more simplified manner than just, its for malloc and alloc because I am not sure I completely understand what those terms are!
I hope people when answering don't get lost in the technicalities and can keep the terms simple for a layman to understand (even if the concept to be described is't laymanish) and keep educating us with the technical terms as we go along. I also hope this is not too big a question, because I seriously think they could not be asked separately!
What is the stack for?
Every program is made up of functions / subroutines / whatever your language of choice calls them. Almost always, those functions have some local state. Even in a simple for loop, you need somewhere to keep track of the loop counter, right? That has to be stored in memory somewhere.
The thing about functions is that the other thing they almost always do is call other functions. Those other functions have their own local state - their local variables. You don't want your local variables to interfere with the locals in your caller. The other thing that has to happen is, when FunctionA calls FunctionB and then has to do something else, you want the local variables in FunctionA to still be there, and have their same values, when FunctionB is done.
Keeping track of these local variables is what the stack is for. Each function call is done by setting up what's called a stack frame. The stack frame typically includes the return address of the caller (for when the function is finished), the values for any method parameters, and storage for any local variables.
When a second function is called, then a new stack frame is created, pushed onto the top of the stack, and the call happens. The new function can happily work away on its stack frame. When that second function returns, its stack frame is popped (removed from the stack) and the caller's frame is back in place just like it was before.
So that's the stack. So what's the heap? It's got a similar use - a place to store data. However, there's often a need for data that lives longer than a single stack frame. It can't go on the stack, because when the function call returns, it's stack frame is cleaned up and boom - there goes your data. So you put it on the heap instead. The heap is a basically unstructured chunk of memory. You ask for x number of bytes, and you get it, and can then party on it. In C / C++, heap memory stays allocated until you explicitly deallocate. In garbage collected languages (Java/C#/Python/etc.) heap memory will be freed when the objects on it aren't used anymore.
To tackle your specific questions from above:
What's the different between a stack overflow and a buffer overflow?
They're both cases of running over a memory limit. A stack overflow is specific to the stack; you've written your code (recursion is a common, but not the only, cause) so that it has too many nested function calls, or you're storing a lot of large stuff on the stack, and it runs out of room. Most OS's put a limit on the maximum size the stack can reach, and when you hit that limit you get the stack overflow. Modern hardware can detect stack overflows and it's usually doom for your process.
A buffer overflow is a little different. So first question - what's a buffer? Well, it's a bounded chunk of memory. That memory could be on the heap, or it could be on the stack. But the important thing is you have X bytes that you know you have access to. You then write some code that writes X + more bytes into that space. The compiler has probably already used the space beyond your buffer for other things, and by writing too much, you've overwritten those other things. Buffer overruns are often not seen immediately, as you don't notice them until you try to do something with the other memory that's been trashed.
Also, remember how I mentioned that return addresses are stored on the stack too? This is the source of many security issues due to buffer overruns. You have code that uses a buffer on the stack and has an overflow vulnerability. A clever hacker can structure the data that overflows the buffer to overwrite that return address, to point to code in the buffer itself, and that's how they get code to execute. It's nasty.
What happens when I initially initialize my variables in the code.. Are they in the code segment or in the data segment or in the heap?
I'm going to talk from a C / C++ perspective here. Assuming you've got a variable declaration:
int i;
That reserves (typically) four bytes on the stack. If instead you have:
char *buffer = malloc(100);
That actually reserves two chunks of memory. The call to malloc allocates 100 bytes on the heap. But you also need storage for the pointer, buffer. That storage is, again, on the stack, and on a 32-bit machine will be 4 bytes (64-bit machine will use 8 bytes).
Where do arrays get stored...???
It depends on how you declare them. If you do a simple array:
char str[128];
for example, that'll reserve 128 bytes on the stack. C never hits the heap unless you explicitly ask it to by calling an allocation method like malloc.
If instead you declare a pointer (like buffer above) the storage for the pointer is on the stack, the actual data for the array is on the heap.
Is it that after my code executes all that was in my heap gets erased...???
Basically, yes. The OS will clean up the memory used by a process after it exits. The heap is a chunk of memory in your process, so the OS will clean it up. Although it depends on what you mean by "clean it up." The OS marks those chunks of RAM as now free, and will reuse it later. If you had explicit cleanup code (like C++ destructors) you'll need to make sure those get called, the OS won't call them for you.
All in all, please tell me about heap in a more simplified manner than just, its for malloc and alloc?
The heap is, much like it's name, a bunch of free bytes that you can grab a piece at a time, do whatever you want with, then throw back to use for something else. You grab a chunk of bytes by calling malloc, and you throw it back by calling free.
Why would you do this? Well, there's a couple of common reasons:
You don't know how many of a thing
you need until run time (based on
user input, for example). So you
dynamically allocate on the heap as
you need them.
You need large data structures. On
Windows, for example, a thread's
stack is limited by default to 1
meg. If you're working with large
bitmaps, for example, that'll be a
fast way to blow your stack and get
a stack overflow. So you grab that
space of the heap, which is usually
much, much larger than the stack.
The code, data, stack and heap?
Not really a question, but I wanted to clarify. The "code" segment contains the executable bytes for your application. Typically code segments are read only in memory to help prevent tampering. The data segment contains constants that are compiled into the code - things like strings in your code or array initializers need to be stored somewhere, the data segment is where they go. Again, the data segment is typically read only.
The stack is a writable section of memory, and usually has a limited size. The OS will initialize the stack and the C startup code calls your main() function for you. The heap is also a writable section of memory. It's reserved by the OS, and functions like malloc and free manage getting chunks out of it and putting them back.
So, that's the overview. I hope this helps.
With respect to stack... This is precicely where the parameters and local variables of the functions / procedures are stored. To be more precise, the params and local variables of the currently executing function is only accessible from the stack... Other variables that belong to chain of functions that were executed before it will be in stack but will not be accessible until the current function completed its operations.
With respect global variables, I believe these are stored in data segment and is always accessible from any function within the created program.
With respect to Heap... These are additional memories that can be made allotted to your program whenever you need them (malloc or new)... You need to know where the allocated memory is in heap (address / pointer) so that you can access it when you need. Incase you loose the address, the memory becomes in-accessible, but the data still remains there. Depending on the platform and language this has to be either manually freed by your program (or a memory leak occurs) or needs to be garbage collected. Heap is comparitively huge to stack and hence can be used to store large volumes of data (like files, streams etc)... Thats why Objects / Files are created in Heap and a pointer to the object / file is stored in stack.
In terms of C/C++ programs, the data segment stores static (global) variables, the stack stores local variables, and the heap stores dynamically allocated variables (anything you malloc or new to get a pointer to). The code segment only stores the machine code (the part of your program that gets executed by the CPU).
GetMem allows you to allocate a buffer of arbitrary size. Somewhere, the size information is retained by the memory manager, because you don't need to tell it how big the buffer is when you pass the pointer to FreeMem.
Is that information for internal use only, or is there any way to retrieve the size of the buffer pointed to by a pointer?
It would seem that the size of a block referenced by a pointer returned by GetMem() must be available from somewhere, given that FreeMem() does not require that you identify the size of memory to be freed - the system must be able to determine that, so why not the application developer?
But, as others have said, the precise details of the memory management involved are NOT defined by the system per se.... Delphi has always had a replaceable memory manager architecture, and the "interface" defined for compatible memory managers does not require that they provide this information for an arbitrary pointer.
The default memory manager will maintain the necessary information in whatever way suits it, but some other memory manager will almost certainly use an entirely different, if superficially similar, mechanism, so even if you hack a solution based on intimate knowledge of one memory manager, if you change the memory manager (or if it is changed for you, e.g. by a change in thesystem defined, memory manager which you perhaps are using by default, as occurred between Delphi 2005 and 2006, for example) then your solution will almost certainly break.
In general, it's not an unreasonable assumption on the part of the RTL/memory manager that the application should already know how big a piece of memory a GetMem() allocated pointer refers to, given that the application asked for it in the first place! :)
And if your application did NOT allocate the pointer then your application's memory manager has absolutely no way of knowing how big the block it references may be. It may be a pointer into the middle of some larger block, for example - only the source of the pointer can possibly know how it relates to the memory it references!
But, if your application really does need to maintain such information about it's own pointers, then it could of course easily devise a means to achieve this with a simple singleton class or function library through which GetMem()/FreeMem() requests are routed, to maintain a record of the associated requested size for each current allocated pointer. Such a mechanism could then of course easily expose this information as required, entirely reliably and independently of whatever memory manager is in use.
This may in face be the only option if an "accurate" record is required , as a given memory manager implementation may allocate a larger block of memory for a given size of data than is actually requested. I do not know if any memory manager does in fact do this, but it could do so in theory, for efficiency sake.
It is for internal use as it depends on the MemoryManager used. BTW, that's why you need to use the pair GetMem/FreeMem from the same MemoryManager; there is no canonical way of knowing how the memory has been reserved.
In Delphi, if you look at FastMM4, you can see that the memory is allocated in small, medium or large blocks:
the small blocks are allocated in pools of fixed size blocks (block size is defined at the pool level in the block type)
TSmallBlockType = packed record
{True = Block type is locked}
BlockTypeLocked: Boolean;
{Bitmap indicating which of the first 8 medium block groups contain blocks
of a suitable size for a block pool.}
AllowedGroupsForBlockPoolBitmap: byte;
{The block size for this block type}
BlockSize: Word;
the medium blocks are also allocated in pools but have a variable size
{Medium block layout:
Offset: -8 = Previous Block Size (only if the previous block is free)
Offset: -4 = This block size and flags
Offset: 0 = User data / Previous Free Block (if this block is free)
Offset: 4 = Next Free Block (if this block is free)
Offset: BlockSize - 8 = Size of this block (if this block is free)
Offset: BlockSize - 4 = Size of the next block and flags
{Get the block header}
LBlockHeader := PCardinal(Cardinal(APointer) - BlockHeaderSize)^;
{Get the medium block size}
LBlockSize := LBlockHeader and DropMediumAndLargeFlagsMask;
the large blocks are allocated individually with the required size
TLargeBlockHeader = packed record
{Points to the previous and next large blocks. This circular linked
list is used to track memory leaks on program shutdown.}
PreviousLargeBlockHeader: PLargeBlockHeader;
NextLargeBlockHeader: PLargeBlockHeader;
{The user allocated size of the Large block}
UserAllocatedSize: Cardinal;
{The size of this block plus the flags}
BlockSizeAndFlags: Cardinal;
end;
Is that information for internal use only, or is there any way to retrieve the size of the buffer pointed to by a pointer?
Do these two `alternatives' contradict each other?
It's for internal use only.
There is some information before the allocated area to store meta information. This means, each time you allocate a piece of memory, a bigger piece is allocated and the first bytes are used for meta information. The returned pointer is to the block following this meta information.
I can imagine that the format is changed with an other version of the memory manager so don't count on this.
That information is for internal use only.
Note that memory managers doesn't need to store the size as part of the memory returned, many memory managers will store it in an internal table and use the memory address of the start of the chunk given out as a lookup key in that table.