How to detect "dangling pointers" if "Assigned()" can't do it? - delphi

In another question, I found out that the Assigned() function is identical to Pointer <> nil. It has always been my understanding that Assigned() was detecting these dangling pointers, but now I've learned it does not. Dangling Pointers are those which may have been created at one point, but have since been free'd and haven't been assigned to nil yet.
If Assigned() can't detect dangling pointers, then what can? I'd like to check my object to make sure it's really a valid created object before I try to work with it. I don't use FreeAndNil as many recommend, because I like to be direct. I just use SomeObject.Free.
Access Violations are my worst enemy - I do all I can to prevent their appearance.

If you have an object variable in scope and it may or may not be a valid reference, FreeAndNil is what you should be using. That or fixing your code so that your object references are more tightly managed so it's never a question.
Access Violations shouldn't be thought of as an enemy. They're bugs: they mean you made a mistake that needs fixed. (Or that there's a bug in some code you're relying on, but I find most often that I'm the one who screwed up, especially when dealing with the RTL, VCL, or Win32 API.)

It is sometimes possible to detect when the address a pointer points to resides in a memory block that is on the heap's list of freed memory blocks. However, this requires comparing the pointer to potentially every block in the heap's free list which could contain thousands of blocks. So, this is potentially a computationally intensive operation and something you would not want to do frequently except perhaps in a severe diagnostic mode.
This technique only works while the memory block that the pointer used to point to continues to sit in the heap free list. As new objects are allocated from the heap, it is likely that the freed memory block will be removed from the heap free list and put back into active play as the home of a new, different object. The original dangling pointer still points to the same address, but the object living at that address has changed. If the newly allocated object is of the same (or compatible) type as the original object now freed, there is practically no way to know that the pointer originated as a reference to the previous object. In fact, in this very special and rare situation, the dangling pointer will actually work perfectly well. The only observable problem might be if someone notices that the data has changed out from under the pointer unexpectedly.
Unless you are allocating and freeing the same object types over and over again in rapid succession, chances are slim that the new object allocated from that freed memory block will be the same type as the original. When the types of the original and the new object are different, you have a chance of figuring out that the content has changed out from under the pointer. However, to do that you need a way to know the type of the original object that the pointer referred to. In many situations in native compiled applications, the type of the pointer variable itself is not retained at runtime. A pointer is a pointer as far as the CPU is concerned - the hardware knows very little of data types. In a severe diagnostic mode it's conceivable that you could build a lookup table to associate every pointer variable with the type allocated and assigned to it, but this is an enormous task.
That's why Assigned() is not an assertion that the pointer is valid. It just tests that the pointer is not nil.
Why did Borland create the Assigned() function to begin with? To further hide pointerisms from novice and occasional programmers. Function calls are easier to read and understand than pointer operations.

The bottom line is that you should not be attempting to detect dangling pointers in code. If you are going to refer to pointers after they have been freed, set the pointer to nil when you free it. But the best approach is not to refer to pointers after they have been freed.
So, how do you avoid referring to pointers after they have been freed? There are a couple of common idioms that get you a long way.
Create objects in a constructor and destroy them in the destructor. Then you simply cannot refer to the pointer before creation or after destruction.
Use a local variable pointer that is created at the beginning of the function and destroyed as the last act of the function.
One thing I would strongly recommend is to avoid writing if Assigned() tests into your code unless it is expected behaviour that the pointer may not be created. Your code will become hard to read and you will also lose track of whether the pointer being nil is to be expected or is a bug.
Of course we all do make mistakes and leave dangling pointers. Using FreeAndNil is one cheap way to ensure that dangling pointer access is detected. A more effective method is to use FastMM in full debug mode. I cannot recommend this highly enough. If you are not using this wonderful tool, you should start doing so ASAP.
If you find yourself struggling with dangling pointers and you find it hard to work out why then you probably need to refactor the code to fit into one of the two idioms above.
You can draw a parallel with array indexing errors. My advice is not to check in code for validity of index. Instead use range checking and let the tools do the work and keep the code clean. The exception to this is where the input comes from outside your program, e.g. user input.
My parting shot: only ever write if Assigned if it is normal behaviour for the pointer to be nil.

Use a memory manager, such as FastMM, that provides debugging support, in particular to fill a block of freed memory with a given byte pattern. You can then dereference the pointer to see if it points at a memory block that starts with the byte pattern, or you can let the code run normallly ad raise an AV if it tries to access a freed memory block through a dangling pointer. The AV's reported memory address will usually be either exactly as, or close to, the byte pattern.

Nothing can find a dangling (once valid but then not) pointer. It's your responsibility to either make sure it's set to nil when you free it's content, or to limit the scope of the pointer variable to only be available within the scope it's valid. (The second is the better solution whenever possible.)

The core point is that the way how objects are implemented in Delphi has some built-in design drawbacks:
there is no distinction between an object and a reference to an object. For "normal" variables, say a scalar (like int) or a record, these two use cases can be well told apart - there's either a type Integer or TSomeRec, or a type like PInteger = ^Integer or PSomeRec = ^TSomeRec, which are different types. This may sound like a neglectable technicality, but it isn't: a SomeRec: TSomeRec denotes "this scope is the original owner of that record and controls its lifecycle", while SomeRec: PSomeRec tells "this scope uses a transient reference to some data, but has no control over the record's lifecycle. So, as dumb it may sound, for objects there's virtually no one who has denotedly control over other objects' lifecycles. The result is - surprise - that the lifecycle state of objects may in certain situations be unclear.
an object reference is just a simple pointer. Basically, that's ok, but the problem is that there's sure a lot of code out there which treats object references as if they were a 32bit or 64bit integer number. So if e.g. Embarcadero wanted to change the implementation of an object reference (and make it not a simple pointer any more), they would break a lot of code.
But if Embarcadero wanted to eliminate dangling object pointers, they would have to redesign Delphi object references:
when an object is freed, all references to it must be freed, too. This is only possible by double-linking both, i.e. the object instance must carry a list with all of the references to it, that is, all memory addresses where such pointers are (on the lowest level). Upon destruction, that list is traversed, and all those pointers are set to nil
a little more comfortable solution were that the "one" holding such a reference can register a callback to get informed when a referenced object is destroyed. In code: when I have a reference FSomeObject: TSomeObject I would want to be able to write in e.g. SetSomeObject: FSomeObject.OnDestruction := Self.HandleDestructionOfSomeObject. But then FSomeObject can't be a pointer; instead, it would have to be at least an (advanced) record type
Of course I can implement all that by myself, but that is tedious, and isn't it something that should be addressed by the language itself? They also managed to implement for x in ...

Related

Why freeing not empty TList<Int64> does not cause memory leak?

Freeing not empty TList<Integer> does not cause memory leak because Integer is equal to pointer in size, and TList handles pointers perfectly. (This is as far as I understand it.) Freeing not empty TList<String> also does not cause memory leak, as String itself is a pointer and is carefully freed somewhere in Delphi's internals when it's no longer needed.
However, freeing any not empty TList<SomeClass> always produces memory leak, and it's understood why.
The thing I do not understand is why freeing not empty TList<Int64> does not produce memory leak.
Sorry for the noob question.
A TList<T> is simply a wrapper around a dynamic array of T. A dynamic array of T is a managed type and so does not need explicit destruction.
This leaves the elements of the array. Since Int64 is a value type, it needs no explicit destruction.
As a general rule, you need only destroy that which you created. You created the list, you need to destroy it. You did not create the elements themselves, so you do not need to destroy them.

Why does my program crash after I call ReallocMemory?

I'm trying to modify the VirtualTreeView to see data in the tree nodes in the design mode.
The allocating node memory is in the private static method so I can't do anything about it. I'm trying to reallocate the memory to match the new size then.
For the test purposes I'm trying to reallocate the same amount of memory:
ReallocMemory(Node, sizeof(Node^))
But the IDE hangs up in the random iteration throwing a lot of AV. Since my knowledge of memory allocation is pretty lacking I think I'm forgetting something. Could you point me please?
ReallocMemory is a function. It returns the new pointer value; it does not modify its argument. You want to call ReallocMem instead, or else use the result of the function:
ReallocMem(Node, SizeOf(Node^));
or
Node := ReallocMemory(Node, SizeOf(Node^));
When either of those functions cannot resize the block of memory in-place, it allocates new memory, copies the old contents into the new buffer, and then frees the original buffer. If you ignore the ReallocMemory result, then you have discarded the new pointer and retained the old, stale pointer in the Node variable. Continued use of a stale pointer would explain access violations and other unpredictable behavior.
There are two versions of those functions for C++ compatibility. C++ doesn't have Delphi's "compiler magic," which is what allows the compiler to have a single ReallocMem function that accepts and modifies any pointer type.
The ReallocMemory function looks like the C++ realloc function, but they don't behave quite the same way, which is why it's safe to directly overwrite the input variable with the function's return value. When reallocation fails, the function throws an exception, just like ReallocMem, where as realloc just returns a null pointer.

why is stack and heap both required for memory allocation

I've searched a while but no conclusive answer is present on why value types have to be allotted on the stack while the reference types i.e. dynamic memory or the objects have to reside on the heap.
why cannot the same be alloted on the stack?
They can be. In practice they're not because stack is a typically scarcer resource than heap and allocating reference types on the stack may exhaust it quickly. Further, if a function returns data allocated on its stack, it will require copying semantics on the caller's part or risk returning something that will be overwritten by the next function call.
Value types, typically local variables, can be brought in and out of scope quickly and easily with native machine instructions. Copy semantics for value types on return is trivial as most fit into machine registers. This happens often and should be as cheap as possible.
It is not correct that value types always live on the stack. Read Jon Skeet's article on the topic:
Memory in .NET - what goes where
I understand that the stack paradigm (nested allocations/deallocations) cannot handle certain algorithms which need non-nested object lifetimes.
just as the static allocation paradigm cannot handle recursive procedure calls. (e.g. naive calculation of fibonacci(n) as f(n-1) + f(n-2))
I'm not aware of a simple algorithm that would illustrate this fact though. any suggestions would be appreciated :-)
Local variables are allocated in the stack. If that was not the case, you wouldn't be able to have variables pointing to the heap when allocating variable's memory. You CAN allocate things in the stack if you want, just create a buffer big enough locally and manage it yourself.
Anything a method puts on the stack will vanish when the method exits. In .net and Java, it would be perfectly acceptable (in fact desirable) if a class object vanished as soon as the last reference to it vanished, but it would be fatal for an object to vanish while references to it still exist. It is not in the general case possible for the compiler to know, when a method creates an object, whether any references to that object will continue to exist after the method exits. Absent such assurance, the only safe way to allocate class objects is to store them on the heap.
Incidentally, in .net, one major advantage of mutable value types is that they can be passed by reference without surrendering perpetual control over them. If class 'foo', or a method thereof, has a structure 'boz' which one of foo's methods passes by reference to method 'bar', it is possible for bar, or the methods it calls, to do whatever they want to 'boz' until they return, but once 'bar' returns any references it held to 'boz' will be gone. This often leads to much safer and cleaner semantics than the promiscuously-sharable references used for class objects.

Records in Delphi

some questions about records in Delphi:
As records are almost like classes, why not use only classes instead of records?
In theory, memory is allocated for a record when it is declared by a variable; but, and how is memory released after?
I can understand the utility of pointers to records into a list object, but with Generics Containers (TList<T>), are there need to use pointer yet? if not, how to delete/release each record into a Generic Container? If I wanna delete a specific record into a Generic Container, how to do it?
There are lots of differences between records and classes; and no "Pointer to record" <> "Class". Each has its own pros and cons; one of the important things about software development is to understand these so you can more easily choose the most appropriate for a given situation.
This question is based on a false premise. Records are not almost like classes, in the same way that Integers are not almost like Doubles.
Classes must always be dynamically instantiated, whereas this is a possibility, but not a requirement for records.
Instances of classes (which we call objects) are always passed around by reference, meaning that multiple sections of code will share and act on the same instance. This is something important to remember, because you may unintentionally modify an object as a side-effect; although when done intentionally it's a powerful feature. Records on the other hand are passed by value; you need to explicitly indicate if you're passing them by reference.
Classes do not 'copy as easily as records'. When I say copy, I mean a separate instance duplicating a source. (This should be obvious in light of the value/reference comment above).
Records tend to work very nicely with typed files (because they're so easy to copy).
Records can overlay fields with other fields (case x of/unions)
These were comments on certain situational benefits of records; conversely, there are also situational benefits for classes that I'll not elaborate on.
Perhaps the easiest way to understand this is to be a little pedantic about it. Let's clarify; memory is not really allocated 'when its declared', it's allocated when the variable is in scope, and deallocated when it goes out of scope. So for a local variable, it's allocated just before the start of the routine, and deallocated just after the end. For a class field, it's allocated when the object is created, and deallocated when it's destroyed.
Again, there are pros and cons...
It can be slower and require more memory to copy entire records (as with generics) than to just copy the references.
Passing records around by reference (using pointers) is a powerful technique whereby you can easily have something else modify your copy of the record. Without this, you'd have to pass your record by value (i.e. copy it) receive the changed record as a result, copy it again to your own structures.
Are pointers to records like classes? No, not at all. Just two of the differences:
Classes support polymorphic inheritance.
Classes can implement interfaces.
For 1 and 2: records are value types, while classes are reference types. They're allocated on the stack, or directly in the memory space of any larger variable that contains them, instead of through a pointer, and automatically cleaned up by the compiler when they go out of scope.
As for your third question, a TList<TMyRecord> internally declares an array of TMyRecord for storage space. All the records in it will be cleaned up when the list is destroyed. If you want to delete a specific one, use the Delete method to delete by index, or the Remove method to find and delete. But be aware that since it's a value type, everything you do will be making copies of the record, not copying references to it.
One of the main benefits of records is, when you have a large "array of record". This is created in memory by allocating space for all records in one contiguous RAM space, which is extremely fast. If you had used "array of TClass" instead, each object in the array would have to be allocated by itself, which is slow.
There has been a lot of work to improve the speed of allocating memory, in order to improve the speed of strings and objects, but it will never be as fast as replacing 100,000 memory allocations with 1 memory allocation.
However, if you use array of record, don't copy the record around in local variables. That may easily kill the speed benefit.
1) To allow for inheritance and polymorphism, classes have some overhead. Records do not allow them, and in some situations may be somewhat faster and simpler to use. Unlike classes, that are always allocated in the heap and managed through references, records can be allocated on the stack also, accessed directly, and assigned each other without requiring to call an "Assign" method.
Also records are useful to access memory blocks with a given structure, because their memory layout is exactly how you define it. A class instance memory layout is controlled by the compiler and has additional data to make objects work (i.e. the pointer to the Virtual Method Table).
2) Unless you allocate records dynamically, using New() or GetMem(), record's memory is managed by the compiler as ordinals, floats or static arrays: global variables memory is allocated at startup and released when the program terminates, and local variables are allocated on the stack entering a function/procedure/method and released exiting. Allocating/releasing memory in the stack is faster because it doesn't require calls to the memory manager, it's just very few assembler instructions to change the stack registers. But be aware that allocating large structure on the stack may cause a stack overflow, because the maximum stack size is fixed and not very large (see linker options).
If records are fields of a class, they are allocated when the class is created and released when the class is freed.
3) One of the advantages of generics is to eliminate the need of low-level pointer management - but be aware of the inner workings.
There are a few other differences between a class and a record. Classes can use polymorphism, and expose interfaces. Records can not implement destructors (although since Delphi 2006 they can now implement constructors and methods).
Records are very useful in segmenting memory into a more logical structure since the first data item in the record is at the same address point of the pointer to the record itself. This is not the case for classes.

Is it possible to get the size of the type that a pointer points to in Delphi 7?

I want to get the size of any "record" type in following function. But seems it doesn't work:
function GetDataSize(P : Pointer) : Integer;
begin
Result := SizeOf(P^); // **How to write the code?**
end;
For example, the size of following record is 8 bytes
SampleRecord = record
Age1 : Integer;
Age2 : Integer;
end;
But GetDataSize(#a) always returns 1 (a is a variable of SampleRecord type of course). What should I do?
I noticed that Delphi has a procedure procedure New(var P: Pointer) which can allocate the memory block corresponds to the size of the type that P points to. How can it gets the size?
The reason New knows how much memory to allocate is that New is compiler magic. It's a language built-in, so when the compiler sees you call it, it rewrites it to something like this:
// New(foo);
foo := System._New(SizeOf(foo^), TypeInfo(TypeOf(foo^)));
TypeOf here is a made-up Delphi function for expository purposes. The compiler knows the declared type of foo because it knows where all your variable declarations are. You can look at the implementation of _New in System.pas. Similar rewriting occurs for Dispose so it knows what kind of finalization to do before freeing the memory.
The ideas of variables and declarations are compile-time concepts. At run time, they cease to exist. At run time, a pointer is just an address. The type of what it points to was determined at compile time. Types are what determine something's size.
If you need to write a function that accepts pointers to multiple things with different sizes, then you'll just have to provide a second parameter that describes what the first one points to.
Check out another question here, "How to know what type is a var." The asker wondered how to determine more information about a variable given only its address.
You cannot find the size of data structure using variable of type Pointer, because compiler cannot, make a guess and check it, since pointer can points to whatever data type you can think of. You can read some information here.
There's no safe way to determine the size of a record that a pointer points to. However, if you allocated the memory that the pointer points to, you can ask the size of that memory block. But then again, since you allocated that block, you should already know the size of that block!
The Delphi memory manager keeps track of every block of memory that gets allocated. With information from the memory manager it is possible to find this information, if your pointer points to the beginning of a memory block. However, if you allocated a large block of memory, loaded some data in it and your pointer points to some data inside this block, this method would be quite unreliable.
Also, if you use referenced types (dynamic arrays, strings, classes, etc.) in your record, the size it returns will still be unusable since you get the size of the reference (4 bytes) instead of the size of the data that is referenced to.
The NEW() command just uses the type information of the datatype that you pass to it to get it's size. To know how it does this exactly, you could just check the Delphi sourcecode. Open \source\Win32\rtl\sys\System.pas and search for "_New". (With the underscore in front of it. Using this sourcecode might help you to understand how Delphi handles memory allocations, although the sourcecode can be really complex.
Delphi has a built-in memory manager. I believe new has access to the heap object and uses HeapSize() (or similar routines) to get the size of a block, for some pointer.

Resources