Understanding memory allocation for TList<RecordType> - delphi

I have to store a TList of something that can easily be implemented as a record in Delphi (five simple fields). However, it's not clear to me what happens when I do TList<TMyRecordType>.Add(R).
Since R is a local variable in the procedure in which I create the my TList, I assume that the memory for it will be released when the function returns. Does this leave an invalid record pointer in the list? Or does the list know to copy-on-assign? If the former, I assume I would have to manually manager the memory for R with New() and Dispose(), is that correct?
Alternatively, I can "promote" my record type to a class type by simply declaring the fields public (without even bothering with making them formal properties). Is that considered OK, or ought I to take the time to build out the class with private fields and public properties?

Simplified: records are blobs of data and are passed around by value - i.e. by copying them - by default. TList<T> stores values in an array of type T. So, TList<TMyRecordType>.Add(R) will copy the value R into the array at position Count, and increment the Count by one. No need to worry about allocation or deallocation of memory.
More complex issues that you usually don't need to worry about: if your record contains fields of a string type, an interface type, a dynamic array, or a record which itself contains fields of one of these types, then it's not just a simply copy of data; instead, CopyRecord from System.pas is used, which ensures that reference counts are updated correctly. But usually you don't need to worry about this detail unless you are using Move to shift the bits around yourself, or doing similar low-level operations.

Related

Is a dynamic array automatically deallocated when length is decreased?

I alread know, that a dynamic array is automatically deallocated/freed after use.
Does the same applies for resizing, especially decreasing? The manual and most help sites only cover increasing the array size.
test: array of TLabel;
SetLength(test, 10);
// fill array here
SetLength(test, 2); // <=== are entries 3-10 are automatically destroyed?
are entries 3-10 are automatically destroyed?
No, they are not automatically destroyed because those entries are dynamically allocated (and are not managed types). Only the pointers that refer to those items are released. It is your responsibility to destroy the items if necessary, because the compiler has no way to guarantee you wouldn't still use them from another reference (or have already destroyed them).
I must also point out that technically items "3-10" is wrong. Dynamic array are zero based. So the references for entries 2 to 9 are the ones released.
I alread know, that a dynamic array is automatically deallocated/freed after use
In addition, your question indicates you don't properly understand this. It seems you believed that when your array goes out of scope the labels referenced would be automatically destroyed. This is incorrect!
No matter where how or why some/all dynamic array entries are released Delphi won't automatically destroy objects types or any dynamically allocated pointer memory. Delphi only automatically releases memory for primitives (Integer, TDateTime, Double short strings), records and managed types1 (interfaces, long strings, other dynamic arrays).
1 Of course this is via reference counting. I.e. reference is reduced by 1; and the underlying object/string/array is released if and only if refCount is reduced to zero.
As whosrdaddy pointed out, if you want automatic destruction of contained objects, then you need to use a container that implements an ownership concept. TObjectList is an example. Although it doesn't work exactly like a dynamic array, it's behaviour is similar enough that it can usually be used as a replacement very easily.

What is the fastest mean to transfer a record in DCOM

I want to transfer some records with the following structure between two Windows PC computer using COM/DCOM. I prefer to transfer an array, say 100 members of TARec, at a time, not each record individually. Currently I am doing this using IStrings. I am looking to improve it using the raw records, to save the time to encode/decode the strings at both ends. Please share your experience.
type
TARec = record
A : TDateTime;
B : WORD;
C : Boolean;
D : Double;
end;
All the record's field type are OLE compatible. Many thanks in advance.
As Rudy suggests in the comments, if your data contains simple value types then a variant byte array can be a very efficient approach and quite simple to implement.
Since you have stated that your data already resides in an array, the basic approach would be:
Create a byte array of the required size to hold all your record data (use VarArrayCreate with type varByte)
Lock the array to obtain a pointer that is safe to use to reference the array contents in memory (VarArrayLock will lock and return a pointer to the array data)
Use CopyMemory to directly copy the data from your array of records to the byte array memory.
Unlock the variant array (VarArrayUnlock) and pass it through your COM/DCOM interface
On the other ('receiving') side you simply reverse the process:
Declare an array of records of the required size
Lock the variant byte array to obtain a pointer to the memory holding the bytes
Copy the byte array data into your record array
Unlock the byte array
This exact approach is something I have used very successfully in a very demanding COM/DCOM scenario (w.r.t efficiency/performance) in the past.
Things to be careful of:
If your data ever changes to include more complex types such as strings or dynamic arrays then additional work will be required to correctly transport these through a byte array.
If your data structure ever changes then the code on both sides of the interface will need to be updated accordingly. One way to protect against this is to incorporate some mechanism for the data to be identified as valid or not by the receiver. This could include a "version number" for example and/or a value (in a 'header' as part of the byte array, in addition to the array data, or passed as a separate parameter entirely - precise details don't really matter). If the receiver finds a version number or size that it is not expecting then it can report this gracefully rather than naively processing the data incorrectly and (most likely) crashing or throwing exceptions as a result.
Alignment/packing issues. Even with the same declaration for the record type, if code is compiled with different alignment settings then the size required for each record in memory could change (which is why a "version number" for the data structure format might not be reliable on its own). One way to avoid this would be to declare the record as packed, though this comes at the cost of a slight reduction in efficiency (and still relies on both sides of the interface agreeing that the data structure is packed).
There are just things to bear in mind however, not prescriptive. Just how complex/robust your implementation needs to be will be determined by your specific case.

When exactly does a dynamic array get garbage collected?

Dynamic arrays are reference counted, and so the memory is freed automatically by the compiler. My question is, when exactly does this automatic freeing occur? Does it happen immediately, or at the end of the containing procedure?
Here is a concrete example
procedure DoStuff;
var data:TBytes;
begin
data:=GetData; // lets say data now contains 1 Gig of data.
DoStuffWithData(data);
// I now want to free up this 1Gig of memory before continuing.
// Is this call needed, or would the memory be freed in the next line anyway?
Finalize(data);
data:=GetMoreData; // The first array now has no remaining references
DoStuffWithData(data);
end
Is the call to Finalize() redundant?
The call to Finalize isn't quite redundant. It's true that the dynamic array's reference count will be decremented on the next line (therefore destroying the array, probably), but that will only happen after the new dynamic array is allocated. Just before the return of GetMoreData, but before the assignment takes place, there will be two dynamic arrays in memory. If you destroy the first one manually in advance, then you'll only have one array in memory at a time.
The second array that you store in data will get destroyed as DoStuff returns (assuming DoStuffWithData doesn't store a copy of the dynamic-array reference elsewhere, increasing its reference count).
When exactly does this automatic freeing occur? Does it happen immediately, or at the end of the containing procedure?
Dynamic memory associated with managed types (dynamic arrays fall into this class) is freed when the reference count is set to 0. This can happen at the following points:
The reference variable is assigned a new value. A call to Finalize can be thought of as the special case where the new values is nil.
The reference variable goes out of scope. For example:
The exit of a function is reached; local variables go out of scope.
An object is destroyed and its members go out of scope.
A pointer to a record is destroyed with the Dispose function; all fields of the record go out of scope.
A unit is finalized and all global variables defined in the unit are finalized.
Note that the various cases above only result in memory being freed when the reference that is being finalized or is leaving scope is the last remaining reference. In other words, when the reference count is 1.
In your specific example, assuming the Finalize is removed, you are creating a new dynamic array and assigning it to a variable that already holds a dynamic array. This then falls into the class described by item 1 in the list above. So in that sense the call to Finalize is superfluous.
Rob has explained the order in which the allocation and deallocation happens which is a good point. To the best of my knowledge that is an implementation detail that is not explicitly documented. However, I'd be astounded if that detail was ever changed.

How to handle billions of objects without "Outofmemory" error

I have an application which may needs to process billions of objects.Each object of is of TRange class type. These ranges are created at different parts of an algorithm which depends on certain conditions and other object properties. As a result, if you have 100 items, you can't directly create the 100th object without creating all the prior objects. If I create all the (billions of) objects and add to the collection, the system will throw Outofmemory error. Now I want to iterate through each object mainly for two purposes:
To apply an operation for each TRange object(eg:Output certain properties)
To get a cumulative sum of a certain property.(eg: Each range has a weight property and I want to retreive totalweight that is a sum of all the range weights).
How do I effectively create an Iterator for these object without raising Outofmemory?
I have handled the first case by passing a function pointer to the algorithm function. For eg:
procedure createRanges(aProc: TRangeProc);//aProc is a pointer to function that takes a //TRange
var range: TRange;
rangerec: TRangeRec;
begin
range:=TRange.Create;
try
while canCreateRange do begin//certain conditions needed to create a range
rangerec := ReturnRangeRec;
range.Update(rangerec);//don't create new, use the same object.
if Assigned(aProc) then aProc(range);
end;
finally
range.Free;
end;
end;
But the problem with this approach is that to add a new functionality, say to retrieve the Total weight I have mentioned earlier, either I have to duplicate the algorithm function or pass an optional out parameter. Please suggest some ideas.
Thank you all in advance
Pradeep
For such large ammounts of data you need to only have a portion of the data in memory. The other data should be serialized to the hard drive. I tackled such a problem like this:
I Created an extended storage that can store a custom record either in memory or on the hard drive. This storage has a maximum number of records that can live simultaniously in memory.
Then I Derived the record classes out of the custom record class. These classes know how to store and load themselves from the hard drive (I use streams).
Everytime you need a new or already existing record you ask the extended storage for such a record. If the maximum number of objects is exceeded, the storage streams some of the least used record back to the hard drive.
This way the records are transparent. You always access them as if they are in memory, but they may get loaded from hard drive first. It works really well. By the way RAM works in a very similar way so it only holds a certain subset of all you data on your hard drive. This is your working set then.
I did not post any code because it is beyond the scope of the question itself and would only confuse.
Look at TgsStream64. This class can handle a huge amounts of data through file mapping.
http://code.google.com/p/gedemin/source/browse/trunk/Gedemin/Common/gsMMFStream.pas
But the problem with this approach is that to add a new functionality, say to retrieve the Total weight I have mentioned earlier, either I have to duplicate the algorithm function or pass an optional out parameter.
It's usually done like this: you write a enumerator function (like you did) which receives a callback function pointer (you did that too) and an untyped pointer ("Data: pointer"). You define a callback function to have first parameter be the same untyped pointer:
TRangeProc = procedure(Data: pointer; range: TRange);
procedure enumRanges(aProc: TRangeProc; Data: pointer);
begin
{for each range}
aProc(range, Data);
end;
Then if you want to, say, sum all ranges, you do it like this:
TSumRecord = record
Sum: int64;
end;
PSumRecord = ^TSumRecord;
procedure SumProc(SumRecord: PSumRecord; range: TRange);
begin
SumRecord.Sum := SumRecord.Sum + range.Value;
end;
function SumRanges(): int64;
var SumRec: TSumRecord;
begin
SumRec.Sum := 0;
enumRanges(TRangeProc(SumProc), #SumRec);
Result := SumRec.Sum;
end;
Anyway, if you need to create billions of ANYTHING you're probably doing it wrong (unless you're a scientist, modelling something extremely large scale and detailed). Even more so if you need to create billions of stuff every time you want one of those. This is never good. Try to think of alternative solutions.
"Runner" has a good answer how to handle this!
But I would like to known if you could do a quick fix: make smaller TRange objects.
Maybe you have a big ancestor? Can you take a look at the instance size of TRange object?
Maybe you better use packed records?
This part:
As a result, if you have 100 items,
you can't directly create the 100th
object without creating all the prior
objects.
sounds a bit like calculating Fibonacci. May be you can reuse some of the TRange objects instead of creating redundant copies? Here is a C++ article describing this approach - it works by storing already calculated intermediate results in a hash map.
Handling billions of objects is possible but you should avoid it as much as possible. Do this only if you absolutely have to...
I did create a system once that needed to be able to handle a huge amount of data. To do so, I made my objects "streamable" so I could read/write them to disk. A larger class around it was used to decide when an object would be saved to disk and thus removed from memory. Basically, when I would call an object, this class would check if it's loaded or not. If not, it would re-create the object again from disk, put it on top of a stack and then move/write the bottom object from this stack to disk. As a result, my stack had a fixed (maximum) size. And it allowed me to use an unlimited amount of objects, with a reasonable good performance too.
Unfortunately, I don't have that code available anymore. I wrote it for a previous employer about 7 years ago. I do know that you would need to write a bit of code for the streaming support plus a bunch more for the stack controller which maintains all those objects. But it technically would allow you to create an unlimited number of objects, since you're trading RAM memory for disk space.

Records in Delphi

some questions about records in Delphi:
As records are almost like classes, why not use only classes instead of records?
In theory, memory is allocated for a record when it is declared by a variable; but, and how is memory released after?
I can understand the utility of pointers to records into a list object, but with Generics Containers (TList<T>), are there need to use pointer yet? if not, how to delete/release each record into a Generic Container? If I wanna delete a specific record into a Generic Container, how to do it?
There are lots of differences between records and classes; and no "Pointer to record" <> "Class". Each has its own pros and cons; one of the important things about software development is to understand these so you can more easily choose the most appropriate for a given situation.
This question is based on a false premise. Records are not almost like classes, in the same way that Integers are not almost like Doubles.
Classes must always be dynamically instantiated, whereas this is a possibility, but not a requirement for records.
Instances of classes (which we call objects) are always passed around by reference, meaning that multiple sections of code will share and act on the same instance. This is something important to remember, because you may unintentionally modify an object as a side-effect; although when done intentionally it's a powerful feature. Records on the other hand are passed by value; you need to explicitly indicate if you're passing them by reference.
Classes do not 'copy as easily as records'. When I say copy, I mean a separate instance duplicating a source. (This should be obvious in light of the value/reference comment above).
Records tend to work very nicely with typed files (because they're so easy to copy).
Records can overlay fields with other fields (case x of/unions)
These were comments on certain situational benefits of records; conversely, there are also situational benefits for classes that I'll not elaborate on.
Perhaps the easiest way to understand this is to be a little pedantic about it. Let's clarify; memory is not really allocated 'when its declared', it's allocated when the variable is in scope, and deallocated when it goes out of scope. So for a local variable, it's allocated just before the start of the routine, and deallocated just after the end. For a class field, it's allocated when the object is created, and deallocated when it's destroyed.
Again, there are pros and cons...
It can be slower and require more memory to copy entire records (as with generics) than to just copy the references.
Passing records around by reference (using pointers) is a powerful technique whereby you can easily have something else modify your copy of the record. Without this, you'd have to pass your record by value (i.e. copy it) receive the changed record as a result, copy it again to your own structures.
Are pointers to records like classes? No, not at all. Just two of the differences:
Classes support polymorphic inheritance.
Classes can implement interfaces.
For 1 and 2: records are value types, while classes are reference types. They're allocated on the stack, or directly in the memory space of any larger variable that contains them, instead of through a pointer, and automatically cleaned up by the compiler when they go out of scope.
As for your third question, a TList<TMyRecord> internally declares an array of TMyRecord for storage space. All the records in it will be cleaned up when the list is destroyed. If you want to delete a specific one, use the Delete method to delete by index, or the Remove method to find and delete. But be aware that since it's a value type, everything you do will be making copies of the record, not copying references to it.
One of the main benefits of records is, when you have a large "array of record". This is created in memory by allocating space for all records in one contiguous RAM space, which is extremely fast. If you had used "array of TClass" instead, each object in the array would have to be allocated by itself, which is slow.
There has been a lot of work to improve the speed of allocating memory, in order to improve the speed of strings and objects, but it will never be as fast as replacing 100,000 memory allocations with 1 memory allocation.
However, if you use array of record, don't copy the record around in local variables. That may easily kill the speed benefit.
1) To allow for inheritance and polymorphism, classes have some overhead. Records do not allow them, and in some situations may be somewhat faster and simpler to use. Unlike classes, that are always allocated in the heap and managed through references, records can be allocated on the stack also, accessed directly, and assigned each other without requiring to call an "Assign" method.
Also records are useful to access memory blocks with a given structure, because their memory layout is exactly how you define it. A class instance memory layout is controlled by the compiler and has additional data to make objects work (i.e. the pointer to the Virtual Method Table).
2) Unless you allocate records dynamically, using New() or GetMem(), record's memory is managed by the compiler as ordinals, floats or static arrays: global variables memory is allocated at startup and released when the program terminates, and local variables are allocated on the stack entering a function/procedure/method and released exiting. Allocating/releasing memory in the stack is faster because it doesn't require calls to the memory manager, it's just very few assembler instructions to change the stack registers. But be aware that allocating large structure on the stack may cause a stack overflow, because the maximum stack size is fixed and not very large (see linker options).
If records are fields of a class, they are allocated when the class is created and released when the class is freed.
3) One of the advantages of generics is to eliminate the need of low-level pointer management - but be aware of the inner workings.
There are a few other differences between a class and a record. Classes can use polymorphism, and expose interfaces. Records can not implement destructors (although since Delphi 2006 they can now implement constructors and methods).
Records are very useful in segmenting memory into a more logical structure since the first data item in the record is at the same address point of the pointer to the record itself. This is not the case for classes.

Resources