About Data from TClientDataSet - delphi

I made a function that copies the Data from A TClientDataSet to B.
In production, the code will dynamically fill a TClientDataSet, like the following:
procedure LoadClientDataSet(const StringSql: String; ParamsAsArray: array of Variant; CdsToLoad: TClientDataSet);
begin
//There is a shared TClientDataSet that retrieves
//all data from a TRemoteDataModule in here.
//the following variables (and code) are here only to demonstration of the algorithm;
GlobalMainThreadVariable.SharedClientDataSet.CommandText := StringSql;
//Handle Parameters
GlobalMainThreadVariable.SharedClientDataSet.Open;
CdsToLoad.Data:= GlobalMainThreadVariable.SharedClientDataSet.Data;
GlobalMainThreadVariable.SharedClientDataSet.Close;
end;
That said:
It is safe to do so? (By safe I mean, what kind of exceptions should I expect and how to deal with them?)
How do I release ".Data's" memory?

The data storage residing behind the Data property is reference counted. Therefore you don't have to bother releasing it.
If you want to dive deeper into the intrinsics of TClientDataSet I recommend reading the really excellent book from Cary Jensen: Delphi in Depth: ClientDataSets

By assigning the Data property like you did duplicates the records. You have now two different isntances of TClientDataset with two different set of records with precisely the same structure, same row count and same field values.
It´s safe to do that if the receiving TClientDataset does not have any field structure previously defined or the existing structure is compatible with the Data being assigned. However, if we are talking about a huge number of records, the assignment may take long time and, in an extreme circumstance, it could exaust the computer´s memory (it all the depends on the computer´s configuration).
In order to release the Data, just close the dataset.
If you prefer to have two instances of TClientDataset but one single instance of the records, my suggestion is to use the TClientDataset.CloneCursor method, that instead of copying the Data, just assign a reference to it in a different dataset. In this case it´s the very same Data shared between two different datasets.

Related

What is the fastest mean to transfer a record in DCOM

I want to transfer some records with the following structure between two Windows PC computer using COM/DCOM. I prefer to transfer an array, say 100 members of TARec, at a time, not each record individually. Currently I am doing this using IStrings. I am looking to improve it using the raw records, to save the time to encode/decode the strings at both ends. Please share your experience.
type
TARec = record
A : TDateTime;
B : WORD;
C : Boolean;
D : Double;
end;
All the record's field type are OLE compatible. Many thanks in advance.
As Rudy suggests in the comments, if your data contains simple value types then a variant byte array can be a very efficient approach and quite simple to implement.
Since you have stated that your data already resides in an array, the basic approach would be:
Create a byte array of the required size to hold all your record data (use VarArrayCreate with type varByte)
Lock the array to obtain a pointer that is safe to use to reference the array contents in memory (VarArrayLock will lock and return a pointer to the array data)
Use CopyMemory to directly copy the data from your array of records to the byte array memory.
Unlock the variant array (VarArrayUnlock) and pass it through your COM/DCOM interface
On the other ('receiving') side you simply reverse the process:
Declare an array of records of the required size
Lock the variant byte array to obtain a pointer to the memory holding the bytes
Copy the byte array data into your record array
Unlock the byte array
This exact approach is something I have used very successfully in a very demanding COM/DCOM scenario (w.r.t efficiency/performance) in the past.
Things to be careful of:
If your data ever changes to include more complex types such as strings or dynamic arrays then additional work will be required to correctly transport these through a byte array.
If your data structure ever changes then the code on both sides of the interface will need to be updated accordingly. One way to protect against this is to incorporate some mechanism for the data to be identified as valid or not by the receiver. This could include a "version number" for example and/or a value (in a 'header' as part of the byte array, in addition to the array data, or passed as a separate parameter entirely - precise details don't really matter). If the receiver finds a version number or size that it is not expecting then it can report this gracefully rather than naively processing the data incorrectly and (most likely) crashing or throwing exceptions as a result.
Alignment/packing issues. Even with the same declaration for the record type, if code is compiled with different alignment settings then the size required for each record in memory could change (which is why a "version number" for the data structure format might not be reliable on its own). One way to avoid this would be to declare the record as packed, though this comes at the cost of a slight reduction in efficiency (and still relies on both sides of the interface agreeing that the data structure is packed).
There are just things to bear in mind however, not prescriptive. Just how complex/robust your implementation needs to be will be determined by your specific case.

TClientDataSet and limit by memory

We have a system that creates reports out of our data. And we can deal with a lot of data. The idea of over 150,000 rows is not out of the question.
Unfortunately, our experience with TClientDataSet is its limitations, because it often results in an 'insufficient memory for this operation' error, when the data gets too big.
So the question is this: Does there exist a generally available implementation of TDataSet that can handle a large amount of data (such as streaming directly to a file and not keeping the entire dataset in memory)?
I am willing to implement such a class myself. But as far as I understand TClientDataSet, it needs to be able to contain the data itself before it can save it to a file/stream. In addition, loading the data again should also be possible as a stream rather than loading in an entire TClientDataSet object, because then we wouldn't have solved the issue.
You can use either FireBird or Interbase in embedded mode.
Is there really any need to cache all the data on the client before reporting? If not, maybe rethink how you're querying and processing data to generate these reports and see if there's a way that involves less client-side data (which comes with a bonus of less data transmitted over the network).
If you've been down that road before and you really do need all this data client side, then you could look at custom data structures. A TList<T> of records (even if you need to build your own indexes) takes a lot less memory than a TClientDataSet does.
KBMMemTable is a nice alternative to TClientDataset
http://www.components4programmers.com/products/kbmmemtable/
We are using it for years and it is very useful and fast.
Wanted to underline that the capacities of the TClientDataset could be something bigger.
Test on the limits of TClientDataset - appending xxx,xxx records, putting the single record in whole (repeated) to create an idea on the size.
// Begin Load Record to TCLientDataset for Reverse (on Reverse) Processing
dxMemData1.Append;
dxMemData1['NT_Rec_No'] := 1000;
dxMemData1['NT_User'] := 'DEV\Administrator';
dxMemData1['NT_Type'] := 'Information';
dxMemData1['Ora_Timestamp'] := '20170706033859.000';
dxMemData1['Ora_Host'] := 'DEV0001';
dxMemData1['Ora_SID'] := 'Oracle.orcl';
dxMemData1['Ora_Event_Id'] := '34';
dxMemData1['NT_Message'] := Memo1.Text;
dxMemData1.Post;
// End Load Record to TCLientDataset for Reverse (on Reverse) Processing
String on the memo1 is a of 100 characters - ansi
did several tests and managed to keep going to something from 600,000 to 900,000 records without crashing.
Difference can be made by making the text on the memo bigger - this did decrease the max number before crash - which means it is not a matter of exact max. record number - but of size consumed - my guess.
Tested the same with TdxMemData (devexpress), this time I could reach almost the double of records

Delphi - How to genericize query result set

I am working with multiple databases within the same application. I am using drivers from two different companies. Both companies have tTable and tQuery Descendants that work well.
I need a way to have generic access to the data, regardless of which driver/tQuery component I am using to return a set of data. This data would NOT tie to components, just to my logic.
For example...(pseudocode) Let's create a function which can run against either tQuery component
function ListAllTables(NameOfDatabase : String) :ReturnSet??
begin
If NameOfDataBase = 'X' then use tQuery(Vendor 1)
else use tQuery(Vendor 2)
RunQuery;
Return Answer...
end;
When NORMALLY running a query, I do
Query.Open;
While not Query.EOF do
begin
Read my rows..
next;
end;
If I am CALLING ListAllTables, what is my return type so that I can iterate through the rows? Each tQuery Vendor is different, so I can't use that (can I, and if so, would I want to?) I could build a Memory Table, and pass that back, but that seems like extra work for ListAllRows to build a memory table, and then to pass it back to the calling routine so that it can "un-build", i.e. iterate through the rows...
What are your ideas and suggestions?
Thanks
GS
Almost all Delphi datasets descend from TDataset, and most useful behavior is defined on TDataset.
Therefore, if you assign each table or query to a variable of type TDataset, you should be able to perform your logic on that dataset in a vendor neutral fashion.
I would also isolate the production of the datasets into a set of factory functions that only create the vendor-specific dataset and return it as a TDataset. Each factory function goes in it's own unit. Then, only those small units need have any knowledge of the vendor specific components.
You can use IProviderSupport to generalize the query execution. I am expecting, that used query's are supporting IProviderSupport. This interface allows to set command text, parameters, execute commands, etc.
The common denominator for used query's is TDataSet. So, you will need pass TDataSet reference.
For example:
var
oDS: TDataSet;
...
if NameOfDataBase = 'X' then
oDS := T1Query.Create(nil)
else
oDS := T2Query.Create(nil);
(oDS as IProviderSupport).PSSetCommandText('select * from mytab');
oDS.Open;
while not oDS.Eof do begin
//
oDS.Next;
end;
Perhaps you could consider a universal data access component such as UniDAC or AnyDAC. This allows you to use only one component set to access different databases in a consistent way.
You might also be interested in DataAbstract from RemObjects. A powerful data abstraction, multi-tier, remoting solution with a lot of features. Not inexpensive, but excellent value for the money.
Relevant links:
http://www.da-soft.com/anydac/
http://www.devart.com/unidac/
http://www.remobjects.com/da/
A simple approach would be to have ListAllTables return (or populate) a stringlist.

How to handle billions of objects without "Outofmemory" error

I have an application which may needs to process billions of objects.Each object of is of TRange class type. These ranges are created at different parts of an algorithm which depends on certain conditions and other object properties. As a result, if you have 100 items, you can't directly create the 100th object without creating all the prior objects. If I create all the (billions of) objects and add to the collection, the system will throw Outofmemory error. Now I want to iterate through each object mainly for two purposes:
To apply an operation for each TRange object(eg:Output certain properties)
To get a cumulative sum of a certain property.(eg: Each range has a weight property and I want to retreive totalweight that is a sum of all the range weights).
How do I effectively create an Iterator for these object without raising Outofmemory?
I have handled the first case by passing a function pointer to the algorithm function. For eg:
procedure createRanges(aProc: TRangeProc);//aProc is a pointer to function that takes a //TRange
var range: TRange;
rangerec: TRangeRec;
begin
range:=TRange.Create;
try
while canCreateRange do begin//certain conditions needed to create a range
rangerec := ReturnRangeRec;
range.Update(rangerec);//don't create new, use the same object.
if Assigned(aProc) then aProc(range);
end;
finally
range.Free;
end;
end;
But the problem with this approach is that to add a new functionality, say to retrieve the Total weight I have mentioned earlier, either I have to duplicate the algorithm function or pass an optional out parameter. Please suggest some ideas.
Thank you all in advance
Pradeep
For such large ammounts of data you need to only have a portion of the data in memory. The other data should be serialized to the hard drive. I tackled such a problem like this:
I Created an extended storage that can store a custom record either in memory or on the hard drive. This storage has a maximum number of records that can live simultaniously in memory.
Then I Derived the record classes out of the custom record class. These classes know how to store and load themselves from the hard drive (I use streams).
Everytime you need a new or already existing record you ask the extended storage for such a record. If the maximum number of objects is exceeded, the storage streams some of the least used record back to the hard drive.
This way the records are transparent. You always access them as if they are in memory, but they may get loaded from hard drive first. It works really well. By the way RAM works in a very similar way so it only holds a certain subset of all you data on your hard drive. This is your working set then.
I did not post any code because it is beyond the scope of the question itself and would only confuse.
Look at TgsStream64. This class can handle a huge amounts of data through file mapping.
http://code.google.com/p/gedemin/source/browse/trunk/Gedemin/Common/gsMMFStream.pas
But the problem with this approach is that to add a new functionality, say to retrieve the Total weight I have mentioned earlier, either I have to duplicate the algorithm function or pass an optional out parameter.
It's usually done like this: you write a enumerator function (like you did) which receives a callback function pointer (you did that too) and an untyped pointer ("Data: pointer"). You define a callback function to have first parameter be the same untyped pointer:
TRangeProc = procedure(Data: pointer; range: TRange);
procedure enumRanges(aProc: TRangeProc; Data: pointer);
begin
{for each range}
aProc(range, Data);
end;
Then if you want to, say, sum all ranges, you do it like this:
TSumRecord = record
Sum: int64;
end;
PSumRecord = ^TSumRecord;
procedure SumProc(SumRecord: PSumRecord; range: TRange);
begin
SumRecord.Sum := SumRecord.Sum + range.Value;
end;
function SumRanges(): int64;
var SumRec: TSumRecord;
begin
SumRec.Sum := 0;
enumRanges(TRangeProc(SumProc), #SumRec);
Result := SumRec.Sum;
end;
Anyway, if you need to create billions of ANYTHING you're probably doing it wrong (unless you're a scientist, modelling something extremely large scale and detailed). Even more so if you need to create billions of stuff every time you want one of those. This is never good. Try to think of alternative solutions.
"Runner" has a good answer how to handle this!
But I would like to known if you could do a quick fix: make smaller TRange objects.
Maybe you have a big ancestor? Can you take a look at the instance size of TRange object?
Maybe you better use packed records?
This part:
As a result, if you have 100 items,
you can't directly create the 100th
object without creating all the prior
objects.
sounds a bit like calculating Fibonacci. May be you can reuse some of the TRange objects instead of creating redundant copies? Here is a C++ article describing this approach - it works by storing already calculated intermediate results in a hash map.
Handling billions of objects is possible but you should avoid it as much as possible. Do this only if you absolutely have to...
I did create a system once that needed to be able to handle a huge amount of data. To do so, I made my objects "streamable" so I could read/write them to disk. A larger class around it was used to decide when an object would be saved to disk and thus removed from memory. Basically, when I would call an object, this class would check if it's loaded or not. If not, it would re-create the object again from disk, put it on top of a stack and then move/write the bottom object from this stack to disk. As a result, my stack had a fixed (maximum) size. And it allowed me to use an unlimited amount of objects, with a reasonable good performance too.
Unfortunately, I don't have that code available anymore. I wrote it for a previous employer about 7 years ago. I do know that you would need to write a bit of code for the streaming support plus a bunch more for the stack controller which maintains all those objects. But it technically would allow you to create an unlimited number of objects, since you're trading RAM memory for disk space.

Approaches for caching calculated values

In a Delphi application we are working on we have a big structure of related objects. Some of the properties of these objects have values which are calculated at runtime and I am looking for a way to cache the results for the more intensive calculations. An approach which I use is saving the value in a private member the first time it is calculated. Here's a short example:
unit Unit1;
interface
type
TMyObject = class
private
FObject1, FObject2: TMyOtherObject;
FMyCalculatedValue: Integer;
function GetMyCalculatedValue: Integer;
public
property MyCalculatedValue: Integer read GetMyCalculatedValue;
end;
implementation
function TMyObject.GetMyCalculatedValue: Integer;
begin
if FMyCalculatedValue = 0 then
begin
FMyCalculatedValue :=
FObject1.OtherCalculatedValue + // This is also calculated
FObject2.OtherValue;
end;
Result := FMyCalculatedValue;
end;
end.
It is not uncommon that the objects used for the calculation change and the cached value should be reset and recalculated. So far we addressed this issue by using the observer pattern: objects implement an OnChange event so that others can subscribe, get notified when they change and reset cached values. This approach works but has some downsides:
It takes a lot of memory to manage subscriptions.
It doesn't scale well when a cached value depends on lots of objects (a list for example).
The dependency is not very specific (even if a cache value depends only on one property it will be reset also when other properties change).
Managing subscriptions impacts the overall performance and is hard to maintain (objects are deleted, moved, ...).
It is not clear how to deal with calculations depending on other calculated values.
And finally the question: can you suggest other approaches for implementing cached calculated values?
If you want to avoid the Observer Pattern, you might try to use a hashing approach.
The idea would be that you 'hash' the arguments, and check if this match the 'hash' for which the state is saved. If it does not, then you recompute (and thus save the new hash as key).
I know I make it sound like I just thought about it, but in fact it is used by well-known softwares.
For example, SCons (Makefile alternative) does it to check if the target needs to be re-built preferably to a timestamp approach.
We have used SCons for over a year now, and we never detected any problem of target that was not rebuilt, so their hash works well!
You could store local copies of the external object values which are required. The access routine then compares the local copy with the external value, and only does the recalculation on a change.
Accessing the external objects properties would likewise force a possible re-evaluation of those properties, so the system should keep itself up-to-date automatically, but only re-calculate when it needs to. I don't know if you need to take steps to avoid circular dependencies.
This increases the amount of space you need for each object, but removes the observer pattern. It also defers all calculations until they are needed, instead of performing the calculation every time a source parameter changes. I hope this is relevant for your system.
unit Unit1;
interface
type
TMyObject = class
private
FObject1, FObject2: TMyOtherObject;
FObject1Val, FObject2Val: Integer;
FMyCalculatedValue: Integer;
function GetMyCalculatedValue: Integer;
public
property MyCalculatedValue: Integer read GetMyCalculatedValue;
end;
implementation
function TMyObject.GetMyCalculatedValue: Integer;
begin
if (FObject1.OtherCalculatedValue &LT;&GT; FObjectVal1)
or (FObject2.OtherValue &LT;&GT; FObjectVal2) then
begin
FMyCalculatedValue :=
FObject1.OtherCalculatedValue + // This is also calculated
FObject2.OtherValue;
FObjectVal1 := FObject1.OtherCalculatedValue;
FObjectVal2 := Object2.OtherValue;
end;
Result := FMyCalculatedValue;
end;
end.
In my work I use Bold for Delphi that can manage unlimited complex structures of cached values depending on each other. Usually each variable only holds a small part of the problem. In this framework that is called derived attributes. Derived because the value is not saved in the database, It just depends on on other derived attributes or persistant attributes in the database.
The code behind such attribute is written in Delphi as a procedure or in OCL (Object Constraint Language) in the model. If you write it as Delphi code you have to subscribe to the depending variables. So if attribute C depends on A and B then whenever A or B changes the code for recalc C is called automatically when C is read. So the first time C is read A and B is also read (maybe from the database). As long as A and B is not changed you can read C and got very fast performance. For complex calculations this can save quite a lot of CPU-time.
The downside and bad news is that Bold is not offically supported anymore and you cannot buy it either. I suppose you can get if you ask enough people, but I don't know where you can download it. Around 2005-2006 it was downloadable for free from Borland but not anymore.
It is not ready for D2009 as someone have to port it to Unicode.
Another option is ECO with dot.net from Capable Objects. ECO is a plugin in Visual Studio. It is a supported framwork that have the same idea and author as Bold for Delphi. Many things are also improved, for example databinding is used for the GUI-components. Both Bold and ECO use a model as a central point with classes, attributes and links. Those can be persisted in a database or a xml-file. With the free version of ECO the model can have max 12 classes, but as I remember there is no other limits.
Bold and ECO contains lot more than derived attributes that makes you more productive and allow you to think on the problem instead of technical details of database or in your case how to cache values. You are welcome with more questions about those frameworks!
Edit:
There is actually a download link for Embarcadero registred users for Bold for Delphi for D7, quite old... I know there was updates for D2005, ad D2006.

Resources