TClientDataSet and limit by memory - delphi

We have a system that creates reports out of our data. And we can deal with a lot of data. The idea of over 150,000 rows is not out of the question.
Unfortunately, our experience with TClientDataSet is its limitations, because it often results in an 'insufficient memory for this operation' error, when the data gets too big.
So the question is this: Does there exist a generally available implementation of TDataSet that can handle a large amount of data (such as streaming directly to a file and not keeping the entire dataset in memory)?
I am willing to implement such a class myself. But as far as I understand TClientDataSet, it needs to be able to contain the data itself before it can save it to a file/stream. In addition, loading the data again should also be possible as a stream rather than loading in an entire TClientDataSet object, because then we wouldn't have solved the issue.

You can use either FireBird or Interbase in embedded mode.

Is there really any need to cache all the data on the client before reporting? If not, maybe rethink how you're querying and processing data to generate these reports and see if there's a way that involves less client-side data (which comes with a bonus of less data transmitted over the network).
If you've been down that road before and you really do need all this data client side, then you could look at custom data structures. A TList<T> of records (even if you need to build your own indexes) takes a lot less memory than a TClientDataSet does.

KBMMemTable is a nice alternative to TClientDataset
http://www.components4programmers.com/products/kbmmemtable/
We are using it for years and it is very useful and fast.

Wanted to underline that the capacities of the TClientDataset could be something bigger.
Test on the limits of TClientDataset - appending xxx,xxx records, putting the single record in whole (repeated) to create an idea on the size.
// Begin Load Record to TCLientDataset for Reverse (on Reverse) Processing
dxMemData1.Append;
dxMemData1['NT_Rec_No'] := 1000;
dxMemData1['NT_User'] := 'DEV\Administrator';
dxMemData1['NT_Type'] := 'Information';
dxMemData1['Ora_Timestamp'] := '20170706033859.000';
dxMemData1['Ora_Host'] := 'DEV0001';
dxMemData1['Ora_SID'] := 'Oracle.orcl';
dxMemData1['Ora_Event_Id'] := '34';
dxMemData1['NT_Message'] := Memo1.Text;
dxMemData1.Post;
// End Load Record to TCLientDataset for Reverse (on Reverse) Processing
String on the memo1 is a of 100 characters - ansi
did several tests and managed to keep going to something from 600,000 to 900,000 records without crashing.
Difference can be made by making the text on the memo bigger - this did decrease the max number before crash - which means it is not a matter of exact max. record number - but of size consumed - my guess.
Tested the same with TdxMemData (devexpress), this time I could reach almost the double of records

Related

About Data from TClientDataSet

I made a function that copies the Data from A TClientDataSet to B.
In production, the code will dynamically fill a TClientDataSet, like the following:
procedure LoadClientDataSet(const StringSql: String; ParamsAsArray: array of Variant; CdsToLoad: TClientDataSet);
begin
//There is a shared TClientDataSet that retrieves
//all data from a TRemoteDataModule in here.
//the following variables (and code) are here only to demonstration of the algorithm;
GlobalMainThreadVariable.SharedClientDataSet.CommandText := StringSql;
//Handle Parameters
GlobalMainThreadVariable.SharedClientDataSet.Open;
CdsToLoad.Data:= GlobalMainThreadVariable.SharedClientDataSet.Data;
GlobalMainThreadVariable.SharedClientDataSet.Close;
end;
That said:
It is safe to do so? (By safe I mean, what kind of exceptions should I expect and how to deal with them?)
How do I release ".Data's" memory?
The data storage residing behind the Data property is reference counted. Therefore you don't have to bother releasing it.
If you want to dive deeper into the intrinsics of TClientDataSet I recommend reading the really excellent book from Cary Jensen: Delphi in Depth: ClientDataSets
By assigning the Data property like you did duplicates the records. You have now two different isntances of TClientDataset with two different set of records with precisely the same structure, same row count and same field values.
It´s safe to do that if the receiving TClientDataset does not have any field structure previously defined or the existing structure is compatible with the Data being assigned. However, if we are talking about a huge number of records, the assignment may take long time and, in an extreme circumstance, it could exaust the computer´s memory (it all the depends on the computer´s configuration).
In order to release the Data, just close the dataset.
If you prefer to have two instances of TClientDataset but one single instance of the records, my suggestion is to use the TClientDataset.CloneCursor method, that instead of copying the Data, just assign a reference to it in a different dataset. In this case it´s the very same Data shared between two different datasets.

Optimizing looping through dataset

I have a code some thing like this
dxMemOrdered : TdxMemData;
while not qrySandbox2.EOF do
begin
dxMemOrdered.append;
dxMemOrderedTotal.asCurrency := qrySandbox2.FieldByName('TOTAL').asCurrency;
dxMemOrdered.post;
qrySandbox2.Next;
end;
this code executes in a thread. When there are huge records say "400000" it is taking around 25 minutes to parse through it. Is there any way that i can reduce the size by optimizing the loop? Any help would be appreciated.
Update
Based on the suggestions i made the following changes
dxMemOrdered : TdxMemData;
qrySandbox2.DisableControls;
while not qrySandbox2.Recordset.EOF do
begin
dxMemOrdered.append;
dxMemOrderedTotal.asCurrency := Recordset.Fields['TOTAL'].Value;
dxMemOrdered.post;
qrySandbox2.Next;
end;
qrySandbox2.EnableControls;
and my output time have improved from 15 mins to 2 mins. Thank you guys
Without seeing more code, the only suggestion I can make is make sure that any visual control that is using the memory table is disabled. Suppose you have a cxgrid called Grid that is linked to your dxMemOrdered memory table:
var
dxMemOrdered: TdxMemData;
...
Grid.BeginUpdate;
try
while not qrySandbox2.EOF do
begin
dxMemOrdered.append;
dxMemOrderedTotal.asCurrency := qrySandbox2.FieldByName('TOTAL').asCurrency;
dxMemOrdered.Post;
qrySandbox2.Next;
end;
finally
Grid.EndUpdate;
end;
Some ideas in order of performance gain vs work to do by you:
1) Check if the SQL dialect that you are using lets you use queries that directly SELECT from/INSERT to. This depends on the database you're using.
2) Make sure that if your datasets are not coupled to visual controls, that you call DisableControls/EnableControls around this loop
3) Does this code have to run in the main program thread? Maybe you can send if off to a separate thread while the user/program continues doing something else
4) When you have to deal with really large data, bulk insertion is the way to go. Many databases have options to bulk insert data from text files. Writing to a text file first and then bulk inserting is way faster than individual inserts. Again, this depends on your database type.
[Edit: I just see you inserting the info that it's TdxMemData, so some of these no longer apply. And you're already threading, missed that ;-). I leave this suggestions in for other readers with similar problems]
It's much better to let SQL do the work instead of iterating though a loop in Delphi. Try a query such as
insert into dxMemOrdered (total)
select total from qrySandbox2
Is 'total' the only field in dxMemOrdered? I hope that it's not the primary key otherwise you are likely to have collisions, meaning that rows will not be added.
There's actually a lot you could do to speed up your thread.
The first would be to look at the problem in a broader perspective:
Am I fetching data from a cached / fast disk, possibly moved in memory?
Am I doing the right thing, when aggregating totals by hand? SQL engines are expecially optimized to do those things, all you'd need to do is to define an additional logical field where to store the SQL aggregated result.
Another little optimization that may bring an improvement over large amounts of looping is to not use constructs like:
Recordset.Fields['TOTAL'].Value
Recordset.FieldByName('TOTAL').Value
but to add the fields with the fields editor and then directly accessing the right field. You'll save a whole loop through the fields collection, that otherwise is performed on every field, on every next record.

How does AdoQuery handle blobs?

I am testing some databases components such as SDAC and others and I found out something interesting:
When I execute a query with TADOQuery and this query has a lot of blob fields and I get all rows (fetchall) the memory of my application gets close to 1.8GB and everything works fine.
Using other components, the same query executed on the same database trows an Out of Memory exception because it exceeds 1.8GB of memory usage.
I know I should not return all those rows, i should use pagination and blablabla. But i am curious how can ADO manage to get all rows and other components cant.
I think that ADO is compressing the blobs in memory, but this is only a guess.
Does anyone knows why memory usage in ADO is so good?
I cannot say about SDAC, but will say about AnyDAC TADQuery:
if exclude fiBlobs from FetchOptions.Items, then AnyDAC will not fetch BLOB values immediately. But will defer fetching until the application will really need a BLOB value;
setting FormatOptions.InlineDataSize to more small value, will reduce memory usage on fetching large result set with multiple character fields;
specifying FormatOptions.MapRules, application may choose more compact data type representation.
Also there are few other techniques, allowing to reduce memory usage when fetching large result sets. To use them properly a developer should know what kind of data will be returned. The price of some options usage may be a slightly reduced fetch performance.

TStringList of objects taking up tons of memory in Delphi XE

I'm working on a simulation program.
One of the first things the program does is read in a huge file (28 mb, about 79'000 lines,), parse each line (about 150 fields), create a class for the object, and add it to a TStringList.
It also reads in another file, which adds more objects during the run. At the end, it ends up being about 85'000 objects.
I was working with Delphi 2007, and the program used a lot of memory, but it ran OK. I upgraded to Delphi XE, and migrated the program over and now it's using a LOT more memory, and it ends up running out of memory half way through the run.
So in Delphi 2007, it would end up using 1.4 gigs after reading in the initial file, which is obviously a huge amount, but in XE, it ends up using almost 1.8 gigs, which is really huge and leads to running out and getting the error
So my question is
Why is it using so much memory?
Why is it using so much more memory in XE than 2007?
What can I do about this? I can't change how big or long the file is, and I do need to create an object for each line and to store it somewhere
Thanks
Just one idea which may save memory.
You could let the data stay on the original files, then just point to them from in-memory structures.
For instance, it's what we do for browsing big log files almost instantly: we memory-map the log file content, then we parse it quick to create indexes of useful information in memory, then we read the content dynamically. No string is created during the reading. Only pointers to each line beginning, with dynamic arrays containing the needed indexes. Calling TStringList.LoadFromFile would be definitively much slower and memory consuming.
The code is here - see the TSynLogFile class. The trick is to read the file only once, and make all indexes on the fly.
For instance, here is how we retrieve a line of text from the UTF-8 file content:
function TMemoryMapText.GetString(aIndex: integer): string;
begin
if (self=nil) or (cardinal(aIndex)>=cardinal(fCount)) then
result := '' else
result := UTF8DecodeToString(fLines[aIndex],GetLineSize(fLines[aIndex],fMapEnd));
end;
We use the exact same trick to parse JSON content. Using such a mixed approach is used by the fastest XML access libraries.
To handle your high-level data, and query them fast, you may try to use dynamic arrays of records, and our optimized TDynArray and TDynArrayHashed wrappers (in the same unit). Arrays of records will be less memory consuming, will be faster to search in because the data won't be fragemented (even faster if you use ordered indexes or hashes), and you'll be able to have high-level access to the content (you can define custom functions to retrieve the data from the memory mapped file, for instance). Dynamic arrays won't fit fast deletion of items (or you'll have to use lookup tables) - but you wrote you are not deleting much data, so it won't be a problem in your case.
So you won't have any duplicated structure any more, only logic in RAM, and data on memory-mapped file(s) - I added a "s" here because the same logic could perfectly map to several source data files (you need some "merge" and "live refresh" AFAIK).
It's hard to say why your 28 MB file is expanding to 1.4 GB worth of objects when you parse it out into objects without seeing the code and the class declarations. Also, you say you're storing it in a TStringList instead of a TList or TObjecList. This sounds like you're using it as some sort of string->object key/value mapping. If so, you might want to look at the TDictionary class in the Generics.Collections unit in XE.
As for why you're using more memory in XE, it's because the string type changed from an ANSI string to a UTF-16 string in Delphi 2009. If you don't need Unicode, you could use a TDictionary to save space.
Also, to save even more memory, there's another trick you could use if you don't need all 79,000 of the objects right away: lazy loading. The idea goes something like this:
Read the file into a TStringList. (This will use about as much memory as the file size. Maybe twice as much if it gets converted into Unicode strings.) Don't create any data objects.
When you need a specific data object, call a routine that checks the string list and looks up the string key for that object.
Check if that string has an object associated with it. If not, create the object from the string and associate it with the string in the TStringList.
Return the object associated with the string.
This will keep both your memory usage and your load time down, but it's only helpful if you don't need all (or a large percentage) of the objects immediately after loading.
In Delphi 2007 (and earlier), a string is an Ansi string, that is, every character occupies 1 byte of memory.
In Delphi 2009 (and later), a string is a Unicode string, that is, every character occupies 2 bytes of memory.
AFAIK, there is no way to make a Delphi 2009+ TStringList object use Ansi strings. Are you really using any of the features of the TStringList? If not, you could use an array of strings instead.
Then, naturally, you can choose between
type
TAnsiStringArray = array of AnsiString;
// or
TUnicodeStringArray = array of string; // In Delphi 2009+,
// string = UnicodeString
Reading though the comments, it sounds like you need to lift the data out of Delphi and into a database.
From there it is easy to match organ donors to receivers*)
SELECT pw.* FROM patients_waiting pw
INNER JOIN organs_available oa ON (pw.bloodtype = oa.bloodtype)
AND (pw.tissuetype = oa.tissuetype)
AND (pw.organ_needed = oa.organ_offered)
WHERE oa.id = '15484'
If you want to see the patients that might match against new organ-donor 15484.
In memory you only handle the few patients that match.
*) simplified beyond all recognition, but still.
In addition to Andreas' post:
Before Delphi 2009, a string header occupied 8 bytes. Starting with Delphi 2009, a string header takes 12 bytes. So every unique string uses 4 bytes more than before, + the fact that each character takes twice the memory.
Also, starting with Delphi 2010 I believe, TObject started using 8 bytes instead of 4. So for each single object created by delphi, delphi now uses 4 more bytes. Those 4 bytes were added to support the TMonitor class I believe.
If you're in desperate need to save memory, here's a little trick that could help if you have a lot of string value that repeats themselve.
var
uUniqueStrings : TStringList;
function ReduceStringMemory(const S : String) : string;
var idx : Integer;
begin
if not uUniqueStrings.Find(S, idx) then
idx := uUniqueStrings.Add(S);
Result := uUniqueStrings[idx]
end;
Note that this will help ONLY if you have a lot of string values that repeat themselves. For exemple, this code use 150mb less on my system.
var sl : TStringList;
I: Integer;
begin
sl := TStringList.Create;
try
for I := 0 to 5000000 do
sl.Add(ReduceStringMemory(StringOfChar('A',5)));every
finally
sl.Free;
end;
end;
I also read in a lot of strings in my program that can approach a couple of GB for large files.
Short of waiting for 64-bit XE2, here is one idea that might help you:
I found storing individual strings in a stringlist to be slow and wasteful in terms of memory. I ended up blocking the strings together. My input file has logical records, which may contain between 5 and 100 lines. So instead of storing each line in the stringlist, I store each record. Processing a record to find the line I need adds very little time to my processing, so this is possible for me.
If you don't have logical records, you might just want to pick a blocking size, and store every (say) 10 or 100 strings together as one string (with a delimiter separating them).
The other alternative, is to store them in a fast and efficient on-disk file. The one I'd recommend is the open source Synopse Big Table by Arnaud Bouchez.
May I suggest you try using the jedi class library (JCL) class TAnsiStringList, which is like TStringList fromDelphi 2007 in that it is made up of AnsiStrings.
Even then, as others have mentioned, XE will be using more memory than delphi 2007.
I really don't see the value of loading the full text of a giant flat file into a stringlist. Others have suggested a bigtable approach such as Arnaud Bouchez's one, or using SqLite, or something like that, and I agree with them.
I think you could also write a simple class that will load the entire file you have into memory, and provide a way to add line-by-line object links to a giant in-memory ansichar buffer.
Starting with Delphi 2009, not only strings but also every TObject has doubled in size. (See Why Has the Size of TObject Doubled In Delphi 2009?). But this would not explain this increase if there are only 85,000 objects. Only if these objects contain many nested objects, their size could be a relevant part of the memory usage.
Are there many duplicate strings in your list? Maybe trying to only store unique strings will help reducing the memory size. See my Question
about a string pool for a possible (but maybe too simple) answer.
Are you sure you don't suffer from a case of memory fragementation?
Be sure to use the latest FastMM (currently 4.97), then take a look at the UsageTrackerDemo demo that contains a memory map form showing the actual usage of the Delphi memory.
Finally take a look at VMMap that shows you how your process memory is used.

Fast read of a Nexus database table

I want to read the entire contents of a table into memory, as quickly as possible. I am using Nexus database, but there might be some techniques I could use that are applicable to all database types in Delphi.
The table I am looking at has 60,000 records with 20 columns. So not a huge data set.
From my profiling, I have found the following so far:
Accessing tables directly using TnxTable is no faster or slower than using a SQL query and 'SELECT * FROM TableName'
The simple act of looping through the rows, without actually reading or copying any data, takes the majority of the time.
The performance I am getting is
Looping through all records takes 3.5 seconds
Looping through all the records, reading the values and storing them, takes 3.7 seconds (i.e. only 0.2 seconds more)
A sample of my code
var query:TnxQuery;
begin
query.SQL.Text:='SELECT * FROM TableName';
query.Active:=True;
while not query.Eof do
query.Next;
This takes 3.5 seconds on a 60,000 row table.
Does this performance sound reasonable? Are there other approaches I can take that would let me read the data faster?
I am currently reading data from a server on the same computer, but eventually this may be from another server on a LAN.
You should be using BlockRead mode with a TnxTable for optimal read speed:
nxTable.BlockReadOptions := [gboBlobs, gboBookmarks];
//leave out gboBlobs if you want to access blobs only as needed
//leave out gboBookmarks if no bookmark support is required
nxTable.BlockReadSize := 1024*1024; //1MB
// setting block read size performs an implicit First
// while block read mode is active only calls to Next and First are allowed for navigation
try
while not nxTable.Eof do begin
// do something....
nxTable.Next;
end;
finally
nxTable.BlockReadSize := 0;
end;
Also, if you don't need to set a range on a specifc index, make sure to use the sequential access index for fastest possible access.

Resources