Approaches for caching calculated values - delphi

In a Delphi application we are working on we have a big structure of related objects. Some of the properties of these objects have values which are calculated at runtime and I am looking for a way to cache the results for the more intensive calculations. An approach which I use is saving the value in a private member the first time it is calculated. Here's a short example:
unit Unit1;
interface
type
TMyObject = class
private
FObject1, FObject2: TMyOtherObject;
FMyCalculatedValue: Integer;
function GetMyCalculatedValue: Integer;
public
property MyCalculatedValue: Integer read GetMyCalculatedValue;
end;
implementation
function TMyObject.GetMyCalculatedValue: Integer;
begin
if FMyCalculatedValue = 0 then
begin
FMyCalculatedValue :=
FObject1.OtherCalculatedValue + // This is also calculated
FObject2.OtherValue;
end;
Result := FMyCalculatedValue;
end;
end.
It is not uncommon that the objects used for the calculation change and the cached value should be reset and recalculated. So far we addressed this issue by using the observer pattern: objects implement an OnChange event so that others can subscribe, get notified when they change and reset cached values. This approach works but has some downsides:
It takes a lot of memory to manage subscriptions.
It doesn't scale well when a cached value depends on lots of objects (a list for example).
The dependency is not very specific (even if a cache value depends only on one property it will be reset also when other properties change).
Managing subscriptions impacts the overall performance and is hard to maintain (objects are deleted, moved, ...).
It is not clear how to deal with calculations depending on other calculated values.
And finally the question: can you suggest other approaches for implementing cached calculated values?

If you want to avoid the Observer Pattern, you might try to use a hashing approach.
The idea would be that you 'hash' the arguments, and check if this match the 'hash' for which the state is saved. If it does not, then you recompute (and thus save the new hash as key).
I know I make it sound like I just thought about it, but in fact it is used by well-known softwares.
For example, SCons (Makefile alternative) does it to check if the target needs to be re-built preferably to a timestamp approach.
We have used SCons for over a year now, and we never detected any problem of target that was not rebuilt, so their hash works well!

You could store local copies of the external object values which are required. The access routine then compares the local copy with the external value, and only does the recalculation on a change.
Accessing the external objects properties would likewise force a possible re-evaluation of those properties, so the system should keep itself up-to-date automatically, but only re-calculate when it needs to. I don't know if you need to take steps to avoid circular dependencies.
This increases the amount of space you need for each object, but removes the observer pattern. It also defers all calculations until they are needed, instead of performing the calculation every time a source parameter changes. I hope this is relevant for your system.
unit Unit1;
interface
type
TMyObject = class
private
FObject1, FObject2: TMyOtherObject;
FObject1Val, FObject2Val: Integer;
FMyCalculatedValue: Integer;
function GetMyCalculatedValue: Integer;
public
property MyCalculatedValue: Integer read GetMyCalculatedValue;
end;
implementation
function TMyObject.GetMyCalculatedValue: Integer;
begin
if (FObject1.OtherCalculatedValue <> FObjectVal1)
or (FObject2.OtherValue <> FObjectVal2) then
begin
FMyCalculatedValue :=
FObject1.OtherCalculatedValue + // This is also calculated
FObject2.OtherValue;
FObjectVal1 := FObject1.OtherCalculatedValue;
FObjectVal2 := Object2.OtherValue;
end;
Result := FMyCalculatedValue;
end;
end.

In my work I use Bold for Delphi that can manage unlimited complex structures of cached values depending on each other. Usually each variable only holds a small part of the problem. In this framework that is called derived attributes. Derived because the value is not saved in the database, It just depends on on other derived attributes or persistant attributes in the database.
The code behind such attribute is written in Delphi as a procedure or in OCL (Object Constraint Language) in the model. If you write it as Delphi code you have to subscribe to the depending variables. So if attribute C depends on A and B then whenever A or B changes the code for recalc C is called automatically when C is read. So the first time C is read A and B is also read (maybe from the database). As long as A and B is not changed you can read C and got very fast performance. For complex calculations this can save quite a lot of CPU-time.
The downside and bad news is that Bold is not offically supported anymore and you cannot buy it either. I suppose you can get if you ask enough people, but I don't know where you can download it. Around 2005-2006 it was downloadable for free from Borland but not anymore.
It is not ready for D2009 as someone have to port it to Unicode.
Another option is ECO with dot.net from Capable Objects. ECO is a plugin in Visual Studio. It is a supported framwork that have the same idea and author as Bold for Delphi. Many things are also improved, for example databinding is used for the GUI-components. Both Bold and ECO use a model as a central point with classes, attributes and links. Those can be persisted in a database or a xml-file. With the free version of ECO the model can have max 12 classes, but as I remember there is no other limits.
Bold and ECO contains lot more than derived attributes that makes you more productive and allow you to think on the problem instead of technical details of database or in your case how to cache values. You are welcome with more questions about those frameworks!
Edit:
There is actually a download link for Embarcadero registred users for Bold for Delphi for D7, quite old... I know there was updates for D2005, ad D2006.

Related

Difference between Record and Packed Record [duplicate]

While reviewing some code in our legacy Delphi 7 program, I noticed that everywhere there is a record it is marked with packed. This of course means that the record is stored byte-for-byte and not aligned to be faster for the CPU to access. The packing seems to have been done blindly as an attempt to outsmart the compiler or something -- basically valuing a few bytes of memory instead of faster access
An example record:
TFooTypeRec = packed record
RID : Integer;
Description : String;
CalcInTotalIncome : Boolean;
RequireAddress : Boolean;
end;
Should I fix this and make every record normal or "not" packed? Or with modern CPUs and memory is this negligible and probably a waste of time? Are there any problems that can result from unpacking?
There is no way to answer this question without a full understanding of how each of those packed records are used in your application code. It is the same as asking "Should I change this variable declaration from Int64 to Byte ?"
Without knowing what values that variable will be expected and required to maintain the answer could be yes. Or it could be no.
Similarly in your case. If a record needs to be packed then it should be left packed. If it does not need to be packed then there is no harm in not packing it. If you are not sure or cannot tell, then the safest course is to leave them as they are.
As a guide to making this determination (should you decide to proceed), situations where record packing is required or recommended include:
persistence of record values
sharing of record values with [potentially] differently compiled code
strict compatibility with externally defined structures
deliberately overlaying a type layout over differently structured memory
This isn't necessarily an exhaustive list, and what these all have in common is:
records comprising a series of values in adjacent bytes that must and can be relied upon by any potential producer or consumer of the record without possibility of interference from the compiler or other factors
What I would recommend is that (if possible and practical) you determine what purpose packing serves in each case and add documentation to that effect to the record declaration itself so that anyone in the future with the same question doesn't have to go through that discovery process, e.g.:
type
TSomeRecordType = packed record
// This record must be packed as it is used for persistence
..
end;
TSomeExternType = packed record
// This record must be packed as it is required to be compatible
// in memory with an externally defined struct (ref: extern code docs)
..
end;
The main idea of using packed records is not that you save a few bytes of memory! Instead, it is about guaranteeing that the variables are where you expect them to be in memory. Without such a guarantee, it would be impossible (or, at least, difficult) to manage memory manually on the heap and write to and read from files.
Hence, the program might malfunction if you 'unpack' the records!
If the record is stored/retrieved as packed or transfered in any way to a receiver that expects it to be packed, then do not change it.
Update :
There is a string type declared in your example. It looks suspicious, since storing the record in a binary file will not preserve the string content.
Packed record have length exactly size like members are.
No packed record are optimised (thay are aligned -> consequently higher) for better performance.

Does the compiler optimize (close) identical FieldByName calls?

In some code I'm maintaining, I see two different methods used in TClientDataSet.OnCalcFields event handlers:
with DataSet do
begin
// 1. Call FieldByName twice
if AMinDate > FieldByName(SPlanAllocatieFromDate).AsDateTime then
AMinDate := FieldByName(sPlanAllocatieFromDate).AsDateTime;
// 2. Put the retrieved FieldByName value in a temp var
lEmpID := FieldByName(SPlanAllocatieEmpID).AsInteger;
if lEmpID <> 0 then lTSAllocatedEmpIDs.Add(IntToStr(lEmpID));
end;
Will the compiler (Delphi XE2, Win32 app) optimize method 2 to use a temp var? The two FieldByNames are quite close, you could even say nested.
If not, I should rewrite 1. because OnCalcFields executes often.
BTW. I know about Fields[] versus FieldByName(), or using a temp TField var when running an EOF loop, those are not the issue here.
No version of the Delphi compiler does anything like this.
Such optimizations would require the compiler to be able to prove that the two calls to FieldByName would always give the same result, and there is currently no provision for flagging a method as being deterministic.
Note that it is quite possible in theory (if unlikely in reality) for the two calls NOT to give the same result, in this case e.g. if a different thread deletes a field out of the collection between the first and second call. Generally, the compiler does not know or care at the call site what a particular method call actually does.
Does the compiler optimize (close) identical FieldByName calls?
No it does not.
The compiler does not look inside function calls to see what is within. It therefore has no way to prove that the value returned by successive calls to a function would be the same. Likewise it has no way to prove that the function has no side-effects. These are the two prerequisites for the optimisation under consideration.
You will need to perform the optimisation yourself, by explicitly adding and using a local variable to store the value returned by a single call to FieldByName.
Beyond the consideration of performance, I would argue that the use of a local variable to hold the field is semantically much better. This makes it clear to the reader that all actions are performed on the same field. That reason alone would be enough to persuade me to make the change you describe. Don't repeat yourself.
And while we are in code review mode, you might care to reconsider the use of with.

About Data from TClientDataSet

I made a function that copies the Data from A TClientDataSet to B.
In production, the code will dynamically fill a TClientDataSet, like the following:
procedure LoadClientDataSet(const StringSql: String; ParamsAsArray: array of Variant; CdsToLoad: TClientDataSet);
begin
//There is a shared TClientDataSet that retrieves
//all data from a TRemoteDataModule in here.
//the following variables (and code) are here only to demonstration of the algorithm;
GlobalMainThreadVariable.SharedClientDataSet.CommandText := StringSql;
//Handle Parameters
GlobalMainThreadVariable.SharedClientDataSet.Open;
CdsToLoad.Data:= GlobalMainThreadVariable.SharedClientDataSet.Data;
GlobalMainThreadVariable.SharedClientDataSet.Close;
end;
That said:
It is safe to do so? (By safe I mean, what kind of exceptions should I expect and how to deal with them?)
How do I release ".Data's" memory?
The data storage residing behind the Data property is reference counted. Therefore you don't have to bother releasing it.
If you want to dive deeper into the intrinsics of TClientDataSet I recommend reading the really excellent book from Cary Jensen: Delphi in Depth: ClientDataSets
By assigning the Data property like you did duplicates the records. You have now two different isntances of TClientDataset with two different set of records with precisely the same structure, same row count and same field values.
It´s safe to do that if the receiving TClientDataset does not have any field structure previously defined or the existing structure is compatible with the Data being assigned. However, if we are talking about a huge number of records, the assignment may take long time and, in an extreme circumstance, it could exaust the computer´s memory (it all the depends on the computer´s configuration).
In order to release the Data, just close the dataset.
If you prefer to have two instances of TClientDataset but one single instance of the records, my suggestion is to use the TClientDataset.CloneCursor method, that instead of copying the Data, just assign a reference to it in a different dataset. In this case it´s the very same Data shared between two different datasets.

Delphi - How to genericize query result set

I am working with multiple databases within the same application. I am using drivers from two different companies. Both companies have tTable and tQuery Descendants that work well.
I need a way to have generic access to the data, regardless of which driver/tQuery component I am using to return a set of data. This data would NOT tie to components, just to my logic.
For example...(pseudocode) Let's create a function which can run against either tQuery component
function ListAllTables(NameOfDatabase : String) :ReturnSet??
begin
If NameOfDataBase = 'X' then use tQuery(Vendor 1)
else use tQuery(Vendor 2)
RunQuery;
Return Answer...
end;
When NORMALLY running a query, I do
Query.Open;
While not Query.EOF do
begin
Read my rows..
next;
end;
If I am CALLING ListAllTables, what is my return type so that I can iterate through the rows? Each tQuery Vendor is different, so I can't use that (can I, and if so, would I want to?) I could build a Memory Table, and pass that back, but that seems like extra work for ListAllRows to build a memory table, and then to pass it back to the calling routine so that it can "un-build", i.e. iterate through the rows...
What are your ideas and suggestions?
Thanks
GS
Almost all Delphi datasets descend from TDataset, and most useful behavior is defined on TDataset.
Therefore, if you assign each table or query to a variable of type TDataset, you should be able to perform your logic on that dataset in a vendor neutral fashion.
I would also isolate the production of the datasets into a set of factory functions that only create the vendor-specific dataset and return it as a TDataset. Each factory function goes in it's own unit. Then, only those small units need have any knowledge of the vendor specific components.
You can use IProviderSupport to generalize the query execution. I am expecting, that used query's are supporting IProviderSupport. This interface allows to set command text, parameters, execute commands, etc.
The common denominator for used query's is TDataSet. So, you will need pass TDataSet reference.
For example:
var
oDS: TDataSet;
...
if NameOfDataBase = 'X' then
oDS := T1Query.Create(nil)
else
oDS := T2Query.Create(nil);
(oDS as IProviderSupport).PSSetCommandText('select * from mytab');
oDS.Open;
while not oDS.Eof do begin
//
oDS.Next;
end;
Perhaps you could consider a universal data access component such as UniDAC or AnyDAC. This allows you to use only one component set to access different databases in a consistent way.
You might also be interested in DataAbstract from RemObjects. A powerful data abstraction, multi-tier, remoting solution with a lot of features. Not inexpensive, but excellent value for the money.
Relevant links:
http://www.da-soft.com/anydac/
http://www.devart.com/unidac/
http://www.remobjects.com/da/
A simple approach would be to have ListAllTables return (or populate) a stringlist.

How to handle billions of objects without "Outofmemory" error

I have an application which may needs to process billions of objects.Each object of is of TRange class type. These ranges are created at different parts of an algorithm which depends on certain conditions and other object properties. As a result, if you have 100 items, you can't directly create the 100th object without creating all the prior objects. If I create all the (billions of) objects and add to the collection, the system will throw Outofmemory error. Now I want to iterate through each object mainly for two purposes:
To apply an operation for each TRange object(eg:Output certain properties)
To get a cumulative sum of a certain property.(eg: Each range has a weight property and I want to retreive totalweight that is a sum of all the range weights).
How do I effectively create an Iterator for these object without raising Outofmemory?
I have handled the first case by passing a function pointer to the algorithm function. For eg:
procedure createRanges(aProc: TRangeProc);//aProc is a pointer to function that takes a //TRange
var range: TRange;
rangerec: TRangeRec;
begin
range:=TRange.Create;
try
while canCreateRange do begin//certain conditions needed to create a range
rangerec := ReturnRangeRec;
range.Update(rangerec);//don't create new, use the same object.
if Assigned(aProc) then aProc(range);
end;
finally
range.Free;
end;
end;
But the problem with this approach is that to add a new functionality, say to retrieve the Total weight I have mentioned earlier, either I have to duplicate the algorithm function or pass an optional out parameter. Please suggest some ideas.
Thank you all in advance
Pradeep
For such large ammounts of data you need to only have a portion of the data in memory. The other data should be serialized to the hard drive. I tackled such a problem like this:
I Created an extended storage that can store a custom record either in memory or on the hard drive. This storage has a maximum number of records that can live simultaniously in memory.
Then I Derived the record classes out of the custom record class. These classes know how to store and load themselves from the hard drive (I use streams).
Everytime you need a new or already existing record you ask the extended storage for such a record. If the maximum number of objects is exceeded, the storage streams some of the least used record back to the hard drive.
This way the records are transparent. You always access them as if they are in memory, but they may get loaded from hard drive first. It works really well. By the way RAM works in a very similar way so it only holds a certain subset of all you data on your hard drive. This is your working set then.
I did not post any code because it is beyond the scope of the question itself and would only confuse.
Look at TgsStream64. This class can handle a huge amounts of data through file mapping.
http://code.google.com/p/gedemin/source/browse/trunk/Gedemin/Common/gsMMFStream.pas
But the problem with this approach is that to add a new functionality, say to retrieve the Total weight I have mentioned earlier, either I have to duplicate the algorithm function or pass an optional out parameter.
It's usually done like this: you write a enumerator function (like you did) which receives a callback function pointer (you did that too) and an untyped pointer ("Data: pointer"). You define a callback function to have first parameter be the same untyped pointer:
TRangeProc = procedure(Data: pointer; range: TRange);
procedure enumRanges(aProc: TRangeProc; Data: pointer);
begin
{for each range}
aProc(range, Data);
end;
Then if you want to, say, sum all ranges, you do it like this:
TSumRecord = record
Sum: int64;
end;
PSumRecord = ^TSumRecord;
procedure SumProc(SumRecord: PSumRecord; range: TRange);
begin
SumRecord.Sum := SumRecord.Sum + range.Value;
end;
function SumRanges(): int64;
var SumRec: TSumRecord;
begin
SumRec.Sum := 0;
enumRanges(TRangeProc(SumProc), #SumRec);
Result := SumRec.Sum;
end;
Anyway, if you need to create billions of ANYTHING you're probably doing it wrong (unless you're a scientist, modelling something extremely large scale and detailed). Even more so if you need to create billions of stuff every time you want one of those. This is never good. Try to think of alternative solutions.
"Runner" has a good answer how to handle this!
But I would like to known if you could do a quick fix: make smaller TRange objects.
Maybe you have a big ancestor? Can you take a look at the instance size of TRange object?
Maybe you better use packed records?
This part:
As a result, if you have 100 items,
you can't directly create the 100th
object without creating all the prior
objects.
sounds a bit like calculating Fibonacci. May be you can reuse some of the TRange objects instead of creating redundant copies? Here is a C++ article describing this approach - it works by storing already calculated intermediate results in a hash map.
Handling billions of objects is possible but you should avoid it as much as possible. Do this only if you absolutely have to...
I did create a system once that needed to be able to handle a huge amount of data. To do so, I made my objects "streamable" so I could read/write them to disk. A larger class around it was used to decide when an object would be saved to disk and thus removed from memory. Basically, when I would call an object, this class would check if it's loaded or not. If not, it would re-create the object again from disk, put it on top of a stack and then move/write the bottom object from this stack to disk. As a result, my stack had a fixed (maximum) size. And it allowed me to use an unlimited amount of objects, with a reasonable good performance too.
Unfortunately, I don't have that code available anymore. I wrote it for a previous employer about 7 years ago. I do know that you would need to write a bit of code for the streaming support plus a bunch more for the stack controller which maintains all those objects. But it technically would allow you to create an unlimited number of objects, since you're trading RAM memory for disk space.

Resources