Recommended data structure for Dictionary+List - delphi

I am trying to optimize a piece of code that holds a bunch of relations betweens objects (pointers).
The situation is the following:
From a series of objects a series of drawings are generated:
TElement (A) ---> generates zero, one or more TDrawing (B)
(Classes are not relevant, so i'll call them A and B)
The current implementation is made with a dictionary
TDictionary< B, A>
so every object from class B comes from a unique A
But in order to identify all the objects of class B that belongs to a given object of class A this is very inefficient, as the dictionary need to be looped to find all the keys with a certain value.
The immediate solution is changing the data structure to the following:
TDictonary < TElements, TList < TDrawing > >
which is not completely efficient as a lot of small memory allocations needs to be done when filling the dictionary and list.
I wonder if someone could give me a hint about a good implementation, I would be grateful!
(sorry for my English)

A quite ugly code but I have come out with this solution:
PLinks:TDictionary<TElement, TObject>;
where the values are either a single object or a list. When adding a value, if the key was not previously set, it is set to the object. Only when trying to add a second value for a key, the value is converted to a list.
procedure AddLink(const source:TElement; object:TDrawing);
var
v:TObject;
list:TList<TDrawing>;
begin
if not PLinks.ContainsKey(source) then
PLinks.Add(source, object)
else begin
v:=PLinks[source];
if v is TList<TDrawing> then
(valor as TList<TDrawing>).add(n)
else begin // v is TDrawing
list:=TList<TDrawing>.Create;
list.Add(v as TDrawing); // adds previous unique item
list.Add(object);
PLinks[source]:=list;
end;
end;
end;
And obviously when looping through the values, it must be checked if the value is a single object or a list.
It's not very elegant, mostly because compile time type checking is lost, but in terms of speed and memory savings it's ok.

Related

ClientDataSet(random record) last record play- error "At beginning of table"

An error occurred while playing the last record in the table - At beginning of table How to fix it.
procedure TForm1.btnNextClick(Sender: TObject);
begin
self.ListBox1.ItemIndex := Random(ListBox1.Items.Count) - 0 ;
AddALL();
begin
ClientDataSet1.RecNo:=Random(ClientDataSet1.RecordCount) - 0;
PlayFile(self.exePath + '\' + self.ClientDataSet1.FieldByName('mp3').AsString, MediaPlayer1,Image2);
end
end;
Val Marinov seems to have given you a good answer to your question.
I just want to add
some points that don't directly answer your question but may help you avoid making some mistakes.
You have some code
self.ListBox1.ItemIndex := Random(ListBox1.Items.Count)
which you want to use to set the listbox's ItemIndex to a random, valid value. There are a couple of things which are asking for trouble about this:
1. Wrong way to use Random
The online help for the Random function says
In Delphi code, Random returns a random number within the range 0 <= X < Range. If Range is not specified, the result is a real-type random number within the range
0 <= X < 1.
For a ListBox, the range of valid ItemIndex values is 0..Items.Count - 1. But Random can return a fractional part, so a better way to write what you want is:
ListBox1.ItemIndex := Trunc(Random(ListBox1.Items.Count));
Called like that, Random will return a value below ListBox1.Items.Count, and the call to Trunc discards the fractional part.
2. Unnecessary use of self.
Your code is liberally sprinkled with the self qualifier. Having to use self like that is usually a sign of bad or sloppy coding.
In your TForm1.AddALL, the self in the first line tells the compiler that the instance of ListBox1 you are referring to is the one which is the TListBox component on your TForm1, rather than some other ListBox1 variable which may also be in scope (e.g. a global variable called ListBox1) when the line is compiled. But the way to avoid that problem is to avoid having the other ListBox1 in scope in the first place.
I suggest you simply delete all the instances of self., because you shouldn't need to have them.
3. Avoid setting dataset RecordNumber
Finally, don't get into the habit of relying on the fact that TClientDataSet allows you to specify a value for RecordNumber, it is rarely a good idea and few dataset types support it.
If you want to go to a random record, better use
Dataset.First;
DataSet.MoveBy(Random(X));
I leave it to you to work out what the argument X to Random should be, to move to a valid, random, record, based on what the online help says about Random.
Record Numbers
Client datasets support a second way of moving directly to a given
record in the dataset: setting the RecNo property of the dataset.
RecNo is a one-based number indicating the sequential number of the
current record relative to the beginning of the dataset.
You can read the RecNo property to determine the current absolute
record number, and write the RecNo property to set the current record.
There are two important things to keep in mind with respect to RecNo:
Attempting to set RecNo to a number less than one, or to a number greater than the number of records in the dataset results in an At
beginning of table, or an At end of table exception, respectively.
The record number of any given record is not guaranteed to be constant. For instance, changing the active index on a dataset alters
the record number of all records in the dataset.
NOTE
You can determine the number of records in the dataset by inspecting
the dataset's RecordCount property. When setting RecNo, never attempt
to set it to a number higher than RecordCount.
See : http://docs.embarcadero.com/products/rad_studio/delphiAndcpp2009/HelpUpdate2/EN/html/delphivclwin32/DB_TDataSet_RecNo.html

How can I avoid running out of memory with a growing TDictionary?

TDictionary<TKey,TValue> uses an internal array that is doubled if it is full:
newCap := Length(FItems) * 2;
if newCap = 0 then
newCap := 4;
Rehash(newCap);
This performs well with medium number of items, but if one gets to the upper limit it is very unfortunate, because it might throw an EOutOfMemory exception even if there is almost half of the memory still available.
Is there any way to influence this behaviour? How do other collection classes deal with this scenario?
You need to understand how a Dictionary works. A dictionary contains a list of "hash buckets" where the items you insert are placed. That's a finite number, so once you fill it up you need to allocate more buckets, there's no way around it. Since the assignment of objects-to-buckets is based on the result of a hash function, you can't simply add buckets to the end of the array and put stuff in there, you need to re-allocate the whole list of blocks, re-hash everything and put it in the (new) corresponding buckets.
Given this behavior, the only way to make the dictionary not re-allocate once full is to make sure it never gets full. If you know the number of items you'll insert in the dictionary pass it as a parameter to the constructor and you'll be done, no more dictionary reallocations.
If you can't do that (you don't know the number of items you'll have in the dictionary) you'll need to reconsider what made you select the TDictionary in the first place and select a data structure that offers better compromise for your particular algorithm. For example you could use binary search trees, as they do the balancing by rotating information in existing nodes, no need for re-allocations ever.

Understanding memory allocation for TList<RecordType>

I have to store a TList of something that can easily be implemented as a record in Delphi (five simple fields). However, it's not clear to me what happens when I do TList<TMyRecordType>.Add(R).
Since R is a local variable in the procedure in which I create the my TList, I assume that the memory for it will be released when the function returns. Does this leave an invalid record pointer in the list? Or does the list know to copy-on-assign? If the former, I assume I would have to manually manager the memory for R with New() and Dispose(), is that correct?
Alternatively, I can "promote" my record type to a class type by simply declaring the fields public (without even bothering with making them formal properties). Is that considered OK, or ought I to take the time to build out the class with private fields and public properties?
Simplified: records are blobs of data and are passed around by value - i.e. by copying them - by default. TList<T> stores values in an array of type T. So, TList<TMyRecordType>.Add(R) will copy the value R into the array at position Count, and increment the Count by one. No need to worry about allocation or deallocation of memory.
More complex issues that you usually don't need to worry about: if your record contains fields of a string type, an interface type, a dynamic array, or a record which itself contains fields of one of these types, then it's not just a simply copy of data; instead, CopyRecord from System.pas is used, which ensures that reference counts are updated correctly. But usually you don't need to worry about this detail unless you are using Move to shift the bits around yourself, or doing similar low-level operations.

How to handle billions of objects without "Outofmemory" error

I have an application which may needs to process billions of objects.Each object of is of TRange class type. These ranges are created at different parts of an algorithm which depends on certain conditions and other object properties. As a result, if you have 100 items, you can't directly create the 100th object without creating all the prior objects. If I create all the (billions of) objects and add to the collection, the system will throw Outofmemory error. Now I want to iterate through each object mainly for two purposes:
To apply an operation for each TRange object(eg:Output certain properties)
To get a cumulative sum of a certain property.(eg: Each range has a weight property and I want to retreive totalweight that is a sum of all the range weights).
How do I effectively create an Iterator for these object without raising Outofmemory?
I have handled the first case by passing a function pointer to the algorithm function. For eg:
procedure createRanges(aProc: TRangeProc);//aProc is a pointer to function that takes a //TRange
var range: TRange;
rangerec: TRangeRec;
begin
range:=TRange.Create;
try
while canCreateRange do begin//certain conditions needed to create a range
rangerec := ReturnRangeRec;
range.Update(rangerec);//don't create new, use the same object.
if Assigned(aProc) then aProc(range);
end;
finally
range.Free;
end;
end;
But the problem with this approach is that to add a new functionality, say to retrieve the Total weight I have mentioned earlier, either I have to duplicate the algorithm function or pass an optional out parameter. Please suggest some ideas.
Thank you all in advance
Pradeep
For such large ammounts of data you need to only have a portion of the data in memory. The other data should be serialized to the hard drive. I tackled such a problem like this:
I Created an extended storage that can store a custom record either in memory or on the hard drive. This storage has a maximum number of records that can live simultaniously in memory.
Then I Derived the record classes out of the custom record class. These classes know how to store and load themselves from the hard drive (I use streams).
Everytime you need a new or already existing record you ask the extended storage for such a record. If the maximum number of objects is exceeded, the storage streams some of the least used record back to the hard drive.
This way the records are transparent. You always access them as if they are in memory, but they may get loaded from hard drive first. It works really well. By the way RAM works in a very similar way so it only holds a certain subset of all you data on your hard drive. This is your working set then.
I did not post any code because it is beyond the scope of the question itself and would only confuse.
Look at TgsStream64. This class can handle a huge amounts of data through file mapping.
http://code.google.com/p/gedemin/source/browse/trunk/Gedemin/Common/gsMMFStream.pas
But the problem with this approach is that to add a new functionality, say to retrieve the Total weight I have mentioned earlier, either I have to duplicate the algorithm function or pass an optional out parameter.
It's usually done like this: you write a enumerator function (like you did) which receives a callback function pointer (you did that too) and an untyped pointer ("Data: pointer"). You define a callback function to have first parameter be the same untyped pointer:
TRangeProc = procedure(Data: pointer; range: TRange);
procedure enumRanges(aProc: TRangeProc; Data: pointer);
begin
{for each range}
aProc(range, Data);
end;
Then if you want to, say, sum all ranges, you do it like this:
TSumRecord = record
Sum: int64;
end;
PSumRecord = ^TSumRecord;
procedure SumProc(SumRecord: PSumRecord; range: TRange);
begin
SumRecord.Sum := SumRecord.Sum + range.Value;
end;
function SumRanges(): int64;
var SumRec: TSumRecord;
begin
SumRec.Sum := 0;
enumRanges(TRangeProc(SumProc), #SumRec);
Result := SumRec.Sum;
end;
Anyway, if you need to create billions of ANYTHING you're probably doing it wrong (unless you're a scientist, modelling something extremely large scale and detailed). Even more so if you need to create billions of stuff every time you want one of those. This is never good. Try to think of alternative solutions.
"Runner" has a good answer how to handle this!
But I would like to known if you could do a quick fix: make smaller TRange objects.
Maybe you have a big ancestor? Can you take a look at the instance size of TRange object?
Maybe you better use packed records?
This part:
As a result, if you have 100 items,
you can't directly create the 100th
object without creating all the prior
objects.
sounds a bit like calculating Fibonacci. May be you can reuse some of the TRange objects instead of creating redundant copies? Here is a C++ article describing this approach - it works by storing already calculated intermediate results in a hash map.
Handling billions of objects is possible but you should avoid it as much as possible. Do this only if you absolutely have to...
I did create a system once that needed to be able to handle a huge amount of data. To do so, I made my objects "streamable" so I could read/write them to disk. A larger class around it was used to decide when an object would be saved to disk and thus removed from memory. Basically, when I would call an object, this class would check if it's loaded or not. If not, it would re-create the object again from disk, put it on top of a stack and then move/write the bottom object from this stack to disk. As a result, my stack had a fixed (maximum) size. And it allowed me to use an unlimited amount of objects, with a reasonable good performance too.
Unfortunately, I don't have that code available anymore. I wrote it for a previous employer about 7 years ago. I do know that you would need to write a bit of code for the streaming support plus a bunch more for the stack controller which maintains all those objects. But it technically would allow you to create an unlimited number of objects, since you're trading RAM memory for disk space.

Approaches for caching calculated values

In a Delphi application we are working on we have a big structure of related objects. Some of the properties of these objects have values which are calculated at runtime and I am looking for a way to cache the results for the more intensive calculations. An approach which I use is saving the value in a private member the first time it is calculated. Here's a short example:
unit Unit1;
interface
type
TMyObject = class
private
FObject1, FObject2: TMyOtherObject;
FMyCalculatedValue: Integer;
function GetMyCalculatedValue: Integer;
public
property MyCalculatedValue: Integer read GetMyCalculatedValue;
end;
implementation
function TMyObject.GetMyCalculatedValue: Integer;
begin
if FMyCalculatedValue = 0 then
begin
FMyCalculatedValue :=
FObject1.OtherCalculatedValue + // This is also calculated
FObject2.OtherValue;
end;
Result := FMyCalculatedValue;
end;
end.
It is not uncommon that the objects used for the calculation change and the cached value should be reset and recalculated. So far we addressed this issue by using the observer pattern: objects implement an OnChange event so that others can subscribe, get notified when they change and reset cached values. This approach works but has some downsides:
It takes a lot of memory to manage subscriptions.
It doesn't scale well when a cached value depends on lots of objects (a list for example).
The dependency is not very specific (even if a cache value depends only on one property it will be reset also when other properties change).
Managing subscriptions impacts the overall performance and is hard to maintain (objects are deleted, moved, ...).
It is not clear how to deal with calculations depending on other calculated values.
And finally the question: can you suggest other approaches for implementing cached calculated values?
If you want to avoid the Observer Pattern, you might try to use a hashing approach.
The idea would be that you 'hash' the arguments, and check if this match the 'hash' for which the state is saved. If it does not, then you recompute (and thus save the new hash as key).
I know I make it sound like I just thought about it, but in fact it is used by well-known softwares.
For example, SCons (Makefile alternative) does it to check if the target needs to be re-built preferably to a timestamp approach.
We have used SCons for over a year now, and we never detected any problem of target that was not rebuilt, so their hash works well!
You could store local copies of the external object values which are required. The access routine then compares the local copy with the external value, and only does the recalculation on a change.
Accessing the external objects properties would likewise force a possible re-evaluation of those properties, so the system should keep itself up-to-date automatically, but only re-calculate when it needs to. I don't know if you need to take steps to avoid circular dependencies.
This increases the amount of space you need for each object, but removes the observer pattern. It also defers all calculations until they are needed, instead of performing the calculation every time a source parameter changes. I hope this is relevant for your system.
unit Unit1;
interface
type
TMyObject = class
private
FObject1, FObject2: TMyOtherObject;
FObject1Val, FObject2Val: Integer;
FMyCalculatedValue: Integer;
function GetMyCalculatedValue: Integer;
public
property MyCalculatedValue: Integer read GetMyCalculatedValue;
end;
implementation
function TMyObject.GetMyCalculatedValue: Integer;
begin
if (FObject1.OtherCalculatedValue &LT;&GT; FObjectVal1)
or (FObject2.OtherValue &LT;&GT; FObjectVal2) then
begin
FMyCalculatedValue :=
FObject1.OtherCalculatedValue + // This is also calculated
FObject2.OtherValue;
FObjectVal1 := FObject1.OtherCalculatedValue;
FObjectVal2 := Object2.OtherValue;
end;
Result := FMyCalculatedValue;
end;
end.
In my work I use Bold for Delphi that can manage unlimited complex structures of cached values depending on each other. Usually each variable only holds a small part of the problem. In this framework that is called derived attributes. Derived because the value is not saved in the database, It just depends on on other derived attributes or persistant attributes in the database.
The code behind such attribute is written in Delphi as a procedure or in OCL (Object Constraint Language) in the model. If you write it as Delphi code you have to subscribe to the depending variables. So if attribute C depends on A and B then whenever A or B changes the code for recalc C is called automatically when C is read. So the first time C is read A and B is also read (maybe from the database). As long as A and B is not changed you can read C and got very fast performance. For complex calculations this can save quite a lot of CPU-time.
The downside and bad news is that Bold is not offically supported anymore and you cannot buy it either. I suppose you can get if you ask enough people, but I don't know where you can download it. Around 2005-2006 it was downloadable for free from Borland but not anymore.
It is not ready for D2009 as someone have to port it to Unicode.
Another option is ECO with dot.net from Capable Objects. ECO is a plugin in Visual Studio. It is a supported framwork that have the same idea and author as Bold for Delphi. Many things are also improved, for example databinding is used for the GUI-components. Both Bold and ECO use a model as a central point with classes, attributes and links. Those can be persisted in a database or a xml-file. With the free version of ECO the model can have max 12 classes, but as I remember there is no other limits.
Bold and ECO contains lot more than derived attributes that makes you more productive and allow you to think on the problem instead of technical details of database or in your case how to cache values. You are welcome with more questions about those frameworks!
Edit:
There is actually a download link for Embarcadero registred users for Bold for Delphi for D7, quite old... I know there was updates for D2005, ad D2006.

Resources