Best way to sort an array - delphi

Say I have an array of records which I want to sort based on one of the fields in the record. What's the best way to achieve this?
TExample = record
SortOrder : integer;
SomethingElse : string;
end;
var SomeVar : array of TExample;

You can add pointers to the elements of the array to a TList, then call TList.Sort with a comparison function, and finally create a new array and copy the values out of the TList in the desired order.
However, if you're using the next version, D2009, there is a new collections library which can sort arrays. It takes an optional IComparer<TExample> implementation for custom sorting orders. Here it is in action for your specific case:
TArray.Sort<TExample>(SomeVar , TDelegatedComparer<TExample>.Construct(
function(const Left, Right: TExample): Integer
begin
Result := TComparer<Integer>.Default.Compare(Left.SortOrder, Right.SortOrder);
end));

(I know this is a year later, but still useful stuff.)
Skamradt's suggestion to pad integer values assumes you are going to sort using a string compare. This would be slow. Calling format() for each insert, slower still. Instead, you want to do an integer compare.
You start with a record type:
TExample = record
SortOrder : integer;
SomethingElse : string;
end;
You didn't state how the records were stored, or how you wanted to access them once sorted. So let's assume you put them in a Dynamic Array:
var MyDA: Array of TExample;
...
SetLength(MyDA,NewSize); //allocate memory for the dynamic array
for i:=0 to NewSize-1 do begin //fill the array with records
MyDA[i].SortOrder := SomeInteger;
MyDA[i].SomethingElse := SomeString;
end;
Now you want to sort this array by the integer value SortOrder. If what you want out is a TStringList (so you can use the ts.Find method) then you should add each string to the list and add the SortOrder as a pointer. Then sort on the pointer:
var tsExamples: TStringList; //declare it somewhere (global or local)
...
tsExamples := tStringList.create; //allocate it somewhere (and free it later!)
...
tsExamples.Clear; //now let's use it
tsExamples.sorted := False; //don't want to sort after every add
tsExamples.Capacity := High(MyDA)+1; //don't want to increase size with every add
//an empty dynamic array has High() = -1
for i:=0 to High(MyDA) do begin
tsExamples.AddObject(MyDA[i].SomethingElse,TObject(MyDA[i].SortOrder));
end;
Note the trick of casting the Integer SortOrder into a TObject pointer, which is stored in the TStringList.Object property. (This depends upon the fact that Integer and Pointer are the same size.) Somewhere we must define a function to compare the TObject pointers:
function CompareObjects(ts:tStringList; Item1,Item2: integer): Integer;
begin
Result := CompareValue(Integer(ts.Objects[Item1]), Integer(ts.Objects[Item2]))
end;
Now, we can sort the tsList on .Object by calling .CustomSort instead of .Sort (which would sort on the string value.)
tsExamples.CustomSort(#CompareObjects); //Sort the list
The TStringList is now sorted, so you can iterate over it from 0 to .Count-1 and read the strings in sorted order.
But suppose you didn't want a TStringList, just an array in sorted order. Or the records contain more data than just the one string in this example, and your sort order is more complex. You can skip the step of adding every string, and just add the array index as Items in a TList. Do everything above the same way, except use a TList instead of TStringList:
var Mlist: TList; //a list of Pointers
...
for i:=0 to High(MyDA) do
Mlist.add(Pointer(i)); //cast the array index as a Pointer
Mlist.Sort(#CompareRecords); //using the compare function below
function CompareRecords(Item1, Item2: Integer): Integer;
var i,j: integer;
begin
i := integer(item1); //recover the index into MyDA
j := integer(item2); // and use it to access any field
Result := SomeFunctionOf(MyDA[i].SomeField) - SomeFunctionOf(MyDA[j].SomeField);
end;
Now that Mlist is sorted, use it as a lookup table to access the array in sorted order:
for i:=0 to Mlist.Count-1 do begin
Something := MyDA[integer(Mlist[i])].SomeField;
end;
As i iterates over the TList, we get back the array indexes in sorted order. We just need to cast them back to integers, since the TList thinks they're pointers.
I like doing it this way, but you could also put real pointers to array elements in the TList by adding the Address of the array element instead of it's index. Then to use them you would cast them as pointers to TExample records. This is what Barry Kelly and CoolMagic said to do in their answers.

If your need sorted by string then use sorted TStringList and
add record by TString.AddObject(string, Pointer(int_val)).
But If need sort by integer field and string - use TObjectList and after adding all records call TObjectList.Sort with necessary sorted functions as parameter.

This all depends on the number of records you are sorting. If you are only sorting less than a few hundred then the other sort methods work fine, if you are going to be sorting more, then take a good look at the old trusty Turbo Power SysTools project. There is a very good sort algorithm included in the source. One that does a very good job sorting millions of records in a efficient manner.
If you are going to use the tStringList method of sorting a list of records, make sure that your integer is padded to the right before inserting it into the list. You can use the format('%.10d',[rec.sortorder]) to right align to 10 digits for example.

The quicksort algorithm is often used when fast sorting is required. Delphi is (Or was) using it for List.Sort for example.
Delphi List can be used to sort anything, but it is an heavyweight container, which is supposed to look like an array of pointers on structures. It is heavyweight even if we use tricks like Guy Gordon in this thread (Putting index or anything in place of pointers, or putting directly values if they are smaller than 32 bits): we need to construct a list and so on...
Consequently, an alternative to easily and fastly sort an array of struct might be to use qsort C runtime function from msvcrt.dll.
Here is a declaration that might be good (Warning: code portable on windows only).
type TComparatorFunction = function(lpItem1: Pointer; lpItem2: Pointer): Integer; cdecl;
procedure qsort(base: Pointer; num: Cardinal; size: Cardinal; lpComparatorFunction: TComparatorFunction) cdecl; external 'msvcrt.dll';
Full example here.
Notice that directly sorting the array of records can be slow if the records are big. In that case, sorting an array of pointer to the records can be faster (Somehow like List approach).

With an array, I'd use either quicksort or possibly heapsort, and just change the comparison to use TExample.SortOrder, the swap part is still going to just act on the array and swap pointers. If the array is very large then you may want a linked list structure if there's a lot of insertion and deletion.
C based routines, there are several here
http://www.yendor.com/programming/sort/
Another site, but has pascal source
http://www.dcc.uchile.cl/~rbaeza/handbook/sort_a.html

Use one of the sort alorithms propose by Wikipedia. The Swap function should swap array elements using a temporary variable of the same type as the array elements. Use a stable sort if you want entries with the same SortOrder integer value to stay in the order they were in the first place.

TStringList have efficient Sort Method.
If you want Sort use a TStringList object with Sorted property to True.
NOTE: For more speed, add objects in a not Sorted TStringList and at the end change the property to True.
NOTE: For sort by integer Field, convert to String.
NOTE: If there are duplicate values, this method not is Valid.
Regards.

If you have Delphi XE2 or newer, you can try:
var
someVar: array of TExample;
list: TList<TExample>;
sortedVar: array of TExample;
begin
list := TList<TExample>.Create(someVar);
try
list.Sort;
sortedVar := list.ToArray;
finally
list.Free;
end;
end;

I created a very simple example that works correctly if the sort field is a string.
Type
THuman = Class
Public
Name: String;
Age: Byte;
Constructor Create(Name: String; Age: Integer);
End;
Constructor THuman.Create(Name: String; Age: Integer);
Begin
Self.Name:= Name;
Self.Age:= Age;
End;
Procedure Test();
Var
Human: THuman;
Humans: Array Of THuman;
List: TStringList;
Begin
SetLength(Humans, 3);
Humans[0]:= THuman.Create('David', 41);
Humans[1]:= THuman.Create('Brian', 50);
Humans[2]:= THuman.Create('Alex', 20);
List:= TStringList.Create;
List.AddObject(Humans[0].Name, TObject(Humans[0]));
List.AddObject(Humans[1].Name, TObject(Humans[1]));
List.AddObject(Humans[2].Name, TObject(Humans[2]));
List.Sort;
Human:= THuman(List.Objects[0]);
Showmessage('The first person on the list is the human ' + Human.name + '!');
List.Free;
End;

Related

Memory copy and memory compare array of array of Single

In Delphi, I declared a 3x3 matrix table as an array of array of Single, like this:
m_Table: array [0..2] of array [0..2] of Single;
Now I want to memory compare the content with another table, or memory copy the table content from another table. I know that I can create a nested loop to do that, but I want to do the job without any loop, if possible.
My question is, it is correct to copy or compare the memory like this:
CompareMem(m_Table, other.m_Table, 9 * SizeOf(Single));
CopyMemory(m_Table, other.m_Table, 9 * SizeOf(Single));
If not, what is the correct way to do that?
And as a subsidiary question, is there a better way to get the length to copy instead of 9 * SizeOf(Single), like e.g. SizeOf(m_Table^)?
Regards
The code in the question works fine. Personally I would say that Move is the idiomatic way to copy memory. Further I would use SizeOf(m_Table) to obtain the size of the type.
I would point out that your comparison differs from the floating point equality operator. Perhaps that's what you want, but you should be aware of this. For instance zero and minus zero compare equal using floating point comparison but not with memory compare. And NaNs always compare not equal, even with identical bit patterns.
Let me also comment that it would make your code more extendible if you declared a type for these matrices. Without that you won't be able to write functions that accept such objects.
The correct and easiest way may be to define a type:
type
TMatrix3x3 = array [0..2,0..2] of Single;
Then you can directly write:
var
v1, v2: TMatrix3x3;
begin
fillchar(v1,sizeof(v1),0);
move(v1,v2,sizeof(v1));
if comparemem(#v1,#v2,sizeof(v1)) then
writeln('equals');
end;
Using sizeof() make your code safe and readable.
You may define a wrapper type with methods:
{ TMatrix3x3 }
type
TMatrix3x3 = record
v: array [0..2,0..2] of Single;
procedure Zero;
procedure Copy(var dest: TMatrix3x3);
procedure Fill(const source: TMatrix3x3);
function Equals(const other: TMatrix3x3): boolean;
end;
procedure TMatrix3x3.Copy(var dest: TMatrix3x3);
begin
move(v,dest,sizeof(v));
end;
function TMatrix3x3.Equals(const other: TMatrix3x3): boolean;
begin
result := CompareMem(#v,#other.v,sizeof(v));
end;
procedure TMatrix3x3.Fill(const source: TMatrix3x3);
begin
move(source,v,sizeof(v));
end;
procedure TMatrix3x3.Zero;
begin
fillchar(v,sizeof(v),0);
end;
Including then advanced features like implicit assignment, and operators, if needed.
But don't reinvent the wheel, if you really to work with matrix arithmetic. Use an already existing and fully tested library, which will save you a lot of trouble and debugging time.
You should use standard TMatrix type from System.Math.Vectors unit, then you can just compare it directly as if Matrix1 = Matrix2 then and assign as Matrix1 := Matrix2

understanding TDictionary on IntegerList

can I create a TDictionary directly on a TList class ? It looks a bit double work if I create my TDictionary Class with key and values are always the same data like BlackList.Add(1, 1);
var
BlackList: TDictionary<Integer, Integer>;
ResultList: TDictionary<Integer, Integer>;
TestListA: TLIst<Integer>;
TestListB: TLIst<Integer>;
i: Integer;
begin
BlackList.Add(1, 1);
BlackList.Add(2, 2);
for i := 0 to TestListA.Count - 1 do
begin
if BlackList.ContainsValue(TestListA[i]) then
begin
// no action ...
end
else
begin
ResultList.Add(i, TestListA[i]);
end;
end;
for i := 0 to TestListB.Count - 1 do
begin
if BlackList.ContainsValue(TestListB[i]) then
begin
// no action ...
end
else
begin
if not(ResultList.ContainsValue(TestListB[i])) then
ResultList.Add(i, TestListB[i]);
end;
end;
end;
The purpose of this algorithm is to compare 2 Integer list's , find all doubles but exclude numbers from a blacklist. The first q is here Find common elements in two Integer List.
The whole purpose, in this particular code, of using TDictionary is to take advantage of O(1) lookup. With TList, which is an array, you do not have that property. Lookup is O(n) in general, O(log n) if sorted. So you cannot extract the O(1) lookup performance from a TList by any means, hence the use of TDictionary.
So it is the performance motivation that is driving the use of TDictionary. As I said in your previous question, the overhead of setting up dictionaries is only worthwhile if the lists are large. You would need to do some benchmarking to quantify what large means in this context.
As for your dictionary, it does not matter what values you use since only the keys are significant. So use zero always. Ideally what you want is a hash set type based on the same algorithm as the dictionary. But I don't believe there is such a thing in the stock RTL. I expect that libraries like Spring4D offer that functionality.

What is the canonical way to write a hasher function for TEqualityComparer.Construct?

Consider the following record:
TMyRecord = record
b: Boolean;
// 3 bytes of padding in here with default record alignment settings
i: Integer;
end;
I wish to implement IEqualityComparer<TMyRecord>. In order to do so I want to call TEqualityComparer<TMyRecord>.Construct. This needs to be supplied with a TEqualityComparison<TMyRecord> which presents no problems to me.
However, Construct also requires a THasher<TMyRecord> and I would like to know the canonical method for implementing that. The function needs to have the following form:
function MyRecordHasher(const Value: TMyRecord): Integer;
begin
Result := ???
end;
I expect that I need to call BobJenkinsHash on both fields of the record value and then combine them some how. Is this the right approach, and how should I combine them?
The reason I don't use TEqualityComparison<TMyRecord>.Default is that it uses CompareMem and so will be incorrect due to the record's padding.
The Effective Java (by Joshua Bloch) section about overriding hashCode could be useful. It shows how the individual parts of the object (or record) can be combined to efficiently construct a hashCode.
A good hash function tends to produce unequal hash codes for unequal
objects. This is exactly what is meant by the third provision of the
hashCode contract. Ideally, a hash function should distribute any
reasonable collection of unequal instances uniformly across all
possible hash values. Achieving this ideal can be extremely difficult.
Luckily it is not too difficult to achieve a fair approximation. Here
is a simple recipe:
Store some constant nonzero value, say 17, in an int variable called result.
For each significant field f in your object (each field taken into account by the equals method, that is), do the following:
a. Compute an int hash code c for the field: ..... details omitted ....
b. Combine the hash code c computed in step a into
result as follows: result = 37*result + c;
Return result.
When you are done writing the hashCode method, ask yourself whether equal instances have equal hash codes. If not, figure out why
and fix the problem.
This can be translated into Delphi code as follows:
{$IFOPT Q+}
{$DEFINE OverflowChecksEnabled}
{$Q-}
{$ENDIF}
function CombinedHash(const Values: array of Integer): Integer;
var
Value: Integer;
begin
Result := 17;
for Value in Values do begin
Result := Result*37 + Value;
end;
end;
{$IFDEF OverflowChecksEnabled}
{$Q+}
{$ENDIF}
This then allows the implementation of MyRecordHasher:
function MyRecordHasher(const Value: TMyRecord): Integer;
begin
Result := CombinedHash([IfThen(Value.b, 0, 1), Value.i]);
end;

Get length of record field of type array

I'm writing a wrapper for communication with an external binary API. The API uses PDUs (packed binary records) for communication. Strings are arrays of AnsiChar and are zero-terminated:
type
TSomePDU = packed record
//...
StringField: array[0..XYZ] of AnsiChar;
//...
end;
PSomePDU = ^TSomePDU;
I want to write a FillPDUString procedure that would accept a String and fill the char array, but I want to avoid keeping track of MaxLength wherever the procedure is used, so I need somehow to get the declared array size given a pointer to the field:
function GetMaxSize(const Field: array of AnsiChar): Integer;
begin
// ???
end;
//...
GetMaxSize(ARecord.StringField);
Is this possible?
If I understand you correctly, then you can use Delphi's Length function
Here's how to get the length:
function GetMaxSize(const Value: PSomePDU): Integer;
begin
Result := Length(Value.StringField);
end;
To obtain the number of elements that an array contains, use Length.
ElementCount := Length(ARecord.StringField);
Use low and high to obtain the bounds of any Delphi array.
MinIndex := low(ARecord.StringField);
MaxIndex := high(ARecord.StringField);
Using the latter approach, with low and high, allows you to avoid assuming that an array is 0-based.

Move() to Insert/Delete item(s) from a dynamic array of string

Using System.Move() to insert/delete item(s) from an array of string is not as easy as insert/delete it from other array of simple data types. The problem is ... string is reference counted in Delphi. Using Move() on reference-counted data types needs deeper knowledge on internal compiler behaviour.
Can someone here explain the needed steps for me to achieve that, or better with some snippet codes, or direct me to a good reference on the internet?
Oh, Please don't tell me to use the "lazy-but-slow way", that is, for loop, I know that.
I've demonstrated how to delete items from a dynamic array before:
Delphi Q&A: How do I delete an element from an array?
In that article, I start with the following code:
type
TXArray = array of X;
procedure DeleteX(var A: TXArray; const Index: Cardinal);
var
ALength: Cardinal;
i: Cardinal;
begin
ALength := Length(A);
Assert(ALength > 0);
Assert(Index < ALength);
for i := Index + 1 to ALength - 1 do
A[i - 1] := A[i];
SetLength(A, ALength - 1);
end;
You cannot go wrong with that code. Use whatever value for X you want; in your case, replace it with string. If you want to get fancier and use Move, then there's way to do that, too.
procedure DeleteX(var A: TXArray; const Index: Cardinal);
var
ALength: Cardinal;
TailElements: Cardinal;
begin
ALength := Length(A);
Assert(ALength > 0);
Assert(Index < ALength);
Finalize(A[Index]);
TailElements := ALength - Index;
if TailElements > 0 then
Move(A[Index + 1], A[Index], SizeOf(X) * TailElements);
Initialize(A[ALength - 1]);
SetLength(A, ALength - 1);
end;
Since X is string, the Finalize call is equivalent to assigning the empty string to that array element. I use Finalize in this code, though, because it will work for all array-element types, even types that include records, interfaces, strings, and other arrays.
For inserting, you just shift things the opposite direction:
procedure InsertX(var A: TXArray; const Index: Cardinal; const Value: X);
var
ALength: Cardinal;
TailElements: Cardinal;
begin
ALength := Length(A);
Assert(Index <= ALength);
SetLength(A, ALength + 1);
Finalize(A[ALength]);
TailElements := ALength - Index;
if TailElements > 0 then begin
Move(A[Index], A[Index + 1], SizeOf(X) * TailElements);
Initialize(A[Index]);
A[Index] := Value;
end;
Use Finalize when you're about to do something that's outside the bounds of the language, such as using the non-type-safe Move procedure to overwrite a variable of a compiler-managed type. Use Initialize when you're re-entering the defined part of the language. (The language defines what happens when an array grows or shrinks with SetLength, but it doesn't define how to copy or delete strings without using a string-assignment statement.)
You don't state if it is important for you to keep the array elements in the same order or not.
If the order is not relevant, you can so something really really fast like this:
procedure RemoveRecord(Index: integer);
begin
FRecords[Index]:= FRecords[High(FRecords)]; { Copy the last element over the 'deleted' element }
SetLength(FRecords, Length(FRecords)-1); { Cut the last element }
end;
{ I haven't tested the code to see it compiles, but you got the idea anyway... }
Sorting the list
If you have a HUGE list that needs to be modified by the user, you can use methods similar to the one above (break the list order). When the user its done editing (after multiple deletes), you present it with a button called "Sort list". Now he can do the lengthy (sort) operation.
Of course, I assume above that your list can be sorted by a certain parameter.
Sorting the list automatically
An alternative is to automate the sorting process. When the user deleted stuff from the list, start a timer. Keep resetting the timer if the user keeps deleting items. When the timer manages to trigger an event, do the sorting, stop the timer.
To insert a string, simply add a string (the lazy way) to the end of the array (which is an array of pointers), and then use Move to change the order of the elements of this array (of pointers).
If I wanted to insert a string into the middle of a list of strings, I'd use TStringList.Insert. (It does it quickly using System.Move.)
Any particular reason why you're using an array instead of a TStringList?
Call UniqueString() on it, before messing with it.
http://docwiki.embarcadero.com/VCL/en/System.UniqueString
Then you have a string with a single reference.
Fat chance that that is what delete and insert do too, and I doubt you'll be faster.
Just wanting to add this for any people that come here in the future.
Modifying Rob's code, I came up with this way of doing it that uses the newer TArray<T> type constructions.
type
TArrayExt = class(TArray)
class procedure Delete<T>(var A: TArray<T>; const Index: Cardinal; Count: Cardinal = 1);
end;
implementation
class procedure TArrayExt.Delete<T>(var A: TArray<T>; const Index: Cardinal;
Count: Cardinal = 1);
var
ALength: Cardinal;
i: Cardinal;
begin
ALength := Length(A);
Assert(ALength > 0);
Assert(Count > 0);
Assert(Count <= ALength - Index);
Assert(Index < ALength);
for i := Index + Count to ALength - 1 do
A[i - Count] := A[i];
SetLength(A, ALength - Count);
end;
A similar thing can be done for the insert.
(Not looking for this to get marked as the answer, just looking to provide an example that was too long to fit in the comments on Rob's excellent answer.)
(Fixed to address Rob's comments below.)
Move() works fine with reference counted types like strings or interfaces, and actually used internally in Delphi's arrays and lists. But, now, in general case, Move() is no longer valid because of managed records feature.
If you use System.Move to put items into an array of string, you should be aware that the strings that where there before the Move (and now overwritten), had a reference count of either -1 for constant strings, or > 0 for variable strings. Constant strings should not be altered, but variable strings should be treated accordingly: You should manually lower their reference-count (before they're overwritten!). To do that, you should try something like this:
Dec(PStrRec(IntPtr(SomeString)-12).refCnt);
But if the reference-count reached zero, you should also finalize the associated memory - something Delphi itself does a whole lot better if you let it work it's compiler-magic for strings. Oh, and also : if the strings you're copying come from the same array as your writing into, the needed administration becomes very cumbersome, very quickly!
So if it's in some way possible to avoid all this manual housekeeping, I would advise to let Delphi handle it itself.

Resources