Improve speed on Crc16 calculation - delphi

I need to calculate Crc16 checksums with a $1021 polynom over large files, below is my current implementation but it's rather slow on large files (eg a 90 MB file takes about 9 seconds).
So my question is how to improve my current implementation (to make it faster), I have googled and looked at some samples implementing a table lookup but my problem is that I don't understand how to modify them to include the polynom (probably my math is failing).
{ based on http://miscel.dk/MiscEl/CRCcalculations.html }
function Crc16(const Buffer: PByte; const BufSize: Int64;
const Polynom: WORD=$1021; const Seed: WORD=0): Word;
var
i,j: Integer;
begin
Result := Seed;
for i:=0 to BufSize-1 do
begin
Result := Result xor (Buffer[i] shl 8);
for j:=0 to 7 do begin
if (Result and $8000) <> 0 then
Result := (Result shl 1) xor Polynom
else Result := Result shl 1;
end;
end;
Result := Result and $FFFF;
end;

If you want this to be fast, you need to implement a table-lookup CRC algorithm.
See chapter 10 of A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS INDEX V3.00 (9/24/96)

Look for CRC routines from jclMath.pas unit of Jedi Code Library. It uses CRC lookup tables.
http://jcl.svn.sourceforge.net/viewvc/jcl/trunk/jcl/source/common/

Your Result variable is a Word, which means there are 64k possible values it could have upon entry to the inner loop. Calculate the 64k possible results that the loop could generate and store them in an array. Then, instead of looping eight times for each byte of the input buffer, simply look up the next value of the checksum in the array. Something like this:
function Crc16(const Buffer: PByte; const BufSize: Int64;
const Polynom: Word = $1021; const Seed: Word = 0): Word;
{$J+}
const
Results: array of Word = nil;
OldPolynom: Word = 0;
{$J-}
var
i, j: Integer;
begin
if (Polynom <> OldPolynom) or not Assigned(Results) then begin
SetLength(Results, 65535);
for i := 0 to Pred(Length(Results)) do begin
Results[i] := i;
for j := 0 to 7 do
if (Results[i] and $8000) <> 0 then
Results[i] := (Results[i] shl 1) xor Polynom
else
Results[i] := Results[i] shl 1;
end;
OldPolynom := Polynom;
end;
Result := Seed;
for i := 0 to Pred(BufSize) do
Result := Results[Result xor (Buffer[i] shl 8)];
end;
That code recalculates the lookup table any time Polynom changes. If that parameter varies among a set of values, then consider caching the lookup tables you generate for them so you don't waste time calculating the same tables repeatedly.
If Polynom will always be $1021, then don't even bother having a parameter for it. Calculate all 64k values in advance and hard-code them in a big array, so your entire function is reduced to just the last three lines of my function above.

Old thread, i know. Here is my implementation (just one loop):
function crc16( s : string; bSumPos : Boolean = FALSE ) : Word;
var
L, crc, sum, i, x, j : Word;
begin
Result:=0;
L:=length(s);
if( L > 0 ) then
begin
crc:=$FFFF;
sum:=length(s);
for i:=1 to L do
begin
j:=ord(s[i]);
sum:=sum+((i) * j);
x:=((crc shr 8) xor j) and $FF;
x:=x xor (x shr 4);
crc:=((crc shl 8) xor (x shl 12) xor (x shl 5) xor x) and $FFFF;
end;
Result:=crc+(Byte(bSumPos) * sum);
end;
end;
Nice thing is also that you can create an unique id with it, for example to get an unique identifier for a filename, like:
function uniqueId( s : string ) : Word;
begin
Result:=crc16( s, TRUE );
end;
Cheers,
Erwin Haantjes

Related

Byte array to Signed integer in Delphi

source array(4 bytes)
[$80,$80,$80,$80] =integer 0
[$80,$80,$80,$81] = 1
[$80,$80,$80,$FF] = 127
[$80,$80,$81,$01] = 128
need to convert this to integer.
below is my code and its working at the moment.
function convert(b: array of Byte): Integer;
var
i, st, p: Integer;
Negative: Boolean;
begin
result := 0;
st := -1;
for i := 0 to High(b) do
begin
if b[i] = $80 then Continue // skip leading 80
else
begin
st := i;
Negative := b[i] < $80;
b[i] := abs(b[i] - $80);
Break;
end;
end;
if st = -1 then exit;
for i := st to High(b) do
begin
p := round(Power(254, High(b) - i));
result := result + b[i] * p;
result := result - (p div 2);
end;
if Negative then result := -1 * result
end;
i'm looking for a better function?
Update:
file link
https://drive.google.com/file/d/0ByBA4QF-YOggZUdzcXpmOS1aam8/view?usp=sharing
in uploaded file ID field offset is from 5 to 9
NEW:
Now i got into new problem which is decoding date field
Date field hex [$80,$8F,$21,$C1] -> possible date 1995-12-15
* in uploaded file date field offset is from 199 to 203
Just an example of some improvements as outlined by David.
The array is passed by reference as a const.
The array is fixed in size.
The use of floating point calculations are converted directly into a constant array.
Const
MaxRange = 3;
Type
TMySpecial = array[0..MaxRange] of Byte;
function Convert(const b: TMySpecial): Integer;
var
i, j: Integer;
Negative: Boolean;
Const
// Pwr[i] = Round(Power(254,MaxRange-i));
Pwr: array[0..MaxRange] of Cardinal = (16387064,64516,254,1);
begin
for i := 0 to MaxRange do begin
if (b[i] <> $80) then begin
Negative := b[i] < $80;
Result := Abs(b[i] - $80)*Pwr[i] - (Pwr[i] shr 1);
for j := i+1 to MaxRange do
Result := Result + b[j]*Pwr[j] - (Pwr[j] shr 1);
if Negative then
Result := -Result;
Exit;
end;
end;
Result := 0;
end;
Note that less code lines is not always a sign of good performance.
Always measure performance before optimizing the code in order to find real bottlenecks.
Often code readability is better than optimizing over the top.
And for future references, please tell us what the algorithm is supposed to do.
Code for testing:
const
X : array[0..3] of TMySpecial =
(($80,$80,$80,$80), // =integer 0
($80,$80,$80,$81), // = 1
($80,$80,$80,$FF), // = 127
($80,$80,$81,$01)); // = 128
var
i,j: Integer;
sw: TStopWatch;
begin
sw := TStopWatch.StartNew;
for i := 1 to 100000000 do
for j := 0 to 3 do
Convert(X[j]);
WriteLn(sw.ElapsedMilliseconds);
ReadLn;
end.

How can I write comparison functions for unsigned 64 bit integers for versions of the compiler without support for UInt64?

How can I do that with Delphi 6? UInt64 is not known in Delphi 6. It was introduced in later versions.
var
i, j: Int64;
if UInt64(i) < UInt64(j) then ...
I am thinking of an asm procedure.
function UInt64CompareLT(i, j: Int64): Boolean;
asm
???
end;
function UInt64CompareGT(i, j: Int64): Boolean;
asm
???
end;
The Int64Rec type from SysUtils is designed for the task of picking out the parts of a 64 bit integer.
If you happen to be using a Delphi that pre-dates this type, define it yourself:
type
Int64Rec = packed record
case Integer of
0: (Lo, Hi: Cardinal);
1: (Cardinals: array [0..1] of Cardinal);
2: (Words: array [0..3] of Word);
3: (Bytes: array [0..7] of Byte);
end;
What's more, you only need a single function that returns -1 for less than, 1 for greater than and 0 for equals. Something like this:
function CompareUInt64(const i, j: Int64): Integer;
begin
if Int64Rec(i).Hi < Int64Rec(j).Hi then
Result := -1
else if Int64Rec(i).Hi > Int64Rec(j).Hi then
Result := 1
else if Int64Rec(i).Lo < Int64Rec(j).Lo then
Result := -1
else if Int64Rec(i).Lo > Int64Rec(j).Lo then
Result := 1
else
Result := 0;
end;
The idea is that you first compare the high order part, and only if that is equal do you then go on to compare the low order part.
This can be made simpler with a compare function for Cardinal.
function CompareCardinal(const i, j: Cardinal): Integer;
begin
if i < j then
Result := -1
else if i > j then
Result := 1
else
Result := 0;
end;
function CompareUInt64(const i, j: Int64): Integer;
begin
Result := CompareCardinal(Int64Rec(i).Hi, Int64Rec(j).Hi);
if Result = 0 then
Result := CompareCardinal(Int64Rec(i).Lo, Int64Rec(j).Lo);
end;
Finally, should you need the boolean functions of your question, they can be implemented on top of this more general function.
There's no need in using assembler (but, sure, you can do it): you can compare hi and low parts of Int64 instead:
function UInt64CompareLT(i, j: Int64): Boolean;
begin
if (LongWord(i shr 32) < LongWord(j shr 32)) then
Result := true
else if (LongWord(i shr 32) > LongWord(j shr 32)) then
Result := false
else if (LongWord(i and $FFFFFFFF) < LongWord(j and $FFFFFFFF)) then
Result := true
else
Result := false;
end;
function UInt64CompareGT(i, j: Int64): Boolean;
begin
if (LongWord(i shr 32) > LongWord(j shr 32)) then
Result := true
else if (LongWord(i shr 32) < LongWord(j shr 32)) then
Result := false
else if (LongWord(i and $FFFFFFFF) > LongWord(j and $FFFFFFFF)) then
Result := true
else
Result := false;
end;
Using assembler is possible, but would bind your code to a specific machine architecture.
You can achieve this in pure Pascal as follows:
type
//Delcare a variant record to have 2 ways to access the same data in memory.
T64Bit = record
case Integer of
0: (I64: Int64;);
1: (Small: Cardinal; Big: Cardinal);
end;
var
I, J: T64Bit;
begin
//You can set the value via normal Int64 assignment as follows:
I.I64 := 1;
J.I64 := 2;
//Then you can compare the "big" and "small" parts on the number using
//unsigned 32-bit comparisons as follows.
if (I.Big < J.Big) or ((I.Big = J.Big) and (I.Small< J.Small)) then
//The logic is as follows...
// If the big part is less than, the the small part doesn't matter
// If the big parts are equal, then the comparison of the small parts determines the result.

Fast linear list that excludes duplicates

I have the following code:
procedure TCellBlock.GeneratePtoQ;
var
x,y: integer;
i: integer;
Change: cardinal;
begin
i:= 0;
//Walk the grid of changed (alive) cells
for x:= GridMaxX downto 1 do begin
for y:= GridMaxY downto 1 do begin
if Active[cIndexP][x, y] then begin
Active[cIndexP][x,y]:= false;
//Put active items on the stack.
ToDo[i]:= x shl 16 or y;
Inc(i);
end; {if}
end; {for y}
end; {for x}
while i > 0 do begin
Dec(i);
y:= ToDo[i] and $FFFF;
x:= ToDo[i] shr 16;
//Calculate the cell, Change = (oldval XOR newval)
Change:= Grid[x,y].GeneratePtoQ;
//Mark the cells in the grid that need to be recalculated next generation.
Active[cIndexQ][x,y]:= Active[cIndexQ][x,y] or (Change <> 0);
Active[cIndexQ][x+1,y+1]:= Active[cIndexQ][x+1,y+1] or ((Change and $cc000000) <> 0);
Active[cIndexQ][x+1,y]:= Active[cIndexQ][x+1,y] or ((Change and $ff000000) <> 0);
Active[cIndexQ][x,y+1]:= Active[cIndexQ][x,y+1] or ((Change and $cccccccc) <> 0);
end; {while}
end;
The above is a code snippet of a test program that calculates conway's game of life.
The code needs to be as fast as possible. And for this purpose I'm trying different approaches.
It walks though a grid of active cells, looks to see which cells are active and puts those
on a stack.
Next it processes the items on the stack and sees which cells have changed.
If a cell has changed it updates the changes into the grid for the next generation.
I store cells in 32bit cardinals (4 bits Y, 8 bits X) and the P (even) generations are offset 1,1 pixel relative to the Q (odd) generations, this way I only have to take 3 neighbors into account instead of 8.
Question
I want to get rid of the grid, I just want to deal with the stack.
How do I implement a stack that eliminates duplicates?
Note that it needs to be as fast as possible and I'm not above using dirty tricks to get that.
if i understood what you asked you want the stack to have no duplication values. i'm not a delphi person but if it was java i would created a hashmap/ map tree and add each value to the map and before adding it to the stack check if it's already in the hash. you can also add all the values th the hash iterate it but you will loose the order of the hash.
Personally I'd take a completely different approach. First I don't see how you don't have to take all neighbours into account just because of using a 1,1 offset and then I doubt that bitshifting tricks make the algorithm much faster (often enough it's the contrary, but then it could be mem bandwidth constrained in which case we'd win a bit)
So I'd just go for the one thing that should bring by far the largest performance gain: Making the algorithm multithreaded. In our world of Quad/Hex/Octacores worrying about a few percent performance increases while wasting 300% or more seems silly. So if we'd ignore the active grids and check all fields the algorithm would be trivial with some great scaling, especially since one could easily vectorize the algorithm, but then that's not especially work efficient so I'd try some different approaches towards multithreading an algorithm that only takes the active cells in account.
First instead of getting rid of the grid I'd double it: One src and one dest grid - that are swapped each round. No locking to access the grid necessary, don't have to worry about when updating the fields and no stale entries (important for multithreading we want to use the cache after all).
Now the simplest solution would be to use some kind of concurrent list structure (no idea about delphi libraries) for the active cells and let each thread steal from it and add new active cells to another. With a good lock-free implementation of a concurrent queue (basically whatever the replacement of this is in delphi) or something similar could be quite nice and simple. For a better performance instead of adding single nodes to the list, I'd think about adding whole chunks to the list, say in sizes of 10 or so - more work with less overhead but if we make the chunks too large we lose parallelism.
I can think of other solutions like giving every thread one list of active cells to work through (or more exactly one list for all and different offsets) but then we have to between each run gather all new entries (not much synchronization overhead but some copying) into a list - worth a try I assume.
If your goal is speed (and only speed). There is a few tricks that can speed things up a LOT. My own implementation of the Conway's Game of Life use those tricks to make it faster. Note that it is VERY expensive on memory.
Each cells are an object
Each cell object contains its X/Y coordinates
Each cell object contains a "live" counters of the number of Alive neighbors. (When a cell turns On/Off, it notify it's neighbor so they update their counters.
To make #3 works, when the next generation is calculated, cells are not turned On/Off right away. They are instead stacked into a list until all cells are calculated.
Each cell has a counter which indicate which is the last generation they changed on. That avoid calculating the same cell twice. (My alternative to the stack that eliminates duplicates)
The list of #5 is reused on the next generation, as only the neighbors of a cell that changed on the previous generation can change on the current one.
There are some of the tricks I use to speed up the generation. Some of the tricks listed here will get you a lot more than multithreading your implementation. But using both those and multithread will get the most performance possible.
As for the multithread subject, read Voo's entry.
I've been thinking about it and I think I have a solution.
some background
Here's how the data is in laid out in memory
00 A 08 B 10 18 The bits of Individual int32's are layout like this:
01 | 09 | 11 19 00 04 08 0C 10 14 18 1C // N-Mask: $33333333
02 | 0A | 12 1A 01 05 09 0D 11 15 19 1D // S-Mask: $cccccccc
03 | 0B | 13 1B 02 06 0A 0E 12 16 1A 1E // W-Mask: $000000ff
04 | 0C | 14 1C 03 07 0B 0F 13 17 1B 1F // E-Mask: $ff000000
05 | 0D | 15 1D //SE-Mask: $cc000000
06 | 0E | 16 1E //NW-Mask: $00000033
07 V 0F V 17 1F I can mask of different portions if need be.
-- Figure A: Grid -- -- Figure B: cell -- -- Table C: masks --
I haven't decided on the size of the building block, but this is the general idea.
Even generations are called P, odd generations are called Q.
They are staggered like this
+----------------+<<<<<<<< P 00 04 08 0C //I use a 64K lookup
|+---------------|+ 01 05* 09* 0D //table to lookup
|| || 02 06* 0A* 0E //the inner* 2x2 bits from
|| || 03 07 0B 0F //a 4x4 grid.
+----------------+| //I need to do 8 lookups for a 32 bit cell
+----------------+<<<<<<<< Q
- Figure D: Cells are staggered - -- Figure E: lookup --
This way when generating P -> Q, I only need to look at P itself and its S, SE, E neighbors, instead of all 8 neighbors, ditto for Q -> P. I need only look at Q itself and its N, NW and W neighbors.
Also notice that the staggering saves me time in translating the result of the lookup, because I have to do less bit shifting to put the results in place.
When I loop though a grid (Figure A) I walk though the cells (Figure B) in the order shown in figure A. Always in strictly increasing order in a P-cycle and always in decreasing order in a Q-cycle.
In fact the Q cycle works in exactly the opposite order from the P-cycle, this speeds things up by reusing the cache as much as possible.
I want to minimize using pointers as much as possible, because pointers cannot be predicted and are not accessed sequentially (they jump all over the place) So I want to use arrays, stacks and queues as much as possible.
What data do to need to keep track of
I need to keep track of only the cells that change. If a cell (that is an int32) does not change from one generation to the next I remove it from consideration.
This is what the code in the question does. It uses a grid to keep track of the changes, but I want to use a stack, not a grid; and I only want to deal with active cells I don't want to know about stable or dead cells.
Some background on the data
Notice how the cell itself is always monotonically increasing. As is its S-neighbor, as well as the E and SE-neighbor. I can use this info to cheat.
The solution
I use a stack to keep track of the cell itself and its S neighbor and a queue to keep track of its E and SE neighbor and when I'm done I merge the two.
Suppose in the Grid the following cells come out as active after I've calculated them:
00, 01, 08 and 15
I make the following two stacks:
stack A stack B
00 08 a) -A: Cell 00 itself in stack A and its E-neighbor in B
01 09 a) Cell 00's S neighbor in stack A and its SE-n'bor in B
02 0A b) -B: Cell 01 is already in the stack, we only add S/SE
08 10 c) -C: Cell 08 goes into the stack as normal
09 11 c) We'll sort out the merge later.
15 1D d) -D: Cell 15 and neighbors go on as usual.
16 1E d)
Now I push members from stack A and B onto a new stack C so that stack C has
no duplicates and it strictly increasing:
Here's the pseudo code to process the two queues:
a:= 0; b:= 0; c:=0;
while not done do begin
if stack[a] <= stack[b] then begin
stack[c]:= stack[a]; inc(a); inc(c);
if stack[a] = stack[b] then inc(b);
end
else begin
stack[c]:= stack[b]; inc(b); inc(c);
end;
end; {while}
And even better
I don't have to actually do the two stacks and the merging as two separate steps, if I make A a stack and B a queue, I can do the second step described in the pseudo code and the building of the two stacks in one pass.
Note
As a cell changes its S, E or SE border does not necessary need to change, but I can test for that using the masks in table C, and only add the cells that really need checking in the next generation to the list.
Benefits
Using this scheme, I only ever have to walk through one stack with active cells when calculating cells, so I don't waste time looking at dead or inactive cells.
I only do sequential memory accesses, maximizing cache usage.
Building the stack with new changes for the next generations only requires one extra temporary queue, which I process in strictly sequential order.
I do no sorting and the minimum of comparisons.
I don't have to keep track of the neighbors of each individual cells (int32), I only need to keep track of the neighbors (S,E,SE, N,W,NW) of the grids, this keeps the memory overhead to a minimum.
I don't need to keep track of a cells status, I only need to count dead cells (A cell is either dead, because it was dead before, or because it changed into dead. All the active cells are in my TODO stack, this saves bookkeeping time and memory space.
The algorithm runs in o(n) time where (n) is the number of active cells, it excludes dead cells, stable cells and cells that oscillate with period 2.
I only ever deal with 32 bit cardinals, which is the much faster than using int16's.
Mostly #Ken, the complete sourcecode for the test program:
Note that 99,9% of the time is spend in displaying, because I haven't done anything to
optimize that.
I've created a new SDI-main app and posted the code in that and because I'm lazy I haven't bothered to rename or repaint any controls.
Project file: sdiapp.dpr
program Sdiapp;
uses
Forms,
SDIMAIN in 'SDIMAIN.pas'; {Form1}
{$R *.RES}
begin
Application.Initialize;
Application.CreateForm(TForm1, Form1);
Application.Run;
end.
Main form: sdimain.pas
unit SDIMAIN;
interface
uses Windows, Classes, Graphics, Forms, Controls, Menus,
Dialogs, StdCtrls, Buttons, ExtCtrls, ComCtrls, ImgList, StdActns,
ActnList, ToolWin;
{--------------------------------------------
p and q are bit arrays of 16x16 bits, grouped
as in 8 int32's as follows
P00 P04 P08 P0c P10 P14 P18 P1c
P01 P05 P09 P0d P11 P15 P19 P1d
P02 P06 P0a P0e P12 P16 P1a P1e
P03 P07 P0b P0f P13 P17 P1b P1f
|
+----> The bits per int32 are grouped as follows
The int32's are grouped as follows
P0 P1
P2 P3
P4 P5
P6 P7
P and Q are staggered as follows:
+---------------------------------+ <---- P
| +-------------------------------|-+ <----Q
| | | |
| | | |
... ...
| | | |
+-|-------------------------------+ |
+---------------------------------+
Generations start counting from 0,
all even generations are stored in P.
all odd generations are stored in Q.
When generating P->Q, the S, SE and E neighbors are checked.
When generating Q->P, the N, NW and W neighbors are checked.
The westernmost P edge in a grid is stored inside that grid.
Ditto for all easternmost Q edges.
--------------------------------------------}
const
cClearQState = $fffffff0;
cClearPState = $fffff0ff;
cIndexQ = 1;
cIndexP = 0;
ChangeSelf = 0;
ChangeNW = 1;
ChangeW = 2;
ChangeN = 3;
ChangeSE = 1;
ChangeE = 2;
ChangeS = 3;
const
//A Grid is 128 x 128 pixels.
GridSizeX = 512 div 8; //should be 128/8, 1024 for testing.
GridSizeY = GridSizeX * 2; //32 totaal: 16x32x4bytes = 2048 x 2 (p+q) = 4k per block.
GridMaxX = GridSizeX - 1;
GridMaxY = GridSizeY - 1;
NumberOfCells = GridSizeX * GridSizeY;
CellSizeX = 8;
CellSizeY = 4;
CellMaxX = CellSizeX - 1;
CellMaxY = CellSizeY - 1;
type
TUnit = Cardinal;
TBytes = array[0..3] of byte;
TChange = array[0..3] of boolean;
type
TCellBlock = class;
TFlags = record
case boolean of
true: (whole: cardinal);
false: (part: array[0..3] of byte);
end;
//TActiveList = array[0..GridMaxX, 0..GridMaxY] of boolean;
//TActive = array[0..1] of TActiveList;
TToDoList = array[-1..NumberOfCells] of cardinal; //Padding on both sides.
TNewRow = TFlags;
PCell = ^TCell;
TCell = record
public
p: TUnit;
q: TUnit;
procedure SetPixel(x,y: integer; InP: Boolean = true);
function GeneratePtoQ: cardinal; inline;
function GenerateQtoP: cardinal; inline;
end;
//A grid contains pointers to an other grid, a unit or nil.
//A grid can contain grids (and nils) or units (and nils), but not both.
PGrid = ^TGrid;
TGrid = array[0..GridMaxX,0..GridMaxY] of TCell;
TCellBlock = class(TPersistent)
private
FHasCells: boolean;
FLevel: integer;
FGrid: TGrid;
ToDoP: TToDoList;
ToDoQ: TToDoList;
PCount: integer;
QCount: integer;
FParent: TCellBlock;
FMyX,FMyY: integer;
N,W,NW: TCellBlock;
S,E,SE: TCellBlock;
procedure GeneratePtoQ; virtual;
procedure GenerateQtoP; virtual;
procedure UpdateFlagsPtoQ; virtual;
procedure UpdateFlagsQtoP; virtual;
procedure Generate; virtual;
procedure Display(ACanvas: TCanvas); virtual;
procedure SetPixel(x,y: integer);
property Grid: TGrid read FGrid write FGrid;
public
constructor Create(AParent: TCellBlock);
destructor Destroy; override;
property Parent: TCellBlock read FParent;
property HasCells: boolean read FHasCells;
property Level: integer read FLevel;
property MyX: integer read FMyX;
property MyY: integer read FMyY;
end;
TCellParent = class(TCellBlock)
private
procedure GeneratePtoQ; override;
procedure GenerateQtoP; override;
//procedure Display(Startx,StartY: integer; ACanvas: TCanvas); override;
public
constructor CreateFromChild(AChild: TCellBlock; ChildX, ChildY: integer);
constructor CreateFromParent(AParent: TCellParent);
destructor Destroy; override;
end;
type
TForm1 = class(TForm)
ToolBar1: TToolBar;
ToolButton9: TToolButton;
ToolButton1: TToolButton;
ToolButton2: TToolButton;
ToolButton3: TToolButton;
ToolButton4: TToolButton;
ActionList1: TActionList;
FileNew1: TAction;
FileOpen1: TAction;
FileSave1: TAction;
FileSaveAs1: TAction;
FileExit1: TAction;
EditCut1: TEditCut;
EditCopy1: TEditCopy;
EditPaste1: TEditPaste;
HelpAbout1: TAction;
StatusBar: TStatusBar;
ImageList1: TImageList;
Image1: TImage;
Timer1: TTimer;
Label1: TLabel;
procedure FileNew1Execute(Sender: TObject);
procedure FileSave1Execute(Sender: TObject);
procedure FileExit1Execute(Sender: TObject);
procedure Timer1Timer(Sender: TObject);
procedure FileOpen1Execute(Sender: TObject);
procedure ToolButton4Click(Sender: TObject);
private
MyBlock: TCellBlock;
MyBitmap: TBitmap;
BitmapData: array[0..1024,0..(1024 div 32)] of integer;
procedure InitLookupTable;
procedure RestartScreen;
public
{ Public declarations }
end;
var
Form1: TForm1;
const
cLiveCell = $88888888;
cLiveVerticalP = $40404040;
cLiveVerticalQ = $04040404;
cLiveTop = $00000088;
cLiveBottom = $88000000;
cLivePCorner = $00000040;
cLiveQCorner = $04000000;
cUnstableCell = $22222222;
cUnstableVerticalP = $10101010;
cUnstableVerticalQ = $01010101;
cUnstableTop = $00000022;
cUnstableBottom = $22000000;
cUnstablePCorner = $00000010;
cUnstableQCorner = $01000000;
cAllDead = $00000000;
cAllLive = $ffffffff;
cLiveRow = $8;
cLive2x2 = $4;
cUnstableRow = $2;
cUnstable8x4 = $22;
cUnstable2x2 = $1;
cUnstable2x4 = $11;
cStateMask: array [0..7] of cardinal =
($fffffff0, $ffffff0f, $fffff0ff, $ffff0fff, $fff0ffff, $ff0fffff, $f0ffffff, $0fffffff);
var
LookupTable: array[0..$FFFF] of byte;
Generation: int64;
implementation
uses about, sysutils, clipbrd, Math;
{$R *.dfm}
type
bool = longbool;
procedure getCPUticks(var i : int64);
begin
asm
mov ECX,i;
RDTSC; //cpu clock in EAX,EDX
mov [ECX],EAX;
mov [ECX+4],EDX;
end;
end;
function IntToBin(AInt: integer): string;
var
i: integer;
begin
i:= SizeOf(AInt)*8;
Result:= StringOfChar('0',i);
while (i > 0) do begin
if Odd(AInt) then Result[i]:= '1';
AInt:= AInt shr 1;
Dec(i);
end; {while}
end;
constructor TCellBlock.Create(AParent: TCellBlock);
begin
inherited Create;
FParent:= AParent;
ToDoQ[-1]:= $ffffffff;
ToDoP[-1]:= $ffffffff;
end;
destructor TCellBlock.Destroy;
begin
inherited Destroy;
end;
procedure TCell.SetPixel(x: Integer; y: Integer; InP: Boolean = true);
var
Mask: cardinal;
Offset: Integer;
begin
//0,0 is the topleft pixel, no correction for p,q fase.
x:= x mod 8;
y:= y mod 4;
Offset:= x * 4 + y;
Mask:= 1 shl Offset;
if (InP) then p:= p or Mask else q:= q or Mask;
end;
procedure TCellBlock.SetPixel(x: Integer; y: Integer);
var
GridX, GridY: integer;
x1,y1: integer;
i: integer;
begin
x:= x + (GridSizeX div 2) * CellSizeX;
y:= y + (GridSizeY div 2) * CellSizeY;
if Odd(Generation) then begin
Dec(x); Dec(y);
QCount:= 0;
end
else PCount:= 0;
GridX:= x div CellSizeX;
GridY:= y div CellSizeY;
if (GridX in [0..GridMaxX]) and (GridY in [0..GridMaxY]) then begin
Grid[GridX,GridY].SetPixel(x,y);
i:= 0;
for x1:= 1 to GridMaxX-1 do begin
for y1:= 1 to GridMaxY-1 do begin
case Odd(Generation) of
false: begin
ToDoP[i]:= (x1 shl 16 or y1);
Inc(PCount);
end;
true: begin
ToDoQ[i]:= (x1 shl 16 or y1);
Inc(QCount);
end;
end; {case}
Inc(i);
end; {for y}
end; {for x}
end; {if}
end;
//GeneratePtoQ
//This procedure generates the Q data and QState-flags
//using the P-data and PState-flags.
procedure TCellBlock.Generate;
begin
if Odd(Generation) then GenerateQtoP
else GeneratePtoQ;
Inc(Generation);
end;
const
MaskS = $cccccccc;
MaskE = $ff000000;
MaskSE = $cc000000;
procedure TCellBlock.GeneratePtoQ;
var
x,y: integer;
i: integer;
Change: cardinal;
ToDoA: TToDoList;
ToDoB: TToDoList;
A, B: integer;
done: boolean;
Address: cardinal;
begin
i:= 0;
A:= 0; B:= 0;
ToDoA[-1]:= $ffffffff;
ToDoB[-1]:= $ffffffff;
while (i < PCount) do begin
y:= ToDoP[i] and $FFFF;
x:= ToDoP[i] shr 16;
Inc(i);
if (x = GridMaxX) or (y = GridMaxY) then continue; //Skip the loop.
Change:= Grid[x,y].GeneratePtoQ;
if (Change <> 0) then begin
Address:= (x shl 16 or y);
if ToDoA[A-1] <> Address then begin
ToDoA[A]:= Address; Inc(A);
end;
if (Change and MaskS) <> 0 then begin
ToDoA[A]:= Address + 1;
Inc(A);
end; {if S changed}
if ((Change and MaskE) <> 0) then begin
Address:= Address + (1 shl 16);
if ToDoB[B-1] <> Address then begin
ToDoB[B]:= Address;
Inc(B);
end;
if ((Change and MaskSE) <> 0) then begin
ToDoB[B]:= Address + 1;
Inc(B);
end; {if SE changed}
end; {if E changed}
end; {if whole cell changed}
end; {while}
ToDoA[A]:= $ffffffff;
ToDoB[B]:= $ffffffff;
ToDoB[B+1]:= $ffffffff;
a:= 0; b:= 0; QCount:= 0;
Done:= (ToDoA[a] = $ffffffff) and (ToDoB[b] = $ffffffff);
while not done do begin
if ToDoA[a] <= ToDoB[b] then begin
ToDoQ[QCount]:= ToDoA[a]; inc(a); inc(QCount);
if ToDoA[a] = ToDoB[b] then inc(b);
end
else begin
ToDoQ[QCount]:= ToDoB[b]; inc(b); inc(QCount);
end;
Done:= (ToDoA[a] = $ffffffff) and (ToDoB[b] = $ffffffff);
end; {while}
end;
const
MaskN = $33333333;
MaskW = $000000ff;
MaskNW = $00000033;
procedure TCellBlock.GenerateQtoP;
var
x,y: integer;
i: integer;
Change: cardinal;
ToDoA: TToDoList;
ToDoB: TToDoList;
A, B: integer;
done: boolean;
Address: cardinal;
begin
i:= 0;
A:= 0; B:= 0;
ToDoA[-1]:= $ffffffff;
ToDoB[-1]:= $ffffffff;
while (i < QCount) do begin
y:= ToDoQ[i] and $FFFF;
x:= ToDoQ[i] shr 16;
Inc(i);
if (x = 0) or (y = 0) then Continue; //Skip the rest of the loop.
Change:= Grid[x,y].GenerateQtoP;
if (Change <> 0) then begin
Address:= (x shl 16 or y);
if ToDoA[A-1] <> Address then begin
ToDoA[A]:= Address; Inc(A);
end;
if (Change and MaskN) <> 0 then begin
ToDoA[A]:= Address - 1;
Inc(A);
end; {if N changed}
if ((Change and MaskW) <> 0) then begin
Address:= Address - (1 shl 16);
if ToDoB[B-1] <> Address then begin
ToDoB[B]:= Address;
Inc(B);
end;
if ((Change and MaskNW) <> 0) then begin
ToDoB[B]:= Address - 1;
Inc(B);
end; {if NW changed}
end; {if W changed}
end; {if whole cell changed}
end; {while}
ToDoA[A]:= $ffffffff;
ToDoB[B]:= $ffffffff;
ToDoB[B+1]:= $ffffffff;
a:= 0; b:= 0; PCount:= 0;
Done:= (ToDoA[a] = $ffffffff) and (ToDoB[b] = $ffffffff);
while not done do begin
if ToDoA[a] <= ToDoB[b] then begin
ToDoP[PCount]:= ToDoA[a]; inc(a); inc(PCount);
if ToDoA[a] = ToDoB[b] then inc(b);
end
else begin
ToDoP[PCount]:= ToDoB[b]; inc(b); inc(PCount);
end;
Done:= (ToDoA[a] = $ffffffff) and (ToDoB[b] = $ffffffff);
end; {while}
end;
(*
procedure TCellBlock.GenerateQtoP;
var
x,y: integer;
i: integer;
Change: cardinal;
begin
i:= 0;
for x:= 0 to GridMaxX - 1 do begin
for y:= 0 to GridMaxY -1 do begin
if Active[cIndexQ][x, y] then begin
Active[cIndexQ][x, y]:= false;
ToDo[i]:= x shl 16 or y;
Inc(i);
end; {if}
end; {for y}
end; {for x}
while i > 0 do begin
Dec(i);
y:= ToDo[i] and $FFFF;
x:= ToDo[i] shr 16;
Change:= Grid[x,y].GenerateQtoP;
Active[cIndexP][x,y]:= Active[cIndexP][x,y] or (Change <> 0);
Active[cIndexP][x-1,y-1]:= Active[cIndexP][x-1,y-1] or ((Change and $00000033) <> 0);
Active[cIndexP][x-1,y]:= Active[cIndexP][x-1,y] or ((Change and $000000ff) <> 0);
Active[cIndexP][x,y-1]:= Active[cIndexP][x,y-1] or ((Change and $33333333) <> 0);
end; {while}
end; (**)
procedure TCellBlock.UpdateFlagsPtoQ;
begin
//nog in te vullen.
end;
procedure TCellBlock.UpdateFlagsQtoP;
begin
//nog in te vullen
end;
function TCell.GeneratePtoQ: cardinal;
var
NewQ: cardinal;
Change: cardinal;
const
Mask1 = $f;
Mask2 = $ff;
Mask4 = $ffff;
Row1Mask = $33333333; //0011-0011-0011-0011-0011-0011-0011-0011
Row2Mask = $cccccccc; //1100-1100-1100-1100-1100-1100-1100-1100
function MakeNewBrick(p0,p1,p2,p3: cardinal): cardinal; inline;
var
Row1, Row2: cardinal;
begin
//Generate new Brick using a 2x2 grid of bricks ordered like:
//p0 p1
//p2 p3
//First row inside P0
if (p0 <> 0) then Row1:=
LookupTable[p0 and $ffff] or
LookupTable[(p0 shr 8) and $ffff] shl 8 or
LookupTable[(p0 shr 16)] shl 16 or
LookupTable[(p0 shr 24) or (p1 and $ff) shl 8] shl 24
else Row1:= LookupTable[(p1 and $ff) shl 8] shl 24;
(**)
p0:= ((p0 and $cccccccc)) or ((p2 and $33333333));
p1:= ((p1 and $cc)) or ((p3 and $33));
if (p0 <> 0) then Row2:=
LookupTable[p0 and $ffff] or
LookupTable[(p0 shr 8) and $ffff] shl 8 or
LookupTable[(p0 shr 16)] shl 16 or
LookupTable[(p0 shr 24) or ((p1 and $ff) shl 8)] shl 24
else Row2:= LookupTable[(p1 and $ff) shl 8] shl 24;
Result:= (Row1 and Row1Mask) or (Row2 and Row2Mask);
end;
begin
NewQ:= MakeNewBrick(Self.p, PGrid(#Self)^[1,0].p, PGrid(#Self)^[0,1].p, PGrid(#Self)^[1,1].p);
Result:= NewQ xor q;
q:= NewQ;
end;
function TCell.GenerateQtoP: cardinal;
var
Offset: integer;
NewP: cardinal;
Change: cardinal;
const
Row1Mask = $33333333; //0011-0011-0011-0011-0011-0011-0011-0011
Row2Mask = $cccccccc; //1100-1100-1100-1100-1100-1100-1100-1100
function MakeNewBrick(q0,q1,q2,q3: cardinal): cardinal; inline;
var
Row1, Row2: cardinal;
begin
//Generate new Brick using a 2x2 grid of bricks ordered like:
//q3 q2
//q1 q0
if (q0 <> 0) then Row1:=
LookupTable[(q0 shr 16)] shl 26 or
LookupTable[(q0 shr 8 ) and $ffff] shl 18 or
LookupTable[(q0 ) and $ffff] shl 10 or
LookupTable[((q0 and $ff) shl 8) or (q1 shr 24)] shl 2
else Row1:= LookupTable[(q1 shr 24)] shl 2;
(*
q0:= ((q0 and $33333333) shl 2) or ((q2 and $cccccccc) shr 2);
q1:= ((q1 and $33000000) shl 2) or ((q3 and $cc000000) shr 2);
if (q0 <> 0) then Row2:=
LookupTable[(q0 shr 16) and $ffff] shl 24 or
LookupTable[(q0 shr 8) and $ffff] shl 16 or
LookupTable[(q0 ) and $ffff] shl 8 or
LookupTable[((q0 and $ff) shl 8) or (q1 shr 24)]
else Row2:= LookupTable[(q1 shr 24)];
(**)
q0:= ((q0 and $33333333)) or ((q2 and $cccccccc));
q1:= ((q1 and $33000000)) or ((q3 and $cc000000));
if (q0 <> 0) then Row2:=
LookupTable[(q0 shr 16)] shl 22 or
LookupTable[(q0 shr 8) and $ffff] shl 14 or
LookupTable[(q0 ) and $ffff] shl 6 or
LookupTable[((q0 and $ff) shl 8) or (q1 shr 24)] shr 2
else Row2:= LookupTable[(q1 shr 24)] shr 2;
Result:= (Row1 and Row2Mask) or (Row2 and Row1Mask);
end;
begin
Offset:= -1;
NewP:= MakeNewBrick(Self.q, PGrid(#Self)^[Offset,0].q, PGrid(#Self)^[0,Offset].q, PGrid(#Self)^[Offset, Offset].q);
Result:= NewP xor P;
P:= NewP;
end;
procedure TCellBlock.Display(ACanvas: TCanvas);
var
GridX,GridY: integer;
//Offset: integer;
procedure DisplayCell(ACell: TCell);
var
x,y,x1,y1: integer;
Row, Mask: integer;
DoPixel: boolean;
Offset: integer;
DrawOffset: integer;
InP: boolean;
begin
DrawOffset:= (Generation and 1);
InP:= not(Odd(Generation));
for y:= 0 to CellMaxY do begin
for x:= 0 to CellMaxX do begin
//if (x = 0) or (y = 0) then ACanvas.Pixels[GridX*16+x+Offset,GridY*16+y+Offset]:= clBtnFace;
//0,0 is the topleft pixel, no correction for p,q fase.
x1:= x mod 8;
y1:= y mod 4;
Offset:= x1 * 4 + y1;
Mask:= 1 shl Offset;
if (InP) then DoPixel:= (ACell.p and Mask) <> 0
else DoPixel:= (ACell.q and Mask) <> 0;
if DoPixel then ACanvas.Pixels[GridX*CellSizeX+x+DrawOffset, GridY*CellSizeY+y+DrawOffset]:= clBlack;
end; {for x}
end; {for y}
end; (**)
begin
ACanvas.Rectangle(-1,-1,1000,1000);
FillChar(Form1.BitmapData, SizeOf(Form1.BitmapData), #0);
for GridY:= 0 to GridMaxY do begin
for GridX:= 0 to GridMaxX do begin
if Int64(Grid[GridX, GridY]) <> 0 then begin
DisplayCell(Grid[GridX,GridY]);
end;
end;
end;
end;
//--------------------------------------
//A Parent is every layer above the ground level
//the tree grows from the bottom up.
//A new parent is placed on top of the last one and
//always has one and only one child to start with, from there
//the tree grows down again.
constructor TCellParent.CreateFromChild(AChild: TCellBlock; ChildX: Integer; ChildY: Integer);
begin
inherited Create(nil);
end;
constructor TCellParent.CreateFromParent(AParent: TCellParent);
begin
inherited Create(AParent);
end;
destructor TCellParent.Destroy;
begin
inherited Destroy;
end;
procedure TCellParent.GeneratePtoQ;
begin
end;
procedure TCellParent.GenerateQtoP;
begin
end;
//The bitmap for the lookup table is as follows:
// 0 2 4 6
// +----+
// 1 |3 5| 7
// 8 |A C| E
// +----+
// 9 B D F
// The inner 2x2 cells are looked up.
// so 0241358AC make up bit 3 etc.
procedure TForm1.InitLookupTable;
const
//Masks for normal order.
MaskNW = $0757; //0000-0111-0101-0111
MaskSW = $0EAE; //0000-1110-1010-1110
MaskNE = $7570; //0111-0101-0111-0000
MaskSE = $EAE0; //1110-1010-1110-0000
//Bitlocations for normal order
BitNW = $0020; //0000-0000-0010-0000
BitSW = $0040; //0000-0000-0100-0000
BitNE = $0200; //0000-0020-0000-0000
BitSE = $0400; //0000-0100-0000-0000
//Lookup table also has a shifted order. here the bottom half of the N word
//and the top half of the south word combine.
//Like so:
// 2 6 A E
// 3 7 B F
// 0 4 8 C
// 1 5 9 D
//Mask for split order.
Mask2NW = $0D5D; // 0000-1101-0101-1101
Mask2SW = $0BAB; // 0000-1011-1010-1011
Mask2NE = $D5D0; // 1101-0101-1101-0000
Mask2SE = $BAB0; // 1011-1010-1011-0000
//Bitlocations for split order
Bit2NW = $0080; // 0000-0000-1000-0000
Bit2SW = $0010; // 0000-0000-0001-0000
Bit2NE = $0800; // 0000-1000-0000-0000
Bit2SE = $0100; // 0000-0001-0000-0000
ResultNW = $01;
ResultSW = $02;
ResultNE = $10;
ResultSE = $20;
Result2NW = $04;
Result2SW = $08;
Result2NE = $40;
Result2SE = $80;
var
i: integer;
iNW, iNE, iSW, iSE: cardinal;
Count: integer;
ResultByte: byte;
function GetCount(a: integer): integer;
var
c: integer;
begin
Result:= 0;
for c:= 0 to 15 do begin
if Odd(a shr c) then Inc(Result);
end; {for c}
end; {GetCount}
begin
//Fill the normal lookup.
for i:= 0 to $ffff do begin
ResultByte:= 0;
iNW:= i and MaskNW;
Count:= GetCount(iNW);
case Count of //count excluding bit itself
3: ResultByte:= ResultNW;
2: if ((i and BitNW) <> 0) then ResultByte:= ResultNW;
end;
iSW:= i and MaskSW;
Count:= GetCount(iSW);
case Count of
3: ResultByte:= ResultByte or ResultSW;
2: if ((i and BitSW) <> 0) then ResultByte:= ResultByte or ResultSW;
end;
iNE:= i and MaskNE;
Count:= GetCount(iNE);
case Count of
3: ResultByte:= ResultByte or ResultNE;
2: if ((i and BitNE) <> 0) then ResultByte:= ResultByte or ResultNE;
end;
iSE:= i and MaskSE;
Count:= GetCount(iSE);
case Count of
3: ResultByte:= ResultByte or ResultSE;
2: if ((i and BitSE) <> 0) then ResultByte:= ResultByte or ResultSE;
end;
LookupTable[i]:= ResultByte;
end; {for i}
//Fill the shifted lookup.
for i:= 0 to $ffff do begin
ResultByte:= 0;
iNW:= i and Mask2NW;
Count:= GetCount(iNW);
case Count of //count excluding bit itself
3: ResultByte:= Result2NW;
2: if ((i and Bit2NW) <> 0) then ResultByte:= Result2NW;
end;
iSW:= i and Mask2SW;
Count:= GetCount(iSW);
case Count of
3: ResultByte:= ResultByte or Result2SW;
2: if ((i and Bit2SW) <> 0) then ResultByte:= ResultByte or Result2SW;
end;
iNE:= i and Mask2NE;
Count:= GetCount(iNE);
case Count of
3: ResultByte:= ResultByte or Result2NE;
2: if ((i and Bit2NE) <> 0) then ResultByte:= ResultByte or Result2NE;
end;
iSE:= i and Mask2SE;
Count:= GetCount(iSE);
case Count of
3: ResultByte:= ResultByte or Result2SE;
2: if ((i and Bit2SE) <> 0) then ResultByte:= ResultByte or Result2SE;
end;
LookupTable[i]:= LookupTable[i] or ResultByte;
end; {for i} (**)
end;
procedure TForm1.RestartScreen;
begin
MyBlock.Free;
MyBlock:= TCellBlock.Create(nil);
//MyBlock.SetPixel(5,7);
//MyBlock.SetPixel(6,7);
//MyBlock.SetPixel(7,7);
//MyBlock.SetPixel(7,6);
//MyBlock.SetPixel(6,5);
MyBlock.SetPixel(10,0);
MyBlock.SetPixel(11,0);
MyBlock.SetPixel(9,1);
MyBlock.SetPixel(10,1);
MyBlock.SetPixel(10,2);
end;
procedure TForm1.Timer1Timer(Sender: TObject);
begin
if Assigned(MyBlock) then begin
MyBlock.Generate;
MyBlock.Display(Image1.Canvas);
end;
end;
procedure TForm1.ToolButton4Click(Sender: TObject);
begin
if Assigned(MyBlock) then begin
MyBlock.Generate;
MyBlock.Display(Image1.Canvas);
end;
end;
procedure TForm1.FileNew1Execute(Sender: TObject);
begin
InitLookupTable;
FillChar(BitmapData, SizeOf(BitmapData), #0);
MyBitmap:= TBitmap.Create;
MyBitmap.SetSize(1024,1024);
MyBitmap.PixelFormat:= pf1bit;
MyBitmap.Monochrome:= true;
//MyBitmap.Handle:= CreateBitmap(1000,1000,1,2,nil);
Generation:= 0;
RestartScreen;
MyBlock.Display(Image1.Canvas);
//if (Sender = FileNew1) then Timer1.Enabled:= not(Timer1.Enabled);
end;
procedure TForm1.FileOpen1Execute(Sender: TObject);
var
i,a: integer;
start, eind: int64;
Diff: double;
LowDiff: double;
begin
LowDiff:= MaxInt;
for a:= 0 to 10 do begin
FileNew1Execute(Sender);
GetCPUTicks(start);
for i:= 0 to 1000 do begin
MyBlock.Generate;
end;
GetCPUTicks(eind);
//Label1.Caption:= IntToStr(Eind - Start);
Diff:= Eind - start;
LowDiff:= Min(Diff, LowDiff);
Label1.Caption:= Format('%10.0n',[lowdiff]) + ' CPU cycles per 1,000 generations';
Clipboard.AsText:= Label1.Caption;
end; {for a}
MyBlock.Display(Image1.Canvas);
end;
procedure TForm1.FileSave1Execute(Sender: TObject);
begin
Timer1.Enabled:= not(Timer1.Enabled);
end;
procedure TForm1.FileExit1Execute(Sender: TObject);
begin
Close;
end;
initialization
Generation:= 0;
end.
Stackoverflow does not allow me to post the form file due to a size limit, but I hope you can manage without.

how to improve the code (Delphi) for loading and searching in a dictionary?

I'm a Delphi programmer.
I have made a program who uses dictionaries with words and expressions (loaded in program as "array of string").
It uses a search algorithm based on their "checksum" (I hope this is the correct word).
A string is transformed in integer based on this:
var
FHashSize: Integer; //stores the value of GetHashSize
HashTable, HashTableNoCase: array[Byte] of Longword;
HashTableInit: Boolean = False;
const
AnsiLowCaseLookup: array[AnsiChar] of AnsiChar = (
#$00, #$01, #$02, #$03, #$04, #$05, #$06, #$07,
#$08, #$09, #$0A, #$0B, #$0C, #$0D, #$0E, #$0F,
#$10, #$11, #$12, #$13, #$14, #$15, #$16, #$17,
#$18, #$19, #$1A, #$1B, #$1C, #$1D, #$1E, #$1F,
#$20, #$21, #$22, #$23, #$24, #$25, #$26, #$27,
#$28, #$29, #$2A, #$2B, #$2C, #$2D, #$2E, #$2F,
#$30, #$31, #$32, #$33, #$34, #$35, #$36, #$37,
#$38, #$39, #$3A, #$3B, #$3C, #$3D, #$3E, #$3F,
#$40, #$61, #$62, #$63, #$64, #$65, #$66, #$67,
#$68, #$69, #$6A, #$6B, #$6C, #$6D, #$6E, #$6F,
#$70, #$71, #$72, #$73, #$74, #$75, #$76, #$77,
#$78, #$79, #$7A, #$5B, #$5C, #$5D, #$5E, #$5F,
#$60, #$61, #$62, #$63, #$64, #$65, #$66, #$67,
#$68, #$69, #$6A, #$6B, #$6C, #$6D, #$6E, #$6F,
#$70, #$71, #$72, #$73, #$74, #$75, #$76, #$77,
#$78, #$79, #$7A, #$7B, #$7C, #$7D, #$7E, #$7F,
#$80, #$81, #$82, #$83, #$84, #$85, #$86, #$87,
#$88, #$89, #$8A, #$8B, #$8C, #$8D, #$8E, #$8F,
#$90, #$91, #$92, #$93, #$94, #$95, #$96, #$97,
#$98, #$99, #$9A, #$9B, #$9C, #$9D, #$9E, #$9F,
#$A0, #$A1, #$A2, #$A3, #$A4, #$A5, #$A6, #$A7,
#$A8, #$A9, #$AA, #$AB, #$AC, #$AD, #$AE, #$AF,
#$B0, #$B1, #$B2, #$B3, #$B4, #$B5, #$B6, #$B7,
#$B8, #$B9, #$BA, #$BB, #$BC, #$BD, #$BE, #$BF,
#$C0, #$C1, #$C2, #$C3, #$C4, #$C5, #$C6, #$C7,
#$C8, #$C9, #$CA, #$CB, #$CC, #$CD, #$CE, #$CF,
#$D0, #$D1, #$D2, #$D3, #$D4, #$D5, #$D6, #$D7,
#$D8, #$D9, #$DA, #$DB, #$DC, #$DD, #$DE, #$DF,
#$E0, #$E1, #$E2, #$E3, #$E4, #$E5, #$E6, #$E7,
#$E8, #$E9, #$EA, #$EB, #$EC, #$ED, #$EE, #$EF,
#$F0, #$F1, #$F2, #$F3, #$F4, #$F5, #$F6, #$F7,
#$F8, #$F9, #$FA, #$FB, #$FC, #$FD, #$FE, #$FF);
implementation
function GetHashSize(const Count: Integer): Integer;
begin
if Count < 65 then
Result := 256
else
Result := Round(IntPower(16, Ceil(Log10(Count div 4) / Log10(16))));
end;
function Hash(const Hash: LongWord; const Buf; const BufSize: Integer): LongWord;
var P: PByte;
I: Integer;
begin
P := #Buf;
Result := Hash;
for I := 1 to BufSize do
begin
Result := HashTable[Byte(Result) xor P^] xor (Result shr 8);
Inc(P);
end;
end;
function HashStrBuf(const StrBuf: Pointer; const StrLength: Integer; const Slots: LongWord): LongWord;
var P: PChar;
I, J: Integer;
begin
if not HashTableInit then
InitHashTable;
P := StrBuf;
if StrLength <= 48 then // Hash all characters for short strings
Result := Hash($FFFFFFFF, P^, StrLength)
else
begin
// Hash first 16 bytes
Result := Hash($FFFFFFFF, P^, 16);
// Hash last 16 bytes
Inc(P, StrLength - 16);
Result := Hash(Result, P^, 16);
// Hash 16 bytes sampled from rest of string
I := (StrLength - 48) div 16;
P := StrBuf;
Inc(P, 16);
for J := 1 to 16 do
begin
Result := HashTable[Byte(Result) xor Byte(P^)] xor (Result shr 8);
Inc(P, I + 1);
end;
end;
// Mod into slots
if Slots <> 0 then
Result := Result mod Slots;
end;
procedure InitHashTable;
var I, J: Byte;
R: LongWord;
begin
for I := $00 to $FF do
begin
R := I;
for J := 8 downto 1 do
if R and 1 <> 0 then
R := (R shr 1) xor $EDB88320
else
R := R shr 1;
HashTable[I] := R;
end;
Move(HashTable, HashTableNoCase, Sizeof(HashTable));
for I := Ord('A') to Ord('Z') do
HashTableNoCase[I] := HashTableNoCase[I or 32];
HashTableInit := True;
end;
The result of the HashStrBuf is "and (FHashSize - 1)" and is used as index in an "array of array of Integer" (of FHashSize size) to store the index of the string from that "array of string".
This way, when searches for a string, it's transformed in "checksum" and then the code searches in the "branch" with this index comparing this string with the strings from dictionary who have the same "checksum".
Ideally each string from dictionary should have unique checksum. But in the "real world" about 2/3 share the same "checksum" with other words. Because of that the search is not that fast.
In these dictionaries strings are composed of this characters: ['a'..'z',#224..#246,#248..#254,#154,#156..#159,#179,#186,#191,#190,#185,'0'..'9', '''']
Is there any way to improve the "hashing" so the strings would have more unique "checksums"?
Oh, one way is to increase the size of that "array of array of Integer" (FHashSize) but it cannot be increased too much because it takes a lot of Ram.
Another thing: these dictionaries are stored on HDD only as words/expressions (not the "checksums"). Their "checksum" is generated at program startup. But it takes a lot of seconds to do that...
Is there any way to speed up the startup of the program? Maybe by improving the "hashing" function, maybe by storing the "checksums" on HDD and loading them from there...
Any input would be appreciated...
PS: here is the code to search:
function TDictionary.LocateKey(const Key: AnsiString): Integer;
var i, j, l, H: Integer;
P, Q: PChar;
begin
Result := -1;
l := Length(Key);
H := HashStrBuf(#Key[1], l, 0) and (FHashSize - 1);
P := #Key[1];
for i := 0 to High(FHash[H]) do //FHash is that "array of array of integer"
begin
if l <> FKeys.ItemSize[FHash[H][i]] then //FKeys.ItemSize is an byte array with the lengths of strings from dictionary
Continue;
Q := FKeys.Pointer(FHash[H][i]); //pointer to string in dictionary
for j := 0 to l - 1 do
if (P + j)^ <> (Q + j)^ then
Break;
if j = l then
begin
Result := FHash[H][i];
Exit;
end;
end;
end;
Don't reinvent the wheel!
IMHO your hashing is far from efficient, and your collision algorithm can be improved.
Take a look for instance at the IniFiles unit, and the THashedStringList.
It's a bit old, but a good start for a string list using hashes.
There are a lot of good Delphi implementation of such, like in SuperObject and a lot of other code...
Take a look at our SynBigTable unit, which can handle arrays of data in memory or in file very fast, with full indexed searches. Or our latest TDynArray wrapper around any dynamic array of data, to implement TList-like methods to it, including fast binary search. I'm quite sure it could be faster than your hand-tuned code using hashing, if you use an ordered index then fast binary search.
Post-Scriptum:
About pure hashing speed of a string content, take a look at this function - rename RawByteString into AnsiString, PPtrInt into PPointer, and PtrInt into Integer for Delphi 7:
function Hash32(const Text: RawByteString): cardinal;
function SubHash(P: PCardinalArray): cardinal;
{$ifdef HASINLINE}inline;{$endif}
var s1,s2: cardinal;
i, L: PtrInt;
const Mask: array[0..3] of cardinal = (0,$ff,$ffff,$ffffff);
begin
if P<>nil then begin
L := PPtrInt(PtrInt(P)-4)^; // fast lenght(Text)
s1 := 0;
s2 := 0;
for i := 1 to L shr 4 do begin // 16 bytes (4 DWORD) by loop - aligned read
inc(s1,P^[0]);
inc(s2,s1);
inc(s1,P^[1]);
inc(s2,s1);
inc(s1,P^[2]);
inc(s2,s1);
inc(s1,P^[3]);
inc(s2,s1);
inc(PtrUInt(P),16);
end;
for i := 1 to (L shr 2)and 3 do begin // 4 bytes (DWORD) by loop
inc(s1,P^[0]);
inc(s2,s1);
inc(PtrUInt(P),4);
end;
inc(s1,P^[0] and Mask[L and 3]); // remaining 0..3 bytes
inc(s2,s1);
result := s1 xor (s2 shl 16);
end else
result := 0;
end;
begin // use a sub function for better code generation under Delphi
result := SubHash(pointer(Text));
end;
There is even a pure asm version, even faster, in our SynCommons.pas unit. I don't know any faster hashing function around (it's faster than crc32/adler32/IniFiles.hash...). It's based on adler32, but use DWORD aligned reading and summing for even better speed. This could be improved with SSE asm, of course, but here is a fast pure Delphi hash function.
Then don't forget to use "multiplication"/"binary and operation" for hash resolution, just like in IniFiles. It will reduce the number of iteration to your list of hashs.
But since you didn't provide the search source code, we are not able to know what could be improved here.
If you are using Delphi 7, consider using Julian Bucknall's lovely Delphi data types code, EzDsl (Easy Data Structures Library).
Now you don't have to reinvent the wheel as another wise person has also said.
You can download ezdsl, a version that I have made work with both Delphi 7, and recent unicode delphi versions, here.
In particular the unit name EHash contains a hash table implementation, which has various hashing algorithms plug-inable, or you can write your own plugin function that just does the hashing function of your choice.
As a word to the wise, if you are using a Unicode Delphi version; I would be careful about hashing your unicode strings with a code library like this, without checking how its hashing algorithms perform on your system. The OP here is using Delphi 7, so Unicode is not a factor for the original question.
I think you'll find a database (without checksums) a lot quicker. Maybe try sqlite which will give you a single file database. There are many Delphi Libraries available.

Convert Int64 to Base30 and Back

For a registration code I want to convert an Int64 to base30 (30 so that only uppercase characters and excluding 0,O,I,1,etc.) and back.
This is not too difficult using functions like:
const
Base = 30;
Base30CharSet = '23456789ABCDEFGHJKLMNPRSTVWXYZ';
function ConvertIntToBase30(ANumber: Int64): string;
begin
if(ANumber = 0) then
Result := Copy(Base30CharSet, 1, 1)
else begin
Result := '';
while(ANumber <> 0) do begin
Result := Copy(Base30CharSet, (ANumber mod Base)+1, 1) + Result;
ANumber := ANumber div Base;
end;
end;
end;
function ConvertBase30ToInt(ANumber: string): Int64;
var
i: integer;
begin
Result := 0;
for i := 1 to Length(ANumber) do begin
Result := Result + (Pos(ANumber[i], Base30CharSet)-1);
if(i < Length(ANumber)) then
Result := Result * Base;
end;
end;
The snag is that I am interested in the Int64's bits, so I could be dealing with a number like $FFFFFFFFFFFFFFFF = -1.
To work around this I thought I would store and remove the sign (abs()) and include the sign as an extra character appended to the base30 result. The problem the occurs at the lower limit of Int64 as calling abs(-9223372036854775808) results in an overflow.
Does anyone have a solution or better algorithm to solve this problem?
The way to deal with it is having a character to indicate it is a negative number so that you can decode back. For negative number, just flip the bit from 1 to 0 and remove the sign bit before encoding and when decode, do a flip back and add the sign bit. Below is working codes
function InvertIntOff(const ANumberL, ANumberH: Integer): Int64;
asm
XOR EAX,$FFFFFFFF
XOR EDX,$FFFFFFFF
end;
function InvertIntOn(const ANumberL, ANumberH: Integer): Int64;
asm
XOR EAX,$FFFFFFFF
XOR EDX,$FFFFFFFF
OR EDX,$80000000
end;
function ConvertIntToBase(ANumber: Int64): string;
const
CBaseMap: array[0..31] of Char = (
'2','3','4','5','6','7','8','9', //0-7
'A','B','C','D','E','F','G','H', //8-15
'J','K','L','M','N', //16-20
'P','Q','R','S','T','U','V','X','W','Y','Z'); //21-31
var
I: Integer;
begin
SetLength(Result, 15);
I := 0;
if ANumber < 0 then
begin
Inc(I);
Result[I] := '1';
ANumber := InvertIntOff(ANumber and $FFFFFFFF, (ANumber and $FFFFFFFF00000000) shr 32);
end;
while ANumber <> 0 do
begin
Inc(I);
Result[I] := CBaseMap[ANumber and $1F];
ANumber := ANumber shr 5;
end;
SetLength(Result, I);
end;
function ConvertBaseToInt(const ABase: string): Int64;
var
I, Index: Integer;
N: Int64;
begin
Result := 0;
if Length(ABase) > 0 then
begin
if ABase[1] = '1' then
Index := 2
else
Index := 1;
for I := Index to Length(ABase) do
begin
case ABase[I] of
'2'..'9':
N := Ord(ABase[I]) - Ord('2');
'A'..'H':
N := Ord(ABase[I]) - Ord('A') + 8;
'J'..'N':
N := Ord(ABase[I]) - Ord('J') + 16;
'P'..'Z':
N := Ord(ABase[I]) - Ord('P') + 21;
else
raise Exception.Create('error');
end;
if I > Index then
Result := Result or (N shl ((I - Index) * 5))
else
Result := N;
end;
if ABase[1] = '1' then
Result := InvertIntOn(Result and $FFFFFFFF, (Result and $FFFFFFFF00000000) shr 32);
end;
end;
procedure TestBase32;
var
S: string;
begin
S := ConvertIntToBase(-1);
ShowMessage(S + ' / ' + IntToStr(ConvertBaseToInt(S)) + ' ? -1');
S := ConvertIntToBase(-31);
ShowMessage(S + ' / ' + IntToStr(ConvertBaseToInt(S)) + ' ? -31');
S := ConvertIntToBase(1);
ShowMessage(S + ' / ' + IntToStr(ConvertBaseToInt(S)) + ' ? 1');
S := ConvertIntToBase(123456789);
ShowMessage(S + ' / ' + IntToStr(ConvertBaseToInt(S)) + ' ? 123456789');
S := ConvertIntToBase(-123456789);
ShowMessage(S + ' / ' + IntToStr(ConvertBaseToInt(S)) + ' ? -123456789');
end;
I think you are almost there by considering abs()...
But rather than using abs() why not simply ignore the sign for processing the value of the Int64 itself ? As far as I can tell, you are in fact already doing this so only one minor addition is needed to the encoding routine:
if aNumber < 0 then
// negative
else
// positive;
The only problem then is the LOSS of sign information in the resulting Base30 string. So treat that as a separate problem to be solved using the new information gained from the aNumber < 0 test...
I see you have excluded all chars that could be confused for 0 or 1 but have also excluded 0 and 1 themselves. You could therefore use 0 and 1 to indicate positive or negative (or vice versa).
Depending on the purpose of these routines, the placement of the 0/1 in the result could be entirely arbitrary (if you wished to obfuscate things and make the placement of the 0/1 random rather than a consistent lead/trail character).
When encoding simply drop a sign indicator into the result string at random, and when decoding handle the 0/1 character whenever as the sign marker it is encountered, but skipped for the purposes of decoding the value.
Of course, if obfuscation is not an issue then simply consistently pre or post fix the sign indicator.
You could even simply choose to use '1' to indicate negative and the LACK of a '1' to indicate/assume positive (this would simplify the zero value case a little I think)
The easy answer is to turn range checking off, even just for the method that you're calling abs in.
If you don't care about an extra char or two you could split the int64 into words or dwords and string those together. I would be more tempted to go to base32 and use bit shifts for speed and ease of use. Then your encoding becomes
Base32CharSet[(ANumber shr 5) % 32]
and a similar pos() based approach for the decode.

Resources