Does function overloading have runtime overhead in Delphi?

Does function overloading have runtime overhead in Delphi? - delphi

Is there any additional runtime overhead in calling overloaded functions?
(I ask this specifically for Delphi, in case the answer isn't the same for all compiled languages)
I think not as that should be resolved during compile time, but you can never be sure can you?

Of course you can be sure, because it is documented. Is the compiler which resolves it at compile time, so there's no additional overhead on calling overloaded functions in Delphi.
[Edit]
I did a small test for you:
var
j: Integer;
st: string;
procedure DoNothing(i: Integer); overload;
begin
j := i;
end;
procedure DoNothing(s: string); overload;
begin
st := s;
end;
procedure DoNothingI(i: integer);
begin
j := i;
end;
procedure TForm2.Button1Click(Sender: TObject);
const
MaxIterations = 10000000;
var
StartTick, EndTick: Cardinal;
I: Integer;
begin
StartTick := GetTickCount;
for I := 0 to MaxIterations - 1 do
DoNothing(I);
EndTick := GetTickCount;
Label1.Caption := Format('Overlaod ellapsed ticks: %d [j:%d]', [EndTick - StartTick, j]);
StartTick := GetTickCount;
for I := 0 to MaxIterations - 1 do
DoNothingI(I);
EndTick := GetTickCount;
Label1.Caption := Format('%s'#13'Normal ellapsed ticks: %d [j:%d]', [Label1.Caption, EndTick - StartTick, j]);
end;
Result: Almost all the time 31 Ticks (milliseconds) for both on my dev machine, sometimes overload takes only 16 ticks.

Overloading is resolved at compile time (no overhead), but overriding has overhead!
virtual is faster than dynamic:
http://docwiki.embarcadero.com/RADStudio/en/Methods
Virtual versus Dynamic
In Delphi for Win32, virtual and dynamic methods are semantically equivalent.
However, they differ in the implementation of method-call dispatching at run time: virtual methods optimize for speed, while dynamic methods optimize for code size.

Related

Fast way to get total line number of a large file

I'm dealing with large text files (bigger than 100MB). I need the total number of lines as fast as possible. I'm currently using the code below (update: added try-finally):
var
SR: TStreamReader;
totallines: int64;
str: string;
begin
SR:=TStreamReader.Create(myfilename, TEncoding.UTF8);
try
totallines:=0;
while not SR.EndOfStream do
begin
str:=SR.ReadLine;
inc(totallines);
end;
finally
SR.Free;
end;
end;
Is there any faster way to get totallines?

Program LineCount;
{$APPTYPE CONSOLE}
{$WEAKLINKRTTI ON}
{$RTTI EXPLICIT METHODS([]) PROPERTIES([]) FIELDS([])}
{$SetPEFlags 1}
{ Compile with XE8 or above... }
USES
SysUtils,
BufferedFileStream;
VAR
LineCnt: Int64;
Ch: Char;
BFS: TReadOnlyCachedFileStream;
function Komma(const S: string; const C: Char = ','): string;
{ About 4 times faster than Comma... }
var
I: Integer; // loops through separator position
begin
Result := S;
I := Length(S) - 2;
while I > 1 do
begin
Insert(C, Result, I);
I := I - 3;
end;
end; {Komma}
BEGIN
writeln('LineCount - Copyright (C) 2020 by Walter L. Chester.');
writeln('Counts lines in the given textfile.');
if ParamCount <> 1 then
begin
writeln('USAGE: LineCount <filename>');
writeln;
writeln('No file size limit! Counts lines: takes 4 minutes on a 16GB file.');
Halt;
end;
if not FileExists(ParamStr(1)) then
begin
writeln('File not found!');
halt;
end;
writeln('Counting lines in file...');
BFS := TReadOnlyCachedFileStream.Create(ParamStr(1), fmOpenRead);
try
LineCnt := 0;
while BFS.Read(ch,1) = 1 do
begin
if ch = #13 then
Inc(LineCnt);
if (LineCnt mod 1000000) = 0 then
write('.');
end;
writeln;
writeln('Total Lines: ' + Komma(LineCnt.ToString));
finally
BFS.Free;
end;
END.

The answer is simply: No. Your algorithm is the fastest but the implementation isn't. You must read the whole file and count the lines. At least if lines are not fixed size.
How you read the file may impact the global performance.
Read the file block by block in a binary buffer (Array of bytes) as large as possible. Then count the lines in the buffer and loop with the block in same buffer.

Is there a native syntax to access the outer function Result variable from inner function in Delphi?

Consider:
function OuterFunc: integer;
function InnerFunc: integer;
begin
// Here I'd like to access the OuterFunc.Result variable
// for both reading and writing its value
OuterFunc.Result := OuterFunc.Result + 12;
end;
begin
end;
Is there a native syntax to access the OuterFunc Result variable inside InnerFunc? Or is the only way to do this to pass it like a parameter, as in the following?
function OuterFunc: integer;
function InnerFunc(var outerResult: integer): integer;
begin
end;
var
i: integer;
begin
i := InnerFunc(Result);
end;

You can assign result to functions by assigning to the function name, which actually was the original way in Pascal:
function MyFunc: integer;
begin
MyFunc := 2;
// is equal to the following
Result := 2;
end;
So in your case you can write
function OuterFunc: integer;
function InnerFunc: integer;
begin
OuterFunc := 12;
end;
begin
end;
Beware however, that using the function name in a statement block anyware else than on the left side of the assignment operator results in a recursive call, and is therefore different from how the predefined Result works.
In other words, you can not access a previously set value of OuterFunc from within InnerFunc. You would need to use e.g. a local variable in the outer scope defined before InnerFunc to be accessible also from InnerFunc:
function OuterFunc: integer;
var
OuterResult: integer;
function InnerFunc: integer;
begin
OuterResult := 0;
OuterResult := OuterResult + 12;
end;
begin
Result := OuterResult;
end;
For more details refer to Function Declarations in the documentation.

Another option, except for using the native Pascal syntax (as displayed by Tom Brunberg's answer), is converting the local function into a procedure.
function OuterFunc: integer;
procedure InnerFunc(out innerResult: integer);
begin
{OuterFunc's} Result := 0;
innerReuslt := -1;
end;
var
i: integer;
begin
InnerFunc( i );
end;
Since this is your INNER local function you would not break some external API/contract by this simple change.
Twice so since your original code has InnerFunc being the de facto procedure, making no use of its own Result neither by caller, nor by callee.
function OuterFunc: integer;
// function InnerFunc: integer;
procedure InnerFunc;
begin
// here i'd like to access OuterFunc.Result variable
// for both reading and writing its value
// OuterFunc.Result := OuterFunc.Result + 12;
Result := Result + 12;
end;
begin
InnerFunc();
end;
But okay, let's assume you just forgot using BOTH results of BOTH functions, but you did originally intend to.
Still there are few ways at your disposal to cut corners and to hack over the Delphi language intentions-limitations.
Starting with that procedure approach, you may add a function shorthand, if you want to use such a function in expressions.
Though it looks a bit ugly and adds a redirection call for the CPU (you can not inline local functions and if you could Delphi inline implementation is bogged with "register dances"), so slows things down somewhat (but depending on how much you call it w.r.t. other work - that extra work might be non-noticeable).
function OuterFunc: integer;
procedure InnerFunc(out innerResult: integer); overload;
begin
innerResult := +2;
// {OuterFunc's} Result := Result + innerResult;
Inc( Result, innerResult );
end;
function InnerFunc: integer; overload;
begin
InnerFunc( Result );
end;
var
i: integer;
begin
// InnerFunc( i );
i := InnerFunc();
end;
And yet another hack is declaring the variables overlapping.
function OuterFunc: integer;
var Outer_Result: integer absolute Result;
i: integer;
function InnerFunc: integer;
begin
Result := +2;
Inc( Outer_Result, Result );
end;
begin
i := InnerFunc();
end;
Now, this approach might kill the optimizations, like placing the "result" in the CPU registers, forcing using the RAM for it, which is slower.
Additionally, once you might wish to change the type of the OuterFunc and if you forget to change the type of the Outer_Result var accordingly - you screwed yourself.
function OuterFunc: double; // was - integer; Proved to be not enough since 2020
var Outer_Result: integer absolute Result; // and here we forgot to sync type changing.... ooooops!
i: integer;
function InnerFunc: integer;
....
So less hackish way to express that intention (at the price of allocating and accessing yet one more in-RAM variable) would be this:
function OuterFunc: integer;
{$T+} // we need to enable type checking: predictability is safety
var Outer_Result: ^integer;
i: integer;
function InnerFunc: integer;
begin
Result := +2;
Inc( Outer_Result^, Result );
end;
begin
Outer_Result := #Result;
i := InnerFunc();
end;
But all these options are hack-arounds, breaking conceptual clarity, thus hampering ability for people to read/understand the program in the future.
If you need the variable - then do declare the variable. That would be the most clear option here. Afterall programs are more written for the future programmers to read them than for the computers to compile them. :-)
function OuterFunc: integer;
var the_Outer_Result: integer;
function InnerFunc;
begin
Result := +2;
Inc( the_Outer_Result, Result );
end;
var
i: integer;
begin
the_Outer_Result := 0;
.....
I := InnerFunc();
.....
Result := the_Outer_Result;
end;
That way you would not fight with the language, but give up and use it as it was intended to use. Fighting and outsmarting the language is always fun, but in the long term, when you have to maintain the code any human being last read 5 years ago and port it to newer versions of Delphi/libraries/Windows - then such the non-natural smart tricks tend to become quite annoying.

Find the Maximum in a List of calculated values using the Parallel Programming Library

I have a list of values. I'd like to find the maximum value. This is a common task. A simple version might be:
iBest := -1;
iMax := -1e20;
for i := 0 to List.Count - 1 do
begin
if List[i].Value > iMax then
begin
iBest := i;
iMax := List[i].Value;
end;
end;
In my case, the .Value getter is the performance bottleneck as it invokes a time consuming calculation (~100ms) which returns the final value.
How can I make this parallel using the Parallel Programming Library?

If the value is a calculated value and you can afford to cache, a simple solution might look something like this:
program Project1;
{$APPTYPE CONSOLE}
uses
SysUtils, Threading, DateUtils, Math, Generics.Collections, StrUtils;
type
TFoo = class
private
FCachedValue : double;
function GetValue : double;
public
property CalculateValue : double read GetValue;
property CachedValue : double read FCachedValue;
end;
TFooList = class(TObjectList<TFoo>)
public
procedure CalculateValues;
function GetMaxValue(var BestIndex : integer) : double;
end;
function TFoo.GetValue : double;
begin
sleep(10); // simulate taking some time... make up a value
FCachedValue := DateUtils.MilliSecondOfTheSecond(now);
result := FCachedValue;
end;
procedure TFooList.CalculateValues;
begin
TParallel.For(0, Count - 1,
procedure (j:integer)
begin
self[j].CalculateValue;
end);
end;
function TFooList.GetMaxValue(var BestIndex : Integer) : double;
var
i, iBest : integer;
maxval : double;
begin
CalculateValues;
iBest := 0;
maxval := self[0].CachedValue;
for i := 0 to self.Count - 1 do
begin
if self[i].CachedValue > maxval then
begin
iBest := i;
maxval := self[i].CachedValue;
end;
end;
BestIndex := iBest;
result := maxval;
end;
var
LFooList : TFooList;
i, iBest : integer;
maxval : double;
begin
LFooList := TFooList.Create(true);
try
for i := 0 to 9999 do LFooList.Add(TFoo.Create);
maxval := LFooList.GetMaxValue(iBest);
WriteLn(Format('Max value index %d', [iBest]));
WriteLn(Format('Max value %.6f', [maxval]));
finally
LFooList.Free;
end;
ReadLn;
end.
This way your object retains a cache of the last calculated value, which you can refresh at any time, but which you can also access quickly. It's somewhat easier to parallelize a full calculation of the list than it is to paralellize the min/max search, and if the bottleneck is the calculation then it makes sense to restrict the added complexity to that operation alone (where you know the overhead is worth it).

Is there a GetMouseMovePointsEx function in Lazarus?

In this other question I asked: Drawing on a paintbox - How to keep up with mouse movements without delay?.
The function GetMouseMovePointsEx was brought to my attention by Sebastian Z, however in Lazarus I am unable to find this function.
He mentioned that in Delphi XE6 it is in Winapi.Windows.pas, in Lazarus though it is not in Windows.pas.
I understand Lazarus is by no means an exact copy of Delphi but this function sounds like it could be the answer I am looking for in that other question. Im just having a hard time finding where it is and even getting any Delphi documentation on it. I do have Delphi XE but right now it is not installed and my project is been written in Lazarus.
I did a Find in Files... search from the Lazarus IDE targeting the install folder and the only result that came back was from one of the fpc sources in:
lazarus\fpc\2.6.4\source\packages\winunits-jedi\src\jwawinuser.pas
I am not sure if I should use the above unit or not, or whether Lazarus has a different variant to GetMouseMovePointsEx?
Does anyone using Lazarus have any experience with GetMouseMovePointsEx and if so where can I find it?
Thanks.

Here's a quick example using Delphi. What you still need to do is filter out the points you've already received.
type
TMouseMovePoints = array of TMouseMovePoint;
const
GMMP_USE_HIGH_RESOLUTION_POINTS = 2;
function GetMouseMovePointsEx(cbSize: UINT; var lppt: TMouseMovePoint; var lpptBuf: TMouseMovePoint; nBufPoints: Integer; resolution: DWORD): Integer; stdcall; external 'user32.dll';
function GetMessagePosAsTPoint: TPoint;
type
TMakePoints = packed record
case Integer of
1: (C : Cardinal);
2: (X : SmallInt; Y : SmallInt);
end;
var
Tmp : TMakePoints;
begin
Tmp.C := GetMessagePos;
Result.X := Tmp.X;
Result.Y := Tmp.Y;
end;
function GetMousePoints: TMouseMovePoints;
var
nVirtualWidth: Integer;
nVirtualHeight: Integer;
nVirtualLeft: Integer;
nVirtualTop: Integer;
cpt: Integer;
mp_in: MOUSEMOVEPOINT;
mp_out: array[0..63] of MOUSEMOVEPOINT;
mode: Integer;
Pt: TPoint;
I: Integer;
begin
Pt := GetMessagePosAsTPoint;
nVirtualWidth := GetSystemMetrics(SM_CXVIRTUALSCREEN) ;
nVirtualHeight := GetSystemMetrics(SM_CYVIRTUALSCREEN) ;
nVirtualLeft := GetSystemMetrics(SM_XVIRTUALSCREEN) ;
nVirtualTop := GetSystemMetrics(SM_YVIRTUALSCREEN) ;
cpt := 0 ;
mode := GMMP_USE_DISPLAY_POINTS ;
FillChar(mp_in, sizeof(mp_in), 0) ;
mp_in.x := pt.x and $0000FFFF ;//Ensure that this number will pass through.
mp_in.y := pt.y and $0000FFFF ;
mp_in.time := GetMessageTime;
cpt := GetMouseMovePointsEx(SizeOf(MOUSEMOVEPOINT), mp_in, mp_out[0], 64, mode) ;
for I := 0 to cpt - 1 do
begin
case mode of
GMMP_USE_DISPLAY_POINTS:
begin
if (mp_out[i].x > 32767) then
mp_out[i].x := mp_out[i].x - 65536;
if (mp_out[i].y > 32767) then
mp_out[i].y := mp_out[i].y - 65536;
end;
GMMP_USE_HIGH_RESOLUTION_POINTS:
begin
mp_out[i].x := ((mp_out[i].x * (nVirtualWidth - 1)) - (nVirtualLeft * 65536)) div nVirtualWidth;
mp_out[i].y := ((mp_out[i].y * (nVirtualHeight - 1)) - (nVirtualTop * 65536)) div nVirtualHeight;
end;
end;
end;
if cpt > 0 then
begin
SetLength(Result, cpt);
for I := 0 to cpt - 1 do
begin
Result[I] := mp_out[I];
end;
end
else
SetLength(Result, 0);
end;
// the following is for demonstration purposes only, it still needs some improvements like filtering out points that were already processed. But it's good enough for painting a blue line on a TImage
procedure TForm1.Image1MouseMove(Sender: TObject; Shift: TShiftState; X,
Y: Integer);
var
MMPoints: TMouseMovePoints;
Pt: TPoint;
I: Integer;
begin
Image1.Canvas.Pen.Color := clBlue;
MMPoints := GetMousePoints;
for I := 0 to Length(MMPoints) - 1 do
begin
Pt.x := MMPoints[I].x;
Pt.y := MMPoints[I].y;
Pt := Image1.ScreenToClient(Pt);
if I = 0 then
Image1.Canvas.MoveTo(PT.X, pt.y)
else
Image1.Canvas.LineTo(PT.X, pt.y);
end;
end;

This function is implemented as part of the Win32 library. It is no more a Delphi or FPC function than it is a C++ or VB function. You import it from Win32.
In Delphi, this importing is achieved by way of the declaration of the function in the Windows unit. If you examine the source of this unit you'll find lots of type and constant declarations, as well as functions. The functions are typically implemented using the external keyword which indicates that the implementation is external to this code. The Windows unit is what is known as a header translation. That is it is a translation of the C/C++ header files from the Win32 SDK.
So you need a header translation with this function. The JEDI header translations are the most usual choice. And it seems that you've already found them. If the versions supplied with FPC serve your needs, use them.
Sometimes you might find yourself on the bleeding edge of progress and need to use a function that has not been included in any of the standard header translations. In that scenario it's usually simple enough to perform the translation yourself.

What is the fastest way to Parse a line in Delphi?

I have a huge file that I must parse line by line. Speed is of the essence.
Example of a line:
Token-1 Here-is-the-Next-Token Last-Token-on-Line
^ ^
Current Position
Position after GetToken
GetToken is called, returning "Here-is-the-Next-Token" and sets the CurrentPosition to the position of the last character of the token so that it is ready for the next call to GetToken. Tokens are separated by one or more spaces.
Assume the file is already in a StringList in memory. It fits in memory easily, say 200 MB.
I am worried only about the execution time for the parsing. What code will produce the absolute fastest execution in Delphi (Pascal)?

Use PChar incrementing for speed of processing
If some tokens are not needed, only copy token data on demand
Copy PChar to local variable when actually scanning through characters
Keep source data in a single buffer unless you must handle line by line, and even then, consider handling line processing as a separate token in the lexer recognizer
Consider processing a byte array buffer that has come straight from the file, if you definitely know the encoding; if using Delphi 2009, use PAnsiChar instead of PChar, unless of course you know the encoding is UTF16-LE.
If you know that the only whitespace is going to be #32 (ASCII space), or a similarly limited set of characters, there may be some clever bit manipulation hacks that can let you process 4 bytes at a time using Integer scanning. I wouldn't expect big wins here though, and the code will be as clear as mud.
Here's a sample lexer that should be pretty efficient, but it assumes that all source data is in a single string. Reworking it to handle buffers is moderately tricky due to very long tokens.
type
TLexer = class
private
FData: string;
FTokenStart: PChar;
FCurrPos: PChar;
function GetCurrentToken: string;
public
constructor Create(const AData: string);
function GetNextToken: Boolean;
property CurrentToken: string read GetCurrentToken;
end;
{ TLexer }
constructor TLexer.Create(const AData: string);
begin
FData := AData;
FCurrPos := PChar(FData);
end;
function TLexer.GetCurrentToken: string;
begin
SetString(Result, FTokenStart, FCurrPos - FTokenStart);
end;
function TLexer.GetNextToken: Boolean;
var
cp: PChar;
begin
cp := FCurrPos; // copy to local to permit register allocation
// skip whitespace; this test could be converted to an unsigned int
// subtraction and compare for only a single branch
while (cp^ > #0) and (cp^ <= #32) do
Inc(cp);
// using null terminater for end of file
Result := cp^ <> #0;
if Result then
begin
FTokenStart := cp;
Inc(cp);
while cp^ > #32 do
Inc(cp);
end;
FCurrPos := cp;
end;

Here is a lame ass implementation of a very simple lexer. This might give you an idea.
Note the limitations of this example - no buffering involved, no Unicode (this is an excerpt from a Delphi 7 project). You would probably need those in a serious implementation.
{ Implements a simpe lexer class. }
unit Simplelexer;
interface
uses Classes, Sysutils, Types, dialogs;
type
ESimpleLexerFinished = class(Exception) end;
TProcTableProc = procedure of object;
// A very simple lexer that can handle numbers, words, symbols - no comment handling
TSimpleLexer = class(TObject)
private
FLineNo: Integer;
Run: Integer;
fOffset: Integer;
fRunOffset: Integer; // helper for fOffset
fTokenPos: Integer;
pSource: PChar;
fProcTable: array[#0..#255] of TProcTableProc;
fUseSimpleStrings: Boolean;
fIgnoreSpaces: Boolean;
procedure MakeMethodTables;
procedure IdentProc;
procedure NewLineProc;
procedure NullProc;
procedure NumberProc;
procedure SpaceProc;
procedure SymbolProc;
procedure UnknownProc;
public
constructor Create;
destructor Destroy; override;
procedure Feed(const S: string);
procedure Next;
function GetToken: string;
function GetLineNo: Integer;
function GetOffset: Integer;
property IgnoreSpaces: boolean read fIgnoreSpaces write fIgnoreSpaces;
property UseSimpleStrings: boolean read fUseSimpleStrings write fUseSimpleStrings;
end;
implementation
{ TSimpleLexer }
constructor TSimpleLexer.Create;
begin
makeMethodTables;
fUseSimpleStrings := false;
fIgnoreSpaces := false;
end;
destructor TSimpleLexer.Destroy;
begin
inherited;
end;
procedure TSimpleLexer.Feed(const S: string);
begin
Run := 0;
FLineNo := 1;
FOffset := 1;
pSource := PChar(S);
end;
procedure TSimpleLexer.Next;
begin
fTokenPos := Run;
foffset := Run - frunOffset + 1;
fProcTable[pSource[Run]];
end;
function TSimpleLexer.GetToken: string;
begin
SetString(Result, (pSource + fTokenPos), Run - fTokenPos);
end;
function TSimpleLexer.GetLineNo: Integer;
begin
Result := FLineNo;
end;
function TSimpleLexer.GetOffset: Integer;
begin
Result := foffset;
end;
procedure TSimpleLexer.MakeMethodTables;
var
I: Char;
begin
for I := #0 to #255 do
case I of
'#', '&', '}', '{', ':', ',', ']', '[', '*',
'^', ')', '(', ';', '/', '=', '-', '+', '#', '>', '<', '$',
'.', '"', #39:
fProcTable[I] := SymbolProc;
#13, #10: fProcTable[I] := NewLineProc;
'A'..'Z', 'a'..'z', '_': fProcTable[I] := IdentProc;
#0: fProcTable[I] := NullProc;
'0'..'9': fProcTable[I] := NumberProc;
#1..#9, #11, #12, #14..#32: fProcTable[I] := SpaceProc;
else
fProcTable[I] := UnknownProc;
end;
end;
procedure TSimpleLexer.UnknownProc;
begin
inc(run);
end;
procedure TSimpleLexer.SymbolProc;
begin
if fUseSimpleStrings then
begin
if pSource[run] = '"' then
begin
Inc(run);
while pSource[run] <> '"' do
begin
Inc(run);
if pSource[run] = #0 then
begin
NullProc;
end;
end;
end;
Inc(run);
end
else
inc(run);
end;
procedure TSimpleLexer.IdentProc;
begin
while pSource[Run] in ['_', 'A'..'Z', 'a'..'z', '0'..'9'] do
Inc(run);
end;
procedure TSimpleLexer.NumberProc;
begin
while pSource[run] in ['0'..'9'] do
inc(run);
end;
procedure TSimpleLexer.SpaceProc;
begin
while pSource[run] in [#1..#9, #11, #12, #14..#32] do
inc(run);
if fIgnoreSpaces then Next;
end;
procedure TSimpleLexer.NewLineProc;
begin
inc(FLineNo);
inc(run);
case pSource[run - 1] of
#13:
if pSource[run] = #10 then inc(run);
end;
foffset := 1;
fRunOffset := run;
end;
procedure TSimpleLexer.NullProc;
begin
raise ESimpleLexerFinished.Create('');
end;
end.

I made a lexical analyser based on a state engine (DFA). It works with a table and is pretty fast. But there are possible faster options.
It also depends on the language. A simple language can possibly have a smart algorithm.
The table is an array of records each containing 2 chars and 1 integer. For each token the lexer walks through the table, startting at position 0:
state := 0;
result := tkNoToken;
while (result = tkNoToken) do begin
if table[state].c1 > table[state].c2 then
result := table[state].value
else if (table[state].c1 <= c) and (c <= table[state].c2) then begin
c := GetNextChar();
state := table[state].value;
end else
Inc(state);
end;
It is simple and works like a charm.

If speed is of the essence, custom code is the answer. Check out the Windows API that will map your file into memory. You can then just use a pointer to the next character to do your tokens, marching through as required.
This is my code for doing a mapping:
procedure TMyReader.InitialiseMapping(szFilename : string);
var
// nError : DWORD;
bGood : boolean;
begin
bGood := False;
m_hFile := CreateFile(PChar(szFilename), GENERIC_READ, 0, nil, OPEN_EXISTING, 0, 0);
if m_hFile <> INVALID_HANDLE_VALUE then
begin
m_hMap := CreateFileMapping(m_hFile, nil, PAGE_READONLY, 0, 0, nil);
if m_hMap <> 0 then
begin
m_pMemory := MapViewOfFile(m_hMap, FILE_MAP_READ, 0, 0, 0);
if m_pMemory <> nil then
begin
htlArray := Pointer(Integer(m_pMemory) + m_dwDataPosition);
bGood := True;
end
else
begin
// nError := GetLastError;
end;
end;
end;
if not bGood then
raise Exception.Create('Unable to map token file into memory');
end;

I think the biggest bottleneck is always going to be getting the file into memory. Once you have it in memory (obviously not all of it at once, but I would work with buffers if I were you), the actual parsing should be insignificant.

This begs another question - How big?
Give us a clue like # of lines or # or Mb (Gb)?
Then we will know if it fits in memory, needs to be disk based etc.
At first pass I would use my WordList(S: String; AList: TStringlist);
then you can access each token as Alist[n]...
or sort them or whatever.

Speed will always be relative to what you are doing once it is parsed. A lexical parser by far is the fastest method of converting to tokens from a text stream regardless of size. TParser in the classes unit is a great place to start.
Personally its been a while since I needed to write a parser, but another more dated yet tried and true method would be to use LEX/YACC to build a grammar then have it convert the grammar into code you can use to perform your processing. DYacc is a Delphi version...not sure if it still compiles or not, but worth a look if you want to do things old school. The dragon book here would be of big help, if you can find a copy.

Rolling your own is the fastest way for sure. For more on this topic, you could see Synedit's source code which contains lexers (called highlighters in the project's context) for about any language on the market. I suggest you take one of those lexers as a base and modify for your own usage.

The fastest way to write the code would probably be to create a TStringList and assign each line in your text file to the CommaText property. By default, white space is a delimiter, so you will get one StringList item per token.
MyStringList.CommaText := s;
for i := 0 to MyStringList.Count - 1 do
begin
// process each token here
end;
You'll probably get better performance by parsing each line yourself, though.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Does function overloading have runtime overhead in Delphi? - delphi

Is there any additional runtime overhead in calling overloaded functions? (I ask this specifically for Delphi, in case the answer isn't the same for all compiled languages) I think not as that should be resolved during compile time, but you can never be sure can you?

Related

Fast way to get total line number of a large file

Is there a native syntax to access the outer function Result variable from inner function in Delphi?

Find the Maximum in a List of calculated values using the Parallel Programming Library

Is there a GetMouseMovePointsEx function in Lazarus?

What is the fastest way to Parse a line in Delphi?

Categories

Resources