What is the fastest way to Parse a line in Delphi?

What is the fastest way to Parse a line in Delphi? - delphi

I have a huge file that I must parse line by line. Speed is of the essence.
Example of a line:
Token-1 Here-is-the-Next-Token Last-Token-on-Line
^ ^
Current Position
Position after GetToken
GetToken is called, returning "Here-is-the-Next-Token" and sets the CurrentPosition to the position of the last character of the token so that it is ready for the next call to GetToken. Tokens are separated by one or more spaces.
Assume the file is already in a StringList in memory. It fits in memory easily, say 200 MB.
I am worried only about the execution time for the parsing. What code will produce the absolute fastest execution in Delphi (Pascal)?

Use PChar incrementing for speed of processing
If some tokens are not needed, only copy token data on demand
Copy PChar to local variable when actually scanning through characters
Keep source data in a single buffer unless you must handle line by line, and even then, consider handling line processing as a separate token in the lexer recognizer
Consider processing a byte array buffer that has come straight from the file, if you definitely know the encoding; if using Delphi 2009, use PAnsiChar instead of PChar, unless of course you know the encoding is UTF16-LE.
If you know that the only whitespace is going to be #32 (ASCII space), or a similarly limited set of characters, there may be some clever bit manipulation hacks that can let you process 4 bytes at a time using Integer scanning. I wouldn't expect big wins here though, and the code will be as clear as mud.
Here's a sample lexer that should be pretty efficient, but it assumes that all source data is in a single string. Reworking it to handle buffers is moderately tricky due to very long tokens.
type
TLexer = class
private
FData: string;
FTokenStart: PChar;
FCurrPos: PChar;
function GetCurrentToken: string;
public
constructor Create(const AData: string);
function GetNextToken: Boolean;
property CurrentToken: string read GetCurrentToken;
end;
{ TLexer }
constructor TLexer.Create(const AData: string);
begin
FData := AData;
FCurrPos := PChar(FData);
end;
function TLexer.GetCurrentToken: string;
begin
SetString(Result, FTokenStart, FCurrPos - FTokenStart);
end;
function TLexer.GetNextToken: Boolean;
var
cp: PChar;
begin
cp := FCurrPos; // copy to local to permit register allocation
// skip whitespace; this test could be converted to an unsigned int
// subtraction and compare for only a single branch
while (cp^ > #0) and (cp^ <= #32) do
Inc(cp);
// using null terminater for end of file
Result := cp^ <> #0;
if Result then
begin
FTokenStart := cp;
Inc(cp);
while cp^ > #32 do
Inc(cp);
end;
FCurrPos := cp;
end;

Here is a lame ass implementation of a very simple lexer. This might give you an idea.
Note the limitations of this example - no buffering involved, no Unicode (this is an excerpt from a Delphi 7 project). You would probably need those in a serious implementation.
{ Implements a simpe lexer class. }
unit Simplelexer;
interface
uses Classes, Sysutils, Types, dialogs;
type
ESimpleLexerFinished = class(Exception) end;
TProcTableProc = procedure of object;
// A very simple lexer that can handle numbers, words, symbols - no comment handling
TSimpleLexer = class(TObject)
private
FLineNo: Integer;
Run: Integer;
fOffset: Integer;
fRunOffset: Integer; // helper for fOffset
fTokenPos: Integer;
pSource: PChar;
fProcTable: array[#0..#255] of TProcTableProc;
fUseSimpleStrings: Boolean;
fIgnoreSpaces: Boolean;
procedure MakeMethodTables;
procedure IdentProc;
procedure NewLineProc;
procedure NullProc;
procedure NumberProc;
procedure SpaceProc;
procedure SymbolProc;
procedure UnknownProc;
public
constructor Create;
destructor Destroy; override;
procedure Feed(const S: string);
procedure Next;
function GetToken: string;
function GetLineNo: Integer;
function GetOffset: Integer;
property IgnoreSpaces: boolean read fIgnoreSpaces write fIgnoreSpaces;
property UseSimpleStrings: boolean read fUseSimpleStrings write fUseSimpleStrings;
end;
implementation
{ TSimpleLexer }
constructor TSimpleLexer.Create;
begin
makeMethodTables;
fUseSimpleStrings := false;
fIgnoreSpaces := false;
end;
destructor TSimpleLexer.Destroy;
begin
inherited;
end;
procedure TSimpleLexer.Feed(const S: string);
begin
Run := 0;
FLineNo := 1;
FOffset := 1;
pSource := PChar(S);
end;
procedure TSimpleLexer.Next;
begin
fTokenPos := Run;
foffset := Run - frunOffset + 1;
fProcTable[pSource[Run]];
end;
function TSimpleLexer.GetToken: string;
begin
SetString(Result, (pSource + fTokenPos), Run - fTokenPos);
end;
function TSimpleLexer.GetLineNo: Integer;
begin
Result := FLineNo;
end;
function TSimpleLexer.GetOffset: Integer;
begin
Result := foffset;
end;
procedure TSimpleLexer.MakeMethodTables;
var
I: Char;
begin
for I := #0 to #255 do
case I of
'#', '&', '}', '{', ':', ',', ']', '[', '*',
'^', ')', '(', ';', '/', '=', '-', '+', '#', '>', '<', '$',
'.', '"', #39:
fProcTable[I] := SymbolProc;
#13, #10: fProcTable[I] := NewLineProc;
'A'..'Z', 'a'..'z', '_': fProcTable[I] := IdentProc;
#0: fProcTable[I] := NullProc;
'0'..'9': fProcTable[I] := NumberProc;
#1..#9, #11, #12, #14..#32: fProcTable[I] := SpaceProc;
else
fProcTable[I] := UnknownProc;
end;
end;
procedure TSimpleLexer.UnknownProc;
begin
inc(run);
end;
procedure TSimpleLexer.SymbolProc;
begin
if fUseSimpleStrings then
begin
if pSource[run] = '"' then
begin
Inc(run);
while pSource[run] <> '"' do
begin
Inc(run);
if pSource[run] = #0 then
begin
NullProc;
end;
end;
end;
Inc(run);
end
else
inc(run);
end;
procedure TSimpleLexer.IdentProc;
begin
while pSource[Run] in ['_', 'A'..'Z', 'a'..'z', '0'..'9'] do
Inc(run);
end;
procedure TSimpleLexer.NumberProc;
begin
while pSource[run] in ['0'..'9'] do
inc(run);
end;
procedure TSimpleLexer.SpaceProc;
begin
while pSource[run] in [#1..#9, #11, #12, #14..#32] do
inc(run);
if fIgnoreSpaces then Next;
end;
procedure TSimpleLexer.NewLineProc;
begin
inc(FLineNo);
inc(run);
case pSource[run - 1] of
#13:
if pSource[run] = #10 then inc(run);
end;
foffset := 1;
fRunOffset := run;
end;
procedure TSimpleLexer.NullProc;
begin
raise ESimpleLexerFinished.Create('');
end;
end.

I made a lexical analyser based on a state engine (DFA). It works with a table and is pretty fast. But there are possible faster options.
It also depends on the language. A simple language can possibly have a smart algorithm.
The table is an array of records each containing 2 chars and 1 integer. For each token the lexer walks through the table, startting at position 0:
state := 0;
result := tkNoToken;
while (result = tkNoToken) do begin
if table[state].c1 > table[state].c2 then
result := table[state].value
else if (table[state].c1 <= c) and (c <= table[state].c2) then begin
c := GetNextChar();
state := table[state].value;
end else
Inc(state);
end;
It is simple and works like a charm.

If speed is of the essence, custom code is the answer. Check out the Windows API that will map your file into memory. You can then just use a pointer to the next character to do your tokens, marching through as required.
This is my code for doing a mapping:
procedure TMyReader.InitialiseMapping(szFilename : string);
var
// nError : DWORD;
bGood : boolean;
begin
bGood := False;
m_hFile := CreateFile(PChar(szFilename), GENERIC_READ, 0, nil, OPEN_EXISTING, 0, 0);
if m_hFile <> INVALID_HANDLE_VALUE then
begin
m_hMap := CreateFileMapping(m_hFile, nil, PAGE_READONLY, 0, 0, nil);
if m_hMap <> 0 then
begin
m_pMemory := MapViewOfFile(m_hMap, FILE_MAP_READ, 0, 0, 0);
if m_pMemory <> nil then
begin
htlArray := Pointer(Integer(m_pMemory) + m_dwDataPosition);
bGood := True;
end
else
begin
// nError := GetLastError;
end;
end;
end;
if not bGood then
raise Exception.Create('Unable to map token file into memory');
end;

I think the biggest bottleneck is always going to be getting the file into memory. Once you have it in memory (obviously not all of it at once, but I would work with buffers if I were you), the actual parsing should be insignificant.

This begs another question - How big?
Give us a clue like # of lines or # or Mb (Gb)?
Then we will know if it fits in memory, needs to be disk based etc.
At first pass I would use my WordList(S: String; AList: TStringlist);
then you can access each token as Alist[n]...
or sort them or whatever.

Speed will always be relative to what you are doing once it is parsed. A lexical parser by far is the fastest method of converting to tokens from a text stream regardless of size. TParser in the classes unit is a great place to start.
Personally its been a while since I needed to write a parser, but another more dated yet tried and true method would be to use LEX/YACC to build a grammar then have it convert the grammar into code you can use to perform your processing. DYacc is a Delphi version...not sure if it still compiles or not, but worth a look if you want to do things old school. The dragon book here would be of big help, if you can find a copy.

Rolling your own is the fastest way for sure. For more on this topic, you could see Synedit's source code which contains lexers (called highlighters in the project's context) for about any language on the market. I suggest you take one of those lexers as a base and modify for your own usage.

The fastest way to write the code would probably be to create a TStringList and assign each line in your text file to the CommaText property. By default, white space is a delimiter, so you will get one StringList item per token.
MyStringList.CommaText := s;
for i := 0 to MyStringList.Count - 1 do
begin
// process each token here
end;
You'll probably get better performance by parsing each line yourself, though.

Related

Why do I get a stack overflow when I call the Windows API from my Delphi program?

My form supports drag'n'drop of files from the Windows Explorer:
uses
ShellApi, System.IOUtils;
procedure TFormMain.FormCreate(Sender: TObject);
begin
DragAcceptFiles(Self.Handle, True);
end;
procedure TFormMain.WMDropFiles(var Msg: TMessage);
var
hDrop: THandle;
FileCount, NameLen, i: Integer;
CurrFile: String;
FileSysEntries: TArray<String>;
begin
inherited;
hDrop := Msg.wParam;
try
FileCount := DragQueryFile(hDrop, $FFFFFFFF, nil, 0);
for i := 0 to FileCount - 1 do
begin
NameLen := DragQueryFile(hDrop, i, nil, 0) + 1; //+1 for NULL
SetLength(CurrFile, NameLen);
DragQueryFile(hDrop, i, PWideChar(CurrFile), NameLen);
//If I don't do this...
SetLength(CurrFile, StrLen(PWideChar(CurrFile)));
if DirectoryExists(CurrFile) then
begin
//...I get a stack overflow here!
FileSysEntries := TDirectory.GetFiles(CurrFile, '*.*', TSearchOption.soAllDirectories);
//Rest removed for clarity...
end;
end;
finally
DragFinish(hDrop);
end;
end;
Now if I don't strip the NULL (#0) character off the CurrFile string (see 2nd SetLength) I get a stack overflow when I call TDirectory.GetFiles and I'm now sure why.
Is the second SetLength (that strips #0) really necessary or should I do NameLen - 1 for the first SetLength? Or maybe something else?

I see a few issues:
you are calling DragAcceptFiles() only in the Form's OnCreate event. If the Form's HWND is ever re-created during the Form's lifetime (it can happen!), you will lose the ability to receive WM_DROPFILES messages.
You would need to call DragAcceptFiles() again with the updated HWND. You can override the Form's virtual CreateWnd() method to handle that.
Alternatively, you can override the Form's virtual CreateParams() method to enable the WS_EX_ACCEPTFILES extended window style for each HWND that is created.
your message handler is calling inherited. You don't need to do that. The default handler will not do anything with the message.
you are over-allocating memory for CurrFile. You technically DO NOT need to include the null terminator when calling SetLength(), as it will automatically allocate extra space for one (a Delphi string is implicitly null-terminated, so that PChar casts can be used with C-style APIs that expect null-terminated character pointers).
If you DO include the null terminator in the string's length, you have to explicitly shrink the strings length afterwards, which you are doing (but not as efficiently as you could be, as DragQueryFile(i) will tell you the length to use without a null terminator, so you don't have to calculate it manually with StrLen()). But, it is better to simply not over-allocate to begin with.
Apparently having that extra #0 in the string's length is causing problems for TDirectory.GetFiles() (or more likely, TPath, which TDirectory uses internally). You should file a bug report about that. But, you do need to make sure you don't leave the terminating #0 in the string's length to begin with, since filesystem path APIs don't accept it anyway.
Try this instead:
uses
ShellApi, System.IOUtils;
procedure TFormMain.CreateWnd;
begin
inherited;
DragAcceptFiles(Self.Handle, True);
end;
procedure TFormMain.WMDropFiles(var Msg: TMessage);
var
hDrop: THandle;
FileCount, NameLen, i: Integer;
CurrFile: String;
FileSysEntries: TArray<String>;
begin
hDrop := Msg.wParam;
try
FileCount := DragQueryFile(hDrop, $FFFFFFFF, nil, 0);
for i := 0 to FileCount - 1 do
begin
NameLen := DragQueryFile(hDrop, i, nil, 0);
SetLength(CurrFile, NameLen);
DragQueryFile(hDrop, i, PChar(CurrFile), NameLen + 1);
if TDirectory.Exists(CurrFile) then
begin
FileSysEntries := TDirectory.GetFiles(CurrFile, '*.*', TSearchOption.soAllDirectories);
//...
end;
end;
finally
DragFinish(hDrop);
end;
end;

Replacing string assignment by PChar operation

I have a puzzling result that I'm struggling to understand.
I've been attempting to improve the speed of this routine
function TStringRecord.GetWord: String;
begin
// return the next word in Input
Result := '';
while (PC^ <> #$00) and not PC^.IsLetter do begin
inc(FPC);
end;
while (PC^ <> #$00) and PC^.IsLetter do begin
Result := Result + PC^;
inc(FPC);
end;
end;
by replacing the Result := Result + PC^ by a pointer-based operation. This
is my attempt:
function TStringRecord.GetWord2: String;
var
Len : Integer;
StartPC,
DestPC : PChar;
begin
// return the next word in Input
Result := '';
while (PC^ <> #$00) and not PC^.IsLetter do begin
inc(FPC);
end;
Len := Length(Input);
SetLength(Result, Len);
StartPC := PChar(Result);
DestPC := PChar(Result);
while (PC^ <> #$00) and PC^.IsLetter do begin
WStrPLCopy(DestPC, PC, 1);
inc(FPC);
inc(DestPC);
end;
SetLength(Result, DestPC - StartPC);
end;
According to my line profiler, WStrPLCopy(DestPC, PC, 1) takes 50 times longer
than Result := Result + PC^. As far as I can tell, this is because on entry
to WStrPLCopy there is a call to _WStrFromPWChar which seems to copy many more
characters than the one necessary. How can I avoid this, or can someone suggest
an alternative PChar-based method?
The remainder of my code is below:
TStringRecord = record
private
FPC: PChar;
FInput: String;
procedure SetInput(const Value: String);
public
function NextWord : String;
function NextWord2 : String;
property Input : String read FInput write SetInput;
property PC : PChar read FPC;
end;
procedure TStringRecord.SetInput(const Value: String);
begin
FInput := Value;
FPC := PChar(Input);
end;

This is how I would write it:
function TStringRecord.GetWord: String;
var beg: PChar;
begin
// return the next word in Input
while (FPC^ <> #0) and not FPC^.IsLetter do
inc(FPC);
beg := FPC;
while (FPC^ <> #0) and FPC^.IsLetter do
inc(FPC);
SetString(result, beg, FPC-beg);
end;
With this, code is very readable, and you have a single memory allocation, and I guess you could not write anything faster (but by inlining PC^.IsLetter, which is the only call to an external piece of code).

Delphi's TPerlRegEx.EscapeRegExChars() always return an empty string?

With Delphi XE4, try the following code:
procedure TForm3.Button1Click(Sender: TObject);
var
myStr: string;
begin
Edit1.Text := TPerlRegEx.EscapeRegExChars('test');
end;
The result (Edit1.Text) is empty.
Is this a bug or I'm missing something? I previously had no problem with this TPerlRegEx.EscapeRegExChars function with the version from regular-expressions.info pre-DelphiXE.
Update 2: Just upgrading an app written in D2010 and encountering this bug, but just wondering how such an obvious bug can exist this long... now I'm seriously considering making my code compatible to Free Pascal, but I really like the antonymous method...
Update 1: I'm using Delphi XE4 Update 1.

It appears to be a bug. If that's the case, both the XE4 and XE5 versions contain it. I've opened a QC report to report it for XE4..XE6.
The problem appears to be with the last line of the function:
Result.Create(Tmp, 0, J);
Stepping through in the debugger shows that the Tmp (a TCharArray) correctly contains 't','e','s','t', #0, #0, #0, #0 at that point, yet Result contains '' when the function actually returns, as setting a breakpoint on the end; following that line indicates that result contains '' at that point (and when the function returns).
Providing a replacement version in a class helper with a minor change to actually store the return value from the call to Create fixes the problem:
type
TPerlRegExHelper = class helper for TPerlRegEx
public
class function EscapeRegExCharsEx(const S: string): string; static;
end;
class function TPerlRegExHelper.EscapeRegExCharsEx(const S: string): string;
var
I, J: Integer;
Tmp: TCharArray;
begin
SetLength(Tmp, S.Length * 2);
J := 0;
for I := Low(S) to High(S) do
begin
case S[I] of
'.', '[', ']', '(', ')', '?', '*', '+', '{', '}', '^', '$', '|', '\':
begin
Tmp[J] := '\';
Inc(j);
Tmp[J] := S[I];
end;
#0:
begin
Tmp[J] := '\';
Inc(j);
Tmp[J] := '0';
end;
else
Tmp[J] := S[I];
end;
Inc(J);
end;
{ Result.Create(Tmp, 0, J); } // The problem code from the original
Result := String.Create(Tmp, 0, J);
end;
The XE3 (and the open-source version you mention) implement the logic totally differently, using the more standard manipulation of Result beginning at the first line of the function with Result := S;, and then using System.Insert as needed to add room for the escape characters.

This is a bug introduced in the XE4 release that is still present in XE6. Previous versions were fine. It looks like the changes were made in readiness for some future switch to immutable strings.
Rather ironically the bug is caused by the string never being assigned a value at all. It's one thing to set out not to mutate a string, but quite another never to initialize it!
So to the analysis of the bug. The method in question in TPerlRegEx.EscapeRegExChars defined in the System.RegularExpressionsCore unit. This is a class function that returns a string. Its signature is:
class function EscapeRegExChars(const S: string): string;
The XE4 implementation makes but one reference to the result variable. As follows:
Result.Create(Tmp, 0, J);
Here, Tmp is an array of char containing the escaped text to be returned, and J is the length of that text.
So, it seems clear that the author intended for this code to assign to the function return variable Result. Sadly that does not occur. Why not? Well, the Create method being called is defined in the helper for string. This is TStringHelper defined in the System.SysUtils unit. There are three Create overloads and the one in play here is:
class function Create(const Value: array of Char; StartIndex: Integer;
Length: Integer): string; overload; static;
Note that this is a class static function. That means that it is not an instance method and has no Self pointer. So when called like this:
Result.Create(Tmp, 0, J);
It is simply a function call whose return value is ignored. It might appear that the result variable would be set but remember that this Create is a class static method. It therefore has no instance. The compiler simply uses the type of Result to resolve the method. The code is equivalent to:
string.Create(Tmp, 0, J);
Nothing more exciting than a call to a function whose return value is simply ignored. Defeated by the extended syntax that allows us to ignore function return values.
The fix to the code is simple enough. Replace that final line with
Result := string.Create(Tmp, 0, J);
You could apply the fix in a copy of the unit, and include that unit in your code. An alternative to that, my preferred option, is to use a code hook. Like this:
unit FixTPerlRegExEscapeRegExChars;
interface
implementation
uses
System.SysUtils, Winapi.Windows, System.RegularExpressionsCore;
procedure PatchCode(Address: Pointer; const NewCode; Size: Integer);
var
OldProtect: DWORD;
begin
if VirtualProtect(Address, Size, PAGE_EXECUTE_READWRITE, OldProtect) then
begin
Move(NewCode, Address^, Size);
FlushInstructionCache(GetCurrentProcess, Address, Size);
VirtualProtect(Address, Size, OldProtect, #OldProtect);
end;
end;
type
PInstruction = ^TInstruction;
TInstruction = packed record
Opcode: Byte;
Offset: Integer;
end;
procedure RedirectProcedure(OldAddress, NewAddress: Pointer);
var
NewCode: TInstruction;
begin
NewCode.Opcode := $E9;//jump relative
NewCode.Offset := NativeInt(NewAddress)-NativeInt(OldAddress)-SizeOf(NewCode);
PatchCode(OldAddress, NewCode, SizeOf(NewCode));
end;
function EscapeRegExChars(Self: TPerlRegEx; const S: string): string;
var
I, J: Integer;
Tmp: TCharArray;
begin
SetLength(Tmp, S.Length * 2);
J := 0;
for I := Low(S) to High(S) do
begin
case S[I] of
'.', '[', ']', '(', ')', '?', '*', '+', '{', '}', '^', '$', '|', '\':
begin
Tmp[J] := '\';
Inc(j);
Tmp[J] := S[I];
end;
#0:
begin
Tmp[J] := '\';
Inc(j);
Tmp[J] := '0';
end;
else
Tmp[J] := S[I];
end;
Inc(J);
end;
Result := string.Create(Tmp, 0, J);
end;
initialization
RedirectProcedure(#TPerlRegEx.EscapeRegExChars, #EscapeRegExChars);
end.
Add this unit to your project and the calls to TPerlRegEx.EscapeRegExChars will start working again.
{$APPTYPE CONSOLE}
uses
System.RegularExpressionsCore,
FixTPerlRegExEscapeRegExChars in 'FixTPerlRegExEscapeRegExChars.pas';
begin
Writeln(TPerlRegEx.EscapeRegExChars('test'));
Readln;
end.
Output
test
QC#124091

How to execute a SQL script using dbExpress?

I'm migrating an old Delphi application (using ZeosDB) to Delphi XE2. I want to use dbExpress as a ZeosDB replacement for database access to Firebird 2.5 or MS-SQL.
There are a lot of sql scripts for creating tables, view and stored procedures I need to run. The Firebird script commands are seperated with ^, MS-SQL script commands with "GO".
How can I run these scripts on the database using a dbexpress connection? ZeosDB provides a TZSqlProcessor, but I can't find any equivalent component for dbExpress.

I do not use DBExpress but as far as I am aware, you can execute (either by Execute or ExecuteDirect) only one SQL command at a time. In other words you cannot put the whole script into the Execute method.
This is not related to different command syntax used by FireBird and MS SQL (^ vs. GO). You have to understand the '^' sign or 'GO' command is not a "TSQL Command"! Both are specific command delimiters used by respective application used to execute commands against the SQL engines. Instead it is difference between "Firebird Manager" (or how it's called) and "SQL Query Profiler" (or "SQL Server Management Studio").
The solution is to use some kind of parser, split the script into a list of single commands, and TSQLConnection.Execute these commands one-by-one.
Something like this pseudocode:
var
DelimiterPos: Integer;
S: String;
Command: String;
begin
S:= ScriptFile; // ScriptFile: String - your whole script
While True Do
begin
DelimiterPos:= Pos('^', ScriptFile);
if DelimiterPos = 0 then DelimiterPos:= Length(S);
Command:= Copy(S, 1, DelimiterPos - 1);
SQLConnection.Execute(Command);
Delete(S, 1, DelimiterPos);
if Lengh(S) = 0 Then Exit;
end;
end;
Please note that the sample above will work correctly only in cases that the '^' sign is not used anywhere in the script but a command separator.
As a sidenote, I am sure there are some already built components that will do that for you (like TZSQLProcessor). I am not aware of any to point you to.
Sidenote 2: I am pretty sure, that you'll have to modify your scripts to be fully compatible with MS SQL. Eventhough Firebird and MS SQL are both SQL servers there is always difference in DML/DDL syntax.
Edit:
If you can "rewrite" the SQL script into the code, you could use Jedi VCL jvStringHolder component. Put each separate command as one item (of type TStrings) in jvStringHolder.
Creating the parser is rather complicated, but not undoable. With the inspiration from SynEdit i made these clases to exactly what you need: Load the script with TSQLScript.ParseScript, then iterate through Command[index: integer] property. The SQLLexer is not full SQL Lexer, but implements keywords separation with respec to comments, brackets, code folding etc. I've also added a special syntax into comments ($ sign in comment block) that helps me put titles into the script.
This is full copy-paste from one of my projects. I'm not giving any more explanation, but I hope you can get the idea and make it running in your project.
unit SQLParser;
interface
type
TTokenKind = (tkUknown, tkEOF, tkComment, tkKeyword, tkIdentifier,
tkCommentParam, tkCommentParamValue, tkCommandEnd, tkCRLF);
TBlockKind = (bkNone, bkLineComment, bkBlockComment);
TSQLLexer = class
private
FBlockKind: TBlockKind;
FParseString: String;
FPosition: PChar;
FTokenKind: TTokenKind;
FTokenPosition: PChar;
function GetToken: String;
procedure Reset;
procedure SetParseString(Value: String);
protected
procedure ReadComment;
procedure ReadCommentParam;
procedure ReadCommentParamValue;
procedure ReadCRLF;
procedure ReadIdentifier;
procedure ReadSpace;
public
constructor Create(ParseString: String);
function NextToken: TTokenKind;
property Position: PChar read FPosition;
property SQLText: String read FParseString write SetParseString;
property Token: String read GetToken;
property TokenKind: TTokenKind read FTokenKind;
property TokenPosition: PChar read FTokenPosition;
end;
implementation
uses SysUtils;
{ TSQLLexer }
constructor TSQLLexer.Create(ParseString: string);
begin
inherited Create;
FParseString:= ParseString;
Reset;
end;
function TSQLLexer.GetToken;
begin
SetString(Result, FTokenPosition, FPosition - FTokenPosition);
end;
function TSQLLexer.NextToken: TTokenKind;
begin
case FBlockKind of
bkLineComment, bkBlockComment: ReadComment;
else
case FPosition^ of
#0: FTokenKind:= tkEOF;
#1..#9, #11, #12, #14..#32:
begin
ReadSpace;
NextToken;
end;
#10, #13: ReadCRLF;
'-':
if PChar(FPosition +1)^ = '-' then
ReadComment
else
Inc(FPosition);
'/':
if PChar(FPosition +1)^ = '*' then
ReadComment
else
Inc(FPosition);
'a'..'z', 'A'..'Z': ReadIdentifier;
';':
begin
FTokenPosition:= FPosition;
Inc(FPosition);
FTokenKind:= tkCommandEnd;
end
else
Inc(FPosition);
end;
end;
Result:= FTokenKind;
end;
procedure TSQLLexer.ReadComment;
begin
FTokenPosition:= FPosition;
if not (FBlockKind in [bkLineComment, bkBlockComment]) then
begin
if FPosition^ = '/' then
FBlockKind:= bkBlockComment
else
FBlockKind:= bkLineComment;
Inc(FPosition, 2);
end;
case FPosition^ of
'$': ReadCommentParam;
':': ReadCommentParamValue;
else
while not CharInSet(FPosition^, [#0, '$']) do
begin
if FBlockKind = bkBlockComment then
begin
if (FPosition^ = '*') And (PChar(FPosition + 1)^ = '/') then
begin
Inc(FPosition, 2);
FBlockKind:= bkNone;
Break;
end;
end
else
begin
if CharInSet(Fposition^, [#10, #13]) then
begin
ReadCRLF;
FBlockKind:= bkNone;
Break;
end;
end;
Inc(FPosition);
end;
FTokenKind:= tkComment;
end;
end;
procedure TSQLLexer.ReadCommentParam;
begin
Inc(FPosition);
ReadIdentifier;
FTokenKind:= tkCommentParam;
end;
procedure TSQLLexer.ReadCommentParamValue;
begin
Inc(FPosition);
ReadSpace;
FTokenPosition:= FPosition;
while not CharInSet(FPosition^, [#0, #10, #13]) do
Inc(FPosition);
FTokenKind:= tkCommentParamValue;
end;
procedure TSQLLexer.ReadCRLF;
begin
while CharInSet(FPosition^, [#10, #13]) do
Inc(FPosition);
FTokenKind:= tkCRLF;
end;
procedure TSQLLexer.ReadIdentifier;
begin
FTokenPosition:= FPosition;
while CharInSet(FPosition^, ['a'..'z', 'A'..'Z', '_']) do
Inc(FPosition);
FTokenKind:= tkIdentifier;
if Token = 'GO' then
FTokenKind:= tkKeyword;
end;
procedure TSQLLexer.ReadSpace;
begin
while CharInSet(FPosition^, [#1..#9, #11, #12, #14..#32]) do
Inc(FPosition);
end;
procedure TSQLLexer.Reset;
begin
FTokenPosition:= PChar(FParseString);
FPosition:= FTokenPosition;
FTokenKind:= tkUknown;
FBlockKind:= bkNone;
end;
procedure TSQLLexer.SetParseString(Value: String);
begin
FParseString:= Value;
Reset;
end;
end.
The parser:
type
TScriptCommand = class
private
FCommandText: String;
public
constructor Create(ACommand: String);
property CommandText: String read FCommandText write FCommandText;
end;
TSQLScript = class
private
FCommands: TStringList;
function GetCount: Integer;
function GetCommandList: TStrings;
function GetCommand(index: Integer): TScriptCommand;
protected
procedure AddCommand(AName: String; ACommand: String);
public
Constructor Create;
Destructor Destroy; override;
procedure ParseScript(Script: TStrings);
property Count: Integer read GetCount;
property CommandList: TStrings read GetCommandList;
property Command[index: integer]: TScriptCommand read GetCommand;
end;
{ TSQLScriptCommand }
constructor TScriptCommand.Create(ACommand: string);
begin
inherited Create;
FCommandText:= ACommand;
end;
{ TSQLSCript }
constructor TSQLScript.Create;
begin
inherited;
FCommands:= TStringList.Create(True);
FCommands.Duplicates:= dupIgnore;
FCommands.Sorted:= False;
end;
destructor TSQLScript.Destroy;
begin
FCommands.Free;
inherited;
end;
procedure TSQLScript.AddCommand(AName, ACommand: String);
var
ScriptCommand: TScriptCommand;
S: String;
begin
if AName = '' then
S:= SUnnamedCommand
else
S:= AName;
ScriptCommand:= TScriptCommand.Create(ACommand);
FCommands.AddObject(S, ScriptCommand);
end;
function TSQLScript.GetCommand(index: Integer): TScriptCommand;
begin
Result:= TScriptCommand(FCommands.Objects[index]);
end;
function TSQLScript.GetCommandList: TStrings;
begin
Result:= FCommands;
end;
function TSQLScript.GetCount: Integer;
begin
Result:= FCommands.Count;
end;
procedure TSQLScript.ParseScript(Script: TStrings);
var
Title: String;
Command: String;
LastParam: String;
LineParser: TSQLLexer;
IsNewLine: Boolean;
LastPos: PChar;
procedure AppendCommand;
var
S: String;
begin
SetString(S, LastPos, LineParser.Position - LastPos);
Command:= Command + S;
LastPos:= LineParser.Position;
end;
procedure FinishCommand;
begin
if Command <> '' then
AddCommand(Title, Command);
Title:= '';
Command:= '';
LastPos:= LineParser.Position;
if LastPos^ = ';' then Inc(LastPos);
end;
begin
LineParser:= TSQLLexer.Create(Script.Text);
try
LastPos:= LineParser.Position;
IsNewLine:= True;
repeat
LineParser.NextToken;
case LineParser.TokenKind of
tkComment: LastPos:= LineParser.Position;
tkCommentParam:
begin
LastParam:= UpperCase(LineParser.Token);
LastPos:= LineParser.Position;
end;
tkCommentParamValue:
if LastParam = 'TITLE' then
begin
Title:= LineParser.Token;
LastParam:= '';
LastPos:= LineParser.Position;
end;
tkKeyword:
if (LineParser.Token = 'GO') and IsNewLine then FinishCommand
else
AppendCommand;
tkEOF:
FinishCommand;
else
AppendCommand;
end;
IsNewLine:= LineParser.TokenKind in [tkCRLF, tkCommandEnd];
until LineParser.TokenKind = tkEOF;
finally
LineParser.Free;
end;
end;

You need to use the TSQLConnection. This is component have a two methods, Execute and ExecuteDirect. The first does not accept parameters, but the second method does.
Using the first method:
procedure TForm1.Button1Click(Sender: TObject);
var
MeuSQL: String;
begin
MeuSQL := 'INSERT INTO YOUR_TABLE ('FIELD1', 'FIELD2') VALUES ('VALUE1', 'VALUE2')';
SQLConnection.ExecuteDirect(MeuSQL);
end;
If you want, you can use a transaction.
Using the second method:
procedure TForm1.Button1Click(Sender: TObject);
var
MySQL: string;
MyParams: TParams;
begin
MySQL := 'INSERT INTO TABLE ("FIELD1", "FIELD2") VALUE (:PARAM1, :PARAM2)';
MyParams.Create;
MyParams.CreateParam(ftString, 'PARAM1', ptInput).Value := 'Seu valor1';
MyParams.CreateParam(ftString, 'PARAM2', ptInput).Value := 'Seu valor2';
SQLConnection1.Execute(MySQL,MyParams, Nil);
end;

I am about 90% sure that you can't, at least not without parsing the individual commands between the GO's, and then serially executing each of them, which as you have already pointed out, is problematic.
(I would be happy to be disproved on the above, and quite interested in seeing the solution...)
If you are simply using the script as initialisation logic (e.g. to create tables, etc), another solution you could consider would be to fire off the scripts in a batch file and executing them via 'Sqlcmd', which could be executed via your delphi app (using ShellExecute), which then waits for it to complete before continuing.
Not as elegant as using a component, but if it is just for initialisation logic it may be a quick, acceptable compromise. I certainly wouldn't consider the above for any processing post-initialisation.

This doesn't appear to be a dbExpress limitation, but a SQL language limitation. I'm not sure about T-SQL, but it seems like the GO is similar to an anonymous block in Oracle PL/SQL. You can put the following PL/SQL code in the TSqlDataSet.CommandText and call ExecSQL to create multiple tables. Maybe T-SQL has a similar way to do this:
begin
execute immediate 'CREATE TABLE suppliers
( supplier_id number(10) not null,
supplier_name varchar2(50) not null,
contact_name varchar2(50)
)';
execute immediate 'CREATE TABLE customers
( customer_id number(10) not null,
customer_name varchar2(50) not null,
address varchar2(50),
city varchar2(50),
state varchar2(25),
zip_code varchar2(10),
CONSTRAINT customers_pk PRIMARY KEY (customer_id)
)';
end;

I don't know how often you need to create those tables, but how about putting all separate SQL create scripts in a table, with sequential/version numbering?
Than you can go through that table and execute 'm one by one.
You'll need to split your scripts once, but after that it's much more maintainable.

What is the fastest way of stripping non alphanumeric characters from a string in Delphi7?

The characters allowed are A to Z, a to z, 0 to 9. The least amount of code or a single function would be best as the system is time critical on response to input.

If I understand you correctly you could use a function like this:
function StripNonAlphaNumeric(const AValue: string): string;
var
SrcPtr, DestPtr: PChar;
begin
SrcPtr := PChar(AValue);
SetLength(Result, Length(AValue));
DestPtr := PChar(Result);
while SrcPtr[0] <> #0 do begin
if SrcPtr[0] in ['a'..'z', 'A'..'Z', '0'..'9'] then begin
DestPtr[0] := SrcPtr[0];
Inc(DestPtr);
end;
Inc(SrcPtr);
end;
SetLength(Result, DestPtr - PChar(Result));
end;
This will use PChar for highest speed (at the cost of less readability).
Edit: Re the comment by gabr about using DestPtr[0] instead of DestPtr^: This should compile to the same bytes anyway, but there are nice applications in similar code, where you need to look ahead. Suppose you would want to replace newlines, then you could do something like
function ReplaceNewlines(const AValue: string): string;
var
SrcPtr, DestPtr: PChar;
begin
SrcPtr := PChar(AValue);
SetLength(Result, Length(AValue));
DestPtr := PChar(Result);
while SrcPtr[0] <> #0 do begin
if (SrcPtr[0] = #13) and (SrcPtr[1] = #10) then begin
DestPtr[0] := '\';
DestPtr[1] := 't';
Inc(SrcPtr);
Inc(DestPtr);
end else
DestPtr[0] := SrcPtr[0];
Inc(SrcPtr);
Inc(DestPtr);
end;
SetLength(Result, DestPtr - PChar(Result));
end;
and therefore I don't usually use the ^.

uses JclStrings;
S := StrKeepChars('mystring', ['A'..'Z', 'a'..'z', '0'..'9']);

Just to add a remark.
The solution using a set is fine in Delphi 7, but it can cause some problems in Delphi 2009 because sets can't be of char (they are converted to ansichar).
A solution you can use is:
case key of
'A'..'Z', 'a'..'z', '0'..'9' : begin end; // No action
else
Key := #0;
end;
But the most versatile way is of course:
if not ValidChar(key) then
Key := #0;
In that case you can use ValidChar in multiple locations and if it need to be changed you only have to change it once.

OnKeypress event
begin
if not (key in ['A'..'Z','a'..'z','0'..'9']) then
Key := #0;
end;

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

What is the fastest way to Parse a line in Delphi? - delphi

I think the biggest bottleneck is always going to be getting the file into memory. Once you have it in memory (obviously not all of it at once, but I would work with buffers if I were you), the actual parsing should be insignificant.

This begs another question - How big? Give us a clue like # of lines or # or Mb (Gb)? Then we will know if it fits in memory, needs to be disk based etc. At first pass I would use my WordList(S: String; AList: TStringlist); then you can access each token as Alist[n]... or sort them or whatever.

Rolling your own is the fastest way for sure. For more on this topic, you could see Synedit's source code which contains lexers (called highlighters in the project's context) for about any language on the market. I suggest you take one of those lexers as a base and modify for your own usage.

Related

Why do I get a stack overflow when I call the Windows API from my Delphi program?

Replacing string assignment by PChar operation

Delphi's TPerlRegEx.EscapeRegExChars() always return an empty string?

How to execute a SQL script using dbExpress?

What is the fastest way of stripping non alphanumeric characters from a string in Delphi7?

Categories

Resources