Strange behavior of move with strings - delphi

I am testing some enhanced string related functions with which I am trying to use move as a way to copy strings around for faster, more efficient use without delving into pointers.
While testing a function for making a delimited string from a TStringList, I encountered a strange issue. The compiler referenced the bytes contained through the index when it was empty and when a string was added to it through move, index referenced the characters contained.
Here is a small downsized barebone code sample:-
unit UI;
interface
uses
System.SysUtils, System.Types, System.UITypes, System.Rtti, System.Classes,
System.Variants, FMX.Types, FMX.Controls, FMX.Forms, FMX.Dialogs, FMX.Layouts,
FMX.Memo;
type
TForm1 = class(TForm)
Results: TMemo;
procedure FormCreate(Sender: TObject);
end;
var
Form1: TForm1;
implementation
{$R *.fmx}
function StringListToDelimitedString
( const AStringList: TStringList; const ADelimiter: String ): String;
var
Str : String;
Temp1 : NativeInt;
Temp2 : NativeInt;
DelimiterSize : Byte;
begin
Result := ' ';
Temp1 := 0;
DelimiterSize := Length ( ADelimiter ) * 2;
for Str in AStringList do
Temp1 := Temp1 + Length ( Str );
SetLength ( Result, Temp1 );
Temp1 := 1;
for Str in AStringList do
begin
Temp2 := Length ( Str ) * 2;
// Here Index references bytes in Result
Move ( Str [1], Result [Temp1], Temp2 );
// From here the index seems to address characters instead of bytes in Result
Temp1 := Temp1 + Temp2;
Move ( ADelimiter [1], Result [Temp1], DelimiterSize );
Temp1 := Temp1 + DelimiterSize;
end;
end;
procedure TForm1.FormCreate(Sender: TObject);
var
StrList : TStringList;
Str : String;
begin
// Test 1 : StringListToDelimitedString
StrList := TStringList.Create;
Str := '';
StrList.Add ( 'Hello1' );
StrList.Add ( 'Hello2' );
StrList.Add ( 'Hello3' );
StrList.Add ( 'Hello4' );
Str := StringListToDelimitedString ( StrList, ';' );
Results.Lines.Add ( Str );
StrList.Free;
end;
end.
Please devise a solution and if possible, some explanation. Alternatives are welcome too.

Let's look at the crucial bit of code:
// Here Index references bytes in Result
Move ( Str [1], Result [Temp1], Temp2 );
// From here the index seems to address characters instead of bytes in Result
Temp1 := Temp1 + Temp2;
Move ( ADelimiter [1], Result [Temp1], DelimiterSize );
Now, some explanations. When you index a string, you are always indexing characters. You are never indexing bytes. It looks to me as though you wish to index bytes. In which case using the string index operator makes life hard. So I suggest that you index bytes as follows.
First of all initialise Temp1 to 0 rather than 1 since we will be using zero-based indexing.
When you need to index Result using a zero-based byte index, do so like this:
PByte(Result)[Temp1]
So your code becomes:
Temp1 := 0;
for Str in AStringList do
begin
Temp2 := Length(Str)*2;
Move(Str[1], PByte(Result)[Temp1], Temp2);
Temp1 := Temp1 + Temp2;
Move(ADelimiter[1], PByte(Result)[Temp1], DelimiterSize);
Temp1 := Temp1 + DelimiterSize;
end;
In fact I think I'd write it like this, avoiding all string indexing:
Temp1 := 0;
for Str in AStringList do
begin
Temp2 := Length(Str)*2;
Move(Pointer(Str)^, PByte(Result)[Temp1], Temp2);
Temp1 := Temp1 + Temp2;
Move(Pointer(ADelimiter)^, PByte(Result)[Temp1], DelimiterSize);
Temp1 := Temp1 + DelimiterSize;
end;
I'd suggest better names than Temp1 and Temp2. I also question the use of NativeInt here. I'd normally expect to see Integer. Not least because a Delphi string is indexed by signed 32 bit values. You cannot have a string with length greater than 2GB.
Note also that you are not allocating enough memory. You forgot to account for the length of the delimiter. Fix that and your function looks like this:
function StringListToDelimitedString(const AStringList: TStringList;
const ADelimiter: String): String;
var
Str: String;
Temp1: Integer;
Temp2: Integer;
DelimiterSize: Integer;
begin
Temp1 := 0;
DelimiterSize := Length(ADelimiter) * SizeOf(Char);
for Str in AStringList do
inc(Temp1, Length(Str) + DelimiterSize);
SetLength(Result, Temp1);
Temp1 := 0;
for Str in AStringList do
begin
Temp2 := Length(Str) * SizeOf(Char);
Move(Pointer(Str)^, PByte(Result)[Temp1], Temp2);
inc(Temp1, Temp2);
Move(Pointer(ADelimiter)^, PByte(Result)[Temp1], DelimiterSize);
inc(Temp1, DelimiterSize);
end;
end;
If you want to avoid pointers, then write it like this:
function StringListToDelimitedString(const AStringList: TStringList;
const ADelimiter: String): String;
var
Str: String;
StrLen: Integer;
ResultLen: Integer;
DelimiterLen: Integer;
ResultIndex: Integer;
begin
DelimiterLen := Length(ADelimiter);
ResultLen := 0;
for Str in AStringList do
inc(ResultLen, Length(Str) + DelimiterLen);
SetLength(Result, ResultLen);
ResultIndex := 1;
for Str in AStringList do
begin
StrLen := Length(Str);
Move(Pointer(Str)^, Result[ResultIndex], StrLen*SizeOf(Char));
inc(ResultIndex, StrLen);
Move(Pointer(ADelimiter)^, Result[ResultIndex], DelimiterLen*SizeOf(Char));
inc(ResultIndex, DelimiterLen);
end;
end;

System.Move works with untyped pointers and counter of bytes. System.Copy and SysUtils.StrLCopy work with strings (Pascal strings and C strings respectively) and counter of chars. But char and byte are different types, so when you move from string/char context to pointers/bytes context - you should re-calculate length in chars to length in bytes. By the way, same is about indices, Result [Temp1] calculates in characters, not in bytes. And always did.
Correct solution is not mixing citizens of different planets. If you want pointers - use pointers. If you want characters and strings - use characters and strings. But do not mix them! divide and conquer and always separate and make clear when you're using raw piinters and where you use typed strings! Otherwise you're misleading yourself;
function StringListToDelimitedString
( const AStringList: TStrings; const ADelimiter: String ): String;
var
Str : array of String;
Lengths : array of Integer;
Temp1 : NativeInt;
Count, TotalChars : Integer;
PtrDestination: PByte;
PCurStr: ^String;
CurLen: Integer;
Procedure Add1(const Source: string);
var count: integer; // all context is in bytes, not chars here!
Ptr1, Ptr2: PByte;
begin
if Source = '' then exit;
Ptr1 := #Source[ 1 ];
Ptr2 := #Source[ Length(Source)+1 ];
count := ptr2 - ptr1;
Move( Source[1], PtrDestination^, count);
Inc(PtrDestination, count);
end;
begin // here all context is in chars and typed strings, not bytes
Count := AStringList.Count;
if Count <= 0 then exit('');
SetLength(Str, Count); SetLength(Lengths, Count);
TotalChars := 0;
for Temp1 := 0 to Count - 1 do begin
PCurStr := #Str[ Temp1 ];
PCurStr^ := AStringList[ Temp1 ]; // caching content, avoiding extra .Get(I) calls
CurLen := Length ( PCurStr^ ); // caching length, avoind extra function calls
Lengths[ Temp1 ] := CurLen;
Inc(TotalChars, CurLen);
end;
SetLength ( Result, TotalChars + ( Count-1 )*Length( ADelimiter ) );
PtrDestination := Pointer(Result[1]);
// Calls UniqueString to get a safe pointer - but only once
for Temp1 := Low(Str) to High(Str) do
begin
Add1( Str[ Temp1 ] );
Dec( Count );
if Count > 0 // not last string yet
then Add1( Delimeter );
end;
end;
Now, the correct solution i believe would be stopping inventing bicycles and using ready-made and tested libraryes, for example.
Str := JclStringList().Add(['Hello1','Hello2','Hello3','Hello4']).Join(';');
Or, if you really need to add a delimiter PAST THE LAST string (which is usually carefully avoided) then
Str := JclStringList().Add(['Hello1','Hello2','Hello3','Hello4', '']).Join(';');
The original claim of squeezing single percents of CPU power just does not hold with the original code. The illusion of fast pointer operations is just shadowed by a suboptimal code caring not about performance at all.
function StringListToDelimitedString
( const AStringList: TStringList; const ADelimiter: String ): String;
TStringList is a class. Class instance creation and deletion are expensive (slow) operations. Delphi made a flexible framework of those classes - but the speed suffers. So if you want to get few extra percents of speed and pay with sacrificing reliability and flexibility - the don't use classes.
DelimiterSize : Byte;
It should be NativeInt instead just as the rest of muneric variables there. You think you just saved few bytes - but you forced CPU to use non-native datatype and insert typecasts every now and then. It is nothing but an explicitly introduced delay. Ironically, you even did not saved those bytes, cause Delphi would just pad three bytes more to allocate next variable on 32-bits boundary. That is a typical "memory alignment" optimization.
Result := ' ';
This value would never be used. So it is just a loss of time.
for Str in AStringList do
This construction, requiring instantiating TInterfacedObject and calling its virtual methods and then reference-counting it with global locking - is expensive (slow) operation. And twice slow in multithreading taskloads. If you need to squeeze few percetns of speed - you should avoid loosing tens of percents on for-in loops. Those hi-level loops are handy and reliable and flexible - but they pay with speed for that.
for Str in AStringList do
Then You do it twice. But you DON'T KNOW how it that stringlist implemented. How efficiently does it gets the string ? It may even pass messages to another process like TMemo.Lines do! So you should minimize all accesses to that class and its multitude of internal virtual members. Cache all the strings ONCE in some local variable, do not fetch TWICE every of those!
Move ( Str [1], Result [Temp1], Temp2 );
Now we come to a really interesting question - is there even hypothetical place to gain any speed advantage by usiong pointers and bytes? Open CPU window and look how that line is actually implemented!
Strings are reference-counted! When you do Str2 := Str1; no data is copied but only pointers do. But as you start accessing the real memory buffer inside the string - that Str[1] expression - the compiler can not more count references, so Delphi is forced here to decrease reference coutner to ONE SINGLE. That is, Delphi is forced here to call UniqueString over Str and over Result; the System.UniqueString checks the refcounter and if it is >1 makes a special local copy of string (copying all the data into a newly allocated special buffer). Then you do a Move - just like Delphi RTL does itself. I cannto get where any advantage of speed may come from?
Move ( ADelimiter [1], Result [Temp1], DelimiterSize )
And here the same operations are done yet again. And they are costly operations! At least an extra procedure is called, at worst the new buffer is allocated and all the content being copied.
Resume:
The boundary between reference-counted strings and raw pointers is a costly one and every time you cross it - you force Delphi to pay a price.
Mixing those boundaries in the same code make the price paid again and again and again. It also confuses yourself where your counters and indices refer to bytes and where they refer to chars.
Delphi optimized casual string operations for years. And did a pretty good job there. Outperforming Delphi is possible - but you would need to understand in very very fine details - up to each CPU assembler instruction - what goes under the curtains of Pascal sources in your program. That is a dirty and tedious work. There will be no mopre that luxury of using those reliable and flexible things as for-in loop and TStrings classes.
In the end you would most probably get a few percents of speed gain, that no one would ever notice. But you will pay for that with a code much harder to understand, write, read and test. Will those few percents of speed worth unmaintainable code ? I doubt it.
So unless you are forced to do so, my advice is to skip wasting your time and just do a usual Str := JclStringList().Add(['Hello1','Hello2','Hello3','Hello4']).Join(';');
Reliability and flexibility is almost always more preferable than sheer speed.
And sorry to tell that, while i do not know a lot about speed optimizations, i easily saw a speed-damaging issues in your code, that you intended to be faster than Delphi itself.
My experience is miles and miles away of even trying to outperform Delphi in strings field. And i do not think you have any chances other but wasting a lot of time to finally get performance worse than stock one.

Related

How to read last line in a text file using Delphi

I need to read the last line in some very large textfiles (to get the timestamp from the data). TStringlist would be a simple approach but it returns an out of memory error. I'm trying to use seek and blockread, but the characters in the buffer are all nonsense. Is this something to do with unicode?
Function TForm1.ReadLastLine2(FileName: String): String;
var
FileHandle: File;
s,line: string;
ok: 0..1;
Buf: array[1..8] of Char;
k: longword;
i,ReadCount: integer;
begin
AssignFile (FileHandle,FileName);
Reset (FileHandle); // or for binary files: Reset (FileHandle,1);
ok := 0;
k := FileSize (FileHandle);
Seek (FileHandle, k-1);
s := '';
while ok<>1 do begin
BlockRead (FileHandle, buf, SizeOf(Buf)-1, ReadCount); //BlockRead ( var FileHandle : File; var Buffer; RecordCount : Integer {; var RecordsRead : Integer} ) ;
if ord (buf[1]) <>13 then //Arg to integer
s := s + buf[1]
else
ok := ok + 1;
k := k-1;
seek (FileHandle,k);
end;
CloseFile (FileHandle);
// Reverse the order in the line read
setlength (line,length(s));
for i:=1 to length(s) do
line[length(s) - i+1 ] := s[i];
Result := Line;
end;
Based on www.delphipages.com/forum/showthread.php?t=102965
The testfile is a simple CSV I created in excel ( this is not the 100MB I ultimately need to read).
a,b,c,d,e,f,g,h,i,j,blank
A,B,C,D,E,F,G,H,I,J,blank
1,2,3,4,5,6,7,8,9,0,blank
Mary,had,a,little,lamb,His,fleece,was,white,as,snow
And,everywhere,that,Mary,went,The,lamb,was,sure,to,go
You really have to read the file in LARGE chunks from the tail to the head.
Since it is so large it does not fit the memory - then reading it line by line from start to end would be very slow. With ReadLn - twice slow.
You also has to be ready that the last line might end with EOL or may not.
Personally I would also account for three possible EOL sequences:
CR/LF aka #13#10=^M^J - DOS/Windows style
CR without LF - just #13=^M - Classic MacOS file
LF without CR - just #10=^J - UNIX style, including MacOS version 10
If you are sure your CSV files would only ever be generated by native Windows programs it would be safe to assume full CR/LF be used. But if there can be other Java programs, non-Windows platforms, mobile programs - I would be less sure. Of course pure CR without LF would be the least probable case of them all.
uses System.IOUtils, System.Math, System.Classes;
type FileChar = AnsiChar; FileString = AnsiString; // for non-Unicode files
// type FileChar = WideChar; FileString = UnicodeString;// for UTF16 and UCS-2 files
const FileCharSize = SizeOf(FileChar);
// somewhere later in the code add: Assert(FileCharSize = SizeOf(FileString[1]);
function ReadLastLine(const FileName: String): FileString; overload; forward;
const PageSize = 4*1024;
// the minimal read atom of most modern HDD and the memory allocation atom of Win32
// since the chances your file would have lines longer than 4Kb are very small - I would not increase it to several atoms.
function ReadLastLine(const Lines: TStringDynArray): FileString; overload;
var i: integer;
begin
Result := '';
i := High(Lines);
if i < Low(Lines) then exit; // empty array - empty file
Result := Lines[i];
if Result > '' then exit; // we got the line
Dec(i); // skip the empty ghost line, in case last line was CRLF-terminated
if i < Low(Lines) then exit; // that ghost was the only line in the empty file
Result := Lines[i];
end;
// scan for EOLs in not-yet-scanned part
function FindLastLine(buffer: TArray<FileChar>; const OldRead : Integer;
const LastChunk: Boolean; out Line: FileString): boolean;
var i, tailCRLF: integer; c: FileChar;
begin
Result := False;
if Length(Buffer) = 0 then exit;
i := High(Buffer);
tailCRLF := 0; // test for trailing CR/LF
if Buffer[i] = ^J then begin // LF - single, or after CR
Dec(i);
Inc(tailCRLF);
end;
if (i >= Low(Buffer)) and (Buffer[i] = ^M) then begin // CR, alone or before LF
Inc(tailCRLF);
end;
i := High(Buffer) - Max(OldRead, tailCRLF);
if i - Low(Buffer) < 0 then exit; // no new data to read - results would be like before
if OldRead > 0 then Inc(i); // the CR/LF pair could be sliced between new and previous buffer - so need to start a bit earlier
for i := i downto Low(Buffer) do begin
c := Buffer[i];
if (c=^J) or (c=^M) then begin // found EOL
SetString( Line, #Buffer[i+1], High(Buffer) - tailCRLF - i);
exit(True);
end;
end;
// we did not find non-terminating EOL in the buffer (except maybe trailing),
// now we should ask for more file content, if there is still left any
// or take the entire file (without trailing EOL if any)
if LastChunk then begin
SetString( Line, #Buffer[ Low(Buffer) ], Length(Buffer) - tailCRLF);
Result := true;
end;
end;
function ReadLastLine(const FileName: String): FileString; overload;
var Buffer, tmp: TArray<FileChar>;
// dynamic arrays - eases memory management and protect from stack corruption
FS: TFileStream; FSize, NewPos: Int64;
OldRead, NewLen : Integer; EndOfFile: boolean;
begin
Result := '';
FS := TFile.OpenRead(FileName);
try
FSize := FS.Size;
if FSize <= PageSize then begin // small file, we can be lazy!
FreeAndNil(FS); // free the handle and avoid double-free in finally
Result := ReadLastLine( TFile.ReadAllLines( FileName, TEncoding.ANSI ));
// or TEncoding.UTF16
// warning - TFIle is not share-aware, if the file is being written to by another app
exit;
end;
SetLength( Buffer, PageSize div FileCharSize);
OldRead := 0;
repeat
NewPos := FSize - Length(Buffer)*FileCharSize;
EndOfFile := NewPos <= 0;
if NewPos < 0 then NewPos := 0;
FS.Position := NewPos;
FS.ReadBuffer( Buffer[Low(Buffer)], (Length(Buffer) - OldRead)*FileCharSize);
if FindLastLine(Buffer, OldRead, EndOfFile, Result) then
exit; // done !
tmp := Buffer; Buffer := nil; // flip-flop: preparing to broaden our mouth
OldRead := Length(tmp); // need not to re-scan the tail again and again when expanding our scanning range
NewLen := Min( 2*Length(tmp), FSize div FileCharSize );
SetLength(Buffer, NewLen); // this may trigger EOutOfMemory...
Move( tmp[Low(tmp)], Buffer[High(Buffer)-OldRead+1], OldRead*FileCharSize);
tmp := nil; // free old buffer
until EndOfFile;
finally
FS.Free;
end;
end;
PS. Note one extra special case - if you would use Unicode chars (two-bytes ones) and would give odd-length file (3 bytes, 5 bytes, etc) - you would never be ble to scan the starting single byte (half-widechar). Maybe you should add the extra guard there, like Assert( 0 = FS.Size mod FileCharSize)
PPS. As a rule of thumb you better keep those functions out of the form class, - because WHY mixing them? In general you should separate concerns into small blocks. Reading file has nothing with user interaction - so should better be offloaded to an extra UNIT. Then you would be able to use functions from that unit in one form or 10 forms, in main thread or in multi-threaded application. Like LEGO parts - they give you flexibility by being small and separate.
PPPS. Another approach here would be using memory-mapped files. Google for MMF implementations for Delphi and articles about benefits and problems with MMF approach. Personally I think rewriting the code above to use MMF would greatly simplify it, removing several "special cases" and the troublesome and memory copying flip-flop. OTOH it would demand you to be very strict with pointers arithmetic.
https://en.wikipedia.org/wiki/Memory-mapped_file
https://msdn.microsoft.com/en-us/library/ms810613.aspx
http://torry.net/quicksearchd.php?String=memory+map&Title=No
Your char type is two byte, so that buffer is 16 byte. Then with blockread you read sizeof(buffer)-1 byte into it, and check the first 2 byte char if it is equal to #13.
The sizeof(buffer)-1 is dodgy (where does that -1 come from?), and the rest is valid, but only if your input file is utf16.
Also your read 8 (or 16) characters each time, but compare only one and then do a seek again. That is not very logical either.
If your encoding is not utf16, I suggest you change the type of a buffer element to ansichar and remove the -1
In response to kopiks suggestion, I figured out how to do it with TFilestream, it works ok with the simple test file, though there may be some further tweeks when I use it on a variety of csv files. Also, I don't make any claims that this is the most efficient method.
procedure TForm1.Button6Click(Sender: TObject);
Var
StreamSize, ApproxNumRows : Integer;
TempStr : String;
begin
if OpenDialog1.Execute then begin
TempStr := ReadLastLineOfTextFile(OpenDialog1.FileName,StreamSize, ApproxNumRows);
// TempStr := ReadFileStream('c:\temp\CSVTestFile.csv');
ShowMessage ('approximately '+ IntToStr(ApproxNumRows)+' Rows');
ListBox1.Items.Add(TempStr);
end;
end;
Function TForm1.ReadLastLineOfTextFile(const FileName: String; var StreamSize, ApproxNumRows : Integer): String;
const
MAXLINELENGTH = 256;
var
Stream: TFileStream;
BlockSize,CharCount : integer;
Hash13Found : Boolean;
Buffer : array [0..MAXLINELENGTH] of AnsiChar;
begin
Hash13Found := False;
Result :='';
Stream := TFileStream.Create(FileName, fmOpenRead or fmShareDenyWrite);
StreamSize := Stream.size;
if StreamSize < MAXLINELENGTH then
BlockSize := StreamSize
Else
BlockSize := MAXLINELENGTH;
// for CharCount := 0 to Length(Buffer)-1 do begin
// Buffer[CharCount] := #0; // zeroing the buffer can aid diagnostics
// end;
CharCount := 0;
Repeat
Stream.Seek(-(CharCount+3), 2); //+3 misses out the #0,#10,#13 at the end of the file
Stream.Read( Buffer[CharCount], 1);
Result := String(Buffer[CharCount]) + result;
if Buffer[CharCount] =#13 then
Hash13Found := True;
Inc(CharCount);
Until Hash13Found OR (CharCount = BlockSize);
ShowMessage(Result);
ApproxNumRows := Round(StreamSize / CharCount);
end;
Just thought of a new solution.
Again, there could be better ones, but this one is the best i thought of.
function GetLastLine(textFilePath: string): string;
var
list: tstringlist;
begin
list := tstringlist.Create;
try
list.LoadFromFile(textFilePath);
result := list[list.Count-1];
finally
list.free;
end;
end;

Best way to check if a character is contained in an array of char

I know, I can write
if C in ['#', ';'] then ...
if C is an AnsiChar.
But this
function CheckValid(C: Char; const Invalid: array of Char; OtherParams: TMyParams): Boolean;
begin
Result := C in Invalid; // <-- Error because Invalid is an array not a set
//maybe other tests...
//Result := Result and OtherTestsOn(OtherParams);
end;
yields E2015: Operator not applicable to this operand type.
Is there an easy way to check if a character is contained in an array of characters (other than iterate through the array)?
I know you don't want to, but this is one of those cases where iterating through the array really is your best option, for performance reasons:
function CheckValid(C: Char; const Invalid: array of Char): Boolean;
var
I: Integer;
begin
Result := False;
for I := Low(Invalid) to High(Invalid) do begin
if Invalid[I] = C then begin
Result = True;
Exit;
end;
end;
end;
Or:
function CheckValid(C: Char; const Invalid: array of Char): Boolean;
var
Ch: Char;
begin
Result := False;
for Ch in Invalid do begin
if Ch = C then begin
Result = True;
Exit;
end;
end;
end;
Converting the input data to strings just to search it can cause huge performance bottlenecks, especially if the function is called often, such as in a loop.
If one tries to avoid iterating through the array and if speed is of no concern then IndexOfAny can be helpful:
function CheckValid(C: Char; const Invalid: array of Char; OtherParams: TMyParams): Boolean;
begin
Result := string(C).IndexOfAny(Invalid) >= 0;
//maybe other test...
//....
end;
From the Delphi documentation:
[IndexOfAny r]eturns an integer indicating the position of the first given character found in the 0-based string. [It returns -1 if the character is not found.]
If speed is of concern, this should be avoided as #RemyLebeau explains in the comments:
Casting C to String to call IndexOfAny() will create 1 temp String. [...] if CheckValid() is called often, those conversions can be a BIG performance bottleneck, not to mention a waste of memory.
In this case #RemyLebeau's answer is the better solution.

How to insert string at index in TMemoryStream?

How can I insert a string at a specified index in TMemoryStream? If you added a string
"a" to an existing string "b" at Index 0 it would move it forward "ab" etc. For example this is what TStringBuilder.Insert does.
Expand the stream so that there is room for the text to be inserted. So, if the text to be inserted has length N, then you need to make the stream N bytes larger.
Copy all the existing content, starting from the insertion point, to the right to make room for the insertion. A call to Move will get this done. You'll be moving this text N bytes to the right.
Write the inserted string at the insertion point.
I'm assuming an 8 bit encoding. If you use a 16 bit encoding, then the stream needs to be grown by 2N bytes, and so on.
You will find that this is a potentially expensive operation. If you care about performance you will do whatever you can to avoid ever having to do this.
P.S. I'm sorry if I have offended any right-to-left language readers with my Anglo-centric assumption that strings run from left to right!
You asked for some code. Here it is:
procedure TMyStringBuilder.Insert(Index: Integer; const Str: string);
var
N: Integer;
P: Char;
begin
N := Length(Str);
if N=0 then
exit;
FStream.Size := FStream.Size + N*SizeOf(Char);
P := PChar(FStream.Memory);
Move((P + Index)^, (P + Index + N)^, (FStream.Size - N - Index)*SizeOf(Char));
Move(Pointer(Str)^, (P + Index)^, N*SizeOf(Char));
end;
Note that I wrote this code, and then looked at the code in TStringBuilder. It's pretty much identical to that!
The fact that the code you end up writing for this operation is identical to that in TStringBuilder should cause you to stop and contemplate. It's very likely that this new string builder replacement class that you are building will end up with the same implementation as the original. It's highly likely that your replacement will perform no better than the original, and quite plausible that the performance of the replacement will be worse.
It looks a little to me as though you are optimising prematurely. According to your comments below you have not yet timed your code to prove that time spent in TStringBuilder methods is your bottleneck. That's really the first thing that you need to do.
Assuming that you do this timing, and prove that TStringBuilder methods are your bottleneck, you then need to identify why that code is performing below par. And then you need to work out how the code could be improved. Simply repeating the implementation of the original class is not going to yield any benefits.
To move the existing data to make some room for the new string data, you can use pointer operation and Move procedure for faster operation. But it only has to be done if the insertion index is lower then the size of the original stream. If the index is larger than stream size then you could: (1) expand the stream size to accomodate the index number and fill the extra room with zero values or spaces, or (2) reduce the index value to the stream size, so the string will be inserted or appended in the end of the stream.
Depend on your needs, you could: (1) make a class derived from TMemoryStream, or (2) make a function to process an instance of TMemoryStream. Here's the first case:
type
TExtMemoryStream = class(TMemoryStream)
public
procedure InsertString(Index: Integer; const S: string);
end;
procedure TExtMemoryStream.InsertString(Index: Integer; const S: string);
var
SLength, OldSize: Integer;
Src, Dst, PointerToS: ^Char;
begin
if Index > Size then Index := Size;
SLength := Length(S);
OldSize := Size;
SetSize(Size + SLength);
Src := Memory; Inc(Src, Index);
Dst := Src; Inc(Dst, SLength);
Move(Src^, Dst^, OldSize - Index);
PointerToS := #S[1];
Move(PointerToS^, Src^, SLength);
end;
or the second case:
procedure InsertStringToMemoryStream(MS: TMemoryStream;
Index: Integer; const S: string);
var
SLength, OldSize: Integer;
Src, Dst, PointerToS: ^Char;
begin
if Index > MS.Size then Index := MS.Size;
SLength := Length(S);
OldSize := MS.Size;
MS.SetSize(MS.Size + SLength);
Src := MS.Memory; Inc(Src, Index);
Dst := Src; Inc(Dst, SLength);
Move(Src^, Dst^, OldSize - Index);
PointerToS := #S[1];
Move(PointerToS^, Src^, SLength);
end;
There, hope it helps :)

how to improve the code (Delphi) for loading and searching in a dictionary?

I'm a Delphi programmer.
I have made a program who uses dictionaries with words and expressions (loaded in program as "array of string").
It uses a search algorithm based on their "checksum" (I hope this is the correct word).
A string is transformed in integer based on this:
var
FHashSize: Integer; //stores the value of GetHashSize
HashTable, HashTableNoCase: array[Byte] of Longword;
HashTableInit: Boolean = False;
const
AnsiLowCaseLookup: array[AnsiChar] of AnsiChar = (
#$00, #$01, #$02, #$03, #$04, #$05, #$06, #$07,
#$08, #$09, #$0A, #$0B, #$0C, #$0D, #$0E, #$0F,
#$10, #$11, #$12, #$13, #$14, #$15, #$16, #$17,
#$18, #$19, #$1A, #$1B, #$1C, #$1D, #$1E, #$1F,
#$20, #$21, #$22, #$23, #$24, #$25, #$26, #$27,
#$28, #$29, #$2A, #$2B, #$2C, #$2D, #$2E, #$2F,
#$30, #$31, #$32, #$33, #$34, #$35, #$36, #$37,
#$38, #$39, #$3A, #$3B, #$3C, #$3D, #$3E, #$3F,
#$40, #$61, #$62, #$63, #$64, #$65, #$66, #$67,
#$68, #$69, #$6A, #$6B, #$6C, #$6D, #$6E, #$6F,
#$70, #$71, #$72, #$73, #$74, #$75, #$76, #$77,
#$78, #$79, #$7A, #$5B, #$5C, #$5D, #$5E, #$5F,
#$60, #$61, #$62, #$63, #$64, #$65, #$66, #$67,
#$68, #$69, #$6A, #$6B, #$6C, #$6D, #$6E, #$6F,
#$70, #$71, #$72, #$73, #$74, #$75, #$76, #$77,
#$78, #$79, #$7A, #$7B, #$7C, #$7D, #$7E, #$7F,
#$80, #$81, #$82, #$83, #$84, #$85, #$86, #$87,
#$88, #$89, #$8A, #$8B, #$8C, #$8D, #$8E, #$8F,
#$90, #$91, #$92, #$93, #$94, #$95, #$96, #$97,
#$98, #$99, #$9A, #$9B, #$9C, #$9D, #$9E, #$9F,
#$A0, #$A1, #$A2, #$A3, #$A4, #$A5, #$A6, #$A7,
#$A8, #$A9, #$AA, #$AB, #$AC, #$AD, #$AE, #$AF,
#$B0, #$B1, #$B2, #$B3, #$B4, #$B5, #$B6, #$B7,
#$B8, #$B9, #$BA, #$BB, #$BC, #$BD, #$BE, #$BF,
#$C0, #$C1, #$C2, #$C3, #$C4, #$C5, #$C6, #$C7,
#$C8, #$C9, #$CA, #$CB, #$CC, #$CD, #$CE, #$CF,
#$D0, #$D1, #$D2, #$D3, #$D4, #$D5, #$D6, #$D7,
#$D8, #$D9, #$DA, #$DB, #$DC, #$DD, #$DE, #$DF,
#$E0, #$E1, #$E2, #$E3, #$E4, #$E5, #$E6, #$E7,
#$E8, #$E9, #$EA, #$EB, #$EC, #$ED, #$EE, #$EF,
#$F0, #$F1, #$F2, #$F3, #$F4, #$F5, #$F6, #$F7,
#$F8, #$F9, #$FA, #$FB, #$FC, #$FD, #$FE, #$FF);
implementation
function GetHashSize(const Count: Integer): Integer;
begin
if Count < 65 then
Result := 256
else
Result := Round(IntPower(16, Ceil(Log10(Count div 4) / Log10(16))));
end;
function Hash(const Hash: LongWord; const Buf; const BufSize: Integer): LongWord;
var P: PByte;
I: Integer;
begin
P := #Buf;
Result := Hash;
for I := 1 to BufSize do
begin
Result := HashTable[Byte(Result) xor P^] xor (Result shr 8);
Inc(P);
end;
end;
function HashStrBuf(const StrBuf: Pointer; const StrLength: Integer; const Slots: LongWord): LongWord;
var P: PChar;
I, J: Integer;
begin
if not HashTableInit then
InitHashTable;
P := StrBuf;
if StrLength <= 48 then // Hash all characters for short strings
Result := Hash($FFFFFFFF, P^, StrLength)
else
begin
// Hash first 16 bytes
Result := Hash($FFFFFFFF, P^, 16);
// Hash last 16 bytes
Inc(P, StrLength - 16);
Result := Hash(Result, P^, 16);
// Hash 16 bytes sampled from rest of string
I := (StrLength - 48) div 16;
P := StrBuf;
Inc(P, 16);
for J := 1 to 16 do
begin
Result := HashTable[Byte(Result) xor Byte(P^)] xor (Result shr 8);
Inc(P, I + 1);
end;
end;
// Mod into slots
if Slots <> 0 then
Result := Result mod Slots;
end;
procedure InitHashTable;
var I, J: Byte;
R: LongWord;
begin
for I := $00 to $FF do
begin
R := I;
for J := 8 downto 1 do
if R and 1 <> 0 then
R := (R shr 1) xor $EDB88320
else
R := R shr 1;
HashTable[I] := R;
end;
Move(HashTable, HashTableNoCase, Sizeof(HashTable));
for I := Ord('A') to Ord('Z') do
HashTableNoCase[I] := HashTableNoCase[I or 32];
HashTableInit := True;
end;
The result of the HashStrBuf is "and (FHashSize - 1)" and is used as index in an "array of array of Integer" (of FHashSize size) to store the index of the string from that "array of string".
This way, when searches for a string, it's transformed in "checksum" and then the code searches in the "branch" with this index comparing this string with the strings from dictionary who have the same "checksum".
Ideally each string from dictionary should have unique checksum. But in the "real world" about 2/3 share the same "checksum" with other words. Because of that the search is not that fast.
In these dictionaries strings are composed of this characters: ['a'..'z',#224..#246,#248..#254,#154,#156..#159,#179,#186,#191,#190,#185,'0'..'9', '''']
Is there any way to improve the "hashing" so the strings would have more unique "checksums"?
Oh, one way is to increase the size of that "array of array of Integer" (FHashSize) but it cannot be increased too much because it takes a lot of Ram.
Another thing: these dictionaries are stored on HDD only as words/expressions (not the "checksums"). Their "checksum" is generated at program startup. But it takes a lot of seconds to do that...
Is there any way to speed up the startup of the program? Maybe by improving the "hashing" function, maybe by storing the "checksums" on HDD and loading them from there...
Any input would be appreciated...
PS: here is the code to search:
function TDictionary.LocateKey(const Key: AnsiString): Integer;
var i, j, l, H: Integer;
P, Q: PChar;
begin
Result := -1;
l := Length(Key);
H := HashStrBuf(#Key[1], l, 0) and (FHashSize - 1);
P := #Key[1];
for i := 0 to High(FHash[H]) do //FHash is that "array of array of integer"
begin
if l <> FKeys.ItemSize[FHash[H][i]] then //FKeys.ItemSize is an byte array with the lengths of strings from dictionary
Continue;
Q := FKeys.Pointer(FHash[H][i]); //pointer to string in dictionary
for j := 0 to l - 1 do
if (P + j)^ <> (Q + j)^ then
Break;
if j = l then
begin
Result := FHash[H][i];
Exit;
end;
end;
end;
Don't reinvent the wheel!
IMHO your hashing is far from efficient, and your collision algorithm can be improved.
Take a look for instance at the IniFiles unit, and the THashedStringList.
It's a bit old, but a good start for a string list using hashes.
There are a lot of good Delphi implementation of such, like in SuperObject and a lot of other code...
Take a look at our SynBigTable unit, which can handle arrays of data in memory or in file very fast, with full indexed searches. Or our latest TDynArray wrapper around any dynamic array of data, to implement TList-like methods to it, including fast binary search. I'm quite sure it could be faster than your hand-tuned code using hashing, if you use an ordered index then fast binary search.
Post-Scriptum:
About pure hashing speed of a string content, take a look at this function - rename RawByteString into AnsiString, PPtrInt into PPointer, and PtrInt into Integer for Delphi 7:
function Hash32(const Text: RawByteString): cardinal;
function SubHash(P: PCardinalArray): cardinal;
{$ifdef HASINLINE}inline;{$endif}
var s1,s2: cardinal;
i, L: PtrInt;
const Mask: array[0..3] of cardinal = (0,$ff,$ffff,$ffffff);
begin
if P<>nil then begin
L := PPtrInt(PtrInt(P)-4)^; // fast lenght(Text)
s1 := 0;
s2 := 0;
for i := 1 to L shr 4 do begin // 16 bytes (4 DWORD) by loop - aligned read
inc(s1,P^[0]);
inc(s2,s1);
inc(s1,P^[1]);
inc(s2,s1);
inc(s1,P^[2]);
inc(s2,s1);
inc(s1,P^[3]);
inc(s2,s1);
inc(PtrUInt(P),16);
end;
for i := 1 to (L shr 2)and 3 do begin // 4 bytes (DWORD) by loop
inc(s1,P^[0]);
inc(s2,s1);
inc(PtrUInt(P),4);
end;
inc(s1,P^[0] and Mask[L and 3]); // remaining 0..3 bytes
inc(s2,s1);
result := s1 xor (s2 shl 16);
end else
result := 0;
end;
begin // use a sub function for better code generation under Delphi
result := SubHash(pointer(Text));
end;
There is even a pure asm version, even faster, in our SynCommons.pas unit. I don't know any faster hashing function around (it's faster than crc32/adler32/IniFiles.hash...). It's based on adler32, but use DWORD aligned reading and summing for even better speed. This could be improved with SSE asm, of course, but here is a fast pure Delphi hash function.
Then don't forget to use "multiplication"/"binary and operation" for hash resolution, just like in IniFiles. It will reduce the number of iteration to your list of hashs.
But since you didn't provide the search source code, we are not able to know what could be improved here.
If you are using Delphi 7, consider using Julian Bucknall's lovely Delphi data types code, EzDsl (Easy Data Structures Library).
Now you don't have to reinvent the wheel as another wise person has also said.
You can download ezdsl, a version that I have made work with both Delphi 7, and recent unicode delphi versions, here.
In particular the unit name EHash contains a hash table implementation, which has various hashing algorithms plug-inable, or you can write your own plugin function that just does the hashing function of your choice.
As a word to the wise, if you are using a Unicode Delphi version; I would be careful about hashing your unicode strings with a code library like this, without checking how its hashing algorithms perform on your system. The OP here is using Delphi 7, so Unicode is not a factor for the original question.
I think you'll find a database (without checksums) a lot quicker. Maybe try sqlite which will give you a single file database. There are many Delphi Libraries available.

case insensitive Pos

Is there any comparable function like Pos that is not case-sensitive in D2010 (unicode)?
I know I can use Pos(AnsiUpperCase(FindString), AnsiUpperCase(SourceString)) but that adds a lot of processing time by converting the strings to uppercase every time the function is called.
For example, on a 1000000 loop, Pos takes 78ms while converting to uppercase takes 764ms.
str1 := 'dfkfkL%&/s"#<.676505';
for i := 0 to 1000000 do
PosEx('#<.', str1, 1); // Takes 78ms
for i := 0 to 1000000 do
PosEx(AnsiUpperCase('#<.'), AnsiUpperCase(str1), 1); // Takes 764ms
I know that to improve the performance of this specific example I can convert the strings to uppercase first before the loop, but the reason why I'm looking to have a Pos-like function that is not case-sensitive is to replace one from FastStrings. All the strings I'll be using Pos for will be different so I will need to convert each and every one to uppercase.
Is there any other function that might be faster than Pos + convert the strings to uppercase?
The built-in Delphi function to do that is in both the AnsiStrings.ContainsText for AnsiStrings and StrUtils.ContainsText for Unicode strings.
In the background however, they use logic very similar to your logic.
No matter in which library, functions like that will always be slow: especially to be as compatible with Unicode as possible, they need to have quite a lot of overhead. And since they are inside the loop, that costs a lot.
The only way to circumvent that overhead, is to do those conversions outside the loop as much as possible.
So: follow your own suggestion, and you have a really good solution.
--jeroen
This version of my previous answer works in both D2007 and D2010.
In Delphi 2007 the CharUpCaseTable is 256 bytes
In Delphi 2010 it is 128 KB (65535*2).
The reason is Char size. In the older version of Delphi my original code only supported the current locale character set at initialization. My InsensPosEx is about 4 times faster than your code. Certainly it is possible to go even faster, but we would lose simplicity.
type
TCharUpCaseTable = array [Char] of Char;
var
CharUpCaseTable: TCharUpCaseTable;
procedure InitCharUpCaseTable(var Table: TCharUpCaseTable);
var
n: cardinal;
begin
for n := 0 to Length(Table) - 1 do
Table[Char(n)] := Char(n);
CharUpperBuff(#Table, Length(Table));
end;
function InsensPosEx(const SubStr, S: string; Offset: Integer = 1): Integer;
var
n: Integer;
SubStrLength: Integer;
SLength: Integer;
label
Fail;
begin
Result := 0;
if S = '' then Exit;
if Offset <= 0 then Exit;
SubStrLength := Length(SubStr);
SLength := Length(s);
if SubStrLength > SLength then Exit;
Result := Offset;
while SubStrLength <= (SLength-Result+1) do
begin
for n := 1 to SubStrLength do
if CharUpCaseTable[SubStr[n]] <> CharUpCaseTable[s[Result+n-1]] then
goto Fail;
Exit;
Fail:
Inc(Result);
end;
Result := 0;
end;
//...
initialization
InitCharUpCaseTable({var}CharUpCaseTable);
I have also faced the problem of converting FastStrings, which used a Boyer-Moore (BM) search to gain some speed, for D2009 and D2010. Since many of my searches are looking for a single character only, and most of these are looking for non-alphabetic characters, my D2010 version of SmartPos has an overload version with a widechar as the first argument, and does a simple loop through the string to find these. I use uppercasing of both arguments to handle the few non-case-sensitive case. For my applications, I believe the speed of this solution is comparable to FastStrings.
For the 'string find' case, my first pass was to use SearchBuf and do the uppercasing and accept the penalty, but I have recently been looking into the possibility of using a Unicode BM implementation. As you may be aware, BM does not scale well or easily to charsets of Unicode proportions, but there is a Unicode BM implementation at Soft Gems. This pre-dates D2009 and D2010, but looks as if it would convert fairly easily. The author, Mike Lischke, solves the uppercasing issue by including a 67kb Unicode uppercasing table, and this may be a step too far for my modest requirements. Since my search strings are usually short (though not as short as your single three-character example) the overhead for Unicode BM may also be a price not worth paying: the BM advantage increases with the length of the string being searched for.
This is definitely a situation where benchmarking with some real-world application-specific examples will be needed before incorporating that Unicode BM into my own applications.
Edit: some basic benchmarking shows that I was right to be wary of the "Unicode Tuned Boyer-Moore" solution. In my environment, UTBM results in bigger code, longer time. I might consider using it if I needed some of the extras this implementation provides (handling surrogates and whole-words only searches).
Here's one that I wrote and have been using for years:
function XPos( const cSubStr, cString :string ) :integer;
var
nLen0, nLen1, nCnt, nCnt2 :integer;
cFirst :Char;
begin
nLen0 := Length(cSubStr);
nLen1 := Length(cString);
if nLen0 > nLen1 then
begin
// the substr is longer than the cString
result := 0;
end
else if nLen0 = 0 then
begin
// null substr not allowed
result := 0;
end
else
begin
// the outer loop finds the first matching character....
cFirst := UpCase( cSubStr[1] );
result := 0;
for nCnt := 1 to nLen1 - nLen0 + 1 do
begin
if UpCase( cString[nCnt] ) = cFirst then
begin
// this might be the start of the substring...at least the first
// character matches....
result := nCnt;
for nCnt2 := 2 to nLen0 do
begin
if UpCase( cString[nCnt + nCnt2 - 1] ) <> UpCase( cSubStr[nCnt2] ) then
begin
// failed
result := 0;
break;
end;
end;
end;
if result > 0 then
break;
end;
end;
end;
Why not just convert the both the substring and the source string to lower or upper case within the regular Pos statement. The result will effectively be case-insensitive because both arguments are all in one case. Simple and lite.
The Jedi Code Library has StrIPos and thousands of other useful functions to complement Delphi's RTL. When I still worked a lot in Delphi, JCL and its visual brother JVCL were among the first things I added to a freshly installed Delphi.
Instead 'AnsiUpperCase' you can use Table it is much faster.
I have reshape my old code. It is very simple and also very fast.
Check it:
type
TAnsiUpCaseTable = array [AnsiChar] of AnsiChar;
var
AnsiTable: TAnsiUpCaseTable;
procedure InitAnsiUpCaseTable(var Table: TAnsiUpCaseTable);
var
n: cardinal;
begin
for n := 0 to SizeOf(TAnsiUpCaseTable) -1 do
begin
AnsiTable[AnsiChar(n)] := AnsiChar(n);
CharUpperBuff(#AnsiTable[AnsiChar(n)], 1);
end;
end;
function UpCasePosEx(const SubStr, S: string; Offset: Integer = 1): Integer;
var
n :integer;
SubStrLength :integer;
SLength :integer;
label
Fail;
begin
SLength := length(s);
if (SLength > 0) and (Offset > 0) then begin
SubStrLength := length(SubStr);
result := Offset;
while SubStrLength <= SLength - result + 1 do begin
for n := 1 to SubStrLength do
if AnsiTable[SubStr[n]] <> AnsiTable[s[result + n -1]] then
goto Fail;
exit;
Fail:
inc(result);
end;
end;
result := 0;
end;
initialization
InitAnsiUpCaseTable(AnsiTable);
end.
I think, converting to upper or lower case before Pos is the best way, but you should try to call AnsiUpperCase/AnsiLowerCase functions as less as possible.
On this occasion I couldn't find any approach that was even as good as, let alone better than Pos() + some form of string normalisation (upper/lowercase conversion).
This is not entirely surprising as when benchmarked the Unicode string handling in Delphi 2009 I found that the Pos() RTL routine has improved significantly since Delphi 7, explained in part by the fact that aspects of the FastCode libraries have been incorporated into the RTL for some time now.
The FastStrings library on the other hand has not - iirc - been significantly updated for a long time now. In tests I found that many FastStrings routines have in fact been overtaken by the equivalent RTL functions (with a couple of exceptions, explained by the unavoidable overhead incurred by the additional complications of Unicode).
The "Char-Wise" processing of the solution presented by Steve is the best so far imho.
Any approach that involves normalising the entire strings (both string and sub-string) risks introducing errors in any character-based position in the results due to the fact that with Unicode strings a case conversion may result in a change in the length of the string (some characters convert to more/fewer characters in a case conversion).
These may be rare cases but Steve's routine avoids them and is only about 10% slower than the already quite fast Pos + Uppercase (your benchmarking results don't tally with mine on that score).
Often the simple solution is the one you'd want to use:
if AnsiPos(AnsiupperCase('needle'), AnsiupperCase('The Needle in the haystack')) <> 0 then
DoSomething;
Reference:
http://www.delphibasics.co.uk/RTL.asp?Name=ansipos
http://www.delphibasics.co.uk/RTL.asp?Name=UpCase
Any program on Windows can call a shell-API function, which keeps your code-size down. As usual, read the program from the bottom up. This has been tested with Ascii-strings only, not wide strings.
program PrgDmoPosIns; {$AppType Console} // demo case-insensitive Pos function for Windows
// Free Pascal 3.2.2 [2022/01/02], Win32 for i386
// FPC.EXE -vq -CoOr -Twin32 -oPrgStrPosDmo.EXE PrgStrPosDmo.LPR
// -vq Verbose: Show message numbers
// -C Code generation:
// o Check overflow of integer operations
// O Check for possible overflow of integer operations - Integer Overflow checking turns on Warning 4048
// r Range checking
// -Twin32 Target 32 bit Windows operating systems
// 29600 bytes code, 1316 bytes data, 35,840 bytes file
function StrStrIA( pszHaystack, pszNeedle : PChar ) : PChar; stdcall; external 'shlwapi.dll'; // dynamic link to Windows API's case-INsensitive search
// https://learn.microsoft.com/en-us/windows/win32/api/shlwapi/nf-shlwapi-strstria
// "FPC\3.2.2\Source\Packages\winunits-base\src\shlwapi.pp" line 557
function StrPos( strNeedle, strHaystk : string ) : SizeInt; // return the position of Needle within Haystack, or zero if not found
var
intRtn : SizeInt; // function result
ptrHayStk , // pointers to
ptrNeedle , // search strings
strMchFnd : PChar ; // pointer to match-found string, or null-pointer/empty-string when not found
bolFnd : boolean; // whether Needle was found within Haystack
intLenHaystk , // length of haystack
intLenMchFnd : SizeInt; // length of needle
begin
strHayStk := strHayStk + #0 ; // strings passed to API must be
strNeedle := strNeedle + #0 ; // null-terminated
ptrHayStk := Addr( strHayStk[ 1 ] ) ; // set pointers to point at first characters of
ptrNeedle := Addr( strNeedle[ 1 ] ) ; // null-terminated strings, so API gets C-style strings
strMchFnd := StrStrIA( ptrHayStk, ptrNeedle ); // call Windows to perform search; match-found-string now points inside the Haystack
bolFnd := ( strMchFnd <> '' ) ; // variable is True when match-found-string is not null/empty
if bolFnd then begin ; // when Needle was yes found in Haystack
intLenMchFnd := Length( strMchFnd ) ; // get length of needle
intLenHaystk := Length( strHayStk ) ; // get length of haystack
intRtn := intLenHaystk - intLenMchFnd; // set function result to the position of needle within haystack, which is the difference in lengths
end else // when Needle was not found in Haystack
intRtn := 0 ; // set function result to tell caller needle does not appear within haystack
StrPos := intRtn ; // pass function result back to caller
end; // StrPos
procedure TstOne( const strNeedle, strHayStk : string ); // run one test with this Needle
var
intPos : SizeInt; // found-match location of Needle within Haystack, or zero if none
begin
write ( 'Searching for : [', strNeedle, ']' ); // bgn output row for this test
intPos := StrPos( strNeedle, strHaystk ); // get Needle position
writeln(' StrPos is ' , intPos ); // end output row for this test
end; // TstOne
procedure TstAll( ); // run all tests with various Needles
const
strHayStk = 'Needle in a Haystack'; // all tests will search in this string
begin
writeln( 'Searching in : [', strHayStk, ']' ); // emit header row
TstOne ( 'Noodle' , strHayStk ); // test not-found
TstOne ( 'Needle' , strHayStk ); // test found at yes-first character
TstOne ( 'Haystack' , strHayStk ); // test found at not-first character
end; // TstAll
begin // ***** MAIN *****
TstAll( ); // run all tests
end.
function TextPos(const ASubText, AText: UnicodeString): Integer;
var
res: Integer;
begin
{
Locates a substring in a given text string without case sensitivity.
Returns the index of the first occurence of ATextin AText,
or zero if the text was not found
}
res := FindNLSString(LOCALE_USER_DEFAULT, FIND_FROMSTART or LINGUISTIC_IGNORECASE, PWideChar(AText), Length(AText), PWideChar(ASubText), Length(ASubText), nil);
Result := (res+1); //convert zero-based to one-based index, and -1 not found to zero.
end;
And in case you don't have the definitions:
function FindNLSString(Locale: LCID; dwFindNLSStringFlags: DWORD; lpStringSource: PWideChar; cchSource: Integer; lpStringValue: PWideChar; cchValue: Integer; cchFound: PInteger): Integer; stdcall; external 'Kernel32.dll';
const
FIND_FROMSTART = $00400000; // look for value in source, starting at the
LINGUISTIC_IGNORECASE = $00000010; // linguistically appropriate 'ignore

Resources