Pos() within utf8 string boundaries

Pos() within utf8 string boundaries - delphi

I'd like to have a Pos() adapted to be used specifying boundaries within the Source string, rather than have it perform the search in the entire data.
Let's say I have a string which is 100 chars long, I want to perform the Pos only between the 5th and 20th character of the (unicode/utf8) string.
The code should be adapted from the ASM fastcode implementation in delphi, and obviously avoid pre-copying the portion of the string to a temporal one, as the purpose is making it faster than that.
My scenario:
I have a string which is accessed many times, and each time, a portion of it is copied to another temporal string, then a Pos is performed on it. I want to avoid the intermediary copy every time, and rather perform the Pos within the boundaries I specify.
Edit: question edited after new one was deemed a duplicate.
I would still like a solution that expands on the current XE3 FastCode assembly implementation, as that would fit my goal here.

Here is an alternative that is not based on asm.
It will also work on a 64-bit application.
function PosExUBound(const SubStr, Str: UnicodeString; Offset,EndPos: Integer): Integer; overload;
var
I, LIterCnt, L, J: NativeInt;
PSubStr, PS: PWideChar;
begin
L := Length(SubStr);
if (EndPos > Length(Str)) then
EndPos := Length(Str);
{ Calculate the number of possible iterations. Not valid if Offset < 1. }
LIterCnt := EndPos - Offset - L + 1;
{- Only continue if the number of iterations is positive or zero (there is space to check) }
if (Offset > 0) and (LIterCnt >= 0) and (L > 0) then
begin
PSubStr := PWideChar(SubStr);
PS := PWideChar(Str);
Inc(PS, Offset - 1);
Dec(L);
I := 0;
J := L;
repeat
if PS[I + J] <> PSubStr[J] then
begin
Inc(I);
J := L;
Dec(LIterCnt);
if (LIterCnt < 0)
then Exit(0);
end
else
if (J > 0) then
Dec(J)
else
Exit(I + Offset);
until false;
end;
Result := 0;
end;
I will leave it as an excercise to implement an AnsiString overloaded version.
BTW, the purepascal parts of the Pos() functions in XE3 are to put it mildly poorly written. See QC111103 Inefficient loop in Pos() for purepascal. Give it a vote if you like.

Related

Delphi: What are faster pure Pascal approachs to find the position of a character in a Unicode string?

Background
Added Later
I have made a pure Pascal function to find the position of a character in a Unicode string as follows:
function CharPosEx(const chChr: Char; const sStr: string;
const iOffset: Integer=1): Integer;
var
PStr : PChar;
PRunIdx: PChar;
PEndIdx: PChar;
iLenStr: Integer;
begin
Result := 0;
iLenStr := Length(sStr);
if (iLenStr = 0) or (iOffset <= 0) or (iOffset > iLenStr) then Exit;
PStr := Pointer(sStr);
PEndIdx := #PStr[iLenStr - 1];
PRunIdx := #PStr[iOffset - 1];
repeat
if PRunIdx^ = chChr then begin
Result := PRunIdx - PStr + 1;
Exit;
end;
Inc(PRunIdx);
until PRunIdx > PEndIdx;
end;
I decide to not use the built-in StrUtils.PosEx() because I want to create a UTF16_CharPosEx function based on an optimized pure Pascal function of CharPosEx. I'm trying to find a faster generic solution like the pure Pascal approachs of the Fastcode Project.
The Original Statements
According to the accepted answer to the question, Delphi: fast Pos with 64-bit, the fastest pure Pascal function to find the position of a substring in a string is PosEx_Sha_Pas_2() of the Fastcode Project.
For the fastest pure Pascal function to find the position of a character in a string, I noticed that the Fastcode Project has CharPos(), CharPosIEx(), and CharPosEY() for a left-to-right matching, as well as CharPosRev() for a right-to-left matching.
However, the problem is that all Fastcode functions were developed before Delphi 2009, which was the first Delphi release that supports Unicode.
I'm interested in CharPos(), and CharPosEY(). I want to re-benchmark them because there are some optimization techniques that are useless nowadays, such as loop unrolling technique that was occasionally implemented in Fastcode functions.
However, I cannot recompile the benchmark project for each of the CharPos family challenges because I have been using Delphi XE3 here, therefore I cannot conclude which one is the fastest.
Questions
Anyone here know or can conlude which one is the fastest pure Pascal implementations for each of the mentioned Fastcode challenges, especially for CharPos() and CharPosEY()?
Other approaches out of the Fastcode Project solution are welcome.
Notes
The Unicode string term I used here refers to a string whose the type is UnicodeString regardless its encoding scheme.
If encoding scheme matters, what I mean is the fixed-width 16-bit encoding scheme (UCS-2).

Many of the solutions to find a character in a string amongst the fastcode examples, uses a technique to read the string in in larger chunks into a register and then analyze the register bytes for a match. this works fine when the characters are single bytes, but are not optimal when characters are 16 bit unicode.
Some examples even use a lookup table, but that is also not optimal in a unicode string search.
I find that the fastcode purepascal PosEx_Sha_Pas_2 string search routine works very good both in 32/64 bit mode even for single character search.
You might as well use that routine.
I stripped out some parts not needed out of the PosEx_Sha_Pas_2 into CharPosEx_LU_Pas and gained some percent in execution time:
function CharPosEx_LU_Pas(c: Char; const S: string; Offset: Integer = 1): Integer;
var
len: Integer;
p, pStart, pStop: PChar;
label
Loop0, Loop4,
TestT, Test0, Test1, Test2, Test3, Test4,
AfterTestT, AfterTest0,
Ret;
begin;
p := Pointer(S);
if (p = nil) or (Offset < 1) then
begin;
Exit(0);
end;
len := PLongInt(PByte(p) - 4)^; // <- Modified to fit 32/64 bit
if (len < Offset) then
begin;
Exit(0);
end;
pStop := p + len;
pStart := p;
p := p + Offset + 3;
if p < pStop then
goto Loop4;
p := p - 4;
goto Loop0;
Loop4:
if c = p[-4] then
goto Test4;
if c = p[-3] then
goto Test3;
if c = p[-2] then
goto Test2;
if c = p[-1] then
goto Test1;
Loop0:
if c = p[0] then
goto Test0;
AfterTest0:
if c = p[1] then
goto TestT;
AfterTestT:
p := p + 6;
if p < pStop then
goto Loop4;
p := p - 4;
if p < pStop then
goto Loop0;
Exit(0);
Test3:
p := p - 2;
Test1:
p := p - 2;
TestT:
p := p + 2;
if p <= pStop then
goto Ret;
Exit(0);
Test4:
p := p - 2;
Test2:
p := p - 2;
Test0:
Inc(p);
Ret:
Result := p - pStart;
end;
I claim no originality to this snippet as it was a simple task to strip out those code parts not needed from PosEx_Sha_Pas_2.
Benchmark 32 bit (101 character string, last character matches):
50000000 repetitions.
System.Pos: 1547 ms
PosEX_Sha_Pas_2: 1292 ms
CharPosEx: 2315 ms
CharPosEx_LU_Pas: 1103 ms
SysUtils.StrScan: 2666 ms
Benchmark 64 bit (101 character string, last character matches):
50000000 repetitions.
System.Pos: 20928 ms
PosEX_Sha_Pas_2: 1783 ms
CharPosEx: 2874 ms
CharPosEx_LU_Pas: 1728 ms
SysUtils.StrScan: 3115 ms

How to insert string at index in TMemoryStream?

How can I insert a string at a specified index in TMemoryStream? If you added a string
"a" to an existing string "b" at Index 0 it would move it forward "ab" etc. For example this is what TStringBuilder.Insert does.

Expand the stream so that there is room for the text to be inserted. So, if the text to be inserted has length N, then you need to make the stream N bytes larger.
Copy all the existing content, starting from the insertion point, to the right to make room for the insertion. A call to Move will get this done. You'll be moving this text N bytes to the right.
Write the inserted string at the insertion point.
I'm assuming an 8 bit encoding. If you use a 16 bit encoding, then the stream needs to be grown by 2N bytes, and so on.
You will find that this is a potentially expensive operation. If you care about performance you will do whatever you can to avoid ever having to do this.
P.S. I'm sorry if I have offended any right-to-left language readers with my Anglo-centric assumption that strings run from left to right!
You asked for some code. Here it is:
procedure TMyStringBuilder.Insert(Index: Integer; const Str: string);
var
N: Integer;
P: Char;
begin
N := Length(Str);
if N=0 then
exit;
FStream.Size := FStream.Size + N*SizeOf(Char);
P := PChar(FStream.Memory);
Move((P + Index)^, (P + Index + N)^, (FStream.Size - N - Index)*SizeOf(Char));
Move(Pointer(Str)^, (P + Index)^, N*SizeOf(Char));
end;
Note that I wrote this code, and then looked at the code in TStringBuilder. It's pretty much identical to that!
The fact that the code you end up writing for this operation is identical to that in TStringBuilder should cause you to stop and contemplate. It's very likely that this new string builder replacement class that you are building will end up with the same implementation as the original. It's highly likely that your replacement will perform no better than the original, and quite plausible that the performance of the replacement will be worse.
It looks a little to me as though you are optimising prematurely. According to your comments below you have not yet timed your code to prove that time spent in TStringBuilder methods is your bottleneck. That's really the first thing that you need to do.
Assuming that you do this timing, and prove that TStringBuilder methods are your bottleneck, you then need to identify why that code is performing below par. And then you need to work out how the code could be improved. Simply repeating the implementation of the original class is not going to yield any benefits.

To move the existing data to make some room for the new string data, you can use pointer operation and Move procedure for faster operation. But it only has to be done if the insertion index is lower then the size of the original stream. If the index is larger than stream size then you could: (1) expand the stream size to accomodate the index number and fill the extra room with zero values or spaces, or (2) reduce the index value to the stream size, so the string will be inserted or appended in the end of the stream.
Depend on your needs, you could: (1) make a class derived from TMemoryStream, or (2) make a function to process an instance of TMemoryStream. Here's the first case:
type
TExtMemoryStream = class(TMemoryStream)
public
procedure InsertString(Index: Integer; const S: string);
end;
procedure TExtMemoryStream.InsertString(Index: Integer; const S: string);
var
SLength, OldSize: Integer;
Src, Dst, PointerToS: ^Char;
begin
if Index > Size then Index := Size;
SLength := Length(S);
OldSize := Size;
SetSize(Size + SLength);
Src := Memory; Inc(Src, Index);
Dst := Src; Inc(Dst, SLength);
Move(Src^, Dst^, OldSize - Index);
PointerToS := #S[1];
Move(PointerToS^, Src^, SLength);
end;
or the second case:
procedure InsertStringToMemoryStream(MS: TMemoryStream;
Index: Integer; const S: string);
var
SLength, OldSize: Integer;
Src, Dst, PointerToS: ^Char;
begin
if Index > MS.Size then Index := MS.Size;
SLength := Length(S);
OldSize := MS.Size;
MS.SetSize(MS.Size + SLength);
Src := MS.Memory; Inc(Src, Index);
Dst := Src; Inc(Dst, SLength);
Move(Src^, Dst^, OldSize - Index);
PointerToS := #S[1];
Move(PointerToS^, Src^, SLength);
end;
There, hope it helps :)

List Intersection

I want to compute a list "intersection". The problem is:
L1 = [1, 0, 2, 3, 1 , 3, 0, 5]
L2 = [3, 5]
Then the result will be
L3 = [0, 0, 0, 1, 0, 1, 0, 1]
Then i will convert this result in a byte. In this case will be 21 in decimal format.
I want to make in delphi and I need this do efficiently. Is there a way to solve this problem better than O(m*n)?

Here's a function that should do what you want. I defined L2 as a set instead of an array since you said all your values will fit in a Byte. Its complexity is O(n); checking set membership runs in constant time. But since the result needs to fit in a byte, the length of L1 must be bound at 8, so the complexity of this function is actually O(1).
function ArrayMembersInSet(const L1: array of Byte; const L2: set of Byte): Byte;
var
i: Integer;
b: Byte;
begin
Assert(Length(L1) <= 8,
'List is to long to fit in result');
Result := 0;
for i := 0 to High(L1) do begin
b := L1[i];
if b in L2 then
Result := Result or (1 shl (7 - i));
end;
end;

Rob's answer will work for this specific case. For a more general case where two lists have to be compared, you can do it in O(m+n) time if both lists are sorted. (Or O(n log n) time if you have to sort them first, but that's still a lot faster than O(m*n).)
The basic List Comparison algorithm looks like this:
procedure ListCompare(list1, list2: TWhateverList; [Add extra params here]);
var
i, j: integer;
begin
i := 0;
j := 0;
while (i < list1.Count) and (j < list2.Count) do
begin
if list1[i] < list2[j] then
begin
//handle this appropriately
inc(i);
end
else if list1[i] > list2[j] then
begin
//handle this appropriately
inc(j);
end
else //both elements are equal
begin
//handle this appropriately
inc(i);
inc(j);
end;
end;
//optional cleanup, if needed:
while (i < list1.Count) do
begin
//handle this appropriately
inc(i);
end;
while (j < list2.Count) do
begin
//handle this appropriately
inc(j);
end;
end;
This can be customized for a whole bunch of tasks, including list intersection, by changing the "handle this appropriately" places, and it's guaranteed to not run for more steps than are in both lists put together. For list intersection, have the equals case add the value to some output and the other two do nothing but advance the counters, and you can leave off the optional cleanup.
One way to use this algorithm is to make the extra params at the top into function pointers and pass in routines that will handle the appropriate cases, or nil to do nothing. (Just make sure you check for nil before invoking them if you go that route!) That way you only have to write the basic code once.

Well no matter what you will need to visit each element in each list comparing the values. A nested loop would accomplish this in O(n^2) and the conversion should just be local work.
EDIT: I noticed that you want a better runtime than O(n*m).

how to improve the code (Delphi) for loading and searching in a dictionary?

I'm a Delphi programmer.
I have made a program who uses dictionaries with words and expressions (loaded in program as "array of string").
It uses a search algorithm based on their "checksum" (I hope this is the correct word).
A string is transformed in integer based on this:
var
FHashSize: Integer; //stores the value of GetHashSize
HashTable, HashTableNoCase: array[Byte] of Longword;
HashTableInit: Boolean = False;
const
AnsiLowCaseLookup: array[AnsiChar] of AnsiChar = (
#$00, #$01, #$02, #$03, #$04, #$05, #$06, #$07,
#$08, #$09, #$0A, #$0B, #$0C, #$0D, #$0E, #$0F,
#$10, #$11, #$12, #$13, #$14, #$15, #$16, #$17,
#$18, #$19, #$1A, #$1B, #$1C, #$1D, #$1E, #$1F,
#$20, #$21, #$22, #$23, #$24, #$25, #$26, #$27,
#$28, #$29, #$2A, #$2B, #$2C, #$2D, #$2E, #$2F,
#$30, #$31, #$32, #$33, #$34, #$35, #$36, #$37,
#$38, #$39, #$3A, #$3B, #$3C, #$3D, #$3E, #$3F,
#$40, #$61, #$62, #$63, #$64, #$65, #$66, #$67,
#$68, #$69, #$6A, #$6B, #$6C, #$6D, #$6E, #$6F,
#$70, #$71, #$72, #$73, #$74, #$75, #$76, #$77,
#$78, #$79, #$7A, #$5B, #$5C, #$5D, #$5E, #$5F,
#$60, #$61, #$62, #$63, #$64, #$65, #$66, #$67,
#$68, #$69, #$6A, #$6B, #$6C, #$6D, #$6E, #$6F,
#$70, #$71, #$72, #$73, #$74, #$75, #$76, #$77,
#$78, #$79, #$7A, #$7B, #$7C, #$7D, #$7E, #$7F,
#$80, #$81, #$82, #$83, #$84, #$85, #$86, #$87,
#$88, #$89, #$8A, #$8B, #$8C, #$8D, #$8E, #$8F,
#$90, #$91, #$92, #$93, #$94, #$95, #$96, #$97,
#$98, #$99, #$9A, #$9B, #$9C, #$9D, #$9E, #$9F,
#$A0, #$A1, #$A2, #$A3, #$A4, #$A5, #$A6, #$A7,
#$A8, #$A9, #$AA, #$AB, #$AC, #$AD, #$AE, #$AF,
#$B0, #$B1, #$B2, #$B3, #$B4, #$B5, #$B6, #$B7,
#$B8, #$B9, #$BA, #$BB, #$BC, #$BD, #$BE, #$BF,
#$C0, #$C1, #$C2, #$C3, #$C4, #$C5, #$C6, #$C7,
#$C8, #$C9, #$CA, #$CB, #$CC, #$CD, #$CE, #$CF,
#$D0, #$D1, #$D2, #$D3, #$D4, #$D5, #$D6, #$D7,
#$D8, #$D9, #$DA, #$DB, #$DC, #$DD, #$DE, #$DF,
#$E0, #$E1, #$E2, #$E3, #$E4, #$E5, #$E6, #$E7,
#$E8, #$E9, #$EA, #$EB, #$EC, #$ED, #$EE, #$EF,
#$F0, #$F1, #$F2, #$F3, #$F4, #$F5, #$F6, #$F7,
#$F8, #$F9, #$FA, #$FB, #$FC, #$FD, #$FE, #$FF);
implementation
function GetHashSize(const Count: Integer): Integer;
begin
if Count < 65 then
Result := 256
else
Result := Round(IntPower(16, Ceil(Log10(Count div 4) / Log10(16))));
end;
function Hash(const Hash: LongWord; const Buf; const BufSize: Integer): LongWord;
var P: PByte;
I: Integer;
begin
P := #Buf;
Result := Hash;
for I := 1 to BufSize do
begin
Result := HashTable[Byte(Result) xor P^] xor (Result shr 8);
Inc(P);
end;
end;
function HashStrBuf(const StrBuf: Pointer; const StrLength: Integer; const Slots: LongWord): LongWord;
var P: PChar;
I, J: Integer;
begin
if not HashTableInit then
InitHashTable;
P := StrBuf;
if StrLength <= 48 then // Hash all characters for short strings
Result := Hash($FFFFFFFF, P^, StrLength)
else
begin
// Hash first 16 bytes
Result := Hash($FFFFFFFF, P^, 16);
// Hash last 16 bytes
Inc(P, StrLength - 16);
Result := Hash(Result, P^, 16);
// Hash 16 bytes sampled from rest of string
I := (StrLength - 48) div 16;
P := StrBuf;
Inc(P, 16);
for J := 1 to 16 do
begin
Result := HashTable[Byte(Result) xor Byte(P^)] xor (Result shr 8);
Inc(P, I + 1);
end;
end;
// Mod into slots
if Slots <> 0 then
Result := Result mod Slots;
end;
procedure InitHashTable;
var I, J: Byte;
R: LongWord;
begin
for I := $00 to $FF do
begin
R := I;
for J := 8 downto 1 do
if R and 1 <> 0 then
R := (R shr 1) xor $EDB88320
else
R := R shr 1;
HashTable[I] := R;
end;
Move(HashTable, HashTableNoCase, Sizeof(HashTable));
for I := Ord('A') to Ord('Z') do
HashTableNoCase[I] := HashTableNoCase[I or 32];
HashTableInit := True;
end;
The result of the HashStrBuf is "and (FHashSize - 1)" and is used as index in an "array of array of Integer" (of FHashSize size) to store the index of the string from that "array of string".
This way, when searches for a string, it's transformed in "checksum" and then the code searches in the "branch" with this index comparing this string with the strings from dictionary who have the same "checksum".
Ideally each string from dictionary should have unique checksum. But in the "real world" about 2/3 share the same "checksum" with other words. Because of that the search is not that fast.
In these dictionaries strings are composed of this characters: ['a'..'z',#224..#246,#248..#254,#154,#156..#159,#179,#186,#191,#190,#185,'0'..'9', '''']
Is there any way to improve the "hashing" so the strings would have more unique "checksums"?
Oh, one way is to increase the size of that "array of array of Integer" (FHashSize) but it cannot be increased too much because it takes a lot of Ram.
Another thing: these dictionaries are stored on HDD only as words/expressions (not the "checksums"). Their "checksum" is generated at program startup. But it takes a lot of seconds to do that...
Is there any way to speed up the startup of the program? Maybe by improving the "hashing" function, maybe by storing the "checksums" on HDD and loading them from there...
Any input would be appreciated...
PS: here is the code to search:
function TDictionary.LocateKey(const Key: AnsiString): Integer;
var i, j, l, H: Integer;
P, Q: PChar;
begin
Result := -1;
l := Length(Key);
H := HashStrBuf(#Key[1], l, 0) and (FHashSize - 1);
P := #Key[1];
for i := 0 to High(FHash[H]) do //FHash is that "array of array of integer"
begin
if l <> FKeys.ItemSize[FHash[H][i]] then //FKeys.ItemSize is an byte array with the lengths of strings from dictionary
Continue;
Q := FKeys.Pointer(FHash[H][i]); //pointer to string in dictionary
for j := 0 to l - 1 do
if (P + j)^ <> (Q + j)^ then
Break;
if j = l then
begin
Result := FHash[H][i];
Exit;
end;
end;
end;

Don't reinvent the wheel!
IMHO your hashing is far from efficient, and your collision algorithm can be improved.
Take a look for instance at the IniFiles unit, and the THashedStringList.
It's a bit old, but a good start for a string list using hashes.
There are a lot of good Delphi implementation of such, like in SuperObject and a lot of other code...
Take a look at our SynBigTable unit, which can handle arrays of data in memory or in file very fast, with full indexed searches. Or our latest TDynArray wrapper around any dynamic array of data, to implement TList-like methods to it, including fast binary search. I'm quite sure it could be faster than your hand-tuned code using hashing, if you use an ordered index then fast binary search.
Post-Scriptum:
About pure hashing speed of a string content, take a look at this function - rename RawByteString into AnsiString, PPtrInt into PPointer, and PtrInt into Integer for Delphi 7:
function Hash32(const Text: RawByteString): cardinal;
function SubHash(P: PCardinalArray): cardinal;
{$ifdef HASINLINE}inline;{$endif}
var s1,s2: cardinal;
i, L: PtrInt;
const Mask: array[0..3] of cardinal = (0,$ff,$ffff,$ffffff);
begin
if P<>nil then begin
L := PPtrInt(PtrInt(P)-4)^; // fast lenght(Text)
s1 := 0;
s2 := 0;
for i := 1 to L shr 4 do begin // 16 bytes (4 DWORD) by loop - aligned read
inc(s1,P^[0]);
inc(s2,s1);
inc(s1,P^[1]);
inc(s2,s1);
inc(s1,P^[2]);
inc(s2,s1);
inc(s1,P^[3]);
inc(s2,s1);
inc(PtrUInt(P),16);
end;
for i := 1 to (L shr 2)and 3 do begin // 4 bytes (DWORD) by loop
inc(s1,P^[0]);
inc(s2,s1);
inc(PtrUInt(P),4);
end;
inc(s1,P^[0] and Mask[L and 3]); // remaining 0..3 bytes
inc(s2,s1);
result := s1 xor (s2 shl 16);
end else
result := 0;
end;
begin // use a sub function for better code generation under Delphi
result := SubHash(pointer(Text));
end;
There is even a pure asm version, even faster, in our SynCommons.pas unit. I don't know any faster hashing function around (it's faster than crc32/adler32/IniFiles.hash...). It's based on adler32, but use DWORD aligned reading and summing for even better speed. This could be improved with SSE asm, of course, but here is a fast pure Delphi hash function.
Then don't forget to use "multiplication"/"binary and operation" for hash resolution, just like in IniFiles. It will reduce the number of iteration to your list of hashs.
But since you didn't provide the search source code, we are not able to know what could be improved here.

If you are using Delphi 7, consider using Julian Bucknall's lovely Delphi data types code, EzDsl (Easy Data Structures Library).
Now you don't have to reinvent the wheel as another wise person has also said.
You can download ezdsl, a version that I have made work with both Delphi 7, and recent unicode delphi versions, here.
In particular the unit name EHash contains a hash table implementation, which has various hashing algorithms plug-inable, or you can write your own plugin function that just does the hashing function of your choice.
As a word to the wise, if you are using a Unicode Delphi version; I would be careful about hashing your unicode strings with a code library like this, without checking how its hashing algorithms perform on your system. The OP here is using Delphi 7, so Unicode is not a factor for the original question.

I think you'll find a database (without checksums) a lot quicker. Maybe try sqlite which will give you a single file database. There are many Delphi Libraries available.

case insensitive Pos

Is there any comparable function like Pos that is not case-sensitive in D2010 (unicode)?
I know I can use Pos(AnsiUpperCase(FindString), AnsiUpperCase(SourceString)) but that adds a lot of processing time by converting the strings to uppercase every time the function is called.
For example, on a 1000000 loop, Pos takes 78ms while converting to uppercase takes 764ms.
str1 := 'dfkfkL%&/s"#<.676505';
for i := 0 to 1000000 do
PosEx('#<.', str1, 1); // Takes 78ms
for i := 0 to 1000000 do
PosEx(AnsiUpperCase('#<.'), AnsiUpperCase(str1), 1); // Takes 764ms
I know that to improve the performance of this specific example I can convert the strings to uppercase first before the loop, but the reason why I'm looking to have a Pos-like function that is not case-sensitive is to replace one from FastStrings. All the strings I'll be using Pos for will be different so I will need to convert each and every one to uppercase.
Is there any other function that might be faster than Pos + convert the strings to uppercase?

The built-in Delphi function to do that is in both the AnsiStrings.ContainsText for AnsiStrings and StrUtils.ContainsText for Unicode strings.
In the background however, they use logic very similar to your logic.
No matter in which library, functions like that will always be slow: especially to be as compatible with Unicode as possible, they need to have quite a lot of overhead. And since they are inside the loop, that costs a lot.
The only way to circumvent that overhead, is to do those conversions outside the loop as much as possible.
So: follow your own suggestion, and you have a really good solution.
--jeroen

This version of my previous answer works in both D2007 and D2010.
In Delphi 2007 the CharUpCaseTable is 256 bytes
In Delphi 2010 it is 128 KB (65535*2).
The reason is Char size. In the older version of Delphi my original code only supported the current locale character set at initialization. My InsensPosEx is about 4 times faster than your code. Certainly it is possible to go even faster, but we would lose simplicity.
type
TCharUpCaseTable = array [Char] of Char;
var
CharUpCaseTable: TCharUpCaseTable;
procedure InitCharUpCaseTable(var Table: TCharUpCaseTable);
var
n: cardinal;
begin
for n := 0 to Length(Table) - 1 do
Table[Char(n)] := Char(n);
CharUpperBuff(#Table, Length(Table));
end;
function InsensPosEx(const SubStr, S: string; Offset: Integer = 1): Integer;
var
n: Integer;
SubStrLength: Integer;
SLength: Integer;
label
Fail;
begin
Result := 0;
if S = '' then Exit;
if Offset <= 0 then Exit;
SubStrLength := Length(SubStr);
SLength := Length(s);
if SubStrLength > SLength then Exit;
Result := Offset;
while SubStrLength <= (SLength-Result+1) do
begin
for n := 1 to SubStrLength do
if CharUpCaseTable[SubStr[n]] <> CharUpCaseTable[s[Result+n-1]] then
goto Fail;
Exit;
Fail:
Inc(Result);
end;
Result := 0;
end;
//...
initialization
InitCharUpCaseTable({var}CharUpCaseTable);

I have also faced the problem of converting FastStrings, which used a Boyer-Moore (BM) search to gain some speed, for D2009 and D2010. Since many of my searches are looking for a single character only, and most of these are looking for non-alphabetic characters, my D2010 version of SmartPos has an overload version with a widechar as the first argument, and does a simple loop through the string to find these. I use uppercasing of both arguments to handle the few non-case-sensitive case. For my applications, I believe the speed of this solution is comparable to FastStrings.
For the 'string find' case, my first pass was to use SearchBuf and do the uppercasing and accept the penalty, but I have recently been looking into the possibility of using a Unicode BM implementation. As you may be aware, BM does not scale well or easily to charsets of Unicode proportions, but there is a Unicode BM implementation at Soft Gems. This pre-dates D2009 and D2010, but looks as if it would convert fairly easily. The author, Mike Lischke, solves the uppercasing issue by including a 67kb Unicode uppercasing table, and this may be a step too far for my modest requirements. Since my search strings are usually short (though not as short as your single three-character example) the overhead for Unicode BM may also be a price not worth paying: the BM advantage increases with the length of the string being searched for.
This is definitely a situation where benchmarking with some real-world application-specific examples will be needed before incorporating that Unicode BM into my own applications.
Edit: some basic benchmarking shows that I was right to be wary of the "Unicode Tuned Boyer-Moore" solution. In my environment, UTBM results in bigger code, longer time. I might consider using it if I needed some of the extras this implementation provides (handling surrogates and whole-words only searches).

Here's one that I wrote and have been using for years:
function XPos( const cSubStr, cString :string ) :integer;
var
nLen0, nLen1, nCnt, nCnt2 :integer;
cFirst :Char;
begin
nLen0 := Length(cSubStr);
nLen1 := Length(cString);
if nLen0 > nLen1 then
begin
// the substr is longer than the cString
result := 0;
end
else if nLen0 = 0 then
begin
// null substr not allowed
result := 0;
end
else
begin
// the outer loop finds the first matching character....
cFirst := UpCase( cSubStr[1] );
result := 0;
for nCnt := 1 to nLen1 - nLen0 + 1 do
begin
if UpCase( cString[nCnt] ) = cFirst then
begin
// this might be the start of the substring...at least the first
// character matches....
result := nCnt;
for nCnt2 := 2 to nLen0 do
begin
if UpCase( cString[nCnt + nCnt2 - 1] ) <> UpCase( cSubStr[nCnt2] ) then
begin
// failed
result := 0;
break;
end;
end;
end;
if result > 0 then
break;
end;
end;
end;

Why not just convert the both the substring and the source string to lower or upper case within the regular Pos statement. The result will effectively be case-insensitive because both arguments are all in one case. Simple and lite.

The Jedi Code Library has StrIPos and thousands of other useful functions to complement Delphi's RTL. When I still worked a lot in Delphi, JCL and its visual brother JVCL were among the first things I added to a freshly installed Delphi.

Instead 'AnsiUpperCase' you can use Table it is much faster.
I have reshape my old code. It is very simple and also very fast.
Check it:
type
TAnsiUpCaseTable = array [AnsiChar] of AnsiChar;
var
AnsiTable: TAnsiUpCaseTable;
procedure InitAnsiUpCaseTable(var Table: TAnsiUpCaseTable);
var
n: cardinal;
begin
for n := 0 to SizeOf(TAnsiUpCaseTable) -1 do
begin
AnsiTable[AnsiChar(n)] := AnsiChar(n);
CharUpperBuff(#AnsiTable[AnsiChar(n)], 1);
end;
end;
function UpCasePosEx(const SubStr, S: string; Offset: Integer = 1): Integer;
var
n :integer;
SubStrLength :integer;
SLength :integer;
label
Fail;
begin
SLength := length(s);
if (SLength > 0) and (Offset > 0) then begin
SubStrLength := length(SubStr);
result := Offset;
while SubStrLength <= SLength - result + 1 do begin
for n := 1 to SubStrLength do
if AnsiTable[SubStr[n]] <> AnsiTable[s[result + n -1]] then
goto Fail;
exit;
Fail:
inc(result);
end;
end;
result := 0;
end;
initialization
InitAnsiUpCaseTable(AnsiTable);
end.

I think, converting to upper or lower case before Pos is the best way, but you should try to call AnsiUpperCase/AnsiLowerCase functions as less as possible.

On this occasion I couldn't find any approach that was even as good as, let alone better than Pos() + some form of string normalisation (upper/lowercase conversion).
This is not entirely surprising as when benchmarked the Unicode string handling in Delphi 2009 I found that the Pos() RTL routine has improved significantly since Delphi 7, explained in part by the fact that aspects of the FastCode libraries have been incorporated into the RTL for some time now.
The FastStrings library on the other hand has not - iirc - been significantly updated for a long time now. In tests I found that many FastStrings routines have in fact been overtaken by the equivalent RTL functions (with a couple of exceptions, explained by the unavoidable overhead incurred by the additional complications of Unicode).
The "Char-Wise" processing of the solution presented by Steve is the best so far imho.
Any approach that involves normalising the entire strings (both string and sub-string) risks introducing errors in any character-based position in the results due to the fact that with Unicode strings a case conversion may result in a change in the length of the string (some characters convert to more/fewer characters in a case conversion).
These may be rare cases but Steve's routine avoids them and is only about 10% slower than the already quite fast Pos + Uppercase (your benchmarking results don't tally with mine on that score).

Often the simple solution is the one you'd want to use:
if AnsiPos(AnsiupperCase('needle'), AnsiupperCase('The Needle in the haystack')) <> 0 then
DoSomething;
Reference:
http://www.delphibasics.co.uk/RTL.asp?Name=ansipos
http://www.delphibasics.co.uk/RTL.asp?Name=UpCase

Any program on Windows can call a shell-API function, which keeps your code-size down. As usual, read the program from the bottom up. This has been tested with Ascii-strings only, not wide strings.
program PrgDmoPosIns; {$AppType Console} // demo case-insensitive Pos function for Windows
// Free Pascal 3.2.2 [2022/01/02], Win32 for i386
// FPC.EXE -vq -CoOr -Twin32 -oPrgStrPosDmo.EXE PrgStrPosDmo.LPR
// -vq Verbose: Show message numbers
// -C Code generation:
// o Check overflow of integer operations
// O Check for possible overflow of integer operations - Integer Overflow checking turns on Warning 4048
// r Range checking
// -Twin32 Target 32 bit Windows operating systems
// 29600 bytes code, 1316 bytes data, 35,840 bytes file
function StrStrIA( pszHaystack, pszNeedle : PChar ) : PChar; stdcall; external 'shlwapi.dll'; // dynamic link to Windows API's case-INsensitive search
// https://learn.microsoft.com/en-us/windows/win32/api/shlwapi/nf-shlwapi-strstria
// "FPC\3.2.2\Source\Packages\winunits-base\src\shlwapi.pp" line 557
function StrPos( strNeedle, strHaystk : string ) : SizeInt; // return the position of Needle within Haystack, or zero if not found
var
intRtn : SizeInt; // function result
ptrHayStk , // pointers to
ptrNeedle , // search strings
strMchFnd : PChar ; // pointer to match-found string, or null-pointer/empty-string when not found
bolFnd : boolean; // whether Needle was found within Haystack
intLenHaystk , // length of haystack
intLenMchFnd : SizeInt; // length of needle
begin
strHayStk := strHayStk + #0 ; // strings passed to API must be
strNeedle := strNeedle + #0 ; // null-terminated
ptrHayStk := Addr( strHayStk[ 1 ] ) ; // set pointers to point at first characters of
ptrNeedle := Addr( strNeedle[ 1 ] ) ; // null-terminated strings, so API gets C-style strings
strMchFnd := StrStrIA( ptrHayStk, ptrNeedle ); // call Windows to perform search; match-found-string now points inside the Haystack
bolFnd := ( strMchFnd <> '' ) ; // variable is True when match-found-string is not null/empty
if bolFnd then begin ; // when Needle was yes found in Haystack
intLenMchFnd := Length( strMchFnd ) ; // get length of needle
intLenHaystk := Length( strHayStk ) ; // get length of haystack
intRtn := intLenHaystk - intLenMchFnd; // set function result to the position of needle within haystack, which is the difference in lengths
end else // when Needle was not found in Haystack
intRtn := 0 ; // set function result to tell caller needle does not appear within haystack
StrPos := intRtn ; // pass function result back to caller
end; // StrPos
procedure TstOne( const strNeedle, strHayStk : string ); // run one test with this Needle
var
intPos : SizeInt; // found-match location of Needle within Haystack, or zero if none
begin
write ( 'Searching for : [', strNeedle, ']' ); // bgn output row for this test
intPos := StrPos( strNeedle, strHaystk ); // get Needle position
writeln(' StrPos is ' , intPos ); // end output row for this test
end; // TstOne
procedure TstAll( ); // run all tests with various Needles
const
strHayStk = 'Needle in a Haystack'; // all tests will search in this string
begin
writeln( 'Searching in : [', strHayStk, ']' ); // emit header row
TstOne ( 'Noodle' , strHayStk ); // test not-found
TstOne ( 'Needle' , strHayStk ); // test found at yes-first character
TstOne ( 'Haystack' , strHayStk ); // test found at not-first character
end; // TstAll
begin // ***** MAIN *****
TstAll( ); // run all tests
end.

function TextPos(const ASubText, AText: UnicodeString): Integer;
var
res: Integer;
begin
{
Locates a substring in a given text string without case sensitivity.
Returns the index of the first occurence of ATextin AText,
or zero if the text was not found
}
res := FindNLSString(LOCALE_USER_DEFAULT, FIND_FROMSTART or LINGUISTIC_IGNORECASE, PWideChar(AText), Length(AText), PWideChar(ASubText), Length(ASubText), nil);
Result := (res+1); //convert zero-based to one-based index, and -1 not found to zero.
end;
And in case you don't have the definitions:
function FindNLSString(Locale: LCID; dwFindNLSStringFlags: DWORD; lpStringSource: PWideChar; cchSource: Integer; lpStringValue: PWideChar; cchValue: Integer; cchFound: PInteger): Integer; stdcall; external 'Kernel32.dll';
const
FIND_FROMSTART = $00400000; // look for value in source, starting at the
LINGUISTIC_IGNORECASE = $00000010; // linguistically appropriate 'ignore

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart