Retrieving second word to last but one word in a sentence in plsql - stored-procedures

I have written a program which executes properly in plsql but the output is wrong. Logically I donot see any issue with the code. Here is the code:
begin
inpstr := rtrim(ltrim(regexp_replace(inp,'\s{2,}','')));
lastpos := instr(inpstr,' ',-1);
out3 := SUBSTR(inpstr, INSTR(inpstr,' ',-1) + 1);
pos := instr(inpstr,' ');
out1 := substr(inpstr,1,pos);
out2 := substr(inpstr,pos,lastpos);
end;
O/P:
dbms_output.put_line('First Word:: '||out1||' '||'Second Word:: '||out2||' '||'Lastword:: '||out3);
PROCEDURE SPLITWORDS compiled
anonymous block completed
First Word:: Welcome Second Word:: to the world of analyti Lastword:: analytics!
But the second word should retrieve 'to the world of' but it fetches analyti.
Could anyone tell me whats wrong with my code.
Thanks,
Dex.

When you assign out2, you at least have to substract pos from lastPos.
Because lastPos in your sample-text "Welcome to the world of analytics!" will be 24 and pos will have 8, so the substring from pos 8 with a length of 24 is ... right: " to the world of analyti"

The third parameter of the substr function is the length of the substring, not the location of the ending character. Subtracting pos from lastpos (see below) would allow your code to work.
out2 := substr(inpstr, pos, lastpos - pos);

Related

Can I use an If statement inside a While statement?

while not eof(MyFile) do
begin
Readln(MyFile,sLine);
iCheck := pos('*',sLine);
if iCheck >= 0 then
K := K + 1;
else
K := 1;
sLine := sArrayParty[K];
end;
K is my counter. MyFile is set to a textfile.
I am trying to run an if statement inside a while statement to check if there is a * in a line read into sLine. If so, I want my counter to be incremented, thereby putting the following text into the next index of the array. So basically I have a text file with information separated by * as delimiters and I want each piece of information to populate a new index. But delphi spases everytime I put that if statement in.
You have a clear syntax error in your code, and the compiler explains exactly what it is:
[DCC Error] Unit1.pas(22): E2153 ';' not allowed before 'ELSE'
So read the words the error message contain, and remove the ; before the else:
if iCheck > 0 then
K := K + 1
else
K := 1;
The number in parenthesis after the Unit1.pas (in my example, (22) is the exact line number where the compiler stopped working, so the line before the else would be line 21. The line numbers at the bottom of the editor window tell you what line that is clearly.
if there is no '' then the pos function will return 0,
if there is '' then it will return the first occurance position.
if iCheck >= 0 then
should be changed to if iCheck > 0 then
i do remember we can assign the string to a string list and assign the stringlist's delimiter property to '*'
ie.
strlst.delimiter := '*';
strlst.text := sline;
so u can get each string traversing through the strlst...
(i used this long back,and i dont have delphi now to test it,please correct me if any thing is wrong)

Algol60 passing integer element of array as parameter - error bad type

I have following problem.
When I try to run the code with arun file.obj (I have compiled with algol.exe file)
BEGIN
INTEGER PROCEDURE fun(tab,index,lower,upper);
INTEGER tab,index,lower,upper;
BEGIN
INTEGER t;
text (1, "Start");
t := 0;
FOR index := lower STEP 1 UNTIL upper DO
t := t + tab;
fun := t;
END;
INTEGER ARRAY t[1:10];
INTEGER i,result,lower,upper;
lower := 1;
upper := 10;
FOR i := 1 STEP 1 UNTIL 10 DO
t[i] := i;
i := 1;
result := fun(t[i],i,lower,upper);
END FINISH;
I am still getting error:
ERROR 3
ADD PBASE PROC LOC
07D4 0886 1 13
083A 0842 0 115
The compiler I use is "The Rogalgol Algol60" product of RHA (Minisystems) Ltd.
Error 3 means "3 Procedure called where the actual and the formal parameter types do not match."
But I do not understand why. The reason of error is t[i] (If I change to i - it is ok).
Someone knows what I am doing wrongly?
I compile the code on the dosbox (linux)
Problem is that the index of the integer array that you're passing to your procedure isn't the same as the integer that he's expecting. I can't remember what an integer array is full of, but I guess it isn't integers... Have to admit I never use them, but can't remember why. Possibly because of limitations like this. I stick to Real arrays and EBCDIC ones.
You can almost certainly fix it by defining a new integer, j; inserting "j := t[i];" before your invocation of 'fun'; then invoking 'fun' with 'j' rather than t[i].
BTW you may want to make the array (and the 'for' loop) zero-relative. ALGOL is mostly zero-relative and I think it may save memory if you go with the flow.
Let me know if this helps....

Delphi - Calculate Minimum Maximum and Average in items in a list

I am building a single application to Calculate Min Max and Avg of Values in a List.
It is actually Temperatures. So I think I am Almost correct but there are 2 Errors.
var
Count, Average, Sum,i, Max, Min, K : Integer;
Temperatures : Array of Integer;
NoItems : Double;
begin
Count := 0;
Sum := 0;
Max := 0;
Min := 0;
Average := 0;
Count := lstTemp.Items.Count;
{Calculate Sum of Values in the list}
for i := 0 to Count - 1 do
Sum := Sum + StrToInt(lstTemp.Items[i]);
{Calculate Min and Max}
SetLength(Temperatures,Count);
for K:=0 to Count-1 do
Temperatures[K] := lstTemp.Items[K];
if (Temperatures[K] > Max) then
Max := Temperatures[K];
if (Temperatures[K] < Min) then
Min := Temperatures[K];
{Calculate Average}
Average := Sum / Count;
edtAvg.Text:=IntToStr(Average); //Display Average
edtAvg.Text:=IntToStr(Min); //Display Minimum Temp.
edtAvg.Text:=IntToStr(Max); //Display Maximum Temp.
end;
So the 2 Errors are
Error: Incompatible types: got "AnsiString" expected "LongInt"
This is for Average := Sum / Count;
Error: Incompatible types: got "Set Of Byte" expected "Double"
This Error is for Temperatures[K] := lstTemp.Items[K];
Any Ideas how to solve this?
Sum and Count are both Integers so I dont know why it shouldnt work!
Thanks
There is a number of problems. First, when you write
for K:=0 to Count-1 do
Temperatures[K] := lstTemp.Items[K];
if (Temperatures[K] > Max) then
Max := Temperatures[K];
if (Temperatures[K] < Min) then
Min := Temperatures[K];
you actually do
for K:=0 to Count-1 do
Temperatures[K] := lstTemp.Items[K];
if (Temperatures[K] > Max) then
Max := Temperatures[K];
if (Temperatures[K] < Min) then
Min := Temperatures[K];
which is nonsense. You want all these lines to be part of the for loop:
for K:=0 to Count-1 do
begin
Temperatures[K] := lstTemp.Items[K];
if (Temperatures[K] > Max) then
Max := Temperatures[K];
if (Temperatures[K] < Min) then
Min := Temperatures[K];
end;
Second, in order for this algorithm to work, the initial value of Min (Max) needs to be larger (smaller) than the values in the list. This might work for Max := 0, but probably not for Min := 0. You need to set Min to a very large value before you run the loop, obviously. The best value you can use is the highest-possible signed 32-bit integer value, that is, 2^31 - 1, which is the value of the MaxInt constant.
Third,
Temperatures[K] := lstTemp.Items[K];
is probably wrong. Temperatures is an array of integers, while lstTemp.Items[K] is a string (at least according to StrToInt(lstTemp.Items[i])), so you need
Temperatures[K] := StrToInt(lstTemp.Items[K]);
Fourth, you declare Average as an integer, but it needs to be a floating-point number (obviously), like real or double.
Fifth,
edtAvg.Text:=IntToStr(Average); //Display Average
edtAvg.Text:=IntToStr(Min); //Display Minimum Temp.
edtAvg.Text:=IntToStr(Max); //Display Maximum Temp.
is not techncally incorrect, but will most likely not do what you want.
Sixth, although not an error, there is no need for you to initialise Count and Average to 0. Finally, you only need a single for loop.
There is (at least in Delphi 2010 - unit Math) one function that will calculate the mean and standard deviation in one step and functions that return the minimum and maximum values in an array. BTW, Mean is the arithmetic average of all the values and is the correct term. (I copied an example that I am working on and modified to your example - it compiles at least):
type
a = array of double;
var
Temperatures : a;
Average,stddev3, Max, Min : extended;
// Compiler insists on extended for these properties
begin
Max := Math.MaxValue(Temperatures);
Min := Math.MinValue(Temperatures);
Math.MeanAndStdDev(Temperatures ,Average,stddev3);
end;
For the maximum value in an array use (it takes an array of double and returns double):
function MaxValue(const Data: array of Double): Double;
For the minimum value use the corresponding:
function MinValue(const Data: array of Double): Double;
I agree that average cannot be an integer, but there are 2 similar functions for integer arrays:
function MinIntValue(const Data: array of Integer): Integer; and
function MaxIntValue(const Data: array of Integer): Integer;
0909EM's reply was very well done, but I have a few disagreements. First, I don't believe there's a need to set any sentinel value at all; simply use the first temperature value. Second, if we put a Begin and End around every single line If statement we'd approach COBOL-like levels of English verbosity. As it is, it's a crying shame this simple problem takes so much code. Third, I would not use StrToIntDef. Remember these lines from the Zen Of Python (I don't care if you don't know Python; everyone should memorize it, at least until we get an I Ching of Intersimone):
Errors should never pass silently.
Unless explicitly silenced.
If a user passes incorrect data into the temperature stats procedure, StrToIntDef is going to silently convert these values to zeroes, an unexpected and undesired behavior. The caller is going to get back answers that they assume are ok (because of no errors), yet will have incorrect values (especially the average). It is a far better thing to let the procedure blow up so testing will reveal the incorrect input.
I'd also replace the For loops with For...in. I banged this together:
program temps;
{$APPTYPE CONSOLE}
{$R *.res}
uses
System.SysUtils, System.Classes, Generics.Collections, Math;
Var
someTemps : TStringList;
Procedure TempStats(temperatures : TStringList);
Var
temps : TList<Real>;
minTemp, maxTemp, sumTemps : Real;
numTemps : Integer;
tempStr : String;
temp : Real;
avgTemp : Real;
Begin
numTemps := temperatures.Count;
If numTemps > 0 then
Begin
temps := TList<Real>.Create;
For tempStr in temperatures Do
temps.Add(StrToFloat(tempStr));
minTemp := temps[0];
maxTemp := temps[0];
sumTemps := 0;
For temp in temps Do
Begin
minTemp := Min(minTemp, temp);
maxTemp := Max(maxTemp, temp);
sumTemps := sumTemps + temp;
End;
avgTemp := sumTemps / numTemps;
WriteLn(avgTemp:0:2);
WriteLn(minTemp:0:2);
WriteLn(maxTemp:0:2);
temps.Free;
End
Else
WriteLn('No temperatures passed.');
End;
Begin
someTemps := TStringList.Create;
someTemps.AddStrings(TArray<String>.Create('72', '93', '84', '76', '82'));
TempStats(someTemps);
ReadLn;
someTemps.Clear;
TempStats(someTemps);
someTemps.Free;
ReadLn;
end.
Firstly, Consider using StrToIntDef (String To Integer with a Default value) instead of StrToInt (String to Integer) this will yield the following...
value := StrToIntDef('Abcdef', 0); // value will be zero
vs
value := StrToInt('Abcdef'); // exception
But the question is do you want integers or floating point values for your temperatures? (eg 1 or 1.6?) If you want floating point values, maybe use StrToFloatDef...
Second, I've seen lots of grads that use Delphi make this mistake, try to always use begin and end, it'll help... because it makes it really clear what you are doing inside a if/for/while and what you intend to do outside..
for i := 0 to lstTemp.Items.Count - 1 do
begin
// Sum all the items in the list
Sum := Sum + StrToIntDef(lstTemp.Items[i], 0);
end;
Next up your array is a bit pointless, the SetLength and adding items bit is OK, but its not very functional, when you could just use the items in the list. All you need to do is hang onto the max and min values.
Then your last problem is that Average isn't going to be a whole integer, its going to have a fractional part. Eg. 5 divided by 2 is 2.5, not 2 and not 3. You could use trunc to return just the integer part, or change Average so that its a floating point number...
for K:=0 to lstTemp.Items.Count-1 do
begin
if (StrToIntDef(lstTemp.Items[K], 0) > Max) then
begin
Max := StrToIntDef(lstTemp.Items[K], 0);
end;
if (StrToIntDef(lstTemp.Items[K], 1000) < Min) then // note, really high number
begin
Min := StrToIntDef(lstTemp.Items[K], 1000);
end;
end;
{Calculate Average}
Average := Trunc(Sum / Count); // do you really want to trunc this? I suspect not.
if Min = 1000 then // just incase
begin
Min := 0;
end;
The final problem you will face is that your always setting the text of the same text box...
edtAvg.Text:=IntToStr(Average); //Display Average
edtMin.Text:=IntToStr(Min); //Display Minimum Temp. (I assume this is supposed to be edtMin)
edtMax.Text:=IntToStr(Max); //Display Maximum Temp. (I assume this is supposed to be edtMax)
I suppose the final improvement I'd make is noticing that you only need one for loop...
for K:=0 to lstTemp.Items.Count-1 do
begin
// Sum all the items in the list
Sum := Sum + StrToIntDef(lstTemp.Items[K], 0);
if (StrToIntDef(lstTemp.Items[K], Low(Integer)) > Max) then // A really low value
begin
Max := StrToIntDef(lstTemp.Items[K], Low(Integer));
end;
if (StrToIntDef(lstTemp.Items[K], High(Integer)) < Min) then // A really high value
begin
Min := StrToIntDef(lstTemp.Items[K], High(Integer));
end;
end;
The most important idea on how to solve this is to read your error messages properly. On a previous question you commented: "the error was saying it is an overloaded function or something". That attitude is not going to help you understand the problem. You need to read the error messages properly.
In this question you give the following description of your errors:
So the 2 Errors are Error: Incompatible types: got "AnsiString" expected "LongInt" This is for Average := Sum / Count; Error: Incompatible types: got "Set Of Byte" expected "Double" This Error is for Temperatures[K] := lstTemp.Items[K];
However, the description does not correspond to the errors you should be seeing based on the code provided.
It looks like you didn't read your errors, and just blindly started making changes in the hopes you would accidentally do something right. Because you didn't read the errors, you didn't notice that they changed. So when you came to us looking for help, you provided old errors with new code or vice-versa.
If you had actually read your error messages properly, you might have been able to solve the problem yourself. At the least, you would have been able to ask a better question with a description that actually matched the code.
Average := Sum / Count;
Average, Sum and Count are all declared as Integer. The error message you should be getting is: "Incompatible types: Integer and Extended".
If you read the error message, it should give you a clue to read up on Integer and Extended.
The problem here is that, in maths, division produces a Rational number. And correspondingly the result of a division operation in a program is not an Integer. So you need to declare Average as either Double or Extended.
Temperatures[K] := lstTemp.Items[K];
Temperatures is declared as an array of Integer. You haven't shown the declaration of lstTemp, but based on other code it's one of the standard Delphi Controls that has Items declared as TStrings. So the error message you should be getting is: "Incompatible types: Integer and string".
If you read the error message, it should give you a clue to do the same thing you did 5 lines earlier.
The reason for this error is that Delphi is a "strongly typed" language. The compiler tries to prevent you from making certain kinds of mistakes because it is much better catch them early. Imagine what might happen if one of the values in lstTemp were 'Hello'. That cannot be converted to an Integer; and would cause a "run-time" error in your program.
To fix this problem you need to tell the compiler: "I know the value is a string and could be any string, but I want you to convert it to an Integer". You do this by calling the StrToInt function. NOTE: You will still get a run time error if an invalid string is passed to the function, but by being forced to explicitly do the conversion, you can think about whether you want to do some pre-validation of your input data.
You asked about the errors reported by the compiler. That's just one kind of error you'll face when programming - and usually the easiest to resolve. You'll also encounter logic errors: where your program compiles successfully, but doesn't behave correctly. Andreas's excellent answer has covered those already, so I'll not repeat them.
However, I will give you some valuable advice. Once you've gotten over the hurdle of resolving compiler errors, and are able to do so easily - you need to as quickly as possible:
Get into the habit of testing your code thoroughly.
Learn how to use the integrated debugger.
Learn about its limitations.
Learn other debugging techniques: logging, profiling, pre- and post-condition checking.
Finally, as a response to alcalde's rant about there not being any simple functions to get Min, Max, Sum or Avg: I offer another possible implementation.
Basically the rant was about the fact that he'd far rather write something along the lines of:
begin
if (lstTemp.Count > 0) then
begin
edtMin.Text := lstTemp.Min;
edtMax.Text := lstTemp.Max;
edtAvg.Text := lstTemp.Average;
end
else
begin
ShowMessage('List is empty');
end;
end;
Obviously the above code won't compile, but with a little work we can achieve something similar.
He's perfectly right on two counts: (1) that this implementation would be cleaner, much easier to maintain and with less chance of errors. (2) Delphi doesn't provide a way to simply do that.
In fact, if you follow a top-down design approach, this might be your initial pseudo-code. You should be taught about top-down design, if not demand your money back. :)
The whole point behind the top-down-design approach is that you're looking for an ideal implementation. You're not worrying about what is/isn't there. If the current library and tools don't provide a Min function, you can write your own.
You are a programmer, you have the power!
I sometimes like to call this "wishful thinking programming". You're wishing if other things were in place, I could implement the functionality much more easily like "this". Then you go about making your wish come true.
Without further ado, here's the implementation. You will need to use the Math unit.
type
{ We will call existing functions that take TDoubleArray as input }
TDoubleArray = array of Double;
TStringsHelper = class(TStrings)
{ A useful class to help us convert TStrings into TDoubleArray }
public
class function Using(AStrings: TStrings): TStringsHelper;
function AsDoubleArray: TDoubleArray;
end;
{ TStringsHelper }
function TStringsHelper.AsDoubleArray: TDoubleArray;
var
LoopI: Integer;
begin
SetLength(Result, Count);
for LoopI := 0 to Count - 1 do
begin
Result[LoopI] := StrToFloat(Strings[LoopI]);
end;
end;
class function TStringsHelper.Using(AStrings: TStrings): TStringsHelper;
begin
Result := TStringsHelper(AStrings);
end;
var
LTemperatures: TDoubleArray;
begin
{ This code is almost the same as our "ideal" implementation }
if (lstTemp.Items.Count > 0) then
begin
LTemperatures := TStringsHelper.Using(lstTemp.Items).AsDoubleArray;
edtMin.Text := FloatToStr(MinValue(LTemperatures));
edtMin.Text := FloatToStr(MaxValue(LTemperatures));
edtMin.Text := FloatToStr(Mean(LTemperatures));
end
else
begin
ShowMessage('List is empty');
end;
end;
What values are in lstTemp.Items[i]?
I suppose the values are integers (without floating points), because you are using IntToStr.
Average cannot be an integer. Integer is a number (4 bytes) without a floating point. A simple numbers, such as 2,3,50,1500, -100
Assume that Sum = 100, and the Count = 3.
What Average will be?
So, you have to use float variable type, Double for example.
I hope it helps...

Delphi RichEdit find line containing string but not after parentheses

I am trying to get a routine that will find a string that does not follow a parentheses. For instance if the file open in the RichEdit contains these lines of CNC code, I want it to find the first two and ignore the third. In the second line it should only find and highlight the first occurrence of the search string. The search string (mach.TOOL_CHANGE_CALL) in this example is 'T'.
N1T1M6
N1T1M6(1/4-20 TAP .5 DP.)
(1/4-20 TAP .5 DP.)
I have gotten this far, but am stumped.
procedure TMainForm.ToolButton3Click(Sender: TObject); // find tool number
var
row:integer;
sel_str:string;
par:integer;
tool:integer;
tool_flag:integer ;
line_counter:integer;
tool_pos:integer;
line_begin:integer;
RE:TRichEdit;
begin
RE:=(ActiveMDIChild as TMDIChild).RichEdit1;
line_counter:=0;
tool_flag:=0;
tool_pos:=0;
row:=SendMessage(RE.Handle,EM_LINEFROMCHAR,-1, RE.SelStart);
while tool_flag =0 do
begin
RE.Perform(EM_LINESCROLL,0,line_counter);
sel_str := RE.Lines[Line_counter];
tool:=pos(mach.TOOL_CHANGE_CALL,sel_str);
par:=pos('(',sel_str);
if par=0 then
par:=pos('[',sel_str);
tool_pos:=tool_pos+length(sel_str);
if (tool>0) and (par = 0) then
begin
RE.SetFocus;
tool_pos:=tool_pos + line_counter-1;
line_begin:=tool_pos-tool;
RE.SelStart := line_begin;
RE.SelLength := Length(sel_str);
tool_flag:=1;
end;
inc (line_counter);
end;
end;
The results I get is that it will ignore the third string, but will also ignore the second string as well. It also will not find subsequent occurrences of the string in the file, it just starts back at the beginning to the text and finds the first one again. How can I get it to find the second example and then find the next 'T' at the next click of the button? I also need it to highlight the entire line the search string was found on.
Given the samples you posted, you can use Delphi (XE and higher) regular expressions to match the text you've indicated. Here, I've put the three sample lines you've shown into a TMemo (Memo1 in the code below), evaluate the regular expression, and put the matches found into Memo2 - as long as your TRichEdit contains only plain text, you can use the same code by replacing Memo1 and Memo2 with RichEdit1 and RichEdit2 respectively.
I've updated the code in both snippets to show how to get the exact position (as an offset from the first character) and length of the match result; you can use this to highlight the match in the richedit using SelStart and SelLength.
uses
RegularExpressions;
procedure TForm1.Button1Click(Sender: TObject);
var
Regex: TRegEx;
MatchResult: TMatch;
begin
Memo1.Lines.Clear;
Memo1.Lines.Add('N1T1M6');
Memo1.Lines.Add('N1T1M6(1/4-20 TAP .5 DP.)');
Memo1.Lines.Add('(1/4-20 TAP .5 DP.)');
Memo2.Clear;
// See the text below for an explanation of the regular expression
Regex := TRegEx.Create('^\w+T\w+', [roMultiLine]);
MatchResult := Regex.Match(Memo1.Lines.Text);
while MatchResult.Success do
begin
Memo2.Lines.Add(MatchResult.Value +
' Index: ' + IntToStr(MatchResult.Index) +
' Length: ' + IntToStr(MatchResult.Length));
MatchResult := MatchResult.NextMatch;
end;
end;
This produces the following results:
If you're using a version of Delphi that doesn't include regular expression support, you can use the free TPerlRegEx with some minor code changes to produce the same results:
uses
PerlRegEx;
procedure TForm1.Button1Click(Sender: TObject);
var
Regex: TPerlRegEx;
begin
Memo1.Lines.Clear;
Memo1.Lines.Add('N1T1M6');
Memo1.Lines.Add('N1T1M6(1/4-20 TAP .5 DP.)');
Memo1.Lines.Add('(1/4-20 TAP .5 DP.)');
Memo2.Clear;
Regex := TPerlRegEx.Create;
try
Regex.RegEx := '^\w+T\w+';
Regex.Options := [preMultiLine];
Regex.Subject := Memo1.Lines.Text;
if Regex.Match then
begin
repeat
Memo2.Lines.Add(Regex.MatchedText +
' Offset: ' + IntToStr(Regex.MatchedOffset) +
' Length: ' + IntToStr(Regex.MatchedLength));
until not Regex.MatchAgain;
end;
finally
Regex.Free;
end;
end;
The regular expression above (^\w+T\w+) means:
Options: ^ and $ match at line breaks
Assert position at the beginning of a line (at beginning
of the string or after a line break character) «^»
Match a single character that is a “word character” (letters,
digits, and underscores) «\w+»
Between one and unlimited times, as many times as possible,
giving back as needed (greedy) «+»
Match the character “T” literally «T»
Match a single character that is a “word character” (letters,
digits, and underscores) «\w+»
Between one and unlimited times, as many times as possible,
giving back as needed (greedy) «+»
Created with RegexBuddy
You can find a decent tutorial regarding regular expressions here. The tool I used for working out the regular expression (and actually producing much of the Delphi code for both examples) was RegexBuddy - I'm not affiliated with the company that produces it, but just a user of that product.

Is There A Fast GetToken Routine For Delphi?

In my program, I process millions of strings that have a special character, e.g. "|" to separate tokens within each string. I have a function to return the n'th token, and this is it:
function GetTok(const Line: string; const Delim: string; const TokenNum: Byte): string;
{ LK Feb 12, 2007 - This function has been optimized as best as possible }
var
I, P, P2: integer;
begin
P2 := Pos(Delim, Line);
if TokenNum = 1 then begin
if P2 = 0 then
Result := Line
else
Result := copy(Line, 1, P2-1);
end
else begin
P := 0; { To prevent warnings }
for I := 2 to TokenNum do begin
P := P2;
if P = 0 then break;
P2 := PosEx(Delim, Line, P+1);
end;
if P = 0 then
Result := ''
else if P2 = 0 then
Result := copy(Line, P+1, MaxInt)
else
Result := copy(Line, P+1, P2-P-1);
end;
end; { GetTok }
I developed this function back when I was using Delphi 4. It calls the very efficient PosEx routine that was originally developed by Fastcode and is now included in the StrUtils library of Delphi.
I recently upgraded to Delphi 2009 and my strings are all Unicode. This GetTok function still works and still works well.
I have gone through the new libraries in Delphi 2009 and there are many new functions and additions to it.
But I have not seen a GetToken function like I need in any of the new Delphi libraries, in the various fastcode projects, and I can't find anything with a Google search other than Zarko Gajic's: Delphi Split / Tokenizer Functions, which is not as optimized as what I already have.
Any improvement, even 10% would be noticeable in my program. I know an alternative is StringLists and to always keep the tokens separate, but this has a big overhead memory-wise and I'm not sure if I did all that work to convert whether it would be any faster.
Whew. So after all this long winded talk, my question really is:
Do you know of any very fast implementations of a GetToken routine? An assembler optimized version would be ideal?
If not, are there any optimizations that you can see to my code above that might make an improvement?
Followup: Barry Kelly mentioned a question I asked a year ago about optimizing the parsing of the lines in a file. At that time I hadn't even thought of my GetTok routine which was not used for the that read or parsing. It is only now that I saw the overhead of my GetTok routine which led me to ask this question. Until Carl Smotricz and Barry's answers, I had never thought of connecting the two. So obvious, but it just didn't register. Thanks for pointing that out.
Yes, my Delim is a single character, so obviously I have some major optimization I can do. My use of Pos and PosEx in the GetTok routine (above) blinded me to the idea that I can do it faster with a character by character search instead, with bits of code like:
while (cp^ > #0) and (cp^ <= Delim) do
Inc(cp);
I'm going to go through everyone's answers and try the various suggestions and compare them. Then I'll post the results.
Confusion: Okay, now I'm really perplexed.
I took Carl and Barry's recommendation to go with PChars, and here is my implementation:
function GetTok(const Line: string; const Delim: string; const TokenNum: Byte): string;
{ LK Feb 12, 2007 - This function has been optimized as best as possible }
{ LK Nov 7, 2009 - Reoptimized using PChars instead of calls to Pos and PosEx }
{ See; https://stackoverflow.com/questions/1694001/is-there-a-fast-gettoken-routine-for-delphi }
var
I: integer;
PLine, PStart: PChar;
begin
PLine := PChar(Line);
PStart := PLine;
inc(PLine);
for I := 1 to TokenNum do begin
while (PLine^ <> #0) and (PLine^ <> Delim) do
inc(PLine);
if I = TokenNum then begin
SetString(Result, PStart, PLine - PStart);
break;
end;
if PLine^ = #0 then begin
Result := '';
break;
end;
inc(PLine);
PStart := PLine;
end;
end; { GetTok }
On paper, I don't think you can do much better than this.
So I put both routines to the task and used AQTime to see what's happening. The run I had included 1,108,514 calls to GetTok.
AQTime timed the original routine at 0.40 seconds. The million calls to Pos took 0.10 seconds. A half a million of the TokenNum = 1 copies took 0.10 seconds. The 600,000 PosEx calls only took 0.03 seconds.
Then I timed my new routine with AQTime for the same run and exactly the same calls. AQTime reports that my new "fast" routine took 3.65 seconds, which is 9 times as long. The culprit according to AQTime was the first loop:
while (PLine^ <> #0) and (PLine^ <> Delim) do
inc(PLine);
The while line, which was executed 18 million times, was reported at 2.66 seconds. The inc line, executed 16 million times, was said to take 0.47 seconds.
Now I thought I knew what was happening here. I had a similar problem with AQTime in a question I posed last year: Why is CharInSet faster than Case statement?
Again it was Barry Kelly who clued me in. Basically, an instrumenting profiler like AQTime does not necessarily do the job for microoptimization. It adds an overhead to each line which may swamp the results which is shown clearly in these numbers. The 34 million lines executed in my new "optimized code" overwhelm the several million lines of my original code, with apparently little or no overhead from the Pos and PosEx routines.
Barry gave me a sample of code using QueryPerformanceCounter to check that he was correct, and in that case he was.
Okay, so let's do the same now with QueryPerformanceCounter to prove that my new routine is faster and not 9 times slower as AQTime says it is. So here I go:
function TimeIt(const Title: string): double;
var i: Integer;
start, finish, freq: Int64;
Seconds: double;
begin
QueryPerformanceCounter(start);
for i := 1 to 250000 do
GetTokOld('This is a string|that needs|parsing', '|', 1);
for i := 1 to 250000 do
GetTokOld('This is a string|that needs|parsing', '|', 2);
for i := 1 to 250000 do
GetTokOld('This is a string|that needs|parsing', '|', 3);
for i := 1 to 250000 do
GetTokOld('This is a string|that needs|parsing', '|', 4);
QueryPerformanceCounter(finish);
QueryPerformanceFrequency(freq);
Seconds := (finish - start) / freq;
Result := Seconds;
end;
So this will test 1,000,000 calls to GetTok.
My old procedure with the Pos and PosEx calls took 0.29 seconds.
The new one with PChars took 2.07 seconds.
Now I am completely befuddled! Can anyone tell me why the PChar procedure is not only slower, but is 8 to 9 times slower!?
Mystery solved! Andreas said in his answer to change the Delim parameter from a string to a Char. I'll always be using just a Char, so at least for my implementation this is very possible. I was amazed at what happened.
The time for the 1 million calls went down from 1.88 seconds to .22 seconds.
And surprisingly, the time for my original Pos/PosEx routine went UP from .29 to .44 seconds when I changed it's Delim parameter to a Char.
Frankly, I'm disappointed by Delphi's optimizer. That Delim is a constant parameter. The optimizer should have noticed that the same conversion is happening within the loop and should have moved it out so that it would only be done once.
Double checking my Code generation parameters, yes I do have Optimization True and String format checking Off.
Bottom line is that the new PChar routine with Andrea's fix is about 25% faster than my original (.22 versus .29).
I still want to follow up on the other comments here and test them out.
Turning off optimization and turning on String format checking only increases the time from .22 to .30. It adds about the same to the original.
The advantage to using assembler code, or calling routines written in assembler like Pos or PosEx is that they are NOT subject to what code generation options you have set. They will always run the same way, a pre-optimized and non-bloated way.
I have reaffirmed in the last couple of days, that the best way to compare code for microoptimization is to look at and compare the Assembler code in the CPU window. It would be nice if Embarcadero could make that window a bit more convenient, and allow us to copy portions to the clipboard or to print sections of it.
Also, I unfairly slammed AQTime earlier in this post, thinking that the extra time added for my new routine was solely because of the instrumentation it added. Now that I go back and check with the Char parameter instead of String, the while loop is down to .30 seconds (from 2.66) and the inc line is down to .14 seconds (from .47). Strange that the inc line would go down as well. But I'm getting worn out from all this testing already.
I took Carl's idea of looping by characters, and rewrote that code with that idea. It makes another improvement, down to .19 seconds from .22. So here is now the best so far:
function GetTok(const Line: string; const Delim: Char; const TokenNum: Byte): string;
{ LK Nov 8, 2009 - Reoptimized using PChars instead of calls to Pos and PosEx }
{ See; https://stackoverflow.com/questions/1694001/is-there-a-fast-gettoken-routine-for-delphi }
var
I, CurToken: Integer;
PLine, PStart: PChar;
begin
CurToken := 1;
PLine := PChar(Line);
PStart := PLine;
for I := 1 to length(Line) do begin
if PLine^ = Delim then begin
if CurToken = TokenNum then
break
else begin
CurToken := CurToken + 1;
inc(PLine);
PStart := PLine;
end;
end
else
inc(PLine);
end;
if CurToken = TokenNum then
SetString(Result, PStart, PLine - PStart)
else
Result := '';
end;
There still may be some minor optimizations to this, such as the CurToken = Tokennum comparison, which should be the same type, Integer or Byte, whichever is faster.
But let's say, I'm happy now.
Thanks again to the StackOverflow Delphi community.
It makes a big difference what "Delim" is expected to be. If it's expected to be a single character, you're far better off stepping through the string character by character, ideally through a PChar, and testing specifically.
If it's a long string, Boyer-Moore and similar searches have a set-up phase for skip tables, and the best way would be to build the tables once, and reuse them for each subsequent find. That means you need state between calls, and this function would be better off as a method on an object instead.
You might be interested in this answer I gave to a question some time before, about the fastest way to parse a line in Delphi. (But I see that it is you that asked the question! Nevertheless, in solving your problem, I would hew to how I described parsing, not using PosEx like you are using, depending on what Delim normally looks like.)
UPDATE: OK, I spent about 40 minutes looking at this. If you know the delimiter is going to be a character, you're pretty much always better off with the second version (i.e. PChar scanning), but you have to pass Delim as a character. At the time of writing, you're converting the PLine^ expression - of type Char - to a string for comparison with Delim. That will be very slow; even indexing into the string, with Delim[1] will also be somewhat slow.
However, depending on how large your lines are, and how many delimited pieces you want to pull out, you may be better off with a resumable approach, rather than skipping unwanted delimited pieces inside the tokenizing routine. If you call GetTok with successively increasing indexes, like you are currently doing in your mini benchmark, you'll end up with O(n*n) performance, where n is the number of delimited sections. That can be turned into O(n) if you save the state of the scan and restore it for the next iteration, or pack all extracted items into an array.
Here's a version that does all tokenization once, and returns an array. It needs to tokenize twice though, in order to know how large to make the array. On the other hand, only the second tokenization needs to extract the strings:
// Do all tokenization up front.
function GetTok4(const Line: string; const Delim: Char): TArray<string>;
var
cp, start: PChar;
count: Integer;
begin
// Count sections
count := 1;
cp := PChar(Line);
start := cp;
while True do
begin
if cp^ <> #0 then
begin
if cp^ <> Delim then
Inc(cp)
else
begin
Inc(cp);
Inc(count);
end;
end
else
begin
Inc(count);
Break;
end;
end;
SetLength(Result, count);
cp := start;
count := 0;
while True do
begin
if cp^ <> #0 then
begin
if cp^ <> Delim then
Inc(cp)
else
begin
SetString(Result[count], start, cp - start);
Inc(cp);
Inc(count);
end;
end
else
begin
SetString(Result[count], start, cp - start);
Break;
end;
end;
end;
Here's the resumable approach. The loads and stores of the current position and delimiter character do have a cost, though:
type
TTokenizer = record
private
FSource: string;
FCurrPos: PChar;
FDelim: Char;
public
procedure Reset(const ASource: string; ADelim: Char); inline;
function GetToken(out AResult: string): Boolean; inline;
end;
procedure TTokenizer.Reset(const ASource: string; ADelim: Char);
begin
FSource := ASource; // keep reference alive
FCurrPos := PChar(FSource);
FDelim := ADelim;
end;
function TTokenizer.GetToken(out AResult: string): Boolean;
var
cp, start: PChar;
delim: Char;
begin
// copy members to locals for better optimization
cp := FCurrPos;
delim := FDelim;
if cp^ = #0 then
begin
AResult := '';
Exit(False);
end;
start := cp;
while (cp^ <> #0) and (cp^ <> Delim) do
Inc(cp);
SetString(AResult, start, cp - start);
if cp^ = Delim then
Inc(cp);
FCurrPos := cp;
Result := True;
end;
Here's the full program I used for benchmarking.
Here are the results:
*** count=3, Length(src)=200
GetTok1: 595 ms
GetTok2: 547 ms
GetTok3: 2366 ms
GetTok4: 407 ms
GetTokBK: 226 ms
*** count=6, Length(src)=350
GetTok1: 1587 ms
GetTok2: 1502 ms
GetTok3: 6890 ms
GetTok4: 679 ms
GetTokBK: 334 ms
*** count=9, Length(src)=500
GetTok1: 3055 ms
GetTok2: 2912 ms
GetTok3: 13766 ms
GetTok4: 947 ms
GetTokBK: 446 ms
*** count=12, Length(src)=650
GetTok1: 4997 ms
GetTok2: 4803 ms
GetTok3: 23021 ms
GetTok4: 1213 ms
GetTokBK: 543 ms
*** count=15, Length(src)=800
GetTok1: 7417 ms
GetTok2: 7173 ms
GetTok3: 34644 ms
GetTok4: 1480 ms
GetTokBK: 653 ms
Depending on the characteristics of your data, whether the delimiter is likely to be a character or not, and how you work with it, different approaches may be faster.
(I made a mistake in my earlier program, I wasn't measuring the same operations for each style of routine. I updated the pastebin link and benchmark results.)
Your new function (the one with PChar) should declare "Delim" as Char and not as String. In your current implementation the compiler has to convert the PLine^ char into a string to compare it with "Delim". And that happens in a tight loop resulting is an enormous performance hit.
function GetTok(const Line: string; const Delim: Char{<<==}; const TokenNum: Byte): string;
{ LK Feb 12, 2007 - This function has been optimized as best as possible }
{ LK Nov 7, 2009 - Reoptimized using PChars instead of calls to Pos and PosEx }
{ See; http://stackoverflow.com/questions/1694001/is-there-a-fast-gettoken-routine-for-delphi }
var
I: integer;
PLine, PStart: PChar;
begin
PLine := PChar(Line);
PStart := PLine;
inc(PLine);
for I := 1 to TokenNum do begin
while (PLine^ <> #0) and (PLine^ <> Delim) do
inc(PLine);
if I = TokenNum then begin
SetString(Result, PStart, PLine - PStart);
break;
end;
if PLine^ = #0 then begin
Result := '';
break;
end;
inc(PLine);
PStart := PLine;
end;
end; { GetTok }
Delphi compiles to VERY efficient code; in my experience, it was very difficult to do better in assembler.
I think you should just point a PChar (they still exist, don't they? I parted ways with Delphi around 4.0) at the beginning of the string and increment it while counting "|"s, until you've found n-1 of them. I suspect that will be faster than calling PosEx repeatedly.
Take note of that position, then increment the pointer some more until you hit the next pipe. Pull out your substring. Done.
I'm only guessing, but I wouldn't be surprised if this was close to the quickest this problem can be solved.
EDIT: Here's what I had in mind. This code is, alas, uncompiled and untested, but it should demonstrate what I meant.
In particular, Delim is treated as a single char, which I believe makes a world of difference if that will fulfill the requirements, and the character at PLine is tested only once. Finally, there's no more comparison against TokenNum; I believe it's faster to decrement a counter to 0 for counting delimiters.
function GetTok(const Line: string; const Delim: string; const TokenNum: Byte): string;
var
Del: Char;
PLine, PStart: PChar;
Nth, I, P0, P9: Integer;
begin
Del := Delim[1];
Nth := TokenNum + 1;
P0 := 1;
P9 := Line.length + 1;
PLine := PChar(line);
for I := 1 to P9 do begin
if PLine^ = Del then begin
if Nth = 0 then begin
P9 := I;
break;
end;
Dec(Nth);
if Nth = 0 then P0 := I + 1
end;
Inc(PLine);
end;
if (Nth <= 1) or (TokenNum = 1) then
Result := Copy(Line, P0, P9 - P0);
else
Result := ''
end;
Using assembler would be a micro-optimization. There are much greater gains to be had by optimizing the algorithm. Not doing work beats doing work in the fastest possible way, every time.
One example would be if you have places in your program where you need several tokens of the same line. Another procedure that returns an array of tokens which you can then index into should be faster than calling your function more than once, especially if you let the procedure not return all tokens, but only as many as you need.
But in general I agree with Carl's answer (+1), using a PChar for scanning would probably be faster than your current code.
This is a function that I've had in my personal library for quite some time that I use extensively. I believe this is the most current version of it. I've had multiple versions in the past being optimized for a variety of different reasons. This one tries to take into account Quoted strings, but if that code is removed it makes the function a slight bit faster.
I actually have a number of other routines, CountSections and ParseSectionPOS being a couple of examples.
Unfortuately this routine is ansi/pchar based only. Although I don't think it would be difficult to move it to unicode. Maybe I've already done that...I'll have to check on that.
Note: This routine is 1 based in the ParseNum indexing.
function ParseSection(ParseLine: string; ParseNum: Integer; ParseSep: Char; QuotedStrChar:char = #0) : string;
var
wStart, wEnd : integer;
wIndex : integer;
wLen : integer;
wQuotedString : boolean;
begin
result := '';
wQuotedString := false;
if not (ParseLine = '') then
begin
wIndex := 1;
wStart := 1;
wEnd := 1;
wLen := Length(ParseLine);
while wEnd <= wLen do
begin
if (QuotedStrChar <> #0) and (ParseLine[wEnd] = QuotedStrChar) then
wQuotedString := not wQuotedString;
if not wQuotedString and (ParseLine[wEnd] = ParseSep) then
begin
if wIndex=ParseNum then
break
else
begin
inc(wIndex);
wStart := wEnd+1;
end;
end;
inc(wEnd);
end;
result := copy(ParseLine, wStart, wEnd-wStart);
if (length(result) > 0) and (QuotedStrChar <> #0) and (result[1] = QuotedStrChar) then
result := AnsiDequotedStr(result, QuotedStrChar);
end;
end; { ParseSection }
In your code, I think this is the only line that can be optimized:
Result := copy(Line, P+1, MaxInt)
If you calculate the new Length there, it might get a bit faster, but not the 10% you are looking for.
Your tokenizing algorithm seems pretty OK.
For optimizing it, I would run it through a profiler (like AQTime from AutomatedQA) with a representative subset of your production data. That will point you to the weakest spot.
The only RTL function that comes close is this one in the Classes unit:
procedure TStrings.SetDelimitedText(const Value: string);
It tokenizes, but uses both QuoteChar and Delimiter, but you only use a Delimiter.
It uses the SetString function in the System unit which is a pretty fast way to set the content of a string based on a PChar/PAnsiChar/PUnicodeChar and a length.
That might get you some improvement as well; on the other hand, Copy is really fast too.
I'm not the person always blaming the algorithm, but if I look at the first piece of source,
the problem is that for string N, you do the POS/posexes for string 1..n-1 again too.
This means for N items, you do sum (n, n-1,n-2...1) POSes (=+/- 0.5*N^2) , while only N are needed.
If you simply cache the position of the last found result, e.g. in a record that is passed by VAR parameter, you can gain a lot.
type
TLastPosition = record
elementnr : integer; // last tokennumber
elementpos: integer; // character index of last match
end;
and then something
if tokennum=(lastposition.elementnr+1) then
begin
newpos:=posex(delim,line,lastposition.elementpos);
end;
Unfortunately, I don't have the time now to write it out, but I hope you get the idea

Resources