I have the following codes:
procedure p1(const s:string);
var i,l:integer;
function skip:boolean; //inline not possible
begin
while (i<=l) and (s[i]<=' ') do inc(i);
result:=i<=l;
end;
begin
//skip() is VERY often called here
end;
procedure p2(const s:string);
function skip(const s:string;var i:integer;l:integer):boolean;inline;
begin
while (i<=l) and (s[i]<=' ') do inc(i);
result:=i<=l;
end;
var i,l:integer;
begin
//skip(s,i,l) is VERY often called here
end;
Which one would you prefer?
The first one is better readable, but slower, because skip() cannot be inlined.
The second one is faster, but very ugly, because every time all the parameters must be specified. Do you know another good readable and fast solution?
Don't prematurely optimize.
Stick with the clearer code unless you really need the performance boost.
The clearer one being the second one.
The 2nd one is more legible. I'm not dogmatically apposed to the use of global variables, but 2nd example just looks cleaner.
Since the 2nd example is faster as well... then your answer is simple. The 2nd one.
....and as a side note, if you really need that much speed out of the thing, inline asm and unrolling the loop can be possible options... but I don't know how this code is being used, or if that would make a difference.
Related
I have a TMemo that displays text from a query. I would like to remove all chars between '{' and '}' so this string '{color:black}😊{color}{color:black}{color}' would end up like this 😊.
MemoComments.Lines.Text := StringReplace(MemoComments.Lines.Text, '{'+ * +'}', '', rfReplaceAll);
I know that the * in my code is wrong. It's just a placeholder. How can I do this the right way?
Is this possible, or do I have to create a complicated loop?
This is a case where you can use a regular expression. I trust someone will publish such an answer for you very shortly.
However, just for the sake of completeness, I want to show that a loop-based approach isn't complicated at all, but rather straightforward:
function ExtractContent(const S: string): string;
var
i, c: Integer;
InBracket: Boolean;
begin
SetLength(Result, S.Length);
InBracket := False;
c := 0;
for i := 1 to S.Length do
begin
if S[i] = '{' then
InBracket := True
else if S[i]= '}' then
InBracket := False
else if not InBracket then
begin
Inc(c);
Result[c] := S[i];
end;
end;
SetLength(Result, c);
end;
Notice that I avoid unnecessary heap allocations.
(Personally, I have never been a huge fan of regular expressions. To me, the correctness of the above algorithm is obvious, it can only be interpreted in one way, and it is clearly written in a performant way. A regex, on the other hand, is a bit more like "magic". But I am a bit of a dinosaur, I admit that.)
Looks like you want a sort of regular expression, which Delphi fortunately offers in their RTL.
s := TRegEx.Replace('{color:black}😊{color}{color:black}{color}', '{.*?}', '', []);
or using the memo:
MemoComments.Lines.Text := TRegEx.Replace(MemoComments.Lines.Text, '{.*?}', '', []);
In this expression, {.*?}, .*? means any number (*) of any character (.), but as few as possible to match the rest of the expression (*?). That last bit is very powerful. By default, regexes are 'greedy', which means that .* would just match as many characters as possible, so it would take everything up to the last }, including the smiley and all the other color codes in between.
Pitfalls/cons
Like Andreas, I'm not a huge fan of regular expressions either. The awkward syntax can be hard to decypher, especially if you don't use them a lot.
Also, a seemingly simple regex can be hard to execute making it actually very slow sometimes, especially when working with larger strings. I recently bumped into one that was so magical, it was stuck for minutes on verifying whether a string of about 1000 characters matched a certain pattern.
The used expression is actually an example of that. It will have to look forward after the .*? part, to check whether it can satisfy the rest of the expression already. If not, go back, take another character, and look forward again. For this expression that's not an issue, but if an expression has multiple parts of variable length, this can be a CPU intensive process!
My earlier version, {[^}]*} is, theoretically at least, more efficient, because instead of any character, it just matches all characters that are not a }. Easier to execute, but harder to read. In the answer above I went for readability over performance, but it's always something to keep in mind.
Note that my first version, \{[^\}]*\} looked even more convoluted. I was using \ to escape the brackets, since they also have a special meaning for grouping, but it doesn't seem necessary in this case.
Lastly, there are different regex dialects, which is not helpful either.
That said
Fortunately Delphi wraps the PCRE library, which is open source, highly optimized, well maintained, well documented, and implements the most commonly used dialect.
And for operations like this they can be brief and easy to write, fast enough to use, and if you use them more often, it also becomes easier to read and write them, especially if you use a tool like regex101.com, where you can try out and debug regexes.
This question is to discuss how to code a spell corrector and is not duplicate of
Delphi Spell Checker component.
Two years ago, I found and used the code of spell corrector by Peter Norvig at his website in Python. But the performance seemed not high. Quite interestingly, more languages that implement the same task have been appended in his webpage list recently.
Some lines in Peter's page include syntax like:
[a + c + b for a, b in splits for c in alphabet]
How to translate it into delphi?
I am interested in how Delphi expert at SO will use the same theory and do the same task with some suitable lines and possible mediocre or better performance. This is not to downvote any language but to learn to compare how they implement the task differently.
Thanks so much in advance.
[Edit]
I will quote Marcelo Toledo who contributes C version, as saying "...While the purpose of this article [C version] was to show the algorithms, not to highlight Python...". Though his C version is with the second most lines, according to his article, his version is high performance when the dictionary file is huge. So this question is not highlight any language but to ask for delphi solution and it is not at all intended for competition, though Peter is influential in directing Google Research.
[Update]
I was enlightened by David's suggestion and studied theory and routine of Peter's page. A very rough and inefficient routine was done, slightly different from other languages, mine is GUI's. I am a beginner and learner in Delphi, I dare not post my complete code ( it is poorly written). I will outline my idea of how I did it. Your comment is welcome so that the routine will be improved.
My hardware and software is old. This is enough to my work (my specialization is not in computer or program related)
AMD Athlon Dual Core Processor
2.01 Ghz, 480 Memory
Windows XP SP2
IDE Delphi 7.0
This is the snapshot and record of processing time of 'correct' word.
I tried Gettickcount, Tdatetime, and Queryperformancecounter to track correct time for word, but gettickcount and Tdatetime will output o ms for each check, so I have to use
Queryperformancecounter. Maybe there are other ways to do it more precisely.
The total lines is 72, not including function that records check time. Number of lines may not be yardstick as mentioned above by Marcelo. The post is discuss how to do the task differently. Delphi Experts at SO will of course use minimum lines to do it with best performance.
procedure Tmajorform.FormCreate(Sender: TObject);
begin
loaddict;
end;
procedure Tmajorform.loaddict;
var
fs: TFilestream;
templist: TStringlist;
p1: tperlregex;
w1: string;
begin
//load that big.txt (6.3M, is Adventures of Sherlock Holmes)
//templist.loadfromstream
//Use Tperlregex to tokenize ( I used regular expression by [Jan Goyvaerts][5])
//The load and tokenize time is about 7-8 seconds on my machine, Maybe there are other ways to
//speed up loading and tokenizing.
end;
procedure Tmajorform.edits1(str: string);
var
i: integer;
ch: char;
begin
// This is to simulate Peter's page in order to fast generate all possible combinations.
// I do not know how to use set in delphi. I used array.
// Peter said his routine edits1 would generate 494 elements of 'something'. Mine will
// generate 469. I do not know why. Before duplicate ignore, mine is over 500. After setting
// duplicate ignore, there are 469 unique elements for 'something'.
end;
procedure Tmajorform.correct(str: string);
var
i, j: integer;
begin
//This is a loop and binary search to add candidate word into list.
end;
procedure Tmajorform.Button2Click(Sender: TObject);
var
str: string;
begin
// Trigger correct(str: string);
end;
It seems by Tfilestream it can increase loading by 1-2 second. I tried using CreateFileMapping method but failed and it seemed a little complicated. Maybe there are other ways to load huge file fast. Because this big.txt will not be big considering availability of corpus, there should be more efficient way to load larger and larger file.
Another point is Delphi 7.0 does not have built-in regular expression. I have a look at other languages that do spell check at Perter's page, they are largely Directly calling their built-in regular expression. Of course, real expert does not need any built-in class or library and can build by himself. To beginner, some classes or libraries are convenience.
Your comment is welcome.
[Update]
I continued the research and further included edits2 function (edit distance 2). This will increase about another 12 lines of code. Peter said edit distance 2 would include almost all possibilities. 'something' will have 114,324 possibilities. My function will generate 102,727 UNIQUE possibilities for it. Of course, suggested words will also include more.
If with edits2, reponse time for correction obviously delay as it increases data by about 200 times. But I find some suggested corrections are obviously impossibilities as a typist will not type a error word that will be in the long corrected word list. So, edit distance 1 will be better provided that the big.txt file is sufficiently big to include more correct words.
Below is the snapshot of tracking edits 2 correct time.
This is a Python list comprehension. It forms the Cartesian product of splits and alphabets.
Each item of splits is a tuple which is unpacked into a and b. Each item of alphabet is put into a variable called c. Then the 3 variables are concatenated, assuming that they are strings. The result of the list comprehension expression is a list containing elements of the form a + c + b, one element for each item in the Cartesian product.
In Python it could be written equivalently as
res = []
for a, b in splits:
for c in alphabets:
res.append(a + c + b)
In Delphi it would be
res := TStringList.Create;
for split in splits do
for c in alphabets do
res.Add(split.a + c + split.b);
I suggest you read up on Python list comprehensions to get a better understanding of this very powerful Python feature.
How to get the entire code of a method in memory so I can calculate its hash at runtime?
I need to make a function like this:
type
TProcedureOfObject = procedure of object;
function TForm1.CalculateHashValue (AMethod: TProcedureOfObject): string;
var
MemStream: TMemoryStream;
begin
result:='';
MemStream:=TMemoryStream.Create;
try
//how to get the code of AMethod into TMemoryStream?
result:=MD5(MemStream); //I already have the MD5 function
finally
MemStream.Free;
end;
end;
I use Delphi 7.
Edit:
Thank you to Marcelo Cantos & gabr for pointing out that there is no consistent way to find the procedure size due to compiler optimization. And thank you to Ken Bourassa for reminding me of the risks. The target procedure (the procedure I would like to compute the hash) is my own and I don't call another routines from there, so I could guarantee that it won't change.
After reading the answers and Delphi 7 help file about the $O directive, I have an idea.
I'll make the target procedure like this:
procedure TForm1.TargetProcedure(Sender: TObject);
begin
{$O-}
//do things here
asm
nop;
nop;
nop;
nop;
nop;
end;
{$O+}
end;
The 5 succesive nops at the end of the procedure would act like a bookmark. One could predict the end of the procedure with gabr's trick, and then scan for the 5 nops nearby to find out the hopefully correct size.
Now while this idea sounds worth trying, I...uhm... don't know how to put it into working Delphi code. I have no experience on lower level programming like how to get the entry point and put the entire code of the target procedure into a TMemoryStream while scanning for the 5 nops.
I'd be very grateful if someone could show me some practical examples.
Marcelo has correctly stated that this is not possible in general.
The usual workaround is to use an address of the method that you want to calculate the hash for and an address of the next method. For the time being the compiler lays out methods in the same order as they are defined in the source code and this trick works.
Be aware that substracting two method addresses may give you a slightly too large result - the first method may actually end few bytes before the next method starts.
The only way I can think of, is turning on TD32 debuginfo, and try JCLDebug to see if you can find the length in the debuginfo using it. Relocation shouldn't affect the length, so the length in the binary should be the same as in mem.
Another way would be to scan the code for a ret or ret opcode. That is less safe, but probably would guard at least part of the function, without having to mess with debuginfo.
The potential deal breaker though is short routines that are tail-call optimized (iow they jump instead of ret). But I don't know if Delphi does that.
You might struggle with this. Functions are defined by their entry point, but I don't think that there is any consistent way to find out the size. In fact, optimisers can do screwy things like merge two similar functions into a common shared function with multiple entry points (whether or not Delphi does stuff like this, I don't know).
EDIT: The 5-nop trick isn't guaranteed to work either. In addition to Remy's caveats (see his comment below), The compiler merely has to guarantee that the nops are the last thing to execute, not that they are last thing to appear in the function's binary image. Turning off optimisations is a rather baroque "solution" that still won't fix all the issues that others have raised.
In short, there are simply too many variables here for what you are trying to do. A better approach would be to target compilation units for checksumming (assuming it satisfies whatever overall objective you have).
I achieve this by letting Delphi generate a MAP-file and sorting symbols based on their start address in ascending order. The length of each procedure or method is then the next symbols start address minus this symbols start address. This is most likely as brittle as the other solutions suggested here but I have this code working in production right now and it has worked fine for me so far.
My implementation that reads the map-file and calculate sizes can be found here at line 3615 (TEditorForm.RemoveUnusedCode).
Even if you would achieve it, there is a few things you need to be aware of...
The hash will change many times, even if the function itself didn't change.
For example, the hash will change if your function call another function that changed address since the last build. I think the hash might also change if your function calls itself recursively and your unit (not necessarily your function) changed since the last build.
As for how it could be achieved, gabr's suggestion seems to be the best one... But it's really prone to break over time.
Code Complete says it is good practice to always use block identifiers, both for clarity and as a defensive measure.
Since reading that book, I've been doing that religiously. Sometimes it seems excessive though, as in the case below.
Is Steve McConnell right to insist on always using block identifiers? Which of these would you use?
//naughty and brief
with myGrid do
for currRow := FixedRows to RowCount - 1 do
if RowChanged(currRow) then
if not(RecordExists(currRow)) then
InsertNewRecord(currRow)
else
UpdateExistingRecord(currRow);
//well behaved and verbose
with myGrid do begin
for currRow := FixedRows to RowCount - 1 do begin
if RowChanged(currRow) then begin
if not(RecordExists(currRow)) then begin
InsertNewRecord(currRow);
end //if it didn't exist, so insert it
else begin
UpdateExistingRecord(currRow);
end; //else it existed, so update it
end; //if any change
end; //for each row in the grid
end; //with myGrid
I have always been following the 'well-behaved and verbose' style, except those unnecessary extra comments at the end blocks.
Somehow it makes more sense to be able to look at code and make sense out of it faster, than having to spend at least couple seconds before deciphering which block ends where.
Tip: Visual studio KB shortcut for C# jump begin and end: Ctrl + ]
If you use Visual Studio, then having curly braces for C# at the beginning and end of block helps also by the fact that you have a KB shortcut to jump to begin and end
Personally, I prefer the first one, as IMHO the "end;" don't tell me much, and once everything is close, I can tell by the identation what happens when.
I believe blocks are more useful when having large statements. You could make a mixed approach, where you insert a few "begin ... end;"s and comment what they are ending (for instance use it for the with and the first if).
IMHO you could also break this into more methods, for example, the part
if not(RecordExists(currRow)) then begin
InsertNewRecord(currRow);
end //if it didn't exist, so insert it
else begin
UpdateExistingRecord(currRow);
end; //else it existed, so update it
could be in a separate method.
I would use whichever my company has set for its coding standards.
That being said, I would prefer to use the second, more verbose, block. It is a lot easier to read. I might, however, leave off the block-ending comments in some cases.
I think it depends somewhat on the situation. Sometimes you simply have a method like this:
void Foo(bool state)
{
if (state)
TakeActionA();
else
TakeActionB();
}
I don't see how making it look like this:
void Foo(bool state)
{
if (state)
{
TakeActionA();
}
else
{
TakeActionB();
}
}
Improves on readability at all.
I'm a Python developer, so I see no need for block identifiers. I'm quite happy without them. Indentation is enough of an indicator for me.
Block identifier are not only easier to read they are much less error prone if you are changing something in the if else logic or simply adding a line and don't recognizing that the line is not in the same logical block then the rest of the code.
I would use the second code block. The first one looks prettier and more familiar but I think this a problem of the language and not the block identifiers
If it is possible I use checkstyle to ensure that brackets are used.
If I remember correctly, CC also gave some advices about comments. Especially about not rewriting what code does in comments, but explaining why it does what it does.
I'd say he's right just for the sake that the code can still be interpreted correctly if the indentation is incorrect. I always like to be able to find the start and end block identifiers for loops when I skim through code, and not rely on proper indentation.
It's never always one way or the other. Because I trust myself, I would use the shorter, more terse style. But if you're in a team environment where not everyone is of the same skill and maintainability is important, you may want to opt for the latter.
My knee-jerk reaction would be the second listing (with the repetitive comments removed from the end of the lines, like everyone's been saying), but after thinking about it more deeply I'd go with the first plus a one or two line useful comment beforehand explaining what's going on (if needed). Obviously in this toy example, even the comment before the concise answer would probably not be needed, but in other examples it might.
Having less (but still readable) and easy to understand code on the screen helps keep your brain space free for future parts of the code IMO.
I'm with those who prefer more concise code.
And it looks like prefering a verbose version to a concise one is more of a personal choice, than of a universal suitableness. (Well, within a company it may become a (mini-)universal rule.)
It's like excessive parentheses: some people prefer it like (F1 and F2) or ((not F2) and F3) or (A - (B * C)) < 0, and not necessarily because they do not know about the precedence rules. It's just more clear to them that way.
I vote for a happy medium. The rule I would use is to use the bracketing keywords any time the content is multiple lines. In action:
// clear and succinct
with myGrid do begin
for currRow := FixedRows to RowCount - 1 do begin
if RowChanged(currRow) then begin
if not(RecordExists(currRow))
InsertNewRecord(currRow);
else
UpdateExistingRecord(currRow);
end; // if RowChanged
end; // next currRow
end; // with myGrid
commenting the end is really usefull for html-like languages so do malformed C code like an infinite succession of if/else/if/else
frequent // comments at the end of code lines (per your Well Behaved and Verbose example) make the code harder to read imho -- when I see it I end up scanning the 'obvious' comments form something special that typically isn't there.
I prefer comments only where the obvious isn't (i.e. overall and / or unique functionality)
Personally I recommend always using block identifiers in languages that support them (but follow your company's coding standards, as #Muad'Dib suggests).
The reason is that, in non-Pythonic languages, whitespace is (generally) not meaningful to the compiler but it is to humans.
So
with myGrid do
for currRow := FixedRows to RowCount - 1 do
if RowChanged(currRow) then
Log(currRow);
if not(RecordExists(currRow)) then
InsertNewRecord(currRow)
else
UpdateExistingRecord(currRow);
appears to do one thing but does something quite different.
I would eliminate the end-of-line comments, though. Use an IDE that highlights blocks. I think Castalia will do that for Delphi. How often do you read code printouts anymore?
Every once in a while I'm editing some long pair of if-then-else statements (or worse, nested if-then-else statements) , like, say, this:
if A < B then
begin
DoSomething;
DoSomethingElse;
{...and more statements going on and on and on...}
FinallyWrapUpThisBit;
end
else
begin
DoThis;
DoThat;
{...and more statements going on and on and on...}
FinallyWrapUpThisBit;
end;
...and I find myself wanting to "collapse" the first begin-end pair, to bring up the lower "else" part (usually because I'm referring to something above the if-then statemnent. Maybe so it would just say "begin..." and have [+} sign to the left of it to expand it out again.
I've explored the "fold" functions in the IDE, but none of the commands seem to do this. It seems like my CodeRush for my old D6 did this, but I could be imagining things. (I have a very active imagination...).
Do any of the IDE plug-ins like Castalia (or some other one) do this?
With plain Delphi out of the box, you would have to surround your begin...end with
{$region 'begin...end'}
....
{$endregion}
which can be done through a code template...
I remember Castalia for the nice colored visualization of code blocks (begin..end) but I don't remember if it was foldable.
Use the refactoring tools to move the conditional branches' code into separate functions. Then you won't need to fold anything. You might also find that you can merge code that's common to the two branches, such as that call to FinallyWrapUpThisBit.
Another big helper here would be CNPack. It is a wizard which installs into Delphi and will colorize your begin/end pairs, making it MUCH easier to follow the code. It doesn't exactly do code folding, for that you need to use the {$REGION} {$ENDREGION} tags.