Related
The goto statement is taboo at my work.
So the following question is born...
Is there a situation possible where a goto is the only valid solution?
Originally GOTO was added to Pascal for error handling, including inter procedural forms that Borland(/Embarcadero) never implemented (example: GOTOing from a inner procedure to the parent), just like Borland never implemented other inner function functionality like passing inner functions to procedure-typed parameters.(*)
In that way GOTO can be considered the precursor to exceptions.
There still some practical uses: The last time I checked, jumping out of a nested IF statement with goto was still faster in Delphi then letting the code exit from a nested if naturally.
Optimizations like these are sometimes used in e.g. compression code, and other complex tree processing code with deeply nested loops or conditional statements.
Such routines often still use goto for errorhandling, because it is faster. (exceptions are not only slow, but their border conditions inhibit some optimizations).
One could see this as part of the plain Pascal level of Object Pascal, just like C++ still allows plain C nearly completely.
(of course, since the optimized compression code in Delphi is only delivered in .o form, it is hard to find examples in the Delphi codebase. The JPEG code has some, but that is a C translation)
(*) Original pascal, and IIRC even Turbo Pascal doesn't allow prematurely exiting a procedure with EXIT. Same for CONTINUE and BREAK.
Is there a situation possible where a GOTO is the only valid solution?
I suppose it depends on what you mean by valid. I suppose you are asking if there exists a program that can only be written with the use of the goto statement. In which case the answer is that there is no such program. Delphi is Turing complete with or without the goto statement.
However, if we are prepared to widen the discussion to include other languages, there are situations where goto is a good solution, even the best solution. The scenario that most commonly comes to mind is implementing tidy-up and error handling in languages without structured exception handling. If you peruse the Linux source code you will find that goto is widely used. I expect that the same is true of the Windows source code.
Goto is very old. It predates sub-routines like functions and procedures! It is also very dangerous and can make your code less readable (to others, or to yourself a few months later).
In theory it's not possible to have a situation where goto is required. I won't repeat the theory about Turing tape machines here, but using selection and iteration, you can re-order the code so in all possible input values the same output comes about.
In practice though, it's sometimes 'handy' and 'better readable' to 'jump away' from the flow of code in certain conditions, and that's where Exceptions come in. raise breaks away from the current execution, and jump to the closest finally or except section. This is safer because they work cascaded, and provide a better way to handle the context in case of one of these border conditions. (And there's also breakand abort and exit)
GOTO is never necessary. Any computable algorithm can be expressed with assignment and the combination of IF...THEN, BEGIN...END, and your choice of WHILE...DO...END or REPEAT...UNTIL. You don't even need subroutines. :)
This is known as the structured program theorem.
For a proof, see the 1966 paper, Flow Diagrams, Turing Machines and Languages with Only Two Formation Rules (PDF) by Corrado Böhm and Giuseppe Jacopini.
Something like 15 years ago I used the goto statement in Delphi to convert one of Bob Jenkins's hash functions from C to Pascal. The C function has a switch() statement without breaks after each case, and you can't do that with Pascal's case statement. So I converted it into a bunch of Pascal labels and gotos. I guess you would still have to do it the same way with the newest Delphi versions.
Edit: I guess using gotos would still be a reasonable way to do this. Gets the job done, easy to understand, limited to a short block of code, not dangerous.
I have a colleague in my team which is extensively using closures in our projects developed in Delphi. Personal, I don't like this because is making code harder to read and I believe that closures should be used ONLY when you need them.
In the other hand I've read Can someone explain Anonymous methods to me? and other links related to this, and I'm taking into account that maybe I'm wrong, so I'm asking you to give me some examples when is better to use closures instead of a 'old-fashion' approach (not using closures).
I believe that this question calls for a very subjective judgement. I am an old-school delphi developer, and inclined to agree with you. Not only do closures add certain risks (as David H points out in comments) they also reduce readability for all classically trained Delphi developers. So why were they added to the language at all? In Delphi XE, the syntax-formatting function and closures weren't working well together, for example, and this increased my mistrust of closures; How much stuff gets added to the Delphi compiler, that the IDE hasn't been fully upgraded to support? You know you're a cranky old timer when you admit publically that you would have been happy if the Delphi language was frozen at the Delphi 7 level and never improved again. But Delphi is a living, powerful, evolving syntax. And that's a good thing. Repeat that to yourself when you find the old-crank taking over. Give it a try.
I can think of at least ten places where Anonymous methods really make sense and thus, reasons why you should use them, notwithstanding my earlier comment that I mistrust them. I will only point out the two that I have decided to personally use, and the limits that I place on myself when I use them:
Sort methods in container classes in the Generics.Collections accept an anonymous method so that you can easily provide a sorting bit of logic without having to write a regular (non-anonymous) function that matches the same signature that the sort method expects. The new generics syntax fits hand in hand with this style, and though it looks alien to you at first, it grows on you and becomes if not exactly really nice to use, at least more convenient than the alternatives.
TThread methods like Synchronize are overloaded, and in addition to supporting a single TThreadMethod as a parameter Thread.Synchronize(aClassMethodWithoutParameters), it has always been a source of pain to me, to get the parameters into that synchronize method. now you can use a closure (anonymous method), and pass the parameters in.
Limits that I recommend in writing anonymous methods:
A. I have a personal rule of thumb of only ONE closure per function, and whenever there is more than one, refactor out that bit of code to its own method. This keeps the cyclomatic complexity of your "methods" from going insane.
B. Also, inside each closure, I prefer to have only a single method invocation, and its parameters, and if I end up writing giant blocks of code, I rewrite those to be methods. Closures are for variable capture, not a carte-blanche for writing endlessly-twisted spaghetti code.
Sample sort:
var
aContainer:TList<TPair<String, Integer>>;
begin
aContainer.Sort(
TMyComparer.Construct(
function (const L, R: TPair<String, Integer>): integer
begin
result := SysUtils.CompareStr(L.Key,R.Key);
end ) {Construct end} ); {aContainer.Sort end}
end;
Update: one comment points to "language uglification", I believe that the uglification refers to the difference between having to write:
x.Sort(
TMyComparer.Construct(
function (const L, R: TPair<String, Integer>): integer
begin
result := SysUtils.CompareStr(L.Key,R.Key);
end ) );
Instead of, the following hypothetical duck-typed (or should I have said inferred types) syntax that I just invented here for comparison:
x.Sort( lambda( [L,R], [ SysUtils.CompareStr(L.Key,R.Key) ] ) )
Some other languages like Smalltalk, and Python can write lambdas more compactly because they are dynamically typed. The need for an IComparer, for example, as the type passed to a Sort() method in a container, is an example of complexity caused by the interface-flavor that strongly typed languages with generics have to follow in order to implement traits like ordering, required for sortability. I don't think there was a nice way to do this. Personally I hate seeing procedure, begin and end keywords inside a function invocation parenthesis, but I don't see what else could reasonably have been done.
How to get the entire code of a method in memory so I can calculate its hash at runtime?
I need to make a function like this:
type
TProcedureOfObject = procedure of object;
function TForm1.CalculateHashValue (AMethod: TProcedureOfObject): string;
var
MemStream: TMemoryStream;
begin
result:='';
MemStream:=TMemoryStream.Create;
try
//how to get the code of AMethod into TMemoryStream?
result:=MD5(MemStream); //I already have the MD5 function
finally
MemStream.Free;
end;
end;
I use Delphi 7.
Edit:
Thank you to Marcelo Cantos & gabr for pointing out that there is no consistent way to find the procedure size due to compiler optimization. And thank you to Ken Bourassa for reminding me of the risks. The target procedure (the procedure I would like to compute the hash) is my own and I don't call another routines from there, so I could guarantee that it won't change.
After reading the answers and Delphi 7 help file about the $O directive, I have an idea.
I'll make the target procedure like this:
procedure TForm1.TargetProcedure(Sender: TObject);
begin
{$O-}
//do things here
asm
nop;
nop;
nop;
nop;
nop;
end;
{$O+}
end;
The 5 succesive nops at the end of the procedure would act like a bookmark. One could predict the end of the procedure with gabr's trick, and then scan for the 5 nops nearby to find out the hopefully correct size.
Now while this idea sounds worth trying, I...uhm... don't know how to put it into working Delphi code. I have no experience on lower level programming like how to get the entry point and put the entire code of the target procedure into a TMemoryStream while scanning for the 5 nops.
I'd be very grateful if someone could show me some practical examples.
Marcelo has correctly stated that this is not possible in general.
The usual workaround is to use an address of the method that you want to calculate the hash for and an address of the next method. For the time being the compiler lays out methods in the same order as they are defined in the source code and this trick works.
Be aware that substracting two method addresses may give you a slightly too large result - the first method may actually end few bytes before the next method starts.
The only way I can think of, is turning on TD32 debuginfo, and try JCLDebug to see if you can find the length in the debuginfo using it. Relocation shouldn't affect the length, so the length in the binary should be the same as in mem.
Another way would be to scan the code for a ret or ret opcode. That is less safe, but probably would guard at least part of the function, without having to mess with debuginfo.
The potential deal breaker though is short routines that are tail-call optimized (iow they jump instead of ret). But I don't know if Delphi does that.
You might struggle with this. Functions are defined by their entry point, but I don't think that there is any consistent way to find out the size. In fact, optimisers can do screwy things like merge two similar functions into a common shared function with multiple entry points (whether or not Delphi does stuff like this, I don't know).
EDIT: The 5-nop trick isn't guaranteed to work either. In addition to Remy's caveats (see his comment below), The compiler merely has to guarantee that the nops are the last thing to execute, not that they are last thing to appear in the function's binary image. Turning off optimisations is a rather baroque "solution" that still won't fix all the issues that others have raised.
In short, there are simply too many variables here for what you are trying to do. A better approach would be to target compilation units for checksumming (assuming it satisfies whatever overall objective you have).
I achieve this by letting Delphi generate a MAP-file and sorting symbols based on their start address in ascending order. The length of each procedure or method is then the next symbols start address minus this symbols start address. This is most likely as brittle as the other solutions suggested here but I have this code working in production right now and it has worked fine for me so far.
My implementation that reads the map-file and calculate sizes can be found here at line 3615 (TEditorForm.RemoveUnusedCode).
Even if you would achieve it, there is a few things you need to be aware of...
The hash will change many times, even if the function itself didn't change.
For example, the hash will change if your function call another function that changed address since the last build. I think the hash might also change if your function calls itself recursively and your unit (not necessarily your function) changed since the last build.
As for how it could be achieved, gabr's suggestion seems to be the best one... But it's really prone to break over time.
Code Complete says it is good practice to always use block identifiers, both for clarity and as a defensive measure.
Since reading that book, I've been doing that religiously. Sometimes it seems excessive though, as in the case below.
Is Steve McConnell right to insist on always using block identifiers? Which of these would you use?
//naughty and brief
with myGrid do
for currRow := FixedRows to RowCount - 1 do
if RowChanged(currRow) then
if not(RecordExists(currRow)) then
InsertNewRecord(currRow)
else
UpdateExistingRecord(currRow);
//well behaved and verbose
with myGrid do begin
for currRow := FixedRows to RowCount - 1 do begin
if RowChanged(currRow) then begin
if not(RecordExists(currRow)) then begin
InsertNewRecord(currRow);
end //if it didn't exist, so insert it
else begin
UpdateExistingRecord(currRow);
end; //else it existed, so update it
end; //if any change
end; //for each row in the grid
end; //with myGrid
I have always been following the 'well-behaved and verbose' style, except those unnecessary extra comments at the end blocks.
Somehow it makes more sense to be able to look at code and make sense out of it faster, than having to spend at least couple seconds before deciphering which block ends where.
Tip: Visual studio KB shortcut for C# jump begin and end: Ctrl + ]
If you use Visual Studio, then having curly braces for C# at the beginning and end of block helps also by the fact that you have a KB shortcut to jump to begin and end
Personally, I prefer the first one, as IMHO the "end;" don't tell me much, and once everything is close, I can tell by the identation what happens when.
I believe blocks are more useful when having large statements. You could make a mixed approach, where you insert a few "begin ... end;"s and comment what they are ending (for instance use it for the with and the first if).
IMHO you could also break this into more methods, for example, the part
if not(RecordExists(currRow)) then begin
InsertNewRecord(currRow);
end //if it didn't exist, so insert it
else begin
UpdateExistingRecord(currRow);
end; //else it existed, so update it
could be in a separate method.
I would use whichever my company has set for its coding standards.
That being said, I would prefer to use the second, more verbose, block. It is a lot easier to read. I might, however, leave off the block-ending comments in some cases.
I think it depends somewhat on the situation. Sometimes you simply have a method like this:
void Foo(bool state)
{
if (state)
TakeActionA();
else
TakeActionB();
}
I don't see how making it look like this:
void Foo(bool state)
{
if (state)
{
TakeActionA();
}
else
{
TakeActionB();
}
}
Improves on readability at all.
I'm a Python developer, so I see no need for block identifiers. I'm quite happy without them. Indentation is enough of an indicator for me.
Block identifier are not only easier to read they are much less error prone if you are changing something in the if else logic or simply adding a line and don't recognizing that the line is not in the same logical block then the rest of the code.
I would use the second code block. The first one looks prettier and more familiar but I think this a problem of the language and not the block identifiers
If it is possible I use checkstyle to ensure that brackets are used.
If I remember correctly, CC also gave some advices about comments. Especially about not rewriting what code does in comments, but explaining why it does what it does.
I'd say he's right just for the sake that the code can still be interpreted correctly if the indentation is incorrect. I always like to be able to find the start and end block identifiers for loops when I skim through code, and not rely on proper indentation.
It's never always one way or the other. Because I trust myself, I would use the shorter, more terse style. But if you're in a team environment where not everyone is of the same skill and maintainability is important, you may want to opt for the latter.
My knee-jerk reaction would be the second listing (with the repetitive comments removed from the end of the lines, like everyone's been saying), but after thinking about it more deeply I'd go with the first plus a one or two line useful comment beforehand explaining what's going on (if needed). Obviously in this toy example, even the comment before the concise answer would probably not be needed, but in other examples it might.
Having less (but still readable) and easy to understand code on the screen helps keep your brain space free for future parts of the code IMO.
I'm with those who prefer more concise code.
And it looks like prefering a verbose version to a concise one is more of a personal choice, than of a universal suitableness. (Well, within a company it may become a (mini-)universal rule.)
It's like excessive parentheses: some people prefer it like (F1 and F2) or ((not F2) and F3) or (A - (B * C)) < 0, and not necessarily because they do not know about the precedence rules. It's just more clear to them that way.
I vote for a happy medium. The rule I would use is to use the bracketing keywords any time the content is multiple lines. In action:
// clear and succinct
with myGrid do begin
for currRow := FixedRows to RowCount - 1 do begin
if RowChanged(currRow) then begin
if not(RecordExists(currRow))
InsertNewRecord(currRow);
else
UpdateExistingRecord(currRow);
end; // if RowChanged
end; // next currRow
end; // with myGrid
commenting the end is really usefull for html-like languages so do malformed C code like an infinite succession of if/else/if/else
frequent // comments at the end of code lines (per your Well Behaved and Verbose example) make the code harder to read imho -- when I see it I end up scanning the 'obvious' comments form something special that typically isn't there.
I prefer comments only where the obvious isn't (i.e. overall and / or unique functionality)
Personally I recommend always using block identifiers in languages that support them (but follow your company's coding standards, as #Muad'Dib suggests).
The reason is that, in non-Pythonic languages, whitespace is (generally) not meaningful to the compiler but it is to humans.
So
with myGrid do
for currRow := FixedRows to RowCount - 1 do
if RowChanged(currRow) then
Log(currRow);
if not(RecordExists(currRow)) then
InsertNewRecord(currRow)
else
UpdateExistingRecord(currRow);
appears to do one thing but does something quite different.
I would eliminate the end-of-line comments, though. Use an IDE that highlights blocks. I think Castalia will do that for Delphi. How often do you read code printouts anymore?
Since the age of the dinosaurs, Turbo Pascal and nowadays Delphi have a Write() and WriteLn() procedure that quietly do some neat stuff.
The number of parameters is variable;
Each variable can be of all sorts of types; you can supply integers, doubles, strings, booleans, and mix them all up in any order;
You can provide additional parameters for each argument:
Write('Hello':10,'World!':7); // alignment parameters
It even shows up in a special way in the code-completion drowdown:
Write ([var F:File]; P1; [...,PN] )
WriteLn ([var F:File]; [ P1; [...,PN]] )
Now that I was typing this I've noticed that Write and WriteLn don't have the same brackets in the code completion dropdown. Therefore it looks like this was not automatically generated, but it was hard-coded by someone.
Anyway, am I able to write procedures like these myself, or is all of this some magic hardcoded compiler trickery?
Writeln is what we call a compiler "magic" function. If you look in System.pas, you won't find a Writeln that is declared anything like what you would expect. The compiler literally breaks it all down into individual calls to various special runtime library functions.
In short, there is no way to implement your own version that does all the same things as the built-in writeln without modifying the compiler.
As the Allen said you can't write your own function that does all the same things.
You can, however, write a textfile driver that does something custom and when use standard Write(ln) to write to your textfile driver. We did that in ye old DOS days :)
("Driver" in the context of the previous statement is just a piece of Pascal code that is hooked into the system by switching a pointer in the System unit IIRC. Been a long time since I last used this trick.)
As far as I know, the pascal standards don't include variable arguments.
Having said that, IIRC, GNU Pascal let's you say something like:
Procecdure Foo(a: Integer; b: Integer; ...);
Try searching in your compiler's language docs for "Variable Argument Lists" or "conformant arrays". Here's an example of the later: http://www.gnu-pascal.de/demos/conformantdemo.pas.
As the prev poster said, writeln() is magic. I think the problem has to do with how the stack is assembled in a pascal function, but it's been a real long time since I've thought about where things were on the stack :)
However, unless you're writing the "writeln" function (which is already written), you probably don't need to implement a procedure with a variable arguments. Try iteration or recursion instead :)
It is magic compiler behaviour rather than regular procedure. And no, there is no way to write such subroutines (unfortunately!). Code generation resolves count of actual parameters and their types and translates to appropriate RTL calls (eg. Str()) at compile time. This opposes frequently suggested array of const (single variant array formal parameter, actually) which leads to doing the same at runtime. I'm finding later approach clumsy, it impairs code readability somewhat, and Bugland (Borland/Inprise/Codegear/Embarcadero/name it) broke Code Insight for variant open array constructors (yes, i do care, i use OutputDebugString(PChar(Format('...', [...])))) and code completion does not work properly (or at all) there.
So, closest possible way to simulate magic behaviour is to declare lot of overloaded subroutines (really lot of them, one per specific formal parameter type in the specific position). One could call this a kludge too, but this is the only way to get flexibility of variable parameter list and can be hidden in the separate module.
PS: i left out format specifiers aside intentionally, because syntax doesn't allow to semicolons use where Str(), Write() and Writeln() are accepting them.
Yes, you can do it in Delphi and friends (e.g. free pascal, Kylix, etc.) but not in more "standard" pascals. Look up variant open array parameters, which are used with a syntax something like this:
procedure MyProc(args : array of const);
(it's been a few years and I don't have manuals hand, so check the details before proceeding). This gives you an open array of TVarData (or something like that) that you can extract RTTI from.
One note though: I don't think you'll be able to match the x:y formatting syntax (that is special), and will probably have to go with a slightly more verbose wrapper.
Most is already said, but I like to add a few things.
First you can use the Format function. It is great to convert almost any kind of variable to string and control its size. Although it has its flaws:
myvar := 1;
while myvar<10000 do begin
Memo.Lines.Add(Format('(%3d)', [myVar]));
myvar := myvar * 10;
end;
Produces:
( 1)
( 10)
(100)
(1000)
So the size is the minimal size (just like the :x:y construction).
To get a minimal amount of variable arguments, you can work with default parameters and overloaded functions:
procedure WriteSome(const A1: string; const A2: string = ''; const A3: string = '');
or
procedure WriteSome(const A1: string); overload;
procedure WriteSome(const A1: Integer); overload;
You cannot write your own write/writeln in old Pascal. They are generated by the compiler, formatting, justification, etc. That's why some programmers like C language, even the flexible standard functions e.g. printf, scanf, can be implemented by any competent programmers.
You can even create an identical printf function for C if you are inclined to create something more performant than the one implemented by the C vendor. There's no magic trickery in them, your code just need to "walk" the variable arguments.
P.S.
But as MarkusQ have pointed out, some variants of Pascal(Free Pascal, Kylix, etc) can facilitate variable arguments. I last tinker with Pascal, since DOS days, Turbo Pascal 7.
Writeln is not "array of const" based, but decomposed by the compiler into various calls that convert the arguments to string and then call the primitive writestring. The "LN" is just a function that writes the lineending as a string. (OS dependant). The procedure variables (function pointers) for the primitives are part of the file type (Textrec/filerec), which is why they can be customized. (e.g. AssignCrt in TP)
If {$I+} mode is on, after each element, a call to the iocheck function is made.
The GPC construct made above is afaik the boundless C open array. FPC (and afaik Delphi too) support this too, but with different syntax.
procedure somehting (a:array of const);cdecl;
will be converted to be ABI compatible to C, printf style. This means that the relevant function (somehting in this case) can't get the number of arguments, but must rely on formatstring parsing. So this is something different from array of const, which is safe.
Although not a direct answer to you question, I would like to add the following comment:
I have recently rewritten some code using Writeln(...) syntax into using a StringList, filling the 'lines' with Format(...) and just plain IntToStr(...), FloatToStr(...) functions and the like.
The main reason for this change was speed improvement. Using a StringList and SaveFileTo is much, much more quicker than the WriteLn, Write combination.
If you are writing a program which creates a lot of text files (I was working on a web site creation program), this makes a lot of difference.