How does the compiler handle case statements? - delphi

In the documentation under case statements, it says:
Each value represented by a caseList must be unique in the case
statement;
And the example shown,
case I of
1..5: Caption := 'Low';
6..9: Caption := 'High';
0, 10..99: Caption := 'Out of range';
else
Caption := ;
end
is equivalent to the nested conditional:
if I in [1..5] then
Caption := 'Low';
else if I in [6..10] then
Caption := 'High';
else if (I = 0) or (I in [10..99]) then
Caption := 'Out of range'
else
Caption := ;
So the first quote suggests it is handled like a set (read the comments here at least one person is with me on this).
Now I know the part
where selectorExpression is any expression of an ordinal type smaller
than 32 bits
contradicts with the properties of sets as it is mentioned her under sets:
The base type can have no more than 256 possible values, and their
ordinalities must fall between 0 and 255
What is really bugging me is that why it is a necessity to have a unique values in the caseList. If it is equivalent to the if statement, the second value would be just not tested because the compiler already found a prior match?

The documentation takes a specific case statement which is equivalent to a specific if statement.
In general, any case statement can be rewritten as an if statement using the same approach. However, the inverse is not true.
The documentation uses an equivalent if statement to explain the logical behaviour (or semantics) of the case statement. It's not a representation of what the compiler does internally.
How does the compiler handle case statement?
First note there are 2 aspects to this question.
Semantically the compiler must handle the case statement as indicated in the documentation. This includes:
Ensuring the values of each caseList entry can be evaluated at compile-time.
Ensuring the caseList entries are unique.
Whichever caseList entry matches, the corresponding caseList-statement is called.
If no caseList entries match, the else statements are called.
However, the compiler is entitled to optimise the implementation as it sees fit provided the optimised byte/machine code is logically equivalent.
Johan's answer describes common optimisations: jump-lists and reordering.
These are much easier to apply given the strict semantics.
What is really bugging me is that why it is a necessity to have a unique values in the caseList.
The uniqueness is required to remove ambiguity. Which caseList-statement should be used if more than one matches?
It could call the first matching caseList-statement and ignore the rest. (SQL Server CASE statement behaves like this.) Also see [1] below.
It could call all of them. (If I remember correctly the MANTIS programming language uses this semantic for its version of a case statement.)
It could report an error requiring the programmer to disambiguate the caseList. (Simply put, this is what Delphi's specification requires. Many other languages use the same approach. Quibbling about it is unproductive especially since this choice is unlikely to be much of a hindrance.)
If it is equivalent to the if statement the second value would be just not tested because the compiler already found a prior match.
[1] I'd like to point out that this can make code much more difficult to read. This behaviour is "ok" when using magic literals, but when using const identifiers it becomes dangerous. If 2 different consts have the same value, it won't be immediately apparent that the latter caseList-statement where the caseList also matches won't be called. Also case statements would be subject to behaviour change due to a simple caseList re-ordering.
const
NEW_CUSTOMER = 0;
EDIT_CUSTOMER = 1;
...
CANCEL_OPERATION = 0;
case UserAction of
NEW_CUSTOMER : ...;
EDIT_CUSTOMER : ...;
...
CANCEL_OPERATION : ...; { Compiler error is very helpful. }
end;
Contradicts with the properties of sets
There's no contradiction. The fact that each caseList value must be unique does not in any way imply it must be "handled like a set". That's your incorrect assumption. Other's making the same assumption are likewise incorrect.
How the uniqueness constraint is checked, is up to the compiler. We can only speculate. But I'd guess the most efficient would be to maintain an ordered list of ranges. Work through each of the caseList of values and ranges, find its position in the above list. If it overlaps, then report an error, otherwise add it to the list.

case statement
The compiler prefers to transform the case statement to a jump table.
In order to make this possible duplicate case labels are not allowed.
Furthermore the compiler does not have to test the case statements in the order you've declared them. For optimization reasons it will reorder these elements as it sees fit; so even if it does not use a jump table, it still cannot allow duplicate case labels.
It is for these same reason (and to keep programmers mentally sane) that the case statement does not allow fall-throughs.
if statement
A (series of) if statements are processed in the order they are declared.
The compiler will test them one by one (and jump out when appropriate).
If the if-statement does not contain duplicates the code will perform the same as an equivalent case statement, but the generated code will most likely be different.
If statements are never converted into jump tables.
about sets
The reason sets are limited to 256 elements is that every element in a set takes up one bit of space.
256 bits = 32 bytes.
Allowing more than 256 elements in a set would grow the representation in memory too much, hampering performance.
The case statement does not use sets, it merely uses set logic in its semantics.

Related

what is LLVM ExtractValueInst?

I was studying one llvm code, where i have found a line,
const ExtractValueInst *EI = cast<ExtractValueInst>(I);
st.setValue(I, st.getValue(EI->getAggregateOperand()));
Now, I understand why cast<...> is being used, but I can't relate with the ExtarctValueInst, Can you give me one example what is this instruction in IR and what is the equivalence C code? and also I want to know about getAggregateOperand() function also. Thank you in advance.
Suppose you have a function that returns a 32-bit integer that isn't really one integer, but rather a set of smaller bitfields and/or bools. Perhaps the bottom six bits are an integer in the range 0-63, the seventh bit is a boolean, etc.
Somewhere you have a call to that function, and a little below the call, you have code that uses the i6 that's a part of the return value. So you create an extractvalue to extract a value from the composite return value. (If the composite were in main memory you'd probably create a getelementptr/load pair, but since it's most likely in a CPU register you create an extractvalue.)
This is quite often used e.g. in exception handling; a catch clause has a single parameter which is a composite of two things, and the catch clause tests one of the two components to determine whether to catch the exception or pass it on.

Please, explain what does the following line of "with.. do" represent

In the code: pbOut is a TPaintBox.
with pbOut.Canvas, Font do
begin
....
end;
The question is, i want to understand what does the comma in with.. do structure do? Is it the same as writing with pbOut.Canvas.Font do or not?
The comma is simply a separator for a list of things to 'scope to' using with.
As to your question, strictly speaking: it depends...
You might make the assumption that pbOut.Canvas always has a Font property (because you know TCanvas has a Font property). But nothing forces a Canvas variable to be of the TCanvas type, so you actually have to consider 2 possibilities.
In your case the first possibility explained below most likely applies, but it's important to be aware of the other.
As to your specific question:
Is it the same as writing with pbOut.Canvas.Font do or not?
It's almost the same, but not quite. That doesn't expose pbOut.Canvas, which the original code does.
Possibility 1
If Font is a member of pbOut.Canvas, then the statement is equivalent to:
with pbOut.Canvas do
begin
with pbOut.Canvas.Font do
begin
{ See notes underneath }
end;
end;
Here you are able to reference members of either pbOut.Canvas or pbOut.Canvas.Font directly; without fully qualifying the reference.
Any members in both pbOut.Canvas and pbOut.Canvas.Font that have have the same identifier would clash. The compiler will favour the inner with item. This means you would still have to fully qualify the pbOut.Canvas member to access it.
Possibility 2
On the other hand, if Font is not an accessible member of pbOut.Canvas, then the statement is equivalent to:
with pbOut.Canvas do
begin
with Font do
begin
{ See notes underneath }
end;
end;
Similar to the previous construct, you can access members of either pbOut.Canvas or Font directly; without fully qualifying the reference.
I must point out that the with statement is not really a useful construct.
It leads to confusing, difficult to read code because you always have to consider what an unqualified identifier really refers to.
There also a risk that adding/removing/renaming members in an existing object or record type might unexpectedly break existing code. Any code using with on instance of the modified type might unexpectedly scope differently.
The with list form in your question is even worse than single with because of the added trickery covered in my answer.
The with statement saves a small amount of code to more explicitly qualify certain identifiers; i.e. minimal "typing" savings, but at a great cost. So it is advised to not use with.
Look into documentation:
A with statement is a shorthand for referencing the fields of a record or the fields, properties, and methods of an object. The syntax of a with statement is:
with obj do statement
or:
with obj1, ..., objn do statement
where obj is an expression yielding a reference to a record, object instance, class instance, interface or class type (metaclass) instance, and statement is any simple or structured statement. Within the statement, you can refer to fields, properties, and methods of obj using their identifiers alone, that is, without qualifiers.
When multiple objects or records appear after with, the entire statement is treated like a series of nested with statements. Thus:
with obj1, obj2, ..., objn do statement
is equivalent to:
with obj1 do
with obj2 do
...
with objn do
// statement
In this case, each variable reference or method name in statement is interpreted, if possible, as a member of objn; otherwise it is interpreted, if possible, as a member of objn1; and so forth. The same rule applies to interpreting the objs themselves, so that, for instance, if objn is a member of both obj1 and obj2, it is interpreted as obj2.objn.
Since a with statement requires a variable or a field to operate upon, using it with properties can be tricky at times. A with statement expects variables it operates on to be available by reference.
Generally using with is a recipe for disaster, since by the look of it, it is difficult to know if a statement refers to one of the with blocks (and which one) or an outer scope.
To answer the question,
Is it the same as writing with pbOut.Canvas.Font do or not?
If you only want to access the Font properties it is the same, but not if you want to access other members of pbOut.Canvas.
For yours and others sake, do not use with statements at all.

Dart int and double being interned? Treated specially by identical()?

Dart has both:
an equality operator == and
a top-level function named identical().
By the choice of syntax, it feels natural to want to use Dart's == operator more frequently than identical(), and I like that. In fact, the Section on Equality of the Idiomatic Dart states that "in practice, you will rarely need to use" identical().
In a recent answer to one of my questions concerning custom filters, it seems that Angular Dart favors use of identical() rather than == when trying to determine whether changes to a model have reached a steady state. (Which can make sense, I suppose, for large models for reasons of efficiency.)
This got me to thinking about identity of int's and so I wrote some tests of identical() over ints. While I expected that small ints might be "interned/cached" (e.g. similar to what is done by Java's Integer.valueOf()), to my surprise, I can't seem to generate two ints that are equal but not identical. I get similar results for double.
Are int and double values being interned/cached? Or maybe identical() is treating them specially? Coming from a Java background, I used to equate equate Dart's:
== to Java's equal() method and
identical() to Java's equality test ==.
But that now seems wrong. Anyone know what is going on?
Numbers are treated specially. If their bit-pattern is the same they must be identical (although it is still debated if this includes the different versions of NaNs).
The main reasons are expectations, leaking of internal details and efficiency.
Expectations: users expect numbers to be identical. It goes against common sense that x == y (for two integers) but not identical(x, y).
Leaking of internal details: the VM uses SMIs (SMall Integers) to represent integers in a specific range (31 bits on 32-bit machines, 63 on 64-bit machines). These are canonicalized and are always identical. Exposing this internal implementation detail would lead to inconsistent results depending on which platform you run.
Efficiency: the VM wants to unbox numbers wherever it can. For example, inside a method doubles are frequently moved into registers. However, keeping track of the original box can be cumbersome and difficult.
foo(x, y) {
var result = x;
while(y-- > 0) {
result += x;
}
return result;
}
Suppose, that the VM optimizes this function and moves result into a register (unboxing x in the process). This allows for a tight loop where result is then efficiently modified. The difficult case happens, when y is 0. The loop wouldn't execute and foo would return x directly. In other words, the following would need to be true:
var x = 5.0;
identical(x, foo(x, 0)); // should be true.
If the VM unboxed the result variable in the method foo it would need to allocate a fresh box for the result and the identical call would therefore return false.
By modifying the definition of identical all these problems are avoided. It comes with a small cost to the identical check, though.
Seems like I posted too quickly. I just stumbled on Dart Issue 13084: Spec says identical(1.0, 1) is true, even if they have different types which led me to the Dart section on Object Identity of the language spec. (I had previously search for equality in the spec but not object identity.)
Here is an excerpt:
The predefined dart function identical() is defined such that identical(c1, c2) iff:
- c1 evaluates to either null or an instance of
bool and c1 == c2, OR
- c1 and c2 are instances of int and c1 == c2, OR
- c1 and c2 are constant strings and c1 == c2, OR
- c1 and c2 are instances of double and one of the following holds: ...
and there are more clauses dealing with lists, maps and constant objects. See the language spec for the full details. Hence, identical() is much more than just a simple test for reference equality.
I can't remember the source for this, but somewhere on dartlang.org or the issue tracker it was said that num, int and double are indeed getting special treatment. One of those special treatments is that you can't subclass those types for performance reasons, but there may be more. What exactly this special treatment entails can probably only be answered by the developers, or maybe someone who knows the specification by heart, but one thing can be inferred:
The numeric types are dart objects - they have methods you can call on their instances. But they also have qualities of primitive data types, as you can do int i = 3;, while a pure object should have a new keyword somewhere. This is different from Java, where there are real primitive types and real objects wrapping them and exposing instance methods.
While the technical details certainly are more complex, if you think about dart numerics as a blend of object and primitive, your comparison to Java still makes sense. In Java, new Integer(5).equals(new Integer(5)) evaluates to true, and so does 5==5.
I am aware this is not a very technically correct answer, but I hope it's still useful to make sense of the behaviour of dart numerics when coming from a Java background.

Access Delphi record fields via . or ^

I came across something in the Delphi language that I hadn't noticed before. Consider a simple record and a pointer to that record:
TRecord = record
value : double;
end;
PTRecord = ^TRecord;
Now declare a variable of type PTRecord:
var x : PTRecord;
and create some space:
x := new (PTRecord);
I noticed that I can access the value field using both the '.' notation and the '^.' notation. Thus the following two lines appear to be operationally equivalent, the compiler doesn't complain and runtime works fine:
x.value := 4.5;
x^.value := 2.3;
I would have thought that '^.' is the correct and only way to access value? My question is, is it ok to use the simpler dot notation or will I run into trouble if I don't use the pointer indirection '^.'? Perhaps this is well known behavior but it's the first time I've come across it.
It is perfectly sound and safe to omit the caret. Of course, logic demands the caret, but since the expression x.value doesn't have a meaningful intepretation in its own, the compiler will assume you actually mean x^.value. This feature is a part of the so-called 'extended syntax'. You can read about this at the documentation.
When Extended syntax is in effect (the default), you can omit the caret when referencing pointers.
Delphi has supported that syntax for close to two decades. When you use the . operator, the compiler will apply the ^ operator implicitly. Both styles are correct. There is no chance for your program doing the wrong thing since there is no case where presence or absence of the ^ will affect the interpretation of the subsequent . operator.
Although this behavior is controlled by the "extended syntax" option, nobody ever disables that option. You can safely rely on it being set in all contexts. It also controls availability of the implicit Result variable, and the way character pointers are compatible with array syntax.
This is called implicit pointer dereferencing for the structured types and inherited from the Delphi 1. This language extension aims to make accessing members of classes (classes are structured types and also instances are implicit pointers) by the means of membership operator (.) only, avoiding requirement for explicit dereference operator (^).
You can safely rely on the presence of this extension in the all Delphi compilers. For more flexibility you can test for the presence of this extension by using $IFOPT X+ conditional directive.

Should I use unsigned integers for counting members?

Should I use unsigned integers for my count class members?
Answer
For example, assume a class
TList <T> = class
private
FCount : Cardinal;
public
property Count : Cardinal read FCount;
end;
That does make sense, doesn't it? The number of items stored in a list can't be negative, so why not use an unsigned integer type for it? I think it's in general a good principle to always use the least general (ergo the most special) type possible.
Now, iterating over a list looks like this:
for I := 0 to List.Count - 1 do
Writeln (List [I]);
When the number of items stored in the list is zero, the compiler tries to evaluate
List.Count - 1
which results in a nice Integer overflow (underflow to be exact). Combined with the fact that the debugger does not show the appropriate location where the exception occured, this was very hard to find for me.
Let me add that if you have overflow checking turned off, the resulting errors will be even harder to track, because then you will often access memory that doesn't belong to you - and that results in undefined behaviour.
I will be using plain Integers for all my count members from now on to avoid situations like this.
If that's complete nonsense, please point it out to me :)
(I just spent an hour tracking an integer overflow in my code, so I decided to share that - most people on here will know that of course, but perhaps I can save someone some time.)
No, definitely not. Delphi idiom is to use integers here. Don't fight the language.
In a 32 bit environment you'll not have more elements in the list, except if you try to build a bitmap.
Let's be clear: every programmer who is going to have to use your code is going to hate you for using a Cardinal instead of an integer.
Unsigned integers are almost always more trouble than they're worth because you usually end up mixing signed and unsigned integers in expressions at some point. That means that the type will need to be widened (and probably have a performance hit) to get correct semantics (ideally the compiler does this as per language definition), or else you'll need to be very careful in your range checking.
Take C/C++ for example: size_t is the type of the integer for memory sizes and allocation, and is unsigned, but ptrdiff_t is the type for the offset you get when you subtract one pointer from another, and necessarily is signed. Want to know how many elements you've allocated in an array? Perhaps you subtract the first element address from the last+1 element address and divide by sizeof(element-type)? Well, now you've just mixed signed and unsigned integers.
Regarding your statement that "I think it's in general a good principle to always use the least general (ergo the most special) type possible." - actually I think it's a good principle to use the data type that will cause you least angst and trouble.
Generally for me that's a signed int since:
I don't usually have lists with 231 or more elements in them.
You shouldn't have lists that big either :-)
I don't like the hassle of having special edge cases in my code.
But it's really a style issue. If 'purity' of code is more important to you than brevity of code, your method is best (with modifications to catch the edge cases). Myself, I prefer brevity since edge cases tend to clutter the code and reduce understanding.
Don't.
It's not just going against a programming idiom, it's an explicit request to the compiler to use unsigned arithmetic, that is either prone to anomalous behaviors (if you don't guard against overflows) or to irrelevant runtime exceptions (if you guard against overflows, a temporary overflow will be fatal, f.i when you subtract before adding, even if the final result is positive, and I'm referring to the CPU opcode-level ordering of operations, which may not bear a trivial relationship to what you have in your code).
Keep in mind "unsigned" does not translate to "positive", it translates to "doesn't have a sign", which is different. The term "unsigned" was picked for good reason (and naming it "Cardinal" in Delphi was a poor choice IMO).
Unsigned types are for raw storage specifications, bitwise operations, ASM code, embedded controllers and other specialty uses. When you're doing high-level programming, you should forget you ever heard about unsigned types.
Moral: use iterators and foreach when you can, because it avoids this question altogether.
Boundary conditions frequently present problems. Allowing for a type that can go negative may just shift the issue. Perhaps it shifts it in a way that's easier to debug, perhaps not. I started off using integers for counting loops like that, but later on switched to cardinals to help me catch errors.

Resources