Should I use unsigned integers for counting members? - delphi

Should I use unsigned integers for my count class members?
Answer
For example, assume a class
TList <T> = class
private
FCount : Cardinal;
public
property Count : Cardinal read FCount;
end;
That does make sense, doesn't it? The number of items stored in a list can't be negative, so why not use an unsigned integer type for it? I think it's in general a good principle to always use the least general (ergo the most special) type possible.
Now, iterating over a list looks like this:
for I := 0 to List.Count - 1 do
Writeln (List [I]);
When the number of items stored in the list is zero, the compiler tries to evaluate
List.Count - 1
which results in a nice Integer overflow (underflow to be exact). Combined with the fact that the debugger does not show the appropriate location where the exception occured, this was very hard to find for me.
Let me add that if you have overflow checking turned off, the resulting errors will be even harder to track, because then you will often access memory that doesn't belong to you - and that results in undefined behaviour.
I will be using plain Integers for all my count members from now on to avoid situations like this.
If that's complete nonsense, please point it out to me :)
(I just spent an hour tracking an integer overflow in my code, so I decided to share that - most people on here will know that of course, but perhaps I can save someone some time.)

No, definitely not. Delphi idiom is to use integers here. Don't fight the language.
In a 32 bit environment you'll not have more elements in the list, except if you try to build a bitmap.
Let's be clear: every programmer who is going to have to use your code is going to hate you for using a Cardinal instead of an integer.

Unsigned integers are almost always more trouble than they're worth because you usually end up mixing signed and unsigned integers in expressions at some point. That means that the type will need to be widened (and probably have a performance hit) to get correct semantics (ideally the compiler does this as per language definition), or else you'll need to be very careful in your range checking.
Take C/C++ for example: size_t is the type of the integer for memory sizes and allocation, and is unsigned, but ptrdiff_t is the type for the offset you get when you subtract one pointer from another, and necessarily is signed. Want to know how many elements you've allocated in an array? Perhaps you subtract the first element address from the last+1 element address and divide by sizeof(element-type)? Well, now you've just mixed signed and unsigned integers.

Regarding your statement that "I think it's in general a good principle to always use the least general (ergo the most special) type possible." - actually I think it's a good principle to use the data type that will cause you least angst and trouble.
Generally for me that's a signed int since:
I don't usually have lists with 231 or more elements in them.
You shouldn't have lists that big either :-)
I don't like the hassle of having special edge cases in my code.
But it's really a style issue. If 'purity' of code is more important to you than brevity of code, your method is best (with modifications to catch the edge cases). Myself, I prefer brevity since edge cases tend to clutter the code and reduce understanding.

Don't.
It's not just going against a programming idiom, it's an explicit request to the compiler to use unsigned arithmetic, that is either prone to anomalous behaviors (if you don't guard against overflows) or to irrelevant runtime exceptions (if you guard against overflows, a temporary overflow will be fatal, f.i when you subtract before adding, even if the final result is positive, and I'm referring to the CPU opcode-level ordering of operations, which may not bear a trivial relationship to what you have in your code).
Keep in mind "unsigned" does not translate to "positive", it translates to "doesn't have a sign", which is different. The term "unsigned" was picked for good reason (and naming it "Cardinal" in Delphi was a poor choice IMO).
Unsigned types are for raw storage specifications, bitwise operations, ASM code, embedded controllers and other specialty uses. When you're doing high-level programming, you should forget you ever heard about unsigned types.

Moral: use iterators and foreach when you can, because it avoids this question altogether.

Boundary conditions frequently present problems. Allowing for a type that can go negative may just shift the issue. Perhaps it shifts it in a way that's easier to debug, perhaps not. I started off using integers for counting loops like that, but later on switched to cardinals to help me catch errors.

Related

How does Rust store types at runtime?

A u32 takes 4 bytes of memory, a String takes 3 pointer-sized integers (for location, size, and reserved space) on the stack, plus some amount on the heap.
This to me implies that Rust doesn't know, when the code is executed, what type is stored at a particular location, because that knowledge would require more memory.
But at the same time, does it not need to know what type is stored at 0xfa3d2f10, in order to be able to interpret the bytes at that location? For example, to know that the next bytes form the spec of a String on the heap?
How does Rust store types at runtime?
It doesn't, generally.
Rust doesn't know, when the code is executed, what type is stored at a particular location
Correct.
does it not need to know what type is stored
No, the bytes in memory should be correct, and the rest of the code assumes as much. The offsets of fields in a struct are baked-in to the generated machine code.
When does Rust store something like type information?
When performing dynamic dispatch, a fat pointer is used. This is composed of a pointer to the data and a pointer to a vtable, a collection of functions that make up the interface in question. The vtable could be considered a representation of the type, but it doesn't have a lot of the information that you might think goes into "a type" (unless the trait requires it). Dynamic dispatch isn't super common in Rust as most people prefer static dispatch when it's possible, but both techniques have their benefits.
There's also concepts like TypeId, which can represent one specific type, but only of a subset of types. It also doesn't provide much capability besides "are these the same type or not".
Isn't this all terribly brittle?
Yes, it can be, which is one of the things that makes Rust so interesting.
In a language like C or C++, there's not much that safeguards the programmer from making dumb mistakes that go out and mess up those bytes floating around in memory. Making those mistakes is what leads to bugs due to memory safety. Instead of interpreting your password as a password, it's interpreted as your username and printed out to an attacker (oops!)
Rust provides safeguards against that in the form of a strong type system and tools like the borrow checker, but still all done at compile time. Unsafe Rust enables these dangerous tools with the tradeoff that the programmer is now expected to uphold all the guarantees themselves, much like if they were writing C or C++ again.
See also:
When does type binding happen in Rust?
How does Rust implement reflection?
How do I print the type of a variable in Rust?
How to introspect all available methods and members of a Rust type?

Which is the quicker operator in objective c

I use == and != a lot in my code and I was wondering which is quicker in objective c so that I can make my app as fast as possible.
Situation
I have a variable which is one of two things and I want the quickest method to see which one it is
Thanks in advance
You should not worry about this level of detail for performance reasons, unless you've identified a performance issue.
However, wondering to satisfy an inquiring mind is a different matter! :-) The answer is they are identical.
A comparison is usually compiled as an instruction which sets condition flags; this could be a specific comparison instruction or something like an arithmetic instruction which sets condition codes; followed by a conditional jump which tests the condition flags - and a test for "equal" is the same cost as for "not equal", just a different setting of those condition flags.
This also means that statements such as if([some method call]) ... and if(![some method call]) ... have the same cost - the "not" operator produces no extra code.
You can test yourself.
Check current milliseconds before and after operating.
I guess there's no differences..
If you really need to know,
you could make a lot of operating with loop.
then you will get the answer.
This is silly. You would have to execute millions of iterations of code using the 2 versions of if statement in order to even detect a difference in speed. This is a triviality, and not worth worrying about.
As the other poster said, == and != should take exactly the same amount of time for non-floatingpoint values. For floating point, there might be some differences, since for an equal comparison the processor has to first normalize the 2 floating point values, then compare them, and normalizing is relatively time-consuming. I don't know if testing for non-equality if slower than equality. IT's unlikely but not impossible.

String indexing vs. dynamic array indexing in Delphi

In Delphi, why are AnsiStrings indexed from one and dynamic arrays indexed from zero? Is this a historical accident, to make AnsiStrings work more like ShortStrings, or is there some deeper logic at work?
One of the contributing factors that led to "Pascal" strings being 1 indexed instead of 0 indexed was that the length of the string was stored in the zeroth byte. Yes, that could have been hidden from the programmer's view by having the compiler internally add a constant offset to the string index expression (as was done in Delphi's long strings later) but in the beginning things were much simpler. Allocate a block of memory, store the length in byte zero, index the char data from byte 1. End of story.
As I recall UCSD Pascal was using this length-in-zero-byte convention long before Turbo Pascal came along.
As for why dynamic arrays are zero based, I don't recall any specific reason but I would guess it reflects the dynamic array's kinship to dynamically allocating a buffer and indexing off the buffer pointer. The array types that you would use to create array pointer types were zero based arrays. The first byte is found at buffer pointer + 0 offset. This is the C rationalization for zero based everything. There was no compelling reason to carry string's 1 based indexing pattern over to compiler managed arrays when string's 1 based indexing was already (and had always been) the exception rather than the norm.
It may well be that because the string type was the first array-like data type that everyone first encountered and possibly the most used data type across the board, there may be a perception of a bias towards 1 based indexing in the language. However, if you look closely I think you'll find arrays in Pascal (distinct from string) have never been inherently 1 based, especially when dynamically allocated.
The reason for the Delphi string tradition of 1-based strings is quite simple. The tradition comes from the implementation of old style Turbo Pascal strings. That data type stored the length of the string in the first byte of the variable, index 0. The string data began in the next byte, index 1.
You can still use that data type today. It's now called ShortString. As is immediately obvious from it's implementation, there is a 255 character limit. This limit led to the introduction of huge strings, if I recall correctly, in Delphi 2. When huge strings were introduced the language designers chose to retain 1-based indexing to make it easier for developers to switch from short strings to huge strings.
I guess Turbo Pascal didn't invent the idea of using element 0 for length. It's just that I'm too young to remember what came before then!
Dynamic arrays weren't bound by the past in the same way and had a free choice. I don't know why zero based was chosen. Perhaps because it fits more easily with the prevailing fashion on platform on which Delphi existed at that time, namely Windows. That's just a guess though. Danny Thorpe worked on the Delphi compiler at that time, and even he can't remember the rationale!
The Delphi language designers are currently moving towards zero based string indexing for huge strings. The initial steps in this direction can be seen in XE3 in the TStringHelper class which uses 0-based indexing. And also in the ZEROBASEDSTRINGS conditional which allows you to opt in to 0-based indexing. Expect the next generation Delphi compiler to use 0-based indexing only. The times they are changin'.
Historical accident.
Pascal strings and arrays traditionally start at 1.
C - and perhaps consequently AnsiStrings - start at 0.
I don't know the rationale for "breaking with Pascal tradition" for Dynamic arrays, which also start at zero. But it makes sense, and I agree with it ...
IMHO...

Why do Delphi and Free Pascal usually prefer a signed-integer data type to unsigned one?

I'm not a Pascal newbie, but I still don't know until now why Delphi and Free Pascal usually declares parameters and returned values as signed integers whereas I see them should always be positive. For example:
Pos() returns type of Integer. Is it possible to be a negative?
SetLength() declares the NewLength parameter as a type of Integer. Is there a negative length for string?
System.THandle declared as Longint. Is there a negative number for handles?
There are many decisions like those in Delphi and Free Pascal. What considerations were behind this?
In Pascal, Integer (signed) is the base type. All other integer number types are a subranges of integer. (this is not entirely true in Borland dialects, given longint in TP and int64 in Delphi, but close enough).
An important reason for that if the intermediate result of calculations gets negative, and you calculate with unsigned integers, range check errors will trigger, and since most older programming languages DON'T assume 2-complement integers, the result (with range checks off) might even be corrupt.
The THandle case is much simpler. Delphi didn't have a proper 32-bit unsigned till D4, but only a 31-bit cardinal. (since 32-bit unsigned integer is not a subrange of integer, the later unsigned ints are a subset of int64, which moved the problem to uint64 which was only added in D2010 or so)
So in many places in the headers signed types are used where the winapi uses unsigned types, probably to avoid the 32th bit getting accidentally corrupt in those versions, and the custom stuck.
But the winapi case is different from the general case.
Added later Some Pascal (and Modula2/3) implementations circumvent this trap by setting the integer at a size larger than the wordsize, and require all numeric types to declare a proper subrange, like in the below program.
The first holds the primary assumption that everything is a subset of integer, and the second allows the compiler to scale nearly everything down again to fit in registers, specially if the CPU has some operations for larger than word operations. (like x86 where 32-bit * 32-bit mul gives a 64-bit result, or can detect wordsize overflows using status bits (e.g. to generate range exceptions for adds without doing a full 2*wordsize add)
var x : 0..20;
y : -10..10;
begin
// any expression of x and y has a range -10..20
Turbo Pascal and Delphi emulate an integer type twice the wordsize for their 16-bit and 32-bit offerings. The handling of the highest unsigned type is hacky at best.
Well, for a start THandle is declared incorrectly. It's unsigned in the Windows headers and should be so in Delphi. In fact I think this was corrected in a recent release of Delphi.
I'd imagine that the preference for signed over unsigned is largely historical and not particularly significant. However, I can think of one example where it is important. Consider the for loop:
for i := 0 to Count-1 do
If i is unsigned and Count is 0 then this loop runs from 0 to $FFFFFFFF which is not what you want. Using a signed integer loop variable avoids that problem.
Pascal is a victim of its syntax here. The equivalent C or C++ loop has no such trouble
for (unsigned int i=0; i<Count; i++)
due to the syntactic difference and use of a comparison operator as stopping condition.
This could also be the reason why Length() on a string or dynamic array returns a signed value. And so for consistency, SetLength() should accept signed values. And given that the return value of Pos() is used to index strings, it should be signed also.
Here's another Stack Overflow discussion of the topic: Should I use unsigned integers for counting members?
Of course, I'm speculating wildly here. Perhaps there was no design and just out of habit the precedent of using signed values was set and became enshrined.
Some string related search functions return -1 when nothing is found.
I believe the reasoning behind this is that MaxInt is 2GB which is the maximum size for strings in 32 bit Delphi. This because a single process can have up to 2GB memory
There are many reasons for using signed integers, even some that might apply when you do not intend to return a negative value.
Imagine I write code that calls Pos, and I want to do math with the results. Would you rather have a negative result (Pos('x',s)-5) raise a range-check exception, underflow and become a very large unsigned number around 4 billion, or go negative, if Pos('x',s) returns 1? Either one is a source of problems for new users who seldom think about these cases, but the long-established tradition is that by using Integer results, it's your job to check for negative and zero results and not use them as string offsets. There is an advantage for beginning and for advanced programmers, in using Integer, and not having "negative" values roll under and become large unsigned values or raise range exceptions.
Secondly, remember that in beginning programming, one usually introduces Integer (signed) types long before one introduces unsigned types like Cardinal. Beginners often work with functions like Pos, and it makes sense to use the type that will create the least-unfriendly set of side effects. There are no negative side effects to having a range larger than the one you absolutely need (the range you probably need for Pos is 1 to maximum-string-length-in-delphi). There is zero benefit in 32-bit Delphi to using the Cardinal type for Pos, and there definitely ARE downsides to choosing it.
Once you get to 64-bit delphi, however, you could theoretically have strings LARGER than an Integer can hold, and moving to Cardinal wouldn't fix all your potential problems. However, the chance of anyone having a 2+ GB string is probably nil, and Delphi 64-bit compiler doesn't allow a >2 GB string, anyway. In my testing, I can achieve an almost 1 GB String in 64 bit Delphi. So the practical length limit for a Win64 string is about a billion (1073741814) characters, which is using nearly 2 GB of actual RAM. At that limit, I either get EIntOverflow or EAccessViolation, and it seems I am hitting Delphi run time library (RTL) bugs, not properly defined limits, so your mileage may vary.

Recommended initialization values for numbers

Assume you have a variety of number or int based variables that you want to be initialized to some default value. But using 0 could be problematic because 0 is meaningful and could have side affects.
Are there any conventions around this?
I have been working in Actionscript lately and have a variety of value objects with optional parameters so for most variables I set null but for numbers or ints I can't use null. An example:
package com.website.app.model.vo
{
public class MyValueObject
{
public function MyValueObject (
_id:String=null,
_amount:Number=0,
_isPurchased:Boolean=false
)
{ // Constructor
if( _id != null ) this.id = _id;
if( _amount != 0 ) this.amount = _amount;
if( _isPurchased != false ) this.isPurchased = _isPurchased;
}
public var id:String;
public var amount:Number;
public var isPurchased:Boolean;
}
}
The difficulty is that using 0 in the above code might be problematic if the value is not ever changed from its initial value. It is easy to detect if a variable has a null value. But detecting 0 may not be so easy because 0 might be a legitimate value. I want to set a default value to make the parameter optional but I also want to later detect in my code if the value was changed from its default without hard to debug side affects.
I suppose I could use something like -1 for a value. I was wondering if there are any well known coding conventions for this kind of thing? I suppose it depends on the nature of the variable and the data.
This is first my stack overflow question. Hopefully the gist of my question makes sense.
A lot of debuggers will use 0xdeadbeef for initializing registers. I always get a chuckle when I see that.
But, in all honesty, your question contains its own answer - use a value that your variable is not ever expected to become. It doesn't matter what the value is.
Since you asked in a comment I'll talk a little bit about C and C++. For efficiency reasons local variables and allocated memory are not initialized by default. But debug builds often do this to help catch errors. A common value used is 0xcdcdcdcd which is reasonably unlikely. It has the high bit set and is either a rather large unsigned or rather large negative signed number. As a pointer address it is odd which will cause an alignment exception if used on anything but a char (but not on X86). It has no special meaning as a 32 bit floating point number so it isn't a perfect choice.
Occasionally you'll see a partially aligned value in a variable such as 0xcdcd0000 or 0x0000cdcd. These can be treated as suspcious at the very least.
Sometimes different values will be used depending on the allocation area of library. That gives you a clue where a bad value may have originated (i.e., it itself wasn't initialized but it was copied from an unititialized value).
The ideal value would be invalid no matter what alignment you read from memory and is invalid over all primitive types. It also should look suspicious to a human so even if they do not know the convention they can suspect something is a foot. That's why 0xdeadbeef can be a good choice because the (hex viewing) programmer will recognize that as the work of a human and not random chance. Note also that it is odd and has the high bit set so it has that going for it.
The value -1 is often traditionally used as an "out of range" or "invalid" value to indicate failure or non-initialised data. Then again, that goes right down the pan if -1 is a semantically valid value for the variable...or you're using an unsigned type.
You seem to like null (and for a good reason), so why not just use it throughout?
In ActionScript you can only assign Number.NaN to variables that are typed Number, not int or uint.
That being said, because AS3 does not support named arguments you can always look at the arguments array (it's a built-in array that all functions have, unless you use the ...rest construct). If that array's length is less than the position of your numeric argument you know it wasn't passed in.
I often use a maximum value for this. As you say, zero often is a valid value. Generally max-int, while theoretically valid, is safe to exclude. But not always; be careful.
I like 0xD15EA5ED, it's similar to 0xDEADBEEF but is usually more accurate when debugging.

Resources