Is it safe to use enums across interop boundaries? - delphi

I'm writing a set of DLLs which allows other developers write their own DLL as an extension. In the Delphi code, I widely use enums, and enum sets. I use enums through DLLs. I know I can safely use an enum through a DLL across different projects compiled with Delphi. However, I'm not sure about how adaptable it is across various languages.
Is it safe to use enums through a DLL while supporting other various languages? Or should I cast it as an integer instead?

Enums need to be passed as integers (Word or DWORD), and you should use the compiler directive {$MINENUMSIZE} (AKA {$Z}) to ensure that they're the proper size. The Delphi compiler will use different sizes based on the number of enum values unless you do so.
If you're planning to interop with C/C++ code on the Windows OS, use {$MINENUMSIZE 4}.
The documentation I linked above addresses interop with C/C++ - see the third paragraph:
The $Z directive controls the minimum storage size of Delphi enumerated types.
An enumerated type is stored as an unsigned byte if the enumeration has no more than 256 values, and if the type was declared in the {$Z1} state (the default). If an enumerated type has more than 256 values, or if the type was declared in the {$Z2} state, it is stored as an unsigned word. Finally, if an enumerated type is declared in the {$Z4} state, it is stored as an unsigned double word.
The {$Z2} and {$Z4} states are useful for interfacing with C and C++ libraries, which usually represent enumerated types as words or double words.

Related

Is repr(C) a preprocessor directive?

I've seen some Rust codebases use the #[repr(C)] macro (is that what it's called?), however, I couldn't find much information about it but that it sets the type layout in memory to the same layout as 'C's.
Here's what I would like to know: is this a preprocessor directive restricted to the compiler and not the language itself (even though there aren't any other compiler front-ends for Rust), and why does Rust even have a memory layout different than that of Cs? (it's just that I've never had to do this in another language).
Here's a nice situation to demonstrate what I meant: if someone creates another compiler for Rust, are they required to implement this macro, or is it a compiler specific thing?
#[repr(C)] is not a preprocessor directive, since Rust doesn't use a preprocessor 1. It is an attribute. Rust doesn't have a complete specification, but the repr attribute is mentioned in the Rust reference, so it is absolutely a part of the language. Implementation-wise, attributes are parsed the same way all other Rust code is, and are stored in the same AST. Rust has no "attribute pass": attributes are an actual part of the language. If someone else were to implement a Rust compiler, they would need to implement #[repr(C)].
Furthermore, #[repr(C)] can't be implemented without some compiler magic. In the absence of a #[repr(...)], Rust compilers are free to arrange the fields of a struct/enum however they want to (and they do take advantage of this for optimization purposes!).
Rust does have a good reason for using it's own memory layout. If compilers aren't tied to how a struct is written in the source code, they can do optimisations like not storing struct fields that are never read from, reordering fields for better performance, enum tag pooling2, and using spare bits throughout NonZero*s in the struct to store data (the last one isn't happening yet, but might in the future). But the main reason is that Rust has things that just don't make sense in C. For instance, Rust has zero-sized types (like () and [i8; 0]) which can't exist in C, trait vtables, enums with fields, generic types, all of which cause problems when trying to translate them to C.
1 Okay, you could use the C preprocessor with Rust if you really wanted to. Please don't.
2 For example, enum Food { Apple, Pizza(Topping) } enum Topping { Pineapple, Mushroom, Garlic } can be stored in just 1 byte since there are only 4 possible Food values that can be created.
What is this?
It is not a macro it is an attribute.
The book has a good chapter on what macros are and it mentions that there are "Attribute-like macros":
The term macro refers to a family of features in Rust: declarative macros with macro_rules! and three kinds of procedural macros:
Custom #[derive] macros that specify code added with the derive attribute used on structs and enums
Attribute-like macros that define custom attributes usable on any item
Function-like macros that look like function calls but operate on the tokens specified as their argument
Attribute-like macros are what you could use like attributes. For example:
#[route(GET, "/")]
fn index() {}
It does look like the repr attribute doesn't it 😃
So what is an attribute then?
Luckily Rust has great resources like rust-by-example which includes:
An attribute is metadata applied to some module, crate or item. This metadata can be used to/for:
conditional compilation of code
set crate name, version and type (binary or library)
disable lints (warnings)
enable compiler features (macros, glob imports, etc.)
link to a foreign library
mark functions as unit tests
mark functions that will be part of a benchmark
The rust reference is also something you usually look at when you need to know something more in depth. (chapter for attributes)
To the compiler authors out there:
If you were to write a rust compiler, and wanted to support things like the standard library or other crates then you would 100% need to implement these. Because the libraries use these and need them.
Otherwise I guess you could come up with a subset of rust that your compiler supports. But then most people wouldn't use it..
Why does rust not just use the C layout?
The nomicon explains why rust needs to be able to reorder fields of structs for example. For reasons of saving space and being more efficient. It is related to, among other things, generics and monomorphization. In repr(C) fields of structs must be in the same order as the definition.
The C representation is designed for dual purposes. One purpose is for creating types that are interoperable with the C Language. The second purpose is to create types that you can soundly perform operations on that rely on data layout such as reinterpreting values as a different type.

No F# generics with constant "template arguments"?

It just occurred to me, that F# generics do not seem to accept constant values as "template parameters".
Suppose one wanted to create a type RangedInt such, that it behaves like an int but is guaranteed to only contain a sub-range of integer values.
A possible approach could be a discriminated union, similar to:
type RangedInt = | Valid of int | Invalid
But this is not working either, as there is no "type specific storage of the range information". And 2 RangedInt instances should be of different type, if the range differs, too.
Being still a bit C++ infested it would look similar to:
template<int low,int high>
class RangedInteger { ... };
Now the question, arising is two fold:
Did I miss something and constant values for F# generics exist?
If I did not miss that, what would be the idiomatic way to accomplish such a RangedInt<int,int> in F#?
Having found Tomas Petricek's blog about custom numeric types, the equivalent to my question for that blog article would be: What if he did not an IntegerZ5 but an IntegerZn<int> custom type family?
The language feature you're requesting is called Dependent Types, and F# doesn't have that feature.
It's not a particularly common language feature, and even Haskell (which most other Functional programming languages 'look up to') doesn't really have it.
There are languages with Dependent Types out there, but none of them I would consider mainstream. Probably the one I hear about the most is Idris.
Did I miss something and constant values for F# generics exist?
While F# has much strong type inference than other .NET languages, at its heart it is built on .NET.
And .NET generics only support a small subset of what is possible with C++ templates. All type arguments to generic types must be types, and there is no defaulting of type arguments either.
If I did not miss that, what would be the idiomatic way to accomplish such a RangedInt in F#?
It would depend on the details. Setting the limits at runtime is one possibility – this would be the usual approach in .NET. Another would be units of measure (this seems less likely to be a fit).
What if he did not an IntegerZ5 but an IntegerZn<int> custom type family?
I see two reasons:
It is an example, and avoiding generics keeps things simpler allowing focus on the point of the example.
What other underlying type would one use anyway? On contemporary systems smaller types (byte, Int16 etc.) are less efficient (unless space at runtime is the overwhelming concern); long would add size without benefit (it is only going to hold 5 possible values).

DWScript basetype efficiency

The limited amount basetypes in DWScript can be very convenient, but doesn't it add a lot of overhead. Considering that integers are Int64, this should be quite a bit of overhead when you're working with byte values for example. Does DWScript optimize for this internally? If not, is there a way to use language extensions to add other basetypes, such as Byte?
In the event that this would cause problems with type inference, if it's possible to handle this in language extensions, inference for integer values could be deferred to the main Integer type or handled by the extension to select the smallest fitting datatype for example.
DWS uses Variants as an internal base type for storing all values. Since a Variant is significantly larger than a Byte, there's really nothing to be gained by using the Byte type in scripts.

Negative array indexing and placement in memory (pointing)

In fortran you can declare an array with any suitable (integral) range, for example:
real* 8 array(-10:10)
I believe that fortran, when passing by reference, will always pass around array(1) as the reference, but I'm not sure.
I'm using fortran pointers, and I believe that fortran is pointing the "1st" element address, i.e. array(1), not array(-10). However I'm not sure.
How does Fortran deal with negative array indexing in memory? And is it implimentation defined?
Edit: To add a little more detail, I'm passing a malloc'd block from C to fortran by means of using a fortran pointer to point at the the address, which is done by calling a fortran routine from within C. I.e. C goes:
void * pointer = malloc(blockSize*sizeof(double));
fortranpoint_(pointer);
And the fortran point routine looks like:
real*8 :: target block(5, -6:6, 0:0)
real*8 :: pointer array(:,:,:)
entry fortranPoint(block)
array => block
return
The problem is that sometimes when it later tries to access say:
array(1, -6, 0)
I am not sure if this is accessing the address at the beginning of the block or somewhere before it. I now think this is implementation defined, but would like to know the details of each implementation.
Fortran array argument ABI depends on the compiler, and perhaps more crucially, on whether the called procedure has an explicit or implicit interface.
For an implicit interface, typically the address of the first element is passed [1]. In the callee, the procedure then adds an offset depending on how the array dummy argument is declared. E.g. if the array dummy argument is declared somearray(-10:10), then a reference to somearray(x) is calculated as
address_of_first_element_passed_in_to_the_procedure + x + 10
If the procedure has an explicit interface, typically an array descriptor structure is passed rather than the address of the first element. In this structure, the callee can find information on the bounds of each dimension and, of course, a pointer to the actual data, allowing it to calculate the correct offset, similarly to the case of an implicit interface.
[1] Note that this is the first element in memory, that is, the lowest index for each dimension. Not somearray(1) regardless of how the array was declared.
To answer your updated question, for C/Fortran interoperability, use the ISO_C_BINDING feature which is nowadays widely available. This provides a standardized way to pass information between C and Fortran.
If the dummy argument for a regular array in Fortran is declared A(:) (or with more dimensions), the SHAPE is passed, not the specific index range. So the procedure will default to one-indexing. You can override this with a declaration in the procedure of A(-10:), or A(StartIndex:), where StartIndex is another argument.
Fortran pointers do include the index range, but the passing mechanism will be compiler dependent. Code interfacing this to C is likely to be OS & compiler dependent. As already suggested, I'd use a regular array and the ISO C Binding. It is MUCH easier than the old ways of figuring out the compiler passing mechanisms and standard and portable. If you have a large existing Fortran code, you could write a "glue" Fortran procedure that maps between the regular Fortran variable declarations and the ISO C Binding names. While they types will have formally different names, in practice they will be the same if you select the correct ISO C types. The ISO C Binding has been available for many years now -- can you upgrade the compiler on the problem target platform? If not, I'd use a regular Fortran array and either use zero-indexing on the C-side, or explicitly pass as arguments the desired indices.
There are examples of ISO C Binding usage on other Stack Overflow questions.
The interface to a procedure is explicit if it is declared so that it is known to the compiler in the caller. The simplest way it to place the procedures in a module and "use" the module in the caller. Having explicit interfaces helps avoid bugs since the compiler can check consistency between arguments of the caller and callee. It is a little bit like C header files, only easier.

How to also prepare for 64-bits when migrating to Delphi 2010 and Unicode

As 64 bits support is not expected in the next version it is no longer an option to wait for the possibility to migrate our existing code base to unicode and 64-bit in one go.
However it would be nice if we could already prepare our code for 64-bit when doing our unicode translation. This will minimize impact in the event it will finally appear in version 2020.
Any suggestions how to approach this without introducing to much clutter if it doesn't arrive until 2020?
There's another similar question, but I'll repeat my reply here too, to make sure as many people see this info:
First up, a disclaimer: although I work for Embarcadero. I can't speak for my employer. What I'm about to write is based on my own opinion of how a hypothetical 64-bit Delphi should work, but there may or may not be competing opinions and other foreseen or unforeseen incompatibilities and events that cause alternative design decisions to be made.
That said:
There are two integer types, NativeInt and NativeUInt, whose size will
float between 32-bit and 64-bit depending on platform. They've been
around for quite a few releases. No other integer types will change size
depending on bitness of the target.
Make sure that any place that relies on casting a pointer value to an
integer or vice versa is using NativeInt or NativeUInt for the integer
type. TComponent.Tag should be NativeInt in later versions of Delphi.
I'd suggest don't use NativeInt or NativeUInt for non-pointer-based values. Try to keep your code semantically the same between 32-bit and 64-bit. If you need 32 bits of range, use Integer; if you need 64 bits, use Int64. That way your code should run the same on both bitnesses. Only if you're casting to and from a Pointer value of some kind, such as a reference or a THandle, should you use NativeInt.
Pointer-like things should follow similar rules to pointers: object
references (obviously), but also things like HWND, THandle, etc.
Don't rely on internal details of strings and dynamic arrays, like
their header data.
Our general policy on API changes for 64-bit should be to keep the
same API between 32-bit and 64-bit where possible, even if it means that
the 64-bit API does not necessarily take advantage of the machine. For
example, TList will probably only handle MaxInt div SizeOf(Pointer)
elements, in order to keep Count, indexes etc. as Integer. Because the
Integer type won't float (i.e. change size depending on bitness), we
don't want to have ripple effects on customer code: any indexes that
round-tripped through an Integer-typed variable, or for-loop index,
would be truncated and potentially cause subtle bugs.
Where APIs are extended for 64-bit, they will most likely be done with
an extra function / method / property to access the extra data, and this
API will also be supported in 32-bit. For example, the Length() standard
routine will probably return values of type Integer for arguments of
type string or dynamic array; if one wants to deal with very large
dynamic arrays, there may be a LongLength() routine as well, whose
implementation in 32-bit is the same as Length(). Length() would throw
an exception in 64-bit if applied to a dynamic array with more than 232
elements.
Related to this, there will probably be improved error checking for
narrowing operations in the language, especially narrowing 64-bit values
to 32-bit locations. This would hit the usability of assigning the
return value of Length to locations of type Integer if Length(),
returned Int64. On the other hand, specifically for compiler-magic
functions like Length(), there may be some advantage of the magic taken,
to e.g. switch the return type based on context. But advantage can't be
similarly taken in non-magic APIs.
Dynamic arrays will probably support 64-bit indexing. Note that Java
arrays are limited to 32-bit indexing, even on 64-bit platforms.
Strings probably will be limited to 32-bit indexing. We have a hard
time coming up with realistic reasons for people wanting 4GB+ strings
that really are strings, and not just managed blobs of data, for which
dynamic arrays may serve just as well.
Perhaps a built-in assembler, but with restrictions, like not being able to freely mix with Delphi code; there are also rules around exceptions and stack frame layout that need to be followed on x64.
First, look at the places where you interact with non-delphi libraries and api-calls,
they might differ. On Win32, libraries with the stdcall calling convenstion are named like _SomeFunction#4 (#4 indicating the size of the parameters, etc). On Win64, there is only one calling convention, and the functions in a dll are no more decorated. If you import functions from dll files, you might need to adjust them.
Keep in mind, in a 64 bit exe you cannot load a 32-bit dll, so, if you depend on 3rd party dll files, you should check for a 64-bit version of those files as well.
Also, look at Integers, if you depend on their max value, for example when you let them overflow and wait for the moment that happens, it will cause trouble if the size of an integer is changed.
Also, when working with streams, and you want to serialize different data, with includes an integer, it will cause trouble, since the size of the integer changed, and your stream will be out of sync.
So, on places where you depend on the size of an integer or pointer, you will need to make adjustments. When serializing sush data, you need to keep in mind this size issue as well, as it might cause data incompatibilities between 32 and 64 bit versions.
Also, the FreePascal compiler with the Lazarus IDE already supports 64-bit. This alternative Object Pascal compiler is not 100% compatible with the Borland/Codegear/Embarcadero dialect of Pascal, so just recompiling with it for 64-bit might not be that simple, but it might help point out problems with 64-bit.
The conversion to 64bit should not be very painful. Start with being intentional about the size of an integer where it matters. Don't use "integer" instead use Int32 for integers sized at 32bits, and Int64 for integers sized at 64bits. In the last bit conversion the definition of Integer went from Int16 to Int32, so your playing it safe by specifying the exact bit depth.
If you have any inline assembly, create a pascal equivalent and create some unit tests to insure that they operate the same way. Perform some timing tests of both and see if the assembly still runs faster enough to keep. If it does, then you will want to make changes to both as they are needed.
Use NativeInt for integers that can contain casted pointers.

Resources