What should we expect from the floating point support in 64-bit Delphi compiler?
Will 64-bit compiler use SSE to
implement floating point arithmetic?
Will 64-bit compiler support the
current 80-bit floating type
(Extended)?
These questions are closely related, so I ask them as a single question.
I made two posts on the subject (here and there), to summarize, yes, the 64bit compiler uses SSE2 (double precision), but it doesn't use SSE (single precision). Everything is converted to double precision floats, and computed using SSE2 (edit: however there is an option to control that)
This means f.i. that if Maths on double precision floats is fast, maths on single precision is slow (lots of redundant conversions between single and double precisions are thrown in), "Extended" is aliased to "Double", and intermediate computations precision is limited to double precision.
Edit: There was an undocumented (at the time) directive that controls SSE code generation, {$EXCESSPRECISION OFF} activates SSE code generation, which brings back performance within expectations.
According to Marco van de Voort in his answer to: How should I prepare my 32-bit Delphi programs for an eventual 64-bit compiler:
x87 FPU is deprecated on x64, and in general SSE2 will be used for florating point. so floating point and its exception handling might work slightly differently, and extended might not be 80-bit (but 64-bit or, less likely 128-bit). This also relates to the usual rounding (copro controlwork) changes when interfacing wiht C code that expects a different fpu word.
PHis commented on that answer with:
I wouldn't say that the x87 FPU is deprecated, but it is certainly the case that Microsoft have decided to do their best to make it that way (and they really don't seem to like 80-bit FP values), although it is clearly technically possible to use the FPU/80-bit floats on Win64.
I just posted an answer to your other question, but I guess it actually should go here:
Obviously, nobody except for Embarcadero can answer this for sure before the product is released.
It is very likely that any decent x64 compiler will use the SSE2 instruction set as a baseline and therefore attempt to do as much floating point computation using SSE features as possible, minimising the use of the x87 FPU. However, it should also be said that there is no technical reason that would prevent the use of the x87 FPU in x64 application code (despite rumours to the contrary which have been around for some time; if you want more info on that point, please have a look at Agner Fog's Calling Convention Manual, specifically chapter 6.1 "Can floating point registers be used in 64-bit Windows?").
Edit 1: Delphi XE2 Win64 indeed does not support 80-bit floating-point calculations out of the box (see e.g. discussuion here (although it allows one to read/write such values). One can bring such capabilities back to Delphi Win64 using a record + class operators, as is done in this TExtendedX87 type (although caveats apply).
For the double=extended bit:
Read ALlen Bauer's Twitter account Kylix_rd:
http://twitter.com/kylix_rd
In hindsight logical, because while SSE2 regs are 128 bit, they are used as two 64-bit doubles.
We won't know for sure how the 64-bit Delphi compiler will implement floating point arithmetic until Embarcadero actually ships it. Anything prior to that is just speculation. But once we know for sure it'll be too late to do anything about it.
Allen Bauer's tweets do seem to indicate that they'll use SSE2 and that the Extended type may be reduced to 64 bits instead of 80 bits. I think that would be a bad idea, for a variety of reasons. I've written up my thoughts in a QualityCentral report Extended should remain an 80-bit type on 64-bit platforms
If you don't want your code to drop from 80-bit precision to 64-bit precision when you move to 64-bit Delphi, click on the QualityCentral link and vote for my report. The more votes, the more likely Embarcadero will listen. If they do use SSE2 for 64-bit floating point, which makes sense, then adding 80-bit floating point using the FPU will be extra work for Embarcadero. I doubt they'll do that work unless a lot of developers ask for it.
If you really need it, then you can use the TExtendedX87 unit by Philipp M. Schlüter (PhiS on SO) as mentioned in this Embarcadero forum thread.
#PhiS: when you update your answer with the info from mine, I'll remove mine.
Related
In Xamarin.iOS project properties, under "iOS Build" there's an option for: "Perform all 32 bit float operations as 64 bit float".
Microsoft seems to say that using 32 bit "affects precision and, possibly, compatibility" in a bad way, so better use 64 bit precision.
But the popup on the text in visual studio (when hovering with the cursor over "Perform all 32 bit float operations as 64 bit float") says "using 64...is slightly incompatible with .net code."
So which one is it?
You have misread the statement in your first point. Microsoft doesn't say that using 32-bit is bad, so you need to use 64-bit. Just the opposite.
Basically, it is always preferable to use 64-bit float operations. They are enabled by default and according to the Floating Point Operations in Xamarin.iOS docs:
While this higher precision is closer to what developers expect from floating point operations in C# on the desktop, on mobile, the performance impact can be significant.
Let's see what the Code Analysis tool is:
Xamarin.iOS analysis is a set of rules that check your project settings to help you determine if better/more optimized settings are available.
So, even though it is preferable to use 64-bit floats, this isn't always the best choice. When you run the Code analysis tool, it will scan your project to see if there is a better suited configuration for your solution (it depends on the project's flow).
Occasionally, the 64-bit floats may do you more harm than gain. In this case, the linter will warn you with XIA0005: Float32Rule, which will suggest that you uncheck the option, like the Microsoft's message says.
I have recently started doing some work with COBOL, where I have only ever done work in z/OS Assembler on a Mainframe before.
I know that COBOL will be translated into Mainframe machine-code, but I am wondering if it is possible to see the generated code?
I want to use this to better understand the under workings of COBOL.
For example, if I was to compile a COBOL program, I would like to see the assembly that results from the compile. Is something like this possible?
Relenting, only because of this: "I want to use this to better understand the under workings of Cobol".
The simple answer is that there is, for Enterprise COBOL on z/OS, a compiler option, LIST. LIST will provide what is known as the "pseudo assembler" output in your compile listing (and some other useful stuff for understanding the executable program). Another compiler option, OFFSET, shows the displacement from the start of the program of the code generated for each COBOL verb. LIST (which inherently has the offset already) and OFFSET are mutually exclusive. So you need to specify LIST and NOOFFSET.
Compiler options can be specified on the PARM of the EXEC PGM= for the compiler. Since the PARM is limited to 100 characters, compiler options can also be specified in a data set, with a DDName of SYSOPTF (which, in turn, you use a compiler option to specify its use).
A third way to specify compiler options is to include them in the program source, using the PROCESS or (more common, since it is shorter) CBL statement.
It is likely that you have a "panel" to compile your programs. This may have a field allowing options to be specified.
However, be aware of a couple of things: it is possible, when installing the compiler, to "nail in" compiler options (which means they can't be changed by the application programmer); it is possible, when installing the compiler, to prevent the use of PROCESS/CBL statements.
The reason for the above is standardisation. There are compiler options which affect code generation, and using different code generation options within the same system can cause unwanted affects. Even across systems, different code generation options may not be desirable if programmers are prone to expect the "normal" options.
It is unlikely that listing-only options will be "nailed", but if you are prevented from specifying options, then you may need to make a special request. This is not common, but you may be unlucky. Not my fault if it doesn't work for you.
This compiler options, and how you can specify them, are documented in the Enterprise COBOL Programming Guide for your specific release. There you will also find the documentation of the pseudo-assembler (be aware that it appears in the document as "pseudo-assembler", "pseudoassembler" and "pseudo assembler", for no good reason).
When you see the pseudo-assembler, you will see that it is not in the same format as an Assembler statement (I've never discovered why, but as far as I know it has been that way for more than 40 years). The line with the pseudo-assembler will also contain the machine-code in the format you are already familiar with from the output of the Assembler.
Don't expect to see a compiled COBOL program looking like an Assembler program that you would write. Enterprise COBOL adheres to a language Standard (1985) with IBM Extensions. The answer to "why does it do it likely that" will be "because", except for optimisations (see later).
What you see will depend heavily on the version of your compiler, because in the summer of 2013, IBM introduced V5, with entirely new code-generation and optimisation. Up to V4.2, the code generator dated back to "ESA", which meant that over 600 machine instructions introduced since ESA were not available to Enterprise COBOL programs, and extended registers. The same COBOL program compiled with V4.2 and with V6.1 (latest version at time of writing) will be markedly different, and not only because of the different instructions, but also because the structure of an executable COBOL program was also redesigned.
Then there's opimisation. With V4.2, there was one level of possible optimisation, and the optimised code was generally "recognisable". With V5+, there are three levels of optimisation (you get level zero without asking for it) and the optimisations are much more extreme, including, well, extreme stuff. If you have V5+, and want to know a bit more about what is going on, use OPT(0) to get a grip on what is happening, and then note the effects of OPT(1) and OPT(2) (and realise, with the increased compile times, how much work is put into the optimisation).
There's not really a substantial amount of official documentation of the internals. Search-engineing will reveal some stuff. IBM's Compiler Cafe:COBOL Cafe Forum - IBM is a good place if you want more knowledge of V5+ internals, as a couple of the developers attend there. For up to V4.2, here may be as good a place as any to ask further specific questions.
Are floating point operations in Delphi deterministic?
I.E. will I get the same result from an identical floating point mathematical operation on the same executable compiled with Delphi Win32 compiler as I would with the Win64 compiler, or the OS X compiler, or the iOS compiler, or the Android compiler?
This is a crucial question as I'm implementing multiplayer support in my game engine, and I'm concerned that the predictive results on the client side could very often differ from the definitive (and authoritative) determination of the server.
The consequence of this would be the appearance of "lag" or "jerkiness" on the client side when the authoritative game state data overrules the predictive state on the client side.
Since I don't realistically have the ability to test dozens of different devices on different platforms compiled with the different compilers in anything resembling a "controlled condition", I figure it's best to put this question to the Delphi developers out there to see if anyone has an internal understanding of floating point determinism at a low-level on the compilers.
I think there is no simple answer. Similar task was discussed here.
In general, there are two standards for presentation of floating point numbers:
IEEE 754-1985 and EEE 754-2008.
All modern (and quite old actually) CPUs follow the standards and it guarantees some things:
Binary presentation of same standard floating type will be equal
Result of some operations (not all, only basic operations!) is guaranteed to be equal, but only if compiler will use same type of the command, i am not sure it is true.
But if you use some extended operations, such as square root, result may vary even for different models of desktop CPUs. You can read this article for some details:
http://randomascii.wordpress.com/2013/07/16/floating-point-determinism/
P.S. As tmyklebu mentioned, square root is also defined by IEEE 754, so it is possible to guarantee same result for same input for Add, Subtract, Multiply, Divide and Square root. Few other operations are also defined by IEEE, but for all details it is better to read IEEE.
Putting aside the standards for floating point calculations for a moment, consider that the 32 and 64 bit compilers compile to use the old FPU vs the newer SSE instructions. I would find it difficult to trust that every calculation would always come out exactly the same on different hardware implementations. Better go the safe route and assume that if its within a small delta pf difference, you evaluate as equal.
From experience I can tell that the results are different: 32-bit works with Extended precision by default, while 64-bit works with double precision by default.
Consider the statement
x,y,z: double;
x := y * z;
in Win32 this will execute as "x := double(Extended(y)*Extended(z));
in Win64 this will execute as "x := double(double(y)*double(z));
you put a lot of effort into ensuring that you use the same precision and mode. However whenever you call 3rd party libraries, you need to consider that they may internally change these flags.
Question
Are there any resources for learning how to use assembly in Delphi?
Background Information
I've found and read some general assembly and instruction set references (x86, MMX, SSE etc). But I'm finding it difficult to apply that information in Delphi. General things like how to get the value of a class property etc.
I would like to have the option to use assembly when optimising code.
I understand:
It will be difficult to beat the compiler.
High-level optimisation techniques are much more likely to increase performance by several orders of magnitude over low-level assembly optimisations. (Such as choosing different algorthims, caching etc)
Profiling is vital. I'm using Sampling Profiler for real-world performance analysis and cpu cycle counts for low-level details.
I am interested in learning how to use assembly in Delphi because:
It can't hurt to have another tool in the toolbox.
It will help with understanding the compiler generated assembly output.
Understanding what the compiler is doing may help with writing better performing pascal code.
I'm curious.
Here is a resource that could be helpful...
www.guidogybels.eu/docs/Using%20Assembler%20in%20Delphi.pdf
(I wanted to add a comment to #Glenn with this info, but am forced to use the Answer mechanism as I am New to this forum and not enough Reps...)
Most optimization involves creating better algorithms: usually that's where you can get the 'order of magnitude' speed improvements can be obtained.
The x64 assembly world is a big change over the x86 assembly world. Which means that with the introduction of x64 in Delphi in XE2 (very soon now ), you will have to write all your assembly code twice.
Getting yourself a better algorithm in Delphi relieves you of writing that assembly code at all.
The major area where assembly can help (but often smartly crafted Delphi code helps a lot too) is low level bit/byte twiddling, for instance when doing encryption. On the other hand FastMM (the fast memory manager for Delphi) has almost all code written in Delphi.
As Macro already wrote: starting with the disassembled code is often a good start. But assembly optimizations can go very far.
An example you can use as a starting point is for instance the SynCrypto unit which has an option for using either Delphi or assembly code.
The way I read your post, you aren't looking so much for assembler resources as resources to explain how Delphi declarations are structured within memory so you can access them via assembler. This is indeed a difficult thing to find, but not impossible.
Here is a good resource I've found to begin to understand how Delphi structures its declarations. Since assembler only involves itself with discrete data addresses to CPU defined data types, you'll be fine with any Delphi structure as long as you understand it and access it properly.
The only other concern is how to interact with the Delphi procedure and function headers to get the data you want (assuming you want to do your assembler using the Delphi inline facility), but that just involves understanding of the standard function calling conventions. This and this will be useful to that end in understanding those.
Now using actual assembler (linked OBJ files) as opposed to the inline assembler is another topic, which will vary depending on the assembler chosen. You can find information on that as well, but if you have an interest you can always ask that question, too.
HTH.
To use BASM efficiently, you need to have a knowledge both of (1) how Delphi does things at a low level and (2) of assembly. Most of the times, you will not find both of these things described in one place.
However, Dennis Christensen's BASM for beginner and this Delphi3000 article go in that direction. For more specific questions, besides Stackoverflow, also Embarcadero's BASM forum is quite useful.
The simplest solution is always coding it in pascal, and look at the generated assembler.
Speedwise, assembler is usually only at a plus in tight loops, and in general code there is hardly improvement, if any. I've only one piece of assembler in my code, and the benefit comes from recoding a floating point vector operation in fixed point SSE. The saturation provided by SIMD instruction sets is an additional bonus.
Worse even, much ill advised assembler code floating around the web is actually slower than the pascal equivalents on modern processors because the tradeoffs of processors changed over time.
Update:
Then simply load the class property in a local var in the prologue of your procedure before you enter the assembler loop, or move the assembler to a different procedure. Choose your battles.
Studying RTL/VCL source might also yield ideas how to access certain constructs.
Btw, not all low level optimization is done using assembler. On Pascal level with some pointer knowledge a lot can be done too, and stuff like cache optimization can sometimes be done on Pascal level too (see e.g. Cache optimization of rotating bitmaps )
Can I work with large numbers (more than 10^400) with built-in method in Delphi?
Not built-in, but you might want to check out MPArith for arbitrary precision maths.
There is also a Delphi BigInt library on SourceForge . I haven't tried it however, but include for completeness.
You could implement your own large number routines using Delphi's operator overloading.
For example add, subtract, multiply and division.
Intel has also added new instructions for multiply and possibly also for division in their latest chip design to come out in the near future.
One of these instructions is called: mulx
Intel mentions multiple carry streams which would allow multiplication to be accelerated as well.
x86 already had subtract with borrow, and add with carry, so now these new instructions do more or less the same for long multiplication and division and such... there are two methods to do multiplication and by using both apperently this becomes possible.
In the future Delphi will probably support these new instructions as well which could make programming something like this extra interesting.
For now these 4 basic operations might take you somewhere... or perhaps nowhere.
It depends a bit on what you want to do.. what kind of math ? just basic math like add/sub/mul/div
Or more complex math like cosinus, sinus, tan, and all kinds of other math functionality.
As far as I know operator overloading is available for records... I can vaguely remember that it might have been added to classes as well but take a grain of salt with that for now.
Operator overloading used to have a bug when converting between types... but it's been solved in later delphi versions, so it should be good to go.