how to use SSE instruction in the x64 architecture in c++? - sse

Currently I am using Visual C++ inline assembly to embed some core function using SSE; however I juts realised that inline assembly is not supported in x64 mode.
How can I use SSE when I build my software in x64 architecture?

The modern method to use assembly instructions in C/C++ is to use intrinsics. Intrinsics have several advantages over inline assembly such as:
You don't have to worry about 32-bit and 64-bit mode.
You don't need to worry about registers and register spilling.
No need to worry AT&T and Intel Syntax.
No need to worry about calling conversions.
The compiler can optimize intrinsics further which it won't do with inline assembly.
Intrinsics are compatible (for the most intrinsics) with GCC, MSVC, ICC, and Clang.
I also like intrinsics because it's easy to emulate hardware with them for example to prepare for AVX512.
You can find the list of Intrinsics MSVC supports here. Intel has better information on intrinsics as well which agrees mostly with MSVC's intrinsics.
But sometimes you still need or want inline assembly. In my opinion it's really stupid that Microsoft does not allow inline assembly in 64-bit mode. This means they have to define intrinsics for several things that other compilers can still do with inline assembly. One example is CPUID. Visual Studio has an intrinsic for CPUID but GCC still uses inline assembly. Another example is adc. For a long time MSVC had no intrinsic for adc but now it appears they do.
Additionally, because they have to create intrinsics for everything it causes confusion. They have to create an intrinsic for mulx but the Intel's documentation for this is wrong. They also have to create intrinics for adcx and adox as well but their documentation disagrees with Intel's and the generated assembly shows that no intrinsic produces adox. So once again the programmer is left waiting for an intrinsic for adox. If they had just allowed inline assembly then there would be no problem.
But back to SSE. With few exceptions, e.g. _mm_set_epi64x in 32-bit mode on MSVC (I don't know if that's been fixed) the SSE/AVX/AVX2 intrinsics work as expected with MSVC, GCC, ICC, and Clang.

Related

Compile Delphi unit with SSE (fma)

In Free-Pascal you can determine if the code is compiled using SSE2/3/64 instructions via the conditional defines from
https://www.freepascal.org/docs-html/current/prog/prog.html#QQ2-333-379,
Table G.3: Possible FPU defines when compiling using FPC
FPUSSE2 SSE 2 instructions on Intel I386 and higher.
FPUSSE3 SSE 3 instructions on Intel I386 and higher, AMD64.
FPUSSE64 SSE64 FPU on Intel I386 and higher, AMD64.
I know the Delphi 64-bit compilers use SSE in the Win RTL, but my question is:
Is there a known method in Delphi to check at compile time, if a unit is compiled with SSE instructions, especially if a*b + c is computed with hardware fma instructions?
Is there a known method in Delphi to check at compile time, if a unit is compiled with SSE instructions?
On Intel platforms, if the CPUX64 conditional is defined then the compiler generates floating point code using SSE instructions. Otherwise, x87 instructions are generated.
No Delphi compiler generates code using FMA instructions. The floating point codegen used by dcc64 has not changed materially since its initial release in XE2.

If we can run 32 bit executables on 64 bit Windows, why can't we convert it?

WoW64 makes it possible to run 32 bit applications on 64 bit Windows. If the conversion from 32 bit instructions to 64 bit instructions can be made at runtime, why can't we convert the executable itself to 64 bit?
That is because WoW64 doesn't convert 32bit instructions to 64bit.
Your 32bit executable is run in 32 bit mode by switching your CPU to compatibility mode. There are some conversions inlined for API/driver calls, but most of the code is not converted.
*This is only true for the x86-64 architecture. The IA-64 architecture doesn't support this, so WoW actually converts your code to 64bit, but with a significant performance penalty.
*.NET code compiled to MSIL is JITted to the correct architecture. Probably the same happens with other architectures with an Intermediate Language like Java, but i'm no expert there.
Yes, Wow64 lets you run 32-bit programs but they still run in 32-bit mode - no code alterations are performed. Such automatic translation would be impossible for a native application.
The number one problem is that native applications have no annotations explaining what the code does. Just one example: the compiler compiles pointer manipulation code and uses a 32-bit register to hold this pointer value on 32-bit platform and emits bare machine code for that - the runtime will have no idea that this was a pointer and it needs to be placed in 64-bit register on 64-bit platform.
Managed environments such as Java and .NET can deal with it - the compiler emits "intermediate language" code with necessary annotations that is then compiled for the target platform before the code is first run.

How and where can I write ARM Assembly codes in Embarcadero Delphi XE5 with Android?

How and where can I write ARM Assembly codes in Embarcadero Delphi XE5 with Android?
That would be the best, if I can write it inline.
Delphi mobile compiler do not support the asm ... end blocks.
But the "old good way" is still available, since we are talking about a Native compiler.
What you can do is compile your own module with an external assembler (e.g. GNU AS), then link it to your Delphi XE* application.
For instance, System.RTTI uses low-level asm tricks via external statically linked files:
procedure RawInvoke(CodeAddress: Pointer; ParamBlock: PParamBlock);
external 'librtlhelper.a' name 'rtti_raw_invoke';
procedure RawIntercept;
external 'librtlhelper.a' name 'rtti_raw_intercept';
Take a look at this Japanese article - Google translate is your friend!
It is not possible.
Use Atomic Instrinsics Instead of Assembly Language.
Quote:
The Delphi mobile compilers do not support a built-in assembler.
The Delphi mobile compilers do not support inline assembler. The documentation makes this clear:
The inline assembler is available on:
DCC32.EXE, the Delphi Command Line Compiler
DCC64.EXE, the Delphi 64-bit Command Line Compiler
DCCOSX.EXE, the Delphi Cross Compiler for OS X
You'll need to find an assembler to create something that the Delphi mobile compiler can consume, for instance a shared library.
You can not.
LLVM - which is the engine behind Delphi Mobile - has its kind of an assembler language: http://llvm.org/docs/CommandGuide/llvm-as.html
But it would hardly be ARM kind or x86 kind, since LLVM tries to be CPU-agnostic.
Anyway Delphi officially has no support neither for CPU-native assembling language nor for LLVM kind of it.
http://docwiki.embarcadero.com/RADStudio/XE4/en/Migrating_Delphi_Code_to_iOS_from_Desktop#Use_Atomic_Instrinsics_Instead_of_Assembly_Language
http://docwiki.embarcadero.com/RADStudio/XE4/en/Using_Inline_Assembly_Code

OMF format to COFF format

is there any tool like Borland "coff2omf.exe"for converting Borland OMF LIB format to
MS VC++ COFF LIB format ?
actually i want to create .obj file in delphi and use that in MSVC++ .
Yes, there is such a tool.
See this tool.
This utility can be used for converting object files between COFF/PE, OMF, ELF and Mach-O formats for all 32-bit and 64-bit x86 platforms. Can modify symbol names in object files. Can build, modify and convert function libraries across platforms. Can dump object files and executable files. Also includes a very good disassembler supporting the SSE4, AVX, AVX2, FMA and XOP instruction sets. Source code included (GPL). Manual.
Note that this http://www.agner.org web site is the best resource I know about low-level optimization. All the linked information is worth reading, if you want to deal with performance.
But for using the Delphi-generated .obj with VC++, it won't be easily feasible, but for very small part of code. You will need the Delphi RTL used in your code. An external .dll is much better. Note also that some types (like strings or dynamic arrays) won't be easily modifiable in VC++.
To the best of my knowledge there is no such tool. Using Agner Fog's object file converter, the tool that Arnaud refers to, I've never succeeded in converting a Delphi unit into a COFF .obj that can be linked to an MSVC program.
I do believe that it's not realistic to take Delphi source code, compile it, and then use the generated object in MSVC. The other direction is quite possible. You can compile C code to an object, and link that object to your Delphi executable. When you do this you need to resolve any dependencies that the compiled object has.
But to link a Delphi object into a C/C++ program is going to require whatever part of the Delphi RTL that you use. And that's going to be tricky unless you happen not to use any part of the Delphi RTL, which seems unlikely.
In your situation I think your options are:
Port the code to C or C++.
Compile the Delphi code into a dynamic library and link to that from your C++ program.

What should be tested in 64-bit Delphi

Delphi with 64 bit compilation is now in Beta, but only invited beta-testers will get their hands on this version.
What should be tested by the beta testers?
Embarcadero will probably provide a tester's guide for the beta testers. But, here are some ideas:
Memory allocation, alignment, heap and stack. 32-bit could use up to 4GB (well, 3.5) of address space on a 64-bit version of Windows with the /LARGEADDRESSAWARE switch: Delphi64 should be able to use much more. Try allocating 8, 16, and 32 GB. (Even if you have less RAM, the allocation should work since it's a virtual address space.) Now read and write values into it a certain spots: check your allocation and pointers all work. Have a look at what Process Explorer reports for the app. Inspect your stack: it runs top-down, unlike the heap - what does it looks like, what addresses is it using? What does the 16-byte alignment look like? Is that alignment kept for all internal Pascal functions, or only those that call external code? In the 32-bit VCL, there were some bits of code that weren't safe for addresses larger than 2GB. Have those been fixed? Does anything break when it's allocated in, say, the 53rd GB of your program's address space? (Try allocating a huge amount, and then dynamically creating forms, controls etc - they'll probably be created with high addresses.) Does the memory manager fragment? How fast are memory moves and copies?
Compiler warnings. (This one is important.) Upgrade your programs - compile them without changes, and see what warnings / errors you get; fix any; and then fix bugs that occur even though you weren't warned. What issues did you encounter? Should the compiler have warned you, but didn't? Do you get warnings when truncating a pointer when casting to an integer? What about more complex issues: if you use the Single floating-point type, what happens? Warning, or is it silently represented as a double? What if you pass in a parameter to a method that's a different size - for example, PostMessage and you pass in a 32-bit-sized value to the handle parameter - will the compiler be smart enough to guess that if the size is wrong, your code might be wrong, even though it's often valid to pass a smaller type to a larger parameter? Under what circumstances should it do so? (Another thing: what if you pass a 64-bit pointer to a 32-bit type in a method expecting a pointer to a 64-bit type - the type safety should yell loudly, but does it? A use case for that is reading blocks from a binary file, something that could easily cause problems with the wrong-sized types.) ...etc.
Compiler warnings are probably one of the most useful tools for people who upgrade, so the compiler should produce as many as possible, in as many situations as possible, with as few false positives as possible. Remember Delphi is used by a wide range of programmers - you may know what a warning means or recognize bad code even if the compiler is silent, but anything that will help novices (or good programmers having a bad day) is important.
Custom controls & WinAPI. You probably have a few customs controls or bits of code that make heavy use of Windows APIs instead of the VCL. Are there any Windows API-specific issues?
Language compatibility. Does the old file IO code work - AssignFile, etc? RTTI? If you have an event signature with an Integer type, and an event handler is auto-created by the IDE, is it generated as Integer or a size-specific integer type depending on the platform that's currently set? What if the event is NativeInt, what then? (I've seen bugs in event handler method signature generation before, though only on the C++ side.)
Different types of application. We can assume GUI programs have been tested well. What about console and service applications?
C++Builder compatible file generation. C++Builder won't be 64-bit in XE2, but hopefully will in XE3. Delphi can produce ..hpp and .obj files for Pascal code, though. What happens in for a 64-bit platform? Can you produce those files, even though they're useless? Does the compiler generate C++-specific warnings in 64-bit mode, or does it give up and not let you do it? In 32-bit mode, is there anything you can do for 64-bit compatibility that will generate a warning building the C++ header?
Linker. Can you link .lib and .obj files created with other compilers? (I'd expect .lib yes, .obj no.) Does the linker use COFF or OMF for 64-bit - have they changed? This thread implies an ELF format. Has it changed for 32-bit too? Does this affect the DCU format, will we still get ultra-fast compiling / linking?
COM and 64-bit plugins. Are there any marshalling issues? Can you build a 64-bit plugin for Explorer now?
Calling conventions. Safecall's supposed to be the only 'calling convention' (if safecall counts...) that's still different - does it still work? Function and procedure pointers, and closures (object method pointers): do they work? What do they look like in the debug inspector? Given all calling conventions are now the same, if you mix calling conventions in your method declaration and your calling pointer, what happens? Is there any legacy stuff around that will break or does it transparently work? Does it now give you an (erroneous) warning that the types are incompatible?
Floating point math. The Delphi 64 preview said floating point would be double only. Can Delphi handle long doubles? Are there any compatibility routines for handling the old Real (48 bits, I think??) type? Does the compiler generate SSE or SSE2 code or a mix, and how good is it?
Performance. This is their first go at a 64-bit compiler; it will probably be refined over the next few releases. But are there any obvious performance problems, with:
compiling; linking; IDE insight?
Generated code: are your programs faster or slower? Is FP math faster or slower? Does inline work, and does it generate any unnecessary header/footer bits around inlined methods?
Debugging. This is probably easiest to test through the whole process of testing everything else, but how well does the 64-bit debugger work? Does it have all functionality of the 32-bit one? Do IDE debug visualiser plugins still work? What if you debug a non-Delphi 64-bit program or attach to a process, instead of running normally?
Misc Is Delphi itself compiled as a 64-bit program? If not, why not? (Are they "eating their own dogfood"?) Code inspect the new VCL (assuming the preview comes with VCL source.) What have they done to make the VCL 32/64 compatible? Are there any errors, or if you already know 64-bit code well from other IDEs, are there better approaches they could take instead?
...etc. I could keep typing for hours, but I think that's a good start though :)
I'm sure Embarcadero will provide some testing guidance. For what it's worth this is what I'd test; Mostly because it's the stuff I care about:
Small console application should work.
Allows me to allocate a 4Gb flat hunk of memory. Don't really need that, but it will be the first thing my console application tries, right after WriteLn('I''m using all 64 bits!!!!');
Can create 64bit DLL and the DLL can be imported and used from other environment.
Do some simple things and look at the generated assembler, just for kicks.
Can create Firebird 64bit compatible UDF's
I'd probably try compiling my "utility" units, because they do an fair amount of pointer manipulation, see how they work.
If the VCL works I'd put it through it's paces: create small form, put a button on it, ShowMessage.
Generally speaking the only thing I really need 64bit Delphi for is Firebird 64bit UDFs. That's minor and can be "fixed" using FPC. I assume the best testing will be done by people that actually need 64 bit delphi. And those people don't need testing suggestions.
The base foundation stuff would come first, to ensure that Delphi 64 can be used for what Delphi 32 can't be used:
compiler correctness: first and foremost, no internal errors, no incorrect code-gen
ability to compile to 64bit DLLs and stability of those
stress the memory manager: with large objects, fragmented allocation, multi-threaded allocations, etc.
multi-threading: is it stable? is it efficient? does it scale? that for core RTL functions and units, and not forgetting the reference-counted types.
floating point: does the compiler deliver proper SSE? are the maths functions properly implemented and correct? what happens if you stress the SSE register set with complex expressions?
And as a bonus, ability to accept 64bit object files from the usual C++ compilers.
Non visual stuff ... I think. There is already success from some beta testers that already ported their libraries. I don't know the preview but from the information I don't have I would assume more complex non visual scenarios currently make sense. Anyone who knows it better please correct me ...
I think the preview first allows you to setup a migration strategy, this would be my intention. The VCL ... intended to work on one code base and maybe backport your code to purepascal instead of assembler.
Mike

Resources