Where is Clang's '_mm256_pow_ps' intrinsic?

Where is Clang's '_mm256_pow_ps' intrinsic? - clang

I can't seem to find the intrinsics for either _mm_pow_ps or _mm256_pow_ps, both of which are supposed to be included with 'immintrin.h'.
Does Clang not define these or are they in a header I'm not including?

That's not an intrinsic; it's an Intel SVML library function name that confusingly uses the same naming scheme as actual intrinsics. There's no vpowps instruction. (AVX512ER on Xeon Phi does have the semi-related vexp2ps instruction...)
IDK if this naming scheme is to trick people into depending on Intel tools when writing SIMD code with their compiler (which comes with SVML), or because their compiler does treat it like an intrinsic/builtin for doing constant propagation if inputs are known, or some other reason.
For functions like that and _mm_sin_ps to be usable, you need Intel's Short Vector Math Library (SVML). Most people just avoid using them. If it has an implementation of something you want, though, it's worth looking into. IDK what other vector pow implementations exist.
In the intrinsics finder, you can avoid seeing these non-portable functions in your search results if you leave the SVML box unchecked.
There are some "composite" intrinsics like _mm_set_epi8() that typically compile to multiple loads and shuffles which are portable across compilers, and do inline instead of being calls to library functions.
Also note that sqrtps is a native machine instruction, so _mm_sqrt_ps() is a real intrinsic. IEEE 754 specifies mul, div, add, sub, and sqrt as "basic" operations that are requires to produce correctly-rounded results (error <= 0.5ulp), so sqrt() is special and does have direct hardware support, unlike most other "math library" functions.
There are various libraries of SIMD math functions. Some of them come with C++ wrapper libraries that allow a+b instead of _mm_add_ps(a,b).
glibc libmvec - since glibc 2.22, to support OpenMP 4.0 vector math functions. GCC knows how to auto-vectorize some functions like cos(), sin(), and probably pow() using it. This answer shows one inconvenient way of using it explicitly for manual vectorization. (Hopefully better ways are possible that don't have mangled names in the source code).
Agner Fog's VCL has some math functions like exp and log. (Formerly GPL licensed, now Apache).
https://github.com/microsoft/DirectXMath (MIT license) - I think portable to non-Windows, and doesn't require DirectX.
https://sleef.org/ - apparently great performance, with variable accuracy you can choose. Formerly only supported on MSVC on Windows, the support matrix on its web site now includes GCC and Clang for x86-64 GNU/Linux and AArch64.
Intel's own SVML (comes with ICC; ICC auto-vectorizes with SVML by default). Confusingly has its prototypes in immintrin.h along with actual intrinsics. Maybe they want to trick people into writing code that's dependent on Intel tools/libraries. Or maybe they think fewer includes are better and that everyone should use their compiler...
Also related: Intel MKL (Math Kernel Library), with matrix BLAS functions.
AMD ACML - end-of-life closed-source freeware. I think it just has functions that loop over arrays/matrices (like Intel MKL), not functions for single SIMD vectors.
sse_mathfun (zlib license) SSE2 and ARM NEON. Hasn't been updated since about 2011 it seems. But does have implementations of single-vector math / trig functions.

Related

What makes libadalang special?

I have been reading about libadalang 1 2 and I am very impressed by it. However, I was wondering if this technique has already been used and another language supports a library for syntactically and semantically analyzing its code. Is this a unique approach?

C and C++: libclang "The C Interface to Clang provides a relatively small API that exposes facilities for parsing source code into an abstract syntax tree (AST), loading already-parsed ASTs, traversing the AST, associating physical source locations with elements within the AST, and other facilities that support Clang-based development tools." (See libtooling for a C++ API)
Python: See the ast module in the Python Language Services section of the Python Library manual. (The other modules can be useful, as well.)
Javascript: The ongoing ESTree effort is attempting to standardize parsing services over different Javascript engines.
C# and Visual Basic: See the .NET Compiler Platform ("Roslyn").
I'm sure there are lots more; those ones just came off the top of my head.
For a practical and theoretical grounding, you should definitely (re)visit the classical textbook Structure and Interpretation of Computer Programs by Abelson & Sussman (1st edition 1985, 2nd edition 1996), which helped popularise the idea of Metacircular Interpretation -- that is, interpreting a computer program as a formal datastructure which can be interpreted (or otherwise analysed) programmatically.

You can see "libadalang" as ASIS Mark II. AdaCore seems to be attempting to rethink ASIS in a way that will support both what ASIS already can do, and more lightweight operations, where you don't require the source to compile, to provide an analysis of it.
Hopefully the final API will be nicer than that of ASIS.
So no, it is not a unique approach. It has already been done for Ada. (But I'm not aware of similar libraries for other languages.)

abstract syntax tree for imperative languages

I am looking for an abstract syntax tree representation that can be used for common imperative languages (Java, C, python, ruby, etc). I would like this to be as close to source as possible (as opposed to something like LLVM). I found Rose online but it is only able to handle C and Fortran. Does this exist?

You won't find "one" universal AST that can represent many languages. People have been searching for 50 years.
The essential reason is that an AST node implicitly represents the precise language semantics of the operator it encodes, and different languages have different semantics for what are apparently the same operators.
For example, the "+" operator in modern Fortran will add integers, reals, complex values, and slices of arrays of such things. Java "+" will add integers, reals, and glue strings together. If I wrote "a+b" in "universal AST", how would you know which semantic effect the corresponding AST encoded?
What you can do is build a system in which the ASTs for different languages are represented uniformly, so that you can share tool infrastructure across many languages. This is done by many Program Transformation Systems (PTS), where you provide the grammar (or pick one from an available library), and the PTS parses and builds an AST using its uniform representation. Most PTS provide additional support to analyze and transform the code.
So, all you need is a PTS and some sweat to define a grammar. That's really not true; getting a grammar right for a real language is actually pretty hard. Worse, there's a lot to Life After Parsing because you need the meaning of symbols and additional inferences such as control and data flow analysis. So you need full front ends (e.g., parsing, name/type resolution, flow analysis, ...), or as much as you can get, if you don't want to be distracted for months before beginning your real work.
What this means in practice is you want to find a tool that handles the languages of interest to you, with mature front ends already available:
Rose (you already found this) handle C, C++ and Fortran. It has no built-in parsing capability of its own; its front ends are custom built. So it is apparantly hard to extend to other languages. But it has good flow analysis capabilities and provides means to transform the code via hand-write AST walks/smashes.
Clang handles C and C++. Clang also uses hand-built front ends. It can also transform code, again by hand-written AST walks/smashes, with a small amount of pattern matching support. As I understand it, you have to use the LLVM part of Clang to do flow analysis.
Our DMS Software Reengineering Toolkit has full front ends for C, C++, Java and COBOL, and full parsers for many more languages such as Python. DMS provides pattern-based analysis and source-to-source transformation. It operates directly from a grammar (see one for Oberon, Nicklaus Wirth's latest language). (I don't know of any tool that handles Ruby, which is famously hard to parse; I understand its grammar is ambiguous, and DMS is good at handling ambiguous grammars).

alea.cuBase and CUBLAS

I'm starting down the exciting road of GPU programming, and if I'm going to do some heavyweight number-crunching, I'd like to use the best libraries that are out there. I would especially like to use cuBLAS from an F# environment. CUDAfy offers the full set of drivers from their solution, and I have also been looking at Alea.cuBase, which has thrown up a few questions.
The Alea.cuSamples project on GitHub makes a cryptic reference to an Examples solution: "For more advanced test, please go to the MatrixMul projects in the Examples solution." However, I can't find any trace of these mysterious projects.
Does anyone know the location of the elusive "MatrixMul projects in the Examples solution"?
Given that cuSamples performs a straightfoward matrix multiplication, would the more advanced version, wherever it lives, use cuBLAS?
If not, is there a way to access cuBLAS from Alea.cuBase a la CUDAfy?

With Alea GPU V2, the new version we have now two options:
Alea Unbound library provides optimized matrix multiplication implementations http://quantalea.com/static/app/tutorial/examples/unbound/matrixmult.html
Alea GPU has cuBlas integrated, see tutorial http://quantalea.com/static/app/tutorial/examples/cublas/index.html

The matrixMulCUBLAS project is a C++ project that ships with the CUDA SDK, https://developer.nvidia.com/cuda-downloads. This uses cuBLAS to get astonishingly quick matrix multiplication (139 GFlops) on my home laptop.

Can Vala be used without GObject?

I'm new to Vala. I'm not familiar with GObject. As I understand it, GObject was spun off from the GLib project from GNOME. Correct me if I'm wrong.
I do like the syntax and implementation of Vala very much, yet it is not in my intentions to write desktop applications for GNOME.
I also know (think I know) that Vala does not have a standard library other than GObject itself.
So my question is: Can Vala be used without GObject and if it can, is it usable (are there optimal and maintained base libraries for common things like type conversions, math, string manipulation, buffers, etc... available)?

There is some other Vala profiles like Dova and Posix.

TLDR: I recommend using Vala with GLib/GObject, because it was designed on top of them.
While there may be alternative profiles for valac they are either unfinished or deprecated.
The whole point of Vala is to reduce the amount of boilerplate required to write GLib and Gtk+ applications in C.
It also adds some nice other improvements over C, like string and array being simple data types instead of error prone pointers.
It mostly wraps all the concepts present in GObject like:
classes
properties
inheritance
delegates
async methods
reference counting (which is manual in C + GObject, and automatic aka ARC in Vala)
type safety of objects
generics
probably much more ...
All of these concepts can be implemented without using GObject/GLib/Gio, but that would mean to basically rewrite GObject/GLib/Gio which doesn't make much sense.
If you don't want to write GUI applications GLib can be used to write console applications as well, using GIO or GTK+ is optional in Vala, applications work on a headless server as well.
I think that there is even some effort in Qt to eventually switch to the GLib main loop, which would make interoperability of Qt and GLib much easier.
A good example of a framework that uses GLib is GStreamer which is used across different desktop environments as well.
In summary:
GLib is a basic cross platform application framework
GObject is the object system used by the GLib ecosystem
GIO is an I/O abstraction (network, filesystem, etc.) based on GLib + GObject
GTK+ is a graphic UI toolkit based on GLib + GObject + GIO + others
GNOME is a desktop environment based on all the "G" technologies
Vala is a high level programming language designed to reduce the boiler plate neded to use the "G" libraries from the C language.
GTK+ originally came from GIMP and was since split into the different "G" libraries that are the basis for GNOME today.
Vala also has very powerful binding mechanisms to make it easy to write so called "VAPI" files for any kind of C library out there.
With the correct VAPI bindings you don't have to worry about manual memory management, pointers, zero termination of strings and arrays and some other tedious things that make writing correct C code so difficult.

Here is another profile that you can use Aroop. (Note it is still under heavy development). What I hope is it is good if you need high performance. Please check the features here.

Floating point support in 64-bit compiler

What should we expect from the floating point support in 64-bit Delphi compiler?
Will 64-bit compiler use SSE to
implement floating point arithmetic?
Will 64-bit compiler support the
current 80-bit floating type
(Extended)?
These questions are closely related, so I ask them as a single question.

I made two posts on the subject (here and there), to summarize, yes, the 64bit compiler uses SSE2 (double precision), but it doesn't use SSE (single precision). Everything is converted to double precision floats, and computed using SSE2 (edit: however there is an option to control that)
This means f.i. that if Maths on double precision floats is fast, maths on single precision is slow (lots of redundant conversions between single and double precisions are thrown in), "Extended" is aliased to "Double", and intermediate computations precision is limited to double precision.
Edit: There was an undocumented (at the time) directive that controls SSE code generation, {$EXCESSPRECISION OFF} activates SSE code generation, which brings back performance within expectations.

According to Marco van de Voort in his answer to: How should I prepare my 32-bit Delphi programs for an eventual 64-bit compiler:
x87 FPU is deprecated on x64, and in general SSE2 will be used for florating point. so floating point and its exception handling might work slightly differently, and extended might not be 80-bit (but 64-bit or, less likely 128-bit). This also relates to the usual rounding (copro controlwork) changes when interfacing wiht C code that expects a different fpu word.
PHis commented on that answer with:
I wouldn't say that the x87 FPU is deprecated, but it is certainly the case that Microsoft have decided to do their best to make it that way (and they really don't seem to like 80-bit FP values), although it is clearly technically possible to use the FPU/80-bit floats on Win64.

I just posted an answer to your other question, but I guess it actually should go here:
Obviously, nobody except for Embarcadero can answer this for sure before the product is released.
It is very likely that any decent x64 compiler will use the SSE2 instruction set as a baseline and therefore attempt to do as much floating point computation using SSE features as possible, minimising the use of the x87 FPU. However, it should also be said that there is no technical reason that would prevent the use of the x87 FPU in x64 application code (despite rumours to the contrary which have been around for some time; if you want more info on that point, please have a look at Agner Fog's Calling Convention Manual, specifically chapter 6.1 "Can floating point registers be used in 64-bit Windows?").
Edit 1: Delphi XE2 Win64 indeed does not support 80-bit floating-point calculations out of the box (see e.g. discussuion here (although it allows one to read/write such values). One can bring such capabilities back to Delphi Win64 using a record + class operators, as is done in this TExtendedX87 type (although caveats apply).

For the double=extended bit:
Read ALlen Bauer's Twitter account Kylix_rd:
http://twitter.com/kylix_rd
In hindsight logical, because while SSE2 regs are 128 bit, they are used as two 64-bit doubles.

We won't know for sure how the 64-bit Delphi compiler will implement floating point arithmetic until Embarcadero actually ships it. Anything prior to that is just speculation. But once we know for sure it'll be too late to do anything about it.
Allen Bauer's tweets do seem to indicate that they'll use SSE2 and that the Extended type may be reduced to 64 bits instead of 80 bits. I think that would be a bad idea, for a variety of reasons. I've written up my thoughts in a QualityCentral report Extended should remain an 80-bit type on 64-bit platforms
If you don't want your code to drop from 80-bit precision to 64-bit precision when you move to 64-bit Delphi, click on the QualityCentral link and vote for my report. The more votes, the more likely Embarcadero will listen. If they do use SSE2 for 64-bit floating point, which makes sense, then adding 80-bit floating point using the FPU will be extra work for Embarcadero. I doubt they'll do that work unless a lot of developers ask for it.

If you really need it, then you can use the TExtendedX87 unit by Philipp M. Schlüter (PhiS on SO) as mentioned in this Embarcadero forum thread.
#PhiS: when you update your answer with the info from mine, I'll remove mine.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart