OpenCL compliant DSP - signal-processing

On the Khronos website, OpenCL is said to be open to DSP. But when I look on the website of DSP making companies, like Texas Instrument, Freescale, NXP or Analog Devices, I can't find any mention about OpenCL.
So does anyone knows if a OpenCL compliant DSP exists?
Edit: As this question seems surprising, I add the reason why I asked it. From the khronos.org page:
"OpenCL 1.0 at a glance
OpenCL (Open Computing Language) is the first open, royalty-free standard for general-purpose parallel programming of heterogeneous systems. OpenCL provides a uniform programming environment for software developers to write efficient, portable code for high-performance compute servers, desktop computer systems and handheld devices using a diverse mix of multi-core CPUs, GPUs, Cell-type architectures and other parallel processors such as DSPs"
So I think it would be interesting to know if it's true, if DSPs, which are particulary suited for some complex calculations, can really be programmed using OpenCL.

The OpenCL spec seems to support using a chip that has one or more programmable GPU shader cores as an expensive DSP. It does not appear that the spec makes many allowances for DSP chips that were not designed to support being used as a programmable GPU shader in a graphics pipeline.

I finally found one: The SNU-Samsung OpenCL Framework is able to use Texas Instrument C64x DSPs. More infos here:
http://aces.snu.ac.kr/Center_for_Manycore_Programming/SNU-SAMSUNG_OpenCL_Framework.html

Related

How to leverage the Neural Engine on Apple Silicon/M1 processors as a developer?

I am struggling to find an answer for this on SO, Google, or Apple's Developer Documentation.
Does Apple provide a APIs for any languages that allow developers to leverage the Neural Engine of the new M1 chips on macOS?
Searching Apple's Developer Documentation brings up a lot of functions in the Metal Performance Shaders library, which seems to use GPU acceleration.
Searching SO with the tags apple-m1 or apple-silicon and the keyword "neural" gives nothing useful.
Searching r/AppleDevelopers for "neural" turns up nothing.
I assume there has to be some information about how to develop using the neural cores. Are these cores only available to Apple developers and commercial partners?
Apple doesn't provide direct public API to Neural Engine.
Some frameworks might be accelerated via Neural Engine:
Accelerate
CoreML
BNNS
MLCompute

Is OpenCL a shared, distributed or hybrid memory system

I'm having a hard time understanding if OpenCL and in particular OpenCL 2.0+ is a shared, distributed or a distributed shared memory architecture, in particular with a computer that has many OpenCL devices in the same PC.
In particular, I can see that It's a shared memory system in the fact that they can all access global memory but theirs a network-like aspect with the compute units that makes me question if it could classicly be classed as a distributed-shared memory architecture
Looking at it from a generic OpenCL coding perspective, your answer is "yes, maybe, unless it's not."
If you are talking about some specific hardware, there are (somewhere) clear and concise answers of how things work on the chip(s) and how OpenCL uses them.
By examining the OpenCL capacities and capabilities at runtime, you could modify some parameters of your OpenCL program or choose one of various kernels that is the best fit.

Hardware optimizations using Qualcomm Snapdragon 800 and Adreno 330

I am developing a real-time computer vision project that runs on an Ubuntu (Linaro) board with an ARM CPU (Snapdragon 800).
Some parts of the software operate on HD images, huge amount of data. This slows the execution and acts as a bottleneck.
These operations include:
Finding all local minimum and maximum values in a 2D array (image). Currenly, it is implemented using the naive, trivial way.
Building a KD-Tree and performing a K-Nearest-Neighbors search. This is currently done using the FLANN library included in OpenCV.
I am looking for ways to utilize the available Adreno 330 GPU, and accelerate these computations.
I was looking at OpenCL, but I found out that it is supported on Adreno 330 only as an "embedded profile", something that I do not what it is, and how it affects things.
I also heard about NEON in ARM processors, but I do not know how will it be any use for me.
Any help, tips and links will be appreciated.
Thanks,
Avi

Linear Algebra library using OpenGL ES 2.0 for iOS

Does anyone know of a linear algebra library for iOS that uses OpenGL ES 2.0 under the covers?
Specifically, I am looking for a way to do matrix multiplication on arbitrary-sized matrices (e.g., much larger than 4x4, more like 5,000 x 100,000) using the GPUs on iOS devices.
Is there a specific reason you're asking for "uses OpenGL ES 2.0 under the covers?" Or do you just want a fast, hardware optimized linear algebra library such as BLAS, which is built into iOS?
MetalPerformanceShaders.framework provides some tuned BLAS-like functions. It is not GLES. It is metal and runs on the GPU. See MetalPerformanceShaders/MPSMatrixMultiplication.h
OpenGL on iOS is probably the wrong way to go. Metal support on iOS would be the better way to go if you're going GPU.
Metal
You could use Apple's support for Metal Compute shaders. I've written high-performance code for my PhD in it. An early experiment I made calculating some fractals using Metal might give you some ideas to start
Ultimately, this question is too broad. What do you intend to use the library for, or how do you intend to use it? Is it a one off multiplication? Have you tested with current libraries and found the performance to be too slow? If so, by how much?
In general, you can run educational or purely informational experiments on performance of algorithm X on CPU vs. GPU vs. specialized hardware, but most often you run up against Amdahl's law and your code vs. a team of experts in the field.
Accelerate
You can also look into the Accelerate framework which offers BLAS.
Apple, according to the WWDC 2014 talk What's new in the Accelerate Framework, has hand tuned the Linear Algebra libraries targeted at their current generation hardware. They aren't just fast, but energy efficient. There are newer talks as well.

SIFT hardware accelerator for smartphones

I'm a fresh graduate electronics engineer and I've an experience on computer vision.I want to ask if it's feasible to make a hardware accelerator of SIFT algorithm - or any other openCV algorithms - to be used on smartphones instead of the current software implementation?
What are the advantages (much low computation, lower power, more complex applications will appear, ...) and the disadvantages(isn't better than the current software implementation, ...)?
Do you have an insight of that?
Thanks
You might be interested to check NEON optimizations - a type of SIMD instructions supported by Nvidia Tegra 3 architectures. Some OpenCV functions are NEON optimized.
Start by reading this nice article Realtime Computer Vision with OpenCV, it has performance comparisons about using NEON, etc.
I also recommend you to start here and here, you will find great insights.
Opencv supports both cuda and (experimentally) opencl
There are specific optimizations for Nvidia's Tegra chipset used in a lot of phones/tablets. I don't know if any phone's use opencl

Resources