Linear Algebra library using OpenGL ES 2.0 for iOS - ios

Does anyone know of a linear algebra library for iOS that uses OpenGL ES 2.0 under the covers?
Specifically, I am looking for a way to do matrix multiplication on arbitrary-sized matrices (e.g., much larger than 4x4, more like 5,000 x 100,000) using the GPUs on iOS devices.

Is there a specific reason you're asking for "uses OpenGL ES 2.0 under the covers?" Or do you just want a fast, hardware optimized linear algebra library such as BLAS, which is built into iOS?

MetalPerformanceShaders.framework provides some tuned BLAS-like functions. It is not GLES. It is metal and runs on the GPU. See MetalPerformanceShaders/MPSMatrixMultiplication.h

OpenGL on iOS is probably the wrong way to go. Metal support on iOS would be the better way to go if you're going GPU.
Metal
You could use Apple's support for Metal Compute shaders. I've written high-performance code for my PhD in it. An early experiment I made calculating some fractals using Metal might give you some ideas to start
Ultimately, this question is too broad. What do you intend to use the library for, or how do you intend to use it? Is it a one off multiplication? Have you tested with current libraries and found the performance to be too slow? If so, by how much?
In general, you can run educational or purely informational experiments on performance of algorithm X on CPU vs. GPU vs. specialized hardware, but most often you run up against Amdahl's law and your code vs. a team of experts in the field.
Accelerate
You can also look into the Accelerate framework which offers BLAS.
Apple, according to the WWDC 2014 talk What's new in the Accelerate Framework, has hand tuned the Linear Algebra libraries targeted at their current generation hardware. They aren't just fast, but energy efficient. There are newer talks as well.

Related

Which is faster when deploying cnn models by TensorFlow Lite, Caffe2 or OpenCV?

We can deploy MobileNet on Smartphone by TensorFlow Lite, Caffe2 or OpenCV, and I think Caffe2 will provide the best performance with higher fps. But why? Is the performance gap between them so large? Thanks.
You should probably go for TensorFlow Lite. Last I looked, Caffe2 had almost zero smartphone GPU support, while TFLite now supports both iOS and many Android devices (all that have OpenGLES >=3.1). Using the GPU generally makes things several times faster, and you can reduce the inference precision to half-float (FP16) with TFLite for even more speed and not too much of a performance hit.
When you can't use the mobile GPU, you'll probably want to quantize your network to int8, which is easily doable with TensorFlow and TensorFlow Lite, whether during or after training. Caffe2 seems to need QNNPACK for quantization, which is claimed to be as much as 2 times faster. The catch is that it only works with two pre-trained models that they released (https://github.com/pytorch/QNNPACK/issues/12), so you can't convert your own model.
So I can't really think of a reason to use Caffe2 over TFLite.
I'm not sure about OpenCV's DNN module, but I seriously doubt it has mobile GPU support. There's a slight chance it has quantization.
Each framework introduces their own optimizations, the result may be significantly different for different devices.

Fixed-Point vs Floating-Point and Performance

I have an iOS audio app that is using floating point to do it's processing right now. On the newer iOS devices it works flawlessly. However, on older devices it stalls/can't process it (no sound will come out).
Should I convert my algorithms to use a fixed-point system to work around this to improve the performance. Or should I just improve the algorithms I'm using to process them (as far as I know some of the algorithms I am using are mostly optimized). Is it worth trying to do fixed-point work in iOS?
Thanks!
EDIT
I'm starting to think that it's a processor speed issue and now I'm thinking I have to just optimize/improve my algorithms. Should I be going with this approach?
On any iOS device that supports iOS 9, using short floats for DSP computation (multiply-accumulates) for most DSP algorithms is as fast or faster than using 32-bit scaled integers. The NEON vector unit can dispatch 4 per cycle if you can keep the pipeline fed.

Does OpenGL ES 2.0 have a steeper learning curve than Metal?

I'm very interested in 3D graphics and heard many developers raving about Metal.
Can someone who has worked with Metal and OpenGL ES 2.0 comment on how the learning curve compares to OpenGL ES 2.0?
As a beginner who aims to stay loyal to iOS, is Metal easier to learn and master than OpenGL ES 2.0 or is it harder because it is more advanced?
I hope this question will be useful to many as I am trying to figure out where to start.
As a beginner, you might be better served by starting with 3D graphics at a higher level. SceneKit for OS X and iOS lets you describe a 3D scene in terms of its content -- geometry, materials (textures/shading), lights, and cameras -- and load assets created with 3D modeling tools. SceneKit is built on OpenGL (ES), so it uses a lot of the same concepts. As you become familiar with those concepts, you can use SceneKit to work your way into the OpenGL world a bit at a time:
use shader modifiers to write GPU shader code that extends SceneKit's
use custom programs to write complete shaders that replace SceneKit's, or techniques to write shaders that postprocess SceneKit's rendering
create custom geometry from your own vertex data with geometry sources & elements
use a node renderer delegate to write your own OpenGL client code that works within a scene
You'll find more info about all of these by watching the SceneKit videos from WWDC: What's New in SceneKit and Building a Game with SceneKit.
Otherwise... OpenGL (ES) and Metal don't have very different learning curves in and of themselves. In fact, I'd consider Metal more approachable than OpenGL in some ways -- for example, many things you can do in GL have implicit and hard-to-predict performance costs, and the Metal analogues of those tasks are much more clear about their impact on CPU or GPU time is and allow you to decide when expensive work gets done.
On the other hand, Metal is brand new -- there aren't yet a lot of third-party resources to help you learn it. And lot of the hard things about learning 3D graphics are very similar whether you're working in Metal, OpenGL, DirectX, or another platform/API. Once you learn the important stuff -- there are plenty of books and online tutorials for that, but StackOverflow isn't the best way to go looking for them -- getting up to speed with Metal or with OpenGL ES on a specific platform is pretty easy.
Coming from an OpenGL-ES background, I had a good look at the Metal APIs. I believe that the learning curve for Metal is steeper, not because it's a new API, but because it introduces low level constructs which developers previously didn't need to worry about.
If you compare fixed pipeline Open-GL with shader oriented Open GL flavours (On mobile: ES 1.x compared with ES 2.x, 3.x), and finally with Metal, what you have is increasingly powerful, increasingly generic APIs detached from the intuitive constructs (triangles, vertices, lamps) which constitute Open-GL's historical foundation.
Bear in mind that creating a more usable API isn't the main goal of Metal. The goal of this framework is helping developers to get rid of driver overheads.

SIFT hardware accelerator for smartphones

I'm a fresh graduate electronics engineer and I've an experience on computer vision.I want to ask if it's feasible to make a hardware accelerator of SIFT algorithm - or any other openCV algorithms - to be used on smartphones instead of the current software implementation?
What are the advantages (much low computation, lower power, more complex applications will appear, ...) and the disadvantages(isn't better than the current software implementation, ...)?
Do you have an insight of that?
Thanks
You might be interested to check NEON optimizations - a type of SIMD instructions supported by Nvidia Tegra 3 architectures. Some OpenCV functions are NEON optimized.
Start by reading this nice article Realtime Computer Vision with OpenCV, it has performance comparisons about using NEON, etc.
I also recommend you to start here and here, you will find great insights.
Opencv supports both cuda and (experimentally) opencl
There are specific optimizations for Nvidia's Tegra chipset used in a lot of phones/tablets. I don't know if any phone's use opencl

OpenCL compliant DSP

On the Khronos website, OpenCL is said to be open to DSP. But when I look on the website of DSP making companies, like Texas Instrument, Freescale, NXP or Analog Devices, I can't find any mention about OpenCL.
So does anyone knows if a OpenCL compliant DSP exists?
Edit: As this question seems surprising, I add the reason why I asked it. From the khronos.org page:
"OpenCL 1.0 at a glance
OpenCL (Open Computing Language) is the first open, royalty-free standard for general-purpose parallel programming of heterogeneous systems. OpenCL provides a uniform programming environment for software developers to write efficient, portable code for high-performance compute servers, desktop computer systems and handheld devices using a diverse mix of multi-core CPUs, GPUs, Cell-type architectures and other parallel processors such as DSPs"
So I think it would be interesting to know if it's true, if DSPs, which are particulary suited for some complex calculations, can really be programmed using OpenCL.
The OpenCL spec seems to support using a chip that has one or more programmable GPU shader cores as an expensive DSP. It does not appear that the spec makes many allowances for DSP chips that were not designed to support being used as a programmable GPU shader in a graphics pipeline.
I finally found one: The SNU-Samsung OpenCL Framework is able to use Texas Instrument C64x DSPs. More infos here:
http://aces.snu.ac.kr/Center_for_Manycore_Programming/SNU-SAMSUNG_OpenCL_Framework.html

Resources