Fixed-Point vs Floating-Point and Performance - ios

I have an iOS audio app that is using floating point to do it's processing right now. On the newer iOS devices it works flawlessly. However, on older devices it stalls/can't process it (no sound will come out).
Should I convert my algorithms to use a fixed-point system to work around this to improve the performance. Or should I just improve the algorithms I'm using to process them (as far as I know some of the algorithms I am using are mostly optimized). Is it worth trying to do fixed-point work in iOS?
Thanks!
EDIT
I'm starting to think that it's a processor speed issue and now I'm thinking I have to just optimize/improve my algorithms. Should I be going with this approach?

On any iOS device that supports iOS 9, using short floats for DSP computation (multiply-accumulates) for most DSP algorithms is as fast or faster than using 32-bit scaled integers. The NEON vector unit can dispatch 4 per cycle if you can keep the pipeline fed.

Related

How was integer divide pervasively used in Apple programming?

I was reading an interesting interview with Chris Lattner, author of LLVM and Swift, and noticed a very curious claim:
Other things are that Apple does periodically add new instructions [11:00] to its CPUs. One example of this historically was the hilariously named “Swift” chip that it launched which was the first designed in-house 32-bit ARM chip. This was the iPhone 5, if I recall.
In this chip, they added an integer-divide instruction. All of the chips before that didn’t have the ability to integer-divide in hardware: you had to actually open-code it, and there was a library function to do that. [11:30] That, and a few other instructions they added, were a pretty big deal and used pervasively
Now that is surprising. As I understand it, integer divide is almost never needed. The cases I've seen where it could be used, fall into a few categories:
Dividing by a power of two. Shift right instead.
Dividing by an integer constant. Multiply by the reciprocal instead. (It's counterintuitive but true that this works in all cases.)
Fixed point as an approximation of real number arithmetic. Use floating point instead.
Fixed point, for multiplayer games that run peer-to-peer across computers with different CPU architectures and need each computer to agree on the results down to the last bit. Okay, but as I understand it, multiplayer games on iPhone don't use that kind of peer-to-peer design.
Rendering 3D textures. Do that on the GPU instead.
After floating point became available in hardware, I've never seen a workload that needed to do integer divide with significant frequency.
What am I missing? What was integer divide used for on Apple devices, so frequently that it was considered worth adding as a CPU instruction?

How to use OpenCV functions in Metal on iOS?

I have developed the Xcode project that uses OpenCV functions for image processing when the iPhone camera live stream.
It takes some time to process one frame and doesn't look like real time.
Is it possible to accelerate the calculation by integrating OpenCV and Metal?
For example, OpenCV function "grabCut" takes more than 1 second to detect certain foreground objects.
How can I reduce the processing time down to 10ms at least using Metal?
You can't call OpenCV functions from Metal.
If you want to speed up this algorithm, you could try porting it to Metal but that's only an option if the algorithm -- or major parts of it -- are highly parallel.
Now, it looks like grabCut has a CUDA implementation (which I found by googling for "grabcut cuda"), which means that implementing this in Metal might actually be worth doing. If you can find the CUDA source code, it's usually a relatively straightforward port.

Apple Accelerate vDSP fft vs DFT and scaling factors

I am an experienced programmer but I don't have a lot of experience implementing DSP routines.
I've been banging my head against this for weeks if not months. My question is two fold, concerning Apple's Accelerate framework:
1)
In the vDSP.h header there are comments to the effect of: please use vDSP_DFT_XXX instead of the (i guess) older versions vDSP_fft_XXX. However there are zero examples of this outside of Apple's https://developer.apple.com/library/prerelease/mac/samplecode/vDSPExamples/Listings/DemonstrateDFT_c.html#//apple_ref/doc/uid/DTS10004300-DemonstrateDFT_c-DontLinkElementID_6. Maybe it's just that the DFT functions are newer? If so, fine and dandy.
2)
Scaling factors. I can read the documentation (https://developer.apple.com/library/mac/documentation/Performance/Conceptual/vDSP_Programming_Guide/UsingFourierTransforms/UsingFourierTransforms.html#//apple_ref/doc/uid/TP40005147-CH202-16195), it says in the case of an FFT on a real input, like audio that I am working with, the resulting value of each of the Fourier coefficients is 2x the actual, mathematical value.
And yet, in every example, including Apple's own, the scaling factor used for the resulting vsmul() function looks like it is 1/2*N instead of 1/2 as I would expect.
Further, there is no documentation about the scaling factors for the vDSP_DFT_XXX routines, but I assume that they just wrap the older ones?
Any insight into either of these questions would be greatly appreciated! Hopefully I'm just missing something basic about the way that FFT's are implemented in this framework (or in general).
There are at least 3 different FFT scaling options that produce "mathematical" results, and there is no single standard scaling. Energy preserving (see Parseval's theorem) FFT libraries need to be scaled by on the order of 1/N for input magnitude results, since a longer signal of the same magnitude will have proportionally more energy. vDSP uses an energy preserving forward FFT.

Linear Algebra library using OpenGL ES 2.0 for iOS

Does anyone know of a linear algebra library for iOS that uses OpenGL ES 2.0 under the covers?
Specifically, I am looking for a way to do matrix multiplication on arbitrary-sized matrices (e.g., much larger than 4x4, more like 5,000 x 100,000) using the GPUs on iOS devices.
Is there a specific reason you're asking for "uses OpenGL ES 2.0 under the covers?" Or do you just want a fast, hardware optimized linear algebra library such as BLAS, which is built into iOS?
MetalPerformanceShaders.framework provides some tuned BLAS-like functions. It is not GLES. It is metal and runs on the GPU. See MetalPerformanceShaders/MPSMatrixMultiplication.h
OpenGL on iOS is probably the wrong way to go. Metal support on iOS would be the better way to go if you're going GPU.
Metal
You could use Apple's support for Metal Compute shaders. I've written high-performance code for my PhD in it. An early experiment I made calculating some fractals using Metal might give you some ideas to start
Ultimately, this question is too broad. What do you intend to use the library for, or how do you intend to use it? Is it a one off multiplication? Have you tested with current libraries and found the performance to be too slow? If so, by how much?
In general, you can run educational or purely informational experiments on performance of algorithm X on CPU vs. GPU vs. specialized hardware, but most often you run up against Amdahl's law and your code vs. a team of experts in the field.
Accelerate
You can also look into the Accelerate framework which offers BLAS.
Apple, according to the WWDC 2014 talk What's new in the Accelerate Framework, has hand tuned the Linear Algebra libraries targeted at their current generation hardware. They aren't just fast, but energy efficient. There are newer talks as well.

fastest image processing library?

I'm working on robot vision system and its main purpose is to detect objects, i want to choose one of these libraries (CImg , OpenCV) and I have knowledge about both of them.
The robot I'm using has Linux , 1GHz CPU and 1G ram and I'm using C++ the size of image is 320p.
I want to have a real-time image processing near 20 out of 25 frames per seconds.
In your opinion which library is more powerful l although I have tested both and they have the same process time, open cv is slightly better and I think that's because I use pointers with open cv codes.
Please share your idea and your reason.
thanks.
I think you can possibly get best performance when you integrated - OpenCV with IPP.
See this reference, http://software.intel.com/en-us/articles/intel-integrated-performance-primitives-intel-ipp-open-source-computer-vision-library-opencv-faq/
Here is another reference http://experienceopencv.blogspot.com/2011/07/speed-up-with-intel-integrated.html
Further, if you freeze the algorithm that works perfectly, usually you can isolate your algorithm and work your way towards doing serious optimization (such as memory optimization, porting to assembly etc.) which might not be ready to use.
It really depends on what you want to do (what kind of objects you want to detect, accuracy, what algorithm you are using etc..) and how much time you have got. If it is for generic computer vision/image processing, I would stick with OpenCV. As Dipan said, do consider further optimization. In my experience with optimization for Computer Vision, the bottleneck usually is in memory interconnect bandwidth (or memory itself) and so you might have to trade in cycles (computation) to save on communication. Do understand the algorithm really well to further optimize the algorithm (which at times can give huge improvements as compared to compilers).

Resources