Does Metal have functionality similar to CUDA's thrust::sort or is one expected to write their own sort function?
Apple's Accelerate framework has useful vector operations in the vDSP package. Sorting, specifically, can be found here.
https://developer.apple.com/documentation/accelerate/vdsp/reversing_and_sorting_functions
Related
I'm working on a library that requires the use of vectors and matrices on the iOS platform. I decided to look into OpenGLES because the matrix and vector manipulations I plan on doing (mainly, transposing, matrix multiplication, and eigendecomposition) could definitely benefit from GPU acceleration.
The issue is that I'm not that familiar with OpenGLES and honestly might not be the best option. If I were to utilize OpenGLES, would I have to manually write the algorithms that do the matrix transposition, multiplication and eigendecomposition? Or is there another Apple or 3rd party framework that can help me with these tasks.
The main dividing issue however is that I want these operations to be GPU accelerated.
I'm going to implement my program using the Accelerate Framework and vectorized arithmetic and then test to see if its fast enough for my purposes and, if it isn't, then try the GPU implementation.
As combinatorial states, Accelerate uses SIMD to accelerate many of its functions, but it is CPU-based. For smaller data sets, it's definitely the way to go, but operating on the GPU can significantly outclass it for large enough data sets with easily parallelized operations.
To avoid having to write all of the OpenGL ES interaction code yourself, you could take a look at my GPUImage framework, which encapsulates fragment shader operations within Objective-C. In particular, you can use the GPUImageRawDataInput and GPUImageRawDataOutput classes to feed raw byte data into the GPU, then operate over that using a custom fragment shader.
A matrix transpose operation would be quick to implement, since all of the matrix elements are independent of one another. Matrix multiplication by a constant or small matrix would also be reasonably easy to do, but I'm not sure how to scale the multiplication of two large matrices properly. Likewise, I don't have a good implementation of eigendecomposition that I could point to off of the top of my head.
The downside to dealing with fragment shader processing is the fact that by default OpenGL ES takes in and outputs 4-byte RGBA values at each pixel. You can change that to half floats on newer devices, and I know that others have done this with this framework, but I haven't attempted that myself. You can pack individual float values into RGBA bytes and unpack at the end, as another approach to get this data in and out of the GPU.
The OpenGL ES 3.0 support on the very latest A7 devices provides some other opportunities for working with float data. You can use vertex data instead of texture input, which lets you supply four floats per vertex and extract those floats in the end. Bartosz Ciechanowski has a very detailed writeup of this on his blog. That might be a better general approach for GPGPU operations, but if you can get your operations to run against texture data in a fragment shader, you'll see huge speedups on the latest hardware (the iPhone 5S can be ~100-1000X faster than the iPhone 4 in this regard, where vertex processing and CPU speeds haven't advanced nearly as rapidly).
The accelerate framework is not accelerated on the GPU, but it is very well optimized and uses SIMD on Neon where appropriate.
I am new to Audio framework but after searching a while i found Accelerate framework provided by iOS api for Digital Signal Processing. In my project i want to convert a sound file to fft so that i can compare two sounds using fft. So how should i proceed with this? I have gone through apples aurio touch sample app but they didnt use accelerate framework. Can any body help me to convert a sound file to fft and then compare using correlation .
The FFT is a complex beast, not something that can be comprehensively discussed in a single text box (I know accomplished engineers who have taken multiple semesters of classes studying topics that boil down to Fourier Transform analysis). Because of the nature of Accelerate framework's tasks, it too is a non-trivial discussion topic.
I would suggest reading Mike Ash's Friday Q&A on FFTs, where he covers some basic use of the vDSP functions to get FFT values, as a starting place.
See this DSP Stack Exchange answer for discussion on convolution and cross-correlation.
I am drawing spectrograms using the sample code aurio touch provided by apple. Now I want to compare the two spectrograms in iOS to see if they are same. Is it possible to compare the two spectrograms using the Accelerate framework?
If it is possible, does anyone know how to compare two spectrograms? If not, is there any other algorithm or library which can be used in iOS for comparing spectrograms?
What you're looking for is called cross-correlation. It's doesn't involve the spectrograms directly, but is based on the same math that allows the spectrograms to be drawn (the Fourier Transform). There's a DSP stack exchange answer here: How do I implement cross-correlation to prove two audio files are similar? that covers the basics of implementing this.
The Accelerate framework will only help you with low-level things like vector and matrix arithmetic, Fourier transforms, etc. What you need to do is figure out how to compare two spectrograms (whatever you mean by compare) using pencil and paper (or just your head if you're pro) and then implement it in code with the aid of frameworks such as Accelerate.
vDSP has all of the building blocks to do cross correlation and convolution, which is what you would need to implement this.
https://developer.apple.com/library/mac/#documentation/Accelerate/Reference/vDSPRef/Reference/reference.html
Does anyone know of a linear algebra library for iOS that uses OpenGL ES 2.0 under the covers?
Specifically, I am looking for a way to do matrix multiplication on arbitrary-sized matrices (e.g., much larger than 4x4, more like 5,000 x 100,000) using the GPUs on iOS devices.
Is there a specific reason you're asking for "uses OpenGL ES 2.0 under the covers?" Or do you just want a fast, hardware optimized linear algebra library such as BLAS, which is built into iOS?
MetalPerformanceShaders.framework provides some tuned BLAS-like functions. It is not GLES. It is metal and runs on the GPU. See MetalPerformanceShaders/MPSMatrixMultiplication.h
OpenGL on iOS is probably the wrong way to go. Metal support on iOS would be the better way to go if you're going GPU.
Metal
You could use Apple's support for Metal Compute shaders. I've written high-performance code for my PhD in it. An early experiment I made calculating some fractals using Metal might give you some ideas to start
Ultimately, this question is too broad. What do you intend to use the library for, or how do you intend to use it? Is it a one off multiplication? Have you tested with current libraries and found the performance to be too slow? If so, by how much?
In general, you can run educational or purely informational experiments on performance of algorithm X on CPU vs. GPU vs. specialized hardware, but most often you run up against Amdahl's law and your code vs. a team of experts in the field.
Accelerate
You can also look into the Accelerate framework which offers BLAS.
Apple, according to the WWDC 2014 talk What's new in the Accelerate Framework, has hand tuned the Linear Algebra libraries targeted at their current generation hardware. They aren't just fast, but energy efficient. There are newer talks as well.
I'm trying to use the Accelerate framework on iOS to bypass the fact that Core Image on iOS doesn't support custom filters/kernels. I'm developing an edge detection filter using two convolutions with a Sobel kernel, but starting with a simple Gaussian blur to get the hangs of it. I know vImage is geared towards image manipulation as matrices and vDSP focuses in processing digital signals using Fourier transforms. But although I started using the vImage functions (vImageConvolve_XXXX, etc), I'm hearing a lot of people discussing the use of vDSP's functions (vDSP_conv, vDSP_imgfir, etc) to do such things as convolutions. So that leads me to the question at hand: when should I use one over the other? What are the differences between them with regards to convolution operations? I've looked everywhere but couldn't find a clear answer. Can someone shed some lights on it, or point me in the right direction?
Thanks!
If vImage provides the operation you need, it is usually simplest to use that. vImage does cache blocking and threading for you, vDSP does not. vImage provides operations on interleaved and integer formats, which are often useful for image processing.
Last time I experimented, neither of these frameworks took advantage of kernel separability, which affords a huge performance boost when convolving in the spatial domain -- a far larger performance boost than vectorized instructions will ever buy you. The Sobel kernel in particular is separable, so if you're using vDSP or vImage (instead of say OpenCV), be sure to separate the kernel yourself.