I'm trying to use the Accelerate framework on iOS to bypass the fact that Core Image on iOS doesn't support custom filters/kernels. I'm developing an edge detection filter using two convolutions with a Sobel kernel, but starting with a simple Gaussian blur to get the hangs of it. I know vImage is geared towards image manipulation as matrices and vDSP focuses in processing digital signals using Fourier transforms. But although I started using the vImage functions (vImageConvolve_XXXX, etc), I'm hearing a lot of people discussing the use of vDSP's functions (vDSP_conv, vDSP_imgfir, etc) to do such things as convolutions. So that leads me to the question at hand: when should I use one over the other? What are the differences between them with regards to convolution operations? I've looked everywhere but couldn't find a clear answer. Can someone shed some lights on it, or point me in the right direction?
Thanks!
If vImage provides the operation you need, it is usually simplest to use that. vImage does cache blocking and threading for you, vDSP does not. vImage provides operations on interleaved and integer formats, which are often useful for image processing.
Last time I experimented, neither of these frameworks took advantage of kernel separability, which affords a huge performance boost when convolving in the spatial domain -- a far larger performance boost than vectorized instructions will ever buy you. The Sobel kernel in particular is separable, so if you're using vDSP or vImage (instead of say OpenCV), be sure to separate the kernel yourself.
Related
I'm trying to realize some opencv functions by Halide, one of the difficulty I have met is how to write the cv::fillpoly in Halide. This function, in opencv, fills a polygon according to the given vertex of this polygon. The details in Opencv: https://docs.opencv.org/2.4/modules/core/doc/drawing_functions.html
I wonder if it's possible to realize it in Halide?
It is possible, but likely difficult and not obviously productive as polygon rendering does not have a regular static data parallel pattern and is already really well optimized in various places. It would be interesting to see if it can be done and made performant but it is a lot of work and isn't obviously going to be as fast or faster than existing graphics libraries. Especially if running on GPU hardware where it has to compete with hardware rasterization. I'd look into using define_extern to call out to existing rendering routines.
I am an experienced programmer but I don't have a lot of experience implementing DSP routines.
I've been banging my head against this for weeks if not months. My question is two fold, concerning Apple's Accelerate framework:
1)
In the vDSP.h header there are comments to the effect of: please use vDSP_DFT_XXX instead of the (i guess) older versions vDSP_fft_XXX. However there are zero examples of this outside of Apple's https://developer.apple.com/library/prerelease/mac/samplecode/vDSPExamples/Listings/DemonstrateDFT_c.html#//apple_ref/doc/uid/DTS10004300-DemonstrateDFT_c-DontLinkElementID_6. Maybe it's just that the DFT functions are newer? If so, fine and dandy.
2)
Scaling factors. I can read the documentation (https://developer.apple.com/library/mac/documentation/Performance/Conceptual/vDSP_Programming_Guide/UsingFourierTransforms/UsingFourierTransforms.html#//apple_ref/doc/uid/TP40005147-CH202-16195), it says in the case of an FFT on a real input, like audio that I am working with, the resulting value of each of the Fourier coefficients is 2x the actual, mathematical value.
And yet, in every example, including Apple's own, the scaling factor used for the resulting vsmul() function looks like it is 1/2*N instead of 1/2 as I would expect.
Further, there is no documentation about the scaling factors for the vDSP_DFT_XXX routines, but I assume that they just wrap the older ones?
Any insight into either of these questions would be greatly appreciated! Hopefully I'm just missing something basic about the way that FFT's are implemented in this framework (or in general).
There are at least 3 different FFT scaling options that produce "mathematical" results, and there is no single standard scaling. Energy preserving (see Parseval's theorem) FFT libraries need to be scaled by on the order of 1/N for input magnitude results, since a longer signal of the same magnitude will have proportionally more energy. vDSP uses an energy preserving forward FFT.
I'm working on a library that requires the use of vectors and matrices on the iOS platform. I decided to look into OpenGLES because the matrix and vector manipulations I plan on doing (mainly, transposing, matrix multiplication, and eigendecomposition) could definitely benefit from GPU acceleration.
The issue is that I'm not that familiar with OpenGLES and honestly might not be the best option. If I were to utilize OpenGLES, would I have to manually write the algorithms that do the matrix transposition, multiplication and eigendecomposition? Or is there another Apple or 3rd party framework that can help me with these tasks.
The main dividing issue however is that I want these operations to be GPU accelerated.
I'm going to implement my program using the Accelerate Framework and vectorized arithmetic and then test to see if its fast enough for my purposes and, if it isn't, then try the GPU implementation.
As combinatorial states, Accelerate uses SIMD to accelerate many of its functions, but it is CPU-based. For smaller data sets, it's definitely the way to go, but operating on the GPU can significantly outclass it for large enough data sets with easily parallelized operations.
To avoid having to write all of the OpenGL ES interaction code yourself, you could take a look at my GPUImage framework, which encapsulates fragment shader operations within Objective-C. In particular, you can use the GPUImageRawDataInput and GPUImageRawDataOutput classes to feed raw byte data into the GPU, then operate over that using a custom fragment shader.
A matrix transpose operation would be quick to implement, since all of the matrix elements are independent of one another. Matrix multiplication by a constant or small matrix would also be reasonably easy to do, but I'm not sure how to scale the multiplication of two large matrices properly. Likewise, I don't have a good implementation of eigendecomposition that I could point to off of the top of my head.
The downside to dealing with fragment shader processing is the fact that by default OpenGL ES takes in and outputs 4-byte RGBA values at each pixel. You can change that to half floats on newer devices, and I know that others have done this with this framework, but I haven't attempted that myself. You can pack individual float values into RGBA bytes and unpack at the end, as another approach to get this data in and out of the GPU.
The OpenGL ES 3.0 support on the very latest A7 devices provides some other opportunities for working with float data. You can use vertex data instead of texture input, which lets you supply four floats per vertex and extract those floats in the end. Bartosz Ciechanowski has a very detailed writeup of this on his blog. That might be a better general approach for GPGPU operations, but if you can get your operations to run against texture data in a fragment shader, you'll see huge speedups on the latest hardware (the iPhone 5S can be ~100-1000X faster than the iPhone 4 in this regard, where vertex processing and CPU speeds haven't advanced nearly as rapidly).
The accelerate framework is not accelerated on the GPU, but it is very well optimized and uses SIMD on Neon where appropriate.
I am drawing spectrograms using the sample code aurio touch provided by apple. Now I want to compare the two spectrograms in iOS to see if they are same. Is it possible to compare the two spectrograms using the Accelerate framework?
If it is possible, does anyone know how to compare two spectrograms? If not, is there any other algorithm or library which can be used in iOS for comparing spectrograms?
What you're looking for is called cross-correlation. It's doesn't involve the spectrograms directly, but is based on the same math that allows the spectrograms to be drawn (the Fourier Transform). There's a DSP stack exchange answer here: How do I implement cross-correlation to prove two audio files are similar? that covers the basics of implementing this.
The Accelerate framework will only help you with low-level things like vector and matrix arithmetic, Fourier transforms, etc. What you need to do is figure out how to compare two spectrograms (whatever you mean by compare) using pencil and paper (or just your head if you're pro) and then implement it in code with the aid of frameworks such as Accelerate.
vDSP has all of the building blocks to do cross correlation and convolution, which is what you would need to implement this.
https://developer.apple.com/library/mac/#documentation/Accelerate/Reference/vDSPRef/Reference/reference.html
Can the Hough Transform be used in commercial software?
I mean, it is one of those things that seem research only and unstable.
You would not put it in a commercial compositing software for example
and have the user rely on it at all times.
Any opinions?
Thanks
The Hough transform has been in use in commercial and industrial applications all over the world for years, decades even. From the wikipedia page you can see that it was first developed in 1972, based on earlier ideas from 1962. That means it is older than the CCD that you use to capture the images you use in the compositing software.
Given that it "seems research only and unstable" to you, I would suggest you spend some time learning various computer vision and image analysis algorithms and techniques, and get a good mathematical basis in the field in general before you implement the Hough transform in commercial compositing software.
And when you are done studying I'd suggest you use a well tested open source implementation.
Yes. In fact, I've written Hough transform code for a piece of commercial software that wasn't meant to be a research tool like MATLAB. Though I put a lot of time into its robustness towards a specific application, it worked great.
The Hough transform by itself can sometimes be unreliable in applications where you have some level noise, such in webcams, or when there are some distortions in the shape you need to extract. This may be what you are seeing. In this case you may need to do a little more tuning towards your application, or try some basic image preprocessing.
I'm a bit annoyed with the condescending tone in both the comment to the question (by High Performance Mark), as well as the accepted answer here.
Firstly, that programming libraries/frameworks provide an implementation of an algorithm does not mean it is used, or rather, suited for commercial applications (i.e. getting the job done, robustly, on less pristine conditions). The Hough transform is a well defined algorithm (with possible uses and limitations) which is simple enough to understand, and very commonly taught in introductory image processing courses. Not surprisingly, it has been implemented in general purpose libraries such as Matlab's, Octave's and OpenCV. I don't believe the question was intended to discuss the robustness of an implementation and possibility of inclusion in commercial image processing frameworks, but rather if the algorithm itself is well suited for end user software (an application that counts circles, or what not).
The accepted answer, as it stands, is "The algorithm is very old. Here is a book on image processing, here is a link to a image processing library that has implemented it". The other answer with zero score seems to be on topic (i.e. discussion possible applications), though isn't very specific ("worked for me").
So, why do some people get the impression that the hough transform is unreliable for shape detection? Here is a good example: Unreliable results with cv2.HoughCircles
The input seems to be very well defined circles. However, the more robust, suggested working solution doesn't use Hough transform. I've had similar experience with my own projects. Usually, the more robust way is some kind of object segmentation, distance transform, watershed and peak localization. Have I ever used Hough transform with good results? No. I think it could be useful in some cases. In particular if the shapes of the imaged objects are perfectly defined, and partially occluded.
In other words, I'm also curious as to commercial applications that ended up benefiting from Hough transform. That's how I came across this question, and subsequently disappointed in the "you wouldn't ask that question if you understood the subject better", responses.