Apple Accelerate vDSP fft vs DFT and scaling factors - ios

I am an experienced programmer but I don't have a lot of experience implementing DSP routines.
I've been banging my head against this for weeks if not months. My question is two fold, concerning Apple's Accelerate framework:
In the vDSP.h header there are comments to the effect of: please use vDSP_DFT_XXX instead of the (i guess) older versions vDSP_fft_XXX. However there are zero examples of this outside of Apple's Maybe it's just that the DFT functions are newer? If so, fine and dandy.
Scaling factors. I can read the documentation (, it says in the case of an FFT on a real input, like audio that I am working with, the resulting value of each of the Fourier coefficients is 2x the actual, mathematical value.
And yet, in every example, including Apple's own, the scaling factor used for the resulting vsmul() function looks like it is 1/2*N instead of 1/2 as I would expect.
Further, there is no documentation about the scaling factors for the vDSP_DFT_XXX routines, but I assume that they just wrap the older ones?
Any insight into either of these questions would be greatly appreciated! Hopefully I'm just missing something basic about the way that FFT's are implemented in this framework (or in general).

There are at least 3 different FFT scaling options that produce "mathematical" results, and there is no single standard scaling. Energy preserving (see Parseval's theorem) FFT libraries need to be scaled by on the order of 1/N for input magnitude results, since a longer signal of the same magnitude will have proportionally more energy. vDSP uses an energy preserving forward FFT.


Can the Hough Transform be used in commercial software?

Can the Hough Transform be used in commercial software?
I mean, it is one of those things that seem research only and unstable.
You would not put it in a commercial compositing software for example
and have the user rely on it at all times.
Any opinions?
The Hough transform has been in use in commercial and industrial applications all over the world for years, decades even. From the wikipedia page you can see that it was first developed in 1972, based on earlier ideas from 1962. That means it is older than the CCD that you use to capture the images you use in the compositing software.
Given that it "seems research only and unstable" to you, I would suggest you spend some time learning various computer vision and image analysis algorithms and techniques, and get a good mathematical basis in the field in general before you implement the Hough transform in commercial compositing software.
And when you are done studying I'd suggest you use a well tested open source implementation.
Yes. In fact, I've written Hough transform code for a piece of commercial software that wasn't meant to be a research tool like MATLAB. Though I put a lot of time into its robustness towards a specific application, it worked great.
The Hough transform by itself can sometimes be unreliable in applications where you have some level noise, such in webcams, or when there are some distortions in the shape you need to extract. This may be what you are seeing. In this case you may need to do a little more tuning towards your application, or try some basic image preprocessing.
I'm a bit annoyed with the condescending tone in both the comment to the question (by High Performance Mark), as well as the accepted answer here.
Firstly, that programming libraries/frameworks provide an implementation of an algorithm does not mean it is used, or rather, suited for commercial applications (i.e. getting the job done, robustly, on less pristine conditions). The Hough transform is a well defined algorithm (with possible uses and limitations) which is simple enough to understand, and very commonly taught in introductory image processing courses. Not surprisingly, it has been implemented in general purpose libraries such as Matlab's, Octave's and OpenCV. I don't believe the question was intended to discuss the robustness of an implementation and possibility of inclusion in commercial image processing frameworks, but rather if the algorithm itself is well suited for end user software (an application that counts circles, or what not).
The accepted answer, as it stands, is "The algorithm is very old. Here is a book on image processing, here is a link to a image processing library that has implemented it". The other answer with zero score seems to be on topic (i.e. discussion possible applications), though isn't very specific ("worked for me").
So, why do some people get the impression that the hough transform is unreliable for shape detection? Here is a good example: Unreliable results with cv2.HoughCircles
The input seems to be very well defined circles. However, the more robust, suggested working solution doesn't use Hough transform. I've had similar experience with my own projects. Usually, the more robust way is some kind of object segmentation, distance transform, watershed and peak localization. Have I ever used Hough transform with good results? No. I think it could be useful in some cases. In particular if the shapes of the imaged objects are perfectly defined, and partially occluded.
In other words, I'm also curious as to commercial applications that ended up benefiting from Hough transform. That's how I came across this question, and subsequently disappointed in the "you wouldn't ask that question if you understood the subject better", responses.

Simplex noise vs Perlin noise

I would like to know why Perlin noise is still so popular today after Simplex came out. Simplex noise was made by Ken Perlin himself and it was suppose to take over his old algorithm which was slow for higher dimensions and with better quality (no visible artifacts).
Simplex noise came out in 2001 and over those 10 years I've only seen people talk of Perlin noise when it comes to generating heightmaps for terrains, creating procedural textures, et cetera.
Could anyone help me out, is there some downside of Simplex noise? I heard rumors that Perlin noise is faster when it comes to 1D and 2D noise, but I don't know if it's true or not.
"If it ain't broke, don't fix it."
See if you can find anyone telling you why Simplex is better. "It's faster and extends to multiple dimensions" and "simplex noise attempts to reduce the complexity of higher dimensional noise functions" were what I found. Most of us work in 2 or 3 dimensions, maybe 4 if we're lucky enough to be doing something with time.
I think its fair to say there is little enough real-time usage of Perlin that is too slow to handle, that for most purposes standard Perlin noise is sufficient. In pre-renderings (such as used in the movie industry) time isn't really important since renderings are slow anyway; and in real-time simulations, we have enough ways to reduce the scope of ongoing processing that it's unlikely you're going to be generating massive noise maps every few nano/milliseconds -- that's just basic real-time optimisation.
I wouldn't be at all surprised if it was simply because of the name. You have to choose between Perlin noise and Simplex noise. The latter is newer and has some advantages. But, you know, it sounds like the 'simple' version of the two. I'll go with the complexer one; noise is supposed to be complex, isn't it?
People tend to be rather irrational.
Ken Perlin patented his simplex noise algorithm. His classic algorithm is not patented to my knowledge.
Some preference for the classic Perlin noise may come from being able to use known values resulting in known visual characteristics, as opposed to investing the time required to find the input parameters needed to get an equivalent output using simplex noise.
[simplex noise] has a slightly different visual character to it, so it’s not always a direct plug-in replacement for classic noise. Applications that depend on the detailed characteristics of classic noise, like the precise feature size, the exact range of values or higher order statistics, might need some modification to look good when using simplex noise instead.
Stefan Gustavson's Simplex noise demystified
Just some anecdotal experience, the reason I used classic Perlin noise was because Ken Perlin had a C implementation of classic Perlin noise, while providing a Java implementation of improved Perlin noise. Silly as it may sound, classic Perlin noise was easier to copy and paste into my program, so that is why I used it. I always intended to get around to porting that Java implementation, but classic Perlin appeared to work well enough, so I never bothered to add it.
Stefan Gustavson has some very good C implementations of Simplex Noise, here
I haven't worked with simplex noise yet, but I can think about a few reasons:
Perhaps because we're used to squares and 90 deg angles? Squares, Cubes,... are much more natural to us than triangles, tetraeders or hyper-tetraeders.
Each layer in perlin noise is just a simple bitmap.
The output of perlin noise are easily tileable squares. And textures are often tiled squares.
You usually use low dimensional noise. In my experience 2D and 3D are most common.
Simplex noise is simply harder to understand and implement
Probably the samplers in a graphic card can do the interpolation for orthogonal bitmaps as used in perlin noise, but not the interpolation on 60 deg angles bitmaps used in simplex noise. (this point might be wrong, I haven't worked with graphics cards for a few years)
I would answer the question bluntly I would say it is because Perlin noise is super simple to get your head around. Simplex noise on the other hand is very much a more complex and hairer beast. Getting a Perlin implementation up and running is much easier than simplex and thus gets more usage. It does not help simplex's case that both are very similiar in the visuals (especially after you manipulate the noise a bit).
Kenneth Perlin himself designed the simplex algorithm for an hardware based implementation and thus made design decisions that make this easier. One example of this can be seen in this quoute, from the patent.
Need for table memory: The original Noise algorithm relied on a number of table lookups, which are quite reasonable in a software implementation, but which in a hardware implementation are expensive and constitute a cost bottleneck, particularly when multiple instances of the Noise function are required in parallel. Ideally, a Noise implementation should not rely on the presence of tables of significant size.
simplex noise looks worse imho, and lots of people think it looks "increasingly bad" in higher dimensions. I'd still recommend it over perlin for most applications, as most won't be using just raw simplex but octaves of it which looks roughly the same as octaves of perlin and is significantly faster for octaves.

Stereovision algorithms

For my project, supposed to segment closest hand region from camera, I initially try openCV's stereovision example. However, disparity map looks very bad and its useless for me.
Is there any other method which is better than openCV implementation and have some output(image-video). Because, my time is limited, I must choose one better algorithm and implement this.
Thank you.
OpenCV implements a number of stereo block matching algorithms some of them pretty cutting edge.
Disparity maps always look bad except in very simple circumstances - the first step is to try and improve the source images, the lighting and the background. I
If it was easy then everybody would eb doing it and there would be no market for expensive 3D laser scanners.
Try the different block matching algorithms provided by OpenCV. The little bit of experimentation I've done so far seems to indicate that cv::StereoSGBM gives better disparity maps than cv::StereoBM, but is slower.
The performance of the block matching algorithms will depend on what parameters they are initialized with. Have a look at the stereo examples again here, notice line 195-222 where the algorithms are initialized.
I also suggest you use some basic GUI (OpenCV:s highgui for example) to manipulate these parameters real-time when finetuning the algorithm.

Natural feature tracking with openCV- evaluating the options

In brief, what are the available options for implementing the Tracking of a particular Image(A photo/graphic/logo) in webcam feed using OpenCv?In particular i am trying to collate opinion about the following:
Would HaarTraining be overkill(considering that it is not 3d objects but simply Images to be tracked) or is it the only way out?
Have tried Template Matching, Color-based detection but these don't offer reliable tracking under varying illumination/Scale/Orientation at all.
Would SIFT,SURF feature matching work as reliably in video as with static image
Am a relative beginner to OpenCV , as is evident by my previous queries on SO (very helpful replies). Any cues or links to what could be good resources for beginning NFT implementation with OpenCV?
Can you talk a bit more about your requirements? Namely, what type of appearance variations do you expect/how much control you have over the environment. What type of constraints do you have in terms of speed/power/resource footprint?
Without those, I can only give some general assessment to the 3 paths you are talking about.
Haar would work well and fast, particularly for instance recognition.
Note that Haar doesn't work all that well for 3D unless you train with a full spectrum of templates to cover various perspectives. The poster child application of Haar cascades is Viola Jones' face detection system which is largely geared towards frontal faces (can certainly be trained for many other things)
For a tutorial on doing Haar training using OpenCV, see here.
Try NCC or better yet, Lucas Kanade tracking (cvCalcOpticalFlowPyrLK which is a pyramidal as in coarse-to-fine LK - a 4 level pyramid usually works well) for a template. Usually good upto 10% scale or 10 degrees rotation without template changes. Beyond that, you can have automatically evolving templates which can drift over time.
For a quick Optical Flow/tracking tutorial, see this.
SIFT/SURF would indeed work very well. I'd suggest some additional geometric verification step to remove spurious matches.
I'd be a bit concerned about the amount of computational time involved. If there isn't significant illumination/scale/in-plane rotation, then SIFT is probably overkill. If you truly need it, check out Changchang Wu's excellent SIFTGPU implmentation. Note: 3rd party, not OpenCV.
It seems that none of the methods when applied alone could bring reliable results unless it is a hobby project. Probably some adaptive algorithm would be more or less acceptable. For example see a famous opensource project where they use machine learning.

Fastest method to compute convolution

I have to apply a convolution filter on each row of many images. The classic is 360 images of 1024x1024 pixels. In my use case it is 720 images 560x600 pixels.
The problem is that my code is much slower than what is advertised in articles.
I have implemented the naive convolution, and it takes 2m 30s. I then switched to FFT using fftw. I used complex 2 complex, filtering two rows in each transform. I'm now around 20s.
The thing is that articles advertise around 10s and even less for the classic condition.
So I'd like to ask the experts here if there could be a faster way to compute the convolution.
Numerical recipes suggest to avoid the sorting done in the dft and adapt the frequency domain filter function accordingly. But there is no code example how this could be done.
Maybe I lose time in copying data. With real 2 real transform I wouldn't have to copy the data into the complexe values. But I have to pad with 0 anyway.
EDIT: see my own answer below for progress feedback and further information on solving this issue.
Question (precise reformulation):
I'm looking for an algorithm or piece of code to apply a very fast convolution to a discrete non periodic function (512 to 2048 values). Apparently the discrete time Fourier transform is the way to go. Though, I'd like to avoid data copy and conversion to complex, and avoid the butterfly reordering.
FFT is the fastest technique known for convolving signals, and FFTW is the fastest free library available for computing the FFT.
The key for you to get maximum performance (outside of hardware ... the GPU is a good suggestion) will be to pad your signals to a power of two. When using FFTW use the 'patient' setting when creating your plan to get the best performance. It's highly unlikely that you will hand-roll a faster implementation than what FFTW provides (forget about N.R.). Also be sure to be using the Real version of the forward 1D FFT and not the Complex version; and only use single (floating point) precision if you can.
If FFTW is not cutting it for you, then I would look at Intel's (very affordable) IPP library. The have hand tuned FFT's for Intel processors that have been optimized for images with various bit depths.
CenterSpace Software
You may want to add image processing as a tag.
But, this article may be of interest, esp with the assumption the image is a power or 2. You can also see where they optimize the FFT. I expect that the articles you are looking at made some assumptions and then optimized the equations for those.
If you want to go faster you may want to use the GPU to actually do the work.
This book may be helpful for you, if you go with the GPU:
This answer is to collect progress report feedback on this issue.
Edit 11 oct.:
The execution time I measured doesn't reflect the effective time of the FFT. I noticed that when my program ends, the CPU is still busy in system time up to 42% for 10s. When I wait until the CPU is back to 0%, before restarting my program I then get the 15.35s execution time which comes from the GPU processing. I get the same time if I comment out the FFT filtering.
So the FFT is in fact currently faster then the GPU and was simply hindered by a competing system task. I don't know yet what this system task is. I suspect it results from the allocation of a huge heap block where I copy the processing result before writing it to disk. For the input data I use a memory map.
I'll now change my code to get an accurate measurement of the FFT processing time. Making it faster is still actuality because there is room to optimize the GPU processing like for instance by pipelining the transfer of data to process.
