I am drawing spectrograms using the sample code aurio touch provided by apple. Now I want to compare the two spectrograms in iOS to see if they are same. Is it possible to compare the two spectrograms using the Accelerate framework?
If it is possible, does anyone know how to compare two spectrograms? If not, is there any other algorithm or library which can be used in iOS for comparing spectrograms?
What you're looking for is called cross-correlation. It's doesn't involve the spectrograms directly, but is based on the same math that allows the spectrograms to be drawn (the Fourier Transform). There's a DSP stack exchange answer here: How do I implement cross-correlation to prove two audio files are similar? that covers the basics of implementing this.
The Accelerate framework will only help you with low-level things like vector and matrix arithmetic, Fourier transforms, etc. What you need to do is figure out how to compare two spectrograms (whatever you mean by compare) using pencil and paper (or just your head if you're pro) and then implement it in code with the aid of frameworks such as Accelerate.
vDSP has all of the building blocks to do cross correlation and convolution, which is what you would need to implement this.
https://developer.apple.com/library/mac/#documentation/Accelerate/Reference/vDSPRef/Reference/reference.html
Related
I am new to Audio framework but after searching a while i found Accelerate framework provided by iOS api for Digital Signal Processing. In my project i want to convert a sound file to fft so that i can compare two sounds using fft. So how should i proceed with this? I have gone through apples aurio touch sample app but they didnt use accelerate framework. Can any body help me to convert a sound file to fft and then compare using correlation .
The FFT is a complex beast, not something that can be comprehensively discussed in a single text box (I know accomplished engineers who have taken multiple semesters of classes studying topics that boil down to Fourier Transform analysis). Because of the nature of Accelerate framework's tasks, it too is a non-trivial discussion topic.
I would suggest reading Mike Ash's Friday Q&A on FFTs, where he covers some basic use of the vDSP functions to get FFT values, as a starting place.
See this DSP Stack Exchange answer for discussion on convolution and cross-correlation.
I have a 3D matrix (very large, let call it L) and a 3D small one (very small, let call it S) and want to use OpenCV to find the closest pattern in L.
Does OpenCV do it for me? If yes, how I should use it?
Thanks.
What you need is the Point Cloud Library, which is an open source library to work with 3D data. I can tell you from my experience, that learning to use this library is very similar to learning OpenCV because many developers work for Willow Garage, the main sponsor of OpenCV.
If you go to the PCL tutorials you will find three useful sections to solve your problem:
1) finding features in your 3D point cloud, that you can later use for matching
2) 3D object recognition based on correspondence grouping
3) Point cloud registration using methods like iterative closest point, and feature matching
No, OpenCV doesn't have anything for this.
Do you have sparse pointcloud or just 3-dim matrix?
For 3-dim matrix you can use phase correlation using FFT. Good library is FFTW
OpenCV has added some neat tools to accomplish this kind of task
Surface Matching https://docs.opencv.org/master/d9/d25/group__surface__matching.html
Silhouette based 3D tracking https://docs.opencv.org/master/d4/dc4/group__rapid.html
Convolutional Neural Network https://docs.opencv.org/master/d9/d02/group__cnn__3dobj.html
I'm trying to use the Accelerate framework on iOS to bypass the fact that Core Image on iOS doesn't support custom filters/kernels. I'm developing an edge detection filter using two convolutions with a Sobel kernel, but starting with a simple Gaussian blur to get the hangs of it. I know vImage is geared towards image manipulation as matrices and vDSP focuses in processing digital signals using Fourier transforms. But although I started using the vImage functions (vImageConvolve_XXXX, etc), I'm hearing a lot of people discussing the use of vDSP's functions (vDSP_conv, vDSP_imgfir, etc) to do such things as convolutions. So that leads me to the question at hand: when should I use one over the other? What are the differences between them with regards to convolution operations? I've looked everywhere but couldn't find a clear answer. Can someone shed some lights on it, or point me in the right direction?
Thanks!
If vImage provides the operation you need, it is usually simplest to use that. vImage does cache blocking and threading for you, vDSP does not. vImage provides operations on interleaved and integer formats, which are often useful for image processing.
Last time I experimented, neither of these frameworks took advantage of kernel separability, which affords a huge performance boost when convolving in the spatial domain -- a far larger performance boost than vectorized instructions will ever buy you. The Sobel kernel in particular is separable, so if you're using vDSP or vImage (instead of say OpenCV), be sure to separate the kernel yourself.
I'm planning on doing my Final Year Project of my degree on Augmented Reality. It will be using markers and there will also be interaction between virtual objects. (sort of a simulation).
Do you recommend using libraries like ARToolkit, NyARToolkit, osgART for such project since they come with all the functions for tracking, detection, calibration etc? Will there be much work left from the programmers point of view?
What do you think if I use OpenCV and do the marker detection, recognition, calibration and other steps from scratch? Will that be too hard to handle?
I don't know how familiar you are with image or video processing, but writing a tracker from scratch will be very time-consuming if want it to return reliable results. The effort also depends on which kind of markers you plan to use. Artoolkit e.g. compares the marker's content detected from the video stream to images you earlier defined as markers. Hence it tries to match images and returns a value of probability that a certain part of the video stream is a predefined marker. Depending on the threshold you are going to use and the lighting situation, markers are not always recognized correctly. Then there are other markers like datamatrix, qrcode, framemarkers (used by QCAR) that encode an id optically. So there is no image matching required, all necessary data can be retrieved from the video stream. Then there are more complex approaches like natural feature tracking, where you can use predefined images, given that they offer enough contrast and points of interest so they can be recognized later by the tracker.
So if you are more interested in the actual application or interaction than in understanding how trackers work, you should base your work on an existing library.
I suggest you to use OpenCV, you will find high quality algorithms and it is fast. They are continuously developing new methods so soon it will be possible to run it real-time in mobiles.
You can start with this tutorial here.
Mastering OpenCV with Practical Computer Vision Projects
I did the exact same thing and found Chapter 2 of this book immensely helpful. They provide source code for the marker tracking project and I've written a framemarker generator tool. There is still quite a lot to figure out in terms of OpenGL, camera calibration, projection matrices, markers and extending it, but it is a great foundation for the marker tracking portion.
For my project, supposed to segment closest hand region from camera, I initially try openCV's stereovision example. However, disparity map looks very bad and its useless for me.
Is there any other method which is better than openCV implementation and have some output(image-video). Because, my time is limited, I must choose one better algorithm and implement this.
Thank you.
OpenCV implements a number of stereo block matching algorithms some of them pretty cutting edge.
Disparity maps always look bad except in very simple circumstances - the first step is to try and improve the source images, the lighting and the background. I
If it was easy then everybody would eb doing it and there would be no market for expensive 3D laser scanners.
Try the different block matching algorithms provided by OpenCV. The little bit of experimentation I've done so far seems to indicate that cv::StereoSGBM gives better disparity maps than cv::StereoBM, but is slower.
The performance of the block matching algorithms will depend on what parameters they are initialized with. Have a look at the stereo examples again here, notice line 195-222 where the algorithms are initialized.
I also suggest you use some basic GUI (OpenCV:s highgui for example) to manipulate these parameters real-time when finetuning the algorithm.