Hi I am trying to do nearest neighbor queries on integer data.
It seems that cv::flann does not support this. Is this true?
Yes, it is possible to use FLANN nearest neighbor searches on integer data. You need to use a distance measure for integers. Some distance measures are templates, parameterized on data type (as in the example below), others have hard coded types (e.g. HammingLUT has unsigned char element type and int result (distance) type). You can also implement your own distance measure, see <opencv2/flann/dist.h> for details.
Example - a quote from the code that uses unsigned char data:
// we use euclidean distances on unsigned chars:
typedef cv::flann::L2<unsigned char> Distance_U8;
cv::flann::GenericIndex< Distance_U8 > * m_flann;
// ...
// we have 3d features
cv::Mat features( features_count, 3, CV_8UC1 );
// ... fill the features matrix ...
// ... build the index ...
m_flann = new cv::flann::GenericIndex< Distance_U8 > (features, params);
// ...
// how many neighbours per query?
in knn = 5;
// search params - see documentation
cvflann::SearchParams params;
// prepare the matrices
// query data - unsigned chars, 3d (like features)
cv::Mat input_1( n_pixels, 3, CV_8UC1 ),
// indices into features array - integers
indices_1( n_pixels, knn, CV_32S ),
// distances - floats (even with integer data distances are floats)
dists_1( n_pixels, knn, CV_32F );
m_flann->knnSearch( input_1, indices_1, dists_1, 1, params);
No, FLANN is for float descriptors only. Although poorly documented the OpenCV set of matchers and descriptors must be used carefully.
There is a bug report on the ros trac explaining in more detail, but basically descriptors and matchers only handle certain types of data, and this must be respected. I've included an extract from the previously mentioned page here for reference:
Descriptors:
float descriptors: SIFT, SURF
uchar descriptors: ORB BRIEF
Matchers:
for float descriptor: FlannBased BruteForce BruteForce-L1
for uchar descriptor: BruteForce-Hamming BruteForce-HammingLUT
Related
Hi i'm using program Visual Structure From Motion to recover the structure of a 3d-place. However, i 've already computed my descriptors and my features; so i want to use them in Visual Structure From Motion.I've read that the file which contains informations about descriptor should has the following pattern:
[Header][Location Data][Descriptor Data][EOF]
[Header] = int[5] = {name, version, npoint, 5, 128};
name = ('S'+ ('I'<<8)+('F'<<16)+('T'<<24));
version = ('V'+('4'<<8)+('.'<<16)+('0'<<24)); or ('V'+('5'<<8)+('.'<<16)+('0'<<24)) if containing color info
npoint = number of features.
[Location Data] is a npoint x 5 float matrix and each row is [x, y, color, scale, orientation].
Write color by casting the float to unsigned char[4]
scale & orientation are only used for visualization, so you can simply write 0 for them
Sort features in the order of decreasing importance, since VisualSFM may use only part of those features.
VisualSFM sorts the features in the order of decreasing scales.
[Descriptor Data] is a npoint x 128 unsigned char matrix. Note the feature descriptors are normalized to 512.
[EOF] int eof_marker = (0xff+('E'<<8)+('O'<<16)+('F'<<24));
There's someone that write a concrete example of this file? This file should be generated automatically by my application.
I was wondering why this line is used for in the lucas kanade tracker in opencv:
DataType<cv::detail::deriv_type>::depth
can someone explain it to me?
In OpenCV, the depth of a Mat refers to the type of data contained in the Mat's data buffer. They are represented by integer values which correspond to a given data type. These integers are most commonly abstracted by an appropriate macro definition (e.g. uchar data is represented by the macro CV_8U).
cv::DataType is a type-traits class that provides a method to obtain the corresponding integer value without having to memorize which macro means which data type. There are very few cases where user code needs to use DataType::depth. Much more common is DataType::type.
A simple example shows one possible use of DataType::depth:
cv::Mat uchar_data = cv::Mat::ones(3, 3, CV_8UC1);
cv::Mat float_data;
uchar_data.convertTo(float_data, cv::DataType<float>::depth);
// ^^ This could equivalently be replaced
// by CV_32F macro
float_data.at<float>(0,1) += 0.5f;
std::cout << float_data << std::endl;
I'm looking for a way to get the treble and bass data from a song for some incrementation of time (say 0.1 seconds) and in the range of 0.0 to 1.0. I've googled around but haven't been able to find anything remotely close to what I'm looking for. Ultimately I want to be able to represent the treble and bass level while the song is playing.
Thanks!
Its reasonably easy. You need to perform an FFT and then sum up the bins that interest you. A lot of how you select will depend on the sampling rate of your audio.
You then need to choose an appropriate FFT order to get good information in the frequency bins returned.
So if you do an order 8 FFT you will need 256 samples. This will return you 128 complex pairs.
Next you need to convert these to magnitude. This is actually quite simple. if you are using std::complex you can simply perform a std::abs on the complex number and you will have its magnitude (sqrt( r^2 + i^2 )).
Interestingly at this point there is something called Parseval's theorem. This theorem states that after performinng a fourier transform the sum of the bins returned is equal to the sum of mean squares of the input signal.
This means that to get the amplitude of a specific set of bins you can simply add them together divide by the number of them and then sqrt to get the RMS amplitude value of those bins.
So where does this leave you?
Well from here you need to figure out which bins you are adding together.
A treble tone is defined as above 2000Hz.
A bass tone is below 300Hz (if my memory serves me correctly).
Mids are between 300Hz and 2kHz.
Now suppose your sample rate is 8kHz. The Nyquist rate says that the highest frequency you can represent in 8kHz sampling is 4kHz. Each bin thus represents 4000/128 or 31.25Hz.
So if the first 10 bins (Up to 312.5Hz) are used for Bass frequencies. Bin 10 to Bin 63 represent the mids. Finally bin 64 to 127 is the trebles.
You can then calculate the RMS value as described above and you have the RMS values.
RMS values can be converted to dBFS values by performing 20.0f * log10f( rmsVal );. This will return you a value from 0dB (max amplitude) down to -infinity dB (min amplitude). Be aware amplitudes do not range from -1 to 1.
To help you along, here is a bit of my C++ based FFT class for iPhone (which uses vDSP under the hood):
MacOSFFT::MacOSFFT( unsigned int fftOrder ) :
BaseFFT( fftOrder )
{
mFFTSetup = (void*)vDSP_create_fftsetup( mFFTOrder, 0 );
mImagBuffer.resize( 1 << mFFTOrder );
mRealBufferOut.resize( 1 << mFFTOrder );
mImagBufferOut.resize( 1 << mFFTOrder );
}
MacOSFFT::~MacOSFFT()
{
vDSP_destroy_fftsetup( (FFTSetup)mFFTSetup );
}
bool MacOSFFT::ForwardFFT( std::vector< std::complex< float > >& outVec, const std::vector< float >& inVec )
{
return ForwardFFT( &outVec.front(), &inVec.front(), inVec.size() );
}
bool MacOSFFT::ForwardFFT( std::complex< float >* pOut, const float* pIn, unsigned int num )
{
// Bring in a pre-allocated imaginary buffer that is initialised to 0.
DSPSplitComplex dspscIn;
dspscIn.realp = (float*)pIn;
dspscIn.imagp = &mImagBuffer.front();
DSPSplitComplex dspscOut;
dspscOut.realp = &mRealBufferOut.front();
dspscOut.imagp = &mImagBufferOut.front();
vDSP_fft_zop( (FFTSetup)mFFTSetup, &dspscIn, 1, &dspscOut, 1, mFFTOrder, kFFTDirection_Forward );
vDSP_ztoc( &dspscOut, 1, (DSPComplex*)pOut, 1, num );
return true;
}
It seems that you're looking for Fast Fourier Transform sample code.
It is quite a large topic to cover in an answer.
The tools you will need are already build in iOS: vDSP API
This should help you: vDSP Programming Guide
And there is also a FFT Sample Code available
You might also want to check out iPhoneFFT. Though that code is slighlty
outdated it can help you understand processes "under-the-hood".
Refer to auriotouch2 example from Apple - it has everything from frequency analysis to UI representation of what you want.
can any one help me about how to get the absolute value of a complex matrix.the matrix contains real value in one channel and imaginary value in another one channel.please help me
if s possible means give me some example.
Thanks in advance
Arangarajan
Let's assume you have 2 components: X and Y, two matrices of the same size and type. In your case it can be real/im values.
// n rows, m cols, type float; we assume the following matrices are filled
cv::Mat X(n,m,CV_32F);
cv::Mat Y(n,m,CV_32F);
You can compute the absolute value of each complex number like this:
// create a new matrix for storage
cv::Mat A(n,m,CV_32F,cv::Scalar(0.0));
for(int i=0;i<n;i++){
// pointer to row(i) values
const float* rowi_x = X.ptr<float>(i);
const float* rowi_y = Y.ptr<float>(i);
float* rowi_a = A.ptr<float>(i);
for(int j=0;j<=m;j++){
rowi_a[j] = sqrt(rowi_x[j]*rowi_x[j]+rowi_y[j]*rowi_y[j]);
}
}
If you look in the OpenCV phasecorr.cpp module, there's a function called magSpectrums that does this already and will handle conjugate symmetry-packed DFT results too. I don't think it's exposed by the header file, but it's easy enough to copy it. If you care about speed, make sure you compile with any available SIMD options turned on too because they can make a big difference with this calculation.
I'm a new user to OpenCV. I'm using version 2.3.2 (from the SVN repository).
I have a specific 3-dimensional cv::Mat structure which is 288 x 384 x 10. This represents a 288 x 384 image and the other 10 channels represent a disparity value. I want to find the minimum element and its location. There is a minMaxElem function in OpenCV with it doesn't work with multi-dimensional arrays. Any idea how I can use the channel splitting functions in OpenCV to perform this?
You can use minMaxIdx function to find minimum/maximum on multidimensional array:
void minMaxIdx(InputArray src, double* minVal, double* maxVal,
int* minIdx=0, int* maxIdx=0, InputArray mask=noArray());
Non-zero minIdx and maxIdx should point to the arrays having enough length to store indexes for all dimensions (3 for 3-dimensional Mat).
minVal and maxVal are used to return single minimum/maximum value. They can be 0 if you don't need the values.