OpenCV : How do I find the minimum element along a specific dimension? - opencv

I'm a new user to OpenCV. I'm using version 2.3.2 (from the SVN repository).
I have a specific 3-dimensional cv::Mat structure which is 288 x 384 x 10. This represents a 288 x 384 image and the other 10 channels represent a disparity value. I want to find the minimum element and its location. There is a minMaxElem function in OpenCV with it doesn't work with multi-dimensional arrays. Any idea how I can use the channel splitting functions in OpenCV to perform this?

You can use minMaxIdx function to find minimum/maximum on multidimensional array:
void minMaxIdx(InputArray src, double* minVal, double* maxVal,
int* minIdx=0, int* maxIdx=0, InputArray mask=noArray());
Non-zero minIdx and maxIdx should point to the arrays having enough length to store indexes for all dimensions (3 for 3-dimensional Mat).
minVal and maxVal are used to return single minimum/maximum value. They can be 0 if you don't need the values.

Related

How are floating-point pixel values converted to integer values?

How does image library (such as PIL, OpenCV, etc) convert floating-point values to integer pixel values?
For example
import numpy as np
from PIL import Image
# Creates a random image and saves in a file
def get_random_img(m=0, s=1, fname='temp.png'):
im = m + s * np.random.randn(60, 60, 3) # For eg. min: -3.8947058634971179, max: 3.6822041760496904
print(im[0, 0]) # for eg. array([ 0.36234732, 0.96987366, 0.08343])
imp = Image.fromarray(im, 'RGB') # (*)
print(np.array(imp)[0, 0]) # [140 , 74, 217]
imp.save(fname)
return im, imp
For the above method, an example is provided in the comment (which is randomly produced). My question is: how does (*) convert ndarray (which can range from - infinity to plus infinity) to pixel values between 0 and 255?
I tried to investigate the Pil.Image.fromarray method and eventually ended by at line #798 d.decode(data) within Pil.Image.Image().frombytes method. I could find the implementation of decode method, thus unable to know what computation goes behind the conversion.
My initial thought was that maybe the method use minimum (to 0) and maximum (to 255) value from the array and then map all the other values accordingly between 0 and 255. But upon investigations, I found out that's not what is happening. Moreover, how does it handle when the values of the array range between 0 and 1 or any other range of values?
Some libraries assume that floating-point pixel values are between 0 and 1, and will linearly map that range to 0 and 255 when casting to 8-bit unsigned integer. Some others will find the minimum and maximum values and map those to 0 and 255. You should always explicitly do this conversion if you want to be sure of what happened to your data.
In general, a pixel does not need to be 8-bit unsigned integer. A pixel can have any numerical type. Usually a pixel intensity represents an amount of light, or a density of some sort, but this is not always the case. Any physical quantity can be sampled in 2 or more dimensions. The range of meaningful values thus depends on what is imaged. Negative values are often also meaningful.
Many cameras have 8-bit precision when converting light intensity to a digital number. Likewise, displays typically have an b-bit intensity range. This is the reason many image file formats store only 8-bit unsigned integer data. However, some cameras have 12 bits or more, and some processes derive pixel data with a higher precision that one does not want to quantize. Therefore formats such as TIFF and ICS will allow you to save images in just about any numeric format you can think of.
I'm afraid it has done nothing anywhere near as clever as you hoped! It has merely interpreted the first byte of the first float as a uint8, then the second byte as another uint8...
from random import random, seed
import numpy as np
from PIL import Image
# Generate repeatable random data, so other folks get the same results
np.random.seed(42)
# Make a single RGB pixel
im = np.random.randn(1, 1, 3)
# Print the floating point values - not that we are interested in them
print(im)
# OUTPUT: [[[ 0.49671415 -0.1382643 0.64768854]]]
# Save that pixel to a file so we can dump it
im.tofile('array.bin')
# Now make a PIL Image from it and print the uint8 RGB values
imp = Image.fromarray(im, 'RGB')
print(imp.getpixel((0,0)))
# OUTPUT: (124, 48, 169)
So, PIL has interpreted our data as RGB=124/48/169
Now look at the hex we dumped. It is 24 bytes long, i.e. 3 float64 (8-byte) values, one for red, one for green and one for blue for the 1 pixel in our image:
xxd array.bin
Output
00000000: 7c30 a928 2aca df3f 2a05 de05 a5b2 c1bf |0.(*..?*.......
00000010: 685e 2450 ddb9 e43f h^$P...?
And the first byte (7c) has become 124, the second byte (30) has become 48 and the third byte (a9) has become 169.
TLDR; PIL has merely taken the first byte of the first float as the Red uint8 channel of the first pixel, then the second byte of the first float as the Green uint8 channel of the first pixel and the third byte of the first float as the Blue uint8 channel of the first pixel.

Using own descriptor and feature in Visual Structure From Motion

Hi i'm using program Visual Structure From Motion to recover the structure of a 3d-place. However, i 've already computed my descriptors and my features; so i want to use them in Visual Structure From Motion.I've read that the file which contains informations about descriptor should has the following pattern:
[Header][Location Data][Descriptor Data][EOF]
[Header] = int[5] = {name, version, npoint, 5, 128};
name = ('S'+ ('I'<<8)+('F'<<16)+('T'<<24));
version = ('V'+('4'<<8)+('.'<<16)+('0'<<24)); or ('V'+('5'<<8)+('.'<<16)+('0'<<24)) if containing color info
npoint = number of features.
[Location Data] is a npoint x 5 float matrix and each row is [x, y, color, scale, orientation].
Write color by casting the float to unsigned char[4]
scale & orientation are only used for visualization, so you can simply write 0 for them
Sort features in the order of decreasing importance, since VisualSFM may use only part of those features.
VisualSFM sorts the features in the order of decreasing scales.
[Descriptor Data] is a npoint x 128 unsigned char matrix. Note the feature descriptors are normalized to 512.
[EOF] int eof_marker = (0xff+('E'<<8)+('O'<<16)+('F'<<24));
There's someone that write a concrete example of this file? This file should be generated automatically by my application.

Extract Treble and Bass from audio in iOS

I'm looking for a way to get the treble and bass data from a song for some incrementation of time (say 0.1 seconds) and in the range of 0.0 to 1.0. I've googled around but haven't been able to find anything remotely close to what I'm looking for. Ultimately I want to be able to represent the treble and bass level while the song is playing.
Thanks!
Its reasonably easy. You need to perform an FFT and then sum up the bins that interest you. A lot of how you select will depend on the sampling rate of your audio.
You then need to choose an appropriate FFT order to get good information in the frequency bins returned.
So if you do an order 8 FFT you will need 256 samples. This will return you 128 complex pairs.
Next you need to convert these to magnitude. This is actually quite simple. if you are using std::complex you can simply perform a std::abs on the complex number and you will have its magnitude (sqrt( r^2 + i^2 )).
Interestingly at this point there is something called Parseval's theorem. This theorem states that after performinng a fourier transform the sum of the bins returned is equal to the sum of mean squares of the input signal.
This means that to get the amplitude of a specific set of bins you can simply add them together divide by the number of them and then sqrt to get the RMS amplitude value of those bins.
So where does this leave you?
Well from here you need to figure out which bins you are adding together.
A treble tone is defined as above 2000Hz.
A bass tone is below 300Hz (if my memory serves me correctly).
Mids are between 300Hz and 2kHz.
Now suppose your sample rate is 8kHz. The Nyquist rate says that the highest frequency you can represent in 8kHz sampling is 4kHz. Each bin thus represents 4000/128 or 31.25Hz.
So if the first 10 bins (Up to 312.5Hz) are used for Bass frequencies. Bin 10 to Bin 63 represent the mids. Finally bin 64 to 127 is the trebles.
You can then calculate the RMS value as described above and you have the RMS values.
RMS values can be converted to dBFS values by performing 20.0f * log10f( rmsVal );. This will return you a value from 0dB (max amplitude) down to -infinity dB (min amplitude). Be aware amplitudes do not range from -1 to 1.
To help you along, here is a bit of my C++ based FFT class for iPhone (which uses vDSP under the hood):
MacOSFFT::MacOSFFT( unsigned int fftOrder ) :
BaseFFT( fftOrder )
{
mFFTSetup = (void*)vDSP_create_fftsetup( mFFTOrder, 0 );
mImagBuffer.resize( 1 << mFFTOrder );
mRealBufferOut.resize( 1 << mFFTOrder );
mImagBufferOut.resize( 1 << mFFTOrder );
}
MacOSFFT::~MacOSFFT()
{
vDSP_destroy_fftsetup( (FFTSetup)mFFTSetup );
}
bool MacOSFFT::ForwardFFT( std::vector< std::complex< float > >& outVec, const std::vector< float >& inVec )
{
return ForwardFFT( &outVec.front(), &inVec.front(), inVec.size() );
}
bool MacOSFFT::ForwardFFT( std::complex< float >* pOut, const float* pIn, unsigned int num )
{
// Bring in a pre-allocated imaginary buffer that is initialised to 0.
DSPSplitComplex dspscIn;
dspscIn.realp = (float*)pIn;
dspscIn.imagp = &mImagBuffer.front();
DSPSplitComplex dspscOut;
dspscOut.realp = &mRealBufferOut.front();
dspscOut.imagp = &mImagBufferOut.front();
vDSP_fft_zop( (FFTSetup)mFFTSetup, &dspscIn, 1, &dspscOut, 1, mFFTOrder, kFFTDirection_Forward );
vDSP_ztoc( &dspscOut, 1, (DSPComplex*)pOut, 1, num );
return true;
}
It seems that you're looking for Fast Fourier Transform sample code.
It is quite a large topic to cover in an answer.
The tools you will need are already build in iOS: vDSP API
This should help you: vDSP Programming Guide
And there is also a FFT Sample Code available
You might also want to check out iPhoneFFT. Though that code is slighlty
outdated it can help you understand processes "under-the-hood".
Refer to auriotouch2 example from Apple - it has everything from frequency analysis to UI representation of what you want.

How to calculate the Absolute value of complex numbers in opencv

can any one help me about how to get the absolute value of a complex matrix.the matrix contains real value in one channel and imaginary value in another one channel.please help me
if s possible means give me some example.
Thanks in advance
Arangarajan
Let's assume you have 2 components: X and Y, two matrices of the same size and type. In your case it can be real/im values.
// n rows, m cols, type float; we assume the following matrices are filled
cv::Mat X(n,m,CV_32F);
cv::Mat Y(n,m,CV_32F);
You can compute the absolute value of each complex number like this:
// create a new matrix for storage
cv::Mat A(n,m,CV_32F,cv::Scalar(0.0));
for(int i=0;i<n;i++){
// pointer to row(i) values
const float* rowi_x = X.ptr<float>(i);
const float* rowi_y = Y.ptr<float>(i);
float* rowi_a = A.ptr<float>(i);
for(int j=0;j<=m;j++){
rowi_a[j] = sqrt(rowi_x[j]*rowi_x[j]+rowi_y[j]*rowi_y[j]);
}
}
If you look in the OpenCV phasecorr.cpp module, there's a function called magSpectrums that does this already and will handle conjugate symmetry-packed DFT results too. I don't think it's exposed by the header file, but it's easy enough to copy it. If you care about speed, make sure you compile with any available SIMD options turned on too because they can make a big difference with this calculation.

does opencv flann library support integer data?

Hi I am trying to do nearest neighbor queries on integer data.
It seems that cv::flann does not support this. Is this true?
Yes, it is possible to use FLANN nearest neighbor searches on integer data. You need to use a distance measure for integers. Some distance measures are templates, parameterized on data type (as in the example below), others have hard coded types (e.g. HammingLUT has unsigned char element type and int result (distance) type). You can also implement your own distance measure, see <opencv2/flann/dist.h> for details.
Example - a quote from the code that uses unsigned char data:
// we use euclidean distances on unsigned chars:
typedef cv::flann::L2<unsigned char> Distance_U8;
cv::flann::GenericIndex< Distance_U8 > * m_flann;
// ...
// we have 3d features
cv::Mat features( features_count, 3, CV_8UC1 );
// ... fill the features matrix ...
// ... build the index ...
m_flann = new cv::flann::GenericIndex< Distance_U8 > (features, params);
// ...
// how many neighbours per query?
in knn = 5;
// search params - see documentation
cvflann::SearchParams params;
// prepare the matrices
// query data - unsigned chars, 3d (like features)
cv::Mat input_1( n_pixels, 3, CV_8UC1 ),
// indices into features array - integers
indices_1( n_pixels, knn, CV_32S ),
// distances - floats (even with integer data distances are floats)
dists_1( n_pixels, knn, CV_32F );
m_flann->knnSearch( input_1, indices_1, dists_1, 1, params);
No, FLANN is for float descriptors only. Although poorly documented the OpenCV set of matchers and descriptors must be used carefully.
There is a bug report on the ros trac explaining in more detail, but basically descriptors and matchers only handle certain types of data, and this must be respected. I've included an extract from the previously mentioned page here for reference:
Descriptors:
float descriptors: SIFT, SURF
uchar descriptors: ORB BRIEF
Matchers:
for float descriptor: FlannBased BruteForce BruteForce-L1
for uchar descriptor: BruteForce-Hamming BruteForce-HammingLUT

Resources