Impulse response - low frequences accuracy - signal-processing

I have question probably more in audio processing, than programming at all.
Just for fun, for understand little bit more I made my own plugin to measure impulse response of the filters. Something that allows me to see various equalisers curves. Similar like it happens in Waves QClone plugin - but qClone can also implement those curves to other signals, like regular EQ, but my plugin just measure those curves - as I know VST Plugin Analyser can do similar things.
But with my plugin the problem is accuracy of low frequences, somewhere below 150 Hz it starts to show crazy curves, inappropriate to real EQ changes. But above 150 Hz everything is OK (almost OK - it shows almost perfectly the EQ curves, but has problem to show curves for very narrow Q parameters).
And I was wondering almost whole week, what I do wrong, I tried to change resolution o measured frequencies range, also tried to change buffersize for one impulse. Don’t know what to do and it is annoying hardly :slight_smile: please help me.
My code for measure impulse response is mainly like that:
float freqResolution = 1000.0f; // it’s for set range of measured freq: float minFreqIndex = log10(20.0f)*freqResolution / log10(wSampleRate); float maxFreqIndex = log10(20000.0f)*freqResolution / log10(wSampleRate); for(int sample=(int)minFreqIndex; sample < maxFreqIndex; sample++) {
logScaleFreq = pow(10.0f, log10(wSampleRate) * (float)sample / (freqResolution-1.0f));
_Re = processor.filteredImpulse[0];
_Im = 0.0f;
for (int i=1; i<buffersize; ++i) {
_Re += processor.filteredImpulse[i] * cosf(-(float)i * 2 * double_Pi * logScaleFreq / wSampleRate);
_Im += processor.filteredImpulse[i] * sinf(-(float)i * 2 * double_Pi * logScaleFreq / wSampleRate);
float _Re_2 = pow(_Re, 2.0f);
float _Im_2 = pow(_Im, 2.0f);
float _Hf = pow(_Re_2 + _Im_2, 0.5f);
logScale_dB = 20*log10(_Hf);
Mainly it’s something like that, and then I print it as a logScale_dB in the function of logScaleFreq.
For any help, great thanks in advance.
Of course
It’s an filtered data from array of one impulse, something like [1, 0, 0, 0, 0, 0, 0, 0, 0…]
with length dependent on buffersize. But there is always only one 1, and many of zeros, like I think impulse should be :slight_smile:


Vector Matrix multiplication via ARM NEON

I have a task - to multiply big row vector (10 000 elements) via big column-major matrix (10 000 rows, 400 columns). I decided to go with ARM NEON since I'm curious about this technology and would like to learn more about it.
Here's a working example of vector matrix multiplication I wrote:
//float* vec_ptr - a pointer to vector
//float* mat_ptr - a pointer to matrix
//float* out_ptr - a pointer to output vector
//int matCols - matrix columns
//int vecRows - vector rows, the same as matrix
for (int i = 0, max_i = matCols; i < max_i; i++) {
for (int j = 0, max_j = vecRows - 3; j < max_j; j+=4, mat_ptr+=4, vec_ptr+=4) {
float32x4_t mat_val = vld1q_f32(mat_ptr); //get 4 elements from matrix
float32x4_t vec_val = vld1q_f32(vec_ptr); //get 4 elements from vector
float32x4_t out_val = vmulq_f32(mat_val, vec_val); //multiply vectors
float32_t total_sum = vaddvq_f32(out_val); //sum elements of vector together
out_ptr[i] += total_sum;
vec_ptr = &myVec[0]; //switch ptr back again to zero element
The problem is that it's taking very long time to compute - 30 ms on iPhone 7+ when my goal is 1 ms or even less if it's possible. Current execution time is understandable since I launch multiplication iteration 400 * (10000 / 4) = 1 000 000 times.
Also, I tried to process 8 elements instead of 4. It seems to help, but numbers still very far from my goal.
I understand that I might make some horrible mistakes since I'm newbie with ARM NEON. And I would be happy if someone can give me some tip how I can optimize my code.
Also - is it worth doing big vector-matrix multiplication via ARM NEON? Does this technology fit well for such purpose?
Your code is completely flawed: it iterates 16 times assuming both matCols and vecRows are 4. What's the point of SIMD then?
And the major performance problem lies in float32_t total_sum = vaddvq_f32(out_val);:
You should never convert a vector to a scalar inside a loop since it causes a pipeline hazard that costs around 15 cycles everytime.
The solution:
float32x4x4_t myMat;
float32x2_t myVecLow, myVecHigh;
myVecLow = vld1_f32(&pVec[0]);
myVecHigh = vld1_f32(&pVec[2]);
myMat = vld4q_f32(pMat);
myMat.val[0] = vmulq_lane_f32(myMat.val[0], myVecLow, 0);
myMat.val[0] = vmlaq_lane_f32(myMat.val[0], myMat.val[1], myVecLow, 1);
myMat.val[0] = vmlaq_lane_f32(myMat.val[0], myMat.val[2], myVecHigh, 0);
myMat.val[0] = vmlaq_lane_f32(myMat.val[0], myMat.val[3], myVecHigh, 1);
vst1q_f32(pDst, myMat.val[0]);
Compute all the four rows in a single pass
Do a matrix transpose (rotation) on-the-fly by vld4
Do vector-scalar multiply-accumulate instead of vector-vector multiply and horizontal add that causes the pipeline hazards.
You were asking if SIMD is suitable for matrix operations? A simple "yes" would be a monumental understatement. You don't even need a loop for this.

Resize organized point cloud

I have an organized point cloud (1280 * 720) captured from a 3D camera. I just wonder whether there's a method to resize(cut down) this point cloud to a smaller size (eg. 128 * 72), when keeping this cloud organized.
(I think this shouldn't be the same as down sampling. "Resize" means like zooming an image).
I am using Point Cloud Library 1.8.0 but stuck with this.
Any advice is welcome, thanks first!
The answer of Rooscannon is in particular correct, but has some bugs in it. The correct uniform subsampling of a organized point cloud is as follows:
// Downsampling or keypoint extraction
int scale = 3;
PointCloud<PointXYZRGB>::Ptr keypoints (new PointCloud<PointXYZRGB>);
keypoints->width = cloud->width / scale;
keypoints->height = cloud->height / scale;
keypoints->points.resize(keypoints->width * keypoints->height);
for( size_t i = 0, ii = 0; i < keypoints->height; ii += scale, i++){
for( size_t j = 0, jj = 0; j < keypoints->width; jj += scale, j++){
keypoints->at(j, i) = cloud->at(jj, ii); //at(column, row)
So the loop conditions, the indexing and the initialization of the subsampled point cloud are different. Otherwise, the subsampled point cloud would not be organized anymore.
Just take a point out of the number of time you want to reduce your cloud,
something like that shloud work :
for (pcl::PointCloud<pcl::PointXYZ>::const_iterator it = src->begin(); it< src->end(); it+=times)
Only problem is the cloud might containt some NaN values. To correct it just set is_dense to false into dest and call removeNaNFromPointCloud on it.
Hope this can help you !
Can't comment but removing NaNs from your point cloud by default makes it unorganized. Quite likely the NaNs are there as dummy points in case your instrument was not able to observe a point in the matrix just to keep the matrix dimensions correct. Removing those breaks the matrix structure and you'll have a different amount of points than your 1280 * 720 matrix would expect.
If you wish to down sample an organized point cloud say by a factor of 2, you could try something like
int scale = 2;
pcl::PointCloud<pcl::your_point_type> down_sampled_cloud;
down_sampled_cloud.width = original_cloud.width / scale;
down_sampled_cloud.height = original_cloud.height / scale;
for( int ii = 0; ii < original_cloud.height; ii+=scale){
for( int jj = 0; jj < original_cloud.width; jj+=scale ){
Change scale to what you wish.
This method just down samples the original point cloud, it will not interpolate points between existing points. Scaling by a decimal factor is trickier and might yield unwanted results if the surface is not continuous.

More precise frequency from FFT with pure sine tones

I'm currently using FFT code from here:
Here's the code from the 2 relevant methods:
-(void)createFFTWithBufferSize:(float)bufferSize withAudioData:(float*)data {
// Setup the length
_log2n = log2f(bufferSize);
// Calculate the weights array. This is a one-off operation.
_FFTSetup = vDSP_create_fftsetup(_log2n, FFT_RADIX2);
// For an FFT, numSamples must be a power of 2, i.e. is always even
int nOver2 = bufferSize/2;
// Populate *window with the values for a hamming window function
float *window = (float *)malloc(sizeof(float)*bufferSize);
vDSP_hamm_window(window, bufferSize, 0);
// Window the samples
vDSP_vmul(data, 1, window, 1, data, 1, bufferSize);
// Define complex buffer
_A.realp = (float *) malloc(nOver2*sizeof(float));
_A.imagp = (float *) malloc(nOver2*sizeof(float));
-(void)updateFFTWithBufferSize:(float)bufferSize withAudioData:(float*)data {
// For an FFT, numSamples must be a power of 2, i.e. is always even
int nOver2 = bufferSize/2;
// Pack samples:
// C(re) -> A[n], C(im) -> A[n+1]
vDSP_ctoz((COMPLEX*)data, 2, &_A, 1, nOver2);
// Perform a forward FFT using fftSetup and A
// Results are returned in A
vDSP_fft_zrip(_FFTSetup, &_A, 1, _log2n, FFT_FORWARD);
// Convert COMPLEX_SPLIT A result to magnitudes
float amp[nOver2];
float maxMag = 0;
for(int i=0; i<nOver2; i++) {
// Calculate the magnitude
float mag = _A.realp[i]*_A.realp[i]+_A.imagp[i]*_A.imagp[i];
maxMag = mag > maxMag ? mag : maxMag;
for(int i=0; i<nOver2; i++) {
// Calculate the magnitude
float mag = _A.realp[i]*_A.realp[i]+_A.imagp[i]*_A.imagp[i];
// Bind the value to be less than 1.0 to fit in the graph
amp[i] = [EZAudio MAP:mag leftMin:0.0 leftMax:maxMag rightMin:0.0 rightMax:1.0];
I've modified the updateFFTWithBufferSize method above so that I could get the frequency in Hz like this:
for(int i=0; i<nOver2; i++) {
// Calculate the magnitude
float mag = _A.realp[i]*_A.realp[i]+_A.imagp[i]*_A.imagp[i];
if(maxMag < mag) {
_i_max = i;
maxMag = mag > maxMag ? mag : maxMag;
float frequency = _i_max / bufferSize * 44100;
NSLog(#"FREQUENCY: %f", frequency);
I've generated a few pure sine tones with Audacity at different frequencies to test with. The issue I'm seeing is that the code is returning the same frequency for two different sine tones that are relatively close in value.
For example:
A sine tone generated at 19255Hz will show up from FFT as 19293.750000Hz. So will a sine tone generated at 19330Hz. Something must be off in the calculations.
Any assistance in how I can modify the above code to get a more precise FFT frequency reading for pure sine tones is greatly appreciated.
Thank you!
You can get a rough frequency estimate by fitting a parabolic curve to the 3 FFT bin magnitudes around the peak magnitude bin, and then finding the extrema of that parabola.
A better estimate can be created by using the transform of your FFT window as an interpolation kernel, and doing successive approximation to refine an estimate of the maxima of the interpolated points. (Zero padding and using a much longer FFT will give you a similar type of interpolated estimate.)
The easy way for a stationary signal is, if possible, to just use a longer FFT with more samples that span a longer time interval.
You've got a number of problems going on here:
1) Your frequency axis spacing is fmax/N, or about 80Hz, so you're not going to get a resolution much better than that.
2) You're signal is very close to the Nyquist frequency (ie, 20KHz/44.1KHz is almost 0.5), and when you're this close to the Nyquist limit you need to be very careful if you want accurate results. (That is, at 20KHz, you're only recording about two data points for each full oscillation cycle.)
3) Since 20KHz is at the edge of human hearing (and higher for most people), many microphones don't really worry about it. Here's a measurement for the iPhone.
Perhaps your sampling frequency isn't high enough?
The FFT is a very good method to get a spectrum if you don't know anything about the input. If you know that the input is a pure sine wave, you can do much better. Start off by calculating the FFT to get a rough idea where the sine is. Get the minimum and maximum to estimate the amplitude [or get that from the FFT - square all inputs, add them, take square root] , get the phase at the begin and end given the estimated frequency and amplitude.
In general, you'll find that the phase does not match. That's because the phase at the end is off by 2*Δf * N. f - Δf is a better estimate of the frequency. Keep in mind that such a method is super noise sensitive. The method works because the input is a pure sine wave, and noise is everything but that. Using this method iteratively blows up quickly; you even hit rounding errors (not sinusoidal either)
Another similar trick is subtracting the estimated wave. The difference between two sines is the product of two sines, one with the frequencies added (in your case, ±38.5 kHz) and one with the frequencies subtracted (Δ_f_, less than 100 Hz). See also Heterodyne detection

Object slowing into target coordinate over specified number of frames

I searched and couldn't find a solution to this (maybe I'm using the wrong terms?), and I feel a bit silly because I believe I'm overlooking something simple. Here's what I'd like to do:
I have an object in my game that needs to travel a specific x distance in a specified number of frames. But, I'd like to have it ease into the target point instead of moving the same distance/velocity every frame.
So rather than just dividing the number of frames by the distance, and applying that velocity each frame, I'd I'd like to slowly ease itself into the target coordinate in a specified number of frames.
So for example (I'm terrible at explaining, perhaps an example could help)... say I have a spaceship, and he needs to travel 32 pixels to the right. I'd like to input a value that specifies he'll travel 32 pixels in... say, 4 frames. In a linear system, he'd travel 8 pixels each frame, but I want him to ease into it, so maybe on frame 1 (and I'm using completely random values) he'd move 16 pixels, frame 2 he'd move 10, frame 3 he'd move 4, and frame 4 he'd move 2, and he'll end up traveling the 32 pixels distance in those 4 frames, slowly easing into the target point.
The first thing that came to mind was using exponent/logarithms somehow, but I'd like some suggestions. Any help would be greatly appreciated, thanks! :D
The general solution is the following:
You have a value (distanceTravelled) which has a range from 0.0 to 1.0.
You have a function (fancyCurve) which takes in a float from 0.0 to 1.0 and remaps it from 0.0 to 1.0, except in a curve.
Every frame, you increase distanceTravelled by a linear amount. But you get the actual value by calling fancyCurve(distanceTravelled).
Here's a pseudo-code example
float linearFunction(float t) { return t; }
float speedUpFunction(float t) { return t*t; }
float slowDownFunction(float t)
//do your own research. Theres plenty of curves from
float easingCurve(float t) {
//choose one.
//return linearFunction(t);
//return speedUpFunction(t);
return slowDownFunction(t);
int main() {
//setting up a spaceship with starting x coordinate
spaceshipX = 2;
spaceshipTargetX = 34;
animationFrames = 8;
//Below is the actual algorithm
distanceTravelled = 0;
startValue = spaceshipX;
animRange = spaceshipTargetX - startValue; // 32
distancePerFrame = 1.0f / animationFrames; // 0.125, since 8 frames
while (distanceTravelled < 1.0f) {
distanceTravelled += distancePerFrame;
spaceshipX = startValue + (easingCurve(distanceTravelled) * animRange);

Laplacian of gaussian filter use

This is a formula for LoG filtering:
Also in applications with LoG filtering I see that function is called with only one parameter:
I want to try LoG filtering using that formula (previous attempt was by gaussian filter and then laplacian filter with some filter-window size )
But looking at that formula I can't understand how the size of filter is connected with this formula, does it mean that the filter size is fixed?
Can you explain how to use it?
As you've probably figured out by now from the other answers and links, LoG filter detects edges and lines in the image. What is still missing is an explanation of what σ is.
σ is the scale of the filter. Is a one-pixel-wide line a line or noise? Is a line 6 pixels wide a line or an object with two distinct parallel edges? Is a gradient that changes from black to white across 6 or 8 pixels an edge or just a gradient? It's something you have to decide, and the value of σ reflects your decision — the larger σ is the wider are the lines, the smoother the edges, and more noise is ignored.
Do not get confused between the scale of the filter (σ) and the size of the discrete approximation (usually called stencil). In Paul's link σ=1.4 and the stencil size is 9. While it is usually reasonable to use stencil size of 4σ to 6σ, these two quantities are quite independent. A larger stencil provides better approximation of the filter, but in most cases you don't need a very good approximation.
This was something that confused me too, and it wasn't until I had to do the same as you for a uni project that I understood what you were supposed to do with the formula!
You can use this formula to generate a discrete LoG filter. If you write a bit of code to implement that formula, you can then to generate a filter for use in image convolution. To generate, say a 5x5 template, simply call the code with x and y ranging from -2 to +2.
This will generate the values to use in a LoG template. If you graph the values this produces you should see the "mexican hat" shape typical of this filter, like so:
You can fine tune the template by changing how wide it is (the size) and the sigma value (how broad the peak is). The wider and broader the template the less affected by noise the result will be because it will operate over a wider area.
Once you have the filter, you can apply it to the image by convolving the template with the image. If you've not done this before, check out these few tutorials.
java applet tutorials more mathsy.
Essentially, at each pixel location, you "place" your convolution template, centred at that pixel. You then multiply the surrounding pixel values by the corresponding "pixel" in the template and add up the result. This is then the new pixel value at that location (typically you also have to normalise (scale) the output to bring it back into the correct value range).
The code below gives a rough idea of how you might implement this. Please forgive any mistakes / typos etc. as it hasn't been tested.
I hope this helps.
private float LoG(float x, float y, float sigma)
// implement formula here
return (1 / (Math.PI * sigma*sigma*sigma*sigma)) * //etc etc - also, can't remember the code for "to the power of" off hand
private void GenerateTemplate(int templateSize, float sigma)
// Make sure it's an odd number for convenience
if(templateSize % 2 == 1)
// Create the data array
float[][] template = new float[templateSize][templatesize];
// Work out the "min and max" values. Log is centered around 0, 0
// so, for a size 5 template (say) we want to get the values from
// -2 to +2, ie: -2, -1, 0, +1, +2 and feed those into the formula.
int min = Math.Ceil(-templateSize / 2) - 1;
int max = Math.Floor(templateSize / 2) + 1;
// We also need a count to index into the data array...
int xCount = 0;
int yCount = 0;
for(int x = min; x <= max; ++x)
for(int y = min; y <= max; ++y)
// Get the LoG value for this (x,y) pair
template[xCount][yCount] = LoG(x, y, sigma);
Just for visualization purposes, here is a simple Matlab 3D colored plot of the Laplacian of Gaussian (Mexican Hat) wavelet. You can change the sigma(σ) parameter and see its effect on the shape of the graph:
sigmaSq = 0.5 % Square of σ parameter
[x y] = meshgrid(linspace(-3,3), linspace(-3,3));
z = (-1/(pi*(sigmaSq^2))) .* (1-((x.^2+y.^2)/(2*sigmaSq))) .*exp(-(x.^2+y.^2)/(2*sigmaSq));
You could also compare the effects of the sigma parameter on the Mexican Hat doing the following:
t = -5:0.01:5;
sigma = 0.5;
mexhat05 = exp(-t.*t/(2*sigma*sigma)) * 2 .*(t.*t/(sigma*sigma) - 1) / (pi^(1/4)*sqrt(3*sigma));
sigma = 1;
mexhat1 = exp(-t.*t/(2*sigma*sigma)) * 2 .*(t.*t/(sigma*sigma) - 1) / (pi^(1/4)*sqrt(3*sigma));
sigma = 2;
mexhat2 = exp(-t.*t/(2*sigma*sigma)) * 2 .*(t.*t/(sigma*sigma) - 1) / (pi^(1/4)*sqrt(3*sigma));
plot(t, mexhat05, 'r', ...
t, mexhat1, 'b', ...
t, mexhat2, 'g');
Or simply use the Wavelet toolbox provided by Matlab as follows:
lb = -5; ub = 5; n = 1000;
[psi,x] = mexihat(lb,ub,n);
plot(x,psi), title('Mexican hat wavelet')
I found this useful when implementing this for edge detection in computer vision. Although not the exact answer, hope this helps.
It appears to be a continuous circular filter whose radius is sqrt(2) * sigma. If you want to implement this for image processing you'll need to approximate it.
There's an example for sigma = 1.4 here:
