Convolution with a bias in Leptonica - image-processing

I want to filter a pix with a convolution kernel but with a bias and i don't see how to "emulate" the Bias using Leptonica API.
So far i have:
PIX* pixs = pixRead("file.png");
L_KERNEL* kel = kernelCreatFromString( 7, 7, 3, 3, "..." );
PIX* pixd = pixConvolve( pixs, kel, 8, 1 );
Any ideas how to emulate the classical "Bias"? I tried to add it's value it to each pixel of the image before or after the pixConvolve but the result is not the one observed with most image processing software.

By "bias", I am assuming that you want to shift the result so that all pixel values are non-negative.
In the notes for pixConvolve(), it says that the absolute value is taken to avoid negative output. It also says that if you wish to keep the negative values, use fpixConvolve() instead, which operates on an FPix and generates an FPix.
If you want a biased result without clipping, it is in general necessary to do the following:
(1) pixConvertToFpix() -- convert to an FPix
(2) fpixConvolve() -- do the convolution on the FPix, producing an FPix
(3) fpixGetMin() -- determine the bias required to make all values nonzero
(4) fpixAddMultConstant() -- add the bias to the FPix
(5) fpixGetMax() -- find the max value; if > 255, you need a 16 bpp Pix to represent it
(6) fpixConvertToPix -- convert back to a pix
Perhaps the leptonica maintainer (me) should bundle this up into a simple interface ;-)
OK, here's a function, following the outline that I wrote above, that should give enough flexibility to do these convolutions.
/*!
* pixConvolveWithBias()
* Input: pixs (8 bpp; no colormap)
* kel1
* kel2 (can be null; use if separable)
* force8 (if 1, force output to 8 bpp; otherwise, determine
* output depth by the dynamic range of pixel values)
* &bias (<return> applied bias)
* Return: pixd (8 or 16 bpp)
*
* Notes:
* (1) This does a convolution with either a single kernel or
* a pair of separable kernels, and automatically applies whatever
* bias (shift) is required so that the resulting pixel values
* are non-negative.
* (2) If there are no negative values in the kernel, a normalized
* convolution is performed, with 8 bpp output.
* (3) If there are negative values in the kernel, the pix is
* converted to an fpix, the convolution is done on the fpix, and
* a bias (shift) may need to be applied.
* (4) If force8 == TRUE and the range of values after the convolution
* is > 255, the output values will be scaled to fit in
* [0 ... 255].
* If force8 == FALSE, the output will be either 8 or 16 bpp,
* to accommodate the dynamic range of output values without
* scaling.
*/
PIX *
pixConvolveWithBias(PIX *pixs,
L_KERNEL *kel1,
L_KERNEL *kel2,
l_int32 force8,
l_int32 *pbias)
{
l_int32 outdepth;
l_float32 min1, min2, min, minval, maxval, range;
FPIX *fpix1, *fpix2;
PIX *pixd;
PROCNAME("pixConvolveWithBias");
if (!pixs || pixGetDepth(pixs) != 8)
return (PIX *)ERROR_PTR("pixs undefined or not 8 bpp", procName, NULL);
if (pixGetColormap(pixs))
return (PIX *)ERROR_PTR("pixs has colormap", procName, NULL);
if (!kel1)
return (PIX *)ERROR_PTR("kel1 not defined", procName, NULL);
/* Determine if negative values can be produced in convolution */
kernelGetMinMax(kel1, &min1, NULL);
min2 = 0.0;
if (kel2)
kernelGetMinMax(kel2, &min2, NULL);
min = L_MIN(min1, min2);
if (min >= 0.0) {
if (!kel2)
return pixConvolve(pixs, kel1, 8, 1);
else
return pixConvolveSep(pixs, kel1, kel2, 8, 1);
}
/* Bias may need to be applied; convert to fpix and convolve */
fpix1 = pixConvertToFPix(pixs, 1);
if (!kel2)
fpix2 = fpixConvolve(fpix1, kel1, 1);
else
fpix2 = fpixConvolveSep(fpix1, kel1, kel2, 1);
fpixDestroy(&fpix1);
/* Determine the bias and the dynamic range.
* If the dynamic range is <= 255, just shift the values by the
* bias, if any.
* If the dynamic range is > 255, there are two cases:
* (1) the output depth is not forced to 8 bpp ==> outdepth = 16
* (2) the output depth is forced to 8 ==> linearly map the
* pixel values to [0 ... 255]. */
fpixGetMin(fpix2, &minval, NULL, NULL);
fpixGetMax(fpix2, &maxval, NULL, NULL);
range = maxval - minval;
*pbias = (minval < 0.0) ? -minval : 0.0;
fpixAddMultConstant(fpix2, *pbias, 1.0); /* shift: min val ==> 0 */
if (range <= 255 || !force8) { /* no scaling of output values */
outdepth = (range > 255) ? 16 : 8;
} else { /* scale output values to fit in 8 bpp */
fpixAddMultConstant(fpix2, 0.0, (255.0 / range));
outdepth = 8;
}
/* Convert back to pix; it won't do any clipping */
pixd = fpixConvertToPix(fpix2, outdepth, L_CLIP_TO_ZERO, 0);
fpixDestroy(&fpix2);
return pixd;
}

Here is the solution as i needed it based on the Dan input.
/*!
* pixConvolveWithBias()
* Input: pixs (8 bpp; no colormap)
* kel1
* kel2 (can be null; use if separable)
* outdepth (of pixd: 8, 16 or 32)
* normflag (1 to normalize kernel to unit sum; 0 otherwise)
* bias
* Return: pixd
*
* Notes:
* (1) This does a convolution with either a single kernel or
* a pair of separable kernels, and automatically applies whatever
* bias (shift) is required so that the resulting pixel values
* are non-negative.
* (2) If there are no negative values in the kernel, a convolution
* is performed and bias added.
* (3) If there are negative values in the kernel, the pix is
* converted to an fpix, the convolution is done on the fpix, and
* a bias (shift) is applied.
*/
PIX *
pixConvolveWithBias(PIX *pixs,
L_KERNEL *kel1,
L_KERNEL *kel2,
l_int32 outdepth,
l_int32 normflag,
l_int32 bias)
{
l_float32 min1, min2, min, minval, maxval, range;
FPIX *fpix1, *fpix2;
PIX *pixd;
PROCNAME("pixConvolveWithBias");
if (!pixs || pixGetDepth(pixs) != 8)
return (PIX *)ERROR_PTR("pixs undefined or not 8 bpp", procName, NULL);
if (pixGetColormap(pixs))
return (PIX *)ERROR_PTR("pixs has colormap", procName, NULL);
if (!kel1)
return (PIX *)ERROR_PTR("kel1 not defined", procName, NULL);
/* Determine if negative values can be produced in convolution */
kernelGetMinMax(kel1, &min1, NULL);
min2 = 0.0;
if (kel2)
kernelGetMinMax(kel2, &min2, NULL);
min = L_MIN(min1, min2);
if (min >= 0.0) {
if (!kel2)
pixd = pixConvolve(pixs, kel1, outdepth, normflag);
else
pixd = pixConvolveSep(pixs, kel1, kel2, outdepth, normflag);
pixAddConstantGray(pixd, bias);
} else {
/* Bias may need to be applied; convert to fpix and convolve */
fpix1 = pixConvertToFPix(pixs, 1);
if (!kel2)
fpix2 = fpixConvolve(fpix1, kel1, normflag);
else
fpix2 = fpixConvolveSep(fpix1, kel1, kel2, normflag);
fpixDestroy(&fpix1);
fpixAddMultConstant(fpix2, bias, 1.0);
pixd = fpixConvertToPix(fpix2, outdepth, L_CLIP_TO_ZERO, 0);
fpixDestroy(&fpix2);
}
return pixd;
}

Related

How to correctly manipulate a CV_16SC3 Mat in a CUDA Kernel

I am writing a CUDA Program while working with OpenCV. I have an empty Mat of a given size (e.g. 1000x800) which I explicitly converted to GPUMat with dataytpe CV_16SC3. It is desired to manipulate the Image in this format in the CUDA Kernel. However trying to manipulate the Mat does not seem to work correctly.
I am calling my CUDA kernel as follows:
my_kernel <<< gridDim, blockDim >>>( (unsigned short*)img.data, img.cols, img.rows, img.step);
and my sample kernel looks like this
__global__ void my_kernel( unsigned short* img, int width, int height, int img_step)
{
int x, y, pixel;
y = blockIdx.y * blockDim.y + threadIdx.y;
x = blockIdx.x * blockDim.x + threadIdx.x;
if (y >= height)
return;
if (x >= width)
return;
pixel = (y * (img_step)) + (3 * x);
img[pixel] = 255; //I know 255 is basically an uchar, this is just part of my test
img[pixel+1] = 255
img[pixel+2] = 255;
}
I am expecting this small kernel sample to write al pixels to white. However, after downloading the Mat again from the GPU and visualizing it with imshow, not all the pixels are white and some weird black lines are present, which makes me believe that somehow I am writing to invalid memory addresses.
My guess is the following. The OpenCV documentation states that cv::mat::data returns an uchar pointer. However, my Mat has a data type "16U" (short unsigned to my knowledge). That is why in the kernel launch I am casting the pointer to (unsigned short*). But apparently that is incorrect.
How should I correctly proceed to be able to read and write the Mat data as short in my kernel?
First of all, the input image type should be short instead of unsigned short because the type of Mat is 16SC3 ( rather than 16UC3 ).
Now, since the image step is in bytes and the data type is short, the pixel index ( or address ) should be calculated taken into account the difference in byte width of those. There are 2 ways to fix this issue.
Method 1:
__global__ void my_kernel( short* img, int width, int height, int img_step)
{
int x, y, pixel;
y = blockIdx.y * blockDim.y + threadIdx.y;
x = blockIdx.x * blockDim.x + threadIdx.x;
if (y >= height)
return;
if (x >= width)
return;
//Reinterpret the input pointer as char* to allow jump in bytes instead of short
char* imgBytes = reinterpret_cast<char*>(img);
//Calculate row start address using the newly created pointer
char* rowStartBytes = imgBytes + (y * img_step); // Jump in byte
//Reinterpret the row start address back to required data type.
short* rowStartShort = reinterpret_cast<short*>(rowStartBytes);
short* pixelAddress = rowStartShort + ( 3 * x ); // Jump in short
//Modify the image values
pixelAddress[0] = 255;
pixelAddress[1] = 255;
pixelAddress[2] = 255;
}
Method 2:
Divide the input image step by the size of required data type (short). It may be done when passing the step as a kernel argument.
my_kernel<<<grid,block>>>( img, width, height, img_step/sizeof(short));
I have used method 2 for quite a long time. It is a shortcut method, but later on when I got to look at the source code of certain image processing libraries, I realized that actually Method 1 is more portable, since the size of type can vary across different platforms.

How to get more precise output out of an FFT?

I am trying to make a colored waveform using the output of the following code. But when I run it, I only get certain numbers (see the freq variable, it uses the bin size, frame rate and index to make these frequencies) as output frequencies. I'm no math expert, even though I cobbled this together from existing code and answers.
//
// colored_waveform.c
// MixDJ
//
// Created by Jonathan Silverman on 3/14/19.
// Copyright © 2019 Jonathan Silverman. All rights reserved.
//
#include "colored_waveform.h"
#include "fftw3.h"
#include <math.h>
#include "sndfile.h"
//int N = 1024;
// helper function to apply a windowing function to a frame of samples
void calcWindow(double* in, double* out, int size) {
for (int i = 0; i < size; i++) {
double multiplier = 0.5 * (1 - cos(2*M_PI*i/(size - 1)));
out[i] = multiplier * in[i];
}
}
// helper function to compute FFT
void fft(double* samples, fftw_complex* out, int size) {
fftw_plan p;
p = fftw_plan_dft_r2c_1d(size, samples, out, FFTW_ESTIMATE);
fftw_execute(p);
fftw_destroy_plan(p);
}
// find the index of array element with the highest absolute value
// probably want to take some kind of moving average of buf[i]^2
// and return the maximum found
double maxFreqIndex(fftw_complex* buf, int size, float fS) {
double max_freq = 0;
double last_magnitude = 0;
for(int i = 0; i < (size / 2) - 1; i++) {
double freq = i * fS / size;
// printf("freq: %f\n", freq);
double magnitude = sqrt(buf[i][0]*buf[i][0] + buf[i][1]*buf[i][1]);
if(magnitude > last_magnitude)
max_freq = freq;
last_magnitude = magnitude;
}
return max_freq;
}
//
//// map a frequency to a color, red = lower freq -> violet = high freq
//int freqToColor(int i) {
//
//}
void generateWaveformColors(const char path[]) {
printf("Generating waveform colors\n");
SNDFILE *infile = NULL;
SF_INFO sfinfo;
infile = sf_open(path, SFM_READ, &sfinfo);
sf_count_t numSamples = sfinfo.frames;
// sample rate
float fS = 44100;
// float songLengLengthSeconds = numSamples / fS;
// printf("seconds: %f", songLengLengthSeconds);
// size of frame for analysis, you may want to play with this
float frameMsec = 5;
// samples in a frame
int frameSamples = (int)(fS / (frameMsec * 1000));
// how much overlap each frame, you may want to play with this one too
int frameOverlap = (frameSamples / 2);
// color to use for each frame
// int outColors[(numSamples / frameOverlap) + 1];
// scratch buffers
double* tmpWindow;
fftw_complex* tmpFFT;
tmpWindow = (double*) fftw_malloc(sizeof(double) * frameSamples);
tmpFFT = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * frameSamples);
printf("Processing waveform for colors\n");
for (int i = 0, outptr = 0; i < numSamples; i += frameOverlap, outptr++)
{
double inSamples[frameSamples];
sf_read_double(infile, inSamples, frameSamples);
// window another frame for FFT
calcWindow(inSamples, tmpWindow, frameSamples);
// compute the FFT on the next frame
fft(tmpWindow, tmpFFT, frameSamples);
// which frequency is the highest?
double freqIndex = maxFreqIndex(tmpFFT, frameSamples, fS);
printf("%i: ", i);
printf("Max freq: %f\n", freqIndex);
// map to color
// outColors[outptr] = freqToColor(freqIndex);
}
printf("Done.");
sf_close (infile);
}
Here is some of the output:
2094216: Max freq: 5512.500000
2094220: Max freq: 0.000000
2094224: Max freq: 0.000000
2094228: Max freq: 0.000000
2094232: Max freq: 5512.500000
2094236: Max freq: 5512.500000
It only shows certain numbers, not a wide variety of frequencies like it maybe should. Or am I wrong? Is there anything wrong with my code you guys can see? The color stuff is commented out because I haven't done it yet.
The frequency resolution of an FFT is limited by the length of the data sample you have. The more samples you have, the higher the frequency resolution.
In your specific case you chose frames of 5 milliseconds, which is then transformed to a number of samples on the following line:
// samples in a frame
int frameSamples = (int)(fS / (frameMsec * 1000));
This corresponds to only 8 samples at the specified 44100Hz sampling rate. The frequency resolution with such a small frame size can be computed to be
44100 / 8
or 5512.5Hz, a rather poor resolution. Correspondingly, the observed frequencies will always be one of 0, 5512.5, 11025, 16537.5 or 22050Hz.
To get a higher resolution you should increase the number of samples used for analysis by increasing frameMsec (as suggested by the comment "size of frame for analysis, you may want to play with this").

MSE for two Vec3b images in OpenCV

I have two Vec3b images and I want to find the MSE (Mean Square Error) between them. I know how to do it when you have two uchar images, but when you have two Vec3b images where there are 3 different values stored for each pixel how do you calculate it?
You should compute the Euclidean distance for each pair of pixels:
MSE = 0;
for(int i = 0; i < width; i++)
for(int j = 0; j < height; j++)
MSE += sqrt(pow(img1.at<Vec3b>(j, i)[0] - img2.at<Vec3b>(j, i)[0]), 2) + pow(img1.at<Vec3b>(j, i)[1] - img2.at<Vec3b>(j, i)[1]), 2) + pow(img1.at<Vec3b>(j, i)[2] - img2.at<Vec3b>(j, i)[2]), 2));
MSE /= width * height;
This code can be optimized and if you convert your image from BGR to HSV, you could get better results according what you want to do.
To calculate the Mean Square Error for 1D and 3D images in opencv, you can use this post which might be faster since image scanning takes longer times.
double getMSE(Mat& I1, Mat& I2)
{
Mat s1;
// save the I! and I2 type before converting to float
int im1type = I1.type();
int im2type = I2.type();
// convert to float to avoid producing zero for negative numbers
I1.convertTo(I1, CV_32F);
I2.convertTo(I2, CV_32F);
absdiff(I1, I2, s1); // |I1 - I2|
s1.convertTo(s1, CV_32F); // cannot make a square on 8 bits
s1 = s1.mul(s1); // |I1 - I2|^2
Scalar s = sum(s1); // sum elements per channel
double sse = s.val[0] + s.val[1] + s.val[2]; // sum channels
if( sse <= 1e-10) // for small values return zero
return 0;
else
{
double mse =sse /(double)(I1.channels() * I1.total());
return mse;
// Instead of returning MSE, the tutorial code returned PSNR (below).
//double psnr = 10.0*log10((255*255)/mse);
//return psnr;
}
// return I1 and I2 to their initial types
I1.convertTo(I1, im1type);
I2.convertTo(I2, im2type);
}
The above code returns zero for small mse values (under 1e-10). Terms s.val1 and s.val[2] are zero for 1D images.
If you want to check for 1D image input, use the following code to test (with random unsigned numbers):
Mat I1(12, 12, CV_8UC1), I2(12, 12, CV_8UC1);
double low = 0;
double high = 255;
cv::randu(I1, Scalar(low), Scalar(high));
cv::randu(I2, Scalar(low), Scalar(high));
double mse = getMSE(I1, I2);
cout << mse << endl;
If you want to check for 3D image input, use the following code to test (with random unsigned numbers):
Mat I1(12, 12, CV_8UC3), I2(12, 12, CV_8UC3);
double low = 0;
double high = 255;
cv::randu(I1, Scalar(low), Scalar(high));
cv::randu(I2, Scalar(low), Scalar(high));
double mse = getMSE(I1, I2);
cout << mse << endl;

Can template matching in OpenCV deal with two same-sized images?

I want to use template matching in OpenCV to get the similarity of two images. As we all know,template matching is usually used to find smaller image parts in a bigger one. Here is my question. I find when template image and source image are same-sized, the result matrix get from function matchTemplate() is always 0, even if the two images are exactly the same one.
Can template matching in OpenCV deal with two same-sized images?
Perhaps I should apologize first: the value of the matrix is indeed zero after normalization, as long as the two pictures are of the same size. I was wrong about that:)
Check out this page:
OpenCV - Normalize
Part of the OpenCV source code:
void cv::normalize( InputArray _src, OutputArray _dst, double a, double b,
int norm_type, int rtype, InputArray _mask )
{
Mat src = _src.getMat(), mask = _mask.getMat();
double scale = 1, shift = 0;
if( norm_type == CV_MINMAX )
{
double smin = 0, smax = 0; //Records the maximum and minimum value in the _src matrix
double dmin = MIN( a, b ), dmax = MAX( a, b );
minMaxLoc( _src, &smin, &smax, 0, 0, mask ); //Find the minimum and maximum value
scale = (dmax - dmin)*(smax - smin > DBL_EPSILON ? 1./(smax - smin) : 0);
shift = dmin - smin*scale;
}
//...
if( !mask.data )
src.convertTo( dst, rtype, scale, shift );
else
{
//...
}
}
Since there is only one element in the result array, smin = smax = result[0][0]
scale = (dmax - dmin)*(smax - smin > DBL_EPSILON ? 1./(smax - smin) : 0);
= (1 - 0 ) * (0) = 0
shift = dmin - smin*scale
= 0 - result[0][0] * 0
= 0
After that, void Mat::convertTo(OutputArray m, int rtype, double alpha, double beta) uses the following formula: (saturate_cast has nothing to do with your problem, so we can ignore it for now.)
When you call normalize( result, result, 0, 1, NORM_MINMAX, -1, Mat() ), whatever the element in the matrix is, it will execute src.convertTo( dst, rtype, scale, shift ); with scale = 0, shift = 0.
In this convertTo function,
alpha = 0, beta = 0
result[0][0] = result[0][0] * alpha + beta
= result[0][0] * 0 + 0
= 0
So, whatever the value in the result matrix is:
As long as the image and the template are of the same size, size of the result matrix will be 1x1, and after normalization, the result matrix will become a [0].

iOS FFT Accerelate.framework draw spectrum during playback

UPDATE 2016-03-15
Please take a look at this project: https://github.com/ooper-shlab/aurioTouch2.0-Swift. It has been ported to Swift and contains every answer you're looking for, if you cam here.
I did a lot of research and learned a lot about FFT and the Accelerate Framework. But after days of experiments I'm kind of frustrated.
I want to display the frequency spectrum of an audio file during playback in a diagram. For every time interval it should show the magnitude in db on the Y-axis (displayed by a red bar) for every frequency (in my case 512 values) calculated by a FFT on the X-Axis.
The output should look like this:
I fill a buffer with 1024 samples extracting only the left channel for the beginning. Then I do all this FFT stuff.
Here is my code so far:
Setting up some variables
- (void)setupVars
{
maxSamples = 1024;
log2n = log2f(maxSamples);
n = 1 << log2n;
stride = 1;
nOver2 = maxSamples/2;
A.realp = (float *) malloc(nOver2 * sizeof(float));
A.imagp = (float *) malloc(nOver2 * sizeof(float));
memset(A.imagp, 0, nOver2 * sizeof(float));
obtainedReal = (float *) malloc(n * sizeof(float));
originalReal = (float *) malloc(n * sizeof(float));
setupReal = vDSP_create_fftsetup(log2n, FFT_RADIX2);
}
Doing the FFT. FrequencyArray is just a data structure that holds 512 float values.
- (FrequencyArry)performFastFourierTransformForSampleData:(SInt16*)sampleData andSampleRate:(UInt16)sampleRate
{
NSLog(#"log2n %i n %i, nOver2 %i", log2n, n, nOver2);
// n = 1024
// log2n 10
// nOver2 = 512
for (int i = 0; i < n; i++) {
originalReal[i] = (float) sampleData[i];
}
vDSP_ctoz((COMPLEX *) originalReal, 2, &A, 1, nOver2);
vDSP_fft_zrip(setupReal, &A, stride, log2n, FFT_FORWARD);
float scale = (float) 1.0 / (2 * n);
vDSP_vsmul(A.realp, 1, &scale, A.realp, 1, nOver2);
vDSP_vsmul(A.imagp, 1, &scale, A.imagp, 1, nOver2);
vDSP_ztoc(&A, 1, (COMPLEX *) obtainedReal, 2, nOver2);
FrequencyArry frequencyArray;
for (int i = 0; i < nOver2; i++) {
frequencyArray.frequency[i] = log10f(obtainedReal[i]); // Magnitude in db???
}
return frequencyArray;
}
The output looks always kind of weird although it some how seems to move according to the music.
I'm happy that I came so far thanks to some very good posts here like this:
Using the apple FFT and accelerate Framework
But now I don't know what to do. What am I missing?
Firstly you're not applying a window function prior to the FFT - this will result in smearing of the spectrum due to spectral leakage.
Secondly, you're just using the real component of the FFT output bins to calculate dB magnitude - you need to use the complex magnitude:
magnitude_dB = 10 * log10(re * re + im * im);

Resources