Since the Corona situation characterizes my studies as self-study, as a Processing-Language newbie I don't have an easy time getting into the subject of image processing , more specifically convolution. Therefore I hope that you can help me.
My lecturer, who unfortunately is nearly never reachable, left me the following conv code. The theory behind convolution is clear to me, but I have many gaps in understanding related to the code. Could someone leave a line comment so that I can get into the code a bit more fluently?
The Code is following
color convolution (int x, int y, float[][] matrix, int matrix_size, PImage img){
float rtotal = 0.0;
float gtotal = 0.0;
float btotal = 0.0;
int offset = matrix_size / 2;
for (int i = 0; i < matrix_size; i++){
for (int j= 0; j < matrix_size; j++){
int xloc = x+i-offset;
int yloc = y+j-offset;
int loc = xloc + img.width*yloc;
rtotal += (red(img.pixels[loc]) * matrix[i][j]);
gtotal += (green(img.pixels[loc]) * matrix[i][j]);
btotal += (blue(img.pixels[loc]) * matrix[i][j]);
}
}
rtotal = constrain(rtotal, 0, 255);
gtotal = constrain(gtotal, 0, 255);
btotal = constrain(btotal, 0, 255);
return color(rtotal, gtotal, btotal);
}
I have to do a bit of guesswork since I'm not positive about all of the functions you're using and I'm not familiar with the Processing 3+ library, but here's my best shot at it.
color convolution (int x, int y, float[][] matrix, int matrix_size, PImage img){
// Note: the 'matrix' parameter here will also frequently be referred to as
// a 'window' or 'kernel' in research
// I'm not certain what your PImage class is from, but I'll assume
// you're using the Processing 3+ library and work off of that assumption
// how much of each color we see within the kernel (matrix) space
float rtotal = 0.0;
float gtotal = 0.0;
float btotal = 0.0;
// this offset is to zero-center our kernel
// the fact that we use matrix_size / 2 sort of implicitly
// alludes to the fact that our matrix_size should be an odd-number
// so that we can have a middle-pixel
int offset = matrix_size / 2;
// looping through the kernel. the fact that we use 'matrix_size'
// as our end-condition for both dimensions means that our 'matrix' kernel
// must always be a square
for (int i = 0; i < matrix_size; i++){
for (int j= 0; j < matrix_size; j++){
// calculating the index conversion from 2D to the 1D format that PImage uses
// refer to: https://processing.org/tutorials/pixels/
// for a better understanding of PImage indexing (about 1/3 of the way down the page)
// WARNING: by subtracting the offset it is possible to hit negative
// x,y values here if you pick an x or y position less than matrix_size / 2.
// the same index-out-of-bounds can occur on the high end.
// When you convolve using a kernel of N x N size (N here would be matrix_size)
// you can only convolve from [N / 2, Width - (N / 2)] for x and y
int xloc = x+i-offset;
int yloc = y+j-offset;
// this is the final 1D PImage index that corresponds to [xloc, yloc] in our 2D image
// really go back up and take a look at the link if this doesn't make sense, it's pretty good
int loc = xloc + img.width*yloc;
// I have to do some speculation again since I'm not certain what red(img.pixels[loc]) does
// I'll assume it returns the red red channel of the pixel
// this section just adds up all of the pixel colors multiplied by the value in the kernel
rtotal += (red(img.pixels[loc]) * matrix[i][j]);
gtotal += (green(img.pixels[loc]) * matrix[i][j]);
btotal += (blue(img.pixels[loc]) * matrix[i][j]);
}
}
// the fact that no further division or averaging happens after the for-loops implies
// that the kernel you feed in should have balanced values for your kernel size
// for example, a kernel that's designed to average out the color over the 3 x 3 area
// it covers (this would be like blurring the image) would be filled with 1/9
// in general: the kernel you're using should have a sum of 1 for all of the numbers inside
// this is just 'in general' you can play around with not doing that, but you'll probably notice a
// darkening effect for when the sum is less than 1, and a brightening effect if it's greater than 1
// for more info on kernels, read this: https://en.wikipedia.org/wiki/Kernel_(image_processing)
// I don't have the code for this constrain function,
// but it's almost certainly just your typical clamp (constrains the values to [0, 255])
// Note: this means that your values saturate at 0 and 255
// if you see a lot of black or white then that means your kernel
// probably isn't balanced as mentioned above
rtotal = constrain(rtotal, 0, 255);
gtotal = constrain(gtotal, 0, 255);
btotal = constrain(btotal, 0, 255);
// Finished!
return color(rtotal, gtotal, btotal);
}
Related
Question is, how to clusterize pairs of some units by their angle? Problem is that, kmeans operates on the notion of Euclidean space distance and does not know about periodic nature of angles. So to make it work, one needs to translate the angle to Euclidean space but hold the following true:
close angles are close values in Euclidean space;
far angles are far in Euclidean space.
Which means, that, 90 and -90 are distant values, 180 and -180 is the same, 170 and -170 are close (angles come from left up and to right: 0 - +180 and from left down to the right: 0 - -180)
I tried to use various sin() functions but they all have issues mentioned in points 1 and 2. Most perspective one is sin(x * 0.5f) but also having the problem that 180 and -180 are distant values in Euclidean space.
The solution I found is to translate angles to points on circle and feed them into kmeans. This way we make it to compare distances between points and this works perfectly.
Important thing to mention. Kmeans #eps in termination criterion is expressed in terms of units of samples that you feed to kmeans. In our example maximal distant points have dist 200 units (2 * radius). This means that having 1.0f is totally fine. If you use cv::normalize(samples, samples, 0.0f, 1.0f) for your samples before calling kmeans(), adjust your #eps appropriately. Something like eps=0.01f plays better here.
Enjoy! Hope this helps someone.
static cv::Point2f angleToPointOnCircle(float angle, float radius, cv::Point2f origin /* center */)
{
float x = radius * cosf(angle * M_PI / 180.0f) + origin.x;
float y = radius * sinf(angle * M_PI / 180.0f) + origin.y;
return cv::Point2f(x, y);
}
static std::vector<std::pair<size_t, int> > biggestKmeansGroup(const std::vector<int> &labels, int count)
{
std::vector<std::pair<size_t, int> > indices;
std::map<int, size_t> l2cm;
for (int i = 0; i < labels.size(); ++i)
l2cm[labels[i]]++;
std::vector<std::pair<size_t, int> > c2lm;
for (std::map<int, size_t>::iterator it = l2cm.begin(); it != l2cm.end(); it++)
c2lm.push_back(std::make_pair(it->second, it->first)); // count, group
std::sort(c2lm.begin(), c2lm.end(), cmp_pair_first_reverse);
for (int i = 0; i < c2lm.size() && count-- > 0; i++)
indices.push_back(c2lm[i]);
return indices;
}
static void sortByAngle(std::vector<boost::shared_ptr<Pair> > &group,
std::vector<boost::shared_ptr<Pair> > &result)
{
std::vector<int> labels;
cv::Mat samples;
/* Radius is not so important here. */
for (int i = 0; i < group.size(); i++)
samples.push_back(angleToPointOnCircle(group[i]->angle, 100, cv::Point2f(0, 0)));
/* 90 degrees per group. May be less if you need it. */
static int PAIR_MAX_FINE_GROUPS = 4;
int groupNr = std::max(std::min((int)group.size(), PAIR_MAX_FINE_GROUPS), 1);
assert(group.size() >= groupNr);
cv::kmeans(samples.reshape(1, (int)group.size()), groupNr, labels,
cvTermCriteria(CV_TERMCRIT_EPS/* | CV_TERMCRIT_ITER*/, 30, 1.0f),
100, cv::KMEANS_RANDOM_CENTERS);
std::vector<std::pair<size_t, int> > biggest = biggestKmeansGroup(labels, groupNr);
for (int g = 0; g < biggest.size(); g++) {
for (int i = 0; i < group.size(); i++) {
if (labels[i] == biggest[g].second)
result.push_back(group[i]);
}
}
}
I'm trying to make a copy of the resizing algorithm of OpenCV with bilinear interpolation in C. What I want to achieve is that the resulting image is exactly the same (pixel value) to that produced by OpenCV. I am particularly interested in shrinking and not in the magnification, and I'm interested to use it on single channel Grayscale images. On the net I read that the bilinear interpolation algorithm is different between shrinkings and enlargements, but I did not find formulas for shrinking-implementations, so it is likely that the code I wrote is totally wrong. What I wrote comes from my knowledge of interpolation acquired in a university course in Computer Graphics and OpenGL. The result of the algorithm that I wrote are images visually identical to those produced by OpenCV but whose pixel values are not perfectly identical (in particular near edges). Can you show me the shrinking algorithm with bilinear interpolation and a possible implementation?
Note: The code attached is as a one-dimensional filter which must be applied first horizontally and then vertically (i.e. with transposed matrix).
Mat rescale(Mat src, float ratio){
float width = src.cols * ratio; //resized width
int i_width = cvRound(width);
float step = (float)src.cols / (float)i_width; //size of new pixels mapped over old image
float center = step / 2; //V1 - center position of new pixel
//float center = step / src.cols; //V2 - other possible center position of new pixel
//float center = 0.099f; //V3 - Lena 512x512 lower difference possible to OpenCV
Mat dst(src.rows, i_width, CV_8UC1);
//cycle through all rows
for(int j = 0; j < src.rows; j++){
//in each row compute new pixels
for(int i = 0; i < i_width; i++){
float pos = (i*step) + center; //position of (the center of) new pixel in old map coordinates
int pred = floor(pos); //predecessor pixel in the original image
int succ = ceil(pos); //successor pixel in the original image
float d_pred = pos - pred; //pred and succ distances from the center of new pixel
float d_succ = succ - pos;
int val_pred = src.at<uchar>(j, pred); //pred and succ values
int val_succ = src.at<uchar>(j, succ);
float val = (val_pred * d_succ) + (val_succ * d_pred); //inverting d_succ and d_pred, supposing "d_succ = 1 - d_pred"...
int i_val = cvRound(val);
if(i_val == 0) //if pos is a perfect int "x.0000", pred and succ are the same pixel
i_val = val_pred;
dst.at<uchar>(j, i) = i_val;
}
}
return dst;
}
Bilinear interpolation is not separable in the sense that you can resize vertically and the resize again vertically. See example here.
You can see OpenCV's resize code here.
I need to calculate the standard deviation on an image I have inside a UIImage object.
I know already how to access all pixels of an image, one at a time, so somehow I can do it.
I'm wondering if there is somewhere in the framework a function to perform this in a better and more efficient way... I can't find it so maybe it doensn't exist.
Do anyone know how to do this?
bye
To further expand on my comment above. I would definitely look into using the Accelerate framework, especially depending on the size of your image. If you image is a few hundred pixels by a few hundred. You will have a ton of data to process and Accelerate along with vDSP will make all of that math a lot faster since it processes everything on the GPU. I will look into this a little more, and possibly put some code in a few minutes.
UPDATE
I will post some code to do standard deviation in a single dimension using vDSP, but this could definitely be extended to 2-D
float *imageR = [0.1,0.2,0.3,0.4,...]; // vector of values
int numValues = 100; // number of values in imageR
float mean = 0; // place holder for mean
vDSP_meanv(imageR,1,&mean,numValues); // find the mean of the vector
mean = -1*mean // Invert mean so when we add it is actually subtraction
float *subMeanVec = (float*)calloc(numValues,sizeof(float)); // placeholder vector
vDSP_vsadd(imageR,1,&mean,subMeanVec,1,numValues) // subtract mean from vector
free(imageR); // free memory
float *squared = (float*)calloc(numValues,sizeof(float)); // placeholder for squared vector
vDSP_vsq(subMeanVec,1,squared,1,numValues); // Square vector element by element
free(subMeanVec); // free some memory
float sum = 0; // place holder for sum
vDSP_sve(squared,1,&sum,numValues); sum entire vector
free(squared); // free squared vector
float stdDev = sqrt(sum/numValues); // calculated std deviation
Please explain your query so that can come up with specific reply.
If I am getting you right then you want to calculate standard deviation of RGB of pixel or HSV of color, you can frame your own method of standard deviation for circular quantities in case of HSV and RGB.
We can do this by wrapping the values.
For example: Average of [358, 2] degrees is (358+2)/2=180 degrees.
But this is not correct because its average or mean should be 0 degrees.
So we wrap 358 into -2.
Now the answer is 0.
So you have to apply wrapping and then you can calculate standard deviation from above link.
UPDATE:
Convert RGB to HSV
// r,g,b values are from 0 to 1 // h = [0,360], s = [0,1], v = [0,1]
// if s == 0, then h = -1 (undefined)
void RGBtoHSV( float r, float g, float b, float *h, float *s, float *v )
{
float min, max, delta;
min = MIN( r, MIN(g, b ));
max = MAX( r, MAX(g, b ));
*v = max;
delta = max - min;
if( max != 0 )
*s = delta / max;
else {
// r = g = b = 0
*s = 0;
*h = -1;
return;
}
if( r == max )
*h = ( g - b ) / delta;
else if( g == max )
*h=2+(b-r)/delta;
else
*h=4+(r-g)/delta;
*h *= 60;
if( *h < 0 )
*h += 360;
}
and then calculate standard deviation for hue value by this:
double calcStddev(ArrayList<Double> angles){
double sin = 0;
double cos = 0;
for(int i = 0; i < angles.size(); i++){
sin += Math.sin(angles.get(i) * (Math.PI/180.0));
cos += Math.cos(angles.get(i) * (Math.PI/180.0));
}
sin /= angles.size();
cos /= angles.size();
double stddev = Math.sqrt(-Math.log(sin*sin+cos*cos));
return stddev;
}
I'm doing project in OpenCV on object detection which consists of matching the object in template image with the reference image. Using SIFT algorithm the features get acurately detected and matched but I want a rectagle around the matched features
My algorithm uses the KD-Tree est ean First technique to get the matches
If you want a rectangle around the detected object, here you have code example with exactly that. You just need to draw a rectangle around the homography H.
Hope it helps. Good luck.
I use the following code, adapted from the SURF algoritm in OpenCV (modules/features2d/src/surf.cpp) to extract a surrounding of a keypoint.
Apart from other examples based on rectangles and ROI, this code returns the patch correctly oriented according to the orientation and scale determined by the feature detection algorithm (both available in the KeyPoint struct).
An example of the results of the detection on several different images:
const int PATCH_SZ = 20;
Mat extractKeyPoint(const Mat& image, KeyPoint kp)
{
int x = (int)kp.pt.x;
int y = (int)kp.pt.y;
float size = kp.size;
float angle = kp.angle;
int win_size = (int)((PATCH_SZ+1)*size*1.2f/9.0);
Mat win(win_size, win_size, CV_8UC3);
float descriptor_dir = angle * (CV_PI/180);
float sin_dir = sin(descriptor_dir);
float cos_dir = cos(descriptor_dir);
float win_offset = -(float)(win_size-1)/2;
float start_x = x + win_offset*cos_dir + win_offset*sin_dir;
float start_y = y - win_offset*sin_dir + win_offset*cos_dir;
uchar* WIN = win.data;
uchar* IMG = image.data;
for( int i = 0; i < win_size; i++, start_x += sin_dir, start_y += cos_dir )
{
float pixel_x = start_x;
float pixel_y = start_y;
for( int j = 0; j < win_size; j++, pixel_x += cos_dir, pixel_y -= sin_dir )
{
int x = std::min(std::max(cvRound(pixel_x), 0), image.cols-1);
int y = std::min(std::max(cvRound(pixel_y), 0), image.rows-1);
for (int c=0; c<3; c++) {
WIN[i*win_size*3 + j*3 + c] = IMG[y*image.step1() + x*3 + c];
}
}
}
return win;
}
I am not sure if the scale is entirely OK, but it is taken from the SURF source and the results look relevant to me.
I'm convoluting an image (512*512) with a FFT filter (kernelsize=10), it looks good.
But when I compare it with an image which I convoluted the normal way the result was horrible.
The PSNR is about 35.
67,187/262,144 Pixel values have a difference of 1 or more(peak at ~8) (having a max pixel value of 255).
My question is, is it normal when convoluting in frequency space or might there be a problem with my convolution/transforming functions? . Because the strange thing is that I should get better results when using double as data-type. But it stays COMPLETELY the same.
When I transform an image into frequency space, DON'T convolute it, then transform it back it's fine and the PSNR is about 140 when using float.
Also, due to the pixel differences being only 1-10 I think I can rule out scaling errors
EDIT: More Details for bored interested people
I use the open source kissFFT library. With real 2dimensional input (kiss_fftndr.h)
My Image Datatype is PixelMatrix. Simply a matrix with alpha, red, green and blue values from 0.0 to 1.0 float
My kernel is also a PixelMatrix.
Here some snippets from the Convolution function
Used datatypes:
#define kiss_fft_scalar float
#define kiss_fft_cpx struct {
kiss_fft_scalar r;
kiss_fft_scalar i,
}
Configuration of the FFT:
//parameters to kiss_fftndr_alloc:
//1st param = array with the size of the 2 dimensions (in my case dim={width, height})
//2nd param = count of the dimensions (in my case 2)
//3rd param = 0 or 1 (forward or inverse FFT)
//4th and 5th params are not relevant
kiss_fftndr_cfg stf = kiss_fftndr_alloc(dim, 2, 0, 0, 0);
kiss_fftndr_cfg sti = kiss_fftndr_alloc(dim, 2, 1, 0, 0);
Padding and transforming the kernel:
I make a new array:
kiss_fft_scalar kernel[width*height];
I fill it with 0 in a loop.
Then I fill the middle of this array with the kernel I want to use.
So if I would use a 2*2 kernel with values 1/4, 1/4, 1/4 and 1/4 it would look like
0 0 0 0 0 0
0 1/4 1/4 0
0 1/4 1/4 0
0 0 0 0 0 0
The zeros are padded until they reach the size of the image.
Then I swap the quadrants of the image diagonally. It looks like:
1/4 0 0 1/4
0 0 0 0
0 0 0 0
1/4 0 0 1/4
now I transform it: kiss_fftndr(stf, floatKernel, outkernel);
outkernel is declarated as
kiss_fft_cpx outkernel= new kiss_fft_cpx[width*height]
Getting the colors into arrays:
kiss_fft_scalar *red = new kiss_fft_scalar[width*height];
kiss_fft_scalar *green = new kiss_fft_scalar[width*height];
kiss_fft-scalar *blue = new kiss_fft_scalar[width*height];
for(int i=0; i<height; i++) {
for(int j=0; i<width; j++) {
red[i*height+j] = input.get(j,i).getRed(); //input is the input image pixel matrix
green[i*height+j] = input.get(j,i).getGreen();
blue{i*height+j] = input.get(j,i).getBlue();
}
}
Then I transform the arrays:
kiss_fftndr(stf, red, outred);
kiss_fftndr(stf, green, outgreen);
kiss_fftndr(stf, blue, outblue); //the out-arrays are type kiss_fft_cpx*
The convolution:
What we have now:
3 transformed color arrays from type kiss_fft_cpx*
1 transformed kernel array from type kiss_fft_cpx*
They are both complex arrays
Now comes the convolution:
for(int m=0; m<til; m++) {
for(int n=0; n<til; n++) {
kiss_fft_scalar real = outcolor[m*til+n].r; //I do that for all 3 arrys in my code!
kiss_fft_scalar imag = outcolor[m*til+n].i; //so I have realred, realgreen, realblue
kiss_fft_scalar realMask = outkernel[m*til+n].r; // and imagred, imaggreen, etc.
kiss_fft_scalar imagMask = outkernel[m*til+n].i;
outcolor[m*til+n].r = real * realMask - imag * imagMask; //Same thing here in my code i
outcolor[m*til+n].i = real * imagMask + imag * realMask; //do it with all 3 colors
}
}
Now I transform them back:
kiss_fftndri(sti, outred, red);
kiss_fftndri(sti, outgreen, green);
kiss_fftndri(sti, outblue, blue);
and I create a new Pixel Matrix with the values from the color-arrays
PixelMatrix output;
for(int i=0; i<height; i++) {
for(int j=0; j<width; j++) {
Pixel p = new Pixel();
p.setRed( red[i*height+j] / (width*height) ); //I divide through (width*height) because of the scaling happening in the FFT;
p.setGreen( green[i*height+j] );
p.setBlue( blue[i*height+j] );
output.set(j , i , p);
}
}
Notes:
I already take care in advance that the image has a size with a power of 2 (256*256), (512*512) etc.
Examples:
kernelsize: 10
Input:
Output:
Output from normal convolution:
my console says :
142519 out of 262144 Pixels have a difference of 1 or more (maxRGB = 255)
PSNR: 32.006027221679688
MSE: 44.116752624511719
though for my eyes they look the same °.°
Maybe one person is bored and goes through the code. It's not urgent, but it's a kind of problem I just want to know what the hell I did wrong ^^
Last, but not least, my PSNR function, though I don't really think that's the problem :D
void calculateThePSNR(const PixelMatrix first, const PixelMatrix second, float* avgpsnr, float* avgmse) {
int height = first.getHeight();
int width = first.getWidth();
BMP firstOutput;
BMP secondOutput;
firstOutput.SetSize(width, height);
secondOutput.SetSize(width, height);
double rsum=0.0, gsum=0.0, bsum=0.0;
int count = 0;
int total = 0;
for(int i=0; i<height; i++) {
for(int j=0; j<width; j++) {
Pixel pixOne = first.get(j,i);
Pixel pixTwo = second.get(j,i);
double redOne = pixOne.getRed()*255;
double greenOne = pixOne.getGreen()*255;
double blueOne = pixOne.getBlue()*255;
double redTwo = pixTwo.getRed()*255;
double greenTwo = pixTwo.getGreen()*255;
double blueTwo = pixTwo.getBlue()*255;
firstOutput(j,i)->Red = redOne;
firstOutput(j,i)->Green = greenOne;
firstOutput(j,i)->Blue = blueOne;
secondOutput(j,i)->Red = redTwo;
secondOutput(j,i)->Green = greenTwo;
secondOutput(j,i)->Blue = blueTwo;
if((redOne-redTwo) > 1.0 || (redOne-redTwo) < -1.0) {
count++;
}
total++;
rsum += (redOne - redTwo) * (redOne - redTwo);
gsum += (greenOne - greenTwo) * (greenOne - greenTwo);
bsum += (blueOne - blueTwo) * (blueOne - blueTwo);
}
}
fprintf(stderr, "%d out of %d Pixels have a difference of 1 or more (maxRGB = 255)", count, total);
double rmse = rsum/(height*width);
double gmse = gsum/(height*width);
double bmse = bsum/(height*width);
double rpsnr = 20 * log10(255/sqrt(rmse));
double gpsnr = 20 * log10(255/sqrt(gmse));
double bpsnr = 20 * log10(255/sqrt(bmse));
firstOutput.WriteToFile("test.bmp");
secondOutput.WriteToFile("test2.bmp");
system("display test.bmp");
system("display test2.bmp");
*avgmse = (rmse + gmse + bmse)/3;
*avgpsnr = (rpsnr + gpsnr + bpsnr)/3;
}
Phonon had the right idea. Your images are shifted. If you shift your image by (1,1), then the MSE will be approximately zero (provided that you mask or crop the images accordingly). I confirmed this using the code (Python + OpenCV) below.
import cv
import sys
import math
def main():
fname1, fname2 = sys.argv[1:]
im1 = cv.LoadImage(fname1)
im2 = cv.LoadImage(fname2)
tmp = cv.CreateImage(cv.GetSize(im1), cv.IPL_DEPTH_8U, im1.nChannels)
cv.AbsDiff(im1, im2, tmp)
cv.Mul(tmp, tmp, tmp)
mse = cv.Avg(tmp)
print 'MSE:', mse
psnr = [ 10*math.log(255**2/m, 10) for m in mse[:-1] ]
print 'PSNR:', psnr
if __name__ == '__main__':
main()
Output:
MSE: (0.027584912741602553, 0.026742391458366047, 0.028147870144492403, 0.0)
PSNR: [63.724087463606452, 63.858801190963192, 63.636348220531396]
My advice for you to try to implement the following code :
A=double(inputS(1:10:length(inputS))); %segmentation
A(:)=-A(:);
%process the image or signal by fast fourior transformation and inverse fft
fresult=fft(inputS);
fresult(1:round(length(inputS)*2/fs))=0;
fresult(end-round(length(fresult)*2/fs):end)=0;
Y=real(ifft(fresult));
that's code help you to obtain the same size image and good for remove DC component ,the you can to convolution.