Image Processing: Image has grid lines after applying filter - opencv

I'm very new to working with image processing at a low level and have just had a go at implementing a gaussian kernel with both GPU and CPU - however both yield the same output, an image which is severely skewed by a grid:
I'm aware I could use OpenCV's pre-built functions to handle the filters, but I wanted to learn the methodology behind it, so I built my own.
Convolution kernel:
// Convolution kernel - this manipulates the given channel and writes out a new blurred channel.
void convoluteChannel_cpu(
const unsigned char* const channel, // Input channel
unsigned char* const channelBlurred, // Output channel
const size_t numRows, const size_t numCols, // Channel width/height (rows, cols)
const float *filter, // The weight of sigma, to convulge
const int filterWidth // This is normally a sample of 9
)
{
// Loop through the images given R, G or B channel
for(int rows = 0; rows < (int)numRows; rows++)
{
for(int cols = 0; cols < (int)numCols; cols++)
{
// Declare new pixel colour value
float newColor = 0.f;
// Loop for every row along the stencil size (3x3 matrix)
for(int filter_x = -filterWidth/2; filter_x <= filterWidth/2; filter_x++)
{
// Loop for every col along the stencil size (3x3 matrix)
for(int filter_y = -filterWidth/2; filter_y <= filterWidth/2; filter_y++)
{
// Clamp to the boundary of the image to ensure we don't access a null index.
int image_x = __min(__max(rows + filter_x, 0), static_cast<int>(numRows -1));
int image_y = __min(__max(cols + filter_y, 0), static_cast<int>(numCols -1));
// Assign the new pixel value to the current pixel, numCols and numRows are both 3, so we only
// need to use one to find the current pixel index (similar to how we find the thread in a block)
float pixel = static_cast<float>(channel[image_x * numCols + image_y]);
// Sigma is the new weight to apply to the image, we perform the equation to get a radnom weighting,
// if we don't do this the image will become choppy.
float sigma = filter[(filter_x + filterWidth / 2) * filterWidth + filter_y + filterWidth/2];
//float sigma = 1 / 81.f;
// Set the new pixel value
newColor += pixel * sigma;
}
}
// Set the value of the next pixel at the current image index with the newly declared color
channelBlurred[rows * numCols + cols] = newColor;
}
}
}
I call this 3 times from another method which splits the image into respective R, G, B channels, but I don't believe this would cause the image to be so severely mutated.
Has anybody encountered a problem similar to this before, and if so how did you solve it?
EDIT Channel Splitting Func:
void gaussian_cpu(
const uchar4* const rgbaImage, // Our input image from the camera
uchar4* const outputImage, // The image we are writing back for display
size_t numRows, size_t numCols, // Width and Height of the input image (rows/cols)
const float* const filter, // The value of sigma
const int filterWidth // The size of the stencil (3x3) 9
)
{
// Build an array to hold each channel for the given image
unsigned char *r_c = new unsigned char[numRows * numCols];
unsigned char *g_c = new unsigned char[numRows * numCols];
unsigned char *b_c = new unsigned char[numRows * numCols];
// Build arrays for each of the output (blurred) channels
unsigned char *r_bc = new unsigned char[numRows * numCols];
unsigned char *g_bc = new unsigned char[numRows * numCols];
unsigned char *b_bc = new unsigned char[numRows * numCols];
// Separate the image into R,G,B channels
for(size_t i = 0; i < numRows * numCols; i++)
{
uchar4 rgba = rgbaImage[i];
r_c[i] = rgba.x;
g_c[i] = rgba.y;
b_c[i] = rgba.z;
}
// Convolute each of the channels using our array
convoluteChannel_cpu(r_c, r_bc, numRows, numCols, filter, filterWidth);
convoluteChannel_cpu(g_c, g_bc, numRows, numCols, filter, filterWidth);
convoluteChannel_cpu(b_c, b_bc, numRows, numCols, filter, filterWidth);
// Recombine the channels to build the output image - 255 for alpha as we want 0 transparency
for(size_t i = 0; i < numRows * numCols; i++)
{
uchar4 rgba = make_uchar4(r_bc[i], g_bc[i], b_bc[i], 255);
outputImage[i] = rgba;
}
}
EDIT Calling the kernel
while(gpu_frames > 0)
{
//cout << gpu_frames << "\n";
camera >> frameIn;
// Allocate I/O Pointers
beginStream(&h_inputFrame, &h_outputFrame, &d_inputFrame, &d_outputFrame, &d_redBlurred, &d_greenBlurred, &d_blueBlurred, &_h_filter, &filterWidth, frameIn);
// Show the source image
imshow("Source", frameIn);
g_timer.Start();
// Allocate mem to GPU
allocateMemoryAndCopyToGPU(numRows(), numCols(), _h_filter, filterWidth);
// Apply the gaussian kernel filter and then free any memory ready for the next iteration
gaussian_gpu(h_inputFrame, d_inputFrame, d_outputFrame, numRows(), numCols(), d_redBlurred, d_greenBlurred, d_blueBlurred, filterWidth);
// Output the blurred image
cudaMemcpy(h_outputFrame, d_frameOut, sizeof(uchar4) * numPixels(), cudaMemcpyDeviceToHost);
g_timer.Stop();
cudaDeviceSynchronize();
gpuTime += g_timer.Elapsed();
cout << "Time for this kernel " << g_timer.Elapsed() << "\n";
Mat outputFrame(Size(numCols(), numRows()), CV_8UC1, h_outputFrame, Mat::AUTO_STEP);
clean_mem();
imshow("Dest", outputFrame);
// 1ms delay to prevent system from being interrupted whilst drawing the new frame
waitKey(1);
gpu_frames--;
}
And then within the beginStream() method, images are converted to uchar4:
// Allocate host variables, casting the frameIn and frameOut vars to uchar4 elements, these will
// later be processed by the kernel
*h_inputFrame = (uchar4 *)frameIn.ptr<unsigned char>(0);
*h_outputFrame = (uchar4 *)frameOut.ptr<unsigned char>(0);

There are many doubts in the problem.
At the start of the code, its mentioned that the filter width is 9, thus making it a 9x9 kernel. But in some other comments its said to be 3. So I am guessing that you are actually using a 9x9 kernel and the filter do have the 81 weights in them.
But the above output can never be due to the above mentioned confusion.
uchar4 is of 4-byte size. Thus in gaussian_cpu while splitting the data by running the loop over rgbaImage[i] on an image that doesnot contain alpha value (it could be inferred from the above mentioned loop that alpha is not present) what actually gets done is that your are copying R1,G2,B3,R5,G6,B7 and so on to the red-channel. Better you initially try the code on a grayscale image and make sure you are using uchar instead of uchar4.
The output image seems exactly 1/3rd the width of the original image, which makes the above assumption to be true.
EDIT 1:
Is the input rgbaImage to guassian_cpu function RGBA or RGB? videoCapture must be giving a 3 channel output. The initialization of *h_inputFrame (to uchar4) itself is wrong as its pointing to 3 channel data.
Similarly the output data is four channel data, but Mat outputFrame is declared as a single channel which points to this four channel data. Try Mat outputFrame as 8UC3 type and see the result.
Also, how is the code working, the guassian_cpu() function has 7 input parameters in the definition, but when you call the function 8 parameters are used. Hope this is just a typo.

Related

How to correctly manipulate a CV_16SC3 Mat in a CUDA Kernel

I am writing a CUDA Program while working with OpenCV. I have an empty Mat of a given size (e.g. 1000x800) which I explicitly converted to GPUMat with dataytpe CV_16SC3. It is desired to manipulate the Image in this format in the CUDA Kernel. However trying to manipulate the Mat does not seem to work correctly.
I am calling my CUDA kernel as follows:
my_kernel <<< gridDim, blockDim >>>( (unsigned short*)img.data, img.cols, img.rows, img.step);
and my sample kernel looks like this
__global__ void my_kernel( unsigned short* img, int width, int height, int img_step)
{
int x, y, pixel;
y = blockIdx.y * blockDim.y + threadIdx.y;
x = blockIdx.x * blockDim.x + threadIdx.x;
if (y >= height)
return;
if (x >= width)
return;
pixel = (y * (img_step)) + (3 * x);
img[pixel] = 255; //I know 255 is basically an uchar, this is just part of my test
img[pixel+1] = 255
img[pixel+2] = 255;
}
I am expecting this small kernel sample to write al pixels to white. However, after downloading the Mat again from the GPU and visualizing it with imshow, not all the pixels are white and some weird black lines are present, which makes me believe that somehow I am writing to invalid memory addresses.
My guess is the following. The OpenCV documentation states that cv::mat::data returns an uchar pointer. However, my Mat has a data type "16U" (short unsigned to my knowledge). That is why in the kernel launch I am casting the pointer to (unsigned short*). But apparently that is incorrect.
How should I correctly proceed to be able to read and write the Mat data as short in my kernel?
First of all, the input image type should be short instead of unsigned short because the type of Mat is 16SC3 ( rather than 16UC3 ).
Now, since the image step is in bytes and the data type is short, the pixel index ( or address ) should be calculated taken into account the difference in byte width of those. There are 2 ways to fix this issue.
Method 1:
__global__ void my_kernel( short* img, int width, int height, int img_step)
{
int x, y, pixel;
y = blockIdx.y * blockDim.y + threadIdx.y;
x = blockIdx.x * blockDim.x + threadIdx.x;
if (y >= height)
return;
if (x >= width)
return;
//Reinterpret the input pointer as char* to allow jump in bytes instead of short
char* imgBytes = reinterpret_cast<char*>(img);
//Calculate row start address using the newly created pointer
char* rowStartBytes = imgBytes + (y * img_step); // Jump in byte
//Reinterpret the row start address back to required data type.
short* rowStartShort = reinterpret_cast<short*>(rowStartBytes);
short* pixelAddress = rowStartShort + ( 3 * x ); // Jump in short
//Modify the image values
pixelAddress[0] = 255;
pixelAddress[1] = 255;
pixelAddress[2] = 255;
}
Method 2:
Divide the input image step by the size of required data type (short). It may be done when passing the step as a kernel argument.
my_kernel<<<grid,block>>>( img, width, height, img_step/sizeof(short));
I have used method 2 for quite a long time. It is a shortcut method, but later on when I got to look at the source code of certain image processing libraries, I realized that actually Method 1 is more portable, since the size of type can vary across different platforms.

How to create OpenCV images that are contiguous in memory?

I am an OpenCV newbie. I create a OpenCV image using cvCreateImage and apply some operations on it. Now, I want to create a series of OpenCV images whose underlying memory is contiguous. This can be helpful to process that memory later as a series of image frames using parallel or CUDA techniques.
How can I create a certain number of OpenCV images that are contiguous in memory?
You can allocate the data yourself:
const int W = 640;
const int H = 480;
const int C = 1; // number of channels (1 for CV_8U)
const int N = 10; // number of images
unsigned char buffer[W*H*C*N];
cv::Mat im0(H, W, CV_8U, buffer);
cv::Mat im1(H, W, CV_8U, buffer + W*H*C);
cv::Mat im2(H, W, CV_8U, buffer + W*H*C*2);
I have used the C++ API because I'm more used to it, but there must exist a similar behaviour in the C api with the cvCreateImage function.
You could use a cv::Mat to manage the storage, then you don't to have remember to delete the storage.
Assuming 3 channel images:
const int W = 640; const int H = 480; const int C = 1;
const int N = 10; // number of images
cv::Mat buffer (N, W * H, CV_8UC3);
cv::Mat im0(H, W, CV_8UC3, buffer.ptr<uchar>(0));
cv::Mat im1(H, W, CV_8UC3, buffer.ptr<uchar>(1));
cv::Mat im2(H, W, CV_8UC3, buffer.ptr<uchar>(2));

Creating a Mat object from a YV12 image buffer

I have a buffer which contains an image in YV12 format. Now I want to either convert this buffer to RGB format or create a Mat object from it directly! Can someone help me? I tried this code :
cv::Mat input(widthOfImg, heightOfImg, CV_8UC1, vy12Buffer);
cv::Mat converted;
cv::cvtColor(input, converted, CV_YUV2RGB_YV12);
That's possible.
cv::Mat picYV12 = cv::Mat(nHeight * 3/2, nWidth, CV_8UC1, yv12DataBuffer);
cv::Mat picBGR;
cv::cvtColor(picYV12, picBGR, CV_YUV2BGR_YV12);
cv::imwrite("test.bmp", picBGR); //only for test
Opencv color conversion flags
The height is multiplied by 3/2 because there are 4 Y samples, and 1 U and 1 V sample stored for every 2x2 square of pixels. This results in a byte sample to pixel ratio of 3/2
4*1+1+1 samples per 2*2 pixels = 6/4 = 3/2
YV12 Format
Correction: In the last version of OpenCV (i use oldest 2.4.13 version) is color conversion code changed to
COLOR_YUV2BGR_YV12
cv::cvtColor(picYV12, picBGR, COLOR_YUV2BGR_YV12);
here is the corresponding version in java (Android)...
This method was faster than other techniques like renderscript or opengl(glReadPixels) for getting bitmap from yuv12/i420 data stream (tested with webrtc i420 ).
long startTimei = SystemClock.uptimeMillis();
Mat picyv12 = new Mat(768,512,CV_8UC1); //(im_height*3/2,im_width), should be even no...
picyv12.put(0,0,return_buff); // buffer - byte array with i420 data
Imgproc.cvtColor(picyv12,picyv12,COLOR_YUV2RGB_YV12);// or use COLOR_YUV2BGR_YV12 depending on output result
long endTimei = SystemClock.uptimeMillis();
Log.d("i420_time", Long.toString(endTimei - startTimei));
Log.d("picyv12_size", picyv12.size().toString()); // Check size
Log.d("picyv12_type", String.valueOf(picyv12.type())); // Check type
Utils.matToBitmap(picyv12,tbmp2); // Convert mat to bitmap (height, width) i.e (512,512) - ARGB_888
save(tbmp2,"itest"); // Save bitmap
That's impossible.
Y'UV420p is a planar format, meaning that the Y', U, and V values are
grouped together instead of interspersed. The reason for this is that
by grouping the U and V values together, the image becomes much more
compressible. When given an array of an image in the Y'UV420p format,
all the Y' values come first, followed by all the U values, followed
finally by all the V values.
but cv::Mat is a RGB color model, and arranged like B0 G0 R0 B1 G1 R1... So,we can't create a Mat object from a YV12 buffer directly.
Here is an example:
cv::Mat Yv12ToRgb( uchar *pBuffer,long bufferSize, int width,int height )
{
cv::Mat result(height,width,CV_8UC3);
uchar y,cb,cr;
long ySize=width*height;
long uSize;
uSize=ySize>>2;
assert(bufferSize==ySize+uSize*2);
uchar *output=result.data;
uchar *pY=pBuffer;
uchar *pU=pY+ySize;
uchar *pV=pU+uSize;
uchar r,g,b;
for (int i=0;i<uSize;++i)
{
for(int j=0;j<4;++j)
{
y=pY[i*4+j];
cb=ucharpU[i];
cr=ucharpV[i];
//ITU-R standard
b=saturate_cast<uchar>(y+1.772*(cb-128));
g=saturate_cast<uchar>(y-0.344*(cb-128)-0.714*(cr-128));
r=saturate_cast<uchar>(y+1.402*(cr-128));
*output++=b;
*output++=g;
*output++=r;
}
}
return result;
}
You can try as YUV_I420 array
char filePath[3000];
int width, height;
cout << "file path = ";
cin >> filePath;
cout << "width = ";
cin >> width;
cout << "height = ";
cin >> height;
FILE *pFile = fopen(filePath, "rb");
unsigned char* buff = new unsigned char[width * height *3 / 2];
fread(buff, 1, width * height* 3 / 2, pFile);
fclose(pFile);
cv::Mat imageRGB;
cv::Mat picI420 = cv::Mat(height * 3 / 2, width, CV_8UC1, buff);
cv::cvtColor(picI420, imageRGB, CV_YUV2BGRA_I420);
imshow("imageRGB", imageRGB);
waitKey(0);

Unknown error when inverting image using cuda

i began to implement some simple image processing using cuda but i have an error in my code
the error happens when i copy pixels from device to host
this is my try
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <opencv2\core\core.hpp>
#include <opencv2\highgui\highgui.hpp>
#include <stdio.h>
using namespace cv;
unsigned char *h_pixels;
unsigned char *d_pixels;
int bufferSize;
int width,height;
const int BLOCK_SIZE = 32;
Mat image;
void get_pixels(const char* fileName)
{
image = imread(fileName);
bufferSize = image.size().width * image.size().height * 3 * sizeof(unsigned char);
width = image.size().width;
height = image.size().height;
h_pixels = new unsigned char[bufferSize];
memcpy(h_pixels,image.data,bufferSize);
}
__global__ void invert_image(unsigned char* pixels,int width,int height)
{
int row = blockIdx.y * BLOCK_SIZE + threadIdx.y;
int col = blockIdx.x * BLOCK_SIZE + threadIdx.x;
int cidx = (row * width + col) * 3;
pixels[cidx] = 255 - pixels[cidx];
pixels[cidx + 1] = 255 - pixels[cidx + 1];
pixels[cidx + 2] = 255 - pixels[cidx + 2];
}
int main()
{
get_pixels("D:\\photos\\z.jpg");
cudaError_t err = cudaMalloc((void**)&d_pixels,bufferSize);
err = cudaMemcpy(d_pixels,h_pixels,bufferSize,cudaMemcpyHostToDevice);
dim3 dimBlock(BLOCK_SIZE,BLOCK_SIZE);
dim3 dimGrid(width/dimBlock.x,height/dimBlock.y);
invert_image<<<dimBlock,dimGrid>>>(d_pixels,width,height);
unsigned char *pixels = new unsigned char[bufferSize];
err= cudaMemcpy(pixels,d_pixels,bufferSize,cudaMemcpyDeviceToHost);// unknown error
const char * errStr = cudaGetErrorString(err);
cudaFree(d_pixels);
image.data = pixels;
namedWindow("display image");
imshow("display image",image);
waitKey();
return 0;
}
also how can i find out error that occurs in cuda device
thanks for your help
OpenCV images are not continuous. Each row is 4 byte or 8 byte aligned. You should also pass the step field of the Mat to the CUDA kernel, so that you can calculate the cidx correctly. The generic formula to calculate the output index is:
cidx = row * (step/elementSize) + (NumberOfChannels * col);
in your case, it will be:
cidx = row * step + (3 * col);
Referring to the alignment of images, you buffer size is equal to image.step * image.size().height.
Next thing is the one pointed out by #phoad in the third point. You should create enough number of thread blocks to cover the whole image.
Here is a generic formula for Grid which will create enough number of blocks for any image size.
dim3 block(BLOCK_SIZE,BLOCK_SIZE);
dim3 grid((width + block.x - 1)/block.x,(height + block.y - 1)/block.y);
First of all be sure that the image file is read correctly.
Check if the device memory is allocated with CUDA_SAFE_CALL(cudaMalloc(..))
Check the dimensions of the image. If the dimension of the image is not multiples of BLOCKSIZE than you might be missing some indices and the image is not fully inverted.
Call cudaDeviceSynchronize after the kernel call and check its return value.
Do you get any error when you run the code without calling the kernel anyway?
You are not freeing the h_pixels and might have a memory leak.
Instead of using BLOCKSIZE in the kernel you might use "blockDim.x". So calculating indices like "blockIdx.x * blockDim.x + threadIdx.x"
Try to do not touch the memory area in the kernel code, namely comment out the memory updates at the kernel (the lines where you access the pixels array) and check if the program continues to fail. If it does not continue to fail you might be accessing out of the bounds.
Use this command immediately after the kernel invocation to print the kernel errors:
printf("error code: %s\n",cudaGetErrorString(cudaGetLastError()))

writing to IplImage imageData

I want to write data directly into the imageData array of an IplImage, but I can't find a lot of information on how it's formatted. One thing that's particularly troubling me is that, despite creating an image with three channels, there are four bytes to each pixel.
The function I'm using to create the image is:
IplImage *frame = cvCreateImage(cvSize(1, 1), IPL_DEPTH_8U, 3);
By all indications, this should create a three channel RGB image, but that doesn't seem to be the case.
How would I, for example, write a single red pixel to that image?
Thanks for any help, it's get me stumped.
If you are looking at frame->imageSize keep in mind that it is frame->height * frame->widthStep, not frame->height * frame->width.
BGR is the native format of OpenCV, not RGB.
Also, if you're just getting started, you should consider using the C++ interface (where Mat replaces IplImage) since that is the future direction and it's a lot easier to work with.
Here's some sample code that accesses pixel data directly:
int main (int argc, const char * argv[]) {
IplImage *frame = cvCreateImage(cvSize(41, 41), IPL_DEPTH_8U, 3);
for( int y=0; y<frame->height; y++ ) {
uchar* ptr = (uchar*) ( frame->imageData + y * frame->widthStep );
for( int x=0; x<frame->width; x++ ) {
ptr[3*x+2] = 255; //Set red to max (BGR format)
}
}
cvNamedWindow("window", CV_WINDOW_AUTOSIZE);
cvShowImage("window", frame);
cvWaitKey(0);
cvReleaseImage(&frame);
cvDestroyWindow("window");
return 0;
}
unsigned char* imageData = [r1, g1, b1, r2, g2, b2, ..., rN, bn, gn]; // n = height*width of image
frame->imageData = imageData.
Take Your image that is a dimensional array of height N and width M and arrange it into a row-wise vector of length N*M. Make this of type unsigned char* for IPL_DEPTH_8U images.
Straight to your answer, painting the pixel red:
IplImage *frame = cvCreateImage(cvSize(1, 1), IPL_DEPTH_8U, 3);
int y,x;
x=0;y=0; //Pixel coordinates. Use this for bigger images than a single pixel.
int C=2; //0 for blue, 1 for green and 2 for red (BGR is the default format).
frame->imageData[y*frame->widthStep+3*x+C]=(uchar)255;

Resources