Is there any way to convert an Eigen::Matrix back to itk::image? - image-processing

I used Eigen library to convert several itk::image images into matrices, and do some dense linear algebra computations on them. Finally, I have the output as a matrix, but I need it in itk::image form. Is there any way to do this?
const unsigned int numberOfPixels = importSize[0] * importSize[1];
float* array1 = inverseU.data();
float* localBuffer = new float[numberOfPixels];
std::memcpy(localBuffer, array1, numberOfPixels);
const bool importImageFilterWillOwnTheBuffer = true;
importFilter->SetImportPointer(localBuffer,numberOfPixels,importImageFilterWillOwnTheBuffer);
importFilter->Update();
inverseU is the Eigen library matrix (float), importSize is the size of this matrix. When I give importFilter->GetOutput(), and write the result to file, the image I get is like this, which is not correct.
This is the matrix inverseU.
https://drive.google.com/file/d/0B3L9EtRhN11QME16SGtfSDJzSWs/view?usp=sharing . It is supposed to give a retinal fundus image in image form, I got the matrix after doing deblurring.

Take a look at the ImportImageFilter of itk. In particular, it may be used to build an itk::Image starting from a C-style array (example).
Someone recently asked how to convert a CImg image to ITK image. My answer might be a starting point...
A way to get the array out of a matrix A from Eigen may be found here :
double* array=A.data();
EDIT : here is a piece of code to turn a matrix of float into a png image saved with ITK. First, the matrix is converted to an itk Image of float. Then, this image is rescaled an cast to a image of unsigned char, using the RescaleIntensityImageFilter as explained here. Finally, the image is saved in png format.
#include <iostream>
#include <itkImage.h>
using namespace itk;
using namespace std;
#include <Eigen/Dense>
using Eigen::MatrixXf;
#include <itkImportImageFilter.h>
#include <itkImageFileWriter.h>
#include "itkRescaleIntensityImageFilter.h"
void eigen_To_ITK (MatrixXf mat)
{
const unsigned int Dimension = 2;
typedef itk::Image<unsigned char, Dimension> UCharImageType;
typedef itk::Image< float, Dimension > FloatImageType;
typedef itk::ImportImageFilter< float, Dimension > ImportFilterType;
ImportFilterType::Pointer importFilter = ImportFilterType::New();
typedef itk::RescaleIntensityImageFilter< FloatImageType, UCharImageType > RescaleFilterType;
RescaleFilterType::Pointer rescaleFilter = RescaleFilterType::New();
typedef itk::ImageFileWriter< UCharImageType > WriterType;
WriterType::Pointer writer = WriterType::New();
FloatImageType::SizeType imsize;
imsize[0] = mat.rows();
imsize[1] = mat.cols();
ImportFilterType::IndexType start;
start.Fill( 0 );
ImportFilterType::RegionType region;
region.SetIndex( start );
region.SetSize( imsize );
importFilter->SetRegion( region );
const itk::SpacePrecisionType origin[ Dimension ] = { 0.0, 0.0 };
importFilter->SetOrigin( origin );
const itk::SpacePrecisionType spacing[ Dimension ] = { 1.0, 1.0 };
importFilter->SetSpacing( spacing );
const unsigned int numberOfPixels = imsize[0] * imsize[1];
const bool importImageFilterWillOwnTheBuffer = true;
float * localBuffer = new float[ numberOfPixels ];
float * it = localBuffer;
memcpy(it, mat.data(), numberOfPixels*sizeof(float));
importFilter->SetImportPointer( localBuffer, numberOfPixels,importImageFilterWillOwnTheBuffer );
rescaleFilter ->SetInput(importFilter->GetOutput());
rescaleFilter->SetOutputMinimum(0);
rescaleFilter->SetOutputMaximum(255);
writer->SetFileName( "output.png" );
writer->SetInput(rescaleFilter->GetOutput() );
writer->Update();
}
int main()
{
const int rows = 42;
const int cols = 90;
MatrixXf mat1(rows, cols);
mat1.topLeftCorner(rows/2, cols/2) = MatrixXf::Zero(rows/2, cols/2);
mat1.topRightCorner(rows/2, cols/2) = MatrixXf::Identity(rows/2, cols/2);
mat1.bottomLeftCorner(rows/2, cols/2) = -MatrixXf::Identity(rows/2, cols/2);
mat1.bottomRightCorner(rows/2, cols/2) = MatrixXf::Zero(rows/2, cols/2);
mat1+=0.1*MatrixXf::Random(rows,cols);
eigen_To_ITK (mat1);
cout<<"running fine"<<endl;
return 0;
}
The program is build using CMake. Here is the CMakeLists.txt :
cmake_minimum_required(VERSION 2.8 FATAL_ERROR)
project(ItkTest)
find_package(ITK REQUIRED)
include(${ITK_USE_FILE})
# to include eigen. This path may need to be changed
include_directories(/usr/local/include/eigen3)
add_executable(MyTest main.cpp)
target_link_libraries(MyTest ${ITK_LIBRARIES})

Related

How to use RealSense's spatial_filter on an OpenCV Mat?

I want to apply RealSense library's depth filtering (rs2::spatial_filter) on an OpenCV Mat, but it seems like the filter is not being applied. The original depth image and the supposedly filtered depth image look exactly the same.
To load raw depth data into rs2::frame, I used modified #Groch88's answer. One of the changes I made was changing the depth format from RS2_FORMAT_Z16 to RS2_FORMAT_DISTANCE (to be able to load a float depth map) and not loading the RGB part of the frame. The whole source is below.
Why do the original and the filtered images look exactly the same? Am I missing something obvious?
main.cpp:
#include <iostream>
#include <opencv2/opencv.hpp>
#include <librealsense2/rs.hpp>
#include "rsImageConverter.h"
int main()
{
cv::Mat rawDepthImg = load_raw_depth("/path/to/depth/image"); // loads float depth image
rsImageConverter ic{rawDepthImg.cols, rawDepthImg.rows, sizeof(float)};
if ( !ic.convertFrame(rawDepthImg.data, rawDepthImg.data) )
{
fprintf(stderr, "Could not load depth.\n");
exit(1);
}
rs2::frame rsDepthFrame = ic.getDepth();
// Filter
// https://dev.intelrealsense.com/docs/post-processing-filters
rs2::spatial_filter spat_filter;
spat_filter.set_option(RS2_OPTION_FILTER_MAGNITUDE, 2.0f);
spat_filter.set_option(RS2_OPTION_FILTER_SMOOTH_ALPHA, 0.5f);
spat_filter.set_option(RS2_OPTION_FILTER_SMOOTH_DELTA, 20.0f);
// Apply filtering
rs2::frame rsFilteredDepthFrame;
rsFilteredDepthFrame = spat_filter.process(rsDepthFrame);
// Copy filtered depth to OpenCV Mat
cv::Mat filteredDepth = cv::Mat::zeros(rawDepthImg.size(), CV_32F);
memcpy(filteredDepth.data, rsFilteredDepthFrame.get_data(), rawDepthImg.cols * rawDepthImg.rows * sizeof(float));
// Display (un)filtered images
cv::imshow("Original depth", rawDepthImg); // Original image is being shown
cv::imshow("Filtered depth", filteredDepth); // A depth image that looks exactly like the original unfiltered depth map is shown
cv::imshow("Diff", filteredDepth - rawDepthImg); // A black image is being shown
cv::waitKey(0);
return 0;
}
rsImageConverter.h (edited version of #Doch88's code):
#include <librealsense2/rs.hpp>
#include <librealsense2/hpp/rs_internal.hpp>
class rsImageConverter
{
public:
rsImageConverter(int w, int h, int bppDepth);
bool convertFrame(uint8_t* depth_data, uint8_t* color_data);
rs2::frame getDepth() const;
private:
int w = 640;
int h = 480;
int bppDepth = sizeof(float);
rs2::software_device dev;
rs2::software_sensor depth_sensor;
rs2::stream_profile depth_stream;
rs2::syncer syncer;
rs2::frame depth;
int ind = 0;
};
rsImageConverter.cpp (edited version of #Doch88's code):
#include "rsImageConverter.h"
rsImageConverter::rsImageConverter(int w, int h, int bppDepth) :
w(w),
h(h),
bppDepth(bppDepth),
depth_sensor(dev.add_sensor("Depth")) // initializing depth sensor
{
rs2_intrinsics depth_intrinsics{ w, h, (float)(w / 2), (float)(h / 2), (float) w , (float) h , RS2_DISTORTION_BROWN_CONRADY ,{ 0,0,0,0,0 } };
depth_stream = depth_sensor.add_video_stream({ RS2_STREAM_DEPTH, 0, 0,
w, h, 60, bppDepth,
RS2_FORMAT_DISTANCE, depth_intrinsics });
depth_sensor.add_read_only_option(RS2_OPTION_DEPTH_UNITS, 1.f); // setting depth units option to the virtual sensor
depth_sensor.open(depth_stream);
depth_sensor.start(syncer);
}
bool rsImageConverter::convertFrame(uint8_t* depth_data, uint8_t* color_data)
{
depth_sensor.on_video_frame({ depth_data, // Frame pixels
[](void*) {}, // Custom deleter (if required)
w*bppDepth, bppDepth, // Stride and Bytes-per-pixel
(rs2_time_t)ind * 16, RS2_TIMESTAMP_DOMAIN_HARDWARE_CLOCK, ind, // Timestamp, Frame# for potential sync services
depth_stream });
ind++;
rs2::frameset fset = syncer.wait_for_frames();
depth = fset.first_or_default(RS2_STREAM_DEPTH);
return depth;
}
rs2::frame rsImageConverter::getDepth() const
{
return depth;
}

histogram kernel memory issue

I am trying to implement an algorithm to process images with more than 256 bins.
The main issue to process histogram in such case comes from the impossibility to allocate more than 32 Kb as local tab in the GPU.
All the algorithms I found for 8 bits per pixel images use a fixed size tab locally.
The histogram is the first process in that tab then a barrier is up and at last an addition is made with the output vector.
I am working with IR image which has more than 32K bins of dynamic.
So I cannot allocate a fixed size tab inside the GPU.
My algorithm use an atomic_add in order to create directly the output histogram.
I am interfacing with OpenCV so, in order to manage the possible case of saturation my bins use floating points. Depending on the ability of the GPU to manage single or double precision.
OpenCV doesn't manage unsigned int, long, and unsigned long data type as matrix type.
I have an error... I do think this error is a kind of segmentation fault.
After several days I still have no idea what can be wrong.
Here is my code :
histogram.cl :
#pragma OPENCL EXTENSION cl_khr_fp64: enable
#pragma OPENCL EXTENSION cl_khr_int64_base_atomics: enable
static void Atomic_Add_f64(__global double *val, double delta)
{
union {
double f;
ulong i;
} old;
union {
double f;
ulong i;
} new;
do {
old.f = *val;
new.f = old.f + delta;
}
while (atom_cmpxchg ( (volatile __global ulong *)val, old.i, new.i) != old.i);
}
static void Atomic_Add_f32(__global float *val, double delta)
{
union
{
float f;
uint i;
} old;
union
{
float f;
uint i;
} new;
do
{
old.f = *val;
new.f = old.f + delta;
}
while (atom_cmpxchg ( (volatile __global ulong *)val, old.i, new.i) != old.i);
}
__kernel void khist(
__global const uchar* _src,
const int src_steps,
const int src_offset,
const int rows,
const int cols,
__global uchar* _dst,
const int dst_steps,
const int dst_offset)
{
const int gid = get_global_id(0);
// printf("This message has been printed from the OpenCL kernel %d \n",gid);
if(gid < rows)
{
__global const _Sty* src = (__global const _Sty*)_src;
__global _Dty* dst = (__global _Dty*) _dst;
const int src_step1 = src_steps/sizeof(_Sty);
const int dst_step1 = dst_steps/sizeof(_Dty);
src += mad24(gid,src_step1,src_offset);
dst += mad24(gid,dst_step1,dst_offset);
_Dty one = (_Dty)1;
for(int c=0;c<cols;c++)
{
const _Rty idx = (_Rty)(*(src+c+src_offset));
ATOMIC_FUN(dst+idx+dst_offset,one);
}
}
}
The function Atomic_Add_f64 directly come from here and there
main.cpp
#include <opencv2/core.hpp>
#include <opencv2/core/ocl.hpp>
#include <fstream>
#include <sstream>
#include <chrono>
int main()
{
cv::Mat_<unsigned short> a(480,640);
cv::RNG rng(std::time(nullptr));
std::for_each(a.begin(),a.end(),[&](unsigned short& v){ v = rng.uniform(0,100);});
bool ret = false;
cv::String file_content;
{
std::ifstream file_stream("../test/histogram.cl");
std::ostringstream file_buf;
file_buf<<file_stream.rdbuf();
file_content = file_buf.str();
}
int output_flag = cv::ocl::Device::getDefault().doubleFPConfig() == 0 ? CV_32F : CV_64F;
cv::String atomic_fun = output_flag == CV_32F ? "Atomic_Add_f32" : "Atomic_Add_f64";
cv::ocl::ProgramSource source(file_content);
// std::cout<<source.source()<<std::endl;
cv::ocl::Kernel k;
cv::UMat src;
cv::UMat dst = cv::UMat::zeros(1,65536,output_flag);
a.copyTo(src);
atomic_fun = cv::format("-D _Sty=%s -D _Rty=%s -D _Dty=%s -D ATOMIC_FUN=%s",
cv::ocl::typeToStr(src.depth()),
cv::ocl::typeToStr(src.depth()), // this to manage case like a matrix of usigned short stored as a matrix of float.
cv::ocl::typeToStr(output_flag),
atomic_fun.c_str());
ret = k.create("khist",source,atomic_fun);
std::cout<<"check create : "<<ret<<std::endl;
k.args(cv::ocl::KernelArg::ReadOnly(src),cv::ocl::KernelArg::WriteOnlyNoSize(dst));
std::size_t sz = a.rows;
ret = k.run(1,&sz,nullptr,false);
std::cout<<"check "<<ret<<std::endl;
cv::Mat b;
dst.copyTo(b);
std::copy_n(b.ptr<double>(0),101,std::ostream_iterator<double>(std::cout," "));
std::cout<<std::endl;
return EXIT_SUCCESS;
}
Hello I arrived to fix it.
I don't really know where the issue come from.
But if I suppose the output as a pointer rather than a matrix it work.
The changes I made are these :
histogram.cl :
__kernel void khist(
__global const uchar* _src,
const int src_steps,
const int src_offset,
const int rows,
const int cols,
__global _Dty* _dst)
{
const int gid = get_global_id(0);
if(gid < rows)
{
__global const _Sty* src = (__global const _Sty*)_src;
__global _Dty* dst = _dst;
const int src_step1 = src_steps/sizeof(_Sty);
src += mad24(gid,src_step1,src_offset);
ulong one = 1;
for(int c=0;c<cols;c++)
{
const _Rty idx = (_Rty)(*(src+c+src_offset));
ATOMIC_FUN(dst+idx,one);
}
}
}
main.cpp
k.args(cv::ocl::KernelArg::ReadOnly(src),cv::ocl::KernelArg::PtrWriteOnly(dst));
The rest of the code is the same in the two files.
For me it work fine.
If someone know why it work when the ouput is declared as a pointer rather than a vector (matrix of one row) I am interested.
Nevertheless my issue is fix :).

OpenCV and DAISY descriptors

I am trying to do feature matching between 2 perspectives of the same image using DAISY and the FlannBasedMatcher.
I don't think there is even a single match that is correct.
Note: I also get different results each time I run the program but I think this is expected behaviour as explained here: FlannBasedMatcher returning different results
So what am I doing wrong? Why are these matches so bad?
Input Images
Wrong & non-deterministic results
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/xfeatures2d.hpp>
#include <iostream>
#include <vector>
#include <stdio.h>
using namespace cv;
using std::vector;
const float nn_match_ratio = 0.7f; // Nearest neighbor matching ratio
const float keypoint_diameter = 15.0f;
int main(int argc, char ** argv){
// Load images
Mat img1 = imread(argv[1]);
Mat img2 = imread(argv[2]);
vector<KeyPoint> keypoints1, keypoints2;
// Add every pixel to the list of keypoints for each image
for (float xx = keypoint_diameter; xx < img1.size().width - keypoint_diameter; xx++) {
for (float yy = keypoint_diameter; yy < img1.size().height - keypoint_diameter; yy++) {
keypoints1.push_back(KeyPoint(xx, yy, keypoint_diameter));
keypoints2.push_back(KeyPoint(xx, yy, keypoint_diameter));
}
}
Mat desc1, desc2;
Ptr<cv::xfeatures2d::DAISY> descriptor_extractor = cv::xfeatures2d::DAISY::create();
// Compute DAISY descriptors for both images
descriptor_extractor->compute(img1, keypoints1, desc1);
descriptor_extractor->compute(img2, keypoints2, desc2);
vector <vector<DMatch>> matches;
// For each descriptor in image1, find 2 closest matched in image2 (note: couldn't get BF matcher to work here at all)
FlannBasedMatcher flannmatcher;
flannmatcher.add(desc1);
flannmatcher.train();
flannmatcher.knnMatch(desc2, matches, 2);
// ignore matches with high ambiguity -- i.e. second closest match not much worse than first
// push all remaining matches back into DMatch Vector "good_matches" so we can draw them using DrawMatches
int num_good = 0;
vector<KeyPoint> matched1, matched2;
vector<DMatch> good_matches;
for (int i = 0; i < matches.size(); i++) {
DMatch first = matches[i][0];
DMatch second = matches[i][1];
if (first.distance < nn_match_ratio * second.distance) {
matched1.push_back(keypoints1[first.queryIdx]);
matched2.push_back(keypoints2[first.trainIdx]);
good_matches.push_back(DMatch(num_good, num_good, 0));
num_good++;
}
}
Mat res;
drawMatches(img1, matched1, img2, matched2, good_matches, res);
imwrite("_res.png", res);
return 0;
}
Sorry. I found my bug. I have the Indexes reversed in the lines that read:
matched1.push_back(keypoints1[first.queryIdx]);
matched2.push_back(keypoints2[first.trainIdx]);
how can i get the coordinates of the matches find in the two images,that is the coordinates of the matches in the first image and the coordinates of the matches in the second?

ImageMagick load image into RAM

I have a JPG picture on which I'd like to perform some operations in order to use pattern recognition. The picture is being rotated and also some filters like color inversion, greyscale,.. are applied
The program goes like this
for(i=0;i<360;i++){
rotate(pic,i);
foreach(filter as f){
f(pic);
recognize(pic);
}
}
In order to increase speed I'd like to have the source image loaded in RAM and then read from there. Is it possible?
You can write the image to mpr:, or clone the image instance to a new structure. Regardless if where the original source is in memory, you will still need to copy the data in the first for loop. Here's an example, in C, that holds a wand instance and clones each iteration.
#include <stdio.h>
#include <MagickWand/MagickWand.h>
void rotate(MagickWand * wand, double degree) {
PixelWand * pwand = NewPixelWand();
PixelSetColor(pwand, "white");
MagickRotateImage(wand, pwand, degree);
DestroyPixelWand(pwand);
}
void _f(MagickWand * wand, FilterTypes filter) {
double x,y;
x = y = 0.0;
MagickResampleImage(wand, x, y, filter);
}
void recognize(MagickWand * wand) {
// ???
}
int main(int argc, const char * argv[]) {
MagickWandGenesis();
MagickWand * wand, * copy_wand;
wand = NewMagickWand();
MagickReadImage(wand, "rose:");
for ( int i = 0; i < 360 ; i++ ) {
copy_wand = CloneMagickWand(wand);
for ( FilterTypes f = UndefinedFilter; f < SentinelFilter; f++ ) {
_f(copy_wand, f);
recognize(copy_wand);
}
}
MagickWandTerminus();
return 0;
}
The MPR writes to a specific page in memory, and can be identified by a user defined label.
MagickReadImage(wand, "rose:");
MagickWriteImage(wand, "mpr:original"); // Save image to "original" label
for ( int i = 0; i < 360 ; i++ ) {
copy_wand = NewMagickWand();
MagickReadImage(copy_wand, "mpr:original"); // Read image from "original" label
for ( FilterTypes f = UndefinedFilter; f < SentinelFilter; f++ ) {
_f(copy_wand, f);
recognize(copy_wand);
}
copy_wand = DestroyMagickWand(copy_wand);
}
The last option I can think of is to copy the image pixel-data into memory, and re-reference it with each iteration. This allows some performance improvements, and I'm thinking OpenMP, but you'll loose a lot of helper methods.
MagickReadImage(wand, "rose:");
size_t w = MagickGetImageWidth(wand);
size_t h = MagickGetImageHeight(wand);
size_t data_length = w * h * 4;
char * data = malloc(data_length);
MagickExportImagePixels(wand, 0, 0, w, h, "RGBA", CharPixel, (void *)data);
for ( int i = 0; i < 360; i++ ) {
long * copy_data = malloc(data_length);
memcpy(copy_data, data, data_length);
As you haven't specified a language or an operating system, I'll show you how to do that with Magick++ in C++ in a Linux/OSX environment:
#include <Magick++.h>
#include <iostream>
using namespace std;
using namespace Magick;
int main(int argc,char **argv)
{
InitializeMagick(*argv);
// Create an image object
Image image;
// Read a file into image object
image.read( "input.gif" );
// Crop the image to specified size (width, height, xOffset, yOffset)
image.crop( Geometry(100,100, 0, 0) );
// Repage the image to forget it was part of something bigger
image.repage();
// Write the image to a file
image.write( "result.gif" );
return 0;
}
Compile with:
g++ -o program program.cpp `Magick++-config --cppflags --cxxflags --ldflags --libs`
You will need an image called input.gif for it to read and that should be bigger than 100x100, so create one with:
convert -size 256x256 xc:gray +noise random input.gif

Image Processing: Image has grid lines after applying filter

I'm very new to working with image processing at a low level and have just had a go at implementing a gaussian kernel with both GPU and CPU - however both yield the same output, an image which is severely skewed by a grid:
I'm aware I could use OpenCV's pre-built functions to handle the filters, but I wanted to learn the methodology behind it, so I built my own.
Convolution kernel:
// Convolution kernel - this manipulates the given channel and writes out a new blurred channel.
void convoluteChannel_cpu(
const unsigned char* const channel, // Input channel
unsigned char* const channelBlurred, // Output channel
const size_t numRows, const size_t numCols, // Channel width/height (rows, cols)
const float *filter, // The weight of sigma, to convulge
const int filterWidth // This is normally a sample of 9
)
{
// Loop through the images given R, G or B channel
for(int rows = 0; rows < (int)numRows; rows++)
{
for(int cols = 0; cols < (int)numCols; cols++)
{
// Declare new pixel colour value
float newColor = 0.f;
// Loop for every row along the stencil size (3x3 matrix)
for(int filter_x = -filterWidth/2; filter_x <= filterWidth/2; filter_x++)
{
// Loop for every col along the stencil size (3x3 matrix)
for(int filter_y = -filterWidth/2; filter_y <= filterWidth/2; filter_y++)
{
// Clamp to the boundary of the image to ensure we don't access a null index.
int image_x = __min(__max(rows + filter_x, 0), static_cast<int>(numRows -1));
int image_y = __min(__max(cols + filter_y, 0), static_cast<int>(numCols -1));
// Assign the new pixel value to the current pixel, numCols and numRows are both 3, so we only
// need to use one to find the current pixel index (similar to how we find the thread in a block)
float pixel = static_cast<float>(channel[image_x * numCols + image_y]);
// Sigma is the new weight to apply to the image, we perform the equation to get a radnom weighting,
// if we don't do this the image will become choppy.
float sigma = filter[(filter_x + filterWidth / 2) * filterWidth + filter_y + filterWidth/2];
//float sigma = 1 / 81.f;
// Set the new pixel value
newColor += pixel * sigma;
}
}
// Set the value of the next pixel at the current image index with the newly declared color
channelBlurred[rows * numCols + cols] = newColor;
}
}
}
I call this 3 times from another method which splits the image into respective R, G, B channels, but I don't believe this would cause the image to be so severely mutated.
Has anybody encountered a problem similar to this before, and if so how did you solve it?
EDIT Channel Splitting Func:
void gaussian_cpu(
const uchar4* const rgbaImage, // Our input image from the camera
uchar4* const outputImage, // The image we are writing back for display
size_t numRows, size_t numCols, // Width and Height of the input image (rows/cols)
const float* const filter, // The value of sigma
const int filterWidth // The size of the stencil (3x3) 9
)
{
// Build an array to hold each channel for the given image
unsigned char *r_c = new unsigned char[numRows * numCols];
unsigned char *g_c = new unsigned char[numRows * numCols];
unsigned char *b_c = new unsigned char[numRows * numCols];
// Build arrays for each of the output (blurred) channels
unsigned char *r_bc = new unsigned char[numRows * numCols];
unsigned char *g_bc = new unsigned char[numRows * numCols];
unsigned char *b_bc = new unsigned char[numRows * numCols];
// Separate the image into R,G,B channels
for(size_t i = 0; i < numRows * numCols; i++)
{
uchar4 rgba = rgbaImage[i];
r_c[i] = rgba.x;
g_c[i] = rgba.y;
b_c[i] = rgba.z;
}
// Convolute each of the channels using our array
convoluteChannel_cpu(r_c, r_bc, numRows, numCols, filter, filterWidth);
convoluteChannel_cpu(g_c, g_bc, numRows, numCols, filter, filterWidth);
convoluteChannel_cpu(b_c, b_bc, numRows, numCols, filter, filterWidth);
// Recombine the channels to build the output image - 255 for alpha as we want 0 transparency
for(size_t i = 0; i < numRows * numCols; i++)
{
uchar4 rgba = make_uchar4(r_bc[i], g_bc[i], b_bc[i], 255);
outputImage[i] = rgba;
}
}
EDIT Calling the kernel
while(gpu_frames > 0)
{
//cout << gpu_frames << "\n";
camera >> frameIn;
// Allocate I/O Pointers
beginStream(&h_inputFrame, &h_outputFrame, &d_inputFrame, &d_outputFrame, &d_redBlurred, &d_greenBlurred, &d_blueBlurred, &_h_filter, &filterWidth, frameIn);
// Show the source image
imshow("Source", frameIn);
g_timer.Start();
// Allocate mem to GPU
allocateMemoryAndCopyToGPU(numRows(), numCols(), _h_filter, filterWidth);
// Apply the gaussian kernel filter and then free any memory ready for the next iteration
gaussian_gpu(h_inputFrame, d_inputFrame, d_outputFrame, numRows(), numCols(), d_redBlurred, d_greenBlurred, d_blueBlurred, filterWidth);
// Output the blurred image
cudaMemcpy(h_outputFrame, d_frameOut, sizeof(uchar4) * numPixels(), cudaMemcpyDeviceToHost);
g_timer.Stop();
cudaDeviceSynchronize();
gpuTime += g_timer.Elapsed();
cout << "Time for this kernel " << g_timer.Elapsed() << "\n";
Mat outputFrame(Size(numCols(), numRows()), CV_8UC1, h_outputFrame, Mat::AUTO_STEP);
clean_mem();
imshow("Dest", outputFrame);
// 1ms delay to prevent system from being interrupted whilst drawing the new frame
waitKey(1);
gpu_frames--;
}
And then within the beginStream() method, images are converted to uchar4:
// Allocate host variables, casting the frameIn and frameOut vars to uchar4 elements, these will
// later be processed by the kernel
*h_inputFrame = (uchar4 *)frameIn.ptr<unsigned char>(0);
*h_outputFrame = (uchar4 *)frameOut.ptr<unsigned char>(0);
There are many doubts in the problem.
At the start of the code, its mentioned that the filter width is 9, thus making it a 9x9 kernel. But in some other comments its said to be 3. So I am guessing that you are actually using a 9x9 kernel and the filter do have the 81 weights in them.
But the above output can never be due to the above mentioned confusion.
uchar4 is of 4-byte size. Thus in gaussian_cpu while splitting the data by running the loop over rgbaImage[i] on an image that doesnot contain alpha value (it could be inferred from the above mentioned loop that alpha is not present) what actually gets done is that your are copying R1,G2,B3,R5,G6,B7 and so on to the red-channel. Better you initially try the code on a grayscale image and make sure you are using uchar instead of uchar4.
The output image seems exactly 1/3rd the width of the original image, which makes the above assumption to be true.
EDIT 1:
Is the input rgbaImage to guassian_cpu function RGBA or RGB? videoCapture must be giving a 3 channel output. The initialization of *h_inputFrame (to uchar4) itself is wrong as its pointing to 3 channel data.
Similarly the output data is four channel data, but Mat outputFrame is declared as a single channel which points to this four channel data. Try Mat outputFrame as 8UC3 type and see the result.
Also, how is the code working, the guassian_cpu() function has 7 input parameters in the definition, but when you call the function 8 parameters are used. Hope this is just a typo.

Resources