OpenCV cuda::meanStdDev support for CV_32FC1 - opencv

I want to find the mean pixel value and standard deviation of a GPUMat and do this reduction on the GPU rather than having to download the image and compute the mean on the CPU (since this will slow me down considerably in my application). The thing is, the GpuMat images I am dealing with are 32 bit floats - the opencv documentation however states that
CV_8UC1 matrices are supported for now
I have no trouble compiling the following code:
#include <opencv2/core/core.hpp>
#include <opencv2/core/cuda.hpp>
#include <opencv2/cudaarithm.hpp>
int main(int argc, char** argv)
{
cv::cuda::GpuMat img = cv::cuda::GpuMat(cv::Mat::zeros(cv::Size(kIWEWidth,kIWEHeight), CV_32FC1));
cv::Scalar mean, std;
cv::cuda::meanStdDev(img, mean, std);
}
However, when I try to actually execute this, I'm hit with
error: (-215:Assertion failed) src.type() == CV_8UC1 in function 'meanStdDev'
So, I was wondering if anyone knows if it's possible to compile OpenCV with 32 bit float support on the meanStdDev method, or if there are any alternative methods that can be recommended. I realise for example, that I should be able to find the average using cuda::sum, cuda::subtract and cuda::sqrSum. But this requires a bunch of kernel launches, and in my particular case, every microsecond counts.
Anyways, thanks in advance for your help!

I find it really weird that the cv::cuda version only supports CV_8U1, because it literally calls the npp function nppiMean_StdDev_8u_C1R, and versions for more image types exist.
void meanStdDev_32FC1M(cv::cuda::GpuMat src, cv::cuda::GpuMat mask, double *mean, double *stddev)
{
CV_Assert(src.type() == CV_32FC1);
double *mean_dev, *stddev_dev;
cudaMalloc((void**)&mean_dev, sizeof(double));
cudaMalloc((void**)&stddev_dev, sizeof(double));
NppiSize sz;
sz.width = src.cols;
sz.height = src.rows;
int bufSize;
nppiMeanStdDevGetBufferHostSize_32f_C1R(sz, &bufSize);//nppSafeCall
cv::cuda::BufferPool pool(cv::cuda::Stream::Null());
cv::cuda::GpuMat buf = pool.getBuffer(1, bufSize, CV_8UC1);
nppiMean_StdDev_32f_C1MR(src.ptr<Npp32f>(), static_cast<int>(src.step), mask.ptr<Npp8u>(), static_cast<int>(mask.step), sz, buf.ptr<Npp8u>(), mean_dev, stddev_dev);
cudaMemcpy(mean, mean_dev, sizeof(double), cudaMemcpyDeviceToHost);
cudaMemcpy(stddev, stddev_dev, sizeof(double), cudaMemcpyDeviceToHost);
cudaFree(mean_dev);
cudaFree(stddev_dev);
}

Related

Adding a scalar to a Mat object

So I'm trying to add a scalar value to all elements of a Mat object in openCV, however for raw_t_ubit8 and raw_t_ubit16 types I get wrong results. Here's the code.
Mat A;
//Initialize Mat A;
A = A + 0.1;
The Matrix is initially
The result of the addition is exactly the same matrix. This problem does not occur when I try to add scalars to raw_t_real types of matrices. By raw_t_ubit8 I mean the depth is CV_8UC1
If, as you mentioned in the comments, the contained values are scaled in the matrix to fit the integer domain 0..255, then you should also scale the scalar value you sum. Namely:
A = A + cv::Scalar(round(0.1 * 255) );
Or even better:
A += cv::Scalar(round(0.1 * 255) );
Note that cv::Scalar, as pointed out in the comments by Miki, is in any case made from a double (it's a cv::Scalar_<double>).
The rounding could be omitted, but then you leave the choice on how to convert your double into integer to the function implementation.
You should also check what happens when the values saturate.
Documentation for Opencv matrix expressions.
As stated in the comments and in #Antonio's answer, you can't add 0.1 to an integer.
If you are using CV_8UC1 matrices, but you want to work with floating points values, you should multiply by 255.
Mat1b A; // <-- type CV_8UC1
...
A += 0.1 * 255;
If the result of the operation need to be casted, as in this case, then ultimately saturated_cast is called.
This is equivalent to #Antonio's answer, but it results in cleaner code (at least for me).
The same code will be used, either if you sum a double or a Scalar. A Scalar object will be created in both ways using:
template<typename _Tp> inline
Scalar_<_Tp>::Scalar_(_Tp v0)
{
this->val[0] = v0;
this->val[1] = this->val[2] = this->val[3] = 0;
}
However if you need to sum exactly 0.1 to your matrix (and not to scale it by 255), you need to convert your matrix to CV_32FC1:
#include <opencv2/opencv.hpp>
using namespace cv;
int main(int, char** argv)
{
Mat1b A = (Mat1b(3,3) << 1,2,3,4,5,6,7,8,9);
Mat1f F;
A.convertTo(F, CV_32FC1);
F += 0.1;
return 0;
}

calcHist() doesn't return green histogram as expected

I am using OpenCV on Ubuntu 12.04. No fancy IDEs. Just compiling and running from the command-line. This is my code for calculating the histogram of a colour image. "lion.jpg" is a colour image. This code heavily borrows from the Official OpenCV tutorial for histogram calculation.
I get the red and blue histograms just fine. But the green histogram is all haywire. The image of the histogram.
My code is:
#include"opencv2/highgui/highgui.hpp"
#include"opencv2/imgproc/imgproc.hpp"
#include<iostream>
#include<stdio.h>
using namespace cv;
using namespace std;
int main(int argc,char *argv[])
{
Mat img,b_hist,g_hist,r_hist;
vector<Mat> channels;
namedWindow("Histogram",CV_WINDOW_NORMAL);
int bins=256;
float range[]={0,255};
const float* histrange={range};
img=imread("lion.jpg",CV_LOAD_IMAGE_COLOR);
split(img,channels);
calcHist(&channels[0],1,0,Mat(),b_hist,1,&bins,&histrange,true,false);
calcHist(&channels[1],1,0,Mat(),g_hist,1,&bins,&histrange,true,false);
calcHist(&channels[2],1,0,Mat(),r_hist,1,&bins,&histrange,true,false);
Mat histimage(600,600,CV_8UC3,Scalar(0,0,0));
normalize(b_hist,b_hist,0,histimage.rows,NORM_MINMAX,-1,Mat());
normalize(g_hist,g_hist,0,histimage.rows,NORM_MINMAX,-1,Mat());
normalize(r_hist,r_hist,0,histimage.rows,NORM_MINMAX,-1,Mat());
for(int i=0;i<bins;i++)
{
line(histimage,Point(2*i,histimage.rows-b_hist.at<float>(i)),Point(2* (i+1),histimage.rows-b_hist.at<float>(i+1)),Scalar(255,0,0));
line(histimage,Point(2*1,histimage.rows-g_hist.at<float>(i)),Point(2*(i+1),histimage.rows-g_hist.at<float>(i+1)),Scalar(0,255,0));
line(histimage,Point(2*i,histimage.rows-r_hist.at<float>(i)),Point(2*(i+1),histimage.rows-r_hist.at<float>(i+1)),Scalar(0,0,255));
}
imshow("Histogram",histimage);
waitKey(0);
destroyWindow("Histogram");
return 1;
}
You have typo
line(histimage,Point(2*1,histimage.rows-g_hist.at<float>(i)),Point(2*(i+1),histimage.rows-g_hist.at<float>(i+1)),Scalar(0,255,0));
Instead of correct
line(histimage,Point(2*i,histimage.rows-g_hist.at<float>(i)),Point(2*(i+1),histimage.rows-g_hist.at<float>(i+1)),Scalar(0,255,0));

palm veins enhancement with OpenCV

I'm trying to implement in OpenCV an algorithm to bring out the details of a palm vein pattern. I've based myself on a paper called "A Contactless Biometric System Using Palm Print and Palm Vein Features" that I've found on the Internet. The part I'm interested in is the chapter 3.2 Pre-processing. The steps involved are shown there.
I'd like to do the implementation using OpenCV but until now I'm stuck hard. Especially they use a Laplacian filter on the response of a low-pass filter to isolate the principal veins but my result gets very noisy, no matter the parameters I try!
Any help would be greatly appreciated!
Ok finally I've figured out by myself how to do it. Here is my code :
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#define THRESHOLD 150
#define BRIGHT 0.7
#define DARK 0.2
using namespace std;
using namespace cv;
int main()
{
// Read source image in grayscale mode
Mat img = imread("roi.png", CV_LOAD_IMAGE_GRAYSCALE);
// Apply ??? algorithm from https://stackoverflow.com/a/14874992/2501769
Mat enhanced, float_gray, blur, num, den;
img.convertTo(float_gray, CV_32F, 1.0/255.0);
cv::GaussianBlur(float_gray, blur, Size(0,0), 10);
num = float_gray - blur;
cv::GaussianBlur(num.mul(num), blur, Size(0,0), 20);
cv::pow(blur, 0.5, den);
enhanced = num / den;
cv::normalize(enhanced, enhanced, 0.0, 255.0, NORM_MINMAX, -1);
enhanced.convertTo(enhanced, CV_8UC1);
// Low-pass filter
Mat gaussian;
cv::GaussianBlur(enhanced, gaussian, Size(0,0), 3);
// High-pass filter on computed low-pass image
Mat laplace;
Laplacian(gaussian, laplace, CV_32F, 19);
double lapmin, lapmax;
minMaxLoc(laplace, &lapmin, &lapmax);
double scale = 127/ max(-lapmin, lapmax);
laplace.convertTo(laplace, CV_8U, scale, 128);
// Thresholding using empirical value of 150 to create a vein mask
Mat mask;
cv::threshold(laplace, mask, THRESHOLD, 255, CV_THRESH_BINARY);
// Clean-up the mask using open morphological operation
morphologyEx(mask,mask,cv::MORPH_OPEN,
getStructuringElement(cv::MORPH_ELLIPSE, cv::Size(5,5)));
// Connect the neighboring areas using close morphological operation
Mat connected;
morphologyEx(mask,mask,cv::MORPH_CLOSE,
getStructuringElement(cv::MORPH_ELLIPSE, cv::Size(11,11)));
// Blurry the mask for a smoother enhancement
cv::GaussianBlur(mask, mask, Size(15,15), 0);
// Blurry a little bit the image as well to remove noise
cv::GaussianBlur(enhanced, enhanced, Size(3,3), 0);
// The mask is used to amplify the veins
Mat result(enhanced);
ushort new_pixel;
double coeff;
for(int i=0;i<mask.rows;i++){
for(int j=0;j<mask.cols;j++){
coeff = (1.0-(mask.at<uchar>(i,j)/255.0))*BRIGHT + (1-DARK);
new_pixel = coeff * enhanced.at<uchar>(i,j);
result.at<uchar>(i,j) = (new_pixel>255) ? 255 : new_pixel;
}
}
// Show results
imshow("frame", img);
waitKey();
imshow("frame", result);
waitKey();
return 0;
}
So the main steps of the paper are followed here. For some parts I've inspired myself on code I've found. It's the case for the first processing I apply that I've found here. Also for the High-pass filter (laplacian) I've inspired myself on the code given in OpenCV 2 Computer Vision Application Programming Cookbook.
Finally I've done some little improvements by allowing to modify the brightness of the background and the darkness of the veins (see defines BRIGHT and DARK). I've also decided to blur a bit the mask to have a more "natural" enhancement.
Here the results (Source / Paper result / My result) :

Sift implementation with OpenCV 2.2

Does someone know the link of example of SIFT implementation with OpenCV 2.2.
regards,
Below is a minimal example:
#include <opencv/cv.h>
#include <opencv/highgui.h>
int main(int argc, const char* argv[])
{
const cv::Mat input = cv::imread("input.jpg", 0); //Load as grayscale
cv::SiftFeatureDetector detector;
std::vector<cv::KeyPoint> keypoints;
detector.detect(input, keypoints);
// Add results to image and save.
cv::Mat output;
cv::drawKeypoints(input, keypoints, output);
cv::imwrite("sift_result.jpg", output);
return 0;
}
Tested on OpenCV 2.3
You can obtain the SIFT detector and SIFT-based extractor in several ways. As others have already suggested the more direct methods, I will provide a more "software engineering" approach that may make you code more flexible to changes (i.e. easier to change to other detectors and extractors).
Firstly, if you are looking to obtain the detector using built in parameters the best way is to use OpenCV"s factory methods for creating it. Here's how:
#include <opencv2/core/core.hpp>
#include <opencv2/features2d/features2d.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <vector>
using namespace std;
using namespace cv;
int main(int argc, char *argv[])
{
Mat image = imread("TestImage.jpg");
// Create smart pointer for SIFT feature detector.
Ptr<FeatureDetector> featureDetector = FeatureDetector::create("SIFT");
vector<KeyPoint> keypoints;
// Detect the keypoints
featureDetector->detect(image, keypoints); // NOTE: featureDetector is a pointer hence the '->'.
//Similarly, we create a smart pointer to the SIFT extractor.
Ptr<DescriptorExtractor> featureExtractor = DescriptorExtractor::create("SIFT");
// Compute the 128 dimension SIFT descriptor at each keypoint.
// Each row in "descriptors" correspond to the SIFT descriptor for each keypoint
Mat descriptors;
featureExtractor->compute(image, keypoints, descriptors);
// If you would like to draw the detected keypoint just to check
Mat outputImage;
Scalar keypointColor = Scalar(255, 0, 0); // Blue keypoints.
drawKeypoints(image, keypoints, outputImage, keypointColor, DrawMatchesFlags::DEFAULT);
namedWindow("Output");
imshow("Output", outputImage);
char c = ' ';
while ((c = waitKey(0)) != 'q'); // Keep window there until user presses 'q' to quit.
return 0;
}
The reason using the factory methods is flexible because now you can change to a different keypoint detector or feature extractor e.g. SURF simply by changing the argument passed to the "create" factory methods like this:
Ptr<FeatureDetector> featureDetector = FeatureDetector::create("SURF");
Ptr<DescriptorExtractor> featureExtractor = DescriptorExtractor::create("SURF");
For other possible arguments to pass to create other detectors or extractors see:
http://opencv.itseez.com/modules/features2d/doc/common_interfaces_of_feature_detectors.html#featuredetector-create
http://opencv.itseez.com/modules/features2d/doc/common_interfaces_of_descriptor_extractors.html?highlight=descriptorextractor#descriptorextractor-create
Now, using the factory methods means you gain the convenience of not having to guess some suitable parameters to pass to each of the detectors or extractors. This can be convenient for people new to using them. However, if you would like to create your own custom SIFT detector, you can wrap the SiftDetector object created with custom parameters and wrap it into a smart pointer and refer to it using the featureDetector smart pointer variable as above.
A simple example using SIFT nonfree feature detector in opencv 2.4
#include <opencv2/opencv.hpp>
#include <opencv2/nonfree/nonfree.hpp>
using namespace cv;
int main(int argc, char** argv)
{
if(argc < 2)
return -1;
Mat img = imread(argv[1]);
SIFT sift;
vector<KeyPoint> key_points;
Mat descriptors;
sift(img, Mat(), key_points, descriptors);
Mat output_img;
drawKeypoints(img, key_points, output_img);
namedWindow("Image");
imshow("Image", output_img);
waitKey(0);
destroyWindow("Image");
return 0;
}
OpenCV provides SIFT and SURF (here too) and other feature descriptors out-of-the-box.
Note that the SIFT algorithm is patented, so it may be incompatible with the regular OpenCV use/license.
Another simple example using SIFT nonfree feature detector in opencv 2.4
Be sure to add the opencv_nonfree240.lib dependency
#include "cv.h"
#include "highgui.h"
#include <opencv2/nonfree/nonfree.hpp>
int main(int argc, char** argv)
{
cv::Mat img = cv::imread("image.jpg");
cv::SIFT sift(10); //number of keypoints
cv::vector<cv::KeyPoint> key_points;
cv::Mat descriptors, mascara;
cv::Mat output_img;
sift(img,mascara,key_points,descriptors);
drawKeypoints(img, key_points, output_img);
cv::namedWindow("Image");
cv::imshow("Image", output_img);
cv::waitKey(0);
return 0;
}
in case someone is wondering how to do it with 2 images :
import numpy as np
import cv2
print ('Initiate SIFT detector')
sift = cv2.xfeatures2d.SIFT_create()
print ('find the keypoints and descriptors with SIFT')
gcp1, des1 = sift.detectAndCompute(src_img,None)
gcp2, des2 = sift.detectAndCompute(trg_img,None)
# create BFMatcher object
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
matches = bf.match(des1,des2)
# Sort them in the order of their distance.
matches = sorted(matches, key = lambda x:x.distance)
#print only the first 100 matches
img3 = drawMatches(src_img, gcp1, trg_img, gcp2, matches[:100])

how to make a CUDA Histogram kernel?

I am writing a CUDA kernel for Histogram on a picture, but I had no idea how to return a array from the kernel, and the array will change when other thread read it. Any possible solution for it?
__global__ void Hist(
TColor *dst, //input image
int imageW,
int imageH,
int*data
){
const int ix = blockDim.x * blockIdx.x + threadIdx.x;
const int iy = blockDim.y * blockIdx.y + threadIdx.y;
if(ix < imageW && iy < imageH)
{
int pixel = get_red(dst[imageW * (iy) + (ix)]);
//this assign specific RED value of image to pixel
data[pixel] ++; // ?? problem statement ...
}
}
#para d_dst: input image TColor is equals to float4.
#para data: the array for histogram size [255]
extern "C" void
cuda_Hist(TColor *d_dst, int imageW, int imageH,int* data)
{
dim3 threads(BLOCKDIM_X, BLOCKDIM_Y);
dim3 grid(iDivUp(imageW, BLOCKDIM_X), iDivUp(imageH, BLOCKDIM_Y));
Hist<<<grid, threads>>>(d_dst, imageW, imageH, data);
}
Have you looked at the SDK sample? The "histogram" sample is available in the CUDA SDK (currently version 3.0 on the NVIDIA developer site, version 3.1 beta available for registered developers).
The documentation with the sample explains nicely how to handle your summation, either using global memory atomics on the GPU or by collecting the results for each block separately and then doing a separate reduction (either on the host or the GPU).
Histogramming is not particularly efficient when implemented with CUDA (or with GPGPU in general) - typically you need to generate lots of partial histograms in shared memory and then sum them. You might want to consider keeping this particular task on the CPU.
You will have to either use atomic function to block other thread from using he same memory, or use the partial histogram. Either way it not that efficient unless the input image is very very large.

Resources