Template Matching on various sizes - opencv

Right now I am working on an OCR algorithm with Template Matching, using the opencv library. I am comparing pixel by pixel, and till now I have obtained good results. The problem comes when the area I want to match is of different size.
Ex: Template size = 70x100 while ROI = 140x200.
Is there any function that I can use in order adapt the required size and end up with the same amount of rows and columns?
Thanks
Robert Grech

Usually one makes an image scale pyramid and then only scans with the 70x100 windows across all scales i.e. as in opencv HOGDescriptor:
double scale = 1.;
double scale0 = 1.05;
int maxLevels = 64;
int nLevels;
Size templateSize(70,100);
cv::Mat testImage = cv::imread("test1.jpg");
vector<double> levelScale;
for( nLevels = 0; nLevels < maxLevels; nLevels++ )
{
levelScale.push_back(scale);
if( cvRound(testImage.cols/scale) < templateSize.width ||
cvRound(testImage.rows/scale) < templateSize.height ||
scale0 <= 1 )
break;
scale *= scale0;
}
nLevels = std::max(nLevels, 1);
levelScale.resize(nLevels);
int level;
for(level =0; level<nLevels; level++)
{
cv::Mat testAtScale;
Size sz(cvRound(testImage.cols/levelScale[level]),
cvRound(testImage.rows/levelScale[level]));
resize(testImage,testAtScale,sz);
//result = match(template,testAtScale);
//cv::imshow("sclale",testAtScale);
//cv::waitKey();
}
you would then need to post-process your results back to the original scale, this is simple with a box, but if you have a heat map / response map / probability map, then re-sizing it back up maybe somewhat hacky.

Related

Detect Image Problems

I really don't know what it is called (distortion or something else)
But I would like to detect lens camera problems for some different types of images by using emgucv (or opencv)
Any ideas about which algorithms to use would be appreciated
Second image seems to have high noise, but is there any way to understand high noise via opencv?
This is very difficult to achieve generically without reference data or a homogeneity sample. However, I have developed a recommendation analyzing the Average SNR (Signal to Noise) ratio of the image. The algorithm divides the input image into a specified number of "sub images' based on a specified kernel size in order to evaluate each independently for local SNR. The computed SNRs for each sub image are then mean averaged to provide an indicator for the global SNR of the image.
You will need to test this approach exhaustively, however it shows promise on the following three images, producing AvgSNR;
Image #1 - AvgSNR = 0.9
Image #2 - AvgSNR = 7.0
Image #3 - AvgSNR = 0.6
NOTE: See how the "clean" control image produces a much higher AvgSNR.
The only variable to consider is the kernel size. I would recommend keeping this at a size that will support will even the smallest of your potential input images. 30 pixels square should likely be appropriate for many images.
I enclose my test code with annotation:
class Program
{
static void Main(string[] args)
{
// List of file names to load.
List<string> fileNames = new List<string>()
{
"IifXZ.png",
"o1z7p.jpg",
"NdQtj.jpg"
};
// For each image
foreach (string fileName in fileNames)
{
// Determine local file path
string path = Path.Combine(Environment.CurrentDirectory, #"TestImages\", fileName);
// Load the image
Image<Bgr, byte> inputImage = new Image<Bgr, byte>(path);
// Compute the AvgSNR with a kernel of 30x30
Console.WriteLine(ComputeAverageSNR(30, inputImage.Convert<Gray, byte>()));
// Display the image
CvInvoke.NamedWindow("Test");
CvInvoke.Imshow("Test", inputImage);
while (CvInvoke.WaitKey() != 27) { }
}
// Pause for evaluation
Console.ReadKey();
}
static double ComputeAverageSNR(int kernelSize, Image<Gray, byte> image)
{
// Calculate the number of sub-divisions given the kernel size
int widthSubDivisions, heightSubDivisions;
widthSubDivisions = (int)Math.Floor((double)image.Width / kernelSize);
heightSubDivisions = (int)Math.Floor((double)image.Height / kernelSize);
int totalNumberSubDivisions = widthSubDivisions * heightSubDivisions;
Rectangle ROI = new Rectangle(0, 0, kernelSize, kernelSize);
double avgSNR = 0;
// Foreach sub-divions, calculate the SNR and sum to the avgSNR
for (int v = 0; v < heightSubDivisions; v++)
{
for (int u = 0; u < widthSubDivisions; u++)
{
// Iterate the sub-division position
ROI.Location = new Point(u * kernelSize, v * kernelSize);
// Calculate the SNR of this sub-division
avgSNR += ComputeSNR(image.GetSubRect(ROI));
}
}
avgSNR /= totalNumberSubDivisions;
return avgSNR;
}
static double ComputeSNR(Image<Gray, byte> image)
{
// Local varibles
double mean, sigma, snr;
// Calculate the mean pixel value for the sub-division
int population = image.Width * image.Height;
mean = CvInvoke.Sum(image).V0 / population;
// Calculate the Sigma of the sub-division population
double sumDeltaSqu = 0;
for (int v = 0; v < image.Height; v++)
{
for (int u = 0; u < image.Width; u++)
{
sumDeltaSqu += Math.Pow(image.Data[v, u, 0] - mean, 2);
}
}
sumDeltaSqu /= population;
sigma = Math.Pow(sumDeltaSqu, 0.5);
// Calculate and return the SNR value
snr = sigma == 0 ? mean : mean / sigma;
return snr;
}
}
NOTE: Without a reference, it is not possible to differentiate between natural variance/fidelity and "noise". For example, a highly texture background, or a scene with few homogeneous regions will yield a high AvgSNR. This approach will perform best when the evaluated scene consists mostly of plain, mono-color surfaces, such as the server room or shop front. Grass for example would contain a large amount of texture and therefore "noise".
An alternative method is to consider evaluating your images in the frequency domain following a Fourier transform. Principally, the noise examples you have provided are images containing unwanted, high frequency content. Conduct FFT and evaluate for images violating a threshold for high frequencies. Here you will from an example of FFT with Emgu: FFT with Emgu

Compare multiple Image Histograms with Processing

picture histogram
I'm quite new to the processing language. I am trying to create an image comparison tool.
The idea is to get a histogram of a picture (see screenshot below, size is 600x400), which is then compared to 10 other histograms of similar pictures (all size 600x400). The histogram shows the frequency distribution of the gray levels with the number of pure black values displayed on the left and number of pure white values on the right.
In the end I should get a "winning" picture (the one that has the most similar histogram).
Below you can see the code for the image histogram, similar to the processing tutorial example.
My idea was to create a PImage [] for the 10 other pictures to create histograms and then an if statement, but I'm not sure how to code it.
Does anyone have a tip on how to proceed or where to look? I couldn't find a similar post.
Thanks in advance and sorry if the question is very basic!
size(600, 400);
// Load an image from the data directory
// Load a different image by modifying the comments
PImage img = loadImage("image4.jpg");
image(img, 0, 0);
int[] hist = new int[256];
// Calculate the histogram
for (int i = 0; i < img.width; i++) {
for (int j = 0; j < img.height; j++) {
int bright = int(brightness(get(i, j)));
hist[bright]++;
}
}
// Find the largest value in the histogram
int histMax = max(hist);
stroke(255);
// Draw half of the histogram (skip every second value)
for (int i = 0; i < img.width; i += 2) {
// Map i (from 0..img.width) to a location in the histogram (0..255)
int which = int(map(i, 0, img.width, 0, 255));
// Convert the histogram value to a location between
// the bottom and the top of the picture
int y = int(map(hist[which], 0, histMax, img.height, 0));
line(i, img.height, i, y);
}
Not sure if your problem is the implementation in processing or if you don't know how to compare histograms. I assume it is the comparison as the rest is pretty straight forward. Calculate the similarity for every candidate and pick the winner.
Search the web for histogram comparison and among others you will find:
http://docs.opencv.org/2.4/doc/tutorials/imgproc/histograms/histogram_comparison/histogram_comparison.html
OpenCV implements four measures for histogram similarity.
Correlation
where and N is the number of histogram bins
or
Chi-Square
or
Intersection
or
Bhattacharyya-Distance
You can use these measures, but I'm sure you'll find something else as well.

Segmentation of perspectively distorted barcodes

There are images with perspectively distorted barcodes in them.
They are located and decoded using ZBar.
Now I do not only need the rough location, but the four real corner points of the barcode, that define the enclosing 4-point polygon.
I tried different approaches, but did not yet get the desired result.
One of them was:
convert image to grayscale
threshold image
erode image
floodFill beginning with a pixel known to be part of barcode
obtain the contour around the floodFill result
But around this contour I now would need to find the minimum best fitting 4-point polygon, which seems to be not that easy.
Do you have ideas for better approaches?
You could use the following code and try to reduce your contour to 4-point polygon via approxPoly
vector approx;
for (size_t i = 0; i < contours.size(); i++)
{
approxPolyDP(Mat(contours[i]), approx,
arcLength(Mat(contours[i]), true)*0.02, true);
if (approx.size() == 4 &&
fabs(contourArea(Mat(approx))) > 1000 &&
isContourConvex(Mat(approx)))
{
double maxCosine = 0;
for( int j = 2; j < 5; j++ )
{
double cosine = fabs(angle(approx[j%4], approx[j-2], approx[j-1]));
maxCosine = MAX(maxCosine, cosine);
}
if( maxCosine < 0.3 )
squares.push_back(approx);
}
}
http://opencv-code.com/tutorials/detecting-simple-shapes-in-an-image/
You can also try the following methods, maybe they will produce good enough results for you:
http://docs.opencv.org/modules/imgproc/doc/structural_analysis_and_shape_descriptors.html?highlight=minarearect#minarearect
or
http://docs.opencv.org/modules/imgproc/doc/structural_analysis_and_shape_descriptors.html?highlight=convexhull#convexhull
OK, I found a solution that works good enough for my use case.
First a scanline is generated from the ZBar result.
Now the first and the last black pixels are found in verion of the image resulting from cv::adaptivethreshold with a large enough blockSize.
From there on the first and the last bar are segmented using cv::findContours.
Now for both end bars the two contour points with the most distance to each others are searched.
They finally define the enclosing 4-point-polygon.
Which is not exactly what I posted in my question, but the additional size due to the elongated guard patterns does not matter in my case.

What is the correct way to apply filter to a image

I was wondering what the correct way would be to apply filter to a image. The image processing textbook that I am reading only talks about the mathematical and theoretical aspect of filters but doesn't talk much the programming part of it !
I came up with this pseudo code could some one tell me if it is correct cause I applied the sobel edge filter to a image and I am not satisfied with the output. I think it detected many unnecessary points as edges and missed out on several points along the edge.
int filter[][] = {{0d,-1d,0d},{-1d,8d,-1d},{0d,-1d,0d}};// I dont exactly remember the //sobel filter
int total = 0;
for(int i = 2;i<image.getWidth()-2;i++)
for(int j = 2;j<image.getHeight()-2;j++)
{
total = 0;
for(int k = 0;k<3;k++)
for(int l = 0;l<3;l++)
{
total += intensity(image.getRGB(i,j)) * filter[i+k][j+l];
}
if(total >= threshold){
image.setRGB(i,j,WHITE);
}
}
int intensity(int color)
{
return (((color >> 16) & 0xFF) + ((color >> 8) & 0xFF) + color)/3;
}
Two issues:
(1) The sober operator includes x-direction and y-direction, they are
int filter[][] = {{1d,0d,-1d},{2d,0d,-2d},{1d,0d,-1d}}; and
int filter[][] = {{1d,2d,1d},{0d,0d,0d},{-1d,-2d,-1d}};
(2) The convolution part:
total += intensity(image.getRGB(i+k,j+l)) * filter[k][l];
Your code doesn't look quiet right to me. In order to apply the filter to the image you must apply the discrete time convolution algorithm http://en.wikipedia.org/wiki/Convolution.
When you do convolution you want to slide the 3x3 filter over the image, moving it one pixel at a time. At each step you multiply the value of the filter 'pixel' by the corresponding value of the image pixel which is under that particular filter 'pixel' (the 9 pixels under the filter are all affected). The values that result should be added up onto a new resulting image as you go.
Thresholding is optional...
The following is your code modified with some notes:
int filter[][] = {{0d,-1d,0d},{-1d,8d,-1d},{0d,-1d,0d}};
//create a new array for the result image on the heap
int newImage[][][3] = ...
//initialize every element in the newImage to 0
for(int i = 0;i<image.getWidth()-1;i++)
for(int j = 0;j<image.getHeight()-1;j++)
for (int k = 0; k<3; k++)
{
newImage[i][j][k] = 0;
}
//Convolve the filter and the image
for(int i = 1;i<image.getWidth()-2;i++)
for(int j = 1;j<image.getHeight()-2;j++)
{
for(int k = -1;k<2;k++)
for(int l = -1;l<2;l++)
{
newImage[i+k][j+l][1] += getRed(image.getRGB(i+k ,j+l)) * filter[k+1][l+1];
newImage[i+k][j+l][2] += getGreen(image.getRGB(i+k ,j+l)) * filter[k+1][l+1];
newImage[i+k][j+l][3] += getBlue(image.getRGB(i+k ,j+l)) * filter[k+1][l+1];
}
}
int getRed(int color)
{
...
}
int getBlue(int color)
{
...
}
int getGreen(int color)
{
...
}
Please note that the code above does not handle the edges of the image exactly right. If you wanted to make it absolutely perfect you'd start by sliding the filter mostly off screen (so the first position would apply the lower right corner of the filter to the image 0,0 pixel of the image. Doing this is really a pain though, so usually its easier just to ignore the 2 pixel border around the edges.
Once you've got that working you can experiment by sliding the Sobel filter in the horizontal and then the vertical directions. You will notice that the filter acts most strongly on lines which are perpendicular to the direction of travel (to the filter). So for the best results apply the filter in the horizontal and then the vertical direction (using the same newImage). That way you will detect vertical as well as horizontal lines equally well. :)
You have some serious undefined behavior going on here. The array filter is 3x3 but the subscripts you're using i+k and j+l are up to the size of the image. It looks like you've misplaced this addition:
total += intensity(image.getRGB(i+k,j+l)) * filter[k][l];
Use GPUImage, it's quite good for you.

Problem assigning values to Mat array in OpenCV 2.3 - seems simple

Using the new API for OpenCV 2.3, I am having trouble assigning values to a Mat array (or say image) inside a loop. Here is the code snippet which I am using;
int paddedHeight = 256 + 2*padSize;
int paddedWidth = 256 + 2*padSize;
int n = 266; // padded height or width
cv::Mat fx = cv::Mat(paddedHeight,paddedWidth,CV_64FC1);
cv::Mat fy = cv::Mat(paddedHeight,paddedWidth,CV_64FC1);
float value = -n/2.0f;
for(int i=0;i<n;i++)
{
for(int j=0;j<n;j++)
fx.at<cv::Vec2d>(i,j) = value++;
value = -n/2.0f;
}
meshElement = -n/2.0f;
for(int i=0;i<n;i++)
{
for(int j=0;j<n;j++)
fy.at<cv::Vec2d>(i,j) = value;
value++;
}
Now in the first loop as soon as j = 133, I get an exception which seems to be related to depth of the image, I cant figure out what I am doing wrong here.
Please Advise! Thanks!
You are accessing the data as 2-component double vector (using .at<cv::Vec2d>()), but you created the matrices to contain only 1 component doubles (using CV_64FC1). Either create the matrices to contain two components per element (with CV_64FC2) or, what seems more appropriate to your code, access the values as simple doubles, using .at<double>(). This explodes exactly at j=133 because that is half the size of your image and when treated as containing 2-component vectors when it only contains 1, it is only half as wide.
Or maybe you can merge these two matrices into one, containing two components per element, but this depends on the way you are going to use these matrices in the future. In this case you can also merge the two loops together and really set a 2-component vector:
cv::Mat f = cv::Mat(paddedHeight,paddedWidth,CV_64FC2);
float yValue = -n/2.0f;
for(int i=0;i<n;i++)
{
float xValue = -n/2.0f;
for(int j=0;j<n;j++)
{
f.at<cv::Vec2d>(i,j)[0] = xValue++;
f.at<cv::Vec2d>(i,j)[1] = yValue;
}
++yValue;
}
This might produce a better memory accessing scheme if you always need both values, the one from fx and the one from fy, for the same element.

Resources