Is it possible to make two grayscale images stastistically equivalent? - opencv

I have 2 grayscale images say G1 and G2 . I also have the statistics (min ,max ,mean and Standard Deviation). I would like to change G2 such that the statistics of G2 (min ,max,mean and SD)match G1. I have tried arithmetic scaling and got the min and max values of both G1 and G2 to match but mean and SD are still different. I have also tried Histogram fitting of G2 in G1 but that did not do what i wanted either. I am using a software called SPIDER this a question applicable to image-processing which can be performed using different software packages(OpenCV MATLABetc) .Any ideas and suggestions would be greatly appreciated.

The easiest thing to do is to apply histogram equalization to both images (histeq in MATLAB). If you do not want to change both images, then you can do histogram matching, but that's a bit more complicated.

You can generate a mapping of input to output based on a simple curve. Start with the values that don't have any dependencies, min and max - those will set the ends of the curve. Now map the mean values to create a single point in the middle of the curve. To modify the standard deviation, you change the shape of the curve between the mean and the endpoints - a curve that is flatter in the middle will give less deviation, and a curve that is flatter towards the ends but steeper in the middle will magnify it.
Edit: I haven't given this enough thought yet, changing the shape of the curve will also change the mean. But I think it can be worked into something usable.

I marked the histogram equalization answer as right because it gave me the best results however I was unable to make the 2 images exactly statistically equivalent as such

Related

How to convert kernel to matrix notation?

I'm trying to understand the bicubic convolution algorithm and haven't been able to understand how the kernel given as a piece wide function,
is turned into this matrix:
I understand to arrive at the matrix a was set to -0.5. No matter how I look at it I can't arrive at the non-symmetric matrix shown.
I've looked through the paper by Keys, but he does not expand into matrix notation and I've struggled with how to get there.
Any insight would be much appreciated.
Step 1 to see the relation is to multiply the function W(x) with the sampled input data f[n] for a given shift t. This gives 5 weights multiplying to 5 input samples, and added together to form an output sample p(t).
The matrix used to compute p(t) is not symmetric because, for any shift t that is not 0, the weights applied to the samples are not symmetric either. You can see this by writing out W(t+i), which are the weights applied to the 5 samples around the output position t (i in [-2,2]).
I've found and understood where Keys describes the process. You can follow along from top to bottom in the image below, but the most important bit to note is Equation 7.
All of the values within the matrix come from the coefficients of the c-terms. The first row of the matrix corresponds to the coefficients of the constant terms, and the first column corresponds to the c_j-1 terms. This can be seen by comparing the figure below to Equation 7's coefficients:
I was able to use this understanding to implement the cubic convolution method to interpolate a surface for which I was able to tune the value of a in order to see the response. I'm happy to help expand on this if anything is unclear!

How to make the labels of superpixels to be locally consistent in a gray-level map?

I have a bunch of gray-scale images decomposed into superpixels. Each superpixel in these images have a label in the rage of [0-1]. You can see one sample of images below.
Here is the challenge: I want the spatially (locally) neighboring superpixels to have consistent labels (close in value).
I'm kind of interested in smoothing local labels but do not want to apply Gaussian smoothing functions or whatever, as some colleagues suggested. I have also heard about Conditional Random Field (CRF). Is it helpful?
Any suggestion would be welcome.
I'm kind of interested in smoothing local labels but do not want to apply Gaussian smoothing functions or whatever, as some colleagues suggested.
And why is that? Why do you not consider helpful advice of your colleagues, which are actually right. Applying smoothing function is the most reasonable way to go.
I have also heard about Conditional Random Field (CRF). Is it helpful?
This also suggests, that you should rather go with collegues advice, as CRF has nothing to do with your problem. CRF is a classifier, sequence classifier to be exact, requiring labeled examples to learn from and has nothing to do with the setting presented.
What are typical approaches?
The exact thing proposed by your collegues, you should define a smoothing function and apply it to your function values (I will not use a term "labels" as it is missleading, you do have values in [0,1], continuous values, "label" denotes categorical variable in machine learning) and its neighbourhood.
Another approach would be to define some optimization problem, where your current assignment of values is one goal, and the second one is "closeness", for example:
Let us assume that you have points with values {(x_i, y_i)}_{i=1}^N and that n(x) returns indices of neighbouring points of x.
Consequently you are trying to find {a_i}_{i=1}^N such that they minimize
SUM_{i=1}^N (y_i - a_i)^2 + C * SUM_{i=1}^N SUM_{j \in n(x_i)} (a_i - a_j)^2
------------------------- - --------------------------------------------
closeness to current constant to closeness to neighbouring values
values weight each part
You can solve the above optimization problem using many techniques, for example through scipy.optimize.minimize module.
I am not sure that your request makes any sense.
Having close label values for nearby superpixels is trivial: take some smooth function of (X, Y), such as constant or affine, taking values in the range [0,1], and assign the function value to the superpixel centered at (X, Y).
You could also take the distance function from any point in the plane.
But this is of no use as it is unrelated to the image content.

Detecting blobs of uniform colour with openCV

I am working on a program to detect split fields for remote sensing (ie. more than one colour/field type within each image, where the image corresponds to the land owned by one farmer) and have been trying to find a solution by reading in images and posterizing them with a clustering algorithm, then analysing the colours and shapes present to try and 'score' each image and decide if more than one type of field is present. My program works reasonably well although there are still quite a few obvious splits that it fails to detect.
Up until now I have been doing this using only standard libraries in c++, but I think now that I should be using openCV or something and I was wondering which techniques to start with. I see there are some image segmentation and blob detection algorithms, but I'm not sure they are applicable because the boundary between fields tends to be blurred or low in contrast. The following are some sample images that I would expect my program to detect as 'split':
(True colour Landsat)
http://imgur.com/m9qWBcq
http://imgur.com/OwqvUvs
Are there any thoughts on how I could go about solving this problem in a different way? Thanks!
1) Convert to HSV and take H or take gray-scaled form. Apply median filter to smooth the fields :P if images are high-resolution.
2) Extract histogram and find all the peaks. These peaks indicate the different colored fields.
3) (A) Now you can use simple thresholding around these peaks-value and then find canny edges for trapezium or similar shapes.
--OR--
(B) Find canny-edges around the peak value ie for peak with maxima value x, find edge for range of (x - dx) to (x + dx) where dx is a small value to be find experimentally.
4) Now you can extract count of contours at different levels/peaks.
I haven't added code because language is not specified and all these constructs are readily available in OpenCV. Its fun to learn. Feel free to ask further. Happy Coding.
Try the implementations of the MSER algorithm in MserFeatureDetector.
The original algorithm was thought for grayscale pictures, and I don't have good experiences with the color version of it, so try to do some preprocesing of the original frames to generate grayscales according to our needs.

motion reconstruction from a single camera

I have a single calibrated camera (known intrinsic parameters, i.e. camera matrix K is known, as well as the distortion coefficients).
I would like to reconstruct the camera's 3d trajectory. There is no a-priori knowledge about the scene.
simplifying the problem by presenting two images that look on the same scene and extracting two set of corresponding matched feature points from them (SIFT, SURF, ORB, etc.)
My problem is how can I calculate the camera extrinsic parameters (i.e. the rotation matrix R and the translation vector t ) between the to viewpoints?
I have managed to calculate the fundamental matrix, and since K is know, the essential matrix as well. using David Nister's efficient solution to the Five-Point Relative Pose Problem I've managed to get 4 possible solution but:
the constraint on the essential matrix E ~ U * diag (s,s,0) * V' doesn't always apply - causing incorrect results.
[EDIT]: taking the average singular value seems to correct the results :) one down
how can I tell which one of the four is the correct one?
Thanks
Your solution to point 1 is correct: diag( (s1 + s2)/2, (s1 + s2)/2, 0).
As for telling which one of the four solutions is correct, only one will give positive depths for all points with respect to the camera frame. That's the one you want.
Code for checking which solution is correct can be found here: http://cs.gmu.edu/%7Ekosecka/examples-code/essentialDiscrete.m from http://cs.gmu.edu/%7Ekosecka/bookcode.html
They use the determinants of U and V to determine the solution with the correct orientation. Look for the comment "then four possibilities are". Since you're only estimating the essential matrix, it's susceptible to noise and does not behave well at all if all of the points are coplanar.
Also, the translation is only recovered to within a constant scaling factor, so the fact that you're seeing a normalized translation vector of unit magnitude is exactly correct. The reason is that the depth is unknown and estimated to be 1. You'll have to find some way to recover the depth as in the code for the eight-point algorithm + 3d reconstruction (Algorithm 5.1 in the bookcode link.)
The book the sample code above is taken from is also a very good reference. http://vision.ucla.edu/MASKS/ Chapter 5, the one you're interested in, is available on the Sample Chapters link.
Congrats on your hard work, sounds like you've tried hard to learn these techniques.
For actual production-strength code, I'd advise to download libmv and ceres, and re-code your solution using them.
Your two questions are really one: invalid solutions are rejected based on the data you have collected. In particular, Nister's (as well as Stewenius's) algorithm is normally used in the inner loop of a RANSAC-like solver, which selects for the solution with the best fit / max number of inliers.

Scoreboard digit recognition using OpenCV

I am trying to extract numbers from a typical scoreboard that you would find at a high school gym. I have each number in a digital "alarm clock" font and have managed to perspective correct, threshold and extract a given digit from the video feed
Here's a sample of my template input
My problem is that no one classification method will accurately determine all digits 0-9. I have tried several methods
1) Tesseract OCR - this one consistently messes up on 4 and frequently returns weird results. Just using the command line version. If I actually try to train it on an "alarm clock" font, I get unknown character every time.
2) kNearest with OpenCV - I search a database consisting of my template images (0-9) and see which one is nearest. I frequently get confusion between 3/1 and 7/1
3) cvMatchShapes - this one is fairly bad, it usually can't tell the difference between 2 of the digits for each input digit
4) Tangent Distance - This one is the closest, but the smallest tangent distance between the input and my templates ends up mapping "7" to "1" every time
I'm really at a loss to get a classification algorithm for such a simple problem. I feel I have cleaned up the input fairly well and it's a fairly simple case for classification but I can't get anything reliable enough to actually use in practice. Any ideas about where to look for classification algorithms, or how to use them correctly would be appreciated. Am I not cleaning up the input? What about a better input database? I don't know what else I'd use for input, each digit and template looks spot on at this point.
The classical digit recognition, which should work well in this case is to crop the image just around the digit and resize it to 4x4 pixels.
A Discrete Cosine Transform (DCT) can be used to further slim down the search space. You could select the first 4-6 values.
With those values, train a classifier. SVM is a good one, readily available in OpenCV.
It is not as simple as emma's or martin suggestions, but it's more elegant and, I think, more robust.
Given the width/height ratio of your input, you may choose a different resolution, like 3x4. Choose the smallest one that retains readable digits.
Given the highly regular nature of your input, you could define a set of 7 target areas of the image to check. Each area should encompass some significant portion of one of the 7 segments of each digital of the display, but not overlap.
You can then check each area and average the color / brightness of the pixels in to to generate a probability for a given binary state. If your probability is high on all areas you can then easily figure out what the digit is.
It's not as elegant as a pure ML type algorithm, but ML is far more suited to inputs which are not regular, and in this case that does not seem to apply - so you trade elegance for accuracy.
Might sound silly but have you tried simply checking for black bars vertically and then horizontally in the top and bottom halfs - left and right of the centerline ?
If you are trying text recognition with Tesseract, try passing not one digit, but a number of duplicated digits, sometimes it could produce better results, here's the example.
However, if you're planning a business software, you may want to have a look at a commercial OCR SDK. For example, try ABBYY FineReader Engine. It's not affordable for free to use applications, but when it comes to business, it can a good value to your product. As far as i know, ABBYY provides the best OCR quality, for example check out http://www.splitbrain.org/blog/2010-06/15-linux_ocr_software_comparison
You want your scorecard image inputs S feeding an algorithm that maps them to {0,1,2,3,4,5,6,7,8,9}.
Let V denote the set of n-tuples of integers.
Construct an algorithm α that maps each image S to a n-tuple
(k1,k2,...,kn)
that can differentiate between two different scoreboard digits.
If you can specify the range of α then you only have to collect the vectors in V that correspond to a digit in order to solve the problem.
I've applied this idea using Martin Beckett's idea and it works. My initial attempt was a simple injection into a 2-tuple by vertical left-to-right summing, with the first integer a image column offset and the second integer was the length of a 'nice' vertical line.
This did not work - images for 6 and 8 would map to the same vectors. So I needed another mini-info-capture for my digit input types (they are not scoreboard) and a 3-tuple info vector does the trick.

Resources