How to crop the roi of the image - opencv

in my project I want to crop the ROI of an image. For this I create a map with the regions of interesst. Now I want to crop the area which has the most important pixels (black is not important, white is important).
Has someone an idea how to realize it? I think this is a maximazion problem
The red border in the image below is an example how I want to crop this image

If I understood your question correctly, you have computed a value at every point in the image. These values suggests the "importance"/"interestingness"/"saliency" of each point. The matrix/image containing these values is the "map" you are referring to. Your goal is to get the bounding box for regions of interests (ROI) with high "importance" score.
The way I think you can go about segmenting the ROIs is to apply Graph Cut based segmentation computing a "score" at each pixel using your importance map. The result of the segmentation is a binary mask that masks the "important" pixels. Next, run OpenCV's findcontours function on this binary mask to get the individual connected components. Then use OpenCV's boundingRect function on the contours returned by findContours(...) to get the bounding boxes.
The good thing about using a Graph Cut based segmentation algorithm in this way is that it will join up fragmented components i.e. the resulting binary mask will tend not to have pockets of small holes even if your "importance" map is noisy.
One Graph Cut based segmentation algorithm already implemented in OpenCV is the GrabCut algorithm. A quick hack would be to apply it on your "importance" map to get the binary mask I mentioned above. A more sophisticated approach would be to build the foreground and background (color perhaps?) model using your "importance" map and passing it as input to the function. More details on GrabCut in OpenCV can be found here: http://docs.opencv.org/modules/imgproc/doc/miscellaneous_transformations.html?highlight=grabcut#void grabCut(InputArray img, InputOutputArray mask, Rect rect, InputOutputArray bgdModel, InputOutputArray fgdModel, int iterCount, int mode)
If you would like greater flexibility, you can hack your own graphcut based segmentation algorithm using the following MRF library. This library allows you to specify your custom objective function in computing the graph cut: http://vision.middlebury.edu/MRF/code/
To use the MRF library, you will need to specify the "cost" at each point in your image indicating whether that point is "foreground" or "background". You can also think of this dichotomy as "important" or "not important" instead of "foreground" vs "background".
The MRF library's goal is to return you a label at each point such that total cost of assigning those labels is as small as possible. Hence, the game is to come up with a function to compute a small cost for points you consider important and large otherwise.
Specifically, the cost at each point is composed of 2 parts: 1) The data term/function and 2) The smoothness term/function. As mentioned earlier, the smaller the data term at each point, the more likely that point will be selected. If your "importance" score s_ij is in the range [0, 1], then a common way to compute your data term would be -log(s_ij).
The smoothness terms is a way to suggest whether 2 neighboring pixels p, q, should have the same label i.e. both "foreground", "background", or one "foreground" and the other "background". Similar to the data cost, you have to construct it such that the cost is small for neighbor pixels having similar "importance" score so that they will be assigned the same label. This term is responsible for "smoothing" the resulting mask so that you will not have pixels of low "importance" sprinkled within regions of high "importance" and vice versa. If there are such regions, OpenCV's findContours(...) function mentioned above will return contours for these regions, which can be filtered out perhaps by checking their size.
Details on functions to compute the cost can be found in the GrabCut paper: GrabCut
This blog post provides a bit more detail (and code) on creating your own graphcut segmentation algorithm in OpenCV: http://www.morethantechnical.com/2010/05/05/bust-out-your-own-graphcut-based-image-segmentation-with-opencv-w-code/
Another paper showing how to perform graph cut segmentation on grayscale images (your case), with better notations, and without the complicated image matting part (not implemented in OpenCV's version) in the GrabCut paper is this: Graph Cuts and Efficient N-D Image Segmentation
Hope this helps.

Related

Which feature descriptors to use and why?

I do like to do compute the position and orientation of a camera in a civil aircraft cockpit.
I do use LEDs as fixed points. My plan is to save their X,Y,Z Position associated with the LED.
How can I detect and identify my LEDs on my images? Which feature descriptor and feature point extractor should I use?
How should I modify my image prior to feature detection?
I like to stay efficient.
----Please stop voting this question down----
Now after having found the solution to my problem, I do realize the question might have been too generic.
Anyways to support other people googeling I am going to describe my answer.
With combinations of OpenCVs functions I create masks which contain areas where the LEDs could be in white. The rest of the image is black. These functions are for example Core.range, Imgproc.dilate, and Imgproc.erode. Also with Imgproc.findcontours I am filtering out too large or too small contours. Also used to combine masks is Core.bitwise_and, or Core.bitwise_not.
The masks are computed from an image in the HSV color space as input.
Having these masks with potential LED areas, I do compute color histograms, which of the intensity normalized rgb colors. (Hue did not work well enough for me). These histograms are trained and normalized using a set of annotated input images and represent my descriptor.
I do match the trained descriptor against computed onces in the application using histogram intersection.
So I receive distance measures. Using a threshold for these measures, the measures and the knowledge of the geometric positions of the real-life LEDs I translate the patches to a graph system, which helps me to find the longest chain of potential LEDs.

Given a image, how extract (detect) blurred (or focussed) regions?

Blurred (focussed) region could be arbitrary shape (not just rectangle)
Simple input/output example:
(is valid any algorithm, command, software, ...)
Thank you!
The variance is much higher in focused regions because blurring acts as a spatial low-pass filter.
But if the object shows no variance (uniform object like wall or a sky) you are completely lost and do not see any differences at all between the blurred and focused regions.
Therefore your question cannot be answered for every kind of image. It's impossible.
What I would do:
High pass filter your image. (Edge detection, absolute differences between neighboring pixels, Laplace or local variances, ...) (Matlab)
Smooth and apply threshold.
Apply pre-knowledge about imaging optics if existing. Example would be knowing that the focus must be round and in in the center, ...

Confusion regarding Object recognition and features using SURF

I have some conceptual issues in understanding SURF and SIFT algorithm All about SURF. As far as my understanding goes SURF finds Laplacian of Gaussians and SIFT operates on difference of Gaussians. It then constructs a 64-variable vector around it to extract the features. I have applied this CODE.
(Q1 ) So, what forms the features?
(Q2) We initialize the algorithm using SurfFeatureDetector detector(500). So, does this means that the size of the feature space is 500?
(Q3) The output of SURF Good_Matches gives matches between Keypoint1 and Keypoint2 and by tuning the number of matches we can conclude that if the object has been found/detected or not. What is meant by KeyPoints ? Do these store the features ?
(Q4) I need to do object recognition application. In the code, it appears that the algorithm can recognize the book. So, it can be applied for object recognition. I was under the impression that SURF can be used to differentiate objects based on color and shape. But, SURF and SIFT find the corner edge detection, so there is no point in using color images as training samples since they will be converted to gray scale. There is no option of using colors or HSV in these algorithms, unless I compute the keypoints for each channel separately, which is a different area of research (Evaluating Color Descriptors for Object and Scene Recognition).
So, how can I detect and recognize objects based on their color, shape? I think I can use SURF for differentiating objects based on their shape. Say, for instance I have a 2 books and a bottle. I need to only recognize a single book out of the entire objects. But, as soon as there are other similar shaped objects in the scene, SURF gives lots of false positives. I shall appreciate suggestions on what methods to apply for my application.
The local maxima (response of the DoG which is greater (smaller) than responses of the neighbour pixels about the point, upper and lover image in pyramid -- 3x3x3 neighbourhood) forms the coordinates of the feature (circle) center. The radius of the circle is level of the pyramid.
It is Hessian threshold. It means that you would take only maximas (see 1) with values bigger than threshold. Bigger threshold lead to the less number of features, but stability of features is better and visa versa.
Keypoint == feature. In OpenCV Keypoint is the structure to store features.
No, SURF is good for comparison of the textured objects but not for shape and color. For the shape I recommend to use MSER (but not OpenCV one), Canny edge detector, not local features. This presentation might be useful

How to implement despeckle in OpenCV?

If histogram equalization is done on a poorly-contrasted image then its features become more visible. However there is also a large amount of grains/speckles/noise. using blurring functions already available in OpenCV is not desirable - i'll be doing text-detection on the image later on and the letters will get unrecognizable.
So what are the preprocessing techniques that should be applied?
Standard blur techniques that convolve the image with a kernel (e.g. Gaussian blur, box filter, etc) act as a low-pass filter and distort the high-frequency text. If you have not done so already, try cv::bilateralFilter() or cv::medianBlur(). If neither of these algorithms work, you should look into other edge-preserving smoothing algorithms.
If you imagine the image as a three-dimensional space, traditional filtering replaces the value of each pixel with the weighted average of all filters in a circle centered around the pixel. Bilateral filtering does the same, but uses a three-dimensional sphere centered at the pixel. Since a well-defined edge looks like a plateau, the sphere contains only one point and the pixel value remains unchanged. You can get a more detailed explanation of the bilateral filter and some sample output here.

Gaussian blur and convolution kernels

I do not understand what a convolution kernel is and how I would apply a convolution matrix to pixels in an image (I am talking about doing a Gaussian Blur operation on an image).
Also could I get an explanation on how to create a kernel for a Gaussian Blur operation?
I am reading this article but I cannot seem to understand how things are done...
Thanks to anyone who takes time to explain this to me :),
ExtremeCoder
The basic idea is that the new pixels of the image are created by an weighted average of the pixels close to it (imagine drawing a circle around the pixel).
For each pixel in the image you are going to create a little square around the pixel. Lets say you take the 8 neighbors next to a pixel (including diagonals even though do not matter here), and we perform a weighted average to get the middle pixel.
In the Gaussian blur case it breaks down to two one dimensional operations. For each pixel take the some amount of pixels next to a pixel in the row direction only. Multiply the pixel values time the weights computed from the Gaussian distribution (or if you are doing this for an visual effect and not for a scientific reason, the weights can anything that looks good) and sum them up. Another way to look at it is the pixel make a vector and the weights make a vector and your are taking the dot product. Repeat this process in the column direction as a separate pass.
A convolution kernel is a matrix of values that specify how the neighborhood of a pixel contribute to that pixel's state in the final image. There's a fair description of the basics here. A gaussian blur is a convolution function that uses a really ugly (you've seen the wikipedia page) function to compute a convolution kernel to pass over the image. You'll find an example kernel for a gaussian in that wikipedia page.
The point of all the math in there is to produce a soft blur that resembles the scatter pattern produced by a mesh screen placed between the viewer and the image. You can think of the 'size' (the standard deviation) of the gaussian as being related to the distance between the image and the screen.
Here's an awesome tool, if you don't want to calculate it all by yourself (like me):
http://www.embege.com/gauss/
EDIT
Since the link seems to be broken now, here's a link to archive.org:
http://web.archive.org/web/20150217075657/http://www.embege.com/gauss

Resources