Include the spatial context of pixels during image clustering - image-processing

How can the spatial context (or neighbourhood) of a pixel be taken into account (besides the pixel intensity) when clustering an image?
For the time being, I'm using K-means, GMM and Fuzzy C-means which cluster the image based only on the distribution of the pixel intensities. But, I need to include the information on the spatial context of the pixel into the clustering, to avoid the misclassification caused by the noise speckle.

The standard approach for segmentation is to add the X and Y coordinates with appropriate scaling to the color values (in RGB or Lab space).
Examples of these are SLIC (K-means clustering in x-y-Lab space) and Quickshift (an accelerated mean shift in x-y-Lab space).
When also considering spacial distances, it is often possible to gain a lot of speed. Check out the implementations in scikit-image or this blog or my blog

Related

Is it possible to "transfer" image texture via inverse gabor transform

Apologies if this is a naive or foolish question, but I am trying to learn a bit more about image processing techniques. I had an intuition about Gabor filters but can't seem to find an answer.
If I calculate a bank of Gabor filters for a set of images and reduce them to N features that a machine learning algorithm has determined to be indicative of a specific texture, can these N features be applied to a novel image to "transfer" the texture to the novel image? Perhaps via an inverse Gabor transform? For example, if I have 10 Gabor filters that can accurate classify a texture as "brick", can these 10 filters be applied to a "wood" texture image (picture of a 2x4) to approximate the brick texture on the wood surface?
If possible this is possible, can it be easily implemented in Python?
As far as I understand, this is directly impossible. "When working with Gabor filters, it is common to work with the magnitude response of each filter." https://www.mathworks.com/help/images/texture-segmentation-using-gabor-filters.html
That is, information is only about the magnitude of the signal, but there is no information about the phase.

Which machine learning model would be feasible for stripping the background from product photos?

My goal is to be able to have a way to process a product photo through the model, and have it return the same photo with the product against a white background. The product photos will be of varying sizes and product types.
I'd like to feed the model photos of products with backgrounds, and those without. In the future I will also expand on the dataset with partially removed backgrounds.
If you are looking for an easy way of doing this, I'd suggest the K-means clustering algorithm. Assuming that you have a simple plain background and an image (of interest) you can obtain the RGB pixel values and use a K-means clustering algorithm with the number of clusters set to 2.
Let me explain this to you with the help of an example. Suppose you have an image of dimension 28*28 (just another arbitrary dimension). The total number of pixels in the image would be 784. Each pixel is represented as a combination of 3 RGB values ranging from 0-255.
A K-Means clustering algorithm will cluster the pixel values into K clusters thus each cluster represents pixel values which are more similar than the pixel values in another cluster. This technique is especially helpful in drawing contours (borders) around images of interest.
In the K-means clustering algorithm, there would be 784 sample points each represented in a 3 dimensional plane for this example. It will cluster these data points into K (2 in this example) clusters.
Here is a very simple implementation of the K-means clustering algorithm.
If you are looking for advanced machine learning implementation, then I'd suggest you look for Deep Convolution Neural Networks for Background Removal in Images. This machine learning technique has been successfully used for the task for background image removal
Read more about it from here, here and here.

Create a feature vector to classify segments in air images

I am working on a project to segment air images and classify each segment. The images are very large and have huge homogeneous areas, so I decided to use a Split and Merge Algorithm for the segmentation.
(On the left the original image and on the right the segmented one, where each segment is represented in its RGB mean value Thanks to this answer)
For the classification I want to use a SVM Classifier (I used it a lot in two projects before) with a feature vector.
For the beginning I just want to use five classes: Water, Vegetation, Built up area, Dune and Anomaly
Now I am thinking about what I can put in this feature vector:
The mean RGB Value of the Segment
A texture feature (but can I represent the texture of the segment with just one value?)
The place in the source image (maybe with a value which represents left, right or middle?)
The size of the segment (Water segments should be much larger than built areas)
The mean RGB values of the fourth neighborhood of the segment
So has anyone done something like this and can give me some advises what useful stuff I can put in the feature vector? And can someone give me an advise how I can represent the texture in the segment correctly?
Thank you for your help.
Instead of the Split and Merge algorithm, you could also use superpixels. There are several fast and easy-to-use superpixel algorithms available (some are even implemented in recent OpenCV versions). To name just a view:
Compact Watershed (see here: https://www.tu-chemnitz.de/etit/proaut/forschung/rsrc/cws_pSLIC_ICPR.pdf)
preSLIC and SLIC (see here: https://www.tu-chemnitz.de/etit/proaut/forschung/rsrc/cws_pSLIC_ICPR.pdf and here: http://www.kev-smith.com/papers/SLIC_Superpixels.pdf)
SEEDS (see here: https://arxiv.org/abs/1309.3848)
ERGC (see here: https://sites.google.com/site/pierrebuyssens/code/ergc)
Given the superpixel segmentation, there is a vast set of features you can compute in order to classify them:
In Automatic Photo Pop-Up Table 1, Hoiem et al. consider, among others, the following features: mean RGB color, mean HSV color, color histograms, saturation histograms, Textons, differenty oriented Gaussian derivative filters, mean x and y location, area, ...
In Recovering Occlusion Boundaries from a Single Image, Hoiem et al. consider some additional features to the above list in Table 1.
In SuperParsing: Scalable Nonparametric Image
Parsing with Superpixels
, Tighe et al. additionally consider SIFT histograms, the mask reduced on a 8 x 8 image, boundign box shape, and color thumbnails.
In Class Segmentation and Object Localization with Superpixel Neighborhoods
, Fulkerson et al. also consider features from neighboring superpixels.
Based on the superpixels, you can still apply a simple merging-scheme in order to reduce the number of superpixels. Simple merging by color histograms might already be useful for your tasks. Otherwise you can additionally use edge information in between superpixels for merging.
You don't need to restrict yourself to just 1 feature vector. You could try multiple feature vectors (from the list you already have), and feed them to classifiers based on multiple kernel learning (MKL). MKL has shown to improve the performance over a single feature approach, and one of my favourite MKL techniques is VBpMKL.
If you have time, I would suggest you try one or more the following features, which can capture the features of interest:
Haralick texture features
Histogram oriented gradient features
Gabor filters
SIFT
patch-wise RGB means

Which feature descriptors to use and why?

I do like to do compute the position and orientation of a camera in a civil aircraft cockpit.
I do use LEDs as fixed points. My plan is to save their X,Y,Z Position associated with the LED.
How can I detect and identify my LEDs on my images? Which feature descriptor and feature point extractor should I use?
How should I modify my image prior to feature detection?
I like to stay efficient.
----Please stop voting this question down----
Now after having found the solution to my problem, I do realize the question might have been too generic.
Anyways to support other people googeling I am going to describe my answer.
With combinations of OpenCVs functions I create masks which contain areas where the LEDs could be in white. The rest of the image is black. These functions are for example Core.range, Imgproc.dilate, and Imgproc.erode. Also with Imgproc.findcontours I am filtering out too large or too small contours. Also used to combine masks is Core.bitwise_and, or Core.bitwise_not.
The masks are computed from an image in the HSV color space as input.
Having these masks with potential LED areas, I do compute color histograms, which of the intensity normalized rgb colors. (Hue did not work well enough for me). These histograms are trained and normalized using a set of annotated input images and represent my descriptor.
I do match the trained descriptor against computed onces in the application using histogram intersection.
So I receive distance measures. Using a threshold for these measures, the measures and the knowledge of the geometric positions of the real-life LEDs I translate the patches to a graph system, which helps me to find the longest chain of potential LEDs.

Confusion regarding Object recognition and features using SURF

I have some conceptual issues in understanding SURF and SIFT algorithm All about SURF. As far as my understanding goes SURF finds Laplacian of Gaussians and SIFT operates on difference of Gaussians. It then constructs a 64-variable vector around it to extract the features. I have applied this CODE.
(Q1 ) So, what forms the features?
(Q2) We initialize the algorithm using SurfFeatureDetector detector(500). So, does this means that the size of the feature space is 500?
(Q3) The output of SURF Good_Matches gives matches between Keypoint1 and Keypoint2 and by tuning the number of matches we can conclude that if the object has been found/detected or not. What is meant by KeyPoints ? Do these store the features ?
(Q4) I need to do object recognition application. In the code, it appears that the algorithm can recognize the book. So, it can be applied for object recognition. I was under the impression that SURF can be used to differentiate objects based on color and shape. But, SURF and SIFT find the corner edge detection, so there is no point in using color images as training samples since they will be converted to gray scale. There is no option of using colors or HSV in these algorithms, unless I compute the keypoints for each channel separately, which is a different area of research (Evaluating Color Descriptors for Object and Scene Recognition).
So, how can I detect and recognize objects based on their color, shape? I think I can use SURF for differentiating objects based on their shape. Say, for instance I have a 2 books and a bottle. I need to only recognize a single book out of the entire objects. But, as soon as there are other similar shaped objects in the scene, SURF gives lots of false positives. I shall appreciate suggestions on what methods to apply for my application.
The local maxima (response of the DoG which is greater (smaller) than responses of the neighbour pixels about the point, upper and lover image in pyramid -- 3x3x3 neighbourhood) forms the coordinates of the feature (circle) center. The radius of the circle is level of the pyramid.
It is Hessian threshold. It means that you would take only maximas (see 1) with values bigger than threshold. Bigger threshold lead to the less number of features, but stability of features is better and visa versa.
Keypoint == feature. In OpenCV Keypoint is the structure to store features.
No, SURF is good for comparison of the textured objects but not for shape and color. For the shape I recommend to use MSER (but not OpenCV one), Canny edge detector, not local features. This presentation might be useful

Resources