I'm trying to think of ways in which clustering (e.g. k-means) fits into procedures for doing semantic segmentation or object recognition on images. My understanding is that semantic segmentation is done principally using deep CNNs. K-means works fine for segmentation, but semantic segmentation is supervised, thus makes clustering itself insufficient.
My question is: how can such unsupervised techniques fit into the overall pipeline of semantic segmentation? Do other techniques generally dominate it, or are there still practical use cases for problems involving classification/localization? I'm aware of a paper using k-means clustering to generate candidate boxes – are there other relevant use cases of clustering techniques in this pipeline?
They do not dominate but are used when data is less.
Unsupervised methods are used in medical image segmentation where data is generally scarce.
Example - Hill Climbing Segmentation
Implementation: https://in.mathworks.com/matlabcentral/fileexchange/22274-hill-climbing-color-image-segmentation
Have a look at this paper for a discussion on hill climbing and k-means for segmentation
Related
Is Gradient Descent algorithm ever used during training of any unsupervised training like clustering, collaborative filtering, etc..?
Gradient descent can be used for a whole bunch of unsupervised learning tasks. In fact, neural networks, which use the Gradient Descent algorithm are used widely for unsupervised learning tasks, like representations of text or natural language in vector space (word2vec).
You can also think of dimensionality reduction techniques like autoencoders, which use Gradient Descent as well.
I am not aware of GD being directly used in clustering, but this link discusses an approach that utilizes Autoencoders with Kmeans, which use GD.
Read this link as well, which discusses a similar question.
In unsupervised algorithms, you don't need to do this. For example, in k-Means, where you are trying to minimize the mean squared error (MSE), you can minimize the error directly at each step given the assignments; no gradients needed.
I have some questions about SVM :
1- Why using SVM? or in other words, what causes it to appear?
2- The state Of art (2017)
3- What improvements have they made?
SVM works very well. In many applications, they are still among the best performing algorithms.
We've seen some progress in particular on linear SVMs, that can be trained much faster than kernel SVMs.
Read more literature. Don't expect an exhaustive answer in this QA format. Show more effort on your behalf.
SVM's are most commonly used for classification problems where labeled data is available (supervised learning) and are useful for modeling with limited data. For problems with unlabeled data (unsupervised learning), then support vector clustering is an algorithm commonly employed. SVM tends to perform better on binary classification problems since the decision boundaries will not overlap. Your 2nd and 3rd questions are very ambiguous (and need lots of work!), but I'll suffice it to say that SVM's have found wide range applicability to medical data science. Here's a link to explore more about this: Applications of Support Vector Machine (SVM) Learning in Cancer Genomics
This thread discusses the comparison of different computer vision concepts. Fully Convolutional Networks for Semantic Segmentation
is a very popular deep learning approach for semantic segmentation. What are the popular or state-of-the-art deep learning approaches for object detection? It seems to me that these two problems share quite some similarities. Are there any framework or methodology that study leveraging the result of solving one problem, i.e., semantic segmentation to solve object detection.
Have a look at this link. It contains a list of networks for different computer vision tasks.
R-FCN: Object Detection via Region-based Fully Convolutional Networks is a new take on Object Detection that uses FCN(which is used for Semantic Segmentation) for Object Detection.
There several famous models like YOLO, R-FCN, SSD and their derivatives.
According to this Wikipedia article Feature Extraction examples for Low-Level algorithms are Edge Detection, Corner Detection etc.
But what are High-Level algorithms?
I only found this quote from the Wikipedia article Feature Detection (computer vision):
Occasionally, when feature detection is computationally expensive and there are time constraints, a higher level algorithm may be used to guide the feature detection stage, so that only certain parts of the image are searched for features.
Could you give an example of one of these higher level algorithms?
There isn't a clear cut definition out there, but my understanding of "high-level" algorithms are more in tune with how we classify objects in real life. For low-level feature detection algorithms, these are mostly concerned with finding corresponding points between images, or finding those things that classify as something even remotely interesting at the lowest possible level you can think of - things like finding edges or lines in an image (in addition to finding interesting points of course). In addition, anything dealing with pixel intensities or colours directly is what I would consider low-level too.
High-level algorithms are mostly in the machine learning domain. These algorithms are concerned with the interpretation or classification of a scene as a whole. Things like body pose classification, face detection, classification of human actions, object detection and recognition and so on. These algorithms are concerned with training a system to recognize or classify something, then you provide it some unknown input that it has never seen before and its job is to either determine what is happening in the scene, or locate a region of interest where it detects an action that the system is trained to look for. This latter fact is probably what the Wikipedia article is referring to. You would have some sort of pre-processing stage where you have some high-level system that determines salient areas in the scene where something important is happening. You would then apply low-level feature detection algorithms in this localized area.
There is a great high-level computer vision workshop that talks about all of this, and you can find the slides and code examples here: https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/teaching/courses/ss-2019-high-level-computer-vision/
Good luck!
High-level features are something that we can directly see and recognize, like object classification, recognition, segmentation and so on. These are usually the goal of CV research, which is always based on 'low-level' features and algorithms.
Two of them are used in machine specially x-ray machine
Concerned Scene as a whole and edges of lines to help soft ware of machine to take good decision.
I think we should not confuse with high-level features and high-level inference. To me, high-level features mean shape, size, or a combination of low-level features etc. are the high-level features. While classification is the decision made based on the high-level features.
I've been reading about similarity measures and image feature extraction; most of the papers refer to k-means as a good uniform clustering technique and my question is, is there any alternative to k-means clustering that performs better for an specific set?
You may want to look at MeanShift clustering which has several advantages over K-Means:
Doesn't require a preset number of clusters
K-Means clusters converge to n-dimensional voronoi grid, MeanShift allows other cluster shapes
MeanShift is implemented in OpenCV in the form of CAMShift which is a MeanShift adaptation for tracking objects in a video sequence.
If you need more info, you can read this excellent paper about MeanShift and Computer Vision:
Mean shift: A robust approach toward feature space analysis
A simple first step, you could generalize k-means to EM. But there are tons of clustering methods available and the kind of clustering you need depends on your data (features) and the applications. In some cases, even your distances you use matters and so might have to do some sort of distance transformation, if it is not in the kind of space you want it to be in.