semantic segmentation and object detection

semantic segmentation and object detection - image-processing

This thread discusses the comparison of different computer vision concepts. Fully Convolutional Networks for Semantic Segmentation
is a very popular deep learning approach for semantic segmentation. What are the popular or state-of-the-art deep learning approaches for object detection? It seems to me that these two problems share quite some similarities. Are there any framework or methodology that study leveraging the result of solving one problem, i.e., semantic segmentation to solve object detection.

Have a look at this link. It contains a list of networks for different computer vision tasks.
R-FCN: Object Detection via Region-based Fully Convolutional Networks is a new take on Object Detection that uses FCN(which is used for Semantic Segmentation) for Object Detection.

There several famous models like YOLO, R-FCN, SSD and their derivatives.

Related

Role of clustering algorithms in semantic segmentation pipeline?

I'm trying to think of ways in which clustering (e.g. k-means) fits into procedures for doing semantic segmentation or object recognition on images. My understanding is that semantic segmentation is done principally using deep CNNs. K-means works fine for segmentation, but semantic segmentation is supervised, thus makes clustering itself insufficient.
My question is: how can such unsupervised techniques fit into the overall pipeline of semantic segmentation? Do other techniques generally dominate it, or are there still practical use cases for problems involving classification/localization? I'm aware of a paper using k-means clustering to generate candidate boxes – are there other relevant use cases of clustering techniques in this pipeline?

They do not dominate but are used when data is less.
Unsupervised methods are used in medical image segmentation where data is generally scarce.
Example - Hill Climbing Segmentation
Implementation: https://in.mathworks.com/matlabcentral/fileexchange/22274-hill-climbing-color-image-segmentation
Have a look at this paper for a discussion on hill climbing and k-means for segmentation

Building a Tetris AI using Neuroevolution

I am planning to create a Tetris AI using artificial neural network and train it with genetic algorithm for a project in my high school computer science class. I have a basic understanding of how an ANN works and how to implement it with a genetic algorithm. I have already written a working Neural Network based on this tutorial and I'm currently working on a genetic algorithm.
My questions are:
Which GA model is better for this situation (Tetris), and why?
What should I use for input for the neural network? Because currently, the method I'm using is to simply convert the state of the board (the pieces) into a one dimensional array and feed it into the neural network? Is there a better approach?
What should the size (number of layers, neurons per layer) the neural network be?
Are there any good sources of information that can help me?
Thank you!

Similar task was already solved by Google, but they solved it for all kinds of Atari games - https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf.
Carefully read this article and all of the related articles too
This is a reinforcement learning task, in my opinion the hardest task in ML domain. So there will be no short answer for your questions - except that probably you shouldn't use GA heuristic at all and rely on reinforcements methods.

Difference between Low-Level and High-Level Feature Detection/ Extraction

According to this Wikipedia article Feature Extraction examples for Low-Level algorithms are Edge Detection, Corner Detection etc.
But what are High-Level algorithms?
I only found this quote from the Wikipedia article Feature Detection (computer vision):
Occasionally, when feature detection is computationally expensive and there are time constraints, a higher level algorithm may be used to guide the feature detection stage, so that only certain parts of the image are searched for features.
Could you give an example of one of these higher level algorithms?

There isn't a clear cut definition out there, but my understanding of "high-level" algorithms are more in tune with how we classify objects in real life. For low-level feature detection algorithms, these are mostly concerned with finding corresponding points between images, or finding those things that classify as something even remotely interesting at the lowest possible level you can think of - things like finding edges or lines in an image (in addition to finding interesting points of course). In addition, anything dealing with pixel intensities or colours directly is what I would consider low-level too.
High-level algorithms are mostly in the machine learning domain. These algorithms are concerned with the interpretation or classification of a scene as a whole. Things like body pose classification, face detection, classification of human actions, object detection and recognition and so on. These algorithms are concerned with training a system to recognize or classify something, then you provide it some unknown input that it has never seen before and its job is to either determine what is happening in the scene, or locate a region of interest where it detects an action that the system is trained to look for. This latter fact is probably what the Wikipedia article is referring to. You would have some sort of pre-processing stage where you have some high-level system that determines salient areas in the scene where something important is happening. You would then apply low-level feature detection algorithms in this localized area.
There is a great high-level computer vision workshop that talks about all of this, and you can find the slides and code examples here: https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/teaching/courses/ss-2019-high-level-computer-vision/
Good luck!

High-level features are something that we can directly see and recognize, like object classification, recognition, segmentation and so on. These are usually the goal of CV research, which is always based on 'low-level' features and algorithms.

Two of them are used in machine specially x-ray machine
Concerned Scene as a whole and edges of lines to help soft ware of machine to take good decision.

I think we should not confuse with high-level features and high-level inference. To me, high-level features mean shape, size, or a combination of low-level features etc. are the high-level features. While classification is the decision made based on the high-level features.

How does HOG feature descriptor training work?

There doesn't seem to be any implementations of HOG training in openCV and little sources about how HOG training works. From what I gathered, HOG training can be done in real time. But what are the requirements of training? How does the training process actually work?

As with most computer vision algorithms, Google Scholar is your friend :) I would suggest reading a few papers on how it works. Here is one of the most referenced papers on HoG for you to start with.
Another tip when researching in computer vision is to note the authors of the papers you find interesting, and try to find their websites. They will tend to have an implementation of their algorithms as well as rules of thumb on how to use them. Also, look up the references that are sited in the paper about your algorithm. This can be very helpful in aquiring the background knowledge to truly understand how the algorithm works and why.

Your terminology is a bit mixed up. HOG is a feature descriptor. You can train a classifier using HOG, which can in turn be used for object detection. OpenCV includes a people detector that uses HOG features and an SVM classifier. It also includes CascadeClassifier, which can use HOG, and which is typically used for face detection.
There is a program in OpenCV called opencv_traincascade, which lets you train a cascade object detector, an which gives you the option to use HOG. There is a function in the Computer Vision System Toolbox for MATLAB called trainCascadeObjectDetector, which does the same thing.

Difference between feature detection and object detection

I know that most common object detection involves Haar cascades and that there are many techniques for feature detection such as SIFT, SURF, STAR, ORB, etc... but if my end goal is to recognizes objects doesn't both ways end up giving me the same result? I understand using feature techniques on simple shapes and patterns but for complex objects these feature algorithms seem to work as well.
I don't need to know the difference in how they function but whether or not having one of them is enough to exclude the other. If I use Haar cascading, do I need to bother with SIFT? Why bother?
thanks
EDIT: for my purposes I want to implement object recognition on a broad class of things. Meaning that any cups that are similarly shaped as cups will be picked up as part of class cups. But I also want to specify instances, meaning a NYC cup will be picked up as an instance NYC cup.

Object detection usually consists of two steps: feature detection and classification.
In the feature detection step, the relevant features of the object to be detected are gathered.
These features are input to the second step, classification. (Even Haar cascading can be used
for feature detection, to my knowledge.) Classification involves algorithms
such as neural networks, K-nearest neighbor, and so on. The goal of classification is to find
out whether the detected features correspond to features that the object to be detected
would have. Classification generally belongs to the realm of machine learning.
Face detection, for example, is an example of object detection.
EDIT (Jul. 9, 2018):
With the advent of deep learning, neural networks with multiple hidden layers have come into wide use, making it relatively easy to see the difference between feature detection and object detection. A deep learning neural network consists of two or more hidden layers, each of which is specialized for a specific part of the task at hand. For neural networks that detect objects from an image, the earlier layers arrange low-level features into a many-dimensional space (feature detection), and the later layers classify objects according to where those features are found in that many-dimensional space (object detection). A nice introduction to neural networks of this kind is found in the Wolfram Blog article "Launching the Wolfram Neural Net Repository".

Normally objects are collections of features. A feature tends to be a very low-level primitive thing. An object implies moving the understanding of the scene to the next level up.
A feature might be something like a corner, an edge etc. whereas an object might be something like a book, a box, a desk. These objects are all composed of multiple features, some of which may be visible in any given scene.

Invariance, speed, storage; few reasons, I can think on top of my head. The other method to do would be to keep the complete image and then check whether the given image is similar to glass images you have in your database. But if you have a compressed representation of the glass, it will need lesser computation (thus faster), will need lesser storage and the features tells you the invariance across images.
Both the methods you mentioned are essentially the same with slight differences. In case of Haar, you detect the Haar features then you boost them to increase the confidence. Boosting is nothing but a meta-classifier, which smartly chooses which all Harr features to be included in your final meta-classification, so that it can give a better estimate. The other method, also more or less does this, except that you have more "sophisticated" features. The main difference is that, you don't use boosting directly. You tend to use some sort of classification or clustering, like MoG (Mixture of Gaussian) or K-Mean or some other heuristic to cluster your data. Your clustering largely depends on your features and application.
What will work in your case : that is a tough question. If I were you, I would play around with Haar and if it doesn't work, would try the other method (obs :>). Be aware that you might want to segment the image and give some sort of a boundary around for it to detect glasses.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart