Difference between Low-Level and High-Level Feature Detection/ Extraction - image-processing

According to this Wikipedia article Feature Extraction examples for Low-Level algorithms are Edge Detection, Corner Detection etc.
But what are High-Level algorithms?
I only found this quote from the Wikipedia article Feature Detection (computer vision):
Occasionally, when feature detection is computationally expensive and there are time constraints, a higher level algorithm may be used to guide the feature detection stage, so that only certain parts of the image are searched for features.
Could you give an example of one of these higher level algorithms?

There isn't a clear cut definition out there, but my understanding of "high-level" algorithms are more in tune with how we classify objects in real life. For low-level feature detection algorithms, these are mostly concerned with finding corresponding points between images, or finding those things that classify as something even remotely interesting at the lowest possible level you can think of - things like finding edges or lines in an image (in addition to finding interesting points of course). In addition, anything dealing with pixel intensities or colours directly is what I would consider low-level too.
High-level algorithms are mostly in the machine learning domain. These algorithms are concerned with the interpretation or classification of a scene as a whole. Things like body pose classification, face detection, classification of human actions, object detection and recognition and so on. These algorithms are concerned with training a system to recognize or classify something, then you provide it some unknown input that it has never seen before and its job is to either determine what is happening in the scene, or locate a region of interest where it detects an action that the system is trained to look for. This latter fact is probably what the Wikipedia article is referring to. You would have some sort of pre-processing stage where you have some high-level system that determines salient areas in the scene where something important is happening. You would then apply low-level feature detection algorithms in this localized area.
There is a great high-level computer vision workshop that talks about all of this, and you can find the slides and code examples here: https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/teaching/courses/ss-2019-high-level-computer-vision/
Good luck!

High-level features are something that we can directly see and recognize, like object classification, recognition, segmentation and so on. These are usually the goal of CV research, which is always based on 'low-level' features and algorithms.

Two of them are used in machine specially x-ray machine
Concerned Scene as a whole and edges of lines to help soft ware of machine to take good decision.

I think we should not confuse with high-level features and high-level inference. To me, high-level features mean shape, size, or a combination of low-level features etc. are the high-level features. While classification is the decision made based on the high-level features.

Related

How to understand Markov Localisation Algorithm?

In my thesis project, I need to implement Monte Carlo Localisation algorithm (it's based on Markov Localisation). I have exactly one month of time to understand and implement the algorithm. I understand basics of probability and Bayes theorem. Now which topics I should get familiar with to understand Markov Algorithm? I have read couple of research papers 3-4 times, still I failed to understand everything.
I tried to do Google whichever terms I didn't understand but I couldn't get the essence of the algorithm. I want to understand systematically. I know what it does but I didn't fully understand how it does or why it does.
for e.g. in one of the research paper it was written that Markov algorithm can be used in global indoor positioning system or when you have multi-modal gaussian distribution. whereas Kalman filter can not be used for the same reasons. Now, I completely didn't understand.
second example, Markov Algorithm assume map is static and consider Markov assumption where measurements are independent and doesn't depend on previous measurements. but when environment is dynamic (objects are moving) , Markov assumption is not valid and we need to modify Markov algorithm to incorporate dynamic environment. Now, I don't understand why?
It would be great if someone point me out which topics should I learn to understand the algorithm. please keep in mind that I have only one month.
Particle Filter is what you are looking for to localize a robot.
To implement particle filter, you need an understanding of basic probability(mostly Bayes theorem), Gaussian distributions in 2D.
slides, video
Watch these course videos, which are really good.
for e.g. in one of the research paper it was written that Markov algorithm can be used in global indoor positioning system or when you have multi-modal gaussian distribution. whereas Kalman filter can not be used for the same reasons. Now, I completely didn't understand.
Kalman filter or Extend Kalman filter is used for unimodal distribution and also the initial estimation must be good enough to track.
Particle filter is multi modal, doesn't need an initial guess, but need more particles (or samples) to converge to a better estimate.
second example, Markov Algorithm assume map is static and consider Markov assumption where measurements are independent and doesn't depend on previous measurements. but when environment is dynamic (objects are moving) , Markov assumption is not valid and we need to modify Markov algorithm to incorporate dynamic environment. Now, I don't understand why?
If the objects are humans, it is not difficult to localize (unless the robot is completely covered by humans and robot is not able to see any part of the environment)even in a dynamic environment. A simple modification will be to consider laser rays which are in conformation with the map. Below paper explains this.
check this paper Markov Localization for Mobile Robots
in Dynami Environments

More complex hand-pose estimation algorithms

I am currently looking into hand-pose estimation in Unity without using any expensive plugins! At the moment, I have implemented a simple hand-tracking system by extracting the contours of the hand, like the link below:
https://www.youtube.com/watch?v=4QE5FcUK5ZA
However, it doesn't work brilliantly in all environments and tends to not recognise the hand when other object are in the frame (like a face!). Does anyone have any more complex algorithms for hand-pose estimation? I've looked at using neural nets but they tend to use a lot of CPU and/or GPU power, and I need this to be lightweight and not lag in Unity.
Anyone have any suggestions?
Multi-layered random forest is a good light weight method for real time hand pose estimation https://ieeexplore.ieee.org/document/7789644/.
It uses an ensemble of regressors that are specialised on different areas of angle space. And the first layer learns how to weight the output of each of these specialised regressors.
It achieves state of the art on hand pose estimation and has been used by the author in real-time AR applications.
The model uses contour features like the ones you have extracted.

Implementation of state-of-the-art video shot boundary detection

I am working on a wide project involving object retrieval from videos.
According to "A Survey on Visual Content-Based Video Indexing and Retrieval", most popular methods are divided in:
simple "threshold based approach" (global or adaptative)
supervised learning-based classifiers (with SVM or AdaBoost)
unsupervised learning-based algorithm (mainly K-Means)
For the moment I implemented on my own a very simple and old-fashioned method based on differences of color-histograms among successive frames.
Nevertheless, I would like to try something more efficient and up to date, without spending too much time, considering that shot boundary detection is not the main topic of my research.
Does anybody know an implementation of an effective algorithm?
I've found the following implementation extremely useful and effective in my own research:
https://github.com/johmathe/Shotdetect
The workhorse of the codebase happens here:
https://github.com/johmathe/Shotdetect/blob/master/src/film.cc#L117-237
It mostly relies on color information for detection of shots.
Computing Chi-Square distance of RGB histograms of adjacent frames is one of the fast, simple and robust methods for shot boundary detection. You can see my implementation and usage of this method in here.

Difference between feature detection and object detection

I know that most common object detection involves Haar cascades and that there are many techniques for feature detection such as SIFT, SURF, STAR, ORB, etc... but if my end goal is to recognizes objects doesn't both ways end up giving me the same result? I understand using feature techniques on simple shapes and patterns but for complex objects these feature algorithms seem to work as well.
I don't need to know the difference in how they function but whether or not having one of them is enough to exclude the other. If I use Haar cascading, do I need to bother with SIFT? Why bother?
thanks
EDIT: for my purposes I want to implement object recognition on a broad class of things. Meaning that any cups that are similarly shaped as cups will be picked up as part of class cups. But I also want to specify instances, meaning a NYC cup will be picked up as an instance NYC cup.
Object detection usually consists of two steps: feature detection and classification.
In the feature detection step, the relevant features of the object to be detected are gathered.
These features are input to the second step, classification. (Even Haar cascading can be used
for feature detection, to my knowledge.) Classification involves algorithms
such as neural networks, K-nearest neighbor, and so on. The goal of classification is to find
out whether the detected features correspond to features that the object to be detected
would have. Classification generally belongs to the realm of machine learning.
Face detection, for example, is an example of object detection.
EDIT (Jul. 9, 2018):
With the advent of deep learning, neural networks with multiple hidden layers have come into wide use, making it relatively easy to see the difference between feature detection and object detection. A deep learning neural network consists of two or more hidden layers, each of which is specialized for a specific part of the task at hand. For neural networks that detect objects from an image, the earlier layers arrange low-level features into a many-dimensional space (feature detection), and the later layers classify objects according to where those features are found in that many-dimensional space (object detection). A nice introduction to neural networks of this kind is found in the Wolfram Blog article "Launching the Wolfram Neural Net Repository".
Normally objects are collections of features. A feature tends to be a very low-level primitive thing. An object implies moving the understanding of the scene to the next level up.
A feature might be something like a corner, an edge etc. whereas an object might be something like a book, a box, a desk. These objects are all composed of multiple features, some of which may be visible in any given scene.
Invariance, speed, storage; few reasons, I can think on top of my head. The other method to do would be to keep the complete image and then check whether the given image is similar to glass images you have in your database. But if you have a compressed representation of the glass, it will need lesser computation (thus faster), will need lesser storage and the features tells you the invariance across images.
Both the methods you mentioned are essentially the same with slight differences. In case of Haar, you detect the Haar features then you boost them to increase the confidence. Boosting is nothing but a meta-classifier, which smartly chooses which all Harr features to be included in your final meta-classification, so that it can give a better estimate. The other method, also more or less does this, except that you have more "sophisticated" features. The main difference is that, you don't use boosting directly. You tend to use some sort of classification or clustering, like MoG (Mixture of Gaussian) or K-Mean or some other heuristic to cluster your data. Your clustering largely depends on your features and application.
What will work in your case : that is a tough question. If I were you, I would play around with Haar and if it doesn't work, would try the other method (obs :>). Be aware that you might want to segment the image and give some sort of a boundary around for it to detect glasses.

Natural feature tracking with openCV- evaluating the options

In brief, what are the available options for implementing the Tracking of a particular Image(A photo/graphic/logo) in webcam feed using OpenCv?In particular i am trying to collate opinion about the following:
Would HaarTraining be overkill(considering that it is not 3d objects but simply Images to be tracked) or is it the only way out?
Have tried Template Matching, Color-based detection but these don't offer reliable tracking under varying illumination/Scale/Orientation at all.
Would SIFT,SURF feature matching work as reliably in video as with static image
comparison?
Am a relative beginner to OpenCV , as is evident by my previous queries on SO (very helpful replies). Any cues or links to what could be good resources for beginning NFT implementation with OpenCV?
Can you talk a bit more about your requirements? Namely, what type of appearance variations do you expect/how much control you have over the environment. What type of constraints do you have in terms of speed/power/resource footprint?
Without those, I can only give some general assessment to the 3 paths you are talking about.
1.
Haar would work well and fast, particularly for instance recognition.
Note that Haar doesn't work all that well for 3D unless you train with a full spectrum of templates to cover various perspectives. The poster child application of Haar cascades is Viola Jones' face detection system which is largely geared towards frontal faces (can certainly be trained for many other things)
For a tutorial on doing Haar training using OpenCV, see here.
2.
Try NCC or better yet, Lucas Kanade tracking (cvCalcOpticalFlowPyrLK which is a pyramidal as in coarse-to-fine LK - a 4 level pyramid usually works well) for a template. Usually good upto 10% scale or 10 degrees rotation without template changes. Beyond that, you can have automatically evolving templates which can drift over time.
For a quick Optical Flow/tracking tutorial, see this.
3.
SIFT/SURF would indeed work very well. I'd suggest some additional geometric verification step to remove spurious matches.
I'd be a bit concerned about the amount of computational time involved. If there isn't significant illumination/scale/in-plane rotation, then SIFT is probably overkill. If you truly need it, check out Changchang Wu's excellent SIFTGPU implmentation. Note: 3rd party, not OpenCV.
It seems that none of the methods when applied alone could bring reliable results unless it is a hobby project. Probably some adaptive algorithm would be more or less acceptable. For example see a famous opensource project where they use machine learning.

Resources