Implementation of state-of-the-art video shot boundary detection - image-processing

I am working on a wide project involving object retrieval from videos.
According to "A Survey on Visual Content-Based Video Indexing and Retrieval", most popular methods are divided in:
simple "threshold based approach" (global or adaptative)
supervised learning-based classifiers (with SVM or AdaBoost)
unsupervised learning-based algorithm (mainly K-Means)
For the moment I implemented on my own a very simple and old-fashioned method based on differences of color-histograms among successive frames.
Nevertheless, I would like to try something more efficient and up to date, without spending too much time, considering that shot boundary detection is not the main topic of my research.
Does anybody know an implementation of an effective algorithm?

I've found the following implementation extremely useful and effective in my own research:
https://github.com/johmathe/Shotdetect
The workhorse of the codebase happens here:
https://github.com/johmathe/Shotdetect/blob/master/src/film.cc#L117-237
It mostly relies on color information for detection of shots.

Computing Chi-Square distance of RGB histograms of adjacent frames is one of the fast, simple and robust methods for shot boundary detection. You can see my implementation and usage of this method in here.

Related

Image Recognition Techniques for Identifying Logos (Brands)

I started learning Image Recognition a few days back and I would like to do a project in which it need to identify different brand logos in Android.
For Ex: If I take a picture of Nike logo in an Android device then it needs to display "Nike".
Low computational time would be the main criteria for me.
For this, I have done some work and started learning OpenCV sample examples.
What would be the best Image Recognition that would be used for me.
1) I came to know from Template Matching that their applicability is limited mostly by the available computational power, as identification of big and complex templates can be time consuming. (and so I don't want to use it)
2) Feature Based detectors like SIFT/SURF/STAR (As per my knowledge this would be a better option for me)
3) How about Deep Learning and Pattern recognition concepts? (I was digging on this and don't know whether it would be an option for me). Can any of you let me know whether I can use this and why it would be an better choice for me when compared with 1 and 2.
4) Haar caascade classifiers (From one of the posts in SO, I came to know that by using Haar it doesn't work in Rotation and Scale invariant and so I haven't concentrated much on this). Does this been a better Option for me If I focus up on.
I’m now running one of my pet projects and it's required face recognition – detecting the area with face on the photo, if it exists with Raspberry pi, so I’ve done some analysis about that task
And I found this approach. The key idea is in avoiding scanning entire picture to help by scanning windows of different sizes like it was in OpenCV, but by dividing an entire photo into 49 (7x7) squares and train the model not only for detecting of presenting one of classes inside each square, but also for determining the location and size of detecting object
It’s only 49 runs of trained model, so I think it's possible to execute this in less than in a second even on non state-of-the-art smartphones. Anyway, it will be a trade-off between accuracy and performance
About the model
I will use vgg –like model, probably a bit simpler than even vgg11A.
In my case ready dataset already exists. So I can train convolutional network with it
Why deep learning approach is better than 1-3 you mentioned? Because of its higher accuracy for such kind of tasks. It’s practically proven. You could check it in kaggle. Majority of the top models for classification competitions are based on convolutional networks
The only disadvantage for you – probably it would be necessary create your own dataset to train the model
Here is a post that I think can be useful for you: Image Processing: Algorithm Improvement for 'Coca-Cola Can' Recognition. Another one: Logo recognition in images.
2) Feature Based detectors like SIFT/SURF/STAR (As per my knowledge
this would be a better option for me)
Just remember that SIFT and SURF are both patented so you will need a license for any commercial use (free for non-commercial use).
4) Haar caascade classifiers (From one of the posts in SO, I came to know that by using Haar it doesn't work in Rotation and Scale invariant and so I haven't concentrated much on this). Does this been a better Option for me If I focus up on.
It works (if I understand your question right), much of this depends of how you trained your classifier. You could train it to detect all kind of rotations and scales. Anyways, I would discourage you to go for this option as I think the other possible solutions are better meant for the case.

feature matching/detection on brain images

This question is for those who have tried feature detection/matching methods on brain images - it is a broad one, and perhaps a bad one:
How could you tell if the method you used was "good enough?"
What does a successful matching/detection test look like for your data?
EDIT:
As of now, I am not trying to detect any distinct features in particular.
I'm using OpenCV's ORB, SIFT, SURF, etc detection methods, and seeing what they identify for features.
Sometimes, however, the orientation of the brain changes entirely from a
few set of images to the next set, so if I compare two images from these sets,the detection methods won't yield any effective
results (i.e. the matching will be distinctly, completely off). But if I compare images that look similar, but not identical,
the detection seems to work alright. Point is, it seems like detection works for frames that were taken around the same
time, but not over a long interval. I wonder if others have come across this and if they have found that detection methods
are still useful despite the fact.
First of all, you should specify what kind of features or for which purpose, the experiment is going to be performed.
Feature extraction is highly subjective in nature, it all depends on what type of problem you are trying to handle. There is no generic feature extraction scheme which works in all cases.
For example if the features are pointing out to some tumor classification or lesion, then of course there are different softwares you can use to extract and define your features.
There are different methods to detect the relevant features regarding to the application:
SURF algorithm (Speeded Up Robust Features)
PLOFS: It is a fast wrapper approach with a subset evaluation.
ICA or 'PCA
This paper is a very great review about brain MRI data feature extraction for tissue classification:
https://pdfs.semanticscholar.org/fabf/a96897dcb59ad9f04b5ff92bd15e1bd159ef.pdf
I found this paper very good o understand the difference between feature extraction techniques.
https://www.sciencedirect.com/science/article/pii/S1877050918301297

Difference between Low-Level and High-Level Feature Detection/ Extraction

According to this Wikipedia article Feature Extraction examples for Low-Level algorithms are Edge Detection, Corner Detection etc.
But what are High-Level algorithms?
I only found this quote from the Wikipedia article Feature Detection (computer vision):
Occasionally, when feature detection is computationally expensive and there are time constraints, a higher level algorithm may be used to guide the feature detection stage, so that only certain parts of the image are searched for features.
Could you give an example of one of these higher level algorithms?
There isn't a clear cut definition out there, but my understanding of "high-level" algorithms are more in tune with how we classify objects in real life. For low-level feature detection algorithms, these are mostly concerned with finding corresponding points between images, or finding those things that classify as something even remotely interesting at the lowest possible level you can think of - things like finding edges or lines in an image (in addition to finding interesting points of course). In addition, anything dealing with pixel intensities or colours directly is what I would consider low-level too.
High-level algorithms are mostly in the machine learning domain. These algorithms are concerned with the interpretation or classification of a scene as a whole. Things like body pose classification, face detection, classification of human actions, object detection and recognition and so on. These algorithms are concerned with training a system to recognize or classify something, then you provide it some unknown input that it has never seen before and its job is to either determine what is happening in the scene, or locate a region of interest where it detects an action that the system is trained to look for. This latter fact is probably what the Wikipedia article is referring to. You would have some sort of pre-processing stage where you have some high-level system that determines salient areas in the scene where something important is happening. You would then apply low-level feature detection algorithms in this localized area.
There is a great high-level computer vision workshop that talks about all of this, and you can find the slides and code examples here: https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/teaching/courses/ss-2019-high-level-computer-vision/
Good luck!
High-level features are something that we can directly see and recognize, like object classification, recognition, segmentation and so on. These are usually the goal of CV research, which is always based on 'low-level' features and algorithms.
Two of them are used in machine specially x-ray machine
Concerned Scene as a whole and edges of lines to help soft ware of machine to take good decision.
I think we should not confuse with high-level features and high-level inference. To me, high-level features mean shape, size, or a combination of low-level features etc. are the high-level features. While classification is the decision made based on the high-level features.

Stereovision algorithms

For my project, supposed to segment closest hand region from camera, I initially try openCV's stereovision example. However, disparity map looks very bad and its useless for me.
Is there any other method which is better than openCV implementation and have some output(image-video). Because, my time is limited, I must choose one better algorithm and implement this.
Thank you.
OpenCV implements a number of stereo block matching algorithms some of them pretty cutting edge.
Disparity maps always look bad except in very simple circumstances - the first step is to try and improve the source images, the lighting and the background. I
If it was easy then everybody would eb doing it and there would be no market for expensive 3D laser scanners.
Try the different block matching algorithms provided by OpenCV. The little bit of experimentation I've done so far seems to indicate that cv::StereoSGBM gives better disparity maps than cv::StereoBM, but is slower.
The performance of the block matching algorithms will depend on what parameters they are initialized with. Have a look at the stereo examples again here, notice line 195-222 where the algorithms are initialized.
I also suggest you use some basic GUI (OpenCV:s highgui for example) to manipulate these parameters real-time when finetuning the algorithm.

Natural feature tracking with openCV- evaluating the options

In brief, what are the available options for implementing the Tracking of a particular Image(A photo/graphic/logo) in webcam feed using OpenCv?In particular i am trying to collate opinion about the following:
Would HaarTraining be overkill(considering that it is not 3d objects but simply Images to be tracked) or is it the only way out?
Have tried Template Matching, Color-based detection but these don't offer reliable tracking under varying illumination/Scale/Orientation at all.
Would SIFT,SURF feature matching work as reliably in video as with static image
comparison?
Am a relative beginner to OpenCV , as is evident by my previous queries on SO (very helpful replies). Any cues or links to what could be good resources for beginning NFT implementation with OpenCV?
Can you talk a bit more about your requirements? Namely, what type of appearance variations do you expect/how much control you have over the environment. What type of constraints do you have in terms of speed/power/resource footprint?
Without those, I can only give some general assessment to the 3 paths you are talking about.
1.
Haar would work well and fast, particularly for instance recognition.
Note that Haar doesn't work all that well for 3D unless you train with a full spectrum of templates to cover various perspectives. The poster child application of Haar cascades is Viola Jones' face detection system which is largely geared towards frontal faces (can certainly be trained for many other things)
For a tutorial on doing Haar training using OpenCV, see here.
2.
Try NCC or better yet, Lucas Kanade tracking (cvCalcOpticalFlowPyrLK which is a pyramidal as in coarse-to-fine LK - a 4 level pyramid usually works well) for a template. Usually good upto 10% scale or 10 degrees rotation without template changes. Beyond that, you can have automatically evolving templates which can drift over time.
For a quick Optical Flow/tracking tutorial, see this.
3.
SIFT/SURF would indeed work very well. I'd suggest some additional geometric verification step to remove spurious matches.
I'd be a bit concerned about the amount of computational time involved. If there isn't significant illumination/scale/in-plane rotation, then SIFT is probably overkill. If you truly need it, check out Changchang Wu's excellent SIFTGPU implmentation. Note: 3rd party, not OpenCV.
It seems that none of the methods when applied alone could bring reliable results unless it is a hobby project. Probably some adaptive algorithm would be more or less acceptable. For example see a famous opensource project where they use machine learning.

Resources