Normalizing lighting conditions for image recognition - opencv

I am using OpenCV to process a video to identify frames containing objects using a rudimentary algorithm (background subtraction → brightness threshold → dilation → contour detection).
The video is shot over the course of a day, so lighting conditions are gradually changing, so I expect it would improve my results if I did some sort of brightness and/or contrast normalization step first.
Answers to this question suggest using convertScaleAbs, or contrast optimization with histogram clipping, or histogram equalization.
Of these techniques or others, what would be the best preprocessing step to normalize the frames?

Related

Semantic Segmentation: How to evaluate the noise influence of the effectivity and robustness of the medical image segmentation?

I have done some pre-processing including N4 Bias correction, noise removal and scaling on medical 3D MRIs, and I was asked one question:
How to evaluate the noise influence of the effectivity and robustness of the medical image segmentation? When affecting the image structure with various noise, the extracted features will be deteriorated. Such effect should be taken advantage in the context of the method
effectivity for different noise intensity.
How to evaluate the noise affect and how to justify the noise removal method used in the scientific manuscript?
I don't know if this can be helpful but I did once in classrom with nuclear magnetic resonance.
In that case we use the Shepp Logan Phantom with FFT. then we add noise to the picture (by adding random numbers with gaussian distribution).
When you transform the image back to the phantom you can see the effects of noise and sometimes artifacts (mostly due to the FFT algorithm and the window function choosed).
What I did was check the mean value of color in the image before and after, then on edges of the pahntom (skull) you can see how much is clear the passage from white to black and vice versa.
This can be easily tested with MATLAB code and the phantom. When you have the accuracy you need you can then apply the algorithm you choose on real images.

How to handle YUV422 (YUYV) image as input of a CNN?

I want to feed an image that is stored in the YUV422 (YUYV) format into a CNN. YUV422 means that two pixels are represented by four bytes, basically two pixels share the chroma but have separate luminances.
I understand that for convolutional neural networks the spatiality plays an important role, i.e. that the filters "see" the luminance pixels together with their corresponding chroma pixels. So how would one approach this problem? Or is this no problem at all?
I want to avoid an additional preprocessing step for performance reasons.
Convolutional Neural Networks as implemented in common frameworks like TensorFlow, PyTorch, etc. stores channels in a planar fashion. That is, each channel (R,G,B or Y,U,V) is stored in a continuous region with all the pixels in the image (width x height). This in contrast to the format where channel data are interleaved inside each pixel. So you will need to upsample the subsampled UV channels to match the size of the Y channel, and then feed it to the network in the same way as RGB data.
Others have found it to work OK, but not reaching the performance of RGB. See https://github.com/ducha-aiki/caffenet-benchmark/blob/master/Colorspace.md
and Effect of image colourspace on performance of convolution neural networks by K Sumanth Reddy; Upasna Singh; Prakash K Uttam.
It is unlikely that the YUV to RGB conversion would be a bottleneck. RGB has the distinct advantage that one can reuse many excellent pretrained models (transfer learning).
1
In this paper YUVMultiNet: Real-time YUV multi-task CNN for autonomous driving, channel Y and UV are are fed into different conv sperately.
2 RGB v.s. YUV v.s. HSV v.s. Lab (1)
As mentioned by Jon Nordby,here shows a comparison.
Interestingly, the learned RGB2GRAY method is better than that in openCV.
YCbCr seems to underperform RGB. The experiment is conducted on ImageNet-2012.
I will Try it on COCO or other datasets later
3. RGB v.s. YUV v.s. HSV v.s. Lab (2)
In paper Deep leaning approach with colorimetric spaces and vegetation indices for vine diseases detection in UAV images,YUV is generally better than RGB.
RGB and YUV color spaces have obtained the best performances in terms of discrimination between the four classes.
RGB:
remains sensitive to the size of the CNN’s convolution filter, especially for patches size (64x64).
The HSV and LAB color spaces have performed less than other color spaces.
For HSV:
this is mainly due to the Hue (H) channel that contains all color information grouped into a single channel which is less relevant for network to learn best color features from this color space. It is the color that has the most relevance in the classification, the saturation and value channels did not make a good contribution to the classification.
For the LAB color space:
classification results were not conclusive. This may be due to that a and b channels,
does not represent effectively the colors related to the diseased vineyard. The L-channel contributes little to the classification because it represents the quantity of light the color space.
From the results, that YUV is more stable and consistent which makes it more suitable for symptoms detection. These good performances are related to the color information of the green colour corresponding to healthy vegetation, and the brown and yellow colours characterising diseased vegetation, presented in UV channels.
The combination of different spaces produced lower scores than each space separately except a slight improvement for (16 1 × 6) patch.
This is due to the fact that CNN has not been able to extract good color features from multiple channels for discriminating healthy and diseased classes

Effects of 2-D Preprocessing on Feature Extraction

Which of this papers is true?
" Effects of 2-D Preprocessing on Feature Extraction "
We show that higher frequencies
do not, for the purposes of feature extraction, necessarily represent
human-salient features and that the combination of contrast
enhancement, decimation, and lowpass filtering achieve more
robust feature extraction than simple high-frequency boosting.
Our ideal feature extractor therefore incorporates a decimator for
reduction to an idealized size, contrast enhancement through
stretched dynamic range, and frequency-domain filtering with a
Gaussian lowpass filter."
" Fast SIFT algorithm based on Sobel edge detector "
This paper proposes a fast SIFT
algorithm based on Sobel edge detector. Sobel edge detector is
applied to generate an edge group scale space and SIFT detector
detects the extreme point under the constraint of the edge group
scale space. The experimental results show that the proposed
algorithm decreases the redundancy of keypoints and speeds up
the implementation while the matching rate between different
images maintains at a high level. As the threshold of Sobel
detector increases, number of keypoints decreases and matching
rate gets higher.

Which feature descriptors to use and why?

I do like to do compute the position and orientation of a camera in a civil aircraft cockpit.
I do use LEDs as fixed points. My plan is to save their X,Y,Z Position associated with the LED.
How can I detect and identify my LEDs on my images? Which feature descriptor and feature point extractor should I use?
How should I modify my image prior to feature detection?
I like to stay efficient.
----Please stop voting this question down----
Now after having found the solution to my problem, I do realize the question might have been too generic.
Anyways to support other people googeling I am going to describe my answer.
With combinations of OpenCVs functions I create masks which contain areas where the LEDs could be in white. The rest of the image is black. These functions are for example Core.range, Imgproc.dilate, and Imgproc.erode. Also with Imgproc.findcontours I am filtering out too large or too small contours. Also used to combine masks is Core.bitwise_and, or Core.bitwise_not.
The masks are computed from an image in the HSV color space as input.
Having these masks with potential LED areas, I do compute color histograms, which of the intensity normalized rgb colors. (Hue did not work well enough for me). These histograms are trained and normalized using a set of annotated input images and represent my descriptor.
I do match the trained descriptor against computed onces in the application using histogram intersection.
So I receive distance measures. Using a threshold for these measures, the measures and the knowledge of the geometric positions of the real-life LEDs I translate the patches to a graph system, which helps me to find the longest chain of potential LEDs.

Confusion regarding Object recognition and features using SURF

I have some conceptual issues in understanding SURF and SIFT algorithm All about SURF. As far as my understanding goes SURF finds Laplacian of Gaussians and SIFT operates on difference of Gaussians. It then constructs a 64-variable vector around it to extract the features. I have applied this CODE.
(Q1 ) So, what forms the features?
(Q2) We initialize the algorithm using SurfFeatureDetector detector(500). So, does this means that the size of the feature space is 500?
(Q3) The output of SURF Good_Matches gives matches between Keypoint1 and Keypoint2 and by tuning the number of matches we can conclude that if the object has been found/detected or not. What is meant by KeyPoints ? Do these store the features ?
(Q4) I need to do object recognition application. In the code, it appears that the algorithm can recognize the book. So, it can be applied for object recognition. I was under the impression that SURF can be used to differentiate objects based on color and shape. But, SURF and SIFT find the corner edge detection, so there is no point in using color images as training samples since they will be converted to gray scale. There is no option of using colors or HSV in these algorithms, unless I compute the keypoints for each channel separately, which is a different area of research (Evaluating Color Descriptors for Object and Scene Recognition).
So, how can I detect and recognize objects based on their color, shape? I think I can use SURF for differentiating objects based on their shape. Say, for instance I have a 2 books and a bottle. I need to only recognize a single book out of the entire objects. But, as soon as there are other similar shaped objects in the scene, SURF gives lots of false positives. I shall appreciate suggestions on what methods to apply for my application.
The local maxima (response of the DoG which is greater (smaller) than responses of the neighbour pixels about the point, upper and lover image in pyramid -- 3x3x3 neighbourhood) forms the coordinates of the feature (circle) center. The radius of the circle is level of the pyramid.
It is Hessian threshold. It means that you would take only maximas (see 1) with values bigger than threshold. Bigger threshold lead to the less number of features, but stability of features is better and visa versa.
Keypoint == feature. In OpenCV Keypoint is the structure to store features.
No, SURF is good for comparison of the textured objects but not for shape and color. For the shape I recommend to use MSER (but not OpenCV one), Canny edge detector, not local features. This presentation might be useful

Resources