OpenCV Image Comparison for Surface Damage detection - opencv

We are planning to create a surface damage detection prototype for ceramic tiles with surface discoloration as a specific damage through the use of OpenCV. We would like to know what method should we consider using. We are new into developing these types of object recognition/object tracking programs. We've read about methods such as the Histogram method and the one where the Hue saturation value was being tracked, but still we are confused.
Also, we would like to know whether it is possible to detect the Hue saturation value of an object without the use of track bars.
Any relevant and helpful response will be greatly appreciated.

I think you can do it in sequence:
1) find tile region. Use corners detector, hough lines, etc.
2) find SIFT (or other descriprors) and recognize what image must be on this tile (find it in you tiles images database).
3) align images carefully. For example find homograpy between found in DB image and image of tile from camera (using SIFT features).
4) find color distance between every pixel in tile image from camera and tile image from database.
5) threshold differences by some value -> get problematic regions
And think about lighting. You have to provide equal lighting conditions for you measurements.

Related

Tracking of rotating objects using opencv

I need to track cars on the road from top-view video.
My application contain two main parts:
Detecting cars on the frame (Tensorflow trained network)
Tracking detected cars (opencv trackers)
I have troubles with opencv trackers. Initially i tried to different trackers, but only MOSSE is fast enough. This tracker works almost perfect for case with straight road, but i faced problems with rotating cars. This situation appears on crossroads.
As i understood, bounding box of rotated object is bigger that bbox of horizontal or vertical object. As result bbox contains big part of static background and the tracker lose target object.
Are there any alternative trackers which can track contours (not bounding boxes)?
Can i adjust quality of existing opencv trackers results by any settings or by adjusting picture?
Schema:
Real image:
If your camera is stationary the following scenario is feasible:
use ‌background subtraction methods to separate background image from foreground blobs.
Improve the foreground results using morphological operations.
Detect car blobs and remove other blobs.
Track foreground blobs in video i.e. binary track (simply use this or even apply KF).
A very basic but effective approach in this scenario might be to track the center coordinates of the bounding box, if the center coordinates only change along one axis (with a small tolerance for either axis), its a linear motion (not a rotation). If both x and y change, the car is moving in the roundabout.
This only has the weakness that it will detect diagonal motion, but since you are looking at a centered roundabout, that shouldn't be an issue.
It will also be very efficient memory-wise.
You should use PCA method, which can calculate the orientation of an detected object and which way it is facing. You can change the threshold of detection to select objects more like the cars (based upon shape and colour - a HSV conversion which in your case is red) in your picture.
Link to an introduction to Principal Component Analysis (PCA)
Method 1 :
- Detect bounding boxes and subtract the background to get blobs rotated rectangles.
Method 2 :
- implement your own version of detector with rotated boxes.
Method 3 :
- Use segmentation instead ... Unet for example.
There are no other trackers than the ones found in the library.
Your best bet is to filter the image and use findcontours.
Optical flow and background subtraction will help with this. You can combine optical flow with your car detector to rule out false positives.
https://docs.opencv.org/3.4/d4/dee/tutorial_optical_flow.html
https://docs.opencv.org/3.4/d1/dc5/tutorial_background_subtraction.html

Detecting balls on a pool table

I'm currently working on a project where I need to be able to very reliable get the positions of the balls on a pool table.
I'm using a Kinect v2 above the table as the source.
Initial image looks like this (after converting it to 8-bit from 16-bit by throwing away pixels which is not around table level):
Then a I subtract a reference image with the empty table from the current image.
After thresholding and equalization it looks like this: image
It's fairly easy to detect the individual balls on a single image, the problem is that I have to do it constantly with 30fps.
Difficulties:
Low resolution image (512*424), a ball is around 4-5 pixel in diameter
Kinect depth image has a lot of noise from this distance (2 meters)
Balls look different on the depth image, for example the black ball is kind of inverted compared to the others
If they touch each other then they can become one blob on the image, if I try to separate them with depth thresholding (only using the top of the balls) then some of the balls can disappear from the image
It's really important that anything other than balls should not be detected e.g.: cue, hands etc...
My process which kind of works but not reliable enough:
16bit to 8bit by thresholding
Subtracting sample image with empty table
Cropping
Thresholding
Equalizing
Eroding
Dilating
Binary threshold
Contour finder
Some further algorithms on the output coordinates
The problem is that a pool cue or hand can be detected as a ball and also if two ball touches then it can cause issues. Also tried with hough circles but with even less success. (Works nicely if the Kinect is closer but then it cant cover the whole table)
Any clues would be much appreciated.
Expanding comments above:
I recommend improving the IRL setup as much as possible.
Most of the time it's easier to ensure a reliable setup than to try to "fix" that user computer vision before even getting to detecting/tracking anything.
My suggestions are:
Move the camera closer to the table. (the image you posted can be 117% bigger and still cover the pockets)
Align the camera to be perfectly perpendicular to the table (and ensure the sensor stand is sturdy and well fixed): it will be easier to process a perfect top down view than a slightly tilted view (which is what the depth gradient shows). (sure the data can be rotated, but why waste CPU cycles when you can simply keep the sensor straight)
With a more reliable setup you should be able to threshold based on depth.
You can possible threshold to the centre of balls since the information bellow is occluded anyway. The balls do not deform, so it the radius decreases fast the ball probably went in a pocket.
One you have a clear threshold image, you can findContours() and minEnclosingCircle(). Additionally you should contrain the result based on min and max radius values to avoid other objects that may be in the view (hands, pool cues, etc.). Also have a look at moments() and be sure to read Adrian's excellent Ball Tracking with OpenCV article
It's using Python, but you should be able to find OpenCV equivalent call for the language you use.
In terms tracking
If you use OpenCV 2.4 you should look into OpenCV 2.4's tracking algorithms (such as Lucas-Kanade).
If you already use OpenCV 3.0, it has it's own list of contributed tracking algorithms (such as TLD).
I recommend starting with Moments first: use the simplest and least computationally expensive setup initially and see how robuts the results are before going into the more complex algorithms (which will take to understand and get the parameters right to get expected results out of)

What are keypoints in image processing?

When using OpenCV for example, algorithms like SIFT or SURF are often used to detect keypoints. My question is what actually are these keypoints?
I understand that they are some kind of "points of interest" in an image. I also know that they are scale invariant and are circular.
Also, I found out that they have orientation but I couldn't understand what this actually is. Is it an angle but between the radius and something? Can you give some explanation? I think I need what I need first is something simpler and after that it will be easier to understand the papers.
Let's tackle each point one by one:
My question is what actually are these keypoints?
Keypoints are the same thing as interest points. They are spatial locations, or points in the image that define what is interesting or what stand out in the image. Interest point detection is actually a subset of blob detection, which aims to find interesting regions or spatial areas in an image. The reason why keypoints are special is because no matter how the image changes... whether the image rotates, shrinks/expands, is translated (all of these would be an affine transformation by the way...) or is subject to distortion (i.e. a projective transformation or homography), you should be able to find the same keypoints in this modified image when comparing with the original image. Here's an example from a post I wrote a while ago:
Source: module' object has no attribute 'drawMatches' opencv python
The image on the right is a rotated version of the left image. I've also only displayed the top 10 matches between the two images. If you take a look at the top 10 matches, these are points that we probably would want to focus on that would allow us to remember what the image was about. We would want to focus on the face of the cameraman as well as the camera, the tripod and some of the interesting textures on the buildings in the background. You see that these same points were found between the two images and these were successfully matched.
Therefore, what you should take away from this is that these are points in the image that are interesting and that they should be found no matter how the image is distorted.
I understand that they are some kind of "points of interest" of an image. I also know that they are scale invariant and I know they are circular.
You are correct. Scale invariant means that no matter how you scale the image, you should still be able to find those points.
Now we are going to venture into the descriptor part. What makes keypoints different between frameworks is the way you describe these keypoints. These are what are known as descriptors. Each keypoint that you detect has an associated descriptor that accompanies it. Some frameworks only do a keypoint detection, while other frameworks are simply a description framework and they don't detect the points. There are also some that do both - they detect and describe the keypoints. SIFT and SURF are examples of frameworks that both detect and describe the keypoints.
Descriptors are primarily concerned with both the scale and the orientation of the keypoint. The keypoints we've nailed that concept down, but we need the descriptor part if it is our purpose to try and match between keypoints in different images. Now, what you mean by "circular"... that correlates with the scale that the point was detected at. Take for example this image that is taken from the VLFeat Toolbox tutorial:
You see that any points that are yellow are interest points, but some of these points have a different circle radius. These deal with scale. How interest points work in a general sense is that we decompose the image into multiple scales. We check for interest points at each scale, and we combine all of these interest points together to create the final output. The larger the "circle", the larger the scale was that the point was detected at. Also, there is a line that radiates from the centre of the circle to the edge. This is the orientation of the keypoint, which we will cover next.
Also I found out that they have orientation but I couldn't understand what actually it is. It is an angle but between the radius and something?
Basically if you want to detect keypoints regardless of scale and orientation, when they talk about orientation of keypoints, what they really mean is that they search a pixel neighbourhood that surrounds the keypoint and figure out how this pixel neighbourhood is oriented or what direction this patch is oriented in. It depends on what descriptor framework you look at, but the general jist is to detect the most dominant orientation of the gradient angles in the patch. This is important for matching so that you can match keypoints together. Take a look at the first figure I have with the two cameramen - one rotated while the other isn't. If you take a look at some of those points, how do we figure out how one point matches with another? We can easily identify that the top of the cameraman as an interest point matches with the rotated version because we take a look at points that surround the keypoint and see what orientation all of these points are in... and from there, that's how the orientation is computed.
Usually when we want to detect keypoints, we just take a look at the locations. However, if you want to match keypoints between images, then you definitely need the scale and the orientation to facilitate this.
I'm not as familiar with SURF, but I can tell you about SIFT, which SURF is based on. I provided a few notes about SURF at the end, but I don't know all the details.
SIFT aims to find highly-distinctive locations (or keypoints) in an image. The locations are not merely 2D locations on the image, but locations in the image's scale space, meaning they have three coordinates: x, y, and scale. The process for finding SIFT keypoints is:
blur and resample the image with different blur widths and sampling rates to create a scale-space
use the difference of gaussians method to detect blobs at different scales; the blob centers become our keypoints at a given x, y, and scale
assign every keypoint an orientation by calculating a histogram of gradient orientations for every pixel in its neighborhood and picking the orientation bin with the highest number of counts
assign every keypoint a 128-dimensional feature vector based on the gradient orientations of pixels in 16 local neighborhoods
Step 2 gives us scale invariance, step 3 gives us rotation invariance, and step 4 gives us a "fingerprint" of sorts that can be used to identify the keypoint. Together they can be used to match occurrences of the same feature at any orientation and scale in multiple images.
SURF aims to accomplish the same goals as SIFT but uses some clever tricks in order to increase speed.
For blob detection, it uses the determinant of Hessian method. The dominant orientation is found by examining the horizontal and vertical responses to Haar wavelets. The feature descriptor is similar to SIFT, looking at orientations of pixels in 16 local neighborhoods, but results in a 64-dimensional vector.
SURF features can be calculated up to 3 times faster than SIFT features, yet are just as robust in most situations.
For reference:
A good SIFT tutorial
An introduction to SURF

Filtering out shadows when diffing frames in opencv

I am using OpenCV to process some videos where a user is placing their hands on different parts of a wall. I've selected some regions of interest and I'm currently just using cv2.absdiff on the original image of the wall with no user and the current frame to detect whether the user has their hand in a region of interest by looking at the average pixel difference. If it's above some threshold, I consider that region "activated".
The problem I'm having is that some of the video clips contain lighting and positions that result in the user casting a shadow over certain ROIs, such that they are above the threshold. Is there a good way to filter out shadows when diffing images?
OpenCV has a Mixture of Gaussian based background subtractor which also has an option to account for shadow. You can use this instead of absdiff. MOG can be a bit slow though, compared to absdiff.
Alternatively, you can convert to HSV, and check that the Hue doesn't change.
You could first detect shadow regions in the original images, and exclude them from the difference imaging part. This paper provides a simple but effective method to detect shadows in images. They explore a colour space that is invariant to shadows.

Headcount in opencv

i am new to opencv. i have to implement a headcount.
my idea is:
Identification of circular objects
We will start by edge detection to find border line of each shape.
sort through the image matrix pixel by pixel
for each pixel, analyze each of the 8 pixels surrounding it
record the value of the darkest pixel, and the lightest pixel
if (darkest_pixel_value - lightest_pixel_value) > threshold)
then rewrite that pixel as 1;
else rewrite that pixel as 0;
Now we detect shapes
count the number of continuous edges
a sharp change in line direction signifies a different line
do this by determining the average vector between adjacent pixels
if one line, then its a circle
by measure angles between lines more information can be deduced (rhomboid, equilateral triangle, etc.)
Face detection
This part includes two common approaches based on features and color. The basic idea of the algorithm is to find objects resembling an eye, then on the basis of geometric face characteristics try to join two the objects into an eye pair.
Steps:
Unimportant colors are eliminated from the image and insignificant colors are replaced with white color.
The image is then converted to grayscale.
The image is filtered with a median filter (unimportant white regions are blurred)
White regions are segmented using a Region growth algorithm.
Hough transform is applied to find circles
For each region the best possible circle is found
Using geometric face characteristics the pair of eyes is found
is this the right way to proceed or is there an easier way?
i want to count the number (estimate) of people found in a crowd (meetings, gatherings)
can you help me with the codes please?
Thank you
You can use the OpenCV built-in face detection.See http://opencv.willowgarage.com/wiki/FaceDetection for detailed instructions.
I had a similar project.
You need to get the best image so concentrate on fixing saturation, contrast and intensity.
If your planning to use color, if you want skin color detection for example, than you need to fix white balance.
Don't think of headcount, instead think of people count.
You need a good background segmentation, use Gaussian Mixture of Models
combined with other background modeling algorithm.
If this is an outdoor application you need shadow detection.
Get foreground blobs and then determine where people are in those blobs.
If your counting heads, you need to detect omega shape for the head and shoulders.
You will need tracking for occlusions and people crossing.
You can also use human body classification, opencv has haarcascade_fullbody.xml
These are just some ideas...

Resources