Removing sun light reflection from images of IR camera in realtime OpenCV application - opencv

I am developing speed estimation and vehicle counting application with OpencV and I use IR camera.
I am facing a problem of sun light reflection which causes vertical white region or lines in the images and has bad effect on my vehicle detections.
I want an approach with very high speed, because it is a real-time application.

The vertical streak defect in those images is called "blooming", happens when the one or a few wells in a CCD saturate to the point that they spill charge over neighboring wells in the same column. In addition, you have "regular" saturation with no blooming around the area of the reflection.
If you can, the best solution is to control the exposure (faster shutter time, or close lens iris if you have one). This will reduce but not eliminate blooming occurrence.
Blooming will always occur in a constant direction (vertical or horizontal, depending on your image orientation), and will normally fill entirely one or few contiguous columns. So you can cheaply detect it by heavily subsampling in the opposite dimension and looking for maxima that repeat in the same column. E.g., in you images, you could look for saturated maxima in the same column over 10 rows or so spread over the image height.
Once you detect the blooming columns, you can follow them in a small band around them to try to locate the saturated area. Note that saturation does not necessarily imply values at the end of the dynamic range (e.g. 255 for 8-bit image). Your sensor could be completely saturated at values that the A/D conversion assign at, say, 252. Saturation simply means that the image response becomes constant with respect to the input luminance.

The easiest solution (to me) is a hardware solution. If you can modify the physical camera setup add a polarizing filter to the lens of the camera. You don't even need a(n expensive) camera specific lens, adding a simple sheet of polarized film is good enough Here is one site I just googled "polarizing film" You will have to play with the orientation, but with this mounted position most surfaces are at the same angle and glare will be polarized near horizontal. So you should find a position that works well in most situations.
I've used this method before and the best part is it adds no extra algorithmic complexity or lag. Especially for mounted cameras where all surfaces are at nearly the same angle. This won't help you process the images you currently have but it will help in processing and acquiring future images.

Related

Check for movement in a small area in video in OpenCV

I've got a video stream from the camera. My goal is to detect and track position of moving object (train).
First of all I tried to use movement detection (frames difference, background subtractors) but it gave bad results.
Tried to cling to the color of the object but it's often (bad lighting, blurry) the same color as a ground (railways).
So the current approach is to divide the area of movement of object into n small regions and define difference between the stored region when there's no object and current one.
The main problem in here is that lightness is changing and when I use a stored region from a reference frame (where there's no object) the brightness of the current frame might be different and it breaks it up.
Also brightness can change while object moving.
It helps me to apply a gaussianBlur and histogramEqualization to make sensitivity to changes in brightness a bit less.
I tried to compare a structure of according regions using ssim, lbph, hog. When I test lbph and hog approaches on the manual cropped regions which is larger than real ones it looked like working but then I used them for my small regions and it stopped working.
In the moment the most efficient approach is just difference between grayscale regions with rmse using a fixed thresholds but it's not robust approach and it suffers a lot when brightness is changing.
Now I'm trying to use some high-pass operator to extract the most dominant edges like with sobel operator in the attached figure but I'm not sure how to properly compare the high-passed regions except by finding the difference.
Frame with an empty railway:
In some seconds a train is appeared and luminance was changed.
At night time luminance is also different.
So the questions are:
What approaches are there for comparison of high-passed images?
Is there any other way to determine if an area is overlapped which you could suggest me?

Detecting balls on a pool table

I'm currently working on a project where I need to be able to very reliable get the positions of the balls on a pool table.
I'm using a Kinect v2 above the table as the source.
Initial image looks like this (after converting it to 8-bit from 16-bit by throwing away pixels which is not around table level):
Then a I subtract a reference image with the empty table from the current image.
After thresholding and equalization it looks like this: image
It's fairly easy to detect the individual balls on a single image, the problem is that I have to do it constantly with 30fps.
Difficulties:
Low resolution image (512*424), a ball is around 4-5 pixel in diameter
Kinect depth image has a lot of noise from this distance (2 meters)
Balls look different on the depth image, for example the black ball is kind of inverted compared to the others
If they touch each other then they can become one blob on the image, if I try to separate them with depth thresholding (only using the top of the balls) then some of the balls can disappear from the image
It's really important that anything other than balls should not be detected e.g.: cue, hands etc...
My process which kind of works but not reliable enough:
16bit to 8bit by thresholding
Subtracting sample image with empty table
Cropping
Thresholding
Equalizing
Eroding
Dilating
Binary threshold
Contour finder
Some further algorithms on the output coordinates
The problem is that a pool cue or hand can be detected as a ball and also if two ball touches then it can cause issues. Also tried with hough circles but with even less success. (Works nicely if the Kinect is closer but then it cant cover the whole table)
Any clues would be much appreciated.
Expanding comments above:
I recommend improving the IRL setup as much as possible.
Most of the time it's easier to ensure a reliable setup than to try to "fix" that user computer vision before even getting to detecting/tracking anything.
My suggestions are:
Move the camera closer to the table. (the image you posted can be 117% bigger and still cover the pockets)
Align the camera to be perfectly perpendicular to the table (and ensure the sensor stand is sturdy and well fixed): it will be easier to process a perfect top down view than a slightly tilted view (which is what the depth gradient shows). (sure the data can be rotated, but why waste CPU cycles when you can simply keep the sensor straight)
With a more reliable setup you should be able to threshold based on depth.
You can possible threshold to the centre of balls since the information bellow is occluded anyway. The balls do not deform, so it the radius decreases fast the ball probably went in a pocket.
One you have a clear threshold image, you can findContours() and minEnclosingCircle(). Additionally you should contrain the result based on min and max radius values to avoid other objects that may be in the view (hands, pool cues, etc.). Also have a look at moments() and be sure to read Adrian's excellent Ball Tracking with OpenCV article
It's using Python, but you should be able to find OpenCV equivalent call for the language you use.
In terms tracking
If you use OpenCV 2.4 you should look into OpenCV 2.4's tracking algorithms (such as Lucas-Kanade).
If you already use OpenCV 3.0, it has it's own list of contributed tracking algorithms (such as TLD).
I recommend starting with Moments first: use the simplest and least computationally expensive setup initially and see how robuts the results are before going into the more complex algorithms (which will take to understand and get the parameters right to get expected results out of)

OpenCV - calibrate camera using static images in water

I have a photocamera mounted vertically under water in a tank, looking downwards.
There is a flat grid on the bottom of the tank (approx 2m away from the camera).
I want to be able to place markers on the bottom, and use computer vision to know their real life exact position.
So, I need to map from pixels to mm.
If I am not mistaken, cv::calibrateCamera(...) does just this, but is dependent on moving a pattern in front of the camera.
I have just static pictures of the scene, and the camera never moves in relation to the grid. Thus, I have only a "single" image to find the parameters.
How can I do this using the grid?
Thank you.
Interesting problem! The "cute" part is the effect on the intrinsic parameters of the refraction at the water-glass interface, namely to increase the focal length (or, conversely, to reduce the field of view) compared to the same lens in air. In theory, you could calibrate in air and then correct for the difference in refraction index, but calibrating directly in water is likely to give you more accurate results.
Do know your accuracy requirements? And have you verified that your lens/sensor combination is adequate to meet them (with an adequate margin)? To answer the question you need to estimate (either by calculation from the lens and sensor specifications, or experimentally using a resolution chart) whether you can resolve in an image the minimal distances required by your application.
From the wording of your question I think that you are interested only in measurements on a single plane. So you only need to (a) remove the nonlinear (barrel or pincushion) lens distortion and (b) estimate the homography between the plane of interest and the image. Once you have the latter, you can directly convert from undistorted image coordinates to world ones by matrix multiplication. Additionally if (as I imagine) the plane of interest is roughly parallel to the image plane, you should not have any problem keeping the entire field-of-view in focus.
Of course, for all of this to work as expected, you should make sure that the tank bottom is really flat, within the measurement tolerances of your application. Otherwise you are really dealing with a 3D problem, and need to modify your procedures accordingly.
The actual procedure depends a lot on the size of the tank, which you don't indicate clearly. If it's small enough that it is practical to manufacture a chessboard-like movable calibration target, by all means go for it. You may want to take a look at this other answer for suggestions. In the following I'll discuss the more interesting case in which your tank is large, e.g. the size of a swimming pool.
I'd proceed by sticking calibration markers in a regular grid at the pool bottom. I'd probably choose checker-like markers like these, maybe printing them myself with a good laser printer on plastic with an adhesive backing (assuming you can leave them in place forever). You should plan on having quite a few of them, say, an 8x8 or 10x10 grid, covering as much as possible of the field of view of the camera in its operating position and pose. To help with lining up the grid nicely you might use a laser line projector of suitable fan angle, or a laser pointer attached to a rotating support. Note carefully that it is not necessary that they be affixed in a precise X-Y grid (which may be complicated, depending on the size of your pool), only that their positions with respect any arbitrarily chosen (but fixed) three of them be known. In other words, you can attach them to the bottom approximately in a grid, then measure the distances of three extreme corners from each other as accurately as you can, thus building a base triangle, then measure the distances of all the other corners from the vertices of the triangle, and finally reconstruct their true positions with a bit of trigonometry. It's basically a surveying problem and, depending on your accuracy requirements and budget, you may want to enroll a local friendly professional surveyor (and their tools) to get it done as precisely as necessary.
Once you have your grid, you can fill the pool, get your camera, focus and f-stop the lens as needed for the application. From now on you may not touch the focus and f-stop ever again, under penalty of miscalibrating - exposure can only be controlled by the exposure time, so make sure to have enough light. Disable any and all auto-focus and auto-iris functions, if any. If the camera has a non-rigid lens mount (e.g. a DLSR), you'll need some kind of mechanical rig to ensure that the lens-body pair stay rigid. F-stop as close as you can, given the available lighting and sensor, so to have a fair bit of depth of field available. Then take several photos (~ 10) of the grid, moving and rotating the camera, and going a bit closer and farther away than your expected operating distance from the plane. You'll want to "see" in some images some significant perspective foreshortening of the grid - this is needed to accurately calibrate the focal length. Avoid JPG and any other lossy compression format when storing the images - use lossless PNG or TIFF.
Once you have the images, you can manually mark and identify the checker markers in the images. For a once-off project like this I would not bother with automatic identification, just do it manually (e.g. in Matlab, or even in Photoshop or Gimp). To help identify the markers, you could, e.g. print a number next to them. Once you have the manual marks, you can refine them automatically to subpixel accuracy, e.g. using cv::findCornerSubpix.
You're almost done. Feed the "reference" measured position of the real corners, and the observed ones in all images, to your favorite camera calibration routine, e.g. cv::calibrateCamera. You use the nominal focal length of the camera (converted to pixels) for an initial estimate, along with null distortion. If all goes well, you will obtain the camera intrinsic parameters, which you will keep, and the camera poses at all images, which you'll throw away.
Now you can mount the camera in your final setup, as needed by your application, and take one further image of the grid. Mark and refine the corner positions as before. Undistort their image positions using the distortion parameters returned by the calibration. Finally compute the homography between the reference positions of the real markers (in meters) and their undistorted positions, and you're done.
HTH
To calibrate the camera you do need multiple images of the checkerboard (or one of the other patterns found here). What you can do, is calibrate the camera outside of the water or do a calibration sequence once.
Once you have that information (focal length, center of lens, distortion, etc). You can use the solvePNP function to estimate the orientation of a single board. This estimation provides you with a distance from the camera to the board.
A completely different alternative could be to find what kind of lens the camera uses and manually fill in the data. I've not tried this, so I'm uncertain how well this would work.

Image Processing for recognizing 2D features

I've created an iPhone app that can scan an image of a page of graph paper and can then tell me which squares have been blacked out and which squares are blank.
I do this by scanning from left to right and use the graph paper's lines as guides. When I encounter a graph paper line, I start to look for black, until I hit the graph paper line again. Then, instead of continuing along the scan line, I go ahead and completely scan the square for black. Then I continue on to the next box. At the end of the line, I skip down so many pixels before starting the scan on a new line (since I have already figured out how tall each box is).
This sort of works, but there are problems. Sometimes I mistake the graph lines as "black". Sometimes, if the image is skewed, or I don't have uniform lighting across the page, then I don't get good results.
What I'd like to do is to specify a few "alignment" boxes that I then resize and rotate (and skew) the picture to align with those. Then, I was thinking that once I have the image aligned, I would then know where all the boxes are and won't have to scan for the boxes, just scan inside the location of the boxes to see if they are black. This should be faster and more reliable. And if I were to operate on images coming from the camera, I'd have more flexibility in asking the user to align the picture to match the alignment marks, rather than having to align the image myself.
Given that this is my first Image Processing project, I feel like I am reinventing the wheel. I'd like suggestions on how to do this, and whether to utilize libraries like OpenCV.
I am enclosing an image similar to what I would like processed. I am looking for a list of all squares that have a significant amount of black marking, i.e. A8, C4, E7, G4, H1, J9.
Issues to be aware of:
Light coverage of the image may not be ideal, but should be relatively consistent across the image (i.e. no shadows)
All squares may be empty or all dark, and the algorithm needs to be able to determine that
the image may be skewed or rotated about any of the axis. Rotation about the z axis maybe easy to fix. There may be rotation around the x or y axis making ones side of the image be wider than the other. However, if I scan the image in realtime as it comes from the camera, I can ask the user to align the alignment marks with marks on the screen. How best to ensure that alignment to give the user appropriate feedback? Just checking to make sure that the 4 corners are dark could result in a false positive when the camera is pointing to a black surface.
not every square will be equally or consistently blacked, but I think there will be enough black to make it unquestionable to a human eye.
the blue grid may be useful, but there are cases where the black markings may overlap the blue grid. I think a virtual grid is probably better than relying on the printed grid. I would think that using the alignment markers to align the image, would then allow for a precise virtual grid to be laid out. And then the contents of each grid box could be sampled, to see if it was predominantly black, vs scanning from left-to-right, no? Here is another image with more markings on the grid. In this image, in addition to the previous marking in A8, C4, E7, G4, H1, J9, I have marked E2, G8 and G9, and I4 and J4 and you can see how the blue grid is obscured.
This is my first phase of this project. Eventually I'd like to scale this algorithm to be able to process at least a few hundred slots and possibly different colors.
To start with, this problem reminded me a bit of these demo's that might be useful to learn from:
The DNA microarray image processing
The Matlab Sudoku solver
The Iphone Sudoku solver blog post, explaining the image processing
Personally, I think the most simple approach would be to detect the squares in your image.
1) Remove the background and small cruft
f_makebw = #(I) im2bw(I.data, double(median(I.data(:)))/1.3);
bw = ~blockproc(im, [128 128], f_makebw);
bw = bwareaopen(bw, 30);
2) Remove everything but the squares and circles.
se = strel('disk', 5);
bw = imerode(bw, se);
% Detect the squares and cricles via morphology
[B, L] = bwboundaries(bw, 'noholes');
3) Detect the squares using 'extend' from regionprops. The 'Extent' metric measures what proportion of the bounding-box is filled. This makes it a
nice measure to distinguish between circles and squares
stats = regionprops(L, 'Extent');
extent = [stats.Extent];
idx1 = find(extent > 0.8);
bw = ismember(L, idx1);
4) This leaves you with your features, to synchronize or rectify the image with. An easy, and robust way, to do this, is via the Autocorrelation Function.
This gives nice peaks, which are easily detected. These peaks can be matched against the ACF peaks from a template image via the Hungarian algorithm. Once matched, you can correct rotation and scaling as you now have a linear system which you can solve:
x = Ax'
Translation can then be corrected using run-of-the-mill cross correlation against the same pre defined template.
If all goes well, you know have an aligned or synchronized image, which should help considerably in determining the position of the dots.
I've been starting to do something similar using my GPUImage iOS framework, so that might be an alternative to doing all of this in OpenCV or something else. As it's name indicates, GPUImage is entirely GPU-based, so it can have some tremendous performance benefits over CPU-bound processing (up to 180X faster for doing things like processing live video).
As a first stage, I took your images and ran them through a simple luminance thresholding filter with a threshold of 0.5 and arrived at the following for your two images:
I just added an adaptive thresholding filter, which attempts to correct for local illumination variances, and works really well for picking out text. However, in your images it uses too small of an averaging radius to handle your blobs well:
and seems to bring out your grid lines, which it sounds like you wish to ignore.
Maurits provides a more comprehensive description of what you could do, but there might be a way to implement these processing operations as high-performance GPU-based filters instead of relying on slower OpenCV versions of the same calculations. If you could grab rotation and scaling information from this thresholded image, you could construct a transform that could also be applied as a filter to your thresholded image to produce your final aligned image, which could then be downsampled and read out by your application to determine which grid locations were filled in.
These GPU-based thresholding operations run in less than 2 ms for 640x480 frames on an iPhone 4, so it might be possible to chain filters together to analyze incoming video frames as fast as the device's video camera can provide them.

Finding the height above water level of rocks

I am currently helping a friend working on a geo-physical project, I'm not by any means a image processing pro, but its fun to play
around with these kinds of problems. =)
The aim is to estimate the height of small rocks sticking out of water, from surface to top.
The experimental equipment will be a ~10MP camera mounted on a distance meter with a built in laser pointer.
The "operator" will point this at a rock, press a trigger which will register a distance along of a photo of the rock, which
will be in the center of the image.
The eqipment can be assumed to always be held at a fixed distance above the water.
As I see it there are a number of problems to overcome:
Lighting conditions
Depending on the time of day etc., the rock might be brighter then the water or opposite.
Sometimes the rock will have a color very close to the water.
The position of the shade will move throughout the day.
Depending on how rough the water is, there might sometimes be a reflection of the rock in the water.
Diversity
The rock is not evenly shaped.
Depending on the rock type, growth of lichen etc., changes the look of the rock.
Fortunateness, there is no shortage of test data. Pictures of rocks in water is easy to come by. Here are some sample images:
I've run a edge detector on the images, and esp. in the fourth picture the poor contrast makes it hard to see the edges:
Any ideas would be greatly appreciated!
I don't think that edge detection is best approach to detect the rocks. Other objects, like the mountains or even the reflections in the water will result in edges.
I suggest that you try a pixel classification approach to segment the rocks from the background of the image:
For each pixel in the image, extract a set of image descriptors from a NxN neighborhood centered at that pixel.
Select a set of images and manually label the pixels as rock or background.
Use the labeled pixels and the respective image descriptors to train a classifier (eg. a Naive Bayes classifier)
Since the rocks tends to have similar texture, I would use texture image descriptors to train the classifier. You could try, for example, to extract a few statistical measures from each color chanel (R,G,B) like the mean and standard deviation of the intensity values.
Pixel classification might work here, but will never yield a 100% accuracy. The variance in the data is really big, rocks have different colours (which are also "corrupted" with lighting) and different texture. So, one must account for global information as well.
The problem you deal with is foreground extraction. There are two approaches I am aware of.
Energy minimization via graph cuts, see e.g. http://en.wikipedia.org/wiki/GrabCut (there are links to the paper and OpenCV implementation). Some initialization ("seeds") should be done (either by a user or by some prior knowledge like the rock is in the center while water is on the periphery). Another variant of input is an approximate bounding rectangle. It is implemented in MS Office 2010 foreground extraction tool.
The energy function of possible foreground/background labellings enforces foreground to be similar to the foreground seeds, and a smooth boundary. So, the minimum of the energy corresponds to the good foreground mask. Note that with pixel classification approach one should pre-label a lot of images to learn from, then segmentation is done automatically, while with this approach one should select seeds on each query image (or they are chosen implicitly).
Active contours a.k.a. snakes also requre some user interaction. They are more like Photoshop Magic Wand tool. They also try to find a smooth boundary, but do not consider the inner area.
Both methods might have problems with the reflections (pixel classification will definitely have). If it is the case, you may try to find an approximate vertical symmetry, and delete the lower part, if any. You can also ask a user to mark the reflaction as a background while collecting stats for graph cuts.
Color segmentation to find the rock, together with edge detection to find the top.
To find the water level I would try and find all the water-rock boundaries, and the horizon (if possible) then fit a plane to the surface of the water.
That way you don't need to worry about reflections of the rock.
Easier if you know the pitch angle between the camera and the water and if the camera is is leveled horizontally (roll).
ps. This is a lot harder than I thought - you don't know the distance to all the rocks so fitting a plane is difficult.
It occurs that the reflection is actually the ideal way of finding the level, look for symetric path edges in the rock edge detection and pick the vertex?

Resources