Am I missing something with stereo calibration? - opencv

I am trying the stereo_calib example and it fails with garbage output. For instance:
However, it is finding corners in my images...
My xml file and images are all here:
https://drive.google.com/open?id=12-5jBN7FK-LO6SLb4r3YYkrOnP7f_xmG
What am I doing wrong? I first tried printing a pattern on a sheet of paper, then thought ok that must be too wavy or something, so had this printed on foam board. But no dice.

(we chatted on a side channel, so this is to the benefit of the rest of the world)
tl;dr: hold the board very still or get a camera with global shutter.
Rolling shutter (see here and there), an attribute of most webcam sensors, many camcorder sensors, and some industrial image sensors, will distort objects that are moving. If you've moved the board even just a little during a frame capture (visible in files right19/right20), it will be captured with distortion. That will affect everything you do with the picture, starting with intrinsic calibration.
To give a sense of scale for the distortions: assuming a 30 FPS video stream, the worst case rolling shutter lag is 33 ms. A pedestrian travels 40-50 mm in that time. If your hands are moving slightly, you can maybe expect a tenth of that, which is still a lot in proportion to the square sizes most people use.
Another source of trouble is printers. If you've printed your checkerboard pattern, make sure to measure the width and height of your squares. they might be slightly rectangular. It's also a good idea to make sure the pattern is quite flat, not bent.

Related

Setting up a scene for image capturing and image processing

I'm building a box where the user will put their foot and then have measurements of their feet taken.
My 1st tier goal is to take basic measurements and my reach goal is to build a 3d model of the person's foot.
Here are what some images from my first attempts and prototyping.
back of the foot | inside of the foot | outside of the foot | top of the foot
So, my big advantage is that I have a lot of control over the scene.
I want to use this fact to set things up so I can get reliable measurements using pictures.
So my questions are as follows:
1) What is the best way to set the scene up? Right now I'm going to have a blue background, lights, and a contrasting sock to create a consistent internal image. Is there a more 'optimal' contrast to use? As you can see below, it's working decently.
2) What's an easy way for me to get reliable pixel to mm measurements? I can use a patterned sock (to increase feature density) and then two cameras from each viewpoint, but it would be great to minimize the number of cameras I need.
I'm going to leave the questions there as not to overload this post - but if people have any other tips it would be very helpful. Thank you!
My approach to 1) would essentially be a "green screen" or "blue screen".
The idea is to carefully illuminate the background so that there are no shadows. Then, you can apply a color threshold, and everything that is not that specific color is the foreground. So far in your images, there is a quite a bit of shadow, which may be able to be eliminated by careful lighting. You'll have to experiment with how much of that is an issue for you.
2) This a little tougher, but possible. You will need to know the position and direction of your cameras, the lens parameters (such as F/#), the sensor parameters (pixel pitch/spacing). With this information, you should be able to locate the extrema of the foot and get some measurements. Here's a general diagram of how this might work. You could use the top view to locate the mid-line of the foot so you know how far it is from the side cameras. Then, you have all the information you need to solve of pixel to real-space measurements. The top camera is easy; since everything is in a plane (assuming the camera is properly aligned and rectified) all you have to do is put a ruler on the floor and take some pictures of it. Then, you can measure the pixels to real-space conversion directly from the image.
For your 3-d modeling issue, I'd like to point out that you don't actually have to get a full point cloud. You could just get a model of a foot and scale it for display based on the measurements you make. In any case, good luck on your project!

Detecting balls on a pool table

I'm currently working on a project where I need to be able to very reliable get the positions of the balls on a pool table.
I'm using a Kinect v2 above the table as the source.
Initial image looks like this (after converting it to 8-bit from 16-bit by throwing away pixels which is not around table level):
Then a I subtract a reference image with the empty table from the current image.
After thresholding and equalization it looks like this: image
It's fairly easy to detect the individual balls on a single image, the problem is that I have to do it constantly with 30fps.
Difficulties:
Low resolution image (512*424), a ball is around 4-5 pixel in diameter
Kinect depth image has a lot of noise from this distance (2 meters)
Balls look different on the depth image, for example the black ball is kind of inverted compared to the others
If they touch each other then they can become one blob on the image, if I try to separate them with depth thresholding (only using the top of the balls) then some of the balls can disappear from the image
It's really important that anything other than balls should not be detected e.g.: cue, hands etc...
My process which kind of works but not reliable enough:
16bit to 8bit by thresholding
Subtracting sample image with empty table
Cropping
Thresholding
Equalizing
Eroding
Dilating
Binary threshold
Contour finder
Some further algorithms on the output coordinates
The problem is that a pool cue or hand can be detected as a ball and also if two ball touches then it can cause issues. Also tried with hough circles but with even less success. (Works nicely if the Kinect is closer but then it cant cover the whole table)
Any clues would be much appreciated.
Expanding comments above:
I recommend improving the IRL setup as much as possible.
Most of the time it's easier to ensure a reliable setup than to try to "fix" that user computer vision before even getting to detecting/tracking anything.
My suggestions are:
Move the camera closer to the table. (the image you posted can be 117% bigger and still cover the pockets)
Align the camera to be perfectly perpendicular to the table (and ensure the sensor stand is sturdy and well fixed): it will be easier to process a perfect top down view than a slightly tilted view (which is what the depth gradient shows). (sure the data can be rotated, but why waste CPU cycles when you can simply keep the sensor straight)
With a more reliable setup you should be able to threshold based on depth.
You can possible threshold to the centre of balls since the information bellow is occluded anyway. The balls do not deform, so it the radius decreases fast the ball probably went in a pocket.
One you have a clear threshold image, you can findContours() and minEnclosingCircle(). Additionally you should contrain the result based on min and max radius values to avoid other objects that may be in the view (hands, pool cues, etc.). Also have a look at moments() and be sure to read Adrian's excellent Ball Tracking with OpenCV article
It's using Python, but you should be able to find OpenCV equivalent call for the language you use.
In terms tracking
If you use OpenCV 2.4 you should look into OpenCV 2.4's tracking algorithms (such as Lucas-Kanade).
If you already use OpenCV 3.0, it has it's own list of contributed tracking algorithms (such as TLD).
I recommend starting with Moments first: use the simplest and least computationally expensive setup initially and see how robuts the results are before going into the more complex algorithms (which will take to understand and get the parameters right to get expected results out of)

Removing sun light reflection from images of IR camera in realtime OpenCV application

I am developing speed estimation and vehicle counting application with OpencV and I use IR camera.
I am facing a problem of sun light reflection which causes vertical white region or lines in the images and has bad effect on my vehicle detections.
I want an approach with very high speed, because it is a real-time application.
The vertical streak defect in those images is called "blooming", happens when the one or a few wells in a CCD saturate to the point that they spill charge over neighboring wells in the same column. In addition, you have "regular" saturation with no blooming around the area of the reflection.
If you can, the best solution is to control the exposure (faster shutter time, or close lens iris if you have one). This will reduce but not eliminate blooming occurrence.
Blooming will always occur in a constant direction (vertical or horizontal, depending on your image orientation), and will normally fill entirely one or few contiguous columns. So you can cheaply detect it by heavily subsampling in the opposite dimension and looking for maxima that repeat in the same column. E.g., in you images, you could look for saturated maxima in the same column over 10 rows or so spread over the image height.
Once you detect the blooming columns, you can follow them in a small band around them to try to locate the saturated area. Note that saturation does not necessarily imply values at the end of the dynamic range (e.g. 255 for 8-bit image). Your sensor could be completely saturated at values that the A/D conversion assign at, say, 252. Saturation simply means that the image response becomes constant with respect to the input luminance.
The easiest solution (to me) is a hardware solution. If you can modify the physical camera setup add a polarizing filter to the lens of the camera. You don't even need a(n expensive) camera specific lens, adding a simple sheet of polarized film is good enough Here is one site I just googled "polarizing film" You will have to play with the orientation, but with this mounted position most surfaces are at the same angle and glare will be polarized near horizontal. So you should find a position that works well in most situations.
I've used this method before and the best part is it adds no extra algorithmic complexity or lag. Especially for mounted cameras where all surfaces are at nearly the same angle. This won't help you process the images you currently have but it will help in processing and acquiring future images.

Camera Calibration: How to do it right

I am trying to calibrate a camera using a checkerboard by the well known Zhang's method followed by bundle adjustment, which is available in both Matlab and OpenCV. There are a lot of empirical guidelines but from my personal experience the accuracy is pretty random. It could sometimes be really good but also sometimes really bad. The result actually can vary quite a bit just by simply placing the checkerboard at different locations. Suppose the target camera is rectilinear with 110 degree horizontal FOV.
Does the number of squares in the checkerboard affect the accuracy? Zhang uses 8x8 in his original paper without really explaining why.
Does the length of the square affect the accuracy? Zhang uses 17cm x 17cm without really explaining why.
What is the optimal number of snap shots of different checkerboard position/orientation? Zhang uses 5 images only. I saw people suggesting 20~30 images with checkerboards at various angles, fills the entire field of view, tilted to the left, right, top and bottom, and suggested there should be no checkerboard placed at similar position/orientation otherwise the result will be biased towards that position/orientation. Is this correct?
The goal is to figure out a workflow to get consistent calibration result.
If the accuracy you get is "pretty random" then you are likely not doing it right: with stable optics and a well conducted procedure you should consistently be getting RMS projection errors within a few tenths of a pixel. Whether this corresponds to variances of millimeters or meters in 3D space depends, of course, on your optics and sensor resolution (calibration is not a way around physics).
I wrote time ago a few suggestions in this answer, and I recommend you follow them. In particular, pay attention to locking the focus distance (I have seen & heard countless people trying to calibrate a camera on autofocus, and be sorely disappointed). As for the size of the target, again it depends on your optics and camera resolution, but generally speaking the goals are (1) to fill with measurements both the field of view and the volume of space you'll be working with, and (2) to observe significant perspective foreshortening, because that is what constrains the solution for the FOV. Good luck!
[Ed.to address a comment]
Concerning variations on the parameter values across successive calibrations, the first thing I'd do is calculate the cross RMS errors, i.e. the RMS error on dataset 1 with the camera calibrated on dataset 2, and vice versa. If either is significantly higher than the calibration errors, it's an indication that the camera has changed between the two calibrations and so all odds are off. Do you have auto-{focus,iris,zoom,stabilization} on? Turn them all off: auto-anything is the bane of calibration, with the only exception of exposure time. Otherwise, you need to see if the variations you observe on the parameters are actually meaningful (hint, they often are not). A variation of the focal length in pixels of several parts per thousand is probably irrelevant with today's sensor resolutions - you can verify that by expressing it in mm, and comparing it to the dot pitch of the sensor. Also, variations of the position of the principal point in the order of tens of pixels are common, since it is poorly observed unless your calibration procedure is very carefully designed to estimate it.
Ideally you want to place your checkerboard at roughly the same distance from the the camera, as the distance at which you want to do your measurements. So your checkerboard squares must be large enough to be resolvable from that distance. You also do want to cover the entire field of view with points, especially close to the edges and corners of the frame. Also, the smaller the board is, the more images you should take to cover the entire field of view. So 20-30 images is usually a good rule of thumb.
Another thing is that the checkerboard should be asymmetric. Ideally, you want to have an even number of squares along one side, and an odd number of squares along the other. This way the board's in-plane orientation is unambiguous.
Also, I would suggest that you try the Camera Calibrator app in MATLAB. At the very least, look at the documentation, which has a lot of useful suggestions for calibrating cameras.

People detect using Hog not finding anyone

I have a video of soccer in which the players are relatively far away from the camera and thus represent small portions of the image. I'm using background subtraction to detect the players and the results are fine but I have been asked to try detecting using Hog.
I tried using the detect MultiScale using the default descriptors presented on opencv but i cant get any detection. I dont really understand how can I make it work on this case, because on other sequences where the people are closer to the camera, the detector works fine.
Here is a sample image link
Thanks.
The descriptor you use with HOG determines the minimum size of person you can detect: with the DefaultPeopleDetector the detection window is 128 pixels high x 64 wide, so you can detect people around 90px high. With the Daimler descriptor the size you can detect is a bit smaller.
Your pedestrians are still too small for this, so you may need to magnify the whole image, or just the parts which show up as foreground using background segmentation.
Have a look at the function definition for detectMultiscale http://docs.opencv.org/modules/objdetect/doc/cascade_classification.html#cascadeclassifier-detectmultiscale
It might be that you need to reduced the value of minsize so as to detect smaller people or the people might just be too far away.

Resources