Why does Charuco Board detection sometimes fail spectacularly? - opencv

I am trying to find the positions of multiple monitors in 3d space. For that purpose I display charuco boards and take a short video of the whole setup. Sample frame here (cropped down from 4k to preserve privacy):
Code is taken verbatim from the OpenCV charuco tutorials:
https://docs.opencv.org/master/da/d13/tutorial_aruco_calibration.html
https://docs.opencv.org/3.4/df/d4a/tutorial_charuco_detection.html
The only change I made was that I created all six boards from separate parts of a big aruco dictionary so that every single marker is unique.
Calibration reports an error of 3-4 (which I believe is in pixels, so on a 4k video isn't too bad).
90% of time this works fine, but in some frames the detection fails spectacularly for no apparent reason as is visible in the lower right screen in the image above. Note that the actual Aruco markers (red corners, green frames, blue ids) get detected well, but the program fails to detect the pose of the charuco board (green markers, rgb axis). This happens sporadically for one or two frames, but I would prefer it not to happen at all.
Any ideas?

Related

Am I missing something with stereo calibration?

I am trying the stereo_calib example and it fails with garbage output. For instance:
However, it is finding corners in my images...
My xml file and images are all here:
https://drive.google.com/open?id=12-5jBN7FK-LO6SLb4r3YYkrOnP7f_xmG
What am I doing wrong? I first tried printing a pattern on a sheet of paper, then thought ok that must be too wavy or something, so had this printed on foam board. But no dice.
(we chatted on a side channel, so this is to the benefit of the rest of the world)
tl;dr: hold the board very still or get a camera with global shutter.
Rolling shutter (see here and there), an attribute of most webcam sensors, many camcorder sensors, and some industrial image sensors, will distort objects that are moving. If you've moved the board even just a little during a frame capture (visible in files right19/right20), it will be captured with distortion. That will affect everything you do with the picture, starting with intrinsic calibration.
To give a sense of scale for the distortions: assuming a 30 FPS video stream, the worst case rolling shutter lag is 33 ms. A pedestrian travels 40-50 mm in that time. If your hands are moving slightly, you can maybe expect a tenth of that, which is still a lot in proportion to the square sizes most people use.
Another source of trouble is printers. If you've printed your checkerboard pattern, make sure to measure the width and height of your squares. they might be slightly rectangular. It's also a good idea to make sure the pattern is quite flat, not bent.

Detecting balls on a pool table

I'm currently working on a project where I need to be able to very reliable get the positions of the balls on a pool table.
I'm using a Kinect v2 above the table as the source.
Initial image looks like this (after converting it to 8-bit from 16-bit by throwing away pixels which is not around table level):
Then a I subtract a reference image with the empty table from the current image.
After thresholding and equalization it looks like this: image
It's fairly easy to detect the individual balls on a single image, the problem is that I have to do it constantly with 30fps.
Difficulties:
Low resolution image (512*424), a ball is around 4-5 pixel in diameter
Kinect depth image has a lot of noise from this distance (2 meters)
Balls look different on the depth image, for example the black ball is kind of inverted compared to the others
If they touch each other then they can become one blob on the image, if I try to separate them with depth thresholding (only using the top of the balls) then some of the balls can disappear from the image
It's really important that anything other than balls should not be detected e.g.: cue, hands etc...
My process which kind of works but not reliable enough:
16bit to 8bit by thresholding
Subtracting sample image with empty table
Cropping
Thresholding
Equalizing
Eroding
Dilating
Binary threshold
Contour finder
Some further algorithms on the output coordinates
The problem is that a pool cue or hand can be detected as a ball and also if two ball touches then it can cause issues. Also tried with hough circles but with even less success. (Works nicely if the Kinect is closer but then it cant cover the whole table)
Any clues would be much appreciated.
Expanding comments above:
I recommend improving the IRL setup as much as possible.
Most of the time it's easier to ensure a reliable setup than to try to "fix" that user computer vision before even getting to detecting/tracking anything.
My suggestions are:
Move the camera closer to the table. (the image you posted can be 117% bigger and still cover the pockets)
Align the camera to be perfectly perpendicular to the table (and ensure the sensor stand is sturdy and well fixed): it will be easier to process a perfect top down view than a slightly tilted view (which is what the depth gradient shows). (sure the data can be rotated, but why waste CPU cycles when you can simply keep the sensor straight)
With a more reliable setup you should be able to threshold based on depth.
You can possible threshold to the centre of balls since the information bellow is occluded anyway. The balls do not deform, so it the radius decreases fast the ball probably went in a pocket.
One you have a clear threshold image, you can findContours() and minEnclosingCircle(). Additionally you should contrain the result based on min and max radius values to avoid other objects that may be in the view (hands, pool cues, etc.). Also have a look at moments() and be sure to read Adrian's excellent Ball Tracking with OpenCV article
It's using Python, but you should be able to find OpenCV equivalent call for the language you use.
In terms tracking
If you use OpenCV 2.4 you should look into OpenCV 2.4's tracking algorithms (such as Lucas-Kanade).
If you already use OpenCV 3.0, it has it's own list of contributed tracking algorithms (such as TLD).
I recommend starting with Moments first: use the simplest and least computationally expensive setup initially and see how robuts the results are before going into the more complex algorithms (which will take to understand and get the parameters right to get expected results out of)

Screen detection using opencv / Emgucv

I'm working on computer screen detection using emgucv (a c# opencv wrapper ).
I want to detect my computer screnn and draw a rectangle on it.
To help in this process, I used 3 Infrared Leds on the screen of the computer which I detect firtsly and after the detection, I could find the screen areas below those 3 leds.
Here is the results after the detection of the 3 leds.
The 3 red boxes are the detected leds.
.
And in general I have something like this
Does anyone have an idea about how I can proceed to detect the whole screan area ?
This is just a suggestion but, if you know for a fact that your computer screen is below your LEDs, you could try using OpenCV GrabCut algorithm. Draw a rectangle below the LEDs, large enough to contain the screen (maybe you could guess the size from the space between the LEDs) and use it to initialize the GrabCut.
Let me know what kind of results you get.
You can try to use a camera with no IR filter(This is mostly all night vision cameras) so that you can get a more intense light from the LEDs hence making it stand out than what your display would have then its a simple blob detection to get there position.
Another solution would be using ARUCO markers on the display if the view angle you are tending to use are not very large then its should be a compelling option and even the relative position of the camera with the display can be predicted also if that is what you want. With the detection of ARUCO you can get the angles that the plane of the display is placed at hence making the estimation of the display area with them.

Removing sun light reflection from images of IR camera in realtime OpenCV application

I am developing speed estimation and vehicle counting application with OpencV and I use IR camera.
I am facing a problem of sun light reflection which causes vertical white region or lines in the images and has bad effect on my vehicle detections.
I want an approach with very high speed, because it is a real-time application.
The vertical streak defect in those images is called "blooming", happens when the one or a few wells in a CCD saturate to the point that they spill charge over neighboring wells in the same column. In addition, you have "regular" saturation with no blooming around the area of the reflection.
If you can, the best solution is to control the exposure (faster shutter time, or close lens iris if you have one). This will reduce but not eliminate blooming occurrence.
Blooming will always occur in a constant direction (vertical or horizontal, depending on your image orientation), and will normally fill entirely one or few contiguous columns. So you can cheaply detect it by heavily subsampling in the opposite dimension and looking for maxima that repeat in the same column. E.g., in you images, you could look for saturated maxima in the same column over 10 rows or so spread over the image height.
Once you detect the blooming columns, you can follow them in a small band around them to try to locate the saturated area. Note that saturation does not necessarily imply values at the end of the dynamic range (e.g. 255 for 8-bit image). Your sensor could be completely saturated at values that the A/D conversion assign at, say, 252. Saturation simply means that the image response becomes constant with respect to the input luminance.
The easiest solution (to me) is a hardware solution. If you can modify the physical camera setup add a polarizing filter to the lens of the camera. You don't even need a(n expensive) camera specific lens, adding a simple sheet of polarized film is good enough Here is one site I just googled "polarizing film" You will have to play with the orientation, but with this mounted position most surfaces are at the same angle and glare will be polarized near horizontal. So you should find a position that works well in most situations.
I've used this method before and the best part is it adds no extra algorithmic complexity or lag. Especially for mounted cameras where all surfaces are at nearly the same angle. This won't help you process the images you currently have but it will help in processing and acquiring future images.

Undistorting/rectify images with OpenCV

I took the example of code for calibrating a camera and undistorting images from this book: shop.oreilly.com/product/9780596516130.do
As far as I understood the usual camera calibration methods of OpenCV work perfectly for "normal" cameras.
When it comes to Fisheye-Lenses though we have to use a vector of 8 calibration parameters instead of 5 and also the flag CV_CALIB_RATIONAL_MODEL in the method cvCalibrateCamera2.
At least, that's what it says in the OpenCV documentary
So, when I use this on an array of images like this (Sample images from OCamCalib) I get the following results using cvInitUndistortMap: abload.de/img/rastere4u2w.jpg
Since the resulting images are cut out of the whole undistorted image, I went ahead and used cvInitUndistortRectifyMap (like it's described here stackoverflow.com/questions/8837478/opencv-cvremap-cropping-image). So I got the following results: abload.de/img/rasterxisps.jpg
And now my question is: Why is not the whole image undistorted? In some pics of my later results you can recognize that the laptop for example is still totally distorted. How can I acomplish even better results using the standard OpenCV methods?
I'm new to stackoverflow and I'm new to OpenCV as well, so please excuse any of my shortcomings when it comes to expressing my problems.
All chessboard corners should be visible to be found. The algorithm expect a certain size of chessboard such as 4x3 or 7x6 (for example). The white border around a chess board should be visible too or dark squares may not be defined precisely.
You still have high distortions at the image periphery after undistort() since distortions are radial (that is they increase with the radius) and your found coefficients are wrong. The latter are wrong since a calibration process minimizes the sum of squared errors in pixel coordinates and you did not represent the periphery with enough samples.
TODO: You have to have 20-40 chess board pattern images if you use 8 distCoeff. Slant your boards at different angles, put them at different distances and spread them around, especially at the periphery. Remember, the success of calibration depends on sampling and also on seeing vanishing points clearly from your chess board (hence slanting and tilting).

Resources