ORB - object needs to be very close to camera

ORB - object needs to be very close to camera - opencv

I have a program that takes a video feed from RSTP and checks for an object. The only problem is that the object needs to be about 6" from the camera but when I use a wired webcam the object can be a few feet away. Both camera are transmitting at the same resolution, what is causing this problem?
Camera transmission specs:
Resolution: 640 * 480
FPS: 20
Bitrate: 500000
Focal Length: 2.8mm
EDIT:
The algorithm I am using is the OpenCV ORB algorithm but I have also seen this behavior when previously using the Haar classifier method in OpenCV.
Below is the limit at which the webcam can no longer detect the object. (approx. 66 pixels)
Below is the limit that Glass can no longer detect the object. (approx. 68 pixels)
Looking at the image it looks like the distance is similar but the distance is at least twice that in the webcam image, which looks to me like it is a camera property that is causing this issue? if so what part of the camera would be responsible for causing this?

As you've recognized yourself, the object sizes are very similar in both images, so the algorithm seems to stop for a certain object resolution.
The difference in distance between both cameras (for the same object size) comes from camera intrinsic parameters like focal length (coming from the lens objective) and the size of the sensor chip.
Depending on the method you used to detect the object, you could resize (upscale) the second image, unless this leads to too many interpolation artifacts (which might not be handable by your detection method).
Upscaling the image is ok for many detectors that have some minimum object size, directly coming from the training data or training window size. Upscaling might lead to additonal (drastical) speed performance increase.
If intrinsic parameters of both cameras are known and the images are undistorted already, you can compute the scale factor between both images, which is:
ratioX = fx1/fx2
ratioY = fy1/fy2
if you want to upscale the 2nd image and fx1,fy1 are the focal length values of the first image.
You could crop the upscaled image afterwards, centered around the principal point. After that, both image regions should match quite well.
Hope this helps and good luck.
edit: you could use cv::undistort function to let an image look like it had another camera matrix, for testing.

Related

ArUco markers, why does the pose change when I change image resolution?

I use this reference https://automaticaddison.com/how-to-perform-pose-estimation-using-an-aruco-marker/ to estimate pose of a marker.
When I obtain cam matrix and distortion matrix I used the full camera resolution.
However, when I change the resolution (image size) before pose estimation, I am getting different results. I am not sure why and which resolution would be correct to use.
Should we always use the same resolution as what was used for camera calibration?
I expected the pose to be somewhat independent from image size other than minor changes. Any thoughts?

Yes, always use the same resolution.
One could recalculate the camera matrix and distortion coefficients to fit a different resolution but that's a hassle, and requires some knowledge of how the camera made these pictures (binning, cropping). Unless you understand the math behind it, just stick with same resolution.

is 1.5 average error on stereo camera calibration bad? using opencv

i used the opencv sample code for stereo camera calibration to get the intrinsics and extrinsics of my stereo camera. I used 149 image pairs and the program detected 114 image pairs
Result of my Calibration:
..... 114 pairs have been successfully detected.
Running stereo calibration ...
done with RMS error = 1.60208
average epipolar error = 1.15512
i know the error should be below 1 but i only get below 1 of error in small number of image pairs. so im not sure if my result is good or bad.

You should be able to get an error below 1, but it's not so bad. I also do the calibration with around 100 of images. I often got a few images to discard in which the detection was not reliable.
If you decreased the number of images down to 10 images, then the calibration might overfit for these cases. The error would then not be reliable.
In the calibration process, the problems I faced came from the calibration setup. My recommendations are the following:
Check that your calibration pattern is perfectly flat. In my case I printed on adhesive paper and glued it on a piece of glass.
Check that your calibration pattern is not symmetrical in rotation, otherwise the pose estimation could be wrong.
Check the intermediate pattern points detection. There are some examples in opencv to show the corners or circles centers detected points.
The error can be also displayed for each frame. This can help you to understand for which images you have a problem. If you see that these images actually have a detection problem, you can discard them.
If you acquire videos and not images, both cameras should be synchronized with a hardware connection. In my case I cannot have such a link, therefore I built some kind of holder for the calibration target to keep it still, and I acquired only images, not videos.
This won't reduce your calibration error, but use very different pattern positions to cover the maximum of the field of view.
If your depth of field is small and you have blurry images before/after the focus because of that, change from the chessboard pattern to a circles pattern (functions also available in opencv).
If you don't have a strong distortion in your images (e.g. a photo with an iphone doesn't really show a strong fisheye-like distortion), consider forcing K3=0.
In my case, I fixed the "principal point" in the middle of the image, because the algorithm always found crazy values for these parameters, like for K3.
Hope this helps a bit. Good luck!

How to take stereo images using single camera?

I want to find the depth map for stereo images.At present i am working on the internet image,I want to take stereo images so that i can work on it by my own.How to take best stereo images without much noise.I have single camera.IS it necessary to do rectification?How much distance must be kept between the cameras?

Not sure I've understood your problem correclty - will try anyway
I guess your currently working with images from middlebury or something similar. If you want to use similar algorithms you have to rectify your images because they are based on the assumption that corresponding pixels are on the same line in all images. If you actually want depth images (!= disparity images) you also need to get the camera extrinsics.
Your setup should have two cameras and you have to make sure that they don't change there relative position/orientation - otherwise your rectification will break apart. In the first step you have to calibrate your system to get intrinsic and extrinsic camera parameters. For that you can either use some tool or roll your own with (for example) OpenCV (calib-module). Print out a calibration board to calibrate your system. Afterwards you can take images and use the calibration to rectify the images.
Regarding color-noise:
You could make your aperture very small and use high exposure times. In my own opinion this is useless because real world situations have to deal with such things anyway.

In short, there are plenty of stereo images on the internet that are already rectified. If you want to take your own stereo images you have to follow these three steps:
The relationship between distance to the object z (mm) and disparity in pixels D is inverse: z=fb/D, where f is focal length in pixels and b is camera separation in mm. Select b such that you have at least several pixels of disparity;
If you know camera intrinsic matrix and compensated for radial distortions you still have to rectify your images in order to ensure that matches are located in the same row. For this you need to find a fundamental matrix, recover essential matrix, apply rectifying homographies and update your intrinsic camera parameters... or use stereo pairs from the Internet.
The low level of noise in the camera image is helped by brightly illuminated scenes, large aperture, large pixel size, etc.; however, depending on your set up you still can end up with a very noisy disparity map. The way to reduce this noise is to trade-off with accuracy and use larger correlation windows. Another way to clean up a disparity map is to use various validation techniques such as
error validation;
uniqueness validation or back-and-force validation
blob-noise supression, etc.

In my experience:
-I did the rectification, so I had to obtain the fundamental matrix, and this may not be correct with some image pairs.
-Better resolution of your camera is better for the matching, I use OpenCV and it has an implementation of BRISK descriptor, it was useful for me.
-Try to cover the same area and try not to do unnecessary rotations.
-Once you understand the Theory, OpenCV is a good friend. Here is some result, but I am still working on it:
Depth map:
Rectified images:

2D FFT of an image shows some invalid values for high frequencies

I uploaded my 2D FFT magnitude image here:
If you take a look at it, for high frequencies[right, left, top and bottom], only at around x and y-axis, there are some points with high power[yellow color]. These points shouldn't be in the resultant FFT2, since I know the original height image is isotropic and therefore the 2D FFT must look something like the example below(just note high frequencies):
Now, the question is, what could be the possible reasons for such behavior at high frequencies?
added:
Here is the magnitude power spectrum before windowing:
https://dl.dropboxusercontent.com/u/82779497/nowin.png
here is the original image, which is a height profile recorded by a profilometer:
https://dl.dropboxusercontent.com/u/82779497/asph5.jpg
By the way, I export data as a .txt file from profilometer software to Matlab.

The profilometer we use for capturing the surface image, uses fringe projection method which produces some artifacts along the projected stripes on the surface. So, the problem lies on the device we are capturing images with.
Thanks for comments Eddy.

How would you find the height of objects given an image?

This isn't exactly a programming question exactly. I just want to know what your approach would be to a common problem in Digital image processing.
Let's say you have an image of a few trees in say jpg format. How would you go about finding the heights of each of these trees? The photo is the only input you have.
I want to know the approaches you have not to code. So it doesn't matter if your answers are vague, or non DIP-ish.
Small correction :
The height need not be the actual height of the tree. The height can be taken to any scale. But should be consistent to all objects in the pic.

Yes it is possible. What you are describing has an entire industry around it, called Photogrammetry

There is a fair amount of computer vision research in this area. Assuming you don't know the camera constraints, you'll have to make assumptions about the scene and camera to determine the heights up to a scale factor. Note that without camera constraints or a reference height in the image it is impossible to tell the difference between a tall tree photographed from a distance or a short tree photographed up close. A great start is the Single View Metrology work by Criminisi.

It is simple to find the size of an object from images using Photogrammetry.
Photogrammetry is the science of making measurements from photographs.
For this we need to know two things,
the distance between the camera and the image plane(distance from camera to object).
Focal-length(in mm and pixels per mm) or physical size of the image sensor.
Following are the steps:
Calibrate the Camera
Use openCV to calibrate the camera.You can use the OpenCV calibrate.py tool and the Chessboard pattern PNG provided in the source code to generate a calibration matrix. Camera calibration is done to find the camera parameters. I took about a dozen of photos of the chessboard photos from many angles as I could with my webcam (to calibrate my webcam). For more details check openCV camera calibration.
We will get f_x,f_y,c_x,c_y from calibration matrix.
Checking the details of the photos you took, you will find the native resolution of the photos(heightXwidth) and in their EXIF headers you can find the focal length value(f). These items may vary depending on your camera.
Pixels per millimeter
We need to know the pixels per millimeter(px/mm) on the image sensor.
f_x=f*m_x
f_y=f*m_y
Since we have two of the variables for each formula we can solve for m_x and m_y.I just averaged f_x and f_y to get f_xy.
m=f_xy/focal_length_of_camera
Insert the image
Insert your image from which you need to find the actual size of image. You should know the distance between object and camera. Find the dimension of the image (height1Xwidth1)
Find the Object size in pixels
Determine the size of object in pixels. I simply use distance formula to find length of a selected line. You can adopt any other method.
Convert px/mm in the lower resolution
pxpermm_in_lower_resolution = (width1*m)/width
Size of object in the image sensor
size_of_object_in_image_sensor = object_size_in_pixels/(pxpermm_in_lower_resolution)
Actual size of object
The actual size of object can be found with the above data as,
real_size = (dist*size_of_object_in_image_sensor)/focal_length

Assuming they're all the same distance away, all to scale, you'd want to find a single unit of measurement you can guarantee. For example, if there's a person in the photo, again, same scale, and you know they're exactly 6 feet tall, you use that as your measure. You then take that, and count how many stacked make the tree. For example, if you need 3.5 of this person, then:
3.5 * 6 = 21
gives you a 21 foot tall tree.
Without a single point of reference for everything, or if they're all on different scales, you would need a lot more information than you could easily get without having been there.

I would rely on an object of known dimensions to be present in the picture. For instance, a man.
Or perhaps, we could use the EXIF data to reverse engineer the size of the object based on the camera's sensor dimensions, the lens and the focal length used. This again depends on the angle. We should be getting most accurate results when the camera has been held perpendicular to the subject.

If your image is 3*3 and you want to find out the size of image (i.e 3x3..so 3x3 = 9) now we have 8 pixels starting from 0 up to 8. So 9/8=(___)kb.
If you want to find the size of image in MB, like doing above example, just do like that (9/8)/(1024)=(----)MB..
So you will get the result in Mb.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart