In the current commit, ManipulationStation used DepthImageToPointCloud to project the point cloud from color image and depth image input. However, in the documentation, it stated that
Note that if a color image is provided, it must be in the same frame as the depth image.
In my understanding, both color and depth image input come from RgbdSensor which is created from the info from MakeD415CameraModel. Both C and D are two difference frames.
I think the resulting point cloud have a wrong coloring. I tested it on similar setup, but not the MakeD415CameraModel exactly. I currently solved this issue by forcing C and D to be the same frames in MakeD415CameraModel.
My question : Do drake have the method that map between depth image and color image from different frames, similar to kinect library? Since this is the simulation after all, maybe this is overkill?
P.S. I am trying to simulate the image from Azure Kinect; hence the question.
The issue https://github.com/RobotLocomotion/drake/issues/12125 discusses this problem, and has either a work-around (in case your simulation could fudge, and have the depth and camera frames identical even though that's slightly different than the real sensor) or a pointer to a different kind of solution (the cv2.registerDepth algorithm). Eventually it would be nice to have registerDepth-like feature built in to drake, but as of today, it's not there yet.
Related
I am currently looking for a proper solution to the following problem, which is not directly programming oriented, but I am guessing that the users of opencv might have an idea:
My stereo camera has a sensor of 1/3.2" 752x480 resolution. I am using the two stereo images of this very camera in order to create a point cloud, thanks to the point cloud library (PCL).
The problem is that I would like to reduce the number of points contained by the point cloud, by directly lowering the resolution of the input images (passing from 752x480 to 376x240).
As it is indicated in the title, I have to adapt the focal of the camera in pixels to this need:
I calculate this very parameter thanks to the following formula:
float focal_pixel = (FOCAL_METERS / SENSOR_WIDTH_METERS)*InputImg.cols;
However the SENSOR_WIDTH_METERS is currently constant and corresponds to the 1/3.2" data converted to meters AND I would like to adapt this to the resolution that I would like to have: 376x240.
I am absolutly not sure if I turned my problem clearly enough to be answered, which would mean that I am going in the wrong direction.
Thank you in advance
edit: the function used to process the stereo image (after computing):
getPointCloud(hori_c_pp, vert_c_pp, focal_pixel, BASELINE_METERS, out_stereo_cloud, ref_texture);
where the two first parameters are just the coordinates of the center of the image, BASELINE_METERS the baseline of my camera out_stereo_cloud my output cloud and eventually ref_texture the color information. This function is taken from the sub library stereo_matching.
For some reason, if I just resize the stereo images, it seems to enter in conflict with the focal_pixel parameters, since the dimension are not the same anymore.
Im very lost on this issue.
As I don't really follow the formulas and method calls you're posting I advise you to use another approach.
OpenCV already gives you the possibility to create voxels using stereo images with the method cv::reprojectImageTo3D. Another question also already discusses the conversion to the according PCL datatype.
If you only want to reproject a certain ROI of your image you should opt for cv::perspectiveTransform as is explained in the documentation I pointed out in the first link.
I start with creating an initial mask of an object in an image. Using this mask, a histogram is created which is then used to process subsequent images.
I use the calcBackProject function to find pixels in the image that belong to the histogram. The problem I am having is that too much of the image is being accepted because certain objects are similar to the color of the initial object. Is there any alternative to calcBackProject? In my application, I can't afford to get objects that do not belong. All of this assumes that I have a perfect initial mask.
There are many ways to track an object, and it can be very difficult. Within OpenCV you may want to try the meanshift/camshift tracker to see if these are any better. If not then you may have to stray out of the opencv world and try tracking-learning-detection frameworks.
Meanshift/Camshift/etc in OpenCV
http://docs.opencv.org/modules/video/doc/video.html
http://docs.opencv.org/trunk/doc/py_tutorials/py_video/py_meanshift/py_meanshift.html
Tracking-Learning-Detection in C++:
STRUCK: http://www.samhare.net/research/struck (uses opencv)
Tracking-Learning-Detection in Matlab:
Preditor: http://personal.ee.surrey.ac.uk/Personal/Z.Kalal/tld.html
This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
matchTemplate opencv not working as shown in opencv document
I have posted some questions earlier also but still can not find the solution.
According to my requirement I have to create a scanning paper app.
In this the camera takes a picture and I have to detect the patterns(that will be predefined) if it appears in the captured image or not.
I tried it with matchTemplate(opencv) but could not succeed it.
Since the image is captured from camera , so it might the case that the pattern in the captured image can be small or big from the size of the pattern image,
so in this case do the matchTemplate will work properly, or if this could not be the solution so what another approach should I try now.
match template won't work for different scales(sizes). To do that you can do a multiscale search. Basically you can run the template matching in different scales of input image. Another way you can do is to train a opencv Haar cascade to detect the template. It has built in multiscale detection.
the wrong size of the template is a standard problem of template matching. Since I don t see any example code it is not easy to understand where maybe the real problem of your question is. Did you try different thresholds in the algorithm ?
For the theoretical aspect there are two big main problems for feature extraction the size (distance) and the rotation (object orientation). The general hough transformation could be a solution.
I'm trying to track the position of a robot from an overhead webcam. However, as I don't have much access to the robot or the environment, so I have been working with snapshots from the webcam.
The robot has 5 bright LEDs positioned strategically which are a different enough color from the robot and the environment so as to easily isolate.
I have been able to do just that using EmguCV, resulting in a binary image like the one below. My question is now, how to I get the positions of the five blobs and use those positions to determine the position and orientation of the robot?
I have been experimenting with the Emgu.CV.VideoSurveillance.BlobTrackerAuto class, but it stubbornly refuses to detect the blobs in the above image. Being a bit of a newbie when it comes to any of this, I'm not sure what I could be doing wrong.
So what would be the best method of obtaining the positions of the blobs in the above image?
I can't tell you how to do it with emgucv in particular, you'd need to translate the calls from opencv to emgucv. You'd use cv::findContours to get the blobs and cv::moments to get the position of the blobs (the formula to get the middle points of the blobs is in the documentation of cv::moments). Then you'd use cv::estimateRigidTransform to get the position and orientation of the robot.
I use cvBlob library to work blobs. Yesterday i worked with it to detect small blobs and works fine.
I wrote a python module to do this very thing.
http://letsmakerobots.com/node/38883#comments
Suppose I have an image file/URL, and I want my software to search it within a set of up to 100 images (or at least in that order of magnitude). The target image that the software should find should be the "same" image as the given image, but it should still be able to "forgive" slight processing on either of them (the two images may have been cropped differently, or they were compressed differently).
The question is - is this feasible a task, given that I won't have any of the images before the search is taking place (i.e., there won't be any indexing prior to the search.) Is it likely to work in subsecond time (remember that the compare set is quite small). And if feasible, which tools can I use for this task? This could be software components or even an online service (I can live with that for a proof of concept). Can OpenSURF help me here?
To focus my question further - I'm not asking which algorithms to use, at this point I would rather use an existing tool/API/service.
The target image that the software should find should be the "same" image as the given image, but it should still be able to "forgive" slight processing on either of them.
If "slight processing" doesn't involve rotation, but only "cropping", then simple cross-correlation should work, if there could be perspective correction, rotation, lens distortion correction, then things are more complicated.
I think this method is quite forgiving to slight color corrections. Anyway, you can always convert both images to grayscale and compare grayscale versions if you want.
To focus my question further - I'm not asking which algorithms to use, at this point I would rather use an existing tool/API/service.
You can start from cvMatchTemplate from OpenCV library (the link points to the C version of the API, but it's available also for C++ and Python). Use the cropped image as a template, and look for it in all your images.
If the images you compare have dark features on light backgrounds, you may benefit from using CV_TM_CCOEFF or CV_TM_CCOEFF_NORMED methods. They both subtract the average over the template area from both images. Normalized methods (CV_TM_*_NORMED) generally work better but are slower than their non-normalized counterparts.
You may consider to do some preprocessing with the images before the cross-correlation. If you normalize them first, the cross-correlation will be less sensitive to slight brightness/contrast modification. If you detect edges first, as suggested by #misha, you'll lose color/lightness information, but the results for contour overlapping will be much better.
jetxee set you off on the right track. However, if you simply use template matching, you can run into problems where the background interferes with your template matching result. For example, if your template is a building and your background is primarily light (e.g. desert sand), then the template matching will fail because the lighter background will always return a higher cross-correlation than the darker template. Here is an example of this problem.
The way you solve it is the same as what is in the link:
Perform edge-detection on both your template and the target image.
Throw original template and image away
Perform template detection using the edge-detected template and edge-detected target image
As far as forgiving slight processing, the edge detection step will take care of that. As long as the edges in the two images are not modified significantly (blurred, optically distorted), the approach will work.
I know you are not looking specifically for algorithms, but nonetheless, let me suggest the following which can accomplish exactly what you are trying to do, very efficiently...
For cropped versions of the same image, including rotation, the Fourier-Mellin transform or a log-polar transform (watch out for the artsy semi-nude drawing - good source however) will give you the translation, rotation and scale coefficients between the two images, allowing to to determine what operations were needed to go from one to the other.