i bought an industrial usb 3.0 camera which has an built in facedetection. According to the Manual, the roi can be embededded in the pixel data. My biggest challenge is to understand the manual on how the data is stored in the pixel data. ( Getting an image, reading out single pixels is not an problem)
Maybe someone can give me a hint on how it works.
I attached some screenshots if the manual.
Thank you very much
image example
cut out embedded data from example
Part 1 Manual
Part 2 Manual
Related
In the current commit, ManipulationStation used DepthImageToPointCloud to project the point cloud from color image and depth image input. However, in the documentation, it stated that
Note that if a color image is provided, it must be in the same frame as the depth image.
In my understanding, both color and depth image input come from RgbdSensor which is created from the info from MakeD415CameraModel. Both C and D are two difference frames.
I think the resulting point cloud have a wrong coloring. I tested it on similar setup, but not the MakeD415CameraModel exactly. I currently solved this issue by forcing C and D to be the same frames in MakeD415CameraModel.
My question : Do drake have the method that map between depth image and color image from different frames, similar to kinect library? Since this is the simulation after all, maybe this is overkill?
P.S. I am trying to simulate the image from Azure Kinect; hence the question.
The issue https://github.com/RobotLocomotion/drake/issues/12125 discusses this problem, and has either a work-around (in case your simulation could fudge, and have the depth and camera frames identical even though that's slightly different than the real sensor) or a pointer to a different kind of solution (the cv2.registerDepth algorithm). Eventually it would be nice to have registerDepth-like feature built in to drake, but as of today, it's not there yet.
Overview
I am attempting to build a prototype of a vision system that would apply pattern matching to figure out the orientation of boxes (eg. soap boxes).
Image sample
Below are real-time captured images of soap boxes in actual environment having two of four possible orientations. (Front_Straight and Back_Inverted orientations).
The real-time images will be very similar to these (300x200 pixels per image approx.)
____
The template images will be fed to the system in prior and it has to determine the orientation of boxes moving on a conveyor. The boxes on conveyor are guided so that they can take only one of 4 possible orientations Front_Straight, Front_Inverted, Back_Straight and Back_Inverted i.e boxes cannot be angular. The camera and the conveyor are fixed so the image size of real-time boxes is constant 300px by 200px. (I have used monochrome camera, if needed colour camera can be used too)
Some properties of the vision system prototype:
Fixed constant lighting.
The real-time image of box will be quite
low-res as attached(300x200 per box)
Minimal motion blur or imaging artefacts
OpenCV C++ based coding environment.
Intel core i5 CPU based PC will
be used.
Problem Statement
I am looking for a light weight yet robust algorithm that can fairly match template image with real-time images of boxes on conveyor to extract the face and orientation. I am new to feature matching so please guide me as to which feature detector and matcher will be most suitable for this particular case. Also please let me know if it is possible to attain 97% plus accuracy using the low-res realtime image as attached.
You have a very fortunate case, having the images with very little variation. Any feature detector should perform very well in this scenario. Since, in OpenCV, the interface is common, they are very easy to compare against each other. From my experience, ORB tends to be quite fast and with good results, but I expect SIFT/SURF to work in your case too.
I wouldn't expect the resolution to be a problem.
I have a project that requires that picture frames and similar objects in an image taken from a smartphone app be measured. The smartphone will likely be able to provide some angle & orientation data.
Is this possible to do within a quarter of an inch with openCV?
Examples would be very helpful, someone that wants cash for an mvp would be outstanding
No can do from a single image, without additional information. Absolute scale is lost in projection. A good survey of what is possible/not possible from single image is this paper.
I am currently looking for a proper solution to the following problem, which is not directly programming oriented, but I am guessing that the users of opencv might have an idea:
My stereo camera has a sensor of 1/3.2" 752x480 resolution. I am using the two stereo images of this very camera in order to create a point cloud, thanks to the point cloud library (PCL).
The problem is that I would like to reduce the number of points contained by the point cloud, by directly lowering the resolution of the input images (passing from 752x480 to 376x240).
As it is indicated in the title, I have to adapt the focal of the camera in pixels to this need:
I calculate this very parameter thanks to the following formula:
float focal_pixel = (FOCAL_METERS / SENSOR_WIDTH_METERS)*InputImg.cols;
However the SENSOR_WIDTH_METERS is currently constant and corresponds to the 1/3.2" data converted to meters AND I would like to adapt this to the resolution that I would like to have: 376x240.
I am absolutly not sure if I turned my problem clearly enough to be answered, which would mean that I am going in the wrong direction.
Thank you in advance
edit: the function used to process the stereo image (after computing):
getPointCloud(hori_c_pp, vert_c_pp, focal_pixel, BASELINE_METERS, out_stereo_cloud, ref_texture);
where the two first parameters are just the coordinates of the center of the image, BASELINE_METERS the baseline of my camera out_stereo_cloud my output cloud and eventually ref_texture the color information. This function is taken from the sub library stereo_matching.
For some reason, if I just resize the stereo images, it seems to enter in conflict with the focal_pixel parameters, since the dimension are not the same anymore.
Im very lost on this issue.
As I don't really follow the formulas and method calls you're posting I advise you to use another approach.
OpenCV already gives you the possibility to create voxels using stereo images with the method cv::reprojectImageTo3D. Another question also already discusses the conversion to the according PCL datatype.
If you only want to reproject a certain ROI of your image you should opt for cv::perspectiveTransform as is explained in the documentation I pointed out in the first link.
I have a collection of about 3000 images that were taken from camera suspended from a weather balloon in flight. The camera is pointing a different direction in each image but is generally aimed down, so all the images share a significant area (40-50%) with the previous image but at a slightly different scale and rotated an arbitrary (and not consistent) amount. The image metadata includes a timestamp, so I do know with certainty the correct order of images and the elapsed time between each.
I want to process these images into a single video. If I simply string them together it will be great for making people seasick, but won't really capture the amazingness of the set :)
The specific part I need help with is finding the rotation of the image from the previous image. Is there a library somewhere that can identify regions of overlap between two images when the images themselves are rotated relative to each other? If I can find 2-3 common points (or more), I can do the remaining calculations to determine the amount of rotation and the offset so I can put them together correctly. Alternately, if there is a library that calculates both of those things for me, that would be even better.
I can do this in any language, with a slight preference for either Java or Python. The data is in Hadoop, so Java is the most natural language, but I can use scripting languages as well if necessary.
Since I'm new to image processing, I don't even know where to start. Any help is greatly appreciated!
For a problem like this you could look into SIFT. This algorithm detects local features in images. OpenCV has an implementation of it, you can read about it here.
You could also try SURF, which is a similar type of algorithm. OpenCV also has this implemented, you can read about that here.