I have a phone camera that’s viewing a planar object. I know the real world measurements of the object. Considering the top left corner of the object as the origin, I calculate the coordinates using the real world measurements. With the object detection algorithm I am able to get the coordinates of the detected object on the image, which is in pixels. (again going by the fact that the image's origin is on the top left hand corner). I obtain the Rotation and Translation matrix using solvepnp(). Now is it possible (with the obtained parameters) to find the distance and the height of the object with respect to the first frame?
Related
I have a fixed camera so I know the distance between the camera to the ground, as well as the distance between the camera and the bottom line in the image (which is the floor).
I have an object in the image and I need to calculate the distance to it. However, the actual dimensions of the object are not available.
In the first image, the distance to the object is 75cm. In the second image the distance is 33cm.
How can I calculate the distances using the fixed camera? I found a few tutorials which used the focal length and the width of the object, however I cannot use it.
I can detect the object and have a bounding box around it.
Thanks
I have a robot-arm here with a camera attached. The camera is fixed to the arm and takes photos in the arms rotation / position.
I use opencv to detect certain points within the image and need to translate my detected coordinates back to the coordinate system of my robot-arm. (To move over them)
I'm struggling to figure out how to transform my points (all the given information is: Origin of my arms coordinate system, arm position and point position inside the image)
Here is an image (hopefully) explaining what I want to achieve:
In Addition I need to subtract X units from my arms length, since the pickup tool is on the tip of the arm and the camera before it.
This however should be possible by transforming the coordinates to angle+length, subtracting X and transforming them back.
What is best strategy to recreate part of a street in iOS SceneKit using .osm XML data?
Please assume part of a street is offered in the OSM XML data and contains the necessary geopoints with latitude and longitude denoting the Nodes to describe the paths/footprints of 6 buildings (i.e. ground floor plans that line the side of a street).
Specifically, what's the best strategy to convert latitude and longitude Nodes in order to locate these building footprints/polygons on the ground floor in a scene within SceneKit iOS? (i.e. running through position 0,0,0)? Thank you.
Very roughly and briefly, based on my own experience with 3D map rendering:
Transform the XML data from lat/long to appropriate coordinates for a 2D map (that is, project it to a plane using a map projection, then apply a 2D affine transform to get it into screen pixel coordinates). Create a 2D map that's wider and taller than the actual screen, because of what's going to happen in step 2:
Using a 3D coordinate system with your map vertical (i.e., set all the Z coordinates to zero), rotate the map so that it reclines at an appropriate shallow angle, as if you're in an aeroplane looking down on it; the angle might be 30 degrees from horizontal. To rotate the map you'll need to create a 3D rotation matrix. The axis of rotation will be the X axis: that is, the horizontal line that is the bottom border of your 2D map. The rotation is exactly the same as what happens when you rotate your laptop screen away from you.
Supply the new 3D coordinates to your rendering system. I haven't used SceneKit but I had a quick look at the documentation and you can use any coordinate system you like, so you will be able to use one that is convenient for the process I have just described: something that uses units the size of a screen pixel at the viewing plane, with Y going upwards, X going right, and Z going away from the viewer.
One final caveat: if you want to add extrusions giving a rough approximation of the 3D building shapes (such data is available in OSM for some areas) note that my scheme requires the tops of buildings, and indeed anything above ground level, to have negative Z coordinates.
Pretty simple. First, convert Your CLLocationCoordinate2D to a MKMapPoint, which is exactly the same as a CGRect. Second, scale down the MKMapPoint by some arbitrary number so it fits in with how you want it on your scene graph, let's say by 200. Since scenekit's coordinate system is centered at (0,0), you'll need to make sure your location is correct. Then just create your scnvector3's with the x/y of he MKMapPoint, and you will be locked to coordinates.
I'm trying to use the function SolvePNP to estimate the relative position of a camera. Mi question is this, when choosing world coordinates, do I need to be careful in choosing them so that there can be no reflections when transforming them to camera coordinates? Or will OpenCV correct that for me?
Details: I'm filming a tennis court and was originally setting the world coordinate origin to be the centre of the court, with the x-axis pointing parallel to the net towards the left, the y-axis pointing forwards vertically on the court, and the z-axis pointing upwards. If I've understood correctly, SolvePNP will transform these coordinates to a system with origin at some point behind the top left corner of an image, with x-axis pointing downwards on the image, y-axis pointing to the right, and z-axis pointing forwards to the scene. However this transformation would definitely involve a reflection, must I swap the x and y axis of my world coordinates to avoid this or is it fine to leave them as they are? (Also, let me know if I'm making a big mistake and SolvePnp actually puts the origin at a point behind the centre of the image rather than one the top left corner...)
Assuming that you have a camera calibration matrix (and that such calibration was done assuming a right hand coordinate system all along), and correct correspondences between the tennis field features in the image and the CAD-features:
You need to select the reference frame in the tennis court such that is a right hand coordinate system, so that your solution from solvePNP provides the pose and position of the tennis field reference frame with respect to the camera coordinate system (by default a right hand coordinate system).
Hope it helps
All i know is that the height and width of an object in video. can someone guide me to calculate distance of an detected object from camera in video using c or c++? is there any algorithm or formula to do that?
thanks in advance
Martin Ch was correct in saying that you need to calibrate your camera, but as vasile pointed out, it is not a linear change. Calibrating your camera means finding this matrix
camera_matrix = [fx,0 ,cx,
0,fy,cy,
0,0, 1];
This matrix operates on a 3 dimensional coordinate (x,y,z) and converts it into a 2 dimensional homogeneous coordinate. To convert to your regular euclidean (x,y) coordinate just divide the first and second component by the third. So now what are those variables doing?
cx/cy: They exist to let you change coordinate systems if you like. For instance you might want the origin in camera space to be in the top left of the image and the origin in world space to be in the center. In that case
cx = -width/2;
cy = -height/2;
If you are not changing coordinate systems just leave these as 0.
fx/fy: These specify your focal length in units of x pixels and y pixels, these are very often close to the same value so you may be able to just give them the same value f. These parameters essentially define how strong perspective effects are. The mapping from a world coordinate to a screen coordinate (as you can work out for yourself from the above matrix) assuming no cx and cy is
xsc = fx*xworld/zworld;
ysc = fy*yworld/zworld;
As you can see the important quantity that makes things bigger closer up and smaller farther away is the ratio f/z. It is not linear, but by using homogenous coordinates we can still use linear transforms.
In short. With a calibrated camera, and a known object size in world coordinates you can calculate its distance from the camera. If you are missing either one of those it is impossible. Without knowing the object size in world coordinates the best you can do is map its screen position to a ray in world coordinates by determining the ration xworld/zworld (knowing fx).
i don´t think it is easy if have to use camera only,
consider about to use 3rd device/sensor like kinect/stereo camera,
then you will get the depth(z) from the data.
https://en.wikipedia.org/wiki/OpenNI