I am trying to figure out how to roughly project the geographic position of an annotated object in an image?
The Setup
A picture with a known object in it. i.e. we know the width/height.
A bounding box highlighting where that object is in frame. X,Y,Width,Height.
The precise longitude and latitude of the camera that took the picture. The Origin.
The heading of the camera.
The focal length of the camera.
The camera sensor size.
The height of the camera off the ground.
Can anyone point me toward a solution for roughly projecting the objects location from the image origin location, given those data points?

The solution is simple if you assume an ellipsoidal surface for the Earth. If you need to use a Digital Terrain Model (DTM) things will get quickly more complicated. For example, your object may be visible in the image but occluded on the DTM because of various sources of error. In the following I assume you work with the ellipsoid.
Briefly, what you need to do is backproject the vertices of the image bounding box, obtaining four vectors (rays) in camera coordinates. You then transform them into Earth-Centered Earth-Fixed (ECEF) coordinates and solve for the intersection of the rays with the WGS-72 (or WGS-84) ellipsoid as explained here.
I recommend using the nvector library to help with this kind of calculations. You didin't specify the language you work with, but there are ports of nvector to many common languages, including Python and Matlab.


How to find the location of an object with respect to another object in an image?

We are doing a project in which we are detecting (using YOLOv4) runway debris using a fixed camera mounted at a pole on the side of the runway. We want to find out the position of the object with respect to the runway surface. How can we find out the distance of the object from the runway sides?
I would recommend using reliable sensors such as "light curtains" to make sure there is no debris on a runway. AI can fail, especially if things are hard to see.
As for mapping image coordinates to world coordinates on a plane, look into "homography". OpenCV has getPerspectiveTransform. it's easiest if you pick a big rectangle of known dimensions in your scene (runway), mark the corners, find them in your picture, note those points too, and those four pairs you give to getPerspectiveTransform. now you have a homography matrix that will transform coordinates back and forth (you can invert it).
check out the tutorials section in OpenCV's documentation for more on homographies.

Translating Lat/Lon coordinates to X,Y plane, with control points

I have an arbitrary map image, which may or may not be accurately projected to some standard geographic mapping. Probably not, though, since it's an artists rendition. Consider this map a 2D image of pixels at 0,0 onward.
I'd like to map lat/lon points in world space to this map. Since the map is not necessarily a known or accurate projection, I've got to come up with some other solution. I figure that establishing control points on the 2D image that correlate to known lat/lon values is step #1. At a minimum, 3, but maybe more, in case it's required to sort out distortion in the map image.
What algorithm or equation would I be looking for to take these control points, and identify the X,Y position on the image from any given lat/lon input?
I expect it to be inaccurate, depending on the number of control points. And I expect, for some weirder images, to have to go and add many control points in certain areas to make it line up right.
When the area depicted is small, (e.g. it fits in a square few km on the side), one thing to try is described below. I'm sorry if you find the description too terse, I wanted to keep it reasonably short.
The idea is to assume the image is in some unknown conformal projection, and to try to approximate it. Of course this may fail, if the image can not, in fact, be reasonably approximated this way.
Given your control points P[], project them into map coordinates Q[] using some conformal projection, and get hold of their image coordinates R[]. To within a metre or so -- given the assumption above -- the R[] can be obtained from the Q[] by a transformation T that is a translation, an (isotropic) scaling and a rotation. You can then find T, say by least squares, using the Q[] and R[]. You have a two stage map from the control point geographic coordinates P[] to their image coordinates R[]: first project using the chosen projection, then apply T. You could use the inverse of this map to go from arbitrary image coordinates to geographical coordinates.
If the image is larger than a few km, you may not get enough accuracy this way. All is not lost. Though a translation, scale and rotation may not suffice, any two conformal projections are related by a (complex) analytic map. So you could try to fit (an approximation to) this map using the control points as above. A suitable approximation might be a complex polynomial, or a complex rational function.
If I were doing this, I think I would first test it on artificial data. For example you could generate images of various sizes, using some projection (differing from the one used above), and see how well you fit the known points in these images.

Measuring distance between objects from a photo, Perspective transform

I have two questions which could be related:
1.) I would like to estimate distances between objects which are positioned in one plane from a photo. Geometrical shape of one object in the photo is rectangular and its dimensions are known, but there is no information on the photo (Camera focal length, photo angle, senor size etc…). For example, say I have the following PCB photo and dimensions of the rectangular chip are known to be 20x10mm, all objects lie in a plane. Is it even possible to estimate the distances (in top view) between other PCB components ?
In this particular case, maximum distance error of 2-3mm would be acceptable.
2.) Say I have similar PCB photo like the above, where I have one feature (object) for which I know it is rectangular shaped. I would like to transform the image perspective so that the object looks rectangular. I have tried imageJ (Fiji) and Interactive Perspective Plugin for this task. First I display rectangular grid over the image and then manually transform the image using the plugin till the object does not appear rectangular. But for some photo angles I find it impossible to manually adjust the control points in order to get rectangular object shape.
Does somebody know alternative approach using imageJ (Fiji) or Octave ? A solution in python would also be ok, although I don’t have much python experience (just recently installed Anaconda with Spyder).
A few years ago, I created a software that seems good for you. It corrects perspective transforming a quadrilateral to a rectangle.
Here is the result:
where you can measure distances.

How to determine distance of objects from camera using Epipolar Plane Image?

I am working on converting 2d images to 3d environment. The images were collected from a video made in a lateral motion. Then the images were placed one behind the other, so it would be easy to find the correspondences between the two images. This is called a spatial-temporal volume.
Next I take a slice from the spatiotemporal volume. That slice is called the Epipolar Plane Image.
Using the Epipolar Plane Image, I want to calculate the depth of the objects in the scene and make a 3D enviornment. I have listed the reference but I have not been able to figure out the math described in the paper. Can someone help me figure this out? Any help is appreciated.
Epipolar-Plane Image Analysis: An Approach to Determining Structure from Motion* !
The math in this situation is easy and straight forward.
First let's define two the coordinate systems for two overlapping images taken by the same camera with the focal length with the following schema:
Let us say that first camera position is defined as follows:
While it's orientation by using three Euler angles is:
By using this definition the corresponding rotation matrix is the identity matrix
The second camera position can be defined as follows:
And since the orientation is the same as the first camera, all Euler angles remain zero:
Which also means that the corresponding rotation matrix is the identity matrix.
If the images overlap and the orientation is the same, the situation in the image space looks like this:
Here the image coordinates and their measurement accuracy are defined as follows:
This geometrical situation can be described by using the Intercept Theorem:
As you see it's not complicated. But be aware that this solution is certainly not the best, since it's base assumption that all orientation angles are the same can't be fulfilled in reality.
If you need to be accurate then you have to perform an bundle adjustment. However, this equations are often used to determine the approximated solution for this geometric situation, where the values are used to linearize the collinearity equations.

Correspondence between a set of 3D model points and their image projections

I have a set of 3-d points and some images with the projections of these points. I also have the focal length of the camera and the principal point of the images with the projections (resulting from previously done camera calibration).
Is there any way to, given these parameters, find the automatic correspondence between the 3-d points and the image projections? I've looked through some OpenCV documentation but I didn't find anything suitable until now. I'm looking for a method that does the automatic labelling of the projections and thus the correspondence between them and the 3-d points.
The question is not very clear, but I think you mean to say that you have the intrinsic calibration of the camera, but not its location and attitude with respect to the scene (the "extrinsic" part of the calibration).
This problem does not have a unique solution for a general 3d point cloud if all you have is one image: just notice that the image does not change if you move the 3d points anywhere along the rays projecting them into the camera.
If have one or more images, you know everything about the 3D cloud of points (e.g. the points belong to an object of known shape and size, and are at known locations upon it), and you have matched them to their images, then it is a standard "camera resectioning" problem: you just solve for the camera extrinsic parameters that make the 3D points project onto their images.
If you have multiple images and you know that the scene is static while the camera is moving, and you can match "enough" 3d points to their images in each camera position, you can solve for the camera poses up to scale. You may want to start from David Nister's and/or Henrik Stewenius's papers on solvers for calibrated cameras, and then look into "bundle adjustment".
If you really want to learn about this (vast) subject, Zisserman and Hartley's book is as good as any. For code, look into libmv, vxl, and the ceres bundle adjuster.
