I would like to drape some 2-D GIS polygon into a ā3-Dā street level?
As a a picture is worth a thousand words, please check this:
So I have the azimuth angle, the view angle and the position where the image was taken along with the field limit in 2D and I would like to have the field plotted on my street-level image (as on the top image).
Related
I start with a grid of evenly spaced XY coordinates, with the distance between them determined by the field of view of the imaging system.
The red dots are the XY coordinates, and the scan area looks like this for a 5x5 grid.
As the system takes images from these XY coordinates, from each image, I detect objects, and compute their centroids (blue dots).
Now, the problem is I want to recompute the fields of view, which are represented by the red dots, such that I image the blue dots most effectively.
Towards that, I tried MeanShift clustering like this
class ObjectClusters:
def __init__(self, album):
self.objects_album = album
def clusters_from_album(self, bandwidth):
# take object coordinates
coords = self.objects_album.get_coords()
# identify clusters
ms = MeanShift(bandwidth=bandwidth, bin_seeding=True)
ms.fit(coords)
cluster_centers = np.array(ms.cluster_centers_)
points = []
for _ in cluster_centers:
points.append(_.tolist())
return points
Now, here's what the results look in real data. The blue dots are centroids of cells, and the orange dots are centroids of Fields of View. In MeanShift, I give the width/2 of the field of view as the bandwidth parameter.
Although it looks like this solution works more or less, from the plot, I was wondering if there is a better approach towards this. I want to be able to get something like this ultimately
which is to compute the optimal number of Fields of View (their centroids) to image my sample.
I am doing object detection in order to count penguins on a UAV georeferenced dataset, so for practical reasons let's say they appear as dots on the images. After running the object detection model, it returns inferred images with the corresponding bounding boxes for each penguin detected.
I need to extract the coordinate of the center of the bounding box (something like x,y), so, as the image is georeferenced, I would be able to convert image b.box center coordinates into GPS coordinates.
This picture is a good example. Here, the authors are counting banana plants, and after detecting the plants of the same regions in 3 differently-treated pictures of the same area, they see that up to three boxes appear around some of the plants (left). So in order to count each plant as one, despite having some of them up to 3 bboxes, this is what they do (quoted from the original article):
Collect bounding boxes of detection from each ROI tiles.
Calculate centroid of each bounding box.
Add the tile number information on x and y-value of centroids to overlay them on original ROI image.
And this is exactly what I am looking for, the step number 3, how to calculate the centroid of each bbox and how to obtain the x,y coords, so then I would be able to transform those coords into real ones, as the image is georeferenced, and then display each real coord on a mosaic.
Thank you very much in advance.
You could use the Intersection over Union algorithm to select one of the boxes and then use the coordinates of the selected box to plot the output circle or box over detected objects.
I am trying to measure the precision of my marker tracking algorithm via post-processing a video.
My algorithm is: Find a printed planar marker in a Videostream and place a virtual marker at that position. I am working with AR.
Here are two frames of such a video:
Virtual Marker on top of detected marker
Virtual Marker with offset to actual marker
I want to calculate the Intersecion over Union / Jaccard Index of the actual marker and virtual marker. For the first picture it would give me ~98% and the second ~1/5th %. This will give me the quality for my algorithm, how precise and well it works.
I want to get the position and rotation of both markers in each frame with OpenCV and calculate the Jaccard Index. As you can see though, if I directly place a virtual marker on top of the paper marker, I will make it difficult for myself (with OpenCV) to detect them.
My idea is to not place a white marker on top of the actual marker, but place an easily detectable "thing" with a specific color or shape with an offset to the marker, let's say 10cm to the right maybe. Then I subtract the offset. So now, at the best case scenario, the position and rotation of the actual marker and the "thing" with the offset subtracted will be the same.
But what should I use as the easily detectable "thing"? I don't have enough experience with OpenCV to know what (colored?) shape I should use. The augmentation can go in front, behind, left, right... of the actual marker anytime during the video and it should do two things:
Not hinder the detection of the actual marker, like currently shown in the pictures
Be easily detectable itself
Help would be much appreciated!
Assuming you have enough white background around the visual marker:
You could use colored circles, for example in red, green, blue and black.
Use opencv blob detection [1] to detect all blobs and filter for circular ones:
Look-up average color values for detected blobs and filter for the colors of the circles.
Alternatively you could filter the whole image for each color and do blob detection on the filtered images. But this is slower.
Find the centroids (~ center point) of each blob using moments of the blob contours. [2] "Center of multiple blobs in an Image".
Now you have the four pixel positions of your circles. If you know the world coordinates of your light projected circles you can use solvePnP to get a pose from this.
Knowing the correct world coordinates is tricky in your case because you project the circle with light on a surface. This involves some 3D geometry. You need to know the transformation from camera coordinate system to pattern projector coordinate system and the projection parameters of your projector.
I guess you send the projected pattern as an image to the projector. I think you can then model the projector as a camera with a certain camera matrix (basically field of view & center point). Naturally you know the pixel coordinates of the projected circles. From this you can compute rays in 3D space (in projector coordinate system). As a starting point see [3]. Intersecting [4] them with the correct surface plane (in projector coordinate system) gives you the 3D coordinates of
the projected circle pattern in projector coordinate system. Transform these to camera coordinate system using your known transformation. Now use opencv solvePnP to determine pose of projected light marker.
How to get surface plane?
If your setup is static you could use visual marker detection of all recorded images and use mean oder median of marker pose as surface plane. Not sure what this implies for your evaluation though..
[1] https://www.learnopencv.com/blob-detection-using-opencv-python-c/
[2] https://www.learnopencv.com/find-center-of-blob-centroid-using-opencv-cpp-python/
[3] https://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html
[4] https://www.cs.princeton.edu/courses/archive/fall00/cs426/lectures/raycast/sld017.htm
What is best strategy to recreate part of a street in iOS SceneKit using .osm XML data?
Please assume part of a street is offered in the OSM XML data and contains the necessary geopoints with latitude and longitude denoting the Nodes to describe the paths/footprints of 6 buildings (i.e. ground floor plans that line the side of a street).
Specifically, what's the best strategy to convert latitude and longitude Nodes in order to locate these building footprints/polygons on the ground floor in a scene within SceneKit iOS? (i.e. running through position 0,0,0)? Thank you.
Very roughly and briefly, based on my own experience with 3D map rendering:
Transform the XML data from lat/long to appropriate coordinates for a 2D map (that is, project it to a plane using a map projection, then apply a 2D affine transform to get it into screen pixel coordinates). Create a 2D map that's wider and taller than the actual screen, because of what's going to happen in step 2:
Using a 3D coordinate system with your map vertical (i.e., set all the Z coordinates to zero), rotate the map so that it reclines at an appropriate shallow angle, as if you're in an aeroplane looking down on it; the angle might be 30 degrees from horizontal. To rotate the map you'll need to create a 3D rotation matrix. The axis of rotation will be the X axis: that is, the horizontal line that is the bottom border of your 2D map. The rotation is exactly the same as what happens when you rotate your laptop screen away from you.
Supply the new 3D coordinates to your rendering system. I haven't used SceneKit but I had a quick look at the documentation and you can use any coordinate system you like, so you will be able to use one that is convenient for the process I have just described: something that uses units the size of a screen pixel at the viewing plane, with Y going upwards, X going right, and Z going away from the viewer.
One final caveat: if you want to add extrusions giving a rough approximation of the 3D building shapes (such data is available in OSM for some areas) note that my scheme requires the tops of buildings, and indeed anything above ground level, to have negative Z coordinates.
Pretty simple. First, convert Your CLLocationCoordinate2D to a MKMapPoint, which is exactly the same as a CGRect. Second, scale down the MKMapPoint by some arbitrary number so it fits in with how you want it on your scene graph, let's say by 200. Since scenekit's coordinate system is centered at (0,0), you'll need to make sure your location is correct. Then just create your scnvector3's with the x/y of he MKMapPoint, and you will be locked to coordinates.
I have a picture of a checkerboard taken from an arbitrary camera angle. I find the two vanishing points corresponding to the two sets of lines that form the checkerboard grid. From these two vanishing points, I compute a homography from the checkerboard plane to the image plane.
I then apply the inverse homography to re-render the checkerboard from a top view. However, for certain images, the re-rendered top view is very large. That is, due to the camera angle, the inverse homography stretches certain parts of the image (i.e. the regions of the image that are very close to one of the vanishing points) to be very large.
This takes up an unnecessarily large amount of memory, and most of the region that becomes highly stretched is stuff I do not need. So, when applying the inverse homography, I would like to avoid rendering regions of the image that will be highly stretched. What is a good way to do this?
(I am coding in MATLAB)
If you just need to render the checkerboard, without the background, you could just extract the four corners of the checkerboard and compute the homography that maps them to the four corners of a square.
Then you can obtain a rectified image of the checkerboard by warping your input image with this homography, paying attention to render only the needed region (ie the square on which you map the checkerboard).