Geopandas spatial join - all points within an x meter radius - spatial

I have a collection of Points (lat, long for a collection of buildings) and I want to group them based on whether they are within x meters of each other. I know that I could do this pairwise by first using Geopandas buffer() function (with x/2 meter radius) and then using sjoin(). However, I don't want to just do this pairwise. I want to group all buildings whose buffer region (the lat, long as the center and the buffer being a circle of radius x/2 meters) overlaps with ANY OTHER buffer region.
For example, if I have three buildings (denoted A, B and C), with each building 25 meters from its neighbor and I use a 25 meter buffer, then A and B can be grouped with sjoin() and B and C can be grouped, but I would want all THREE to be grouped.
That's in contrast to the case where A and B are 25 meters apart and C is 50 meters from B. In that case, I would want to be able to group A and B together and C in its own group.
In reality, I have potentially 100 or more buildings, so it isn't possible to run all permutations pairwise. I would need a function that groups multiple buildings whenever the building's buffer circle intersects with any other buffer circle.
Is there a simple way to do this with Geopandas?

Thank you for your responses. I wound up doing the following:
buffered each building's lat/long Point by a set distance, in meters (** see note below **)
determined the geopandas.unary_union of all of the buffers to get a multi-polygon (list of polygons)
wrote a custom function to identify which polygon in the list intersects with a specific buildings Point (using geopandas's .intersection), and called that list index the "location group"
used panda's apply with the above function to get the "location group" that each building belonged to
used panda's groupby to sum the values based on the "location group"
For the above, I had to be a bit careful about edge cases where the intersection was not a multipolygon but rather a polygon (or even a Point, in cases where the buffer was zero).
A bigger issue I ran into was enormous trouble getting geopanda's buffer function to work properly. I know that this has to do with the crs/projections and tried all sorts of variations, but could never get the buffer function to work properly with the buffer distance in meters. I found a nice custom function here (Is there an easy way to create square buffers around point and if they intersect, merge them?) that did work, but then realized that my Postgres setup has the PostGIS functions and used ST_Buffer to get the buffer polygons at the same time I am querying the database for the building lat/long data.
Hopefully the above will be helpful to others.

Related

How can I calculate center of mass after floodfill?

I have an image like this one
with 3 distinct regions. Using a breath first 4 neighbor queue, I have implemented a basic flood fill that distinguishes between the 3.
Now I need to find the center of mass of these regions with each pixel weighing one unit of weight.
Whats the best way of going about that?
The simplest way is to keep three arrays, sumx, sumy and count, each with one entry per label (3 in your case), and all initialized to 0. Then run through the image once, and for each labeled pixel add the x coordinate to the corresponding bin in sumx, the y coordinate to the corresponding bin in sumy, and 1 to the corresponding bin in count.
At the end, for each label l you can compute sumx[l]/count[l] and sumy[l]/count[l]. These are the unweighted centers of gravity (centroids).

Pathfinding On a huge Map

I am in need of some type of pathfinding, so I searched the Internet and found some algorithms.
It seems like they all need some type of map also.
This map can be represented by:
Grid
Nodes
As my map is currently quite huge (20.000 x 20.000 px), a grid map of 1 x 1 px tiles would lead to 400.000.000 unique points on the Grid and also the best Quality I would think. But thats way to much points for me so I could either
increase the tile size (e.g. 50 x 50 px = 160.000 unique points)
switch to Nodes
As the 160.000 unique points are also to much for me, or I would say, not the quality I would like to have, as some units are bigger as 50 px, I think Nodes are the better way to go.
I found this on the Internet 2D Nodal Pathfinding without a Grid and did some calculations:
local radius = 75 -- this varies for some units so i stick to the biggest value
local DistanceBetweenNodes = radius * 2 -- to pass tiles diagonaly
local grids = 166 -- how many col/row
local MapSize = grids * DistanceBetweenNodes -- around 25.000
local walkable = 0 -- used later
local Map = {}
function even(a)
return ((a / radius) % 2 == 0)
end
for x = 0, MapSize, radius do
Map[x] = {}
for y = 0, MapSize, radius do
if (even(x) and even(y)) or (not even(x) and not even(y)) then
Map[x][y] = walkable
end
end
end
Without removing the unpassable Nodes and a unit size of 75 i would end up with ~55445 unique Nodes. The Nodes will drastically shrink if i remove the unpassable Nodes, but as my units have different sizes i need to make the radius to the smallest unit i got. I dont know if this will work with bigger units later then.
So i searched the Internet again and found this Nav Meshes.
This will reduce the Nodes to only "a few" in my eyes and would work with any unit size.
UPDATE 28.09
I have created a nodal Map of all passable Areas now ~30.000 nodes.
Here is an totally random example of a map and the points i have:
Example Map
This calls for some optimization, and reduce the amount of nodes you have.
Almost any pathfinding algorithm can take a node list that is not a grid. You will need to adjust for distance between nodes, though.
You could also increase your grid size so that it does not have as many squares. You will need to compensate for small, narrow paths, in some sort of way, though.
At the end of the day, i would suggest you reduce your node count by simply placing nodes in an arranged path, where you know it is possible to get from point A to B, specifying the neighbors. You will need to manually make a node path for every level, though. Take my test as an example (There are no walls, just the node path):
For your provided map, you would end up with a path node similar to this:
Which has around 50 nodes, compared to the hundreds a grid can have.
This can work on any scale, since your node count is dramatically cut, compared to the grid approach. You will need to make some adjustments, like calculating the distance between nodes, now that they are not in a grid. For this test i am using dijkstra algorithm, in Corona SDK (Lua), but you can try using any other like A-star (A*), which is used in many games and can be faster.
I found a Unity example that takes a similar approach using nodes, and you can see that the approach works in 3D as well:

Calculating the neighborhood distance

What method would you use to compute a distance that represents the number of "jumps" one has to do to go from one area to another area in a given 2D map?
Let's take the following map for instance:
(source: free.fr)
The end result of the computation would be a triangle like this:
A B C D E F
A
B 1
C 2 1
D 2 1 1
E . . . .
F 3 2 2 1 .
Which means that going from A to D, it takes 2 jumps.
However, to go from anywhere to E, it's impossible because the "gap" is too big, and so the value is "infinite", represented here as a dot for simplification.
As you can see on the example, the polygons may share points, but most often they are simply close together and so a maximum gap should be allowed to consider two polygons to be adjacent.
This, obviously, is a simplified example, but in the real case I'm faced with about 60000 polygons and am only interested by jump values up to 4.
As input data, I have the polygon vertices as an array of coordinates, from which I already know how to calculate the centroid.
My initial approach would be to "paint" the polygons on a white background canvas, each with their own color and then walk the line between two candidate polygons centroid. Counting the colors I encounter could give me the number of jumps.
However, this is not really reliable as it does not take into account concave arrangements where one has to walk around the "notch" to go from one polygon to the other as can be seen when going from A to F.
I have tried looking for reference material on this subject but could not find any because I have a hard time figuring what the proper terms are for describing this kind of problem.
My target language is Delphi XE2, but any example would be most welcome.
You can create inflated polygon with small offset for every initial polygon, then check for intersection with neighbouring (inflated) polygons. Offseting is useful to compensate small gaps between polygons.
Both inflating and intersection problems might be solved with Clipper library.
Solution of the potential neighbours problem depends on real conditions - for example, simple method - divide plane to square cells, and check for neighbours that have vertices in the same cell and in the nearest cells.
Every pair of intersecting polygons gives an edge in (unweighted, undirected) graph. You want to find all the path with length <=4 - just execute depth-limited BFS from every vertice (polygon) - assuming that graph is sparse
You can try a single link clustering or some voronoi diagrams. You can also brute-force or try Density-based spatial clustering of applications with noise (DBSCAN) or K-means clustering.
I would try that:
1) Do a Delaunay triangulation of all the points of all polygons
2) Remove from Delaunay graph all triangles that have their 3 points in the same polygon
Two polygons are neightbor by point if at least one triangle have at least one points in both polygons (or obviously if polygons have a common point)
Two polygons are neightbor by side if each polygon have at least two adjacents points in the same quad = two adjacent triangles (or obviously two common and adjacent points)
Once the gaps are filled with new polygons (triangles eventually combined) use Djikistra Algorithm ponderated with distance from nearest points (or polygons centroid) to compute the pathes.

How to determine area of MKMapRect with greatest concentration of MKAnnotation objects?

Given an MKMapView that contains a variable amount of annotations ([mapView annotations]) at various points on the map and the MKMapRect value MKMapRectWorld, how can I determine an area on the map that has the greatest concentration on MKAnnotation objects (perhaps the 5-15 annotations closest to one another) ?
Example scenarios:
* Coffee finder: Determine which area of the map has the most Starbucks
* K9 statistics: Determine which area of the map has the most Cocker Spaniels
The "area" could be a set rect size or determined by a block of annotations, I don't necessarily care. Thanks for your help!
You will find related question helpful.
Also take look at K-means_algorithm
K-means_algorithm
If you have N annotations and want to break into K parts you can find center (which will fill certain criteria. e.g. minimize the within-cluster sum of squares ) of each of K parts with K-means-algorithm. Once you have center find out distance between center and annotation farthest from center it will give radius of region you are interested. There are several variations of K-means_algorithm, you can choose whichever based on performance and ease of implementation.
EDIT:
I have not implemented following, but think will definitely give one of solution
If you are OK with range 5-10, there can be multiple solutions. So we will find one of solution.
1- Say you have (N=100) annotations and want which (P =15) of them are most located densely.
2- Then we will divide N annotations in K = N/P groups(here K = 7) randomly
3- Use K-means algorithm so that finally we will have K groups that can be differentiated as separate entities.
4- These K groups will have property of minimum " within-cluster sum of squares".
5- If you want to save computational time, you can relax definition of most concentrated group as minimum "within-cluster sum of squares" as opposed to area bounded by them.
6- select a group from obtained K group that satisfies your criteria.
7- If want to persist to minimum area(greatest concentration) definition then you will need to do lots of computation
a. First determine boundary annotations of given group, which itself a huge problem.
b. calculate are of each polygon and see which is least. not complicated but computationally demanding)
EDIT2:
I tried whatever I can and finally thought it this question belongs to specialist math website. I asked your question here and from the answer , you can get paper discussing this problem and solution here. Problem they discuss is given N points, find the K points whose area of convex hull is minimum.

Detecting a certain Latitude / Longitude is in a US State

I know that most people will view this question and point me to Google Geocode - but I'm looking for a mathematical formula that allows someone to take a Lat/Lng point and see if its inside a US state (or a bounding box). Is there a way via PHP, that I can do a calculation to see if a point is in a certain Box (such as California)?
Well, there's no formula that'll tell you anything about what states is where (it would have totally been a spoiler as to the outcome of the US-Mexico war if there was!) So you'll need to get that data from somewhere.
This then turns into one of two problems, depending on the degree of accuracy you want.
If you have details of a bounding box that is rectangular when shown on a Mercator or similar projection (that is, it has degrees of latitude for north and south, and of longitude for east and west), then the formula is simply:
inBox = latitude <= north && latitude >= south && longitude <= west && longitude >= east
If you have more detail, and have a series of points that defines the border of the state (obviously, the more points, the more precision) then it becomes a variant of the point-in-polygon problem, with a guarantee of only involving simple polygons (no US state has a border that crosses itself, nor completely surrounds that used in this C code. It's possible that there would be edge cases affected by the fact that this is a 2D-plane algorithm rather than a spherical one, but I imagine you'd need to have some pretty precise data on the boundaries of the states for the imprecision from the algorithm to be greater than that caused by the data.
The simplest way I would think is using bound box for each state, that can be found from Flicker Geo API, an example for CA- https://www.flickr.com/places/info/2347563

Resources