Find explicitly islands in matrix - breadth-first-search

I am given a matrix of 0 and 1's and have to find the islands formed by one's. If found a reference :
https://www.careercup.com/question?id=14948781
About who to compute the number of island but don't know at all how to adapt the algorithm to at the end obtain the list of islands given a lists of cells of the matrix belonging to them.

This problem is essentially asking you to find the connected components of an undirected graph. Here, a connected component is a group of 1's surrounded by 0s and none of the 1s in the group are connected to another separate group of 1s surrounded by 0s.
To solve this, you can start by performing a Depth First Search (DFS) on each of the elements in the 2D matrix.
If the algorithm encounters an unvisited 1, increment the count of the islands
Recursively perform a DFS on all the adjacent vertices (up, down, left, right)
Keep track of the visited 1s so that they are not visited again (can be done
using a set or by marking the node in the graph visited with a boolean flag)
As soon as a 0 is found during this recursive DFS, stop the DFS
Continue the algorithm by moving onto the next element in the 2D matrix and
repeat steps 1-4
When you reach the end of the 2D matrix return the count of islands

Related

When calculating Manhattan Distance, should you calculate distance to end point or start point?

I'm trying to learn the A* algorithm (when applied to a grid pattern) and think I have grasped that before you can find the shortest path, you need to calculate the distance away from the start for any given square.
I'm following the guide here: https://medium.com/#nicholas.w.swift/easy-a-star-pathfinding-7e6689c7f7b2 which has the following image showing the Manhattan distances for each square on the grid:
with the start square being the green square and the end being the blue.
However, surely it makes more sense to figure the distance in reverse? The A* chooses the connected square with the shortest distance to the goal right? So surely this (based on the image) would make sense if we started at the end and asked what's the lowest value connected to the start, in this case 17, so go there, then 15 so go there etc etc.
Sub question: The distances in the image away from the start appear to be based on moving through Von Neumann neighbours, so surely on the way back you cannot go diagonally?
It is quite simple actually:
F = G + H
F is the total cost of the node.
G is the distance between the current node and the start node.
H is the heuristic — estimated distance from the current node to the end node.
The numbers in the grid represent G (and not the heuristic). G is the actual distance from the start point.
H should be calculated to the endpoint.

Geopandas spatial join - all points within an x meter radius

I have a collection of Points (lat, long for a collection of buildings) and I want to group them based on whether they are within x meters of each other. I know that I could do this pairwise by first using Geopandas buffer() function (with x/2 meter radius) and then using sjoin(). However, I don't want to just do this pairwise. I want to group all buildings whose buffer region (the lat, long as the center and the buffer being a circle of radius x/2 meters) overlaps with ANY OTHER buffer region.
For example, if I have three buildings (denoted A, B and C), with each building 25 meters from its neighbor and I use a 25 meter buffer, then A and B can be grouped with sjoin() and B and C can be grouped, but I would want all THREE to be grouped.
That's in contrast to the case where A and B are 25 meters apart and C is 50 meters from B. In that case, I would want to be able to group A and B together and C in its own group.
In reality, I have potentially 100 or more buildings, so it isn't possible to run all permutations pairwise. I would need a function that groups multiple buildings whenever the building's buffer circle intersects with any other buffer circle.
Is there a simple way to do this with Geopandas?
Thank you for your responses. I wound up doing the following:
buffered each building's lat/long Point by a set distance, in meters (** see note below **)
determined the geopandas.unary_union of all of the buffers to get a multi-polygon (list of polygons)
wrote a custom function to identify which polygon in the list intersects with a specific buildings Point (using geopandas's .intersection), and called that list index the "location group"
used panda's apply with the above function to get the "location group" that each building belonged to
used panda's groupby to sum the values based on the "location group"
For the above, I had to be a bit careful about edge cases where the intersection was not a multipolygon but rather a polygon (or even a Point, in cases where the buffer was zero).
A bigger issue I ran into was enormous trouble getting geopanda's buffer function to work properly. I know that this has to do with the crs/projections and tried all sorts of variations, but could never get the buffer function to work properly with the buffer distance in meters. I found a nice custom function here (Is there an easy way to create square buffers around point and if they intersect, merge them?) that did work, but then realized that my Postgres setup has the PostGIS functions and used ST_Buffer to get the buffer polygons at the same time I am querying the database for the building lat/long data.
Hopefully the above will be helpful to others.

Extracting properties of handwritten digits to fasten nearest neighbour algorithm

I have 1024 bit long binary representation of three handwritten digits: 0, 1, 8.
Basically, in 32x32 bitmap of a digit, rows are concatenated to form a binary vector.
There are 50 binary vectors for each digit.
When we apply Nearest neighbour to each digit, we can use hamming distance metric or some other, and then apply the algorithm to differentiate between the vectors.
Now I want to use another technique where instead of looking at each bit of a vector, I would like to analyse on less number of bits while comparing the vectors.
For example, I know that when one compares bitmap(size:1024 bits) of digits '8' and '0', We must have 1s in middle of the vector of digit '8' as there digit 8 visually appears as the combination of two zeros placed in column.
So our algorithm would look for the intersection of two zeros(which would be the middle of digit.
Thats the way I want to work. I want to convert the low level representation(looking at 1024 bitmap vector) to the high level representation(that consist of two properties extracted from bitmap).
Any suggestion? I hope, the question is somewhat clear to the audience.
Idea 1: Flood fill
This idea does not use the 50 patterns you have per digit: it is based on the idea that usually a "1" has all 0-bits connected around that "1" shape, while a "0" separates the 0-bits inside it from those outside it, and an "8" has two such enclosed areas. So counting connected areas of 0-bits would identify which of the three it is.
So you could use a flood fill algorithm, starting at any 0 bit in the vector, and set all those connected 0-bits to 1. In a 1 dimensional array you need to take care to correctly identify connected bits (either horizontally: 1 position apart, but not crossing a 32 boundary, or vertically... 32 positions apart). Of course, this flood-filling will destroy the image - so make sure to use a copy. If after one such flood-fill there are still 0 bits (which were therefore not connected to those you turned into 1), then choose one of those and start a second flood-fill there. Repeat if necessary.
When all bits have been set to 1 in that way, use the number of flood-fills you had to perform, as follows:
One flood-fill? It's a "1", because all 0-bits are connected.
Two flood-fills? It's a "0", because the shape of a zero separates two areas (inside/outside)
Three flood-fills? It's an "8", because this shape separates three areas of connected 0-bits.
Of course, this process assumes that these handwritten digits are well-formed. For example, if an 8-shape would have a small gap, like here:
..then it will not be identified as an "8", but a "0". This particular problem could be resolved by identifying "loose ends" of 1-bits (a "line" that stops). When you have two of those at a short distance, then increase the number you got from flood-fill counting with 1 (as if those two ends were connected).
Similarly, if a "0" accidentally has a small second loop, like here:
...it will be identified as an "8" instead of a "0". You could prevent this particular problem by requiring that each flood-fill finds a minimum number of 0-bits (like at least 10 0-bits) to count as one.
Idea 2: probability vector
For each digit, add up the 50 example vectors you have, so that for each position you have a count somewhere between 0 to 50. You would have one such "probability" vector per digit, so prob0, prob1 and prob8. If prob8[501] = 45, it means that it is highly probable (45/50) that an "8" vector will have a 1-bit at index 501.
Now transform these 3 probability vectors as follows: instead of storing a count per position, store the positions in order of decreasing count (probability). So if prob8[513] has the highest value (like 49), then that new array should start like [513, ...]. Let's call these new vectors A0, A8 and A1 (for the corresponding digit).
Finally, when you need to match a given input vector, simultaneously go through A0, A1 and A8 (always looking at the same index in the three vectors) and keep 3 scores. When the input vector has a 1 at the position specified in A0[i], then add 1 to score0. If it also has a 1 at the position specified in A1[i] (same i), then add 1 to score1. Same thing for score8. Increment i, and repeat. Stop this iteration as soon as you have a clear winner, i.e. when the highest score among score0, score1 and score8 has crossed a threshold difference with the second highest score among them. At that point you know which digit is being represented.

Calculating the neighborhood distance

What method would you use to compute a distance that represents the number of "jumps" one has to do to go from one area to another area in a given 2D map?
Let's take the following map for instance:
(source: free.fr)
The end result of the computation would be a triangle like this:
A B C D E F
A
B 1
C 2 1
D 2 1 1
E . . . .
F 3 2 2 1 .
Which means that going from A to D, it takes 2 jumps.
However, to go from anywhere to E, it's impossible because the "gap" is too big, and so the value is "infinite", represented here as a dot for simplification.
As you can see on the example, the polygons may share points, but most often they are simply close together and so a maximum gap should be allowed to consider two polygons to be adjacent.
This, obviously, is a simplified example, but in the real case I'm faced with about 60000 polygons and am only interested by jump values up to 4.
As input data, I have the polygon vertices as an array of coordinates, from which I already know how to calculate the centroid.
My initial approach would be to "paint" the polygons on a white background canvas, each with their own color and then walk the line between two candidate polygons centroid. Counting the colors I encounter could give me the number of jumps.
However, this is not really reliable as it does not take into account concave arrangements where one has to walk around the "notch" to go from one polygon to the other as can be seen when going from A to F.
I have tried looking for reference material on this subject but could not find any because I have a hard time figuring what the proper terms are for describing this kind of problem.
My target language is Delphi XE2, but any example would be most welcome.
You can create inflated polygon with small offset for every initial polygon, then check for intersection with neighbouring (inflated) polygons. Offseting is useful to compensate small gaps between polygons.
Both inflating and intersection problems might be solved with Clipper library.
Solution of the potential neighbours problem depends on real conditions - for example, simple method - divide plane to square cells, and check for neighbours that have vertices in the same cell and in the nearest cells.
Every pair of intersecting polygons gives an edge in (unweighted, undirected) graph. You want to find all the path with length <=4 - just execute depth-limited BFS from every vertice (polygon) - assuming that graph is sparse
You can try a single link clustering or some voronoi diagrams. You can also brute-force or try Density-based spatial clustering of applications with noise (DBSCAN) or K-means clustering.
I would try that:
1) Do a Delaunay triangulation of all the points of all polygons
2) Remove from Delaunay graph all triangles that have their 3 points in the same polygon
Two polygons are neightbor by point if at least one triangle have at least one points in both polygons (or obviously if polygons have a common point)
Two polygons are neightbor by side if each polygon have at least two adjacents points in the same quad = two adjacent triangles (or obviously two common and adjacent points)
Once the gaps are filled with new polygons (triangles eventually combined) use Djikistra Algorithm ponderated with distance from nearest points (or polygons centroid) to compute the pathes.

Extracting only peaks in a distribution

I have a table that has frame numbers in one column and corresponding color moments in the other column. I found them using openCV.
Some of the frames have extremely high values and rest very low. How can I extract the frames with very high peaks ?
This is the plot of the distribution, I tried to use Gaussian smoothing and then thresholding on the plot below.
I got this result.
Now how should I proceed ?
Basically you are looking for a peakfinder...MATLAB has a peakfinder function to find peaks...
I did not find any ready made API in OpenCV for this so I implemented the peakfinder of MATLAB...the algorithm goes this way...
Initial assumptions or prior knowledge can be a) you can have 'n' peaks in your distribution b) your peaks are separated by a minimum window 'w' i.e no two peak are closer than 'w'.
I can tel you the window implementation. Start at a data point . Mark its position as current index and check in its left and right neighbourhood of length 'w' whether a value more than the value at current index exists or not.
If yes move to the point. Make the point the current index and repeat 2.
If no then its your local maxima. Move ur current index by 'w' length and repeat 2 till you reach data set end.
try to implement this and check MATLAB help for peakfinder. If no luck I can post the code..
EDIT after seeing your edited graph it seems the graph has well defined maximum peaks and hence what you can do is track the sign of the dy/dx of the graph. Maximum peaks are points where sign of dy/dx changes from positive to negative...in code language
vector<double> array_of_max_peak;
if (sign( x(n+1) - x(n) ) ) > 0
array_of_max_peak.push(x(n));

Resources