Osmnx cannot find path between nodes in composed graph? - geolocation

I am trying to use osmnx to find distances between a origin point (lat/lon) and nearest infrastructure, such as railways, water or parks.
1) I get the entire graph from an area with network_type='walk'.
2) Get the needed infrastructure, e.g. railway for that same area.
3) Compose the two graphs into one.
4) Find the nearest node from origin point in the original graph.
5) Find the nearest node from the origin point in the infrastructure graph
6) Find the shortest route length between the two nodes.
If you run the example below, you will see that it is missing 20% of the data because it cannot find a route between the nodes. For infrastructure='way["leisure"~"park"]' or infrastructure='way["natural"~"wood"]' this is even worse, with 80-90% of nodes not being connected.
Minimal reproducible example:
import osmnx as ox
import networkx as nx
bbox = [55.5267243, 55.8467243, 12.4100724, 12.7300724]
g = ox.graph_from_bbox(bbox[0], bbox[1], bbox[2], bbox[3],
retain_all=True,
truncate_by_edge=True,
simplify=False,
network_type='walk')
points = [(55.6790884456018, 12.568493971506154),
(55.6790884456018, 12.568493971506154),
(55.6867418740291, 12.58232314016353),
(55.6867418740291, 12.58232314016353),
(55.6867418740291, 12.58232314016353),
(55.67119624894504, 12.587201455313153),
(55.677406927839506, 12.57651997656002),
(55.6856574907879, 12.590500429002823),
(55.6856574907879, 12.590500429002823),
(55.68465359365924, 12.585474365063224),
(55.68153666806675, 12.582594757267945),
(55.67796979175, 12.583111746311117),
(55.68767346629932, 12.610040871066179),
(55.6830855237578, 12.575431380892427),
(55.68746749645466, 12.589488615911913),
(55.67514254640597, 12.574308210656602),
(55.67812748568291, 12.568454119053886),
(55.67812748568291, 12.568454119053886),
(55.6701733527419, 12.58989203029166),
(55.677700136266616, 12.582800629527789)]
railway = ox.graph_from_bbox(bbox[0], bbox[1], bbox[2], bbox[3],
retain_all=True,
truncate_by_edge=True,
simplify=False,
network_type='walk',
infrastructure='way["railway"]')
g_rail = nx.compose(g, railway)
l_rail = []
for point in points:
nearest_node = ox.get_nearest_node(g, point)
rail_nn = ox.get_nearest_node(railway, point)
if nx.has_path(g_rail, nearest_node, rail_nn):
l_rail.append(nx.shortest_path_length(g_rail, nearest_node, rail_nn, weight='length'))
else:
l_rail.append(-1)

There are 2 things that caught my attention.
OSMNX documentation specifies ox.graph_from_bbox parameters be given in the order of north, south, east, west (https://osmnx.readthedocs.io/en/stable/osmnx.html). I mention this because when I tried to run your code, I was getting empty graphs.
The parameter 'retain_all = True' is the key as you may already know. When set to true, it retains all nodes in the graph, even if they are not connected to any of the other nodes in the graph. This happens primarily due to the incompleteness of OpenStreetMap which contains voluntarily contributed geographic information. I suggest you set 'retain_all = False' meaning your graph now contains only the connected nodes. In this way, you get a complete list without any -1.
I hope this helps.
g = ox.graph_from_bbox(bbox[1], bbox[0], bbox[3], bbox[2],
retain_all=False,
truncate_by_edge=True,
simplify=False,
network_type='walk')
railway = ox.graph_from_bbox(bbox[1], bbox[0], bbox[3], bbox[2],
retain_all=False,
truncate_by_edge=True,
simplify=False,
network_type='walk',
infrastructure='way["railway"]')
g_rail = nx.compose(g, railway)
l_rail = []
for point in points:
nearest_node = ox.get_nearest_node(g, point)
rail_nn = ox.get_nearest_node(railway, point)
if nx.has_path(g_rail, nearest_node, rail_nn):
l_rail.append(nx.shortest_path_length(g_rail, nearest_node, rail_nn, weight='length'))
else:
l_rail.append(-1)
print(l_rail)
Out[60]:
[7182.002999999995,
7182.002999999995,
5060.562000000002,
5060.562000000002,
5060.562000000002,
6380.099999999999,
7127.429999999996,
4707.014000000001,
4707.014000000001,
5324.400000000003,
6153.250000000002,
6821.213000000002,
8336.863999999998,
6471.305,
4509.258000000001,
5673.294999999996,
6964.213999999994,
6964.213999999994,
6213.673,
6860.350000000001]

Related

Using cv.matchTemplate to find multiple best matches

I am using the function cv.matchTemplate to try to find template matches.
result = cv.matchTemplate(img, templ, match_method)
After I run the function I have a bunch of answers in list result. I want to filter the list to find the best n matches. The data in result just a large array of numbers so I don't know what criteria to filter based on. Using extremes = cv.minMaxLoc(result, None) filters the result list in an undesired way before converting them to locations.
The match_method is cv.TM_SQDIFF. I want to:
filter the results down to the best matches
Use the results to obtain the locations
How can I acheive this?
You can treshold the result of matchTemplate to find locations with sufficient match. This tutorial should get you started. Read at the bottom of the page for finding multiple matches.
import numpy as np
threshold = 0.2
loc = np.where( result <= threshold) # filter the results
for pt in zip(*loc[::-1]): #pt marks the location of the match
cv2.rectangle(img_rgb, pt, (pt[0] + w, pt[1] + h), (0,0,255), 2)
Keep in mind depending on the function you use will determine how you filter. cv.TM_SQDIFF tends to zero as the match quality increases so setting the threshold closer to zero filters out worse. The opposite is true for cv.TM CCORR cv.TM_CCORR_NORMED cv.TM_COEFF and cv.TM_COEFF_NORMED matching methods (better tends to 1)
The above answer does not find the best N matches as the question asked. It filters out answers based on a threshold leaving open the (likely) possibility that you still have more than N results or zero results that beat the threshold.
To find the N 'best matches' we're looking for the N highest numbers in a 2d array and retrieving their indexes so we know the location. We can use nump.argpartition to find the highest N indexes in a 1d array and numpy.ndarray.flatten with numpy.unravel_index to go back and forth between a 2d and 1d array like so:
find_num = 5
result = cv.matchTemplate(img, templ, match_method)
idx_1d = np.argpartition(result.flatten(), -find_num)[-find_num:]
idx_2d = np.unravel_index(idx_1d, result.shape)
From here you have the x,y locations of the top 5 matches.

Compute annual mean using x-arrays

I have a python xarray dataset with time,x,y for its dimensions and value1 as its variable. I'm trying to compute annual mean of value1 for each x,y coordinate pair.
I've run into this function while reading the docs:
ds.groupby('time.year').mean()
This seems to compute a single annual mean for all x,y coordinate pairs in value1 at each given time slice
rather than the annual means of individual x,y coordinate pairs at each given time slice.
While the code snippet above produces the wrong output, I'm very interested in its oversimplified form. I would really like to figure out the "X-arrays trick" to doing annual mean for a given x,y coordinate pair rather than hacking it together myself.
Cam someone point me in the right direction? Should I temporarily turn this into a pandas object?
To avoid the default of averaging over all dimensions, you simply need to supply the dimension you want to average over explicitly:
ds.groupby('time.year').mean('time')
Note, that calling ds.groupby('time.year').mean('time') will be incorrect if you are working with monthly and not daily data. Taking the mean will place equal weight on months of different length, e.g., Feb and July, which is wrong.
Instead use below from NCAR:
def weighted_temporal_mean(ds, var):
"""
weight by days in each month
"""
# Determine the month length
month_length = ds.time.dt.days_in_month
# Calculate the weights
wgts = month_length.groupby("time.year") / month_length.groupby("time.year").sum()
# Make sure the weights in each year add up to 1
np.testing.assert_allclose(wgts.groupby("time.year").sum(xr.ALL_DIMS), 1.0)
# Subset our dataset for our variable
obs = ds[var]
# Setup our masking for nan values
cond = obs.isnull()
ones = xr.where(cond, 0.0, 1.0)
# Calculate the numerator
obs_sum = (obs * wgts).resample(time="AS").sum(dim="time")
# Calculate the denominator
ones_out = (ones * wgts).resample(time="AS").sum(dim="time")
# Return the weighted average
return obs_sum / ones_out
average_weighted_temp = weighted_temporal_mean(ds_first_five_years, 'TEMP')

Simple registration algorithm for small sets of 2D points

I am trying to find a simple algorithm to find the correspondence between two sets of 2D points (registration). One set contains the template of an object I'd like to find and the second set mostly contains points that belong to the object of interest, but it can be noisy (missing points as well as additional points that do not belong to the object). Both sets contain roughly 40 points in 2D. The second set is a homography of the first set (translation, rotation and perspective transform).
I am interested in finding an algorithm for registration in order to get the point-correspondence. I will be using this information to find the transform between the two sets (all of this in OpenCV).
Can anyone suggest an algorithm, library or small bit of code that could do the job? As I'm dealing with small sets, it does not have to be super optimized. Currently, my approach is a RANSAC-like algorithm:
Choose 4 random points from set 1 and from set 2.
Compute transform matrix H (using openCV getPerspective())
Warp 1st set of points using H and test how they aligned to the 2nd set of points
Repeat 1-3 N times and choose best transform according to some metric (e.g. sum of squares).
Any ideas? Thanks for your input.
With python you can use Open3D librarry, wich is very easy to install in Anaconda. To your purpose ICP should work fine, so we'll use the classical ICP, wich minimizes point-to-point distances between closest points in every iteration. Here is the code to register 2 clouds:
import numpy as np
import open3d as o3d
# Parameters:
initial_T = np.identity(4) # Initial transformation for ICP
distance = 0.1 # The threshold distance used for searching correspondences
(closest points between clouds). I'm setting it to 10 cm.
# Read your point clouds:
source = o3d.io.read_point_cloud("point_cloud_1.xyz")
target = o3d.io.read_point_cloud("point_cloud_0.xyz")
# Define the type of registration:
type = o3d.pipelines.registration.TransformationEstimationPointToPoint(False)
# "False" means rigid transformation, scale = 1
# Define the number of iterations (I'll use 100):
iterations = o3d.pipelines.registration.ICPConvergenceCriteria(max_iteration = 100)
# Do the registration:
result = o3d.pipelines.registration.registration_icp(source, target, distance, initial_T, type, iterations)
result is a class with 4 things: the transformation T(4x4), 2 metrict (rmse and fitness) and the set of correspondences.
To acess the transformation:
I used it a lot with 3D clouds obteined from Terrestrial Laser Scanners (TLS) and from robots (Velodiny LIDAR).
With MATLAB:
We'll use the point-to-point ICP again, because your data is 2D. Here is a minimum example with two point clouds random generated inside a triangle shape:
% Triangle vértices:
V1 = [-20, 0; -10, 10; 0, 0];
V2 = [-10, 0; 0, 10; 10, 0];
% Create clouds and show pair:
points = 5000
N1 = criar_nuvem_triangulo(V1,points);
N2 = criar_nuvem_triangulo(V2,points);
pcshowpair(N1,N2)
% Registrate pair N1->N2 and show:
[T,N1_tranformed,RMSE]=pcregistericp(N1,N2,'Metric','pointToPoint','MaxIterations',100);
pcshowpair(N1_tranformed,N2)
"criar_nuvem_triangulo" is a function to generate random point clouds inside a triangle:
function [cloud] = criar_nuvem_triangulo(V,N)
% Function wich creates 2D point clouds in triangle format using random
% points
% Parameters: V = Triangle vertices (3x2 Matrix)| N = Number of points
t = sqrt(rand(N, 1));
s = rand(N, 1);
P = (1 - t) * V(1, :) + bsxfun(#times, ((1 - s) * V(2, :) + s * V(3, :)), t);
points = [P,zeros(N,1)];
cloud = pointCloud(points)
end
results:
You may just use cv::findHomography. It is a RANSAC-based approach around cv::getPerspectiveTransform.
auto H = cv::findHomography(srcPoints, dstPoints, CV_RANSAC,3);
Where 3 is the reprojection threshold.
One traditional approach to solve your problem is by using point-set registration method when you don't have matching pair information. Point set registration is similar to method you are talking about.You can find matlab implementation here.
Thanks

Blocproc in matlab with two output variables

I have the following problem. I have to compute dense SIFT interest points in a very high dimensional image (182MP). When I run the code in the full image Matlab always close suddently. So I decided to run the code in image patches.
the code
I tried to use blocproc in matlab to call the c++ function that performs the dense sift interest points detection this way:
fun = #(block_struct) denseSIFT(block_struct.data, options);
[dsift , infodsift] = blockproc(ndvi,[1000 1000],fun);
where dsift is the sift descriptors (vectors) and infodsift has the information of the interest points, such as the x and y coordinates.
the problem
The problem is the fact that blocproc just allow one output, but i want both outputs. The following error is given by matlab when i run the code.
Error using blockproc
Too many output arguments.
Is there a way for me doing this?
Would it be a problem for you to "hard code" a version of blockproc?
Assuming for a moment that you can divide your image into NxM smaller images, you could loop around as follows:
bigImage = someFunction();
sz = size(bigImage);
smallSize = sz ./ [N M];
dsift = cell(N,M);
infodsift = cell(N,M);
for ii = 1:N
for jj = 1:M
smallImage = bigImage((ii-1)*smallSize(1) + (1:smallSize(1)), (jj-1)*smallSize(2) + (1:smallSize(2));
[dsift{ii,jj} infodsift{ii,jj}] = denseSIFT(smallImage, options);
end
end
The results will then be in the two cell arrays. No real need to pre-allocate, but it's tidier if you do. If the individual matrices are the same size, you can convert into a single large matrix with
dsiftFull = cell2mat(dsift);
Almost magic. This won't work if your matrices are different sizes - but then, if they are, I'm not sure you would even want to put them all in a single one (unless you decide to horzcat them).
If you do decide you want a list of "all the colums as a giant matrix", then you can do
giantMatrix = [dsift{:}];
This will return a matrix with (in your example) 128 rows, and as many columns as there were "interest points" found. It's shorthand for
giantMatrix = [dsift{1,1} dsift{2,1} dsift{3,1} ... dsift{N,M}];

relationship between density of edges to the number of vertices in graph

I want to understand how to compute big-O for a dense versus sparse graph.
"Algorithms in a nutshell" says that for sparse graph, O(E) is O(V) and for dense graph O(E) is closer to O(V^2). Does anyone know how is that derived?
Assuming the graph is simple - at the worst case every node can be connected to all |V|-1 other nodes, resulting in [in not directed graph:] |E| = (|V|-1) + (|V| -2) + ... + 1 <= |V| * (|V| -1) = O(|V|^2). And in directed graph: |E| = |V| * (|V|-1) = O(|V|^2).
A good example for a dense graph is a clique - which have all the edges.
For sparsed graph - we assume the number of edges connected to each vertex is bounded by a constant. Let this constant be k. Thus: |E| <= k* |V|, and we get |E| = O(|V|)
A good example for a sparsed graph is the internet, where every URL is a node and every link is an edge.
Note that if the graph is not simple, you cannot bound |E| with any function of |V|.
It's not derived, it's a definition. In a fully connected (directed) graph with self-loops, the number of edges |E| = |V|² so the definition of a dense graph is reasonable. The definition of a sparse graph is one where O(|E|) = O(|V|), so there's a constant maximum number of edges per vertex.
Note that if the number of edges is much smaller, e.g. O(lg |V|), then it's still O(|V|) as well. One could imagine a "semi-sparse" class of graphs with |E| = O(|V| lg |V|) or something like that, but I personally have never encountered such a class in practice.

Resources