Geographic points extend beyond expected boundary - geolocation

I have a point geometry of US locations contained in a GeoDataFrame.
I want to plot this as a scatterplot over the US map.
My code is:
import numpy as np
import geopandas as gpd
import libpysal
import contextily as ctx
import matplotlib.pyplot as plt
from shapely.ops import cascaded_union
gdf = gpd.GeoDataFrame(point_geometry, geometry='geometry')
boundary = gpd.read_file(libpysal.examples.get_path('us48.shp'))
fig, ax = plt.subplots(figsize=(50, 50))
boundary.plot(ax=ax, color="gray")
gdf.plot(ax=ax, markersize=3.5, color="black")
ax.axis("off")
plt.axis("equal")
plt.show()
Upon inspecting on the graph, the dots are out of my expected bounds.
Is there something I am missing?
Do I need to create a boundary to limit the scatter of the dots?

The plot looks good. I guess you want to exclude the points outside conterminous USA. Those points are clearly in Hawaii, Alaska, and Canada.
From your geodataframe with point geometry, gdf, and with polygon geometry, boundary, you can create a proper boundary that can be used to limit the scatter of the points.
# need this module
from shapely.ops import cascaded_union
# create the conterminous USA polygon
poly_union = cascaded_union([poly for poly in boundary.geometry])
# get a selection from `gdf`, taking points within `poly_union`
points_within = gdf[gdf.geometry.within(poly_union)]
Now, points_within is a geodataframe that you can use to plot instead of gdf.
points_within.plot(ax=ax, markersize=3.5, color="black")

Related

Difference between example Acrobot plant A matrix and standard form

In section 3.4.1 of the Underactuated Robotics notes (https://underactuated.mit.edu/acrobot.html#section4), the manipulator equations are linearized around a fixed point and the matrix A_lin is derived.
While verifying the linearization of my own attempt at making an acrobot, I used the python notebook provided in Example 3.5 (LQR for the Acrobot and Cart-pole) to obtain the A matrix of the linearized Acrobot (Plant from the Examples module). I did this by simply adding 'print(linearized_acrobot.A())' on line 21 of the LQR for Acrobot block. Interestingly, I noticed that the bottom right 2x2 block is nonzero, which is different from the form derived in the notes. What is the reason behind the difference? For convenience I'll leave the code below:
import matplotlib.pyplot as plt
import mpld3
import numpy as np
from IPython.display import HTML, display
from pydrake.all import (AddMultibodyPlantSceneGraph, ControllabilityMatrix,
DiagramBuilder, Linearize, LinearQuadraticRegulator,
MeshcatVisualizerCpp, Parser, Saturation, SceneGraph,
Simulator, StartMeshcat, WrapToSystem)
from pydrake.examples.acrobot import (AcrobotGeometry, AcrobotInput,
AcrobotPlant, AcrobotState)
from pydrake.solvers.mathematicalprogram import MathematicalProgram, Solve
from underactuated import FindResource, running_as_notebook
from underactuated.meshcat_cpp_utils import MeshcatSliders
from underactuated.quadrotor2d import Quadrotor2D, Quadrotor2DVisualizer
if running_as_notebook:
mpld3.enable_notebook()
def UprightState():
state = AcrobotState()
state.set_theta1(np.pi)
state.set_theta2(0.)
state.set_theta1dot(0.)
state.set_theta2dot(0.)
return state
def acrobot_controllability():
acrobot = AcrobotPlant()
context = acrobot.CreateDefaultContext()
input = AcrobotInput()
input.set_tau(0.)
acrobot.get_input_port(0).FixValue(context, input)
context.get_mutable_continuous_state_vector()\
.SetFromVector(UprightState().CopyToVector())
linearized_acrobot = Linearize(acrobot, context)
print(linearized_acrobot.A())
print(
f"The singular values of the controllability matrix are: {np.linalg.svd(ControllabilityMatrix(linearized_acrobot), compute_uv=False)}"
)
acrobot_controllability()
Great question. The AcrobotPlant in Drake has default parameters which include some joint friction, which leads to non-zero elements in the bottom corner. If you amend your code with
acrobot = AcrobotPlant()
context = acrobot.CreateDefaultContext()
params = acrobot.get_mutable_parameters(context)
print(params)
params.set_b1(0)
params.set_b2(0)
then the bottom-right 2x2 elements of the linearized A are zero as expected.

How to show kdeplot in a 5*4 subplot?

I am working on a machine learning project and am using the seaborn kdeplot to show the standard scaler after scaling. However, no matter how large the figure size I change, the graphs just can't show and will show the error: AttributeError: 'numpy.ndarray' object has no attribute 'plot'.The image I'm willing to show is a 5*4 subplot that look like this:
expected subplot image
#feature scaling
#since numerical attributes have very different scales,
#we use standardization to get all attributes to have the same scale
import pandas as pd
import numpy as np
from sklearn import preprocessing
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
matplotlib.style.use('ggplot')
scaler = preprocessing.StandardScaler()
scaled_df = scaler.fit_transform(train_set)
scaled_df = pd.DataFrame(scaled_df, columns=["SaleAmount","SaleCount","ReturnAmount","ReturnCount",
"KeyedAmount","KeyedCount","VoidRejectAmount","VoidRejectCount","RetrievalAmount",
"RetrievalCount","ChargebackAmount","ChargebackCount","DepositAmount","DepositCount",
"NetDeposit","AuthorizationAmount","AuthorizationCount","DeclinedAuthorizationAmount","DeclinedAuthorizationCount"])
fig, axes = plt.subplots(figsize=(20,10), ncols=5, nrows=4)
sns.kdeplot(scaled_df['SaleAmount'], ax=axes[0])
sns.kdeplot(scaled_df['SaleCount'], ax=axes[1])
sns.kdeplot(scaled_df['ReturnAmount'], ax=axes[2])
sns.kdeplot(scaled_df['ReturnCount'], ax=axes[3])
sns.kdeplot(scaled_df['KeyedAmount'], ax=axes[4])
sns.kdeplot(scaled_df['KeyedCount'], ax=axes[5])
sns.kdeplot(scaled_df['VoidRejectAmount'], ax=axes[6])
sns.kdeplot(scaled_df['VoidRejectCount'], ax=axes[7])
sns.kdeplot(scaled_df['RetrievalAmount'], ax=axes[8])
sns.kdeplot(scaled_df['RetrievalCount'], ax=axes[9])
sns.kdeplot(scaled_df['ChargebackAmount'], ax=axes[10])
sns.kdeplot(scaled_df['ChargebackCount'], ax=axes[11])
sns.kdeplot(scaled_df['DepositAmount'], ax=axes[12])
sns.kdeplot(scaled_df['DepositCount'], ax=axes[13])
sns.kdeplot(scaled_df['NetDeposit'], ax=axes[14])
sns.kdeplot(scaled_df['AuthorizationAmount'], ax=axes[15])
sns.kdeplot(scaled_df['AuthorizationCount'], ax=axes[16])
sns.kdeplot(scaled_df['DeclinedAuthorizationAmount'], ax=axes[17])
sns.kdeplot(scaled_df['DeclinedAuthorizationCount'], ax=axes[18])
You need to know that you have a two dimension array so something like this:
sns.kdeplot(scaled_df['DeclinedAuthorizationCount'], ax=axes[9,2])

Scikit-learn PCA .fit_transform shape is inconsistent (n_samples << m_attributes)

I am getting different shapes for my PCA using sklearn. Why isn't my transformation resulting in an array of the same dimensions like the docs say?
fit_transform(X, y=None)
Fit the model with X and apply the dimensionality reduction on X.
Parameters:
X : array-like, shape (n_samples, n_features)
Training data, where n_samples is the number of samples and n_features is the number of features.
Returns:
X_new : array-like, shape (n_samples, n_components)
Check this out with the iris dataset which is (150, 4) where I'm making 4 PCs:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn import decomposition
import seaborn as sns; sns.set_style("whitegrid", {'axes.grid' : False})
%matplotlib inline
np.random.seed(0)
# Iris dataset
DF_data = pd.DataFrame(load_iris().data,
index = ["iris_%d" % i for i in range(load_iris().data.shape[0])],
columns = load_iris().feature_names)
Se_targets = pd.Series(load_iris().target,
index = ["iris_%d" % i for i in range(load_iris().data.shape[0])],
name = "Species")
# Scaling mean = 0, var = 1
DF_standard = pd.DataFrame(StandardScaler().fit_transform(DF_data),
index = DF_data.index,
columns = DF_data.columns)
# Sklearn for Principal Componenet Analysis
# Dims
m = DF_standard.shape[1]
K = m
# PCA (How I tend to set it up)
M_PCA = decomposition.PCA()
A_components = M_PCA.fit_transform(DF_standard)
#DF_standard.shape, A_components.shape
#((150, 4), (150, 4))
but then when I use the same exact approach on my actual dataset (76, 1989) as in 76 samples and 1989 attributes/dimensions I get a (76, 76) array instead of (76, 1989)
DF_centered = normalize(DF_mydata, method="center", axis=0)
m = DF_centered.shape[1]
# print(m)
# 1989
M_PCA = decomposition.PCA(n_components=m)
A_components = M_PCA.fit_transform(DF_centered)
DF_centered.shape, A_components.shape
# ((76, 1989), (76, 76))
normalize is just a wrapper I made that subtracts the mean from each dimension.
(Note: this answer is adapted from my answer on Cross Validated here: Why are there only n−1 principal components for n data points if the number of dimensions is larger or equal than n?)
PCA (as most typically run) creates a new coordinate system by:
shifting the origin to the centroid of your data,
squeezes and/or stretches the axes to make them equal in length, and
rotates your axes into a new orientation.
(For more details, see this excellent CV thread: Making sense of principal component analysis, eigenvectors & eigenvalues.) However, step 3 rotates your axes in a very specific way. Your new X1 (now called "PC1", i.e., the first principal component) is oriented in your data's direction of maximal variation. The second principal component is oriented in the direction of the next greatest amount of variation that is orthogonal to the first principal component. The remaining principal components are formed likewise.
With this in mind, let's examine a simple example (suggested by #amoeba in a comment). Here is a data matrix with two points in a three dimensional space:
X = [ 1 1 1
2 2 2 ]
Let's view these points in a (pseudo) three dimensional scatterplot:
So let's follow the steps listed above. (1) The origin of the new coordinate system will be located at (1.5,1.5,1.5). (2) The axes are already equal. (3) The first principal component will go diagonally from what used to be (0,0,0) to what was originally (3,3,3), which is the direction of greatest variation for these data. Now, the second principal component must be orthogonal to the first, and should go in the direction of the greatest remaining variation. But what direction is that? Is it from (0,0,3) to (3,3,0), or from (0,3,0) to (3,0,3), or something else? There is no remaining variation, so there cannot be any more principal components.
With N=2 data, we can fit (at most) N−1=1 principal components.

Healpy plotting: How do i make a figure with subplots using the healpy.mollview projection?

I've just recently started trying to use healpy and i can't figure out how to make subplots to contain my maps. I have a thermal emission map of a planet as function of time and i need to look at it at several moments in time (lets say 9 different times) and superimpose some coordinates, to check that my planet is rotating the right way.
So far, i can do 2 things.
Make 9 different figures with the superimposed coordinates.
Make a figure with 9 subplots containing 9 different maps but that superimposes all of my coordinates on all of my subplots, instead of just the time-appropriate ones.
I'm not sure if this is a very simple problem but it's been driving me crazy and i cant find anything that works.
I'll show you what i mean:
OPTION 1:
import healpy as hp
import matplolib.pyplot as plt
MAX = 10**(23)
MIN = 10**10
for i in range(9):
t = 4000+10*i
hp.visufunc.mollview(Fmap_wvpix[t,:],
title = "Map at t="+str(t), min = MIN, max=MAX))
hp.visufunc.projplot(d[t,np.where(np.abs(d[t,:,2]-SSP[t])<0.5),1 ],
d[t,np.where(np.abs(d[t,:,2]-SSP[t])<0.5),2],
'k*',markersize = 6)
hp.visufunc.projplot(d[t,np.where(np.abs(d[t,:,2]-(SOP[t]))<0.2),1 ],
d[t,np.where(np.abs(d[t,:,2]-(SOP[t]))<0.2),2],
'r*',markersize = 6)
This makes 9 figures that look pretty much like this :
Flux map superimposed with some stars at time = t
But i need a lot of them so i want to make an image that contains 9 subplots that look like the image.
OPTION 2:
fig = plt.figure(figsize = (10,8))
for i in range(9):
t = 4000+10*i
hp.visufunc.mollview(Fmap_wvpix[t,:],
title = "Map at t="+str(t), min = MIN, max=MAX,
sub = int('33'+str(i+1)))
hp.visufunc.projplot(d[t,np.where(np.abs(d[t,:,2]-SSP[t])<0.5),1 ],
d[t,np.where(np.abs(d[t,:,2]-SSP[t])<0.5),2],
'k*',markersize = 6)
hp.visufunc.projplot(d[t,np.where(np.abs(d[t,:,2]-(SOP[t]))<0.2),1 ],
d[t,np.where(np.abs(d[t,:,2]-(SOP[t]))<0.2),2],
'r*',markersize = 6)
This gives me subplots but it draws all the projplot stars on all of my subplots! (see following image)
Subplots with too many stars
I know that i need a way to call the axes that has the time = t map and draw the stars for time = t on the appropriate map, but everything i've tried so far has failed. I've mostly tried to use projaxes thinking i can define a matplotlib axes and draw the stars on it but it doesnt work. Any advice?
Also, i would like to draw some lines on my map as well but i cant figure out how to do that either. The documentation says projplot but it won't draw anyting if i don't tell it i want a marker.
PS: This code is probably useless to you as it won't work if you don't have my arrays. Here's a simpler version that should run:
import numpy as np
import healpy as hp
import matplotlib.pyplot as plt
NSIDE = 8
m = np.arange(hp.nside2npix(NSIDE))*1
MAX = 900
MIN = 0
fig = plt.figure(figsize = (10,8))
for i in range(9):
t = 4000+10*i
hp.visufunc.mollview(m+100*i, title = "Map at t="+str(t), min = MIN, max=MAX,
sub = int('33'+str(i+1)))
hp.visufunc.projplot(1.5,0+30*i, 'k*',markersize = 16)
So this is supposed to give me one star for each frame and the star is supposed to be moving. But instead it's drawing all the stars on all the frames.
What can i do? I don't understand the documentation.
If you want to have healpy plots in matplotlib subplots, the following would be the way to go. The key is to use plt.axes() to select the active subplot and to use the hold=True keyword in the healpy functions.
import healpy as hp
import numpy as np
import matplotlib.pyplot as plt
fig, (ax1, ax2) = plt.subplots(ncols=2)
plt.axes(ax1)
hp.mollview(np.random.random(hp.nside2npix(32)), hold=True)
plt.axes(ax2)
hp.mollview(np.arange(hp.nside2npix(32)), hold=True)
I have just encountered this question looking for a solution to the same problem, but managed to find it from the documentation of mollview (here).
As you notice there, they say that 'sub' received the same syntax as the function subplot (from matplotlib). This format is:
( # of rows, # of columns, # of current subplot)
E.g. to make your plot, the value sub wants to receive in each iteration is
sub=(3,3,i)
Where i runs from 1 to 9 (3*3).
This worked for me, I haven't tried this with your code, but should work.
Hope this helps!

Count number of objects using watershed algorithm - Scikit-image

I am trying to find the number of objects in a given image using watershed segmentation. Consider for example the coins image. Here I would like to know the number of coins in the image. I implemented the code available at Scikit-image documentation and tweaked with it a little and got results similar to those displayed on the documentation page.
After looking at functions used in the code in detail I found out that ndimage.label() also returns number of unique objects found in the image (mentioned in it's documentation), but when I print that value I am getting 53 which is very high as compared to the number of coins in the actual image.
Can somebody suggest some method to find the number of objects in an image.
Here is a version of your code that counts the coins in one of two ways: a) by directly segmenting the distance image and b) by doing watershed first and rejecting tiny intersecting regions.
from __future__ import print_function
import numpy as np
import matplotlib.pyplot as plt
from skimage import io, color, filter as filters
from scipy import ndimage
from skimage.morphology import watershed
from skimage.feature import peak_local_max
from skimage.measure import regionprops, label
image = color.rgb2gray(io.imread('water_coins.jpg', plugin='freeimage'))
image = image < filters.threshold_otsu(image)
distance = ndimage.distance_transform_edt(image)
# Here's one way to measure the number of coins directly
# from the distance map
coin_centres = (distance > 0.8 * distance.max())
print('Number of coins (method 1):', np.max(label(coin_centres)))
# Or you can proceed with the watershed labeling
local_maxi = peak_local_max(distance, indices=False, footprint=np.ones((3, 3)),
labels=image)
markers, num_features = ndimage.label(local_maxi)
labels = watershed(-distance, markers, mask=image)
# ...but then you have to clean up the tiny intersections between coins
regions = regionprops(labels)
regions = [r for r in regions if r.area > 50]
print('Number of coins (method 2):', len(regions) - 1)
fig, axes = plt.subplots(ncols=3, figsize=(8, 2.7))
ax0, ax1, ax2 = axes
ax0.imshow(image, cmap=plt.cm.gray, interpolation='nearest')
ax0.set_title('Overlapping objects')
ax1.imshow(-distance, cmap=plt.cm.jet, interpolation='nearest')
ax1.set_title('Distances')
ax2.imshow(labels, cmap=plt.cm.spectral, interpolation='nearest')
ax2.set_title('Separated objects')
for ax in axes:
ax.axis('off')
fig.subplots_adjust(hspace=0.01, wspace=0.01, top=1, bottom=0, left=0,
right=1)
plt.show()

Resources