Error in metpy.calc.wind_direction with era5 data - dask

I downloaded Era5 U and V wind components from era-5 and I am using xarray to read the .nc file and select several lat lon points from the data. After i need to calculate wind speed and direction using metpy.calc function:
ds = xr.open_mfdataset('ERA5_u10_*.nc',combine='by_coords').metpy.parse_cf()
dv = xr.open_mfdataset('ERA5_v10_*.nc',combine='by_coords').metpy.parse_cf()
#direction =mpcalc.wind_direction(ds['u10'],dv['v10'])
#dd=mpcalc.wind_direction(ds.u10,dv.v10)
locations=[]
k=0
u10 = []
v10=[]
speed=[]
direction=[]
u100 = []
v100=[]
for i,j in zip(lats,lons):
stations_u=ds.sel(longitude=j,latitude=i,method='nearest')
stations_v=dv.sel(longitude=j,latitude=i,method='nearest')
u10.append(stations_u.u10)### mudar variavel
v10.append(stations_v.v10)### mudar variavel
speed.append(mpcalc.wind_speed(stations_u.u10,stations_v.v10))
direction.append(mpcalc.wind_direction(stations_u.u10,stations_v.v10))
Wind speed works with no problem, but wind direction raises an error:
runfile('/mnt/data1/ERA5/wind/untitled0.py', wdir='/mnt/data1/ERA5/wind')
Found valid latitude/longitude coordinates, assuming latitude_longitude for projection grid_mapping variable
Found valid latitude/longitude coordinates, assuming latitude_longitude for projection grid_mapping variable
Traceback (most recent call last):
File "/usr/local/python/anaconda3/envs/my_env/lib/python3.6/site-packages/dask/array/core.py", line 1615, in __setitem__
y = where(key, value, self)
File "/usr/local/python/anaconda3/envs/my_env/lib/python3.6/site-packages/dask/array/routines.py", line 1382, in where
return elemwise(np.where, condition, x, y)
File "/usr/local/python/anaconda3/envs/my_env/lib/python3.6/site-packages/dask/array/core.py", line 4204, in elemwise
broadcast_shapes(*shapes)
File "/usr/local/python/anaconda3/envs/my_env/lib/python3.6/site-packages/dask/array/core.py", line 4165, in broadcast_shapes
"shapes {0}".format(" ".join(map(str, shapes)))
ValueError: operands could not be broadcast together with shapes (262968,) (nan,) (262968,)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/mnt/data1/ERA5/wind/untitled0.py", line 40, in <module>
direction.append(mpcalc.wind_direction(stations_u.u10,stations_v.v10))
File "/usr/local/python/anaconda3/envs/my_env/lib/python3.6/site-packages/metpy/xarray.py", line 1206, in wrapper
result = func(*bound_args.args, **bound_args.kwargs)
File "/usr/local/python/anaconda3/envs/my_env/lib/python3.6/site-packages/metpy/units.py", line 246, in wrapper
return func(*args, **kwargs)
File "/usr/local/python/anaconda3/envs/my_env/lib/python3.6/site-packages/metpy/calc/basic.py", line 104, in wind_direction
wdir[mask] += 360. * units.deg
File "/usr/local/python/anaconda3/envs/my_env/lib/python3.6/site-packages/pint/quantity.py", line 1868, in __setitem__
self._magnitude[key] = factor.magnitude
File "/usr/local/python/anaconda3/envs/my_env/lib/python3.6/site-packages/dask/array/core.py", line 1622, in __setitem__
) from e
ValueError: Boolean index assignment in Dask expects equally shaped arrays.
Example: da1[da2] = da3 where da1.shape == (4,), da2.shape == (4,) and da3.shape == (4,).
If i try to calculate Wdir with only 1 element in U and in V, surprisingly it works:
mpcalc.wind_direction(stations_u.u10[0],stations_v.v10[0]).values
Out[12]: array(173.41553, dtype=float32)

metpy.calc.wind_direction is unfortunately known not to work with Dask arrays--and in fact many places in MetPy don't work well with Dask, though we definitely want to. For now, to use wind_direction you'll need to turn stations_u, etc. into standard numpy arrays using e.g. .compute().

Thaks for your reply! Yeh it makes sense...
The dataarrays i am working with are very big, so i need to keep it as a dataArray because copying it to a normal np array would use more memory that what i have xD
I eventually tried the calculation with statins_u as both u and v, and surprisingly it works:
mpcalc.wind_direction(stations_u.u10,stations_u.u10)## Heading ##
Out[7]:
<xarray.DataArray 'sub-404afe706bcfce1f36810c70766c0647' (time: 262968)>
<Quantity(dask.array<sub, shape=(262968,), dtype=float32, chunksize=(87672,), chunktype=numpy.ndarray>, 'degree')>
Coordinates:
longitude float32 -7.1
latitude float32 43.8
* time (time) datetime64[ns] 1990-01-01 ... 2019-12-31T23:00:00
why does it work in this case ? isn't metpy.calc_wind_direction using dask array in this case ?
i fell that there's something wrong with attributes names and units and pint, but cant understand what it is ...

Related

Is np.linalg.solve() not working for AutoDiff?

Does np.linalg.solve() not work for AutoDiff? I use is to solve manipulator equation. The error message is shown below.
I try a similar "double" version code, it is no issue. Please tell me how to fix it, thanks!
### here is the error message
vdot_ad = np.linalg.solve(M_,ggg_ad)
File "<__array_function__ internals>", line 5, in solve
File "/usr/local/lib/python3.8/site-packages/numpy/linalg/linalg.py", line 394, in solve
r = gufunc(a, b, signature=signature, extobj=extobj)
TypeError: No loop matching the specified signature and casting was found for ufunc solve1
####. here is the code
plant = MultibodyPlant(time_step= 0.01)
parser = Parser(plant)
parser.AddModelFromFile("double_pendulum.sdf")
plant.Finalize()
plant_autodiff = plant.ToAutoDiffXd()
####### <AutoDiff> get the error message
xu = np.hstack((x, u))
xu_ad = initializeAutoDiff(xu)[:,0]
x_ad = xu_ad[:4]
q_ad = x_ad[:2]
v_ad = x_ad[2:4]
u_ad = xu_ad[4:]
(M_, Cv_, tauG_, B_, tauExt_) = ManipulatorDynamics(plant_autodiff, q_ad, v_ad)
vdot_ad = np.linalg.solve(M_,tauG_ + np.dot(B_,u_ad) - np.dot(Cv_,v_ad))
Note that in pydrake, AutoDiffXd scalars are exposed to NumPy using dtype=object.
There are some drawbacks to this approach, like what you have ran into now.
This is not necessarily an issue with Drake, but a limitation on NumPy itself given the ufunc's that are implemented on the (super old) version that is on 18.04.
To illustrate, here is what I see on Ubuntu 18.04, CPython 3.6.9, NumPy 1.13.3:
>>> import numpy as np
>>> A = np.eye(2)
>>> b = np.array([1, 2])
>>> np.linalg.solve(A, b)
array([ 1., 2.])
>>> A = A.astype(object)
>>> b = b.astype(object)
>>> np.linalg.solve(A, b)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3/dist-packages/numpy/linalg/linalg.py", line 375, in solve
r = gufunc(a, b, signature=signature, extobj=extobj)
TypeError: No loop matching the specified signature and casting
was found for ufunc solve1
The most direct solution would be to expose an analogous routine in pydrake, and have users leverage that.
That is what we had to do for np.linalg.inv as well:
https://github.com/RobotLocomotion/drake/pull/11173/files
Not the best solution :( However, it's simple enough!

Apply function along time dimension of XArray

I have an image stack stored in an XArray DataArray with dimensions time, x, y on which I'd like to apply a custom function along the time axis of each pixel such that the output is a single image of dimensions x,y.
I have tried: apply_ufunc but the function fails stating that I need to first load the data into RAM (i.e. cannot use a Dask Array). Ideally, I'd like to keep the DataArray as Dask Arrays internally as it isn't possible to load the entire stack into RAM. The exact error message is:
ValueError: apply_ufunc encountered a dask array on an argument, but handling for dask arrays has not been enabled. Either set the dask argument or load your data into memory first with .load() or .compute()
My code currently looks like this:
import numpy as np
import xarray as xr
import pandas as pd
def special_mean(x, drop_min=False):
s = np.sum(x)
n = len(x)
if drop_min:
s = s - x.min()
n -= 1
return s/n
times = pd.date_range('2019-01-01', '2019-01-10', name='time')
data = xr.DataArray(np.random.rand(10, 8, 8), dims=["time", "y", "x"], coords={'time': times})
data = data.chunk({'time':10, 'x':1, 'y':1})
res = xr.apply_ufunc(special_mean, data, input_core_dims=[["time"]], kwargs={'drop_min': True})
If I do load the data into RAM using .compute then I still end up with an error which states:
ValueError: applied function returned data with unexpected number of dimensions: 0 vs 2, for dimensions ('y', 'x')
I'm not sure entirely what I am missing/doing wrong.
def special_mean(x, drop_min=False):
s = np.sum(x)
n = len(x)
if drop_min:
s = s - x.min()
n -= 1
return s/n
times = pd.date_range('2019-01-01', '2019-01-10', name='time')
data = xr.DataArray(np.random.rand(10, 8, 8), dims=["time", "y", "x"], coords={'time': times})
data = data.chunk({'time':10, 'x':1, 'y':1})
res = xr.apply_ufunc(special_mean, data, input_core_dims=[["time"]], kwargs={'drop_min': True}, dask = 'allowed', vectorize = True)
The code above using the vectorize argument should work.
My aim was also to implement apply_ufunc from Xarray such that it can compute the special mean across x and y.
I enjoyed Ales example; of course by omitting the line related to the chunk. Otherwise:
ValueError: applied function returned data with unexpected number of dimensions. Received 0 dimension(s) but expected 2 dimensions with names: ('y', 'x')
Interestingly, I realized that, in a situation, to have the output of apply_ufunc 3D instead of 2D, we need to add "out_core_dims=[["time"]]" to the apply_ufunc.

Sklearn LabelEncoder throws TypeError in sort

I am learning machine learning using Titanic dataset from Kaggle. I am using LabelEncoder of sklearn to transform text data to numeric labels. The following code works fine for "Sex" but not for "Embarked".
encoder = preprocessing.LabelEncoder()
features["Sex"] = encoder.fit_transform(features["Sex"])
features["Embarked"] = encoder.fit_transform(features["Embarked"])
This is the error I got
Traceback (most recent call last):
File "../src/script.py", line 20, in <module>
features["Embarked"] = encoder.fit_transform(features["Embarked"])
File "/opt/conda/lib/python3.6/site-packages/sklearn/preprocessing/label.py", line 131, in fit_transform
self.classes_, y = np.unique(y, return_inverse=True)
File "/opt/conda/lib/python3.6/site-packages/numpy/lib/arraysetops.py", line 211, in unique
perm = ar.argsort(kind='mergesort' if return_index else 'quicksort')
TypeError: '>' not supported between instances of 'str' and 'float'
I solved it myself. The problem was that the particular feature had NaN values. Replacing it with a numerical value it will still throw an error since it is of different datatypes. So I replaced it with a character value
features["Embarked"] = encoder.fit_transform(features["Embarked"].fillna('0'))
Try this function, you’ll need to pass a Pandas Dataframe. It will look at the type of your column and encode. So you won’t need to even bother checking the types yourself.
def encoder(data):
'''Map the categorical variables to numbers to work with scikit learn'''
for col in data.columns:
if data.dtypes[col] == "object":
le = preprocessing.LabelEncoder()
le.fit(data[col])
data[col] = le.transform(data[col])
return data

Keeping zeros in data with sklearn

I have a csv dataset that I'm trying to use with sklearn. The goal is to predict future webtraffic. However, my dataset contains zeros on days that there were no visitors and I'd like to keep that value. There are more days with zero visitors then there are with visitors (it's a tiny tiny site). Here's a look at the data
Col1 is the date:
10/1/11
10/2/11
10/3/11
etc....
Col2 is the # of visitors:
12
1
0
0
1
5
0
0
etc....
sklearn seems to interpret the zero values as NaN values which is understandable. How can I use those zero values in a logistic function (is that even possible)?
Update:
The estimator is https://github.com/facebookincubator/prophet and when I run the following:
df = pd.read_csv('~/tmp/datafile.csv')
df['y'] = np.log(df['y'])
df.head()
m = Prophet()
m.fit(df);
future = m.make_future_dataframe(periods=365)
future.tail()
forecast = m.predict(future)
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()
m.plot(forecast);
m.plot_components(forecast);
plt.show
I get the following:
growthprediction.py:7: RuntimeWarning: divide by zero encountered in log
df['y'] = np.log(df['y'])
/usr/local/lib/python3.6/site-packages/fbprophet/forecaster.py:307: RuntimeWarning: invalid value encountered in double_scalars
k = (df['y_scaled'].ix[i1] - df['y_scaled'].ix[i0]) / T
Traceback (most recent call last):
File "growthprediction.py", line 11, in <module>
m.fit(df);
File "/usr/local/lib/python3.6/site-packages/fbprophet/forecaster.py", line 387, in fit
params = model.optimizing(dat, init=stan_init, iter=1e4)
File "/usr/local/lib/python3.6/site-packages/pystan/model.py", line 508, in optimizing
ret, sample = fit._call_sampler(stan_args)
File "stanfit4anon_model_35bf14a7f93814266f16b4cf48b40a5a_4758371668158283666.pyx", line 804, in stanfit4anon_model_35bf14a7f93814266f16b4cf48b40a5a_4758371668158283666.StanFit4Model._call_sampler (/var/folders/ym/m6j7kw0d3kj_0frscrtp58800000gn/T/tmp5wq7qltr/stanfit4anon_model_35bf14a7f93814266f16b4cf48b40a5a_4758371668158283666.cpp:16585)
File "stanfit4anon_model_35bf14a7f93814266f16b4cf48b40a5a_4758371668158283666.pyx", line 398, in stanfit4anon_model_35bf14a7f93814266f16b4cf48b40a5a_4758371668158283666._call_sampler (/var/folders/ym/m6j7kw0d3kj_0frscrtp58800000gn/T/tmp5wq7qltr/stanfit4anon_model_35bf14a7f93814266f16b4cf48b40a5a_4758371668158283666.cpp:8818)
RuntimeError: k initialized to invalid value (nan)
In this line of your code:
df['y'] = np.log(df['y'])
you are taking logarithm of 0 when your df['y'] is zero, which results in warnings and NaNs in your resulting dataset, because logarithm of 0 is not defined.
sklearn itself does NOT interpret zero values as NaNs unless you replace them with NaNs in your preprocessing.

Facing an error in program of image capturing and processing using opencv(cv2) and python

In the following program, it captures and processes image runtime. But I am facing a lot problems in the code. The first problem is, when camera is initialized for the first time and if it is unable to detect red colour in captured frame then it gives following error.
Traceback (most recent call last):
File "/home/mukund/Desktop/Checknewcod/attempone.py", line 23, in <module>
M=cv2.moments(best_cnt)
NameError: name 'best_cnt' is not defined
Sometimes it gives following error.
Traceback (most recent call last):
File "/home/mukund/Desktop/New logic/GridTestcal1.py", line 287, in <module>
fun2()
File "/home/mukund/Desktop/New logic/GridTestcal1.py", line 230, in fun2
M=cv2.moments(best_cnt)
UnboundLocalError: local variable 'best_cnt' referenced before assignment
The code is as follows.
import cv2
import numpy as np
capture = cv2.VideoCapture(1)
num = 0
while True:
flag, img2 = capture.read()
flag=capture.set(3,640)
flag=capture.set(4,480)
cv2.imwrite('pic'+str(num)+'.jpg', img2)
img1=cv2.imread('pic'+str(num)+'.jpg')
img=cv2.blur(img1,(3,3))
ORANGE_MIN=np.array([170,160,60],np.uint8)
ORANGE_MAX=np.array([180,255,255],np.uint8)
hsv_img=cv2.cvtColor(img1,cv2.COLOR_BGR2HSV)
frame_thresh=cv2.inRange(hsv_img,ORANGE_MIN,ORANGE_MAX)
contours,hierarchy=cv2.findContours(frame_thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)
x_area=0
for cnt in contours:
area=cv2.contourArea(cnt)
if area>x_area:
x_area=area
best_cnt=cnt
M=cv2.moments(best_cnt)
cx,cy=int(M['m10']/M['m00']),int(M['m01']/M['m00'])
cv2.circle(frame_thresh,(cx,cy),5,255,-1)
cv2.imwrite('pic1'+str(num)+'.jpg',frame_thresh)
print cx,cy
if num == 30:
capture.release()
break
num += 1
Is there any efficient way to implement above code such as using video processing?
For the error, you should check that the number of returned contours is > 0.
In some cases no coutours are detected so best_cnt is not set so the error is telling you that you are referencing it before defining that variable.
for cnt in contours:
area=cv2.contourArea(cnt)
if area>x_area: # I believe your code is indented wrongly here
x_area=area
best_cnt=cnt
if len(cnt) > 0: # In some cases no coutours are detected so best_cnt is not set.
M=cv2.moments(best_cnt)
cx,cy=int(M['m10']/M['m00']),int(M['m01']/M['m00'])
cv2.circle(frame_thresh,(cx,cy),5,255,-1)
cv2.imwrite('pic1'+str(num)+'.jpg',frame_thresh)
print cx,cy
if num == 30:
capture.release()
break
num += 1
The bulk of the processing time for your code, I believe, should be in reading and writing the image. If you are getting the frames from a video camera, then things might be slightly better, especially if its frame rate is high. If you can choose to write the frames every say k images then it might help avoid dropping frames.
Also, it would help if you can narrow down the region of interest (ROI) and only perform the image processing algorithms such as cvtColor, inRange, and findContours on it.
Some minor tweaks would be to move the declaration of ORANGE_MIN/MAX outside of the while loop.

Resources