I would like to use dask.array.map_overlap to deal with the scipy interpolation function. However, I keep meeting errors that I cannot understand and hoping someone can answer this to me.
Here is the error message I have received if I want to run .compute().
ValueError: could not broadcast input array from shape (1070,0) into shape (1045,0)
To resolve the issue, I started to use .to_delayed() to check each partition outputs, and this is what I found.
Following is my python code.
Step 1. Load netCDF file through Xarray, and then output to dask.array with chunk size (400,400)
df = xr.open_dataset('./Brazil Sentinal2 Tile/' + data_file +'.nc')
lon, lat = df['lon'].data, df['lat'].data
slon = da.from_array(df['lon'], chunks=(400,400))
slat = da.from_array(df['lat'], chunks=(400,400))
data = da.from_array(df.isel(band=0).__xarray_dataarray_variable__.data, chunks=(400,400))
Step 2. declare a function for da.map_overlap use
def sumsum2(lon,lat,data, hex_res=10):
hex_col = 'hex' + str(hex_res)
lon_max, lon_min = lon.max(), lon.min()
lat_max, lat_min = lat.max(), lat.min()
b = box(lon_min, lat_min, lon_max, lat_max, ccw=True)
b = transform(lambda x, y: (y, x), b)
b = mapping(b)
target_df = pd.DataFrame(h3.polyfill( b, hex_res), columns=[hex_col])
target_df['lat'] = target_df[hex_col].apply(lambda x: h3.h3_to_geo(x)[0])
target_df['lon'] = target_df[hex_col].apply(lambda x: h3.h3_to_geo(x)[1])
tlon, tlat = target_df[['lon','lat']].values.T
abc = lNDI(points=(lon.ravel(), lat.ravel()),
values= data.ravel())(tlon,tlat)
target_df['out'] = abc
print(np.stack([tlon, tlat, abc],axis=1).shape)
return np.stack([tlon, tlat, abc],axis=1)
Step 3. Apply the da.map_overlap
b = da.map_overlap(sumsum2, slon[:1200,:1200], slat[:1200,:1200], data[:1200,:1200], depth=10, trim=True, boundary=None, align_arrays=False, dtype='float64',
)
Step 4. Using to_delayed() to test output shape
print(b.to_delayed().flatten()[0].compute().shape, )
print(b.to_delayed().flatten()[1].compute().shape)
(1065, 3)
(1045, 0)
(1090, 3)
(1070, 0)
which is saying that the output from da.map_overlap is only outputting 1-D dimension ( which is (1045,0) and (1070,0) ), while in the da.map_overlap, the output I am preparing is 2-D dimension ( which is (1065,3) and (1090,3) ).
In addition, if I turn off the trim argument, which is
c = da.map_overlap(sumsum2,
slon[:1200,:1200],
slat[:1200,:1200],
data[:1200,:1200],
depth=10,
trim=False,
boundary=None,
align_arrays=False,
dtype='float64',
)
print(c.to_delayed().flatten()[0].compute().shape, )
print(c.to_delayed().flatten()[1].compute().shape)
The output becomes
(1065, 3)
(1065, 3)
(1090, 3)
(1090, 3)
This is saying that when trim=True, I cut out everything?
because...
#-- print out the values
b.to_delayed().flatten()[0].compute()[:10,:]
(1065, 3)
array([], shape=(1045, 0), dtype=float64)
while...
#-- print out the values
c.to_delayed().flatten()[0].compute()[:10,:]
array([[ -47.83683837, -18.98359832, 1395.01848583],
[ -47.8482856 , -18.99038681, 2663.68391094],
[ -47.82800624, -18.99207069, 1465.56517187],
[ -47.81897323, -18.97919009, 2769.91556363],
[ -47.82066663, -19.00712956, 1607.85927095],
[ -47.82696896, -18.97167714, 2110.7516765 ],
[ -47.81562653, -18.98302933, 2662.72112163],
[ -47.82176881, -18.98594465, 2201.83205114],
[ -47.84567 , -18.97512514, 1283.20631652],
[ -47.84343568, -18.97270783, 1282.92117225]])
Any thoughts for this?
Thank You.
I guess I got the answer. Please let me if I am wrong.
I am not allowing to use trim=True is because I change the shape of output array (after surfing the internet, I notice that the shape of output array should be the same with the shape of input array). Since I change the shape, the dask has no idea how to deal with it so it returns the empty array to me (weird).
Instead of using trim=False, since I didn't ask cutting-out the buffer zone, it is now okay to output the return values. (although I still don't know why the dask cannot concat the chunked array, but believe is also related to shape)
The solution is using delayed function on da.concatenate, which is
delayed(da.concatenate)([e.to_delayed().flatten()[idx] for idx in range(len(e.to_delayed().flatten()))])
In this case, we are not relying on the concat function in map_overlap but use our own concat to combine the outputs we want.
I want to return every value up to and including some key.
Whilst I could generate every such key and chuck them all into the Get, I suspect this will inefficiently search for the value of every key.
Inspired by this answer, I have come up with the following
let getAllUpTo key (frame:Frame<'key,'col>) : Frame<'key, 'col> =
let endRng = frame.RowIndex.Locate key
let startRng = frame.RowIndex.KeyRange |> fst |> frame.RowIndex.Locate
let fixedRange = RangeRestriction.Fixed (startRng, endRng)
frame.GetAddressRange fixedRange
Is there a built in method for doing this efficiently?
If you want to access a sub-range of a data frame with a specified starting/ending key, you can do this using the df.Rows.[ ... ] indexer. Say we have some data indexed by (sorted) dates:
let s1 = series [
let rnd = Random()
for d in 0 .. 365 ->
DateTime(2020, 1, 1).AddDays(float d) => rnd.Next() ]
let df = frame [ "S1" => s1 ]
To get a part of the data frame starting/ending on a specific date, you can use:
// Get all rows from 1 June (inclusive)
df.Rows.[DateTime(2020, 6, 1) ..]
// Get all rows until 1 June (inclusive)
df.Rows.[.. DateTime(2020, 6, 1)]
The API you are using is essentially what this does under the cover - but you are using a very low-level operations that you do not typically need to use in user code.
A = [ [1,2,3],[4,5,6]].
B = [ [a,b,c],[d,e,f]].
The output should be:
[ [{1,a},{2,b},{3,c}],[{4,d},{5,e},{6,f}]].
This is what I have got so far.
Input: [ [{Y} || Y<-X ] || X<-A].
Output: [[{1},{2},{3}],[{4},{5},{6}]]
I think this is what you need:
[lists:zip(LA, LB) || {LA, LB} <- lists:zip(A, B)].
You need to zip both lists to be able to work with their elements together.
I try to use rCharts package to create interactive graphs. I used tutorials examples but I have not managed to use nPlot method. It did not return error but I did not receive the figure as they did in the tutorial.
require(devtools)
install_github('rCharts', 'ramnathv')
library(rCharts)
a<-as.data.frame(HairEyeColor)
hair_eye_male<-subset(a, Sex=="Male")
is.data.frame(hair_eye_male)
n1<-nPlot(Freq~Hair, group='Eye', data=hair_eye_male, type="multiBarChart")
n1
hair_eye = as.data.frame(HairEyeColor)
p2 <- nPlot(Freq ~ Hair, group = 'Eye',
data = subset(hair_eye, Sex == "Female"),
type = 'multiBarChart'
)
p2$chart(color = c('brown', 'blue', '#594c26', 'green'))
p2
in the Environment panel n1 and p2 appeared as Environment. I tried rChart and it works.
Best regards
devices :[1.1:Acer C6, 2:Acer C6, 1:Acer C6, 2.2:HTC Magic]
files :[2:Tetris.apk, 1:TheSims3.apk]
I have a mapping of files and devies, as of now its one-to-many mapping.
devices :[1.1:Acer C6, 2:Acer C6, 1:Acer C6, 2.2:HTC Magic]
files :[2:Tetris.apk, 1:TheSims3.apk]
Now I need to implement many-to-many mapping
my logic for one-to-many mapping is
mapping = params.devices.inject( [:] ) { map, dev ->
// Get the first part of the version (up to the first dot)
def v = dev.key.split( /\./ )[ 0 ]
logger.debug("value of v :"+v)
map << [ (dev.value): files[ v ] ]
}
current output is - mapping :[Acer C6:Tetris.apk, HTC Magic:Tetris.apk]
expected output : [Acer C6:Tetris.apk, Acer C6:TheSims3.apk, HTC Magic:Tetris.apk]
You are accumulating your results using the device name as a key. When a new value is added to the map, it overwrites the last one with the same key.
You could try accumulating into a Set instead of a map. Example:
def devices = ['1.1': 'Acer C6', '2': 'Acer C6', '1': 'Acer C6', '2.2': 'HTC Magic']
def files = ['2': 'Tetris.apk', '1': 'TheSims3.apk']
def deviceFiles = devices.inject([] as Set) { deviceFiles, device ->
def v = device.key.split( /\./ )[0]
deviceFiles << [ (device.value), files[ v ] ]
}
assert deviceFiles == [
['Acer C6', 'Tetris.apk'],
['Acer C6', 'TheSims3.apk'],
['HTC Magic', 'Tetris.apk']
] as Set