I would like to use dask.array.map_overlap to deal with the scipy interpolation function. However, I keep meeting errors that I cannot understand and hoping someone can answer this to me.
Here is the error message I have received if I want to run .compute().
ValueError: could not broadcast input array from shape (1070,0) into shape (1045,0)
To resolve the issue, I started to use .to_delayed() to check each partition outputs, and this is what I found.
Following is my python code.
Step 1. Load netCDF file through Xarray, and then output to dask.array with chunk size (400,400)
df = xr.open_dataset('./Brazil Sentinal2 Tile/' + data_file +'.nc')
lon, lat = df['lon'].data, df['lat'].data
slon = da.from_array(df['lon'], chunks=(400,400))
slat = da.from_array(df['lat'], chunks=(400,400))
data = da.from_array(df.isel(band=0).__xarray_dataarray_variable__.data, chunks=(400,400))
Step 2. declare a function for da.map_overlap use
def sumsum2(lon,lat,data, hex_res=10):
hex_col = 'hex' + str(hex_res)
lon_max, lon_min = lon.max(), lon.min()
lat_max, lat_min = lat.max(), lat.min()
b = box(lon_min, lat_min, lon_max, lat_max, ccw=True)
b = transform(lambda x, y: (y, x), b)
b = mapping(b)
target_df = pd.DataFrame(h3.polyfill( b, hex_res), columns=[hex_col])
target_df['lat'] = target_df[hex_col].apply(lambda x: h3.h3_to_geo(x)[0])
target_df['lon'] = target_df[hex_col].apply(lambda x: h3.h3_to_geo(x)[1])
tlon, tlat = target_df[['lon','lat']].values.T
abc = lNDI(points=(lon.ravel(), lat.ravel()),
values= data.ravel())(tlon,tlat)
target_df['out'] = abc
print(np.stack([tlon, tlat, abc],axis=1).shape)
return np.stack([tlon, tlat, abc],axis=1)
Step 3. Apply the da.map_overlap
b = da.map_overlap(sumsum2, slon[:1200,:1200], slat[:1200,:1200], data[:1200,:1200], depth=10, trim=True, boundary=None, align_arrays=False, dtype='float64',
)
Step 4. Using to_delayed() to test output shape
print(b.to_delayed().flatten()[0].compute().shape, )
print(b.to_delayed().flatten()[1].compute().shape)
(1065, 3)
(1045, 0)
(1090, 3)
(1070, 0)
which is saying that the output from da.map_overlap is only outputting 1-D dimension ( which is (1045,0) and (1070,0) ), while in the da.map_overlap, the output I am preparing is 2-D dimension ( which is (1065,3) and (1090,3) ).
In addition, if I turn off the trim argument, which is
c = da.map_overlap(sumsum2,
slon[:1200,:1200],
slat[:1200,:1200],
data[:1200,:1200],
depth=10,
trim=False,
boundary=None,
align_arrays=False,
dtype='float64',
)
print(c.to_delayed().flatten()[0].compute().shape, )
print(c.to_delayed().flatten()[1].compute().shape)
The output becomes
(1065, 3)
(1065, 3)
(1090, 3)
(1090, 3)
This is saying that when trim=True, I cut out everything?
because...
#-- print out the values
b.to_delayed().flatten()[0].compute()[:10,:]
(1065, 3)
array([], shape=(1045, 0), dtype=float64)
while...
#-- print out the values
c.to_delayed().flatten()[0].compute()[:10,:]
array([[ -47.83683837, -18.98359832, 1395.01848583],
[ -47.8482856 , -18.99038681, 2663.68391094],
[ -47.82800624, -18.99207069, 1465.56517187],
[ -47.81897323, -18.97919009, 2769.91556363],
[ -47.82066663, -19.00712956, 1607.85927095],
[ -47.82696896, -18.97167714, 2110.7516765 ],
[ -47.81562653, -18.98302933, 2662.72112163],
[ -47.82176881, -18.98594465, 2201.83205114],
[ -47.84567 , -18.97512514, 1283.20631652],
[ -47.84343568, -18.97270783, 1282.92117225]])
Any thoughts for this?
Thank You.
I guess I got the answer. Please let me if I am wrong.
I am not allowing to use trim=True is because I change the shape of output array (after surfing the internet, I notice that the shape of output array should be the same with the shape of input array). Since I change the shape, the dask has no idea how to deal with it so it returns the empty array to me (weird).
Instead of using trim=False, since I didn't ask cutting-out the buffer zone, it is now okay to output the return values. (although I still don't know why the dask cannot concat the chunked array, but believe is also related to shape)
The solution is using delayed function on da.concatenate, which is
delayed(da.concatenate)([e.to_delayed().flatten()[idx] for idx in range(len(e.to_delayed().flatten()))])
In this case, we are not relying on the concat function in map_overlap but use our own concat to combine the outputs we want.
I've been working with RxSwift for a while now, just switched to Combine and I am trying to wrap my head around this specific .filter behaviour. Here's a short playground example:
import Combine
let publisher = [1, 2, 3, 4, 5]
.publisher
.share()
let filter1 = publisher
.filter { $0 == 1 }
.print("filter1")
let filter2 = publisher
.filter { $0 == 2 }
.print("filter2")
Publishers
.Merge(filter1, filter2)
.sink {
print("Result is: \($0)")
}
the output is
filter1: receive subscription: (Multicast)
filter1: request unlimited
filter1: receive value: (1)
Result is: 1
filter1: receive finished
filter2: receive subscription: (Multicast)
filter2: request unlimited
filter2: receive finished
What surprises me is that Result is: 2 is never called because the stream finishes. I could remove .share() operator which would result in receiving both values as I'd expect
filter1: receive subscription: ([1])
filter1: request unlimited
filter1: receive value: (1)
Result is: 1
filter1: receive finished
filter2: receive subscription: ([2])
filter2: request unlimited
filter2: receive value: (2)
Result is: 2
filter2: receive finished
But what if my publisher is an API call and I don't want to create a duplicate network request? Which is exactly the case I am trying to handle now and it's also why I need to use .share() operator.
Any better explanation why is this happening and how to handle a case where you want to filter a stream, do a separate logic in each stream and then merge the results back together?
So there are a couple of different things going on here.
First, the [1, 2, 3].publisher works different than Observable.from([1, 2, 3]). The latter emits the values once per cycle, while the former emits all the values back to back. The Publisher example works more like this in Rx:
Observable<Int>.create { observer in
[1, 2, 3, 4, 5].forEach {
observer.onNext($0)
}
observer.onCompleted()
return Disposables.create()
}
Because of this, in the Observable.from case, the emissions are not complete by the time the filter2 observable is subscribed to. So even if you omit the share() both "Result is: 1" and "Result is: 2" will be emitted.
Second, the share() operator also works differently. By default, the RxSwift share operator will reset the Observable once all subscriptions are disposed (it's a reference counting share). In the Combine case, the share operator makes the publisher connectable and then connects to it. Essentially, it's the same as the .share(replay: 0, scope: .forever) operator in RxSwift (something I have never needed in Rx BTW).
So the Rx code that is equivalent to the Combine code you posted is actually this:
let observable = emitSequence([1, 2, 3, 4, 5])
.share(replay: 0, scope: .forever)
let filter1ʹ = observable
.filter { $0 == 1 }
.debug("filterʹ1")
let filter2ʹ = observable
.filter { $0 == 2 }
.debug("filterʹ2")
Observable.merge(filter1ʹ, filter2ʹ)
.subscribe(onNext: {
print("Resultʹ is: \($0)")
})
func emitSequence<S>(_ sequence: S) -> Observable<S.Element> where S: Sequence {
Observable.create { observer in
sequence.forEach {
observer.onNext($0)
}
observer.onCompleted()
return Disposables.create()
}
}
All this said the practical aspect of dealing with an API call is fine. In that case, the assumption is that the call won't immediately return (it will take a cycle at least) and since it's one-shot, as long as you make sure you aren't resubscribing to the Observable, the fact that share() doesn't reset isn't a problem.
Using SWI-Prolog I have made this simple predicate that relates a time that is in hh:mm format into a time term.
time_string(time(H,M), String) :-
number_string(H,Hour),
number_string(M,Min),
string_concat(Hour,":",S),
string_concat(S,Min,String).
The predicate though can only work in one direction.
time_string(time(10,30),String).
String = "10:30". % This is perfect.
Unfortunately this query fails.
time_string(Time,"10:30").
ERROR: Arguments are not sufficiently instantiated
ERROR: In:
ERROR: [11] number_string(_8690,_8692)
ERROR: [10] time_string(time(_8722,_8724),"10:30") at /tmp/prolcompDJBcEE.pl:74
ERROR: [9] toplevel_call(user:user: ...) at /usr/local/logic/lib/swipl/boot/toplevel.pl:1107
It would be really nice if I didn't have to write a whole new predicate to answer this query. Is there a way I could do this?
Well, going from the structured term time(H,M) to the string String is easier than going from the unstructured String the term time(H,M).
Your predicate works in the "generation" direction.
For the other direction, you want to parse the String. In this case, this is computationally easy and can be done without search/backtracking, which is nice!
Use Prolog's "Definite Clause Grammar" syntax which are "just" a nice way to write predicates that process a "list of stuff". In this case the list of stuff is a list of characters (atoms of length 1). (For the relevant page from SWI-Prolog, see here)
With some luck, the DCG code can run backwards/forwards, but this is generally not the case. Real code meeting some demands of efficiency or causality may force it so that under the hood of a single predicate, you first branch by "processing direction", and then run through rather different code structures to deliver the goods.
So here. The code immediately "decays" into the parse and generate branches. Prolog does not yet manage to behave fully constraint-based. You just have to do some things before others.
Anyway, let's do this:
:- use_module(library(dcg/basics)).
% ---
% "Generate" direction; note that String may be bound to something
% in which case this clause also verifies whether generating "HH:MM"
% from time(H,M) indeed yields (whatever is denoted by) String.
% ---
process_time(time(H,M),String) :-
integer(H), % Demand that H,M are valid integers inside limits
integer(M),
between(0,23,H),
between(0,59,M),
!, % Guard passed, commit to this code branch
phrase(time_g(H,M),Chars,[]), % Build Codes from time/2 Term
string_chars(String,Chars). % Merge Codes into a string, unify with String
% ---
% "Parse" direction.
% ---
process_time(time(H,M),String) :-
string(String), % Demand that String be a valid string; no demands on H,M
!, % Guard passed, commit to this code branch
string_chars(String,Chars), % Explode String into characters
phrase(time_p(H,M),Chars,[]). % Parse "Codes" into H and M
% ---
% "Generate" DCG
% ---
time_g(H,M) --> hour_g(H), [':'], minute_g(M).
hour_g(H) --> { divmod(H,10,V1,V2), digit_int(D1,V1), digit_int(D2,V2) }, digit(D1), digit(D2).
minute_g(M) --> { divmod(M,10,V1,V2), digit_int(D1,V1), digit_int(D2,V2) }, digit(D1), digit(D2).
% ---
% "Parse" DCG
% ---
time_p(H,M) --> hour_p(H), [':'], minute_p(M).
hour_p(H) --> digit(D1), digit(D2), { digit_int(D1,V1), digit_int(D2,V2), H is V1*10+V2, between(0,23,H) }.
minute_p(M) --> digit(D1), digit(D2), { digit_int(D1,V1), digit_int(D2,V2), M is V1*10+V2, between(0,59,M) }.
% ---
% Do I really have to code this? Oh well!
% ---
digit_int('0',0).
digit_int('1',1).
digit_int('2',2).
digit_int('3',3).
digit_int('4',4).
digit_int('5',5).
digit_int('6',6).
digit_int('7',7).
digit_int('8',8).
digit_int('9',9).
% ---
% Let's add plunit tests!
% ---
:- begin_tests(hhmm).
test("parse 1", true(T == time(0,0))) :- process_time(T,"00:00").
test("parse 2", true(T == time(12,13))) :- process_time(T,"12:13").
test("parse 1", true(T == time(23,59))) :- process_time(T,"23:59").
test("generate", true(S == "12:13")) :- process_time(time(12,13),S).
test("verify", true) :- process_time(time(12,13),"12:13").
test("complete", true(H == 12)) :- process_time(time(H,13),"12:13").
test("bad parse", fail) :- process_time(_,"66:66").
test("bad generate", fail) :- process_time(time(66,66),_).
:- end_tests(hhmm).
That's a lot of code.
Does it work?
?- run_tests.
% PL-Unit: hhmm ........ done
% All 8 tests passed
true.
Given the simplicity of the pattern, a DCG could be deemeed overkill, but actually it provides us an easy access to the atomics ingredients that we can feed into some declarative arithmetic library. For instance
:- module(hh_mm_bi,
[hh_mm_bi/2
,hh_mm_bi//1
]).
:- use_module(library(dcg/basics)).
:- use_module(library(clpfd)).
hh_mm_bi(T,S) :- phrase(hh_mm_bi(T),S).
hh_mm_bi(time(H,M)) --> n2(H,23),":",n2(M,59).
n2(V,U) --> d(A),d(B), {V#=A*10+B,V#>=0,V#=<U}.
d(V) --> digit(D), {V#=D-0'0}.
Some tests
?- hh_mm_bi(T,`23:30`).
T = time(23, 30).
?- hh_mm_bi(T,`24:30`).
false.
?- phrase(hh_mm_bi(T),S).
T = time(0, 0),
S = [48, 48, 58, 48, 48] ;
T = time(0, 1),
S = [48, 48, 58, 48, 49] ;
...
edit
library(clpfd) is not the only choice we have for declarative arithmetic. Here is another shot, using library(clpBNR), but it requires you install the appropriate pack, using ?- pack_install(clpBNR). After this is done, another solution functionally equivalent to the one above could be
:- module(hh_mm_bnr,
[hh_mm_bnr/2
,hh_mm_bnr//1
]).
:- use_module(library(dcg/basics)).
:- use_module(library(clpBNR)).
hh_mm_bnr(T,S) :- phrase(hh_mm_bnr(T),S).
hh_mm_bnr(time(H,M)) --> n2(H,23),":",n2(M,59).
n2(V,U) --> d(A),d(B), {V::integer(0,U),{V==A*10+B}}.
d(V) --> digit(D), {{V==D-0'0}}.
edit
The comment (now removed) by #DavidTonhofer has made me think that a far simpler approach is available, moving the 'generation power' into d//1:
:- module(hh_mm,
[hh_mm/2
,hh_mm//1
]).
hh_mm(T,S) :- phrase(hh_mm(T),S).
hh_mm(time(H,M)) --> n2(H,23),":",n2(M,59).
n2(V,U) --> d(A),d(B), { V is A*10+B, V>=0, V=<U }.
d(V) --> [C], { member(V,[0,1,2,3,4,5,6,7,8,9]), C is V+0'0 }.
time_string(time(H,M),String)
:-
hour(H) ,
minute(M) ,
number_string(H,Hs) ,
number_string(M,Ms) ,
string_concat(Hs,":",S) ,
string_concat(S,Ms,String)
.
hour(H) :- between(0,11,H) .
minute(M) :- between(0,59,M) .
/*
?- time_string(time(10,30),B).
B = "10:30".
?- time_string(time(H,M),"10:30").
H = 10,
M = 30 ;
false.
?- time_string(time(H,M),S).
H = M, M = 0,
S = "0:0" ;
H = 0,
M = 1,
S = "0:1" ;
H = 0,
M = 2,
S = "0:2" ;
H = 0,
M = 3,
S = "0:3" %etc.
*/
Yet another answer, avoiding DCGs as overkill for this task. Or rather, the two separate tasks involved here: Not every relation can be expressed in a single Prolog predicate, especially not every relation on something as extra-logical as SWI-Prolog's strings.
So here is the solution for one of the tasks, computing strings from times (this is your code renamed):
time_string_(time(H,M), String) :-
number_string(H,Hour),
number_string(M,Min),
string_concat(Hour,":",S),
string_concat(S,Min,String).
For example:
?- time_string_(time(11, 59), String).
String = "11:59".
Here is a simple implementation of the opposite transformation:
string_time_(String, time(H, M)) :-
split_string(String, ":", "", [Hour, Minute]),
number_string(H, Hour),
number_string(M, Minute).
For example:
?- string_time_("11:59", Time).
Time = time(11, 59).
And here is a predicate that chooses which of these transformations to use, depending on which arguments are known. The exact condition will depend on the cases that can occur in your application, but it seems reasonable to say that if the string is indeed a string, we want to try to parse it:
time_string(Time, String) :-
( string(String)
-> % Try to parse the existing string.
string_time_(String, Time)
; % Hope that Time is a valid time term.
time_string_(Time, String) ).
This will translate both ways:
?- time_string(time(11, 59), String).
String = "11:59".
?- time_string(Time, "11:59").
Time = time(11, 59).
I am trying to take a list of words that I have imported from a textfile and make a dictionary , where the value is incremented each time the word is passed over in the loop. However, with the current code I have, none are added and only the value I add initiall is there when I print the dictionary. What am I doing wrong?
import pymysql
from os import path
import re
db = pymysql.connect(host='127.0.0.1', port=3306, user='root', passwd='', db='db_cc')
cursor = db.cursor()
cursor.execute("SELECT id, needsprocessing, SchoolID, ClassID, TaskID FROM sharedata WHERE needsprocessing = 1")
r = cursor.fetchall()
print(r)
from os import path
import re
noentities = len(r)
a = r[0][1]
b = r[0][2]
c = r[0][3]
d = r[0][4]
filepath = "/codecompare/%s/%s/%s/%s.txt" %(a, b, c, d)
print(filepath)
foo = open(filepath, "r")
steve = foo.read()
rawimport = steve.split(' ')
dictionary = {"for":0}
foo.close()
for word in rawimport:
if word in dictionary:
dictionary[word] +=1
else:
dictionary[word] = 1
print dictionary
Some rawimport values are as follows:
print rawimport
['Someting', 'something', 'dangerzones', 'omething', 'ghg', 'sdf', 'hgiinsfg', '932wrtioarsjg', 'fghbyghgyug', 'sadiiilglj']
Additionally, when trying to print from the code, it throws
... print dictionary
File "<stdin>", line 3
print dictionary
^
SyntaxError: invalid syntax
However, if I run print dictionary by itself it prints:
{'for': 0}
Which is evidence that for loop did nothing.
Any ideas?
Running Python 2.7.2
edit: updated to reflect closing of file and to make loop simpler
edit: added sample rawimport data
I received the same Traceback when working through this in the Python interpreter -- it arose from not leaving the context of the for loop:
>>> for word in rawimport:
... if word in dictionary:
... dictionary[word]+=1
... else:
... dictionary[word]=1
... print dictionary
File "<stdin>", line 6
print dictionary
^
The interpreter thinks your print statement belongs to the for loop, and errors because it's not appropriately indented. (If you did indent it, of course, it would print the dictionary during each pass). The solution to that (assuming you're doing this in the interpreter, which was how I reproduced your error) is hitting enter again:
>>> for word in rawimport:
... if word in dictionary:
... dictionary[word]+=1
... else:
... dictionary[word]=1
...
>>> print dictionary
{'for': 1, 'fghbyghgyug': 1, '932wrtioarsjg': 1, 'dangerzones': 1, 'sdf': 1, 'ghg': 1, 'Someting': 1, 'something': 1, 'omething': 1, 'sadiiilglj': 1, 'hgiinsfg': 1}
'''