I'm after a way to iterate on xarray chunks, so something similar to dask.array.blocks but that would give me access to xarray chunks with coordinates and dimensions.
For the record, I'm aware that xarray.map_blocks exists, but what I'm doing maps input chunks to output chunks of unknown shape, so I'd like to write something custom by looping directly on the xarray chunks.
I've tried to look into the xarray.map_blocks source code, since I guess something similar to what I need is in there, but I had a hard time understanding what's going on there.
My use case is that I would like, for each xarray chunk, to get an output xarray chunk of variable length along a new dimension (called foo below), and eventually concatenate them along foo.
This is a mocked scenario that should at least clarify what I'm after.
For now I've solved the problem constructing, from each dask chunk of the DataArray, an "xarray" chunk (but this looks quite convoluted), and then using client.map(fn_on_chunk, xarray_chunks).
n = 1000
x_raster = y_raster = np.arange(n)
time = np.arange(10)
vals_raster = np.arange(n*n*10).reshape(n, n, 10)
da_raster = xr.DataArray(vals_raster, coords={"y": y_raster, "x": x_raster, 'time':time})
da_raster = da_raster.chunk(dict(x=100, y=100))
def fn_on_chunk(da_chunk):
# Tried to replicate the fact that I can't know in advance
# the lenght of one dimension of the output
len_range = np.random.randint(10)
outs = []
for foo in range(len_range):
# Do some magic that finds needed coordinates
# on this particular chunk
x_chunk, y_chunk = fn_magic(foo)
out = da_chunk.sel(x=x_chunk, y=y_chunk)
out['foo'] = foo
return xr.concat(outs, dim='foo')
In Julia, I have a list of neighbors of a location stored in all_neighbors[loc]. This allows me to quickly loop over these neighbors conveniently with the syntax for neighbor in all_neighbors[loc]. This leads to readable code such as the following:
active_neighbors = 0
for neighbor in all_neighbors[loc]
if cube[neighbor] == ACTIVE
active_neighbors += 1
Astute readers will see that this is nothing more than a reduction. Because I'm just counting active neighbors, I figured I could do this in a one-liner using the count function. However,
# This does not work
active_neighbors = count(x->x==ACTIVE, cube[all_neighbors[loc]])
does not work because the all_neighbors mask doesn't get interpreted correctly as simply a mask over the cube array. Does anyone know the cleanest way to write this reduction? An alternative solution I came up with is:
active_neighbors = count(x->x==ACTIVE, [cube[all_neighbors[loc][k]] for k = 1:length(all_neighbors[loc])])
but I really don't like this because it's even less readable than what I started with. Thanks for any advice!
This should work:
count(x -> cube[x] == ACTIVE, all_neighbors[loc])
I'm new to Julia language and I wanted to improve my understanding by implementing a double linked list.
Unfortunately it seems that there is no good existing library for this purpose.
The only good one is the single linked list (here).
There is one implementation of a double linked list (here). But this is 2 years old and I'm not sure if it is outdated or not. And it does not allow a real empty list. It is just a single element with a default value.
At the moment I would be able to implement the common stuff like push!, pop!, that's not the problem.
But I'm struggling with implementing a double linked list that could be empty.
My current approach uses Nullable for a optional value of the reference and value.
type ListNode{T}
ListNode(v) = (x=new(); x.prev=Nullable{x}; x.next=Nullable{x}; x.value=Nullable(v); x)
ListNode(p, n, v) = new(p, n, v)
type List{T}
List() = (start=new(Nullable(ListNode{T}())); node=start; start)
List(v) = (start=new(Nullable(ListNode{T}(v))); node=start; start)
But it seems like this is pretty ugly and inconvenient to work with.
My second approach would be to introduce a boolean variable (inside List{T}) which stores if a list is empty or not. Checking this boolean would me allow to simply handle push! and pop! to empty lists.
I tried to google a good solution but I didn't found one.
Can anyone give me a "julia style" solution for the double linked list?
There is now a library containing various data structures, DataStructures.jl Some initial notes regarding the question. As of this writing, type is decrepitated. Instead, mutable struct should be used, for Julia 1.0 and beyond. Nullable is also decrepitated, and a Union of Nothing and the type in question can be used instead.
There exist a package called DataStructures.jl that provides what you need.
You can find a DoubleLinked list containing the functionality you need here:
Code snippets from the link above, defining a DoubleLinked list in Julia >= v 1.1:
mutable struct ListNode{T}
function ListNode{T}() where T
node = new{T}()
node.next = node
node.prev = node
return node
function ListNode{T}(data) where T
node = new{T}(data)
return node
mutable struct MutableLinkedList{T}
function MutableLinkedList{T}() where T
l = new{T}()
l.len = 0
l.node = ListNode{T}()
l.node.next = l.node
l.node.prev = l.node
return l
In addition to the DataStructures package, Chris Rackauckas' LinkedLists.jl is a good resource.
The source code is quite readable and you can always ask questions.
When I use a loop, to access the variables outside of the loop they need to be initialised before you enter the loop. For example:
Y = Array{Int}()
for i = 1:end
Y = i
Since I have initialised Y before entering the loop, I can access it later by typing
If I had not initialised it before entering the loop, typing Y would not have returned anything.
I want to extend this functionality to the output of the 'hist' function. I don't know how to set up the empty hist output before the loop. The only work around I have found is below.
yHistData = [hist(DataSet[1],Bins)]
for j = 2:NumberOfLayers
yHistData = [yHistData;hist(DataSet[j],Bins)]
Now when I access this later on by simply typing
I get the correct values returned to me.
How can I initialise this hist data before entering the loop without defining it using the first value of the list I'm iterating over?
This can be done with a loop like follows:
yHistData = []
for j = 1:NumberOfLayers
push!(yHistData, hist(DataSet[j], Bins))
push! modifies the array by adding the specified element to the end. This increases code speed because we do not need to create copies of the array all the time. This code is nice and simple, and runs faster than yours. The return type, however, is now Array{Any, 1}, which can be improved.
Here I have typed the array so that the performance when using this array in the future is better. Without typing the array, the performance is sometimes better and sometimes worse than your code, depending on NumberOfLayers.
yHistData = Tuple{FloatRange{Float64},Array{Int64,1}}[]
for j = 1:NumberOfLayers
push!(yHistData, hist(DataSet[j], Bins))
Assuming length(DataSet) == NumberOfLayers, we can use anonymous functions to simplify the code even further:
yHistData = map(data -> hist(data, Bins), DataSet)
This solution is short, easy to read, and very fast on Julia 0.5. However, this version is not yet released. On 0.4, the currently released version, the performance of this version will be slower.
For example, if I want to read the middle value from magic(5), I can do so like this:
M = magic(5);
value = M(3,3);
to get value == 13. I'd like to be able to do something like one of these:
value = magic(5)(3,3);
value = (magic(5))(3,3);
to dispense with the intermediate variable. However, MATLAB complains about Unbalanced or unexpected parenthesis or bracket on the first parenthesis before the 3.
Is it possible to read values from an array/matrix without first assigning it to a variable?
It actually is possible to do what you want, but you have to use the functional form of the indexing operator. When you perform an indexing operation using (), you are actually making a call to the subsref function. So, even though you can't do this:
value = magic(5)(3, 3);
You can do this:
value = subsref(magic(5), struct('type', '()', 'subs', {{3, 3}}));
Ugly, but possible. ;)
In general, you just have to change the indexing step to a function call so you don't have two sets of parentheses immediately following one another. Another way to do this would be to define your own anonymous function to do the subscripted indexing. For example:
subindex = #(A, r, c) A(r, c); % An anonymous function for 2-D indexing
value = subindex(magic(5), 3, 3); % Use the function to index the matrix
However, when all is said and done the temporary local variable solution is much more readable, and definitely what I would suggest.
There was just good blog post on Loren on the Art of Matlab a couple days ago with a couple gems that might help. In particular, using helper functions like:
paren = #(x, varargin) x(varargin{:});
curly = #(x, varargin) x{varargin{:}};
where paren() can be used like
paren(magic(5), 3, 3);
would return
ans = 16
I would also surmise that this will be faster than gnovice's answer, but I haven't checked (Use the profiler!!!). That being said, you also have to include these function definitions somewhere. I personally have made them independent functions in my path, because they are super useful.
These functions and others are now available in the Functional Programming Constructs add-on which is available through the MATLAB Add-On Explorer or on the File Exchange.
How do you feel about using undocumented features:
>> builtin('_paren', magic(5), 3, 3) %# M(3,3)
ans =
or for cell arrays:
>> builtin('_brace', num2cell(magic(5)), 3, 3) %# C{3,3}
ans =
Just like magic :)
Bad news, the above hack doesn't work anymore in R2015b! That's fine, it was undocumented functionality and we cannot rely on it as a supported feature :)
For those wondering where to find this type of thing, look in the folder fullfile(matlabroot,'bin','registry'). There's a bunch of XML files there that list all kinds of goodies. Be warned that calling some of these functions directly can easily crash your MATLAB session.
At least in MATLAB 2013a you can use getfield like:
getfield(a,{1,2}) % etc
to get the element at (1,2)
unfortunately syntax like magic(5)(3,3) is not supported by matlab. you need to use temporary intermediate variables. you can free up the memory after use, e.g.
tmp = magic(3);
myVar = tmp(3,3);
clear tmp
Note that if you compare running times with the standard way (asign the result and then access entries), they are exactly the same.
subs=#(M,i,j) M(i,j);
>> for nit=1:10;tic;subs(magic(100),1:10,1:10);tlap(nit)=toc;end;mean(tlap)
ans =
>> for nit=1:10,tic;M=magic(100); M(1:10,1:10);tlap(nit)=toc;end;mean(tlap)
ans =
To my opinion, the bottom line is : MATLAB does not have pointers, you have to live with it.
It could be more simple if you make a new function:
function [ element ] = getElem( matrix, index1, index2 )
element = matrix(index1, index2);
and then use it:
value = getElem(magic(5), 3, 3);
Your initial notation is the most concise way to do this:
M = magic(5); %create
value = M(3,3); % extract useful data
clear M; %free memory
If you are doing this in a loop you can just reassign M every time and ignore the clear statement as well.
To complement Amro's answer, you can use feval instead of builtin. There is no difference, really, unless you try to overload the operator function:
BUILTIN(...) is the same as FEVAL(...) except that it will call the
original built-in version of the function even if an overloaded one
exists (for this to work, you must never overload
>> feval('_paren', magic(5), 3, 3) % M(3,3)
ans =
>> feval('_brace', num2cell(magic(5)), 3, 3) % C{3,3}
ans =
What's interesting is that feval seems to be just a tiny bit quicker than builtin (by ~3.5%), at least in Matlab 2013b, which is weird given that feval needs to check if the function is overloaded, unlike builtin:
>> tic; for i=1:1e6, feval('_paren', magic(5), 3, 3); end; toc;
Elapsed time is 49.904117 seconds.
>> tic; for i=1:1e6, builtin('_paren', magic(5), 3, 3); end; toc;
Elapsed time is 51.485339 seconds.