Initialising hist function in Julia for use in a loop - histogram

When I use a loop, to access the variables outside of the loop they need to be initialised before you enter the loop. For example:
Y = Array{Int}()
for i = 1:end
Y = i
end
Since I have initialised Y before entering the loop, I can access it later by typing
Y
If I had not initialised it before entering the loop, typing Y would not have returned anything.
I want to extend this functionality to the output of the 'hist' function. I don't know how to set up the empty hist output before the loop. The only work around I have found is below.
yHistData = [hist(DataSet[1],Bins)]
for j = 2:NumberOfLayers
yHistData = [yHistData;hist(DataSet[j],Bins)]
end
Now when I access this later on by simply typing
yHistData
I get the correct values returned to me.
How can I initialise this hist data before entering the loop without defining it using the first value of the list I'm iterating over?

This can be done with a loop like follows:
yHistData = []
for j = 1:NumberOfLayers
push!(yHistData, hist(DataSet[j], Bins))
end
push! modifies the array by adding the specified element to the end. This increases code speed because we do not need to create copies of the array all the time. This code is nice and simple, and runs faster than yours. The return type, however, is now Array{Any, 1}, which can be improved.
Here I have typed the array so that the performance when using this array in the future is better. Without typing the array, the performance is sometimes better and sometimes worse than your code, depending on NumberOfLayers.
yHistData = Tuple{FloatRange{Float64},Array{Int64,1}}[]
for j = 1:NumberOfLayers
push!(yHistData, hist(DataSet[j], Bins))
end
Assuming length(DataSet) == NumberOfLayers, we can use anonymous functions to simplify the code even further:
yHistData = map(data -> hist(data, Bins), DataSet)
This solution is short, easy to read, and very fast on Julia 0.5. However, this version is not yet released. On 0.4, the currently released version, the performance of this version will be slower.

Related

F# seq behavior

I'm a little baffled about the inner work of the sequence expression in F#.
Normally if we make a sequential file reader with seq with no intentional caching of data
seq {
let mutable current = file.Read()
while current <> -1 do
yield current
}
We will end up with some weird behavior if we try to do some re-iterate or backtracking, My Idea of this was, since Read() is a function calling some mutable value we can't expect the output to be correct if we re-iterate. But then this behaves nicely even on boundary reading?
let Read path =
seq {
use fp = System.IO.File.OpenRead path
let buf = [| for _ in 0 .. 1024 -> 0uy |]
let mutable pos = 1
let mutable current = 0
while pos <> 0 do
if current = 0 then
pos <- fp.Read(buf, 0, 1024)
if pos > 0 && current < pos then
yield buf.[current]
current <- (current + 1) % 1024
}
let content = Read "some path"
We clearly use the same buffer to enhance performance, but assuming that we read the 1025 byte, it will trigger an update to the buffer, if we then try to read any byte with position < 1025 after we still get the correct output. How can that be and what are the difference?
Your question is a bit unclear, so I'll try to guess.
When you create a seq { }, you're essentially creating a state machine which will run only as far as it needs to. When you request the very first element from it, it'll start at the top and run until your first yield instruction. Then, when you request another value, it'll run from that point until the next yield, and so on.
Keep in mind that a seq { } produces an IEnumerable<'T>, which is like a "plan of execution". Each time you start to iterate the sequence (for example by calling Seq.head), a call to GetEnumerator is made behind the scenes, which causes a new IEnumerator<'T> to be created. It is the IEnumerator which does the actual providing of values. You can think of it in more classical terms as having an array over which you can iterate (an iterable or enumerable) and many pointers over that array, each of which are at different points in the array (many iterators or enumerators).
In your first code, file is most likely external to the seq block. This means that the file you are reading from is baked into the plan of execution; no matter how many times you start to iterate the sequence, you'll always be reading from the same file. This is obviously going to cause unpredictable behaviour.
However, in your second code, the file is opened as part of the seq block's definition. This means that you'll get a new file handle each time you iterate the sequence or, essentially, a new file handle per enumerator. The reason this code works is that you can't reverse an enumerator or iterate over it multiple times, not with a single thread at least.
(Now, if you were to manually get an enumerator and advance it over multiple threads, you'd probably run into problems very quickly. But that is a different topic.)

How to share the same index among multiple dask arrays

I'm trying to build a dask-based ipython application, that holds a meta-class which consists of some sub-dask-arrays (which are all shaped (n_samples, dim_1, dim_2 ...)) and should be able to sector the sub-dask-arrays by its getitem operator.
In the getitem method, I call the da.Array.compute method (the code is still in it's very early state), so I would be able to iterate batches of the sub-arrays.
def MetaClass(object):
...
def __getitem__(self, inds):
new_m = MetaClass()
inds = inds.compute()
for name,var in vars(self).items():
if isinstance(var,da.Array):
try:
setattr(new_m, name, var[inds])
except Exception as e:
print(e)
else:
setattr(new_m, name, var)
return new_m
# Here I construct the meta-class to work with some directory.
m = MetaClass('/my/data/...')
# m.type is one of the sub-dask-arrays
m2 = m[m.type==2]
It works as expected, and I get the sliced arrays, but as a result I get a huge memory consumption, and I assume that in the background the mechanism of dask is copying the index for each sub-dask-array.
My question is, how do I achieve the same results, without using so much memory?
(I tried not to "compute" the "inds" in getitem, but then I get nan shaped arrays, which can not be iterated, which is a must for the application)
I have been thinking about three possible solutions that I'd be happy to be advised which of them is the "right" one for me. (or to get another solution which I haven't thought of):
To use a Dask DataFrame, which I'm not sure how to fit multidimensional-dask-arrays in (would really appreciate some help or even a link that explains how to deal with multidimensional arrays in dd).
To forget about the entire MetaClass, and to use one dask-array with a nasty dtype (something like [("type",int,(1,)),("images",np.uint8,(1000,1000))]), again, I'm not familiar with this and would really appreciate some help with that (tried to google it.. it's a bit complicated..)
To share the index as a global inside the calling function (getitem) with property and its get-function-mechanism (https://docs.python.org/2/library/functions.html#property). But the big downside here is that I lose the types of the arrays (big down for representation and everything that needs anything but the data itself).
Thanks in advance!!!
One can use the sub-arrays.map_blocks with a shared function that holds the indices in its memory.
Here is an example:
def bool_mask(arr, block_info=None):
from_ind,to_ind = block_info[0]["array-location"][0]
return arr[inds[from_ind:to_ind]]
def getitem(var):
original_chunks = var.chunks[0]
tmp_inds = np.cumsum([0]+list(original_chunks))
from_inds = tmp_inds[:-1]
to_inds = tmp_inds[1:]
new_chunks_0 = np.array(list(map(lambda f,t:inds[f:t].sum(),from_inds,to_inds)))
new_chunks = tuple([tuple(new_chunks_0.tolist())] + list(var.chunks[1:]))
return var.map_blocks(bool_mask,dtype=var.dtype,chunks=new_chunks)

Can't modify loop-variable in lua [duplicate]

This question already has answers here:
Lua for loop reduce i? Weird behavior [duplicate]
(3 answers)
Closed 7 years ago.
im trying this in lua:
for i = 1, 10,1 do
print(i)
i = i+2
end
I would expect the following output:
1,4,7,10
However, it seems like i is getting not affected, so it gives me:
1,2,3,4,5,6,7,8,9,10
Can someone tell my a bit about the background concept and what is the right way to modify the counter variable?
As Colonel Thirty Two said, there is no way to modify a loop variable in Lua. Or rather more to the point, the loop counter in Lua is hidden from you. The variable i in your case is merely a copy of the counter's current value. So changing it does nothing; it will be overwritten by the actual hidden counter every time the loop cycles.
When you write a for loop in Lua, it always means exactly what it says. This is good, since it makes it abundantly clear when you're doing looping over a fixed sequence (whether a count or a set of data) and when you're doing something more complicated.
for is for fixed loops; if you want dynamic looping, you must use a while loop. That way, the reader of the code is aware that looping is not fixed; that it's under your control.
When using a Numeric for loop, you can change the increment by the third value, in your example you set it to 1.
To see what I mean:
for i = 1,10,3 do
print(i)
end
However this isn't always a practical solution, because often times you'll only want to modify the loop variable under specific conditions. When you wish to do this, you can use a while loop (or if you want your code to run at least once, a repeat loop):
local i = 1
while i < 10 do
print(i)
i = i + 1
end
Using a while loop you have full control over the condition, and any variables (be they global or upvalues).
All answers / comments so far only suggested while loops; here's two more ways of working around this problem:
If you always have the same step size, which just isn't 1, you can explicitly give the step size as in for i =start,end,stepdo … end, e.g. for i = 1, 10, 3 do … or for i = 10, 1, -1 do …. If you need varying step sizes, that won't work.
A "problem" with while-loops is that you always have to manually increment your counter and forgetting this in a sub-branch easily leads to infinite loops. I've seen the following pattern a few times:
local diff = 0
for i = 1, n do
i = i+diff
if i > n then break end
-- code here
-- and to change i for the next round, do something like
if some_condition then
diff = diff + 1 -- skip 1 forward
end
end
This way, you cannot forget incrementing i, and you still have the adjusted i available in your code. The deltas are also kept in a separate variable, so scanning this for bugs is relatively easy. (i autoincrements so must work, any assignment to i below the loop body's first line is an error, check whether you are/n't assigning diff, check branches, …)

Why do you save $s register into the stack when calling other functions in MIPS?

I've read other questions in stackoverflow very similar to what I'm asking but I still do not understand. I understand the basic idea behind a stack and how it works but I still do not understand why you have to go to the trouble of saving the value that the $s register have before calling the function.
Like for example, consider this c code.
int main()
{
int a = 5, b = 5, c = 0;
int c = addNumbers(a,b);
}
in that example, I'm don't care about the value that c has initially, so why would care to save it? I'm only concern with the return value it gets from the function.
One example off the top of my head that could apply to why you would save $s registers into the stack is to save the value of the variables in main so that when you send those variables into a function, the value of those variables in main remain the same.
This make sense because that variable on one hand could be use as an argument in a function and then used in an if statement after returning from the function.
So in that situation you would want to preserve that value.
If some one could provide a better example, that would be great.

matlab indexing into nameless matrix [duplicate]

For example, if I want to read the middle value from magic(5), I can do so like this:
M = magic(5);
value = M(3,3);
to get value == 13. I'd like to be able to do something like one of these:
value = magic(5)(3,3);
value = (magic(5))(3,3);
to dispense with the intermediate variable. However, MATLAB complains about Unbalanced or unexpected parenthesis or bracket on the first parenthesis before the 3.
Is it possible to read values from an array/matrix without first assigning it to a variable?
It actually is possible to do what you want, but you have to use the functional form of the indexing operator. When you perform an indexing operation using (), you are actually making a call to the subsref function. So, even though you can't do this:
value = magic(5)(3, 3);
You can do this:
value = subsref(magic(5), struct('type', '()', 'subs', {{3, 3}}));
Ugly, but possible. ;)
In general, you just have to change the indexing step to a function call so you don't have two sets of parentheses immediately following one another. Another way to do this would be to define your own anonymous function to do the subscripted indexing. For example:
subindex = #(A, r, c) A(r, c); % An anonymous function for 2-D indexing
value = subindex(magic(5), 3, 3); % Use the function to index the matrix
However, when all is said and done the temporary local variable solution is much more readable, and definitely what I would suggest.
There was just good blog post on Loren on the Art of Matlab a couple days ago with a couple gems that might help. In particular, using helper functions like:
paren = #(x, varargin) x(varargin{:});
curly = #(x, varargin) x{varargin{:}};
where paren() can be used like
paren(magic(5), 3, 3);
would return
ans = 16
I would also surmise that this will be faster than gnovice's answer, but I haven't checked (Use the profiler!!!). That being said, you also have to include these function definitions somewhere. I personally have made them independent functions in my path, because they are super useful.
These functions and others are now available in the Functional Programming Constructs add-on which is available through the MATLAB Add-On Explorer or on the File Exchange.
How do you feel about using undocumented features:
>> builtin('_paren', magic(5), 3, 3) %# M(3,3)
ans =
13
or for cell arrays:
>> builtin('_brace', num2cell(magic(5)), 3, 3) %# C{3,3}
ans =
13
Just like magic :)
UPDATE:
Bad news, the above hack doesn't work anymore in R2015b! That's fine, it was undocumented functionality and we cannot rely on it as a supported feature :)
For those wondering where to find this type of thing, look in the folder fullfile(matlabroot,'bin','registry'). There's a bunch of XML files there that list all kinds of goodies. Be warned that calling some of these functions directly can easily crash your MATLAB session.
At least in MATLAB 2013a you can use getfield like:
a=rand(5);
getfield(a,{1,2}) % etc
to get the element at (1,2)
unfortunately syntax like magic(5)(3,3) is not supported by matlab. you need to use temporary intermediate variables. you can free up the memory after use, e.g.
tmp = magic(3);
myVar = tmp(3,3);
clear tmp
Note that if you compare running times with the standard way (asign the result and then access entries), they are exactly the same.
subs=#(M,i,j) M(i,j);
>> for nit=1:10;tic;subs(magic(100),1:10,1:10);tlap(nit)=toc;end;mean(tlap)
ans =
0.0103
>> for nit=1:10,tic;M=magic(100); M(1:10,1:10);tlap(nit)=toc;end;mean(tlap)
ans =
0.0101
To my opinion, the bottom line is : MATLAB does not have pointers, you have to live with it.
It could be more simple if you make a new function:
function [ element ] = getElem( matrix, index1, index2 )
element = matrix(index1, index2);
end
and then use it:
value = getElem(magic(5), 3, 3);
Your initial notation is the most concise way to do this:
M = magic(5); %create
value = M(3,3); % extract useful data
clear M; %free memory
If you are doing this in a loop you can just reassign M every time and ignore the clear statement as well.
To complement Amro's answer, you can use feval instead of builtin. There is no difference, really, unless you try to overload the operator function:
BUILTIN(...) is the same as FEVAL(...) except that it will call the
original built-in version of the function even if an overloaded one
exists (for this to work, you must never overload
BUILTIN).
>> feval('_paren', magic(5), 3, 3) % M(3,3)
ans =
13
>> feval('_brace', num2cell(magic(5)), 3, 3) % C{3,3}
ans =
13
What's interesting is that feval seems to be just a tiny bit quicker than builtin (by ~3.5%), at least in Matlab 2013b, which is weird given that feval needs to check if the function is overloaded, unlike builtin:
>> tic; for i=1:1e6, feval('_paren', magic(5), 3, 3); end; toc;
Elapsed time is 49.904117 seconds.
>> tic; for i=1:1e6, builtin('_paren', magic(5), 3, 3); end; toc;
Elapsed time is 51.485339 seconds.

Resources