Resultant weight vector has same values in SAS/IML - machine-learning

I'm trying to create a binary perceptron classifier using SAS to develop my skills with SAS. The data has been cleaned and split into training and test sets. Due to my inexperience, I expanded the label vector into a table of seven identical columns to correspond to the seven weights to make the calculations more straightforward, at least, given my limited experience this seemed to be a usable method. Anyway, I run the following:
PROC IML;
W = {0, 0, 0, 0, 0, 0, 0};
USE Work.X_train;
XVarNames = {"Pclass" "Sex" "Age" "FamSize" "EmbC" "EmbQ" "EmbS"};
READ ALL VAR XVarNames INTO X_trn;
USE Work.y_train;
YVarNames = {"S1" "S2" "S3" "S4" "S5" "S6" "S7"};
READ ALL VAR YVarNames INTO y_trn;
DO i = 1 to 668;
IF W`*X_trn[i] > 0 THEN Z = {1, 1, 1, 1, 1, 1, 1};
ELSE Z = {0, 0, 0, 0, 0, 0, 0};
W = W+(y_trn[i]`-Z)#X_trn[i]`;
END;
PRINT W;
RUN;
and the result is a column vector with seven entries each having value -2.373. The particular value isn't important, but clearly, a weight vector that is comprised of identical values is not useful. The question then is, what error in the code am I making that is producing this result?
My intuition is that something with how I am trying to call each row of observations for X_trn and y_trn into the equation is resulting in this error. Otherwise, it might be due to the matrix arithmetic in the W = line, but the orientation of all of the vectors seems to be appropriate.

Related

How can I combine blocks of the same type in rectangular prisms in an efficient mannor?

I'm making a block-based city building game and I'm trying to figure out an efficient way of merging multiple blocks of the same type into boxes for optimization purposes. So let's say there's a wall of a building and it's 16x30x16 blocks of brick. Rather than draw the 7,680 blocks, I can draw them as one giant flat rectangular prism with a repeating texture, which would be eons more efficient.
I started on this by creating strips, which I intended to further combine into panes, but it seems that this method is already too slow, as it has to loop through every block in a plot(chunk), check if it can be merged into the current strip, and then merge it if yes.
Thanks in advance
local function Draw(plot)
local blocks = plot.blocks
local slabs = NewAutotable(3)
local slabList = {}
for y = 1, 32 do
for x = 1, 16 do
local currentSlab = NewSlab(x, y, 1, 1, 1, 1, blocks[x][y][1])
slabs[x][y][1] = currentSlab
slabList[#slabList + 1] = currentSlab
for z = 2, 16 do
if currentSlab[7] == blocks[x][y][z] then
GrowSlab(currentSlab, 0, 0, 1)
else
currentSlab = NewSlab(x, y, z, 1, 1, 1, blocks[x][y][z + 1])
slabs[x][y][z] = currentSlab
slabList[#slabList + 1] = currentSlab
end
end
end
end
end

If you can combine 3+ arbitrarily sized integers and still be able to deconstruct it back

Say you have 3 integers:
13105
705016
13
I'm wondering if you could combine these into one integer in any way, so that you can still get back to the original 3 integers.
var startingSet = [ 13105, 705016, 13 ]
var combined = combineIntoOneInteger(startingSet)
// 15158958589285958925895292589 perhaps, I have no idea.
var originalIntegers = deconstructInteger(combined, 3)
// [ 13105, 705016, 13 ]
function combineIntoOneInteger(integers) {
// some sort of hashing-like function...
}
function deconstructInteger(integer, arraySize) {
// perhaps pass it some other parameters
// like how many to deconstruct to, or other params.
}
It doesn't need to technically be an "integer". It is just a string using only the integer characters, though perhaps I might want to use the hex characters instead. But I ask in terms of integers because underneath I do have integers of a bounded size that will be used to construct the combined object.
Some other notes....
The combined value should be unique, so no matter what values you combine, you will always get a different result. That is, there are absolutely no conflicts. Or if that's not possible, perhaps an explanation why and a potential workaround.
The mathematical "set" containing all possible outputs can be composed of different amounts of components. That is to say, you might have the output/combined set containing [ 100, 200, 300, 400 ] but the input set is these 4 arrays: [ [ 1, 2, 3 ], [ 5 ], [ 91010, 132 ], [ 500, 600, 700 ] ]. That is, the input arrays can be of wildly different lengths and wildly different sized integers.
One way to accomplish this more generically is to just use a "separator" character, which makes it super easy. So it would be like 13105:705016:13. But this is cheating, I want it to only use the characters in the integer set (or perhaps the hex set, or some other arbitrary set, but for this case just the integer set or hex).
Another idea for a potential way to accomplish this is to somehow hide a separator in there by doing some hashing or permutation jiu jitsu so that [ 13105, 705016, 13 ] becomes some integer-looking thing like 95918155193915183, where 155 and 5 are some separator like interpolator values based on the preceding input or some other tricks. A simpler approach to this would be like saying "anything following three zeroes 000 like 410001414 means it's a new integer. So basically 000 is a separator. But this specifically is ugly and brittle. Maybe it could get more tricky and work though, like "if the value is odd and followed by a multiple of 3 of itself, then it's a separator" sort of thing. But I can see that also having brittle edge cases.
But basically, given a set of integers n (of strings of integer characters), how to convert that into a single integer (or single integer-charactered string), and then convert it back into the original set of integers n.
Sure, there are lots of ways to do this.
To start with, it's only necessary to have a reversible function which combines two values into one. (For it to be reversible, there must be another function which takes the output value and recreates the two input values.)
Let's call the function which combines two values combine and the reverse function separate. Then we have:
separate(combine(a, b)) == [a, b]
for any values a and b. That means that combine(a, b) == combine(c, d)
can only be true if both a == c and b == d; in other words, every pair of inputs produces a different output.
Encoding arbitrary vectors
Once we have that function, we can encode arbitrary-length input vectors. The simplest case is when we know in advance what the length of the vector is. For example, we could define:
combine3 = (a, b, c) => combine(combine(a, b), c)
combine4 = (a, b, c, d) => combine(combine(combine(a, b), c), d)
and so on. To reverse that computation, we only have to repeatedly call separate the correct number of times, each time keeping the second returned value. For example, if we previously had computed:
m = combine4(a, b, c, d)
we could get the four input values back as follows:
c3, d = separate(m)
c2, c = separate(c3)
a, b = separate(c2)
But your question asks for a way to combine an arbitrary number of values. To do that, we just need to do one final combine, which mixes in the number of values. That lets us get the original vector back out: first, we call separate to get the value count back out, and then we call separate enough times to extract each successive input value.
combine_n = v => combine(v.reduce(combine), v.length)
function separate_n(m) {
let [r, n] = separate(m)
let a = Array(n)
for (let i = n - 1; i > 0; --i) [r, a[i]] = separate(r);
a[0] = r;
return a;
}
Note that the above two functions do not work on the empty vector, which should code to 0. Adding the correct checks for this case is left as an exercise. Also note the warning towards the bottom of this answer, about integer overflow.
A simple combine function: diagonalization
With that done, let's look at how to implement combine. There are actually many solutions, but one pretty simple one is to use the diagonalization function:
diag(a, b) = (a + b)(a + b + 1)
------------------ + a
2
This basically assigns positions in the infinite square by tracing successive diagonals:
<-- b -->
0 1 3 6 10 15 21 ...
^ 2 4 7 11 16 22 ...
| 5 8 12 17 23 ...
a 9 13 18 24 ...
| 14 19 25 ...
v 20 26 ...
27 ...
(In an earlier version of this answer, I had reversed a and b, but this version seems to have slightly more intuitive output values.)
Note that the top row, where a == 0, is exactly the triangular numbers, which is not surprising because the already enumerated positions are the top left triangle of the square.
To reverse the transformation, we start by solving the equation which defines the triangular numbers, m = s(s + 1)/2, which is the same as
0 = s² + s - 2m
whose solution can be found using the standard quadratic formula, resulting in:
s = floor((-1 + sqrt(1 + 8 * m)) / 2)
(s here is the original a+b; that is, the index of the diagonal.)
I should explain the call to floor which snuck in there. s will only be precisely an integer on the top row of the square, where a is 0. But, of course, a will usually not be 0, and m will usually be a little more than the triangular number we're looking for, so when we solve for s, we'll get some fractional value. Floor just discards the fractional part, so the result is the diagonal index.
Now we just have to recover a and b, which is straight-forward:
a = m - combine(0, s)
b = s - a
So we now have the definitions of combine and separate:
let combine = (a, b) => (a + b) * (a + b + 1) / 2 + a
function separate(m) {
let s = Math.floor((-1 + Math.sqrt(1 + 8 * m)) / 2);
let a = m - combine(0, s);
let b = s - a;
return [a, b];
}
One cool feature of this particular encoding is that every non-negative integer corresponds to a distinct vector. Many other encoding schemes do not have this property; the possible return values of combine_n are a subset of the set of non-negative integers.
Example encodings
For reference, here are the first 30 encoded values, and the vectors they represent:
> for (let i = 1; i <= 30; ++i) console.log(i, separate_n(i));
1 [ 0 ]
2 [ 1 ]
3 [ 0, 0 ]
4 [ 1 ]
5 [ 2 ]
6 [ 0, 0, 0 ]
7 [ 0, 1 ]
8 [ 2 ]
9 [ 3 ]
10 [ 0, 0, 0, 0 ]
11 [ 0, 0, 1 ]
12 [ 1, 0 ]
13 [ 3 ]
14 [ 4 ]
15 [ 0, 0, 0, 0, 0 ]
16 [ 0, 0, 0, 1 ]
17 [ 0, 1, 0 ]
18 [ 0, 2 ]
19 [ 4 ]
20 [ 5 ]
21 [ 0, 0, 0, 0, 0, 0 ]
22 [ 0, 0, 0, 0, 1 ]
23 [ 0, 0, 1, 0 ]
24 [ 0, 0, 2 ]
25 [ 1, 1 ]
26 [ 5 ]
27 [ 6 ]
28 [ 0, 0, 0, 0, 0, 0, 0 ]
29 [ 0, 0, 0, 0, 0, 1 ]
30 [ 0, 0, 0, 1, 0 ]
Warning!
Observe that all of the unencoded values are pretty small. The encoded values is similar in size to the concatenation of all the input values, and so it does grow pretty rapidly; you have to be careful to not exceed Javascript's limit on exact integer computation. Once the encoded value exceeds this limit (253) it will no longer be possible to reverse the encoding. If your input vectors are long and/or the encoded values are large, you'll need to find some kind of bignum support in order to do precise integer computations.
Alternative combine functions
Another possible implementation of combine is:
let combine = (a, b) => 2**a * 3**b
In fact, using powers of primes, we could dispense with the combine_n sequence, and just produce the combination directly:
combine(a, b, c, d, e,...) = 2a 3b 5c 7d 11e...
(That assumes that the encoded values are strictly positive; if they could be 0, we'd have no way of knowing how long the sequence was because the encoded value does not distinguish between a vector and the same vector with a 0 appended. But that's not a big issue, because if we needed to deal with 0s, we would just add one to all used exponents:
combine(a, b, c, d, e,...) = 2a+1 3b+1 5c+1 7d+1 11e+1...
That is certainly correct and its very elegant in a theoretical sense. It's the solution which you will find in theoretical CS textbooks because it is much easier to prove uniqueness and reversibility. However, in the real world it is really not practical. Reversing the combination depends on finding the prime factors of the encoded value, and the encoded values are truly enormous, well out of the range of easily representable numbers.
Another possibility is precisely the one you mention in the question: simply put a separator between successive values. One simple way to do this is to rewrite the values to encode in base 9 (or base 15) and then increment all the digit values, so that the digit 0 is not present in any encoded value. Then we can put 0s between the encoded values and read the result in base 10 (or base 16).
Neither of these solutions has the property that every non-negative integer is the encoding of some vector. (The second one almost has that property, and it's a useful exercise to figure out which integers are not possible encodings, and then fix the encoding algorithm to avoid that problem.)

Accessing ghosted chunks with dask

Using dask, I would like to break up an image array into overlapping tiles, perform a computation (on all the tiles simultaneously), and then stitch the results back into an image.
The following works, but feels clumsy:
from dask import array as da
from dask.array import ghost
import numpy as np
test_data = np.random.random((50, 50))
x = da.from_array(test_data, chunks=(10, 10))
depth = {0: 1, 1: 1}
g = ghost.ghost(x, depth=depth, boundary='reflect')
# Calculate the shape of the array in terms of chunks
chunk_shape = [len(c) for c in g.chunks]
chunk_nr = np.prod(chunk_shape)
# Allocate a list for results (as many entries as there are chunks)
blocks = [None,] * chunk_nr
def pack_block(block, block_id):
"""Store `block` at the correct position in `blocks`,
according to its `block_id`.
E.g., with ``block_id == (0, 3)``, the block will be stored at
``blocks[3]`.
"""
idx = np.ravel_multi_index(block_id, chunk_shape)
blocks[idx] = block
# We don't really need to return anything, but this will do
return block
g.map_blocks(pack_block).compute()
# Do some operation on the blocks; this is an over-simplified example.
# Typically, I want to do an operation that considers *all*
# blocks simultaneously, hence the need to first unpack into a list.
blocks = [b**2 for b in blocks]
def retrieve_block(_, block_id):
"""Fetch the correct block from the results set, `blocks`.
"""
idx = np.ravel_multi_index(block_id, chunk_shape)
return blocks[idx]
result = g.map_blocks(retrieve_block)
# Slice off excess from each computed chunk
result = ghost.trim_internal(result, depth)
result = result.compute()
Is there a cleaner way to achieve the same end result?
The user-facing api for this is map_overlap method
>>> x = np.array([1, 1, 2, 3, 3, 3, 2, 1, 1])
>>> x = da.from_array(x, chunks=5)
>>> def derivative(x):
... return x - np.roll(x, 1)
>>> y = x.map_overlap(derivative, depth=1, boundary=0)
>>> y.compute()
array([ 1, 0, 1, 1, 0, 0, -1, -1, 0])
Two additional notes for your use case
Avoid hashing costs by supplying name=False to from_array. This saves you about 400MB/s assuming you don't have any fancy hashing libraries around.
x = da.from_array(x, name=False)
Be careful of computing inplace. Dask doesn't guarantee correct behavior if user functions mutate data inplace. In this particular case it's probably fine, since we're copying for ghosting anyway, but it's something to be aware of.
Second answer
Given the comment by #stefan-van-der-walt we'll try another solution.
Consider using the .to_delayed() method to get an array of chunks as dask.delayed objects
depth = {0: 1, 1: 1}
g = ghost.ghost(x, depth=depth, boundary='reflect')
blocks = g.todelayed()
This gives you a numpy array of dask.delayed objects, each of which point to a block. You can now perform arbitrary parallel computations on these blocks. If I wanted them all to arrive at the same function then I might call the following:
result = dask.delayed(f)(blocks.tolist())
The function f will then get a list of lists of numpy arrays, each of which corresponds to one block in the dask.array g.

Prolog print value as result instead of true

I need to write a program, which returns a new list from a given list with following criteria.
If list member is negative or 0 it should and that value 3 times to new list. If member is positive it should add value 2 times for that list.
For example :
goal: dt([-3,2,0],R).
R = [-3,-3,-3,2,2,0,0,0].
I have written following code and it works fine for me, but it returns true as result instead of R = [some_values]
My code :
dt([],R):- write(R). % end print new list
dt([X|Tail],R):- X =< 0, addNegavite(Tail,X,R). % add 3 negatives or 0
dt([X|Tail],R):- X > 0, addPositive(Tail,X,R). % add 2 positives
addNegavite(Tail,X,R):- append([X,X,X],R,Z), dt(Tail, Z).
addPositive(Tail,X,R):- append([X,X],R,Z), dt(Tail, Z).
Maybe someone know how to make it print R = [] instead of true.
Your code prepares the value of R as it goes down the recursing chain top-to-bottom, treating the value passed in as the initial list. Calling dt/2 with an empty list produces the desired output:
:- dt([-3,2,0],[]).
Demo #1 - Note the reversed order
This is, however, an unusual way of doing things in Prolog: typically, R is your return value, produced in the other way around, when the base case services the "empty list" situation, and the rest of the rules grow the result from that empty list:
dt([],[]). % Base case: empty list produces an empty list
dt([X|Like],R):- X =< 0, addNegavite(Like,X,R).
dt([X|Like],R):- X > 0, addPositive(Like,X,R).
% The two remaining rules do the tail first, then append:
addNegavite(Like,X,R):- dt(Like, Z), append([X,X,X], Z, R).
addPositive(Like,X,R):- dt(Like, Z), append([X,X], Z, R).
Demo #2
Why do you call write inside your clauses?
Better don't have side-effects in your clauses:
dt([], []).
dt([N|NS], [N,N,N|MS]) :-
N =< 0,
dt(NS, MS).
dt([N|NS], [N,N|MS]) :-
N > 0,
dt(NS, MS).
That will work:
?- dt([-3,2,0], R).
R = [-3, -3, -3, 2, 2, 0, 0, 0] .
A further advantage of not invoking functions with side-effects in clauses is that the reverse works, too:
?- dt(R, [-3, -3, -3, 2, 2, 0, 0, 0]).
R = [-3, 2, 0] .
Of cause you can invoke write outside of your clauses:
?- dt([-3,2,0], R), write(R).
[-3,-3,-3,2,2,0,0,0]
R = [-3, -3, -3, 2, 2, 0, 0, 0] .

Backpropagation, all outputs tend to 1

I have this Backpropagation implementation in MATLAB, and have an issue with training it. Early on in the training phase, all of the outputs go to 1. I have normalized the input data(except the desired class, which is used to generate a binary target vector) to the interval [0, 1]. I have been referring to the implementation in Artificial Intelligence: A Modern Approach, Norvig et al.
Having checked the pseudocode against my code(and studying the algorithm for some time), I cannot spot the error. I have not been using MATLAB for that long, so have been trying to use the documentation where needed.
I have also tried different amounts of nodes in the hidden layer and different learning rates (ALPHA).
The target data encodings are as follows: when the target is to classify as, say 2, the target vector would be [0,1,0], say it were 1, [1, 0, 0] so on and so forth. I have also tried using different values for the target, such as (for class 1 for example) [0.5, 0, 0].
I noticed that some of my weights go above 1, resulting in large net values.
%Topological constants
NUM_HIDDEN = 8+1;%written as n+1 so is clear bias is used
NUM_OUT = 3;
%Training constants
ALPHA = 0.01;
TARG_ERR = 0.01;
MAX_EPOCH = 50000;
%Read and normalize data file.
X = normdata(dlmread('iris.data'));
X = shuffle(X);
%X_test = normdata(dlmread('iris2.data'));
%epocherrors = fopen('epocherrors.txt', 'w');
%Weight matrices.
%Features constitute size(X, 2)-1, however size is (X, 2) to allow for
%appending bias.
w_IH = rand(size(X, 2), NUM_HIDDEN)-(0.5*rand(size(X, 2), NUM_HIDDEN));
w_HO = rand(NUM_HIDDEN+1, NUM_OUT)-(0.5*rand(NUM_HIDDEN+1, NUM_OUT));%+1 for bias
%Layer nets
net_H = zeros(NUM_HIDDEN, 1);
net_O = zeros(NUM_OUT, 1);
%Layer outputs
out_H = zeros(NUM_HIDDEN, 1);
out_O = zeros(NUM_OUT, 1);
%Layer deltas
d_H = zeros(NUM_HIDDEN, 1);
d_O = zeros(NUM_OUT, 1);
%Control variables
error = inf;
epoch = 0;
%Run the algorithm.
while error > TARG_ERR && epoch < MAX_EPOCH
for n=1:size(X, 1)
x = [X(n, 1:size(X, 2)-1) 1]';%Add bias for hiddens & transpose to column vector.
o = X(n, size(X, 2));
%Forward propagate.
net_H = w_IH'*x;%Transposed w.
out_H = [sigmoid(net_H); 1]; %Append 1 for bias to outputs
net_O = w_HO'*out_H;
out_O = sigmoid(net_O); %Again, transposed w.
%Calculate output deltas.
d_O = ((targetVec(o, NUM_OUT)-out_O) .* (out_O .* (1-out_O)));
%Calculate hidden deltas.
for i=1:size(w_HO, 1);
delta_weight = 0;
for j=1:size(w_HO, 2)
delta_weight = delta_weight + d_O(j)*w_HO(i, j);
end
d_H(i) = (out_H(i)*(1-out_H(i)))*delta_weight;
end
%Update hidden-output weights
for i=1:size(w_HO, 1)
for j=1:size(w_HO, 2)
w_HO(i, j) = w_HO(i, j) + (ALPHA*out_H(i)*d_O(j));
end
end
%Update input-hidden weights.
for i=1:size(w_IH, 1)
for j=1:size(w_IH, 2)
w_IH(i, j) = w_IH(i, j) + (ALPHA*x(i)*d_H(j));
end
end
out_O
o
%out_H
%w_IH
%w_HO
%d_O
%d_H
end
end
function outs = sigmoid(nets)
outs = zeros(size(nets, 1), 1);
for i=1:size(nets, 1)
if nets(i) < -45
outs(i) = 0;
elseif nets(i) > 45
outs(i) = 1;
else
outs(i) = 1/1+exp(-nets(i));
end
end
end
From what we've established in the comments the only thing that comes in my mind are all recipes written down together in this great NN archive:
ftp://ftp.sas.com/pub/neural/FAQ2.html#questions
First things you could try are:
1) How to avoid overflow in the logistic function? Probably that's the problem - many times I've implemented NNs the problem was with such an overflow.
2) How should categories be encoded?
And more general:
3) How does ill-conditioning affect NN training?
4) Help! My NN won't learn! What should I do?
After the discussion it turns out the problem lies within the sigmoid function:
function outs = sigmoid(nets)
%...
outs(i) = 1/1+exp(-nets(i)); % parenthesis missing!!!!!!
%...
end
It should be:
function outs = sigmoid(nets)
%...
outs(i) = 1/(1+exp(-nets(i)));
%...
end
The lack of parenthesis caused that the sigmoid output was larger than 1 sometimes. That made the gradient calculation incorrect (because it wasn't a gradient of this function). This caused the gradient to be negative. And this caused that the delta for the output layer was most of the time in the wrong direction. After the fix (the after correctly maintaining the error variable - this seems to be missing in your code) all seems to work fine.
Beside that, there are two other main problems with this code:
1) No bias. Without the bias each neuron can only represent a line which crosses the origin. If data is normalized (i.e. values are between 0 and 1), some configurations are inseparable.
2) Lack of guarding against high gradient values (point 1 in my previous answer).

Resources