No weights change, only the bias of the last convnet? - machine-learning

i got a one-file python game, where a pixel in the first array should hunt (on the same postion in his array) a pixel in the second array. I trained it now for hours and hours and the only thing changes in the neural net seemed to be the bias of the last convnet ? I think, mostly the weights should change and not so much the bias, or? The code of this simple game is here: https://github.com/flobotics/flobotics_tensorflow_game/blob/master/pixel_hunter_game/flobotics_game.py
And here i got pictures of the weights and biases in tensorboard

It seemed that batch_norm does not exist anymore, but there is batch_normalization ? would that be an correct implementation in my case ?
h_conv1 = tf.nn.relu(tf.nn.conv2d(input_layer, conv_weights_1, strides=[1, 4, 4, 1], padding="SAME") + conv_biases_1)
#batch normalization
bn_mean, bn_variance = tf.nn.moments(h_conv1,[0,1,2])
bn_scale = tf.Variable(tf.ones([32]))
bn_offset = tf.Variable(tf.zeros([32]))
bn_epsilon = 1e-3
bn_conv1 = tf.nn.batch_normalization(h_conv1, bn_mean, bn_variance, bn_offset, bn_scale, bn_epsilon)
#h_conv1 = tf.nn.relu(tf.nn.conv2d(input_layer, conv_weights_1, strides=[1, 4, 4, 1], padding="SAME") + conv_biases_1)
#h_pool1 = max_pool_2x2(h_conv1)
h_pool1 = max_pool_2x2(bn_conv1)

Related

Pytorch Multiclass Logistic Regression Type Errors

I'm new to ML and even more naive with Pytorch. Here's the problem. (I've skipped certain parts like the random_split() which seem to work just fine)
I've to predict wine quality (red) which from the dataset is the last column with 6 classes
That's what my dataset looks like
The link to the dataset (winequality-red.csv)
features = df.drop(['quality'], axis = 1)
targets = df.iloc[:, -1] # theres 6 classes
dataset = TensorDataset(torch.Tensor(np.array(features)).float(), torch.Tensor(targets).float())
# here's where I think the error might be, but I might be wrong
batch_size = 8
# Dataloader
train_loader = DataLoader(train_ds, batch_size, shuffle = True)
val_loader = DataLoader(val_ds, batch_size)
test_ds = DataLoader(test_ds, batch_size)
input_size = len(df.columns) - 1
output_size = 6
threshold = .5
class WineModel(nn.Module):
def __init__(self):
super().__init__()
self.linear = nn.Linear(input_size, output_size)
def forward(self, xb):
out = self.linear(xb)
return out
model = WineModel()
n_iters = 2000
num_epochs = n_iters / (len(train_ds) / batch_size)
num_epochs = int(num_epochs)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-2)
# the part below returns the error on running
iter = 0
for epoch in range(num_epochs):
for i, (x, y) in enumerate(train_loader):
optimizer.zero_grad()
outputs = model(x)
loss = criterion(outputs, y)
loss.backward()
optimizer.step()
RuntimeError: expected scalar type Long but found Float
Hopefully that is sufficient info
The targets for nn.CrossEntropyLoss are given as the class indices, which are required to be integers, to be precise they need to be of type torch.long, which is equivalent to torch.int64.
You converted the targets to floats, but you should convert them to longs:
dataset = TensorDataset(torch.Tensor(np.array(features)).float(), torch.Tensor(targets).long())
Since the targets are the indices of the classes, they must be in range [0, num_classes - 1]. As you have 6 classes that would be in range [0, 5]. Having a quick look at your data, the quality uses values in range [3, 8]. Even though you have 6 classes, the values cannot be used directly as the classes. If you list the classes as classes = [3, 4, 5, 6, 7, 8], you can see that the first class is 3, classes[0] == 3, up to the last class being classes[5] == 8.
You need to replace the class values with the indices, just like you would for named classes (e.g. if you had the classes dog and cat, dog would be 0 and cat would be 1), but you can avoid having to look them up, since the values are simply shifted by 3, i.e. index = classes[index] - 3. Therefore you can subtract 3 from the entire target tensor:
torch.Tensor(targets).long() - 3

Training Loss Isnt Decreasing Across Epochs

I am trying to construct a siamese neural network to take in two facial expressions and output the probability that the two images are similar. I have 5 people, with 10 expressions per person, so 50 total images, but with Siamese, I can generate 2500 pairs (with repetition). I have already run dlib's facial landmark detection on each of the 50 images, so each of the two inputs to the siamese net are two flattened 136,1 element arrays. The siamese structure is below:
input_shape = (136,)
left_input = Input(input_shape, name = 'left')
right_input = Input(input_shape, name = 'right')
convnet = Sequential()
convnet.add(Dense(50,activation="relu"))
encoded_l = convnet(left_input)
encoded_r = convnet(right_input)
L1_layer = Lambda(lambda tensors:K.abs(tensors[0] - tensors[1]))
L1_distance = L1_layer([encoded_l, encoded_r])
prediction = Dense(1,activation='relu')(L1_distance)
siamese_net = Model(inputs=[left_input,right_input],outputs=prediction)
optimizer = Adam()
siamese_net.compile(loss="binary_crossentropy",optimizer=optimizer)
I have an array called x_train, which is 80% of all possible labels and whose elements are a list of lists. Data is a 5x10x64x2 matrix, where there are 5 ppl, 10 expressions per, 64 facial landmarks, and 2 positons (x,y) per landmark
x_train = [ [ [Person, Expression] , [Person, Expression] ], ...]
data = np.load('data.npy')
My train loop goes as follows:
def train(data, labels, network, epochs, batch_size):
track_loss = defaultdict(list)
for i in range(0, epochs):
iterations = len(labels)//batch_size
remain = len(labels)%batch_size
shuffle(labels)
print('Epoch%s-----' %(i + 1))
for j in range(0, iterations):
batch = [j*batch_size, j*batch_size + batch_size]
if(j == iterations - 1):
batch[1] += remain
mini_batch = np.zeros(shape = (batch[1] - batch[0], 2, 136))
for k in range(batch[0], batch[1]):
prepx = data[labels[k][0][0],labels[k][0][1],:,:]
prepy = data[labels[k][1][0],labels[k][1][1],:,:]
mini_batch[k - batch[0]][0] = prepx.flatten()
mini_batch[k - batch[0]][1] = prepy.flatten()
targets = np.array([1 if(labels[i][0][1] == labels[i][1][1]) else 0 for i in range(batch[0], batch[1])])
new_batch = mini_batch.reshape(batch[1] - batch[0], 2, 136, 1)
new_targets = targets.reshape(batch[1] - batch[0], 1)
#print(mini_batch.shape, targets.shape)
loss=siamese_net.train_on_batch(
{
'left': mini_batch[:, 0, :],
'right': mini_batch[:, 1, :]
},targets)
track_loss['Epoch%s'%(i)].append(loss)
return network, track_loss
siamese_net, track_loss = train(data, x_train,siamese_net, 20, 30)
The value of each element in the target array is either a 0 or 1, depending on whether the two expressions inputted into the net are different or the same.
Although, I have seen in the omniglot example there were far more images and test images, my neural net doesn't have any decrease in loss.
EDIT:
Here is the new loss with the fixed target tensor:
Epoch1-----Loss: 1.979214
Epoch2-----Loss: 1.631347
Epoch3-----Loss: 1.628090
Epoch4-----Loss: 1.634603
Epoch5-----Loss: 1.621578
Epoch6-----Loss: 1.631347
Epoch7-----Loss: 1.631347
Epoch8-----Loss: 1.631347
Epoch9-----Loss: 1.621578
Epoch10-----Loss: 1.634603
Epoch11-----Loss: 1.634603
Epoch12-----Loss: 1.621578
Epoch13-----Loss: 1.628090
Epoch14-----Loss: 1.624834
Epoch15-----Loss: 1.631347
Epoch16-----Loss: 1.634603
Epoch17-----Loss: 1.628090
Epoch18-----Loss: 1.631347
Epoch19-----Loss: 1.624834
Epoch20-----Loss: 1.624834
I want to know how to improve my architecture and training procedure, or even data prep, in order to improve the performance of the neural net. I assumed that using dlib's facial landmark detection would simplify complexity of the neural net, but I am starting to doubt that hypothesis.

GAN Converges in Just a Few Epochs

I implemented a genrative adversarial network in Keras. My training data size is about 16,000, where each image is of 32*32 size. All of my training images are the resized versions of the imageds from the imagenet dataset with regard to the object detection task. I fed the image matrix directly into the network without doing the center crop. I used the AdamOptimizer with the learning rate being 1e-4, and beta1 being 0.5 and I also set the dropout rate to be 0.1. I first trained the discrimator on 3000 real images and 3000 fake images and it achieved a 93% accuracy. Then, I trained for 500 epochs with the batch size being 32. However, my model seemed to converge in only a few epochs(<10), and the images it generated were ugly.
Plot of the Loss Function
Random Samples Generated by the Generator
I was wondering whether my training dataset is too small(compared to those in the paper of DCGAN, which are more than 300,000) or my model configuration is not correct. What's more, should I train the SGD on D for k iterations (where k is small, perhaps 1) and then training with SGD on G for one iteration as suggested by Ian Goodfellow in the original paper?(I have just tried to train them one at a time)
Below is the configuration of the generator.
g_input = Input(shape=[100])
H = Dense(1024*4*4, init='glorot_normal')(g_input)
H = BatchNormalization(mode=2)(H)
H = Activation('relu')(H)
H = Reshape( [4, 4,1024] )(H)
H = UpSampling2D(size=( 2, 2))(H)
H = Convolution2D(512, 3, 3, border_mode='same', init='glorot_uniform')(H)
H = BatchNormalization(mode=2)(H)
H = Activation('relu')(H)
H = UpSampling2D(size=( 2, 2))(H)
H = Convolution2D(256, 3, 3, border_mode='same', init='glorot_uniform')(H)
H = BatchNormalization(mode=2)(H)
H = Activation('relu')(H)
H = UpSampling2D(size=( 2, 2))(H)
H = Convolution2D(3, 3, 3, border_mode='same', init='glorot_uniform')(H)
g_V = Activation('tanh')(H)
generator = Model(g_input,g_V)
generator.compile(loss='binary_crossentropy', optimizer=opt)
generator.summary()
Below is the configuration of the discriminator:
d_input = Input(shape=shp)
H = Convolution2D(64, 5, 5, subsample=(2, 2), border_mode = 'same', init='glorot_normal')(d_input)
H = LeakyReLU(0.2)(H)
#H = Dropout(dropout_rate)(H)
H = Convolution2D(128, 5, 5, subsample=(2, 2), border_mode = 'same', init='glorot_normal')(H)
H = BatchNormalization(mode=2)(H)
H = LeakyReLU(0.2)(H)
#H = Dropout(dropout_rate)(H)
H = Flatten()(H)
H = Dense(256, init='glorot_normal')(H)
H = LeakyReLU(0.2)(H)
d_V = Dense(2,activation='softmax')(H)
discriminator = Model(d_input,d_V)
discriminator.compile(loss='categorical_crossentropy', optimizer=dopt)
discriminator.summary()
Below is the configuration of GAN as a whole:
gan_input = Input(shape=[100])
H = generator(gan_input)
gan_V = discriminator(H)
GAN = Model(gan_input, gan_V)
GAN.compile(loss='categorical_crossentropy', optimizer=opt)
GAN.summary()
I think problem is with loss function
Try
loss='categorical_crossentropy',
I suspect that your generator is trainable while you training the gan. You can verify by using generator.layers[-1].get_weights() to see if the parameters changed during training process of gan.
You should freeze discriminator before you assemble it to gan:
generator.trainnable = False
gan_input = Input(shape=[100])
H = generator(gan_input)
gan_V = discriminator(H)
GAN = Model(gan_input, gan_V)
GAN.compile(loss='categorical_crossentropy', optimizer=opt)
GAN.summary()
see this discussion:
https://github.com/fchollet/keras/issues/4674

Stacked RNN model setup in TensorFlow

I'm kind of lost in building up a stacked LSTM model for text classification in TensorFlow.
My input data was something like:
x_train = [[1.,1.,1.],[2.,2.,2.],[3.,3.,3.],...,[0.,0.,0.],[0.,0.,0.],
...... #I trained the network in batch with batch size set to 32.
]
y_train = [[1.,0.],[1.,0.],[0.,1.],...,[1.,0.],[0.,1.]]
# binary classification
The skeleton of my code looks like:
self._input = tf.placeholder(tf.float32, [self.batch_size, self.max_seq_length, self.vocab_dim], name='input')
self._target = tf.placeholder(tf.float32, [self.batch_size, 2], name='target')
lstm_cell = rnn_cell.BasicLSTMCell(self.vocab_dim, forget_bias=1.)
lstm_cell = rnn_cell.DropoutWrapper(lstm_cell, output_keep_prob=self.dropout_ratio)
self.cells = rnn_cell.MultiRNNCell([lstm_cell] * self.num_layers)
self._initial_state = self.cells.zero_state(self.batch_size, tf.float32)
inputs = tf.nn.dropout(self._input, self.dropout_ratio)
inputs = [tf.reshape(input_, (self.batch_size, self.vocab_dim)) for input_ in
tf.split(1, self.max_seq_length, inputs)]
outputs, states = rnn.rnn(self.cells, inputs, initial_state=self._initial_state)
# We only care about the output of the last RNN cell...
y_pred = tf.nn.xw_plus_b(outputs[-1], tf.get_variable("softmax_w", [self.vocab_dim, 2]), tf.get_variable("softmax_b", [2]))
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_pred, self._target))
correct_pred = tf.equal(tf.argmax(y_pred, 1), tf.argmax(self._target, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
train_op = tf.train.AdamOptimizer(self.lr).minimize(loss)
init = tf.initialize_all_variables()
with tf.Session() as sess:
initializer = tf.random_uniform_initializer(-0.04, 0.04)
with tf.variable_scope("model", reuse=True, initializer=initializer):
sess.run(init)
# generate batches here (omitted for clarity)
print sess.run([train_op, loss, accuracy], feed_dict={self._input: batch_x, self._target: batch_y})
The problem is that no matter how large the dataset is, the loss and accuracy has no sign of improvement (looks completely stochastic). Am I doing anything wrong?
Update:
# First, load Word2Vec model in Gensim.
model = Doc2Vec.load(word2vec_path)
# Second, build the dictionary.
gensim_dict = Dictionary()
gensim_dict.doc2bow(model.vocab.keys(), allow_update=True)
w2indx = {v: k + 1 for k, v in gensim_dict.items()}
w2vec = {word: model[word] for word in w2indx.keys()}
# Third, read data from a text file.
for fname in fnames:
i = 0
with codecs.open(fname, 'r', encoding='utf8') as fr:
for line in fr:
tmp = []
for t in line.split():
tmp.append(t)
X_train.append(tmp)
i += 1
if i is samples_count:
break
# Fourth, convert words into vectors, and pad each sentence with ZERO arrays to a fixed length.
result = np.zeros((len(data), self.max_seq_length, self.vocab_dim), dtype=np.float32)
for rowNo in xrange(len(data)):
rowLen = len(data[rowNo])
for colNo in xrange(rowLen):
word = data[rowNo][colNo]
if word in w2vec:
result[rowNo][colNo] = w2vec[word]
else:
result[rowNo][colNo] = [0] * self.vocab_dim
for colPadding in xrange(rowLen, self.max_seq_length):
result[rowNo][colPadding] = [0] * self.vocab_dim
return result
# Fifth, generate batches and feed them to the model.
... Trivias ...
Here are few reasons it may not be training and suggestions to try:
You are not allowing to update word vectors, space of pre-learned vectors may be not working properly.
RNNs really need gradient clipping when trained. You can try adding something like this.
Unit scale initialization seems to work better, as it accounts for the size of the layer and allows gradient to be scaled properly as it goes deeper.
You should try removing dropout and second layer - just to check if your data passing is correct and your loss is going down at all.
I also can recommend trying this example with your data: https://github.com/tensorflow/skflow/blob/master/examples/text_classification.py
It trains word vectors from scratch, already has gradient clipping and uses GRUCells which usually are easier to train. You can also see nice visualizations for loss and other things by running tensorboard logdir=/tmp/tf_examples/word_rnn.

Backpropagation, all outputs tend to 1

I have this Backpropagation implementation in MATLAB, and have an issue with training it. Early on in the training phase, all of the outputs go to 1. I have normalized the input data(except the desired class, which is used to generate a binary target vector) to the interval [0, 1]. I have been referring to the implementation in Artificial Intelligence: A Modern Approach, Norvig et al.
Having checked the pseudocode against my code(and studying the algorithm for some time), I cannot spot the error. I have not been using MATLAB for that long, so have been trying to use the documentation where needed.
I have also tried different amounts of nodes in the hidden layer and different learning rates (ALPHA).
The target data encodings are as follows: when the target is to classify as, say 2, the target vector would be [0,1,0], say it were 1, [1, 0, 0] so on and so forth. I have also tried using different values for the target, such as (for class 1 for example) [0.5, 0, 0].
I noticed that some of my weights go above 1, resulting in large net values.
%Topological constants
NUM_HIDDEN = 8+1;%written as n+1 so is clear bias is used
NUM_OUT = 3;
%Training constants
ALPHA = 0.01;
TARG_ERR = 0.01;
MAX_EPOCH = 50000;
%Read and normalize data file.
X = normdata(dlmread('iris.data'));
X = shuffle(X);
%X_test = normdata(dlmread('iris2.data'));
%epocherrors = fopen('epocherrors.txt', 'w');
%Weight matrices.
%Features constitute size(X, 2)-1, however size is (X, 2) to allow for
%appending bias.
w_IH = rand(size(X, 2), NUM_HIDDEN)-(0.5*rand(size(X, 2), NUM_HIDDEN));
w_HO = rand(NUM_HIDDEN+1, NUM_OUT)-(0.5*rand(NUM_HIDDEN+1, NUM_OUT));%+1 for bias
%Layer nets
net_H = zeros(NUM_HIDDEN, 1);
net_O = zeros(NUM_OUT, 1);
%Layer outputs
out_H = zeros(NUM_HIDDEN, 1);
out_O = zeros(NUM_OUT, 1);
%Layer deltas
d_H = zeros(NUM_HIDDEN, 1);
d_O = zeros(NUM_OUT, 1);
%Control variables
error = inf;
epoch = 0;
%Run the algorithm.
while error > TARG_ERR && epoch < MAX_EPOCH
for n=1:size(X, 1)
x = [X(n, 1:size(X, 2)-1) 1]';%Add bias for hiddens & transpose to column vector.
o = X(n, size(X, 2));
%Forward propagate.
net_H = w_IH'*x;%Transposed w.
out_H = [sigmoid(net_H); 1]; %Append 1 for bias to outputs
net_O = w_HO'*out_H;
out_O = sigmoid(net_O); %Again, transposed w.
%Calculate output deltas.
d_O = ((targetVec(o, NUM_OUT)-out_O) .* (out_O .* (1-out_O)));
%Calculate hidden deltas.
for i=1:size(w_HO, 1);
delta_weight = 0;
for j=1:size(w_HO, 2)
delta_weight = delta_weight + d_O(j)*w_HO(i, j);
end
d_H(i) = (out_H(i)*(1-out_H(i)))*delta_weight;
end
%Update hidden-output weights
for i=1:size(w_HO, 1)
for j=1:size(w_HO, 2)
w_HO(i, j) = w_HO(i, j) + (ALPHA*out_H(i)*d_O(j));
end
end
%Update input-hidden weights.
for i=1:size(w_IH, 1)
for j=1:size(w_IH, 2)
w_IH(i, j) = w_IH(i, j) + (ALPHA*x(i)*d_H(j));
end
end
out_O
o
%out_H
%w_IH
%w_HO
%d_O
%d_H
end
end
function outs = sigmoid(nets)
outs = zeros(size(nets, 1), 1);
for i=1:size(nets, 1)
if nets(i) < -45
outs(i) = 0;
elseif nets(i) > 45
outs(i) = 1;
else
outs(i) = 1/1+exp(-nets(i));
end
end
end
From what we've established in the comments the only thing that comes in my mind are all recipes written down together in this great NN archive:
ftp://ftp.sas.com/pub/neural/FAQ2.html#questions
First things you could try are:
1) How to avoid overflow in the logistic function? Probably that's the problem - many times I've implemented NNs the problem was with such an overflow.
2) How should categories be encoded?
And more general:
3) How does ill-conditioning affect NN training?
4) Help! My NN won't learn! What should I do?
After the discussion it turns out the problem lies within the sigmoid function:
function outs = sigmoid(nets)
%...
outs(i) = 1/1+exp(-nets(i)); % parenthesis missing!!!!!!
%...
end
It should be:
function outs = sigmoid(nets)
%...
outs(i) = 1/(1+exp(-nets(i)));
%...
end
The lack of parenthesis caused that the sigmoid output was larger than 1 sometimes. That made the gradient calculation incorrect (because it wasn't a gradient of this function). This caused the gradient to be negative. And this caused that the delta for the output layer was most of the time in the wrong direction. After the fix (the after correctly maintaining the error variable - this seems to be missing in your code) all seems to work fine.
Beside that, there are two other main problems with this code:
1) No bias. Without the bias each neuron can only represent a line which crosses the origin. If data is normalized (i.e. values are between 0 and 1), some configurations are inseparable.
2) Lack of guarding against high gradient values (point 1 in my previous answer).

Resources