The total error for the network did not change on over 100,000 iterations.
The input is 22 values and the output is a single value. the input array is [195][22] and the output array is [195][1].
BasicNetwork network = new BasicNetwork();
network.addLayer(new BasicLayer(null,true,22));
network.addLayer(new BasicLayer(new ActivationSigmoid(),true,10));
network.addLayer(new BasicLayer(new ActivationSigmoid(),false,1));
network.getStructure().finalizeStructure();
network.reset();
MLDataSet training_data = new BasicMLDataSet(input, target_output);
final Backpropagation train = new Backpropagation(network, training_data);
int epoch = 1;
do {
train.iteration();
System.out.println("Epoch #" + epoch + " Error:" + train.getError());
epoch++;
}
while(train.getError() > 0.01);
{
train.finishTraining();
}
What is wrong with this code?
Depending on what the data you are trying to classify your network may be too small to transform the search space into a linearly separable problem. So try adding more neurons or layers - this will probably take longer to train. Unless it is already linearly separable and then a NN may be an inefficient way to solve this.
Also you don't have a training strategy, if the NN falls into local minima on the error surface it will be stuck there. See the encog user guide https://s3.amazonaws.com/heatonresearch-books/free/Encog3Java-User.pdf pg 166 has a list of training strategy's.
final int strategyCycles = 50;
final double strategyError = 0.25;
train.addStrategy(new ResetStrategy(strategyError,strategyCycles));
Related
I'm currently working on implementing a neural CRF as a project for school and am looking around for repos to reference.
I encountered this one the other day and have been completely stumped by the implementation of the forward algorithm.
T = feats.shape[1]
batch_size = feats.shape[0]
# alpha_recursion,forward, alpha(zt)=p(zt,bar_x_1:t)
log_alpha = torch.Tensor(batch_size, 1, self.num_labels).fill_(-10000.).to(self.device)
# normal_alpha_0 : alpha[0]=Ot[0]*self.PIs
# self.start_label has all of the score. it is log,0 is p=1
log_alpha[:, 0, self.start_label_id] = 0
# feats: sentances -> word embedding -> lstm -> MLP -> feats
# feats is the probability of emission, feat.shape=(1,tag_size)
for t in range(1, T):
log_alpha = (log_sum_exp_batch(self.transitions + log_alpha, axis=-1) + feats[:, t]).unsqueeze(1)
# log_prob of all barX
log_prob_all_barX = log_sum_exp_batch(log_alpha)
return log_prob_all_barX
From my understanding, the forward algorithm and its log alpha should be the log of the following where s_m is the scoring function - transition score from previous to current tag + emission score of neural hidden state/feature:
it seems to me that the code should be something more akin to log_alpha = log_sum_exp(transition_score + feat_score + log_alpha) if the log is applied.
I am writing a program of classification problem using LSTM.
However, I do not know how to calculate cross entropy with all the output of LSTM.
Here is a part of my program.
cell_fw = tf.nn.rnn_cell.LSTMCell(num_hidden)
cell_bw = tf.nn.rnn_cell.LSTMCell(num_hidden)
outputs, _ = tf.nn.bidirectional_dynamic_rnn(cell_fw,cell_bw,inputs = inputs3, dtype=tf.float32,sequence_length= seq_len)
outputs = tf.concat(outputs,axis=2)
#outputs [batch_size,max_timestep,num_features]
outputs = tf.reshape(outputs, [-1, num_hidden*2])
W = tf.Variable(tf.truncated_normal([num_hidden*2,
num_classes],
stddev=0.1))
b = tf.Variable(tf.constant(0., shape=[num_classes]))
logits = tf.matmul(outputs, W) + b
How can I apply crossentropy error to this?
Should I create a vector that represents the same class as the number of max_timestep for each batch and calculate the error with that?
Have you looked at cross_entropy documentation: https://www.tensorflow.org/api_docs/python/tf/losses/softmax_cross_entropy ?
The dimension of onehot_labels should answer your question.
I'm beginner in tensorflow and i'm working on a Model which Colorize Greyscale images and in the last part of the model the paper say :
Once the features are fused, they are processed by a set of
convolutions and upsampling layers, the latter which consist of simply
upsampling the input by using the nearest neighbour technique so that
the output is twice as wide and twice as tall.
when i tried to implement it in tensorflow i used tf.image.resize_nearest_neighbor for upsampling but when i used it i found the cost didn't change in all the epochs except of the 2nd epoch, and without it the cost is optmized and changed
This part of code
def Model(Input_images):
#some code till the following last part
Color_weights = {'W_conv1':tf.Variable(tf.random_normal([3,3,256,128])),'W_conv2':tf.Variable(tf.random_normal([3,3,128,64])),
'W_conv3':tf.Variable(tf.random_normal([3,3,64,64])),
'W_conv4':tf.Variable(tf.random_normal([3,3,64,32])),'W_conv5':tf.Variable(tf.random_normal([3,3,32,2]))}
Color_biases = {'b_conv1':tf.Variable(tf.random_normal([128])),'b_conv2':tf.Variable(tf.random_normal([64])),'b_conv3':tf.Variable(tf.random_normal([64])),
'b_conv4':tf.Variable(tf.random_normal([32])),'b_conv5':tf.Variable(tf.random_normal([2]))}
Color_layer1 = tf.nn.relu(Conv2d(Fuse, Color_weights['W_conv1'], 1) + Color_biases['b_conv1'])
Color_layer1_up = tf.image.resize_nearest_neighbor(Color_layer1,[56,56])
Color_layer2 = tf.nn.relu(Conv2d(Color_layer1_up, Color_weights['W_conv2'], 1) + Color_biases['b_conv2'])
Color_layer3 = tf.nn.relu(Conv2d(Color_layer2, Color_weights['W_conv3'], 1) + Color_biases['b_conv3'])
Color_layer3_up = tf.image.resize_nearest_neighbor(Color_layer3,[112,112])
Color_layer4 = tf.nn.relu(Conv2d(Color_layer3, Color_weights['W_conv4'], 1) + Color_biases['b_conv4'])
return Color_layer4
The Training Code
Prediction = Model(Input_images)
Colorization_MSE = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(Prediction,tf.Variable(tf.random_normal([2,112,112,32]))))
Optmizer = tf.train.AdadeltaOptimizer(learning_rate= 0.05).minimize(Colorization_MSE)
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
for epoch in range(EpochsNum):
epoch_loss = 0
Batch_indx = 1
for i in range(int(ExamplesNum / Batch_size)):#Over batches
print("Batch Num ",i + 1)
ReadNextBatch()
a, c = sess.run([Optmizer,Colorization_MSE],feed_dict={Input_images:Batch_GreyImages})
epoch_loss += c
print("epoch: ",epoch + 1, ",Los: ",epoch_loss)
So what is wrong with my logic or if the problem is in
tf.image.resize_nearest_neighbor what should i do or what is it's replacement ?
Ok, i solved it, i noticed that tf.random normal was the problem and when i replaced it with tf.truncated normal it is works well
I am trying to implement a custom Keras objective function:
in 'Direct Intrinsics: Learning Albedo-Shading Decomposition by Convolutional Regression', Narihira et al.
This is the sum of equations (4) and (6) from the previous picture. Y* is the ground truth, Y a prediction map and y = Y* - Y.
This is my code:
def custom_objective(y_true, y_pred):
#Eq. (4) Scale invariant L2 loss
y = y_true - y_pred
h = 0.5 # lambda
term1 = K.mean(K.sum(K.square(y)))
term2 = K.square(K.mean(K.sum(y)))
sca = term1-h*term2
#Eq. (6) Gradient L2 loss
gra = K.mean(K.sum((K.square(K.gradients(K.sum(y[:,1]), y)) + K.square(K.gradients(K.sum(y[1,:]), y)))))
return (sca + gra)
However, I suspect that the equation (6) is not correctly implemented because the results are not good. Am I computing this right?
Thank you!
Edit:
I am trying to approximate (6) convolving with Prewitt filters. It works when my input is a chunk of images i.e. y[batch_size, channels, row, cols], but not with y_true and y_pred (which are of type TensorType(float32, 4D)).
My code:
def cconv(image, g_kernel, batch_size):
g_kernel = theano.shared(g_kernel)
M = T.dtensor3()
conv = theano.function(
inputs=[M],
outputs=conv2d(M, g_kernel, border_mode='full'),
)
accum = 0
for curr_batch in range (batch_size):
accum = accum + conv(image[curr_batch])
return accum/batch_size
def gradient_loss(y_true, y_pred):
y = y_true - y_pred
batch_size = 40
# Direction i
pw_x = np.array([[-1,0,1],[-1,0,1],[-1,0,1]]).astype(np.float64)
g_x = cconv(y, pw_x, batch_size)
# Direction j
pw_y = np.array([[-1,-1,-1],[0,0,0],[1,1,1]]).astype(np.float64)
g_y = cconv(y, pw_y, batch_size)
gra_l2_loss = K.mean(K.square(g_x) + K.square(g_y))
return (gra_l2_loss)
The crash is produced in:
accum = accum + conv(image[curr_batch])
...and error description is the following one:
*** TypeError: ('Bad input argument to theano function with name "custom_models.py:836" at index 0 (0-based)', 'Expected an array-like
object, but found a Variable: maybe you are trying to call a function
on a (possibly shared) variable instead of a numeric array?')
How can I use y (y_true - y_pred) as a numpy array, or how can I solve this issue?
SIL2
term1 = K.mean(K.square(y))
term2 = K.square(K.mean(y))
[...]
One mistake spread across the code was that when you see (1/n * sum()) in the equations, it is a mean. Not the mean of a sum.
Gradient
After reading your comment and giving it more thought, I think there is a confusion about the gradient. At least I got confused.
There are two ways of interpreting the gradient symbol:
The gradient of a vector where y should be differentiated with respect to the parameters of your model (usually the weights of the neural net). In previous edits I started to write in this direction because that's the sort of approach used to trained the model (eg. gradient descent). But I think I was wrong.
The pixel intensity gradient in a picture, as you mentioned in your comment. The diff of each pixel with its neighbor in each direction. In which case I guess you have to translate the example you gave into Keras.
To sum up, K.gradients() and numpy.gradient() are not used in the same way. Because numpy implicitly considers (i, j) (the row and column indices) as the two input variables, while when you feed a 2D image to a neural net, every single pixel is an input variable. Hope I'm clear.
I've been using MATLAB for my time series dataset (for an electricity dataset) as a part of my course. It consists of 40,000+ samples. After the formation of neural network, I wanted to test its accuracy. I've been curious more on MAPE(mean absolute percentage error) and RMS(Root Mean Square) errors. To calculate them, I've used following lines of code.
mape_res = zeros(N_TRAIN);
mse_res = zeros(N_TRAIN);
for i_train = 1:N_TRAIN
Inp = inputs_consumption(i_train );
Actual_Output = targets_consumption( i_train + 1 );
Observed_Output = sim( ann, Inp );
mape_res(i_train) = abs(Observed_Output - Actual_Output)/Actual_Output;
mse_res(i_train) = Observed_Output - Actual_Output;
end
mape = sum(mape_res)/N_TRAIN;
mse = sum(power(mse_res,2))/N_TRAIN;
sprintf( 'The MSE on training is %g', mse )
sprintf( 'The MAPE on training is %g', mape )
The problem with above coding is that, for a large dataset(40K samples), it takes almost 15 minutes to iterate through all those loops and it's quite a long waiting for getting result for the error rate; Isn't there any other efficient way to calculate them?
You could always do a rolling average that gets updated each iteration, as follows:
mape_res = abs(Observed_Output - Actual_Output) / Actual_Output;
mse_res = Observed_Output - Actual_Output;
alpha = 1 / i_train;
mape = mape * (1 - alpha) + mape_res * alpha;
mse = mes * (1 - alpha) + power(mse_res,2) * alpha;
Then you could either display the resulting values each iteration, use them for stopping criteria if the desired error rate is reached, or both. This also has the added benefit of not requiring the initialization and population of the mape_res and mse_res vectors unless they happen to be needed elsewhere...
Edit: Do make sure to initialize the mape and mse values to zero prior to entering the for loop :)