This is the similar problem, that I was trying to solve and understand the neural network concept.
My assumption:
At hidden layers node 1:
1*1+2*1+3*(-5) = -12
At node 2:
1*3+2*(-4)+3*2 = 1
At output layer:
(-12) * 2 + (-12) * (-1) = -12
1 * 2 + 1 *(-1) = 1
Which resulted as a wrong output.
output layer will be -12*2 + -1*1 = -25.
It will be just one computation not 2 as you have quoted in the question.
Related
I'm trying to implement an addition to the loss function of the ppo algorithm in stable-baselines3. For this I collected additional observations for the states s(t-10) and s(t+1) which I can access in the train-function of the PPO class in ppo.py as part of the rollout_buffer.
I'm using a 3-layer-mlp as my network architecture and need the outputs of the second layer for the triplet (s(t-α), s(t), s(t+1)) to use them to calculate L = max(d(s(t+1) , s(t)) − d(s(t+1) , s(t−α)) + γ, 0), where d is the L2-distance.
Finally I want to add this term to the old loss, so loss = loss + 0.3 * L
This is my implementation starting with the original loss in line 242:
loss = policy_loss + self.ent_coef * entropy_loss + self.vf_coef * value_loss
###############################
net1 = nn.Sequential(*list(self.policy.mlp_extractor.policy_net.children())[:-1])
L_losses = []
a = 0
obs = rollout_data.observations
obs_alpha = rollout_data.observations_alpha
obs_plusone = rollout_data.observations_plusone
inds = rollout_data.inds
for i in inds:
if i > alpha: # only use observations for which L can be calculated
fs_t = net1(obs[a])
fs_talpha = net1(obs_alpha[a])
fs_tone = net1(obs_plusone[a])
L = max(
th.norm(th.subtract(fs_tone, fs_t)) - th.norm(th.subtract(fs_tone, fs_talpha)) + 1.0, 0.0)
L_losses.append(L)
else:
L_losses.append(0)
a += 1
L_loss = th.mean(th.FloatTensor(L_losses))
loss += 0.3 * L_loss
So with net1 I tried to get a clone of the original network with the outputs from the second layer. I am unsure if this is the right way to do this.
I do have some questions about my approach as the resulting performance is slightly worse compared to without the added term although it should be slightly better:
Is my way of getting the outputs of the second layer of the mlp network working?
When loss.backward() is called can the gradient be calculated correctly (with the new term included)?
I wrote k-means algorithm in tensorflow and tried to add minimal distance to summary:
Why don't plots have any length in horizontal dimension? What is on horizontal axis? What does m postfix mean?
The code is following:
global_count = 0
count = 0
self.report.add_time_stamp(description="Initializing with k-means...")
self.report.add_time_stamp(description="#\tdist")
self.report.add_time_stamp(description="%d\t%.4e " % (count, dist_min_sum_value))
while dist_min_sum_value != dist_min_sum_value_old:
count += 1
global_count += 1
dist_min_sum_value_old = dist_min_sum_value
mu_value, cluster_indice_value, distance_square_value, dist_min_value, dist_min_sum_value = \
sess.run([self.mu_assign_new_kmeans, self.cluster_indices_assign_new_kmeans, self.distance_square,
self.dist_min, self.dist_min_sum])
if self.k_means_summary is not None:
k_means_summary_value = sess.run(self.k_means_summary)
self.k_means_writer.add_summary(k_means_summary_value, global_count - 1)
print("%d\t%.4e " % (count, dist_min_sum_value))
k_means_summary definition looks like
self.dist_min_sum_summary = tf.summary.scalar('dist_min_sum_summary', tf.squeeze(self.dist_min_sum))
...
self.k_means_summary = tf.summary.merge([self.dist_min_sum_summary])
Is it correct that I merge single summary?
It seems like the script may only be writing 1 event. What is the output of running
tensorboard --inspect --logdir=[LOGDIR]
?
That should output counts for all event types. The format (but not the data) should look like this:
audio -
graph -
histograms -
images -
scalars
first_step 0
last_step 42
max_step 42
min_step 0
num_steps 43
outoforder_steps []
sessionlog:checkpoint -
sessionlog:start -
sessionlog:stop -
tensor -
I'm beginner in tensorflow and i'm working on a Model which Colorize Greyscale images and in the last part of the model the paper say :
Once the features are fused, they are processed by a set of
convolutions and upsampling layers, the latter which consist of simply
upsampling the input by using the nearest neighbour technique so that
the output is twice as wide and twice as tall.
when i tried to implement it in tensorflow i used tf.image.resize_nearest_neighbor for upsampling but when i used it i found the cost didn't change in all the epochs except of the 2nd epoch, and without it the cost is optmized and changed
This part of code
def Model(Input_images):
#some code till the following last part
Color_weights = {'W_conv1':tf.Variable(tf.random_normal([3,3,256,128])),'W_conv2':tf.Variable(tf.random_normal([3,3,128,64])),
'W_conv3':tf.Variable(tf.random_normal([3,3,64,64])),
'W_conv4':tf.Variable(tf.random_normal([3,3,64,32])),'W_conv5':tf.Variable(tf.random_normal([3,3,32,2]))}
Color_biases = {'b_conv1':tf.Variable(tf.random_normal([128])),'b_conv2':tf.Variable(tf.random_normal([64])),'b_conv3':tf.Variable(tf.random_normal([64])),
'b_conv4':tf.Variable(tf.random_normal([32])),'b_conv5':tf.Variable(tf.random_normal([2]))}
Color_layer1 = tf.nn.relu(Conv2d(Fuse, Color_weights['W_conv1'], 1) + Color_biases['b_conv1'])
Color_layer1_up = tf.image.resize_nearest_neighbor(Color_layer1,[56,56])
Color_layer2 = tf.nn.relu(Conv2d(Color_layer1_up, Color_weights['W_conv2'], 1) + Color_biases['b_conv2'])
Color_layer3 = tf.nn.relu(Conv2d(Color_layer2, Color_weights['W_conv3'], 1) + Color_biases['b_conv3'])
Color_layer3_up = tf.image.resize_nearest_neighbor(Color_layer3,[112,112])
Color_layer4 = tf.nn.relu(Conv2d(Color_layer3, Color_weights['W_conv4'], 1) + Color_biases['b_conv4'])
return Color_layer4
The Training Code
Prediction = Model(Input_images)
Colorization_MSE = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(Prediction,tf.Variable(tf.random_normal([2,112,112,32]))))
Optmizer = tf.train.AdadeltaOptimizer(learning_rate= 0.05).minimize(Colorization_MSE)
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
for epoch in range(EpochsNum):
epoch_loss = 0
Batch_indx = 1
for i in range(int(ExamplesNum / Batch_size)):#Over batches
print("Batch Num ",i + 1)
ReadNextBatch()
a, c = sess.run([Optmizer,Colorization_MSE],feed_dict={Input_images:Batch_GreyImages})
epoch_loss += c
print("epoch: ",epoch + 1, ",Los: ",epoch_loss)
So what is wrong with my logic or if the problem is in
tf.image.resize_nearest_neighbor what should i do or what is it's replacement ?
Ok, i solved it, i noticed that tf.random normal was the problem and when i replaced it with tf.truncated normal it is works well
I am using FANN for function approximation. My code is here:
/*
* File: main.cpp
* Author: johannsebastian
*
* Created on November 26, 2013, 8:50 PM
*/
#include "../FANN-2.2.0-Source/src/include/doublefann.h"
#include "../FANN-2.2.0-Source/src/include/fann_cpp.h"
//#include <doublefann>
//#include <fann/fann_cpp>
#include <cstdlib>
#include <iostream>
using namespace std;
using namespace FANN;
//Remember: fann_type is double!
int main(int argc, char** argv) {
//create a test network: [1,2,1] MLP
neural_net * net = new neural_net;
const unsigned int layers[3] = {1, 2, 1};
net->create_standard_array(3, layers);
//net->create_standard(num_layers, num_input, num_hidden, num_output);
//net->set_learning_rate(0.7f);
//net->set_activation_steepness_hidden(0.7);
//net->set_activation_steepness_output(0.7);
net->set_activation_function_hidden(SIGMOID);
net->set_activation_function_output(SIGMOID);
net->set_training_algorithm(TRAIN_RPROP);
//cout<<net->get_train_error_function()
//exit(0);
//test the number 2
fann_type * testinput = new fann_type;
*testinput = 2;
fann_type * testoutput = new fann_type;
*testoutput = *(net->run(testinput));
double outputasdouble = (double) *testoutput;
cout << "Test output: " << outputasdouble << endl;
//make a training set of x->x^2
training_data * squaredata = new training_data;
squaredata->read_train_from_file("trainingdata.txt");
//cout<<testinput[0]<<endl;
//cout<<testoutput[0]<<endl;
cout<<*(squaredata->get_input())[9]<<endl;
cout<<*(squaredata->get_output())[9]<<endl;
cout<<squaredata->length_train_data();
//scale data
fann_type * scaledinput = new fann_type[squaredata->length_train_data()];
fann_type * scaledoutput = new fann_type[squaredata->length_train_data()];
for (unsigned int i = 0; i < squaredata->length_train_data(); i++) {
scaledinput[i] = *squaredata->get_input()[i]/200;///100;
scaledoutput[i] = *squaredata->get_output()[i]/200;///100;
cout<<"In:\t"<<scaledinput[i]<<"\t Out:\t"<<scaledoutput[i]<<endl;
}
net->train_on_data(*squaredata, 1000000, 100000, 0.001);
*testoutput = *(net->run(testinput));
outputasdouble = (double) *testoutput;
cout << "Test output: " << outputasdouble << endl;
cout << endl << "Easy!";
return 0;
}
Here's trainingdata.txt:
10 1 1
1 1
2 4
3 9
4 16
5 25
6 36
7 49
8 64
9 81
10 100
When I run I get this:
Test output: 0.491454
10
100
10In: 0.005 Out: 0.005
In: 0.01 Out: 0.02
In: 0.015 Out: 0.045
In: 0.02 Out: 0.08
In: 0.025 Out: 0.125
In: 0.03 Out: 0.18
In: 0.035 Out: 0.245
In: 0.04 Out: 0.32
In: 0.045 Out: 0.405
In: 0.05 Out: 0.5
Max epochs 1000000. Desired error: 0.0010000000.
Epochs 1. Current error: 2493.7961425781. Bit fail 10.
Epochs 100000. Current error: 2457.3000488281. Bit fail 9.
Epochs 200000. Current error: 2457.3000488281. Bit fail 9.
Epochs 300000. Current error: 2457.3000488281. Bit fail 9.
Epochs 400000. Current error: 2457.3000488281. Bit fail 9.
Epochs 500000. Current error: 2457.3000488281. Bit fail 9.
Epochs 600000. Current error: 2457.3000488281. Bit fail 9.
Epochs 700000. Current error: 2457.3000488281. Bit fail 9.
Epochs 800000. Current error: 2457.3000488281. Bit fail 9.
Epochs 900000. Current error: 2457.3000488281. Bit fail 9.
Epochs 1000000. Current error: 2457.3000488281. Bit fail 9.
Test output: 1
Easy!
RUN FINISHED; exit value 0; real time: 9s; user: 10ms; system: 4s
Why is the training not working? After I asked a similar question, I was told to scale the NN's input and output. I have done so. Am I getting some parameter(s) wrong, or do I simply have to train longer?
The node number in your hidden layer is too few to fit a quadratic function. I would try 10.
Besides, I would like to recommend you a fun applet in which you can simulate the training process by parameter setting. I tried with 10 hidden layer nodes and unipolar sigmoid as both hidden layer and output layer activation function, the fitting is not bad (but randomize the weights may lead to the failure of converge, so more nodes in hidden layer are highly recommended, you can try to play this applet yourself and observe some interesting points):
Maybe a bit late, but maybe new FANN beginner will see this answer, I hope this helps !
I think your problem comes from the data format in your trainingdata.txt:
See :
FANN data format
You have to do a newline after each input and each output.
In your case, you have 10 examples with 1 input and 1 output. Then, you have to format your file like this :
10 1 1
1
1
2
4
3
9
4
16
5
25
6
36
...
Note : I notice when the data format is wrong, the error computed by training method is very (very) high. Could be an hint to look at your file format when you see huge error value.
thanks in advance for your help in figuring this out. I'm taking an algorithms class and I'm stuck on something. According to the professor, the following holds true where C(1)=1 and n is a power of 2:
C(n) = 2 * C(n/2) + n resolves to C(n) = n * lg(n) + n
C(n) = 2 * C(n/2) + lg(n) resolves to C(n) = 3 * n - lg(n) - 2
The first one I completely grok. As I understand the form, what's stated is that C(n) resolves to two sub-problems, each of which requires n/2 work to solve, and an additional n amount of work to split and merge everything. As such, for every division of the problem, the constant 2 is increased by a factor of ^k (where k is the number of splits), the 2 in n/2 is also increased by a factor of ^k for much the same reason, and the last n is multiplied by a factor of k because each split creates a multiple of k extra work.
My confusion stems from the second relation. Given that the first and second relations are almost identical, why isn't the result of the second something like nlgn+(lgn^2)?
The general result is the Master Theorem
But in this specific case, you can work out the math for a power of 2:
C(2^k)
= 2 * C(2^(k-1)) + lg(2^k)
= 4 * C(2^(k-2)) + lg(2^k) + 2 * lg(2^(k-1))
= ... repeat ...
= 2^k * C(1) + sum (from i=1 to k) 2^(k-i) * lg 2^i
= 2^k + lg(2) * sum (from i=1 to k) 2^(i) * i
= 2^k - 2 + 2^k+1 - k
= 3 * 2^k - k - 2
= 3 * n - lg(n) - 2