I have some troubles trying to set up a multilayer perceptron for binary classification using tensorflow.
I have a very large dataset (about 1,5*10^6 examples) each with a binary (0/1) label and 100 features. What I need to do is to set up a simple MLP and then try to change the learning rate and the initialization pattern to document the results (it's an assignment).
I am getting strange results, though, as my MLP seem to get stuck with a low-but-not-great cost early and never getting off of it. With fairly low values of learning rate the cost goes NAN almost immediately. I don't know if the problem lies in how I structured the MLP (I did a few tries, going to post the code for the last one) or if I am missing something with my tensorflow implementation.
CODE
import tensorflow as tf
import numpy as np
import scipy.io
# Import and transform dataset
print("Importing dataset.")
dataset = scipy.io.mmread('tfidf_tsvd.mtx')
with open('labels.txt') as f:
all_labels = f.readlines()
all_labels = np.asarray(all_labels)
all_labels = all_labels.reshape((1498271,1))
# Split dataset into training (66%) and test (33%) set
training_set = dataset[0:1000000]
training_labels = all_labels[0:1000000]
test_set = dataset[1000000:1498272]
test_labels = all_labels[1000000:1498272]
print("Dataset ready.")
# Parameters
learning_rate = 0.01 #argv
mini_batch_size = 100
training_epochs = 10000
display_step = 500
# Network Parameters
n_hidden_1 = 64 # 1st hidden layer of neurons
n_hidden_2 = 32 # 2nd hidden layer of neurons
n_hidden_3 = 16 # 3rd hidden layer of neurons
n_input = 100 # number of features after LSA
# Tensorflow Graph input
x = tf.placeholder(tf.float64, shape=[None, n_input], name="x-data")
y = tf.placeholder(tf.float64, shape=[None, 1], name="y-labels")
print("Creating model.")
# Create model
def multilayer_perceptron(x, weights):
# First hidden layer with SIGMOID activation
layer_1 = tf.matmul(x, weights['h1'])
layer_1 = tf.nn.sigmoid(layer_1)
# Second hidden layer with SIGMOID activation
layer_2 = tf.matmul(layer_1, weights['h2'])
layer_2 = tf.nn.sigmoid(layer_2)
# Third hidden layer with SIGMOID activation
layer_3 = tf.matmul(layer_2, weights['h3'])
layer_3 = tf.nn.sigmoid(layer_3)
# Output layer with SIGMOID activation
out_layer = tf.matmul(layer_2, weights['out'])
return out_layer
# Layer weights, should change them to see results
weights = {
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1], dtype=np.float64)),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2], dtype=np.float64)),
'h3': tf.Variable(tf.random_normal([n_hidden_2, n_hidden_3],dtype=np.float64)),
'out': tf.Variable(tf.random_normal([n_hidden_2, 1], dtype=np.float64))
}
# Construct model
pred = multilayer_perceptron(x, weights)
# Define loss and optimizer
cost = tf.nn.l2_loss(pred-y,name="squared_error_cost")
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
# Initializing the variables
init = tf.initialize_all_variables()
print("Model ready.")
# Launch the graph
with tf.Session() as sess:
sess.run(init)
print("Starting Training.")
# Training cycle
for epoch in range(training_epochs):
#avg_cost = 0.
# minibatch loading
minibatch_x = training_set[mini_batch_size*epoch:mini_batch_size*(epoch+1)]
minibatch_y = training_labels[mini_batch_size*epoch:mini_batch_size*(epoch+1)]
# Run optimization op (backprop) and cost op
_, c = sess.run([optimizer, cost], feed_dict={x: minibatch_x, y: minibatch_y})
# Compute average loss
avg_cost = c / (minibatch_x.shape[0])
# Display logs per epoch
if (epoch) % display_step == 0:
print("Epoch:", '%05d' % (epoch), "Training error=", "{:.9f}".format(avg_cost))
print("Optimization Finished!")
# Test model
# Calculate accuracy
test_error = tf.nn.l2_loss(pred-y,name="squared_error_test_cost")/test_set.shape[0]
print("Test Error:", test_error.eval({x: test_set, y: test_labels}))
OUTPUT
python nn.py
Importing dataset.
Dataset ready.
Creating model.
Model ready.
Starting Training.
Epoch: 00000 Training error= 0.331874878
Epoch: 00500 Training error= 0.121587482
Epoch: 01000 Training error= 0.112870921
Epoch: 01500 Training error= 0.110293652
Epoch: 02000 Training error= 0.122655269
Epoch: 02500 Training error= 0.124971940
Epoch: 03000 Training error= 0.125407845
Epoch: 03500 Training error= 0.131942481
Epoch: 04000 Training error= 0.121696954
Epoch: 04500 Training error= 0.116669835
Epoch: 05000 Training error= 0.129558477
Epoch: 05500 Training error= 0.122952110
Epoch: 06000 Training error= 0.124655344
Epoch: 06500 Training error= 0.119827300
Epoch: 07000 Training error= 0.125183779
Epoch: 07500 Training error= 0.156429254
Epoch: 08000 Training error= 0.085632880
Epoch: 08500 Training error= 0.133913128
Epoch: 09000 Training error= 0.114762624
Epoch: 09500 Training error= 0.115107805
Optimization Finished!
Test Error: 0.116647016708
This is what MMN advised
weights = {
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1], stddev=0, dtype=np.float64)),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2], stddev=0.01, dtype=np.float64)),
'h3': tf.Variable(tf.random_normal([n_hidden_2, n_hidden_3], stddev=0.01, dtype=np.float64)),
'out': tf.Variable(tf.random_normal([n_hidden_2, 1], dtype=np.float64))
}
This is the output
Epoch: 00000 Training error= 0.107566668
Epoch: 00500 Training error= 0.289380907
Epoch: 01000 Training error= 0.339091784
Epoch: 01500 Training error= 0.358559815
Epoch: 02000 Training error= 0.122639698
Epoch: 02500 Training error= 0.125160135
Epoch: 03000 Training error= 0.126219718
Epoch: 03500 Training error= 0.132500418
Epoch: 04000 Training error= 0.121795254
Epoch: 04500 Training error= 0.116499476
Epoch: 05000 Training error= 0.124532673
Epoch: 05500 Training error= 0.124484790
Epoch: 06000 Training error= 0.118491177
Epoch: 06500 Training error= 0.119977633
Epoch: 07000 Training error= 0.127532511
Epoch: 07500 Training error= 0.159053519
Epoch: 08000 Training error= 0.083876224
Epoch: 08500 Training error= 0.131488483
Epoch: 09000 Training error= 0.123161189
Epoch: 09500 Training error= 0.125011362
Optimization Finished!
Test Error: 0.129284643093
Connected third hidden layer, thanks to MMN
There was a mistake in my code and I had two hidden layers instead of three. I corrected doing:
'out': tf.Variable(tf.random_normal([n_hidden_3, 1], dtype=np.float64))
and
out_layer = tf.matmul(layer_3, weights['out'])
I returned to the old value for stddev though, as it seems to cause less fluctuation in the cost function.
The output is still troubling
Epoch: 00000 Training error= 0.477673073
Epoch: 00500 Training error= 0.121848744
Epoch: 01000 Training error= 0.112854530
Epoch: 01500 Training error= 0.110597624
Epoch: 02000 Training error= 0.122603499
Epoch: 02500 Training error= 0.125051472
Epoch: 03000 Training error= 0.125400717
Epoch: 03500 Training error= 0.131999354
Epoch: 04000 Training error= 0.121850889
Epoch: 04500 Training error= 0.116551533
Epoch: 05000 Training error= 0.129749704
Epoch: 05500 Training error= 0.124600464
Epoch: 06000 Training error= 0.121600218
Epoch: 06500 Training error= 0.121249676
Epoch: 07000 Training error= 0.132656938
Epoch: 07500 Training error= 0.161801757
Epoch: 08000 Training error= 0.084197352
Epoch: 08500 Training error= 0.132197409
Epoch: 09000 Training error= 0.123249055
Epoch: 09500 Training error= 0.126602369
Optimization Finished!
Test Error: 0.129230736355
Two more changes thanks to Steven
So Steven proposed to change Sigmoid activation function with ReLu, and so I tried. In the mean time, I noticed I didn't set an activation function for the output node, so I did that too (should be easy to see what I changed).
Starting Training.
Epoch: 00000 Training error= 293.245977809
Epoch: 00500 Training error= 0.290000000
Epoch: 01000 Training error= 0.340000000
Epoch: 01500 Training error= 0.360000000
Epoch: 02000 Training error= 0.285000000
Epoch: 02500 Training error= 0.250000000
Epoch: 03000 Training error= 0.245000000
Epoch: 03500 Training error= 0.260000000
Epoch: 04000 Training error= 0.290000000
Epoch: 04500 Training error= 0.315000000
Epoch: 05000 Training error= 0.285000000
Epoch: 05500 Training error= 0.265000000
Epoch: 06000 Training error= 0.340000000
Epoch: 06500 Training error= 0.180000000
Epoch: 07000 Training error= 0.370000000
Epoch: 07500 Training error= 0.175000000
Epoch: 08000 Training error= 0.105000000
Epoch: 08500 Training error= 0.295000000
Epoch: 09000 Training error= 0.280000000
Epoch: 09500 Training error= 0.285000000
Optimization Finished!
Test Error: 0.220196439287
This is what it does with the Sigmoid activation function on every node, output included
Epoch: 00000 Training error= 0.110878121
Epoch: 00500 Training error= 0.119393080
Epoch: 01000 Training error= 0.109229532
Epoch: 01500 Training error= 0.100436962
Epoch: 02000 Training error= 0.113160662
Epoch: 02500 Training error= 0.114200962
Epoch: 03000 Training error= 0.109777990
Epoch: 03500 Training error= 0.108218725
Epoch: 04000 Training error= 0.103001394
Epoch: 04500 Training error= 0.084145737
Epoch: 05000 Training error= 0.119173495
Epoch: 05500 Training error= 0.095796251
Epoch: 06000 Training error= 0.093336573
Epoch: 06500 Training error= 0.085062860
Epoch: 07000 Training error= 0.104251661
Epoch: 07500 Training error= 0.105910949
Epoch: 08000 Training error= 0.090347288
Epoch: 08500 Training error= 0.124480612
Epoch: 09000 Training error= 0.109250224
Epoch: 09500 Training error= 0.100245836
Optimization Finished!
Test Error: 0.110234139674
I found these numbers very strange, in the first case, it is stuck in a higher cost than sigmoid, even though sigmoid should saturate very early. In the second case, it starts with a training error which is almost the last one... so it basically converges with one mini-batch. I'm starting to think that I am not calculating the cost correctly, in this line:
avg_cost = c / (minibatch_x.shape[0])
So it could be a couple of things:
You could be saturating the sigmoid units (as MMN mentioned) I would suggest trying relu units instead.
replace:
tf.nn.sigmoid(layer_n)
with:
tf.nn.relu(layer_n)
Your model may not have the expressive power to actually learn your data. I.e. it would need to be deeper.
You can also try a different optimizer like Adam() as such
replace:
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
with:
optimizer = tf.train.AdamOptimizer().minimize(cost)
A few other points:
You should add a bias term to your weights
like so:
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1], dtype=np.float64)),
'b2': tf.Variable(tf.random_normal([n_hidden_2], dtype=np.float64)),
'b3': tf.Variable(tf.random_normal([n_hidden_3],dtype=np.float64)),
'bout': tf.Variable(tf.random_normal([1], dtype=np.float64))
}
def multilayer_perceptron(x, weights):
# First hidden layer with SIGMOID activation
layer_1 = tf.matmul(x, weights['h1']) + biases['b1']
layer_1 = tf.nn.sigmoid(layer_1)
# Second hidden layer with SIGMOID activation
layer_2 = tf.matmul(layer_1, weights['h2']) + biases['b2']
layer_2 = tf.nn.sigmoid(layer_2)
# Third hidden layer with SIGMOID activation
layer_3 = tf.matmul(layer_2, weights['h3']) + biases['b3']
layer_3 = tf.nn.sigmoid(layer_3)
# Output layer with SIGMOID activation
out_layer = tf.matmul(layer_2, weights['out']) + biases['bout']
return out_layer
and you can update the learning rate over time
like so:
learning_rate = tf.train.exponential_decay(INITIAL_LEARNING_RATE,
global_step,
decay_steps,
LEARNING_RATE_DECAY_FACTOR,
staircase=True)
You just need to define the decay steps i.e. when to decay and LEARNING_RATE_DECAY_FACTOR i.e. decay by how much.
Your weights at initialized with a stddev of 1, so the output of layer 1 will have a stddev of 10 or so. This might be saturating the sigmoid functions to the point most gradients are 0.
Can you try initializing the hidden weights with a stddev of .01?
Along with above answers, I will suggest you that try a cost function tf.nn.sigmoid_cross_entropy_with_logits(logits, targets, name=None)
As binary classification, you must try the sigmoid_cross_entropy_with_logits cost function
I will also suggest you must also plot line graph of accuracy of train and test vs number of epochs. i.e check whether the model is overfitting?
If its not overfitting, try to make your neural net more complex. That is by increasing number of neurons, increasing number of layers. You will get such a point beyond that your training accuracy will keep increasing but validation will not that point will give the best model.
Related
I am currently trying to implement transfer learning using pytorch on the nasnet model. I cannot find any other way to import the model other than using this.
import timm
model = timm.create_model('nasnetalarge', pretrained=True)
It has last three layers
(act): ReLU(inplace=True)
(global_pool): SelectAdaptivePool2d (pool_type=avg, flatten=Flatten(start_dim=1, end_dim=-1))
(last_linear): Linear(in_features=4032, out_features=1000, bias=True)
I am trying to do binary classification and I am trying to fine tune the network. So, I changed the number of out features to be 2. But the accuracy remains constant with the number of epochs.
Is it a correct way to implement nasnet model? Also, should I change all the activation functions or some of them? And how to fine tune the model so that the model converges?
Epoch: 0 Train Loss: 0.6450783805564987 Validation loss: 0.6651424169540405 Train Accuracy: tensor(64.5254, device='cuda:0') Validation accuracy: 78.0
Epoch: 1 Train Loss: 0.6464798424893493 Validation loss: 0.6233693957328796 Train Accuracy: tensor(63.9106, device='cuda:0') Validation accuracy: 77.33333333333333
Epoch: 2 Train Loss: 0.6471542623569249 Validation loss: 0.5627642869949341 Train Accuracy: tensor(63.9618, device='cuda:0') Validation accuracy: 77.33333333333333
Epoch: 3 Train Loss: 0.6478866574491537 Validation loss: 0.6459301710128784 Train Accuracy: tensor(64.1540, device='cuda:0') Validation accuracy: 78.66666666666666
Epoch: 4 Train Loss: 0.6494869376566493 Validation loss: 0.6185131072998047 Train Accuracy: tensor(64.1540, device='cuda:0') Validation accuracy: 78.0
Epoch: 5 Train Loss: 0.6495973079446123 Validation loss: 0.6605387926101685 Train Accuracy: tensor(64.3269, device='cuda:0') Validation accuracy: 78.0
Epoch: 6 Train Loss: 0.6508511623683317 Validation loss: 0.7085398435592651 Train Accuracy: tensor(64.1604, device='cuda:0') Validation accuracy: 78.0
Epoch: 7 Train Loss: 0.6518356682885635 Validation loss: 0.6155421137809753 Train Accuracy: tensor(64.3013, device='cuda:0') Validation accuracy: 78.0
Epoch: 8 Train Loss: 0.6525909496022505 Validation loss: 0.6670436859130859 Train Accuracy: tensor(64.0963, device='cuda:0') Validation accuracy: 78.0
I am trying to change the hyperparameters, but that didn't work. I wonder if there is something wrong with my implementation. Can anyone please help?
Here is my training part of the code
best_accuracy = 0.0
training_loss = []
validation_loss = []
for epoch in range(num_of_epochs):
#Evaluation and training on training dataset
model.train()
running_loss=0.0
running_correct=0.0
correct=0.0
for images,labels in train_loader:
images = images.to(device)
labels = labels.to(device)
with torch.set_grad_enabled(True):
outputs=model(images)
_,preds=torch.max(outputs,1)
loss=loss_function(outputs,labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
running_loss +=loss.item()*images.size(0)
running_correct +=torch.sum(preds==labels.data)
step_lr_scheduler.step()
train_accuracy=(running_correct/train_count)*100
train_loss=running_loss/train_count
training_loss.append(train_loss)
#Evaluating on the validation set
model.eval()
valid_accuracy=0.0
running_validloss = 0.0
for images,labels in valid_loader:
images = images.to(device)
labels = labels.to(device)
with torch.no_grad():
outputs = model(images)
_,preds=torch.max(outputs,1)
running_validloss +=loss.item()*images.size(0)
correct += (preds == labels.cuda(device)).sum().item()
valid_accuracy = 100*(correct/valid_count)
valid_loss = running_validloss/valid_count
validation_loss.append(valid_loss)
print('Epoch: '+str(epoch)+' Train Loss: '+ str(train_loss)+ ' Validation loss: ' + str(valid_loss) + ' Train Accuracy: '+str(train_accuracy)+ ' Validation accuracy: ' + str(valid_accuracy))
Out of curiosity, I compared a stacked LSTM neural network with a single time step with MLP with tanh activation function, thinking they would have the same performance.
The architectures used for comparison are as follows, and they are trained on an identical dataset of regression problem (loss function is MSE):
model.add(Dense(50, input_dim=num_features, activation = 'tanh'))
model.add(Dense(100, activation = 'tanh'))
model.add(Dense(150, activation = 'tanh'))
model.add(Dense(100, activation = 'tanh'))
model.add(Dense(50, activation = 'tanh'))
model.add(Dense(1))
model.add(LSTM(50, return_sequences=True, input_shape=(None, num_features)))
model.add(LSTM(100, return_sequences=True))
model.add(LSTM(150, return_sequences=True))
model.add(LSTM(100, return_sequences=True))
model.add(LSTM(50))
model.add(Dense(1))
Surprisingly, the loss for the LSTM model decreases much faster than the MLP:
MLP loss:
Epoch: 1
Training Loss: 0.011504
Validation Loss: 0.010708
Epoch: 2
Training Loss: 0.010739
Validation Loss: 0.010623
Epoch: 3
Training Loss: 0.010598
Validation Loss: 0.010189
Epoch: 4
Training Loss: 0.010046
Validation Loss: 0.009651
Epoch: 5
Training Loss: 0.009305
Validation Loss: 0.008502
Epoch: 6
Training Loss: 0.007388
Validation Loss: 0.004334
Epoch: 7
Training Loss: 0.002576
Validation Loss: 0.001686
Epoch: 8
Training Loss: 0.001375
Validation Loss: 0.001217
Epoch: 9
Training Loss: 0.000921
Validation Loss: 0.000916
Epoch: 10
Training Loss: 0.000696
Validation Loss: 0.000568
Epoch: 11
Training Loss: 0.000560
Validation Loss: 0.000479
Epoch: 12
Training Loss: 0.000493
Validation Loss: 0.000451
Epoch: 13
Training Loss: 0.000439
Validation Loss: 0.000564
Epoch: 14
Training Loss: 0.000402
Validation Loss: 0.000478
Epoch: 15
Training Loss: 0.000377
Validation Loss: 0.000366
Epoch: 16
Training Loss: 0.000351
Validation Loss: 0.000240
Epoch: 17
Training Loss: 0.000340
Validation Loss: 0.000352
Epoch: 18
Training Loss: 0.000327
Validation Loss: 0.000203
Epoch: 19
Training Loss: 0.000311
Validation Loss: 0.000323
Epoch: 20
Training Loss: 0.000299
Validation Loss: 0.000264
LSTM loss:
Epoch: 1
Training Loss: 0.011345
Validation Loss: 0.010634
Epoch: 2
Training Loss: 0.008128
Validation Loss: 0.003692
Epoch: 3
Training Loss: 0.001488
Validation Loss: 0.000668
Epoch: 4
Training Loss: 0.000440
Validation Loss: 0.000232
Epoch: 5
Training Loss: 0.000260
Validation Loss: 0.000160
Epoch: 6
Training Loss: 0.000200
Validation Loss: 0.000137
Epoch: 7
Training Loss: 0.000165
Validation Loss: 0.000093
Epoch: 8
Training Loss: 0.000140
Validation Loss: 0.000104
Epoch: 9
Training Loss: 0.000127
Validation Loss: 0.000139
Epoch: 10
Training Loss: 0.000116
Validation Loss: 0.000091
Epoch: 11
Training Loss: 0.000106
Validation Loss: 0.000095
Epoch: 12
Training Loss: 0.000099
Validation Loss: 0.000082
Epoch: 13
Training Loss: 0.000091
Validation Loss: 0.000135
Epoch: 14
Training Loss: 0.000085
Validation Loss: 0.000099
Epoch: 15
Training Loss: 0.000082
Validation Loss: 0.000055
Epoch: 16
Training Loss: 0.000079
Validation Loss: 0.000062
Epoch: 17
Training Loss: 0.000075
Validation Loss: 0.000045
Epoch: 18
Training Loss: 0.000073
Validation Loss: 0.000121
Epoch: 19
Training Loss: 0.000069
Validation Loss: 0.000045
Epoch: 20
Training Loss: 0.000065
Validation Loss: 0.000052
After 100 epochs, the validation loss for MLP decreased to about 1e-4, but the loss for LSTM decreased to about 1e-5.
It doesn't make much sense to me as to how these two architectures would be any different, since the LSTM cells are not using any memory from previous timesteps. Also, the training for MLP is about 3 times faster than LSTM. Could someone explain the math behind it?
I am trying to learn Keras. I see machine learning code for recognizing handwritten digits here (also given here). It seems to have feedforward, SGD and backpropagation methods written from a scratch. I just want to know if it is possible to write this program using Keras? A starting step in that direction will be appreciated.
You can use this to understand how the MNIST dataset works for MLP first.Keras MNIST tutorial. As you proceed, you can look into how CNN works on the MNIST dataset.
I will describe a bit of the process of the keras code that you have attached to your comment
# Step 1: Organize Data
batch_size = 128 # This is split the 60k images into batches of 128, normally people use 100. It's up to you
num_classes = 10 # Your final layer. Basically number 0 - 9 (10 classes)
epochs = 20 # 20 'runs'. You can increase or decrease to see the change in accuracy. Normally MNIST accuracy peaks at around 10-20 epochs.
# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data() #X_train - Your training images, y_train - training labels; x_test - test images, y_test - test labels. Normally people train on 50k train images, 10k test images.
x_train = x_train.reshape(60000, 784) # Each MNIST image is 28x28 pixels. So you are flattening into a 28x28 = 784 array. 60k train images
x_test = x_test.reshape(10000, 784) # Likewise, 10k test images
x_train = x_train.astype('float32') # For float numbers
x_test = x_test.astype('float32')
x_train /= 255 # For normalization. Each image has a 'degree' of darkness within the range of 0-255, so you want to reduce that range to 0 - 1 for your Neural Network
x_test /= 255
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes) # One-hot encoding. So when your NN is trained, your prediction for 5(example) will look like this [0000010000] (Final layer).
y_test = keras.utils.to_categorical(y_test, num_classes)
# Step 2: Create MLP model
model = Sequential()
model.add(Dense(512, activation='relu', input_shape=(784,))) #First hidden layer, 512 neurons, activation relu, input 784 array
model.add(Dropout(0.2)) # During the training, layer has 20% probability of 'switching off' certain neurons
model.add(Dense(512, activation='relu')) # Same as above
model.add(Dropout(0.2))
model.add(Dense(num_classes, activation='softmax')) # Final layer, 10 neurons, softmax is a probability function to give the best probability of the input image
model.summary()
# Step 3: Create model compilation
model.compile(loss='categorical_crossentropy',
optimizer=RMSprop(),
metrics=['accuracy'])
# 10 classes - categorical_crossentropy. If 2 classes, you can use binary_crossentropy; optimizer - RMSprop, you can change this to ADAM, SGD, etc...; metrics - accuracy
# Step 4: Train model
history = model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test))
# Training happens here. Train on each batch size for 20 runs, the validate your result on the test set.
# Step 5: See results on your test data
score = model.evaluate(x_test, y_test, verbose=0)
# Prints out scores
print('Test loss:', score[0])
print('Test accuracy:', score[1])
I'm studying about machine learning. While I'm studying, I found Tensorflow CNN code using MNIST Dataset.And here's a code that i want to know.
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv), reduction_indices=[1]))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess.run(tf.global_variables_initializer())
for i in range(1000):
batch = mnist.train.next_batch(100)
if i%100 == 0:
train_accuracy = accuracy.eval(feed_dict={
x:batch[0], y_: batch[1], keep_prob: 1.0})
print("step %d, training accuracy %g"%(i, train_accuracy))
train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
print("test accuracy %g"%accuracy.eval(feed_dict={
x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))
In this code, my question is about batch = mnist.train.next_batch(100). When I searched about this, it means that this is mini-batch and randomly choose 100 data from MNIST dataset. Now here's my question.
When I want to test this code with full batch, what should I do? Just change mnist.train.next_batch(100) to mnist.train.next_batch(55000)?
Yes, getting a batch of 55000 will train one epoch on all digits of MNIST.
Note that this is a bad idea: this will likely not fit into your memory. You would have to save the weight activation of 55000 digits, and the gradients... it is very likely your Python will crash!
By training 1000 times on a batch of 100 random images you get a great result, and your computer is happy!
I am training a neural network with Keras which takes input of 2000 X 1 arrays, all the input data are "0" and "1" and generate a single output either 0 or 1.
here is my model:
def mdl_normal(sq_len,broker_num):
model = Sequential()
model.add(Dense(sq_len * (broker_num + 1), input_dim = (sq_len * (broker_num+1)),activation = 'relu'))
model.add(Dense(800, activation = 'relu'))
model.add(Dense(400, activation = 'relu'))
model.add(Dense(1, activation = 'sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='SGD')
return model
However I am getting the following while training:
Epoch 384/600 0s - loss: 1.4224e-04 - val_loss: 2.6322
The loss is extremely low and I am wondering I am doing something wrong. Can someone explain what is the meaning of loss here?
Thanks!
Louis