I'm studying about machine learning. While I'm studying, I found Tensorflow CNN code using MNIST Dataset.And here's a code that i want to know.
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv), reduction_indices=[1]))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess.run(tf.global_variables_initializer())
for i in range(1000):
batch = mnist.train.next_batch(100)
if i%100 == 0:
train_accuracy = accuracy.eval(feed_dict={
x:batch[0], y_: batch[1], keep_prob: 1.0})
print("step %d, training accuracy %g"%(i, train_accuracy))
train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
print("test accuracy %g"%accuracy.eval(feed_dict={
x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))
In this code, my question is about batch = mnist.train.next_batch(100). When I searched about this, it means that this is mini-batch and randomly choose 100 data from MNIST dataset. Now here's my question.
When I want to test this code with full batch, what should I do? Just change mnist.train.next_batch(100) to mnist.train.next_batch(55000)?
Yes, getting a batch of 55000 will train one epoch on all digits of MNIST.
Note that this is a bad idea: this will likely not fit into your memory. You would have to save the weight activation of 55000 digits, and the gradients... it is very likely your Python will crash!
By training 1000 times on a batch of 100 random images you get a great result, and your computer is happy!
Related
I am new to PyTorch and I'm trying to build a simple neural net for classification. The problem is the network doesn't learn at all. I tried various learning rate ranging from 0.3 to 1e-8 and I also tried training it for a longer duration. My data is small with only 120 training examples and the batch size is 16. Here is the code
Define network
model = nn.Sequential(nn.Linear(4999, 1000),
nn.ReLU(),
nn.Linear(1000,200),
nn.ReLU(),
nn.Linear(200,1),
nn.Sigmoid())
Loss and optimizer
import torch.optim as optim
optimizer = optim.SGD(model.parameters(), lr=0.01)
criterion = nn.BCELoss(reduction="mean")
Training
num_epochs = 100
for epoch in range(num_epochs):
cumulative_loss = 0
for i, data in enumerate(batch_gen(X_train, y_train, batch_size=16)):
inputs, labels = data
inputs = torch.from_numpy(inputs).float()
labels = torch.from_numpy(labels).float()
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
cumulative_loss += loss.item()
if i%5 == 0 and i != 0:
print(f"epoch {epoch} batch {i} => Loss: {cumulative_loss/5}")
print("Finished Training!!")
Any help is appreciated!
The reason your loss doesn't seem to decrease every epoch is because you're not printing it every epoch. You're actually printing it every 5th batch. And the loss does not decrease a lot per batch.
Try the following. Here, loss every epoch will be printed.
num_epochs = 100
for epoch in range(num_epochs):
cumulative_loss = 0
for i, data in enumerate(batch_gen(X_train, y_train, batch_size=16)):
inputs, labels = data
inputs = torch.from_numpy(inputs).float()
labels = torch.from_numpy(labels).float()
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
cumulative_loss += loss.item()
print(f"epoch {epoch} => Loss: {cumulative_loss}")
print("Finished Training!!")
One reason that your loss doesn't decrease could be because your neural-net isn't deep enough to learn anything. So, trying add more layers.
model = nn.Sequential(nn.Linear(4999, 3000),
nn.ReLU(),
nn.Linear(3000,200),
nn.ReLU(),
nn.Linear(2000,1000),
nn.ReLU(),
nn.Linear(500,250),
nn.ReLU(),
nn.Linear(250,1),
nn.Sigmoid())
Also, I just noticed you're passing data that has very high dimensionality. You have 4999 features/columns and only 120 training examples/rows. Converging a model with so less data is next to impossible (considering you have very high dimensional data).
I'd suggest you try finding more rows or perform dimensionality reduction on your input data (like PCA) to reduce the feature space (to maybe 50/100 or lesser features) and then try again. Chances are that your model still won't converge but it's worth a try.
I'm trying to create a neural network for image classification. This is my Model summary. I have done normalization to my dataset and shuffling to my data.
. When I run model.fit the val_loss is very high sometimes close to 100 whereas my loss is less than 0.8
When you don't normalize test data, validation loss will be very high when compared to training data that was normalized. I used simple mnist model to demonstrate the point of normalization.
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# this is to demonstrate the importance of normalizing both training and testing data
x_train, x_test = x_train / 255.0, x_test / 1.
When we don't normalize test data where as training data was normalized,
training loss is loss: 0.0771 where as loss during test is 13.1599. Please check the complete code here. Thanks!
I am trying to learn Keras. I see machine learning code for recognizing handwritten digits here (also given here). It seems to have feedforward, SGD and backpropagation methods written from a scratch. I just want to know if it is possible to write this program using Keras? A starting step in that direction will be appreciated.
You can use this to understand how the MNIST dataset works for MLP first.Keras MNIST tutorial. As you proceed, you can look into how CNN works on the MNIST dataset.
I will describe a bit of the process of the keras code that you have attached to your comment
# Step 1: Organize Data
batch_size = 128 # This is split the 60k images into batches of 128, normally people use 100. It's up to you
num_classes = 10 # Your final layer. Basically number 0 - 9 (10 classes)
epochs = 20 # 20 'runs'. You can increase or decrease to see the change in accuracy. Normally MNIST accuracy peaks at around 10-20 epochs.
# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data() #X_train - Your training images, y_train - training labels; x_test - test images, y_test - test labels. Normally people train on 50k train images, 10k test images.
x_train = x_train.reshape(60000, 784) # Each MNIST image is 28x28 pixels. So you are flattening into a 28x28 = 784 array. 60k train images
x_test = x_test.reshape(10000, 784) # Likewise, 10k test images
x_train = x_train.astype('float32') # For float numbers
x_test = x_test.astype('float32')
x_train /= 255 # For normalization. Each image has a 'degree' of darkness within the range of 0-255, so you want to reduce that range to 0 - 1 for your Neural Network
x_test /= 255
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes) # One-hot encoding. So when your NN is trained, your prediction for 5(example) will look like this [0000010000] (Final layer).
y_test = keras.utils.to_categorical(y_test, num_classes)
# Step 2: Create MLP model
model = Sequential()
model.add(Dense(512, activation='relu', input_shape=(784,))) #First hidden layer, 512 neurons, activation relu, input 784 array
model.add(Dropout(0.2)) # During the training, layer has 20% probability of 'switching off' certain neurons
model.add(Dense(512, activation='relu')) # Same as above
model.add(Dropout(0.2))
model.add(Dense(num_classes, activation='softmax')) # Final layer, 10 neurons, softmax is a probability function to give the best probability of the input image
model.summary()
# Step 3: Create model compilation
model.compile(loss='categorical_crossentropy',
optimizer=RMSprop(),
metrics=['accuracy'])
# 10 classes - categorical_crossentropy. If 2 classes, you can use binary_crossentropy; optimizer - RMSprop, you can change this to ADAM, SGD, etc...; metrics - accuracy
# Step 4: Train model
history = model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test))
# Training happens here. Train on each batch size for 20 runs, the validate your result on the test set.
# Step 5: See results on your test data
score = model.evaluate(x_test, y_test, verbose=0)
# Prints out scores
print('Test loss:', score[0])
print('Test accuracy:', score[1])
I am training a model in Keras with as follows:
model.fit(Xtrn, ytrn batch_size=16, epochs=50, verbose=1, shuffle=True,
callbacks=[model_checkpoint], validation_data=(Xval, yval))
The fitting output looks as follows:
As shown in the model.fit I have a batch size of 16 and a total of 8000 training samples as shown in the output. So from my understanding, training takes place every 16 batches. Which also means training is ran 500 times for a single epoch (i.e., 8000/16 =500)
So let's take the training accuracy printed in the output for Epoch 1/50, which in this case is 0.9381. I would like to know how is this training accuracy of 0.9381 derived.
Is it the:
Is the mean training accuracy, taken as the average from the 500 times training, performed for every batch?
OR,
Is it the best (or max) training accuracy from out of the 500 instances the training procedure is run?
Take a look at the BaseLogger in Keras where they're computing a running mean.
For each epoch the accuracy is the average of all the batches seen before in that epoch.
class BaseLogger(Callback):
"""Callback that accumulates epoch averages of metrics.
This callback is automatically applied to every Keras model.
"""
def on_epoch_begin(self, epoch, logs=None):
self.seen = 0
self.totals = {}
def on_batch_end(self, batch, logs=None):
logs = logs or {}
batch_size = logs.get('size', 0)
self.seen += batch_size
for k, v in logs.items():
if k in self.totals:
self.totals[k] += v * batch_size
else:
self.totals[k] = v * batch_size
def on_epoch_end(self, epoch, logs=None):
if logs is not None:
for k in self.params['metrics']:
if k in self.totals:
# Make value available to next callbacks.
logs[k] = self.totals[k] / self.seen
I used the following tensorflow implementation for a binary classification task and got really bad accuracy. However when I trained the same dataset with a sklearn.ensemble.GradientBoostingClassifier without any tuning and the result was pretty good. When I took a deep look at the out-of-sample predictions of the neural network made, I realized most of the predictions were the positive class.
precision recall f1-score support
0 0.01 1.00 0.02 8
1 1.00 0.37 0.55 1630
avg / total 1.00 0.38 0.54 1638
The implementation of 2 layer fully connected network:
import math
batch_size = 200
feature_size = len(train_features.columns)
graph = tf.Graph()
with graph.as_default():
# Input data. For the training data, we use a placeholder that will be fed
# at run time with a training minibatch.
tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, feature_size))
tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
tf_valid_dataset = tf.constant(valid_dataset)
tf_test_dataset = tf.constant(test_dataset)
# Variables.
weights1 = tf.Variable(tf.truncated_normal([feature_size, 512]))
biases1 = tf.Variable(tf.zeros([512]))
weights2 = tf.Variable(tf.truncated_normal([512, 512], stddev=0.005))
biases2 = tf.Variable(tf.zeros([512]))
weights = tf.Variable(tf.truncated_normal([512, num_labels], stddev=0.005))
biases = tf.Variable(tf.zeros([num_labels]))
hidden_layer1 = tf.nn.relu(tf.matmul(tf_train_dataset, weights1) + biases1)
hidden_layer2 = tf.nn.relu(tf.matmul(hidden_layer1, weights2) + biases2)
logits = tf.matmul(hidden_layer2, weights) + biases
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))
# Optimizer.
optimizer = tf.train.AdamOptimizer(0.0005).minimize(loss)
# Predictions for the training, validation, and test data.
train_prediction = tf.nn.softmax(logits)
valid_hidden_layer1 = tf.nn.relu(tf.matmul(tf_valid_dataset, weights1) + biases1)
valid_hidden_layer2 = tf.nn.relu(tf.matmul(valid_hidden_layer1, weights2) + biases2)
valid_prediction = tf.nn.softmax(tf.matmul(valid_hidden_layer2, weights) + biases)
test_hidden_layer1 = tf.nn.relu(tf.matmul(tf_test_dataset, weights1) + biases1)
test_hidden_layer2 = tf.nn.relu(tf.matmul(test_hidden_layer1, weights2) + biases2)
test_prediction = tf.nn.softmax(tf.matmul(test_hidden_layer2, weights) + biases)
Any suggestion on how to debug this?
The sklearn GradientBoostingClassifier is a different algorithm than a neural network. It does something based on regression trees, which require less fine tuning in order to give good performance than neural networks. This is the trade-off when using neural networks; if you want performance better than alternative algorithms like random forests and SVM, you need to tune the hyper parameters.
As far as that goes, the first thing you should do is initialize the bias on your relu units to nonzero. This helps prevent them from entering a regime where they 'die' and end up giving 0 output and 0 gradient forever. You should also try different learning rates; a learning rate too high will cause the algorithm to not learn properly, and too low will waste resources.
You should also experiment with the number of neurons and layers. I see you have 512 neurons in each hidden layer, and this might be too much unless your problem is that high of dimension and you have enough data. What is your training and test/cross-validation error like? You should keep track of these while you train. If you're getting low training error but high validation error, then you should cut down on the number of neurons because you are overfitting. You could also try having just one hidden layer and see if that helps.