Related
I am trying to use cnn-lstm model on this dataset. I've stored this dataset in dataframe named as df. there are totally 11 column in this dataset but i am just mentioning 9 columns here. All columns have numerical values only
Area book_hotel votes location hotel_type Total_Price Facilities Dine rate
6 0 0 1 163 400 22 7 4.4
19 1 2 7 122 220 28 11 4.6
X=df.drop(['rate'],axis=1)
Y=df['rate']
x_train, x_test, y_train, y_test = train_test_split(np.asarray(X), np.asarray(Y), test_size=0.33, shuffle= True)
x_train has shape (3350,10) and
x_test has shape (1650, 10)
# The known number of output classes.
num_classes = 10
# Input image dimensions
input_shape = (10,)
# Convert class vectors to binary class matrices. This uses 1 hot encoding.
y_train_binary = keras.utils.to_categorical(y_train, num_classes)
y_test_binary = keras.utils.to_categorical(y_test, num_classes)
x_train = x_train.reshape(3350, 10,1)
x_test = x_test.reshape(1650, 10,1)
input_layer = Input(shape=(10, 1))
conv1 = Conv1D(filters=32,
kernel_size=8,
strides=1,
activation='relu',
padding='same')(input_layer)
lstm1 = LSTM(32, return_sequences=True)(conv1)
output_layer = Dense(1, activation='sigmoid')(lstm1)
model = Model(inputs=input_layer, outputs=output_layer)
model.summary()
model.compile(loss='mse',optimizer='adam')
Finally when i am trying to fit the model with input
model.fit(x_train,y_train)
ValueError Traceback (most recent call last)
<ipython-input-170-4719cf73997a> in <module>()
----> 1 model.fit(x_train,y_train)
2 frames
/usr/local/lib/python3.6/dist-packages/keras/engine/training_utils.py in standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix)
133 ': expected ' + names[i] + ' to have ' +
134 str(len(shape)) + ' dimensions, but got array '
--> 135 'with shape ' + str(data_shape))
136 if not check_batch_axis:
137 data_shape = data_shape[1:]
ValueError: Error when checking target: expected dense_2 to have 3 dimensions, but got array with shape (3350, 1)
Can someone please help me resolving this error
I see some problem in your code...
the last dimension output must be equal to the number of class and with multiclass tasks you need to apply a softmax activation: Dense(num_classes, activation='softmax')
you must set return_sequences=False in your last lstm cell because you need a 2D output and not a 3D
you must use categorical_crossentropy as loss function with one-hot encoded target
here a complete dummy example...
num_classes = 10
n_sample = 1000
X = np.random.uniform(0,1, (n_sample,10,1))
y = tf.keras.utils.to_categorical(np.random.randint(0,num_classes, n_sample))
input_layer = Input(shape=(10, 1))
conv1 = Conv1D(filters=32,
kernel_size=8,
strides=1,
activation='relu',
padding='same')(input_layer)
lstm1 = LSTM(32, return_sequences=False)(conv1)
output_layer = Dense(num_classes, activation='softmax')(lstm1)
model = Model(inputs=input_layer, outputs=output_layer)
model.compile(loss='categorical_crossentropy',optimizer='adam')
model.fit(X,y, epochs=5)
I have a dataset containing 1000 examples where each example has 5 features (a,b,c,d,e). I want to feed 7 examples to an LSTM so it predicts the feature (a) of the 8th day.
Reading Pytorchs documentation of nn.LSTM() I came up with the following:
input_size = 5
hidden_size = 10
num_layers = 1
output_size = 1
lstm = nn.LSTM(input_size, hidden_size, num_layers)
fc = nn.Linear(hidden_size, output_size)
out, hidden = lstm(X) # Where X's shape is ([7,1,5])
output = fc(out[-1])
output # output's shape is ([7,1])
According to the docs:
The input of the nn.LSTM is "input of shape (seq_len, batch, input_size)" with "input_size – The number of expected features in the input x",
And the output is: "output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the LSTM, for each t."
In this case, I thought seq_len would be the sequence of 7 examples, batchis 1 and input_size is 5. So the lstm would consume each example containing 5 features refeeding the hidden layer every iteration.
What am I missing?
When I extend your code to a full example -- I also added some comments to may help -- I get the following:
import torch
import torch.nn as nn
input_size = 5
hidden_size = 10
num_layers = 1
output_size = 1
lstm = nn.LSTM(input_size, hidden_size, num_layers)
fc = nn.Linear(hidden_size, output_size)
X = [
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
]
X = torch.tensor(X, dtype=torch.float32)
print(X.shape) # (seq_len, batch_size, input_size) = (7, 1, 5)
out, hidden = lstm(X) # Where X's shape is ([7,1,5])
print(out.shape) # (seq_len, batch_size, hidden_size) = (7, 1, 10)
out = out[-1] # Get output of last step
print(out.shape) # (batch, hidden_size) = (1, 10)
out = fc(out) # Push through linear layer
print(out.shape) # (batch_size, output_size) = (1, 1)
This makes sense to me, given your batch_size = 1 and output_size = 1 (I assume, you're doing regression). I don't know where your output.shape = (7, 1) come from.
Are you sure that your X has the correct dimensions? Did you create nn.LSTM maybe with batch_first=True? There are lot of little things that can sneak in.
I am new to pytorch. The following is the basic example of using nn module to train a simple one-layer model with some random data (from here)
import torch
N, D_in, H, D_out = 64, 1000, 100, 10
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)
model = torch.nn.Sequential(
torch.nn.Linear(D_in, H),
torch.nn.ReLU(),
torch.nn.Linear(H, D_out),
)
loss_fn = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)
for t in range(500):
y_pred = model(x)
loss = loss_fn(y_pred, y)
print(t, loss.item())
optimizer.zero_grad()
loss.backward()
optimizer.step()
As far as I understand, the batch size is equal to 1 in the example, in other words, a single point (out of 64) is used to calculate gradients and update parameters. My question is: how to modify this example to train the model with the batch size greater than one?
In fact N is the batch size. So you just need to modify N currently its set to 64. So you have in every training batch 64 vectors with size / dim D_in.
I checked the link you posted, you can also take a look at the comments - there is some explanation too :)
# -*- coding: utf-8 -*-
import numpy as np
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random input and output data
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)
# Randomly initialize weights
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)
learning_rate = 1e-6
for t in range(500):
# Forward pass: compute predicted y
h = x.dot(w1)
h_relu = np.maximum(h, 0)
y_pred = h_relu.dot(w2)
# Compute and print loss
loss = np.square(y_pred - y).sum()
print(t, loss)
# Backprop to compute gradients of w1 and w2 with respect to loss
grad_y_pred = 2.0 * (y_pred - y)
grad_w2 = h_relu.T.dot(grad_y_pred)
grad_h_relu = grad_y_pred.dot(w2.T)
grad_h = grad_h_relu.copy()
grad_h[h < 0] = 0
grad_w1 = x.T.dot(grad_h)
# Update weights
w1 -= learning_rate * grad_w1
w2 -= learning_rate * grad_w2
To include batch size in PyTorch basic examples, the easiest and cleanest way is to use PyTorch torch.utils.data.DataLoader and torch.utils.data.TensorDataset.
Dataset stores the samples and their corresponding labels, and DataLoader wraps an iterable around the Dataset to enable easy access to the samples.
DataLoader will take care of creating batches for you.
Building on your question, there is a complete code snippet, where we iterate over a dataset of 10000 examples for 2 epochs with a batch size of 64:
import torch
from torch.utils.data import DataLoader, TensorDataset
# Create the dataset with N_SAMPLES samples
N_SAMPLES, D_in, H, D_out = 10000, 1000, 100, 10
x = torch.randn(N_SAMPLES, D_in)
y = torch.randn(N_SAMPLES, D_out)
# Define the batch size and the number of epochs
BATCH_SIZE = 64
N_EPOCHS = 2
# Use torch.utils.data to create a DataLoader
# that will take care of creating batches
dataset = TensorDataset(x, y)
dataloader = DataLoader(dataset, batch_size=BATCH_SIZE, shuffle=True)
# Define model, loss and optimizer
model = torch.nn.Sequential(
torch.nn.Linear(D_in, H),
torch.nn.ReLU(),
torch.nn.Linear(H, D_out),
)
loss_fn = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)
# Get the dataset size for printing (it is equal to N_SAMPLES)
dataset_size = len(dataloader.dataset)
# Loop over epochs
for epoch in range(N_EPOCHS):
print(f"Epoch {epoch + 1}\n-------------------------------")
# Loop over batches in an epoch using DataLoader
for id_batch, (x_batch, y_batch) in enumerate(dataloader):
y_batch_pred = model(x_batch)
loss = loss_fn(y_batch_pred, y_batch)
optimizer.zero_grad()
loss.backward()
optimizer.step()
# Every 100 batches, print the loss for this batch
# as well as the number of examples processed so far
if id_batch % 100 == 0:
loss, current = loss.item(), (id_batch + 1)* len(x_batch)
print(f"loss: {loss:>7f} [{current:>5d}/{dataset_size:>5d}]")
The output should be something like:
Epoch 1
-------------------------------
loss: 643.433716 [ 64/10000]
loss: 648.195435 [ 6464/10000]
Epoch 2
-------------------------------
loss: 613.619873 [ 64/10000]
loss: 625.018555 [ 6464/10000]
I have been trying to implement a CNN on the CIFAR-10 dataset for a few days and my test set accuracy does not seem to go beyond the 10% and the error just hang around 69.07733. I have tweaking the model and few days but in vain. I haven't been able to spot out where I am going wrong. Please help me recognise the fault in the model. Here is the code for it:
import os
import sys
import pickle
import tensorflow as tf
import numpy as np
from matplotlib import pyplot as plt
data_root = './cifar-10-batches-py'
train_data = np.ndarray(shape=(50000,3072), dtype=np.float32)
train_labels = np.ndarray(shape=(50000), dtype=np.float32)
num_images = 0
test_data = np.ndarray(shape=(10000,3072),dtype = np.float32)
test_labels = np.ndarray(shape=(10000),dtype=np.float32)
meta_data = {}
for file in os.listdir(data_root):
file_path = os.path.join(data_root,file)
with open(file_path,'rb') as f:
temp = pickle.load(f,encoding ='bytes')
if file == 'batches.meta':
for i,j in enumerate(temp[b'label_names']):
meta_data[i] = j
if 'data_batch_' in file:
for i in range(10000):
train_data[num_images,:] = temp[b'data'][i]
train_labels[num_images] = temp[b'labels'][i]
num_images += 1
if 'test_batch' in file:
for i in range(10000):
test_data[i,:] = temp[b'data'][i]
test_labels[i] = temp[b'labels'][i]
'''
print('meta: \n',meta_data)
train_data = train_data.reshape(50000,3,32,32).transpose(0,2,3,1)
print('\ntrain data: \n',train_data.shape,'\nLabels: \n',train_labels[0])
print('\ntest data: \n',test_data[0].shape,'\nLabels: \n',train_labels[0])'''
#accuracy function acc = (no. of correct prediction/total attempts) * 100
def accuracy(predictions, labels):
return (100 * (np.sum(np.argmax(predictions,1)== np.argmax(labels, 1))/predictions.shape[0]))
#reformat the data
def reformat(data,labels):
data = data.reshape(data.shape[0],3,32,32).transpose(0,2,3,1).astype(np.float32)
labels = (np.arange(10) == labels[:,None]).astype(np.float32)
return data,labels
train_data, train_labels = reformat(train_data,train_labels)
test_data, test_labels = reformat(test_data, test_labels)
print ('Train ',train_data[0][1])
plt.axis("off")
plt.imshow(train_data[1], interpolation = 'nearest')
plt.savefig("1.png")
plt.show()
'''
print("Train: \n",train_data.shape,test_data[0],"\nLabels: \n",train_labels.shape,train_labels[:11])
print("Test: \n",test_data.shape,test_data[0],"\nLabels: \n",test_labels.shape,test_labels[:11])'''
image_size = 32
num_channels = 3
batch_size = 30
patch_size = 5
depth = 64
num_hidden = 256
num_labels = 10
graph = tf.Graph()
with graph.as_default():
#input data and labels
train_input = tf.placeholder(tf.float32,shape=(batch_size,image_size,image_size,num_channels))
train_output = tf.placeholder(tf.float32,shape=(batch_size,num_labels))
test_input = tf.constant(test_data)
#layer weights and biases
layer_1_weights = tf.Variable(tf.truncated_normal([patch_size,patch_size,num_channels,depth]))
layer_1_biases = tf.Variable(tf.zeros([depth]))
layer_2_weights = tf.Variable(tf.truncated_normal([patch_size,patch_size,depth,depth]))
layer_2_biases = tf.Variable(tf.constant(0.1, shape=[depth]))
layer_3_weights = tf.Variable(tf.truncated_normal([64*64, num_hidden]))
layer_3_biases = tf.Variable(tf.constant(0.1, shape=[num_hidden]))
layer_4_weights = tf.Variable(tf.truncated_normal([num_hidden, num_labels]))
layer_4_biases = tf.Variable(tf.constant(0.1, shape=[num_labels]))
def convnet(data):
conv_1 = tf.nn.conv2d(data, layer_1_weights,[1,1,1,1], padding = 'SAME')
hidden_1 = tf.nn.relu(conv_1+layer_1_biases)
norm_1 = tf.nn.lrn(hidden_1, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75)
pool_1 = tf.nn.max_pool(norm_1,[1,2,2,1],[1,2,2,1], padding ='SAME')
conv_2 = tf.nn.conv2d(pool_1,layer_2_weights,[1,1,1,1], padding = 'SAME')
hidden_2 = tf.nn.relu(conv_2+layer_2_biases)
norm_2 = tf.nn.lrn(hidden_2, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75)
pool_2 = tf.nn.max_pool(norm_2,[1,2,2,1],[1,2,2,1], padding ='SAME')
shape = pool_2.get_shape().as_list()
hidd2_trans = tf.reshape(pool_2,[shape[0],shape[1]*shape[2]*shape[3]])
hidden_3 = tf.nn.relu(tf.matmul(hidd2_trans,layer_3_weights) + layer_3_biases)
return tf.nn.relu(tf.matmul(hidden_3,layer_4_weights) + layer_4_biases)
logits = convnet(train_input)
loss = tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(labels=train_output, logits = logits))
optimizer = tf.train.AdamOptimizer(1e-4).minimize(loss)
train_prediction = tf.nn.softmax(logits)
test_prediction = tf.nn.softmax(convnet(test_input))
num_steps = 100000
with tf.Session(graph=graph) as session:
tf.global_variables_initializer().run()
print('Initialized \n')
for step in range(num_steps):
offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
batch = train_data[offset:(offset+batch_size),:,:,:]
batch_labels = train_labels[offset:(offset+batch_size),:]
feed_dict ={train_input: batch, train_output: batch_labels}
_,l,prediction = session.run([optimizer, loss, train_prediction], feed_dict = feed_dict)
if (step % 500 == 0):
print("Loss at step %d: %f" %(step, l))
print("Accuracy: %f" %(accuracy(prediction, batch_labels)))
print("Test accuracy: %f" %(accuracy(session.run(test_prediction), test_labels)))
On a first glance I would say the initialization of the CNN is the culprit. A convnet is an optimization algorithm in a highly non-convex space and therefore depends a lot on careful initialization to not get stuck on local minima or saddle points. Look at xavier initialization for an example on how to fix that.
Example Code:
W = tf.get_variable("W", shape=[784, 256],
initializer=tf.contrib.layers.xavier_initializer())
Problem is your network is having very high depth(number of filters = 64 for both layers). Also, you are training the network from scratch. And your dataset of CIFAR10 (50000 images) is very little. Moreover, each CIFAR10 image is only 32x32x3 size.
Couple of alternatives what I can suggest you is to retrain a pre-trained model, i.e do transfer learning.
Other better alternative is to reduce the number of filters in each layer. In this way, you will be able to train the model from scratch and also it will be faster. (Assuming you don't have GPU).
Next you are making use of local response normalization. I would suggest you to remove this layer and do mean normalization in pre-processing step.
Next, if you feel the learning is not picking up at all, try increasing the learning rate a little and see.
Lastly, just to reduce some operation in your code, you are reshaping your tensor and then doing transpose in many places like this:
data.reshape(data.shape[0],3,32,32).transpose(0,2,3,1)
Why not directly reshape it to something like this?
data.reshape(data.shape[0], 32, 32, 3)
Hope the answer helps you.
So I am training a CNN and compute the training accuracy for each batch. Most of the it gives out 100% batch training accuracy. which I though was okay because I'm testing my model against the data I trained it with. But at some iterations, I get a 90% or 90% batch training accuracy. And worst, sometimes it goes down to 0% real quick and bounces back to 100% batch training accuracy. And I used the algorithm in https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/04_Save_Restore.ipynb and they also computed the batch training accuracy but they don't get the same results I get. They started out with around 80% batch training accuracy and observed a gradual increase until 98%. Why is this?
I was suspecting that my network is overfitting.
Here is my exact code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
import tensorflow as tf
import pyfftw
from scipy import signal
import xlrd
from tensorflow.python.tools import freeze_graph
from tensorflow.python.tools import optimize_for_inference_lib
import time
from datetime import timedelta
import math
import os
from sklearn.metrics import confusion_matrix
##matplotlib inline
plt.style.use('ggplot')
## define funtions
def read_data(file_path):
## column_names = ['user-id','activity','timestamp', 'x-axis', 'y-axis', 'z-axis']
column_names = ['activity','timestamp', 'Ax', 'Ay', 'Az', 'Gx', 'Gy', 'Gz', 'Mx', 'My', 'Mz'] ## 3 sensors
data = pd.read_csv(file_path,header = None, names = column_names)
return data
def feature_normalize(dataset):
mu = np.mean(dataset,axis = 0)
sigma = np.std(dataset,axis = 0)
return (dataset - mu)/sigma
def plot_axis(ax, x, y, title):
ax.plot(x, y)
ax.set_title(title)
ax.xaxis.set_visible(False)
ax.set_ylim([min(y) - np.std(y), max(y) + np.std(y)])
ax.set_xlim([min(x), max(x)])
ax.grid(True)
def plot_activity(activity,data):
fig, (ax0, ax1, ax2) = plt.subplots(nrows = 3, figsize = (15, 10), sharex = True)
plot_axis(ax0, data['timestamp'], data['Ax'], 'x-axis')
plot_axis(ax1, data['timestamp'], data['Ay'], 'y-axis')
plot_axis(ax2, data['timestamp'], data['Az'], 'z-axis')
plt.subplots_adjust(hspace=0.2)
fig.suptitle(activity)
plt.subplots_adjust(top=0.90)
plt.show()
def windows(data, size):
start = 0
while start < data.count():
yield start, start + size
start += (size / 2)
def segment_signal(data, window_size = None, num_channels=None): # edited
segments = np.empty((0,window_size,num_channels)) #change from 3 to 9 channels for AGM fusion #use variable num_channels=9
labels = np.empty((0))
for (n_start, n_end) in windows(data['timestamp'], window_size):
## x = data["x-axis"][start:end]
## y = data["y-axis"][start:end]
## z = data["z-axis"][start:end]
n_start = int(n_start)
n_end = int(n_end)
Ax = data["Ax"][n_start:n_end]
Ay = data["Ay"][n_start:n_end]
Az = data["Az"][n_start:n_end]
Gx = data["Gx"][n_start:n_end]
Gy = data["Gy"][n_start:n_end]
Gz = data["Gz"][n_start:n_end]
Mx = data["Mx"][n_start:n_end]
My = data["My"][n_start:n_end]
Mz = data["Mz"][n_start:n_end]
if(len(dataset['timestamp'][n_start:n_end]) == window_size): # include only windows with size of 90
segments = np.vstack([segments,np.dstack([Ax,Ay,Az,Gx,Gy,Gz,Mx,My,Mz])])
labels = np.append(labels,stats.mode(data["activity"][n_start:n_end])[0][0])
return segments, labels
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev = 0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.0, shape = shape)
return tf.Variable(initial)
def depthwise_conv2d(x, W):
return tf.nn.depthwise_conv2d(x,W, [1, 1, 1, 1], padding='VALID')
def apply_depthwise_conv(x,weights,biases):
return tf.nn.relu(tf.add(depthwise_conv2d(x, weights),biases))
def apply_max_pool(x,kernel_size,stride_size):
return tf.nn.max_pool(x, ksize=[1, 1, kernel_size, 1],
strides=[1, 1, stride_size, 1], padding='VALID')
#------------------------get dataset----------------------#
## run shoaib_dataset.py to generate dataset_shoaib_total.txt
## get data from dataset_shoaib_total.txt
dataset = read_data('dataset_shoaib_total.txt')
#--------------------preprocessing------------------------#
dataset['Ax'] = feature_normalize(dataset['Ax'])
dataset['Ay'] = feature_normalize(dataset['Ay'])
dataset['Az'] = feature_normalize(dataset['Az'])
dataset['Gx'] = feature_normalize(dataset['Gx'])
dataset['Gy'] = feature_normalize(dataset['Gy'])
dataset['Gz'] = feature_normalize(dataset['Gz'])
dataset['Mx'] = feature_normalize(dataset['Mx'])
dataset['My'] = feature_normalize(dataset['My'])
dataset['Mz'] = feature_normalize(dataset['Mz'])
###--------------------plot activity data----------------#
##for activity in np.unique(dataset["activity"]):
## subset = dataset[dataset["activity"] == activity][:180]
## plot_activity(activity,subset)
#------------------fixed hyperparameters--------------------#
window_size = 200 #from 90 #FIXED at 4 seconds
#----------------input hyperparameters------------------#
input_height = 1
input_width = window_size
num_labels = 6
num_channels = 9 #from 3 channels #9 channels for AGM
#-------------------sliding time window----------------#
segments, labels = segment_signal(dataset, window_size=window_size, num_channels=num_channels)
labels = np.asarray(pd.get_dummies(labels), dtype = np.int8)
reshaped_segments = segments.reshape(len(segments), (window_size*num_channels)) #use variable num_channels instead of constant 3 channels
#------------divide data into test and training set-----------#
train_test_split = np.random.rand(len(reshaped_segments)) < 0.80
train_x_init = reshaped_segments[train_test_split]
train_y_init = labels[train_test_split]
test_x = reshaped_segments[~train_test_split]
test_y = labels[~train_test_split]
train_validation_split = np.random.rand(len(train_x_init)) < 0.80
train_x = train_x_init[train_validation_split]
train_y = train_y_init[train_validation_split]
validation_x = train_x_init[~train_validation_split]
validation_y = train_y_init[~train_validation_split]
#---------------training hyperparameters----------------#
batch_size = 10
kernel_size = 60 #from 60 #optimal 2
depth = 15 #from 60 #optimal 15
num_hidden = 1000 #from 1000 #optimal 80
learning_rate = 0.0001
training_epochs = 8
total_batches = train_x.shape[0] ##// batch_size
#---------define placeholders for input----------#
X = tf.placeholder(tf.float32, shape=[None,input_width * num_channels], name="input")
X_reshaped = tf.reshape(X,[-1,input_height,input_width,num_channels])
Y = tf.placeholder(tf.float32, shape=[None,num_labels])
#---------------------perform convolution-----------------#
# first convolutional layer
c_weights = weight_variable([1, kernel_size, num_channels, depth])
c_biases = bias_variable([depth * num_channels])
c = apply_depthwise_conv(X_reshaped,c_weights,c_biases)
p = apply_max_pool(c,20,2)
# second convolutional layer
c2_weights = weight_variable([1, 6,depth*num_channels,depth//10])
c2_biases = bias_variable([(depth*num_channels)*(depth//10)])
c = apply_depthwise_conv(p,c2_weights,c2_biases)
#--------------flatten data for fully connected layers----------#
shape = c.get_shape().as_list()
c_flat = tf.reshape(c, [-1, shape[1] * shape[2] * shape[3]])
#------------fully connected layers----------------#
f_weights_l1 = weight_variable([shape[1] * shape[2] * depth * num_channels * (depth//10), num_hidden])
f_biases_l1 = bias_variable([num_hidden])
f = tf.nn.tanh(tf.add(tf.matmul(c_flat, f_weights_l1),f_biases_l1))
#----------------------dropout------------------#
keep_prob = tf.placeholder(tf.float32)
drop_layer = tf.nn.dropout(f, keep_prob)
#----------------------softmax layer----------------#
out_weights = weight_variable([num_hidden, num_labels])
out_biases = bias_variable([num_labels])
y_ = tf.nn.softmax(tf.add(tf.matmul(drop_layer, out_weights),out_biases), name="y_")
#-----------------loss optimization-------------#
loss = -tf.reduce_sum(Y * tf.log(y_))
optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(loss)
#-----------------compute accuracy---------------#
correct_prediction = tf.equal(tf.argmax(y_,1), tf.argmax(Y,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
cost_history = np.empty(shape=[1],dtype=float)
saver = tf.train.Saver()
session = tf.Session()
session.run(tf.global_variables_initializer())
#-------------early stopping-----------------#
# Best validation accuracy seen so far.
best_validation_accuracy = 0.0
# Iteration-number for last improvement to validation accuracy.
last_improvement = 0
# Stop optimization if no improvement found in this many iterations.
require_improvement = 1000
# Counter for total number of iterations performed so far.
total_iterations = 0
def validation_accuracy():
return session.run(accuracy, feed_dict={X: validation_x, Y: validation_y, keep_prob: 1.0})
def next_batch(b, batch_size, train_x, train_y):
##for b in range(total_batches):
offset = (b * batch_size) % (train_y.shape[0] - batch_size)
batch_x = train_x[offset:(offset + batch_size), :]
batch_y = train_y[offset:(offset + batch_size), :]
return batch_x, batch_y
def optimize(num_iterations):
# Ensure we update the global variables rather than local copies.
global total_iterations
global best_validation_accuracy
global last_improvement
# Start-time used for printing time-usage below.
start_time = time.time()
for i in range(num_iterations):
# Increase the total number of iterations performed.
# It is easier to update it in each iteration because
# we need this number several times in the following.
total_iterations += 1
# Get a batch of training examples.
# x_batch now holds a batch of images and
# y_true_batch are the true labels for those images.
##x_batch, y_true_batch = data.train.next_batch(train_batch_size)
x_batch, y_true_batch = next_batch(i, batch_size, train_x, train_y)
# Put the batch into a dict with the proper names
# for placeholder variables in the TensorFlow graph.
feed_dict_train = {X: x_batch,
Y: y_true_batch, keep_prob: 0.5}
# Run the optimizer using this batch of training data.
# TensorFlow assigns the variables in feed_dict_train
# to the placeholder variables and then runs the optimizer.
session.run(optimizer, feed_dict=feed_dict_train)
# Print status every 100 iterations and after last iteration.
if (total_iterations % 100 == 0) or (i == (num_iterations - 1)):
# Calculate the accuracy on the training-batch.
acc_train = session.run(accuracy, feed_dict={X: x_batch,
Y: y_true_batch, keep_prob: 1.0})
# Calculate the accuracy on the validation-set.
# The function returns 2 values but we only need the first.
##acc_validation, _ = validation_accuracy()
acc_validation = validation_accuracy()
# If validation accuracy is an improvement over best-known.
if acc_validation > best_validation_accuracy:
# Update the best-known validation accuracy.
best_validation_accuracy = acc_validation
# Set the iteration for the last improvement to current.
last_improvement = total_iterations
# Save all variables of the TensorFlow graph to file.
saver.save(sess=session, save_path="../shoaib-har_agm_es.ckpt")
# A string to be printed below, shows improvement found.
improved_str = '*'
else:
# An empty string to be printed below.
# Shows that no improvement was found.
improved_str = ''
# Status-message for printing.
msg = "Iter: {0:>6}, Train-Batch Accuracy: {1:>6.1%}, Validation Acc: {2:>6.1%} {3}"
# Print it.
print(msg.format(i + 1, acc_train, acc_validation, improved_str))
# If no improvement found in the required number of iterations.
if total_iterations - last_improvement > require_improvement:
print("No improvement found in a while, stopping optimization.")
# Break out from the for-loop.
break
# Ending time.
end_time = time.time()
# Difference between start and end-times.
time_dif = end_time - start_time
# Print the time-usage.
print("Time usage: " + str(timedelta(seconds=int(round(time_dif)))))
optimize(10000)
With the output:
What exactly is training accuracy? Is it even computed? Or do you compute the training accuracy on the entire training data and not just the batch you trained your network with?
Here I printed the results such that it prints out the batch training accuracy and the training accuracy on the entire dataset set for every multiples of 20 iterations.
The data is divided to 3 sets: train, validation and test.
Batch training accuracy is computed on the train set (the difference between the label and the prediction).
Validation accuracy is the accuracy on the validation set.
The batch accuracy can be computed just after a forward pass in the network. The number of samples in one forward pass is the batch size. It is just a way to train models faster (mini-batch gradient descent)
Overfitting is when the model works really good on known data (training set) but performs poorly on new data.
As to the 10% multiples, it is just the printing format you are using.