How can I randomize my image selection to test my CNN model - image-processing

I want to predict the emotion in my cnn model using images first, before I apply it using web cam.
Please correct my code if it is wrong.
model = tf.keras.models.load_model("model/model.model")
label_dict = {0 : 'boredom',
1 : 'confused',
2 : 'drowsy',
3 : 'engage',
4 : 'frustrated',
5 : 'looking away'}
def img():
list_dir ='./Images/bored/'
file_list = os.listdir(list_dir )
img_path = random.choice(file_list)
img = load_img(img_path, target_size=(48,48))
x = img_to_array(img)
x = np.expand_dims(x, axis=0)
prediction = np.argmax(model.predict(x))
print('The predicted emotion is : ' + label_dict[prediction])
my_image = load_img(img_path)
return my_image
plt.imshow(img())
when I try to run it i got this error:

Related

Using WeightedRandomSampler in PyTorch

I need to implement a multi-label image classification model in PyTorch. However my data is not balanced, so I used the WeightedRandomSampler in PyTorch to create a custom dataloader. But when I iterate through the custom dataloader, I get the error : IndexError: list index out of range
Implemented the following code using this link :https://discuss.pytorch.org/t/balanced-sampling-between-classes-with-torchvision-dataloader/2703/3?u=surajsubramanian
def make_weights_for_balanced_classes(images, nclasses):
count = [0] * nclasses
for item in images:
count[item[1]] += 1
weight_per_class = [0.] * nclasses
N = float(sum(count))
for i in range(nclasses):
weight_per_class[i] = N/float(count[i])
weight = [0] * len(images)
for idx, val in enumerate(images):
weight[idx] = weight_per_class[val[1]]
return weight
weights = make_weights_for_balanced_classes(train_dataset.imgs, len(full_dataset.classes))
weights = torch.DoubleTensor(weights)
sampler = WeightedRandomSampler(weights, len(weights))
train_loader = DataLoader(train_dataset, batch_size=4,sampler = sampler, pin_memory=True)
Based on the answer in https://stackoverflow.com/a/60813495/10077354, the following is my updated code. But then too when I create a dataloader :loader = DataLoader(full_dataset, batch_size=4, sampler=sampler), len(loader) returns 1.
class_counts = [1691, 743, 2278, 1271]
num_samples = np.sum(class_counts)
labels = [tag for _,tag in full_dataset.imgs]
class_weights = [num_samples/class_counts[i] for i in range(len(class_counts)]
weights = [class_weights[labels[i]] for i in range(num_samples)]
sampler = WeightedRandomSampler(torch.DoubleTensor(weights), num_samples)
Thanks a lot in advance !
I included an utility function based on the accepted answer below :
def sampler_(dataset):
dataset_counts = imageCount(dataset)
num_samples = sum(dataset_counts)
labels = [tag for _,tag in dataset]
class_weights = [num_samples/dataset_counts[i] for i in range(n_classes)]
weights = [class_weights[labels[i]] for i in range(num_samples)]
sampler = WeightedRandomSampler(torch.DoubleTensor(weights), int(num_samples))
return sampler
The imageCount function finds number of images of each class in the dataset. Each row in the dataset contains the image and the class, so we take the second element in the tuple into consideration.
def imageCount(dataset):
image_count = [0]*(n_classes)
for img in dataset:
image_count[img[1]] += 1
return image_count
That code looks a bit complex... You can try the following:
#Let there be 9 samples and 1 sample in class 0 and 1 respectively
class_counts = [9.0, 1.0]
num_samples = sum(class_counts)
labels = [0, 0,..., 0, 1] #corresponding labels of samples
class_weights = [num_samples/class_counts[i] for i in range(len(class_counts))]
weights = [class_weights[labels[i]] for i in range(int(num_samples))]
sampler = WeightedRandomSampler(torch.DoubleTensor(weights), int(num_samples))
Here is an alternative solution:
import numpy as np
from torch.utils.data.sampler import WeightedRandomSampler
counts = np.bincount(y)
labels_weights = 1. / counts
weights = labels_weights[y]
WeightedRandomSampler(weights, len(weights))
where y is a list of labels corresponding to each sample, has shape (n_samples,) and are encoded [0, ..., n_classes].
weights won't add up to 1, which is ok according to the official docs.

keras neural network predicts the same number for every handwritten digit

I am new to machine learning so as a first project I've tried to built a handwritten digit recognition neural network based on the mnist dataset and when I test it with the test images provided by the data set itself it seems to work pretty well (that's what the function test_predict is for). Now I would like to step it up and have the network recognise some actual handwritten digits that I've taken photos of.
The function partial_img_rec takes on an image containing multiple digits and it will be called by multiple_digits. I know it might seem weird that I use recursion here and I'm sure there are some more efficient ways to do this but that's not the matter. In order to test partial_img_rec I provide some photos of individual digits that are stored in the folder .\individual_test and they all look something like this:
The problem is: My neural network's prediction for every single one of my test images is "5". The probability is always about 22% no matter the actual digit displayed. I totally get why the results are not as great as those achieved with the mnist dataset's test images but I certainly didn't expect this. Do you have any idea why this is happening? Any advise is welcome.
Thank you in advance.
Here's my code (edited, now working):
# import keras and the MNIST dataset
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from keras.utils import np_utils
# numpy is necessary since keras uses numpy arrays
import numpy as np
# imports for pictures
from PIL import Image
from PIL import ImageOps
# imports for tests
import random
import os
class mnist_network():
def __init__(self):
""" load data, create and train model """
# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# flatten 28*28 images to a 784 vector for each image
num_pixels = X_train.shape[1] * X_train.shape[2]
X_train = X_train.reshape((X_train.shape[0], num_pixels)).astype('float32')
X_test = X_test.reshape((X_test.shape[0], num_pixels)).astype('float32')
# normalize inputs from 0-255 to 0-1
X_train = X_train / 255
X_test = X_test / 255
# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]
# create model
self.model = Sequential()
self.model.add(Dense(num_pixels, input_dim=num_pixels, kernel_initializer='normal', activation='relu'))
self.model.add(Dense(num_classes, kernel_initializer='normal', activation='softmax'))
# Compile model
self.model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# train the model
self.model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=2)
self.train_img = X_train
self.train_res = y_train
self.test_img = X_test
self.test_res = y_test
def test_all(self):
""" evaluates the success rate using all the test data """
scores = self.model.evaluate(self.test_img, self.test_res, verbose=0)
print("Baseline Error: %.2f%%" % (100-scores[1]*100))
def predict_result(self, img, num_pixels = None, show=False):
""" predicts the number in a picture (vector) """
assert type(img) == np.ndarray and img.shape == (784,)
"""if show:
# show the picture!!!! some problem here
plt.imshow(img, cmap='Greys')
plt.show()"""
num_pixels = img.shape[0]
# the actual number
res_number = np.argmax(self.model.predict(img.reshape(-1,num_pixels)), axis = 1)
# the probabilities
res_probabilities = self.model.predict(img.reshape(-1,num_pixels))
return (res_number[0], res_probabilities.tolist()[0]) # we only need the first element since they only have one
def test_predict(self, amount_test = 100):
""" test some random numbers from the test part of the data set """
assert type(amount_test) == int and amount_test <= 10000
cnt_right = 0
cnt_wrong = 0
for i in range(amount_test):
ind = random.randrange(0,10000) # there are 10000 images in the test part of the data set
""" correct_res is the actual result stored in the data set
It's represented as a list of 10 elements one of which being 1, the rest 0 """
correct_list = self.test_res.tolist()
correct_list = correct_list[ind] # the correct sublist
correct_res = correct_list.index(1.0)
predicted_res = self.predict_result(self.test_img[ind])[0]
if correct_res != predicted_res:
cnt_wrong += 1
print("Error in predict ! \
index = ", ind, " predicted result = ", predicted_res, " correct result = ", correct_res)
else:
cnt_right += 1
print("The machine predicted correctly ",cnt_right," out of ",amount_test," examples. That is a success rate of ", (cnt_right/amount_test)*100,"%.")
def partial_img_rec(self, image, upper_left, lower_right, results=[]):
""" partial is a part of an image """
left_x, left_y = upper_left
right_x, right_y = lower_right
print("current test part: ", upper_left, lower_right)
print("results: ", results)
# condition to stop recursion: we've reached the full width of the picture
width, height = image.size
if right_x > width:
return results
partial = image.crop((left_x, left_y, right_x, right_y))
# rescale image to 28 *28 dimension
partial = partial.resize((28,28), Image.ANTIALIAS)
partial.show()
# transform to vector
partial = ImageOps.invert(partial)
partial = np.asarray(partial, "float32")
partial = partial / 255.
partial[partial < 0.5] = 0.
# flatten image to 28*28 = 784 vector
num_pixels = partial.shape[0] * partial.shape[1]
partial = partial.reshape(num_pixels)
step = height // 10
# is there a number in this part of the image?
res, prop = self.predict_result(partial)
print("result: ", res, ". probabilities: ", prop)
# only count this result if the network is >= 50% sure
if prop[res] >= 0.5:
results.append(res)
# step is 80% of the partial image's size (which is equivalent to the original image's height)
step = int(height * 0.8)
print("found valid result")
else:
# if there is no number found we take smaller steps
step = height // 20
print("step: ", step)
# recursive call with modified positions ( move on step variables )
return self.partial_img_rec(image, (left_x+step, left_y), (right_x+step, right_y), results=results)
def test_individual_digits(self):
""" test partial_img_rec with some individual digits (square shaped images)
saved in the folder 'individual_test' following the pattern 'number_digit.jpg' """
cnt_right, cnt_wrong = 0,0
folder_content = os.listdir(".\individual_test")
for imageName in folder_content:
# image file must be a jpg or png
assert imageName[-4:] == ".jpg" or imageName[-4:] == ".png"
correct_res = int(imageName[0])
image = Image.open(".\\individual_test\\" + imageName).convert("L")
# only square images in this test
if image.size[0] != image.size[1]:
print(imageName, " has the wrong proportions: ", image.size,". It has to be a square.")
continue
predicted_res = self.partial_img_rec(image, (0,0), (image.size[0], image.size[1]), results=[])
if predicted_res == []:
print("No prediction possible for ", imageName)
else:
predicted_res = predicted_res[0]
if predicted_res != correct_res:
print("error in partial_img-rec! Predicted ", predicted_res, ". The correct result would have been ", correct_res)
cnt_wrong += 1
else:
cnt_right += 1
print("correctly predicted ",imageName)
print(cnt_right, " out of ", cnt_right + cnt_wrong," digits were correctly recognised. The success rate is therefore ", (cnt_right / (cnt_right + cnt_wrong)) * 100," %.")
def multiple_digits(self, img):
""" takes as input an image without unnecessary whitespace surrounding the digits """
#assert type(img) == myImage
width, height = img.size
# start with the first quadratic part of the image
res_list = self.partial_img_rec(img, (0,0),(height ,height))
res_str =""
for elem in res_list:
res_str += str(elem)
return res_str
network = mnist_network()
network.test_individual_digits()
EDIT
#Geecode's answer was very helpful and the network now predicts correctly some of the pictures including the one shown above. Yet the overall success rate is lower than 50%. Do you have any ideas how to improve this?
Examples for images returning bad results:
Nothing wrong with your image in itself, your model can correctly classify it.
The issue is that you made a Floor Division on your partial:
partial = partial // 255
which always results in 0. So you always get a black image.
You have to do a "normal" division and some preparation, because your model was trained on black i.e. 0. valued pixel backgrounded negative images:
# transform to vector
partial = ImageOps.invert(partial)
partial = np.asarray(partial, "float32")
partial = partial / 255.
partial[partial < 0.5] = 0.
After then your model will classify correctly:
Out:
result: 1 . probabilities: [0.000431705528171733, 0.7594985961914062, 0.0011404436081647873, 0.00018972357793245465, 0.03162384033203125, 0.008697531186044216, 0.0014472954208031297, 0.18429973721504211, 0.006838776171207428, 0.005832481198012829]
found valid result
Note, that of course you can play on the image preparation yet, that was not the purpose of this answer.
Update:
My detailed answer regarding how to achive better performance in this task, see here.

training data using one hot encoding gives results of coefficients more than the actual column

While using the one-hot encoding I am getting results sets of coefficients more than the actual number of columns
i have added the code also please have a look i am writing in the spark
coefficient : [0.0002054800236568163,2.0439310112800845e-06,0.0012587034306473716,0.0003538955306262437,0.0014205218783369504,-0.09556139895866411,-0.01119907246997649,0.0009278595718565514,0.055504033414581995,-0.0060363295643237206,0.1208861923722965,-0.03708163001735046,-0.011924436110750052,0.18739103759110842,-0.06788345901273717,0.24122048812836505,-0.08719840615913002,-0.18789455768956798,0.2881887187896297,-0.13987095144035597,-0.016854358762055686,0.029427863518793968,-0.01918399191298753,0.011116841193397481,0.04191756597743858,-0.04191756597744139,-0.003281743064241399,0.0032817430642382403,-0.007199912662577535,0.007199912662575341,0.011613111115769799,-0.042503873680151225,0.10019922603083396,-0.34485589766428043,0.3756841570542743,-0.019416573355186505,0.37012264711363996]
features column : ['balance', 'day', 'duration', 'campaign', 'age', 'job', 'marital', 'education', 'default', 'housing', 'loan', 'contact', 'poutcome']
oneHotEncodedFeatures = []
for colm in stringFeatures:
indexer = StringIndexer(inputCol=colm, outputCol='indexed_' + colm).fit(dataset)
indexed_features.append('indexed_' + colm)
dataset = indexer.transform(dataset)
encoder = OneHotEncoderEstimator(inputCols=['indexed_'+colm], outputCols=['encoded_'+colm], dropLast=True, handleInvalid='keep').fit(dataset)
oneHotEncodedFeatures.append('encoded_'+colm)
dataset = encoder.transform(dataset)
dataset.show()
final_features = numericalFeatures + oneHotEncodedFeatures
featureassembler = VectorAssembler(inputCols=final_features,
outputCol="features")
dataset = featureassembler.transform(dataset)
# vectorIndexer = VectorIndexer(inputCol='features', outputCol='vectorIndexedFeatures', maxCategories=4).fit(
# dataset)
# dataset = vectorIndexer.transform(dataset)
trainDataRatioTransformed = self.trainDataRatio
testDataRatio = 1 - trainDataRatioTransformed
trainingData, testData = dataset.randomSplit([trainDataRatioTransformed, testDataRatio], seed=40)
# applying the model
lr = LinearRegression(featuresCol="features", labelCol=label)
regressor = lr.fit(trainingData)
locationAddress = 'hdfs://10.171.0.181:9000/dev/dmxdeepinsight/datasets/'
modelPersist = 'linearRegressorModel.parquet'
modelStorageLocation = locationAddress + userId + modelPersist
regressor.write().overwrite().save(modelStorageLocation)
# print regressor.featureImportances
# print(dataset.orderBy(feature_colm, ascending=True))
# pred = regressor.transform(testData)
# coefficeint & intercept
# saving the model and test dataset as csv file
print("coefficient : " + str(regressor.coefficients))
coefficient_t = str(regressor.coefficients)
# print("intercept : " + str(regressor.intercept))
intercept_t = str(regressor.intercept)
print('features column :',feature_colm)```
There should be only thirteen coefficients but it is showing twenty nine.

value prediction with tensorflow and python

I have a data set which contains a list of stock prices. I need to use the tensorflow and python to predict the close price.
Q1: I have the following code which takes the first 2000 records as training and 2001 to 20000 records as test but I don't know how to change the code to do the prediction of the close price of today and 1 day later??? Please advise!
#!/usr/bin/env python2
import numpy as np
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt
def feature_scaling(input_pd, scaling_meathod):
if scaling_meathod == 'z-score':
scaled_pd = (input_pd - input_pd.mean()) / input_pd.std()
elif scaling_meathod == 'min-max':
scaled_pd = (input_pd - input_pd.min()) / (input_pd.max() -
input_pd.min())
return scaled_pd
def input_reshape(input_pd, start, end, batch_size, batch_shift, n_features):
temp_pd = input_pd[start-1: end+batch_size-1]
output_pd = map(lambda y : temp_pd[y:y+batch_size], xrange(0, end-start+1, batch_shift))
output_temp = map(lambda x : np.array(output_pd[x]).reshape([-1]), xrange(len(output_pd)))
output = np.reshape(output_temp, [-1, batch_size, n_features])
return output
def target_reshape(input_pd, start, end, batch_size, batch_shift, n_step_ahead, m_steps_pred):
temp_pd = input_pd[start+batch_size+n_step_ahead-2: end+batch_size+n_step_ahead+m_steps_pred-2]
print temp_pd
output_pd = map(lambda y : temp_pd[y:y+m_steps_pred], xrange(0, end-start+1, batch_shift))
output_temp = map(lambda x : np.array(output_pd[x]).reshape([-1]), xrange(len(output_pd)))
output = np.reshape(output_temp, [-1,1])
return output
def lstm(input, n_inputs, n_steps, n_of_layers, scope_name):
num_layers = n_of_layers
input = tf.transpose(input,[1, 0, 2])
input = tf.reshape(input,[-1, n_inputs])
input = tf.split(0, n_steps, input)
with tf.variable_scope(scope_name):
cell = tf.nn.rnn_cell.BasicLSTMCell(num_units=n_inputs)
cell = tf.nn.rnn_cell.MultiRNNCell([cell]*num_layers)
output, state = tf.nn.rnn(cell, input, dtype=tf.float32) yi1
output = output[-1]
return output
feature_to_input = ['open price', 'highest price', 'lowest price', 'close price','turnover', 'volume','mean price']
feature_to_predict = ['close price']
feature_to_scale = ['volume']
sacling_meathod = 'min-max'
train_start = 1
train_end = 1000
test_start = 1001
test_end = 20000
batch_size = 100
batch_shift = 1
n_step_ahead = 1
m_steps_pred = 1
n_features = len(feature_to_input)
lstm_scope_name = 'lstm_prediction'
n_lstm_layers = 1
n_pred_class = 1
learning_rate = 0.1
EPOCHS = 1000
PRINT_STEP = 100
read_data_pd = pd.read_csv('./stock_price.csv')
temp_pd = feature_scaling(input_pd[feature_to_scale],sacling_meathod)
input_pd[feature_to_scale] = temp_pd
train_input_temp_pd = input_pd[feature_to_input]
train_input_nparr = input_reshape(train_input_temp_pd,
train_start, train_end, batch_size, batch_shift, n_features)
train_target_temp_pd = input_pd[feature_to_predict]
train_target_nparr = target_reshape(train_target_temp_pd, train_start, train_end, batch_size, batch_shift, n_step_ahead, m_steps_pred)
test_input_temp_pd = input_pd[feature_to_input]
test_input_nparr = input_reshape(test_input_temp_pd, test_start, test_end, batch_size, batch_shift, n_features)
test_target_temp_pd = input_pd[feature_to_predict]
test_target_nparr = target_reshape(test_target_temp_pd, test_start, test_end, batch_size, batch_shift, n_step_ahead, m_steps_pred)
tf.reset_default_graph()
x_ = tf.placeholder(tf.float32, [None, batch_size, n_features])
y_ = tf.placeholder(tf.float32, [None, 1])
lstm_output = lstm(x_, n_features, batch_size, n_lstm_layers, lstm_scope_name)
W = tf.Variable(tf.random_normal([n_features, n_pred_class]))
b = tf.Variable(tf.random_normal([n_pred_class]))
y = tf.matmul(lstm_output, W) + b
cost_func = tf.reduce_mean(tf.square(y - y_))
train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost_func)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
init = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init)
for ii in range(EPOCHS):
sess.run(train_op, feed_dict={x_:train_input_nparr, y_:train_target_nparr})
if ii % PRINT_STEP == 0:
cost = sess.run(cost_func, feed_dict={x_:train_input_nparr, y_:train_target_nparr})
print 'iteration =', ii, 'training cost:', cost
Very simply, prediction (a.k.a. scoring or inference) comes from running the input through only the forward pass, and collecting the score for each input vector. It's the same process flow as testing. The difference is the four stages of model use:
Train: learn from the training data set; adjust weights as needed.
Test: evaluate the model's performance; if accuracy has converged, stop training.
Validate: evaluate the accuracy of the trained model. If it doesn't meet acceptance criteria, change something and start over with the training.
Predict: you've passed validation -- release the model for use by the intended application.
All four steps follow the same forward logic flow; training includes back-propagation; the others do not. Simply follow the forward-only process, and you'll get the result form you need.
I worry about your data partition: only 10% for training, 90% for testing, and none for validation. A more typical split is 50-30-20, or something in that general area.
Q-1 : You should change your LSTM parameter to return a sequence of size two which will be prediction for that day and the day after.
Q-2 it's clearly that your model is underfitting the data, which is so obvious with your 10% train 90% test data ! You should more equilibrated ratio as suggested in the previous answer.

What are the problems that causes neural networks stagnate in learning?

I was trying to see how accurate a neural network can approximate simple functions, like a scalar-valued polynomial in several variables. So I had these ideas:
Fix a polynomial of several variables, say, f(x_1,..,x_n).
Generate 50000 vectors of length n using numpy.random which will serve as training data.
Evaluate the f(x) at these points, the value will be used as label.
Make test data and label in the same way
Write a neural network and see how accuracy it can approximate f(x) on test set.
Here is my sample neural network implemented in tensorflow
import tensorflow as tf
import numpy as np
input_vector_length = int(10)
output_vector_length = int(1)
train_data_size = int(50000)
test_data_size = int(10000)
train_input_domain = [-10, 10] #Each component in an input vector is between -10 and 10
test_input_domain = [-10, 10]
iterations = 20000
batch_size = 200
regularizer = 0.01
sess = tf.Session()
x = tf.placeholder(tf.float32, shape=[None, input_vector_length], name="x")
y = tf.placeholder(tf.float32, shape =[None, output_vector_length], name="y")
function = tf.reduce_sum(x, 1) + 0.25*tf.pow(tf.reduce_sum(x,1), 2) + 0.025*tf.pow(tf.reduce_sum(x,1), 3)
#make train data input
train_input = (train_input_domain[1]-train_input_domain[0])*np.random.rand(train_data_size, input_vector_length) + train_input_domain[0]
#make train data label
train_label = sess.run(function, feed_dict = {x : train_input})
train_label = train_label.reshape(train_data_size, output_vector_length)
#make test data input
test_input = (test_input_domain[1]-test_input_domain[0])*np.random.rand(test_data_size, input_vector_length) + test_input_domain[0]
#make test data label
test_label = sess.run(function, feed_dict = {x : test_input})
test_label = test_label.reshape(test_data_size, output_vector_length)
def weight_variables(shape, name):
initial = 10*tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variables(shape, name):
initial = 10*tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def take_this_batch(data, batch_index=[]):
A = []
for i in range(len(batch_index)):
A.append(data[i])
return A
W_0 = weight_variables(shape=[input_vector_length, 10], name="W_0")
B_0 = bias_variables(shape=[10], name="W_0")
y_1 = tf.sigmoid(tf.matmul(x, W_0) + B_0)
W_1 = weight_variables(shape=[10, 20], name="W_1")
B_1 = bias_variables(shape=[20], name="B_1")
y_2 = tf.sigmoid(tf.matmul(y_1, W_1) + B_1)
W_2 = weight_variables(shape=[20,40], name="W_2")
B_2 = bias_variables(shape=[40], name="B_2")
y_3 = tf.sigmoid(tf.matmul(y_2, W_2) + B_2)
keep_prob = tf.placeholder(tf.float32, name="keep_prob")
y_drop = tf.nn.dropout(y_3, keep_prob)
W_output = weight_variables(shape=[40, output_vector_length], name="W_output")
B_output = bias_variables(shape=[output_vector_length], name="B_output")
y_output = tf.matmul(y_drop, W_output) + B_output
weight_sum = tf.reduce_sum(tf.square(W_0)) + tf.reduce_sum(tf.square(W_1)) + tf.reduce_sum(tf.square(W_2)) + tf.reduce_sum(tf.square(W_3))
cost = tf.reduce_mean(tf.square(y - y_output)) + regularizer*(weight_sum)
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cost)
error = cost
sess.run(tf.initialize_all_variables())
with sess.as_default():
for step in range(iterations):
batch_index = np.random.randint(low=0, high=train_data_size, size=batch_size)
batch_input = take_this_batch(train_input, batch_index)
batch_label = take_this_batch(train_label, batch_index)
train_step.run(feed_dict = {x : batch_input, y:batch_label, keep_prob:0.5})
if step % 1000 == 0:
current_error = error.eval(feed_dict = {x:batch_input, y:batch_label, keep_prob:1.0})
print("step %d, Current error is %f" % (step,current_error))
print(error.eval(feed_dict={x:test_input, y:test_label, keep_prob:1.0}))
Simply speaking, the performance of this neural network is horrifying! My neural network has three hidden layers of size 10, 20 and 40. The input layer is of size 10, and the output layer has size 1. I used a simple L^2 cost function, and I regularized it with the square of weights and regularizer 0.01.
During training stage, I noticed that the error seems to get stuck and refuses to go down. I am wondering what could go wrong? Thanks a lot for reading this long question. Any suggestion is appreciated.
Since you are using sigmoid as the activation function in the hidden layers, the value at these neurons is reduced to the range of (0,1). Hence, it is a good idea to normalize the input data for this network.

Resources