Improving Accuracy of prediction Algorithm - machine-learning

I am new to machine learning and I am currently working on Prediction problem. I have provided the link for an excel spreadsheet that has few columns of data.
https://drive.google.com/file/d/1fWf6dX8kOCRB3GpX42AF6UvTmd0g9zXp/view?usp=sharing
I was trying to predict the column F values based column A to E values. The code for which is given below
import numpy as np
import pandas as pn
from keras.layers import Dense, Activation
from keras.models import Sequential
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import linear_model
import matplotlib.pyplot as plt
dataset = pn.read_excel(r"G:\Machine learning\data\database.xlsx", "Sheet5")
dataset.columns = ['A','B','C','D','E','F']
print (dataset)
#check= dataset.iloc[0:,3 :13]
X = dataset.iloc[0:,0 :5]
print(X)
Y = dataset.iloc[0:, 5 :6]
print(Y)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.15, random_state = 0)
print(X_test)
print(Y_test)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
#
model = Sequential()
##
### Adding the input layer and the first hidden layer
model.add(Dense(32, activation = 'relu', input_dim = 5, kernel_initializer='normal'))
##
### Adding the second hidden layer
model.add(Dense(units = 16, activation = 'relu'))
model.add(Dense(units = 64, activation = 'relu'))
model.add(Dense(units = 8, activation = 'relu'))
model.add(Dense(units = 16, activation = 'relu'))
#model.add(Dense(units = 8, activation = 'linear'))
###
#### Adding the third hidden layer
#model.add(Dense(units = 16, activation = 'relu'))
#model.add(Dense(units = 16, activation = 'relu'))
#model.add(Dense(units = 16, activation = 'relu'))
##
### Adding the output layer
model.add(Dense(units = 1))
##
model.add(Dense(units = 1))
##
model.add(Dense(1))
### Compiling the ANN
model.compile(optimizer = 'nadam', loss = 'mean_squared_error',metrics= ['accuracy'])
##
### Fitting the ANN to the Training set
history = model.fit(X_train, Y_train, epochs=125, batch_size=5, verbose=1, validation_split=0.1)
##
y_pred = model.predict(X_test)
##
y_pred1 = model.predict(X_train)
print (y_pred1)
Y_test.reset_index(drop= True, inplace= True)
print (Y_train)
Y_train.reset_index(drop= True, inplace= True)
plt.plot(y_pred1)
plt.plot(Y_train)
plt.show()
plt.plot(y_pred)
plt.plot(Y_test)
plt.show()
print (y_pred)
print (Y_test)
plt.plot((Y_test-y_pred)*100/Y_test)
plt.show()
The fit I get from this code is shown below.
Fit
Now when I predict, the error is huge in some cases as shown below
Prediction
Can anyone guide me on improvising the code so as to get a better prediction?


For a small dataset you have provided, it might be the case that your neural network model is too complicated (4 hidden layers and maximum 64 neurons).
What you can try is manually decreasing the amount of layers to see if the accuracy raises. But in the future, if you would like to more practically tune the hyperparameters or optimize your model parameters, you should consider using methods such as random/grid search, cross validation, and regularization.
Random Search: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html

Related

Difficulties in adding the output layer in python

I am getting this error when I try to run my code:
File "C:\Users\olaku\anaconda3\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
TypeError: __init__() missing 1 required positional argument: 'units'".
My code:
# Importing the relevant libraries.
import numpy as np
import pandas as pd
#importing the dataset
dataset = pd.read_csv ('Churn_Modelling.csv')
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values
# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])
labelencoder_X_2 = LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])
onehotencoder = OneHotEncoder(categories='auto', drop=None, sparse=True)
X = onehotencoder.fit_transform(X).toarray()
X = X[:, 1:]
# Splitting the dataset into training and testing set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 123)
#Feature Scalling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Part 2: Make the Artifical Neural Network (ANN)
#Import the Keras library and packages
import keras
from keras.models import Sequential
from keras.layers import Dense
#Initialising the ANN
classifier = Sequential ()
classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu', input_dim = 11))
#Adding the second hidden layer
classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu'))
#Adding the output layer
classifier.add(Dense(output = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
try this changes:
classifier.add(Dense(6, kernel_initializer = 'uniform', activation = 'relu', input_dim = 11))
#Adding the second hidden layer
classifier.add(Dense(6, kernel_initializer = 'uniform', activation = 'relu'))
#Adding the output layer
classifier.add(Dense(1, kernel_initializer = 'uniform', activation = 'sigmoid'))
let me know!

xception model incorrect classification over training dataset [duplicate]

I want to predict the kind of 2 diseases but I get results as binary (like 1.0 and 0.0). How can I get accuracy of these (like 0.7213)?
Training code:
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense
# Intialising the CNN
classifier = Sequential()
# Step 1 - Convolution
classifier.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), activation = 'relu'))
# Step 2 - Pooling
classifier.add(MaxPooling2D(pool_size = (2, 2)))
# Adding a second convolutional layer
classifier.add(Conv2D(32, (3, 3), activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))
# Step 3 - Flattening
classifier.add(Flatten())
# Step 4 - Full connection
classifier.add(Dense(units = 128, activation = 'relu'))
classifier.add(Dense(units = 1, activation = 'sigmoid'))
# Compiling the CNN
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
# Part 2 - Fitting the CNN to the images
import h5py
from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(rescale = 1./255,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True)
test_datagen = ImageDataGenerator(rescale = 1./255)
training_set = train_datagen.flow_from_directory('training_set',
target_size = (64, 64),
batch_size = 32,
class_mode = 'binary')
test_set = test_datagen.flow_from_directory('test_set',
target_size = (64, 64),
batch_size = 32,
class_mode = 'binary')
classifier.fit_generator(training_set,
steps_per_epoch = 100,
epochs = 1,
validation_data = test_set,
validation_steps = 100)
Single prediction code:
import numpy as np
from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img,image
test_image = image.load_img('path_to_image', target_size = (64, 64))
test_image = image.img_to_array(test_image)
test_image = np.expand_dims(test_image, axis = 0)
result = classifier.predict(test_image)
print(result[0][0]) # Prints 1.0 or 0.0
# I want accuracy rate for this prediction like 0.7213
The file structures is like:
test_set
benigne
benigne_images
melignant
melignant_images
training set
Training set structure is also the same as test set.
Update: As you clarified in the comments, you are looking for the probabilities of each class given one single test sample. Therefore you can use predict method. However, note that you must first preprocess the image the same way you have done in the training phase:
test_image /= 255.0
result = classifier.predict(test_image)
The result would be the probability of the given image belonging to class one (i.e. positive class).
If you have a generator for test data, then you can use evaluate_generator() to get the loss as well as the accuracy (or any other metric you have set) of the model on the test data.
For example, right after fitting the model, i.e. using fit_generator, you can use evaluate_generator on your test data generator, i.e. test_set:
loss, acc = evaluate_generator(test_set)

ValueError: Error when checking target: expected dense_3 to have shape (1,) but got array with shape (11,)

I'm trying to train my model with Keras and I'm taking this online course from udemy. Now everything works fine but when I try to fit the ANN to the training set it gives the following error. everything works fine but when I EXECUTE this last line it gives the error.
It should work fine without this error or is there any other way to fit the ANN to the training set?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
dataset = pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])
labelencoder_X_2 = LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])
onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()
X = X[:, 1:]
from sklearn.model_selection import train_test_split
X_train , y_train , X_test, y_test = train_test_split(X,y, test_size = 0.2 , random_state = 0)
#convert X_test into a 'numpy' array to acoid valur error for 1D array
X_test = np.reshape(y, (-1,1))
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.fit_transform(X_test)
import keras
from keras.models import Sequential
from keras.layers import Dense
#initializing the ANN
classifier = Sequential()
#adding the input layer and the first hidden layer
classifier.add(Dense(units =6, kernel_initializer = 'uniform' , activation = 'relu', input_dim =11 ))
#adding the second layer
classifier.add(Dense(units = 6 , kernel_initializer = 'uniform' , activation = 'relu'))
#adding the output layer
classifier.add(Dense(units = 1 , kernel_initializer = 'uniform' , activation = 'sigmoid'))
#compiling the ANN
classifier.compile(optimizer = 'adam' , loss = 'binary_crossentropy', metrics = ['accuracy'])
# 'optimizer' is the algorithm that u wanna use for the wights adjustments
#fitting the ann to the trainging set
classifier.fit(X_train , y_train , batch_size =10 , epochs = 100 )
It seems that input_shape is not set correctly.
from docs:
Input shape
nD tensor with shape: (batch_size, ..., input_dim). The most common
situation would be a 2D input with shape (batch_size, input_dim).
In your case input_shape=(X_train.shape[1],)
Try this:
#initializing the ANN
classifier = Sequential()
#adding the input layer and the first hidden layer
classifier.add(Dense(units=6,
kernel_initializer='uniform',
activation='relu',
input_shape=(X_train.shape[1],))
...

Error when checking input: expected conv2d_1_input to have shape (1, 28, 28) but got array with shape (28, 28, 1)

I am trying to predict a digit by using a keras example, when i trained my model everything is fine and the accuracy of test data is very good, but when i want to predict a digit i have some problem with the match of dimensions , i tryed to change the dimension of the digit but it still the same error .
here is my code :
main.py:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.optimizers import Adam
from keras.layers.convolutional import MaxPooling2D
from sklearn.preprocessing import LabelEncoder,OneHotEncoder
from keras import backend as K
# Read training and test data files
train = pd.read_csv("C:/Users/GOT/Desktop/Arabic Handwritten Digits Dataset CSV/csvTrainImages 60k x 784.csv").values
test = pd.read_csv("C:/Users/GOT/Desktop/Arabic Handwritten Digits Dataset CSV/csvTestImages 10k x 784.csv").values
train_label = pd.read_csv("C:/Users/GOT/Desktop/Arabic Handwritten Digits Dataset CSV/csvTrainLabel 60k x 1.csv").values
test_label = pd.read_csv("C:/Users/GOT/Desktop/Arabic Handwritten Digits Dataset CSV/csvTestLabel 10k x 1.csv").values
print(train.shape)
#Reshape and normalize training data
trainX = train[:, 0:].reshape(train.shape[0],1,28, 28).astype( 'float32' )
X_train = trainX / 255.0
y_train = train_label[:, 0]
# print(y_train.shape)
#Reshape and normalize test data
testX = test[:,0:].reshape(test.shape[0],1, 28, 28).astype( 'float32' )
X_test = testX / 255.0
y_test = test_label[:,0]
# print(y_test.shape)
#one hot encode
from keras.utils import np_utils
number_of_classes = 10
y_train = np_utils.to_categorical(y_train, number_of_classes)
y_test = np_utils.to_categorical(y_test, number_of_classes)
model = Sequential()
K.set_image_dim_ordering('th')
model.add(Conv2D(30, 5, 5, border_mode= 'valid' , input_shape=(1, 28, 28),activation= 'relu' ))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(15, 3, 3, activation= 'relu' ))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation= 'relu' ))
model.add(Dense(50, activation= 'relu' ))
model.add(Dense(10, activation= 'softmax' ))
# # Compile model
model.compile(loss= 'categorical_crossentropy' , optimizer= 'adam' , metrics=[ 'accuracy' ])
model.fit(X_train, y_train,
epochs=20,
batch_size= 160)
model.summary()
model.save('modelRasool.h5')
score = model.evaluate(X_test, y_test, batch_size=128)
print("The Accuracy and the Loss :")
print(score)
teset_predict.py
from keras.models import load_model
from PIL import Image
import numpy as np
model = load_model('C:/Users/GOT/PycharmProjects/test/modelRasool.h5')
for index in range(2):
img = Image.open('data/' + str(index) + '.png').convert("L")
img = img.resize((28,28))
im2arr = np.array(img)
im2arr = im2arr.reshape(1,28,28,1)
# Predicting the Test set results
y_pred = model.predict(im2arr)
print(y_pred)
the Error :
ValueError: Error when checking input: expected conv2d_1_input to have shape (1, 28, 28) but got array with shape (28, 28, 1)
please help me
it's easy your shape input need be (1 ,1 28,28)
im2arr = np.array(img)
im2arr = im2arr.reshape(1,1,28,28)

How to train a CNN properly to predict well on unseen data?

I have been developing a program for my school project which recognizes numbers. For that I have used Python, Keras and the MNIST Dataset. This is the code I used to train it:
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Convolution2D, MaxPooling2D, Activation, AveragePooling2D
from keras import backend as K
import matplotlib.pyplot as plt
import matplotlib
batch_size = 32
num_classes = 10
epochs = 10
img_rows, img_cols = 28, 28
(x_train, y_train), (x_test, y_test) = mnist.load_data()
if K.image_data_format() == 'channels_first':
x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
input_shape = (1, img_rows, img_cols)
else:
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
model = Sequential()
model.add(Convolution2D(6, (5, 5), input_shape=input_shape))
model.add(Activation('sigmoid'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(12, (5, 5)))
model.add(Activation('sigmoid'))
model.add(AveragePooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(192))
model.add(Dense(10))
model.add(Activation('sigmoid'))
model.add(Dense(10))
model.add(Activation('softmax'))
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adadelta(),
metrics=['accuracy'])
hist = model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
model.save('model3.h5')
train_loss = hist.history['loss']
val_loss = hist.history['val_loss']
train_acc = hist.history['acc']
val_acc = hist.history['val_acc']
xc = range(epochs)
plt.figure(1,figsize=(7,5))
plt.plot(xc,train_loss)
plt.plot(xc,val_loss)
plt.xlabel('num of Epochs')
plt.ylabel('loss')
plt.title('train_loss vs val_loss')
plt.grid(True)
plt.legend(['train','val'])
print(plt.style.available) # use bmh, classic,ggplot for big pictures
plt.style.use(['classic'])
plt.figure(2,figsize=(7,5))
plt.plot(xc,train_acc)
plt.plot(xc,val_acc)
plt.xlabel('num of Epochs')
plt.ylabel('accuracy')
plt.title('train_acc vs val_acc')
plt.grid(True)
plt.legend(['train','val'],loc=4)
#print plt.style.available # use bmh, classic,ggplot for big pictures
plt.style.use(['classic'])
plt.show()
I saved the model under the name model3.h5. However, in another program I wrote, I was trying to predict with the model I saved the numbers I entered in Paint. I had 10 pictures (0-9) and while predicting the model predicted that all numbers are number 8, which is of course wrong.
However, during training, accuracy was close to 98.5% and the loss was less than 0.1. Am I doing something wrong?
Here is the code that I run for predicting it on unseen data. It resizes the picture to 28 columns and 28 rows so it can run on my CNN.
This is my first project on Convolutional Neural Networks and I don't know "some extra techniques" that could help me do better on the unseen data.
I tried some different architectures as well (doing convolutional layers with max pooling and relu activation functions, then adding full connected layers) but the result was still the same. I also tried to set it to 100 or 200 epochs, still no use...
import os, cv2
import numpy as np
import matplotlib.pyplot as plt
from sklearn.utils import shuffle
from sklearn.model_selection import train_test_split
from keras import backend as K
from keras.models import load_model
K.set_image_dim_ordering('tf')
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers.convolutional import Convolution2D, MaxPooling2D
from keras.optimizers import SGD,RMSprop,adam
PATH = os.getcwd()
data_path = PATH + '\myNumbers'
data_dir_list = os.listdir(data_path) #direktoriji unutra
img_data = []
for file in data_dir_list:
test_image = cv2.imread(data_path + "\\" + file)
test_image = cv2.cvtColor(test_image, cv2.COLOR_RGB2GRAY)
test_image = cv2.resize(test_image,(28,28))
test_image = np.array(test_image)
test_image = test_image.astype('float32')
test_image /= 255
print (test_image.shape)
test_image= np.expand_dims(test_image, axis=3)
test_image= np.expand_dims(test_image, axis=0)
print (test_image.shape)
img_data.append(test_image)
model = load_model("model3.h5")
for img in img_data:
print(model.predict(img))
print(model.predict_classes(img))

Resources