DNN binary classifier's accuracy not increasing - machine-learning

My binary classifier DNN's accuracy seems stuck since epoch 1. I think this means that the model is not learning. Any insight on why this is happening?
Problem statement: I would like to classify a given sequence of readings for sensors (ex. [0 1 15 1 0 3]) into either 0 or 1 (0 equivalent to "idle" state, 1 equivalent to "active" state).
About the dataset: Dataset is available here
The "state" column is the target, while the rest of the columns are the features.
I've tried using SGD instead of Adam, tried using different kernel initializes, tried changing the number of hidden layers and number of neurons per layer and tried using sklearn's StandardScaler instead of the MinMaxScaler. None of these approaches seemed to change the outcome.
This is the code:
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.callbacks import EarlyStopping
from keras.optimizers import Adam
from keras.initializers import he_uniform
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder
seed = 7
random_state = np.random.seed(seed)
data = pd.read_csv('Dataset/Reformed/Model0_Dataset.csv')
X = data.drop(['state'], axis=1).values
y = data['state'].values
#min_max_scaler = MinMaxScaler()
std_scaler = StandardScaler()
# X_scaled = min_max_scaler.fit_transform(X)
X_scaled = std_scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=random_state)
# One Hot encode targets
y_train = y_train.reshape(-1, 1)
y_test = y_test.reshape(-1, 1)
enc = OneHotEncoder(categories='auto')
y_train_enc = enc.fit_transform(y_train).toarray()
y_test_enc = enc.fit_transform(y_test).toarray()
epochs = 500
batch_size = 100
model = Sequential()
model.add(Dense(700, input_shape=(X.shape[1],), kernel_initializer=he_uniform(seed)))
model.add(Dropout(0.5))
model.add(Dense(1400, activation='relu', kernel_initializer=he_uniform(seed)))
model.add(Dropout(0.5))
model.add(Dense(700, activation='relu', kernel_initializer=he_uniform(seed)))
model.add(Dropout(0.5))
model.add(Dense(800, activation='relu', kernel_initializer=he_uniform(seed)))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))
model.summary()
early_stopping_monitor = EarlyStopping(patience=25)
# model.compile(SGD(lr=.01, decay=1e-6, momentum=0.9, nesterov=True), loss='binary_crossentropy', metrics=['accuracy'])
model.compile(Adam(lr=.01, decay=1e-6), loss='binary_crossentropy', metrics=['accuracy'], )
history = model.fit(X_train, y_train_enc, validation_split=0.2, batch_size=batch_size,
callbacks=[early_stopping_monitor], epochs=epochs, shuffle=True, verbose=1)
eval = model.evaluate(X_test, y_test_enc, batch_size=batch_size, verbose=1)
Expected results: Accuracy increasing (and loss decreasing) with each epoch (at least for the early epochs).
Actual results: The following values are fixed throughout the entire training process:
loss: 8.0118 - acc: 0.5001 - val_loss: 8.0366 - val_acc: 0.4987

You are using the wrong loss, with a two-output softmax you should use categorical_crossentropy and you should one-hot encode your labels. If you want to use binary_crossentropy, then the output layer should be a one unit with a sigmoid activation.

Related

Troubles with Cross-Validation

I have some troubles to implement cross-validation. I understand that after cross-validation I have to re-train the model but I have the next doubts:
Do train_test split before cross validation and use X_train and y_train for cross-validation process and then re-train model with X_train and y_train
Split data in features (X) and labels (y) and use this variables in cross-validation process and then do train test split and train model with X_train and y_train
If I use features and label variables what is the next step after cross-validation?
Code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
data = pd.read_csv('../data/pima-indians-diabetes.csv')
data.head()
# All the columns except the one we want to predict
features = data.drop(['Outcome'], axis=1)
# Only the column we want to predict
labels = data['Outcome']
from sklearn.model_selection import train_test_split
test_size = 0.33
seed = 12
X_train, X_test, Y_train, Y_test = train_test_split(features, labels, test_size=test_size,
random_state=seed)
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
scores = cross_val_score(model, X_train, y_train, cv=kfold)`
model.fit(X_train, Y_train)
kfold = KFold(n_splits=10, random_state=1)
model = LogisticRegression()
scores = cross_val_score(model, features, labels, cv=kfold)
X_train, X_test, Y_train, Y_test = train_test_split(features, labels, test_size=0.2,
random_state=42)
model.fit(X_train, Y_train)
Which of the two code blocks is correct or is there another way to implement cross-validation correctly?

CNN on tfidf as input

I am working on fake news detection using CNN, I am new to ccoding CNNs in keras and tensorflow. I need help regarding creating a CNN that takes input as statements in form of vectors each of length 100 and outputs 0 or 1 depending on its predicted value as false or true.
X_train.shape
# 10229, 100
X_train = np.expand_dims(X_train, axis=2)
X_train.shape
# 10229,100,1
# actual cnn model here
import tensorflow as tf
from tensorflow.keras import layers
# Conv1D + global max pooling
from keras.layers import Conv1D, MaxPooling1D, Embedding, Dropout, Flatten, Dense
from keras.layers import Input
text_len=100
from keras.models import Model
inp = Input(batch_shape=(None, text_len, 1))
conv2 = Conv1D(filters=128, kernel_size=5, activation='relu')(inp)
drop21 = Dropout(0.5)(conv2)
conv22 = Conv1D(filters=64, kernel_size=5, activation='relu')(drop21)
drop22 = Dropout(0.5)(conv22)
pool2 = MaxPooling1D(pool_size=2)(drop22)
flat2 = Flatten()(pool2)
out = Dense(1, activation='softmax')(flat2)
model = Model(inp, out)
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
model.fit(X_train, Y_train)
I will really appreciate if someone could give me a working code for this with a little bit of explaination
in this dummy example, I use a Conv1D with 2D features. The Conv1D accepts as input sequences in 3D format (n_samples, time_steps, features). If you are using 2D features you have to adapt it to 3D. the normal choice is to consider your features as is expanding simply the temporal dimension (expand_dims on axis 1) there is no reason to assume positional/temporal pattern on tfidf/one-hot features.
When you build your NN you start with 3D dimension and you have to pass in 2D. to pass from to 3D to 2D there are lot of possibilities, the post simple is flattening, with 1 temporal dim a pooling layer is useless. if u are using softmax as last activation layer remember to pass to your dense layer a dimensionality equal to the number of your classes
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import *
from tensorflow.keras.models import *
## define variable
n_sample = 10229
text_len = 100
## create dummy data
X_train = np.random.uniform(0,1, (n_sample,text_len))
y_train = np.random.randint(0,2, n_sample)
## expand train dimnesion: pass from 2d to 3d
X_train = np.expand_dims(X_train, axis=1)
print(X_train.shape, y_train.shape)
## create model
inp = Input(shape=(1,text_len))
conv2 = Conv1D(filters=128, kernel_size=5, activation='relu', padding='same')(inp)
drop21 = Dropout(0.5)(conv2)
conv22 = Conv1D(filters=64, kernel_size=5, activation='relu', padding='same')(drop21)
drop22 = Dropout(0.5)(conv22)
pool2 = Flatten()(drop22) # this is an option to pass from 3d to 2d
out = Dense(2, activation='softmax')(pool2) # the output dim must be equal to the num of class if u use softmax
model = Model(inp, out)
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
model.fit(X_train, y_train, epochs=5)

Validation accuracy and validation loss almost remains constant in every epoch

I am making an autonomous farming robot for my final year project. I want to move it autonomously in lanes in side the farms. I am just using the raspberry pi image in front of my vehicle. I collect my data through pi and then send it to my computer for training.
Initially i have just trained it for moving in a straight line. As i have not used encoders in my motors so there is a possibility of its being diverging along one direction , so i have to constantly give it the feedback to stay on the right path.
Sample image is as follows, Note this is black and white image :enter image description here
I have 836 images for training and 356 for validation. When i am trying to train it, my model accuracy doesnot improves much. I have tried changing different structures, from fully connected layers to different convolutional layers, my training accuracy doesnot improves much and perhaps most of the times validation accuracy and validation loss remains same.
I am confused that why is this so, is this to do with my code or should i apply computer vision techniques on the image so that features are more prominently visible. What should be the best approach to tackle this problem.
My code is as follows:
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
# fix dimension ordering issue
from keras import backend as K
import numpy as np
import glob
import pandas as pd
from sklearn.model_selection import train_test_split
K.set_image_dim_ordering('th')
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
def load_data(path):
print("Loading training data...")
training_data = glob.glob(path)[0]
data=np.load(training_data)
a=data['train']
b=data['train_labels']
s=np.concatenate((a, b), axis=1)
data=pd.DataFrame(s)
data=data.sample(frac=1)
X = data.iloc[:,:-4]
y=data.iloc[:,-4:]
print("Image array shape: ", X.shape)
print("Label array shape: ", y.shape)
# normalize data
# train validation split, 7:3
return train_test_split(X, y, test_size=0.3)
data_path = "*.npz"
X_train,X_test,y_train,y_test=load_data(data_path)
# reshape to be [samples][channels][width][height]
X_train = X_train.values.reshape(X_train.shape[0], 1, 120, 320).astype('float32')
X_test = X_test.values.reshape(X_test.shape[0], 1, 120, 320).astype('float32')
# normalize inputs from 0-255 to 0-1
X_train = X_train / 255.0
X_test = X_test / 255.0
# one hot encode outputs
num_classes = y_test.shape[1]
# define a simple CNN model
def baseline_model():
model = Sequential()
model.add(Conv2D(30, (5, 5), input_shape=(1, 120, 320), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(15, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
# build the model
model = baseline_model()
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=10)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("CNN Error: %.2f%%" % (100-scores[1]*100))
sample output: This is the best output and it is of the above code:
enter image description here
I solved this problem by changing the structure of my algorithm and using NVIDIA's deep learning car algorithm to solve this problem. The algorithm is very robust and applies basic computer vision also on it. You can easily find sample implementation for toy cars on medium/youtube also.
this article was really helpful for me:
https://towardsdatascience.com/deeppicar-part-1-102e03c83f2c
additionally this resource was also very helpful:
https://zhengludwig.wordpress.com/projects/self-driving-rc-car/

xgboost: Huge logloss despite reasonable accuracy

I train a xgboost classifier on a binary classification problem. It produces 70% accurate predictions. Yet logloss is very big at 9.13. I suspect that might be because a few predictions are very much off the target, but I do not understand why it happens - other people report much better logloss (0.55 - 0.6) on the same data with xgboost.
from readCsv import x_train, y_train
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, log_loss
from xgboost import XGBClassifier
seed=7
test_size=0.09
X_train, X_test, y_train, y_test = train_test_split(
x_train, y_train, test_size=test_size, random_state=seed)
# fit model no training data
model = XGBClassifier(max_depth=5,
learning_rate=0.02,
objective= 'binary:logistic',
n_estimators = 5000)
model.fit(X_train, y_train)
# make predictions for test data
y_pred = model.predict(X_test)
predictions = [round(value) for value in y_pred]
accuracy = accuracy_score(y_test, predictions)
print("Accuracy: %.2f%%" % (accuracy * 100.0))
ll = log_loss(y_test, y_pred)
print("Log_loss: %f" % ll)
print(model)
produces following output:
Accuracy: 73.54%
Log_loss: 9.139162
XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,
gamma=0, learning_rate=0.02, max_delta_step=0, max_depth=5,
min_child_weight=1, missing=None, n_estimators=5000, nthread=-1,
objective='binary:logistic', reg_alpha=0, reg_lambda=1,
scale_pos_weight=1, seed=0, silent=True, subsample=1)
Anyone knows reasons for my high logloss? Thanks!
solution: use model.predict_proba(), not model.predict()
This reduced logloss from 7+ to 0.52, which is in expected range. model.predict() was outputing values of huge magnitude like 1e18, it seems it needed to go through some function which would make it a valid probability score (between 0 and 1).

Accuracy does not go up on a keras model

I'm trying to train a model on data from the Higgs Boson challenge on kaggle. The first thing I decided to do was to create a simple keras model. I've tried different amount and width of layers, different cost functions, different optimizers different functions in neurons, but the accuracy on the training set is always between 0.65-0.7 range. I don't really understand why. Here's my an example of a model that worked so weird:
from keras.layers import Dense, merge, Activation, Dropout
from keras.models import Model
from keras.models import Sequential
from keras.optimizers import SGD
model = Sequential()
model.add(Dense(600, input_shape=(30,),activation="relu"))
model.add(Dropout(0.5))
model.add(Dense(400, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(100, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
sgd = SGD(lr=0.01, decay=1e-6)
model.compile(optimizer='rmsprop',loss='binary_crossentropy',metrics=['accuracy'])
model.fit(train,labels,nb_epoch=1,batch_size=1)
I also tried larger models and got such an accuracy too. Please tell me what I am doing wrong.
EDIT
I have tried training this model with 100 epochs and the batch size 0f 100 and got loss equal to 4.9528 and accuracy to 0.6924 again. And it always outputs zero for every example.
The problem arise from the fact that your model always outputs the majority class. It's not a weighted problem (one of the classes appears more than the other) and it seems that your network "learns" to always output the same class.
Try using a different classifier (Random Forest for example) and you'll see that the accuracy is much better.
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier()
rf.fit(X_train, y_train)
When trying to address the issue with the neural network I uses SMOTE to balance the train dataset. You should use "adam" as the optimizer for the classification. Also, a much smaller network architecture should be enough for this problem.
from keras.layers import Dense, Dropout
from keras.models import Sequential
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from imblearn.over_sampling import SMOTE
df = pd.read_csv("training.csv")
y = np.array(df['Label'].apply(lambda x: 0 if x=='s' else 1))
X = np.array(df.drop(["EventId","Label"], axis=1))
sm = SMOTE()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
X_res, y_res = sm.fit_sample(X_train, y_train)
model = Sequential()
model.add(Dense(25, input_shape=(31,),activation="relu"))
model.add(Dropout(0.5))
model.add(Dense(10, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer="adam",loss='binary_crossentropy',metrics=['accuracy'])
model.fit(X_res, y_res,validation_data=(X_test, y_test),nb_epoch=100,batch_size=100)
An example results:
Epoch 11/100
230546/230546 [==============================] - 5s - loss: 0.5146 - acc: 0.7547 - val_loss: 0.3365 - val_acc: 0.9138
Epoch 12/100
230546/230546 [==============================] - 5s - loss: 0.4740 - acc: 0.7857 - val_loss: 0.3033 - val_acc: 0.9270
Epoch 13/100
230546/230546 [==============================] - 5s - loss: 0.4171 - acc: 0.8295 - val_loss: 0.2821 - val_acc: 0.9195
You are training way too short
model.fit(train,labels,nb_epoch=1,batch_size=1)
this means you are going once through the data, with extremely small batch, it should be something among the lines of
model.fit(train, labels, nb_epoch=100, batch_size=100)

Resources