How to Design the Neural Network? - machine-learning

I was trying to make a deep learning prediction model for predicting whether a person is a CKD patient or not. Can you please tell me? How can I design a neural network for it? How many neurons should I add in each layer? Or is there any other method in Keras to do so? The dataset link: https://github.com/Samar-080301/Python_Project/blob/master/ckd_full.csv
import tensorflow as tf
from tensorflow import keras
import pandas as pd
from sklearn.model_selection import train_test_split
import os
from matplotlib import pyplot as plt
os.chdir(r'C:\Users\samar\OneDrive\desktop\projects\Chronic_Kidney_Disease')
os.getcwd()
x=pd.read_csv('ckd_full.csv')
y=x[['class']]
y['class']=y['class'].replace(to_replace=(r'ckd',r'notckd'), value=(1,0))
x=x.drop(columns=['class'])
x['rbc']=x['rbc'].replace(to_replace=(r'normal',r'abnormal'), value=(1,0))
x['pcc']=x['pcc'].replace(to_replace=(r'present',r'notpresent'), value=(1,0))
x['ba']=x['ba'].replace(to_replace=(r'present',r'notpresent'), value=(1,0))
x['pc']=x['pc'].replace(to_replace=(r'normal',r'abnormal'), value=(1,0))
x['htn']=x['htn'].replace(to_replace=(r'yes',r'no'), value=(1,0))
x['dm']=x['dm'].replace(to_replace=(r'yes',r'no'), value=(1,0))
x['cad']=x['cad'].replace(to_replace=(r'yes',r'no'), value=(1,0))
x['pe']=x['pe'].replace(to_replace=(r'yes',r'no'), value=(1,0))
x['ane']=x['ane'].replace(to_replace=(r'yes',r'no'), value=(1,0))
x['appet']=x['appet'].replace(to_replace=(r'good',r'poor'), value=(1,0))
x[x=="?"]=np.nan
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.01)
#begin the model
model=keras.models.Sequential()
model.add(keras.layers.Dense(128,input_dim = 24, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(128,activation=tf.nn.relu)) # adding a layer with 128 nodes and relu activaation function
model.add(tf.keras.layers.Dense(128,activation=tf.nn.relu)) # adding a layer with 128 nodes and relu activaation function
model.add(tf.keras.layers.Dense(128,activation=tf.nn.relu)) # adding a layer with 128 nodes and relu activaation function
model.add(tf.keras.layers.Dense(128,activation=tf.nn.relu)) # adding a layer with 128 nodes and relu activaation function
model.add(tf.keras.layers.Dense(128,activation=tf.nn.relu)) # adding a layer with 128 nodes and relu activaation function
model.add(tf.keras.layers.Dense(2,activation=tf.nn.softmax)) # adding a layer with 2 nodes and softmax activaation function
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) # specifiying hyperparameters
model.fit(xtrain,ytrain,epochs=5) # load the model
model.save('Nephrologist') # save the model with a unique name
myModel=tf.keras.models.load_model('Nephrologist') # make an object of the model
prediction=myModel.predict((xtest))
C:\Users\samar\anaconda3\lib\site-packages\ipykernel_launcher.py:12: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
if sys.path[0] == '':
Epoch 1/5
396/396 [==============================] - 0s 969us/sample - loss: nan - acc: 0.3561
Epoch 2/5
396/396 [==============================] - 0s 343us/sample - loss: nan - acc: 0.3763
Epoch 3/5
396/396 [==============================] - 0s 323us/sample - loss: nan - acc: 0.3763
Epoch 4/5
396/396 [==============================] - 0s 283us/sample - loss: nan - acc: 0.3763
Epoch 5/5
396/396 [==============================] - 0s 303us/sample - loss: nan - acc: 0.3763

Here is the structure that I achieved 100% test accuracy with:
model=keras.models.Sequential()
model.add(keras.layers.Dense(200,input_dim = 24, activation=tf.nn.tanh))
model.add(keras.layers.Dense(1, activation=tf.nn.sigmoid))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) # specifiying hyperparameters
xtrain_tensor = tf.convert_to_tensor(xtrain, dtype=tf.float32)
ytrain_tensor = tf.convert_to_tensor(ytrain, dtype=tf.float32)
model.fit(xtrain_tensor , ytrain_tensor , epochs=500, batch_size=128, validation_split = 0.15, shuffle=True, verbose=2) # load the model
results = model.evaluate(xtest, ytest, batch_size=128)
Output:
3/3 - 0s - loss: 0.2560 - accuracy: 0.9412 - val_loss: 0.2227 - val_accuracy: 0.9815
Epoch 500/500
3/3 - 0s - loss: 0.2225 - accuracy: 0.9673 - val_loss: 0.2224 - val_accuracy: 0.9815
1/1 [==============================] - 0s 0s/step - loss: 0.1871 - accuracy: 1.0000
The last line represents the evaluation of the model on the test dataset. Seems like it generalized well :)
------------------------------------------------- Original answer below ---------------------------------------------------
I would go with a logistic regression model first in order to see if there is any predictive value to your dataset.
model=keras.models.Sequential()
model.add(keras.layers.Dense(1,input_dim = 24, activation=tf.nn.sigmoid))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']) # specifiying hyperparameters
model.fit(xtrain,ytrain,epochs=100) # Might require more or less epoches. It depends on the amount of noise in your dataset.
If you see you receive an accuracy score that satisfies you, I would give it a try and add 1 or 2 more dense hidden layers with between 10 to 40 nodes.
It's important to mention that my advice is solely based on my experience.
I HIGHLY(!!!!) recommend transforming the y_label into a binary value when 1 represents the positive class (a record is a record of a CKD patient) and 0 represents the negative class.
Let me know if it works, and if it doesn't I'll also try to play with your dataset.

apparently you seem to have problem with your data pre-processing
you can use
df.fillna('ffill')
and also you can use feature columns to do those long tasks example:
CATEGORICAL_COLUMNS = ['columns','which have','categorical data','like sex']
NUMERIC_COLUMNS = ['columns which have','numeric data']
feature_column =[]
for items in CATEGORICAL_COLUMNS:
feature_column.append( tf.feature_clolumns.categorical_columns_with_vocavulary_list(items, df[items].unique()))
for items in NUMERIC_COLUMNS:
feature_column.append( tf.feature_clolumns.numeric_columns(items, df[items].unique()))
now you can use these feature columns to make a prediction for your model which will be more accurate more can be done in data preprocessing here is the official documentation to help you more : tensorflow Documentation on feature columns

Related

Horrible forecasting KPI scores (RMSE, MAE and MAPE) using RNN with three flavors ("LSTM", "GRU" and "RNN")

My project is trying to forecast COVID 19 total confirmed case for 2021. This is the overlook of the confirm case data, which I use to train my RNN model.
enter image description here
The confirmed number though doesn't show any repeating pattern. But there has been research studies using RNN and LSTM on the same data (in fact, I use the same data source as them). This is the research study I drew inspiration from:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8523546/
This is the result I got:
beginning the training of the LSTM RNN:
Epoch 99: 100%|██████████| 29/29 [00:00<00:00, 33.10it/s, loss=0.115, v_num=logs, train_loss=0.0933, val_loss=0.466]
training of the LSTM RNN completed: 105.23 sec
Predicting DataLoader 0: 100%|██████████| 1/1 [00:00<00:00, 6.48it/s]
LSTM :
MAPE : 199.3029
RMSPE : 0.6659
RMSE : 0.6763
-R squared : 15366746993394475597824.0000
se : 0.0871
beginning the training of the GRU RNN:
Epoch 99: 100%|██████████| 29/29 [00:00<00:00, 37.86it/s, loss=0.11, v_num=logs, train_loss=0.0911, val_loss=0.474]
training of the GRU RNN completed: 90.99 sec
Predicting DataLoader 0: 100%|██████████| 1/1 [00:00<00:00, 3.91it/s]
GRU :
MAPE : 209.6602
RMSPE : 0.6771
RMSE : 0.6877
-R squared : 467975998695823638528.0000
se : 0.0871
beginning the training of the Vanilla RNN:
Epoch 99: 100%|██████████| 29/29 [00:00<00:00, 43.54it/s, loss=0.114, v_num=logs, train_loss=0.109, val_loss=0.461]
training of the Vanilla RNN completed: 79.32 sec
Predicting DataLoader 0: 100%|██████████| 1/1 [00:00<00:00, 6.08it/s]
Vanilla :
MAPE : 200.5710
RMSPE : 0.6673
RMSE : 0.6778
-R squared : 115942252717688101559392358367232.0000
se : 0.0871
Also, here is my prediction plot. Three of the plot looks exactly the same, different MAPE score:
enter image description here
For the package i use, i use darts (u8darts[all] when importing the packages).
for methodology, I set the parameters for my model as this. I used this article by Heiko Oinen for methodology (detailed code is below the page):
https://medium.com/towards-data-science/temporal-loops-intro-to-recurrent-neural-networks-for-time-series-forecasting-in-python-b0398963dc1f:
#Set up the models, run the models, plot and evaluate
EPOCH = 100
def run_RNN(flavor, ts, train, val):
#set the model up
model_RNN = RNNModel(
model = flavor,
model_name = flavor + str(" RNN"),
input_chunk_length = 12,
training_length = 20,
hidden_dim = 20,
batch_size = 16,
n_epochs = EPOCH,
dropout = 0,
optimizer_kwargs = {'lr': 1e-3},
log_tensorboard = True,
random_state = 42,
force_reset = True
)
if flavor == "RNN": flavor = "Vanilla"
#fit the model
fit_it(model_RNN, train, val, flavor)
#compute N predictions
pred = model_RNN.predict(n = FC_N,future_covariates = covariates)
#plot predictions vs actual
plot_fitted(pred, ts, flavor)
#print accuracy metrics
res_acc = accuracy_metrics(pred, ts)
I have even tries epoch to 300, but the train loss and loss during training didn't decrease further.
I haven't gotten experience in asking questions here, I'll try to articulate and provide more detail if you have questions. Thank you so much for your help!

PyTorch Detecto Model: tensor incompatibiliy in predicition for a pretrained model

Trying to train a very simple model and do a
image-prediction with the following code
for pytorch detecto:
from detecto import core, utils, visualize
dataset = core.Dataset('images/')
model = core.Model(['rect'])
model.fit(dataset)
modelName = 'model_weights_simpleRect.pth'
model.save(modelName)
image = utils.read_image('simple_image_to_test.jpg')
predictions = model.predict(image)
This leads to the following output:
Epoch 1 of 10
Begin iterating over training dataset
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:12<00:00, 1.56it/s]
Epoch 2 of 10
Begin iterating over training dataset
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:11<00:00, 1.80it/s]
Epoch 3 of 10
Begin iterating over training dataset
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:11<00:00, 1.80it/s]
Epoch 4 of 10
Begin iterating over training dataset
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:11<00:00, 1.79it/s]
Epoch 5 of 10
Begin iterating over training dataset
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:11<00:00, 1.79it/s]
Epoch 6 of 10
Begin iterating over training dataset
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:11<00:00, 1.80it/s]
Epoch 7 of 10
Begin iterating over training dataset
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:11<00:00, 1.78it/s]
Epoch 8 of 10
Begin iterating over training dataset
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:11<00:00, 1.80it/s]
Epoch 9 of 10
Begin iterating over training dataset
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:11<00:00, 1.78it/s]
Epoch 10 of 10
Begin iterating over training dataset
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:11<00:00, 1.80it/s]
Traceback (most recent call last):
File "train_simpleRect_and_predict.py", line 15, in <module>
predictions = model.predict(image)
File "/home/std/anaconda3/envs/dri/lib/python3.7/site-packages/detecto/core.py", line 338, in predict
preds = self._get_raw_predictions(images)
File "/home/std/anaconda3/envs/dri/lib/python3.7/site-packages/detecto/core.py", line 294, in _get_raw_predictions
preds = self._model(images)
File "/home/std/anaconda3/envs/dri/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/home/std/anaconda3/envs/dri/lib/python3.7/site-packages/torchvision/models/detection/generalized_rcnn.py", line 52, in forward
detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
File "/home/std/anaconda3/envs/dri/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/home/std/anaconda3/envs/dri/lib/python3.7/site-packages/torchvision/models/detection/roi_heads.py", line 550, in forward
boxes, scores, labels = self.postprocess_detections(class_logits, box_regression, proposals, image_shapes)
File "/home/std/anaconda3/envs/dri/lib/python3.7/site-packages/torchvision/models/detection/roi_heads.py", line 474, in postprocess_detections
pred_boxes = self.box_coder.decode(box_regression, proposals)
File "/home/std/anaconda3/envs/dri/lib/python3.7/site-packages/torchvision/models/detection/_utils.py", line 168, in decode
rel_codes.reshape(sum(boxes_per_image), -1), concat_boxes
RuntimeError: cannot reshape tensor of 0 elements into shape [0, -1] because the unspecified dimension size -1 can be any value and is ambiguous
How can I get more detailled information about the model dimension, where exactly
in the model the tensor incompatibilies occur and how to fix it?
Add. info: I used the same code with other data and it worked.
Thank you!
Another problem - which brought up a similar tensor-dimension error, was caused by this statement:
model = Model.load(modelName, ['rect'])
The correct version is:
model = Model()
model.load(modelName, ['rect'])
The problem were wrong image dimensions in the
xml-description files, which corresponded to each image.
I fixed the xml-files and the error did not occur again.

How to solve ' CUDA out of memory. Tried to allocate xxx MiB' in pytorch?

I am trying to train a CNN in pytorch,but I meet some problems.
The RuntimeError:
RuntimeError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0;
2.00 GiB total capacity; 584.97 MiB already allocated; 13.81 MiB free; 590.00 MiB reserved in total by PyTorch)
This is my code:
import os
import numpy as np
import cv2
import torch as t
import torch.nn as nn
import torchvision.transforms as transforms
from torch.utils.data import DataLoader,Dataset
import time
import matplotlib.pyplot as plt
%matplotlib inline
root_path='C:/Users/60960/Desktop/recet-task/course_LeeML20/course_LeeML20-datasets/hw3/food-11'
training_path=root_path+'/training'
testing_path=root_path+'/testing'
validation_path=root_path+'/validation'
def readfile(path,has_label):
img_paths=sorted(os.listdir(path))
x=np.zeros((len(img_paths),128,128,3),dtype=np.uint8)
y=np.zeros((len(img_paths)),dtype=np.uint8)
for i,file in enumerate(img_paths):
img=cv2.imread(path+'/'+file)
x[i,:,:]=cv2.resize(img,(128,128))
if has_label:
y[i]=int(file.split('_')[0])
if has_label:
return x,y
else:
return x
def show_img(img_from_cv2):
b,g,r=cv2.split(img_from_cv2)
img=cv2.merge([r,g,b])
plt.imshow(img)
plt.show()
x_train,y_train=readfile(training_path,True)
x_val,y_val=readfile(validation_path,True)
x_test=readfile(testing_path,False)
train_transform=transforms.Compose([
transforms.ToPILImage(),
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(15),
transforms.ToTensor()
])
test_transform=transforms.Compose([
transforms.ToPILImage(),
transforms.ToTensor()
])
class ImgDataset(Dataset):
def __init__(self,x,y=None,transform=None):
self.x=x
self.y=y
if y is not None:
self.y=t.LongTensor(y)
self.transform=transform
def __len__(self):
return len(self.x)
def __getitem__(self,idx):
X=self.x[idx]
if self.transform is not None:
X=self.transform(X)
if self.y is not None:
Y=self.y[idx]
return X,Y
return X
batch_size=128
train_set=ImgDataset(x_train,y_train,transform=train_transform)
val_set=ImgDataset(x_val,y_val,transform=test_transform)
train_loader=DataLoader(train_set,batch_size=batch_size,shuffle=True)
val_loader=DataLoader(val_set,batch_size=batch_size,shuffle=False)
class Classifier(nn.Module):
def __init__(self):
super(Classifier,self).__init__()
self.cnn=nn.Sequential(
nn.Conv2d(3,64,3,1,1),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(2,2,0),
nn.Conv2d(64,128,3,1,1),
nn.BatchNorm2d(128),
nn.ReLU(),
nn.MaxPool2d(2,2,0),
nn.Conv2d(128,256,3,1,1),
nn.BatchNorm2d(256),
nn.ReLU(),
nn.MaxPool2d(2,2,0),
nn.Conv2d(256,512,3,1,1),
nn.BatchNorm2d(512),
nn.ReLU(),
nn.MaxPool2d(2,2,0),
nn.Conv2d(512,512,3,1,1),
nn.BatchNorm2d(512),
nn.ReLU(),
nn.MaxPool2d(2,2,0)
)
self.fc=nn.Sequential(
nn.Linear(512*4*4,1024),
nn.ReLU(),
nn.Linear(1024,512),
nn.ReLU(),
nn.Linear(512,11)
)
def forward(self,x):
out=self.cnn(x)
out=out.view(out.size()[0],-1)
return self.fc(out)
model=Classifier().cuda()
loss_fn=nn.CrossEntropyLoss()
optim=t.optim.Adam(model.parameters(),lr=0.001)
epochs=30
for epoch in range(epochs):
epoch_start_time=time.time()
train_acc=0.0
train_loss=0.0
val_acc=0.0
val_loss=0.0
model.train()
for i,data in enumerate(train_loader):
optim.zero_grad()
train_pred=model(data[0].cuda())
batch_loss=loss_fn(train_pred,data[1].cuda())
batch_loss.backward()
optim.step()
train_acc+=np.sum(np.argmax(train_pred.cpu().data.numpy(),axis=1)==data[1].numpy())
train_loss+=batch_loss.item()
model.eval()
with t.no_grad():
for i,data in enumerate(val_loader):
val_pred=model(data[0].cuda())
batch_loss=loss_fn(val_pred,data[1].cuda())
val_acc+=np.sum(np.argmax(val_pred.cpu().data.numpy(),axis=1)==data[1].numpy())
val_loss+=batch_loss.item()
print('[%03d/%03d] %2.2f sec(s) Train Acc: %3.6f Loss: %3.6f | Val Acc: %3.6f loss: %3.6f' % (epoch + 1, epochs, time.time()-epoch_start_time,train_acc/train_set.__len__(), train_loss/train_set.__len__(), val_acc/val_set.__len__(), val_loss/val_set.__len__()))
x_train_val=np.concatenate((x_train,x_val),axis=0)
y_train_val=np.concatenate((y_train,y_val),axis=0)
train_val_set=ImgDataset(x_train_val,x_train_val,train_transform)
train_val_loader=DataLoader(train_val_set,batch_size=batch_size,shuffle=True)
model_final=Classifier().cuda()
loss_fn=nn.CrossEntropy()
optim=t.optim.Adam(model_final.parameters(),lr=0.001)
epochs=30
for epoch in range(epochs):
epoch_start_time=time.time()
train_acc=0.0
train_loss=0.0
model_final.train()
for i,data in enumerate(train_val_loader):
optim.zero_grad()
train_pred=model_final(data[0].cuda())
batch_loss=loss_fn(train_pred,data[1].cuda())
batch_loss.backward()
optim.step()
train_acc+=np.sum(np.argmax(train_pred.cpu().data.numpy(),axis=1)==data[1].numpy())
train_loss+=batch_loss.item()
print('[%03d/%03d] %2.2f sec(s) Train Acc: %3.6f Loss: %3.6f' % (epoch + 1, epochs, time.time()-epoch_start_time,train_acc/train_val_set.__len__(), train_loss/train_val_set.__len__()))
test_set=ImgDataset(x_test,transform=test_transform)
test_loader=DataLoader(test_set,batch_size=batch_size,shuffle=False)
model_final.eval()
prediction=[]
with t.no_grad():
for i,data in enumerate(test_loader):
test_pred=model_final(data.cuda())
test_label=np.argmax(test_pred.cpu().data.numpy(),axis=1)
for y in test_label:
prediction.append(y)
with open('predict.csv','w') as f:
f.write('Id,Category\n')
for i,y in enumerate(prediction):
f.write('{},{}\n,'.format(i,y))
Pytorch version is 1.4.0, opencv2 version is 4.2.0.
The training dataset are pictures like these:training set
The error happens at this line:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-1-770be67177f4> in <module>
119 for i,data in enumerate(train_loader):
120 optim.zero_grad()
--> 121 train_pred=model(data[0].cuda())
122 batch_loss=loss_fn(train_pred,data[1].cuda())
123 batch_loss.backward()
I have already installed:
some information.
GPU utilization is low,close to zero:
GPU utilization.
Error message says:
RuntimeError: CUDA out of memory. Tried to allocate 512.00 MiB.
So I want to know how to allocate more memory.
What's more, I have tried to reduce the batch size to 1, but this doesn't work.
HELP!!!
Try reducing your batch_size (ex. 32). This can happen because your GPU memory can't hold all your images for a single epoch.
Before reducing the batch size check the status of GPU memory :slight_smile:
nvidia-smi
Then check which process is eating up the memory choose PID and kill :boom: that process with
sudo kill -9 PID
or
sudo fuser -v /dev/nvidia*
sudo kill -9 PID

Difference between Tensorflow and Scikitlearn log_loss function implementation

Hi I am trying to get into tensorflow and feeling a bit dumb.
Does log_loss in TF differ from sklearn's one?
Here are some lines from my code, how I am calculating:
from sklearn.metrics import log_loss
tmp = np.array(y_test)
y_test_t = np.array([tmp, -(tmp-1)]).T[0]
tf_log_loss = tf.losses.log_loss(predictions=tf.nn.softmax(logits), labels=tf_y)
with tf.Session() as sess:
# training
a = sess.run(tf.nn.softmax(logits), feed_dict={tf_x: xtest, keep_prob: 1.})
print(" sk.log_loss: ", log_loss(y_test, a,eps=1e-7 ))
print(" tf.log_loss: ", sess.run(tf_log_loss, feed_dict={tf_x: xtest, tf_y: y_test_t, keep_prob: 1.}))
Output I get
Epoch 7, Loss: 0.4875 Validation Accuracy: 0.818981
sk.log_loss: 1.76533018874
tf.log_loss: 0.396557
Epoch 8, Loss: 0.4850 Validation Accuracy: 0.820738
sk.log_loss: 1.77217639627
tf.log_loss: 0.393351
Epoch 9, Loss: 0.4835 Validation Accuracy: 0.823374
sk.log_loss: 1.78479079656
tf.log_loss: 0.390572
Seems like while tf.log_loss converges sk.log_loss diverges.
I had the same problem. After looking up the source code of tf.losses.log_loss, its key lines show wat is going on:
losses = - math_ops.multiply(labels, math_ops.log(predictions + epsilon))
- math_ops.multiply((1 - labels), math_ops.log(1 - predictions + epsilon))
It is binary log-loss (i.e. every class is considered non-exclusive) rather than multi-class log-loss.
As I worked with probabilities (rather than logits), I couldn't use tf.nn.softmax_cross_entropy_with_logits (though, I could have applied logarithm).
My solution was to implement log-loss by hand:
loss = tf.reduce_sum(tf.multiply(- labels, tf.log(probs))) / len(probs)
See also:
https://github.com/tensorflow/tensorflow/issues/2462
difference between tensorflow tf.nn.softmax and tf.nn.softmax_cross_entropy_with_logits

Keras RNN loss does not decrease over epoch

I built a RNN using Keras. The RNN is used to solve a regression problem:
def RNN_keras(feat_num, timestep_num=100):
model = Sequential()
model.add(BatchNormalization(input_shape=(timestep_num, feat_num)))
model.add(LSTM(input_shape=(timestep_num, feat_num), output_dim=512, activation='relu', return_sequences=True))
model.add(BatchNormalization())
model.add(LSTM(output_dim=128, activation='relu', return_sequences=True))
model.add(BatchNormalization())
model.add(TimeDistributed(Dense(output_dim=1, activation='relu'))) # sequence labeling
rmsprop = RMSprop(lr=0.00001, rho=0.9, epsilon=1e-08)
model.compile(loss='mean_squared_error',
optimizer=rmsprop,
metrics=['mean_squared_error'])
return model
The whole process looks fine. But the loss stays the exact same over epochs.
61267 in the training set
6808 in the test set
Building training input vectors ...
888 unique feature names
The length of each vector will be 888
Using TensorFlow backend.
Build model...
# Each batch has 1280 examples
# The training data are shuffled at the beginning of each epoch.
****** Iterating over each batch of the training data ******
Epoch 1/3 : Batch 1/48 | loss = 11011073.000000 | root_mean_squared_error = 3318.232910
Epoch 1/3 : Batch 2/48 | loss = 620.271667 | root_mean_squared_error = 24.904161
Epoch 1/3 : Batch 3/48 | loss = 620.068665 | root_mean_squared_error = 24.900017
......
Epoch 1/3 : Batch 47/48 | loss = 618.046448 | root_mean_squared_error = 24.859678
Epoch 1/3 : Batch 48/48 | loss = 652.977051 | root_mean_squared_error = 25.552946
****** Epoch 1: RMSD(training) = 24.897174
Epoch 2/3 : Batch 1/48 | loss = 607.372620 | root_mean_squared_error = 24.644049
Epoch 2/3 : Batch 2/48 | loss = 599.667786 | root_mean_squared_error = 24.487448
Epoch 2/3 : Batch 3/48 | loss = 621.368103 | root_mean_squared_error = 24.926300
......
Epoch 2/3 : Batch 47/48 | loss = 620.133667 | root_mean_squared_error = 24.901398
Epoch 2/3 : Batch 48/48 | loss = 639.971924 | root_mean_squared_error = 25.297264
****** Epoch 2: RMSD(training) = 24.897174
Epoch 3/3 : Batch 1/48 | loss = 651.519836 | root_mean_squared_error = 25.523636
Epoch 3/3 : Batch 2/48 | loss = 673.582581 | root_mean_squared_error = 25.952084
Epoch 3/3 : Batch 3/48 | loss = 613.930054 | root_mean_squared_error = 24.776562
......
Epoch 3/3 : Batch 47/48 | loss = 624.460327 | root_mean_squared_error = 24.988203
Epoch 3/3 : Batch 48/48 | loss = 629.544250 | root_mean_squared_error = 25.090448
****** Epoch 3: RMSD(training) = 24.897174
I do NOT think it is normal. Do I miss something?
UPDATE:
I find that all predictions are always zero after all epochs. This is the reason why all RMSDs are all the same because the predictions are all the same, i.e. 0. I checked the training y. It only contains just a few zeros. So it is not due to data imbalance.
So now I am thinking if it is because of the layers and activation that I am using.
Your RNN functions seems to be ok.
The speed of reduction in loss depends on optimizer and learning rate.
Any how you are using decay rate 0.9. try with bigger learning rate, any how it is going to decrease with 0.9 rate.
Try out other optimizers with different learning rates
Other optimizers available with keras: https://keras.io/optimizers/
Many times, some optimizers work well on some data sets while some may fails.
Have you tried changing activation function from relu to softmax?
Relu activation has the tendency to diverge. However, if initializing the weight with eigenmatrix may result in a better convergence.
Since you are using RNNs for regression problem (not for classification), you should use 'linear' activation at the last layer.
In your code,
model.add(TimeDistributed(Dense(output_dim=1, activation='relu'))) # sequence labeling
change to activation='linear' instead of 'relu'.
If it doesn't work, remove activation='relu' in second layer.
Also learning rate for rmsprop usually ranges from 0.1 to 0.0001.

Resources