Decrease loss in keras training using lstm - machine-learning

I have an input like this:
x_train = [
[0,0,0,1,-1,-1,1,0,1,0,...,0,1,-1],
[-1,0,0,-1,-1,0,1,1,1,...,-1,-1,0],
...
[1,0,0,1,1,0,-1,-1,-1,...,-1,-1,0]
]
which 1 means increase in one metric and -1 means decrease in it and 0 means no change in the metric. Each array has 83 items for 83 fields and the output (labels) for each array is a categorical array that shows effect of these metrics on a single metric:
[[ 0. 0. 1.]
[ 1. 0. 0.],
[ 0. 0. 1.],
...
[ 0. 0. 1.],
[ 1. 0. 0.]]
I used keras and lstm in the following code:
def train(x, y, x_test, y_test):
x_train = np.array(x)
y_train = np.array(y)
y_train = to_categorical(y_train, 3)
model = Sequential()
model.add(Embedding(x_train.shape[0], output_dim=256))
model.add(LSTM(128))
model.add(Dropout(0.5))
model.add(Dense(3, activation='softmax'))
opt = optimizers.SGD(lr=0.001)
model.compile(loss='categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
model.fit(x_train, y_train, batch_size=128, nb_epoch=100)
y_test = to_categorical(y_test, 3)
score = model.evaluate(x_test, y_test, batch_size=128)
prediction = model.predict(x_test, batch_size=128)
print score
print prediction
but the loss after 100 epochs is:
1618/1618 [==============================] - 0s - loss: 0.7328 - acc: 0.5556
How can I decrease this loss percentage?

Related

Errors with LSTM input shapes with time series data

I'm trying to predict torque from 8 features with an LSTM layer in my neural network. I'm having trouble with the input shape and have looked around on many sites for a solution. I'm quite new to machine learning and am having trouble understanding the problem and how I can fix this. Here is my code, dataset, and error message.
file = r'/content/drive/MyDrive/only_force_pt1.csv'
df = pd.read_csv(file)
X = df.iloc[:, 1:9]
y = df.iloc[:,9]
print(X)
print(y)
df.head()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.1, shuffle = True)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size = 0.1, shuffle = True)
[verbose, epochs, batch_size] = [1, 200, 32]
input_shape = (X_train.shape[0],X_train.shape[1])
model = Sequential()
# LSTM
model.add(LSTM(64, input_shape=input_shape, return_sequences = True))
model.add(Dense(32, activation='relu', kernel_regularizer=keras.regularizers.l2(0.001)))
#model.add(Dropout(0.2))
#model.add(Dense(32, activation='relu', kernel_regularizer=keras.regularizers.l2(0.001)))
model.add(Dense(1,activation='relu'))
earlystopper = EarlyStopping(monitor='val_loss', min_delta=0, patience = 20, verbose =1, mode = 'auto')
model.summary()
model.compile(loss = 'mse', optimizer = Adam(learning_rate = 0.001), metrics=[tf.keras.metrics.RootMeanSquaredError()])
history = model.fit(X_train, y_train, batch_size = batch_size, epochs = epochs, verbose = verbose, validation_data=(X_val,y_val), callbacks = [earlystopper])
ValueError: Input 0 of layer "sequential_17" is incompatible with the layer: expected shape=(None, 3634, 8), found shape=(None, 8)
dataset: https://drive.google.com/drive/folders/1BQOXffFYioCiPug2VcBZEZVD-u3y9bcl?usp=sharing][1]
As I understand your problem, I think that you are passing the number of data points as an additional dimension on the input shape of the LSTM layer. Your data dimensionality is 8 and 3634(=X_train.shape[0]) is the number of data points, which should match the first dimension (with None) of the input tensors, and should not be passed as a dimension to the LSTM because it is determined by the batch size.
If that's the case, change the input_shape definition to:
input_shape = (X_train.shape[1],)
and it should work.

Very low performance even after oversampling dataset

I'm using an MLPClassifier for classification of heart diseases. I used imblearn.SMOTE to balance the objects of each class. I was getting very good results (85% balanced acc.), but i was advised that i would not use SMOTE on test data, only for train data. After i made this changes, the performance of my classifier fell down too much (~35% balanced accuracy) and i don't know what can be wrong.
Here is a simple benchmark with training data balanced but test data unbalanced:
And this is the code:
def makeOverSamplesSMOTE(X,y):
from imblearn.over_sampling import SMOTE
sm = SMOTE(sampling_strategy='all')
X, y = sm.fit_sample(X, y)
return X,y
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=20)
## Normalize data
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.fit_transform(X_test)
## SMOTE only on training data
X_train, y_train = makeOverSamplesSMOTE(X_train, y_train)
clf = MLPClassifier(hidden_layer_sizes=(20),verbose=10,
learning_rate_init=0.5, max_iter=2000,
activation='logistic', solver='sgd', shuffle=True, random_state=30)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
I'd like to know what i'm doing wrong, since this seems to be the proper way of preparing data.
The first mistake in your code is when you are transforming data into standard format. You only need to fit StandardScaler once and that is on X_train. You shouldn't refit it on X_test. So the correct code will be:
def makeOverSamplesSMOTE(X,y):
from imblearn.over_sampling import SMOTE
sm = SMOTE(sampling_strategy='all')
X, y = sm.fit_sample(X, y)
return X,y
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=20)
## Normalize data
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)
## SMOTE only on training data
X_train, y_train = makeOverSamplesSMOTE(X_train, y_train)
clf = MLPClassifier(hidden_layer_sizes=(20),verbose=10,
learning_rate_init=0.5, max_iter=2000,
activation='logistic', solver='sgd', shuffle=True, random_state=30)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
For the machine learning model, try reducing the learning rate. it is too high. the default learning rate in sklearn is 0.001. Try changing the activation function and the number of layers. Also not every ML model works on every dataset so you might need to look at your data and choose ML model accordingly.
Hope you have already got better result for your model.I tried by changing few parameter, and I getting accuracy of 65%, when I change it to 90:10 sample I got an accuracy of 70%.
But accuracy can mislead,so I calculated F1 score which give you better picture of prediction.
from sklearn.neural_network import MLPClassifier
clf = MLPClassifier(hidden_layer_sizes=(1,),verbose=False,
learning_rate_init=0.001,
max_iter=2000,
activation='logistic', solver='sgd', shuffle=True, random_state=50)
clf.fit(X_train_res, y_train_res)
y_pred = clf.predict(X_test)
from sklearn.metrics import accuracy_score, confusion_matrix ,classification_report
score=accuracy_score(y_test, y_pred, )
print(score)
cr=classification_report(y_test, clf.predict(X_test))
print(cr)
Accuracy = 0.65
classification report :
precision recall f1-score support
0 0.82 0.97 0.89 33
1 0.67 0.31 0.42 13
2 0.00 0.00 0.00 6
3 0.00 0.00 0.00 4
4 0.29 0.80 0.42 5
micro avg 0.66 0.66 0.66 61
macro avg 0.35 0.42 0.35 61
weighted avg 0.61 0.66 0.61 61
confusion_matrix:
array([[32, 0, 0, 0, 1],
[ 4, 4, 2, 0, 3],
[ 1, 1, 0, 0, 4],
[ 1, 1, 0, 0, 2],
[ 1, 0, 0, 0, 4]], dtype=int64)

determine the Keras's Mnist input shape

I have xtrain.shape as
(60000, 28, 28)
It means 60000 channels with image size 28 * 28
I want to make a keras Sequential model.
specifying the model shape
model = Sequential()
model.add(Convolution2D(32,3,activation='relu',input_shape=(????)))
model.add(Dense(10, activation='relu'))
model.summary()
what input_shape should looks like?
model = Sequential()
model.add(Dense(64,input_shape=(1,28,28)))
when I put this I got an following error
Error when checking input: expected dense_31_input to have 4 dimensions, but got array with shape (60000, 28, 28)
why this required 4 dimensions? and how to fix it form code?
I have xtrain.shape as
(60000, 28, 28)
It means 60000 channels with image size 28 * 28
Well, it certainly does not mean that; it means 60000 samples, not channels (MNIST is a single-channel dataset).
No need to re-invent the wheel in such cases - have a look at the MNIST CNN example in Keras:
from keras import backend as K
# input image dimensions
img_rows, img_cols = 28, 28
# the data, shuffled and split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()
if K.image_data_format() == 'channels_first': # Theano backend
x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
input_shape = (1, img_rows, img_cols)
else: # Tensorflow backend
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)
# normalise:
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
# your model:
model = Sequential()
model.add(Convolution2D(32,3,activation='relu',input_shape=input_shape))
model.add(Dense(10, activation='softmax')) # change to softmax in the final layer
where you should also change the activation of the final layer to softmax (and most probably add some pooling and flatten layers before the final dense one).
Try to reshape data to (60000, 28, 28, 1) or (60000, 1, 28, 28).
First one,
model = Sequential()
model.add(Convolution2D(32,3,activation='relu',input_shape=(60000,28,28)))
model.add(Dense(10, activation='relu'))
model.summary()
Second one,
model = Sequential()
model.add(Dense(64,input_shape=(None,60000,28,28)))

how to choose LSTM 2-d input shape?

I am trying to feed 1-D signal(1,2000) which has 22 features(22,2000) into LSTM.
(1-D signal is taken by 10 seconds with 200 hz sampling rate)
And I have 808 batches. (808, 22, 2000)
I saw that the LSTM receives 3D tensor shape of (batch_size, timestep, input_dim).
So is it right that my input shape as?
: (batch_size = 808, timestep = 2000, input_dim = 3)
here is my sample of code.
# data shape check
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)
(727, 22, 2000)
(81, 22, 2000)
(727, 2)
(81, 2)
# Model Config
inputshape = (808,2000,2) # 22 chanel, 2000 samples
lstm_1_cell_num = 20
lstm_2_cell_num = 20
inputdrop_ratio = 0.2
celldrop_ratio = 0.2
# define model
model = Sequential()
model.add(LSTM(lstm_1_cell_num, input_shape=inputshape, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(20))
model.add(LSTM(lstm_2_cell_num, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(2, activation='sigmoid'))
print(model.summary())
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
First input shape must be (22,2000) and batch size should be given in the fit function. So try this
inputshape = (22,2000)
model.fit(X_train, y_train,
batch_size=808,
epochs=epochs,
validation_data=(X_test,y_test),
shuffle=True)

Linear regression model accuracy is always 1.0 in tensorflow

Problem:
I am building a model that will predict housing price. So, firstly I
decided to build a Linear regression model in Tensorflow. But when I
start training I see that my accuracy is always 1
I am new to machine learning. Please, someone, tell me what's going wrong I can't figure it out. I searched in google but doesn't find any answer that solves my problem.
Here's my code
df_train = df_train.loc[:, ['OverallQual', 'GrLivArea', 'GarageArea', 'SalePrice']]
df_X = df_train.loc[:, ['OverallQual', 'GrLivArea', 'GarageArea']]
df_Y = df_train.loc[:, ['SalePrice']]
df_yy = get_dummies(df_Y)
print("Shape of df_X: ", df_X.shape)
X_train, X_test, y_train, y_test = train_test_split(df_X, df_yy, test_size=0.15)
X_train = np.asarray(X_train).astype(np.float32)
X_test = np.asarray(X_test).astype(np.float32)
y_train = np.asarray(y_train).astype(np.float32)
y_test = np.asarray(y_test).astype(np.float32)
X = tf.placeholder(tf.float32, [None, num_of_features])
y = tf.placeholder(tf.float32, [None, 1])
W = tf.Variable(tf.zeros([num_of_features, 1]))
b = tf.Variable(tf.zeros([1]))
prediction = tf.add(tf.matmul(X, W), b)
num_epochs = 20000
# calculating loss
cost = tf.reduce_mean(tf.losses.softmax_cross_entropy(onehot_labels=y, logits=prediction))
optimizer = tf.train.GradientDescentOptimizer(0.00001).minimize(cost)
correct_prediction = tf.equal(tf.argmax(prediction, axis=1), tf.argmax(y, axis=1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(num_epochs):
if epoch % 100 == 0:
train_accuracy = accuracy.eval(feed_dict={X: X_train, y: y_train})
print('step %d, training accuracy %g' % (epoch, train_accuracy))
optimizer.run(feed_dict={X: X_train, y: y_train})
print('test accuracy %g' % accuracy.eval(feed_dict={
X: X_test, y: y_test}))
Output is:
step 0, training accuracy 1
step 100, training accuracy 1
step 200, training accuracy 1
step 300, training accuracy 1
step 400, training accuracy 1
step 500, training accuracy 1
step 600, training accuracy 1
step 700, training accuracy 1
............................
............................
step 19500, training accuracy 1
step 19600, training accuracy 1
step 19700, training accuracy 1
step 19800, training accuracy 1
step 19900, training accuracy 1
test accuracy 1
EDIT:
I changed my cost function to this
cost = tf.reduce_sum(tf.pow(prediction-y, 2))/(2*1241)
But still my output is always 1.
EDIT 2:
In response to lejlot comment:
Thanks lejlot. I changed my accuracy code to this
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
merged_summary = tf.summary.merge_all()
writer = tf.summary.FileWriter("/tmp/hpp1")
writer.add_graph(sess.graph)
for epoch in range(num_epochs):
if epoch % 5:
s = sess.run(merged_summary, feed_dict={X: X_train, y: y_train})
writer.add_summary(s, epoch)
sess.run(optimizer,feed_dict={X: X_train, y: y_train})
if (epoch+1) % display_step == 0:
c = sess.run(cost, feed_dict={X: X_train, y: y_train})
print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(c), \
"W=", sess.run(W), "b=", sess.run(b))
print("Optimization Finished!")
training_cost = sess.run(cost, feed_dict={X: X_train, y: y_train})
print("Training cost=", training_cost, "W=", sess.run(W), "b=", sess.run(b), '\n')
But the output is all nan
Output:
....................................
Epoch: 19900 cost= nan W= nan b= nan
Epoch: 19950 cost= nan W= nan b= nan
Epoch: 20000 cost= nan W= nan b= nan
Optimization Finished!
Training cost= nan W= nan b= nan
You want to use linear regression, but you actually use logistic regression. Take a look at tf.losses.softmax_cross_entropy: it outputs a probability distribution, i.e. a vector of numbers that sum up to 1. In your case, the vector has size=1, hence it always outputs [1].
Here are two examples that will help you see the difference: linear regression and logistic regression.

Resources