I am trying to feed a huge sparse matrix to Keras model. As the dataset doesn`t fit into RAM, the way around is to train the model on a data generated batch-by-batch by a generator.
To test this approach and make sure my solution works fine, I slightly modified a Kera`s simple MLP on the Reuters newswire topic classification task. So, the idea is to compare original and edited models. I just convert numpy.ndarray into scipy.sparse.csr.csr_matrix and feed it to the model.
But my model crashes at some point and I need a hand to figure out a reason.
Here is the original model and my additions below
from __future__ import print_function
import numpy as np
np.random.seed(1337) # for reproducibility
from keras.datasets import reuters
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.utils import np_utils
from keras.preprocessing.text import Tokenizer
max_words = 1000
batch_size = 32
nb_epoch = 5
print('Loading data...')
(X_train, y_train), (X_test, y_test) = reuters.load_data(nb_words=max_words, test_split=0.2)
print(len(X_train), 'train sequences')
print(len(X_test), 'test sequences')
nb_classes = np.max(y_train)+1
print(nb_classes, 'classes')
print('Vectorizing sequence data...')
tokenizer = Tokenizer(nb_words=max_words)
X_train = tokenizer.sequences_to_matrix(X_train, mode='binary')
X_test = tokenizer.sequences_to_matrix(X_test, mode='binary')
print('X_train shape:', X_train.shape)
print('X_test shape:', X_test.shape)
print('Convert class vector to binary class matrix (for use with categorical_crossentropy)')
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)
print('Y_train shape:', Y_train.shape)
print('Y_test shape:', Y_test.shape)
print('Building model...')
model = Sequential()
model.add(Dense(512, input_shape=(max_words,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
history = model.fit(X_train, Y_train,
nb_epoch=nb_epoch, batch_size=batch_size,
verbose=1)#, validation_split=0.1)
#score = model.evaluate(X_test, Y_test,
# batch_size=batch_size, verbose=1)
print('Test score:', score[0])
print('Test accuracy:', score[1])
It outputs:
Loading data...
8982 train sequences
2246 test sequences
46 classes
Vectorizing sequence data...
X_train shape: (8982, 1000)
X_test shape: (2246, 1000)
Convert class vector to binary class matrix (for use with categorical_crossentropy)
Y_train shape: (8982, 46)
Y_test shape: (2246, 46)
Building model...
Epoch 1/5
8982/8982 [==============================] - 5s - loss: 1.3932 - acc: 0.6906
Epoch 2/5
8982/8982 [==============================] - 4s - loss: 0.7522 - acc: 0.8234
Epoch 3/5
8982/8982 [==============================] - 5s - loss: 0.5407 - acc: 0.8681
Epoch 4/5
8982/8982 [==============================] - 5s - loss: 0.4160 - acc: 0.8980
Epoch 5/5
8982/8982 [==============================] - 5s - loss: 0.3338 - acc: 0.9136
Test score: 1.01453569163
Test accuracy: 0.797417631398
Finally, here is my part
X_train_sparse = sparse.csr_matrix(X_train)
def batch_generator(X, y, batch_size):
n_batches_for_epoch = X.shape[0]//batch_size
for i in range(n_batches_for_epoch):
index_batch = range(X.shape[0])[batch_size*i:batch_size*(i+1)]
X_batch = X[index_batch,:].todense()
y_batch = y[index_batch,:]
yield(np.array(X_batch),y_batch)
model.fit_generator(generator=batch_generator(X_train_sparse, Y_train, batch_size),
nb_epoch=nb_epoch,
samples_per_epoch=X_train_sparse.shape[0])
The crash:
Exception Traceback (most recent call last)
<ipython-input-120-6722a4f77425> in <module>()
1 model.fit_generator(generator=batch_generator(X_trainSparse, Y_train, batch_size),
2 nb_epoch=nb_epoch,
----> 3 samples_per_epoch=X_trainSparse.shape[0])
/home/kk/miniconda2/envs/tensorflow/lib/python2.7/site-packages/keras/models.pyc in fit_generator(self, generator, samples_per_epoch, nb_epoch, verbose, callbacks, validation_data, nb_val_samples, class_weight, max_q_size, **kwargs)
648 nb_val_samples=nb_val_samples,
649 class_weight=class_weight,
--> 650 max_q_size=max_q_size)
651
652 def evaluate_generator(self, generator, val_samples, max_q_size=10, **kwargs):
/home/kk/miniconda2/envs/tensorflow/lib/python2.7/site-packages/keras/engine/training.pyc in fit_generator(self, generator, samples_per_epoch, nb_epoch, verbose, callbacks, validation_data, nb_val_samples, class_weight, max_q_size)
1356 raise Exception('output of generator should be a tuple '
1357 '(x, y, sample_weight) '
-> 1358 'or (x, y). Found: ' + str(generator_output))
1359 if len(generator_output) == 2:
1360 x, y = generator_output
Exception: output of generator should be a tuple (x, y, sample_weight) or (x, y). Found: None
I believe the problem is due to wrong setup of samples_per_epoch. I`d trully appreciate if someone could comment on this.
Here is my solution.
def batch_generator(X, y, batch_size):
number_of_batches = samples_per_epoch/batch_size
counter=0
shuffle_index = np.arange(np.shape(y)[0])
np.random.shuffle(shuffle_index)
X = X[shuffle_index, :]
y = y[shuffle_index]
while 1:
index_batch = shuffle_index[batch_size*counter:batch_size*(counter+1)]
X_batch = X[index_batch,:].todense()
y_batch = y[index_batch]
counter += 1
yield(np.array(X_batch),y_batch)
if (counter < number_of_batches):
np.random.shuffle(shuffle_index)
counter=0
In my case, X - sparse matrix, y - array.
If you can use Lasagne instead of Keras I've written a small MLP class with the following features:
supports both dense and sparse matrices
supports drop-out and hidden layer
Supports complete probability distribution instead of one-hot labels so supports multilabel training.
Supports scikit-learn like API (fit, predict, accuracy, etc.)
Is very easy to configure and modify
Related
I am kind of new to using the mlxtend package and as well as the Keras package so please bear with me. I have been trying to combine predictions of various models, i.e., Random Forest, Logistic Regression, and a Neural Network model, using StackingCVClassifier. I am trying to stack these classifiers that operate on different feature subsets. Kindly see the code as follows.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from tensorflow import keras
from keras import layers
from keras.constraints import maxnorm
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten, Input
from mlxtend.classifier import StackingCVClassifier
from mlxtend.feature_selection import ColumnSelector
from sklearn.pipeline import make_pipeline
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.neural_network import MLPClassifier
X, y = make_classification()
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=0)
# defining neural network model
def create_model ():
# create model
model = Sequential()
model.add(Dense(10, input_dim=10, activation='relu'))
model.add(Dropout(0.2))
model.add(Flatten())
optimizer= keras.optimizers.RMSprop(lr=0.001)
model.add(Dense(units = 1, activation = 'sigmoid')) # Compile model
model.compile(loss='binary_crossentropy',
optimizer=optimizer, metrics=[keras.metrics.AUC(), 'accuracy'])
return model
# using KerasClassifier on the neural network model
NN_clf=KerasClassifier(build_fn=create_model, epochs=5, batch_size= 5)
NN_clf._estimator_type = "classifier"
# stacking of classifiers that operate on different feature subsets
pipeline1 = make_pipeline(ColumnSelector(cols=(np.arange(0, 5, 1))), LogisticRegression())
pipeline2 = make_pipeline(ColumnSelector(cols=(np.arange(5, 10, 1))), RandomForestClassifier())
pipeline3 = make_pipeline(ColumnSelector(cols=(np.arange(10, 20, 1))), NN_clf)
# final stacking
clf = StackingCVClassifier(classifiers=[pipeline1, pipeline2, pipeline3], meta_classifier=MLPClassifier())
clf.fit(X_train, y_train)
print("Stacking model score: %.3f" % clf.score(X_val, y_val))
However, I am getting this error:
ValueError Traceback (most recent call last)
<ipython-input-11-ef342536824f> in <module>
42 # final stacking
43 clf = StackingCVClassifier(classifiers=[pipeline1, pipeline2, pipeline3], meta_classifier=MLPClassifier())
---> 44 clf.fit(X_train, y_train)
45
46 print("Stacking model score: %.3f" % clf.score(X_val, y_val))
~\anaconda3\lib\site-packages\mlxtend\classifier\stacking_cv_classification.py in fit(self, X, y, groups, sample_weight)
282 meta_features = prediction
283 else:
--> 284 meta_features = np.column_stack((meta_features, prediction))
285
286 if self.store_train_meta_features:
~\anaconda3\lib\site-packages\numpy\core\overrides.py in column_stack(*args, **kwargs)
~\anaconda3\lib\site-packages\numpy\lib\shape_base.py in column_stack(tup)
654 arr = array(arr, copy=False, subok=True, ndmin=2).T
655 arrays.append(arr)
--> 656 return _nx.concatenate(arrays, 1)
657
658
~\anaconda3\lib\site-packages\numpy\core\overrides.py in concatenate(*args, **kwargs)
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 3 dimension(s)
Please help me. Thanks!
The error is happening because you are combining prediction from traditional ML models and DL model.
ML models are giving predictions in the shape like this (80,1) whereas DL model is predicting in shape like this (80,1,1), so there is mismatch while trying to append all the predictions.
Common workaround for this is to strip the extra dimension of predictions given by DL method to make it (80,1) instead of (80,1,1)
So, open the py file located inside:
anaconda3\lib\site-packages\mlxtend\classifier\stacking_cv_classification.py
In the line 280 and 356, outside of if block, add this:
prediction = prediction.squeeze(axis=1) if len(prediction.shape)>2 else prediction
So, it will look something like this:
...
...
...
if not self.use_probas:
prediction = prediction[:, np.newaxis]
elif self.drop_proba_col == "last":
prediction = prediction[:, :-1]
elif self.drop_proba_col == "first":
prediction = prediction[:, 1:]
prediction = prediction.squeeze(axis=1) if len(prediction.shape)>2 else prediction
if meta_features is None:
meta_features = prediction
else:
meta_features = np.column_stack((meta_features, prediction))
...
...
...
for model in self.clfs_:
if not self.use_probas:
prediction = model.predict(X)[:, np.newaxis]
else:
if self.drop_proba_col == "last":
prediction = model.predict_proba(X)[:, :-1]
elif self.drop_proba_col == "first":
prediction = model.predict_proba(X)[:, 1:]
else:
prediction = model.predict_proba(X)
prediction = prediction.squeeze(axis=1) if len(prediction.shape)>2 else prediction
per_model_preds.append(prediction)
...
...
...
Prakash's answer raises really good points.
If you want to get this running without too many changes, you can roll your own version of a scikit-learn BaseEstimator/ClassifierMixin object, or wrap in the recommended KerasClassifier object.
i.e. You can roll your own estimator like this:
class MyKerasModel(BaseEstimator, ClassifierMixin):
def fit(self, X, y):
model = keras.Sequential()
model.add(layers.Input(shape=X.shape[1]))
model.add(layers.Dense(10, input_dim=10, activation='relu'))
model.add(layers.Dropout(0.2))
model.add(layers.Flatten())
model.add(layers.Dense(units = 1, activation = 'sigmoid'))
optimizer= keras.optimizers.RMSprop(learning_rate=0.001)
model.compile(loss='binary_crossentropy',
optimizer=optimizer, metrics=[keras.metrics.AUC(), 'accuracy'])
model.fit(X, y)
self.model = model
return self
def predict(self, X):
return (self.model.predict(X) > 0.5).flatten()
And putting all the pieces together allows you to stack the predictions:
from sklearn.base import BaseEstimator, ClassifierMixin
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from tensorflow import keras
from tensorflow.keras import layers
from mlxtend.classifier import StackingCVClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neural_network import MLPClassifier
X, y = make_classification()
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=0)
class MyKerasModel(BaseEstimator, ClassifierMixin):
def fit(self, X, y):
model = keras.Sequential()
model.add(layers.Input(shape=X.shape[1]))
model.add(layers.Dense(10, input_dim=10, activation='relu'))
model.add(layers.Dropout(0.2))
model.add(layers.Flatten())
model.add(layers.Dense(units = 1, activation = 'sigmoid'))
optimizer= keras.optimizers.RMSprop(learning_rate=0.001)
model.compile(loss='binary_crossentropy',
optimizer=optimizer, metrics=[keras.metrics.AUC(), 'accuracy'])
model.fit(X, y)
self.model = model
return self
def predict(self, X):
return (self.model.predict(X) > 0.5).flatten()
clf = StackingCVClassifier(
classifiers=[RandomForestClassifier(), LogisticRegression(), MyKerasModel()],
meta_classifier=MLPClassifier(),
).fit(X_train, y_train)
print("Stacking model score: %.3f" % clf.score(X_val, y_val))
Output:
2/2 [==============================] - 0s 11ms/step - loss: 0.8580 - auc: 0.5050 - accuracy: 0.5500
2/2 [==============================] - 0s 1ms/step
2/2 [==============================] - 0s 4ms/step - loss: 0.6955 - auc_1: 0.5777 - accuracy: 0.5750
2/2 [==============================] - 0s 1ms/step
3/3 [==============================] - 0s 3ms/step - loss: 0.7655 - auc_2: 0.6037 - accuracy: 0.6125
Stacking model score: 1.000
I am trying to build a neural network to forecast the price of an financial asset using 4 input facotors (the price of oil, the USD value, Inflation and an index that is measuring the uncertainty in the economy)
To evaluate the model I want to compare the Mean Absolute Error of my prediction compared to the actual values of the test data.
But whenever I rerun my code I get a different result for the mean absolute error (without changing the model, just reruning the exact same code).
And I have no idea why that is the case.
This is my code:
First I load the data
data = pd.read_excel("data/Data_final.xlsx", index_col= "Date", parse_dates = True).dropna()
data = data["2020":"1985"].sort_values("Date", ascending = True)
Then I split the data into the input variables and the output variable (price of gold) and split the data into a training and a test set - I used shuffle = False to get the same train and test data set each time
x = data[["WTI_Crude_Oil", "US_CPI_all_Urban", "USDX", "Uncertainty_Index"]]
y = data.iloc[:,0]
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size =0.20, shuffle = False)
After that I normalize the data
mean_x = x_train.mean(axis = 0)
x_train = x_train - mean_x
std_x = x_train.std(axis = 0)
x_train = x_train / std_x
# normalize test data as well using mean and std of train data to avoid look ahead bias
x_test = x_test - mean_x
x_test = x_test / std_x
Then I am build my model
from keras import models
from keras import layers
model = models.Sequential()
model.add(layers.Dense(64, activation = "relu", input_shape = (x_train.shape[1],)))
model.add(layers.Dense(16, activation = "relu"))
model.add(layers.Dense(1))
model.compile(optimizer = "rmsprop", loss = "mse", metrics = ["mae"])
Then I fit and evaluate the model
model.fit(x_train, y_train, epochs = 600)
test_mse_score, test_mae_score = model.evaluate(x_test, y_test)
My goal is to reduce the Mean absolute error, but for every time I run the code I get a different Mean Absolute Error. Where is the mistake in my code? Snippet of the data for better understanding
Try this:
from pandas_datareader import data as wb
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.pylab import rcParams
from sklearn.preprocessing import MinMaxScaler
start = '2019-06-30'
end = '2020-06-30'
tickers = ['SBUX']
thelen = len(tickers)
price_data = []
for ticker in tickers:
prices = wb.DataReader(ticker, start = start, end = end, data_source='yahoo')[['Open','Adj Close']]
price_data.append(prices.assign(ticker=ticker)[['ticker', 'Open', 'Adj Close']])
#names = np.reshape(price_data, (len(price_data), 1))
df = pd.concat(price_data)
df.reset_index(inplace=True)
for col in df.columns:
print(col)
#used for setting the output figure size
rcParams['figure.figsize'] = 20,10
#to normalize the given input data
scaler = MinMaxScaler(feature_range=(0, 1))
#to read input data set (place the file name inside ' ') as shown below
df.head()
df['Date'] = pd.to_datetime(df.Date,format='%Y-%m-%d')
# df.index = names['Date']
#3plt.figure(figsize=(16,8))
# plt.plot(df['Adj Close'], label='Closing Price')
ntrain = 80
df_train = df.head(int(len(df)*(ntrain/100)))
ntest = -80
df_test = df.tail(int(len(df)*(ntest/100)))
#importing the packages
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM
#dataframe creation
seriesdata = df.sort_index(ascending=True, axis=0)
new_seriesdata = pd.DataFrame(index=range(0,len(df)),columns=['Date','Adj Close'])
length_of_data=len(seriesdata)
for i in range(0,length_of_data):
new_seriesdata['Date'][i] = seriesdata['Date'][i]
new_seriesdata['Adj Close'][i] = seriesdata['Adj Close'][i]
#setting the index again
new_seriesdata.index = new_seriesdata.Date
new_seriesdata.drop('Date', axis=1, inplace=True)
#creating train and test sets this comprises the entire data’s present in the dataset
myseriesdataset = new_seriesdata.values
totrain = myseriesdataset[0:255,:]
tovalid = myseriesdataset[255:,:]
#converting dataset into x_train and y_train
scalerdata = MinMaxScaler(feature_range=(0, 1))
scale_data = scalerdata.fit_transform(myseriesdataset)
x_totrain, y_totrain = [], []
length_of_totrain=len(totrain)
for i in range(60,length_of_totrain):
x_totrain.append(scale_data[i-60:i,0])
y_totrain.append(scale_data[i,0])
x_totrain, y_totrain = np.array(x_totrain), np.array(y_totrain)
x_totrain = np.reshape(x_totrain, (x_totrain.shape[0],x_totrain.shape[1],1))
#LSTM neural network
lstm_model = Sequential()
lstm_model.add(LSTM(units=50, return_sequences=True, input_shape=(x_totrain.shape[1],1)))
lstm_model.add(LSTM(units=50))
lstm_model.add(Dense(1))
lstm_model.compile(loss='mean_squared_error', optimizer='adadelta')
lstm_model.fit(x_totrain, y_totrain, epochs=10, batch_size=1, verbose=2)
#predicting next data stock price
myinputs = new_seriesdata[len(new_seriesdata) - (len(tovalid)+1) - 60:].values
myinputs = myinputs.reshape(-1,1)
myinputs = scalerdata.transform(myinputs)
tostore_test_result = []
for i in range(60,myinputs.shape[0]):
tostore_test_result.append(myinputs[i-60:i,0])
tostore_test_result = np.array(tostore_test_result)
tostore_test_result = np.reshape(tostore_test_result,(tostore_test_result.shape[0],tostore_test_result.shape[1],1))
myclosing_priceresult = lstm_model.predict(tostore_test_result)
myclosing_priceresult = scalerdata.inverse_transform(myclosing_priceresult)
totrain = df_train
tovalid = df_test
#predicting next data stock price
myinputs = new_seriesdata[len(new_seriesdata) - (len(tovalid)+1) - 60:].values
# Printing the next day’s predicted stock price.
print(len(tostore_test_result));
print(myclosing_priceresult);
Result:
Date
ticker
Open
Adj Close
Epoch 1/10
193/193 - 3s - loss: 0.2868
Epoch 2/10
193/193 - 3s - loss: 0.2671
Epoch 3/10
193/193 - 3s - loss: 0.2465
Epoch 4/10
193/193 - 3s - loss: 0.2253
Epoch 5/10
193/193 - 3s - loss: 0.2040
Epoch 6/10
193/193 - 3s - loss: 0.1827
Epoch 7/10
193/193 - 3s - loss: 0.1617
Epoch 8/10
193/193 - 3s - loss: 0.1413
Epoch 9/10
193/193 - 3s - loss: 0.1217
Epoch 10/10
193/193 - 3s - loss: 0.1033
1
[[67.15351]]
I've built a recurrent neural network to predict time series data. I seemed to be getting reasonable results, it would start at 33% accuracy for three classes as expected and would get better up to a point, also as expected. I wanted to test the network just to make sure it's actually working so I created a basic input / output as follows:
in1 in2 in3 out
1 2 3 0
5 6 7 1
9 10 11 2
1 2 3 0
5 6 7 1
9 10 11 2
I copied this pattern through a million lines in a csv. I would have assumed the neural network could easily identify this pattern, as it's always the same. I've tried a leaning rate of .1, .01, .001, .0001, but it always stays at about 33% (34% for the .0001 lr). I'll post my code below, should the neural net easily be able to identify this, is there something very wrong with my set up?
import os
import pandas as pd
from sklearn import preprocessing
from collections import deque
import random
import numpy as np
import time
import random
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, LSTM, CuDNNLSTM, BatchNormalization
from tensorflow.keras.models import load_model
df = pd.DataFrame()
df = pd.read_csv('/home/drew/Desktop/numbers.csv')
times = sorted(df.index.values)
last_5pct = times[-int(0.1*len(times))]
validation_df = df[(df.index >= last_5pct)]
main_df = df[(df.index < last_5pct)]
epochs = 10
batch_size = 64
train_x = main_df[['in1','in2','in3']]
validation_x = validation_df[['in1','in2','in3']]
train_y = main_df[['out']]
validation_y = validation_df[['out']]
train_x = train_x.values
validation_x = validation_x.values
train_x = train_x.reshape(train_x.shape[0],1,3)
validation_x = validation_x.reshape(validation_x.shape[0],1,3)
model = Sequential()
model.add(LSTM(128, input_shape=(train_x.shape[1:]), return_sequences=True))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(LSTM(128, input_shape=(train_x.shape[1:]), return_sequences=True))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(Dense(32, activation="relu"))
model.add(Dropout(0.2))
model.add(Dense(3, activation="softmax"))
opt = tf.keras.optimizers.Adam(lr=0.0001, decay=1e-6)
model.compile(loss='sparse_categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
model.fit(train_x, train_y,
batch_size=batch_size,
epochs=epochs,
validation_data=(validation_x, validation_y))
result:
Train on 943718 samples, validate on 104857 samples
Epoch 1/10
943718/943718 [===] - 125s 132us/sample - loss: 0.0040 - acc: 0.3436 - val_loss: 2.3842e-07 - val_acc: 0.3439
Epoch 2/10
943718/943718 [===] - 111s 118us/sample - loss: 2.1557e-06 - acc: 0.3437 - val_loss: 2.3842e-07 - val_acc: 0.3435
...
Train on 943718 samples, validate on 104857 samples
Epoch 1/10
943718/943718 [==============================] - 125s 132us/sample - loss: 0.0040 - acc: 0.3436 - val_loss: 2.3842e-07 - val_acc: 0.3439
Epoch 2/10
943718/943718 [==============================] - 111s 118us/sample - loss: 2.1557e-06 - acc: 0.3437 - val_loss: 2.3842e-07 - val_acc: 0.3435
Epoch 6/10
719104/943718 [=====================>........] - ETA: 25s - loss: 2.4936e-07 - acc: 0.3436
It seems that you are not predicting time series result, so there is no need to return sequence after the second LSTM layer.
If you change the second LSTM to:
model.add(LSTM(128, input_shape=(train_x.shape[1:])))
Then, you should get high accuracy.
I am trying to train a Lstm model for Text classification on Amazon Fine food review problem , I am using same dataset as provided by kaggle , I am using tokenizer to convert text data into tokens , but while I am training I am getting same accuracy for all the epochs .Like this
Epoch 1/5
55440/55440 [==============================] - 161s 3ms/step - loss: 2.3666 - acc: 0.8516 - val_loss: 2.3741 - val_acc: 0.8511
Epoch 2/5
55440/55440 [==============================] - 159s 3ms/step - loss: 2.3666 - acc: 0.8516 - val_loss: 2.3741 - val_acc: 0.8511
Epoch 3/5
55440/55440 [==============================] - 160s 3ms/step - loss: 2.3666 - acc: 0.8516 - val_loss: 2.3741 - val_acc: 0.8511
Epoch 4/5
55440/55440 [==============================] - 160s 3ms/step - loss: 2.3666 - acc: 0.8516 - val_loss: 2.3741 - val_acc: 0.8511
Moreover when I am plotting my confusion matrix None of the Negative classes are predicted only positive classes are predicted.
I think I am mostly doing wrong when converting Labels i.e. 'Positive' and 'Negative' to some numerical representation for classification.
Please see my code for more details.
I have tried increasing number of Lstm units and epochs also tried increasing length of sequences but none worked , please note that I have done all pre-processing on reviews as required.
# train and test split x is dataframe consisting of amazon fine food review
#dataset
x = df
x1 = pd.DataFrame(x)
y = df['Score']
x1.head()
import math
train_pct_index = int(0.70 * len(df)) #train data size = 70%
X_train, X_test = x1[:train_pct_index], x1[train_pct_index:]
y_train, y_test = y[:train_pct_index], y[train_pct_index:]
#y_test.value_counts()
x1_df = pd.DataFrame(X_train)
x2_df = pd.DataFrame(X_test)
from sklearn import preprocessing
encoder = preprocessing.LabelEncoder()
y_train=encoder.fit_transform(y_train)
y_test=encoder.fit_transform(y_test)
# tokenizing reviews
tokenizer = Tokenizer(num_words = 5000 )
tokenizer.fit_on_texts(x1_df['CleanedText'])
sequences = tokenizer.texts_to_sequences(x1_df['CleanedText'])
test_sequences = tokenizer.texts_to_sequences(x2_df['CleanedText'])
train_data = pad_sequences(sequences, maxlen=500)
test_data = pad_sequences(test_sequences, maxlen=500)
nb_words = (np.max(train_data) + 1)
# building lstm model and compiling it
from keras.layers.recurrent import LSTM, GRU
model = Sequential()
model.add(Embedding(nb_words,50,input_length=500))
model.add(LSTM(20))
model.add(Dropout(0.5))
model.add(Dense(1, activation='softmax'))
model.summary()
model.compile(loss='binary_crossentropy', optimizer='adam', metrics = ['accuracy'])
I would like my lstm model to generalise well and predict negative reviews which are minority class in this case as well.
I am trying to build a classifier using deep learning techniques and used cifar-10 dataset to build a classifier.I have tried to build a classifier with 1024 hidden nodes. Each image is of 32*32*3(R-G-B) size.I have loaded data from only 3/5 files of dataset as my computer has low processing power.
from __future__ import print_function
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
import os
import sys
import tarfile
import random
from IPython.display import display, Image
from scipy import ndimage
from sklearn.linear_model import LogisticRegression
from six.moves.urllib.request import urlretrieve
from six.moves import cPickle as pickle
from sklearn.preprocessing import MultiLabelBinarizer
folder='/home/cifar-10-batches-py/'
training_data=np.ndarray((30000,3072),dtype=np.float32)
training_labels=np.ndarray(30000,dtype=np.int32)
testing_data=np.ndarray((10000,3072),dtype=np.float32)
testing_labels=np.ndarray(10000,dtype=np.int32)
no_of_files=3
begin=0
end=10000
for i in range(no_of_files):
with open(folder+"data_batch_"+str(i+1),'rb') as f:
s=pickle.load(f,encoding='bytes')
training_data[begin:end]=s[b'data']
training_labels[begin:end]=s[b'labels']
begin=begin+10000
end=end+10000
test_path='/home/cifar-10-batches-py/test_batch'
with open(test_path,'rb') as d:
s9=pickle.load(d,encoding='bytes')
tdata=s9[b'data']
testing_data=tdata
tlabels=s9[b'labels']
testing_labels=tlabels
test_data=np.ndarray((5000,3072),dtype=np.float32)
test_labels=np.ndarray(5000,dtype=np.int32)
valid_data=np.ndarray((5000,3072),dtype=np.float32)
valid_labels=np.ndarray(5000,dtype=np.int32)
valid_data[:,:]=testing_data[:5000, :]
valid_labels[:]=testing_labels[:5000]
test_data[:,:]=testing_data[5000:, :]
test_labels[:]=testing_labels[5000:]
onehot_training_labels=np.eye(10)[training_labels.astype(int)]
onehot_test_labels=np.eye(10)[test_labels.astype(int)]
onehot_valid_labels=np.eye(10)[valid_labels.astype(int)]
image_size=32
num_labels=10
train_subset = 10000
def accuracy(predictions, labels):
return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
/ predictions.shape[0])
batch_size = 128
relu_count = 1024 #hidden nodes count
graph = tf.Graph()
with graph.as_default():
tf_train_dataset = tf.placeholder(tf.float32,
shape=(batch_size, image_size * image_size*3))
tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
tf_valid_dataset = tf.constant(valid_data)
tf_test_dataset = tf.constant(test_data)
beta_regul = tf.placeholder(tf.float32)
weights1 = tf.Variable(
tf.truncated_normal([image_size * image_size*3, relu_count]))
biases1 = tf.Variable(tf.zeros([relu_count]))
weights2 = tf.Variable(
tf.truncated_normal([relu_count, num_labels]))
biases2 = tf.Variable(tf.zeros([num_labels]))
preds = tf.matmul( tf.nn.relu(tf.matmul(tf_train_dataset, weights1) + biases1), weights2) + biases2
loss = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(logits=preds, labels=tf_train_labels))+ \
beta_regul * (tf.nn.l2_loss(weights1) + tf.nn.l2_loss(weights2))
optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
train_prediction = tf.nn.softmax(preds)
lay1_valid = tf.nn.relu(tf.matmul(tf_valid_dataset, weights1) + biases1)
valid_prediction = tf.nn.softmax(tf.matmul(lay1_valid, weights2) + biases2)
lay1_test = tf.nn.relu(tf.matmul(tf_test_dataset, weights1) + biases1)
test_prediction = tf.nn.softmax(tf.matmul(lay1_test, weights2) + biases2)
num_steps = 5000
with tf.Session(graph=graph) as session:
tf.initialize_all_variables().run()
print("Initialized")
for step in range(num_steps):
offset = (step * batch_size) % (onehot_training_labels.shape[0] - batch_size)
batch_data = training_data[offset:(offset + batch_size), :]
batch_labels = onehot_training_labels[offset:(offset + batch_size), :]
feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels,beta_regul : 1e-3}
_, l, predictions = session.run(
[optimizer, loss, train_prediction], feed_dict=feed_dict)
if (step % 500 == 0):
print("Minibatch loss at step %d: %f" % (step, l))
print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
print("Validation accuracy: %.1f%%" % accuracy(
valid_prediction.eval(), onehot_valid_labels))
print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), onehot_test_labels))
the output for this code is:
WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/util/tf_should_use.py:170: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.global_variables_initializer` instead.
Initialized
Minibatch loss at step 0: 117783.914062
Minibatch accuracy: 14.8%
Validation accuracy: 10.2%
Minibatch loss at step 500: 3632989892247552.000000
Minibatch accuracy: 12.5%
Validation accuracy: 10.1%
Minibatch loss at step 1000: 2203224941527040.000000
Minibatch accuracy: 6.2%
Validation accuracy: 9.9%
Minibatch loss at step 1500: 1336172110413824.000000
Minibatch accuracy: 10.9%
Validation accuracy: 9.8%
Minibatch loss at step 2000: 810328996708352.000000
Minibatch accuracy: 8.6%
Validation accuracy: 10.1%
Minibatch loss at step 2500: 491423044468736.000000
Minibatch accuracy: 9.4%
Validation accuracy: 10.1%
Minibatch loss at step 3000: 298025566076928.000000
Minibatch accuracy: 12.5%
Validation accuracy: 9.8%
Minibatch loss at step 3500: 180741635833856.000000
Minibatch accuracy: 10.9%
Validation accuracy: 9.8%
Minibatch loss at step 4000: 109611013111808.000000
Minibatch accuracy: 15.6%
Validation accuracy: 10.1%
Minibatch loss at step 4500: 66473376612352.000000
Minibatch accuracy: 3.9%
Validation accuracy: 9.9%
Test accuracy: 10.2%
where am i doing it wrong? I see the accuracy is very low.
As far as I can see, you are building a simple 2-layer FNN with Tensorflow. While it is okay, you won't get a really high accuracy. But if you try, you need to carefully tune hyperparameters - learning rate, regularization strength, decay rate, number of neurons in hidden layers.
You are using not all data, so it will surely descrease the quality of predictions. It could still work, but you should check classes distribution in train, val and test sets. It is possible that some classes have too little values in one of datasets. You need at least to stratify your selection.
Are you sure you have a deep understanding of deep learning? It could be a good idea to try cs231n course.