Mel Spectrogram feature extraction to CNN - machine-learning

This question is in line with the question posted here but with a slight nuance of the CNN. Using the feature extraction definition:
max_pad_len = 174
n_mels = 128
def extract_features(file_name):
try:
audio, sample_rate = librosa.core.load(file_name, res_type='kaiser_fast')
mely = librosa.feature.melspectrogram(y=audio, sr=sample_rate, n_mels=n_mels)
#pad_width = max_pad_len - mely.shape[1]
#mely = np.pad(mely, pad_width=((0, 0), (0, pad_width)), mode='constant')
except Exception as e:
print("Error encountered while parsing file: ", file_name)
return None
return mely
How do you go about getting the correct dimension of the num_rows, num_columns and num_channels to be input to the train and test data?
In constructing the CNN Model, how to determine the correct shape to input?
model = Sequential()
model.add(Conv2D(filters=16, kernel_size=2, input_shape=(num_rows, num_columns, num_channels), activation='relu'))
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.2))

I dont know if it is exactly your problem but I also have to use a MEL as an input to a CNN.
Short answer:
input_shape = (x_train.shape[1], x_train.shape[2], 1)
x_train = x_train.reshape(x_train.shape[0], x_train.shape[1], x_train.shape[2], 1)
or
x_train = x_train.reshape(x_train.shape[0], x_train.shape[1], x_train.shape[2], 1)
input_shape = x_train.shape[1:]
Long answer
In my case I have a DataFrame with speakers_id and mel spectrograms (previously calculated with librosa).
The Keras CNN models are prepared for images with width, height and channels of colors (grayscale - RGB)
The Mel Spectrograms given by librosa are image-like arrays with width and height, so you need to do a reshape to add the channel dimension.
Define the input and expected output
# It looks stupid but that way i could convert the panda.Series to a np.array
x = np.array(list(df.mel))
y = df.speaker_id
print('X shape:', x.shape)
X shape: (2204, 128, 24)
2204 Mels, 128x24
Split in train-test
x_train, x_test, y_train, y_test = train_test_split(x, y)
print(f'Train: {len(x_train)}', f'Test: {len(x_test)}')
Train: 1653 Test: 551
Reshape to add the extra dimension
x_train = x_train.reshape(x_train.shape[0], x_train.shape[1], x_train.shape[2], 1)
x_test = x_test.reshape(x_test.shape[0], x_test.shape[1], x_test.shape[2], 1)
print('Shapes:', x_train.shape, x_test.shape)
Shapes: (1653, 128, 24, 1) (551, 128, 24, 1)
Set input_shape
# The input shape is independent of the amount of inputs
input_shape = x_train.shape[1:]
print('Input shape:', input_shape)
Input shape: (128, 24, 1)
Put it into the model
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=input_shape))
model.add(MaxPooling2D())
# More layers...
model.compile(optimizer='adam',loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),metrics=['accuracy'])
Run model
model.fit(x_train, y_train, epochs=20, validation_data=(x_test, y_test))
Hope this is helpfull

Related

Errors with LSTM input shapes with time series data

I'm trying to predict torque from 8 features with an LSTM layer in my neural network. I'm having trouble with the input shape and have looked around on many sites for a solution. I'm quite new to machine learning and am having trouble understanding the problem and how I can fix this. Here is my code, dataset, and error message.
file = r'/content/drive/MyDrive/only_force_pt1.csv'
df = pd.read_csv(file)
X = df.iloc[:, 1:9]
y = df.iloc[:,9]
print(X)
print(y)
df.head()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.1, shuffle = True)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size = 0.1, shuffle = True)
[verbose, epochs, batch_size] = [1, 200, 32]
input_shape = (X_train.shape[0],X_train.shape[1])
model = Sequential()
# LSTM
model.add(LSTM(64, input_shape=input_shape, return_sequences = True))
model.add(Dense(32, activation='relu', kernel_regularizer=keras.regularizers.l2(0.001)))
#model.add(Dropout(0.2))
#model.add(Dense(32, activation='relu', kernel_regularizer=keras.regularizers.l2(0.001)))
model.add(Dense(1,activation='relu'))
earlystopper = EarlyStopping(monitor='val_loss', min_delta=0, patience = 20, verbose =1, mode = 'auto')
model.summary()
model.compile(loss = 'mse', optimizer = Adam(learning_rate = 0.001), metrics=[tf.keras.metrics.RootMeanSquaredError()])
history = model.fit(X_train, y_train, batch_size = batch_size, epochs = epochs, verbose = verbose, validation_data=(X_val,y_val), callbacks = [earlystopper])
ValueError: Input 0 of layer "sequential_17" is incompatible with the layer: expected shape=(None, 3634, 8), found shape=(None, 8)
dataset: https://drive.google.com/drive/folders/1BQOXffFYioCiPug2VcBZEZVD-u3y9bcl?usp=sharing][1]
As I understand your problem, I think that you are passing the number of data points as an additional dimension on the input shape of the LSTM layer. Your data dimensionality is 8 and 3634(=X_train.shape[0]) is the number of data points, which should match the first dimension (with None) of the input tensors, and should not be passed as a dimension to the LSTM because it is determined by the batch size.
If that's the case, change the input_shape definition to:
input_shape = (X_train.shape[1],)
and it should work.

Combination of CNN and LSTM for time series data

I'm trying to run a combination of CNN (Convolutional Neural Network) and LSTM (Long Short Term Memory), and didn't find the right reshaping for the data the fits for both. I thought LSTM needs [samples, timesteps, features], but it doesn't work here as input.
I'm receiving an error:
ValueError: Negative dimension size caused by subtracting 3 from 1 for '{{node conv1d/conv1d/Conv2D}} = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], explicit_paddings=[], padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true](conv1d/conv1d/Reshape, conv1d/conv1d/ExpandDims_1)' with input shapes: [?,1,1,24], [1,3,24,64].
The data is taken from:
https://www.kaggle.com/berkeleyearth/climate-change-earth-surface-temperature-data
It has the shape of:
date
LandAverageTemperature
1750-01-01
1.2
etc..
The full code is:
import tensorflow as tf
def preprocessing(data,n_in=1, n_out=1):
from sklearn.model_selection import train_test_split
def series_to_supervised(df, n_in=1, n_out=1,
dropnan=True):
cols = list()
# input sequence (t-n, ... t-1)
for i in range(n_in, 0, -1):
cols.append(df.shift(i))
# forecast sequence (t, t+1, ... t+n)
for i in range(0, n_out):
cols.append(df.shift(-i))
# put it all together
agg = pd.concat(cols, axis=1)
# drop rows with NaN values
if dropnan:
agg.dropna(inplace=True)
return agg.values
land_temp = pd.DataFrame(data['LandAverageTemperature'].values)
ma_vals = data['LandAverageTemperature'].expanding(min_periods=12).mean()
ma_vals_inter = ma_vals.interpolate(limit_direction='both')
df = series_to_supervised(ma_vals_inter, n_in=n_in, n_out=n_out, dropnan=True)
df = pd.DataFrame(df)
X, y = df.iloc[:, :-n_out], df.iloc[:, -n_out:]
percent = 0.8
if n_out == 1:
y = y.iloc[:, 0]
lim = int(percent * X.shape[0])
X_train, X_test, y_train, y_test = X[:lim], X[lim:], y[:lim], y[ lim:] # train_test_split( X, y, test_size=0.4, random_state=0)
return X_train, X_test, y_train, y_test
def lstm_cnn2(X_train,y_train,config,n_in,n_out=1,batch_size=1,epochs=1000,verbose=0,n_features=1):
input_y = y_train.values.reshape(y_train.shape[0], 1)
n_timesteps, n_features, n_outputs = X_train.shape[0],
X_train.shape[1], input_y.shape[1]
# reshape output into [samples, timesteps, features]
input_x = X_train.values.reshape((X_train.shape[0], 1,
n_features))
# define model
model = Sequential()
model.add(Conv1D(64, 3, activation='relu', input_shape=(n_timesteps,1,n_features)))
model.add(Conv1D(64, 3, activation='relu'))
model.add(MaxPooling1D())
model.add(Flatten())
model.add(RepeatVector(n_outputs))
model.add(LSTM(200, activation='relu',
return_sequences=True))
model.add(TimeDistributed(Dense(100,
activation='relu')))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mse', optimizer='adam')
# fit network
model.fit(input_x, input_y, epochs=epochs,
batch_size=batch_size, verbose=verbose)
return model
if __name__ == '__main__':
file_location='./GlobalTemperatures.csv'
data = pd.read_csv(file_location)
data['dt'] = pd.to_datetime(data['dt'])
n_out = 1
n_in = 12 * 2
X_train, X_test, y_train, y_test =
preprocessing(data,n_in,n_out)
config = 128, 64, 32, 3 * 48, 48, 24, 100, 20 # lstm
model configuration
verbose, epochs, batch_size = 0, 1, 16
model_lstm = lstm_cnn2(X_train, y_train, config,
n_in,batch_size=batch_size)

Keras wrong accuracy using model.predict

I am using this code for a CNN
train_batches = ImageDataGenerator().flow_from_directory('dice_sklearn/train', target_size=(IMG_WIDTH, IMG_HEIGHT),
classes=['1', '2', '3', '4', '5', '6'],
batch_size=cv_opt['batch'],
color_mode="grayscale")
test_batches = ImageDataGenerator().flow_from_directory('dice_sklearn/test', target_size=(IMG_WIDTH, IMG_HEIGHT),
class_mode='categorical',
batch_size=cv_opt['batch'],
shuffle=False)
train_num = len(train_batches)
test_num = len(test_batches)
model = Sequential([
Conv2D(32, (3, 3), padding='same', activation='relu', input_shape=(IMG_WIDTH, IMG_HEIGHT, 1)),
Conv2D(32, (3, 3), activation='relu'),
MaxPooling2D(pool_size=(2, 2)),
Dropout(0.30),
Conv2D(64, (3, 3), padding='same', activation='relu'),
Conv2D(64, (3, 3), activation='relu'),
MaxPooling2D(pool_size=(2, 2)),
Dropout(0.30),
Conv2D(64, (3, 3), padding='same', activation='relu'),
Conv2D(64, (3, 3), activation='relu'),
MaxPooling2D(pool_size=(2, 2)),
Dropout(0.30),
Flatten(),
Dense(512, activation='relu'),
Dropout(0.5),
Dense(6, activation='softmax'),
])
print(model.summary())
model.compile(Adam(lr=cv_opt['lr']), loss='categorical_crossentropy', metrics=['accuracy'])
history = model.fit(train_batches, steps_per_epoch=train_num,
epochs=cv_opt['epoch'], verbose=2)
model.save('cnn-keras.h5')
test_batches.reset()
prediction = model.predict(test_batches, steps=test_num, verbose=1)
predicted_class = np.argmax(prediction, axis=1)
classes = test_batches.classes[test_batches.index_array]
accuracy = (predicted_class == classes).mean()
print("Final accuracy:", accuracy * 100)
Where
cv_opt['batch'] is set to 50
cv_opt['lr'] is set to 0.0003
cv_opt['epoch'] is set to 50
The output from the training phase (with model.fit) on the last line (last epoch) returns:
192/192 [==============================] - 98s 510ms/step - loss: 0.0514 - accuracy: 0.9818 - val_loss: 0.0369 - val_accuracy: 0.9833
But when I run this part of code:
test_batches.reset()
prediction = model.predict(test_batches, steps=test_num, verbose=1)
predicted_class = np.argmax(prediction, axis=1)
classes = test_batches.classes[test_batches.index_array]
accuracy = (predicted_class == classes).mean()
print("Final accuracy:", accuracy * 100)
I get an accuracy score very very low: (0.16).
But if a plot the learning curves I can see that the test/validation curve (if in testing or in parameter tuning) both reach accuracies near 90%.
Am I using the model.predict in the wrong way?
Your model is not overfitting. Steps 1 and 2 do not have to be implemented at all in order to solve your problem. In fact, it is even more wrong since the author states that in case of overfitting you need to add more layers, which is strongly advised against: when one has an overfitting model, the model needs to be made simpler, not more complex.
The solution to your issue lies in #Dr.Snoopy's answer : the order of the classes do not match.
My recommendation is to iterate manually through the entire test set, get the ground truth, get the prediction (ensure the same exact preprocessing on images like in the training set is applied on your test set images) before you feed them to your model.
Then, calculate your metrics. This will solve your problem.
For example, you could use the idea below:
correctly_predicted = 0
for image in os.scandir(path_to_my_test_directory):
image_path = image.path
image = cv2.imread(image_path)
image = apply_the_same_preprocessing_like_in_training(image)
#transform from (H,W,3) to (1,H,W,3) because TF + Keras predict only on batches
image = np.expand_dims(image,axis=0)
prediction_label = np.argmax(model.predict(image))
if prediction_label == ground_truth_label:
correctly_predicted+=1

determine the Keras's Mnist input shape

I have xtrain.shape as
(60000, 28, 28)
It means 60000 channels with image size 28 * 28
I want to make a keras Sequential model.
specifying the model shape
model = Sequential()
model.add(Convolution2D(32,3,activation='relu',input_shape=(????)))
model.add(Dense(10, activation='relu'))
model.summary()
what input_shape should looks like?
model = Sequential()
model.add(Dense(64,input_shape=(1,28,28)))
when I put this I got an following error
Error when checking input: expected dense_31_input to have 4 dimensions, but got array with shape (60000, 28, 28)
why this required 4 dimensions? and how to fix it form code?
I have xtrain.shape as
(60000, 28, 28)
It means 60000 channels with image size 28 * 28
Well, it certainly does not mean that; it means 60000 samples, not channels (MNIST is a single-channel dataset).
No need to re-invent the wheel in such cases - have a look at the MNIST CNN example in Keras:
from keras import backend as K
# input image dimensions
img_rows, img_cols = 28, 28
# the data, shuffled and split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()
if K.image_data_format() == 'channels_first': # Theano backend
x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
input_shape = (1, img_rows, img_cols)
else: # Tensorflow backend
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)
# normalise:
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
# your model:
model = Sequential()
model.add(Convolution2D(32,3,activation='relu',input_shape=input_shape))
model.add(Dense(10, activation='softmax')) # change to softmax in the final layer
where you should also change the activation of the final layer to softmax (and most probably add some pooling and flatten layers before the final dense one).
Try to reshape data to (60000, 28, 28, 1) or (60000, 1, 28, 28).
First one,
model = Sequential()
model.add(Convolution2D(32,3,activation='relu',input_shape=(60000,28,28)))
model.add(Dense(10, activation='relu'))
model.summary()
Second one,
model = Sequential()
model.add(Dense(64,input_shape=(None,60000,28,28)))

how to choose LSTM 2-d input shape?

I am trying to feed 1-D signal(1,2000) which has 22 features(22,2000) into LSTM.
(1-D signal is taken by 10 seconds with 200 hz sampling rate)
And I have 808 batches. (808, 22, 2000)
I saw that the LSTM receives 3D tensor shape of (batch_size, timestep, input_dim).
So is it right that my input shape as?
: (batch_size = 808, timestep = 2000, input_dim = 3)
here is my sample of code.
# data shape check
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)
(727, 22, 2000)
(81, 22, 2000)
(727, 2)
(81, 2)
# Model Config
inputshape = (808,2000,2) # 22 chanel, 2000 samples
lstm_1_cell_num = 20
lstm_2_cell_num = 20
inputdrop_ratio = 0.2
celldrop_ratio = 0.2
# define model
model = Sequential()
model.add(LSTM(lstm_1_cell_num, input_shape=inputshape, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(20))
model.add(LSTM(lstm_2_cell_num, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(2, activation='sigmoid'))
print(model.summary())
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
First input shape must be (22,2000) and batch size should be given in the fit function. So try this
inputshape = (22,2000)
model.fit(X_train, y_train,
batch_size=808,
epochs=epochs,
validation_data=(X_test,y_test),
shuffle=True)

Resources