Text classification using Keras: How to add custom features? - machine-learning

I'm writing a program to classify texts into a few classes. Right now, the program loads the train and test samples of word indices, applies an embedding layer and a convolutional layer, and classifies them into the classes. I'm trying to add handcrafted features for experimentation, as in the following code. The features is a list of two elements, where the first element consists of features for the training data, and the second consists of features for the test data. Each training/test sample will have a corresponding feature vector (i.e. the features are not word features).
model = Sequential()
model.add(Embedding(params.nb_words,
params.embedding_dims,
weights=[embedding_matrix],
input_length=params.maxlen,
trainable=params.trainable))
model.add(Convolution1D(nb_filter=params.nb_filter,
filter_length=params.filter_length,
border_mode='valid',
activation='relu'))
model.add(Dropout(params.dropout_rate))
model.add(GlobalMaxPooling1D())
# Adding hand-picked features
model_features = Sequential()
nb_features = len(features[0][0])
model_features.add(Dense(1,
input_shape=(nb_features,),
init='uniform',
activation='relu'))
model_final = Sequential()
model_final.add(Merge([model, model_features], mode='concat'))
model_final.add(Dense(len(citfunc.funcs), activation='softmax'))
model_final.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
print model_final.summary()
model_final.fit([x_train, features[0]], y_train,
nb_epoch=params.nb_epoch,
batch_size=params.batch_size,
class_weight=data.get_class_weights(x_train, y_train))
y_pred = model_final.predict([x_test, features[1]])
My question is, is this code correct? Is there any conventional way of adding features to each of the text sequences?

Try:
input = Input(shape=(params.maxlen,))
embedding = Embedding(params.nb_words,
params.embedding_dims,
weights=[embedding_matrix],
input_length=params.maxlen,
trainable=params.trainable)(input)
conv = Convolution1D(nb_filter=params.nb_filter,
filter_length=params.filter_length,
border_mode='valid',
activation='relu')(embedding)
drop = Dropout(params.dropout_rate)(conv)
seq_features = GlobalMaxPooling1D()(drop)
# Adding hand-picked features
nb_features = len(features[0][0])
other_features = Input(shape=(nb_features,))
model_final = merge([seq_features , other_features], mode='concat'))
model_final = Dense(len(citfunc.funcs), activation='softmax'))(model_final)
model_final = Model([input, other_features], model_final)
model_final.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
In this case - you are merging features from a sequence analysis with custom features directly - without squashing all custom features to 1 features using Dense.

Related

Combining Classification and Regression in a sequential way using MLP

I am looking for a way to do classification and regression sequentially?
For example, assuming samples have 3 input values and 1 output value. The model should first classify using the 3 input values and sequentially do the regression task using the classification output (i.e. classification has 3 input values from the original samples and regression has 4 input values (3 from the original samples + the classification output).
Below the architecture that I draw. However, not really sure about the part where the second input layer occurs. Could someone give advice or working examples for this application?
input1_classification = Input(shape=(3,))
hidden1 = Dense(20, activation='relu', kernel_initializer='he_normal'(input1_classification)
# classsfication
outputout_classification = Dense(2, activation='softmax')(hidden1)
# regression input
input1_regression =Input(shape=(5,))
hidden2 = Dense(10, activation='relu', kernel_initializer='he_normal'(out_classification)
out_reg_final = Dense(1)(hidden2)
# define model
model = Model(inputs=input1_classification, outputs=[out_classification, out_reg_final])
# compile the keras modelmodel.compile(loss['sparse_categorical_crossentropy','mse'], optimizer='adam')
# fit the keras model on the dataset
model.fit(X_train, [y_train_class,y_train_reg], epochs=150, batch_size=32, verbose=2)
All you need to do is to concatenate your original input with the output of classification and apply your regression model there, you do not specify "extra" inputs.
So it will become something among the lines of:
input1_classification = Input(shape=(3,))
# classsfication
hidden1 = Dense(20, activation='relu', kernel_initializer='he_normal'(input1_classification)
outputout_classification = Dense(2, activation='softmax')(hidden1)
# regression input
new_input = Concatenate(axis=1)([input1_classification, outputout_classification ])
hidden2 = Dense(10, activation='relu', kernel_initializer='he_normal'(new_input)
out_reg_final = Dense(1)(hidden2)
# define model
model = Model(inputs=input1_classification, outputs=[out_classification, out_reg_final])
# compile the keras modelmodel.compile(loss['sparse_categorical_crossentropy','mse'], optimizer='adam')
# fit the keras model on the dataset
model.fit(X_train, [y_train_class,y_train_reg], epochs=150, batch_size=32, verbose=2)

Properly declaring input_shape for neural network in Keras?

I am attempting to write code to identify data types after loading it in from CSV files. So there are 5 possible labels, and the feature vector contains a list of lists. The feature vector is a list of lists with the following shape:
[slash_count, dash_count, colon_count, letters, dot_count, digits]
I then split my feature and label vectors into training, testing, and validation sets. I found some code on Stackoverflow that someone wrote to do this and I have used the same:
X_train, X_test, y_train, y_test = train_test_split(ml_list, labels, test_size=0.3, random_state=1)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.3, random_state=1)
After doing this, I normalize the features in scale [0,1], and then I create the categorical variables for the labels:
min_max_scaler = preprocessing.MinMaxScaler()
X_train_minmax = min_max_scaler.fit_transform(X_train)
X_test_minmax = min_max_scaler.fit_transform(X_test)
X_val_minmax = min_max_scaler.fit_transform(X_val)
from keras.utils import to_categorical
y_train_minmax = to_categorical(y_train)
y_test_minmax = to_categorical(y_test)
y_val_minmax = to_categorical(y_val)
Next, I attempt to find the shape of the newly recoded variables:
print(y_train_minmax.shape) #(91366, 4)
print(X_train_minmax.shape) #(91366, 6)
print(X_test_minmax.shape) #(55939, 6)
print(X_val_minmax.shape) #(39157, 6)
print(y_train_minmax.shape) #(91366, 4)
print(y_test_minmax.shape) #(55939, 4)
print(y_val_minmax.shape) #(39157, 4)
Finally, I build the model and attempt to fit it:
model = models.Sequential()
model.add(layers.Dense(512, activation='relu', input_shape=(91366, 6)))
model.add(layers.Dense(3, activation='softmax'))
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train_minmax, y_train_minmax, epochs=5, batch_size=128)
I get this message when I run the code:
ValueError: Error when checking input: expected dense_1_input to have 3 dimensions, but got array with shape (91366, 6)
I believe that the error is in when I create the neural network with the input shape. I am having a hard time understanding where I went wrong. Any help would be great!
You should change this line:
model.add(layers.Dense(512, activation='relu', input_shape=(6,)))
In keras you don't need to directly specify the number of examples you have in your dataset. As input_shape you need to provide only a shape of a single data point.
Another potential error which I spotted in your code snippet is that you should set:
model.add(layers.Dense(4, activation='softmax'))
As your output single data point has a shape of (4,). It's not consistent with what you've said about possible layers so I'd also advise rechecking your data.
Another possible mistake which I spotted is that you are not training separate scalers for train, test and valid datasets - but a single one on a train set - and then scale your other dataset using a trained scaler.

Classification with Keras Autoencoders

I'm trying to take a vanilla autoencoder using Keras (with a Tensorflow backend) and stop it when the loss value converges to a specific value. After the last epoch, I want to use a sigmoid function to perform classification. Would you know how to go about doing this (or at least point me in the right direction)?
The below code is quite similar to the vanilla autoencoder at http://wiseodd.github.io/techblog/2016/12/03/autoencoders/. (I'm using my own data, but feel free to use the MNIST example in the link to demonstrate what you are talking about.)
NUM_ROWS = len(x_train)
NUM_COLS = len(x_train[0])
inputs = Input(shape=(NUM_COLS, ))
h = Dense(64, activation='sigmoid')(inputs)
outputs = Dense(NUM_COLS)(h)
# trying to add last sigmoid layer
outputs = Dense(1)
outputs = Activation('sigmoid')
model = Model(input=inputs, output=outputs)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=batch,
epochs=epochs,
validation_data=(x_test, y_test))
I have an interpretation of what you are aiming at, however, you don't seem to have a very clear image yourself.
I guess you can clarify if you prepare the necessary dataset yourself.
One possible solution would be as below:
NUM_ROWS = len(x_train)
NUM_COLS = len(x_train[0])
inputs = Input(shape=(NUM_COLS, ))
h = Dense(64, activation='sigmoid')(inputs)
outputs = Dense(NUM_COLS)(h)
model = Model(input=inputs, output=outputs)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(x_train, x_train,
batch_size=batch,
epochs=epochs,
validation_data=(x_test, y_test))
h.trainable=False
# trying to add last sigmoid layer
outputs = Dense(1)(h)
outputs = Activation('sigmoid')
model2.fit(x_train, y_train,
batch_size=batch,
epochs=epochs,
validation_data=(x_test, y_test))

How to calculate prediction uncertainty using Keras?

I would like to calculate NN model certainty/confidence (see What my deep model doesn't know) - when NN tells me an image represents "8", I would like to know how certain it is. Is my model 99% certain it is "8" or is it 51% it is "8", but it could also be "6"? Some digits are quite ambiguous and I would like to know for which images the model is just "flipping a coin".
I have found some theoretical writings about this but I have trouble putting this in code. If I understand correctly, I should evaluate a testing image multiple times while "killing off" different neurons (using dropout) and then...?
Working on MNIST dataset, I am running the following model:
from keras.models import Sequential
from keras.layers import Dense, Activation, Conv2D, Flatten, Dropout
model = Sequential()
model.add(Conv2D(128, kernel_size=(7, 7),
activation='relu',
input_shape=(28, 28, 1,)))
model.add(Dropout(0.20))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(Dropout(0.20))
model.add(Flatten())
model.add(Dense(units=64, activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(units=10, activation='softmax'))
model.summary()
model.compile(loss='categorical_crossentropy',
optimizer='sgd',
metrics=['accuracy'])
model.fit(train_data, train_labels, batch_size=100, epochs=30, validation_data=(test_data, test_labels,))
How should I predict with this model so that I get its certainty about predictions too? I would appreciate some practical examples (preferably in Keras, but any will do).
To clarify, I am looking for an example of how to get certainty using the method outlined by Yurin Gal (or an explanation of why some other method yields better results).
If you want to implement dropout approach to measure uncertainty you should do the following:
Implement function which applies dropout also during the test time:
import keras.backend as K
f = K.function([model.layers[0].input, K.learning_phase()],
[model.layers[-1].output])
Use this function as uncertainty predictor e.g. in a following manner:
def predict_with_uncertainty(f, x, n_iter=10):
result = numpy.zeros((n_iter,) + x.shape)
for iter in range(n_iter):
result[iter] = f(x, 1)
prediction = result.mean(axis=0)
uncertainty = result.var(axis=0)
return prediction, uncertainty
Of course you may use any different function to compute uncertainty.
Made a few changes to the top voted answer. Now it works for me.
It's a way to estimate model uncertainty. For other source of uncertainty, I found https://eng.uber.com/neural-networks-uncertainty-estimation/ helpful.
f = K.function([model.layers[0].input, K.learning_phase()],
[model.layers[-1].output])
def predict_with_uncertainty(f, x, n_iter=10):
result = []
for i in range(n_iter):
result.append(f([x, 1]))
result = np.array(result)
prediction = result.mean(axis=0)
uncertainty = result.var(axis=0)
return prediction, uncertainty
Your model uses a softmax activation, so the simplest way to obtain some kind of uncertainty measure is to look at the output softmax probabilities:
probs = model.predict(some input data)[0]
The probs array will then be a 10-element vector of numbers in the [0, 1] range that sum to 1.0, so they can be interpreted as probabilities. For example the probability for digit 7 is just probs[7].
Then with this information you can do some post-processing, typically the predicted class is the one with highest probability, but you can also look at the class with second highest probability, etc.
A simpler way is to set training=True on any dropout layers you want to run during inference as well (essentially tells the layer to operate as if it's always in training mode - so it is always present for both training and inference).
import keras
inputs = keras.Input(shape=(10,))
x = keras.layers.Dense(3)(inputs)
outputs = keras.layers.Dropout(0.5)(x, training=True)
model = keras.Model(inputs, outputs)
Code above is from this issue.

Keras: How to feed input directly into other hidden layers of the neural net than the first?

I have a question about using Keras to which I'm rather new. I'm using a convolutional neural net that feeds its results into a standard perceptron layer, which generates my output. This CNN is fed with a series of images. This is so far quite normal.
Now I like to pass a short non-image input vector directly into the last perceptron layer without sending it through all the CNN layers. How can this be done in Keras?
My code looks like this:
# last CNN layer before perceptron layer
model.add(Convolution2D(200, 2, 2, border_mode='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Dropout(0.25))
# perceptron layer
model.add(Flatten())
# here I like to add to the input from the CNN an additional vector directly
model.add(Dense(1500, W_regularizer=l2(1e-3)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1))
Any answers are greatly appreciated, thanks!
You didn't show which kind of model you use to me, but I assume that you initialized your model as Sequential. In a Sequential model you can only stack one layer after another - so adding a "short-cut" connection is not possible.
For this reason authors of Keras added option of building "graph" models. In this case you can build a graph (DAG) of your computations. It's a more complicated than designing a stack of layers, but still quite easy.
Check the documentation site to look for more details.
Provided your Keras's backend is Theano, you can do the following:
import theano
import numpy as np
d = Dense(1500, W_regularizer=l2(1e-3), activation='relu') # I've joined activation and dense layers, based on assumption you might be interested in post-activation values
model.add(d)
model.add(Dropout(0.5))
model.add(Dense(1))
c = theano.function([d.get_input(train=False)], d.get_output(train=False))
layer_input_data = np.random.random((1,20000)).astype('float32') # refer to d.input_shape to get proper dimensions of layer's input, in my case it was (None, 20000)
o = c(layer_input_data)
The answer here works. It is more high level and works also for the tensorflow backend:
input_1 = Input(input_shape)
input_2 = Input(input_shape)
merge = merge([input_1, input_2], mode="concat") # could also to "sum", "dot", etc.
hidden = Dense(hidden_dims)(merge)
classify = Dense(output_dims, activation="softmax")(hidden)
model = Model(input=[input_1, input_2], output=hidden)

Resources