Related
I am new in Machine learning, and I want to detect emotions from the face.
Preprocessing: I used equalizeHist to equalizes the histogram of grayscale images (JAFFE database with 213 images), in the goal to normalizes the brightness and increases the contrast of the image.
Feature extraction: I extract features with Gabors filters from images, and I get a matrix of 213x120. I split data: 60% train data and 40% test data, and I normalize it.
Train and test: for training the model, I use SVM classifier with an RBF kernel. With grid-search, I select the best couple C, gamma (using 10-fold cross-validation). Then, I test the performance of the model on the test data (unseen data), and I get 89% accuracy.
The problem is when I want to predict emotion from a new input face, I get a false result.. Is it an overfitting problem?
UPDATE 1: feature extraction
gf = GaborFilter(ksize=(11, 11), freq_nbr=5, or_nbr=8, lambd=4, sigma=8, gamma=5, psi=5*np.pi/6) #Gabor filter with *cv2.getGaborKernel*
data = []
for j in range(len(roi1_images)): # len(roi1_images) = 213
vect1 = feature_extraction_roi_avr(roi1_images[j], gf.kernels) # feature vect from left eye
vect2 = feature_extraction_roi_avr(roi2_images[j], gf.kernels) # feature vect from right eye (40 features)
vect3 = feature_extraction_roi_avr(roi3_images[j], gf.kernels) # feature vect from mouth (40 features)
vect = np.concatenate((vect1, vect2, vect3), axis=None) # feature vect of one face (40 features)
data.append(vect)
data = np.array(data) # Data matrix (213, 120)
UPDATE 2: Learn and test model
clf = GridSearchCV(estimator=SVC(kernel='rbf'), param_grid=svm_parameters, cv=10, n_jobs=-1) # grid search with 10fold cross validation
scaler = StandardScaler()
# Split data
X_train_, X_test_, y_train, y_test = train_test_split(data, data_labels, random_state=0, test_size=0.4, stratify=data_labels)
X_train = scaler.fit_transform(X_train) # Normalize train data
clf.fit(X_train, y_train) # fit SVM model
X_test = scaler.transform(X_test) # Normalize test data
score = clf.score(X_test, y_test) # calculate mean accuracy
print(score) # score accuracy = 0.8953488372093024
UPDATE 3: predict new input
landmarks, gs_image = detect_landmarks(path_image)
roi1_image = extract_roi(gs_image, landmarks, 1) #extract region 1 (eye 1)
roi2_image = extract_roi(gs_image, landmarks, 2) #extract region 2 (eye 2)
roi3_image = extract_roi(gs_image, landmarks, 3) #extract region 3 (eye 3)
vect1 = feature_extraction_roi_avr(roi1_image, gf.kernels)
vect2 = feature_extraction_roi_avr(roi2_image, gf.kernels)
vect3 = feature_extraction_roi_avr(roi3_image, gf.kernels)
vect = np.concatenate((vect1, vect2, vect3), axis=0)
vect = np.reshape(vect, (1,-1))
vect = scaler.transform(vect)
class_ = clf.predict(vect)[0]
print(class_,end=" ")
I am using the keras model with the following layers to predict a label of input (out of 4 labels)
embedding_layer = keras.layers.Embedding(MAX_NB_WORDS,
EMBEDDING_DIM,
weights=[embedding_matrix],
input_length=MAX_SEQUENCE_LENGTH,
trainable=False)
sequence_input = keras.layers.Input(shape = (MAX_SEQUENCE_LENGTH,),
dtype = 'int32')
embedded_sequences = embedding_layer(sequence_input)
hidden_layer = keras.layers.Dense(50, activation='relu')(embedded_sequences)
flat = keras.layers.Flatten()(hidden_layer)
preds = keras.layers.Dense(4, activation='softmax')(flat)
model = keras.models.Model(sequence_input, preds)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['acc'])
model.fit(X_train, Y_train, batch_size=32, epochs=100)
However, the softmax function returns a number of outputs of 4 (because I have 4 labels)
When I'm using the predict function to get the predicted Y using the same model, I am getting an array of 4 for each X rather than one single label deciding the label for the input.
model.predict(X_test, batch_size = None, verbose = 0, steps = None)
How do I make the output layer of the keras model, or the model.predict function, decide on one single label, rather than output weights for each label?
The following is a common function to sample from a probability vector
def sample(preds, temperature=1.0):
# helper function to sample an index from a probability array
preds = np.asarray(preds).astype('float64')
preds = np.log(preds) / temperature
exp_preds = np.exp(preds)
preds = exp_preds / np.sum(exp_preds)
probas = np.random.multinomial(1, preds, 1)
return np.argmax(probas)
Taken from here.
The temperature parameter decides how much the differences between the probability weights are weightd. A temperature of 1 is considering each weight "as it is", a temperature larger than 1 reduces the differences between the weights, a temperature smaller than 1 increases them.
Here an example using a probability vector on 3 labels:
p = np.array([0.1, 0.7, 0.2]) # The first label has a probability of 10% of being chosen, the second 70%, the third 20%
print(sample(p, 1)) # sample using the input probabilities, unchanged
print(sample(p, 0.1)) # the new vector of probabilities from which to sample is [ 3.54012033e-09, 9.99996371e-01, 3.62508322e-06]
print(sample(p, 10)) # the new vector of probabilities from which to sample is [ 0.30426696, 0.36962778, 0.32610526]
To see the new vector make sample return preds.
I have trained an LSTM RNN classification model on Tensorflow. I was saving and restoring checkpoints to retrain and use the model for testing. Now I want to use Tensorflow serving so that I can use the model in production.
Initially, I would parse through a corpus to create my dictionary which is then used to map words in a string to integers. I would then store this dictionary in a pickle file which could be reloaded when restoring a checkpoint and retraining on a data set or just for using the model so that the mapping is consistent. How do I store this dictionary when saving the model using SavedModelBuilder?
My code for the neural network is as follows. The code for saving the model is towards the end (I am including an overview of the whole structure for context):
...
# Read files and store them in variables
with open('./someReview.txt', 'r') as f:
reviews = f.read()
with open('./someLabels.txt', 'r') as f:
labels = f.read()
...
#Pre-processing functions
#Parse through dataset and create a vocabulary
vocab_to_int, reviews = RnnPreprocessing.map_vocab_to_int(reviews)
with open(pickle_path, 'wb') as handle:
pickle.dump(vocab_to_int, handle, protocol=pickle.HIGHEST_PROTOCOL)
#More preprocessing functions
...
# Building the graph
lstm_size = 256
lstm_layers = 2
batch_size = 1000
learning_rate = 0.01
n_words = len(vocab_to_int) + 1
# Create the graph object
tf.reset_default_graph()
with tf.name_scope('inputs'):
inputs_ = tf.placeholder(tf.int32, [None, None], name="inputs")
labels_ = tf.placeholder(tf.int32, [None, None], name="labels")
keep_prob = tf.placeholder(tf.float32, name="keep_prob")
#Create embedding layer LSTM cell, LSTM Layers
...
# Forward pass
with tf.name_scope("RNN_forward"):
outputs, final_state = tf.nn.dynamic_rnn(cell, embed, initial_state=initial_state)
# Output. We are only interested in the latest output of the lstm cell
with tf.name_scope('predictions'):
predictions = tf.contrib.layers.fully_connected(outputs[:, -1], 1, activation_fn=tf.sigmoid)
tf.summary.histogram('predictions', predictions)
#More functions for cost, accuracy, optimizer initialization
...
# Training
epochs = 1
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
iteration = 1
for e in range(epochs):
state = sess.run(initial_state)
for ii, (x, y) in enumerate(get_batches(train_x, train_y, batch_size), 1):
feed = {inputs_: x,
labels_: y[:, None],
keep_prob: 0.5,
initial_state: state}
summary, loss, state, _ = sess.run([merged, cost, final_state, optimizer], feed_dict=feed)
train_writer.add_summary(summary, iteration)
if iteration%1==0:
print("Epoch: {}/{}".format(e, epochs),
"Iteration: {}".format(iteration),
"Train loss: {:.3f}".format(loss))
if iteration%2==0:
val_acc = []
val_state = sess.run(cell.zero_state(batch_size, tf.float32))
for x, y in get_batches(val_x, val_y, batch_size):
feed = {inputs_: x,
labels_: y[:, None],
keep_prob: 1,
initial_state: val_state}
summary, batch_acc, val_state = sess.run([merged, accuracy, final_state], feed_dict=feed)
val_acc.append(batch_acc)
print("Val acc: {:.3f}".format(np.mean(val_acc)))
iteration +=1
test_writer.add_summary(summary, iteration)
#Saving the model
export_path = './SavedModel'
print ('Exporting trained model to %s'%(export_path))
builder = saved_model_builder.SavedModelBuilder(export_path)
# Build the signature_def_map.
classification_inputs = utils.build_tensor_info(inputs_)
classification_outputs_classes = utils.build_tensor_info(labels_)
classification_signature = signature_def_utils.build_signature_def(
inputs={signature_constants.CLASSIFY_INPUTS: classification_inputs},
outputs={
signature_constants.CLASSIFY_OUTPUT_CLASSES:
classification_outputs_classes,
},
method_name=signature_constants.CLASSIFY_METHOD_NAME)
legacy_init_op = tf.group(
tf.tables_initializer(), name='legacy_init_op')
#add the sigs to the servable
builder.add_meta_graph_and_variables(
sess, [tag_constants.SERVING],
signature_def_map={
signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
classification_signature
},
legacy_init_op=legacy_init_op)
print ("added meta graph and variables")
#save it!
builder.save()
print("model saved")
I am not entirely sure if this is the correct way to save a model such as this but this is the only implementation I have found in the documentation and online tutorials.
I haven't found any example or any explicit guide to saving the dictionary or how to use it when restoring a savedModel in the documentation.
When using checkpoints, I would just load the pickle file before running the session. How do I restore this savedModel so that I can use the same word to int mapping using the dictionary? Is there any specific way I should be saving the model or loading it?
I have also added inputs_ as the input for the input signature. This is a sequence of integeres 'after' the words have been mapped. I can't specify a string as input because I get an AttributeError: 'str' object has no attribute 'dtype' . In such cases, how exactly are words mapped to integers in models that are in production?
Implement your preprocessing using the utilities in tf.feature_column and it'll be straightforward to use the same mapping to integers in serving.
One approach to this is storing the vocabulary in the model's graph. This will then be shipped with the model.
...
vocab_table = lookup.index_table_from_file(vocabulary_file='data/vocab.csv', num_oov_buckets=1, default_value=-1)
text = features[commons.FEATURE_COL]
words = tf.string_split(text)
dense_words = tf.sparse_tensor_to_dense(words, default_value=commons.PAD_WORD)
word_ids = vocab_table.lookup(dense_words)
padding = tf.constant([[0, 0], [0, commons.MAX_DOCUMENT_LENGTH]])
# Pad all the word_ids entries to the maximum document length
word_ids_padded = tf.pad(word_ids, padding)
word_id_vector = tf.slice(word_ids_padded, [0, 0], [-1, commons.MAX_DOCUMENT_LENGTH])
Source: https://github.com/KishoreKarunakaran/CloudML-Serving/blob/master/text/imdb_cnn/model/cnn_model.py#L83
I'm trying to use keras model.fit_generator() to fit a model, below is my definition of the generator:
from sklearn.utils import shuffle
IMG_PATH_PREFIX = "./data/IMG/"
def generator(samples, batch_size=64):
num_samples = len(samples)
while 1: # Loop forever so the generator never terminates
shuffle(samples)
for offset in range(0, num_samples, batch_size):
batch_samples = samples[offset:offset+batch_size]
images = []
angles = []
for batch_sample in batch_samples:
name = IMG_PATH_PREFIX + batch_sample[0].split('/')[-1]
center_image = cv2.imread(name)
center_angle = float(batch_sample[3])
images.append(center_image)
angles.append(center_angle)
X_train = np.array(images)
y_train = np.array(angles)
#X_train = np.expand_dims(X_train, axis=0)
#y_train = np.expand_dims(y_train, axis=1)
print("X_train shape: ", X_train.shape, " y_train shape:", y_train.shape)
#print("X train: ", X_train)
yield X_train, y_train
train_generator = generator(train_samples, batch_size = 32)
validation_generator = generator(validation_samples, batch_size = 32)
Here the output shape is:
X_train shape: (32, 160, 320, 3) y_train shape: (32,)
The model fit code is:
model = Sequential()
#cropping layer
model.add(Cropping2D(cropping=((50,20), (1,1)), input_shape=(160,320,3), dim_ordering='tf'))
model.compile(loss = "mse", optimizer="adam")
model.fit_generator(train_generator, samples_per_epoch= len(train_samples), validation_data=validation_generator, nb_val_samples=len(validation_samples), nb_epoch=3)
Then I get the error message:
ValueError: Error when checking model target: expected cropping2d_6 to have 4 dimensions, but got array with shape (32, 1)
Could someone help let me know what's the issue?
The big question here is : do you know what you are trying to do ?
1) If you read here, the input is a 4D tensor and the output is ALSO a 4D tensor. Your target is a 2D tensor of shape (batch_size,1). So of course, when keras tries to compute the error between the output which has 3D (without batch dimension) and the target which has 1D (without batch dimension), it can not make sense out of that. Outputs and targets must have the same dimensions.
2) Do you know what cropping2D is actually doing ? It is cropping your images... So removing values at the beginning and end of your cropping dimensions. In your case you are outputing images of shape (90, 218, 3). This is not a prediction, there is no weight to train on this layer so no reason to fit the "model". Your model is just cropping images. No training needed for that.
I'm kind of lost in building up a stacked LSTM model for text classification in TensorFlow.
My input data was something like:
x_train = [[1.,1.,1.],[2.,2.,2.],[3.,3.,3.],...,[0.,0.,0.],[0.,0.,0.],
...... #I trained the network in batch with batch size set to 32.
]
y_train = [[1.,0.],[1.,0.],[0.,1.],...,[1.,0.],[0.,1.]]
# binary classification
The skeleton of my code looks like:
self._input = tf.placeholder(tf.float32, [self.batch_size, self.max_seq_length, self.vocab_dim], name='input')
self._target = tf.placeholder(tf.float32, [self.batch_size, 2], name='target')
lstm_cell = rnn_cell.BasicLSTMCell(self.vocab_dim, forget_bias=1.)
lstm_cell = rnn_cell.DropoutWrapper(lstm_cell, output_keep_prob=self.dropout_ratio)
self.cells = rnn_cell.MultiRNNCell([lstm_cell] * self.num_layers)
self._initial_state = self.cells.zero_state(self.batch_size, tf.float32)
inputs = tf.nn.dropout(self._input, self.dropout_ratio)
inputs = [tf.reshape(input_, (self.batch_size, self.vocab_dim)) for input_ in
tf.split(1, self.max_seq_length, inputs)]
outputs, states = rnn.rnn(self.cells, inputs, initial_state=self._initial_state)
# We only care about the output of the last RNN cell...
y_pred = tf.nn.xw_plus_b(outputs[-1], tf.get_variable("softmax_w", [self.vocab_dim, 2]), tf.get_variable("softmax_b", [2]))
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_pred, self._target))
correct_pred = tf.equal(tf.argmax(y_pred, 1), tf.argmax(self._target, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
train_op = tf.train.AdamOptimizer(self.lr).minimize(loss)
init = tf.initialize_all_variables()
with tf.Session() as sess:
initializer = tf.random_uniform_initializer(-0.04, 0.04)
with tf.variable_scope("model", reuse=True, initializer=initializer):
sess.run(init)
# generate batches here (omitted for clarity)
print sess.run([train_op, loss, accuracy], feed_dict={self._input: batch_x, self._target: batch_y})
The problem is that no matter how large the dataset is, the loss and accuracy has no sign of improvement (looks completely stochastic). Am I doing anything wrong?
Update:
# First, load Word2Vec model in Gensim.
model = Doc2Vec.load(word2vec_path)
# Second, build the dictionary.
gensim_dict = Dictionary()
gensim_dict.doc2bow(model.vocab.keys(), allow_update=True)
w2indx = {v: k + 1 for k, v in gensim_dict.items()}
w2vec = {word: model[word] for word in w2indx.keys()}
# Third, read data from a text file.
for fname in fnames:
i = 0
with codecs.open(fname, 'r', encoding='utf8') as fr:
for line in fr:
tmp = []
for t in line.split():
tmp.append(t)
X_train.append(tmp)
i += 1
if i is samples_count:
break
# Fourth, convert words into vectors, and pad each sentence with ZERO arrays to a fixed length.
result = np.zeros((len(data), self.max_seq_length, self.vocab_dim), dtype=np.float32)
for rowNo in xrange(len(data)):
rowLen = len(data[rowNo])
for colNo in xrange(rowLen):
word = data[rowNo][colNo]
if word in w2vec:
result[rowNo][colNo] = w2vec[word]
else:
result[rowNo][colNo] = [0] * self.vocab_dim
for colPadding in xrange(rowLen, self.max_seq_length):
result[rowNo][colPadding] = [0] * self.vocab_dim
return result
# Fifth, generate batches and feed them to the model.
... Trivias ...
Here are few reasons it may not be training and suggestions to try:
You are not allowing to update word vectors, space of pre-learned vectors may be not working properly.
RNNs really need gradient clipping when trained. You can try adding something like this.
Unit scale initialization seems to work better, as it accounts for the size of the layer and allows gradient to be scaled properly as it goes deeper.
You should try removing dropout and second layer - just to check if your data passing is correct and your loss is going down at all.
I also can recommend trying this example with your data: https://github.com/tensorflow/skflow/blob/master/examples/text_classification.py
It trains word vectors from scratch, already has gradient clipping and uses GRUCells which usually are easier to train. You can also see nice visualizations for loss and other things by running tensorboard logdir=/tmp/tf_examples/word_rnn.