Unable to Restore Graph - machine-learning

Problem:
Attempting to restore meta_graph via tf.train.import_meta_graph("saved_models/model.meta") gives the following error:
InvalidArgumentError (see above for traceback): Shape [-1] has negative dimensions
[[Node: Placeholder_2 = Placeholder[dtype=DT_INT32, shape=[?], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
The shapes of placeholder and the data passed are as follows:
placeholder: (?, ?, 50)
data: (1, 2, 50)
Code Involved:
Placeholder involved: self.x_placeholder_input = tf.placeholder(tf.float32, shape=[None, None, n_inputs])
The other placeholder (label): self.y_placeholder_label = tf.placeholder(tf.int32, shape=[None, self.num_of_classes])
Predict method:
def predict(self):
with tf.name_scope("predict"):
with tf.Session(graph=tf.Graph()) as sess:
saver = tf.train.import_meta_graph("saved_models/model.meta")
saver.restore(sess, "saved_models/model")
graph = tf.get_default_graph()
output = graph.get_tensor_by_name("optimize/cal_loss/model_network/model_network_NN_network/output/BiasAdd:0")
x_placeholder = graph.get_tensor_by_name("Placeholder:0")
print x_placeholder.shape
print np.array(self.data_x).shape
print sess.run(output, feed_dict={x_placeholder: self.data_x})
Train Method:
def train(self):
writer = writer = tf.summary.FileWriter("mygraph/logs", tf.get_default_graph())
num_of_epoch = 10
with tf.Session() as sess:
for epoch in range(num_of_epoch):
# initialise all variables
optimize = self.optimize
sess.run(tf.global_variables_initializer())
sess.run(optimize,
feed_dict={self.x_placeholder_input: np.array(self.data_x),
self.y_placeholder_label: np.array(self.data_y),
self.sq_placeholder_seq_length: np.array(self.seq_length)})
if num_of_epoch % 10 == 0:
# Create Saver to save model
print "Cycle " + str(epoch) + " out of " + str(num_of_epoch) + " done"
saver = tf.train.Saver()
location = saver.save(sess, "saved_models/model")
print "Model saved to : " + str(location)
Question: Is the problem due to placeholder having two None when defining it's shape? It is fine when training though.
**Full Code (if it helps): **
(https://gist.github.com/duemaster/660208e6cd7856af2522c2efa67911da)

In my experience you are getting this error because you are not feeding the value of that placeholder in the restored model. You'd normally expect to get the error "you must feed the value for placeholder XX" when you forget to feed it, but I've noticed that when placeholders have None in their shape vector (with restored models), the error would be the one you are getting about negative dimensions. I've got this error even with 1 None in the placeholder shape, and properly feeding its value solved the problem.

Related

keras neural network predicts the same number for every handwritten digit

I am new to machine learning so as a first project I've tried to built a handwritten digit recognition neural network based on the mnist dataset and when I test it with the test images provided by the data set itself it seems to work pretty well (that's what the function test_predict is for). Now I would like to step it up and have the network recognise some actual handwritten digits that I've taken photos of.
The function partial_img_rec takes on an image containing multiple digits and it will be called by multiple_digits. I know it might seem weird that I use recursion here and I'm sure there are some more efficient ways to do this but that's not the matter. In order to test partial_img_rec I provide some photos of individual digits that are stored in the folder .\individual_test and they all look something like this:
The problem is: My neural network's prediction for every single one of my test images is "5". The probability is always about 22% no matter the actual digit displayed. I totally get why the results are not as great as those achieved with the mnist dataset's test images but I certainly didn't expect this. Do you have any idea why this is happening? Any advise is welcome.
Thank you in advance.
Here's my code (edited, now working):
# import keras and the MNIST dataset
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from keras.utils import np_utils
# numpy is necessary since keras uses numpy arrays
import numpy as np
# imports for pictures
from PIL import Image
from PIL import ImageOps
# imports for tests
import random
import os
class mnist_network():
def __init__(self):
""" load data, create and train model """
# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# flatten 28*28 images to a 784 vector for each image
num_pixels = X_train.shape[1] * X_train.shape[2]
X_train = X_train.reshape((X_train.shape[0], num_pixels)).astype('float32')
X_test = X_test.reshape((X_test.shape[0], num_pixels)).astype('float32')
# normalize inputs from 0-255 to 0-1
X_train = X_train / 255
X_test = X_test / 255
# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]
# create model
self.model = Sequential()
self.model.add(Dense(num_pixels, input_dim=num_pixels, kernel_initializer='normal', activation='relu'))
self.model.add(Dense(num_classes, kernel_initializer='normal', activation='softmax'))
# Compile model
self.model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# train the model
self.model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=2)
self.train_img = X_train
self.train_res = y_train
self.test_img = X_test
self.test_res = y_test
def test_all(self):
""" evaluates the success rate using all the test data """
scores = self.model.evaluate(self.test_img, self.test_res, verbose=0)
print("Baseline Error: %.2f%%" % (100-scores[1]*100))
def predict_result(self, img, num_pixels = None, show=False):
""" predicts the number in a picture (vector) """
assert type(img) == np.ndarray and img.shape == (784,)
"""if show:
# show the picture!!!! some problem here
plt.imshow(img, cmap='Greys')
plt.show()"""
num_pixels = img.shape[0]
# the actual number
res_number = np.argmax(self.model.predict(img.reshape(-1,num_pixels)), axis = 1)
# the probabilities
res_probabilities = self.model.predict(img.reshape(-1,num_pixels))
return (res_number[0], res_probabilities.tolist()[0]) # we only need the first element since they only have one
def test_predict(self, amount_test = 100):
""" test some random numbers from the test part of the data set """
assert type(amount_test) == int and amount_test <= 10000
cnt_right = 0
cnt_wrong = 0
for i in range(amount_test):
ind = random.randrange(0,10000) # there are 10000 images in the test part of the data set
""" correct_res is the actual result stored in the data set
It's represented as a list of 10 elements one of which being 1, the rest 0 """
correct_list = self.test_res.tolist()
correct_list = correct_list[ind] # the correct sublist
correct_res = correct_list.index(1.0)
predicted_res = self.predict_result(self.test_img[ind])[0]
if correct_res != predicted_res:
cnt_wrong += 1
print("Error in predict ! \
index = ", ind, " predicted result = ", predicted_res, " correct result = ", correct_res)
else:
cnt_right += 1
print("The machine predicted correctly ",cnt_right," out of ",amount_test," examples. That is a success rate of ", (cnt_right/amount_test)*100,"%.")
def partial_img_rec(self, image, upper_left, lower_right, results=[]):
""" partial is a part of an image """
left_x, left_y = upper_left
right_x, right_y = lower_right
print("current test part: ", upper_left, lower_right)
print("results: ", results)
# condition to stop recursion: we've reached the full width of the picture
width, height = image.size
if right_x > width:
return results
partial = image.crop((left_x, left_y, right_x, right_y))
# rescale image to 28 *28 dimension
partial = partial.resize((28,28), Image.ANTIALIAS)
partial.show()
# transform to vector
partial = ImageOps.invert(partial)
partial = np.asarray(partial, "float32")
partial = partial / 255.
partial[partial < 0.5] = 0.
# flatten image to 28*28 = 784 vector
num_pixels = partial.shape[0] * partial.shape[1]
partial = partial.reshape(num_pixels)
step = height // 10
# is there a number in this part of the image?
res, prop = self.predict_result(partial)
print("result: ", res, ". probabilities: ", prop)
# only count this result if the network is >= 50% sure
if prop[res] >= 0.5:
results.append(res)
# step is 80% of the partial image's size (which is equivalent to the original image's height)
step = int(height * 0.8)
print("found valid result")
else:
# if there is no number found we take smaller steps
step = height // 20
print("step: ", step)
# recursive call with modified positions ( move on step variables )
return self.partial_img_rec(image, (left_x+step, left_y), (right_x+step, right_y), results=results)
def test_individual_digits(self):
""" test partial_img_rec with some individual digits (square shaped images)
saved in the folder 'individual_test' following the pattern 'number_digit.jpg' """
cnt_right, cnt_wrong = 0,0
folder_content = os.listdir(".\individual_test")
for imageName in folder_content:
# image file must be a jpg or png
assert imageName[-4:] == ".jpg" or imageName[-4:] == ".png"
correct_res = int(imageName[0])
image = Image.open(".\\individual_test\\" + imageName).convert("L")
# only square images in this test
if image.size[0] != image.size[1]:
print(imageName, " has the wrong proportions: ", image.size,". It has to be a square.")
continue
predicted_res = self.partial_img_rec(image, (0,0), (image.size[0], image.size[1]), results=[])
if predicted_res == []:
print("No prediction possible for ", imageName)
else:
predicted_res = predicted_res[0]
if predicted_res != correct_res:
print("error in partial_img-rec! Predicted ", predicted_res, ". The correct result would have been ", correct_res)
cnt_wrong += 1
else:
cnt_right += 1
print("correctly predicted ",imageName)
print(cnt_right, " out of ", cnt_right + cnt_wrong," digits were correctly recognised. The success rate is therefore ", (cnt_right / (cnt_right + cnt_wrong)) * 100," %.")
def multiple_digits(self, img):
""" takes as input an image without unnecessary whitespace surrounding the digits """
#assert type(img) == myImage
width, height = img.size
# start with the first quadratic part of the image
res_list = self.partial_img_rec(img, (0,0),(height ,height))
res_str =""
for elem in res_list:
res_str += str(elem)
return res_str
network = mnist_network()
network.test_individual_digits()
EDIT
#Geecode's answer was very helpful and the network now predicts correctly some of the pictures including the one shown above. Yet the overall success rate is lower than 50%. Do you have any ideas how to improve this?
Examples for images returning bad results:
Nothing wrong with your image in itself, your model can correctly classify it.
The issue is that you made a Floor Division on your partial:
partial = partial // 255
which always results in 0. So you always get a black image.
You have to do a "normal" division and some preparation, because your model was trained on black i.e. 0. valued pixel backgrounded negative images:
# transform to vector
partial = ImageOps.invert(partial)
partial = np.asarray(partial, "float32")
partial = partial / 255.
partial[partial < 0.5] = 0.
After then your model will classify correctly:
Out:
result: 1 . probabilities: [0.000431705528171733, 0.7594985961914062, 0.0011404436081647873, 0.00018972357793245465, 0.03162384033203125, 0.008697531186044216, 0.0014472954208031297, 0.18429973721504211, 0.006838776171207428, 0.005832481198012829]
found valid result
Note, that of course you can play on the image preparation yet, that was not the purpose of this answer.
Update:
My detailed answer regarding how to achive better performance in this task, see here.

Unable restore variables of Adam Optimizer while using tf.train.save

I get following errors when I try to restore a saved model in tensorflow:
W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key out_w/Adam_5 not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key b1/Adam not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key b1/Adam_4 not found in checkpoint
I guess I am unable to save Variables of Adam Optimizer.
Any fix?
Consider this small experiment:
import tensorflow as tf
def simple_model(X):
with tf.variable_scope('Layer1'):
w1 = tf.get_variable('w1', initializer=tf.truncated_normal((5, 2)))
b1 = tf.get_variable('b1', initializer=tf.ones((2)))
layer1 = tf.matmul(X, w1) + b1
return layer1
def simple_model2(X):
with tf.variable_scope('Layer1'):
w1 = tf.get_variable('w1_x', initializer=tf.truncated_normal((5, 2)))
b1 = tf.get_variable('b1_x', initializer=tf.ones((2)))
layer1 = tf.matmul(X, w1) + b1
return layer1
tf.reset_default_graph()
X = tf.placeholder(tf.float32, shape = (None, 5))
model = simple_model(X)
saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
saver.save(sess, './Checkpoint', global_step = 0)
tf.reset_default_graph()
X = tf.placeholder(tf.float32, shape = (None, 5))
model = simple_model(X) # Case 1
#model = simple_model2(X) # Case 2
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
tf.train.Saver().restore(sess, tf.train.latest_checkpoint('.'))
In Case 1, everything works fine. But in Case2, you will get errors like Key Layer1/b1_x not found in checkpoint which is because the variable names in the model are different (though the shapes and datatypes of both variables are same). Ensure that variables are having same names in the model in which you are restoring.
To check the names of the variables present in the checkpoint, check this answer.
This can also happen when you are not training every variable simultaneously, due to only partially available adam parameters in a checkpoint.
One possible fix would be to "reset" Adam after loading the checkpoint. To to this, filter adam-related variables when creating the saver:
vl = [v for v in tf.global_variables() if "Adam" not in v.name]
saver = tf.train.Saver(var_list=vl)
Make sure to initialize global variables afterwards.

Is there any difference between the two codes?

I am currently still relatively new to Tensorflow. I am having some trouble with these two pieces of code.
Code A:
self.h1_layer = tf.layers.dense(self.x, self.n_nodes_hl1, activation=tf.nn.relu, name="h1")
self.h2_layer = tf.layers.dense(self.h1_layer, self.n_nodes_hl2, activation=tf.nn.relu, name="h2")
self.h3_layer = tf.layers.dense(self.h2_layer, self.n_nodes_hl3, activation=tf.nn.relu, name="h3")
self.logits = tf.layers.dense(self.h3_layer, self.num_of_classes, name="output")
Code B:
self.hidden_1_layer = {
'weights': tf.Variable(tf.random_normal([self.num_of_words, self.h1])),
'biases' : tf.Variable(tf.random_normal([self.h1]))
}
self.hidden_2_layer = {
'weights': tf.Variable(tf.random_normal([self.h1, self.h2])),
'biases' : tf.Variable(tf.random_normal([self.h2]))
}
self.hidden_3_layer = {
'weights': tf.Variable(tf.random_normal([self.h2, self.h3])),
'biases' : tf.Variable(tf.random_normal([self.h3]))
}
self.final_output_layer = {
'weights': tf.Variable(tf.random_normal([self.h3, self.num_of_classes])),
'biases' : tf.Variable(tf.random_normal([self.num_of_classes]))
}
layer1 = tf.add(tf.matmul(data, self.hidden_1_layer['weights']), self.hidden_1_layer['biases'])
layer1 = tf.nn.relu(layer1)
layer2 = tf.add(tf.matmul(layer1, self.hidden_2_layer['weights']), self.hidden_2_layer['biases'])
layer2 = tf.nn.relu(layer2)
layer3 = tf.add(tf.matmul(layer2, self.hidden_3_layer['weights']), self.hidden_3_layer['biases'])
layer3 = tf.nn.relu(layer3)
output = tf.matmul(layer3, self.final_output_layer['weights']) + self.final_output_layer['biases']
Are they the same thing? Can both Codes A & B weights and biases be saved with tf.train.Saver() ?
Thanks
Edit:
I am facing issues using Code A to generate prediction. It seems that logits of Code A is always changing.
The full code:
import tensorflow as tf
import os
from utils import Utils as utils
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
class Neural_Network:
# Neural Network Setup
num_of_epoch = 50
n_nodes_hl1 = 500
n_nodes_hl2 = 500
n_nodes_hl3 = 500
def __init__(self):
self.num_of_classes = utils.get_num_of_classes()
self.num_of_words = utils.get_num_of_words()
# placeholders
self.x = tf.placeholder(tf.float32, [None, self.num_of_words])
self.y = tf.placeholder(tf.int32, [None, self.num_of_classes])
with tf.name_scope("model"):
self.h1_layer = tf.layers.dense(self.x, self.n_nodes_hl1, activation=tf.nn.relu, name="h1")
self.h2_layer = tf.layers.dense(self.h1_layer, self.n_nodes_hl2, activation=tf.nn.relu, name="h2")
self.h3_layer = tf.layers.dense(self.h2_layer, self.n_nodes_hl3, activation=tf.nn.relu, name="h3")
self.logits = tf.layers.dense(self.h3_layer, self.num_of_classes, name="output")
def predict(self):
return self.logits
def make_prediction(self, query):
result = None
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
saver = tf.train.import_meta_graph('saved_models/testing.meta')
saver.restore(sess, 'saved_models/testing')
# for variable in tf.trainable_variables():
# print sess.run(variable)
prediction = self.predict()
pre, prediction = sess.run([self.logits, prediction], feed_dict={self.x : query})
print pre
prediction = prediction.tolist()
prediction = tf.nn.softmax(prediction)
prediction = sess.run(prediction)
print prediction
return utils.get_label_from_encoding(prediction[0])
def train(self, data):
prediction = self.predict()
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=self.y))
optimizer = tf.train.AdamOptimizer().minimize(cost)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
writer = tf.summary.FileWriter("mygraph/logs", tf.get_default_graph())
for epoch in range(self.num_of_epoch):
optimised, loss = sess.run([optimizer, cost],
feed_dict={self.x: data['values'], self.y: data['labels']})
if epoch % 1 == 0:
print("Completed Training Cycle: " + str(epoch) + " out of " + str(self.num_of_epoch))
print("Current Loss: " + str(loss))
saver = tf.train.Saver()
saver.save(sess, 'saved_models/testing')
print("Model saved")
TLDR: The operations are essentially the same but the variables creation and initialization methods are different.
If you trace the code from here, you will eventually get to a stage where the code is calling tf.get_variable to initialize variables. In your example above, since kernel_initializer and bias_initializer is not set, they will default to None and tf.zeros_initializer() respectively (see Dense API). When None is passed to tf.get_variable as an initializer, a glorot_uniform_initializer will be used:
If initializer is None (the default), the default initializer passed
in the variable scope will be used. If that one is None too, a
glorot_uniform_initializer will be used. The initializer can also be a
Tensor, in which case the variable is initialized to this value and
shape.
More on tf.get_variable can be found here.
For one case, you used a tf.random_normal initializer for both kernel weights and bias weights, but for the other, you used tf.layers.dense and will result in a glorot_uniform_initializer for kernel weights and zeros_initializer for bias weights as no parameters were passed to tf.layers.dense.
To your second question on whether they can be saved, yes they can.
As a last note, you have to be careful when using tf.Variable as it might complicate things when the scopes are not properly set.

Why can't I restore this model?

I am currently having trouble restoring this model to make a prediction.
Code:
def neural_network(data):
with tf.name_scope("network"):
layer1 = tf.layers.dense(data, 1000, activation=tf.nn.relu, name="hidden_layer1")
layer2 = tf.layers.dense(layer1, 1000, activation=tf.nn.relu, name="hidden_layer2")
output = tf.layers.dense(layer2, 2, name="output_layer")
return output
def evaluate():
with tf.name_scope("loss"):
global x
xentropy = tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=neural_network(x))
loss = tf.reduce_mean(xentropy, name="loss")
with tf.name_scope("train"):
optimizer = tf.train.AdamOptimizer()
training_op = optimizer.minimize(loss)
with tf.name_scope("exec"):
with tf.Session() as sess:
for i in range(1, 10):
sess.run(tf.global_variables_initializer())
sess.run(training_op, feed_dict={x: np.array(train_data).reshape([-1, 1]), y: label})
print "Training " + str(i)
saver = tf.train.Saver()
saver.save(sess, "saved_models/testing")
print "Model Saved."
def predict():
with tf.name_scope("predict"):
output = neural_network(x)
output = tf.nn.softmax(output)
with tf.Session() as sess:
saver = tf.train.import_meta_graph("saved_models/testing.meta")
# saver = tf.train.Saver()
saver.restore(sess, "saved_models/testing")
print sess.run(output, feed_dict={x: np.array([12003]).reshape([-1, 1])})
I have tried using tf.train.Saver() to restore but also gives the same error.
The error given is ValueError: Variable hidden_layer1/kernel already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:
I have tried setting reuse=True for tf.layers.dense() but it results in me unable to train the graph (gives the same ValueError as above but asking to set reuse=None).
I am guessing it has to do with the graph still existing in the session so when I try to restore it, it detects a duplicate graph. However, I thought this should not happen as the session have already closed.
link to entire code: gistlink
I think you are loading the variables in the same graph. For testing try to create a new graph and load it. Do something like this:
loaded_graph = tf.Graph()
with tf.Session(graph=loaded_graph) as sess:
# Load the graph with the trained states

Stacked RNN model setup in TensorFlow

I'm kind of lost in building up a stacked LSTM model for text classification in TensorFlow.
My input data was something like:
x_train = [[1.,1.,1.],[2.,2.,2.],[3.,3.,3.],...,[0.,0.,0.],[0.,0.,0.],
...... #I trained the network in batch with batch size set to 32.
]
y_train = [[1.,0.],[1.,0.],[0.,1.],...,[1.,0.],[0.,1.]]
# binary classification
The skeleton of my code looks like:
self._input = tf.placeholder(tf.float32, [self.batch_size, self.max_seq_length, self.vocab_dim], name='input')
self._target = tf.placeholder(tf.float32, [self.batch_size, 2], name='target')
lstm_cell = rnn_cell.BasicLSTMCell(self.vocab_dim, forget_bias=1.)
lstm_cell = rnn_cell.DropoutWrapper(lstm_cell, output_keep_prob=self.dropout_ratio)
self.cells = rnn_cell.MultiRNNCell([lstm_cell] * self.num_layers)
self._initial_state = self.cells.zero_state(self.batch_size, tf.float32)
inputs = tf.nn.dropout(self._input, self.dropout_ratio)
inputs = [tf.reshape(input_, (self.batch_size, self.vocab_dim)) for input_ in
tf.split(1, self.max_seq_length, inputs)]
outputs, states = rnn.rnn(self.cells, inputs, initial_state=self._initial_state)
# We only care about the output of the last RNN cell...
y_pred = tf.nn.xw_plus_b(outputs[-1], tf.get_variable("softmax_w", [self.vocab_dim, 2]), tf.get_variable("softmax_b", [2]))
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_pred, self._target))
correct_pred = tf.equal(tf.argmax(y_pred, 1), tf.argmax(self._target, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
train_op = tf.train.AdamOptimizer(self.lr).minimize(loss)
init = tf.initialize_all_variables()
with tf.Session() as sess:
initializer = tf.random_uniform_initializer(-0.04, 0.04)
with tf.variable_scope("model", reuse=True, initializer=initializer):
sess.run(init)
# generate batches here (omitted for clarity)
print sess.run([train_op, loss, accuracy], feed_dict={self._input: batch_x, self._target: batch_y})
The problem is that no matter how large the dataset is, the loss and accuracy has no sign of improvement (looks completely stochastic). Am I doing anything wrong?
Update:
# First, load Word2Vec model in Gensim.
model = Doc2Vec.load(word2vec_path)
# Second, build the dictionary.
gensim_dict = Dictionary()
gensim_dict.doc2bow(model.vocab.keys(), allow_update=True)
w2indx = {v: k + 1 for k, v in gensim_dict.items()}
w2vec = {word: model[word] for word in w2indx.keys()}
# Third, read data from a text file.
for fname in fnames:
i = 0
with codecs.open(fname, 'r', encoding='utf8') as fr:
for line in fr:
tmp = []
for t in line.split():
tmp.append(t)
X_train.append(tmp)
i += 1
if i is samples_count:
break
# Fourth, convert words into vectors, and pad each sentence with ZERO arrays to a fixed length.
result = np.zeros((len(data), self.max_seq_length, self.vocab_dim), dtype=np.float32)
for rowNo in xrange(len(data)):
rowLen = len(data[rowNo])
for colNo in xrange(rowLen):
word = data[rowNo][colNo]
if word in w2vec:
result[rowNo][colNo] = w2vec[word]
else:
result[rowNo][colNo] = [0] * self.vocab_dim
for colPadding in xrange(rowLen, self.max_seq_length):
result[rowNo][colPadding] = [0] * self.vocab_dim
return result
# Fifth, generate batches and feed them to the model.
... Trivias ...
Here are few reasons it may not be training and suggestions to try:
You are not allowing to update word vectors, space of pre-learned vectors may be not working properly.
RNNs really need gradient clipping when trained. You can try adding something like this.
Unit scale initialization seems to work better, as it accounts for the size of the layer and allows gradient to be scaled properly as it goes deeper.
You should try removing dropout and second layer - just to check if your data passing is correct and your loss is going down at all.
I also can recommend trying this example with your data: https://github.com/tensorflow/skflow/blob/master/examples/text_classification.py
It trains word vectors from scratch, already has gradient clipping and uses GRUCells which usually are easier to train. You can also see nice visualizations for loss and other things by running tensorboard logdir=/tmp/tf_examples/word_rnn.

Resources