RNN Encoder Decoder using keras - machine-learning

I am trying to build an architecture which will be used for machine language translation (from English to French)
model = Sequential()
model.add(LSTM(256, input_shape =(15,1)))
model.add(RepeatVector(output_sequence_length))
model.add(LSTM(21,return_sequences=True))
model.add(TimeDistributed(Dense(french_vocab_size, activation='sigmoid'))
Max length of English sentence is 15 and that of French is 21. Max number of English words is 199 and that of French is 399. output_sequence_length is 21.
This model throws me an error
Error when checking input: expected lstm_40_input to have shape (None, 15, 1) but got array with shape (137861, 21, 1)
I am stuck with the understanding of the LSTM in keras.
1.The first argument according to documentation must be 'dimensionality of output space'. I did not understand what that means. Also,
what exactly happens return_sequences is set to True
Please let me know.

What Kind of data are you trying to feed your network ? Because it seems to me that you didn't convert your words to vectors (binary vectors or encoded vectors).
Anyway, a LSTM Netword need a 3 dimensional entry, the dimensions correspond to that : (samples , timesteps , features).
In your case, samples correspond to the numebr of your sentences, I guess 137861. Timesteps correspond to the length of each sequence, which In your case is 15, and features is the size of each encoded word ( Depending on which type of encoding you choose. If you choose OneHotEncoding, it will be 199).
The error that you got shows that you fed your network sequences with 21 timesteps instead of 15.
For your second question, when return_sequences is set to False, it returns only one output per LSTM layer, which in your case will be (256, ) for your first LSTM layer. when it's set to True, it will have one output per timestep, giving you an overall output of shape (15 , 256). When you want to stack two or more LSTM layers, you always have to set the first layers to return_sequences = True.
Also, what you are building is called a Many to Many architecture, with different timestep lengths for the input and the output (15 vs 21). As far as I know, it's not that easy to implement in keras.

Related

How to interpret patterns occuring in convolutional layer after training?

I am pretty sure I understood' the principle of cnn and why they are prefered over just fully connected neural networks. What I try to comprehend is how to interpret the occuring patterns after training the model.
So let's assume I want to recognize the number "1" written on an 256x256 big image-plane (only 1 bit image, black/white) that is then forwared to the output that either says "is a one", or "is not a one".
If the model is untrained and the first handwritten "1" is forwared, the result could be "[0.28, 0.72] which is obiously wrong. I then calculate the error between [0.28, 0.72] and [1, 0] (for example based on the mean squared error), derive it and try to find the local minimas of the derivative (backpropagation). Then I calculate the delta values for each weight (by using chainrule and partial derivative) until I finally reach the convolutional layer for which delta values are also calculated.
But my question now is: What exactly do the patterns that will occur by adding up bunch of delta values to the convolutional layer "weights" mean? Why do they find certain features characteristical for the number "1"? Or is it more like, it does not find any specific features per se, but rather it "encodes" the relationship between handwritten "1"s and the desired output [1, 0] into the convolutional layers?
_

What is Sequence length in LSTM?

The dimensions for the input data for LSTM are [Batch Size, Sequence Length, Input Dimension] in tensorflow.
What is the meaning of Sequence Length & Input Dimension ?
How do we assign the values to them if my input data is of the form :
[[[1.23] [2.24] [5.68] [9.54] [6.90] [7.74] [3.26]]] ?
LSTMs are a subclass of recurrent neural networks. Recurrent neural nets are by definition applied on sequential data, which without loss of generality means data samples that change over a time axis. A full history of a data sample is then described by the sample values over a finite time window, i.e. if your data live in an N-dimensional space and evolve over t-time steps, your input representation must be of shape (num_samples, t, N).
Your data does not fit the above description. I assume, however, that this representation means you have a scalar value x which evolves over 7 time instances, such that x[0] = 1.23, x[1] = 2.24, etc.
If that is the case, you need to reshape your input such that instead of a list of 7 elements, you have an array of shape (7,1). Then, your full data can be described by a 3rd order tensor of shape (num_samples, 7, 1) which can be accepted by a LSTM.
Simply put seq_len is number of time steps that will be inputted into LSTM network, Let's understand this by example...
Suppose you are doing a sentiment classification using LSTM.
Your input sentence to the network is =["I hate to eat apples"]. Every single token would be fed as input at each timestep, So accordingly here the seq_Len would total number of tokens in a sentence that is 5.
Coming to the input_dim you might know we can't directly feed words to the netowrk you would need to encode those words into numbers. In Pytorch/tensorflow embedding layers are used where we have to specify embedding dimension.
Suppose your embedding dimension is 50 that means that embedding layer will take index of respective token and convert it into vector representation of size 50. So the input dim to LSTM network would become 50.

Keras simple RNN implementation

I found problems when trying to compile a network with one recurrent layer. It seems there is some issue with the dimensionality of the first layer and thus my understanding of how RNN layers work in Keras.
My code sample is:
model.add(Dense(8,
input_dim = 2,
activation = "tanh",
use_bias = False))
model.add(SimpleRNN(2,
activation = "tanh",
use_bias = False))
model.add(Dense(1,
activation = "tanh",
use_bias = False))
The error is
ValueError: Input 0 is incompatible with layer simple_rnn_1: expected ndim=3, found ndim=2
This error is returned regardless of input_dim value. What am I missing ?
That message means: the input going into the rnn has 2 dimensions, but an rnn layer expects 3 dimensions.
For an RNN layer, you need inputs shaped like (BatchSize, TimeSteps, FeaturesPerStep). These are the 3 dimensions expected.
A Dense layer (in keras 2) can work with either 2 or 3 dimensions. We can see that you're working with 2 because you passed an input_dim instead of passing an input_shape=(Steps,Features).
There are many possible ways to solve this, but the most meaningful and logical would be a case where your input data is a sequence with time steps.
Solution 1 - Your training data is a sequence:
If your training data is a sequence, you shape it like (NumberOfSamples, TimeSteps, Features) and pass it to your model. Make sure you use input_shape=(TimeSteps,Features) in the first layer instead of using input_dim.
Solution 2 - You reshape the output of the first dense layer so it has the additional dimension:
model.add(Reshape((TimeSteps,Features)))
Make sure that the product TimeSteps*Features is equal to 8, the output of your first dense layer.

Input of LSTM seq2seq network - Tensorflow

Using the Tensorflow seq2seq tutorial code I am creating a character-based chatbot. I don't use word embeddings. I have an array of characters (the alphabet and some punctuation marks) and special symbols like the GO, EOS and UNK symbol.
Because I'm not using word embeddings, I use the standard tf.nn.seq2seq.basic_rnn_seq2seq() seq2seq model. However, I am confused about what shape encoder_inputs and decoder_inputs should have. Should they be an array of integers, corresponding to the index of the characters in the alphabet-array, or should I turn those integers into one-hot vectors first?
How many input nodes does one LSTM cell have? Can you specify that? Because I guess in my case an LSTM cell should have an input neuron for each letter in the alphabet (therefore the one-hot vectors?).
Also, what is the LSTM "size" you have to pass in the constructor tf.nn.rnn_cell.BasicLSTMCell(size)?
Thank you.
Appendix: these are the bugs I am trying to fix.
When I use the following code, according to the tutorial:
for i in xrange(buckets[-1][0]): # Last bucket is the biggest one.
self.encoder_inputs.append(tf.placeholder(tf.int32, shape=[None], name="encoder{0}".format(i)))
for i in xrange(buckets[-1][1] + 1):
self.decoder_inputs.append(tf.placeholder(tf.int32, shape=[None], name="decoder{0}".format(i)))
self.target_weights.append(tf.placeholder(dtype, shape=[None], name="weight{0}".format(i)))
And run the self_test() function, I get the error:
ValueError: Linear is expecting 2D arguments: [[None], [None, 32]]
Then, when I change the shapes in the above code to shape=[None, 32] I get this error:
TypeError: Expected int32, got -0.21650635094610965 of type 'float' instead.
The number of inputs of an lstm cell is the dimension of whatever tensor you pass as inputs to the tf.rnn function when instantiating things.
The size argument is the number of hidden units in your lstm (so a bigger number is slower but can lead to more accurate models).
I'd need a bigger stack trace to understand these errors.
It turns out the size argument passed to BasicLSTMCell represents both the size of the hidden state of the LSTM and the size of the input layer. So if you want a different hidden size than input size, you can first propagate your inputs through an additional projection layer or use the built-in seq2seq word embeddings function.

How to input the unknown size of time-step to TimeDistributed Keras layer

I have my input state with shape = (84,84,4)
state = Input(shape=(84,84,4), dtype="float")
So I want to pass this to some TimeDistributed layer with time steps size=1..5 (in range of 1 to 5) and I don't know exactly which it equals.
My next layer is something like this:
conv1 = TimeDistributed(Convolution2D(16, 8, 8, subsample=(4, 4), border_mode='valid',
activation='relu', dim_ordering='tf'))(state)
And I've got an error at this layer:
IndexError: tuple index out of range
I just want to pass an unknown time-series size to TimeDistributed and then to LSTM also.
So basically in Keras - you need to provide the sequence length because during computations Keras layers accepts as an input numpy array with a specified shape - what makes compulsory for all inputs (at least in one batch) to have a length fixed. But - you still can deal with varying input size by 0-padding (making all sequence equal size by adding all zero dummy timesteps at the beginning) and then masking what makes your network equivalent to a varying length input network.
You can give a variable sequence-length, like this :
classifier.add(TimeDistributed(Convolution2D(64,(3,3)),input_shape=(None,None,None,3)))
But now you will have to adjust the length of the vector when it flattens or un-rolls at the time prediction.

Resources