What's the difference between LSTM() and LSTMCell()? - machine-learning

I've checked the source code for both functions, and it seems that LSTM() makes the LSTM network in general, while LSTMCell() only returns one cell.
However, in most cases people only use one LSTM Cell in their program. Does this mean when you have only one LSTM Cell (ex. in simple Seq2Seq), calling LSTMCell() and LSTM() would make no difference?

LSTM is a recurrent layer
LSTMCell is an object (which happens to be a layer too) used by the LSTM layer that contains the calculation logic for one step.
A recurrent layer contains a cell object. The cell contains the core code for the calculations of each step, while the recurrent layer commands the cell and performs the actual recurrent calculations.
Usually, people use LSTM layers in their code.
Or they use RNN layers containing LSTMCell.
Both things are almost the same. An LSTM layer is a RNN layer using an LSTMCell, as you can check out in the source code.
About the number of cells:
Alghout it seems, because of its name, that LSTMCell is a single cell, it is actually an object that manages all the units/cells as we may think. In the same code mentioned, you can see that the units argument is used when creating an instance of LSTMCell.

Related

How can a recurrent layer have several neurons?

If I understand correctly, one neuron per layer is enough since the layer will just be unrolled through time to accomodate a long sequence.
How can a recurrent layer contain several neurons?
Aren't the neurons in one layer essentially the same if they were unrolled through time?
Neural networks (MLP, CNN, RNN) are expected to have multiple neurons through multiple layers. One neuron per layer is hardly enough and will most likely provide a linear solution to the issue being addressed and its architecture will be too small to deal with any kind of real-life situation.
From Brandon Rohrer's video Recurrent Neural Networks and LSTM you can see a very simple structure containing multiple neurons (dots) for a single layer. Imagine this simple model working with only one model? It will prove to perform very poorly.

LSTM connections between cells/units (not timesteps)

My question is regarding how an LSTM layer is built, for example in keras:
keras.layers.LSTM(units,... other options)
Are these units individual cells or the dimensions of the cell state?
I've read conflicting comments on the subject, could someone clarify if all LSTM units or blocks are different units interconnected with a delay of 1 timestep or is a LSTM layer just a cell with 'units' number of dimensions for the cell state?
I've made 3 diagramms, the first is the normal LSTM cell as it is usually shown (feel free to check it for errors), the other two are, as far as i understand them, the other options concerning the 'many cell' layer.
LSTM normal diagramm
LSTM each cell connected to the next in layer
LSTM with all cells connected?
Units are the number of cells in your LSTM layer.
model.add(LSTM(32))
Implies that you are adding an LSTM layer that has 32 LSTM cells that are connected to the previous and next layer. This will result in an output shape of (batch_size, 32), as units also correspond to the dimensionality of the output shape (when return_sequences is false).

Good approach for training a neural network

I am training a neural network model to differentiate the orange and pomegranate.
In the training dataset, the background of the object (for both orange and pomegranate) is same and constant. But while testing, the background of the object is different than what I trained with.
So my first doubt is,
Is it good approach to train a model with one background (suppose white background) and test with
another background (suppose grey background)?.
Second, I trained the object with different position and the same background. Since the theory says that position doesn't matter for convolution, it has ability to recognise the object placed at anywhere, because anyhow, after convolution, the dimension of the activation map decreases and the depth increases.
So my second doubt is,
Is it necessary or good approach to keep the object at different position while training
the model?
Is it good approach to train a model with one background (suppose white background) and test with
another background (suppose grey background)?.
When training a neural network, it is important to shuffle the dataset you are using and split the dataset to training and testing sets. The reason why you need to shuffle the data, is in order for your model to see all types of samples in the training set so the moment it is exposed to new unseen data, it can reflect it over the previously seen data. In the example you mentioned above, it is important to shuffle the data due to the fact that there are different background colors which can effect the prediction of the model. Therefore both the training and the testing set need to have both background colors in order for your model to give good predictions.
Is it necessary or good approach to keep the object at different position while training
the model?
It is indeed better to train your model with objects in different positions due to the fact it can bring your model to predict more types of oranges or pomegranates. Note that if you are using different positions for the object you are trying to predict, it is important to have a sufficient amount of data in order for the model to give you good predictions over the testing set.
I hope this short explanation helped, if something isn't clear please let me know and I'll edit the post.
Is it good approach to train a model with one background (suppose white background) and test with another background (suppose grey background)?.
Background is a property of an image that is not required for distinguishing the object. You want your network to learn this behavior. Consider two cases now:
You give your network images with one background. Lets see what can go possibly wrong here.
Assume that your background is completely black. This means that there will be 0 output for a feature map (kernel) when it was put into the background. Your network will learn that it can put any high weights for these features and it will do a good job during training as long as those weights can successfully extract feature of the classes.
Now during testing, the background color is white. The same feature maps with high weight now will have very high output. These high output can saturate the non-linear unit and all categories may be classified as one category.
The second case where during training you shows images with different background.
In this case, neural network has to learn that the feature maps corresponding to background and need to subtract the bias based on the background.
In short, there is an extra information that you need to learn that is background is not important for deciding the category. When you provide only one color background, your neural network cannot learn this behavior and can give garbage result on test dataset.
Is it necessary or good approach to keep the object at different position while training the model?
You are right, Convolutional Neural Network are translational-equivariant. But for building a classifier, you pass the output of CNN-layer through a fully-connected layer. If you put image at different positions, different input will go to the fully-connected layer but output for all these images is the same category. So you are forcing your neural network to learn that the position of the object is not required for classifying its category.
Regarding your first doubt, It is not much of an issue as long as the target object is present in the images. Shuffle the data before feeding it to the network.
For second doubt, Yes it is always a good idea have target object at different positions. Also one more thing to take care is that the source of your data is same and mostly of same quality. Otherwise performance issue will arise.

What is the most efficient way to implement multi-layer RNNs in TensorFlow?

I’m trying to figure out if it’s more efficient to run an RNN on the inputs, and then run another RNN on those outputs, repeatedly (one horizontal layer at a time). Or to run one time-step at a time for all layers (one vertical layer at a time).
I know tensorflow's MultiCellRNN class does the latter. Why is this method chosen over the former? Is the former equally efficient? Are there cases where going one time-step at a time for all layers is preferable?
See http://karpathy.github.io/2015/05/21/rnn-effectiveness/ for reference on multi-layer RNNs.
1: How to easily implement an RNN
Use an lstm cell, they're generally better (no vanishing gradient problems) and tensorflow makes it very easy to implement them through:
from tensorflow.python.ops.rnn_cell import BasicLSTMCell
...
cell = BasicLSTMCell( state_dim )
stacked_lstm = tf.nn.rnn_cell.MultiRNNCell([cell]*num_layers, state_is_tuple=True)
find out more on the tensorflow website: https://www.tensorflow.org/tutorials/recurrent/
2:Horizontal or Deep?
Just like you can have a multi layer neural networks, you can also have a multi layer RNN. Think of the RNN cell as a layer within your neural network, a special layer which allows you to remember sequential inputs. From my experience you will still have linear transforms (or depth) within your network, but the question to have multiple layers of lstm cells depends on your network topology, preference, and computational ability. (the more the merrier) The amount of inputs and outputs depends on your problem, and as far as I can remember there is no such thing as multiple Horizontal RNN cells, just depth.
All computation is done depth wise one input at a time.
The multi layer function you referenced is awesome, it handles all computation for you under the hood, just tell it how many cells you want and it does the rest.
Good Luck
If you run everything sequentially, there should not be that much of a performance difference between both approaches (unless I am overseeing something with cache locality here). The main advantage of the latter approach is that you can parallelise the computation for multiple layers.
E.g. instead of waiting for the inputs to propagate through 2 layers, you can already start the computation of the next time step in the first layer while the result from the current time step is propagating through the second layer.
Disclaimer: I would not consider myself a performance expert.

Mood classification using libsvm

I want to apply SVM on audio data det. I am extarcting difftrent features from the speech signal. After reducing the dimention of this matrix, I am still getting a features in matix form. Can anyone help me regarding the data formating
should i have to convert the feature matix in a row vector? Can i assign same label to each row of one feature matrix and other label to the rows of other matrix?
Little bit ambiguous question but let me try to resolve your problem. For feature selection, you can use filter method, wrapper method etc. One popularly used method is principle component analysis. Once you select your feature you can directly feed them to the classifier. In your case, i guess you are getting lower dimensional representation of your training data (for example, if you have used SVD). In this case, its fine, now you can use it for SVM classification.
What did you mean by adding label to feature matrix? You can add label to the training instances, not the features. I guess you are talking about separate matrix for each of the class labels. If that is the case, yes you can use as you want but remember it depends on the model design.

Resources