TensorFlow Combining Dense Layer with LSTM Cell - machine-learning

How do i combine a Tensorflow Dense Layer, which is then followed by a LSTM!
Given a sequence of variable length, i want to backprop through both the layers, since i will be using this for RL.
How do i format my input sequence/ define my layers to be consistent with size requirements?

Related

Why can't you use a 3D volume input for LSTM?

In the paper CountNet: Estimating the Number of Concurrent Speakers Using Supervised Learning I recently read, it specified that the 3D volume output from a CNN layer must be reduced into a 2 dimensional sequence before entering the LSTM layer, why is that? What's wrong with using the 3 dimensional format?
The standard LSTM neural network assumes input of the following size:
[batch size] × [sequence length] × [feature dim]
The LSTM first multiplies each vector of size [feature dim] by a matrix, and then combines them in a fancy way. What's important here is that there's a vector per each example (the batch dimensions) and each timestep (the seq. length dimension). In a sense, this vector is first transformed by a matrix multiplication(s) (possibly involving some pointwise non-linearities, which don't change the shape, so I don't mention them) into a hidden state update, which is also a vector, and the updated hidden state vector is then used to produce the output (also a vector).
As you can see, the LSTM is designed to operate on vectors. You could design a Matrix-LSTM – an LSTM counterpart that assumes any or all of the following are matrices: the input, the hidden state, the output. That would require you to replace matrix-vector multiplications that process the input (or the state) by a generatlized linear operation that is able to turn any matrix into any other, which would be given by a rank-4 tensor, I believe. However, it'd be equivalent to just reshaping the input matrix into a vector, reshaping the rank-4 tensor into a matrix, doing matrix-vector product and then reshaping the output back into a matrix, so it makes little sense to devise such Matrix-LSTMs instead of just reshaping your inputs.
That said, it might still make sense to design a generalized LSTM that takes something other than a vector as input if the you know something about the input structure that instructs a more specific linear operator than a general rank-4 tensor. For example, images are known to have local structure (nearby pixels are more related than those far apart), hence using convolutions is more "reasonable" than reshaping images to vectors and then performing a general matrix multiplication. In a similar fashion you could replace all the matrix-vector multiplications in the LSTM with convolutions, which would allow for image-like input, states and outputs.

Sharing weights in parallel Convolutional Layer

currently I am developing a new network using NiftyNet and would need some help.
I am trying to implement an Autofocus Layer [1] as proposed in the paper. However, at a certain point, the Autofocus Layer needs to calculate K (K=4) parallel convolutions each using the same weights (w) and concatenates the four outputs afterwards.
Is there a way to create four parallel convolutional layer with each having the same weights in NiftyNet?
Thank you in advance.
[1] https://arxiv.org/pdf/1805.08403.pdf
The solution to this problem is as follows.
There is no restriction allowing you to use the same convolutional layer multiple times, each time with another input. This simulates the desired parallelism and solves the weight sharing issue, because there is only one convolutional layer.
However, using this approach doesn't solve the issue having different dilation rates in each parallel layer - we only have one convolutional layer for the weight sharing problem as mentioned above.
Note: it is the same operation either using a given tensor as input
for a convolutional layer with dilation rate = 2 OR using a dilated
tensor with rate = 2 as input for a convolutional layer with dilation
rate = 1.
Therefore, creating K dilated tensors each with a different dilation rate and then using each of them as input for a single convolutional layer with dilation rate = 1 solves the problem having parallel layers each with a different dilation rate.
NiftyNet provides a class to create dilated tensors.

How to apply different size kernel filters on a single convolutional layer

I am interested in doing a ConvNets in which there are filters with different sizes at the same convolutional layer. How can I make it using Tensorflow?
you need to split your layer to several parallel conv with different filter sizes. Pad the convolutions according to filter size and then concat the outputs.
Look at GoogLeNet for an example of such configuration.

Can't understand how filters in a Conv net are calculated

I've been studying machine learning for 4 months, and I understand the concepts behind the MLP. The problem came when I started reading about Convolutional Neural Networks. Let me tell you what I know and then ask what I'm having trouble with.
The core parts of a CNN are:
Convolutional Layer: you have "n" number of filters that you use to generate "n" feature maps.
RELU Layer: you use it for normalizing the output of the convolutional layer.
Sub-sampling Layer: used for "generating" a new feature map that represents more abstract concepts.
Repeat the first 3 layers some times and the last part is a common Classifier, such as a MLP.
My doubts are the following:
How do I create the filters used in the Convolutional Layer? Do I have to create a filter, train it, and then put it in the Conv Layer, or do I train it with the backpropagation algorithm?
Imagine I have a conv layer with 3 filters, then it will output 3 feature maps. After applying the RELU and Sub-sampling layer, I will still have 3 feature maps (smaller ones). When passing again through the Conv Layer, how do I calculate the output? Do I have to apply the filter in each feature map separately, or do some kind of operation over the 3 feature maps and then make the sum? I don't have any idea of how to calculate the output of this second Conv Layer, and how many feature maps it will output.
How do I pass the data from the Conv layers to the MLP (for classification in the last part of the NN)?
If someone knows of a simple implementation of a CNN without using a framework I will appreciate it. I think the best way of learning how stuff works is by doing it by yourself. In another time, when you already know how stuff works, you can use frameworks, because they save you a lot of time.
You train it with backpropagation algorithm, the same way as you train MLP.
You apply each filter separately. For example if you have 10 feature maps in the first layer and the filter shape of one of the feature maps from the second layer is 3*3, then you apply 3*3 filter to each of the ten feature maps in the first layer, weights for each feature map are different, in this case one filter will have 3*3*10 weights.
To understand it easier, keep in mind that a pixel of a non-grayscale image has three values - red, green and blue, so if you're passing images to a convolutional neural network ,then in the input layer you alredy have 3 feature maps(for RGB), so one value in the next layer will be connected too all 3 feature maps in the first layer
You should flatten the convolutional feature maps, for example if you have 10 feature maps with the size of 5*5, then you will have a layer with 250 values and then nothing different from MLP, you connect all of these artificial neurons to all of the artificial neurons in the next layer by weights.
Here someone has implemented convolutional neural network without frameworks.
I would also recommend you those lectures.

Conclusion from PCA of dataset

I have a set of data for sequence labeling.
I did PCA with (with 2 principal components on the x and y axis) on the dataset and it turns out as below:
Using an LSTM network to classify the dataset above, I then decided to extract the activations from the hidden layer of the LSTM. What I obtain is like the figure below:
My question is, what conclusion can I draw by comparing both the results?
Is it fair to say that the features of the original dataset are now self-organized after running it through an LSTM classifier?

Resources