Understanding Temporal Convolution in Torch - machine-learning

I am trying to parameterise a 1D conv net via Torch.
Let's say I have a Tensor called data that is of dimensions 10 x 512, in that there are 10 rows and 512 columns. As such, I want to implement a single 3-layer stack of a TemporalConvolution layer, followed by ReLU, followed by TemporalMaxPooling. My classification problem is binary, and there is a corresponding labels tensor, which is 10 x 1. Let us assume that there is already written a feval to iterate through each row in both data and labels.
As such, the problem is to construct a net that can map from 512 columns down to 1 column
Adapted from the documentation:
...
model = nn.Sequential()
model:add(nn.TemporalConvolution(inputFrameSize, outputFrameSize, kW, [dW]))
model:add(nn.ReLU())
model:add(nn.TemporalMaxPooling(kW2, [dW2])
...
criterion = nn.BCECriterion()
...
I have parameterised it as follows, but the following doesn't work : /
TemporalConvolution(512,1,3,1)
ReLU())
TemporalMaxPooling(3, 1)
It throws the error: 2D or 3D(batch mode) tensor expected. As a result I tried to reshape data before passing it to the net:
data = data:resize(1, 100, 512)
But this throws the error: invalid input frame size.
I can see that the error concerns the shape of the data coming into the conv net and of course the parameterisation too. I am further confused by this post here which seems to suggest that inputFrameSize of TemporalConvolution should be set to 10 not 512.
Any guidance would be appreciated, as to how to build a 1D conv net.
P.S. I have tested the script with a logisticRegression model, and that runs, so the issue is purely with the conv net architecture / the shape of the data coming into it.

I guess you misunderstand the meaning of inputFrameSize, which is not the seqlen of your input but n_channels (e.g. for 512*512 RGB images in 2d-convlution, the inputFrameSize should be 3 not 512).

Related

Reshaping numpy array as an input to CNN

I have seen multiple posts on reshaping numpy arrays as inputs to CNN's however, I haven't been able to successfully reshape my array as an input to my CNN!
I have a CNN that merges with another model further downstream. The input shape of the CNN is (4,4,1) -- it is bigger but i have purposefully made it smaller to establish he pipeline and get it running before i put in the proper size.
the format will be the same however, its a 1 channel n x n np.array. I am getting errors when reshaping which I will mention after the code. The input dimensions are put in to the model as follows:
cnn_branch_input = tf.keras.layers.Input(shape=(4,4,1))
cnn_branch_two = tf.keras.layers.Conv2D(etc....)(cnn_branch_input)
the np array (which is originally a pandas dataframe) characteristics and reshaping are as follows:
np.array(array).shape
(4,4)
input = np.array(array).reshape(-1,1,4,4)
input.shape
(1,1,4,4)
the input to my merged model is as follows:
model.fit([cnn_input,gnn_input, gnn_node_feat], y,
#sample_weight=train_mask,
#validation_data=validation_data,
batch_size=4,
shuffle=False)
this causes an error which makes sense to me:
ValueError: Data cardinality is ambiguous:
x sizes: 1, 4, 4 -- Please provide data which shares the same first dimension.
So now when reshaping to intentionally have a 4x4 plus 1 channel shape as follows:
input = np.array(array).reshape(-1,4,4,1)
input.shape
(1,4,4,1)
Two things, the array reshapes to 4, 1x1 arrays, so it seems the structure of the original array is lost, and I get the same error!!
Notice that in both reshape methods, the shape is either (1,4,4,1) or (1,1,4,4).. the -1 entry simply becomes a 1, making the CNN think the first element is shape 1. I thought the -1 would allow me to successfully add the sample dimension as 'any number of samples'.
Simply entering the original (4,4) array, I receive the error that the CNN received a 2 dim array while a 4 dimension array is required.
Im really confused as to how to correctly reshape this array! I would appreciate any help!

What input shapes are required for a classifier that is using a pre-trained network (Pytorch)?

I'm fairly new to deeplearning, python, and pytorch so please bear with me!
I'm trying to understand Transfer Learning in Pytorch using two different Pretrained Networks: Vgg11 and Densenet121.
I've run data of shape (3 x 224 x 224) through the "features" part of the above networks, and the output shapes are as follows:
Vgg11 features output shape: 512 x 7 x 7
Densenet121 features output shape: 1024 x 7 x7
Now, I'm trying to make my own Classifier to use instead of the Pre-trained one. Upon checking both pre-trained classifiers, I see the Vgg11 classifier has in the first layer:
(0): Linear(in_features=25088, out_features=4096, bias=True)
While the Densenet121 has in the first layer:
(classifier): Linear(in_features=1024, out_features=1000, bias=True))
The Vgg one makes sense, since if you flatten the output of the "features" part, you get 512 x 7 x 7 = 25,088.
How does the Densenet one have only 1024 dimensions? If you flatten the output of its "features" part, you get 1024 x 7 x 7 = 50,176
Are there steps that I am missing for either of them? Are there ways to check the input and output shapes of each layer and find out exactly what's happening?
Thank you.
As mentioned in Table 1 in the DenseNet paper, DenseNet-121 uses something called Global Average Pooling, which is an extreme way of pooling where a tensor of dimensions d x h x w is reduced to d x 1 x 1.

Keras CNN too few parameters

I am trying to recreate the following tutorial CNN with 3 inputs and sigmoid activation functions in keras:
So the number of parameters should be 7 (assuming 1 filter of size 2 convolved over 2 locations (either top 2 inputs or 2 lower inputs), 2 shared weights (shown as 1.0's on the synapses) and no padding in the conv1d layer). When I write the following in Keras:
I only get 5 parameters when I check it in model.summary():
What do I need to do to get the correct number of parameters? There are probably several things that are wrong in my code since I'm new to Keras.
All convolutional parameters are shared spatially (in case of 1D this means across the input sequence). Precisely, the convolutional filter of length 2 is applied twice to inputs (x[0], x[1]) and (x[1], x[2]), but it's the same filter in both cases and correspondingly the trainable parameters are the same too.
This explains the size of the model you are getting right now: Conv1D has 3 parameters (weight (2) and bias (1)), the dense layer has 2 parameters because the output of Conv1D is (?, 2, 1).
Finally, I can't comment on the network you're trying to implement. Probably they mean 2 filters (but then the layer will have 6 parameters)... But I'm not aware of any implementation, in which the convolutional layer has separate parameters for each patch.

RNN Encoder Decoder using keras

I am trying to build an architecture which will be used for machine language translation (from English to French)
model = Sequential()
model.add(LSTM(256, input_shape =(15,1)))
model.add(RepeatVector(output_sequence_length))
model.add(LSTM(21,return_sequences=True))
model.add(TimeDistributed(Dense(french_vocab_size, activation='sigmoid'))
Max length of English sentence is 15 and that of French is 21. Max number of English words is 199 and that of French is 399. output_sequence_length is 21.
This model throws me an error
Error when checking input: expected lstm_40_input to have shape (None, 15, 1) but got array with shape (137861, 21, 1)
I am stuck with the understanding of the LSTM in keras.
1.The first argument according to documentation must be 'dimensionality of output space'. I did not understand what that means. Also,
what exactly happens return_sequences is set to True
Please let me know.
What Kind of data are you trying to feed your network ? Because it seems to me that you didn't convert your words to vectors (binary vectors or encoded vectors).
Anyway, a LSTM Netword need a 3 dimensional entry, the dimensions correspond to that : (samples , timesteps , features).
In your case, samples correspond to the numebr of your sentences, I guess 137861. Timesteps correspond to the length of each sequence, which In your case is 15, and features is the size of each encoded word ( Depending on which type of encoding you choose. If you choose OneHotEncoding, it will be 199).
The error that you got shows that you fed your network sequences with 21 timesteps instead of 15.
For your second question, when return_sequences is set to False, it returns only one output per LSTM layer, which in your case will be (256, ) for your first LSTM layer. when it's set to True, it will have one output per timestep, giving you an overall output of shape (15 , 256). When you want to stack two or more LSTM layers, you always have to set the first layers to return_sequences = True.
Also, what you are building is called a Many to Many architecture, with different timestep lengths for the input and the output (15 vs 21). As far as I know, it's not that easy to implement in keras.

Torch 'Gather' Issue

I have two tensors as follows:
Normalised Tensor :
1
10
94
[torch.LongStorage of size 3]
and
Batch :
1
10
[torch.LongStorage of size 2]
I would like to use 'Batch' to select indices in the 3 dimension of 'Normalised Tensor'. So far I have used gather as follows:
normalised:long():gather(1, batch:long())
Unfortunately it's returning this error.
"bad argument #1 to 'gather' (Input tensor must have same dimensions as output"
Any help would be much appreciated! Thanks
Answer is based on assuming the following: you have a three dimensional tensor of sizes x,y,z and you want a three dimensional tensor of size x,y,10 where the x,y slices are chosen based on indices listed in another tensor of size 1,10.
I, personally, have spent much time pondering what would be the possible use of the gather method. Only conclusion I've come to is: it not the problem described above.
The described problem is solvable by use of the index function:
local slice = normalised:gather(3, batch[1]:long())

Resources