I am having a problem with data shape when I try to feed it into my ConvLSTM network.
I already know that the train input for ConvLSTM should be 5 dimensional (number of sequences, number of samples in each sequence, rows, columns, channels). And I know that the labels for the same network should be four dimensional as it is expected from python. So my training data set has the shape (105, 15, 30, 40, 3) and for the following label data set for training I created the array of shape (105*15, 30, 40, 3).But I am getting the following error:
ValueError: Input arrays should have the same number of samples as target arrays. Found 105 input samples and 1575 target samples.
Related
ValueError: One of the dimensions in the output is <= 0 due to downsampling in conv3d_15. Consider increasing the input size. Received input shape [None, 1, 1, 1, 1904211] which would produce output shape with a zero or negative value in a dimension.
Can anyone explain me what is the meaning of this error
While i am trying to build my 3d convolutional neural network. I got this erro
A 3D convolution layer expect the input to be in a shape similar to (4, 28, 28, 28, 1). For 28x28x28 volumes with a single channel. More info here - https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv3D
Can someone please explain me the inputs and outputs along with the working of the layer mentioned below
model.add(Embedding(total_words, 64, input_length=max_sequence_len-1))
total_words = 263
max_sequence_len=11
Is 64, the number of dimensions?
And why is the output of this layer (None, 10, 64)
Shouldn't it be a 64 dimension vector for each word, i.e (None, 263, 64)
You can find all the information about the Embedding Layer of Tensorflow Here.
The first two parameters are input_dimension and output_dimension.
The input dimensions basically represents the vocabulary size of your model. You can find this out by using the word_index function of the Tokenizer() function.
The output dimensions are going to be Dimensions of the input of the next Dense Layer
The output of the Embedding layer is of the form (batch_size, input_length, output_dim). But since you specified the input_length parameter, your layers input will be of the form (batch, input_length). That's why the output is of the form (None, 10 ,64).
Hope that clears up your doubt ☺️
In the Embedding layer the first argument represents the input dimensions (which is typically of considerable dimensionality). The second argument represents the output dimensions, a.k.a the dimensionality of the reduced vector. The third argument is for the sequence length. In essence, an Embedding layer is simply learning a lookup table of shape (input dim, output dim). The weights of this layer reflect that shape. The output of the layer, however, will of course be of shape (output dim, seq length); one dimensionality-reduced embedding vector for each element in the input sequence. The shape you were expecting is actually the shape of the weights of an embedding layer.
im trying to fit the data with the following shape to the pretrained keras vgg19 model.
image input shape is (32383, 96, 96, 3)
label shape is (32383, 17)
and I got this error
expected block5_pool to have 4 dimensions, but got array with shape (32383, 17)
at this line
model.fit(x = X_train, y= Y_train, validation_data=(X_valid, Y_valid),
batch_size=64,verbose=2, epochs=epochs,callbacks=callbacks,shuffle=True)
Here's how I define my model
model = VGG16(include_top=False, weights='imagenet', input_tensor=None, input_shape=(96,96,3),classes=17)
How did maxpool give me a 2d tensor but not a 4D tensor ? I'm using the original model from keras.applications.vgg16. How can I fix this error?
Your problem comes from VGG16(include_top=False,...) as this makes your solution to load only a convolutional part of VGG. This is why Keras is complaining that it got 2-dimensional output insted of 4-dimensional one (4 dimensions come from the fact that convolutional output has shape (nb_of_examples, width, height, channels)). In order to overcome this issue you need to either set include_top=True or add additional layers which will squash the convolutional part - to a 2d one (by e.g. using Flatten, GlobalMaxPooling2D, GlobalAveragePooling2D and a set of Dense layers - including a final one which should be a Dense with size of 17 and softmax activation function).
So, I'm trying to learn fixed vector representations for segments of about 200 songs (~ 3-5 minutes per song) and wanted to use an LSTM-based Sequence-to-sequence Autoencoder for it.
I'm preprocessing the audio (using librosa) as follows:
I'm first just getting a raw audio signal time series of shape around (1500000,) - (2500000,) per song.
I'm then slicing each raw time series into segments and getting a lower-level mel spectrogram matrix of shape (512, 3000) - (512, 6000) per song. Each of these (512,) vectors can be referred to as 'mini-songs' as they represent parts of the song.
I vertically stack all these mini-songs of all the songs together to create the training data (let's call this X). X turns out to be (512, 600000) in size, where the first dimension (512) is the window size and the second dimension (600000) is the total number of 'mini-songs' in the dataset.
Which is to say, there are about 600000 mini-songs in X - each column in X represents a mini-song of length (512,).
Each of these (512,) mini-song vectors should be encoded into a (50,) vector per mini-song i.e. we will have 600000 (50,) vectors at the end of the process.
In more standard terminology, I have 600000 training samples each of length 512. [Think of this as being similar to an image dataset - 600000 images, each of length 784, where the images are of resolution 32x32. Except in my case I want to treat the 512-length samples as sequences that have temporal properties.]
I read the example here and was looking to extend that for my use case. I was wondering what the timesteps and input_dim parameters to the Input layer should be set to.
I'm setting timesteps = X.shape[0] (i.e. 512 in this case) and input_dim = X.shape[1] (i.e 600000). Is this the correct way to go about it?
Edit: Added clarifications above.
Your input is actually a 1D sequence not a 2D image.
The input tensor will be (600000, 512, 1) and you need to set the input_dim to 1 and the timesteps to 512.
The shape input does not take the first dimension of the tensor (i.e. 600000 in your case).
The Keras tutorial gives the following code example (with comments):
# apply a convolution 1d of length 3 to a sequence with 10 timesteps,
# with 64 output filters
model = Sequential()
model.add(Convolution1D(64, 3, border_mode='same', input_shape=(10, 32)))
# now model.output_shape == (None, 10, 64)
I am confused about the output size. Shouldn't it create 10 timesteps with a depth of 64 and a width of 32 (stride defaults to 1, no padding)? So (10,32,64) instead of (None,10,64)
In k-Dimensional convolution you will have a filters which will somehow preserve a structure of first k-dimensions and will squash the information from all other dimension by convoluting them with a filter weights. So basically every filter in your network will have a dimension (3x32) and all information from the last dimension (this one with size 32) will be squashed to a one real number with the first dimension preserved. This is the reason why you have a shape like this.
You could imagine a similar situation in 2-D case when you have a colour image. Your input will have then 3-dimensional structure (picture_length, picture_width, colour). When you apply the 2-D convolution with respect to your first two dimensions - all information about colours will be squashed by your filter and will no be preserved in your output structure. The same as here.