Embedded-RNN with Keras - Issue with Concatenate - machine-learning

For audio classification, I would like to try a kind of embedded RNN. Based on the MFCC or the FFT of a 30 second samples, I would like to create a first output for every 1s subsample and then use the 30 outputs to send it to another RNN to get the final prediction. The idea is to fight Vanishing gradient problem by dividing the problem into multi pieces (your opinion is also welcome about this idea, it comes from a visualisation I saw with Wavenet).
This is a representation of the model with only 4 timesteps and 1 layer of LSTM for every level :
On the following code, I have an issue with the dimension of Concatenate. The input iX is (None, 30, 84) and the output is (None, 32). After concatenation on axis 0, I would like a (None, 30, 32).
i1 = Input((30, 84))
l1 = CuDNNLSTM(units=32, return_sequences=False) (i1)
i2 = Input((30, 84))
l2 = CuDNNLSTM(units=32, return_sequences=False) (i2)
i3 = Input((30, 84))
l3 = CuDNNLSTM(units=32, return_sequences=False) (i3)
i4 = Input((30, 84))
l4 = CuDNNLSTM(units=32, return_sequences=False) (i4)
i5 = Input((30, 84))
l5 = CuDNNLSTM(units=32, return_sequences=False) (i5)
i6 = Input((30, 84))
l6 = CuDNNLSTM(units=32, return_sequences=False) (i6)
i7 = Input((30, 84))
l7 = CuDNNLSTM(units=32, return_sequences=False) (i7)
i8 = Input((30, 84))
l8 = CuDNNLSTM(units=32, return_sequences=False) (i8)
i9 = Input((30, 84))
l9 = CuDNNLSTM(units=32, return_sequences=False) (i9)
i10 = Input((30, 84))
l10 = CuDNNLSTM(units=32, return_sequences=False) (i10)
i11 = Input((30, 84))
l11 = CuDNNLSTM(units=32, return_sequences=False) (i11)
i12 = Input((30, 84))
l12 = CuDNNLSTM(units=32, return_sequences=False) (i12)
# ... up to 30
input_layer = [i1, i2, i3, i4, i5, i6 ,i7, i8, i9, i10, i11, i12]
first_layer = [l1, l2, l3, l4, l5, l6 ,l7, l8, l9, l10, l11, l12]
# f = Concatenate(axis=0)(first_layer) # Sequential format
f = concatenate(first_layer, axis=0) # Functional API version
o1 = CuDNNLSTM(units=32, return_sequences=False) (f)
outputs = Dense(16, activation='softmax') (o1)
model = Model(inputs=input_layer, outputs=outputs)
model.summary()
The error is logical because the shape of (None, 32) is not compatible with LSTM.
ValueError: Input 0 is incompatible with layer cu_dnnlstm_13: expected ndim=3, found ndim=2
Second thing, is there a way to train a model with the same "cell" for the first layer. For example on the image, I would like to have red cells = blue cells = yellow cells = green cells in term of cell state. This is because I would like a time invariant output for a given sound. A specific sound at 0 second should have the same output at the same sound at 10 second. But as it is now, the output will be different based on every Cell State.
If this is not possible in Keras, is ther a way to do it with tensorflow ?
Many thanks for your support,
Nicolas

Regarding your error, it seems you want to stack your tensors (concatenating/stacking tensors along a new dimension), not concatenate them (concatenating tensors along an existing dimension).
Using K.stack():
import keras.backend as K
from keras.models import Model
from keras.layers import Lambda, Input, CuDNNLSTM, Dense
import numpy as np
# Demonstrating K.stack() on simple tensors:
list_l = [K.variable(np.random.rand(32)) for i in range(30)]
f = K.stack(list_l, axis=0)
print(f)
# > Tensor("stack:0", shape=(30, 32), dtype=float32)
# Actual usage, in your model:
input_layer = [Input(shape=(30, 84)) for n in range(30)]
first_layer = [CuDNNLSTM(units=32, return_sequences=False)(i) for i in input_layer]
f = Lambda(lambda tensors: K.stack(tensors, axis=1))(first_layer)
print(f)
# > Tensor("lambda_1/stack:0", shape=(?, 30, 32), dtype=float32)
o1 = CuDNNLSTM(units=32, return_sequences=False)(f)
outputs = Dense(16, activation='softmax') (o1)
model = Model(inputs=input_layer, outputs=outputs)
model.summary()
It isn't completely clear to me what you mean in your sup. question... Weight sharing for your 1st CuDNNLSTM layers maybe (see doc on Shared Layers)?
If so, you could define your first layers as:
cudnn_lstm_first = CuDNNLSTM(units=32, return_sequences=False)
first_layer = [cudnn_lstm_first(i) for i in input_layer]

Related

Getting error while training a DANN model -> CUDA error: CUBLAS_STATUS_EXECUTION_FAILED

At first I want to tell, that I am a newbie to Machine Learning. So I apologize if there is any mistake in my code.
I am training a DANN(Domain Adversarial Neural Network) on the CIFAR10 and CIFAR100 datasets and I am getting a error code
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-25-f44419d6a132> in <module>
27 D.zero_grad()
28
---> 29 Ltot.backward()
30
31 C_opt.step()
1 frames
/usr/local/lib/python3.8/dist-packages/torch/_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
486 inputs=inputs,
487 )
--> 488 torch.autograd.backward(
489 self, gradient, retain_graph, create_graph, inputs=inputs
490 )
/usr/local/lib/python3.8/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
195 # some Python versions print out the first line of a multi-line function
196 # calls in the traceback and some print out the last line
--> 197 Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
198 tensors, grad_tensors_, retain_graph, create_graph, inputs,
199 allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)
Couldn't figure out the problem in the code. Kindly attaching the whole code for your reference.
class FeatureExtractor(nn.Module):
"""
Feature Extractor
"""
def __init__(self, in_channel=3, hidden_dims=512):
super(FeatureExtractor, self).__init__()
self.conv = nn.Sequential(
nn.Conv2d(in_channel, 64, 3, padding=1),
nn.BatchNorm2d(64),
nn.ReLU(inplace=True),
nn.MaxPool2d(2,2), # Add max pooling layer
nn.Conv2d(64, 128, 3, padding=1),
nn.BatchNorm2d(128),
nn.ReLU(inplace=True),
nn.MaxPool2d(2,2), # Add max pooling layer
nn.Conv2d(128, 256, 3, padding=1),
nn.BatchNorm2d(256),
nn.ReLU(inplace=True),
nn.MaxPool2d(2,2), # Add max pooling layer
nn.Conv2d(256, 256, 3, padding=1),
nn.BatchNorm2d(256),
nn.ReLU(inplace=True),
nn.MaxPool2d(2,2), # Add max pooling layer
nn.Conv2d(256, hidden_dims, 3, padding=1),
nn.BatchNorm2d(hidden_dims),
nn.ReLU(inplace=True),
nn.AdaptiveAvgPool2d((1,1)),
)
def forward(self, x):
h = self.conv(x).squeeze() # (N, hidden_dims)
return h
class Classifier(nn.Module):
"""
Classifier
"""
def __init__(self, input_size=512, num_classes=10):
super(Classifier, self).__init__()
self.layer = nn.Sequential(
nn.Linear(input_size, 256),
nn.ReLU(inplace=True),
nn.Dropout(0.5),
nn.Linear(256, num_classes),
)
def forward(self, h):
c = self.layer(h)
return c
class Discriminator(nn.Module):
"""
Simple Discriminator w/ MLP
"""
def __init__(self, input_size=512, num_classes=1):
super(Discriminator, self).__init__()
self.layer = nn.Sequential(
nn.Linear(input_size, 256),
nn.LeakyReLU(0.2),
nn.Dropout(0.5),
nn.Linear(256, 128),
nn.LeakyReLU(0.2),
nn.Linear(128, num_classes),
nn.Sigmoid(),
)
def forward(self, h):
y = self.layer(h)
return y
F = FeatureExtractor().to(DEVICE)
C = Classifier().to(DEVICE)
D = Discriminator().to(DEVICE)
transform = transforms.Compose([
transforms.Resize(28),
transforms.CenterCrop(28),
transforms.ToTensor(),
transforms.Normalize(mean=[0.5],
std=[0.5])
])
mnist_train = datasets.CIFAR10(root='../data/', train=True, transform=transform, download=True)
mnist_test = datasets.CIFAR10(root='../data/', train=False, transform=transform, download=True)
svhn_train = datasets.CIFAR100(root='../data/', train=True,transform=transform, download=True)
svhn_test = datasets.CIFAR100(root='../data/', train=False,transform=transform, download=True)
batch_size = 64
svhn_train.data.shape
svhn_loader = DataLoader(dataset=svhn_train, batch_size=batch_size, shuffle=True, drop_last=True,num_workers=3, pin_memory=True)
mnist_loader = DataLoader(dataset=mnist_train, batch_size=batch_size, shuffle=True, drop_last=True, num_workers=3, pin_memory=True )
eval_loader = DataLoader(dataset=svhn_test, batch_size=batch_size, shuffle=False, drop_last=False,num_workers=3, pin_memory=True)
test_loader = DataLoader(dataset=mnist_test, batch_size=batch_size, shuffle=False, drop_last=False,num_workers=3, pin_memory=True)
bce = nn.BCELoss()
xe = nn.CrossEntropyLoss()
F_opt = torch.optim.Adam(F.parameters())
C_opt = torch.optim.Adam(C.parameters())
D_opt = torch.optim.Adam(D.parameters())
max_epoch = 50
step = 0
n_critic = 1 # for training more k steps about Discriminator
n_batches = len(mnist_train)//batch_size
# lamda = 0.01
D_src = torch.ones(batch_size, 1).to(DEVICE) # Discriminator Label to real
D_tgt = torch.zeros(batch_size, 1).to(DEVICE) # Discriminator Label to fake
D_labels = torch.cat([D_src, D_tgt], dim=0)
"""### Training Code
"""
def get_lambda(epoch, max_epoch):
p = epoch / max_epoch
return 2. / (1+np.exp(-10.*p)) - 1.
mnist_set = iter(mnist_loader)
def sample_mnist(step, n_batches):
global mnist_set
if step % n_batches == 0:
mnist_set = iter(mnist_loader)
return next(mnist_set)
ll_c, ll_d = [], []
acc_lst = []
torch.cuda.empty_cache()
for epoch in range(1, max_epoch+1):
for idx, (src_images, labels) in enumerate(svhn_loader):
tgt_images, _ = sample_mnist(step, n_batches)
# Training Discriminator
src, labels, tgt = src_images.to(DEVICE), labels.to(DEVICE), tgt_images.to(DEVICE)
x = torch.cat([src, tgt], dim=0)
h = F(x)
y = D(h.detach())
Ld = bce(y, D_labels)
D.zero_grad()
Ld.backward()
D_opt.step()
c = C(h[:batch_size])
y = D(h)
Lc = xe(c, labels)
Ld = bce(y, D_labels)
lamda = 0.1*get_lambda(epoch, max_epoch)
Ltot = Lc -lamda*Ld
F.zero_grad()
C.zero_grad()
D.zero_grad()
Ltot.backward()
C_opt.step()
F_opt.step()
if step % 100 == 0:
dt = datetime.datetime.now().strftime('%H:%M:%S')
print('Epoch: {}/{}, Step: {}, D Loss: {:.4f}, C Loss: {:.4f}, lambda: {:.4f} ---- {}'.format(epoch, max_epoch, step, Ld.item(), Lc.item(), lamda, dt))
ll_c.append(Lc.detach().cpu().numpy())
ll_d.append(Ld.detach().cpu().numpy())
if step % 500 == 0:
F.eval()
C.eval()
with torch.no_grad():
corrects = torch.zeros(1).to(DEVICE)
for idx, (src, labels) in enumerate(eval_loader):
src, labels = src.to(DEVICE), labels.to(DEVICE)
c = C(F(src))
_, preds = torch.max(c, 1)
corrects += (preds == labels).sum()
acc = corrects.item() / len(eval_loader.dataset)
print('***** Eval Result: {:.4f}, Step: {}'.format(acc, step))
corrects = torch.zeros(1).to(DEVICE)
for idx, (tgt, labels) in enumerate(test_loader):
tgt, labels = tgt.to(DEVICE), labels.to(DEVICE)
c = C(F(tgt))
_, preds = torch.max(c, 1)
corrects += (preds == labels).sum()
acc = corrects.item() / len(test_loader.dataset)
print('***** Test Result: {:.4f}, Step: {}'.format(acc, step))
acc_lst.append(acc)
F.train()
C.train()
step += 1

why my variational autoencoder can't learn

I am using 187 data as train set, which has 68 features
and would like to extract 10 features then use PCA to plot in 2D
my original data is right skewed but the latent space becomes normal
even though the loss decreases well, the model doesn't seem to be learning
[model] variational autoencoder
layer 68 - 30 - 10 - 30 - 68, using leaky_relu as activation function and tanh in the final layer
added l1 regularization in loss function, and dropout in the encoder
class VAE(nn.Module):
def __init__(self):
super(VAE, self).__init__()
self.fc1 = nn.Linear(68, 30)
self.fc21 = nn.Linear(30, 10)
self.fc22 = nn.Linear(30, 10)
self.fc3 = nn.Linear(10, 30)
self.fc4 = nn.Linear(30, 68)
self.dropout = nn.Dropout(0.5)
def encode(self, x):
h1 = F.leaky_relu(self.fc1(x))
h1 = self.dropout(h1)
return self.fc21(h1), self.fc22(h1)
def reparameterize(self, mu, logvar):
std = logvar.mul(0.5).exp_()
if torch.cuda.is_available():
eps = torch.cuda.FloatTensor(std.size()).normal_()
else:
eps = torch.FloatTensor(std.size()).normal_()
eps = Variable(eps)
return eps.mul(std).add_(mu)
def decode(self, z):
h3 = F.leaky_relu(self.fc3(z))
return torch.tanh(self.fc4(h3)) # sigmoid -> relu
def forward(self, x):
mu, logvar = self.encode(x.view(-1, 68))
z = self.reparameterize(mu, logvar)
return self.decode(z), z, mu, logvar
model = VAE().to(device)
optimizer = optim.Adam(model.parameters(), lr=1e-3) # this will help for L2 regularization
def loss_function(recon_x, x, mu, logvar):
loss = nn.MSELoss(reduction = 'sum')
BCE = loss(recon_x, x)
KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
regularization_loss = 0 # l1 regularization
for param in model.parameters():
regularization_loss += torch.sum(torch.abs(param))
return BCE + KLD + regularization_loss
I have no clue why this is not working
this is the desired output
this is the one I get -- very random

Deep CNN doesn't learn and accuracy just stay in same value

I have a Deep CNN based on ResNet, and a dataset(10000, 50,50,1) to classify digits . when I run it to start leanrning , accuracy just stops in some value and gently occilating(around 0.2). I am wondering if it has overfitting or there is another issue involved ?
here is the identity block :
def identity_block(X, f, filters, stage, block):
# defining name basics
conv_name_base = 'res' + str(stage) + block + '_branch'
bn_name_base = 'bn' + str(stage) + block + '_branch'
# retrieve filters
F1, F2, F3 = filters
# save the shortcut
X_shortcut = X
# first component
X = Conv2D(filters=F1, kernel_size=(1, 1), strides=(1, 1), padding='valid', name=conv_name_base + '2a',
kernel_initializer=initializers.glorot_uniform(seed=0))(X)
X = BatchNormalization(axis=3, name=bn_name_base + '2a')(X)
X = Activation('relu')(X)
# second component
X = Conv2D(filters=F2, kernel_size=(f, f), strides=(1, 1), padding='same', name=conv_name_base + '2b',
kernel_initializer=initializers.glorot_uniform(seed=0))(X)
X = BatchNormalization(axis=3, name=bn_name_base + '2b')(X)
X = Activation('relu')(X)
# third component
X = Conv2D(filters=F3, kernel_size=(1, 1), strides=(1, 1), padding='valid', name=conv_name_base + '2c',
kernel_initializer=initializers.glorot_uniform(seed=0))(X)
X = BatchNormalization(axis=3, name=bn_name_base + '2c')(X)
# final component
X = Add()([X, X_shortcut])
X = Activation('relu')(X)
return X
and convolutional block :
def conv_block(X, f, filters, stage, block, s=2):
conv_name_base = 'res' + str(stage) + block + '_branch'
bn_name_base = 'bn' + str(stage) + block + '_branch'
# Retivr filters
F1, F2, F3 = filters
# Save shortcut
X_shortcut = X
# First component
X = Conv2D(F1, kernel_size=(1, 1), strides=(s, s), name=conv_name_base + '2a',
kernel_initializer=initializers.glorot_uniform(seed=0))(X)
X = BatchNormalization(axis=3, name=bn_name_base + '2a')(X)
X = Activation('relu')(X)
# Second component
X = Conv2D(F2, kernel_size=(f, f), strides=(1, 1), padding='same', name=conv_name_base + '2b',
kernel_initializer=initializers.glorot_uniform(seed=0))(X)
X = BatchNormalization(axis=3, name=bn_name_base + '2b')(X)
X = Activation('relu')(X)
# third component
X = Conv2D(F3, kernel_size=(1, 1), strides=(1, 1), name=conv_name_base + '2c', padding='valid',
kernel_initializer=initializers.glorot_uniform(seed=0))(X)
X = BatchNormalization(axis=3, name=bn_name_base + '2c')(X)
# short cut
X_shortcut = Conv2D(F3, kernel_size=(1, 1), strides=(s, s), name=conv_name_base + '1',
kernel_initializer=initializers.glorot_uniform(seed=0))(X_shortcut)
X_shortcut = BatchNormalization(axis=3, name=bn_name_base + '1')(X_shortcut)
# finaly
X = Add()([X, X_shortcut])
X = Activation('relu')(X)
return X
and finaly the ResNet:
def ResNet( input_shape=(50, 50, 1), classes=10):
inp = Input(shape=(50,50,1))
# zero padding
X = ZeroPadding2D((3, 3), name='pad0')(inp)
# stage1
X = Conv2D(32, (5,5), name='conv1', input_shape=input_shape,
kernel_initializer=initializers.glorot_uniform(seed=0))(X)
X = BatchNormalization(axis=3, name='bn1')(X)
X = Activation('relu')(X)
X = MaxPooling2D((2,2), name='pool1')(X)
# Stage 2
stage2_filtersize = 32
X = conv_block(X, 3, filters=[stage2_filtersize, stage2_filtersize, stage2_filtersize], stage=2, block='a', s=1)
X = identity_block(X, 3, [stage2_filtersize,stage2_filtersize, stage2_filtersize], stage=2, block='b')
X = identity_block(X, 3, [stage2_filtersize, stage2_filtersize, stage2_filtersize], stage=2, block='c')
# Stage 3
stage3_filtersize = 64
X = conv_block(X, 3, filters=[stage3_filtersize, stage3_filtersize, stage3_filtersize], stage=3, block='a', s=1)
X = identity_block(X, 3, [stage3_filtersize, stage3_filtersize, stage3_filtersize], stage=3, block='b')
X = identity_block(X, 3, [stage3_filtersize, stage3_filtersize, stage3_filtersize], stage=3, block='c')
# Stage 4
stage4_filtersize = 128
X = conv_block(X, 3, filters=[stage4_filtersize, stage4_filtersize, stage4_filtersize], stage=4, block='a', s=1)
X = identity_block(X, 3, [stage4_filtersize, stage4_filtersize, stage4_filtersize], stage=4, block='b')
X = identity_block(X, 3, [stage4_filtersize, stage4_filtersize, stage4_filtersize], stage=4, block='c')
# final
X = AveragePooling2D((2, 2), padding='same', name='Pool0')(X)
# FC
X = Flatten(name='D0')(X)
X = Dense(classes, activation='softmax', kernel_initializer=initializers.glorot_uniform(seed=0), name='D2')(X)
# creat model
model = Model(inputs=inp, outputs=X)
return model
update 1 : here are the fitting and compile methods :
model.compile(optimizer='adam',
loss=tensorflow.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model.compile(optimizer='adam',
loss=tensorflow.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
print("model compiled settings imported successfully")
early_stopping = EarlyStopping(monitor='val_loss', patience=2)
model.fit(X_train, Y_train, validation_split=0.2, callbacks=[early_stopping], epochs=10)
test_loss, test_acc = model.evaluate(X_test, Y_test, verbose=2)
First try normalizing the values of the digit image (50x50).
Then also consider how a neural network learns its weights. Convolutional Neural Networks learns by continually adding gradient error vectors that are multiplied by a learning rate computed from backpropagation to various weight matrices throughout the network as training examples are passed through.
The most important thing to consider is the multiplication of the learning rate, because once we didn't scale the training inputs the range of distributions of the feature values will be likely different from each feature, thus the learning rate would cause corrections in each dimension that would differ from one another. This is random, so the machine could be overcompensating a correction in one weight dimension and under compensating in another. Which is very non-ideal as this might result in an oscillation state or a very slow training state.
Oscillating means that the model is unable to locate the center for the better maxima in weights.
Slow training means moving too slow to achieve a better maxima.
This is why it is a common practice to normalize images before using it as an input for Neural Network or any Models that is Gradient-Based.
TF_Support's answer:
Provide few samples of the dataset, loss curve, accuracy plot so we can clearly understand what you're trying to learn, it's more important than the code you provided.
I would guess, you are trying to learn very hard samples, 50by50 grayscale is not much. Is your network overfitting? (We would only figure that out after looking into some plots of the validation metrics) (0.2 is it your training accuracy?)
First do a sanity check on the dataset, by training a very simple CNN. I see you have 10 classes (not sure, just guessing from the function's default value), the randomized accuracy is 10%, so set a baseline first with a simple CNN and then try to improve with ResNet.
Increase the learning rate and see how the accuracy fluctuates. After a few epochs, reduce the learning rate when the accuracy better than the baseline.

Trying to understand Pytorch's implementation of LSTM

I have a dataset containing 1000 examples where each example has 5 features (a,b,c,d,e). I want to feed 7 examples to an LSTM so it predicts the feature (a) of the 8th day.
Reading Pytorchs documentation of nn.LSTM() I came up with the following:
input_size = 5
hidden_size = 10
num_layers = 1
output_size = 1
lstm = nn.LSTM(input_size, hidden_size, num_layers)
fc = nn.Linear(hidden_size, output_size)
out, hidden = lstm(X) # Where X's shape is ([7,1,5])
output = fc(out[-1])
output # output's shape is ([7,1])
According to the docs:
The input of the nn.LSTM is "input of shape (seq_len, batch, input_size)" with "input_size – The number of expected features in the input x",
And the output is: "output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the LSTM, for each t."
In this case, I thought seq_len would be the sequence of 7 examples, batchis 1 and input_size is 5. So the lstm would consume each example containing 5 features refeeding the hidden layer every iteration.
What am I missing?
When I extend your code to a full example -- I also added some comments to may help -- I get the following:
import torch
import torch.nn as nn
input_size = 5
hidden_size = 10
num_layers = 1
output_size = 1
lstm = nn.LSTM(input_size, hidden_size, num_layers)
fc = nn.Linear(hidden_size, output_size)
X = [
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
]
X = torch.tensor(X, dtype=torch.float32)
print(X.shape) # (seq_len, batch_size, input_size) = (7, 1, 5)
out, hidden = lstm(X) # Where X's shape is ([7,1,5])
print(out.shape) # (seq_len, batch_size, hidden_size) = (7, 1, 10)
out = out[-1] # Get output of last step
print(out.shape) # (batch, hidden_size) = (1, 10)
out = fc(out) # Push through linear layer
print(out.shape) # (batch_size, output_size) = (1, 1)
This makes sense to me, given your batch_size = 1 and output_size = 1 (I assume, you're doing regression). I don't know where your output.shape = (7, 1) come from.
Are you sure that your X has the correct dimensions? Did you create nn.LSTM maybe with batch_first=True? There are lot of little things that can sneak in.

Has anyone successfully trained Squeezenet with residual connections?

I have trained the two versions of Squeezenet, both with success, thanks #forresti !
When training the one with residual connections, I am stucked. Whatever learning policy I took, the one shipped in this repo, or the plainly step, I cannot train it to the results given in the paper. The accuracy is a bit lower than Squeezenet v1.0....
I know that I should post this in that repo, but I can't find issues tab there....
Anyone could shed me some light? Thanks in advance!
====================EDIT=============================
I firstly adopted the solver hyperparameters shipped with SqueezeNet-v1.0. Then, I changed the learning policy from poly to step, keeping the remaining parameters untouched and closely monitored the loss and accuracy, when they became apparently flat, I changed the learning rate by a factor of 0.4. In both cases, I got top-5 accuracies 81.9x% and 79.8x%, lower than the benchmark provided in the paper, seems rather weird....
You can use newest SqueezeNet v1.1 version of Squezenet from: https://github.com/rcmalli/keras-squeezenet
Model Definition:
from keras import backend as K
from keras.layers import Input, Convolution2D, MaxPooling2D, Activation, concatenate, Dropout
from keras.layers import GlobalAveragePooling2D, GlobalMaxPooling2D
from keras.models import Model
from keras.utils.layer_utils import get_source_inputs #https://stackoverflow.com/questions/68862735/keras-vggface-no-module-named-keras-engine-topology
from tensorflow.keras.utils import get_file
from keras.utils import layer_utils
sq1x1 = "squeeze1x1"
exp1x1 = "expand1x1"
exp3x3 = "expand3x3"
relu = "relu_"
WEIGHTS_PATH = "https://github.com/rcmalli/keras-squeezenet/releases/download/v1.0/squeezenet_weights_tf_dim_ordering_tf_kernels.h5"
WEIGHTS_PATH_NO_TOP = "https://github.com/rcmalli/keras-squeezenet/releases/download/v1.0/squeezenet_weights_tf_dim_ordering_tf_kernels_notop.h5"
# Modular function for Fire Node
def fire_module(x, fire_id, squeeze=16, expand=64):
s_id = 'fire' + str(fire_id) + '/'
if K.image_data_format() == 'channels_first':
channel_axis = 1
else:
channel_axis = 3
x = Convolution2D(squeeze, (1, 1), padding='valid', name=s_id + sq1x1)(x)
x = Activation('relu', name=s_id + relu + sq1x1)(x)
left = Convolution2D(expand, (1, 1), padding='valid', name=s_id + exp1x1)(x)
left = Activation('relu', name=s_id + relu + exp1x1)(left)
right = Convolution2D(expand, (3, 3), padding='same', name=s_id + exp3x3)(x)
right = Activation('relu', name=s_id + relu + exp3x3)(right)
x = concatenate([left, right], axis=channel_axis, name=s_id + 'concat')
return x
# Original SqueezeNet from paper.
def SqueezeNet(include_top=True, weights='imagenet',
input_tensor=None, input_shape=None,
pooling=None,
classes=1000):
"""Instantiates the SqueezeNet architecture."""
if weights not in {'imagenet', None}:
raise ValueError('The `weights` argument should be either '
'`None` (random initialization) or `imagenet` '
'(pre-training on ImageNet).')
input_shape = input_shape
if input_tensor is None:
img_input = Input(shape=input_shape)
else:
if not K.is_keras_tensor(input_tensor):
img_input = Input(tensor=input_tensor, shape=input_shape)
else:
img_input = input_tensor
x = Convolution2D(64, (3, 3), strides=(2, 2), padding='valid', name='conv1')(img_input)
x = Activation('relu', name='relu_conv1')(x)
x = MaxPooling2D(pool_size=(3, 3), strides=(2, 2), name='pool1')(x)
x = fire_module(x, fire_id=2, squeeze=16, expand=64)
x = fire_module(x, fire_id=3, squeeze=16, expand=64)
x = MaxPooling2D(pool_size=(3, 3), strides=(2, 2), name='pool3')(x)
x = fire_module(x, fire_id=4, squeeze=32, expand=128)
x = fire_module(x, fire_id=5, squeeze=32, expand=128)
x = MaxPooling2D(pool_size=(3, 3), strides=(2, 2), name='pool5')(x)
x = fire_module(x, fire_id=6, squeeze=48, expand=192)
x = fire_module(x, fire_id=7, squeeze=48, expand=192)
x = fire_module(x, fire_id=8, squeeze=64, expand=256)
x = fire_module(x, fire_id=9, squeeze=64, expand=256)
if include_top:
# It's not obvious where to cut the network...
# Could do the 8th or 9th layer... some work recommends cutting earlier layers.
x = Dropout(0.5, name='drop9')(x)
x = Convolution2D(classes, (1, 1), padding='valid', name='conv10')(x)
x = Activation('relu', name='relu_conv10')(x)
x = GlobalAveragePooling2D()(x)
x = Activation('softmax', name='loss')(x)
else:
if pooling == 'avg':
x = GlobalAveragePooling2D()(x)
elif pooling=='max':
x = GlobalMaxPooling2D()(x)
elif pooling==None:
pass
else:
raise ValueError("Unknown argument for 'pooling'=" + pooling)
#x = Dense(10, activation= 'softmax')(x)
# Ensure that the model takes into account
# any potential predecessors of `input_tensor`.
if input_tensor is not None:
inputs = get_source_inputs(input_tensor)
else:
inputs = img_input
model = Model(inputs, x, name='squeezenet')
# load weights
if weights == 'imagenet':
if include_top:
weights_path = get_file('squeezenet_weights_tf_dim_ordering_tf_kernels.h5',
WEIGHTS_PATH,
cache_subdir='models')
else:
weights_path = get_file('squeezenet_weights_tf_dim_ordering_tf_kernels_notop.h5',
WEIGHTS_PATH_NO_TOP,
cache_subdir='models')
model.load_weights(weights_path)
if K.backend() == 'theano':
layer_utils.convert_all_kernels_in_model(model)
return model
Example Usage:
import numpy as np
from keras_squeezenet import SqueezeNet
from keras.applications.imagenet_utils import preprocess_input, decode_predictions
from keras.preprocessing import image
model = SqueezeNet()
img = image.load_img('../images/cat.jpeg', target_size=(227, 227))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
preds = model.predict(x)
print('Predicted:', decode_predictions(preds))

Resources