TFF: 'trainable=True ' causes decrinsing of accuracy - tensorflow-federated

I work with TFF, here is a part of my code :
def create_keras_model():
baseModel = tf.keras.applications.ResNet50(include_top=False, weights=None, input_tensor=tf.keras.Input(shape=(224, 224, 3)))
for layer in baseModel.layers:
layer.trainable = False
return model
With this model I find test-accuracy value = 0.8
Now, I would like to change layer.trainable = True, but the test-accuracy value decrease to 0.2 and loss becomes 12 . which is not normal, can anyone tell why.

Related

How to register a dynamic backward hook on tensors in Pytorch?

I'm trying to register a backward hook on each neuron's weights in a network. By dynamic I mean that it will take a value and multiply the associated gradients by that value.
From here it seem like it's possible to register a hook on a tensor with a fixed value (though note that I need it to take a value that will change). From here it also seems like it's possible to register a hook on all of the parameters -- they use it to do gradients clipping (though note that I'm trying to only do it on each neuron's weights).
If my network is as follows:
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.fc1 = nn.Linear(3,5)
self.fc2 = nn.Linear(5,10)
self.fc3 = nn.Linear(10,1)
def forward(self, x):
x = torch.relu(self.fc1(x))
x = torch.relu(self.fc2(x))
x = torch.relu(self.fc3(x))
return x
The first layer has 5 neurons with 3 associated weights for each. Hence, this layer should have 5 hooks that modifies (i.e change the current gradient by multiplying it) their 3 associated weights gradients during the backward step.
Training pseudo-code example:
net = Model()
for epoch in epochs:
out = net(data)
loss = criterion(out, target)
optimizer.zero_grad()
loss.backward()
for hook in list_of_hooks: #not sure if there's a more "pytorch" way of doing this without a for loop
hook(random_value)
optimizer.step()
What about exploiting lambdas closure over names?
A short example:
import torch
net_params = torch.rand(5, 3, requires_grad=True)
msg = "Hello!"
t.register_hook(lambda g: print(msg))
out1 = net_params * 2.
loss = out1.sum()
loss.backward() # Activates the hook and prints "Hello!"
msg = "How are you?" # The lambda is affected by this change
out2 = t ** 4.
loss2 = out2.sum()
loss2.backward() # Activates the hook again and prints "How are you?"
So a possible solution to your problem:
net = Model()
# Replace it with your computed values
rand_values = torch.rand(net.fc1.out_features, net.fc1.in_features)
net.fc1.weight.register_hook(lambda g: g * rand_values)
for epoch in epochs:
out = net(data)
loss = criterion(out, target)
optimizer.zero_grad()
loss.backward() # fc1 gradients are multiplied by rand_values
optimizer.step()
# Update rand_values. The lambda computation will change accordingly
rand_values = torch.rand(net.fc1.out_features, net.fc1.in_features)
Edit
To make things clearer, if you specifically want to multiply each set of weights i by a single value vi you can exploit broadcasting semantic and define values = torch.tensor([v0, v1, v2, v3, v4]).reshape(5, 1), then the lambda becomes lambda g: g * values

How to get a shifting windows output on CNN-LSTM time-series forecasting? ([t-120:t] sequence to predict [t+1:t+40])

I am trying to use CNN+LSTM model for reliable stock price forecasting.
I am hoping to get model pred value of next 40 days output for every day.
I have successfully concatenated the Conv2D layer as LSTM input layer.
In short, I use [t-120 : t] sequence to predict [t+1 : t+40].
Now, I faced an issue of model output that prints very similar values (almost constant) during the 60 days of test period.
(It is not exactly the same, but the 40 day trend are almost same)
I am expecting a daily result of 40 days with shifted window.
(see image below for better understanding)
Here is my question:
Why am I getting similar output on my test set? (not shifted window)
Is there a problem with my activation function (maybe in the Dense layer)?
def create_model_cnn_lstm(params, numOfFeat, input_shape):
input_layers = []
channel_list = []
for i in range(0, numOfFeat):
layer = Input(shape=input_shape)
input_layers.append(layer)
layer = TimeDistributed(Conv2D(32,(3,3), strides=1,kernel_regularizer=0.01, padding='same', activation="relu", use_bias=True,kernel_initializer='glorot_uniform'))(layer)
layer = TimeDistributed(Conv2D(64,(5,5), strides=1,kernel_regularizer=0.01, padding='same', activation="relu", use_bias=True,kernel_initializer='glorot_uniform'))(layer)
layer = TimeDistributed(MaxPool2D(pool_size=2)(layer)
layer = TimeDistributed(Dropout(0.3))(layer)
layer = TimeDistributed(Flatten())(layer)
layer = TimeDistributed(Dense(10))(layer)
channel_list.append(layer)
layer = Concatenate(axis = -1)(channel_list)
layer = LSTM(units = 512, activation = 'tanh', return_sequences = False, dropout = 0.2, recurrent_dropout=0.1, kernel_regularizer=0.01)(layer)
layer = Dense(512, activation = 'tanh')(layer)
layer = Dropout(0.5)(layer)
layer = Dense(pred_len)(layer)
model = Model(input_layers, layer)
optimizer = optimizers.Adam(learning_rate=params["lr"], beta_1=0.9, beta_2=0.999, amsgrad=False)
model.compile(loss='mse', optimizer=optimizer, metrics=['mse'])
return model

Why Resnet50 with TFF does not give good results

I use TFF 0.12.0 and image dataset for dog and cat(2 labels), If I test with VGG16, Ifind accuracy 0.9 but If I change to ResNet50, accuracy decrease to 0.4, Here is what I write:
def create_compiled_keras_model():
baseModel = tf.keras.applications.ResNet50(include_top=False, weights="imagenet", input_tensor=tf.keras.Input(shape=(224, 224, 3)))
resnet_output = baseModel.output
layer1 = tf.keras.layers.GlobalAveragePooling2D()(resnet_output)
layer2 = tf.keras.layers.Flatten(name="flatten")(layer1)
layer2 = tf.keras.layers.Dense(units=256, name='nonlinear', activation="relu")(layer2)
dropout_layer = tf.keras.layers.Dropout(0.5)(layer2)
model_output = tf.keras.layers.Dense(2, activation="softmax")(dropout_layer)
model = tf.keras.Model(inputs=baseModel.input, outputs=model_output)
for layer in baseModel.layers:
layer.trainable = False
model.compile(optimizer=tf.keras.optimizers.SGD(lr=0.001, momentum =0.9),
loss=tf.keras.losses.CategoricalCrossentropy(),
metrics=([tf.keras.metrics.CategoricalAccuracy()]))
return model
def model_fn():
keras_model = create_compiled_keras_model()
return tff.learning.from_compiled_keras_model(keras_model, sample_batch)
iterative_process = tff.learning.build_federated_averaging_process(model_fn, server_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=1.0),client_weight_fn=None)
state = iterative_process.initialize()
evaluation = tff.learning.build_federated_evaluation(model_fn)
but accuracy does not exceed 0.46 even after 100 rounds. here is a part of the result :
round 1, metrics=<categorical_accuracy=0.500249981880188,loss=0.7735000252723694,keras_training_time_client_sum_sec=0.0>
round 2, metrics=<categorical_accuracy=0.47187501192092896,loss=0.7735000252723694,keras_training_time_client_sum_sec=0.0>
....
round 99, metrics=<categorical_accuracy=0.4632812440395355,loss=0.7622881531715393,keras_training_time_client_sum_sec=0.0>
round 100, metrics=<categorical_accuracy=0.46015626192092896,loss=0.7622881531748393,keras_training_time_client_sum_sec=0.0>
Help Please!!!

Pytorch model stuck at 0.5 though loss decreases consistently

This is using PyTorch
I have been trying to implement UNet model on my images, however, my model accuracy is always exact 0.5. Loss does decrease.
I have also checked for class imbalance. I have also tried playing with learning rate. Learning rate affects loss but not the accuracy.
My architecture below ( from here )
""" `UNet` class is based on https://arxiv.org/abs/1505.04597
The U-Net is a convolutional encoder-decoder neural network.
Contextual spatial information (from the decoding,
expansive pathway) about an input tensor is merged with
information representing the localization of details
(from the encoding, compressive pathway).
Modifications to the original paper:
(1) padding is used in 3x3 convolutions to prevent loss
of border pixels
(2) merging outputs does not require cropping due to (1)
(3) residual connections can be used by specifying
UNet(merge_mode='add')
(4) if non-parametric upsampling is used in the decoder
pathway (specified by upmode='upsample'), then an
additional 1x1 2d convolution occurs after upsampling
to reduce channel dimensionality by a factor of 2.
This channel halving happens with the convolution in
the tranpose convolution (specified by upmode='transpose')
Arguments:
in_channels: int, number of channels in the input tensor.
Default is 3 for RGB images. Our SPARCS dataset is 13 channel.
depth: int, number of MaxPools in the U-Net. During training, input size needs to be
(depth-1) times divisible by 2
start_filts: int, number of convolutional filters for the first conv.
up_mode: string, type of upconvolution. Choices: 'transpose' for transpose convolution
"""
class UNet(nn.Module):
def __init__(self, num_classes, depth, in_channels, start_filts=16, up_mode='transpose', merge_mode='concat'):
super(UNet, self).__init__()
if up_mode in ('transpose', 'upsample'):
self.up_mode = up_mode
else:
raise ValueError("\"{}\" is not a valid mode for upsampling. Only \"transpose\" and \"upsample\" are allowed.".format(up_mode))
if merge_mode in ('concat', 'add'):
self.merge_mode = merge_mode
else:
raise ValueError("\"{}\" is not a valid mode for merging up and down paths.Only \"concat\" and \"add\" are allowed.".format(up_mode))
# NOTE: up_mode 'upsample' is incompatible with merge_mode 'add'
if self.up_mode == 'upsample' and self.merge_mode == 'add':
raise ValueError("up_mode \"upsample\" is incompatible with merge_mode \"add\" at the moment "
"because it doesn't make sense to use nearest neighbour to reduce depth channels (by half).")
self.num_classes = num_classes
self.in_channels = in_channels
self.start_filts = start_filts
self.depth = depth
self.down_convs = []
self.up_convs = []
# create the encoder pathway and add to a list
for i in range(depth):
ins = self.in_channels if i == 0 else outs
outs = self.start_filts*(2**i)
pooling = True if i < depth-1 else False
down_conv = DownConv(ins, outs, pooling=pooling)
self.down_convs.append(down_conv)
# create the decoder pathway and add to a list
# - careful! decoding only requires depth-1 blocks
for i in range(depth-1):
ins = outs
outs = ins // 2
up_conv = UpConv(ins, outs, up_mode=up_mode, merge_mode=merge_mode)
self.up_convs.append(up_conv)
self.conv_final = conv1x1(outs, self.num_classes)
# add the list of modules to current module
self.down_convs = nn.ModuleList(self.down_convs)
self.up_convs = nn.ModuleList(self.up_convs)
self.reset_params()
#staticmethod
def weight_init(m):
if isinstance(m, nn.Conv2d):
#https://prateekvjoshi.com/2016/03/29/understanding-xavier-initialization-in-deep-neural-networks/
##Doc: https://pytorch.org/docs/stable/nn.init.html?highlight=xavier#torch.nn.init.xavier_normal_
init.xavier_normal_(m.weight)
init.constant_(m.bias, 0)
def reset_params(self):
for i, m in enumerate(self.modules()):
self.weight_init(m)
def forward(self, x):
encoder_outs = []
# encoder pathway, save outputs for merging
for i, module in enumerate(self.down_convs):
x, before_pool = module(x)
encoder_outs.append(before_pool)
for i, module in enumerate(self.up_convs):
before_pool = encoder_outs[-(i+2)]
x = module(before_pool, x)
# No softmax is used. This means we need to use
# nn.CrossEntropyLoss is your training script,
# as this module includes a softmax already.
x = self.conv_final(x)
return x
Parameters are :
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
x,y = train_sequence[0] ; batch_size = x.shape[0]
model = UNet(num_classes = 2, depth=5, in_channels=5, merge_mode='concat').to(device)
optim = torch.optim.Adam(model.parameters(),lr=0.01, weight_decay=1e-3)
criterion = nn.BCEWithLogitsLoss() #has sigmoid internally
epochs = 1000
The function for training is :
import torch.nn.functional as f
def train_model(epoch,train_sequence):
"""Train the model and report validation error with training error
Args:
model: the model to be trained
criterion: loss function
data_train (DataLoader): training dataset
"""
model.train()
for idx in range(len(train_sequence)):
X, y = train_sequence[idx]
images = Variable(torch.from_numpy(X)).to(device) # [batch, channel, H, W]
masks = Variable(torch.from_numpy(y)).to(device)
outputs = model(images)
print(masks.shape, outputs.shape)
loss = criterion(outputs, masks)
optim.zero_grad()
loss.backward()
# Update weights
optim.step()
# total_loss = get_loss_train(model, data_train, criterion)
My function for calculating loss and accuracy is below:
def get_loss_train(model, train_sequence):
"""
Calculate loss over train set
"""
model.eval()
total_acc = 0
total_loss = 0
for idx in range(len(train_sequence)):
with torch.no_grad():
X, y = train_sequence[idx]
images = Variable(torch.from_numpy(X)).to(device) # [batch, channel, H, W]
masks = Variable(torch.from_numpy(y)).to(device)
outputs = model(images)
loss = criterion(outputs, masks)
preds = torch.argmax(outputs, dim=1).float()
acc = accuracy_check_for_batch(masks.cpu(), preds.cpu(), images.size()[0])
total_acc = total_acc + acc
total_loss = total_loss + loss.cpu().item()
return total_acc/(len(train_sequence)), total_loss/(len(train_sequence))
Edit : Code which runs (calls) the functions:
for epoch in range(epochs):
train_model(epoch, train_sequence)
train_acc, train_loss = get_loss_train(model,train_sequence)
print("Train Acc:", train_acc)
print("Train loss:", train_loss)
Can someone help me identify as why is accuracy always exact 0.5?
Edit-2:
As asked accuracy_check_for_batch function is here:
def accuracy_check_for_batch(masks, predictions, batch_size):
total_acc = 0
for index in range(batch_size):
total_acc += accuracy_check(masks[index], predictions[index])
return total_acc/batch_size
and
def accuracy_check(mask, prediction):
ims = [mask, prediction]
np_ims = []
for item in ims:
if 'str' in str(type(item)):
item = np.array(Image.open(item))
elif 'PIL' in str(type(item)):
item = np.array(item)
elif 'torch' in str(type(item)):
item = item.numpy()
np_ims.append(item)
compare = np.equal(np_ims[0], np_ims[1])
accuracy = np.sum(compare)
return accuracy/len(np_ims[0].flatten())
I found the mistake.
model = UNet(num_classes = 2, depth=5, in_channels=5, merge_mode='concat').to(device)
should be
model = UNet(num_classes = 1, depth=5, in_channels=5, merge_mode='concat').to(device)
because I am using BCELosswithLogits.

PyTorch: CrossEntropyLoss, changing class weight does not change the computed loss

According to Doc for cross entropy loss, the weighted loss is calculated by multiplying the weight for each class and the original loss.
However, in the pytorch implementation, the class weight seems to have no effect on the final loss value unless it is set to zero. Following is the code:
from torch import nn
import torch
logits = torch.FloatTensor([
[0.1, 0.9],
])
label = torch.LongTensor([0])
criterion = nn.CrossEntropyLoss(weight=torch.FloatTensor([1, 1]))
loss = criterion(logits, label)
print(loss.item()) # result: 1.1711
# Change class weight for the first class to 0.1
criterion = nn.CrossEntropyLoss(weight=torch.FloatTensor([0.1, 1]))
loss = criterion(logits, label)
print(loss.item()) # result: 1.1711, should be 0.11711
# Change weight for first class to 0
criterion = nn.CrossEntropyLoss(weight=torch.FloatTensor([0, 1]))
loss = criterion(logits, label)
print(loss.item()) # result: 0
As illustrated in the code, the class weight seems to have no effect unless it is set to 0, this behavior contradicts to the documentation.
Updates
I implemented a version of weighted cross entropy which is in my eyes the "correct" way to do it.
import torch
from torch import nn
def weighted_cross_entropy(logits, label, weight=None):
assert len(logits.size()) == 2
batch_size, label_num = logits.size()
assert (batch_size == label.size(0))
if weight is None:
weight = torch.ones(label_num).float()
assert (label_num == weight.size(0))
x_terms = -torch.gather(logits, 1, label.unsqueeze(1)).squeeze()
log_terms = torch.log(torch.sum(torch.exp(logits), dim=1))
weights = torch.gather(weight, 0, label).float()
return torch.mean((x_terms+log_terms)*weights)
logits = torch.FloatTensor([
[0.1, 0.9],
[0.0, 0.1],
])
label = torch.LongTensor([0, 1])
neg_weight = 0.1
weight = torch.FloatTensor([neg_weight, 1])
criterion = nn.CrossEntropyLoss(weight=weight)
loss = criterion(logits, label)
print(loss.item()) # results: 0.69227
print(weighted_cross_entropy(logits, label, weight).item()) # results: 0.38075
What I did is to multiply each instance in the batch with its associated class weight. The result is still different from the original pytorch implementation, which makes me wonder how pytorch actually implement this.

Resources