I use TFF 0.12.0 and I run a code of federated learning for image classification with VGG16, and here is a part of my code:
def create_compiled_keras_model():
layer1 = tf.keras.layers.GlobalAveragePooling2D()(output)
layer1 = tf.keras.layers.Dense(units=256)(output)
model_output = tf.keras.layers.Dense(units=2, activation='relu')(layer1)
model = tf.keras.Model(model.input, model_output)
return model
def model_fn():
keras_model = create_compiled_keras_model()
return tff.learning.from_keras_model(keras_model, sample_batch, loss=tf.keras.losses.CategoricalCrossentropy(),metrics=[tf.keras.metrics.CategoricalAccuracy()])
After running, I see that accuracy does not increase, knowing that I initialize 100 rounds.
round 1, metrics=<categorical_accuracy=0.5,loss=8.059043884277344,keras_training_time_client_sum_sec=0.0>
round 2, metrics=<categorical_accuracy=0.5,loss=8.059045791625977,keras_training_time_client_sum_sec=0.0>
round 3, metrics=<categorical_accuracy=0.5,loss=8.05904769897461,keras_training_time_client_sum_sec=0.0>
round 4, metrics=<categorical_accuracy=0.5,loss=8.059043884277344,keras_training_time_client_sum_sec=0.0>
round 5, metrics=<categorical_accuracy=0.5,loss=8.059045791625977,keras_training_time_client_sum_sec=0.0>
round 6, metrics=<categorical_accuracy=0.5,loss=8.059045791625977,keras_training_time_client_sum_sec=0.0>
Related
What I am trying to do
I want to create a model that smoothes my predictions. My predictions have a shape [num samples, 4, 7], where 4 is the sequence length and 7 is the number of classes. The class values sum to 100.
However, my predictions often fluctuate, predicting for example a value of 50 for class 5 at time step 1, and 89 for time step 2. In reality, a class rarely makes such extreme fluctuations. So, I want to smooth my predictions,
I have training data that has a similar shape [num samples, 4, 7]. I want to create a model that learns the behavior of classes using this data, and then applies that on my predictions, hopefully smoothing my results.
I understand that I can just average out results and smooth like that, but I am curious if I can use a deep learning model that understand underlying probabilities and indirectly more corrects as well as smoothes the predictions.
What I have tried
However, I am struggling to understand how one creates such an architecture. I have tried working with matrices as well as with LSTM:
class SmoothModel(nn.Module):
def __init__(self, input_size, output_size):
super(SmoothModel, self).__init__()
self.input_size = input_size
self.output_size = output_size
# Initialize the cooccurrence matrix as a learnable parameter
self.cooccurrence = nn.Parameter(torch.randn(input_size, output_size))
# Initialize the transition matrix as a learnable parameter
self.transition = nn.Parameter(torch.randn(input_size, output_size))
# Softmax layer
self.softmax = nn.Softmax(dim=-1)
def forward(self, x):
sequences_updated = []
# Update sequence based on transition and cooccurence matrix
for i in range(x.shape[0]):
# Cooccurence multiplication
seq_list = []
for j in range(x.shape[1]):
predicted_cooc = x[i, j, :].unsqueeze(0) # shape [1, 7]
updated_cooc = torch.matmul(predicted_cooc, self.cooccurrence)
seq_list.append(updated_cooc)
# Create to a sequence of 4 again, where the cooccurence is updated
seq = torch.cat(seq_list, dim=0) # create shape [4, 7]
# Transition multiplication
updated_seq = torch.matmul(seq, self.transition) # shape [4, 7]
# Append the updated sequence
sequences_updated.append(updated_seq.unsqueeze(0)) # append shape [1, 4, 7]
# Create tensor with all updated sequences
updated_tensor = torch.cat(sequences_updated, dim=0) # dim = 0 is the number of samples
# Output should sum to 100
updated_tensor = self.softmax(updated_tensor) * 100
return updated_tensor
My idea behind this model was that it would update my predictions based on learned cooccurrence and transition probabilities.
Another model I tried, but with LSTM:
class SmoothModel(nn.Module):
def __init__(self, input_size, output_size, hidden_size = 64):
super(SmoothModel, self).__init__()
self.input_size = input_size
self.output_size = output_size
# Initialize the cooccurrence as a learnable parameter
self.cooccurrence = nn.Linear(input_size, output_size)
# Initialize the transition probability as a learnable parameter
self.transition = nn.LSTM(input_size, hidden_size)
self.transition_probability = nn.Linear(hidden_size, output_size)
# Softmax layer
self.softmax = nn.Softmax(dim=-1)
def forward(self, x):
sequences_updated = []
# Update sequence based on transition and cooccurence matrix
for i in range(x.shape[0]):
# Cooccurence multiplication
seq_list = []
for j in range(x.shape[1]):
predicted_cooc = x[i, j, :].unsqueeze(0) # shape [1, 7]
updated_cooc = self.cooccurrence(predicted_cooc)
seq_list.append(updated_cooc)
# Create to a sequence of 4 again, where the cooccurence is updated
seq = torch.cat(seq_list, dim=0) # create shape [4, 7]
# Transition probability
_, (hidden, _) = self.transition(seq.unsqueeze(0)) # shape [1, 4, hidden size]
updated_seq = self.transition_probability(hidden[-1, :, :].unsqueeze(0)) # shape [1, 4, 7]
# Append the updated sequence
sequences_updated.append(updated_seq) # append
# Create tensor with all updated sequences
updated_tensor = torch.cat(sequences_updated, dim=0) # dim = 0 is the number of samples
# Output should sum to 100
updated_tensor = self.softmax(updated_tensor) * 100
return updated_tensor
I furthermore tried some variants on this, for example only updating time step per timestep and sort of Markov Chain theory. But current models don't improve results.
Question
Does anyone have experience regarding this / know what theory/architecture I could be using? Or should I look at it a total different way?
I am happy to provide further (data) information if necessary!
I use TFF 0.12.0 and image dataset for dog and cat(2 labels), If I test with VGG16, Ifind accuracy 0.9 but If I change to ResNet50, accuracy decrease to 0.4, Here is what I write:
def create_compiled_keras_model():
baseModel = tf.keras.applications.ResNet50(include_top=False, weights="imagenet", input_tensor=tf.keras.Input(shape=(224, 224, 3)))
resnet_output = baseModel.output
layer1 = tf.keras.layers.GlobalAveragePooling2D()(resnet_output)
layer2 = tf.keras.layers.Flatten(name="flatten")(layer1)
layer2 = tf.keras.layers.Dense(units=256, name='nonlinear', activation="relu")(layer2)
dropout_layer = tf.keras.layers.Dropout(0.5)(layer2)
model_output = tf.keras.layers.Dense(2, activation="softmax")(dropout_layer)
model = tf.keras.Model(inputs=baseModel.input, outputs=model_output)
for layer in baseModel.layers:
layer.trainable = False
model.compile(optimizer=tf.keras.optimizers.SGD(lr=0.001, momentum =0.9),
loss=tf.keras.losses.CategoricalCrossentropy(),
metrics=([tf.keras.metrics.CategoricalAccuracy()]))
return model
def model_fn():
keras_model = create_compiled_keras_model()
return tff.learning.from_compiled_keras_model(keras_model, sample_batch)
iterative_process = tff.learning.build_federated_averaging_process(model_fn, server_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=1.0),client_weight_fn=None)
state = iterative_process.initialize()
evaluation = tff.learning.build_federated_evaluation(model_fn)
but accuracy does not exceed 0.46 even after 100 rounds. here is a part of the result :
round 1, metrics=<categorical_accuracy=0.500249981880188,loss=0.7735000252723694,keras_training_time_client_sum_sec=0.0>
round 2, metrics=<categorical_accuracy=0.47187501192092896,loss=0.7735000252723694,keras_training_time_client_sum_sec=0.0>
....
round 99, metrics=<categorical_accuracy=0.4632812440395355,loss=0.7622881531715393,keras_training_time_client_sum_sec=0.0>
round 100, metrics=<categorical_accuracy=0.46015626192092896,loss=0.7622881531748393,keras_training_time_client_sum_sec=0.0>
Help Please!!!
I have a dataset containing 1000 examples where each example has 5 features (a,b,c,d,e). I want to feed 7 examples to an LSTM so it predicts the feature (a) of the 8th day.
Reading Pytorchs documentation of nn.LSTM() I came up with the following:
input_size = 5
hidden_size = 10
num_layers = 1
output_size = 1
lstm = nn.LSTM(input_size, hidden_size, num_layers)
fc = nn.Linear(hidden_size, output_size)
out, hidden = lstm(X) # Where X's shape is ([7,1,5])
output = fc(out[-1])
output # output's shape is ([7,1])
According to the docs:
The input of the nn.LSTM is "input of shape (seq_len, batch, input_size)" with "input_size – The number of expected features in the input x",
And the output is: "output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the LSTM, for each t."
In this case, I thought seq_len would be the sequence of 7 examples, batchis 1 and input_size is 5. So the lstm would consume each example containing 5 features refeeding the hidden layer every iteration.
What am I missing?
When I extend your code to a full example -- I also added some comments to may help -- I get the following:
import torch
import torch.nn as nn
input_size = 5
hidden_size = 10
num_layers = 1
output_size = 1
lstm = nn.LSTM(input_size, hidden_size, num_layers)
fc = nn.Linear(hidden_size, output_size)
X = [
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
]
X = torch.tensor(X, dtype=torch.float32)
print(X.shape) # (seq_len, batch_size, input_size) = (7, 1, 5)
out, hidden = lstm(X) # Where X's shape is ([7,1,5])
print(out.shape) # (seq_len, batch_size, hidden_size) = (7, 1, 10)
out = out[-1] # Get output of last step
print(out.shape) # (batch, hidden_size) = (1, 10)
out = fc(out) # Push through linear layer
print(out.shape) # (batch_size, output_size) = (1, 1)
This makes sense to me, given your batch_size = 1 and output_size = 1 (I assume, you're doing regression). I don't know where your output.shape = (7, 1) come from.
Are you sure that your X has the correct dimensions? Did you create nn.LSTM maybe with batch_first=True? There are lot of little things that can sneak in.
According to Doc for cross entropy loss, the weighted loss is calculated by multiplying the weight for each class and the original loss.
However, in the pytorch implementation, the class weight seems to have no effect on the final loss value unless it is set to zero. Following is the code:
from torch import nn
import torch
logits = torch.FloatTensor([
[0.1, 0.9],
])
label = torch.LongTensor([0])
criterion = nn.CrossEntropyLoss(weight=torch.FloatTensor([1, 1]))
loss = criterion(logits, label)
print(loss.item()) # result: 1.1711
# Change class weight for the first class to 0.1
criterion = nn.CrossEntropyLoss(weight=torch.FloatTensor([0.1, 1]))
loss = criterion(logits, label)
print(loss.item()) # result: 1.1711, should be 0.11711
# Change weight for first class to 0
criterion = nn.CrossEntropyLoss(weight=torch.FloatTensor([0, 1]))
loss = criterion(logits, label)
print(loss.item()) # result: 0
As illustrated in the code, the class weight seems to have no effect unless it is set to 0, this behavior contradicts to the documentation.
Updates
I implemented a version of weighted cross entropy which is in my eyes the "correct" way to do it.
import torch
from torch import nn
def weighted_cross_entropy(logits, label, weight=None):
assert len(logits.size()) == 2
batch_size, label_num = logits.size()
assert (batch_size == label.size(0))
if weight is None:
weight = torch.ones(label_num).float()
assert (label_num == weight.size(0))
x_terms = -torch.gather(logits, 1, label.unsqueeze(1)).squeeze()
log_terms = torch.log(torch.sum(torch.exp(logits), dim=1))
weights = torch.gather(weight, 0, label).float()
return torch.mean((x_terms+log_terms)*weights)
logits = torch.FloatTensor([
[0.1, 0.9],
[0.0, 0.1],
])
label = torch.LongTensor([0, 1])
neg_weight = 0.1
weight = torch.FloatTensor([neg_weight, 1])
criterion = nn.CrossEntropyLoss(weight=weight)
loss = criterion(logits, label)
print(loss.item()) # results: 0.69227
print(weighted_cross_entropy(logits, label, weight).item()) # results: 0.38075
What I did is to multiply each instance in the batch with its associated class weight. The result is still different from the original pytorch implementation, which makes me wonder how pytorch actually implement this.
i got a one-file python game, where a pixel in the first array should hunt (on the same postion in his array) a pixel in the second array. I trained it now for hours and hours and the only thing changes in the neural net seemed to be the bias of the last convnet ? I think, mostly the weights should change and not so much the bias, or? The code of this simple game is here: https://github.com/flobotics/flobotics_tensorflow_game/blob/master/pixel_hunter_game/flobotics_game.py
And here i got pictures of the weights and biases in tensorboard
It seemed that batch_norm does not exist anymore, but there is batch_normalization ? would that be an correct implementation in my case ?
h_conv1 = tf.nn.relu(tf.nn.conv2d(input_layer, conv_weights_1, strides=[1, 4, 4, 1], padding="SAME") + conv_biases_1)
#batch normalization
bn_mean, bn_variance = tf.nn.moments(h_conv1,[0,1,2])
bn_scale = tf.Variable(tf.ones([32]))
bn_offset = tf.Variable(tf.zeros([32]))
bn_epsilon = 1e-3
bn_conv1 = tf.nn.batch_normalization(h_conv1, bn_mean, bn_variance, bn_offset, bn_scale, bn_epsilon)
#h_conv1 = tf.nn.relu(tf.nn.conv2d(input_layer, conv_weights_1, strides=[1, 4, 4, 1], padding="SAME") + conv_biases_1)
#h_pool1 = max_pool_2x2(h_conv1)
h_pool1 = max_pool_2x2(bn_conv1)