How to add leaning late while calculating backward propagation in Pytorch - machine-learning

I heard that while doing backward propagation, the weight will be updated
using learning late and partial derivative.
But I don't know where to put the learning_late parameter in the backward propagation code. And I'm wonder that without settings of learning late,
what is the default learning late?
So, Here is the Code that I want to learn.
import torch
import torch.nn as nn
import torch.optim as optim
class MyNeuralNetwork(nn.Module):
def __init__(self):
super(MyNeuralNetwork, self).__init__()
layer_1=nn.Linear(in_features=2, out_features=2, bias=False)
weight_1 = torch.tensor([[.3,.25],[.4, .35]])
layer_1.weight = nn.Parameter(weight_1)
self.layer1 = nn.Sequential(
layer_1,
nn.Sigmoid()
)
layer_2 = nn.Linear(in_features=2, out_features=2, bias=False)
weight_2 = torch.tensor([[.45, .4],[.7, .6]])
layer_2.weight = nn.Parameter(weight_2)
self.layer2 = nn.Sequential(
layer_2,
nn.Sigmoid()
)
def forward(self, input):
output = self.layer1(input)
output = self.layer2(output)
return output
model = MyNeuralNetwork().to("cpu")
print(model)
input = torch.tensor([0.1,0.2]).reshape(1,-1)
target = torch.tensor([0.4,0.6]).reshape(1,-1)
out = model(input)
print(f"output value : {out}")
criterion = nn.MSELoss()
loss = criterion(out, target)
print(f"loss value : {loss}")
model.zero_grad()
print('↓ layer1.weight before backward propagation ↓')
print(model._modules['layer1']._modules['0'].weight)
print(model._modules['layer2']._modules['0'].weight)
print()
loss.backward() # where can I put the learning late in back propagation.
print('↓ layer1.weight after backward propagation ↓')
print(model._modules['layer1']._modules['0'].weight)
print(model._modules['layer2']._modules['0'].weight)
My question's Point is how to add learning late which I want
for train this model.

The answer is that you need to use the the learning rate parameter of the optimizer when you call the optimizer's step function in order to update the weights of your model. Specifically, when you create an optimizer, you'll need to specify a learning rate like this:
optimizer = optim.SGD(model.parameters(), lr=learning_rate)
Then, when you do a training/Testing loop, you'll call the optimizer's step function after the backward pass like this:
optimizer.step()
This step function will update the weights in the model using the learning rate that you specified. The default learning rate is usually 0.01, but you can change it based on your specific needs.

Related

How to add multiple layers to an RNN module for sentiment analysis? Pytorch

I am trying to create a sentiment analysis model with Pytorch (newbie)
import torch.nn as nn
class RNN(nn.Module):
def __init__(self, input_dim, embedding_dim, hidden_dim, output_dim, dropout):
super().__init__() #to call the functions in the superclass
self.embedding = nn.Embedding(input_dim, embedding_dim) #Embedding layer to create dense vector instead of sparse matrix
self.rnn = nn.RNN(embedding_dim, hidden_dim)
self.fc = nn.Linear(hidden_dim, output_dim)
self.dropout = nn.Dropout(dropout)
def forward(self, text):
embedded = self.embedding(text)
output, hidden = self.rnn(embedded)
hidden = self.dropout(hidden[-1,:,:])
nn.Sigmoid()
return self.fc(hidden)
However, the accuracy is below 50% and I would like to add an extra layer, maybe another linear before feeding it to the last linear to get the prediction. What kind of layers can I add after the RNN and before the last Linear? and also what should I feed it with?
I have tried simply adding another
output, hidden= self.fc(hidden)
but I get
ValueError: too many values to unpack (expected 2)
Which I believe is because the output of the previous layer with activation and dropout is different. The help is greatly appreciated.
Thanks
You were very close, just change your forward call to:
import torch.nn.functional as F
class model_RNN(nn.Module):
def __init__(self, input_dim, embedding_dim, hidden_dim, output_dim, dropout):
super().__init__() #to call the functions in the superclass
self.embedding = nn.Embedding(input_dim, embedding_dim) #Embedding layer to create dense vector instead of sparse matrix
self.rnn = nn.RNN(embedding_dim, hidden_dim)
self.hidden_fc = nn.Linear(hidden_dim,hidden_dim)
self.out_fc = nn.Linear(hidden_dim, output_dim)
self.dropout = nn.Dropout(dropout)
def forward(self, text):
embedded = self.embedding(text)
output, hidden = self.rnn(embedded)
hidden = self.dropout(hidden[-1,:,:])
hidden = F.relu(torch.self.hidden_fc(hidden))
return self.out_fc(hidden)
Just a note, calling nn.Sigmoid() won't do anything to your model output because it will just create a sigmoid layer but won't call it on your data. What you want is probably torch.sigmoid(self.fc(hidden)). Although I would say it's not recommended to use an output activation because some common loss functions require the raw logits. Make sure you apply the sigmoid after the model call in eval mode though!

Why does the difference in network architecture make a huge difference in name classification

I tried to build a RNN by myself following this tutorial https://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial. I built my own version with this following network architecture, which is different from the tutorial.a stands for input layer, h hidden, o output. Here's my code:
class RNN(nn.Module):
def __init__(self,input_size,hidden_size,output_size,initial_hidden):
super(RNN, self).__init__()
self.linear1 = nn.Linear(input_size,hidden_size)
self.linear2 = nn.Linear(hidden_size,hidden_size,bias=False)
self.linear3 = nn.Linear(hidden_size,output_size)
self.prev_hidden = initial_hidden
def forward(self,X):
input = torch.add(self.linear1(X).view(1,-1),self.linear2(self.prev_hidden.to(device))
hidden = nn.ReLU()(input)
self.prev_hidden = hidden.detach()
output = self.linear3(hidden)
return output
This model stops at loss = 12000 over all samples and doesn't really drop anymore. However, after switching to the model described in the tutorial, which the hidden and input layers share the same weight, the loss drops to 4000 with the same hyper parameter. Here's the code:
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(RNN, self).__init__()
self.hidden_size = hidden_size
self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
self.i2o = nn.Linear(input_size + hidden_size, output_size)
self.softmax = nn.LogSoftmax(dim=1)
def forward(self, input, hidden):
combined = torch.cat((input, hidden), 1)
hidden = self.i2h(combined)
output = self.i2o(combined)
output = self.softmax(output)
return output, hidden
def initHidden(self):
return torch.zeros(1, self.hidden_size)
Why does the model architecture in the tutorial outperforms my version so much?
This line
self.prev_hidden = hidden.detach()
Makes you never backprop through time through your RNN, that looks like pretty non standard idea for training a neural network, and definitely limits its ability to learn.
Other obvious differences is that your implementation does not output probability (it lacks some projection onto a simplex, e.g. a softmax) which is hard to verify how much of an issue it is as the full training code is missing.

Save the best performing model (but not regarding validation acc) in Pytorch

Is there a way to replicate behavior of Keras earlyStopping method in PyTorch?
For example, if I want best model in regards of f1 score rather than acquired validation accuracy, is there an equivalent to Keras
earlyStopping = EarlyStopping(monitor='f1_score', patience=10, verbose=0, mode='min')?
Pytorch has no callback functions like Keras does, so you'd have to take care of this manually.
Here's a rough pseudo-code outline:
NUM_EPOCHS = 100
train_data, val_data = ...
model = ...
loss_fn = ...
optimizer = ...
# for early stopping
curr_best_f1 = 0.0
patience = 10
for epoch in range(NUM_EPOCHS):
# training step
model.train()
train_step(model, loss_fn, optimizer, train_data)
# validation step
model.eval()
val_preds, val_labels = val_step(model, loss_fn, val_data)
# check early stopping criterion
f1_score = F1_Score(val_preds, val_labels)
if f1_score >= curr_best_f1:
curr_best_f1 = f1_score
patience = 10
else:
patience -= 1
if patience == 0:
break
Change F1_Score to whatever metric you like to use and the >= accordingly if you like to have min instead of max. You may also want to put the early stopping checking in a separate function or small class.

How can I use an LSTM to classify a series of vectors into two categories in Pytorch

I have a series of vectors representing a signal over time. I'd like to classify parts of the signal into two categories: 1 or 0. The reason for using LSTM is that I believe the network will need knowledge of the entire signal to classify.
My problem is developing the PyTorch model. Below is the class I've come up with.
class LSTMClassifier(nn.Module):
def __init__(self, input_dim, hidden_dim, label_size, batch_size):
self.lstm = nn.LSTM(input_dim, hidden_dim)
self.hidden2label = nn.Linear(hidden_dim, label_size)
self.hidden = self.init_hidden()
def init_hidden(self):
return (torch.zeros(1, self.batch_size, self.hidden_dim),
torch.zeros(1, self.batch_size, self.hidden_dim))
def forward(self, x):
lstm_out, self.hidden = self.lstm(x, self.hidden)
y = self.hidden2label(lstm_out[-1])
log_probs = F.log_softmax(y)
return log_probs
However this model is giving a bunch of shape errors, and I'm having trouble understanding everything going on. I looked at this SO question first.
You should follow PyTorch documentation, especially inputs and outputs part, always.
This is how the classifier should look like:
import torch
import torch.nn as nn
class LSTMClassifier(nn.Module):
def __init__(self, input_dim, hidden_dim, label_size):
super().__init__()
self.lstm = nn.LSTM(input_dim, hidden_dim, batch_first=True)
self.hidden2label = nn.Linear(hidden_dim, label_size)
def forward(self, x):
_, (h_n, _) = self.lstm(x)
return self.hidden2label(h_n.reshape(x.shape[0], -1))
clf = LSTMClassifier(100, 200, 1)
inputs = torch.randn(64, 10, 100)
clf(inputs)
Points to consider:
always use super().__init__() as it registers modules in your neural networks, allows for hooks etc.
Use batch_first=True so you can pass inputs of shape (batch, timesteps, n_features)
No need to init_hidden with zeros, it is the default value if left uninitialized
No need to pass self.hidden each time to LSTM. Moreover, you should not do that. It means that elements from each batch of data are somehow next steps, while batch elements should be disjoint and you probably do not need that.
_, (h_n, _) returns last hidden cell from last timestep, exactly of shape: (num_layers * num_directions, batch, hidden_size). In our case num_layers and num_directions is 1 so we get (1, batch, hidden_size) tensor as output
Reshape to (batch, hidden_size) so it can be passed through linear layer
Return logits without activation. Only one if it is a binary case. Use torch.nn.BCEWithLogitsLoss as loss for binary case and torch.nn.CrossEntropyLoss for multiclass case. Also sigmoid is proper activation for binary case, while softmax or log_softmax is appropriate for multiclass.
For binary only one output is needed. Any value below 0 (if returning unnormalized probabilities as in this case) is considered negative, anything above positive.

Using a stateful Keras model in pure TensorFlow

I have a stateful RNN model with several GRU layers that was created in Keras.
I have to run this model now from Java, so I dumped the model as protobuf, and I'm loading it from Java TensorFlow.
This model must be stateful because features will be fed one timestep at-a-time.
As far as I understand, in order to achieve statefulness in a TensorFlow model, I must somehow feed in the last state every time I execute the session runner, and also that the run would return the state after the execution.
Is there a way to output the state in the Keras model?
Is there a simpler way altogether to get a stateful Keras model to work as such using TensorFlow?
Many thanks
An alternative solution is to use the model.state_updates property of the keras model, and add it to the session.run call.
Here is a full example that illustrates this solutions with two lstms:
import tensorflow as tf
class SimpleLstmModel(tf.keras.Model):
""" Simple lstm model with two lstm """
def __init__(self, units=10, stateful=True):
super(SimpleLstmModel, self).__init__()
self.lstm_0 = tf.keras.layers.LSTM(units=units, stateful=stateful, return_sequences=True)
self.lstm_1 = tf.keras.layers.LSTM(units=units, stateful=stateful, return_sequences=True)
def call(self, inputs):
"""
:param inputs: [batch_size, seq_len, 1]
:return: output tensor
"""
x = self.lstm_0(inputs)
x = self.lstm_1(x)
return x
def main():
model = SimpleLstmModel(units=1, stateful=True)
x = tf.placeholder(shape=[1, 1, 1], dtype=tf.float32)
output = model(x)
sess = tf.Session()
sess.run(tf.initialize_all_variables())
res_at_step_1, _ = sess.run([output, model.state_updates], feed_dict={x: [[[0.1]]]})
print(res_at_step_1)
res_at_step_2, _ = sess.run([output, model.state_updates], feed_dict={x: [[[0.1]]]})
print(res_at_step_2)
if __name__ == "__main__":
main()
Which produces the following output:
[[[0.00168626]]]
[[[0.00434444]]]
and shows that the lstm state is preserved between batches.
If we set stateful to False, the output becomes:
[[[0.00033928]]]
[[[0.00033928]]]
Showing that the state is not reused.
ok, so I managed to solve this problem!
What worked for me was creating tf.identity tensors for not only the outputs, as is standard, but also for the state tensors.
In the Keras models, the state tensors can be found by doing:
model.updates
Which gives something like this:
[(<tf.Variable 'gru_1_1/Variable:0' shape=(1, 70) dtype=float32_ref>,
<tf.Tensor 'gru_1_1/while/Exit_2:0' shape=(1, 70) dtype=float32>),
(<tf.Variable 'gru_2_1/Variable:0' shape=(1, 70) dtype=float32_ref>,
<tf.Tensor 'gru_2_1/while/Exit_2:0' shape=(1, 70) dtype=float32>),
(<tf.Variable 'gru_3_1/Variable:0' shape=(1, 4) dtype=float32_ref>,
<tf.Tensor 'gru_3_1/while/Exit_2:0' shape=(1, 4) dtype=float32>)]
The 'Variable' is used for inputting the states, and the 'Exit' for outputs of the new states.
So I created tf.identity out of the 'Exit' tensors. I gave them meaningful names, e.g.:
tf.identity(state_variables[j], name='state'+str(j))
Where state_variables contained only the 'Exit' tensors
Then used the input variables (e.g. gru_1_1/Variable:0) to feed the model state from TensorFlow, and the identity variables I created out of the 'Exit' tensors were used to extract the new states after feeding the model at each timestep

Resources