Below is my model which I am using for multi-class classification
model = Sequential()
model.add(Dense(6, input_dim = input_dim , activation = 'relu'))
model.add(Dense(3, activation = 'softmax'))
adam = optimizers.Adam(learning_rate=0.001)
model.compile(loss = 'categorical_crossentropy' , optimizer = 'adam' , metrics = ['accuracy'] )
I wanted to Range Normalize the output produced by the first layer, i.e. layer with relu activation, between 0 and 1.
I have checked Normalization layers available in Keras but as far as I have read they produce the output with mean 0 and stddev 1.
I am not sure what is the procedure for performing custom Normalization on a layer in Keras.
I would recommend using a sigmoid function as this will map the output to [0,1], whereas relu maps between (0, inifinity]
Related
I am using Python 3.8 and VSCode.
I tried to create a basic Neural Network without activations and biases but because of the error, I'm not able to update the gradients of the weights.
Matrix Details:
Layer Shape: (1, No. of Neurons)
Weight Layer Shape: (No. of Neurons in the previous layer, No. of Neurons in the next layer)
Here's my code:
# sample neural network
# importing libraries
import torch
import numpy as np
torch.manual_seed(0)
# hyperparameters
epochs = 100
lr = 0.01 # learning rate
# data
X_train = torch.tensor([[1]], dtype = torch.float)
y_train = torch.tensor([[2]], dtype = torch.float)
'''
Network Architecture:
1 neuron in the first layer
4 neurons in the second layer
4 neurons in the third layer
1 neuron in the last layer
* I haven't added bias and activation.
'''
# initializing the weights
weights = []
weights.append(torch.rand((1, 4), requires_grad = True))
weights.append(torch.rand((4, 4), requires_grad = True))
weights.append(torch.rand((4, 1), requires_grad = True))
# calculating y_pred
y_pred = torch.matmul(torch.matmul(torch.matmul(X_train, weights[0]), weights[1]), weights[2])
# calculating loss
loss = (y_pred - y_train)**2
# calculating the partial derivatives
loss.backward()
# updating the weights and zeroing the gradients
with torch.no_grad():
for i in range(len(weights)):
weights[i] = weights[i] - weights[i].grad
weights[i].grad.zero_()
It it showing the error:
File "test3.py", line 43, in <module>
weights[i].grad.zero_()
AttributeError: 'NoneType' object has no attribute 'zero_'
I don't understand why it is showing this error. Can someone please explain?
Your model doesn't have any trainable parameters for the grad to be calculated. Use torch's Parameter. See this link for creating a module with learnable parameters.
torch.nn.parameter.Parameter
A kind of Tensor that is to be considered a module parameter.
I am unsure how PyTorch manges to link the loss function to the model I want it to be computed for. There is never an explicit reference between the loss and the model, such as the one between the model's parameters and the optimizer.
Say for example I want to train 2 networks on the same dataset, so I want to utilize a single pass through the dataset. How would PyTorch link the appropriate loss functions to the appropriate models. Here's code for reference:
import torch
from torch import nn, optim
import torch.nn.functional as F
from torchvision import datasets, transforms
import shap
# Define a transform to normalize the data
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,)),
])
# Download and load the training data
trainset = datasets.MNIST('~/.pytorch/MNIST_data/', download=True, train=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)
model = nn.Sequential(nn.Linear(784, 128),
nn.ReLU(),
nn.Linear(128, 64),
nn.ReLU(),
nn.Linear(64, 10),
nn.LogSoftmax(dim=1))
model2 = nn.Sequential(nn.Linear(784, 128),
nn.ReLU(),
nn.Linear(128, 10),
nn.LogSoftmax(dim=1))
# Define the loss
criterion = nn.NLLLoss()
criterion2 = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.003)
optimizer2 = optim.SGD(model2.parameters(), lr=0.003)
epochs = 5
for e in range(epochs):
running_loss = 0
running_loss_2 = 0
for images, labels in trainloader:
# Flatten MNIST images into a 784 long vector
images = images.view(images.shape[0], -1) # batch_size x total_pixels
# Training pass
optimizer.zero_grad()
optimizer2.zero_grad()
output = model(images)
loss = criterion(output, labels)
loss.backward()
optimizer.step()
output2 = model2(images)
loss2 = criterion2(output2, labels)
loss2.backward()
optimizer2.step()
running_loss += loss.item()
running_loss_2 += loss2.item()
print(f"Training loss 1: {running_loss/len(trainloader)}")
print(f"Training loss 2: {running_loss_2/len(trainloader)}")
print()
So once again, how does pytorch know how to compute the appropriate gradients for the appropriate models when loss.backward() and loss2.backward() are called?
Whenever you perform forward operations using one of your model parameters (or any torch.tensor that has attribute requires_grad==True), pytorch builds a computational graph. When you operate on descendents in this graph, the graph is extended. In your case, you have a nn.module called model which will have some trainable model.parameters(), so pytorch will build a graph from your model.parameters() all the way to the loss as you perform the forward operations. The graph is then traversed in reverse during the backward pass to propagate the gradients back to the parameters. For loss in your code above the graph is something like
model.parameters() --> [intermediate variables in model] --> output --> loss
^ ^
| |
images labels
When you call loss.backward() pytorch traverses this graph in reverse to reach all trainable parameters (only the model.parameters() in this case) and updates param.grad for each of them. The optimizer then relies on this information gathered during the backward pass to update the parameter.
For loss2 the story is similar.
The official pytorch tutorials are a good resource for more in-depth information on this.
Hi I am developing a neural network model using keras.
code
def base_model():
# Initialising the ANN
regressor = Sequential()
# Adding the input layer and the first hidden layer
regressor.add(Dense(units = 4, kernel_initializer = 'he_normal', activation = 'relu', input_dim = 7))
# Adding the second hidden layer
regressor.add(Dense(units = 2, kernel_initializer = 'he_normal', activation = 'relu'))
# Adding the output layer
regressor.add(Dense(units = 1, kernel_initializer = 'he_normal'))
# Compiling the ANN
regressor.compile(optimizer = 'adam', loss = 'mse', metrics = ['mae'])
return regressor
I have been reading about which kernel_initializer to use and came across the link- https://towardsdatascience.com/hyper-parameters-in-action-part-ii-weight-initializers-35aee1a28404
it talks about glorot and he initializations. I have tried with different intilizations for weights, but all of them give the same results. I want to understand how important is it do a proper initialization?
Thanks
I'll give you an explanation of how much weights initialisation is important.
Let's suppose our NN has an input layer with 1000 neurons, and suppose we start to initialise weights as they are normal distributed with mean 0 and variance 1 ().
At the second layer, we assume that only 500 first layer's neurons are activated, while the other 500 not.
The neuron's input of the second layer z will be the sum of :
so, it will be even normal distributed but with variance .
This means its value will be |z| >> 1 or |z| << 1, so neurons will saturate. The network will learn slowly at all.
A solution is to initialise weights as where is the number of the inputs of the first layer. In this way z will be and so less spreader, consequently neurons are less prone to saturate.
This trick can help as a start but in deep neural networks, due to the presence of hidden multi-layers, the weights initialisation should be done at each layer. A method may be using the batch normalization
Besides this from your code I can see you'v chosen as cost function the MSE, so it is a quadratic cost function. I don't know if your problem is a classification one, but if this is the case I suggest you to use a cross-entropy function as cost function for increasing the learning rate of your network.
I have the cnn model code.
classifier = Sequential()
classifier.add(Convolution2D(32,3,3, input_shape =
(256,256,3),activation = "relu"))
classifier.add(MaxPooling2D(pool_size = (2,2)))
So now i need to find what values the 32 filters were initialized with ? Any code that helps in printing the values of the filters
Here is the default keras Conv2d initialization : kernel_initializer='glorot_uniform' (or init='glorot_uniform' for older version of keras).
You can look at what this initializer does here : Keras initializers
Finally, here is one way to access the weights of your first layer :
classifier = Sequential()
classifier.add(Convolution2D(32,3,3, input_shape =
(256,256,3),activation = "relu"))
classifier.add(MaxPooling2D(pool_size = (2,2)))
first_layer = classifier.layers[0]
print(first_layer.get_weights()) # You may need to process this output tensor to get a readable output and not just a raw tensor
Get the corresponding layer from the model
layer = classifier.layers[0] # 0th layer is the convolution in your architecture
There will be two variables for each convolution layer (Filter kernels and Bias). Get the corresponding one
filters = layer.weights[0] # kernel is the 0th index
Now filters contain the values you are looking for and it is a tensor. To get the values of the tensor, use get_value() function of Keras backend
import keras.backend as K
print(K.get_value(wt))
This will print an array of shape (3, 3, 3, 32) which translates to 32 filters of kernel size 3x3 for 3 channels.
I am trying to make a sequence to sequence encoder decoder model and need to softmax the last layer to use categorical cross entropy.
I've tried setting activation of the last LSTM layer to 'softmax' but that doesn't seem to do the trick. Adding another dense layer and setting the activation to softmax doesn't help either. What is the correct way to do a softmax when your last LSTM outputs a sequence?
inputs = Input(batch_shape=(batch_size, timesteps, input_dim), name='hella')
encoded = LSTM(latent_dim, return_sequences=True, stateful=False)(inputs)
encoded = LSTM(latent_dim, return_sequences=True, stateful=False)(encoded)
encoded = LSTM(latent_dim, return_sequences=True, stateful=False)(encoded)
encoded = LSTM(latent_dim, return_sequences=False)(encoded)
decoded = RepeatVector(timesteps)(encoded)
decoded = LSTM(input_dim, return_sequences=True)(decoded)
# do softmax here
sequence_autoencoder = Model(inputs, decoded)
sequence_autoencoder.compile(loss='categorical_crossentropy', optimizer='adam')
Figured it out:
As of Keras 2, you can simply add:
TimeDistributed(Dense(input_dim, activation='softmax'))
TimeDistributed allows you to apply a Dense layer on each temporal time step. Documentation can be found here: https://keras.io/layers/wrappers/