ZCA whitening in python for Machine learning - machine-learning

I am training 1000 images of 28x28 size. But before training, I am performing ZCA whitening on my data by taking the reference from How to implement ZCA Whitening? Python.
Since I have 1000 data images of size 28x28, after flattening, it becomes 1000x784.
But as given in the code below, whether X is my image dataset of 1000x784?
If it is so, then it means the ZCAMatrix size is 1000x1000.
In this case, for prediction I have a image of size 28x28, or we can say, size of 1x784.So it doesn't make sense to multiply ZCAMatrix to the image.
So I think, X is the transpose of image data set. Am I right?
If I am right, then the size of ZCAMatrix is 784x784.
Now how should I calculate the ZCA whitened image, whether I should use np.dot(ZCAMatrix, transpose_of_image_to_be_predict) or np.dot(image_to_be_predict, ZCAMatrix)?
Suggestion would be greatly appreciate.
def zca_whitening_matrix(X):
"""
Function to compute ZCA whitening matrix (aka Mahalanobis whitening).
INPUT: X: [M x N] matrix.
Rows: Variables
Columns: Observations
OUTPUT: ZCAMatrix: [M x M] matrix
"""
# Covariance matrix [column-wise variables]: Sigma = (X-mu)' * (X-mu) / N
sigma = np.cov(X, rowvar=True) # [M x M]
# Singular Value Decomposition. X = U * np.diag(S) * V
U,S,V = np.linalg.svd(sigma)
# U: [M x M] eigenvectors of sigma.
# S: [M x 1] eigenvalues of sigma.
# V: [M x M] transpose of U
# Whitening constant: prevents division by zero
epsilon = 1e-5
# ZCA Whitening matrix: U * Lambda * U'
ZCAMatrix = np.dot(U, np.dot(np.diag(1.0/np.sqrt(S + epsilon)), U.T)) # [M x M]
return ZCAMatrix
And an example of the usage:
X = np.array([[0, 2, 2], [1, 1, 0], [2, 0, 1], [1, 3, 5], [10, 10, 10] ]) # Input: X [5 x 3] matrix
ZCAMatrix = zca_whitening_matrix(X) # get ZCAMatrix
ZCAMatrix # [5 x 5] matrix
xZCAMatrix = np.dot(ZCAMatrix, X) # project X onto the ZCAMatrix
xZCAMatrix # [5 x 3] matrix

I got the reference from the Keras code available here.
It is very clear that in my case the co-variance matrix will give 784x784 matrix, on which Singular Value Decomposition is performed. It gives 3 matrix that is used to calculate the principal_components, and that principal_components is used to find the ZCA whitened data.
Now my question was
how should I calculate the ZCA whitened image, whether I should use
np.dot(ZCAMatrix, transpose_of_image_to_be_predict) or
np.dot(image_to_be_predict, ZCAMatrix)? Suggestion would be greatly
appreciate.
For this I got the reference from here.
Here I need to use np.dot(image_to_be_predict, ZCAMatrix) to calculate the ZCA whitened image.

Related

How to keep input and output shape consistent after applying conv2d and convtranspose2d to image data?

I'm using Pytorch to experiment image segmentation task. I found input and output shape are often inconsistent after applying Conv2d() and Convtranspose2d() to my image data of shape [1,1,height,width]). How to fix it the issue for arbitrary height and width?
Best regards
import torch
data = torch.rand(1,1,16,26)
a = torch.nn.Conv2d(1,1,kernel_size=3, stride=2)
b = a(data)
print(b.shape)
c = torch.nn.ConvTranspose2d(1,1,kernel_size=3, stride=2)
d = c(b)
print(d.shape) # torch.Size([1, 1, 15, 25])
TLDR; Given the same parameters nn.ConvTranspose2d is not the invert operation of nn.Conv2d in terms of dimension shape conservation.
From an input with spatial dimension x_in, nn.Conv2d will output a tensor with respective spatial dimension x_out:
x_out = [(x_in + 2p - d*(k-1) - 1)/s + 1]
Where [.] is the whole part function, p the padding, d the dilation, k the kernel size, and s the stride.
In your case: k=3, s=2, while other parameters default to p=0 and d=1. In other words x_out = [(x_in - 3)/2 + 1]. So given x_in=16, you get x_out = [7.5] = 7.
On the other hand, we have for nn.ConvTranspose2d:
x_out = (x_in-1)*s - 2p + d*(k-1) + op + 1
Where [.] is the whole part function, p the padding, d the dilation, k the kernel size, s the stride, and op the output padding.
In your case: k=3, s=2, while other parameters default to p=0, d=1, and op=0. You get x_out = (x_in-1)*2 + 3. So given x_in=7, you get x_out = 15.
However, if you apply an output padding on your transpose convolution, you will get the desired shape:
>>> conv = nn.Conv2d(1,1, kernel_size=3, stride=2)
>>> convT = nn.ConvTranspose2d(1, 1, kernel_size=3, stride=2, output_padding=1)
>>> convT(conv(data)).shape
torch.Size([1, 1, 16, 26])

Fitting a image with 2d gaussian in Stan

I’m trying to fit an image of (matrix of NxN) dimension, but I don’t figure out how to do it. I suppose that my model has to be a y ~multi_normal (mu, Sigma) with mu a vector of values mu_x and mu_y and Sigma a cov matrix.
I’m writing a simple Stan code, but i don’t know how to introduce the image to the multi_normal. An example of data could be:
np.array([[1 4 7 6],
[2 11 9 4],
[3 6 9 4],
[1 2 3 1]])
This code is bad but how can I implement it?
data {
int<lower=0> n; #dimensions of gaussian
int<lower=0> N; #dimensions of the image
matrix[N,N] x; #image
}
parameters {
vector[2] mu;
matrix <lower=0> cov;
}
model {
y ~ multi_normal( mu, cov);
mu ~ normal(0., 100.); #prior
cov ~ cauchy(0., 100.);

Loss for Multi-label Classification

I am working on a multi-label classification problem. My gt labels are of shape 14 x 10 x 128, where 14 is the batch_size, 10 is the sequence_length, and 128 is the vector with values 1 if the item in sequence belongs to the object and 0 otherwise.
My output is also of same shape: 14 x 10 x 128. Since, my input sequence was of varying length I had to pad it to make it of fixed length 10. I'm trying to find the loss of the model as follows:
total_loss = 0.0
unpadded_seq_lengths = [3, 4, 5, 7, 9, 3, 2, 8, 5, 3, 5, 7, 7, ...] # true lengths of sequences
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.BCEWithLogitsLoss()
for data in training_dataloader:
optimizer.zero_grad()
# shape of input 14 x 10 x 128
output = model(data)
batch_loss = 0.0
for batch_idx, sequence in enumerate(output):
# sequence shape is 10 x 128
true_seq_len = unpadded_seq_lengths[batch_idx]
# only keep unpadded gt and predicted labels since we don't want loss to be influenced by padded values
predicted_labels = sequence[:true_seq_len, :] # for example, 3 x 128
gt_labels = gt_labels_padded[batch_idx, :true_seq_len, :] # same shape as above, gt_labels_padded has shape 14 x 10 x 128
# loop through unpadded predicted and gt labels and calculate loss
for item_idx, predicted_labels_seq_item in enumerate(predicted_labels):
# predicted_labels_seq_item and gt_labels_seq_item are 1D vectors of length 128
gt_labels_seq_item = gt_labels[item_idx]
current_loss = criterion(predicted_labels_seq_item, gt_labels_seq_item)
total_loss += current_loss
batch_loss += current_loss
batch_loss.backward()
optimizer.step()
Can anybody please check to see if I'm calculating loss correctly. Thanks
Update:
Is this the correct approach for calculating accuracy metrics?
# batch size: 14
# seq length: 10
for epoch in range(10):
TP = FP = TN = FN = 0.
for x, y, mask in tr_dl:
# mask shape: (10,)
out = model(x) # out shape: (14, 10, 128)
y_pred = (torch.sigmoid(out) >= 0.5).float().type(torch.int64) # consider all predictions above 0.5 as 1, rest 0
y_pred = y_pred[mask] # y_pred shape: (14, 10, 10, 128)
y_labels = y[mask] # y_labels shape: (14, 10, 10, 128)
# do I flatten y_pred and y_labels?
y_pred = y_pred.flatten()
y_labels = y_labels.flatten()
for idx, prediction in enumerate(y_pred):
if prediction == 1 and y_labels[idx] == 1:
# calculate IOU (overlap of prediction and gt bounding box)
iou = 0.78 # assume we get this iou value for objects at idx
if iou >= 0.5:
TP += 1
else:
FP += 1
elif prediction == 1 and y_labels[idx] == 0:
FP += 1
elif prediction == 0 and y_labels[idx] == 1:
FN += 1
else:
TN += 1
EPOCH_ACC = (TP + TN) / (TP + TN + FP + FN)
It is usually recommended to stick with batch-wise operations and avoid going into single-element processing steps while in the main training loop. One way to handle this case is to make your dataset return padded inputs and labels with additionally a mask that will come useful for loss computation. In other words, to compute the loss term with sequences of varying sizes, we will use a mask instead of doing individual slices.
Dataset
The way to proceed is to make sure you build the mask in the dataset and not in the inference loop. Here I am showing a minimal implementation that you should be able to transfer to your dataset without much hassle:
class Dataset(data.Dataset):
def __init__(self):
super().__init__()
def __len__(self):
return 100
def __getitem__(self, index):
i = random.randint(5, SEQ_LEN) # for demo puporse, generate x with random length
x = torch.rand(i, EMB_SIZE)
y = torch.randint(0, N_CLASSES, (i, EMB_SIZE))
# pad data to fit in batch
pad = torch.zeros(SEQ_LEN-len(x), EMB_SIZE)
x_padded = torch.cat((pad, x))
y_padded = torch.cat((pad, y))
# construct tensor to mask loss
mask = torch.cat((torch.zeros(SEQ_LEN-len(x)), torch.ones(len(x))))
return x_padded, y_padded, mask
Essentially in the __getitem__, we not only pad the input x and target y with zero values, we also construct a simple mask containing the positions of the padded values in the currently processed element.
Notice how:
x_padded, shaped (SEQ_LEN, EMB_SIZE)
y_padded, shaped (SEQ_LEN, N_CLASSES)
mask, shaped (SEQ_LEN,)
are all three tensors which are shape invariant across the dataset, yet mask contains the padding information necessary for us to compute the loss function appropriately.
Inference
The loss you've used nn.BCEWithLogitsLoss, is the correct one since it's a multi-dimensional loss used for binary classification. In other words, you can use it here in this multi-label classification task, considering each one of the 128 logits as an individual binary prediction. Do not use nn.CrossEntropyLoss) as suggested elsewhere, since the softmax will push a single logit (i.e. class), which is the behaviour required for single-label classification tasks.
Therefore, in the training loop, we simply have to apply the mask to our loss.
for x, y, mask in dl:
y_pred = model(x)
loss = mask*bce(y_pred, y)
# backpropagation, loss postprocessing, logs, etc.
This is what you need for the first part of the question, there are already loss functions implemented in tensorflow: https://medium.com/#aadityaura_26777/the-loss-function-for-multi-label-and-multi-class-f68f95cae525. Yours is just tf.nn.weighted_cross_entropy_with_logits, but you need to set the weight.
The second part of the question is not straightforward, because there's conditioning on the IOU, generally, when you do machine learning, you should heavily depend on matrix multiplication, in your case, you probably need to pre-calculate the IOU -> 1 or 0 as a vector, then multiply with the y_pred , element-wise, this will give you the modified y_pred . After that, you can use any accuracy available function to calculate the final result.
if you can use the CROSSENTROPYLOSS instead of BCEWithLogitsLoss there is something called ignore_index. you can use it to exclude your padded sequences. the difference between the 2 losses is the activation function used (softmax vs sigmoid). but I think you can still use the CROSSENTROPYLOSSfor binary classification as well.

How to compute the cosine_similarity in pytorch for all rows in a matrix with respect to all rows in another matrix

In pytorch, given that I have 2 matrixes how would I compute cosine similarity of all rows in each with all rows in the other.
For example
Given the input =
matrix_1 = [a b]
[c d]
matrix_2 = [e f]
[g h]
I would like the output to be
output =
[cosine_sim([a b] [e f]) cosine_sim([a b] [g h])]
[cosine_sim([c d] [e f]) cosine_sim([c d] [g h])]
At the moment I am using torch.nn.functional.cosine_similarity(matrix_1, matrix_2) which returns the cosine of the row with only that corresponding row in the other matrix.
In my example I have only 2 rows, but I would like a solution which works for many rows. I would even like to handle the case where the number of rows in the each matrix is different.
I realize that I could use the expand, however I want to do it without using such a large memory footprint.
By manually computing the similarity and playing with matrix multiplication + transposition:
import torch
from scipy import spatial
import numpy as np
a = torch.randn(2, 2)
b = torch.randn(3, 2) # different row number, for the fun
# Given that cos_sim(u, v) = dot(u, v) / (norm(u) * norm(v))
# = dot(u / norm(u), v / norm(v))
# We fist normalize the rows, before computing their dot products via transposition:
a_norm = a / a.norm(dim=1)[:, None]
b_norm = b / b.norm(dim=1)[:, None]
res = torch.mm(a_norm, b_norm.transpose(0,1))
print(res)
# 0.9978 -0.9986 -0.9985
# -0.8629 0.9172 0.9172
# -------
# Let's verify with numpy/scipy if our computations are correct:
a_n = a.numpy()
b_n = b.numpy()
res_n = np.zeros((2, 3))
for i in range(2):
for j in range(3):
# cos_sim(u, v) = 1 - cos_dist(u, v)
res_n[i, j] = 1 - spatial.distance.cosine(a_n[i], b_n[j])
print(res_n)
# [[ 0.9978022 -0.99855876 -0.99854881]
# [-0.86285472 0.91716063 0.9172349 ]]
Adding eps for numerical stability base on benjaminplanche's answer:
def sim_matrix(a, b, eps=1e-8):
"""
added eps for numerical stability
"""
a_n, b_n = a.norm(dim=1)[:, None], b.norm(dim=1)[:, None]
a_norm = a / torch.max(a_n, eps * torch.ones_like(a_n))
b_norm = b / torch.max(b_n, eps * torch.ones_like(b_n))
sim_mt = torch.mm(a_norm, b_norm.transpose(0, 1))
return sim_mt
same as Zhang Yu's answer but using clamp instead of max and without creating a new tensor. I did a small test with timeit, which indicated that clamp was faster, though I am not proficient in using that tool.
def sim_matrix(a, b, eps=1e-8):
"""
added eps for numerical stability
"""
a_n, b_n = a.norm(dim=1)[:, None], b.norm(dim=1)[:, None]
a_norm = a / torch.clamp(a_n, min=eps)
b_norm = b / torch.clamp(b_n, min=eps)
sim_mt = torch.mm(a_norm, b_norm.transpose(0, 1))
return sim_mt
You could use TorchMetrics's from torchmetrics.functional import pairwise_cosine_similarity to calculate cosine similarity for two matrices with different shapes. Refer to https://torchmetrics.readthedocs.io/en/stable/pairwise/cosine_similarity.html
>>> import torch
>>> from torchmetrics.functional import pairwise_cosine_similarity
>>> x = torch.tensor([[2, 3], [3, 5], [5, 8]], dtype=torch.float32)
>>> y = torch.tensor([[1, 0], [2, 1]], dtype=torch.float32)
>>> pairwise_cosine_similarity(x, y)
tensor([[0.5547, 0.8682],
[0.5145, 0.8437],
[0.5300, 0.8533]])
>>> pairwise_cosine_similarity(x)
tensor([[0.0000, 0.9989, 0.9996],
[0.9989, 0.0000, 0.9998],
[0.9996, 0.9998, 0.0000]])
It is unnecessary to use loop in calculate the similarity between the row/column vector in a matrix. Here an example.
import torch as t
a = t.randn(2,4)
print(a)
# step 1. 计算行向量的长度
len_a = t.sqrt(t.sum(a**2,dim=-1))
print(len_a)
b = len_a.unsqueeze(1).expand(-1,2)
c = len_a.expand(2,-1)
# print(b)
# print(c)
# step2. 计算乘积
x = a # a.T
print(x)
# step3. 计算最后的结果
res = x/(b*c)
print(res)
You can expand the 2 input batches, perform the pairwise cosine similarity operation, then transpose it:
Non-cloning equivalents of torch.repeat_interleave and torch.repeat are used.
def distance_matrix(x, y, distance_function):
return distance_function(
x.view(x.size(0), 1, x.size(1)).expand(x.size(0), y.size(0), x.size(1)).contiguous().view(-1, x.size(1)),
y.expand(x.size(0), y.size(0), y.size(1)).flatten(end_dim=1),
).view(x.size(0), y.size(0))
from torch.nn import functional as F
distance_matrix(x, y, F.cosine_similarity)

Reshaping tensor after max pooling ValueError: Shapes are not compatible

I am building CNN fitting my own data, based on this example
Basically, my data has 3640 features; I have a convolution layer followed by a pooling layer, that pools every other feature, so I end up with dimensions (?, 1, 1819, 1) because 3638 features after conv layer / 2 == 1819.
When I try to reshape my data after pooling to get it in the form [n_samples, n_fetures]
print("pool_shape", pool_shape) #pool (?, 1, 1819, 10)
print("y_shape", y_shape) #y (?,)
pool.set_shape([pool_shape[0], pool_shape[2]*pool_shape[3]])
y.set_shape([y_shape[0], 1])
I get an error:
ValueError: Shapes (?, 1, 1819, 10) and (?, 18190) are not compatible
My code:
N_FEATURES = 140*26
N_FILTERS = 1
WINDOW_SIZE = 3
def my_conv_model(x, y):
x = tf.cast(x, tf.float32)
y = tf.cast(y, tf.float32)
print("x ", x.get_shape())
print("y ", y.get_shape())
# to form a 4d tensor of shape batch_size x 1 x N_FEATURES x 1
x = tf.reshape(x, [-1, 1, N_FEATURES, 1])
# this will give you sliding window of 1 x WINDOW_SIZE convolution.
features = tf.contrib.layers.convolution2d(inputs=x,
num_outputs=N_FILTERS,
kernel_size=[1, WINDOW_SIZE],
padding='VALID')
print("features ", features.get_shape()) #features (?, 1, 3638, 10)
# Max pooling across output of Convolution+Relu.
pool = tf.nn.max_pool(features, ksize=[1, 1, 2, 1],
strides=[1, 1, 2, 1], padding='SAME')
pool_shape = pool.get_shape()
y_shape = y.get_shape()
print("pool_shape", pool_shape) #pool (?, 1, 1819, 10)
print("y_shape", y_shape) #y (?,)
### here comes the error ###
pool.set_shape([pool_shape[0], pool_shape[2]*pool_shape[3]])
y.set_shape([y_shape[0], 1])
pool_shape = pool.get_shape()
y_shape = y.get_shape()
print("pool_shape", pool_shape) #pool (?, 1, 1819, 10)
print("y_shape", y_shape) #y (?,)
prediction, loss = learn.models.logistic_regression(pool, y)
return prediction, loss
How to reshape the data to get any meaningful representation of it and to later pass it to logistic regression layer?
This looks like a confusion between the Tensor.set_shape() method and the tf.reshape() operator. In this case, you should use tf.reshape() because you are changing the shape of the pool and y tensors:
The tf.reshape(tensor, shape) operator takes a tensor of any shape, and returns a tensor with the given shape, as long as they have the same number of elements. This operator should be used to change the shape of the input tensor.
The tensor.set_shape(shape) method takes a tensor that might have a partially known or unknown shape, and asserts to TensorFlow that it actually has the given shape. This method should be used to provide more information about the shape of a particular tensor.
It can be used, e.g., when you take the output of an operator that has a data-dependent output shape (such as tf.image.decode_jpeg()) and assert that it has a static shape (e.g. based on knowledge about the sizes of images in your dataset).
In your program, you should replace the calls to set_shape() with something like the following:
pool_shape = tf.shape(pool)
pool = tf.reshape(pool, [pool_shape[0], pool_shape[2] * pool_shape[3]])
y_shape = tf.shape(y)
y = tf.reshape(y, [y_shape[0], 1])
# Or, more straightforwardly:
y = tf.expand_dims(y, 1)

Resources