I am wondering how can I make use of torch.scatter_() or any other built-in masking function for this one-hot masking task-
I have two tensors-
X = [batch, 100], label = [batch], and num_classes = 10
so each label has 10 tensors out of those 100 tensors in 'X'.
For instance X of shape [1x100],
X = ([ 0.0468, -1.7434, -1.0217, -0.0724, -0.5169, -1.7318, -0.1207, -0.8377,
-0.8055, 0.7438, 0.1139, 1.2162, -1.7950, 1.7416, -1.2031, -1.4833,
-0.5454, 0.2466, -1.2303, -0.4257, 0.9873, -1.5905, -1.3950, 0.4013,
-1.0523, 1.4450, 0.6574, 1.5239, -0.3503, -0.1114, 1.8192, -1.7425,
0.4678, 0.4074, 1.7606, -1.0502, 0.0724, 0.1721, 0.1108, 0.4453,
0.2278, -1.5352, -0.1232, 1.1052, 0.2496, 1.2898, -0.4167, -0.8211,
0.2340, -0.3829, -0.1328, 0.1033, 2.8693, -0.8802, -0.0433, 0.5335,
0.0662, 0.4250, 0.2353, -0.1590, 0.0865, 0.6519, -0.2242, 1.5300,
1.7021, -0.9451, 0.5845, -0.7309, 0.7124, 0.6544, -1.4426, -0.1859,
-1.5313, -1.5391, -0.2138, -1.0203, 0.6678, 1.3445, -1.3453, 0.5222,
0.9510, 0.0969, -0.5437, -0.2727, -0.6090, -2.9624, 0.4578, 0.5257,
-0.2866, 0.0818, -1.2454, 1.6511, 0.1634, 1.3720, -0.4222, 0.5347,
0.3586, -0.3506, 2.6866, 0.5084])
label = [3]
I would like to do one-hot masking of "1" to tensors 30-40 and rest all the tensors as "0" on the tensor 'X'.
so for,
label = 1 -> mask (0 to 10) as ‘1’ and rest as ‘0’
label = 2 -> mask (10 to 20) as ‘1’ and rest as ‘0’
… so on.
Related
I would like to resize, and specifically shrink, a mask (2D array of 1s and 0s) so that any pixel in the low-resolution-mask that maps to a group of pixels in the high-resolution-mask (original) containing at least one value of 1 will be set to 1 itself (example at bottom).
I've tried using cv2.resize() using cv2.INTER_MAX but it returned an error:
error: OpenCV(4.6.0) /io/opencv/modules/imgproc/src/resize.cpp:3927: error: (-5:Bad argument) Unknown interpolation method in function 'resize'
It doesn't seem that Pillow Image or scipy have an interpolation method to do so.
I'm looking for a solution for the defined shrink_max()
>>> orig_mask = [[1,0,0],[0,0,0],[0,0,0]]
>>> orig_mask
[[1,0,0]
,[0,0,0]
,[0,0,0]]
>>> mini_mask = shrink_max(orig_mask, (2,2))
>>> mini_mask
[[1,0]
,[0,0]]
>>> mini_mask = shrink_max(orig_mask, (1,1))
>>> mini_mask
[[1]]
I'm not aware of a direct method but try this for shrinking the mask to half-size, i.e. each low-res pixel maps to 4 original pixels (modify to any ratio as per your needs):
import numpy as np
orig_mask = np.array([[1,0,0],[0,0,0],[0,0,0]])
# first make the original mask divisible by 2
pad_row = orig_mask.shape[0] % 2
pad_col = orig_mask.shape[1] % 2
# i.e. pad the right and bottom of the mask with zeros
orig_mask_padded = np.pad(orig_mask, ((0,pad_row), (0,pad_col)))
# get the new shape
new_rows = orig_mask_padded.shape[0] // 2
new_cols = orig_mask_padded.shape[1] // 2
# group the original pixels by fours and max each group
shrunk_mask = orig_mask_padded.reshape(new_rows, 2, new_cols, 2).max(axis=(1,3))
print(shrunk_mask)
Check working with submatrixes here: Numpy: efficiently sum sub matrix m of M
Here's the complete function for shrinking to any desired shape:
def shrink_max(mask, shrink_to_shape):
r, c = shrink_to_shape
m, n = mask.shape
padded_mask = np.pad(mask, ((0, -m % r), (0, -n % c)))
pr, pc = padded_mask.shape
return padded_mask.reshape(r, pr // r, c, pc // c).max(axis=(1, 3))
For example print(shrink_max(orig_mask, (2,1))) returns:
[[1]
[0]]
I am working on a multi-label classification problem. My gt labels are of shape 14 x 10 x 128, where 14 is the batch_size, 10 is the sequence_length, and 128 is the vector with values 1 if the item in sequence belongs to the object and 0 otherwise.
My output is also of same shape: 14 x 10 x 128. Since, my input sequence was of varying length I had to pad it to make it of fixed length 10. I'm trying to find the loss of the model as follows:
total_loss = 0.0
unpadded_seq_lengths = [3, 4, 5, 7, 9, 3, 2, 8, 5, 3, 5, 7, 7, ...] # true lengths of sequences
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.BCEWithLogitsLoss()
for data in training_dataloader:
optimizer.zero_grad()
# shape of input 14 x 10 x 128
output = model(data)
batch_loss = 0.0
for batch_idx, sequence in enumerate(output):
# sequence shape is 10 x 128
true_seq_len = unpadded_seq_lengths[batch_idx]
# only keep unpadded gt and predicted labels since we don't want loss to be influenced by padded values
predicted_labels = sequence[:true_seq_len, :] # for example, 3 x 128
gt_labels = gt_labels_padded[batch_idx, :true_seq_len, :] # same shape as above, gt_labels_padded has shape 14 x 10 x 128
# loop through unpadded predicted and gt labels and calculate loss
for item_idx, predicted_labels_seq_item in enumerate(predicted_labels):
# predicted_labels_seq_item and gt_labels_seq_item are 1D vectors of length 128
gt_labels_seq_item = gt_labels[item_idx]
current_loss = criterion(predicted_labels_seq_item, gt_labels_seq_item)
total_loss += current_loss
batch_loss += current_loss
batch_loss.backward()
optimizer.step()
Can anybody please check to see if I'm calculating loss correctly. Thanks
Update:
Is this the correct approach for calculating accuracy metrics?
# batch size: 14
# seq length: 10
for epoch in range(10):
TP = FP = TN = FN = 0.
for x, y, mask in tr_dl:
# mask shape: (10,)
out = model(x) # out shape: (14, 10, 128)
y_pred = (torch.sigmoid(out) >= 0.5).float().type(torch.int64) # consider all predictions above 0.5 as 1, rest 0
y_pred = y_pred[mask] # y_pred shape: (14, 10, 10, 128)
y_labels = y[mask] # y_labels shape: (14, 10, 10, 128)
# do I flatten y_pred and y_labels?
y_pred = y_pred.flatten()
y_labels = y_labels.flatten()
for idx, prediction in enumerate(y_pred):
if prediction == 1 and y_labels[idx] == 1:
# calculate IOU (overlap of prediction and gt bounding box)
iou = 0.78 # assume we get this iou value for objects at idx
if iou >= 0.5:
TP += 1
else:
FP += 1
elif prediction == 1 and y_labels[idx] == 0:
FP += 1
elif prediction == 0 and y_labels[idx] == 1:
FN += 1
else:
TN += 1
EPOCH_ACC = (TP + TN) / (TP + TN + FP + FN)
It is usually recommended to stick with batch-wise operations and avoid going into single-element processing steps while in the main training loop. One way to handle this case is to make your dataset return padded inputs and labels with additionally a mask that will come useful for loss computation. In other words, to compute the loss term with sequences of varying sizes, we will use a mask instead of doing individual slices.
Dataset
The way to proceed is to make sure you build the mask in the dataset and not in the inference loop. Here I am showing a minimal implementation that you should be able to transfer to your dataset without much hassle:
class Dataset(data.Dataset):
def __init__(self):
super().__init__()
def __len__(self):
return 100
def __getitem__(self, index):
i = random.randint(5, SEQ_LEN) # for demo puporse, generate x with random length
x = torch.rand(i, EMB_SIZE)
y = torch.randint(0, N_CLASSES, (i, EMB_SIZE))
# pad data to fit in batch
pad = torch.zeros(SEQ_LEN-len(x), EMB_SIZE)
x_padded = torch.cat((pad, x))
y_padded = torch.cat((pad, y))
# construct tensor to mask loss
mask = torch.cat((torch.zeros(SEQ_LEN-len(x)), torch.ones(len(x))))
return x_padded, y_padded, mask
Essentially in the __getitem__, we not only pad the input x and target y with zero values, we also construct a simple mask containing the positions of the padded values in the currently processed element.
Notice how:
x_padded, shaped (SEQ_LEN, EMB_SIZE)
y_padded, shaped (SEQ_LEN, N_CLASSES)
mask, shaped (SEQ_LEN,)
are all three tensors which are shape invariant across the dataset, yet mask contains the padding information necessary for us to compute the loss function appropriately.
Inference
The loss you've used nn.BCEWithLogitsLoss, is the correct one since it's a multi-dimensional loss used for binary classification. In other words, you can use it here in this multi-label classification task, considering each one of the 128 logits as an individual binary prediction. Do not use nn.CrossEntropyLoss) as suggested elsewhere, since the softmax will push a single logit (i.e. class), which is the behaviour required for single-label classification tasks.
Therefore, in the training loop, we simply have to apply the mask to our loss.
for x, y, mask in dl:
y_pred = model(x)
loss = mask*bce(y_pred, y)
# backpropagation, loss postprocessing, logs, etc.
This is what you need for the first part of the question, there are already loss functions implemented in tensorflow: https://medium.com/#aadityaura_26777/the-loss-function-for-multi-label-and-multi-class-f68f95cae525. Yours is just tf.nn.weighted_cross_entropy_with_logits, but you need to set the weight.
The second part of the question is not straightforward, because there's conditioning on the IOU, generally, when you do machine learning, you should heavily depend on matrix multiplication, in your case, you probably need to pre-calculate the IOU -> 1 or 0 as a vector, then multiply with the y_pred , element-wise, this will give you the modified y_pred . After that, you can use any accuracy available function to calculate the final result.
if you can use the CROSSENTROPYLOSS instead of BCEWithLogitsLoss there is something called ignore_index. you can use it to exclude your padded sequences. the difference between the 2 losses is the activation function used (softmax vs sigmoid). but I think you can still use the CROSSENTROPYLOSSfor binary classification as well.
I have a (244, 108) numpy array. It contains percentage change of close value of a trade for each minute in one day ie 108 values and like that for 244 days. Basically its a 1D vector. So in order to do 1D CNN how should I shape my data?
What i have done:
x.shape = (244, 108)
x = np.expand_dims(x, axis=2)
x.shape = (243, 108, 1)
y.shape = (243,)
Model:
class Net(torch.nn.Module):
def __init__(self):
super(Net, self).__init__()
self.layer1 = torch.nn.Conv1d(in_channels=108, out_channels=1, kernel_size=1, stride=1)
self.act1 = torch.nn.ReLU()
self.act2 = torch.nn.MaxPool1d(kernel_size=1, stride=1)
self.layer2 = torch.nn.Conv1d(in_channels=1, out_channels=1, kernel_size=1, stride=1)
self.act3 = torch.nn.ReLU()
self.act4 = torch.nn.MaxPool1d(kernel_size=1, stride=1)
self.linear_layers = nn.Linear(1, 1)
# Defining the forward pass
def forward(self, x):
x = self.layer1(x)
x = self.act1(x)
x = self.act2(x)
x = self.layer2(x)
x = self.act3(x)
x = self.act4(x)
x = self.linear_layers(x)
return x
If each day should be separate instance for convolution your data should have the shape (248, 1, 108). This seems more reasonable.
If you want all your days and minutes to be a continuum for network to learn it should be of shape (1, 1, 248*108).
Basically first dimension is batch (how many training samples), second is the number of channels or features of sample (only one in your case) and last is the number of timesteps.
Edit
Your pooling layer should be torch.nn.AdaptiveMaxPool1d(1). You should also reshape output from this layer like this: pooled.reshape(x.shape[0], -1) before pushing it through torch.nn.Linear layer.
I'm trying to get a basic LSTM working in TensorFlow. I'm receiving the following error:
TypeError: 'Tensor' object is not iterable.
The offending line is:
rnn_outputs, final_state = tf.nn.dynamic_rnn(cell, x, sequence_length=seqlen,
initial_state=init_state,)`
I'm using version 1.0.1 on windows 7. My inputs and label have the following shapes
x_shape = (50, 40, 18), y_shape = (50, 40)
Where:
batch size = 50
sequence length = 40
input vector length at each step = 18
I'm building my graph as follows
def build_graph(learn_rate, seq_len, state_size=32, batch_size=5):
# use a fixed sequence length
seqlen = tf.constant(seq_len, shape=[batch_size],dtype=tf.int32)
# Placeholders
x = tf.placeholder(tf.float32, [batch_size, None, 18])
y = tf.placeholder(tf.float32, [batch_size, None])
keep_prob = tf.constant(1.0)
# RNN
cell = tf.contrib.rnn.LSTMCell(state_size)
init_state = tf.get_variable('init_state', [1, state_size],
initializer=tf.constant_initializer(0.0))
init_state = tf.tile(init_state, [batch_size, 1])
rnn_outputs, final_state = tf.nn.dynamic_rnn(cell, x, sequence_length=seqlen,
initial_state=init_state,)
# Add dropout, as the model otherwise quickly overfits
rnn_outputs = tf.nn.dropout(rnn_outputs, keep_prob)
# Prediction layer
with tf.variable_scope('prediction'):
W = tf.get_variable('W', [state_size, num_classes])
b = tf.get_variable('b', [num_classes], initializer=tf.constant_initializer(0.0))
preds = tf.tanh(tf.matmul(rnn_outputs, W) + b)
# MSE
loss = tf.square(tf.subtract(y, preds))
# loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits, y))
train_step = tf.train.AdamOptimizer(learn_rate).minimize(loss)
Can anyone tell me what I am missing?
Sequence length should be iterable e.g. a list or tensor, not a scalar. In your case specifically, you need to replace sequence length = 40 with a list of the lengths of each input. For instance, if your first sequence has 10 steps, the second 13 and the third 18, you would pass in [10, 13, 18]. This lets TensorFlow's dynamic RNN know how many steps to unroll for (I believe it uses a while loop internally).
I have a TensorFlow model with some of these characteristics:
state_size = 800,
num_classes = 14313,
batch_size = 10,
num_steps = 16, # width of the tensor
num_layers = 3
x = tf.placeholder(tf.int32, [batch_size, num_steps], name='input_placeholder')
y = tf.placeholder(tf.int32, [batch_size, num_steps], name='labels_placeholder')
rnn_inputs = [tf.squeeze(i, squeeze_dims=[1]) for i in
tf.split(x_one_hot, num_steps, 1)] # still a list of tensors (batch_size, num_classes)
...
logits = tf.matmul(rnn_outputs, W) + b
predictions = tf.nn.softmax(logits)
Now I want to feed it a np.array (shape = batch_size x num_steps, so 10 x 16) and I get a predictions tensor back.
Weirdly, its shape is 160 x 14313. The latter is the number of classes. But where does 160 come from? I don't understand that. I would like to have a probability for each of my classes, for each of the elements of the batch (which is 10). How did the num_steps become involved, how to I read from this pred. tensor which is the expected element after those 16 numbers?
In this case the 160 comes from the shape as you suspected.
which means that for each batch of 10, has 16 timesteps, this is technically flattened when you do your shape variable.
At this point you have logits of shape 160 * classes. so you can do predictions[i] for each batch which then will have the probability of each class being the desired class.
which is why to get the chosen class you would do something like tf.argmax(predictions, 1) to get a tensor with the classification
this will have a shape of 160 in your case, so it will be the predicted
class for each one of the batches.
In order to get the probabilities, you could use the logits:
def prob(logit):
return 1/(1 + np.exp(-logit)