How to add an additional label to a huggingface model? - machine-learning

I'm following the multiple choice QA tutorial and trying to modify it slightly to fit my data. My data is exactly the same, except that I have 5 labels instead of 4:
# original data:
from datasets import load_dataset
swag = load_dataset("swag", "regular")
set(swag["train"]['label'])
>>> {0, 1, 2, 3}
# my data:
set(train_dataset["train"]['label'])
>>>
{0, 1, 2, 3, 4}
I'm running the code in the tutorial and getting the error:
nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [9,0,0] Assertion t >= 0 && t < n_classes failed.
I found from here and here that this can be caused when the target values are out of bounds, which can happend when using nn.CrossEntropyLoss which expects a torch.LongTensor with values in the range [0, nb_classes-1].
I will not copy the entire script from the tutorial since it's in the link above, but I found that the error can be replicated by modifying the DataCollatorForMultipleChoice function by adding an extra label as follows:
from random import choices
#dataclass
class DataCollatorForMultipleChoice:
"""
Data collator that will dynamically pad the inputs for multiple choice received.
"""
tokenizer: PreTrainedTokenizerBase
padding: Union[bool, str, PaddingStrategy] = True
max_length: Optional[int] = None
pad_to_multiple_of: Optional[int] = None
def __call__(self, features):
label_name = "label" if "label" in features[0].keys() else "labels"
labels = [feature.pop(label_name) for feature in features]
labels = [random.choice(range(5)) for _ in range(16)] #<<<---ADDING EXTRA LABEL HERE. INSTEAD OF 0-4 THIS IS BETWEEN 0-5
print(len(labels))
print(labels)
batch_size = len(features)
num_choices = len(features[0]["input_ids"])
flattened_features = [
[{k: v[i] for k, v in feature.items()} for i in range(num_choices)] for feature in features
]
flattened_features = sum(flattened_features, [])
batch = self.tokenizer.pad(
flattened_features,
padding=self.padding,
max_length=self.max_length,
pad_to_multiple_of=self.pad_to_multiple_of,
return_tensors="pt",
)
batch = {k: v.view(batch_size, num_choices, -1) for k, v in batch.items()}
batch["labels"] = torch.tensor(labels, dtype=torch.int64)
return batch
Then when I run the trainer I get:
16 # batch
[0, 0, 2, 1, 1, 1, 0, 4, 0, 4, 3, 0, 0, 0, 1, 1] # labels
... nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [7,0,0] Assertion `t >= 0 && t < n_classes` failed.
I tried changing the number of labels in the model:
# original:
# model = AutoModelForMultipleChoice.from_pretrained("bert-base-uncased")
# my modification:
model = AutoModelForMultipleChoice.from_pretrained("bert-base-uncased", num_labels=5)
but I got the same error.
The script runs just fine with my data if I modify the added line from above to
labels = [random.choice(range(4)) for _ in range(16)] # note that now it's from 0-4 and not from 0-5

Related

how to set condition in objective function in cvxpy

I have a brute force optimization algorithm with the objective function of the form:
np.clip(x # M, a_min=0, a_max=1) # P
where x is a Boolean decision vector, M is a Boolean matrix/tensor and P is a probability vector. As you can guess, x # M as an inner product can have values higher than 1 where is not allowed as the obj value should be a probability scalar or vector (if M is a tensor) between 0 to 1. So, I have used numpy.clip to fix the x # M to 0 and 1 values. How can I set up a mechanism like clip in cvxpy to achieve the same result? I have spent ours on internet with no lock so I appreciate any hint. I have been trying to use this to replicate clip but it raises Exception: Cannot evaluate the truth value of a constraint or chain constraints, e.g., 1 >= x >= 0. As a side note, since cvxpy cannot handle tensors, I loop through tensor slices with M[s].
n = M.shape[0]
m = M.shape[1]
w = M.shape[2]
max_budget_of_decision_variable = 7
x = cp.Variable(n, boolean=True)
obj = 0
for s in range(m):
for w in range(w):
if (x # M[s])[w] >= 1:
(x # M[s])[w] = 1
obj += x # M[s] # P
objective = cp.Maximize(obj)
cst = []
cst += [cp.sum(y) <= max_budget_of_decision_variable ]
prob = cp.Problem(objective, constraints = cst)
As an example, consider M = np.array([ [1, 0, 0, 1, 1, 0], [0, 0, 1, 0, 1, 0], [1, 1, 1, 0, 1, 0]]) and P = np.array([0.05, 0.15, 0.1, 0.15, 0.5, 0.05]).

Dask - custom aggregation

I've just learned about Dask yesterday and now am upgrading my work from pandas... but am stuck trying to translate simple custom aggregations.
I'm not fully (or probably even at all) understanding how the Series are represented inside those internal lambda functions; normally I would step in with breakpoint() to inspect them, but this isn't an option here. When I try to get an element of x with index, I get an "Meta" error.
Any help/pointers would be appreciated.
import dask.dataframe as dd
import pandas as pd
#toy df
df = dd.from_pandas(pd.DataFrame(dict(a = [1, 1, 2, 2], \
b = [100, 100, 200, 250])), npartitions=2)
df.compute()
a b
0 1 100
1 1 100
2 2 200
3 2 250
# PART 1 - for conceptual understanding - replicating trivial list
# intended result
df.groupby('a').agg(list).compute()
b
a
1 [100, 100]
2 [200, 250]
# replicate manually with custom aggregation
list_manual = dd.Aggregation('list_manual', lambda x: list(x), \
lambda x1: list(x1))
res = df.groupby('a').agg(list_manual).compute()
res
b
0 (0, [(1, 0 100\n1 100\nName: b, dtype: i...
res.b[0]
(0,
0 (1, [100, 100])
0 (2, [200, 250])
Name: list_manual-b-6785578d38a71d6dbe0d1ac6515538f7, dtype: object)
# looks like the grouping tuple wasn't even unpacked (1, 2 groups are there)
# ... instead it all got collapsed into one thing
# PART 2 - custom function
# with pandas - intended behavior
collect_uniq = lambda x: list(set(x))
dfp = df.compute()
dfp.groupby('a').agg(collect_uniq)
b
a
1 [100]
2 [200, 250]
#now trying the same in Dask
collect_uniq_dask = dd.Aggregation('collect_uniq_dask', \
lambda x: list(set(x)), lambda x1: list(x1))
res = df.groupby('a').agg(collect_uniq_dask).compute()
# gives TypeError("unhashable type: 'Series'")

Uploading and labeling pairs of photos

I created a ResNet18 to detect if 2 individuals are siblings or not, by giving an image of each one (the model has input_size = 2).
I need to create my dataset, in which I will specify which pair are siblings or not.
I tried:
training_set = train_datagen.flow_from_directory('training',
target_size=(28,28),
batch_size=32,
class_mode='binary')
And I got training_set.classes array([0, 0, 0, 0, 1, 1, 1, 1])
for training_set.filenames
'false\\false1\\_DSC5763.jpg',
'false\\false2\\_DSC5751.jpg',
'false\\false2\\_DSC5760.jpg',
'siblings\\siblings1\\_DSC5751.jpg',
'siblings\\siblings1\\_DSC5755_1.jpg',
'siblings\\siblings2\\_DSC5760.jpg',
'siblings\\siblings2\\_DSC5763.jpg'
The training_set.classes should be array([0, 0, 1, 1]), for my purposes.
How can I do this?
I finished my project and I thought to come back to post the answer I found. I am trying to classify if 2 individuals are siblings or not.
#Lists used for creating the dataset
categories = []
first_img= []
second_img = []
#Parsing throw the images and making 2 arrays
for filename in filenames:
category = filename.split('.')[0]
#Each pair is named <sibling/false>+<nr_of_pair>+0/1
if 'sibling' in category:
if filename.split('_')[1][0] == '0':
first_img.append(filename)
categories.append(1)
else:
second_img.append(filename)
else:
if filename.split('_')[1][0] == '0':
first_img.append(filename)
categories.append(0)
else:
second_img.append(filename)
#dataset of the first individual of the pair and it's label
df1 = pd.DataFrame({
'filename': first_img,
'category': categories
}).astype('str')
#dataset of the second individual of the pair and it's label
df2 =pd.DataFrame({
'filename': second_img,
'category': categories
}).astype('str')
And for the fit_generator I used the function.
def generate_generator_multiple(datagen):
train_generator1 = datagen.flow_from_dataframe(df1,
"../train/input/",
x_col='filename',
y_col='category',
class_mode='binary',
target_size=(image_size1, image_size2),
batch_size = batch_size)
train_generator2 = datagen.flow_from_dataframe(df2,
"../train/input/",
x_col='filename',
y_col='category',
class_mode='binary',
target_size=(image_size1, image_size2),
batch_size = batch_size)
while True:
X1i = train_generator1.next()
X2i = train_generator2.next()
yield [X1i[0], X2i[0]], X2i[1] #Yield both images and their mutual
datagen is a ImageDataGenerator object

Conversion between keypoints Coco and open pose?

Hi I am currently struggling between converting between popular 2d keypoint output , from COCO keypoints to openpose . I have the following keypoint order from coco keypoints of the order x1,y1,c1 ....x17,y17,c17 where x,y are the x y cordinates and C is the confidence score of the joints being detected. I was wondering if any one has successfully mapped between Coco and openpose
def convert_coco_to_openpose_cords(coco_keypoints_list):
# coco keypoints: [x1,y1,v1,...,xk,yk,vk] (k=17)
# ['Nose', Leye', 'Reye', 'Lear', 'Rear', 'Lsho', 'Rsho', 'Lelb',
# 'Relb', 'Lwri', 'Rwri', 'Lhip', 'Rhip', 'Lkne', 'Rkne', 'Lank', 'Rank']
# openpose keypoints: [y1,...,yk], [x1,...xk] (k=18, with Neck)
# ['Nose', *'Neck'*, 'Rsho', 'Relb', 'Rwri', 'Lsho', 'Lelb', 'Lwri','Rhip',
# 'Rkne', 'Rank', 'Lhip', 'Lkne', 'Lank', 'Leye', 'Reye', 'Lear', 'Rear']
indices = [0, 6, 8, 10, 5, 7, 9, 12, 14, 16, 11, 13, 15, 1, 2, 3, 4]
y_cords = []
x_cords = []
for i in indices:
xi, yi, vi = coco_keypoints_list[i*3:(i+1)*3]
if vi == 0: # not labeled
y_cords.append(MISSING_VALUE)
x_cords.append(MISSING_VALUE)
elif vi == 1: # labeled but not visible
y_cords.append(yi)
x_cords.append(xi)
elif vi == 2: # labeled and visible
y_cords.append(yi)
x_cords.append(xi)
else:
raise ValueError("vi value: {}".format(vi))
# Get 'Neck' keypoint by interpolating between 'Lsho' and 'Rsho' keypoints
l_shoulder_index = 5
r_shoulder_index = 6
l_shoulder_keypoint = coco_keypoints_list[l_shoulder_index*3:(l_shoulder_index+1)*3]
r_shoulder_keypoint = coco_keypoints_list[r_shoulder_index*3:(r_shoulder_index+1)*3]
if l_shoulder_keypoint[2] > 0 and r_shoulder_keypoint[2] > 0:
neck_keypoint_y = int((l_shoulder_keypoint[1]+r_shoulder_keypoint[1])/2.)
neck_keypoint_x = int((l_shoulder_keypoint[0]+r_shoulder_keypoint[0])/2.)
else:
neck_keypoint_y = neck_keypoint_x = MISSING_VALUE
open_pose_neck_index = 1
y_cords.insert(open_pose_neck_index, neck_keypoint_y)
x_cords.insert(open_pose_neck_index, neck_keypoint_x)
return np.concatenate([np.expand_dims(y_cords, -1),
np.expand_dims(x_cords, -1)], axis=1)

Weird erro in theano when running 3D CNN

I am modifying the tutorial here, http://deeplearning.net/tutorial/lenet.html, to a 3D convolutional neural network.
However, I met a problem when I run, as,
Traceback (most recent call last):
File "cnnProgram.py", line 191, in <module>
minibatch_avg_cost = train_model(minibatch_index)
File "/Volumes/TONY/anaconda/lib/python2.7/site-packages/theano/compile/function_module.py", line 871, in __call__
storage_map=getattr(self.fn, 'storage_map', None))
File "/Volumes/TONY/anaconda/lib/python2.7/site-packages/theano/gof/link.py", line 314, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File "/Volumes/TONY/anaconda/lib/python2.7/site-packages/theano/compile/function_module.py", line 859, in __call__
outputs = self.fn()
ValueError: total size of new array must be unchanged
Apply node that caused the error: Reshape{4}(InplaceDimShuffle{0,4,1,2,3}.0, TensorConstant{[5000 5.. 12 5]})
Toposort index: 102
Inputs types: [TensorType(float64, 5D), TensorType(int64, vector)]
Inputs shapes: [(200, 10, 5, 12, 5), (4,)]
Inputs strides: [(24000, 8, 4800, 400, 80), (8,)]
Inputs values: ['not shown', array([5000, 5, 12, 5])]
Outputs clients: [[CorrMM_gradWeights{valid, (1, 1)}(Reshape{4}.0, Reshape{4}.0), CorrMM{valid, (1, 1)}(Reshape{4}.0, Subtensor{::, ::, ::int64, ::int64}.0)]]
Here is my code, it seems the problem occurs when I change the dimension order with .dimshuffle? This is too weird, and I completely can't figure out why.
Here is my code.
from __future__ import print_function
import scipy.io as sio
import numpy as np
import theano.tensor as T
import theano
from theano import shared
from lasagne.layers import InputLayer, DenseLayer
import os
import sys
import timeit
from mlp import LogRegr, HiddenLayer, DropoutLayer
from convnet3d import ConvLayer, NormLayer, PoolLayer, RectLayer
from activations import relu, tanh, sigmoid, softplus
# Get data
dataReadyForCNN_withValid = sio.loadmat("DataReadyForCNN_withValid.mat")
xTrain = dataReadyForCNN_withValid["xTrain"]
xTrain = xTrain.astype("float64")
yTrainCond = dataReadyForCNN_withValid["yTrainCond"]
yTrainCond = yTrainCond.astype("int32")
yTrainWord = dataReadyForCNN_withValid["yTrainWord"]
yTrainWord = yTrainWord.astype("int32")
xValidate = dataReadyForCNN_withValid["xTrain"]
xValidate = xValidate.astype("float64")
yValidateCond = dataReadyForCNN_withValid["yValidateCond"]
yValidateCond = yValidateCond.astype("int32")
yValidateWord = dataReadyForCNN_withValid["yValidateWord"]
yValidateWord = yValidateWord.astype("int32")
xTest = dataReadyForCNN_withValid["xTest"]
xTest = xTest.astype("float64")
yTestCond = dataReadyForCNN_withValid["yTestCond"]
yTestCond = yTestCond.astype("int32")
yTestWord = dataReadyForCNN_withValid["yTestWord"]
yTestWord = yTestWord.astype("int32")
##################################
# Build Model
#################################
# xTrain = np.random.rand(500, 1, 51, 61, 23).astype('float64')
dtensor5 = T.TensorType('float64', (False,)*5)
x = dtensor5('x') # the input data
y = T.ivector()
# allocate symbolic variables for the data
index = T.lscalar() # index to a [mini]batch
# input = (nImages, nChannel(nFeatureMaps), nDim1, nDim2, nDim3)
# layer1 (500, 5, 47, 56, 22)
# layer2 (500, 5, 10, 12, 5)
# layer3 (500, 3, 9, 11, 4)
# layer4 (500, 3, 5, 6, 2)
kernel_shape = (5,6,2)
fMRI_shape = (51, 61, 23)
n_in_maps = 1 # channel
n_out_maps = 5 # num of feature maps, aka the depth of the neurons
batch_size = 200
# 1st: Convolution Layer
layer1_input = x
layer1 = ConvLayer(layer1_input, 1, 5, (5, 6, 2), fMRI_shape,
batch_size, tanh)
# print layer1.output.eval({x:xTrain[:500]}).shape
# 2nd: Pool layer
poolShape = (5, 5, 5)
layer2 = PoolLayer(layer1.output, poolShape)
# print layer2.output.eval({x:xTrain}).shape
# 3rd: Convolution Layer
layer3 = ConvLayer(layer2.output, 5, 3, (2, 2, 2), (10, 12, 5),
500, tanh)
# print layer3.output.eval({x:xTrain[:500]}).shape
# 4th: Pool layer
layer4 = PoolLayer(layer3.output, (2, 2, 2))
# print layer4.output.eval({x:xTrain[:500]}).shape
# 5th: Dense layer
layer5_input = T.flatten(layer4.output, outdim=2)
layer5 = HiddenLayer(layer5_input, n_in=180, n_out=500, activation=tanh)
# layer5.output.eval({x:xTrain[:500]}).shape
# 6th: Logistic layer
layer6 = LogRegr(layer5.output, 500, 12, tanh)
cost = layer6.negative_log_likelihood(y)
# create a function to compute the mistakes that are made by the model
test_model = theano.function(
[index],
layer6.errors(y),
givens={
x: shared(xTest)[index * batch_size: (index + 1) * batch_size],
y: shared(yTestCond[0])[index * batch_size: (index + 1) * batch_size]
}
)
validate_model = theano.function(
[index],
layer6.errors(y),
givens={
x: shared(xValidate)[index * batch_size: (index + 1) * batch_size],
y: shared(yValidateCond[0])[index * batch_size: (index + 1) * batch_size]
}
)
# create a list of all model parameters to be fit by gradient descent
params = layer5.params + layer3.params + layer1.params + layer6.params
# create a list of gradients for all model parameters
grads = T.grad(cost, params)
# train_model is a function that updates the model parameters by
# SGD Since this model has many parameters, it would be tedious to
# manually create an update rule for each model parameter. We thus
# create the updates list by automatically looping over all
# (params[i], grads[i]) pairs.
learning_rate=0.1
updates = [
(param_i, param_i - learning_rate * grad_i)
for param_i, grad_i in zip(params, grads)
]
train_model = theano.function(
[index],
cost,
updates=updates,
givens={
x: shared(xTrain)[index * batch_size: (index + 1) * batch_size],
y: shared(yTrainCond[0])[index * batch_size: (index + 1) * batch_size]
}
)
###############
# TRAIN MODEL #
###############
import timeit
print('... training')
n_train_batches = 10
n_test_batches = 10
n_validate_batches = 10
n_epochs=200
# early-stopping parameters
patience = 10000 # look as this many examples regardless
patience_increase = 2 # wait this much longer when a new best is
# found
improvement_threshold = 0.995 # a relative improvement of this much is
# considered significant
validation_frequency = min(n_train_batches, patience // 2)
# go through this many
# minibatche before checking the network
# on the validation set; in this case we
# check every epoch
best_validation_loss = np.inf
best_iter = 0
test_score = 0.
start_time = timeit.default_timer()
epoch = 0
done_looping = False
while (epoch < n_epochs) and (not done_looping):
epoch = epoch + 1
for minibatch_index in range(n_train_batches):
minibatch_avg_cost = train_model(minibatch_index)
# iteration number
iter = (epoch - 1) * n_train_batches + minibatch_index
if (iter + 1) % validation_frequency == 0:
# compute zero-one loss on validation set
validation_losses = [validate_model(i) for i
in range(n_valid_batches)]
this_validation_loss = numpy.mean(validation_losses)
print(
'epoch %i, minibatch %i/%i, validation error %f %%' %
(
epoch,
minibatch_index + 1,
n_train_batches,
this_validation_loss * 100.
)
)
# if we got the best validation score until now
if this_validation_loss < best_validation_loss:
#improve patience if loss improvement is good enough
if (
this_validation_loss < best_validation_loss *
improvement_threshold
):
patience = max(patience, iter * patience_increase)
best_validation_loss = this_validation_loss
best_iter = iter
# test it on the test set
test_losses = [test_model(i) for i
in range(n_test_batches)]
test_score = numpy.mean(test_losses)
print((' epoch %i, minibatch %i/%i, test error of '
'best model %f %%') %
(epoch, minibatch_index + 1, n_train_batches,
test_score * 100.))
if patience <= iter:
done_looping = True
break
end_time = timeit.default_timer()
print(('Optimization complete. Best validation score of %f %% '
'obtained at iteration %i, with test performance %f %%') %
(best_validation_loss * 100., best_iter + 1, test_score * 100.))
Because I did not have your data. So I changed the getting data part of your code and then tested it. I did not get any bug.
Make sure your input data is in the right form(with the shape of (50, 1, 51, 61, 23) in your case).

Resources