distinguishment with sub categories - machine-learning

I am dealing with distinguishing two categories A and B these days and I find that when category B contains some sub categories B1, B2, B3...
Sometimes the distinguish result is better for explicit labeling B1, B2, B3 (the sub categories labels), but sometimes the result is better for gathering the sub categories and just labeling them B.
In the other word, sometimes
y=[A, A, A, ..., B1, B1, ..., B2, B2, ... B3, B3, ...]
is better, but sometimes,
y=[A, A, A, ..., B, B, B, ...]
is better.
I think naively there are two impact effect the result:
case 1 include more information
case 2 the algorithm can focus more on the distinguish of A and B
But I am not sure my assumption is right, anyone knows about it? And when dealing with this case, when there are sub categories, what's your way to get a best result?

In this situation you can use a two-level classifier which first predicts the top-level class, and second selects the most likely second-level class falling underneath. Here is some code that I have used,
from sklearn.base import BaseEstimator, ClassifierMixin
import pandas as pd
import numpy as np
class TwoLevelClassifier(BaseEstimator, ClassifierMixin):
"""
A two-level classifier intended to be used with labels taken from a 2-level taxonomy.
The labels are pipe-separated class-subclass pairs as "class|subclass".
>>> from sklearn.linear_model import LogisticRegression()
>>> X = pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]])
>>> y1 = pd.Series(['A','A','B'])
>>> y2 = pd.Series(['A1','A2','B1'])
>>> y = y1 + "|" + y2
>>> clf = TwoLevelClassifier(LogisticRegression(), LogisticRegression())
>>> clf.fit(X,y)
"""
def __init__(self, classifier1, classifier2):
self.classifier1 = classifier1
self.classifier2 = classifier2
def fit(self, X, y):
y1 = y.str.split('|').str[0]
y2 = y.str.split('|').str[1]
self.classifier1.fit(X, y1)
self.classifier2.fit(X, y)
self.classes_ = self.classifier2.classes_
def predict_proba(self, X):
level1_pred = pd.Series(self.classifier1.predict(X))
probs = pd.DataFrame(self.classifier2.predict_proba(X), columns = self.classifier2.classes_)
classes_ = self.classifier2.classes_
mask = np.array(pd.Series(level1_pred).map(lambda x: [y.split('|')[0] == x for y in classes_]).tolist())
probs_filtered = probs.where(mask, 0)
return probs_filtered.values
def predict(self, X):
probs = self.predict_proba(X)
return probs.idxmax(axis=1)

Related

How to compute scikit learn pairwise customer affinity of a DNA sequence list

I want to compute a pairwise similar score of a list of DNA sequence, ie. X = ['AACCT', 'TTCCAAT', 'GGGGACTT'], as an input to scikit learn clustering algorithm.
from sklearn.metrics.pairwise import pairwise_distances
from Bio import pairwise2 # pairwise compute distance between DNA sequences
first I define a customer distance function:
def blast_dist(x, y):
try:
x = Seq(x)
y = Seq(y)
return pairwise2.align.globalms(x, y, 5, -5, -2, -0.5)[0].score
# pairwise2.align.globalms is to calculate blast identity score between two DNA sequence
except:
return 0
and then use sklearn.metrics.pairwise_distances function to generate a distance matrix
m = pairwise_distances(X, X, metric=blast_dist)
and return an error
ValueError: could not convert string to float: 'AACCT'

Using quadratic function with unknown constant term, How can i find these unknown constant using gradient descent?

everyone.
I am beginner of machine learning and start learning about gradient descent right now. However, I got a one big problem. Following question is like this :
given numbers [0,0],[1,1],[1,2],[2,1] and
equation will be [ f=(a2)*x^2 + (a1)*x + a0 ]
With hand-solving, i got a answer [-1,5/2,0]
but it is hard to find out the solution from making a python code with gradient descent with these given data.
In my case, I try to make a code with gradient descent method with easiest and fastest way like :
learningRate = 0.1
make **a series of number of x
initialize given 1,1,1 for a2,a1,a0
partial derivative for a2,a1,a0 (a2_p:2x, a1_p:x, a0_p:1)
gradient descent method : (ex) a2 = a2 - (learningRate)( y - [(a2)*x^2 + (a1)*x + a0] )(a2_p)
ps. Honestly, I do not know what should i put 'x' and 'y' or a2, a1, a0.
However, i got a wrong answer with different result each time.
So, I want to get a hint for correct equation or code sequence.
Thank you for reading my lowest level of question.
There are a few errors in your equations
For the function f(x) = a2*x^2+a1*x+a0, partial derivatives for a2, a1 and a0 are x^2, x and 1, respectively.
Suppose cost function is (1/2)*(y-f(x))^2
Partial derivatives of cost function with respect to ai is -(y-f(x))* partial derivative of f(x) for ai, where i belongs to [0,2]
So, the gradient descent equation is:
ai = ai + learning_rate*(y-f(x)) * partial derivative of f(x) for ai, where i belongs to [0,2]
I hope this code helps
#Training sample
sample = [(0,0),(1,1),(1,2),(2,1)]
#Our function => a2*x^2+a1*x+a0
class Function():
def __init__(self, a2, a1, a0):
self.a2 = a2
self.a1 = a1
self.a0 = a0
def eval(self, x):
return self.a2*x**2+self.a1*x+self.a0
def partial_a2(self, x):
return x**2
def partial_a1(self, x):
return x
def partial_a0(self, x):
return 1
#Initialise function
f = Function(1,1,1)
#To Calculate loss from the sample
def loss(sample, f):
return sum([(y-f.eval(x))**2 for x,y in sample])/len(sample)
epochs = 100000
lr = 0.0005
#To record the best values
best_values = (0,0,0)
for epoch in range(epochs):
min_loss = 100
for x, y in sample:
#Gradient descent
f.a2 = f.a2+lr*(y-f.eval(x))*f.partial_a2(x)
f.a1 = f.a1+lr*(y-f.eval(x))*f.partial_a1(x)
f.a0 = f.a0+lr*(y-f.eval(x))*f.partial_a0(x)
#Storing the best values
epoch_loss = loss(sample, f)
if min_loss > epoch_loss:
min_loss = epoch_loss
best_values = (f.a2, f.a1, f.a0)
print("Loss:", min_loss)
print("Best values (a2,a1,a0):", best_values)
Output:
Loss: 0.12500004789165717
Best values (a2,a1,a0): (-1.0001922562970325, 2.5003368582261487, 0.00014521557599919338)

change model in tensorflow-federated but not work

I try to change model(just and hidden layer) in the tutorial of Federated Learning for Image Classification. But the result shows that w1 and b1 don't change and retain the initial value 0 after multiple iterations. Only w2 and b2 are trainable in the training. Here is my code:
MnistVariables = collections.namedtuple(
'MnistVariables', 'w1 w2 b1 b2 num_examples loss_sum accuracy_sum')
def create_mnist_variables():
return MnistVariables(
w1=tf.Variable(
lambda: tf.zeros(dtype=tf.float32, shape=(784, 128)),
name='w1',
trainable=True),
w2=tf.Variable(
lambda: tf.zeros(dtype=tf.float32, shape=(128, 10)),
name='w2',
trainable=True),
b1=tf.Variable(
lambda: tf.zeros(dtype=tf.float32, shape=(128)),
name='b1',
trainable=True),
b2=tf.Variable(
lambda: tf.zeros(dtype=tf.float32, shape=(10)),
name='b2',
trainable=True),
num_examples=tf.Variable(0.0, name='num_examples', trainable=False),
loss_sum=tf.Variable(0.0, name='loss_sum', trainable=False),
accuracy_sum=tf.Variable(0.0, name='accuracy_sum', trainable=False))
def mnist_forward_pass(variables, batch):
a = tf.add(tf.matmul(batch['x'], variables.w1) , variables.b1)
a= tf.nn.relu(a)
y = tf.nn.softmax(tf.add(tf.matmul(a, variables.w2),variables.b2))
predictions = tf.cast(tf.argmax(y, 1), tf.int32)
flat_labels = tf.reshape(batch['y'], [-1])
loss = -tf.reduce_mean(tf.reduce_sum(
tf.one_hot(flat_labels, 10) * tf.log(y), reduction_indices=[1]))
accuracy = tf.reduce_mean(
tf.cast(tf.equal(predictions, flat_labels), tf.float32))
num_examples = tf.to_float(tf.size(batch['y']))
tf.assign_add(variables.num_examples, num_examples)
tf.assign_add(variables.loss_sum, loss * num_examples)
tf.assign_add(variables.accuracy_sum, accuracy * num_examples)
return loss, predictions
def get_local_mnist_metrics(variables):
return collections.OrderedDict([
('w1', variables.w1),
('w2', variables.w2),
('b1', variables.b1),
('b2', variables.b2),
('num_examples', variables.num_examples),
('loss', variables.loss_sum / variables.num_examples),
('accuracy', variables.accuracy_sum / variables.num_examples)
])
class MnistModel(tff.learning.Model):
def __init__(self):
self._variables = create_mnist_variables()
#property
def trainable_variables(self):
return [self._variables.w1, self._variables.w2,
self._variables.b1, self._variables.b2]
I also add w2 and b2 in the trainable variables. But it seems that they are not trained in the training process and I don't know why. Does anyone have some successful experiences to change model in this tutorial?
I suspect the ReLU activations with zero initialisations of w1 and b1 are problematic and this maybe a case of "dying ReLU" (see What is the “dying ReLU” problem in neural networks?.
Since w1 and b1 are initialized to zero, I would expect the output to also be 0 after the matrix multiply and addition.
Possible options to try: using a non-zero initializer, use an alternative activation function (or don't have an activation after the first layer).

How to compute the cosine_similarity in pytorch for all rows in a matrix with respect to all rows in another matrix

In pytorch, given that I have 2 matrixes how would I compute cosine similarity of all rows in each with all rows in the other.
For example
Given the input =
matrix_1 = [a b]
[c d]
matrix_2 = [e f]
[g h]
I would like the output to be
output =
[cosine_sim([a b] [e f]) cosine_sim([a b] [g h])]
[cosine_sim([c d] [e f]) cosine_sim([c d] [g h])]
At the moment I am using torch.nn.functional.cosine_similarity(matrix_1, matrix_2) which returns the cosine of the row with only that corresponding row in the other matrix.
In my example I have only 2 rows, but I would like a solution which works for many rows. I would even like to handle the case where the number of rows in the each matrix is different.
I realize that I could use the expand, however I want to do it without using such a large memory footprint.
By manually computing the similarity and playing with matrix multiplication + transposition:
import torch
from scipy import spatial
import numpy as np
a = torch.randn(2, 2)
b = torch.randn(3, 2) # different row number, for the fun
# Given that cos_sim(u, v) = dot(u, v) / (norm(u) * norm(v))
# = dot(u / norm(u), v / norm(v))
# We fist normalize the rows, before computing their dot products via transposition:
a_norm = a / a.norm(dim=1)[:, None]
b_norm = b / b.norm(dim=1)[:, None]
res = torch.mm(a_norm, b_norm.transpose(0,1))
print(res)
# 0.9978 -0.9986 -0.9985
# -0.8629 0.9172 0.9172
# -------
# Let's verify with numpy/scipy if our computations are correct:
a_n = a.numpy()
b_n = b.numpy()
res_n = np.zeros((2, 3))
for i in range(2):
for j in range(3):
# cos_sim(u, v) = 1 - cos_dist(u, v)
res_n[i, j] = 1 - spatial.distance.cosine(a_n[i], b_n[j])
print(res_n)
# [[ 0.9978022 -0.99855876 -0.99854881]
# [-0.86285472 0.91716063 0.9172349 ]]
Adding eps for numerical stability base on benjaminplanche's answer:
def sim_matrix(a, b, eps=1e-8):
"""
added eps for numerical stability
"""
a_n, b_n = a.norm(dim=1)[:, None], b.norm(dim=1)[:, None]
a_norm = a / torch.max(a_n, eps * torch.ones_like(a_n))
b_norm = b / torch.max(b_n, eps * torch.ones_like(b_n))
sim_mt = torch.mm(a_norm, b_norm.transpose(0, 1))
return sim_mt
same as Zhang Yu's answer but using clamp instead of max and without creating a new tensor. I did a small test with timeit, which indicated that clamp was faster, though I am not proficient in using that tool.
def sim_matrix(a, b, eps=1e-8):
"""
added eps for numerical stability
"""
a_n, b_n = a.norm(dim=1)[:, None], b.norm(dim=1)[:, None]
a_norm = a / torch.clamp(a_n, min=eps)
b_norm = b / torch.clamp(b_n, min=eps)
sim_mt = torch.mm(a_norm, b_norm.transpose(0, 1))
return sim_mt
You could use TorchMetrics's from torchmetrics.functional import pairwise_cosine_similarity to calculate cosine similarity for two matrices with different shapes. Refer to https://torchmetrics.readthedocs.io/en/stable/pairwise/cosine_similarity.html
>>> import torch
>>> from torchmetrics.functional import pairwise_cosine_similarity
>>> x = torch.tensor([[2, 3], [3, 5], [5, 8]], dtype=torch.float32)
>>> y = torch.tensor([[1, 0], [2, 1]], dtype=torch.float32)
>>> pairwise_cosine_similarity(x, y)
tensor([[0.5547, 0.8682],
[0.5145, 0.8437],
[0.5300, 0.8533]])
>>> pairwise_cosine_similarity(x)
tensor([[0.0000, 0.9989, 0.9996],
[0.9989, 0.0000, 0.9998],
[0.9996, 0.9998, 0.0000]])
It is unnecessary to use loop in calculate the similarity between the row/column vector in a matrix. Here an example.
import torch as t
a = t.randn(2,4)
print(a)
# step 1. 计算行向量的长度
len_a = t.sqrt(t.sum(a**2,dim=-1))
print(len_a)
b = len_a.unsqueeze(1).expand(-1,2)
c = len_a.expand(2,-1)
# print(b)
# print(c)
# step2. 计算乘积
x = a # a.T
print(x)
# step3. 计算最后的结果
res = x/(b*c)
print(res)
You can expand the 2 input batches, perform the pairwise cosine similarity operation, then transpose it:
Non-cloning equivalents of torch.repeat_interleave and torch.repeat are used.
def distance_matrix(x, y, distance_function):
return distance_function(
x.view(x.size(0), 1, x.size(1)).expand(x.size(0), y.size(0), x.size(1)).contiguous().view(-1, x.size(1)),
y.expand(x.size(0), y.size(0), y.size(1)).flatten(end_dim=1),
).view(x.size(0), y.size(0))
from torch.nn import functional as F
distance_matrix(x, y, F.cosine_similarity)

Implementing a custom objective function in Keras

I am trying to implement a custom Keras objective function:
in 'Direct Intrinsics: Learning Albedo-Shading Decomposition by Convolutional Regression', Narihira et al.
This is the sum of equations (4) and (6) from the previous picture. Y* is the ground truth, Y a prediction map and y = Y* - Y.
This is my code:
def custom_objective(y_true, y_pred):
#Eq. (4) Scale invariant L2 loss
y = y_true - y_pred
h = 0.5 # lambda
term1 = K.mean(K.sum(K.square(y)))
term2 = K.square(K.mean(K.sum(y)))
sca = term1-h*term2
#Eq. (6) Gradient L2 loss
gra = K.mean(K.sum((K.square(K.gradients(K.sum(y[:,1]), y)) + K.square(K.gradients(K.sum(y[1,:]), y)))))
return (sca + gra)
However, I suspect that the equation (6) is not correctly implemented because the results are not good. Am I computing this right?
Thank you!
Edit:
I am trying to approximate (6) convolving with Prewitt filters. It works when my input is a chunk of images i.e. y[batch_size, channels, row, cols], but not with y_true and y_pred (which are of type TensorType(float32, 4D)).
My code:
def cconv(image, g_kernel, batch_size):
g_kernel = theano.shared(g_kernel)
M = T.dtensor3()
conv = theano.function(
inputs=[M],
outputs=conv2d(M, g_kernel, border_mode='full'),
)
accum = 0
for curr_batch in range (batch_size):
accum = accum + conv(image[curr_batch])
return accum/batch_size
def gradient_loss(y_true, y_pred):
y = y_true - y_pred
batch_size = 40
# Direction i
pw_x = np.array([[-1,0,1],[-1,0,1],[-1,0,1]]).astype(np.float64)
g_x = cconv(y, pw_x, batch_size)
# Direction j
pw_y = np.array([[-1,-1,-1],[0,0,0],[1,1,1]]).astype(np.float64)
g_y = cconv(y, pw_y, batch_size)
gra_l2_loss = K.mean(K.square(g_x) + K.square(g_y))
return (gra_l2_loss)
The crash is produced in:
accum = accum + conv(image[curr_batch])
...and error description is the following one:
*** TypeError: ('Bad input argument to theano function with name "custom_models.py:836" at index 0 (0-based)', 'Expected an array-like
object, but found a Variable: maybe you are trying to call a function
on a (possibly shared) variable instead of a numeric array?')
How can I use y (y_true - y_pred) as a numpy array, or how can I solve this issue?
SIL2
term1 = K.mean(K.square(y))
term2 = K.square(K.mean(y))
[...]
One mistake spread across the code was that when you see (1/n * sum()) in the equations, it is a mean. Not the mean of a sum.
Gradient
After reading your comment and giving it more thought, I think there is a confusion about the gradient. At least I got confused.
There are two ways of interpreting the gradient symbol:
The gradient of a vector where y should be differentiated with respect to the parameters of your model (usually the weights of the neural net). In previous edits I started to write in this direction because that's the sort of approach used to trained the model (eg. gradient descent). But I think I was wrong.
The pixel intensity gradient in a picture, as you mentioned in your comment. The diff of each pixel with its neighbor in each direction. In which case I guess you have to translate the example you gave into Keras.
To sum up, K.gradients() and numpy.gradient() are not used in the same way. Because numpy implicitly considers (i, j) (the row and column indices) as the two input variables, while when you feed a 2D image to a neural net, every single pixel is an input variable. Hope I'm clear.

Resources