My intention was to solve the l1 error fitting problem using scipy.optimize as suggested in L1 norm instead of L2 norm for cost function in regression model, but I keep getting the wrong solution, so I debugged using least square which we know how to get a closed-form expression:
n=10
d=2
A = np.random.rand(n,d)
x = np.random.rand(d,1)
b = np.dot(A, x)
x_hat = np.linalg.lstsq(A, b)[0]
print(np.linalg.norm(np.dot(A, x_hat)-b))
def fit(X, params):
return X.dot(params)
def cost_function(params, X, y):
return np.linalg.norm(y - fit(X, params))
x0 = np.zeros((d,1))
output = minimize(cost_function, x0, args=(A, b))
y_hat = fit(A, output.x)
print(np.linalg.norm(y_hat-b))
print(output)
The output is:
4.726604209672303e-16
2.2714597315189407
fun: 2.2714597315189407
hess_inv: array([[ 0.19434496, -0.1424377 ],
[-0.1424377 , 0.16718703]])
jac: array([ 3.57627869e-07, -2.98023224e-08])
message: 'Optimization terminated successfully.'
nfev: 32
nit: 4
njev: 8
status: 0
success: True
x: array([0.372247 , 0.32633966])
which looks super weird because it can't even solve the l2 regression? Am I making stupid mistakes here?
The error is due to mismatching shapes. Minimize's x0 and the iterates have shape (n,). So when params has shape (n,), fit(X,params) will be shape (n,) while your y has shape (n,1). Thus, the expression y-fit(X,params) has shape (n,n) due to numpy's implicit broadcasting. And in consequence np.linalg.norm returns the frobenius matrix norm instead of the euclidean vector norm.
Solution: change your cost function to:
def cost_function(params, X, y):
return np.linalg.norm(y - fit(X, params.reshape(d,1)))
Related
everyone.
I am beginner of machine learning and start learning about gradient descent right now. However, I got a one big problem. Following question is like this :
given numbers [0,0],[1,1],[1,2],[2,1] and
equation will be [ f=(a2)*x^2 + (a1)*x + a0 ]
With hand-solving, i got a answer [-1,5/2,0]
but it is hard to find out the solution from making a python code with gradient descent with these given data.
In my case, I try to make a code with gradient descent method with easiest and fastest way like :
learningRate = 0.1
make **a series of number of x
initialize given 1,1,1 for a2,a1,a0
partial derivative for a2,a1,a0 (a2_p:2x, a1_p:x, a0_p:1)
gradient descent method : (ex) a2 = a2 - (learningRate)( y - [(a2)*x^2 + (a1)*x + a0] )(a2_p)
ps. Honestly, I do not know what should i put 'x' and 'y' or a2, a1, a0.
However, i got a wrong answer with different result each time.
So, I want to get a hint for correct equation or code sequence.
Thank you for reading my lowest level of question.
There are a few errors in your equations
For the function f(x) = a2*x^2+a1*x+a0, partial derivatives for a2, a1 and a0 are x^2, x and 1, respectively.
Suppose cost function is (1/2)*(y-f(x))^2
Partial derivatives of cost function with respect to ai is -(y-f(x))* partial derivative of f(x) for ai, where i belongs to [0,2]
So, the gradient descent equation is:
ai = ai + learning_rate*(y-f(x)) * partial derivative of f(x) for ai, where i belongs to [0,2]
I hope this code helps
#Training sample
sample = [(0,0),(1,1),(1,2),(2,1)]
#Our function => a2*x^2+a1*x+a0
class Function():
def __init__(self, a2, a1, a0):
self.a2 = a2
self.a1 = a1
self.a0 = a0
def eval(self, x):
return self.a2*x**2+self.a1*x+self.a0
def partial_a2(self, x):
return x**2
def partial_a1(self, x):
return x
def partial_a0(self, x):
return 1
#Initialise function
f = Function(1,1,1)
#To Calculate loss from the sample
def loss(sample, f):
return sum([(y-f.eval(x))**2 for x,y in sample])/len(sample)
epochs = 100000
lr = 0.0005
#To record the best values
best_values = (0,0,0)
for epoch in range(epochs):
min_loss = 100
for x, y in sample:
#Gradient descent
f.a2 = f.a2+lr*(y-f.eval(x))*f.partial_a2(x)
f.a1 = f.a1+lr*(y-f.eval(x))*f.partial_a1(x)
f.a0 = f.a0+lr*(y-f.eval(x))*f.partial_a0(x)
#Storing the best values
epoch_loss = loss(sample, f)
if min_loss > epoch_loss:
min_loss = epoch_loss
best_values = (f.a2, f.a1, f.a0)
print("Loss:", min_loss)
print("Best values (a2,a1,a0):", best_values)
Output:
Loss: 0.12500004789165717
Best values (a2,a1,a0): (-1.0001922562970325, 2.5003368582261487, 0.00014521557599919338)
I am learning to implement the Factorization Machine in Pytorch.
And there should be some feature crossing operations.
For example, I've got three features [A,B,C], after embedding, they are [vA,vB,vC], so the feature crossing is "[vA·vB], [vA·vC], [vB·vc]".
I know this operation can be simplified by the following:
It can be implemented by MATRIX OPERATIONS.
But this only gives a final result, say, a single value.
The question is, how to get all cross_vec in the following without doing FOR loop:
note: size of "feature_emb" is [batch_size x feature_len x embedding_size]
g_feature = 0
for i in range(self.featurn_len):
for j in range(self.featurn_len):
if j <= i: continue
cross_vec = feature_emb[:,i,:] * feature_emb[:,j,:]
g_feature += torch.sum(cross_vec, dim=1)
You can
cross_vec = (feature_emb[:, None, ...] * feature_emb[..., None, :]).sum(dim=-1)
This should give you corss_vec of shape (batch_size, feature_len, feature_len).
Alternatively, you can use torch.bmm
cross_vec = torch.bmm(feature_emb, feature_emb.transpose(1, 2))
For a classification problem using BernoulliNB , how to calculate the joint log-likelihood. The joint likelihood it to be calculated by below formula, where y(d) is the array of actual output (not predicted values) and x(d) is the data set of features.
I read this answer and read the documentation but it didn't exactly served my purpose. Can somebody please
help.
By looking at the code, it looks like there is a hidden undocumented ._joint_log_likelihood(self, X) function in the BernoulliNB which computes the joint log-likelihood.
Its implementation is somewhat consistent with what you ask.
The solution is to count the y(d) of the output.
If the output is True, the y(d) is the [1] in data[idx][1],
else [0] in data[idx][0].
The first block of code calls the _joint_log_likelihood function.
The second block of code is the detail of that function.
The third block of code uses the function on a Bernoulli Naive Bayes dataset.
train, test, train_labels, test_labels = train_test_split(Xs[0], ys[0],
test_size=1./3, random_state=r)
naive = BernoulliNB(alpha= 10**-7)
model = naive.fit(train, train_labels)
joint_log_train = model._joint_log_likelihood(train)
l = [np.append(x,y) for x, y in zip(train, train_labels)]
def count(data, label):
x = 0
for idx, l in enumerate(label):
if (l == True):
x += data[idx][1]
else:
x += data[idx][0]
return x
# Write your code below this line.
for i, (x, y) in enumerate(zip(Xs, ys)):
train, test, train_labels, test_labels = train_test_split(x, y, test_size=1./3, random_state=r)
for j, a in enumerate(alphas):
naive = BernoulliNB(alpha = a)
model = naive.fit(train, train_labels)
joint_log_train = model._joint_log_likelihood(train)
joint_log_test = model._joint_log_likelihood(test)
train_jil[i][j] = count(joint_log_train, train_labels)
test_jil[i][j] = count(joint_log_test, test_labels)
Here is part of get_updates code from SGD from keras(source)
moments = [K.zeros(shape) for shape in shapes]
self.weights = [self.iterations] + moments
for p, g, m in zip(params, grads, moments):
v = self.momentum * m - lr * g # velocity
self.updates.append(K.update(m, v))
Observation:
Since moments variable is a list of zeros tensors. Each m in the for loop is a zero tensor with the shape of p. Then the self.momentum * m, at the first line of the loop, is just a scalar multiply by zero tensor which result a zero tensor.
Question
What am I missing here?
Yes - during a first iteration of this loop m is equal to 0. But then it's updated by a current v value in this line:
self.updates.append(K.update(m, v))
So in next iteration you'll have:
v = self.momentum * old_velocity - lr * g # velocity
where old_velocity is a previous value of v.
I am trying to implement a custom Keras objective function:
in 'Direct Intrinsics: Learning Albedo-Shading Decomposition by Convolutional Regression', Narihira et al.
This is the sum of equations (4) and (6) from the previous picture. Y* is the ground truth, Y a prediction map and y = Y* - Y.
This is my code:
def custom_objective(y_true, y_pred):
#Eq. (4) Scale invariant L2 loss
y = y_true - y_pred
h = 0.5 # lambda
term1 = K.mean(K.sum(K.square(y)))
term2 = K.square(K.mean(K.sum(y)))
sca = term1-h*term2
#Eq. (6) Gradient L2 loss
gra = K.mean(K.sum((K.square(K.gradients(K.sum(y[:,1]), y)) + K.square(K.gradients(K.sum(y[1,:]), y)))))
return (sca + gra)
However, I suspect that the equation (6) is not correctly implemented because the results are not good. Am I computing this right?
Thank you!
Edit:
I am trying to approximate (6) convolving with Prewitt filters. It works when my input is a chunk of images i.e. y[batch_size, channels, row, cols], but not with y_true and y_pred (which are of type TensorType(float32, 4D)).
My code:
def cconv(image, g_kernel, batch_size):
g_kernel = theano.shared(g_kernel)
M = T.dtensor3()
conv = theano.function(
inputs=[M],
outputs=conv2d(M, g_kernel, border_mode='full'),
)
accum = 0
for curr_batch in range (batch_size):
accum = accum + conv(image[curr_batch])
return accum/batch_size
def gradient_loss(y_true, y_pred):
y = y_true - y_pred
batch_size = 40
# Direction i
pw_x = np.array([[-1,0,1],[-1,0,1],[-1,0,1]]).astype(np.float64)
g_x = cconv(y, pw_x, batch_size)
# Direction j
pw_y = np.array([[-1,-1,-1],[0,0,0],[1,1,1]]).astype(np.float64)
g_y = cconv(y, pw_y, batch_size)
gra_l2_loss = K.mean(K.square(g_x) + K.square(g_y))
return (gra_l2_loss)
The crash is produced in:
accum = accum + conv(image[curr_batch])
...and error description is the following one:
*** TypeError: ('Bad input argument to theano function with name "custom_models.py:836" at index 0 (0-based)', 'Expected an array-like
object, but found a Variable: maybe you are trying to call a function
on a (possibly shared) variable instead of a numeric array?')
How can I use y (y_true - y_pred) as a numpy array, or how can I solve this issue?
SIL2
term1 = K.mean(K.square(y))
term2 = K.square(K.mean(y))
[...]
One mistake spread across the code was that when you see (1/n * sum()) in the equations, it is a mean. Not the mean of a sum.
Gradient
After reading your comment and giving it more thought, I think there is a confusion about the gradient. At least I got confused.
There are two ways of interpreting the gradient symbol:
The gradient of a vector where y should be differentiated with respect to the parameters of your model (usually the weights of the neural net). In previous edits I started to write in this direction because that's the sort of approach used to trained the model (eg. gradient descent). But I think I was wrong.
The pixel intensity gradient in a picture, as you mentioned in your comment. The diff of each pixel with its neighbor in each direction. In which case I guess you have to translate the example you gave into Keras.
To sum up, K.gradients() and numpy.gradient() are not used in the same way. Because numpy implicitly considers (i, j) (the row and column indices) as the two input variables, while when you feed a 2D image to a neural net, every single pixel is an input variable. Hope I'm clear.