How to minimize lasso loss function with scipy.minimize? - machine-learning

Main issue: Why coefficients of Lasso regression are not shrunk to zero with minimization done by scipy.minimize?
I am trying to create Lasso model, using scipy.minimize. However, it is working only when alpha is zero (thus only like basic squared error). When alpha is not zero, it returns worse result (higher loss) and still none of coefficients is zero.
I know that Lasso is not differentiable, but I tried to use Powell optimizer, that should handle non-differential loss (also I tried BFGS, that should handle non-smooth). None of these optimizers worked.
For testing this, I created dataset where y is random (provided here to be reproducible), first feature of X is exactly y*.5 and other four features are random (also provided here to be reproducible). I would expect the algorithm to shrink these random coefficients to zero and keep only the first one, but it's not happening.
For lasso loss function I am using formula from this paper (figure 1, first page)
My code is following:
from scipy.optimize import minimize
import numpy as np
class Lasso:
def _pred(self,X,w):
return np.dot(X,w)
def LossLasso(self,weights,X,y,alpha):
w = weights
yp = self._pred(X,w)
loss = np.linalg.norm(y - yp)**2 + alpha * np.sum(abs(w))
return loss
def fit(self,X,y,alpha=0.0):
initw = np.random.rand(X.shape[1]) #initial weights
res = minimize(self.LossLasso,
initw,
args=(X,y,alpha),
method='Powell')
return res
if __name__=='__main__':
y = np.array([1., 0., 1., 0., 0., 1., 1., 0., 0., 0., 1., 0., 0., 0., 1., 0., 1.,
1., 1., 0.])
X_informative = y.reshape(20,1)*.5
X_noninformative = np.array([[0.94741352, 0.892991 , 0.29387455, 0.30517762],
[0.22743465, 0.66042825, 0.2231239 , 0.16946974],
[0.21918747, 0.94606854, 0.1050368 , 0.13710866],
[0.5236064 , 0.55479259, 0.47711427, 0.59215551],
[0.07061579, 0.80542011, 0.87565747, 0.193524 ],
[0.25345866, 0.78401146, 0.40316495, 0.78759134],
[0.85351906, 0.39682136, 0.74959904, 0.71950502],
[0.383305 , 0.32597392, 0.05472551, 0.16073454],
[0.1151415 , 0.71683239, 0.69560523, 0.89810466],
[0.48769347, 0.58225877, 0.31199272, 0.37562258],
[0.99447288, 0.14605177, 0.61914979, 0.85600544],
[0.78071238, 0.63040498, 0.79964659, 0.97343972],
[0.39570225, 0.15668933, 0.65247826, 0.78343458],
[0.49527699, 0.35968554, 0.6281051 , 0.35479879],
[0.13036737, 0.66529989, 0.38607805, 0.0124732 ],
[0.04186019, 0.13181696, 0.10475994, 0.06046115],
[0.50747742, 0.5022839 , 0.37147486, 0.21679859],
[0.93715221, 0.36066077, 0.72510501, 0.48292022],
[0.47952644, 0.40818585, 0.89012395, 0.20286356],
[0.30201193, 0.07573086, 0.3152038 , 0.49004217]])
X = np.concatenate([X_informative,X_noninformative],axis=1)
#alpha zero
clf = Lasso()
print(clf.fit(X,y,alpha=0.0))
#alpha nonzero
clf = Lasso()
print(clf.fit(X,y,alpha=0.5))
While output of alpha zero is correct:
fun: 2.1923913945084075e-24
message: 'Optimization terminated successfully.'
nfev: 632
nit: 12
status: 0
success: True
x: array([ 2.00000000e+00, -1.49737205e-13, -5.49916821e-13, 8.87767676e-13,
1.75335824e-13])
output of alpha non-zero has much higher loss and non of coefficients is zero as expected:
fun: 0.9714385008821652
message: 'Optimization terminated successfully.'
nfev: 527
nit: 6
status: 0
success: True
x: array([ 1.86644474e+00, 1.63986381e-02, 2.99944361e-03, 1.64568796e-12,
-6.72908469e-09])
Why coefficients of random features are not shrunk to zero and loss is so high?

Is this a viable option:
import numpy as np
from sklearn.linear_model import Lasso, Ridge
from sklearn.model_selection import GridSearchCV
y = np.array([1., 0., 1., 0., 0., 1., 1., 0., 0., 0., 1., 0., 0., 0., 1., 0., 1., 1., 1., 0.])
X_informative = y.reshape(20, 1) * .5
X_noninformative = np.array([[0.94741352, 0.892991 , 0.29387455, 0.30517762],
[0.22743465, 0.66042825, 0.2231239 , 0.16946974],
[0.21918747, 0.94606854, 0.1050368 , 0.13710866],
[0.5236064 , 0.55479259, 0.47711427, 0.59215551],
[0.07061579, 0.80542011, 0.87565747, 0.193524 ],
[0.25345866, 0.78401146, 0.40316495, 0.78759134],
[0.85351906, 0.39682136, 0.74959904, 0.71950502],
[0.383305 , 0.32597392, 0.05472551, 0.16073454],
[0.1151415 , 0.71683239, 0.69560523, 0.89810466],
[0.48769347, 0.58225877, 0.31199272, 0.37562258],
[0.99447288, 0.14605177, 0.61914979, 0.85600544],
[0.78071238, 0.63040498, 0.79964659, 0.97343972],
[0.39570225, 0.15668933, 0.65247826, 0.78343458],
[0.49527699, 0.35968554, 0.6281051 , 0.35479879],
[0.13036737, 0.66529989, 0.38607805, 0.0124732 ],
[0.04186019, 0.13181696, 0.10475994, 0.06046115],
[0.50747742, 0.5022839 , 0.37147486, 0.21679859],
[0.93715221, 0.36066077, 0.72510501, 0.48292022],
[0.47952644, 0.40818585, 0.89012395, 0.20286356],
[0.30201193, 0.07573086, 0.3152038 , 0.49004217]])
X = np.concatenate([X_informative,X_noninformative], axis=1)
_lasso = Lasso()
_lasso_parms = {'alpha': [1e-15, 1e-10, 1e-8, 1e-4, 1e-3, 1e-2, 1, 5, 10, 20]}
_lasso_regressor = GridSearchCV(_lasso, _lasso_parms, scoring='neg_mean_squared_error', cv=5)
print('_lasso_regressor.fit(X, y)')
print(_lasso_regressor.fit(X, y))
print("\n=========================================\n")
print('lasso_regressor.best_params_: ')
print(_lasso_regressor.best_params_)
print("\n")
print('lasso_regressor.best_score_: ')
print(_lasso_regressor.best_score_)
print("\n=========================================\n")
_ridge = Ridge()
_ridge_parms = {'alpha': [1e-15, 1e-10, 1e-8, 1e-4, 1e-3, 1e-2, 1, 5, 10, 20]}
_ridge_regressor = GridSearchCV(_ridge, _lasso_parms, scoring='neg_mean_squared_error', cv=5)
print('_ridge_regressor.fit(X, y)')
print(_ridge_regressor.fit(X, y))
print("\n=========================================\n")
print('_ridge_regressor.best_params_: ')
print(_ridge_regressor.best_params_)
print("\n")
print('_ridge_regressor.best_score_: ')
print(_ridge_regressor.best_score_)
print("\n=========================================\n")
and the output:

have you tried running the lasso loss minimize with other data sets? with the data you've provided, the regularization (l1 penalty) represents almost the entirety of the loss function's value. as you increase the alpha value, you're increasing the magnitude of the loss function many orders of magnitude above what the loss function returns with the true optimal coefficient 2.0

Related

The data(tensor) on GPU is automatically changed to zero in multiprocessing, and how to keep the value of data on GPU unchanged in multiprocessing?

In the code, self.x is set to [1., 1., 1., 1., 1.] and it is allocatted on GPU. While, it changes to [0., 0., 0., 0., 0.] when printing it in multiprocessing. And if the line of code 'self.x = self.x.to(device)' is deleted, the printing turns out to be correct. Why does this happen? And how to keep the value of data on GPU unchanged in multiprocessing?
import torch
import torch.multiprocessing as mp
class Set:
def __init__(self, device):
self.x = torch.ones(5)
self.x = self.x.to(device)
def solve(data):
print(data.x)
def func():
jobs = []
data = Set(torch.device('cuda:0'))
for i in range(3):
p = mp.Process(target=solve, args=(data,))
jobs.append(p)
p.start()
for j in jobs:
j.join()
if __name__ == '__main__':
func()

Is torchmetrics mean average precision behaving as expected? 2 images assessed individually score higher than the result together

I'm trying to evaluate the performance of an object detection model using torchmetrics mean average precision. However, I'm getting this odd result:
When I evaluate the metric for image A, I get 'map': 0.7891, 'map_50': 1, 'map_75': 1
When I evaluate the metric for image B, I get 'map': 0.7611, 'map_50': 1, 'map_75': 1
When I run the metric for a list containing information about both image A and image B, I expect the performance to be somewhere between 0.7891 and 0.7611. Instead, I get 'map':0.7197, 'map_50': 0.9293,'map_75': 0.9293.
Is my reasoning about how mean average precision for multiple images works flawed, or is something going wrong here?
Here are the specific values I'm using:
import torch
from torchmetrics.detection.mean_ap import MeanAveragePrecision
include_A = True
include_B = True
truth_eval = []
pred_eval = []
if include_A:
truth_eval.append({'boxes': torch.tensor([[562., 180., 694., 227.],
[322., 189., 467., 251.],
[770., 167., 954., 240.]], dtype=torch.float64),
'labels': torch.tensor([1., 1., 1.], dtype=torch.float64)})
pred_eval.append({'boxes': torch.tensor([[ 769.4067, 163.4746, 949.1498, 240.5295],
[ 574.9888, 178.3064, 697.7145, 226.6653],
[ 333.8886, 188.9244, 471.6525, 250.9485],
[1037.9913, 20.9808, 1110.0175, 118.0723]]),
'scores': torch.tensor([1., 1., 1., 1.]),
'labels': torch.tensor([1., 1., 1., 1.])})
if include_B:
truth_eval.append({'boxes': torch.tensor([[554., 180., 688., 228.],
[309., 190., 457., 253.],
[810., 166., 999., 239.]], dtype=torch.float64),
'labels': torch.tensor([1., 1., 1.], dtype=torch.float64)})
pred_eval.append({'boxes': torch.tensor([[ 567.4130, 177.8370, 691.1442, 227.1231],
[ 810.1462, 163.8314, 994.3475, 241.7314],
[ 321.8360, 190.0476, 464.3474, 251.8763],
[1046.0631, 18.7516, 1111.9734, 104.2167],
[ 540.7168, 76.8780, 564.6191, 133.8904]]),
'scores': torch.tensor([1., 1., 1., 1., 1.]),
'labels': torch.tensor([1., 1., 1., 1., 1.])})
metric = MeanAveragePrecision()
metric.update(pred_eval , truth_eval )
metric.compute()

Assign custom weight in pytorch

I'm trying to assign some custom weight to my PyTorch model but it doesn't work correctly.
class Mod(nn.Module):
def __init__(self):
super(Mod, self).__init__()
self.linear = nn.Sequential(
nn.Linear(1, 5)
)
def forward(self, x):
x = self.linear(x)
return x
mod = Mod()
mod.linear.weight = torch.tensor([1. ,2. ,3. ,4. ,5.], requires_grad=True)
mod.linear.bias = torch.nn.Parameter(torch.tensor(0., requires_grad=True))
print(mod.linear.weight)
>>> tensor([1., 2., 3., 4., 5.], requires_grad=True)
output = mod(torch.ones(1))
print(output)
>>> tensor([ 0.2657, 0.3220, -0.0726, -1.6987, 0.3945], grad_fn=<AddBackward0>)
The output is expected to be [1., 2., 3., 4., 5.] but it doesn't work as expected. What am I missing here?
You are not updating the weights in the right place. Your self.linear is not a nn.Linear layer, but rather a nn.Sequential container. Your nn.Linear is the first layer in the sequential. To access it you need to index self.linear:
with torch.no_grad():
mod.linear[0].weight.data = torch.tensor([1. ,2. ,3. ,4. ,5.], requires_grad=True)[:, None]
mod.linear[0].bias.data = torch.zeros((5, ), requires_grad=True) # bias is not a scalar here

Drake: Why are there multiple geometryId for the same link in a MultibodyPlant?

I have 6 real links (named left distal, left proximal, right distal, right proximal, palm, and ball) + 2 dummy links (so the ball can move freely in x-y plane) in my URDFs. I want to know which geometryId corresponds to which link so I can check for contact. However, there are far more geometryId registered than links in the system. So I try to print out the body names through the geometryIds and found multiple geometryIds with the same body name. Then from Drake's render_multibody_plant.ipynb tutorial I saw this line geometry_label = inspector.GetPerceptionProperties(geometry_id).GetProperty("label", "id") so I printed the same for my geometryIds. However, some of them returns None. When I print out int(geometry_label) for those that are not None, I get exactly 6 numbers (but multiple of the same number for some of the links!).
I don't understand what those extra geometry_ids are for and how to find the geometryIds that I actually care for.
Here is relevant code:
builder = DiagramBuilder()
# plant, scene_graph = AddMultibodyPlantSceneGraph(builder, time_step=0.00001)
plant, scene_graph = AddMultibodyPlantSceneGraph(builder, time_step=0.)
file_name = FindResource("models/myhand.urdf")
gripper_model = Parser(plant).AddModelFromFile(file_name)
file_name = FindResource("models/sphere.urdf")
object_model = Parser(plant).AddModelFromFile(file_name)
scene_graph_context = scene_graph.AllocateContext()
plant.Finalize()
plant.set_name('hand')
controller = ConstantVectorSource([-7, 7])
torque_system = builder.AddSystem(controller)
builder.Connect(torque_system.get_output_port(0), plant.get_actuation_input_port())
# Setup visualization
T_xy = np.array([[ 1., 0., 0., 0.], [ 0., 1., 0., 0.], [ 0., 0., 0., 1.]])
T_xz = np.array([[ 1., 0., 0., 0.], [ 0., 0., 1., 0.], [ 0., 0., 0., 1.]])
visualizer = builder.AddSystem(
PlanarSceneGraphVisualizer(scene_graph, T_VW=T_xy, xlim=[-0.3, 0.3], ylim=[-0.3, 0.3], show=plt_is_interactive))
builder.Connect(scene_graph.get_pose_bundle_output_port(),
visualizer.get_input_port(0))
diagram = builder.Build()
diagram_context = diagram.CreateDefaultContext()
scene_graph_context = scene_graph.GetMyContextFromRoot(diagram_context)
plant_context = plant.GetMyContextFromRoot(diagram_context)
# Set up a simulator to run this diagram
simulator = Simulator(diagram)
context = simulator.get_mutable_context()
# simulate from zero to duration
simulator.Initialize()
# Simulate
duration = 0.5 if get_ipython() else 0.1 # sets a shorter duration during testing
context.SetTime(0.0)
AdvanceToAndVisualize(simulator, visualizer, duration)
query_object = scene_graph.get_query_output_port().Eval(scene_graph_context)
inspector = query_object.inspector()
for geometry_id in inspector.GetAllGeometryIds():
body = plant.GetBodyFromFrameId(inspector.GetFrameId(geometry_id))
print(body.name())
geometry_label = inspector.GetPerceptionProperties(
geometry_id)
if geometry_label != None:
print(int(geometry_label.GetProperty("label", "id")))
Here's output to the print statement:
ball
ball
6
right_distal
right_distal
5
right_distal
5
left_distal
left_distal
4
left_distal
4
right_proximal
right_proximal
3
right_proximal
3
left_proximal
left_proximal
2
left_proximal
2
palm
palm
1
Note: running PyDrake on a Jupyter Notebook
Specifically, each link can register collision and visual geometry. Even if the two geometries specify shapes with identical parameters, those shapes are not consolidated to a single geometry. So, you'll get one GeometryId for each collision and visual geometry associated with a link.
Those declared as visual in your URDF will have non-None PerceptionProperties (and IllutrationProperties). But their ProximityProperties will be None. Conversely, those declared as collision geometries will have non-None ProximityProperties but None for the other two types.
I would expect to have a separate geometry ID for each visual element in your body (not one per body). Does your URDF have multiple visual elements in each body?
The "Body frames and SceneGraph frames" documentation for MultibodyPlant appears to have the answer you're looking for:
Given a GeometryId, SceneGraph cannot report what body it is affixed to. It can only report the SceneGraph alias frame F. But the following idiom can report the body:
const MultibodyPlant<T>& plant = ...;
const SceneGraphInspector<T>& inspector = ...;
const GeometryId g_id = id_from_some_query;
const FrameId f_id = inspector.GetFrameId(g_id);
const Body<T>* body = plant.GetBodyFromFrameId(f_id);
See documentation of geometry::SceneGraphInspector on where to get an inspector.

Probability prediction method of KNeighborsClassifier returns only 0 and 1

Can anyone tell me what's the problem with my code?
Why I can predict probability of iris dataset by using LinearRegression but, KNeighborsClassifier gives me 0 or 1 while it should give me a result like the one LinearRegression yields?
from sklearn.datasets import load_iris
from sklearn import metrics
iris = load_iris()
X = iris.data
y = iris.target
for train_index, test_index in skf:
X_train, X_test = X_total[train_index], X_total[test_index]
y_train, y_test = y_total[train_index], y_total[test_index]
from sklearn.linear_model import LogisticRegression
ln = LogisticRegression()
ln.fit(X_train,y_train)
ln.predict_proba(X_test)[:,1]
array([ 0.18075722, 0.08906078, 0.14693156, 0.10467766,
0.14823032,
0.70361962, 0.65733216, 0.77864636, 0.67203114, 0.68655163,
0.25219798, 0.3863194 , 0.30735105, 0.13963637, 0.28017798])
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=5, algorithm='ball_tree', metric='euclidean')
knn.fit(X_train, y_train)
knn.predict_proba(X_test)[0:10,1]
array([ 0., 0., 0., 0., 0., 1., 1., 1., 1., 1.])
Because KNN has very limited concept of probability. Its estimate is simply fraction of votes among nearest neighbours. Increase number of neighbours to 15 or 100 or query point near the decision boundary and you will see more diverse results. Currently your points are simply always having 5 neighbours of the same label (thus probability 0 or 1).
here, I have a knn model - model_knn
using sklearn
result = {}
model_classes = model_knn.classes_
predicted = model_knn.predict(word_average)
score = model_knn.predict_proba(word_average)
index = np.where(model_classes == predicted[0])[0][0]
result["predicted"] = predicted[0]
result["score"] = score[0][index]

Resources