How to linearize a quadratic objective function

How to linearize a quadratic objective function - quadratic-programming

I have an optimization problem. The objective is defined as follows:
Max∑(k=1)∑(t=1)(r_k (t))^2
enter image description here
constraints are linear
How to linearize this objective function?

This is a non-convex QP. There is a linearization possible based on the KKT conditions. This will give you a linear MIP. See link

Related

Why does the Silhouette_score require labels as input?

Why is it wrong to think that it only needs the data since it: "outputs a measure of how similar an object is to its own cluster (cohesion) compared to other clusters (separation)."
However, I also need to input the labels (which the function itself computes); so, why are the labels necessary to input?

how similar an object is to its own cluster
In order to compute the silhouette, you need to know to which cluster your samples belong.
Also:
The Silhouette Coefficient is calculated using the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample. The Silhouette Coefficient for a sample is (b - a) / max(a, b). To clarify, b is the distance between a sample and the nearest cluster that the sample is not a part of.
You need the labels to know what "intra-cluster" and "nearest-cluster" mean.

Silhouette_score is a metric for clustering quality, not a clustering algorithm. It considers both the inter-class and intra-class distance.
For that calculation to happen, you need to supply both the data and target labels (estimated by unsupervised methods like K-means).

support vector machines - solving for alphas

In Svm, this is our dual problem optimization objective with the following constrains along with alphas between 0 and C. How do i find alphas with this optimization objective and these constrians.
Also, please correct me, if i am wrong somewhere

Well, you want to maximize the Lagrangean of this optimization problem, right?
So what you did was to set the partial derivatives of your relevant components to zero to build the Lagrangean. You did that because in the optimum, they are equal to zero. This is the necessary condition to find a stationary point but because of the convexity property of your original problem this is also the optimal solution.
To maximize over the alphas, you want to use the SMO algorithm as described here:
https://en.m.wikipedia.org/wiki/Sequential_minimal_optimization
Good luck!

How does scikitlearn implement line search?

In this section of the documentation on gradient boosting, it says
Gradient Boosting attempts to solve this minimization problem numerically via steepest descent: The steepest descent direction is the negative gradient of the loss function evaluated at the current model F_{m-1} which can be calculated for any differentiable loss function:
Where the step length \gamma_m is chosen using line search:
I understand the purpose of the line search, but I don't understand the algorithm itself. I read through the source code, but it's still not clicking. An explanation would be much appreciated.

The implementation is depending on which loss function you choose when initialize a GradientBoostingClassifier instance(use this for example, the regression part should be similar). The default loss function is 'deviance' and the corresponding optimization algorithm is implemented here. In the _update_terminal_region function, a simple Newton iteration is implemented with only one step.
Is this the answer you want?

I suspect the thing you find confusing is this: you can see where scikit-learn computes the negative gradient of the loss function and fits a base estimator to that negative gradient. It looks like the _update_terminal_region method is responsible for figuring out the step size, but you can't see anywhere it might be solving the line search minimization problem as written in the documentation.
The reason you can't find a line search happening is that, for the special case of decision tree regressors, which are just piecewise constant functions, the optimal solution is usually known. For example, if you look at the _update_terminal_region method of the LeastAbsoluteError loss function, you see that the leaves of the tree are given the value of the weighted median of the difference between y and the predicted value for the examples for which that leaf is relevant. This median is the known optimal solution.
To summarize what's happening, for each gradient descent iteration the following steps are taken:
Compute the negative gradient of the loss function at the current prediction.
Fit a DecisionTreeRegressor to the negative gradient. This fitting produces a tree with good splits for decreasing the loss.
Replace the values at the leaves of the DecisionTreeRegressor with values that minimize loss. These are usually computed from some simple known formula that takes advantage of the fact that the decision tree is just a piecewise constant function.
This method should be at least as good as what is described in the docs, but I think in some cases might not be identical to it.

From your comments it seems the algorithm itself is unclear and not the way scikitlearn implements it.
Notation in the wikipedia article is slightly sloppy, one does not simply differentiate by a function evaluated at a point. Once you replace F_{m-1}(x_i) with \hat{y_i} and replace partial derivative with a partial derivative evaluated at \hat{y}=F_{m-1}(x) things become clearer:
This would also remove x_{i} (sort of) from the minimization problem and shows the intent of line search - to optimize depending on the current prediction and not depending on the training set. Now, notice that:
Hence you're just minimizing:
So line search simply optimizes one degree of freedom you have (once you've found the right gradient direction) - the step size.

How to make the labels of superpixels to be locally consistent in a gray-level map?

I have a bunch of gray-scale images decomposed into superpixels. Each superpixel in these images have a label in the rage of [0-1]. You can see one sample of images below.
Here is the challenge: I want the spatially (locally) neighboring superpixels to have consistent labels (close in value).
I'm kind of interested in smoothing local labels but do not want to apply Gaussian smoothing functions or whatever, as some colleagues suggested. I have also heard about Conditional Random Field (CRF). Is it helpful?
Any suggestion would be welcome.

I'm kind of interested in smoothing local labels but do not want to apply Gaussian smoothing functions or whatever, as some colleagues suggested.
And why is that? Why do you not consider helpful advice of your colleagues, which are actually right. Applying smoothing function is the most reasonable way to go.
I have also heard about Conditional Random Field (CRF). Is it helpful?
This also suggests, that you should rather go with collegues advice, as CRF has nothing to do with your problem. CRF is a classifier, sequence classifier to be exact, requiring labeled examples to learn from and has nothing to do with the setting presented.
What are typical approaches?
The exact thing proposed by your collegues, you should define a smoothing function and apply it to your function values (I will not use a term "labels" as it is missleading, you do have values in [0,1], continuous values, "label" denotes categorical variable in machine learning) and its neighbourhood.
Another approach would be to define some optimization problem, where your current assignment of values is one goal, and the second one is "closeness", for example:
Let us assume that you have points with values {(x_i, y_i)}_{i=1}^N and that n(x) returns indices of neighbouring points of x.
Consequently you are trying to find {a_i}_{i=1}^N such that they minimize
SUM_{i=1}^N (y_i - a_i)^2 + C * SUM_{i=1}^N SUM_{j \in n(x_i)} (a_i - a_j)^2
------------------------- - --------------------------------------------
closeness to current constant to closeness to neighbouring values
values weight each part
You can solve the above optimization problem using many techniques, for example through scipy.optimize.minimize module.

I am not sure that your request makes any sense.
Having close label values for nearby superpixels is trivial: take some smooth function of (X, Y), such as constant or affine, taking values in the range [0,1], and assign the function value to the superpixel centered at (X, Y).
You could also take the distance function from any point in the plane.
But this is of no use as it is unrelated to the image content.

curve fitting in OpenCV

Is there any opencv function for curve fitting?
I have a set of points (cv::points) and my aim is to fit these points to a closed/open curve.
Right now I am taking a pair of points and drawing lines with them, effectively forming a curve.

It's not quite clear from your question whether you want to smooth the curve by adding more points or to summarise it by using fewer points. If it's the latter, perhaps you should consider cv::approxPolyDP, which is documented here and copied below for reference.

I think you are talking function approximation and interpolation.
As I know, there's not a function directly about curve fitting.
If you just want to get the fitting result, you can use Matlab's curve fitting toolbox, where there is a tool named cftool. cftool is a GUI tool, you can specify the input points and the interpolation method and get the result formula.

Categories

HOME

quartz.net

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How to linearize a quadratic objective function - quadratic-programming

I have an optimization problem. The objective is defined as follows: Max∑(k=1)∑(t=1)(r_k (t))^2 enter image description here constraints are linear How to linearize this objective function?

This is a non-convex QP. There is a linearization possible based on the KKT conditions. This will give you a linear MIP. See link

Related

Why does the Silhouette_score require labels as input?

support vector machines - solving for alphas

How does scikitlearn implement line search?

How to make the labels of superpixels to be locally consistent in a gray-level map?

curve fitting in OpenCV

Categories

Resources