Is Triangle inequality necessary for kmeans?

Is Triangle inequality necessary for kmeans? - machine-learning

I wonder if Triangle inequality is necessary for the distance measure used in kmeans.

k-means is designed for Euclidean distance, which happens to satisfy triangle inequality.
Using other distance functions is risky, as it may stop converging. The reason however is not the triangle inequality, but the mean might not minimize the distance function. (The arithmetic mean minimizes the sum-of-squares, not arbitrary distances!)
There are faster methods for k-means that exploit the triangle inequality to avoid recomputations. But if you stick to classic MacQueen or Lloyd k-means, then you do not need the triangle inequality.
Just be careful with using other distance functions to not run into an infinite loop. You need to prove that the mean minimizes your distances to the cluster centers. If you cannot prove this, it may fail to converge, as the objective function no longer decreases monotonically! So you really should try to prove convergence for your distance function!

Well, classical kmeans is defined on Euclidean space with L2 distance, so you get triangle inequality automatically from that (triangle inequality is part of how a distance/metric is defined). If you are using a non-euclidean metric, you would need to define what is the meaning of the "mean", amongst other things.
If you don't have triangle inequality, it means that two points could be very far from each other, but both can be close to a third point. You need to think how you would like to interpret this case.
Having said all that, I have in the past used average linkage hierarchical clustering with a distance measure that did not fulfill triangle inequality amongst other things, and it worked great for my needs.

Related

Does optimizing a bijection-transformed value affect performance or accuracy?

If I want to optimize a function with respect to some constrained value, I can find a bijective map between an unconstrained space and the constrained space, then optimize the composition of the original function and the bijective map with respect to the unconstrained value.
Does optimizing in a different space affect the performance or accuracy of optimization? And does it vary between bijective maps?
My use case is training constrained Gaussian process model hyperparameters in GPflow using TensorFlow Probability's bijectors.

If I understand you correctly, you might have for example some variable that is constrained to be positive and want to optimize it. And for that you train the variable in the unconstrained space?
That would be pretty common in machine learning where you for example enforce a variance (of let's say the likelihood) to be positive by taking the exponent of the unconstrained value.
I guess the effect on the optimization very much depends on how you optimize it. For gradient based methods it does have an effect, and sometimes small tricks are helpful to improve those issues (e.g. shifting, so that your transformation is tf.exp(shift_val + unconstrained_variable) ).
And yes afaik it varies inbetween different mappings. In my example, the softplus and exponential transformation result in different gradients. Tough I'm not sure if there's a consent on which one is preferable.
I'd just try a few different ones. As long as it doesn't lead to numerical issues, either transformation/bijection should be fine.
EDIT: just to clarify. The bijection should not affect the solution space, just the optimization path itself.

Better A* Search Heuristic in a 2-d grid world

I am still new with the idea of A* Search. I understand some of the Heuristic that A* Search have such as Straight-Line Distance (Euclidean Distance), Manhattan Distance and Misplaced Tiles (for 8 puzzle game).
For the 2-d grid world,
Which is better admissible heuristic than Straight-Line Distance. I have my mind on Manhattan Distance. Any other suggestion?

When using A* there are two properties that must hold for the heuristic, in order for the search to be optimal (finding the best solution).
The heuristic must be admissible
The heuristic must be monotonistic
In reality it's pretty hard to come up with a non-monotonistic (also called inconsistent) heuristic, so lets stick with the first requirement.
A heuristic is admissible if it never overestimates the distance between two nodes (in this case points). As such the manhattan-distance heuristic is not admissible if diagonal movements are allowed - simply because of pythagoras theorem (the combined length of the two catheti, is longer than the squareroot of the hypothenuse), so in this case the straight line distance heuristic is the better - since it's admissible.
However if diagonal movements are not allowed in the 2D grid, then both heuristics are admissible, since neither will overestimate the distance, but hte manhattan distance heuristic is the preferred, because it makes better estimates, i.e. estimates closer to the actual distance.

Use a heuristic that agrees with the allowed movement:
For 4-directions, use Manhattan distance (L1)
For 8-directions, use Chebyshev distance (L-Infinity)
For any direction, you can use Euclidean distance, but an alternative map representation may be better (e.g. using Waypoints)
Amit Patel has produced fantastic reference material for this subject. See his page at RedBlobGames.com for an introduction to A* and his page on Stanford's Game Programming Page for a description of several grid-world Heuristics . His Stanford page also describes several methods for reducing size of the open set when optimality is not required.
There are also extensions for A* to take advantage of symmetry in grids with constant movement cost. Daniel Harabor introduced two in his doctoral thesis--Jump Point Search (JPS) and Rectangular Symmetry Reduction (RSR). He describes these in an article he posted on AiGameDev.com

Homography and projective transformation

im trying to write a code that will do projective transformation, but with more than 4 key points. i found this helpful guide but it uses 4 points of reference
https://math.stackexchange.com/questions/296794/finding-the-transform-matrix-from-4-projected-points-with-javascript
i know that matlab uses has a function tcp2form that handles that, but i haven't found a way so far.
anyone can give me some guidance, on how to do so? i can solve the equations using (least squares), but i'm stuck since i have a matrix that is larger than 3*3 and i can't multiple the homogeneous coordinates.
Thanks

If you have more than four control points, you have an overdetermined system of equations. There are two possible scenarios. Either your points are all compatible with the same transformation. In that case, any four points can be used, and the rest will match the transformation exactly. At least in theory. For the sake of numeric stability you'd probably want to choose your points so that they are far from being collinear.
Or your points are not all compatible with a single projective transformation. In this case, all you can hope for is an approximation. If you want the best approximation, you'll have to be more specific about what “best” means, i.e. some kind of error measure. Measuring things in a projective setup is inherently tricky, since there are usually a lot of arbitrary decisions involved.
What you can try is fixing one matrix entry (e.g. the lower right one to 1), then writing the conditions for the remaining 8 coordinates as a system of linear equations, and performing a least squares approximation. But the choice of matrix representative (i.e. fixing one entry here) affects the least squares error measure while it has no effect on the geometric meaning, so this is a pretty arbitrary choice. If the lower right entry of the desired matrix should happen to be zero, you'd computation will run into numeric problems due to overflow.

Remove outliers from Lucas-Kanade optical flow

There are similar questions on SO, but I didn't find the answer I wanted. I need to implement a robust optical flow in order to track features on a (detected) face. I use goodFeaturesToTrack/SURF (I haven't yet decided which is best) to get the initial features.
My question is how can I remove the outliers generated from optical flow? Is RANSAC a valid option for this and if so, how can you combine it with calcOpticalFlowPyrLK?
I also thought of rejecting the features for which the displacement is bigger than a threshold, but it's just an idea and don't really know how to implement it (how to choose the threshold, should I compute the mean displacement, etc). So, which approach is best ?

RANSAC is a good and robust option if you have a model that you expect your motion to conform to.
In general LK is local flow and does not have to conform to any (global) motion model, so in many cases RANSAC is inappropriate.
For general flow you might consider:
Symmetric flow: LK flow from A to B give the same results as an independent LK flow from B to A.
Motion bounds: use domain specific knowledge to, e.g. remove motions that are too big, too sparse, too different than neighbors etc.

If you would use a grid of flowpoints instead of feature detection then you could asses flowpoints by comparing the results with the surrounding flowpoints. If the distance with the surrounding vectors is too big, you could eliminate them. But doing this with irregular features is rather too expensive.
If you do continous tracking (of the same features) over several frames, you could also add some temporal smoothness assumption. e.g. a tracking vector from frame N to N+1 is likely to be very similar with the vector from N-1 to N and N+1 to N+2.
Generally, it always makes sense to eliminate suspicous vectors by the features already mentioned above:
- vectors which are very long
- vectors with high error
- tracking points with poor gradient (already excluded, if you use corner detection for the features)
Ransac would only work if you are particularly interested in one rather global feature. e.g. the movement of the head. But I guess that's not what you are interested in (otherwise you could probably also just take the mean of all vectors)

How is a homography calculated?

I am having quite a bit of trouble understanding the workings of plane to plane homography. In particular I would like to know how the opencv method works.
Is it like ray tracing? How does a homogeneous coordinate differ from a scale*vector?
Everything I read talks like you already know what they're talking about, so it's hard to grasp!

Googling homography estimation returns this as the first link (at least to me):
http://cseweb.ucsd.edu/classes/wi07/cse252a/homography_estimation/homography_estimation.pdf. And definitely this is a poor description and a lot has been omitted. If you want to learn these concepts reading a good book like Multiple View Geometry in Computer Vision would be far better than reading some short articles. Often these short articles have several serious mistakes, so be careful.
In short, a cost function is defined and the parameters (the elements of the homography matrix) that minimize this cost function are the answer we are looking for. A meaningful cost function is geometric, that is, it has a geometric interpretation. For the homography case, we want to find H such that by transforming points from one image to the other the distance between all the points and their correspondences be minimum. This geometric function is nonlinear, that means: 1-an iterative method should be used to solve it, in general, 2-an initial starting point is required for the iterative method. Here, algebraic cost functions enter. These cost functions have no meaningful/geometric interpretation. Often designing them is more of an art, and for a problem usually you can find several algebraic cost functions with different properties. The benefit of algebraic costs is that they lead to linear optimization problems, hence a closed form solution for them exists (that is a one shot /non-iterative method). But the downside is that the found solution is not optimal. Therefore, the general approach is to first optimize an algebraic cost and then use the found solution as starting point for an iterative geometric optimization. Now if you google for these cost functions for homography you will find how usually these are defined.
In case you want to know what method is used in OpenCV simply need to have a look at the code:
http://code.opencv.org/projects/opencv/repository/entry/trunk/opencv/modules/calib3d/src/fundam.cpp#L81
This is the algebraic function, DLT, defined in the mentioned book, if you google homography DLT should find some relevant documents. And then here:
http://code.opencv.org/projects/opencv/repository/entry/trunk/opencv/modules/calib3d/src/fundam.cpp#L165
An iterative procedure minimizes the geometric cost function.It seems the Gauss-Newton method is implemented:
http://en.wikipedia.org/wiki/Gauss%E2%80%93Newton_algorithm
All the above discussion assumes you have correspondences between two images. If some points are matched to incorrect points in the other image, then you have got outliers, and the results of the mentioned methods would be completely off. Robust (against outliers) methods enter here. OpenCV gives you two options: 1.RANSAC 2.LMeDS. Google is your friend here.
Hope that helps.

To answer your question we need to address 4 different questions:
1. Define homography.
2. See what happens when noise or outliers are present.
3. Find an approximate solution.
4. Refine it.
Homography in a 3x3 matrix that maps 2D points. The mapping is linear in homogeneous coordinates: [x2, y2, 1]’ ~ H * [x1, y1, 1]’, where ‘ means transpose (to write column vectors as rows) and ~ means that the mapping is up to scale. It is easier to see in Cartesian coordinates (multiplying nominator and denominator by the same factor doesn’t change the result)
x2 = (h11*x1 + h12*y1 + h13)/(h31*x1 + h32*y1 + h33)
y2 = (h21*x1 + h22*y1 + h23)/(h31*x1 + h32*y1 + h33)
You can see that in Cartesian coordinates the mapping is non-linear, but for now just keep this in mind.
We can easily solve a former set of linear equations in Homogeneous coordinates using least squares linear algebra methods (see DLT - Direct Linear Transform) but this unfortunately only minimizes an algebraic error in homography parameters. People care more about another kind of error - namely the error that shifts points around in Cartesian coordinate systems. If there is no noise and no outliers two erros can be identical. However the presence of noise requires us to minimize the residuals in Cartesian coordinates (residuals are just squared differences between the left and right sides of Cartesian equations). On top of that, a presence of outliers requires us to use a Robust method such as RANSAC. It selects the best set of inliers and rejects a few outliers to make sure they don’t contaminate our solution.
Since RANSAC finds correct inliers by random trial and error method over many iterations we need a really fast way to compute homography and this would be a linear approximation that minimizes parameters' error (wrong metrics) but otherwise is close enough to the final solution (that minimizes squared point coordinate residuals - a right metrics). We use a linear solution as a guess for further non-linear optimization;
The final step is to use our initial guess (solution of linear system that minimized Homography parameters) in solving non-linear equations (that minimize a sum of squared pixel errors). The reason to use squared residuals instead of their absolute values, for example, is because in Gaussian formula (describes noise) we have a squared exponent exp(x-mu)^2, so (skipping some probability formulas) maximum likelihood solutions requires squared residuals.
In order to perform a non-linear optimization one typically employs a Levenberg-Marquardt method. But in the first approximation one can just use a gradient descent (note that gradient points uphill but we are looking for a minimum thus we go against it, hence a minus sign below). In a nutshell, we go through a set of iterations 1..t..N selecting homography parameters at iteration t as param(t) = param(t-1) - k * gradient, where gradient = d_cost/d_param.
Bonus material: to further minimize the noise in your homography you can try a few tricks: reduce a search space for points (start tracking your points); use different features (lines, conics, etc. that are also transformed by homography but possibly have a higher SNR); reject impossible homographs to speed up RANSAC (e.g. those that correspond to ‘impossible’ point movements); use low pass filter for small changes in Homographies that may be attributed to noise.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Is Triangle inequality necessary for kmeans? - machine-learning

I wonder if Triangle inequality is necessary for the distance measure used in kmeans.

Related

Does optimizing a bijection-transformed value affect performance or accuracy?

Better A* Search Heuristic in a 2-d grid world

Homography and projective transformation

Remove outliers from Lucas-Kanade optical flow

How is a homography calculated?

Categories

Resources