How does fitEllipse work in OpenCV? - image-processing

I am working with opencv and I need to understand how does the function fitEllipse exactly works. I looked at the code at ( and I know it uses least-squares to determine the likely ellipses. I also looked at the paper given in the documentation(Andrew W. Fitzgibbon, R.B.Fisher. A Buyer’s Guide to Conic Fitting. Proc.5th British Machine Vision Conference, Birmingham, pp. 513-522, 1995.)
But I cannot understand exactly the algorithm. For example, why does it need to solve 3 times the least square problem? why bd is initialized to 10000 before the first svd(I guess it is juste a random value for the initialization but why this value can be random?)? why does the values in Ad needs to be negative before the first svd?
Thank you!

Here is Matlab code.. it might help
function [Q,a]=fit_ellipse_fitzgibbon(data)
% function [Q,a]=fit_ellipse_fitzgibbon(data)
% Ellipse specific fit, according to:
% Direct Least Square Fitting of Ellipses,
% A. Fitzgibbon, M. Pilu and R. Fisher. PAMI 1996
% See Also:
[m,n] = size(data);
x = data(1,:)';
y = data(2,:)';
D = [x.^2 x.*y y.^2 x y ones(size(x))]; % design matrix
S = D'*D; % scatter matrix
C(6,6)=0; C(1,3)=-2; C(2,2)=1; C(3,1)=-2; % constraints matrix
% solve the generalized eigensystem
[V,D] = eig(S, C);
% find the only negative eigenvalue
[n_r, n_c] = find(D<0 & ~isinf(D));
if isempty(n_c),
warning('Error getting the ellipse parameters, will do LS');
[Q,a] = fit_ellipse_ls(data); %
% the parameters
a = V(:, n_c);
[A B C D E F] = deal(a(1),a(2),a(3),a(4),a(5),a(6)); % deal is slow!
Q = [A B/2 D/2; B/2 C E/2; D/2 E/2 F];
end % fit_ellipse_fitzgibbon
Fitzibbon solution has some numerical stability though. See the work of Halir for a solution to this.
It is essentially least squares solution, but specifically designed so that it will produce a valid ellipse, not just any conic.


Vectorizing distance to several points on Octave (Matlab)

I'm writing a k-means algorithm. At each step, I want to compute the distance of my n points to k centroids, without a for loop, and for d dimensions.
The problem is I have a hard time splitting on my number of dimensions with the Matlab functions I know. Here is my current code, with x being my n 2D-points and y my k centroids (also 2D-points of course), and with the points distributed along dimension 1, and the spatial coordinates along the dimension 2:
dist = #(a,b) (a - b).^2;
dx = bsxfun(dist, x(:,1), y(:,1)'); % x is (n,1) and y is (1,k)
dy = bsxfun(dist, x(:,2), y(:,2)'); % so the result is (n,k)
dists = dx + dy; % contains the square distance of each points to the k centroids
[_,l] = min(dists, [], 2); % we then argmin on the 2nd dimension
How to vectorize furthermore ?
First edit 3 days later, searching on my own
Since asking this question I made progress on my own towards vectorizing this piece of code.
The code above runs in approximately 0.7 ms on my example.
I first used repmat to make it easy to do broadcasting:
dists = permute(permute(repmat(x,1,1,k), [3,2,1]) - y, [3,2,1]).^2;
dists = sum(dists, 2);
[~,l] = min(dists, [], 3);
As expected it is slightly slower since we replicate the matrix, it runs at 0.85 ms.
From this example it was pretty easy to use bsxfun for the whole thing, but it turned out to be extremely slow, running in 150 ms so more than 150 times slower than the repmat version:
dist = #(a, b) (a - b).^2;
dists = permute(bsxfun(dist, permute(x, [3, 2, 1]), y), [3, 2, 1]);
dists = sum(dists, 2);
[~,l] = min(dists, [], 3);
Why is it so slow ? Isn't vectorizing always an improvement on speed, since it uses vector instructions on the CPU ? I mean of course simple for loops could be optimized to use it aswell, but how can vectorizing make the code slower ? Did I do it wrong ?
Using a for loop
For the sake of completeness, here's the for loop version of my code, surprisingly the fastest running in 0.4 ms, not sure why..
for i=1:k
dists(:,i) = sum((x - y(i,:)).^2, 2);
[~,l] = min(dists, [], 2);
Note: This answer was written when the question was also tagged MATLAB. Links to Octave documentation added after the MATLAB tag was removed.
You can use the pdist2MATLAB/Octave function to calculate pairwise distances between two sets of observations.
This way, you offload the bother of vectorization to the people who wrote MATLAB/Octave (and they have done a pretty good job of it)
X = rand(10,3);
Y = rand(5,3);
D = pdist2(X, Y);
D is now a 10x5 matrix where the i, jth element is the distance between the ith X and jth Y point.
You can pass it the kind of distance you want as the third argument -- e.g. 'euclidean', 'minkowski', etc, or you could pass a function handle to your custom function like so:
dist = #(a,b) (a - b).^2;
D = pdist2(X, Y, dist);
As saastn mentions, pdist2(..., 'smallest', k) makes things easier in k-means. This returns just the smallest k values from each column of pdist2's result. Octave doesn't have this functionality, but it's easily replicated using sort()MATLAB/Octave.
D_smallest = sort(D);
D_smallest = D_smallest(1:k, :);

"Z" Variable is undefined when used to represent a matrix for sigmoid function

I'm a high school student and I just started going into machine learning to further my knowledge of coding. I tried out the program Octave and been working with neurological networks, or at least, tried to. In my first program, however, I already found myself at an impasse with my Sigmoid gradient function. When I try to make the function work for each value within a matrix, I have no idea how to do so. I tried placing z as the parameter of the function but it says that "z" itself is undefined. I have no knowledge on C or C++, and I'm still an amateur in this area, so sorry if I take some time to understand. Thanks to anyone who offers to help!
I'm running Octave 4.4.1, and I haven't tried any other solution yet, as I don't really have any.
% Main Code
g = sigGrad([-2 -1 0 1 2]);
% G is supposed to be my sigmoid Gradient for each value of Theta, which is the matrix within it's parameters.
% Sigmoid Gradient function
function g = sigGrad(z)
g = zeros(size(z));
% This is where the code tells me that z is undefined
g = sigmoid(z).*(1.-sigmoid(z));
% I began by initializing a matrix of zeroes with the size of z
% It should later do the Gradient Equation, but it marks z as undefined before that
% Sigmoid function
g = sigmoid(z)
g = 1.0 ./ (1.0 + exp(-z));
From what I see, I make out that you are committing simple syntax mistakes, I'd recommend get a gist of octave first than diving into the code head on. That being said you have to declare your functions with proper syntax and use them as shown below:
function g = sigmoid(z)
% SIGMOID Compute sigmoid function
% J = SIGMOID(z) computes the sigmoid of z.
g = 1.0 ./ (1.0 + exp(-z));
And the other piece of code should be
function g = sigGrad(z)
% sigGrad returns the gradient of the sigmoid function evaluated at z
% g = sigGrad(z) computes the gradient of the sigmoid function evaluated at z.
% This should work regardless if z is a matrix or a vector.
% In particular, if z is a vector or matrix, you should return the gradient for each element.
g = zeros(size(z));
g = sigmoid(z).*(1 - sigmoid(z));
And then finally call the above implemented functions using:
g = sigGrad([1 -0.5 0 0.5 1]);

Dealing with NaN (missing) values for Logistic Regression- Best practices?

I am working with a data-set of patient information and trying to calculate the Propensity Score from the data using MATLAB. After removing features with many missing values, I am still left with several missing (NaN) values.
I get errors due to these missing values, as the values of my cost-function and gradient vector become NaN, when I try to perform logistic regression using the following Matlab code (from Andrew Ng's Coursera Machine Learning class) :
[m, n] = size(X);
X = [ones(m, 1) X];
initial_theta = ones(n+1, 1);
[cost, grad] = costFunction(initial_theta, X, y);
options = optimset('GradObj', 'on', 'MaxIter', 400);
[theta, cost] = ...
fminunc(#(t)(costFunction(t, X, y)), initial_theta, options);
Note: sigmoid and costfunction are working functions I created for overall ease of use.
The calculations can be performed smoothly if I replace all NaN values with 1 or 0. However I am not sure if that is the best way to deal with this issue, and I was also wondering what replacement value I should pick (in general) to get the best results for performing logistic regression with missing data. Are there any benefits/drawbacks to using a particular number (0 or 1 or something else) for replacing the said missing values in my data?
Note: I have also normalized all feature values to be in the range of 0-1.
Any insight on this issue will be highly appreciated. Thank you
As pointed out earlier, this is a generic problem people deal with regardless of the programming platform. It is called "missing data imputation".
Enforcing all missing values to a particular number certainly has drawbacks. Depending on the distribution of your data it can be drastic, for example, setting all missing values to 1 in a binary sparse data having more zeroes than ones.
Fortunately, MATLAB has a function called knnimpute that estimates a missing data point by its closest neighbor.
From my experience, I often found knnimpute useful. However, it may fall short when there are too many missing sites as in your data; the neighbors of a missing site may be incomplete as well, thereby leading to inaccurate estimation. Below, I figured out a walk-around solution to that; it begins with imputing the least incomplete columns, (optionally) imposing a safe predefined distance for the neighbors. I hope this helps.
function data = dnnimpute(data,distCutoff,option,distMetric)
% data = dnnimpute(data,distCutoff,option,distMetric)
% Distance-based nearest neighbor imputation that impose a distance
% cutoff to determine nearest neighbors, i.e., avoids those samples
% that are more distant than the distCutoff argument.
% Imputes missing data coded by "NaN" starting from the covarites
% (columns) with the least number of missing data. Then it continues by
% including more (complete) covariates in the calculation of pair-wise
% distances.
% option,
% 'median' - Median of the nearest neighboring values
% 'weighted' - Weighted average of the nearest neighboring values
% 'default' - Unweighted average of the nearest neighboring values
% distMetric,
% 'euclidean' - Euclidean distance (default)
% 'seuclidean' - Standardized Euclidean distance. Each coordinate
% difference between rows in X is scaled by dividing
% by the corresponding element of the standard
% deviation S=NANSTD(X). To specify another value for
% S, use D=pdist(X,'seuclidean',S).
% 'cityblock' - City Block distance
% 'minkowski' - Minkowski distance. The default exponent is 2. To
% specify a different exponent, use
% D = pdist(X,'minkowski',P), where the exponent P is
% a scalar positive value.
% 'chebychev' - Chebychev distance (maximum coordinate difference)
% 'mahalanobis' - Mahalanobis distance, using the sample covariance
% of X as computed by NANCOV. To compute the distance
% with a different covariance, use
% D = pdist(X,'mahalanobis',C), where the matrix C
% is symmetric and positive definite.
% 'cosine' - One minus the cosine of the included angle
% between observations (treated as vectors)
% 'correlation' - One minus the sample linear correlation between
% observations (treated as sequences of values).
% 'spearman' - One minus the sample Spearman's rank correlation
% between observations (treated as sequences of values).
% 'hamming' - Hamming distance, percentage of coordinates
% that differ
% 'jaccard' - One minus the Jaccard coefficient, the
% percentage of nonzero coordinates that differ
% function - A distance function specified using #, for
% example #DISTFUN.
if nargin < 3
option = 'mean';
if nargin < 4
distMetric = 'euclidean';
nanVals = isnan(data);
nanValsPerCov = sum(nanVals,1);
noNansCov = nanValsPerCov == 0;
if isempty(find(noNansCov, 1))
[~,leastNans] = min(nanValsPerCov);
noNansCov(leastNans) = true;
first = data(nanVals(:,noNansCov),:);
nanRows = find(nanVals(:,noNansCov)==true); i = 1;
for row = first'
data(nanRows(i),noNansCov) = mean(row(~isnan(row)));
i = i+1;
nSamples = size(data,1);
if nargin < 2
dataNoNans = data(:,noNansCov);
distances = pdist(dataNoNans);
distCutoff = min(distances);
[stdCovMissDat,idxCovMissDat] = sort(nanValsPerCov,'ascend');
imputeCols = idxCovMissDat(stdCovMissDat>0);
% Impute starting from the cols (covariates) with the least number of
% missing data.
for c = reshape(imputeCols,1,length(imputeCols))
imputeRows = 1:nSamples;
imputeRows = imputeRows(nanVals(:,c));
for r = reshape(imputeRows,1,length(imputeRows))
% Calculate distances
distR = inf(nSamples,1);
noNansCov_r = find(isnan(data(r,:))==0);
noNansCov_r = noNansCov_r(sum(isnan(data(nanVals(:,c)'==false,~isnan(data(r,:)))),1)==0);
for i = find(nanVals(:,c)'==false)
distR(i) = pdist([data(r,noNansCov_r); data(i,noNansCov_r)],distMetric);
tmp = min(distR(distR>0));
% Impute the missing data at sample r of covariate c
switch option
case 'weighted'
data(r,c) = (1./distR(distR<=max(distCutoff,tmp)))' * data(distR<=max(distCutoff,tmp),c) / sum(1./distR(distR<=max(distCutoff,tmp)));
case 'median'
data(r,c) = median(data(distR<=max(distCutoff,tmp),c),1);
case 'mean'
data(r,c) = mean(data(distR<=max(distCutoff,tmp),c),1);
% The missing data in sample r is imputed. Update the sample
% indices of c which are imputed.
nanVals(r,c) = false;
fprintf('%u/%u of the covariates are imputed.\n',find(c==imputeCols),length(imputeCols));
To deal with missing data you can use one of the following three options:
If there are not many instances with missing values, you can just delete the ones with missing values.
If you have many features and it is affordable to lose some information, delete the entire feature with missing values.
The best method is to fill some value (mean, median) in place of missing value. You can calculate the mean of the rest of the training examples for that feature and fill all the missing values with the mean. This works out pretty well as the mean value stays in the distribution of your data.
Note: When you replace the missing values with the mean, calculate the mean only using training set. Also, store that value and use it to change the missing values in the test set also.
If you use 0 or 1 to replace all the missing values then the data may get skewed so it is better to replace the missing values by an average of all the other values.

OpenCV: Essential Matrix Decomposition

I am trying to extract Rotation matrix and Translation vector from the essential matrix.
Mat svd_u = svd.u;
Mat svd_vt = svd.vt;
Mat svd_w = svd.w;
Matx33d W(0,-1,0,
Mat_<double> R = svd_u * Mat(W).t() * svd_vt; //or svd_u * Mat(W) * svd_vt;
Mat_<double> t = svd_u.col(2); //or -svd_u.col(2)
However, when I am using R and T (e.g. to obtain rectified images), the result does not seem to be right(black images or some obviously wrong outputs), even so I used different combination of possible R and T.
I suspected to E. According to the text books, my calculation is right if we have:
E = U*diag(1, 1, 0)*Vt
In my case svd.w which is supposed to be diag(1, 1, 0) [at least in term of a scale], is not so. Here is an example of my output:
svd.w = [21.47903827647813; 20.28555196246256; 5.167099204708699e-010]
Also, two of the eigenvalues of E should be equal and the third one should be zero. In the same case the result is:
eigenvalues of E = 0.0000 + 0.0000i, 0.3143 +20.8610i, 0.3143 -20.8610i
As you see, two of them are complex conjugates.
Now, the questions are:
Is the decomposition of E and calculation of R and T done in a right way?
If the calculation is right, why the internal rules of essential matrix are not satisfied by the results?
If everything about E, R, and T is fine, why the rectified images obtained by them are not correct?
I get E from fundamental matrix, which I suppose to be right. I draw epipolar lines on both the left and right images and they all pass through the related points (for all the 16 points used to calculate the fundamental matrix).
Any help would be appreciated.
I see two issues.
First, discounting the negligible value of the third diagonal term, your E is about 6% off the ideal one: err_percent = (21.48 - 20.29) / 20.29 * 100 . Sounds small, but translated in terms of pixel error it may be an altogether larger amount.
So I'd start by replacing E with the ideal one after SVD decomposition: Er = U * diag(1,1,0) * Vt.
Second, the textbook decomposition admits 4 solutions, only one of which is physically plausible (i.e. with 3D points in front of the camera). You may be hitting one of non-physical ones. See .

Cascaded Hough Transform in OpenCV

Is it possible to perform a Cascaded Hough Transform in OpenCV? I understand its just a HT followed by another one. The problem I'm facing is that the values returned are always rho and theta and never in y-intercept form.
Is it possible to convert these values back to y-intercept and split them into sub-spaces so I can detect vanishing points?
Or is it just better to program an implementation of HT myself in, say, Python?
you could try to populate the Hough domain with m and c parameters instead, so that y = mx + c can be re-written as c = y - mx so instead of the usual rho = x cos(theta) + y sin(theta), you have c = y - mx
normally, you'd go through the thetas and calculate the rho, then you increment the accumulator value for that pair of rho and theta. Here, you'd go through the value of m and calculate the values of c, then accumulate that m,c element in the accumulator. The bin with the most votes would be the right m,c
// going through the image looking for edge pixels
for (i = 0;i<numrows;i++)
for (j = 0;j<numcols;j++)
if (img[i*numcols + j] > 1)
for (n = first_m;n<last_m;n++)
index = i - n * j;
I guess where this becomes ineffective is that its hard to define the step size for going through m as they should technically go from -infinity to infinity so you'd kind of have trouble. yeah, so much for Hough transform in terms of m,c. Lol
I guess you could go the other way and isolate m so it would be m = (y-c)/x so that now, you cycle through a bunch of y values that make sense and its much more manageable though it's still hard to define your accumulator matrix because m still has no limit. I guess you could limit the values of m that you would be interested in looking for.
Yeah, much more sense to go with rho and theta and convert them into y = mx + c and then even making a brand new image and re-running the hough transform on it.
I don't think OpenCV can perform cascaded hough transforms. You should convert them to xy space yourself. This article might help you:
