determinant calculation with SIMD - sse

Does there exist an approach for calculating the determinant of matrices with low dimensions (about 4), that works well with SIMD (neon, SSE, SSE2)? I am using a hand-expansion formula, which does not work so well. I am using SSE all the way to SSE3 and neon, both under linux. The matrix elements are all floats.

Here's my 5 cents.
determinant of a 2x2 matrix:
that's an exercise for the reader, should be simple to implement
determinant of a 3x3 matrix:
use the scalar triple product. This will require smart cross() and dot() implementations. The recipes for these are widely available.
determinant of a 4x4 matrix:
Use one of the tricks in here. My code:
template <class T>
inline T det(matrix<T, 4, 4> const& m) noexcept
{
auto const A(make_matrix<T, 2, 2>(m(0, 0), m(0, 1), m(1, 0), m(1, 1)));
auto const B(make_matrix<T, 2, 2>(m(0, 2), m(0, 3), m(1, 2), m(1, 3)));
auto const C(make_matrix<T, 2, 2>(m(2, 0), m(2, 1), m(3, 0), m(3, 1)));
auto const D(make_matrix<T, 2, 2>(m(2, 2), m(2, 3), m(3, 2), m(3, 3)));
return det(A - B * inv(D) * C) * det(D);
}
determinant of a 5x5+ matrix:
probably use the tricks above.

Related

Technique to introduce normalisation/consistency to std dev comparison?

I am implementing a very simple segmentation algorithm for single channel images. The algorithm works like so:
For a single channel image:
Calculate the standard deviation, ie, measure how much the luminosity varies across the image.
If the stddev > 15 (aka threshold):
Divide the image into 4 cells/images
For each cell:
Repeat step 1 and step 2 (go recursive)
Else:
Draw a rectangle on the source image to signify a segment lies in these bounds.
My problem occurs because my threshold is constant and when I go recursive 15 is not longer a good signifier of whether that image is homogeneous or not. How can I introduce consistency/normalisation to my homogeneity check?
Should I resize each image to the same size (100x100)? Should my threshold be formula? Say 15 / img.rows * img.cols or 15 / MAX_HISTOGRAM_PEAK?
Edit Code:
void split_mat(const Mat& src, Mat& split1, Mat& split2, Mat& split3, Mat& split4) {
split1 = Mat(src, Rect(Point(0, 0), Point(src.cols / 2, src.rows / 2)));
split1 = Mat(src, Rect(Point(src.cols/2, 0), Point(src.cols, src.rows / 2)));
split3 = Mat(src, Rect(Point(0, src.rows/2), Point(src.cols / 2, src.rows)));
split4 = Mat(src, Rect(Point(src.cols/2, src.rows/2), Point(src.cols, src.rows)));
}
void segment_by_homogeny(const Mat& src, double threshold) {
Scalar mean, stddev;
meanStdDev(src, mean, stddev);
double dev = stddev[0]; // / (src.rows * src.cols) * 100.0;
if (dev >= threshold) {
Mat s1, s2, s3, s4;
split_mat(src, s1, s2, s3, s4);
// Go recursive and segment each sub-segment where necessary
segment_by_homogeny(s1, threshold);
segment_by_homogeny(s2, threshold);
segment_by_homogeny(s3, threshold);
segment_by_homogeny(s4, threshold);
}
else {
// Store 'segment' in global vector 'images'
// and write std dev on it
char d[255];
sprintf_s(d, "Std Dev: %f", stddev[0]);
putText(src, d, cvPoint(30, 60),
FONT_HERSHEY_COMPLEX_SMALL, 0.7, cvScalar(200, 200, 250), 1, CV_AA);
images.push_back(src);
}
}
// current usage for the example image results in inifinite recursion.
// The green and red segment never has a std dev < 25
segment_by_homogeny(img, 25);
I am expecting my algorithm to produce the following 5 segments:
You can simplify your algorithm. Because you want to divide the given region into 4 subregions, you can first divide it into the 4 subregions, then calculate the average luminosity value for each, and have your threshold on the difference between these neighbor values.

Training convolutional nets on multi-channel image data sets

I am trying to implement a convolutional neural network from scratch and I am not able to figure out how to perform (vectorized)operations on multi-channel images like rgb, which have 3 dimensions. On following the articles and tutorials such as this CS231n tutorial , it's pretty clear to implement a network for a single input as the input layer will be a 3d matrix but there are always multiple data points in a dataset. so, I cannot figure out how to implement these networks for vectorized operation on entire datsets.
I have implemented a network which takes a 3d matrix as input but now I have realized that It will not work on entire dataset but I will have to propagate one input at a time.I don't really know whether conv nets are vectorized over entire dataset or not .But if they are, how can I vectorize my convolutional network for multi-channel images ?
If I got your question right, you're basically asking how to do convolutional layer for a mini-batch, which will be a 4-D tensor.
To put it simply, you want to treat each input in a batch independently and apply convolution to each one. It's fairly straightforward to code without vectorization using a loop.
A vectorization implementation is often based on im2col technique, which basically transforms the 4-D input tensor into a giant matrix and performs a matrix multiplication. Here's an implementation of a forward pass using numpy.lib.stride_tricks in python:
import numpy as np
def conv_forward(x, w, b, stride, pad):
N, C, H, W = x.shape
F, _, HH, WW = w.shape
# Check dimensions
assert (W + 2 * pad - WW) % stride == 0, 'width does not work'
assert (H + 2 * pad - HH) % stride == 0, 'height does not work'
# Pad the input
p = pad
x_padded = np.pad(x, ((0, 0), (0, 0), (p, p), (p, p)), mode='constant')
# Figure out output dimensions
H += 2 * pad
W += 2 * pad
out_h = (H - HH) / stride + 1
out_w = (W - WW) / stride + 1
# Perform an im2col operation by picking clever strides
shape = (C, HH, WW, N, out_h, out_w)
strides = (H * W, W, 1, C * H * W, stride * W, stride)
strides = x.itemsize * np.array(strides)
x_stride = np.lib.stride_tricks.as_strided(x_padded,
shape=shape, strides=strides)
x_cols = np.ascontiguousarray(x_stride)
x_cols.shape = (C * HH * WW, N * out_h * out_w)
# Now all our convolutions are a big matrix multiply
res = w.reshape(F, -1).dot(x_cols) + b.reshape(-1, 1)
# Reshape the output
res.shape = (F, N, out_h, out_w)
out = res.transpose(1, 0, 2, 3)
out = np.ascontiguousarray(out)
return out
Note that it uses some non-trivial features of linear algebra library, which are implemented in numpy, but may be not in your library.
BTW, you generally don't want to push the entire data set as one batch - split it into several batches.

Convert several 1D mat to a single 2D mat in OpenCV [duplicate]

I have three matrices, each of size 4x1. I want to copy all of these matrices to another matrix of size 4x3 and call it R. Is there a smart way to do it?
You can just use hconcat for horizontal concatenation. You can use it per matrix, e.g. hconcat( mat1, mat2, R ), or apply it directly on a vector or array of matrices.
Here's a sample code:
vector<Mat> matrices = {
Mat(4, 1, CV_8UC1, Scalar(1)),
Mat(4, 1, CV_8UC1, Scalar(2)),
Mat(4, 1, CV_8UC1, Scalar(3)),
};
Mat R;
hconcat( matrices, R );
cout << R << endl;
Here's the result:
[1, 2, 3;
1, 2, 3;
1, 2, 3;
1, 2, 3]
Program ended with exit code: 1
Similarly, if you want to do it vertically (stack by rows), use vconcat.
You can use
Mat R(3, 4, CV_32F); // [3 rows x 4 cols] with float values
mat1.copyTo(R.row(0));
mat2.copyTo(R.row(1));
mat3.copyTo(R.row(2));
or
Mat R(4, 3, CV_32F); // [4 rows x 3 cols] with float values
mat1.copyTo(R.col(0));
mat2.copyTo(R.col(1));
mat3.copyTo(R.col(2));
Alternatively, as #sub_o suggested, you can also use hconcat()/vconcat() to concatenate matrices.
For those using OpenCv in Python, if you have arrays A, B, and C, and want array D that is horizontal concatenation of the others:
D = cv2.hconcat((A, B, C))
There is also a vconcat method.

Opencv multiply scalar and matrix

I have been trying to achieve something which should pretty trivial and is trivial in Matlab.
Using methods of OpenCV, I want to simply achieve something such as:
cv::Mat sample = [4 5 6; 4 2 5; 1 4 2];
sample = 5*sample;
After which sample should just be:
[20 25 30; 20 10 25; 5 20 10]
I have tried scaleAdd, Mul, Multiply and neither allow a scalar multiplier and require a matrix of the same "size and type". In this scenario I could create a Matrix of Ones and then use the scale parameter but that seems so very extraneous
Any direct simple method would be great!
OpenCV does in fact support multiplication by a scalar value with overloaded operator*. You might need to initialize the matrix correctly, though.
float data[] = {1 ,2, 3,
4, 5, 6,
7, 8, 9};
cv::Mat m(3, 3, CV_32FC1, data);
m = 3*m; // This works just fine
If you are mainly interested in mathematical operations, cv::Matx is a little easier to work with:
cv::Matx33f mx(1,2,3,
4,5,6,
7,8,9);
mx *= 4; // This works too
For java there is no operator overloading, but the Mat object provides the functionality with a convertTo method.
Mat dst= new Mat(src.rows(),src.cols(),src.type());
src.convertTo(dst,-1,scale,offset);
Doc on this method is here
For big Mats you should use forEach.
If C++11 is available:
m.forEach<double>([&](double& element, const int position[]) -> void
{
element *= 5;
}
);
something like this.
Mat m = (Mat_<float>(3, 3)<<
1, 2, 3,
4, 5, 6,
7, 8, 9)*5;
Mat A = //data;//source Matrix
Mat B;//destination Matrix
Scalar alpha = new Scalar(5)//factor
Core.multiply(A,alpha,b);

OpenCV Error: Bad argument <Unknown array type> in unknown function, file ..\..\..\modules\core\src\matrix.cpp, line 697

I'm currently trying to rectify stereo cameras to create a disparity map. Unfortunately, I'm having trouble getting past the stereo rectification step because I keep receiving the error
"OpenCV Error: Bad argument in unknown function, file ..\..\..\modules\core\src\matrix.cpp, line 697."
The process is complicated by the fact that I'm not the one one who calibrated the cameras, nor do I have access to the cameras used to record the videos. I was given all of the calibration parameters (intrinsics, distortion coefficients, rotation matrix, and translation vector). As you can see, I've tried to turn these directly into CvMats and use them that way, but I get an error when I try to actually use them.
Thanks in advance.
CvMat li, lm, ri, rm, r, t, Rl, Rr, Pl, Pr;
double init_li[3][3] =
{ {477.984984743, 0, 316.17458671},
{0, 476.861945645, 253.45073026},
{0, 0 ,1} };
double init_lm[5] = {-0.117798518453, 0.147554949385, -0.0549082041898, 0, 0};
double init_ri[3][3] =
{{478.640315323, 0, 299.957994781},
{0, 477.898896505, 251.665771947},
{0, 0, 1}};
double init_rm[5] = {-0.10884732532, 0.12118405303, -0.0322073237741, 0, 0};
double init_r[3][3] =
{{0.999973709051976, 0.00129700728791757, -0.00713435189275776},
{-0.00132096594266573, 0.999993501087837, -0.00335452397041856},
{0.00712995468519435, 0.00336386001267643, 0.99996892361313}};
double init_t[3] = {-0.0830973040641153, -0.00062704210860633, 1.4287643345188e-005};
cvInitMatHeader(&li, 3, 3, CV_64FC1, init_li);
cvInitMatHeader(&lm, 5, 1, CV_64FC1, init_lm);
cvInitMatHeader(&ri, 3, 3, CV_64FC1, init_ri);
cvInitMatHeader(&rm, 5, 1, CV_64FC1, init_rm);
cvInitMatHeader(&r, 3, 3, CV_64FC1, init_r);
cvInitMatHeader(&t, 3, 1, CV_64FC1, init_t);
cvInitMatHeader(&Rl, 3,3, CV_64FC1);
cvInitMatHeader(&Rr, 3,3, CV_64FC1);
cvInitMatHeader(&Pl, 3,4, CV_64FC1);
cvInitMatHeader(&Pr, 3,4, CV_64FC1);
//frame is a cv::MAT holding the first frame of the video.
CvSize imageSize = frame.size();
imageSize.width /= 2;
//IT BREAKS HERE
cvStereoRectify(&li, &ri, &lm, &rm, imageSize, &r, &t, &Rl, &Rr, &Pl, &Pr);
so, you've been bitten by the c-api ? why don't you just turn your back on it ?
use the c++ api whenever possible, don't start learning opencv with the old(1.0), deprecated api, please !
double init_li[9] =
{ 477.984984743, 0, 316.17458671,
0, 476.861945645, 253.45073026,
0, 0 ,1 };
double init_lm[5] = {-0.117798518453, 0.147554949385, -0.0549082041898, 0, 0};
double init_ri[9] =
{ 478.640315323, 0, 299.957994781,
0, 477.898896505, 251.665771947,
0, 0, 1};
double init_rm[5] = {-0.10884732532, 0.12118405303, -0.0322073237741, 0, 0};
double init_r[9] =
{ 0.999973709051976, 0.00129700728791757, -0.00713435189275776,
-0.00132096594266573, 0.999993501087837, -0.00335452397041856,
0.00712995468519435, 0.00336386001267643, 0.99996892361313};
double init_t[3] = {-0.0830973040641153, -0.00062704210860633, 1.4287643345188e-005};
cv::Mat li(3, 3, CV_64FC1, init_li);
cv::Mat lm(5, 1, CV_64FC1, init_lm);
cv::Mat ri(3, 3, CV_64FC1, init_ri);
cv::Mat rm(5, 1, CV_64FC1, init_rm);
cv::Mat r, t, Rl, Rr, Pl, Pr; // note: no initialization needed.
//frame is a cv::MAT holding the first frame of the video.
cv::Size imageSize = frame.size();
imageSize.width /= 2;
//IT won't break HERE
cv::stereoRectify(li, ri, lm, rm, imageSize, r, t, Rl, Rr, Pl, Pr);
// no need ever to release or care about anything
Ok, so I figured out the answer. The problem was that I had only initialized headers for Rl, Rr, Pl, and Pr, but no memory was allocated for the data itself. I was able to fix it as follows:
double init_Rl[3][3];
double init_Rr[3][3];
double init_Pl[3][4];
double init_Pr[3][4];
cvInitMatHeader(&Rl, 3,3, CV_64FC1, init_Rl);
cvInitMatHeader(&Rr, 3,3, CV_64FC1, init_Rr);
cvInitMatHeader(&Pl, 3,4, CV_64FC1, init_Pl);
cvInitMatHeader(&Pr, 3,4, CV_64FC1, init_Pr);
Although, I have a theory that I might have been able to use cv::stereoRectify with cv::Mats as parameters, which would have made life much easier. I don't know if cv::stereoRectify exists, but it seems that versions of many of the other c functions are in the cv namespace. In case it's hard to tell, I'm very new to OpenCV.

Resources