How to apply a kernel to a raster image - image-processing

Im trying to apply a Sharpen Kernel to a raster picture, Here is my kernel:
{ 0.0f,-1.0f,0.0f,
-1.0f,5.0f,-1.0f,
0.0f,-1.0f,0.0f }
And here is my Code:
struct Pixel{
GLubyte R, G, B;
float x, y;
};
. . .
for (unsigned i = 1; i < iWidth - 1; i++){
for (unsigned j = 1; j < iHeight - 1; j++){
float r = 0, g = 0, b = 0;
r += -(float)pixels[i + 1][j].R;
g += -(float)pixels[i + 1][j].G;
b += -(float)pixels[i + 1][j].B;
r += -(float)pixels[i - 1][j].R;
g += -(float)pixels[i - 1][j].G;
b += -(float)pixels[i - 1][j].B;
r += -(float)pixels[i][j + 1].R;
g += -(float)pixels[i][j + 1].G;
b += -(float)pixels[i][j + 1].B;
r += -(float)pixels[i][j - 1].R;
g += -(float)pixels[i][j - 1].G;
b += -(float)pixels[i][j - 1].B;
pixels[i][j].R = (GLubyte)((pixels[i][j].R * 5) + r);
pixels[i][j].G = (GLubyte)((pixels[i][j].G * 5) + g);
pixels[i][j].B = (GLubyte)((pixels[i][j].B * 5) + b);
}
}
But the colors get mixed up when I apply this kernel, Here is an example:
What am I doing wrong?
NOTE : I know that OpenGL can do this fast and easy, but I just wanted to experiment on this kind of masks.
EDIT : The first code had a bug:
pixels[i][j].R = (GLubyte)((pixels[i][j].R * 5) + r);
pixels[i][j].G = (GLubyte)((pixels[i][j].R/*G*/ * 5) + g);
pixels[i][j].B = (GLubyte)((pixels[i][j].R/*B*/ * 5) + b);
I fixed it but I still got that problem.
Iv changed the last three lines to this:
r = (float)((pixels[i][j].R * 5) + r);
g = (float)((pixels[i][j].G * 5) + g);
b = (float)((pixels[i][j].B * 5) + b);
if (r < 0) r = 0;
if (g < 0) g = 0;
if (b < 0) b = 0;
if (r > 255) r = 255;
if (g > 255) g = 255;
if (b > 255) b = 255;
pixels[i][j].R = r;
pixels[i][j].G = g;
pixels[i][j].B = b;
And now the output looks like this:

You have a copy-paste bug here:
pixels[i][j].R = (GLubyte)((pixels[i][j].R * 5) + r);
pixels[i][j].G = (GLubyte)((pixels[i][j].R * 5) + g);
pixels[i][j].B = (GLubyte)((pixels[i][j].R * 5) + b);
^
This should be:
pixels[i][j].R = (GLubyte)((pixels[i][j].R * 5) + r);
pixels[i][j].G = (GLubyte)((pixels[i][j].G * 5) + g);
pixels[i][j].B = (GLubyte)((pixels[i][j].B * 5) + b);
Also it looks like you may have iWidth/iHeight transposed, but it's hard to say without seeing the rest of the code. Typically though the outer loop iterates over rows, so the upper bound would be the number of rows, i.e. the image height.
Most importantly though you have a fundamental problem in that you're trying to perform a neighbourhood operation in-place. Each output pixel depends on its neighbours, but you're modifying these neighbours as you iterate through the image. You need to do this kind of operation out-of-place, i.e. have a separate output image:
out_pixels[i][j].R = r;
out_pixels[i][j].G = g;
out_pixels[i][j].B = b;
so that the input image does not get modified. (Note also that you'll want to copy the edge pixels over from the input image to the output image.)

Related

Is the Sharpness filter available in Konvajs, if it is there how to use that?

https://konvajs.org/api/Konva.Filters.html
in this link the sharpness filter is not available
Konva doesn't have such a filter in its core. You have to implement it manually.
For that use case, you can write your own custom filter. See custom filters docs.
I tried to use that sharpen implementation: https://gist.github.com/mikecao/65d9fc92dc7197cb8a7c
// noprotect
const stage = new Konva.Stage({
container: 'container',
width: window.innerWidth,
height: window.innerHeight
});
const layer = new Konva.Layer();
stage.add(layer);
function Sharpen(srcData) {
const mix = 1;
const w = srcData.width;
const h = srcData.height;
const canvas = document.createElement('canvas');
const ctx = canvas.getContext('2d');
var x, sx, sy, r, g, b, a, dstOff, srcOff, wt, cx, cy, scy, scx,
weights = [0, -1, 0, -1, 5, -1, 0, -1, 0],
katet = Math.round(Math.sqrt(weights.length)),
half = (katet * 0.5) | 0,
dstData = ctx.createImageData(w, h),
dstBuff = dstData.data,
srcBuff = srcData.data,
y = h;
while (y--) {
x = w;
while (x--) {
sy = y;
sx = x;
dstOff = (y * w + x) * 4;
r = 0;
g = 0;
b = 0;
a = 0;
for (cy = 0; cy < katet; cy++) {
for (cx = 0; cx < katet; cx++) {
scy = sy + cy - half;
scx = sx + cx - half;
if (scy >= 0 && scy < h && scx >= 0 && scx < w) {
srcOff = (scy * w + scx) * 4;
wt = weights[cy * katet + cx];
r += srcBuff[srcOff] * wt;
g += srcBuff[srcOff + 1] * wt;
b += srcBuff[srcOff + 2] * wt;
a += srcBuff[srcOff + 3] * wt;
}
}
}
dstBuff[dstOff] = r * mix + srcBuff[dstOff] * (1 - mix);
dstBuff[dstOff + 1] = g * mix + srcBuff[dstOff + 1] * (1 - mix);
dstBuff[dstOff + 2] = b * mix + srcBuff[dstOff + 2] * (1 - mix);
dstBuff[dstOff + 3] = srcBuff[dstOff + 3];
}
}
for(var i = 0; i < dstData.data.length; i++) {
srcData.data[i] = dstData.data[i];
}
}
Konva.Image.fromURL('https://i.imgur.com/ktWThtZ.png', img => {
img.setAttrs({filters: [Sharpen]});
img.cache();
layer.add(img);
layer.draw();
});
Demo: https://jsbin.com/tejalusano/1/edit?html,js,output

Separable gaussian blur - optimize vertical pass

I have implemented separable Gaussian blur. Horizontal pass was relatively easy to optimize with SIMD processing. However, I am not sure how to optimize vertical pass.
Accessing elements is not very cache friendly and filling SIMD lane would mean reading many different pixels. I was thinking about transpose the image and run horizontal pass and then transpose image back, however, I am not sure if it will gain any improvement because of two tranpose operations.
I have quite large images 16k resolution and kernel size is 19, so vectorization of vertical pass gain was about 15%.
My Vertical pass is as follows (it is sinde generic class typed to T which can be uint8_t or float):
int yStart = kernelHalfSize;
int xStart = kernelHalfSize;
int yEnd = input.GetWidth() - kernelHalfSize;
int xEnd = input.GetHeigh() - kernelHalfSize;
const T * inData = input.GetData().data();
V * outData = output.GetData().data();
int kn = kernelHalfSize * 2 + 1;
int kn4 = kn - kn % 4;
for (int y = yStart; y < yEnd; y++)
{
size_t yW = size_t(y) * output.GetWidth();
size_t outX = size_t(xStart) + yW;
size_t xEndSimd = xStart;
int len = xEnd - xStart;
len = len - len % 4;
xEndSimd = xStart + len;
for (int x = xStart; x < xEndSimd; x += 4)
{
size_t inYW = size_t(y) * input.GetWidth();
size_t x0 = ((x + 0) - kernelHalfSize) + inYW;
size_t x1 = x0 + 1;
size_t x2 = x0 + 2;
size_t x3 = x0 + 3;
__m128 sumDot = _mm_setzero_ps();
int i = 0;
for (; i < kn4; i += 4)
{
__m128 kx = _mm_set_ps1(kernelDataX[i + 0]);
__m128 ky = _mm_set_ps1(kernelDataX[i + 1]);
__m128 kz = _mm_set_ps1(kernelDataX[i + 2]);
__m128 kw = _mm_set_ps1(kernelDataX[i + 3]);
__m128 dx, dy, dz, dw;
if constexpr (std::is_same<T, uint8_t>::value)
{
//we need co convert uint8_t inputs to float
__m128i u8_0 = _mm_loadu_si128((const __m128i*)(inData + x0));
__m128i u8_1 = _mm_loadu_si128((const __m128i*)(inData + x1));
__m128i u8_2 = _mm_loadu_si128((const __m128i*)(inData + x2));
__m128i u8_3 = _mm_loadu_si128((const __m128i*)(inData + x3));
__m128i u32_0 = _mm_unpacklo_epi16(
_mm_unpacklo_epi8(u8_0, _mm_setzero_si128()),
_mm_setzero_si128());
__m128i u32_1 = _mm_unpacklo_epi16(
_mm_unpacklo_epi8(u8_1, _mm_setzero_si128()),
_mm_setzero_si128());
__m128i u32_2 = _mm_unpacklo_epi16(
_mm_unpacklo_epi8(u8_2, _mm_setzero_si128()),
_mm_setzero_si128());
__m128i u32_3 = _mm_unpacklo_epi16(
_mm_unpacklo_epi8(u8_3, _mm_setzero_si128()),
_mm_setzero_si128());
dx = _mm_cvtepi32_ps(u32_0);
dy = _mm_cvtepi32_ps(u32_1);
dz = _mm_cvtepi32_ps(u32_2);
dw = _mm_cvtepi32_ps(u32_3);
}
else
{
/*
//load 8 consecutive values
auto dd = _mm256_loadu_ps(inData + x0);
//extract parts by shifting and casting to 4 values float
dx = _mm256_castps256_ps128(dd);
dy = _mm256_castps256_ps128(_mm256_permutevar8x32_ps(dd, _mm256_set_epi32(0, 0, 0, 0, 4, 3, 2, 1)));
dz = _mm256_castps256_ps128(_mm256_permutevar8x32_ps(dd, _mm256_set_epi32(0, 0, 0, 0, 5, 4, 3, 2)));
dw = _mm256_castps256_ps128(_mm256_permutevar8x32_ps(dd, _mm256_set_epi32(0, 0, 0, 0, 6, 5, 4, 3)));
*/
dx = _mm_loadu_ps(inData + x0);
dy = _mm_loadu_ps(inData + x1);
dz = _mm_loadu_ps(inData + x2);
dw = _mm_loadu_ps(inData + x3);
}
//calculate 4 dots at once
//[dx, dy, dz, dw] <dot> [kx, ky, kz, kw]
auto mx = _mm_mul_ps(dx, kx); //dx * kx
auto my = _mm_fmadd_ps(dy, ky, mx); //mx + dy * ky
auto mz = _mm_fmadd_ps(dz, kz, my); //my + dz * kz
auto res = _mm_fmadd_ps(dw, kw, mz); //mz + dw * kw
sumDot = _mm_add_ps(sumDot, res);
x0 += 4;
x1 += 4;
x2 += 4;
x3 += 4;
}
for (; i < kn; i++)
{
auto v = _mm_set_ps1(kernelDataX[i]);
auto v2 = _mm_set_ps(
*(inData + x3), *(inData + x2),
*(inData + x1), *(inData + x0)
);
sumDot = _mm_add_ps(sumDot, _mm_mul_ps(v, v2));
x0++;
x1++;
x2++;
x3++;
}
sumDot = _mm_mul_ps(sumDot, _mm_set_ps1(weightX));
if constexpr (std::is_same<V, uint8_t>::value)
{
__m128i asInt = _mm_cvtps_epi32(sumDot);
asInt = _mm_packus_epi32(asInt, asInt);
asInt = _mm_packus_epi16(asInt, asInt);
uint32_t res = _mm_cvtsi128_si32(asInt);
((uint32_t *)(outData + outX))[0] = res;
outX += 4;
}
else
{
float tmpRes[4];
_mm_store_ps(tmpRes, sumDot);
outData[outX + 0] = tmpRes[0];
outData[outX + 1] = tmpRes[1];
outData[outX + 2] = tmpRes[2];
outData[outX + 3] = tmpRes[3];
outX += 4;
}
}
for (int x = xEndSimd; x < xEnd; x++)
{
int kn = kernelHalfSize * 2 + 1;
const T * v = input.GetPixelStart(x - kernelHalfSize, y);
float tmp = 0;
for (int i = 0; i < kn; i++)
{
tmp += kernelDataX[i] * v[i];
}
tmp *= weightX;
outData[outX] = ImageUtils::clamp_cast<V>(tmp);
outX++;
}
}
There’s a well-known trick for that.
While you compute both passes, read them sequentially, use SIMD to compute, but write out the result into another buffer, transposed, using scalar stores. Protip: SSE 4.1 has _mm_extract_ps just don’t forget to cast your destination image pointer from float* into int*. Another thing about these stores, I would recommend using _mm_stream_si32 for that as you want maximum cache space used by your input data. When you’ll be computing the second pass, you’ll be reading sequential memory addresses again, the prefetcher hardware will deal with the latency.
This way both passes will be identical, I usually call same function twice, with different buffers.
Two transposes caused by your 2 passes cancel each other. Here’s an HLSL version, BTW.
There’s more. If your kernel size is only 19, that fits in 3 AVX registers. I think shuffle/permute/blend instructions are still faster than even L1 cache loads, i.e. it might be better to load the kernel outside the loop.

Converting from RGB to Lαβ Color spaces and converting it back to RGB using OpenCV

I am currently trying to convert colors between RGB (red, green, blue) color space and Lαβ color space, Based on the details in the this paper.
My difficulties are in reversing the conversion process. When the result is not as same as initial RGB Mat. I think I missing something in type castings between Mats but I can't tell what is it!
here is my code:
<!-- language: lang-cc -->
Mat DetectTrackFace::RGB2LAlphBeta(Mat &src)
{
Mat dest;
Mat L_AlphBeta(src.rows, src.cols, CV_32FC3);
//cvtColor(src,dest,CV_BGR2XYZ);
float X,Y,Z,L,M,S,_L,Alph,Beta;
int R,G,B;
for(int i = 0; i < src.rows; i++)
{
for(int j = 0; j < src.cols; j++)
{
B = src.at<Vec3b>(i, j)[0];
G = src.at<Vec3b>(i, j)[1];
R = src.at<Vec3b>(i, j)[2];
X = ( 0.4124 * R ) + ( 0.3576 * G ) + ( 0.1805 * B);
Y = ( 0.2126 * R ) + ( 0.7152 * G ) + ( 0.0722 * B);
Z = ( 0.0193 * R ) + ( 0.1192 * G ) + ( 0.9505 * B);
L = (0.3897 * X) + (0.6890 * Y) + (-0.0787 * Z);
M = (-0.2298 * X) + (1.1834* Y) + (0.0464 * Z);
S = (0.0000 * X) + (0.0000 * Y) + (1.0000 * Z);
//for handling log
if(L == 0.0000) L=1.0000;
if(M == 0.0000) M = 1.0000;
if( S == 0.0000) S = 1.0000;
//LMS to Lab
_L = (1.0 / sqrt(3.0)) *((1.0000 * log10(L)) + (1.0000 * log10(M)) + (1.0000 * log10(S)));
Alph =(1.0 / sqrt(6.0)) * ((1.0000 * log10(L)) + (1.0000 * log10(M)) + (-2.0000 * log10(S)));
Beta = (1.0 / sqrt(2.0)) * ((1.0000 * log10(L)) + (-1.0000 * log10(M)) + (-0.0000 * log10(S)));
L_AlphBeta.at<Vec3f>(i, j)[0] = _L;
L_AlphBeta.at<Vec3f>(i, j)[1] = Alph;
L_AlphBeta.at<Vec3f>(i, j)[2] = Beta;
}
}
return L_AlphBeta;
}
Mat DetectTrackFace::LAlphBeta2RGB(Mat &src)
{
Mat XYZ(src.rows, src.cols, src.type());
Mat BGR(src.rows, src.cols, CV_8UC3);
float X,Y,Z,L,M,S,_L,Alph,Beta, B,G,R;
for(int i = 0; i < src.rows; i++)
{
for(int j = 0; j < src.cols; j++)
{
_L = src.at<Vec3f>(i, j)[0]*1.7321;
Alph = src.at<Vec3f>(i, j)[1]*2.4495;
Beta = src.at<Vec3f>(i, j)[2]*1.4142;
/*Inv_Transform_logLMS2lab =
0.33333 0.16667 0.50000
0.33333 0.16667 -0.50000
0.33333 -0.33333 0.00000*/
L = (0.33333*_L) + (0.16667 * Alph) + (0.50000 * Beta);
M = (0.33333 * _L) + (0.16667 * Alph) + (-0.50000 * Beta);
S = (0.33333 * _L) + (-0.33333 * Alph) + (0.00000* Beta);
L = pow(10 , L);
if(L == 1) L=0;
M = pow(10 , M);
if(M == 1) M=0;
S = pow(10 , S);
if(S == 1) S=0;
/*Inv_Transform_XYZ2LMS
1.91024 -1.11218 0.20194
0.37094 0.62905 0.00001
0.00000 0.00000 1.00000*/
X = (1.91024 *L ) + (-1.11218 * M ) +(0.20194 * S);
Y = (0.37094 * L ) + (0.62905 * M ) +(0.00001 * S);
Z = (0.00000 * L) + (0.00000 * M ) +(1.00000 * S);
/*Inv_Transform_RGB2XYZ
3.240625 -1.537208 -0.498629
-0.968931 1.875756 0.041518
0.055710 -0.204021 1.056996*/
R = ( 3.240625 * X) + ( -1.537208 * Y) + ( -0.498629 * Z);
G = ( -0.968931 * X) + ( 1.875756 * Y) + ( 0.041518 * Z);
B = ( 0.055710 * X) + ( -0.204021 * Y) + ( 1.056996 * Z);
if(R>255) R = 255;
if(G>255) G = 255;
if(B>255) B = 255;
if(R<0) R = 0;
if(G<0) G = 0;
if(B<0) B = 0;
if(R > 255 || G > 255 || B > 255 || R < 0 || G < 0 || B<0)
cout<<"R = "<<R<<" G = "<<G <<" B = "<<B<<endl;
BGR.at<Vec3b>(i, j)[0] = (uchar)B;
BGR.at<Vec3b>(i, j)[1] = (uchar)G;
BGR.at<Vec3b>(i, j)[2] = (uchar)R;
}
}
//normalize(BGR,BGR, 255, 0, NORM_MINMAX, CV_8UC3 );
return BGR;
}
You have float to uchar truncation errors in the function LAlphBeta2RGB here:
BGR.at<Vec3b>(i, j)[0] = (uchar)B;
BGR.at<Vec3b>(i, j)[1] = (uchar)G;
BGR.at<Vec3b>(i, j)[2] = (uchar)R;
You can solve this using:
BGR(i, j)[0] = uchar(cvRound(B));
BGR(i, j)[1] = uchar(cvRound(G));
BGR(i, j)[2] = uchar(cvRound(R));
However, you shouldn't take care of conversion problems explicitly. You can use saturate_cast to handle this for you. You can declare R,G,B variables as uchar:
uchar B, G, R;
and perform the conversion as:
R = saturate_cast<uchar>((3.240625 * X) + (-1.537208 * Y) + (-0.498629 * Z));
G = saturate_cast<uchar>((-0.968931 * X) + (1.875756 * Y) + (0.041518 * Z));
B = saturate_cast<uchar>((0.055710 * X) + (-0.204021 * Y) + (1.056996 * Z));
and then assign as:
BGR(i, j)[0] = B;
BGR(i, j)[1] = G;
BGR(i, j)[2] = R;
Or avoid using R,G,B entirely using:
BGR(i, j)[2] = saturate_cast<uchar>((3.240625 * X) + (-1.537208 * Y) + (-0.498629 * Z));
BGR(i, j)[1] = saturate_cast<uchar>((-0.968931 * X) + (1.875756 * Y) + (0.041518 * Z));
BGR(i, j)[0] = saturate_cast<uchar>((0.055710 * X) + (-0.204021 * Y) + (1.056996 * Z));
Here the full code. I took the liberty to use Mat_ instead of Mat as functions arguments, to avoid using at<type>() to access pixel values. In fact, you are already assuming that inputs of your functions are CV_8UC3 and CV_32FC3, respectively.
#include <opencv2\opencv.hpp>
#include <iostream>
using namespace std;
using namespace cv;
Mat RGB2LAlphBeta(Mat3b &src)
{
Mat3f L_AlphBeta(src.rows, src.cols);
//cvtColor(src,dest,CV_BGR2XYZ);
float X, Y, Z, L, M, S, _L, Alph, Beta;
int R, G, B;
for (int i = 0; i < src.rows; i++)
{
for (int j = 0; j < src.cols; j++)
{
B = src(i, j)[0];
G = src(i, j)[1];
R = src(i, j)[2];
X = (0.4124 * R) + (0.3576 * G) + (0.1805 * B);
Y = (0.2126 * R) + (0.7152 * G) + (0.0722 * B);
Z = (0.0193 * R) + (0.1192 * G) + (0.9505 * B);
L = (0.3897 * X) + (0.6890 * Y) + (-0.0787 * Z);
M = (-0.2298 * X) + (1.1834* Y) + (0.0464 * Z);
S = (0.0000 * X) + (0.0000 * Y) + (1.0000 * Z);
//for handling log
if (L == 0.0000) L = 1.0000;
if (M == 0.0000) M = 1.0000;
if (S == 0.0000) S = 1.0000;
//LMS to Lab
_L = (1.0 / sqrt(3.0)) *((1.0000 * log10(L)) + (1.0000 * log10(M)) + (1.0000 * log10(S)));
Alph = (1.0 / sqrt(6.0)) * ((1.0000 * log10(L)) + (1.0000 * log10(M)) + (-2.0000 * log10(S)));
Beta = (1.0 / sqrt(2.0)) * ((1.0000 * log10(L)) + (-1.0000 * log10(M)) + (-0.0000 * log10(S)));
L_AlphBeta(i, j)[0] = _L;
L_AlphBeta(i, j)[1] = Alph;
L_AlphBeta(i, j)[2] = Beta;
}
}
return L_AlphBeta;
}
Mat LAlphBeta2RGB(Mat3f &src)
{
Mat3f XYZ(src.rows, src.cols);
Mat3b BGR(src.rows, src.cols);
float X, Y, Z, L, M, S, _L, Alph, Beta;
for (int i = 0; i < src.rows; i++)
{
for (int j = 0; j < src.cols; j++)
{
_L = src(i, j)[0] * 1.7321;
Alph = src(i, j)[1] * 2.4495;
Beta = src(i, j)[2] * 1.4142;
/*Inv_Transform_logLMS2lab =
0.33333 0.16667 0.50000
0.33333 0.16667 -0.50000
0.33333 -0.33333 0.00000*/
L = (0.33333*_L) + (0.16667 * Alph) + (0.50000 * Beta);
M = (0.33333 * _L) + (0.16667 * Alph) + (-0.50000 * Beta);
S = (0.33333 * _L) + (-0.33333 * Alph) + (0.00000* Beta);
L = pow(10, L);
if (L == 1) L = 0;
M = pow(10, M);
if (M == 1) M = 0;
S = pow(10, S);
if (S == 1) S = 0;
/*Inv_Transform_XYZ2LMS
1.91024 -1.11218 0.20194
0.37094 0.62905 0.00001
0.00000 0.00000 1.00000*/
X = (1.91024 *L) + (-1.11218 * M) + (0.20194 * S);
Y = (0.37094 * L) + (0.62905 * M) + (0.00001 * S);
Z = (0.00000 * L) + (0.00000 * M) + (1.00000 * S);
/*Inv_Transform_RGB2XYZ
3.240625 -1.537208 -0.498629
-0.968931 1.875756 0.041518
0.055710 -0.204021 1.056996*/
BGR(i, j)[2] = saturate_cast<uchar>((3.240625 * X) + (-1.537208 * Y) + (-0.498629 * Z));
BGR(i, j)[1] = saturate_cast<uchar>((-0.968931 * X) + (1.875756 * Y) + (0.041518 * Z));
BGR(i, j)[0] = saturate_cast<uchar>((0.055710 * X) + (-0.204021 * Y) + (1.056996 * Z));
}
}
//normalize(BGR,BGR, 255, 0, NORM_MINMAX, CV_8UC3 );
return BGR;
}
int main()
{
Mat3b img = imread("path_to_image");
Mat3f labb = RGB2LAlphBeta(img);
Mat3b rgb = LAlphBeta2RGB(labb);
Mat3b diff;
absdiff(img, rgb, diff);
// Check if all pixels are equals
cout << ((sum(diff) == Scalar(0, 0, 0, 0)) ? "Equals" : "Different");
return 0;
}

How can I get ellipse coefficient from fitEllipse function of OpenCV?

I want to extract the red ball from one picture and get the detected ellipse matrix in picture.
Here is my example:
I threshold the picture, find the contour of red ball by using findContour() function and use fitEllipse() to fit an ellipse.
But what I want is to get coefficient of this ellipse. Because the fitEllipse() return a rotation rectangle (RotatedRect), so I need to re-write this function.
One Ellipse can be expressed as Ax^2 + By^2 + Cxy + Dx + Ey + F = 0; So I want to get u=(A,B,C,D,E,F) or u=(A,B,C,D,E) if F is 1 (to construct an ellipse matrix).
I read the source code of fitEllipse(), there are totally three SVD process, I think I can get the above coefficients from the results of those three SVD process. But I am quite confused what does each result (variable cv::Mat x) of each SVD process represent and why there are three SVD here?
Here is this function:
cv::RotatedRect cv::fitEllipse( InputArray _points )
{
Mat points = _points.getMat();
int i, n = points.checkVector(2);
int depth = points.depth();
CV_Assert( n >= 0 && (depth == CV_32F || depth == CV_32S));
RotatedRect box;
if( n < 5 )
CV_Error( CV_StsBadSize, "There should be at least 5 points to fit the ellipse" );
// New fitellipse algorithm, contributed by Dr. Daniel Weiss
Point2f c(0,0);
double gfp[5], rp[5], t;
const double min_eps = 1e-8;
bool is_float = depth == CV_32F;
const Point* ptsi = points.ptr<Point>();
const Point2f* ptsf = points.ptr<Point2f>();
AutoBuffer<double> _Ad(n*5), _bd(n);
double *Ad = _Ad, *bd = _bd;
// first fit for parameters A - E
Mat A( n, 5, CV_64F, Ad );
Mat b( n, 1, CV_64F, bd );
Mat x( 5, 1, CV_64F, gfp );
for( i = 0; i < n; i++ )
{
Point2f p = is_float ? ptsf[i] : Point2f((float)ptsi[i].x, (float)ptsi[i].y);
c += p;
}
c.x /= n;
c.y /= n;
for( i = 0; i < n; i++ )
{
Point2f p = is_float ? ptsf[i] : Point2f((float)ptsi[i].x, (float)ptsi[i].y);
p -= c;
bd[i] = 10000.0; // 1.0?
Ad[i*5] = -(double)p.x * p.x; // A - C signs inverted as proposed by APP
Ad[i*5 + 1] = -(double)p.y * p.y;
Ad[i*5 + 2] = -(double)p.x * p.y;
Ad[i*5 + 3] = p.x;
Ad[i*5 + 4] = p.y;
}
solve(A, b, x, DECOMP_SVD);
// now use general-form parameters A - E to find the ellipse center:
// differentiate general form wrt x/y to get two equations for cx and cy
A = Mat( 2, 2, CV_64F, Ad );
b = Mat( 2, 1, CV_64F, bd );
x = Mat( 2, 1, CV_64F, rp );
Ad[0] = 2 * gfp[0];
Ad[1] = Ad[2] = gfp[2];
Ad[3] = 2 * gfp[1];
bd[0] = gfp[3];
bd[1] = gfp[4];
solve( A, b, x, DECOMP_SVD );
// re-fit for parameters A - C with those center coordinates
A = Mat( n, 3, CV_64F, Ad );
b = Mat( n, 1, CV_64F, bd );
x = Mat( 3, 1, CV_64F, gfp );
for( i = 0; i < n; i++ )
{
Point2f p = is_float ? ptsf[i] : Point2f((float)ptsi[i].x, (float)ptsi[i].y);
p -= c;
bd[i] = 1.0;
Ad[i * 3] = (p.x - rp[0]) * (p.x - rp[0]);
Ad[i * 3 + 1] = (p.y - rp[1]) * (p.y - rp[1]);
Ad[i * 3 + 2] = (p.x - rp[0]) * (p.y - rp[1]);
}
solve(A, b, x, DECOMP_SVD);
// store angle and radii
rp[4] = -0.5 * atan2(gfp[2], gfp[1] - gfp[0]); // convert from APP angle usage
if( fabs(gfp[2]) > min_eps )
t = gfp[2]/sin(-2.0 * rp[4]);
else // ellipse is rotated by an integer multiple of pi/2
t = gfp[1] - gfp[0];
rp[2] = fabs(gfp[0] + gfp[1] - t);
if( rp[2] > min_eps )
rp[2] = std::sqrt(2.0 / rp[2]);
rp[3] = fabs(gfp[0] + gfp[1] + t);
if( rp[3] > min_eps )
rp[3] = std::sqrt(2.0 / rp[3]);
box.center.x = (float)rp[0] + c.x;
box.center.y = (float)rp[1] + c.y;
box.size.width = (float)(rp[2]*2);
box.size.height = (float)(rp[3]*2);
if( box.size.width > box.size.height )
{
float tmp;
CV_SWAP( box.size.width, box.size.height, tmp );
box.angle = (float)(90 + rp[4]*180/CV_PI);
}
if( box.angle < -180 )
box.angle += 360;
if( box.angle > 360 )
box.angle -= 360;
return box;
}
The source code link: https://github.com/Itseez/opencv/blob/master/modules/imgproc/src/shapedescr.cpp
The function fitEllipse returns a RotatedRect that contains all the parameters of the ellipse.
An ellipse is defined by 5 parameters:
xc : x coordinate of the center
yc : y coordinate of the center
a : major semi-axis
b : minor semi-axis
theta : rotation angle
You can obtain these parameters like:
RotatedRect e = fitEllipse(points);
float xc = e.center.x;
float yc = e.center.y;
float a = e.size.width / 2; // width >= height
float b = e.size.height / 2;
float theta = e.angle; // in degrees
You can draw an ellipse with the function ellipse using the RotatedRect:
ellipse(image, e, Scalar(0,255,0));
or, equivalently using the ellipse parameters:
ellipse(res, Point(xc, yc), Size(a, b), theta, 0.0, 360.0, Scalar(0,255,0));
If you need the values of the coefficients of the implicit equation, you can do like (from Wikipedia):
So, you can get the parameters you need from the RotatedRect, and you don't need to change the function fitEllipse.
The solve function is used to solve linear systems or least-squares problems. Using the SVD decomposition method the system can be over-defined and/or the matrix src1 can be singular.
For more details on the algorithm, you can see the paper of Fitzgibbon that proposed this fit ellipse method.
Here is some code that worked for me which I based on the other responses on this thread.
def getConicCoeffFromEllipse(e):
# ellipse(Point(xc, yc),Size(a, b), theta)
xc = e[0][0]
yc = e[0][1]
a = e[1][0]/2
b = e[1][1]/2
theta = math.radians(e[2])
# See https://en.wikipedia.org/wiki/Ellipse
# Ax^2 + Bxy + Cy^2 + Dx + Ey + F = 0 is the equation
A = a*a*math.pow(math.sin(theta),2) + b*b*math.pow(math.cos(theta),2)
B = 2*(b*b - a*a)*math.sin(theta)*math.cos(theta)
C = a*a*math.pow(math.cos(theta),2) + b*b*math.pow(math.sin(theta),2)
D = -2*A*xc - B*yc
E = -B*xc - 2*C*yc
F = A*xc*xc + B*xc*yc + C*yc*yc - a*a*b*b
coef = np.array([A,B,C,D,E,F]) / F
return coef
def getConicMatrixFromCoeff(c):
C = np.array([[c[0], c[1]/2, c[3]/2], # [ a, b/2, d/2 ]
[c[1]/2, c[2], c[4]/2], # [b/2, c, e/2 ]
[c[3]/2, c[4]/2, c[5]]]) # [d/2], e/2, f ]
return C

Multi otsu(multi-thresholding) with openCV

I am trying to carry out multi-thresholding with otsu. The method I am using currently is actually via maximising the between class variance, I have managed to get the same threshold value given as that by the OpenCV library. However, that is just via running otsu method once.
Documentation on how to do multi-level thresholding or rather recursive thresholding is rather limited. Where do I do after obtaining the original otsu's value? Would appreciate some hints, I been playing around with the code, adding one external for loop, but the next value calculated is always 254 for any given image:(
My code if need be:
//compute histogram first
cv::Mat imageh; //image edited to grayscale for histogram purpose
//imageh=image; //to delete and uncomment below;
cv::cvtColor(image, imageh, CV_BGR2GRAY);
int histSize[1] = {256}; // number of bins
float hranges[2] = {0.0, 256.0}; // min andax pixel value
const float* ranges[1] = {hranges};
int channels[1] = {0}; // only 1 channel used
cv::MatND hist;
// Compute histogram
calcHist(&imageh, 1, channels, cv::Mat(), hist, 1, histSize, ranges);
IplImage* im = new IplImage(imageh);//assign the image to an IplImage pointer
IplImage* finalIm = cvCreateImage(cvSize(im->width, im->height), IPL_DEPTH_8U, 1);
double otsuThreshold= cvThreshold(im, finalIm, 0, 255, cv::THRESH_BINARY | cv::THRESH_OTSU );
cout<<"opencv otsu gives "<<otsuThreshold<<endl;
int totalNumberOfPixels= imageh.total();
cout<<"total number of Pixels is " <<totalNumberOfPixels<< endl;
float sum = 0;
for (int t=0 ; t<256 ; t++)
{
sum += t * hist.at<float>(t);
}
cout<<"sum is "<<sum<<endl;
float sumB = 0; //sum of background
int wB = 0; // weight of background
int wF = 0; //weight of foreground
float varMax = 0;
int threshold = 0;
//run an iteration to find the maximum value of the between class variance(as between class variance shld be maximise)
for (int t=0 ; t<256 ; t++)
{
wB += hist.at<float>(t); // Weight Background
if (wB == 0) continue;
wF = totalNumberOfPixels - wB; // Weight Foreground
if (wF == 0) break;
sumB += (float) (t * hist.at<float>(t));
float mB = sumB / wB; // Mean Background
float mF = (sum - sumB) / wF; // Mean Foreground
// Calculate Between Class Variance
float varBetween = (float)wB * (float)wF * (mB - mF) * (mB - mF);
// Check if new maximum found
if (varBetween > varMax) {
varMax = varBetween;
threshold = t;
}
}
cout<<"threshold value is: "<<threshold;
To extend Otsu's thresholding method to multi-level thresholding the between class variance equation becomes:
Please check out Deng-Yuan Huang, Ta-Wei Lin, Wu-Chih Hu, Automatic
Multilevel Thresholding Based on Two-Stage Otsu's Method with Cluster
Determination by Valley Estimation, Int. Journal of Innovative
Computing, 2011, 7:5631-5644 for more information.
http://www.ijicic.org/ijicic-10-05033.pdf
Here is my C# implementation of Otsu Multi for 2 thresholds:
/* Otsu (1979) - multi */
Tuple < int, int > otsuMulti(object sender, EventArgs e) {
//image histogram
int[] histogram = new int[256];
//total number of pixels
int N = 0;
//accumulate image histogram and total number of pixels
foreach(int intensity in image.Data) {
if (intensity != 0) {
histogram[intensity] += 1;
N++;
}
}
double W0K, W1K, W2K, M0, M1, M2, currVarB, optimalThresh1, optimalThresh2, maxBetweenVar, M0K, M1K, M2K, MT;
optimalThresh1 = 0;
optimalThresh2 = 0;
W0K = 0;
W1K = 0;
M0K = 0;
M1K = 0;
MT = 0;
maxBetweenVar = 0;
for (int k = 0; k <= 255; k++) {
MT += k * (histogram[k] / (double) N);
}
for (int t1 = 0; t1 <= 255; t1++) {
W0K += histogram[t1] / (double) N; //Pi
M0K += t1 * (histogram[t1] / (double) N); //i * Pi
M0 = M0K / W0K; //(i * Pi)/Pi
W1K = 0;
M1K = 0;
for (int t2 = t1 + 1; t2 <= 255; t2++) {
W1K += histogram[t2] / (double) N; //Pi
M1K += t2 * (histogram[t2] / (double) N); //i * Pi
M1 = M1K / W1K; //(i * Pi)/Pi
W2K = 1 - (W0K + W1K);
M2K = MT - (M0K + M1K);
if (W2K <= 0) break;
M2 = M2K / W2K;
currVarB = W0K * (M0 - MT) * (M0 - MT) + W1K * (M1 - MT) * (M1 - MT) + W2K * (M2 - MT) * (M2 - MT);
if (maxBetweenVar < currVarB) {
maxBetweenVar = currVarB;
optimalThresh1 = t1;
optimalThresh2 = t2;
}
}
}
return new Tuple(optimalThresh1, optimalThresh2);
}
And this is the result I got by thresholding an image scan of soil with the above code:
(T1 = 110, T2 = 147).
Otsu's original paper: "Nobuyuki Otsu, A Threshold Selection Method
from Gray-Level Histogram, IEEE Transactions on Systems, Man, and
Cybernetics, 1979, 9:62-66" also briefly mentions the extension to
Multithresholding.
https://engineering.purdue.edu/kak/computervision/ECE661.08/OTSU_paper.pdf
Hope this helps.
Here is a simple general approach for 'n' thresholds in python (>3.0) :
# developed by- SUJOY KUMAR GOSWAMI
# source paper- https://people.ece.cornell.edu/acharya/papers/mlt_thr_img.pdf
import cv2
import numpy as np
import math
img = cv2.imread('path-to-image')
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
a = 0
b = 255
n = 6 # number of thresholds (better choose even value)
k = 0.7 # free variable to take any positive value
T = [] # list which will contain 'n' thresholds
def sujoy(img, a, b):
if a>b:
s=-1
m=-1
return m,s
img = np.array(img)
t1 = (img>=a)
t2 = (img<=b)
X = np.multiply(t1,t2)
Y = np.multiply(img,X)
s = np.sum(X)
m = np.sum(Y)/s
return m,s
for i in range(int(n/2-1)):
img = np.array(img)
t1 = (img>=a)
t2 = (img<=b)
X = np.multiply(t1,t2)
Y = np.multiply(img,X)
mu = np.sum(Y)/np.sum(X)
Z = Y - mu
Z = np.multiply(Z,X)
W = np.multiply(Z,Z)
sigma = math.sqrt(np.sum(W)/np.sum(X))
T1 = mu - k*sigma
T2 = mu + k*sigma
x, y = sujoy(img, a, T1)
w, z = sujoy(img, T2, b)
T.append(x)
T.append(w)
a = T1+1
b = T2-1
k = k*(i+1)
T1 = mu
T2 = mu+1
x, y = sujoy(img, a, T1)
w, z = sujoy(img, T2, b)
T.append(x)
T.append(w)
T.sort()
print(T)
For full paper and more informations visit this link.
I've written an example on how otsu thresholding work in python before. You can see the source code here: https://github.com/subokita/Sandbox/blob/master/otsu.py
In the example there's 2 variants, otsu2() which is the optimised version, as seen on Wikipedia page, and otsu() which is more naive implementation based on the algorithm description itself.
If you are okay in reading python codes (in this case, they are pretty simple, almost pseudo code like), you might want to look at otsu() in the example and modify it. Porting it to C++ code is not hard either.
#Antoni4 gives the best answer in my opinion and it's very straight forward to increase the number of levels.
This is for three-level thresholding:
#include "Shadow01-1.cuh"
void multiThresh(double &optimalThresh1, double &optimalThresh2, double &optimalThresh3, cv::Mat &imgHist, cv::Mat &src)
{
double W0K, W1K, W2K, W3K, M0, M1, M2, M3, currVarB, maxBetweenVar, M0K, M1K, M2K, M3K, MT;
unsigned char *histogram = (unsigned char*)(imgHist.data);
int N = src.rows*src.cols;
W0K = 0;
W1K = 0;
M0K = 0;
M1K = 0;
MT = 0;
maxBetweenVar = 0;
for (int k = 0; k <= 255; k++) {
MT += k * (histogram[k] / (double) N);
}
for (int t1 = 0; t1 <= 255; t1++)
{
W0K += histogram[t1] / (double) N; //Pi
M0K += t1 * (histogram[t1] / (double) N); //i * Pi
M0 = M0K / W0K; //(i * Pi)/Pi
W1K = 0;
M1K = 0;
for (int t2 = t1 + 1; t2 <= 255; t2++)
{
W1K += histogram[t2] / (double) N; //Pi
M1K += t2 * (histogram[t2] / (double) N); //i * Pi
M1 = M1K / W1K; //(i * Pi)/Pi
W2K = 1 - (W0K + W1K);
M2K = MT - (M0K + M1K);
if (W2K <= 0) break;
M2 = M2K / W2K;
W3K = 0;
M3K = 0;
for (int t3 = t2 + 1; t3 <= 255; t3++)
{
W2K += histogram[t3] / (double) N; //Pi
M2K += t3 * (histogram[t3] / (double) N); // i*Pi
M2 = M2K / W2K; //(i*Pi)/Pi
W3K = 1 - (W1K + W2K);
M3K = MT - (M1K + M2K);
M3 = M3K / W3K;
currVarB = W0K * (M0 - MT) * (M0 - MT) + W1K * (M1 - MT) * (M1 - MT) + W2K * (M2 - MT) * (M2 - MT) + W3K * (M3 - MT) * (M3 - MT);
if (maxBetweenVar < currVarB)
{
maxBetweenVar = currVarB;
optimalThresh1 = t1;
optimalThresh2 = t2;
optimalThresh3 = t3;
}
}
}
}
}
#Guilherme Silva
Your code has a BUG
You Must Replace:
W3K = 0;
M3K = 0;
with
W2K = 0;
M2K = 0;
and
W3K = 1 - (W1K + W2K);
M3K = MT - (M1K + M2K);
with
W3K = 1 - (W0K + W1K + W2K);
M3K = MT - (M0K + M1K + M2K);
;-)
Regards
EDIT(1): [Toby Speight]
I discovered this bug by applying the effect to the same picture at different resoultions(Sizes) and seeing that the output results were to much different from each others (Even changing resolution a little bit)
W3K and M3K must be the totals minus the Previous WKs and MKs.
(I thought about this for Code-similarity with the one with one level less)
At the moment due to my lacks of English I cannot explain Better How and Why
To be honest I'm still not 100% sure that this way is correct, even thought from my outputs I could tell that it gives better results. (Even with 1 Level more (5 shades of gray))
You could try yourself ;-)
Sorry
My Outputs:
3 Thresholds
4 Thresholds
I found a useful piece of code in this thread. I was looking for a multi-level Otsu implementation for double/float images. So, I tried to generalize example for N-levels with double/float matrix as input. In my code below I am using armadillo library as dependency. But this code can be easily adapted for standard C++ arrays, just replace vec, uvec objects with single dimensional double and integer arrays, mat and umat with two-dimensional. Two other functions from armadillo used here are: vectorise and hist.
// Input parameters:
// map - input image (double matrix)
// mask - region of interest to be thresholded
// nBins - number of bins
// nLevels - number of Otsu thresholds
#include <armadillo>
#include <algorithm>
#include <vector>
mat OtsuFilterMulti(mat map, int nBins, int nLevels) {
mat mapr; // output thresholded image
mapr = zeros<mat>(map.n_rows, map.n_cols);
unsigned int numElem = 0;
vec threshold = zeros<vec>(nLevels);
vec q = zeros<vec>(nLevels + 1);
vec mu = zeros<vec>(nLevels + 1);
vec muk = zeros<vec>(nLevels + 1);
uvec binv = zeros<uvec>(nLevels);
if (nLevels <= 1) return mapr;
numElem = map.n_rows*map.n_cols;
uvec histogram = hist(vectorise(map), nBins);
double maxval = map.max();
double minval = map.min();
double odelta = (maxval - abs(minval)) / nBins; // distance between histogram bins
vec oval = zeros<vec>(nBins);
double mt = 0, variance = 0.0, bestVariance = 0.0;
for (int ii = 0; ii < nBins; ii++) {
oval(ii) = (double)odelta*ii + (double)odelta*0.5; // centers of histogram bins
mt += (double)ii*((double)histogram(ii)) / (double)numElem;
}
for (int ii = 0; ii < nLevels; ii++) {
binv(ii) = ii;
}
double sq, smuk;
int nComb;
nComb = nCombinations(nBins,nLevels);
std::vector<bool> v(nBins);
std::fill(v.begin(), v.begin() + nLevels, true);
umat ibin = zeros<umat>(nComb, nLevels); // indices from combinations will be stored here
int cc = 0;
int ci = 0;
do {
for (int i = 0; i < nBins; ++i) {
if(ci==nLevels) ci=0;
if (v[i]) {
ibin(cc,ci) = i;
ci++;
}
}
cc++;
} while (std::prev_permutation(v.begin(), v.end()));
uvec lastIndex = zeros<uvec>(nLevels);
// Perform operations on pre-calculated indices
for (int ii = 0; ii < nComb; ii++) {
for (int jj = 0; jj < nLevels; jj++) {
smuk = 0;
sq = 0;
if (lastIndex(jj) != ibin(ii, jj) || ii == 0) {
q(jj) += double(histogram(ibin(ii, jj))) / (double)numElem;
muk(jj) += ibin(ii, jj)*(double(histogram(ibin(ii, jj)))) / (double)numElem;
mu(jj) = muk(jj) / q(jj);
q(jj + 1) = 0.0;
muk(jj + 1) = 0.0;
if (jj>0) {
for (int kk = 0; kk <= jj; kk++) {
sq += q(kk);
smuk += muk(kk);
}
q(jj + 1) = 1 - sq;
muk(jj + 1) = mt - smuk;
mu(jj + 1) = muk(jj + 1) / q(jj + 1);
}
if (jj>0 && jj<(nLevels - 1)) {
q(jj + 1) = 0.0;
muk(jj + 1) = 0.0;
}
lastIndex(jj) = ibin(ii, jj);
}
}
variance = 0.0;
for (int jj = 0; jj <= nLevels; jj++) {
variance += q(jj)*(mu(jj) - mt)*(mu(jj) - mt);
}
if (variance > bestVariance) {
bestVariance = variance;
for (int jj = 0; jj<nLevels; jj++) {
threshold(jj) = oval(ibin(ii, jj));
}
}
}
cout << "Optimized thresholds: ";
for (int jj = 0; jj<nLevels; jj++) {
cout << threshold(jj) << " ";
}
cout << endl;
for (unsigned int jj = 0; jj<map.n_rows; jj++) {
for (unsigned int kk = 0; kk<map.n_cols; kk++) {
for (int ll = 0; ll<nLevels; ll++) {
if (map(jj, kk) >= threshold(ll)) {
mapr(jj, kk) = ll+1;
}
}
}
}
return mapr;
}
int nCombinations(int n, int r) {
if (r>n) return 0;
if (r*2 > n) r = n-r;
if (r == 0) return 1;
int ret = n;
for( int i = 2; i <= r; ++i ) {
ret *= (n-i+1);
ret /= i;
}
return ret;
}

Resources