I am looking to align images for a focus stacking application using a smartphone. Links to images:
First in stack: 1, Last in stack: 2, Final stacked images: 3
I.e. images are nominally the same, BUT contain:
Systematic change in FOCUS as the focal plane shifts between images
Magnification changes slightly (smartphone feature as focus changes!)
Camera moves slightly due to random vibrations.
Images need to be aligned for the focus-stacking APP to work.
Progress to date:
I use OpenCV's findTransformECC() to get alignment. It works well after some experimentation i.e. see cv2.MOTION_EUCLIDEAN for the warp_mode in ECC image alignment method which was useful to improve the initialization of the Warp matrix:
Images aligned at pixel level
60secs to process 8Mpix image (1sec for 0.5Mpix image) (on 3 year old portable PC with OpenCV release libraries)
See stacked image link above.
I briefly investigated a feature detector (SIFT). It did not align the images well, presumably due to the change in focus between images.
int scale = 1;
int scaleSmall = 4;
float scaleDiff = scaleSmall / scale;
for (i = 0; i< numImages; i++) {
file = dir + image + to_string(i) + ".jpg";
col[i] = imread(file);
resize(col[i], z[i], Size(col[i].cols/scale, col[i].rows/scale));
cvtColor(z[i], zg[i], CV_BGR2GRAY);
resize(zg[i], zgSmall[i], Size(col[i].cols / scaleSmall, col[i].rows / scaleSmall));
// Set a 2x3 or 3x3 warp matrix depending on the motion model.
// See
// Define the motion model
const int warp_mode = MOTION_HOMOGRAPHY;
// Initialize the matrix to identity
if (warp_mode == MOTION_HOMOGRAPHY) {
warp_init = Mat::eye(3, 3, CV_32F);
warp_matrix = Mat::eye(3, 3, CV_32F);
warp_matrix_prev = Mat::eye(3, 3, CV_32F);
scaleTX = (Mat_<float>(3, 3) << 1, 1, scaleDiff, 1, 1, scaleDiff, 1 / scaleDiff, 1 / scaleDiff, 1);
else {
warp_init = Mat::eye(2, 3, CV_32F);
scaleTX = Mat::eye(2, 3, CV_32F);
warp_matrix = Mat::eye(2, 3, CV_32F);
warp_matrix_prev = Mat::eye(2, 3, CV_32F);
scaleTX = (Mat_<float>(2, 3) << 1, 1, scaleDiff, 1, 1, scaleDiff);
// Specify the number of iterations.
int number_of_iterations = 5000;
// Specify the threshold of the increment
// in the correlation coefficient between two iterations
double termination_eps = 1e-8;
// Define termination criteria
TermCriteria criteria(TermCriteria::COUNT + TermCriteria::EPS, number_of_iterations, termination_eps);
for (i = 1; i < numImages; i++) {
// Check images right size
if (zg[0].rows < 10 || zg[1].rows < 10)
// Run the ECC algorithm at start to get an initial guess. The results are stored in warp_matrix.
if (i == 1) {
findTransformECC(zgSmall[0], zgSmall[i], warp_init, warp_mode, criteria );
// See
warp_matrix = warp_init * scaleTX;
// Warp Matrix from previous iteration is used as initialisation
findTransformECC(zg[0], zg[i], warp_matrix, warp_mode, criteria);
if (warp_mode != MOTION_HOMOGRAPHY) {
warpAffine(zg[i], ag[i], warp_matrix, zg[i].size(), INTER_LINEAR + WARP_INVERSE_MAP);
warpAffine(z[i], acol[i], warp_matrix, zg[i].size(), INTER_LINEAR + WARP_INVERSE_MAP);
else {
// Use warpPerspective for Homography
warpPerspective(z[i], acol[i], warp_matrix, z[i].size(), INTER_LINEAR + WARP_INVERSE_MAP);
warpPerspective(zg[i], ag[i], warp_matrix, zg[i].size(), INTER_LINEAR + WARP_INVERSE_MAP);
Can the image registration speed be significantly improved (using the same hardware)?

there are at least 3 improvements that can be done:
5000 iterations may be unnecessary. Try to limit it to 500. Moreover transforming images to gradient domain may help. See GetGradient function from this tutorial.
You can assume that perspective effects are negligible so you can change warp_mode to MOTION_AFFINE to limit the degrees of freedom from 8 to 6.
You can also try another, much faster approach that is based on phase correlation (frequency domain). In the standard way it only estimates translation between images but you can transfer them to the log-polar space to get translation, rotation and scale invariance. This code implements the third approach.


Technique to introduce normalisation/consistency to std dev comparison?

I am implementing a very simple segmentation algorithm for single channel images. The algorithm works like so:
For a single channel image:
Calculate the standard deviation, ie, measure how much the luminosity varies across the image.
If the stddev > 15 (aka threshold):
Divide the image into 4 cells/images
For each cell:
Repeat step 1 and step 2 (go recursive)
Draw a rectangle on the source image to signify a segment lies in these bounds.
My problem occurs because my threshold is constant and when I go recursive 15 is not longer a good signifier of whether that image is homogeneous or not. How can I introduce consistency/normalisation to my homogeneity check?
Should I resize each image to the same size (100x100)? Should my threshold be formula? Say 15 / img.rows * img.cols or 15 / MAX_HISTOGRAM_PEAK?
Edit Code:
void split_mat(const Mat& src, Mat& split1, Mat& split2, Mat& split3, Mat& split4) {
split1 = Mat(src, Rect(Point(0, 0), Point(src.cols / 2, src.rows / 2)));
split1 = Mat(src, Rect(Point(src.cols/2, 0), Point(src.cols, src.rows / 2)));
split3 = Mat(src, Rect(Point(0, src.rows/2), Point(src.cols / 2, src.rows)));
split4 = Mat(src, Rect(Point(src.cols/2, src.rows/2), Point(src.cols, src.rows)));
void segment_by_homogeny(const Mat& src, double threshold) {
Scalar mean, stddev;
meanStdDev(src, mean, stddev);
double dev = stddev[0]; // / (src.rows * src.cols) * 100.0;
if (dev >= threshold) {
Mat s1, s2, s3, s4;
split_mat(src, s1, s2, s3, s4);
// Go recursive and segment each sub-segment where necessary
segment_by_homogeny(s1, threshold);
segment_by_homogeny(s2, threshold);
segment_by_homogeny(s3, threshold);
segment_by_homogeny(s4, threshold);
else {
// Store 'segment' in global vector 'images'
// and write std dev on it
char d[255];
sprintf_s(d, "Std Dev: %f", stddev[0]);
putText(src, d, cvPoint(30, 60),
FONT_HERSHEY_COMPLEX_SMALL, 0.7, cvScalar(200, 200, 250), 1, CV_AA);
// current usage for the example image results in inifinite recursion.
// The green and red segment never has a std dev < 25
segment_by_homogeny(img, 25);
I am expecting my algorithm to produce the following 5 segments:
You can simplify your algorithm. Because you want to divide the given region into 4 subregions, you can first divide it into the 4 subregions, then calculate the average luminosity value for each, and have your threshold on the difference between these neighbor values.

Using opencv matchtemplate for blister pack inspection

I am doing a project in which I have to inspect pharmaceutical blister pack for missing tablets.
I am trying to use opencv's matchTemplate function. Let me show the code and then some results.
int match(string filename, string templatename)
Mat ref = cv::imread(filename + ".jpg");
Mat tpl = cv::imread(templatename + ".jpg");
if (ref.empty() || tpl.empty())
cout << "Error reading file(s)!" << endl;
return -1;
imshow("file", ref);
imshow("template", tpl);
Mat res_32f(ref.rows - tpl.rows + 1, ref.cols - tpl.cols + 1, CV_32FC1);
matchTemplate(ref, tpl, res_32f, CV_TM_CCOEFF_NORMED);
Mat res;
res_32f.convertTo(res, CV_8U, 255.0);
imshow("result", res);
int size = ((tpl.cols + tpl.rows) / 4) * 2 + 1; //force size to be odd
adaptiveThreshold(res, res, 255, ADAPTIVE_THRESH_MEAN_C, THRESH_BINARY, size, -128);
imshow("result_thresh", res);
while (true)
double minval, maxval, threshold = 0.8;
Point minloc, maxloc;
minMaxLoc(res, &minval, &maxval, &minloc, &maxloc);
if (maxval >= threshold)
rectangle(ref, maxloc, Point(maxloc.x + tpl.cols, maxloc.y + tpl.rows), CV_RGB(0,255,0), 2);
floodFill(res, maxloc, 0); //mark drawn blob
imshow("final", ref);
return 0;
And here are some pictures.
The "sample" image of a good blister pack:
The template cropped from "sample" image:
Result with "sample" image:
Missing tablet from this pack is detected:
But here are the problems:
I currently don't have any idea why this happens. Any suggestion and/or help is appreciated.
The original code that I followed and modified is here:
I found a solution for my own question. I just need to apply Canny edge detector on both image and template before throwing them to matchTemplate function. The full working code:
int match(string filename, string templatename)
Mat ref = cv::imread(filename + ".jpg");
Mat tpl = cv::imread(templatename + ".jpg");
if(ref.empty() || tpl.empty())
cout << "Error reading file(s)!" << endl;
return -1;
Mat gref, gtpl;
cvtColor(ref, gref, CV_BGR2GRAY);
cvtColor(tpl, gtpl, CV_BGR2GRAY);
const int low_canny = 110;
Canny(gref, gref, low_canny, low_canny*3);
Canny(gtpl, gtpl, low_canny, low_canny*3);
imshow("file", gref);
imshow("template", gtpl);
Mat res_32f(ref.rows - tpl.rows + 1, ref.cols - tpl.cols + 1, CV_32FC1);
matchTemplate(gref, gtpl, res_32f, CV_TM_CCOEFF_NORMED);
Mat res;
res_32f.convertTo(res, CV_8U, 255.0);
imshow("result", res);
int size = ((tpl.cols + tpl.rows) / 4) * 2 + 1; //force size to be odd
adaptiveThreshold(res, res, 255, ADAPTIVE_THRESH_MEAN_C, THRESH_BINARY, size, -64);
imshow("result_thresh", res);
double minval, maxval;
Point minloc, maxloc;
minMaxLoc(res, &minval, &maxval, &minloc, &maxloc);
if(maxval > 0)
rectangle(ref, maxloc, Point(maxloc.x + tpl.cols, maxloc.y + tpl.rows), Scalar(0,255,0), 2);
floodFill(res, maxloc, 0); //mark drawn blob
imshow("final", ref);
return 0;
Any suggestion for improvement is appreciated. I am strongly concerned about performance and robustness of my code, so I am looking for all ideas.
There are 2 things that got my nerves now: the lower Canny threshold and the negative constant on adaptiveThreshold function.
Edit: Here is the result, as you asked :)
Test image, missing 2 tablets:
Canny results of template and test image:
matchTemplate result (converted to CV_8U):
After adaptiveThreshold:
Final result:
I don't think think the adaptive threshold is a good choice.
What you need to do here is called non-maximum suppression. You have an image with multiple local maxima, and you want to remove all pixels that are not local maxima.
cv::dilate(res_32f, res_dilated, null, 5);
cv::compare(res_32f, res_dilated, mask_local_maxima, cv::CMP_GE);
cv::set(res_32f, 0, mask_local_maxima)
Now all pixels in the res_32f image that are not local maxima are set to zero. All the maximum pixels are still at their original value, so you can adjust the threshold later in the line
double minval, maxval, threshold = 0.8;
All local maxima should also now be surrounded by enough zeroes that the floodfill will not extend too far.
Now I think you should be able to adjust the threshold to exclude all false positives.
If this is not enough, here is another suggestion:
Instead of just one template, I would run the search with multiple templates; your current template,and one with a tablet from the right side and the left side of the pack. Due to perspective these tablets look quite a bit different. Keep track of the found tablets so you do not detect the smae tablet multiple times.
With these multiple templates you can raise the threshold even higher.
One further refinement: if the detection is still too erratic, try blurring your template and search image with a Gaussian blur. This will remove fine details and noise that may throw of the matchTemplate function, while leaving the larger structures intact.
Using a canny filter instead seems unreliable to me: It seems to rely on the fact that a removed tablet region will have more edges at the center. But I am not sure if this will always be the case; and you discard a lot of information about color and brightness with the canny filter, so I would expect worse results.
(that said, if it works for you, it works)
Have you tried the Surf algorithm in order to get more detailed descriptors? You could try to collect descriptor for both the full and the empty sample image. And perform different action for each one of thr object detected.

Detect basket ball Hoops and ball tracking

Detect the hoop(basket).To see the samples of "hoop".
Count the no of successful attempts(shoot) and the failure attempts.
I am using opencv.
Camera position will be static.
The Portrait mode videos from any mobile device.
What have i tried:
Able to track the basket ball. Still, seeking for a better solution.
My code:
int main () {
VideoCapture vid(path);
if (!vid.isOpened())
int i_frame_height = vid.get(CV_CAP_PROP_FRAME_HEIGHT);
i_height_basketball = i_height_basketball * I_HEIGHT / i_frame_height;
int fps = vid.get(CV_CAP_PROP_FPS);
Mat mat_black(640, 480, CV_8UC3, Scalar(0, 0, 0));
vector <Mat> vec_frames;
for (int i_push = 0; i_push < I_NO_FRAMES_STORE; i_push++)
vector <Mat> vec_mat_result;
for (int i_push = 0; i_push < I_RESULT_STORE; i_push++)
int count_frame = 0;
while (true) {
int clk_start = clock();
Mat image, result;
vid >> image;
if (image.empty())
resize(image, image, Size(I_WIDTH, I_HEIGHT));
image.copyTo(vec_mat_result[count_frame % I_RESULT_STORE]);
if (count_frame >= 1)
vec_mat_result[(count_frame - 1) % I_RESULT_STORE].copyTo(result);
GaussianBlur(image, image, Size(9, 9), 2, 2);
image.copyTo(vec_frames[count_frame % I_NO_FRAMES_STORE]);
if (count_frame >= I_NO_FRAMES_STORE - 1) {
Mat mat_diff_temp(I_HEIGHT, I_WIDTH, CV_32S, Scalar(0));
for (int i_diff = 0; i_diff < I_NO_FRAMES_STORE; i_diff++) {
Mat mat_rgb_diff_temp = abs(vec_frames[ (count_frame - 1) % I_NO_FRAMES_STORE ] - vec_frames[ (count_frame - i_diff) % I_NO_FRAMES_STORE ]);
cvtColor(mat_rgb_diff_temp, mat_rgb_diff_temp, CV_BGR2GRAY);
mat_rgb_diff_temp = mat_rgb_diff_temp > I_THRESHOLD;
mat_rgb_diff_temp.convertTo(mat_rgb_diff_temp, CV_32S);
mat_diff_temp = mat_diff_temp + mat_rgb_diff_temp;
mat_diff_temp = mat_diff_temp > I_THRESHOLD_2;
// mat_diff_temp.convertTo(mat_diff_temp, CV_8U);
Mat mat_roi = mat_diff_temp.rowRange(0, i_height_basketball);
// imshow("ROI", mat_roi);
Moments mm = cv::moments(mat_roi, true);
Point p_center = Point(mm.m10 / mm.m00, mm.m01 / mm.m00);
circle(result, p_center, 3, CV_RGB(0, 255, 0), -1);
line(result, Point(0, i_height_basketball), Point(result.cols, i_height_basketball), Scalar(225, 0, 0), 1);
count_frame = count_frame + 1;
int clk_processing_time = (clock() - clk_start);
if (count_frame > 1)
imshow("image", result);
// waitKey(0);
int delay = (1000 / fps) - clk_processing_time;
if (delay <= 0)
delay = 2;
if (waitKey(delay) >= 27)
return 0;
How to detect the hoop? I thought of doing with Square detection to detect the square regions around the hoop.
What is the best way of counting the successful shoots? Or How to count ?
I have what I suspect will be a fairly strong baseline: once the ball has commenced its downward arc, if the ball demonstrates significant upward movement again, its a miss. Otherwise, its a basket. This won't catch airballs, but I suspect they're relatively few anyway.
I think you could get a whole lot of mileage out of learning the ball trajectory of a successful shot and not worry too much about the hoop. Furthermore, didn't you say the camera was fixed-position? Doesn't that mean the hoop's always in the same place, and so you could just specify its location?
If you absolutely did have to find the hoop, I'd look for an object (sub-region of the image) of about the same size as the ball (which you say you can track) that's orange. More generally, you could learn a classifier for the hoop based on the training images you linked to, and apply it at a mixture of locations and scales, searching for the best match. You should know its approximate location, i.e. that it's in the upper portion of the image and likely to be to one side or the other. Then you could use proximity features to this identified region in addition to trajectory features to build a classifier for whether the shot succeeded or not.

OpenCV (Emgu.CV) -- compositing images with alpha

I'm using Emgu.CV to perform some basic image manipulation and composition. My images are loaded as Image<Bgra,Byte>.
Question #1: When I use the Image<,>.Add() method, the images are always blended together, regardless of the alpha value. Instead I'd like them to be composited one atop the other, and use the included alpha channel to determine how the images should be blended. So if I call image1.Add(image2) any fully opaque pixels in image2 would completely cover the pixels from image1, while semi-transparent pixels would be blended based on the alpha value.
Here's what I'm trying to do in visual form. There's a city image with some "transparent holes" cut out, and a frog behind. This is what it should look like:
And this is what openCV produces.
How can I get this effect with OpenCV? And will it be as fast as calling Add()?
Question #2: is there a way to perform this composition in-place instead of creating a new image with each call to Add()? (e.g. image1.AddImageInPlace(image2) modifies the bytes of image1?)
NOTE: Looking for answers within Emgu.CV, which I'm using because of how well it handles perspective warping.
Before OpenCV 2.4 there was no support of PNGs with alpha channel.
To verify if your current version supports it, print the number of channels after loading an image that you are certain to be RGBA. If it supports, the application will output the number 4, else it will output number 3 (RGB). Using the C API you would do:
IplImage* t_img = cvLoadImage(argv[1], CV_LOAD_IMAGE_UNCHANGED);
if (!t_img)
printf("!!! Unable to load transparent image.\n");
return -1;
printf("Channels: %d\n", t_img->nChannels);
If you can't update OpenCV:
There are some posts around that try to bypass this limitation but I haven't tested them myself;
The easiest solution would be to use another API to load the image and blend it, check blImageBlending;
Another alternative, not as lightweight, is to use Qt.
If your version already supports PNGs with RGBA:
Take a look at Emulating photoshop’s blending modes in OpenCV. It implements several Photoshop blending modes and I imagine you are capable of converting that code to .Net.
I had to deal with this problem recently and I've demonstrated how to deal with it on this answer.
You'll have to iterate through each pixel. I'm assuming image 1 is the frog image, and image 2 is the city image, with image1 always being bigger than image2.
//to simulate image1.AddInPlace(image2)
int image2w = image2.Width;
int image2h = image2.Height;
int i,j;
var alpha;
for (i = 0; i < w; i++)
for (j = 0; j < h; j++)
//alpha=255 is opaque > image2 should be used
alpha = image2[3][j,i].Intensity;
image1[j, i]
= new Bgra(
image2[j, i].Blue * alpha + (image1[j, i].Blue * (255-alpha)),
image2[j, i].Green * alpha + (image1[j, i].Green * (255-alpha)),
image2[j, i].Red * alpha + (image1[j, i].Red * (255-alpha)));
Using Osiris's suggestion as a starting point, and having checked out alpha compositing on Wikipedia, i ended up with the following which worked really nicely for my purposes.
This was used this with Emgucv. I was hoping that the opencv gpu::AlphaComposite methods were available in Emgucv which I believe would have done the following for me, but alas the version I am using didn't appear to have them implemented.
static public Image<Bgra, Byte> Overlay( Image<Bgra, Byte> image1, Image<Bgra, Byte> image2 )
Image<Bgra, Byte> result = image1.Copy();
Image<Bgra, Byte> src = image2;
Image<Bgra, Byte> dst = image1;
int rows = result.Rows;
int cols = result.Cols;
for (int y = 0; y < rows; ++y)
for (int x = 0; x < cols; ++x)
double srcA = 1.0/255 * src.Data[y, x, 3];
double dstA = 1.0/255 * dst.Data[y, x, 3];
double outA = (srcA + (dstA - dstA * srcA));
result.Data[y, x, 0] = (Byte)(((src.Data[y, x, 0] * srcA) + (dst.Data[y, x, 0] * (1 - srcA))) / outA); // Blue
result.Data[y, x, 1] = (Byte)(((src.Data[y, x, 1] * srcA) + (dst.Data[y, x, 1] * (1 - srcA))) / outA); // Green
result.Data[y, x, 2] = (Byte)(((src.Data[y, x, 2] * srcA) + (dst.Data[y, x, 2] * (1 - srcA))) / outA); // Red
result.Data[y, x, 3] = (Byte)(outA*255);
return result;
A newer version, using emgucv methods. rather than a loop. Not sure it improves on performance.
double unit = 1.0 / 255.0;
Image[] dstS = dst.Split();
Image[] srcS = src.Split();
Image[] rs = result.Split();
Image<Gray, double> srcA = srcS[3] * unit;
Image<Gray, double> dstA = dstS[3] * unit;
Image<Gray, double> outA = srcA.Add(dstA.Sub(dstA.Mul(srcA)));// (srcA + (dstA - dstA * srcA));
// Red.
rs[0] = srcS[0].Mul(srcA).Add(dstS[0].Mul(1 - srcA)).Mul(outA.Pow(-1.0)); // Mul.Pow is divide.
rs[1] = srcS[1].Mul(srcA).Add(dstS[1].Mul(1 - srcA)).Mul(outA.Pow(-1.0));
rs[2] = srcS[2].Mul(srcA).Add(dstS[2].Mul(1 - srcA)).Mul(outA.Pow(-1.0));
rs[3] = outA.Mul(255);
// Merge image back together.
CvInvoke.cvMerge(rs[0], rs[1], rs[2], rs[3], result);
return result.Convert<Bgra, Byte>();
I found an interesting blog post on internet, which I think is related to what you are trying to do.
Please have a look at the Creating Overlays Method ( link). You can use this idea to implement your own function to add two images in the way you mentioned above, making some particular areas in the image transparent while leaving the rest as it is.

Sum of each column opencv

In Matlab, If A is a matrix, sum(A) treats the columns of A as vectors, returning a row vector of the sums of each column.
sum(Image); How could it be done with OpenCV?
Using cvReduce has worked for me. For example, if you need to store the column-wise sum of a matrix as a row matrix you could do this:
CvMat * MyMat = cvCreateMat(height, width, CV_64FC1);
// Fill in MyMat with some data...
CvMat * ColSum = cvCreateMat(1, MyMat->width, CV_64FC1);
cvReduce(MyMat, ColSum, 0, CV_REDUCE_SUM);
More information is available in the OpenCV documentation.
EDIT after 3 years:
The proper function for this is cv::reduce.
Reduces a matrix to a vector.
The function reduce reduces the matrix to a vector by treating the
matrix rows/columns as a set of 1D vectors and performing the
specified operation on the vectors until a single row/column is
obtained. For example, the function can be used to compute horizontal
and vertical projections of a raster image. In case of REDUCE_MAX and
REDUCE_MIN , the output image should have the same type as the source
one. In case of REDUCE_SUM and REDUCE_AVG , the output may have a
larger element bit-depth to preserve accuracy. And multi-channel
arrays are also supported in these two reduction modes.
I've used ROI method: move ROI of height of the image and width 1 from left to right and calculate means.
Mat src = imread(filename, 0);
vector<int> graph( src.cols );
for (int c=0; c<src.cols-1; c++)
Mat roi = src( Rect( c,0,1,src.rows ) );
graph[c] = int(mean(roi)[0]);
Mat mgraph( 260, src.cols+10, CV_8UC3);
for (int c=0; c<src.cols-1; c++)
line( mgraph, Point(c+5,0), Point(c+5,graph[c]), Scalar(255,0,0), 1, CV_AA);
imshow("mgraph", mgraph);
imshow("source", src);
Just out of curiosity, I've tried resize to height 1 and the result was almost the same:
Mat test;
cv::resize(src,test,Size( src.cols,1 ));
Mat mgraph1( 260, src.cols+10, CV_8UC3);
for(int c=0; c<test.cols; c++)
graph[c] =<uchar>(0,c);
for (int c=0; c<src.cols-1; c++)
line( mgraph1, Point(c+5,0), Point(c+5,graph[c]), Scalar(255,255,0), 1, CV_AA);
imshow("mgraph1", mgraph1);
cvSum respects ROI, so if you move a 1 px wide window over the whole image, you can calculate the sum of each column.
My c++ got a little rusty so I won't provide a code example, though the last time I did this I used OpenCVSharp and it worked fine. However, I'm not sure how efficient this method is.
My math skills are getting rusty too, but shouldn't it be possible to sum all elements in columns in a matrix by multiplying it by a vector of 1s?
For an 8 bit greyscale image, the following should work (I think).
It shouldn't be too hard to expand to different image types.
int imgStep = image->widthStep;
uchar* imageData = (uchar*)image->imageData;
uint result[image->width];
memset(result, 0, sizeof(uchar) * image->width);
for (int col = 0; col < image->width; col++) {
for (int row = 0; row < image->height; row++) {
result[col] += imageData[row * imgStep + col];
// your desired vector is in result
