Related
I am trying to implement a convolutional neural network from scratch and I am not able to figure out how to perform (vectorized)operations on multi-channel images like rgb, which have 3 dimensions. On following the articles and tutorials such as this CS231n tutorial , it's pretty clear to implement a network for a single input as the input layer will be a 3d matrix but there are always multiple data points in a dataset. so, I cannot figure out how to implement these networks for vectorized operation on entire datsets.
I have implemented a network which takes a 3d matrix as input but now I have realized that It will not work on entire dataset but I will have to propagate one input at a time.I don't really know whether conv nets are vectorized over entire dataset or not .But if they are, how can I vectorize my convolutional network for multi-channel images ?
If I got your question right, you're basically asking how to do convolutional layer for a mini-batch, which will be a 4-D tensor.
To put it simply, you want to treat each input in a batch independently and apply convolution to each one. It's fairly straightforward to code without vectorization using a loop.
A vectorization implementation is often based on im2col technique, which basically transforms the 4-D input tensor into a giant matrix and performs a matrix multiplication. Here's an implementation of a forward pass using numpy.lib.stride_tricks in python:
import numpy as np
def conv_forward(x, w, b, stride, pad):
N, C, H, W = x.shape
F, _, HH, WW = w.shape
# Check dimensions
assert (W + 2 * pad - WW) % stride == 0, 'width does not work'
assert (H + 2 * pad - HH) % stride == 0, 'height does not work'
# Pad the input
p = pad
x_padded = np.pad(x, ((0, 0), (0, 0), (p, p), (p, p)), mode='constant')
# Figure out output dimensions
H += 2 * pad
W += 2 * pad
out_h = (H - HH) / stride + 1
out_w = (W - WW) / stride + 1
# Perform an im2col operation by picking clever strides
shape = (C, HH, WW, N, out_h, out_w)
strides = (H * W, W, 1, C * H * W, stride * W, stride)
strides = x.itemsize * np.array(strides)
x_stride = np.lib.stride_tricks.as_strided(x_padded,
shape=shape, strides=strides)
x_cols = np.ascontiguousarray(x_stride)
x_cols.shape = (C * HH * WW, N * out_h * out_w)
# Now all our convolutions are a big matrix multiply
res = w.reshape(F, -1).dot(x_cols) + b.reshape(-1, 1)
# Reshape the output
res.shape = (F, N, out_h, out_w)
out = res.transpose(1, 0, 2, 3)
out = np.ascontiguousarray(out)
return out
Note that it uses some non-trivial features of linear algebra library, which are implemented in numpy, but may be not in your library.
BTW, you generally don't want to push the entire data set as one batch - split it into several batches.
There is a lot of information on the internet about the differences between YUV4:4:4 to YUV4:2:2 formats, however, I can not find anything that tells how to convert the YUV4:4:4 to YUV4:2:2. Since such conversion is performed using software, I was hoping that there should be some developers that have done it and could direct me to the sources that describe the conversion algorithm. Of course, the software code would be nice to have, but having the access to the theory would be sufficient enough to write my own software. Specifically, I would like to know pixel structure and how the bytes are managed during conversion.
I found several similar questions like this and this, however, could not get my question answered. Also, I posted this question on the Photography forum, and they considered it as a software question.
The reason why you can't find specific description, is that there are many ways to do it.
Lets start from Wikipedia: https://en.wikipedia.org/wiki/Chroma_subsampling#4:2:2
4:4:4:
Each of the three Y'CbCr components have the same sample rate, thus there is no chroma subsampling. This scheme is sometimes used in high-end film scanners and cinematic post production.
and
4:2:2:
The two chroma components are sampled at half the sample rate of luma: the horizontal chroma resolution is halved. This reduces the bandwidth of an uncompressed video signal by one-third with little to no visual difference.
Note: Terms YCbCr and YUV are used interchangeably.
https://en.wikipedia.org/wiki/YCbCr
Y′CbCr is often confused with the YUV color space, and typically the terms YCbCr and YUV are used interchangeably, leading to some confusion; when referring to signals in video or digital form, the term "YUV" mostly means "Y′CbCr".
Data memory ordering:
Again there is more than one format.
Intel IPP documentation defines two main categories: "Pixel-Order Image Formats" and "Planar Image Formats".
There is a nice documentation here: https://software.intel.com/en-us/node/503876
Refer here: http://www.fourcc.org/yuv.php#NV12 for YUV pixel arrangement formats.
Refer here: http://scc.ustc.edu.cn/zlsc/sugon/intel/ipp/ipp_manual/IPPI/ippi_ch6/ch6_image_downsampling.htm#ch6_image_downsampling for downsampling description.
Let's assume "Pixel-Order" format:
YUV 4:4:4 data order: Y0 U0 V0 Y1 U1 V1 Y2 U2 V2 Y3 U3 V3
YUV 4:2:2 data order: Y0 U0 Y1 V0 Y2 U1 Y3 V1
Each element is a single byte, and Y0 is the lower byte in memory.
The 4:2:2 data order described above is named UYVY or YUY2 pixel format.
Conversion algorithms:
"Naive sub-sampling":
"Throw" every second U/V component:
Take U0, and throw U1, take V0 and throw V1...
Source: Y0 U0 V0 Y1 U1 V1 Y2 U2 V2
Destination: Y0 U0 Y1 V0 Y2 U2 Y3 V2
I can't recommend it, since it causes aliasing artifacts.
Average each U/V pair:
Take Destination U0 equals source (U0+U1)/2, same for V0...
Source: Y0 U0 V0 Y1 U1 V1 Y2 U2 V2
Destination: Y0 (U0+U1)/2 Y1 (V0+V1)/2 Y2 (U2+U3)/2 Y3 (V2+V3)/2
Use other interpolation method for down-sampling U and V (cubic interpolation for example).
Usually you will not be able to see any differences compared to simple average.
C implementation:
The question is not tagged as C, but I think the following C implementation may be helpful.
The following code converts pixel-ordered YUV 4:4:4 to pixel-ordered YUV 4:2:2 by averaging each U/V pair:
//Convert single row I0 from pixel-ordered YUV 4:4:4 to pixel-ordered YUV 4:2:2.
//Save the result in J0.
//I0 size in bytes is image_width*3
//J0 size in bytes is image_width*2
static void ConvertRowYUV444ToYUV422(const unsigned char I0[],
const int image_width,
unsigned char J0[])
{
int x;
//Process two Y,U,V triples per iteration:
for (x = 0; x < image_width; x += 2)
{
//Load source elements
unsigned char y0 = I0[x*3]; //Load source Y element
unsigned int u0 = (unsigned int)I0[x*3+1]; //Load source U element (and convert from uint8 to uint32).
unsigned int v0 = (unsigned int)I0[x*3+2]; //Load source V element (and convert from uint8 to uint32).
//Load next source elements
unsigned char y1 = I0[x*3+3]; //Load source Y element
unsigned int u1 = (unsigned int)I0[x*3+4]; //Load source U element (and convert from uint8 to uint32).
unsigned int v1 = (unsigned int)I0[x*3+5]; //Load source V element (and convert from uint8 to uint32).
//Calculate destination U, and V elements.
//Use shift right by 1 for dividing by 2.
//Use plus 1 before shifting - round operation instead of floor operation.
unsigned int u01 = (u0 + u1 + 1) >> 1; //Destination U element equals average of two source U elements.
unsigned int v01 = (v0 + v1 + 1) >> 1; //Destination U element equals average of two source U elements.
J0[x*2] = y0; //Store Y element (unmodified).
J0[x*2+1] = (unsigned char)u01; //Store destination U element (and cast uint32 to uint8).
J0[x*2+2] = y1; //Store Y element (unmodified).
J0[x*2+3] = (unsigned char)v01; //Store destination V element (and cast uint32 to uint8).
}
}
//Convert image I from pixel-ordered YUV 4:4:4 to pixel-ordered YUV 4:2:2.
//I - Input image in pixel-order data YUV 4:4:4 format.
//image_width - Number of columns of image I.
//image_height - Number of rows of image I.
//J - Destination "image" in pixel-order data YUV 4:2:2 format.
//Note: The term "YUV" referees to "Y'CbCr".
//I is pixel ordered YUV 4:4:4 format (size in bytes is image_width*image_height*3):
//YUVYUVYUVYUV
//YUVYUVYUVYUV
//YUVYUVYUVYUV
//YUVYUVYUVYUV
//
//J is pixel ordered YUV 4:2:2 format (size in bytes is image_width*image_height*2):
//YUYVYUYV
//YUYVYUYV
//YUYVYUYV
//YUYVYUYV
//
//Conversion algorithm:
//Each element of destination U is average of 2 original U horizontal elements
//Each element of destination V is average of 2 original V horizontal elements
//
//Limitations:
//1. image_width must be a multiple of 2.
//2. I and J must be two separate arrays (in place computation is not supported).
static void ConvertYUV444ToYUV422(const unsigned char I[],
const int image_width,
const int image_height,
unsigned char J[])
{
//I0 points source row.
const unsigned char *I0; //I0 -> YUYVYUYV...
//J0 and points destination row.
unsigned char *J0; //J0 -> YUYVYUYV
int y; //Row index
//In each iteration process single row.
for (y = 0; y < image_height; y++)
{
I0 = &I[y*image_width*3]; //Input row width is image_width*3 bytes (each pixel is Y,U,V).
J0 = &J[y*image_width*2]; //Output row width is image_width*2 bytes (each two pixels are Y,U,Y,V).
//Process single source row into single destination row
ConvertRowYUV444ToYUV422(I0, image_width, J0);
}
}
Planar representation of YUV 4:2:2
Planar representation may be more intuitive than "Pixel-Order" format.
In planar representation each color channel is represented as a separate matrix, which can be displayed as an image.
Example:
Original image in RGB format (before converting to YUV):
Image channels in YUV 4:4:4 format:
(Left YUV triple is represented in gray levels, and right YUV triple is represented using false colors).
Image channels in YUV 4:2:2 format (after horizontal Chroma subsampling):
(Left YUV triple is represented in gray levels, and right YUV triple is represented using "false colors").
As you can see, in 4:2:2 format, the U an V channels are down-sampled (shrunk) in the horizontal axis.
Remark:
The "false colors" representation of U and V channels is used for emphasizing that Y is the Luma channel and U and V are the Chrominance channels.
Higher order interpolation and Anti-Aliasing filter:
Following MATLAB code sample shows how to perform down-sampling with higher order interpolation and Anti-Aliasing filter.
The sample also shows the down-sampling method used by FFMPEG.
Note: you don't need to know MATLAB programming in order to understand the samples.
You do need some knowledge of image filtering by convolution between a Kernel and an image.
%Prepare the input:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
load('mandrill.mat', 'X', 'map'); %Load input image
RGB = im2uint8(ind2rgb(X, map)); %Convert to RGB (the mandrill sample image is an indexed image)
YUV = rgb2ycbcr(RGB); %Convert from RGB to YUV (MATLAB function rgb2ycbcr uses BT.601 conversion formula)
%Separate YUV to 3 planes (Y plane, U plane and V plane)
Y = YUV(:, :, 1);
U = YUV(:, :, 2);
V = YUV(:, :, 3);
U = double(U); %Work in double precision instead of uint8.
[M, N] = size(Y); %Image size is N columns by M rows.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Linear interpolation without Anti-Aliasing filter:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Horizontal down-sampling U plane using Linear interpolation (without Anti-Aliasing filter).
%Simple averaging is equivalent to linear interpolation.
U2 = (U(:, 1:2:end) + U(:, 2:2:end))/2;
refU2 = imresize(U, [M, N/2], 'bilinear', 'Antialiasing', false); %Use MATLAB imresize function as reference
disp(['Linear interpolation max diff = ' num2str(max(abs(double(U2(:)) - double(refU2(:)))))]); %Print maximum difference.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Cubic interpolation without Anti-Aliasing filter:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Horizontal down-sampling U plane using Cubic interpolation (without Anti-Aliasing filter).
%Following operations are equivalent to cubic interpolation:
%1. Convolution with filter kernel [-0.125, 1.25, -0.125]
%2. Averaging pair elements
fU = imfilter(U, [-0.125, 1.25, -0.125], 'symmetric');
U2 = (fU(:, 1:2:end) + fU(:, 2:2:end))/2;
U2 = max(min(U2, 240), 16); %Limit to valid range of U elements (valid range of U elements in uint8 format is [16, 240])
refU2 = imresize(U, [M, N/2], 'cubic', 'Antialiasing', false); %Use MATLAB imresize function as reference
refU2 = max(min(refU2, 240), 16); %Limit to valid range of U elements
disp(['Cubic interpolation max diff = ' num2str(max(abs(double(U2(:)) - double(refU2(:)))))]); %Print maximum difference.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Linear interpolation with Anti-Aliasing filter:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Horizontal down-sampling U plane using Linear interpolation with Anti-Aliasing filter.
%Remark: The Anti-Aliasing filter is the filter used by MATLAB specific implementation of 'bilinear' imresize.
%Following operations are equivalent to Linear interpolation with Anti-Aliasing filter:
%1. Convolution with filter kernel [0.25, 0.5, 0.25]
%2. Averaging pair elements
fU = imfilter(U, [0.25, 0.5, 0.25], 'symmetric');
U2 = (fU(:, 1:2:end) + fU(:, 2:2:end))/2;
refU2 = imresize(U, [M, N/2], 'bilinear', 'Antialiasing', true); %Use MATLAB imresize function as reference
disp(['Linear interpolation with Anti-Aliasing max diff = ' num2str(max(abs(double(U2(:)) - double(refU2(:)))))]); %Print maximum difference.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Cubic interpolation with Anti-Aliasing filter:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Horizontal down-sampling U plane using Cubic interpolation with Anti-Aliasing filter.
%Remark: The Anti-Aliasing filter is the filter used by MATLAB specific implementation of 'cubic' imresize.
%Following operations are equivalent to Linear interpolation with Anti-Aliasing filter:
%1. Convolution with filter kernel [-0.0234375, -0.046875, 0.2734375, 0.59375, 0.2734375, -0.046875, -0.0234375]
%2. Averaging pair elements
h = [-0.0234375, -0.046875, 0.2734375, 0.59375, 0.2734375, -0.046875, -0.0234375];
fU = imfilter(U, h, 'symmetric');
U2 = (fU(:, 1:2:end) + fU(:, 2:2:end))/2;
U2 = max(min(U2, 240), 16); %Limit to valid range of U elements
refU2 = imresize(U, [M, N/2], 'cubic', 'Antialiasing', true); %Use MATLAB imresize function as reference
refU2 = max(min(refU2, 240), 16); %Limit to valid range of U elements
disp(['Cubic interpolation with Anti-Aliasing max diff = ' num2str(max(abs(double(U2(:)) - double(refU2(:)))))]); %Print maximum difference.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%FFMPEG implementation of horizontal down-sampling U plane.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%FFMPEG uses cubic interpolation with Anti-Aliasing filter (different filter kernel):
%Remark: I didn't check the source code of FFMPEG to verify the values of the filter kernel.
%I can't tell how FFMPEG actually implements the conversion.
%Following operations are equivalent to FFMPEG implementation (with minor differences):
%1. Convolution with filter kernel [-115, -231, 1217, 2354, 1217, -231, -115]/4096
%2. Averaging pair elements
h = [-115, -231, 1217, 2354, 1217, -231, -115]/4096;
fU = imfilter(U, h, 'symmetric');
U2 = (fU(:, 1:2:end) + fU(:, 2:2:end))/2;
U2 = max(min(U2, 240), 16); %Limit to valid range of U elements (FFMPEG actually doesn't limit the result)
%Save Y,U,V planes to file in format supported by FFMPEG
f = fopen('yuv444.yuv', 'w');
fwrite(f, Y', 'uint8');
fwrite(f, U', 'uint8');
fwrite(f, V', 'uint8');
fclose(f);
%For executing FFMPEG within MATLAB, download FFMPEG and place the executable in working directory (ffmpeg.exe for Windows)
%FFMPEG converts source file in YUV444 format to destination file in YUV422 format.
if isunix
[status, cmdout] = system(['./ffmpeg -y -s ', num2str(N), 'x', num2str(M), ' -pix_fmt yuv444p -i yuv444.yuv -pix_fmt yuv422p yuv422.yuv']);
else
[status, cmdout] = system(['ffmpeg.exe -y -s ', num2str(N), 'x', num2str(M), ' -pix_fmt yuv444p -i yuv444.yuv -pix_fmt yuv422p yuv422.yuv']);
end
f = fopen('yuv422.yuv', 'r');
refY = (fread(f, [N, M], '*uint8'))';
refU2 = (fread(f, [N/2, M], '*uint8'))'; %Read down-sampled U plane (FFMPEG result from file).
refV2 = (fread(f, [N/2, M], '*uint8'))';
fclose(f);
%Limit to valid range of U elements.
%In FFMPEG down-sampled U and V may exceed valid range (there is probably a way to tell FFMPEG to limit the result).
refU2 = max(min(refU2, 240), 16);
%Difference exclude first column and last column (FFMPEG treats the margins different than MATLAB)
%Remark: There are minor differences due to rounding (I guess).
disp(['FFMPEG Cubic interpolation with Anti-Aliasing max diff = ' num2str(max(max(abs(double(U2(:, 2:end-1)) - double(refU2(:, 2:end-1))))))]);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Examples for different kind of down-sampling methods.
Linear interpolation versus Cubic interpolation with Anti-Aliasing filter:
In the first example (mandrill) there are no visible differences.
In the second example (circle and rectangle) there are minor visible differences.
The third example (lines) demonstrates aliasing artifacts.
Remark: displayed images where up-sampled from YUV422 to YUV444 using Cubic interpolation and converted from YUV444 to RGB.
Linear interpolation versus Cubic with Anti-Aliasing (mandrill):
Linear interpolation versus Cubic with Anti-Aliasing (circle and rectangle):
Linear interpolation versus Cubic with Anti-Aliasing (demonstrates Aliasing artifacts):
I've noticed that using convertTo to convert a matrix from 32-bit to 16-bit "rounds" number to the upper boud. So, values bigger than 0x0000FFFF in the source matrix will be set as 0xFFFF in the destination matrix.
What I want for my application is instead to mask the values, setting in the destination just the 2 LSB of the values.
Here is an example:
Mat mat32;
Mat mat16;
mat32 = Mat(2,2,CV_32SC1);
for(int y = 0; y < 2; y++)
for(int x = 0; x < 2; x++)
mat32.at<unsigned int>(cv::Point(x,y)) = 0x0000FFFE + (y*2+x);
mat32.convertTo(mat16, CV_16UC1);
The matrixes have these values:
32 bits matrix:
0000FFFE 0000FFFF
00010000 00010001
16 bits matrix:
0000FFFE 0000FFFF
0000FFFF 0000FFFF
In the second row of 16-bit matrix I want to have
00000000 00000001
I can do this by scanning the source matrix value-by-value and masking the values, but the performances are low.
Is there an OpenCV function that does this?
Thanks to everyone!
MIX
This can be done, but this requires a somewhat dirty trick, so it is up to you to use this approach or not. So this is how it can be done:
For this example lets create 1000x1000 32-bit matrix and set all its values to 65541 (=256*256+5). So after the conversion we expect to have a matrix filled with fives.
Mat M1(1000, 1000, CV_32S, Scalar(65541));
And here is the trick:
Mat M2(1000, 1000, CV_16SC2, M1.data);
We created matrix M2 over the same memory buffer as M1, but M2 'think' that this is a buffer of 2-channel 16-bit image. Now the last thing to do is to copy the channel you need to the place you need. This can be done by split() or mixChannels() functions. For example:
Mat M3(1000, 1000, CV_16S);
int fromto[] = {0,0};
Mat inpu[] = {M2}, outpu[] = {M3};
mixChannels(inpu, 1, outpu, 1, fromto, 1);
cout << M3.at<short>(10,10) << endl;
Ye I know that the format of mixChannels looks weird and makes the code even less readable, but it works... If you prefer split() function:
vector<Mat> v;
split(M2,v);
cout << v[0].at<short>(10,10) << " " << v[1].at<short>(10,10) << endl;
There is no OpenCV function (that I know of) which does the conversion like you want, so either you code it yourself or like you said you go through a masking step first to remove the 16 high bits.
The mask can be applied using the bitwise_and in C++ or cvAndS in C. See here.
You could also have made your hand-written code more efficient. In general, you should avoid OpenCV pixel accessors in loops because they have bad performance. I don't have an OpenCV install at hand so this could be slighlty off -- the idea is to use the data field directly, and step which is the number of bytes per row:
for(int y = 0; y < mat32.height; ++) {
int* row = (int*)( (char*)mat32.data + y * mat32.step);
for(int x = 0; x < mat32.step/ 4)
row[x] &= 0xffff;
Then, once the mask is applied, all values fit in 16 bits, and convertTo will just truncate the 16 upper bits.
The other solution is to code the conversion by hand:
mat16.resize( mat32.size() );
for(int y = 0; y < mat32.height; ++) {
const int* row32 = (const int*)( (char*)mat32.data + y * mat32.step);
short* row16 = (short*) ( (char*)mat16.data + y * mat16.step);
for(int x = 0; x < mat32.step/ 4)
row16[x] = short(row32[x]);
I'm trying to create a PCA model in OpenCV to hold pixel coordinates. As an experiment I have two sets of pixel coordinates that maps out two approximate circles. Each set of coordiantes has 48 x,y pairs. I was experimenting with the following code which reads the coordinates from a file and stores them in a Mat structure. However, I don't think it is right and PCA in openCV seems very poorly covered on the Internet.
Mat m(2, 48, CV_32FC2); // matrix with 2 rows of 48 cols of floats held in two channels
pFile = fopen("data.txt", "r");
for (int i=0; i<48; i++){
int x, y;
fscanf(pFile, "%d%c%c%d%c", &x, &c, &c, &y, &c);
m.at<Vec2f>( 0 , i )[0] = (float)x; // store x in row 0, col i in channel 0
m.at<Vec2f>( 0 , i )[1] = (float)y; // store y in row 0, col i in channel 1
}
for (int i=0; i<48; i++){
int x, y;
fscanf(pFile, "%d%c%c%d%c", &x, &c, &c, &y, &c);
m.at<Vec2f>( 1 , i )[0] = (float)x; // store x in row 1, col i in channel 0
m.at<Vec2f>( 1 , i )[1] = (float)y; // store y in row 1, col i in channel 1
}
PCA pca(m, Mat(), CV_PCA_DATA_AS_ROW, 2); // 2 principle components??? Not sure what to put here e.g. is it 2 for two data sets or 48 for number of elements?
for (int i=0; i<48; i++){
float x = pca.mean.at<Vec2f>(i,0)[0]; //get average x
float y = pca.mean.at<Vec2f>(i,0)[1]; //get average y
printf("\n x=%f, y=%f", x, y);
}
However, this crashes when creating the pca object. I know this is a very basic question but I am a bit lost and was hoping that someone could get me started with pca in open cv.
Perhaps it would be helpful if you described in further detail what you need to use PCA for and what you hope to achieve (output?).
I am fairly sure that the reason your program crashes is because the input Mat is CV_32FC2, when it should be CV_32FC1. You need to reshape your data into 1 dimensional row vectors before using PCA, not knowing what you need I can't say how to reshape your data. (The common application with images is eigenFace which requires an image to be reshaped into a row vector). Additionally you will need to normalize your input data between 0 and 1.
As a further aside, usually you would choose to keep 1 less principal component than the number of input samples because the last principal component is simply orthogonal to the others.
I have worked with opencv PCA before and would like to help further. I would also refer you to this blog: http://www.bytefish.de/blog/pca_in_opencv which helped me get started with PCA in openCV.
In Matlab, If A is a matrix, sum(A) treats the columns of A as vectors, returning a row vector of the sums of each column.
sum(Image); How could it be done with OpenCV?
Using cvReduce has worked for me. For example, if you need to store the column-wise sum of a matrix as a row matrix you could do this:
CvMat * MyMat = cvCreateMat(height, width, CV_64FC1);
// Fill in MyMat with some data...
CvMat * ColSum = cvCreateMat(1, MyMat->width, CV_64FC1);
cvReduce(MyMat, ColSum, 0, CV_REDUCE_SUM);
More information is available in the OpenCV documentation.
EDIT after 3 years:
The proper function for this is cv::reduce.
Reduces a matrix to a vector.
The function reduce reduces the matrix to a vector by treating the
matrix rows/columns as a set of 1D vectors and performing the
specified operation on the vectors until a single row/column is
obtained. For example, the function can be used to compute horizontal
and vertical projections of a raster image. In case of REDUCE_MAX and
REDUCE_MIN , the output image should have the same type as the source
one. In case of REDUCE_SUM and REDUCE_AVG , the output may have a
larger element bit-depth to preserve accuracy. And multi-channel
arrays are also supported in these two reduction modes.
OLD:
I've used ROI method: move ROI of height of the image and width 1 from left to right and calculate means.
Mat src = imread(filename, 0);
vector<int> graph( src.cols );
for (int c=0; c<src.cols-1; c++)
{
Mat roi = src( Rect( c,0,1,src.rows ) );
graph[c] = int(mean(roi)[0]);
}
Mat mgraph( 260, src.cols+10, CV_8UC3);
for (int c=0; c<src.cols-1; c++)
{
line( mgraph, Point(c+5,0), Point(c+5,graph[c]), Scalar(255,0,0), 1, CV_AA);
}
imshow("mgraph", mgraph);
imshow("source", src);
EDIT:
Just out of curiosity, I've tried resize to height 1 and the result was almost the same:
Mat test;
cv::resize(src,test,Size( src.cols,1 ));
Mat mgraph1( 260, src.cols+10, CV_8UC3);
for(int c=0; c<test.cols; c++)
{
graph[c] = test.at<uchar>(0,c);
}
for (int c=0; c<src.cols-1; c++)
{
line( mgraph1, Point(c+5,0), Point(c+5,graph[c]), Scalar(255,255,0), 1, CV_AA);
}
imshow("mgraph1", mgraph1);
cvSum respects ROI, so if you move a 1 px wide window over the whole image, you can calculate the sum of each column.
My c++ got a little rusty so I won't provide a code example, though the last time I did this I used OpenCVSharp and it worked fine. However, I'm not sure how efficient this method is.
My math skills are getting rusty too, but shouldn't it be possible to sum all elements in columns in a matrix by multiplying it by a vector of 1s?
For an 8 bit greyscale image, the following should work (I think).
It shouldn't be too hard to expand to different image types.
int imgStep = image->widthStep;
uchar* imageData = (uchar*)image->imageData;
uint result[image->width];
memset(result, 0, sizeof(uchar) * image->width);
for (int col = 0; col < image->width; col++) {
for (int row = 0; row < image->height; row++) {
result[col] += imageData[row * imgStep + col];
}
}
// your desired vector is in result