Extrema detection in difference of gaussian images in SIFT - image-processing

I have a question about the workings of the SIFT algorithm. So, say I have a scale space representation of the individual images across many octaves by convolving the image with Gaussian filters of various sizes. Futhermore, I have computed the various difference of Gaussian (DoG) images for each of these octaves.
Let us assume I have 7 DoG images for a given octave. My question is regarding the maxima finding in these DoG images. According to the literature, one compares against 8 local neighbours and 9 neighbours for each of the neighbouring DoG images.
So, now say I am processing these 7 DoG images and I will start from index 1 and go all the way to index 5. So, something like:
for (int i = 1; i <= 5; ++i)
{
for (int y = 1; y < image_height-1; ++y)
{
for (int x = 1; x < image_width-1; ++x)
{
current_pixel = image[x, y, i];
// Compare with the neighbours
// check if it is a maxima at loc (x, y, i)
}
}
}
So, here I am iterating through the image and will check if it is a maxima at this location. My question is now I will end up with the maxima locations at each of these scales (from 1 to 5 in my case). So, for a given (x, y) location there can be multiple maximas (for example at scale 1, 3 and 5). So, is that a problem or there can be multiple keypoints associated for the same spatial location (x, y)? Can someone explain to me how the algorithm proceeds to refine these keypoints?

You will want to find the extrema across scale as well.
Scale-space extrema detection means finding the extremum for every pixel across "scale" and across "space." Space is the xy-plane in the image. Space is the index into the pyramid.
Wht do you want to do this?
The idea of scale-space extrema detection is to find the scale at which a feature has the highest response. For example, if you have a small blob in the image. Its extremum will be at a fine scale. At a coarse scale, this small blob will be washed out.
For a large blob, computing the score at a fine scale does not produce an extremum. But, if the scale is coarse enough the blob will stand out. That is, for coarser levels of the pyramid smaller structures around that small blob will be washed out, and the large blob will stand out.

Related

Does image compression reduce actual size of image?

I am performing an SVD image compression- SVD allows you to approximate the actual image matrix by a lower rank matrix of rank v, thus achieving compression(link).
Here's the pseudo-code:
load image_mat % load image as a matrix
[U, S, V] = SVD(image_mat) % U and V are square, S is diagonal (mxn)
set S(v+1:end,:) = 0; set S(:,v+1:end) = 0; % set all values after rank v as zero
new_image = U*S*V';
The problem I am facing is this: Once I perform the lower rank approximation, the old and the new matrix are of the same size (m x n). Both images contain the same number of pixels (since U and V do not change.) Thus, the file size does not (read: CANNOT!) change. However, I see the image quality changing drastically for different values of v.
What am I missing?
EDIT: Further explanation:
Below is the result of the SVD decompression, by rank reduction:
My question is, if the number of pixels in both the pictures remains the same, how would I get a file size reduction (decompression) ? Except the fact that the matrix of singular values (S) is changing in size, everything else pretty much remains the same (despite the obvious drop in image quality), i.e the new matrix constructed after decompression has the same size 512 x 512 as the original image.
You are not missing anything. the original image has mn data while the compressed one has k+km+k*n data where k is the rank.

optical flow for moving object: few points

I'm trying to do something like this:
http://www.youtube.com/watch?feature=player_embedded&v=MIYt1yNwoZU
and I'm on the right way, it works well for 2 hours coding. But I have some question:
I'm using opencv 2.4 and there are some options around.. see here. which one is the best? lucas kanade with some automatic feature detection? or maybe a simple global orientation is enough? or even kalman filter? for now I'm using a dense farneback’s algorithm and i think is the first (= more simple) option but maybe is not the best one.
after calculating optical flow on the image (scaled down by factor of 2 for calculating optical flow because it is an hard work) I take the average of the vectors. normal average, summing all of them and dividing from the number of vectors. so with a nested for loop on flow mat. better way?
Point2f average_motion(0,0); float n=1;
for(int y = 0; y < flow.rows; y += step)
for(int x = 0; x < flow.cols; x += step) {
const Point2f& fxy = flow.at<Point2f>(y, x);
if( abs(fxy.x) > threshold || abs(fxy.y) > threshold) {
average_motion += fxy;
n++;
}
}
average_motion *= 1/n;
average_motion *= 1/n;
cout << average_motion << endl;
I'm moving the rects BUT the right/left movement seems to be a little bit weird, instead the up/down works really nice! someone can explain me why?
translating is ok, but i'm stuck on rotating.. if i get the average vector how can i get the degree? I've tried with angle between vectors with X axis but is does not work nice. some hint?
Now I'm drawing stuff with opencv drawing api but from 2.4 there is also opengl support.. and should be nice, but i don't find example on that..
The best approach for optical flow is using a Kalman filter for predicting the movement, so you can project the patches in that directions and reduce the searching area for the next frame. Increasing computational speed.
The bad news is that it is a difficult task to make Kalman filter track properly.
I would propse the use the Lucas Kanade method because it is quite fast. Or you could use the GPU implementation of the RLOF, which is similar to the Lucas Kanade. Do not estimate a dense motion vector field just estimate motion vectors for a grid (e.g. each 5th pixel) this saves you a lot of runtime. Or seed the features to track regarding your rectangles you want to move. To move your rectangle it would be a more elegant to estimate transformation matrices e.g. affine or perspective by cv::getPerspectiveTransform or cv::getAffineTransform. The affine transformation contains translation, rotation, and scaling and the perspective contains also scheering. (By both RANSAC is a good estimator). The new positions of the rectangle points could be easly computed by matrix operation.
[x,y,1] = Matrix * [x_old, y_old, 1], see OpenCV documentation

Dealing with Boundary conditions / Halo regions in CUDA

I'm working on image processing with CUDA and i've a doubt about pixel processing.
What is often done with the boundary pixels of an image when applying a m x m convolution filter?
In a 3 x 3 convolution kernel, ignoring the 1 pixel boundary of the image is easier to deal with, especially when the code is improved with shared memory. Indeed, in this case, one does not need to check if a given pixel has all the neigbourhood available (i.e. pixel at coord (0, 0) has not left, left-upper, upper neighbours). However, removing the 1 pixel boundary of the original image could generate partial results.
Opposite to that, I'd like to process all the pixels within the image, also when using shared memory improvements, i.e., for example, loading 16 x 16 pixels, but computing the inner 14 x 14. Also in this case, ignoring the boundary pixels generates a clearer code.
What is usually done in this case?
Does anyone usually use my approach ignoring the boundary pixels?
Of course, I'm aware the answer depends on the type of problem, i.e. adding two images pixel-wise has not this problem.
Thanks in advance.
A common approach to dealing with border effects is to pad the original image with extra rows & columns based on your filter size. Some common choices for the padded values are:
A constant (e.g. zero)
Replicate the first and last row / column as many times as needed
Reflect the image at the borders (e.g. column[-1] = column[1], column[-2] = column[2])
Wrap the image values (e.g. column[-1] = column[width-1], column[-2] = column[width-2])
tl;dr: It depends on the problem you're trying to solve -- there is no solution for this that applies to all problems. In fact, mathematically speaking, I suspect there may be no "solution" at all since I believe it's an ill-posed problem you're forced to deal with.
(Apologies in advance for my reckless abuse of mathematics)
To demonstrate let's consider a situation where all pixel components and kernel values are assumed to be positive. To get an idea of how some of these answers could lead us astray let's further think about a simple averaging ("box") filter. If we set values outside the boundary of the image to zero then this will clearly drag down the average at every pixel within ceil(n/2) (manhattan distance) of the boundary. So you'll get a "dark" border on your filtered image (assuming a single intensity component or RGB colorspace -- your results will vary by colorspace!). Note that similar arguments can be made if we set the values outside the boundary to any arbitrary constant -- the average will tend towards that constant. A constant of zero might be appropriate if the edges of your typical image tend towards 0 anyway. This is also true if we consider more complex filter kernels like a gaussian however the problem will be less pronounced because the kernel values tend to decrease quickly with distance from the center.
Now suppose that instead of using a constant we choose to repeat the edge values. This is the same as making a border around the image and copying rows, columns, or corners enough times to ensure the filter stays "inside" the new image. You could also think of it as clamping/saturating the sample coordinates. This has problems with our simple box filter because it overemphasizes the values of the edge pixels. A set of edge pixels will appear more than once yet they all receive the same weight w=(1/(n*n)).
Suppose we sample an edge pixel with value K 3 times. That means its contribution to the average is:
K*w + K*w + K*w = K*3*w
So effectively that one pixel has a higher weight in the average. Note that since this is an average filter the weight is a constant over the kernel. However this argument applies to kernels with weights that vary by position too (again: think of the gaussian kernel..).
Suppose we wrap or reflect the sampling coordinates so that we're still using values from within the boundary of the image. This has some valuable advantages over using a constant but isn't necessarily "correct" either. For instance, how many photos do you take where the objects at the upper border are similar to those at the bottom? Unless you're taking pictures of mirror-smooth lakes I doubt this is true. If you're taking pictures of rocks to use as textures in games wrapping or reflecting could be appropriate. I'm sure there are significant points to be made here about how wrapping and reflecting will likely reduce any artifacts that result from using a fourier transform. However this comes back to the same idea: that you have a periodic signal which you do not wish to distort by introducing spurious new frequencies or overestimating the amplitude of existing frequencies.
So what can you do if you're filtering photos of bright red rocks beneath a blue sky? Clearly you don't want to add orange-ish haze in the blue sky and blue-ish fuzz on the red rocks. Reflecting the sample coordinate works because we expect similar colors to those pixels found at the reflected coordinates... unless, just for the sake of argument, we imagine the filter kernel is so big that the reflected coordinate would extend past the horizon.
Let's go back to the box filter example. An alternative with this filter is to stop thinking about using a static kernel and think back to what this kernel was meant to do. An averaging/box filter is designed to sum the pixel components then divide by the number of pixels summed. The idea is that this smooths out noise. If we're willing to trade a reduced effectiveness in suppressing noise near the boundary we can simply sum fewer pixels and divide by a correspondingly smaller number. This can be extended to filters with similar what-I-will-call-"normalizing" terms -- terms that are related to the area or volume of the filter. For "area" terms you count the number of kernel weights that are within the boundary and ignore those weights that are not. Then use this count as the "area" (which might involve a extra multiplication). For volume (again: assuming positive weights!) simply sum the kernel weights. This idea is probably awful for derivative filters because there are fewer pixels to compete with the noisy pixels and differentials are notoriously sensitive to noise. Also, some filters have been derived by numeric optimization and/or empirical data rather than from ab-initio/analytic methods and thus may lack a readily apparent "normalizing" factor.
Your question is somewhat broad and I believe it mixes two problems:
dealing with boundary conditions;
dealing with halo regions.
The first problem (boundary conditions) is encountered, for example, when computing the convolution between and image and a 3 x 3 kernel. When the convolution window comes across the boundary, one has the problem of extending the image outside of its boundaries.
The second problem (halo regions) is encountered, for example, when loading a 16 x 16 tile within shared memory and one has to process the internal 14 x 14 tile to compute second order derivatives.
For the second issue, I think a useful question is the following: Analyzing memory access coalescing of my CUDA kernel.
Concerning the extension of a signal outside of its boundaries, a useful tool is provided in this case by texture memory thanks to the different provided addressing modes, see The different addressing modes of CUDA textures.
Below, I'm providing an example on how a median filter can be implemented with periodic boundary conditions using texture memory.
#include <stdio.h>
#include "TimingGPU.cuh"
#include "Utilities.cuh"
texture<float, 1, cudaReadModeElementType> signal_texture;
#define BLOCKSIZE 32
/*************************************************/
/* KERNEL FUNCTION FOR MEDIAN FILTER CALCULATION */
/*************************************************/
__global__ void median_filter_periodic_boundary(float * __restrict__ d_vec, const unsigned int N){
unsigned int tid = threadIdx.x + blockIdx.x * blockDim.x;
if (tid < N) {
float signal_center = tex1D(signal_texture, tid - 0);
float signal_before = tex1D(signal_texture, tid - 1);
float signal_after = tex1D(signal_texture, tid + 1);
printf("%i %f %f %f\n", tid, signal_before, signal_center, signal_after);
d_vec[tid] = (signal_center + signal_before + signal_after) / 3.f;
}
}
/********/
/* MAIN */
/********/
int main() {
const int N = 10;
// --- Input host array declaration and initialization
float *h_arr = (float *)malloc(N * sizeof(float));
for (int i = 0; i < N; i++) h_arr[i] = (float)i;
// --- Output host and device array vectors
float *h_vec = (float *)malloc(N * sizeof(float));
float *d_vec; gpuErrchk(cudaMalloc(&d_vec, N * sizeof(float)));
// --- CUDA array declaration and texture memory binding; CUDA array initialization
cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<float>();
//Alternatively
//cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc(32, 0, 0, 0, cudaChannelFormatKindFloat);
cudaArray *d_arr; gpuErrchk(cudaMallocArray(&d_arr, &channelDesc, N, 1));
gpuErrchk(cudaMemcpyToArray(d_arr, 0, 0, h_arr, N * sizeof(float), cudaMemcpyHostToDevice));
cudaBindTextureToArray(signal_texture, d_arr);
signal_texture.normalized = false;
signal_texture.addressMode[0] = cudaAddressModeWrap;
// --- Kernel execution
median_filter_periodic_boundary<<<iDivUp(N, BLOCKSIZE), BLOCKSIZE>>>(d_vec, N);
gpuErrchk(cudaPeekAtLastError());
gpuErrchk(cudaDeviceSynchronize());
gpuErrchk(cudaMemcpy(h_vec, d_vec, N * sizeof(float), cudaMemcpyDeviceToHost));
for (int i=0; i<N; i++) printf("h_vec[%i] = %f\n", i, h_vec[i]);
printf("Test finished\n");
return 0;
}

Optimal sigma for Gaussian filtering of an image?

When applying a Gaussian blur to an image, typically the sigma is a parameter (examples include Matlab and ImageJ).
How does one know what sigma should be? Is there a mathematical way to figure out an optimal sigma? In my case, i have some objects in images that are bright compared to the background, and I need to find them computationally. I am going to apply a Gaussian filter to make the center of these objects even brighter, which hopefully facilitates finding them. How can I determine the optimal sigma for this?
There's no formula to determine it for you; the optimal sigma will depend on image factors - primarily the resolution of the image and the size of your objects in it (in pixels).
Also, note that Gaussian filters aren't actually meant to brighten anything; you might want to look into contrast maximization techniques - sounds like something as simple as histogram stretching could work well for you.
edit: More explanation - sigma basically controls how "fat" your kernel function is going to be; higher sigma values blur over a wider radius. Since you're working with images, bigger sigma also forces you to use a larger kernel matrix to capture enough of the function's energy. For your specific case, you want your kernel to be big enough to cover most of the object (so that it's blurred enough), but not so large that it starts overlapping multiple neighboring objects at a time - so actually, object separation is also a factor along with size.
Since you mentioned MATLAB - you can take a look at various gaussian kernels with different parameters using the fspecial('gaussian', hsize, sigma) function, where hsize is the size of the kernel and sigma is, well, sigma. Try varying the parameters to see how it changes.
I use this convention as a rule of thumb. If k is the size of kernel than sigma=(k-1)/6 . This is because the length for 99 percentile of gaussian pdf is 6sigma.
You have to find a min/max of a function G such that G(X,sigma) where X is a set of your observations (in your case, your image grayscale values) , This function can be anything that maintain the "order" of the intensities of the iamge, for example, this can be done with the 1st derivative of the image (as G),
fil = fspecial('sobel');
im = imfilter(I,fil);
imagesc(im);
colormap = gray;
this gives you the result of first derivative of an image, now you want to find max sigma by
maximzing G(X,sigma), that means that you are trying a few sigmas (let say, in increasing order) until you reach a sigma that makes G maximal. This can also be done with second derivative.
Given the central value of the kernel equals 1 the dimension that guarantees to have the outermost value less than a limit (e.g 1/100) is as follows:
double limit = 1.0 / 100.0;
size = static_cast<int>(2 * std::ceil(sqrt(-2.0 * sigma * sigma * log(limit))));
if (size % 2 == 0)
{
size++;
}

Basic Complexity Question - Convolution

I'm trying to evaluate the complexity of some basic image filtering algorithms. I was wondering if you could verify this theory;
For a basic pixel by pixel filter like Inverse the number of operations grows linearly with the size of the input (In pixels) and
Let S = Length of the side of the image
Let M = # pixels input
Inverse is of order O(M) or O(S^2).
A convolution filter on the other hand has a parameter R which determines the size of the neighborhood to convolve in establishing the next pixel value for each filter.
Let R = Radius of convolution filter
Convolution is of order O(M*((R+R*2)^2) = O(M*(4R^2) = O(MR^2)
Or should I let N = the size of the convolution filter (Neighbourhood) in pixels?
O(M*(N)) = O(MN)
Ultimately a convolution filter is linearly dependent on the product of the number of pixels and the number of pixels in the neighbourhood.
If you have any links to a paper where this has been documented it would be greatly appreciated.
Kind regards,
Gavin
O(MN) seems right if I understand that for each pixel in the image the convolution is the adjustment of pixel values in the neighbourhood N, regardless of N being square. N could be best-fit triangle ... but providing the pixels in the neighbourhood are adjusted for each pixel in the image then O(MN) makes more sense, because the dependency is in the pixels adjusted per pixel in the source image.
Interestingly, in a non-regular neighbourhood some pixels may be adjusted by the neighbourhood mask more than others, but O(MN) will still stand.
If the neighbourhood is central on a pixel P and then moved to the next P which was not in the neighbourhood (meaning each pixel is transformed once) then this doesn't stand.

Resources