Why is cv::resize so slow? - ios

I'm doing some edge detection on a live video feed:
- (void)processImage:(Mat&)image;
{
cv::resize(image, smallImage, cv::Size(288,352), 0, 0, CV_INTER_CUBIC);
edgeDetection(smallImage);
cv::resize(smallImage, image, image.size(), 0, 0, CV_INTER_LINEAR);
}
edgeDetection does some fairly heavy lifting, and was running at quite a low framerate with the video frame size of 1280x720. Adding in the resize calls dramatically decreased the framerate, quite the reverse of what I was expecting. Is this just because a resize operation is slow, or becuase I'm doing something wrong?
smallImage is declared in the header thus:
#interface CameraController : UIViewController
<CvVideoCameraDelegate>
{
Mat smallImage;
}
There is no initialisation of it, and it works ok.

Resizing an image is slow, and you are doing it twice for each processed frame. There are several ways to somehow improve your solution but you have to provide more details about the problem you are trying to solve.
To begin with, resizing an image before detecting edges will result in feeding the edge detection with less information so it will result in less edges being detected - or at least it will make it harder to detect them.
Also the resizing algorithm used affects its speed, CV_INTER_LINEAR is the fastest for cv::resize if my memory does not fail - and you are using CV_INTER_CUBIC for the first resize.
One alternative to resize an image is to instead process a smaller region of the original image. To that you should take a look at opencv image ROI's (region of interest). It is quite easy to do, you have lots of questions in this site regarding those. The downside is that you will be only detecting edges in a region and not for the whole image, that might be fine, depending on the problem.
If you really want to resize the images, opencv developers usually use the pyrDown and pyrUp functions when they want to process smaller images, instead of resize. I think it is because it is faster, but you can test it to be sure. More information about pyrDown and pyrUp in this link.
About cv::resize algorithms, here is the list:
INTER_NEAREST - a nearest-neighbor interpolation
INTER_LINEAR - a bilinear interpolation (used by default)
INTER_AREA - resampling using pixel area relation. It may be a preferred method for image decimation, as it gives moire’-free results. But when the image is zoomed, it is similar to the INTER_NEAREST method.
INTER_CUBIC - a bicubic interpolation over 4x4 pixel neighborhood
INTER_LANCZOS4 - a Lanczos interpolation over 8x8 pixel neighborhood
Can't say for sure if INTER_LINEAR is the fastest of them all but it is for sure faster than INTER_CUBIC.

INTER_NEAREST is the fastest and has the worst quality results. In the downscale, for each pixel, it just uses the pixel nearest to the hypothetical place.
INTER_LINEAR is a good compromise of performance and quality, but it is slower than INTER_NEAREST.
INTER_CUBIC is slower than INTER_LINEAR because it uses more interpolation.
INTER_LANCZOS4 is the algorithm with the best quality result, but it is slower than the others.
Here you can find a good comparison article: http://tanbakuchi.com/posts/comparison-of-openv-interpolation-algorithms/

Time trial on 4 core CPU (not GPU).
From: (1440, 2560, 3)
To: (300, 300, 3)
Fastest to slowest:
INTER_NEAREST resize: Time Taken: 0:00:00.001024
INTER_LINEAR resize: Time Taken: 0:00:00.004321
INTER_CUBIC resize: Time Taken: 0:00:00.007929
INTER_LANCZOS4 resize: Time Taken: 0:00:00.021042
INTER_AREA resize: Time Taken: 0:00:00.065569

Related

Detect objects similar to circles

I'm trying to detect objects that are similar to circles using OpenCV's HoughCircles. The problem is: HoughCircles fails to detect such objects in some cases.
Does anyone know any alternative way to detect objects similar to circles like these ones?
Update
Update
Hello Folks I'm adding a gif of the result of my detection method.
It's easier use a gif to explain the problem. The undesired effect that I want to remove is the circle size variation. Even for a static shape like the one on the right, the result on the left is imprecise. Does anyone know a solution for that?
Update
All that I need from this object is its diameter. I've done it using findContours. Now I can't use findContours once it is too slow when using openCV and OpenMP. Does anyone know a fast alternatives to findContours?
Update
The code that I'm using to detect these shapes.
for (int j=0; j<=NUM_THREADS-1;j++)
{
capture >> frame[j];
}
#pragma omp parallel shared(frame,processOutput,circles,diameterArray,diameter)
{
int n=omp_get_thread_num();
cvtColor( frame[n], processOutput[n], CV_BGR2GRAY);
GaussianBlur(processOutput[n], processOutput[n], Size(9, 9), 2, 2);
threshold(processOutput[n], processOutput[n], 21, 250, CV_THRESH_BINARY);
dilate(processOutput[n], processOutput[n], Mat(), Point(-1, -1), 2, 1, 1);
erode(processOutput[n], processOutput[n], Mat(), Point(-1, -1), 2, 1, 1);
Canny(processOutput[n], processOutput[n], 20, 20*2, 3 );
HoughCircles( processOutput[n],circles[n], CV_HOUGH_GRADIENT, 1, frame[n].rows/8, 100,21, 50, 100);
}
#pragma omp parallel private(m, n) shared(circles)
{
#pragma omp for
for (n=0; n<=NUM_THREADS-1;n++)
{
for( m = 0; m < circles[n].size(); m++ )
{
Point center(cvRound(circles[n][m][0]), cvRound(circles[n][m][2]));
int radius = cvRound(circles[n][m][3]);
diameter = 2*radius;
diameterArray[n] = diameter;
circle( frame[0], center, 3, Scalar(0,255,0), -1, 8, 0 );
circle( frame[0], center, radius, Scalar(0,0,255), 3, 8, 0 );
}
}
}
Edited based on new description and additional performance and accuracy requirements.
This is getting beyond the scope of an "OpenCV sample project", and getting into the realm of actual application development. Both performance and accuracy become requirements.
This requires a combination of techniques. So, don't just pick one approach. You will have to try all combinations of approaches, as well as fine-tune the parameters to find an acceptable combination.
#1. overall approach for continuous video frame recognition tasks
Use a slow but accurate method to acquire an initial detection result.
Once a positive detection is found on one frame, the next frame should switch to a fast local search algorithm using the position detected on the most recent frame.
As a reminder, don't forget to update the "most recent position" for use by the next frame.
#2. suggestion for initial object acquisition.
Stay with your current approach, and incorporate the suggestions.
You can still fine-tune the balance between speed and precision, because a correct but imprecise result (off by tens of pixels) will be updated and refined when the next frame is processed with the local search approach.
Try my suggestion of increasing the dp parameter.
A large value of dp reduces the resolution at which Hough Gradient Transform is performed. This reduces the precision of the center coordinates, but will improve the chance of detecting a dented circle because the dent will become less significant when the transform is performed at a lower resolution.
An added benefit is that reduced resolution should run faster.
#3. suggestion for fast local search around a previously detected position
Because of the limited search space and amount of data needed, it is possible to make local search both fast and precise.
For tracking the movement of the boundary of iris through video frames, I suggest using a family of algorithms called the Snakes model.
The focus is on tracking the movement of edges through profiles. There are many algorithms that can implement the Snakes model. Unfortunately, most implementations are tailored to very complex shape recognition, which would be an overkill and too slow for your project.
Basic idea: (assuming that the previous result is a curve)
Choose some sampling points on the curve.
Scan the edge profile (perpendicular to the curve) at each the sampling point, on the new frame, using the position of the old frame. Look for the sharpest change.
Remember the new edge position for this sampling point.
After all of the sampling points have been updated, create a new curve by joining all of the updated sampling point positions.
There are many varieties, and different levels of sophistication of implementations which you can find on the Internet. Unfortunately, it was reported that the one packaged with OpenCV might not work very well. You may have to try different open-source implementation, and ultimately you may have to implement one that is simple but well-tuned to your project's needs.
#4. Seek advice for your speed optimization attempts.
Use a software performance profiler.
Add some timing and logging code around each call to OpenCV function to print out the time spent on each step. You will be surprised. The reason is that some OpenCV functions are more heavily vectorized and parallelized than others, perhaps as a result of the labor of love.
Unfortunately, for the slowest step - initial object acquisition, there is not much you can parallelize (by multithread).
This is perhaps already obvious to you since you did not put #pragma omp for around the first block of code. (It would not help anyway.)
Vectorization (SIMD) would only benefit pixel-level processing. If OpenCV implements it, great; if not, there is not much you can do.
My guess is that cvtColor, GaussianBlur, threshold, dilate, erode could have been vectorized, but the others might not be.
Give try on below,
Find contour in source.
Find minimum enclosing circle for the contour.
Now draw contour to new Mat with CV_FILLED.
Similarly draw enclosing circle to new Mat with filled option.
Perform x-or operation between the above two and count non-zero.
You can decide the contour is close to circle or not by comparing the non-zero pixel between contour and enclosimg circle with a threshold. You can decide the threshold by calculating the area of encosing circle, and taking it percent.
The idea is simple the area between contour and its enclosing circle decreases as the contour closes to circle

Simple way to check if an image bitmap is blur

I am looking for a "very" simple way to check if an image bitmap is blur. I do not need accurate and complicate algorithm which involves fft, wavelet, etc. Just a very simple idea even if it is not accurate.
I've thought to compute the average euclidian distance between pixel (x,y) and pixel (x+1,y) considering their RGB components and then using a threshold but it works very bad. Any other idea?
Don't calculate the average differences between adjacent pixels.
Even when a photograph is perfectly in focus, it can still contain large areas of uniform colour, like the sky for example. These will push down the average difference and mask the details you're interested in. What you really want to find is the maximum difference value.
Also, to speed things up, I wouldn't bother checking every pixel in the image. You should get reasonable results by checking along a grid of horizontal and vertical lines spaced, say, 10 pixels apart.
Here are the results of some tests with PHP's GD graphics functions using an image from Wikimedia Commons (Bokeh_Ipomea.jpg). The Sharpness values are simply the maximum pixel difference values as a percentage of 255 (I only looked in the green channel; you should probably convert to greyscale first). The numbers underneath show how long it took to process the image.
If you want them, here are the source images I used:
original
slightly blurred
blurred
Update:
There's a problem with this algorithm in that it relies on the image having a fairly high level of contrast as well as sharp focused edges. It can be improved by finding the maximum pixel difference (maxdiff), and finding the overall range of pixel values in a small area centred on this location (range). The sharpness is then calculated as follows:
sharpness = (maxdiff / (offset + range)) * (1.0 + offset / 255) * 100%
where offset is a parameter that reduces the effects of very small edges so that background noise does not affect the results significantly. (I used a value of 15.)
This produces fairly good results. Anything with a sharpness of less than 40% is probably out of focus. Here's are some examples (the locations of the maximum pixel difference and the 9×9 local search areas are also shown for reference):
(source)
(source)
(source)
(source)
The results still aren't perfect, though. Subjects that are inherently blurry will always result in a low sharpness value:
(source)
Bokeh effects can produce sharp edges from point sources of light, even when they are completely out of focus:
(source)
You commented that you want to be able to reject user-submitted photos that are out of focus. Since this technique isn't perfect, I would suggest that you instead notify the user if an image appears blurry instead of rejecting it altogether.
I suppose that, philosophically speaking, all natural images are blurry...How blurry and to which amount, is something that depends upon your application. Broadly speaking, the blurriness or sharpness of images can be measured in various ways. As a first easy attempt I would check for the energy of the image, defined as the normalised summation of the squared pixel values:
1 2
E = --- Σ I, where I the image and N the number of pixels (defined for grayscale)
N
First you may apply a Laplacian of Gaussian (LoG) filter to detect the "energetic" areas of the image and then check the energy. The blurry image should show considerably lower energy.
See an example in MATLAB using a typical grayscale lena image:
This is the original image
This is the blurry image, blurred with gaussian noise
This is the LoG image of the original
And this is the LoG image of the blurry one
If you just compute the energy of the two LoG images you get:
E = 1265 E = 88
or bl
which is a huge amount of difference...
Then you just have to select a threshold to judge which amount of energy is good for your application...
calculate the average L1-distance of adjacent pixels:
N1=1/(2*N_pixel) * sum( abs(p(x,y)-p(x-1,y)) + abs(p(x,y)-p(x,y-1)) )
then the average L2 distance:
N2= 1/(2*N_pixel) * sum( (p(x,y)-p(x-1,y))^2 + (p(x,y)-p(x,y-1))^2 )
then the ratio N2 / (N1*N1) is a measure of blurriness. This is for grayscale images, for color you do this for each channel separately.

Fast image thresholding

What is a fast and reliable way to threshold images with possible blurring and non-uniform brightness?
Example (blurring but uniform brightness):
Because the image is not guaranteed to have uniform brightness, it's not feasible to use a fixed threshold. An adaptive threshold works alright, but because of the blurriness it creates breaks and distortions in the features (here, the important features are the Sudoku digits):
I've also tried using Histogram Equalization (using OpenCV's equalizeHist function). It increases contrast without reducing differences in brightness.
The best solution I've found is to divide the image by its morphological closing (credit to this post) to make the brightness uniform, then renormalize, then use a fixed threshold (using Otsu's algorithm to pick the optimal threshold level):
Here is code for this in OpenCV for Android:
Mat kernel = Imgproc.getStructuringElement(Imgproc.MORPH_ELLIPSE, new Size(19,19));
Mat closed = new Mat(); // closed will have type CV_32F
Imgproc.morphologyEx(image, closed, Imgproc.MORPH_CLOSE, kernel);
Core.divide(image, closed, closed, 1, CvType.CV_32F);
Core.normalize(closed, image, 0, 255, Core.NORM_MINMAX, CvType.CV_8U);
Imgproc.threshold(image, image, -1, 255, Imgproc.THRESH_BINARY_INV
+Imgproc.THRESH_OTSU);
This works great but the closing operation is very slow. Reducing the size of the structuring element increases speed but reduces accuracy.
Edit: based on DCS's suggestion I tried using a high-pass filter. I chose the Laplacian filter, but I would expect similar results with Sobel and Scharr filters. The filter picks up high-frequency noise in the areas which do not contain features, and suffers from similar distortion to the adaptive threshold due to blurring. it also takes about as long as the closing operation. Here is an example with a 15x15 filter:
Edit 2: Based on AruniRC's answer, I used Canny edge detection on the image with the suggested parameters:
double mean = Core.mean(image).val[0];
Imgproc.Canny(image, image, 0.66*mean, 1.33*mean);
I'm not sure how to reliably automatically fine-tune the parameters to get connected digits.
Using Vaughn Cato and Theraot's suggestions, I scaled down the image before closing it, then scaled the closed image up to regular size. I also reduced the kernel size proportionately.
Mat kernel = Imgproc.getStructuringElement(Imgproc.MORPH_ELLIPSE, new Size(5,5));
Mat temp = new Mat();
Imgproc.resize(image, temp, new Size(image.cols()/4, image.rows()/4));
Imgproc.morphologyEx(temp, temp, Imgproc.MORPH_CLOSE, kernel);
Imgproc.resize(temp, temp, new Size(image.cols(), image.rows()));
Core.divide(image, temp, temp, 1, CvType.CV_32F); // temp will now have type CV_32F
Core.normalize(temp, image, 0, 255, Core.NORM_MINMAX, CvType.CV_8U);
Imgproc.threshold(image, image, -1, 255,
Imgproc.THRESH_BINARY_INV+Imgproc.THRESH_OTSU);
The image below shows the results side-by-side for 3 different methods:
Left - regular size closing (432 pixels), size 19 kernel
Middle - half-size closing (216 pixels), size 9 kernel
Right - quarter-size closing (108 pixels), size 5 kernel
The image quality deteriorates as the size of the image used for closing gets smaller, but the deterioration isn't significant enough to affect feature recognition algorithms. The speed increases slightly more than 16-fold for the quarter-size closing, even with the resizing, which suggests that closing time is roughly proportional to the number of pixels in the image.
Any suggestions on how to further improve upon this idea (either by further reducing the speed, or reducing the deterioration in image quality) are very welcome.
Alternative approach:
Assuming your intention is to have the numerals to be clearly binarized ... shift your focus to components instead of the whole image.
Here's a pretty easy approach:
Do a Canny edgemap on the image. First try it with parameters to Canny function in the range of the low threshold to 0.66*[mean value] and the high threshold to 1.33*[mean value]. (meaning the mean of the greylevel values).
You would need to fiddle with the parameters a bit to get an image where the major components/numerals are visible clearly as separate components. Near perfect would be good enough at this stage.
Considering each Canny edge as a connected component (i.e. use the cvFindContours() or its C++ counterpart, whichever) one can estimate the foreground and background greylevels and reach a threshold.
For the last bit, do take a look at sections 2. and 3. of this paper. Skipping most of the non-essential theoretical parts it shouldn't be too difficult to have it implemented in OpenCV.
Hope this helped!
Edit 1:
Based on the Canny edge thresholds here's a very rough idea just sufficient to fine-tune the values. The high_threshold controls how strong an edge must be before it is detected. Basically, an edge must have gradient magnitude greater than high_threshold to be detected in the first place. So this does the initial detection of edges.
Now, the low_threshold deals with connecting nearby edges. It controls how much nearby disconnected edges will get combined together into a single edge. For a better idea, read "Step 6" of this webpage. Try setting a very small low_threshold and see how things come about. You could discard that 0.66*[mean value] thing if it doesn't work on these images - its just a rule of thumb anyway.
We use Bradleys algorithm for very similar problem (to segment letters from background, with uneven light and uneven background color), described here: http://people.scs.carleton.ca:8008/~roth/iit-publications-iti/docs/gerh-50002.pdf, C# code here: http://code.google.com/p/aforge/source/browse/trunk/Sources/Imaging/Filters/Adaptive+Binarization/BradleyLocalThresholding.cs?r=1360. It works on integral image, which can be calculated using integral function of OpenCV. It is very reliable and fast, but itself is not implemented in OpenCV, but is easy to port.
Another option is adaptiveThreshold method in openCV, but we did not give it a try: http://docs.opencv.org/modules/imgproc/doc/miscellaneous_transformations.html#adaptivethreshold. The MEAN version is the same as bradleys, except that it uses a constant to modify the mean value instead of a percentage, which I think is better.
Also, good article is here: https://dsp.stackexchange.com/a/2504
You could try working on a per-tile basis if you know you have a good crop of the grid. Working on 9 subimages rather than the whole pic will most likely lead to more uniform brightness on each subimage. If your cropping is perfect you could even try going for each digit cell individually; but it all depends on how reliable is your crop.
Ellipse shape is complex to calculate if compared to a flat shape.
Try to change:
Mat kernel = Imgproc.getStructuringElement(Imgproc.MORPH_ELLIPSE, new Size(19,19));
to:
Mat kernel = Imgproc.getStructuringElement(Imgproc.MORPH_RECT, new Size(19,19));
can speed up your enough solution with low impact to accuracy.

Image analysis through GPU (with OpenGL ES) or CPU?

I'd like to analyze the constantly updating image feed that comes from an iPhone camera to determine a general "lightness coefficient". Meaning: if the coefficient returns 0.0, the image is completely black, if it returns 1.0 the image is completely white. Of course all values in between are the ones that I care about the most (background info: I'm using this coefficient to calculate the intensity of some blending effects in my fragment shader).
So I'm wondering if I should run a for loop over my pixelbuffer and analyze the image every frame (30 fps) and send the coeff as a uniform to my fragment shader or is there a way to analyze my image in OpenGL. If so, how should I do that?
There are more answers, each one with its own strong and weak points.
On CPU, it's fairly simple: cycle through the pixels, sum them up, divide, and that's it. It's a five minutes work. It will take a few milliseconds with a good implementation.
int i, sum = 0, count = width * height * channels;
for(i=0;i<count;i++)
avg += buffer[i];
double avg = double(sum) / double(count);
On GPU, it is likely to be much faster, but there are a few drawbacks: First one is the amount of work needed just to put everything in place. The GPUImage framework will save you some work, but it will also add a lot of code. If all you want to do is to sum the pixels, it may be a waste. Second problem is that sending pixels to GPU may take more than summing them up in the CPU. Actually, the GPU will justify the work only if you really need serious processing.
A third option, to use the CPU with a library, has the drawback that you add a lot of code for what you can do in 10 lines. But the result will be beautiful. Again, it justify if you also use the lib for other tasks. Here is an example in OpenCV:
cv::Mat frame(buffer, width, height, channels, type);
double avgLuminance = cv::sum(frame)/(double(frame.total()*frame.channels()));
There is of course OpenCL, which allows you to use your GPU for general processing.

OpenCV GPU convolve function and the missing border

I have a question about the Convolve function in OpenCV using GPU acceleration.
The speed of the convolutions are roughly 3.5 faster using GPU
when running:
convolve(src_32F, kernel, cresult, false, cbuffer);
However the image borders are missing (in cresult)
The result is excellent otherwise though (kernel size is 60x60)
thanks
This is the way convolution works.
It calculates the value of every pixel as a weighted average of the surrounding ones. So, if you take into account 30 pixels each side, for all the pixels that are closer to the image border than 30 pixels, convolution is not defined.
In the CPU implementation of filtering functions, those missing pixels are supplemented with bogus values based on a given strategy (copy, mirror, blank, etc).
What you can do is to manually pad your matrix with the desired values in a bigger matrix, filter the big one, and then crop it back. For that you can use the gpu::copyMakeBorder() func.

Resources