Fast, reliable focus score for camera frames - ios

I'm doing real-time frame-by-frame analysis of a video stream in iOS.
I need to assign a score to each frame for how in focus it is. The method must be very fast to calculate on a mobile device and should be fairly reliable.
I've tried simple things like summing after using an edge detector, but haven't been impressed by the results. I've also tried using the focus scores provided in the frame's metadata dictionary, but they're significantly affected by the brightness of the image, and much more device-specific.
What are good ways to calculate a fast, reliable focus score?

Poor focus means that edges are not very sharp, and small details are lost. High JPEG compression gives very similar distortions.
Compress a copy of your image heavily, unpack and calculate the difference with the original. Intense difference, even at few spots, should mean that the source image had sharp details that are lost in compression. If difference is relatively small everywhere, the source was already fuzzy.
The method can be easily tried in an image editor. (No, I did not yet try it.) Hopefully iPhone has an optimized JPEG compressor already.

A simple answer that human visual system probably uses is to implemnt focusing on top of edge
Tracking. Thus if a set of edges can be tracked across a visual sequence one can work with intensity profile
Of these edges only to detrmine when it the steepest.

From a theoretical point of view, blur manifests as a lost of the high frequency content. Thus, you can just use do a FFT and check the relative frequency distribution. iPhone uses ARM Cortex chips which have NEON instructions that can be used for an efficient FFT implementation.
#9000's suggestion of heavily compressed JPEG has the effect of taking a very small number of the largest wavelet coefficients will usually result in what's in essence a low pass filter.

Consider different kind of edges: e.g. peaks versus step edges. The latter will still be present regardless of focus. To isolate the former use non max suppression in the direction of gradient. As a focus score use the ratio of suppressed edges at two different resolutions.

Related

Removing sun light reflection from images of IR camera in realtime OpenCV application

I am developing speed estimation and vehicle counting application with OpencV and I use IR camera.
I am facing a problem of sun light reflection which causes vertical white region or lines in the images and has bad effect on my vehicle detections.
I want an approach with very high speed, because it is a real-time application.
The vertical streak defect in those images is called "blooming", happens when the one or a few wells in a CCD saturate to the point that they spill charge over neighboring wells in the same column. In addition, you have "regular" saturation with no blooming around the area of the reflection.
If you can, the best solution is to control the exposure (faster shutter time, or close lens iris if you have one). This will reduce but not eliminate blooming occurrence.
Blooming will always occur in a constant direction (vertical or horizontal, depending on your image orientation), and will normally fill entirely one or few contiguous columns. So you can cheaply detect it by heavily subsampling in the opposite dimension and looking for maxima that repeat in the same column. E.g., in you images, you could look for saturated maxima in the same column over 10 rows or so spread over the image height.
Once you detect the blooming columns, you can follow them in a small band around them to try to locate the saturated area. Note that saturation does not necessarily imply values at the end of the dynamic range (e.g. 255 for 8-bit image). Your sensor could be completely saturated at values that the A/D conversion assign at, say, 252. Saturation simply means that the image response becomes constant with respect to the input luminance.
The easiest solution (to me) is a hardware solution. If you can modify the physical camera setup add a polarizing filter to the lens of the camera. You don't even need a(n expensive) camera specific lens, adding a simple sheet of polarized film is good enough Here is one site I just googled "polarizing film" You will have to play with the orientation, but with this mounted position most surfaces are at the same angle and glare will be polarized near horizontal. So you should find a position that works well in most situations.
I've used this method before and the best part is it adds no extra algorithmic complexity or lag. Especially for mounted cameras where all surfaces are at nearly the same angle. This won't help you process the images you currently have but it will help in processing and acquiring future images.

is it possible to take low resolution image from street camera, increase it and see image details

I would like to know if it is possible to take low resolution image from street camera, increase it
and see image details (for example a face, or car plate number). Is there any software that is able to do it?
Thank you.
example of image: http://imgur.com/9Jv7Wid
Possible? Yes. In existence? not to my knowledge.
What you are referring to is called super-resolution. The way it works, in theory, is that you combine multiple low resolution images, and then combine them to create a high-resolution image.
The way this works is that you essentially map each image onto all the others to form a stack, where the target portion of the image is all the same. This gets extremely complicated extremely fast as any distortion (e.g. movement of the target) will cause the images to differ dramatically, on the pixel level.
But, let's you have the images stacked and have removed the non-relevant pixels from the stack of images. You are left hopefully with a movie/stack of images that all show the exact same image, but with sub-pixel distortions. A sub-pixel distortion simply means that the target has moved somewhere inside the pixel, or has moved partially into the neighboring pixel.
You can't measure if the target has moved within the pixel, but you can detect if the target has moved partially into a neighboring pixel. You can do this by knowing that the target is going to give off X amount of photons, so if you see 1/4 of the photons in one pixel and 3/4 of the photons in the neighboring pixel you know it's approximate location, which is 3/4 in one pixel and 1/4 in the other. You then construct an image that has a resolution of these sub-pixels and place these sub-pixels in their proper place.
All of this gets very computationally intensive, and sometimes the images are just too low-resolution and have too much distortion from image to image to even create a meaningful stack of images. I did read a paper about a lab in a university being able to create high-resolution images form low-resolution images, but it was a very very tightly controlled experiment, where they moved the target precisely X amount from image to image and had a very precise camera (probably scientific grade, which is far more sensitive than any commercial grade security camera).
In essence to do this in the real world reliably you need to set up cameras in a very precise way and they need to be very accurate in a particular way, which is going to be expensive, so you are better off just putting in a better camera than relying on this very imprecise technique.
Actually it is possible to do super-resolution (SR) out of even a single low-resolution (LR) image! So you don't have to hassle taking many LR images with sub-pixel shifts to achieve that. The intuition behind such techniques is that natural scenes are full of many repettitive patterns that can be use to enahance the frequency content of similar patches (e.g. you can implement dictionary learning in your SR reconstruction technique to generate the high-resolution version). Sure the enhancment may not be as good as using many LR images but such technique is simpler and more practicle.
Photoshop would be your best bet. But know that you cannot reliably inclrease the size of an image without making the quality even worse.

Why supersampling is not widely used for image scaling?

I look for an appropriate image scaling algorithm and wondered why supersampling is not as popular as bicubic, bilinear or even lanczos.
By supersampling I mean a method that divides the source image into equal rectangles, each rectangle corresponding to a pixel in the destination image. In my opinion, this is the most natural and accurate method. It takes into account all pixels of the source image, while bilinear might skip some pixels. As far as I can see, the quality is also very high, comparable with lanczos.
Why do popular image libraries (such as GraphicsMagic, GD or PIL) not implement this algorithm? I found realizations only in Intel IPP and AMD Framewave projects. I know at least one disadvantage: it can only be used for downscaling, but am I missing something else?
For comparison, this is a 4.26x scaled down image. From left to right: GraphicsMagic Sinc filter (910ms), Framewave Super method (350ms), GraphicsMagic Triangle filter (320ms):
Now I know the answer. Because pixel is not a little square. And that is why supersampling resizing gives aliased result. This can be seen on thin water jets on sample image. This is not fatal and supersampling can be used for scaling to 2x, 3x and so on to dramatically reduce picture size before resize to exact dimensions with another method. This technique is used in jpeglib to open images in smaller size.
Of course we still can think about pixels as squares and actually, GD library does. It's imagecopyresampled is true supersampling.
You are a bit mistaken (when saying that linear rescaling misses pixels). Assuming You are rescaling the image by at most factor of 2, Bilinear interpolation takes into account all the pixels of the source image. If you smooth the image a bit and use bilinear interpolation this gives you high quality results. For most practical cases even bi-qubic interpolation is not needed.
Since bi-linear interpolation is extremely fast (can be easily executed in fixed point calculations) it is by far the best image rescaling algorithm when dealing with real time processing.
If you intend to shrink the image by more than factor of 2 than bilinear interpolation is mathematically wrong and with larger factors even bi-cubic starts to make mistakes. That is why in image processing software (like photoshop) we use better algorithms (yet much more CPU demanding).
The answer to your question is speed consideration.
Given the speed of your CPU/GPU, the image size and desired frame rate you can easily compute how many operations you can do for every pixel. For example - with 2GHZ CPU and 1[Gpix] image size, you can only make few calculations for each pixel every second.
Given the amount of allowed calculations - you select the best algorithms. So the decision is usually not driven by image quality but rather by speed considerations.
Another issue about super sampling - Sometimes if you do it in frequency domain, it works much better. This is called frequency interpolation. But you will not want to calculate FFT just for rescaling an image.
Moreover - I don't know if you are familiar with back projection. This is a way to interpolate the image from destination to source instead of from source to destination. Using back projection you can enlarge the image by a factor of 10, use bilinear interpolation and still be mathematically correct.
Computational burden and increased memory demand is most likely the answer you are looking for. That's why adaptive supersampling was introduced which compromises between burden/memory demand and effectiveness.
I guess supersampling is still too heavy even for today's hardware.
Short answer: They are super-sampling. I think the problem is terminology.
In your example, you are scaling down. This means decimating, not interpolating. Decimation will produce aliasing if no super-sampling is used. I don't see aliasing in the images you posted.
A sinc filter involves super-sampling. It is especially good for decimation because it specifically cuts off frequencies above those that can be seen in the final image. Judging from the name, I suspect the triangle filter also is a form of super-sampling. The second image you show is blurry, but I see no aliasing. So my guess is that it also uses some form of super-sampling.
Personally, I have always been confused by Adobe Photoshop, which asks me if I want "bicubic" or "bilinear" when I am scaling. But Bilinear, Bicubic, and Lanczos are interpolation methods, not decimation methods.
I can also tell you that modern video games also use super-sampling. Mipmapping is a commonly-used shortcut to realtime decimation by pre-decimating individual images by powers of two.

EMGU OpenCV disparity only on certain pixels

I'm using the EMGU OpenCV wrapper for c#. I've got a disparity map being created nicely. However for my specific application I only need the disparity values of very few pixels, and I need them in real time. The calculation is taking about 100 ms now, I imagine that by getting disparity for hundreds of pixel values rather than thousands things would speed up considerably. I don't know much about what's going on "under the hood" of the stereo solver code, is there a way to speed things up by only calculating the disparity for the pixels that I need?
First of all, you fail to mention what you are really trying to accomplish, and moreover, what algorithm you are using. E.g. StereoGC is a really slow (i.e. not real-time), but usually far more accurate) compared to both StereoSGBM and StereoBM. Those last two can be used real-time, providing a few conditions are met:
The size of the input images is reasonably small;
You are not using an extravagant set of parameters (for instance, a larger value for numberOfDisparities will increase computation time).
Don't expect miracles when it comes to accuracy though.
Apart from that, there is the issue of "just a few pixels". As far as I understand, the algorithms implemented in OpenCV usually rely on information from more than 1 pixel to determine the disparity value. E.g. it needs a neighborhood to detect which pixel from image A map to which pixel in image B. As a result, in general it is not possible to just discard every other pixel of the image (by the way, if you already know the locations in both images, you would not need the stereo methods at all). So unless you can discard a large border of your input images for which you know that you'll never find your pixels of interest there, I'd say the answer to this part of your question would be "no".
If you happen to know that your pixels of interest will always be within a certain rectangle of the input images, you can specify the input image ROIs (regions of interest) to this rectangle. Assuming OpenCV does not contain a bug here this should speedup the computation a little.
With a bit of googling you can to find real-time examples of finding stereo correspondences using EmguCV (or plain OpenCV) using the GPU on Youtube. Maybe this could help you.
Disclaimer: this may have been a more complete answer if your question contained more detail.

Feature Detection in Noisy Images

I've built an imaging system with a webcam and feature matching such that as I move the camera around; I can track the camera's motion. I am doing something similar to here, except with the webcam frames as the input.
It works really well for "good" images, but when taking images in really low light lots of noise appears (camera high gain), and that messes with the feature detection and matching. Basically, it doesn't detect any good features, and when it does, it cannot match them correctly between frames.
Does anyone know a good solution for this? What other methods are used for finding and matching features?
Here are two example images with very low features:
I think phase correlation is going to be your best bet here. It is designed to tell you the phase shift (i.e., translation) between two images. It is much more resilient (but not immune) to noise than feature detection because it operates in frequency space; whereas, feature detectors operate spatially. Another benefit is, it is very fast when compared with feature detection methods. I have an implementation available in the OpenCV trunk that is sub-pixel accurate located here.
However, your images are pretty much "featureless" with the exception of the crease in the middle, so even phase correlation may have some trouble with it. Think of it like trying to detect translation in a snow storm. If all you can see is white, you can't tell that you have translated at all, thus the term whiteout. In your case, the algorithm might suffer from "greenout" :)
Can you adjust the camera settings to work better in low-light conditions. Have you fully opened the iris? Can you live with lower framerates? Setting a longer exposure time will allow the camera to gather more light, thus giving you more features at the cost of adding motion blur. Or, if low-light is your default environment you probably want something designed for this like an IR camera, but those can be expensive. Other than that, a big lens and long exposures are your friend :)
Histogram equalization may be of interest in improving the image contrast. But, sometimes it can just enhance the noise. OpenCV has a global histogram equalization function called equalizeHist. For a more localized implementation, you'll want to look at Contrast Limited Adaptive Histogram Equalization or CLAHE for short. Here is a good article on it. This page has some nice examples, and some code.

Resources