How to deal with noise in images captured with a camera - image-processing

Assume there is a black box there is no light in it and there is a camera in this box. Camera starts capturing and what it captures is nothing - everywhere is pure black in box. But there will be differences in the sizes of captured frames due to various types of noises (thermal noise, quantization noise etc). I want to decrease/eliminate the effects of these noises in software side so that in a completely isolated black box, all captured frames will be exactly the same. Resolution, depth, color etc none of the properties matters after processing, accuracy/quality of the captured frames in the end doesn't matter. Any kind of filtering, downsampling etc every solution is acceptable. Reference is the black box, frames should be as identical as possible.
Any suggestions ?

There are many approaches to how to remove noise, the noise you are talking about is probably Gaussian noise.
The simplest thing you can do to remove it is to run a Gaussian blur on the image, and then to use threshold to remove (make the value zero) everything that it's value is under "a", when "a" is a parameter you should play with a little bit to find the most suitable value.

ok this is a bit funny answer
since you wrote accuracy/quality of the captured frames in the end doesn't matter, i'd say simple return black a software generated black image. You could do this without a camera ;)
hm more seriously i think you want to calculate the average nearing to black.
each extra sample will have less effect, and it builds up over time to create a noise pattern for your camera.
Then substract that image from new captures.
Or return a statistical result so you return values and the probability of that value.

Related

What exactly is the need for gamma correction?

I have problems to fully understand the need for gamma correction. I hope you guys can help me.
Let’s assume we want to display 256 neighboring pixels. These pixels should be a smooth gradient from black to white. To denote theirs colors, we use linear gray values from 0..255. Due to the non-linearity of the human eye, the monitor must not just turn these values into linear luminance values. If the neighboring pixels had the luminance values (1/256)*I_max, (2/256)*I_max, et cetera, we would perceive in the darker area too large differences in brightness between two pixels (the gradient would not be smooth).
Fortunately, a monitor has the reciprocal non-linearity to the human eye. That means, if we put linear gray values 0..255 into the frame buffer, then the monitor turns them into non-linear luminance values x^gamma. However, as our eye is non-linear the other way round, we perceive a smooth linear gradient. The non-linearity of the monitor and the one of our eye cancel each other out.
So, why do we need the gamma correction? I have read in books that we always want the monitor to produce linear luminance values. According to them, the non-linearity of the monitor must be compensated before writing the gray values to the frame buffer. That is done by the gamma correction. However, my problem here is that - as far as I understand it - we would not perceive linear brightness values (i.e. we would not perceive a smooth, steady gradient) when the monitor produces linear luminance values.
As far as I see it, it would be just perfect, if we put linear gray values into the frame buffer. The monitor turns these values into non-linear luminance values and our eye perceives linear brightness values again, because the eye is reciprocal non-linear. There would be no need to gamma correct the gray values in the frame buffer and no need to force the monitor to produce linear luminance values.
What is wrong with my way of looking at these things?
Thanks
Allow me to ‘resurrect’ this question since I am struggling with similar questions right now and I think I have found the answer -it may be useful for someone else. Or I might be wrong and someone could tell me :)
I think there is nothing wrong with your way of thinking. Thing is, you don‘t need to gamma-correct all the time, if you know what you are doing. It depends on what you want to achieve. Let‘s see two different cases.
A) Light simulation (AKA rendering). You have a diffuse surface with a light pointing towards it. Then, the light's intensity is doubled.
Well. Let’s see what happens in the real world in such situation. Assuming a purely diffuse surface, the intensity of the light reflected is going to be the surface's albedo multiplied by the incoming light intensity and the cosine of the incoming light angle and the normal. Whatever. Thing is, when the incoming light intensity is doubled, the reflected light intensity will be doubled too. This is why light transport is said to be a linear process. Funny enough, you will not perceive the surface as twice as bright, because our perception is nonlinear (This is modelled by the so-called Steven's power law). Put again: in the real world the reflected light is doubled, but you do not perceive it twice as bright.
Now, how would we simulate this? Well, if we have a sRGB texture with the surface's albedo, we would need to linearlize it (by de-correcting it, which means applying the 2.2 gamma). Now that it is linear, and we have the light intensity, we can use the formula I said before to compute the reflected light intensity. Since we are in a linear space, by doubling the intensity we will double the output, like in the real world. Now we gamma-correct our results. Because of this, when the screen displays the rendered image, it will apply the gamma and so it will have a linear response, meaning that the intensity of the light emited by the screen will be twice as much when we simulate the twice-as-poweful light than when we simulate the first one. So the light that arrive at your eyes from your screen will have double the intensity. Exactly as it would happen if you were looking at the real surface with real lights affecting it. You will not perceive the second render twice as bright, of course but, again, and as we said earlier, this is exactly what it would happen in the real situation. Same behavior in the real world and in the simulation means that the simulation (the render) was correct :)
B) A different case is precisely if you want a gradient that you want to 'look' (AKA being perceived) as linear.
Since you want the nonlinear response of the screen to cancel out our nonlinear visual perception, you can skip gamma correction altogether (as you suggest). Or, more accurately, keep operating in linear space and gamma-correcting, but creating your gradient not with consecutive values for the pixels(1,2,3...255) that would be perceived nonlinearly (because of Steven's), but values transformed by the inverse of our perceptual brightness response (that is, applying an exponent of 1/0.5=2 to the normalized values. This is applying the reciprocal of Steven's exponent for brightness).
As a matter of fact, if you see gamma-corrected linear gradient such as the one in http://scanline.ca/gradients/ you do not perceive it as linear at all: you see far more variation in the lower intensities than in the higher ones (as expected).
Well, at least this is my current understanding of the topic. I hope it helps anyone. And again, please, please, if it is wrong I would be really grateful if someone could point it out...
The problem is really when doing color calculations. For example, if you are blending two colors, you need to use the linear intensities to do the calculations. To actually display the proper result, you then have to convert the linear intensities back to the gamma-corrected intensities.
How your eyes perceive the intensities isn't relevant. To do color calculations correctly, they have to be done based on the physical principles of optics, which relies on linear luminance values. Once you have calculated a color, you want those luminance values to be output by your monitor, regardless of how it is perceived, so you have to compensate for the fact that the monitor doesn't directly produce the colors that you want.
To actually answer the question which is wrong with your way of looking at this - it is nothing really wrong with it. It WOULD be great to have a linear framebuffer, but as you say, it's definetely not great to have an 8-Bit linear frame buffer.
The fact that 8 bits are so easy to handle is pretty much the only justification for gamma compressed frame buffers and color notations (Think HTML's #888 - wouldn't it be uncool to use #333 for middle gray not #888).
About the monitor - you want to be able to predict it's response to your input, and you know from sRGB what it should be. Normally that's all you need to know. Some people think it's "correct" or something if the monitor produces "linear" output which can be simulated if you compensate for the monitor's gamma. I advise to steer clear of such a setup, which breaks all the apps which (correcly and sanely) assume standard gamma in favour of un-breaking ill-concieved linearity-assuming apps. Don't do that. Instead, fix the apps or dump them.

Background removal using Kinect: noise suppression around body shape

The objective is to display the person on a different background (aka background removal).
I'm using the Kinect with Microsoft's Beta Kinect SDK to do so. With help of the depth, the background is filtered and we get only the image of the person.
This is pretty simple to do, and we can find the code that does that everywhere on the Internet. However, the depth signal is noisy, and we get pixels which do not belong to the person that are displayed.
I applied an edge detector to see if it was useful, and I currently get this:
Here's another without edge detection:
My question is: Which way can I get rid of these noisy white pixels around the person?
I tried morphological operations, but some parts of the body are erased and still leave white pixels behind.
The algorithm doesn't need to be real-time, I can just apply it when I press a 'Save image' button.
Edit 1:
I just tried to do background substraction with the closest frames on the shape border. The single pixels you see are flickering, which means it is noise and I can get easily get rid of them.
Edit 2:
The project is now over, and here's what we did: manual calibration of the Kinect by using the OpenNI driver, which provides directly the infrared image. The result is really good, but each calibration is specific to each Kinect.
Then, we applied a little transparency on the borders, and the result looks really nice! I can't provide pictures, however.
Your problem isn't just the noisy white pixels. You're missing significant parts of the person as well, e.g. part of his right hand. I'd recommend being more conservative with your thresholding of the depth data (allow more false positives). This would give you more noisy pixels, but at least you'd have the person in their entirety.
To get rid of the noisy pixels, I can think of a couple of things:
Feather the outer pixels (reduce them in intensity/increase their transparency if you're using an alpha channel)
Smooth the image, perform the edge detection on the smoothed image, then use these edges with your original sharp image.
Do some skin region detection to mark parts that definitely belong to a person. See skin detection in the YUV color space? and Skin Color Detection
For clothes, work with the hue and saturation image. If you know the color of the t-shirt (or that at least that it's not a neutral color), then this will stand out easily. If you don't know this information, then it may be worth building up a model of the person using the other frames (if there's a big gray blob that's moving around in your video, chances are that your subject is wearing a gray shirt)
The approaches aren't mutually exclusive so it may be worth trying to do them in combination. If I think of anything else, I'll post back here.
If there is no other way of resolving the jitter on the edges you could always try anti-alias as post-process.

How to compensate for uneven illumination in a photograph of a printed page?

I am trying to teach my camera to be a scanner: I take pictures of printed text and then convert them to bitmaps (and then to djvu and OCR'ed). I need to compute a threshold for which pixels should be white and which black, but I'm stymied by uneven illumination. For example if the pixels in the center are dark enough, I'm likely to wind up with a bunch of black pixels in the corners.
What I would like to do, under relatively simple assumptions, is compensate for uneven illumination before thresholding. More precisely:
Assume one or two light sources, maybe one with gradual change in light intensity across the surface (ambient light) and another with an inverse square (direct light).
Assume that the white parts of the paper all have the same reflectivity/albedo/whatever.
Find some algorithm to estimate degree of illumination at each pixel, and from that recover the reflectivity of each pixel.
From a pixel's reflectivity, classify it white or black
I have no idea how to write an algorithm to do this. I don't want to fall back on least-squares fitting since I'd somehow like to ignore the dark pixels when estimating illumination. I also don't know if the algorithm will work.
All helpful advice will be upvoted!
EDIT: I've definitely considered chopping the image into pieces that are large enough so they still look like "text on a white background" but small enough so that illumination of a single piece is more or less even. I think if I then interpolate the thresholds so that there's no discontinuity across sub-image boundaries, I will probably get something halfway decent. This is a good suggestion, and I will have to give it a try, but it still leaves me with the problem of where to draw the line between white and black. More thoughts?
EDIT: Here are some screen dumps from GIMP showing different histograms and the "best" threshold value (chosen by hand) for each histogram. In two of the three a single threshold for the whole image is good enough. In the third, however, the upper left corner really needs a different threshold:
I'm not sure if you still need a solution after all this time, but if you still do. A few years ago I and my team photographed about 250,000 pages with a camera and converted them to (almost black and white ) grey scale images which we then DjVued ( also make pdfs of).
(See The catalogue and complete collection of photographic facsimiles of the 1144 paper transcripts of the French Institute of Pondicherry.)
We also ran into the problem of uneven illumination. We came up with a simple unsophisticated solution which worked very well in practice. This solution should also work to create black and white images rather than grey scale (as I'll describe).
The camera and lighting setup
a) We taped an empty picture frame to the top of a table to keep our pages in the exact same position.
b) We put a camera on a tripod also on top of the table above and pointing down at the taped picture frame and on a bar about a foot wide attached to the external flash holder on top of the camera we attached two "modelling lights". These can be purchased at any good camera shop. They are designed to provide even illumination. The camera was shaded from the lights by putting small cardboard box around each modelling light. We photographed in greyscale which we then further processed. (Our pages were old browned paper with blue ink writing so your case should be simpler).
Processing of the images
We used the free software package irfanview.
This software has a batch mode which can simultaneously do color correction, change the bit depth and crop the images. We would take the photograph of a page and then in interactive mode adjust the brightness, contrast and gamma settings till it was close to black and white. (We used greyscale but by setting the bit depth to 2 you will get black and white when you batch process all the pages.)
After determining the best color correction we then interactively cropped a single image and noted the cropping settings. We then set all these settings in the batch mode window and processed the pages for one book.
Creating DjVu images.
We used the free DjVu Solo 3.1 to create the DjVu images. This has several modes to create the DjVu images. The mode which creates black and white images didn't work well for us for photographs, but the "photo" mode did.
We didn't OCR (since the images were handwritten Sanskrit) but as long as the letters are evenly illuminated I think your OCR software should ignore big black areas like between a two page spread. But you can always get rid of the black between a two page spread or at the edges by cropping the pages twices once for the left hand pages and once for the right hand pages and the irfanview software will allow you to cleverly number your pages so you can then remerge the pages in the correct order. I.e rename your pages something like page-xxxA for lefthand pages and page-xxxB for righthand pages and the pages will then sort correctly on name.
If you still need a solution I hope some of the above is useful to you.
i would recommend calibrating the camera. considering that your lighting setup is fixed (that is the lights do not move between pictures), and your camera is grayscale (not color).
take a picture of a white sheet of paper which covers the whole workable area of your "scanner". store this picture, it tells what is white paper for each pixel. now, when you take take a picture of a document to scan, you can reload your "white reference picture" and even the illumination before performing a threshold.
let's call the white reference REF, the picture DOC, the even illumination picture EVEN, and the maximum value of a pixel MAX (for 8bit imaging, it is 255). for each pixel:
EVEN = DOC * (MAX/REF)
notes:
beware of the parenthesis: most image processing library uses the image pixel type for performing computation on pixel values and a simple multiplication will overload your pixel. eventually, write the loop yourself and use a 32 bit integer for intermediate computations.
the white reference image can be smoothed before being used in the process. any smoothing or blurring filter will do, and don't hesitate to apply it aggressively.
the MAX value in the formula above represents the target pixel value in the resulting image. using the maximum pixel value targets a bright white, but you can adjust this value to target a lighter gray.
Well. Usually the image processing I do is highly time sensitive, so a complex algorithm like the one you're seeking wouldn't work. But . . . have you considered chopping the image up into smaller pieces, and re-scaling each sub-image? That should make the 'dark' pixels stand out fairly well even in an image of variable lighting conditions (I am assuming here that you are talking about a standard mostly-white page with dark text.)
Its a cheat, but a lot easier than the 'right' way you're suggesting.
This might be horrendously slow, but what I'd recommend is to break the scanned surface into quarters/16ths and re-color them so that the average grayscale level is similar across the page. (Might break if you have pages with large margins though)
I assume that you are taking images of (relatively) small black letters on a white background.
One approach could be to "remove" the small black objects, while keeping the illumination variations of the background. This gives an estimate of how the image is illuminated, which can be used for normalizing the original image. It is often enough to subtract the illumination estimate from the original image and then do a threshold based segmentation.
This approach is based on gray scale morphological filters, and could be implemented in matlab like below:
img = imread('filename.png');
illumination = imclose(img, strel('disk', 10));
imgCorrected = img - illumination;
thresholdValue = graythresh(imgCorrected);
bw = imgCorrected > thresholdValue;
For an example with real images take a look at this guide from mathworks. For further reading about the use of morphological image analysis this book by Pierre Soille can be recommended.
Two algorithms come to my mind:
High-pass to alleviate the low-frequency illumination gradient
Local threshold with an appropriate radius
Adaptive thresholding is the keyword. Quote from a 2003 article by R.
Fisher, S. Perkins, A. Walker, and E. Wolfart: “This more sophisticated version
of thresholding can accommodate changing lighting conditions in the image, e.g.
those occurring as a result of a strong illumination gradient or shadows.”
ImageMagick's -lat option can do it, for example:
convert -lat 50x50-2000 input.jpg output.jpg
input.jpg
output.jpg
You could try using an edge detection filter, then a floodfill algorithm, to distinguish the background from the foreground. Interpolate the floodfilled region to determine the local illumination; you may also be able to modify the floodfill algorithm to use the local background value to jump across lines and fill boxes and so forth.
You could also try a Threshold Hysteresis with a rate of change control. Here is the link to the normal Threshold Hysteresis. Set the first threshold to a typical white value. Set the second threshold to less than the lowest white value in the corners.
The difference is that you want to check the difference between pixels for all values in between the first and second threshold. Ideally if the difference is positive, then act normally. But if it is negative, you only want to threshold if the difference is small.
This will be able to compensate for lighting variations, but will ignore the large changes between the background and the text.
Why don't you use simple opening and closing operations?
Try this, just lool at the results:
src - cource image
src - open(src)
close(src) - src
and look at the close - src result
using different window size, you will get backgound of the image.
I think this helps.

Adaptive threshold Binarization's bad effects

I implemented some adaptive binarization methods, they use a small window and at each pixel the threshold value is calculated. There are problems with these methods:
If we select the window size too small we will get this effect (I think the reason is because of window size is small)
(source: piccy.info)
At the left upper corner there is an original image, right upper corner - global threshold result. Bottom left - example of dividing image to some parts (but I am talking about analyzing image's pixel small surrounding, for example window of size 10X10).
So you can see the result of such algorithms at the bottom right picture, we got a black area, but it must be white.
Does anybody know how to improve an algorithm to solve this problem?
There shpuld be quite a lot of research going on in this area, but unfortunately I have no good links to give.
An idea, which might work but I have not tested, is to try to estimate the lighting variations and then remove that before thresholding (which is a better term than "binarization").
The problem is then moved from adaptive thresholding to finding a good lighting model.
If you know anything about the light sources then you could of course build a model from that.
Otherwise a quick hack that might work is to apply a really heavy low pass filter to your image (blur it) and then use that as your lighting model. Then create a difference image between the original and the blurred version, and threshold that.
EDIT: After quick testing, it appears that my "quick hack" is not really going to work at all. After thinking about it I am not very surprised either :)
I = someImage
Ib = blur(I, 'a lot!')
Idiff = I - Idiff
It = threshold(Idiff, 'some global threshold')
EDIT 2
Got one other idea which could work depending on how your images are generated.
Try estimating the lighting model from the first few rows in the image:
Take the first N rows in the image
Create a mean row from the N collected rows. You know have one row as your background model.
For each row in the image subtract the background model row (the mean row).
Threshold the resulting image.
Unfortunately I am at home without any good tools to test this.
It looks like you're doing adaptive thresholding wrong. Your images look as if you divided your image into small blocks, calculated a threshold for each block and applied that threshold to the whole block. That would explain the "box" artifacts. Usually, adaptive thresholding means finding a threshold for each pixel separately, with a separate window centered around the pixel.
Another suggestion would be to build a global model for your lighting: In your sample image, I'm pretty sure you could fit a plane (in X/Y/Brightness space) to the image using least-squares, then separate the pixels into pixels brighter (foreground) and darker than that plane (background). You can then fit separate planes to the background and foreground pixels, threshold using the mean between these planes again and improve the segmentation iteratively. How well that would work in practice depends on how well your lightning can be modeled with a linear model.
If the actual objects you try to segment are "thinner" (you said something about barcodes in a comment), you could try a simple opening/closing operation the get a lighting model. (i.e. close the image to remove the foreground pixels, then use [closed image+X] as threshold).
Or, you could try mean-shift filtering to get the foreground and background pixels to the same brightness. (Personally, I'd try that one first)
You have very non-uniform illumination and fairly large object (thus, no universal easy way to extract the background and correct the non-uniformity). This basically means you can not use global thresholding at all, you need adaptive thresholding.
You want to try Niblack binarization. Matlab code is available here
http://www.uio.no/studier/emner/matnat/ifi/INF3300/h06/undervisningsmateriale/week-36-2006-solution.pdf (page 4).
There are two parameters you'll have to tune by hand: window size (N in the above code) and weight.
Try to apply a local adaptive threshold using this procedure:
convolve the image with a mean or median filter
subtract the original image from the convolved one
threshold the difference image
The local adaptive threshold method selects an individual threshold for each pixel.
I'm using this approach extensively and it's working fine with images having non uniform background.

OpenCV: Detect blinking lights in a video feed

I have a video feed. This video feed contains several lights blinking at different rates. All lights are the same color (they are all infrared LEDs). How can I detect the position and frequency of these blinking lights?
Disclaimer: I am extremely new to OpenCV. I do have a copy of Learning OpenCV, but I am finding it a bit overwhelming. If anyone could explain a solution in OpenCV terminology, it would be greatly appreciated. I am not expecting code to be written for me.
Threshold each image in the sequence with a threshold that makes the LED:s visible. If you can threshold it with a threshold that only keeps the LED and removes background then you are more or less finished since all you need to do now is to keep track of each position that has seen a LED and count how often it occurs.
As a middle step, if there is "background noise" in the thresholded image would be to use erosion to remove small mistakes, and then maybe dilate to "close holes" in the blobs you are actually interested in.
If the scene is static you could also make a simple background model by taking the median of a few frames and removing the resulting median image from any frame and threshold that. Stuff that has changed (your LEDs) will appear stronger.
If the scene is moving I see no other (easy) solution than making sure the LED are bright enough to be able to use the threshold approach given above.
As for OpenCV: if you know what you want to do, it is not very hard to find a function that does it. The hard part is coming up with a method to solve the problem, not the actual coding.
If the leds are stationary, the problem is far simpler than when they are moving. Assuming they are stationary, a solution to find the frequency could simply be to keep a vector or an array for each pixel location in which you store the values of that pixel, preferably after the preprocessing described by kigurai, over some timeframe. You can then compute the 1D fourier transform of those value vectors and find the ground frequency as the first significant component after the DC peak. If the DC peak is too low, it means there is no led there.
Hope this problem is still somewhat actual, and that my solution makes sense.

Resources