I have the following rather simple problem and unfortunately I am not getting forward.
Imagine a simple 2D image with pixels and a unique value for each pixel of the image.
For example, let the image be 512x512 pixels in size and 10 nm x 10 nm in dimension. Since I want to view the image in frequency space, I calculate the Fourier transform of the image.
Of course, the image still has 512x512 pixels, but what about the units? I would say the physical units are now 1/m, but what happens to the 10 nm?
So what would be the dimensions of my image after the Fourier transform? Would be very grateful for any help.
The first frequency bin (after the DC bin) stands for "1 cycle" across the whole domain, so that means 1 / (10 nm) in your case, i.e. wavelength is 10 nm. The pattern continues, next bin stands for 2 cycles, or 2 / (10 nm), i.e. wavelength is 5 nm.
I'm not sure it makes sense to translate the length units in this process.
Related
I need to implement dimension inspection of an object with a tolerance of 20 microns using image processing. To measure the dimension in mm, i need the mm per pixel value for pixel to mm conversion.
Camera and lens Specifications:
5 MP Matrix vision camera (2592 x 1944)
25 mm lens
How i tried to do it:
I used a 30 cm ruler to get the actual field of view in mm covered by the camera.I got a plot of the image using Matplotlib function in OpenCV as shown in the fig.
Image for scaling
From the image i got 31 mm as the actual width covered by the camera and the camera resolution is 2592 x 1944. So i obtained mm/pixel = 31/2952 = 0.011959876.
But i want to know if it is the correct way to find the mm/pixel value using a centimeter scale specially when tolerance of 20 micron is needed in dimension inspection. If this is not the correct way, then a solution procedure for finding mm/pixel value would be really helpful.
I believe what you are doing really borderline. First of all, to be as precise as possible I would use the right (or left) edge of the most left and most right ruler ticks like I sketched here:
and then use this value in pixel to calculate the mm/pixel calibration value. Even using this method 20 mu is really tough to achieve. Let's say we can determine the ruler tick edge position with a precision of 2 pixels (very optimistic) then you would have an error of about 31mm/2580 * 2, which is about 25 mu.
If you really need the 20mu calibration precision I would go for a microscope calibration target. I've been always used one of those for this kind of calibration task.
20 microns over a field of view of 31 mm = 31000 µm corresponds to 1.7 pixel, so your measurement error must be smaller than that. This is a stringent requirement. Your ruler and manual operation are not appropriate.
In the first place, you should check the magnitude of the lens distortion, which could very well exceed these 1.7 pixels. You will need a precise calibration procedure that can fit a deformation model to the image. For this purpose you should use a certified calibration target such as grid of dots or a chessboard pattern.
At the same time as the calibration software measures and compensates the distortion, it will provide the scale factor between physical units (knowing the grid spacing) and pixels. You can measure feature location on the target by blob analysis or gauging techniques, then use least-squares fitting of a model.
Software packages made for machine vision applications do contain such tools.
Also be aware that there can be a bias in the dimensional measurement of the object due to mis-location of the edges. Simply moving the light source can result in variations of the measured size.
If your objects are always the same and at the same place in the field of view, a cheap solution is to establish a repeatable measurement procedure in pixels, and physically measure one of the parts. This will give you a scale factor valid in the same conditions.
But simply moving the object will have a noticeable effect, both by changing the light reflection/shadows on edges and by having a different distortion.
somehow detecting charuco diamonds does not work with bigger images for me. With my original images of 1920x1080 it neither recognizes the ids reliably (the diamond ids elements are switching places every time). In the first image, you can see it recognizes (7, 9, 45, 2).
Then I tried downsampling the images to 960x540, and dividing the calibration params, f, c, to half, and it works! The id is correctly recognized as (2,7,45,9) and the pose estimation is accurate.
How to make it work for bigger images? I tried changing the detection parameters depending on absolute pixel units (not relative to image size). Here is a list of my current parameters. I realized increasing the Window size for threasholding helps recognizing the squares, but not for id or pose estimation.
nmarkers: 1024
adaptiveThreshWinSizeMin: 13
adaptiveThreshWinSizeMax: 113
adaptiveThreshWinSizeStep: 10
adaptiveThreshWinSize: 42
adaptiveThreshConstant: 7
minMarkerPerimeterRate: 0.1
maxMarkerPerimeterRate: 4.0
polygonalApproxAccuracyRate: 0.05
minCornerDistance: 10.0
minDistanceToBorder: 10
minMarkerDistance: 10.0
minMarkerDistanceRate: 0.05
doCornerRefinement: false
cornerRefinementWinSize: 5
cornerRefinementMaxIterations: 30
cornerRefinementMinAccuracy: 0.1
markerBorderBits: 1
perspectiveRemovePixelPerCell: 8
perspectiveRemoveIgnoredMarginPerCell: 0.13
maxErroneousBitsInBorderRate: 0.04
minOtsuStdDev: 5.0
errorCorrectionRate: 0.6
Any hints?
thank you!
At the end I needed to patch the opencv aruco module. It was a matter of a certain treshold escalating too fast (to the 4th) to the image size (closestCandidateDistance in refineDetectedMarkers). The solution was to make minRepDistance in detectCharucoDiamond to only scale linearly with the image size.
Full answer and patch in the opencv forum.
I am implementing Lowe's method, "SIFT", for finding and describing features in an image.
I have found interest points, and now I have to describe them: Using Lowe's method, I have calculated the magnitude and gradient in an area around the keypoint, and created a Gauss weighted histogram, with 36 bins, each corresponding to an orientation of 10 degrees. For each keypoint, there is a histogram. Each bin is the sum of the weighted magnitude, in that direction. An example taken from aishack.in: http://www.aishack.in/static/img/tut/sift-orientation-histogram.jpg
Bins within 80% the size of the maximum bin, is made a new keypoint. After describing, it says in the paper: "Finally, a parabola is fit to the 3 histogram values closest to each peak to interpolate the peak position for better accuracy". I am not sure i get this.
In my understanding, it means the peak, the left, and the right value of that peak, will have a parabola fit, like this(be warned! Drawn free hand)
http://i.stack.imgur.com/7V8pb.jpg
and the orientation of the keypoint will be where the extremum of the parabola is. For instance: If the parabola fitted at 10-19, 20-29, and 30-39 (with 20-29 being the histogram peak), had extremum at a point, that reached in the 30-39, then this would be the orientation of that keypoint. Am i understanding this correctly? In this way, the orientation of the keypoint, can only be within 36 orientations
Another option: Same idea as above, only the histogram is no longer discrete: the extremum of the parapola will thus be a continuous value, and this value is assigned to the keypoint.
The idea of the parabola fitting is to find the peak with better than bin resolution. As you see in your example, the peak is at 20-29 (average 24.5) but the 10-19 bin is higher than the 30-39 bin. It's therefore likely that the precise peak should be below 24.5.
You can't have a non-discrete histogram, that defeats the point of a histogram. What you can have is overlapping bins: create a bin for 20-29, but also a bin for 21-30 and 22-31 etc. So the value 24 would map to 10 bins, from 15-24 to 24-35.
And when you increment a bin, you don't necessarily need to increment it by 1. You can also increment a bin by a variable amount, e.g. the distance from the given value to the edge of the bin. So 24 would add 1 to bin 16-25 but 4 to bin 20-29.
I am looking for a "very" simple way to check if an image bitmap is blur. I do not need accurate and complicate algorithm which involves fft, wavelet, etc. Just a very simple idea even if it is not accurate.
I've thought to compute the average euclidian distance between pixel (x,y) and pixel (x+1,y) considering their RGB components and then using a threshold but it works very bad. Any other idea?
Don't calculate the average differences between adjacent pixels.
Even when a photograph is perfectly in focus, it can still contain large areas of uniform colour, like the sky for example. These will push down the average difference and mask the details you're interested in. What you really want to find is the maximum difference value.
Also, to speed things up, I wouldn't bother checking every pixel in the image. You should get reasonable results by checking along a grid of horizontal and vertical lines spaced, say, 10 pixels apart.
Here are the results of some tests with PHP's GD graphics functions using an image from Wikimedia Commons (Bokeh_Ipomea.jpg). The Sharpness values are simply the maximum pixel difference values as a percentage of 255 (I only looked in the green channel; you should probably convert to greyscale first). The numbers underneath show how long it took to process the image.
If you want them, here are the source images I used:
original
slightly blurred
blurred
Update:
There's a problem with this algorithm in that it relies on the image having a fairly high level of contrast as well as sharp focused edges. It can be improved by finding the maximum pixel difference (maxdiff), and finding the overall range of pixel values in a small area centred on this location (range). The sharpness is then calculated as follows:
sharpness = (maxdiff / (offset + range)) * (1.0 + offset / 255) * 100%
where offset is a parameter that reduces the effects of very small edges so that background noise does not affect the results significantly. (I used a value of 15.)
This produces fairly good results. Anything with a sharpness of less than 40% is probably out of focus. Here's are some examples (the locations of the maximum pixel difference and the 9×9 local search areas are also shown for reference):
(source)
(source)
(source)
(source)
The results still aren't perfect, though. Subjects that are inherently blurry will always result in a low sharpness value:
(source)
Bokeh effects can produce sharp edges from point sources of light, even when they are completely out of focus:
(source)
You commented that you want to be able to reject user-submitted photos that are out of focus. Since this technique isn't perfect, I would suggest that you instead notify the user if an image appears blurry instead of rejecting it altogether.
I suppose that, philosophically speaking, all natural images are blurry...How blurry and to which amount, is something that depends upon your application. Broadly speaking, the blurriness or sharpness of images can be measured in various ways. As a first easy attempt I would check for the energy of the image, defined as the normalised summation of the squared pixel values:
1 2
E = --- Σ I, where I the image and N the number of pixels (defined for grayscale)
N
First you may apply a Laplacian of Gaussian (LoG) filter to detect the "energetic" areas of the image and then check the energy. The blurry image should show considerably lower energy.
See an example in MATLAB using a typical grayscale lena image:
This is the original image
This is the blurry image, blurred with gaussian noise
This is the LoG image of the original
And this is the LoG image of the blurry one
If you just compute the energy of the two LoG images you get:
E = 1265 E = 88
or bl
which is a huge amount of difference...
Then you just have to select a threshold to judge which amount of energy is good for your application...
calculate the average L1-distance of adjacent pixels:
N1=1/(2*N_pixel) * sum( abs(p(x,y)-p(x-1,y)) + abs(p(x,y)-p(x,y-1)) )
then the average L2 distance:
N2= 1/(2*N_pixel) * sum( (p(x,y)-p(x-1,y))^2 + (p(x,y)-p(x,y-1))^2 )
then the ratio N2 / (N1*N1) is a measure of blurriness. This is for grayscale images, for color you do this for each channel separately.
Given an image (Like the one given below) I need to convert it into a binary image (black and white pixels only). This sounds easy enough, and I have tried with two thresholding functions. The problem is I cant get the perfect edges using either of these functions. Any help would be greatly appreciated.
The filters I have tried are, the Euclidean distance in the RGB and HSV spaces.
Sample image:
Here it is after running an RGB threshold filter. (40% it more artefects after this)
Here it is after running an HSV threshold filter. (at 30% the paths become barely visible but clearly unusable because of the noise)
The code I am using is pretty straightforward. Change the input image to appropriate color spaces and check the Euclidean distance with the the black color.
sqrt(R*R + G*G + B*B)
since I am comparing with black (0, 0, 0)
Your problem appears to be the variation in lighting over the scanned image which suggests that a locally adaptive thresholding method would give you better results.
The Sauvola method calculates the value of a binarized pixel based on the mean and standard deviation of pixels in a window of the original image. This means that if an area of the image is generally darker (or lighter) the threshold will be adjusted for that area and (likely) give you fewer dark splotches or washed-out lines in the binarized image.
http://www.mediateam.oulu.fi/publications/pdf/24.p
I also found a method by Shafait et al. that implements the Sauvola method with greater time efficiency. The drawback is that you have to compute two integral images of the original, one at 8 bits per pixel and the other potentially at 64 bits per pixel, which might present a problem with memory constraints.
http://www.dfki.uni-kl.de/~shafait/papers/Shafait-efficient-binarization-SPIE08.pdf
I haven't tried either of these methods, but they do look promising. I found Java implementations of both with a cursory Google search.
Running an adaptive threshold over the V channel in the HSV color space should produce brilliant results. Best results would come with higher than 11x11 size window, don't forget to choose a negative value for the threshold.
Adaptive thresholding basically is:
if (Pixel value + constant > Average pixel value in the window around the pixel )
Pixel_Binary = 1;
else
Pixel_Binary = 0;
Due to the noise and the illumination variation you may need an adaptive local thresholding, thanks to Beaker for his answer too.
Therefore, I tried the following steps:
Convert it to grayscale.
Do the mean or the median local thresholding, I used 10 for the window size and 10 for the intercept constant and got this image (smaller values might also work):
Please refer to : http://homepages.inf.ed.ac.uk/rbf/HIPR2/adpthrsh.htm if you need more
information on this techniques.
To make sure the thresholding was working fine, I skeletonized it to see if there is a line break. This skeleton may be the one needed for further processing.
To get ride of the remaining noise you can just find the longest connected component in the skeletonized image.
Thank you.
You probably want to do this as a three-step operation.
use leveling, not just thresholding: Take the input and scale the intensities (gamma correct) with parameters that simply dull the mid tones, without removing the darks or the lights (your rgb threshold is too strong, for instance. you lost some of your lines).
edge-detect the resulting image using a small kernel convolution (5x5 for binary images should be more than enough). Use a simple [1 2 3 2 1 ; 2 3 4 3 2 ; 3 4 5 4 3 ; 2 3 4 3 2 ; 1 2 3 2 1] kernel (normalised)
threshold the resulting image. You should now have a much better binary image.
You could try a black top-hat transform. This involves substracting the Image from the closing of the Image. I used a structural element window size of 11 and a constant threshold of 0.1 (25.5 on for a 255 scale)
You should get something like:
Which you can then easily threshold:
Best of luck.