SIFT parabola fitting of histogram - image-processing

I am implementing Lowe's method, "SIFT", for finding and describing features in an image.
I have found interest points, and now I have to describe them: Using Lowe's method, I have calculated the magnitude and gradient in an area around the keypoint, and created a Gauss weighted histogram, with 36 bins, each corresponding to an orientation of 10 degrees. For each keypoint, there is a histogram. Each bin is the sum of the weighted magnitude, in that direction. An example taken from aishack.in: http://www.aishack.in/static/img/tut/sift-orientation-histogram.jpg
Bins within 80% the size of the maximum bin, is made a new keypoint. After describing, it says in the paper: "Finally, a parabola is fit to the 3 histogram values closest to each peak to interpolate the peak position for better accuracy". I am not sure i get this.
In my understanding, it means the peak, the left, and the right value of that peak, will have a parabola fit, like this(be warned! Drawn free hand)
http://i.stack.imgur.com/7V8pb.jpg
and the orientation of the keypoint will be where the extremum of the parabola is. For instance: If the parabola fitted at 10-19, 20-29, and 30-39 (with 20-29 being the histogram peak), had extremum at a point, that reached in the 30-39, then this would be the orientation of that keypoint. Am i understanding this correctly? In this way, the orientation of the keypoint, can only be within 36 orientations
Another option: Same idea as above, only the histogram is no longer discrete: the extremum of the parapola will thus be a continuous value, and this value is assigned to the keypoint.

The idea of the parabola fitting is to find the peak with better than bin resolution. As you see in your example, the peak is at 20-29 (average 24.5) but the 10-19 bin is higher than the 30-39 bin. It's therefore likely that the precise peak should be below 24.5.
You can't have a non-discrete histogram, that defeats the point of a histogram. What you can have is overlapping bins: create a bin for 20-29, but also a bin for 21-30 and 22-31 etc. So the value 24 would map to 10 bins, from 15-24 to 24-35.
And when you increment a bin, you don't necessarily need to increment it by 1. You can also increment a bin by a variable amount, e.g. the distance from the given value to the edge of the bin. So 24 would add 1 to bin 16-25 but 4 to bin 20-29.

Related

Required tolerance for camera calibration target

In reading about and experimenting with camera calibration I haven't seen any mention of the required tolerance for the placement of calibration targets. For example say I have a field of view of 200mm x 30mm and I want to be able to measure the position of objects in this field to within 1mm. I will calibrate my camera using a grid pattern and the OpenCV calibrateCamera flow. Say my calibration target is a printed chessboard grid with 5mm pitch. What is the tolerance on that 5mm spacing between corners on my target? Does a tighter tolerance result in more accurate pixel to real-world transformation? Does a tighter tolerance result in better distortion removal?
Note I'm measuring objects on a 2D plane, no depth measurement, and unfortunately I don't have the ability to move the calibration targets around and take multiple views of it. So I'm talking specifically about calibrating using a single view.
Calibration using a single view is a poor idea, generally speaking, because of the small number of independent samples it entails, so it is possible that tolerance on the calibration grid manufacture be the least of your worries. But if you must...
The controlling factor here is the sensor's dot pitch. Given the nominal focal length of your lens, and that you want your calibration RMSE to be order of a few tenths of pixel, you can work out the angle spanned by, say, 1/10 of a pixel along the sensor's horizontal axis. Back projecting that at the nominal distance between the lens's exit pupil and the target will give you a length in 3D world that measures the uncertainty in a target's corner location at the calibration optimum. Your physical target points should be known at least as accurately, and normally better.
Example:
Setup: Dot pitch 5um, 16mm focal lens, 200mm working distance to target.
Backprojected 1/10 pixel: 200/16*0.5um =~ 6um.
Backprojected 1/2 pixel : 200/16*2.5um =~ 31um.
You can loosen that if you assume perfect Chi-square scaling of the errors with the square root of the number of the data points. If you have, say, 100 corners, you can multiply that by 10, i.e. ~ 300um for 1/2 pixel
Note that with this kind of tolerances temperature control (for camera and target) may become a factor to keep into account.

Is there a way to find mm per pixel value for a camera?

I need to implement dimension inspection of an object with a tolerance of 20 microns using image processing. To measure the dimension in mm, i need the mm per pixel value for pixel to mm conversion.
Camera and lens Specifications:
5 MP Matrix vision camera (2592 x 1944)
25 mm lens
How i tried to do it:
I used a 30 cm ruler to get the actual field of view in mm covered by the camera.I got a plot of the image using Matplotlib function in OpenCV as shown in the fig.
Image for scaling
From the image i got 31 mm as the actual width covered by the camera and the camera resolution is 2592 x 1944. So i obtained mm/pixel = 31/2952 = 0.011959876.
But i want to know if it is the correct way to find the mm/pixel value using a centimeter scale specially when tolerance of 20 micron is needed in dimension inspection. If this is not the correct way, then a solution procedure for finding mm/pixel value would be really helpful.
I believe what you are doing really borderline. First of all, to be as precise as possible I would use the right (or left) edge of the most left and most right ruler ticks like I sketched here:
and then use this value in pixel to calculate the mm/pixel calibration value. Even using this method 20 mu is really tough to achieve. Let's say we can determine the ruler tick edge position with a precision of 2 pixels (very optimistic) then you would have an error of about 31mm/2580 * 2, which is about 25 mu.
If you really need the 20mu calibration precision I would go for a microscope calibration target. I've been always used one of those for this kind of calibration task.
20 microns over a field of view of 31 mm = 31000 µm corresponds to 1.7 pixel, so your measurement error must be smaller than that. This is a stringent requirement. Your ruler and manual operation are not appropriate.
In the first place, you should check the magnitude of the lens distortion, which could very well exceed these 1.7 pixels. You will need a precise calibration procedure that can fit a deformation model to the image. For this purpose you should use a certified calibration target such as grid of dots or a chessboard pattern.
At the same time as the calibration software measures and compensates the distortion, it will provide the scale factor between physical units (knowing the grid spacing) and pixels. You can measure feature location on the target by blob analysis or gauging techniques, then use least-squares fitting of a model.
Software packages made for machine vision applications do contain such tools.
Also be aware that there can be a bias in the dimensional measurement of the object due to mis-location of the edges. Simply moving the light source can result in variations of the measured size.
If your objects are always the same and at the same place in the field of view, a cheap solution is to establish a repeatable measurement procedure in pixels, and physically measure one of the parts. This will give you a scale factor valid in the same conditions.
But simply moving the object will have a noticeable effect, both by changing the light reflection/shadows on edges and by having a different distortion.

Meaning of Histogram on Tensorboard

I am working on Google Tensorboard, and I'm feeling confused about the meaning of Histogram Plot. I read the tutorial, but it seems unclear to me. I really appreciate if anyone could help me figure out the meaning of each axis for Tensorboard Histogram Plot.
Sample histogram from TensorBoard
I came across this question earlier, while also seeking information on how to interpret the histogram plots in TensorBoard. For me, the answer came from experiments of plotting known distributions.
So, the conventional normal distribution with mean = 0 and sigma = 1 can be produced in TensorFlow with the following code:
import tensorflow as tf
cwd = "test_logs"
W1 = tf.Variable(tf.random_normal([200, 10], stddev=1.0))
W2 = tf.Variable(tf.random_normal([200, 10], stddev=0.13))
w1_hist = tf.summary.histogram("weights-stdev_1.0", W1)
w2_hist = tf.summary.histogram("weights-stdev_0.13", W2)
summary_op = tf.summary.merge_all()
init = tf.initialize_all_variables()
sess = tf.Session()
writer = tf.summary.FileWriter(cwd, session.graph)
sess.run(init)
for i in range(2):
writer.add_summary(sess.run(summary_op),i)
writer.flush()
writer.close()
sess.close()
Here is what the result looks like:
.
The horizontal axis represents time steps.
The plot is a contour plot and has contour lines at the vertical axis values of -1.5, -1.0, -0.5, 0.0, 0.5, 1.0, and 1.5.
Since the plot represents a normal distribution with mean = 0 and sigma = 1 (and remember that sigma means standard deviation), the contour line at 0 represents the mean value of the samples.
The area between the contour lines at -0.5 and +0.5 represent the area under a normal distribution curve captured within +/- 0.5 standard deviations from the mean, suggesting that it is 38.3% of the sampling.
The area between the contour lines at -1.0 and +1.0 represent the area under a normal distribution curve captured within +/- 1.0 standard deviations from the mean, suggesting that it is 68.3% of the sampling.
The area between the contour lines at -1.5 and +1-.5 represent the area under a normal distribution curve captured within +/- 1.5 standard deviations from the mean, suggesting that it is 86.6% of the sampling.
The palest region extends a little beyond +/- 4.0 standard deviations from the mean, and only about 60 per 1,000,000 samples will be outside of this range.
While Wikipedia has a very thorough explanation, you can get the most relevant nuggets here.
Actual histogram plots will show several things. The plot regions will grow and shrink in vertical width as the variation of the monitored values increases or decreases. The plots may also shift up or down as the mean of the monitored values increases or decreases.
(You may have noted that the code actually produces a second histogram with a standard deviation of 0.13. I did this to clear up any confusion between the plot contour lines and the vertical axis tick marks.)
#marc_alain, you're a star for making such a simple script for TB, which are hard to find.
To add to what he said the histograms showing 1,2,3 sigma of the distribution of weights. which is equivalent to the 68th,95th, and 98th percentiles. So think if you're model has 784 weights, the histogram shows how the values of those weights change with training.
These histograms are probably not that interesting for shallow models, you could imagine that with deep networks, weights in high layers might take a while to grow because of the logistic function being saturated. Of course I'm just mindlessly parroting this paper by Glorot and Bengio, in which they study the weights distribution through training and show how the logistic function is saturated for the higher layers for quite a while.
When plotting histograms, we put the bin limits on the x-axis and the count on the y-axis. However, the whole point of histogram is to show how a tensor changes over times. Hence, as you may have already guessed, the depth axis (z-axis) containing the numbers 100 and 300, shows the epoch numbers.
The default histogram mode is Offset mode. Here the histogram for each epoch is offset in the z-axis by a certain value (to fit all epochs in the graph). This is like seeing all histograms places one after the other, from one corner of the ceiling of the room (from the mid point of the front ceiling edge to be precise).
In the Overlay mode, the z-axis is collapsed, and the histograms become transparent, so you can move and hover over to highlight the one corresponding to a particular epoch. This is more like the front view of the Offset mode, with only outlines of histograms.
As explained in the documentation here:
tf.summary.histogram
takes an arbitrarily sized and shaped Tensor, and compresses it into a
histogram data structure consisting of many bins with widths and
counts. For example, let's say we want to organize the numbers [0.5,
1.1, 1.3, 2.2, 2.9, 2.99] into bins. We could make three bins:
a bin containing everything from 0 to 1 (it would contain one element, 0.5),
a bin containing everything from 1-2 (it would contain two elements, 1.1 and 1.3),
a bin containing everything from 2-3 (it would contain three elements: 2.2, 2.9 and 2.99).
TensorFlow uses a similar approach to create bins, but unlike in our
example, it doesn't create integer bins. For large, sparse datasets,
that might result in many thousands of bins. Instead, the bins are
exponentially distributed, with many bins close to 0 and comparatively
few bins for very large numbers. However, visualizing
exponentially-distributed bins is tricky; if height is used to encode
count, then wider bins take more space, even if they have the same
number of elements. Conversely, encoding count in the area makes
height comparisons impossible. Instead, the histograms resample the
data into uniform bins. This can lead to unfortunate artifacts in
some cases.
Please read the documentation further to get the full knowledge of plots displayed in the histogram tab.
Roufan,
The histogram plot allows you to plot variables from your graph.
w1 = tf.Variable(tf.zeros([1]),name="a",trainable=True)
tf.histogram_summary("firstLayerWeight",w1)
For the example above the vertical axis would have the units of my w1 variable. The horizontal axis would have units of the step which I think is captured here:
summary_str = sess.run(summary_op, feed_dict=feed_dict)
summary_writer.add_summary(summary_str, **step**)
It may be useful to see this on how to make summaries for the tensorboard.
Don
Each line on the chart represents a percentile in the distribution over the data: for example, the bottom line shows how the minimum value has changed over time, and the line in the middle shows how the median has changed. Reading from top to bottom, the lines have the following meaning: [maximum, 93%, 84%, 69%, 50%, 31%, 16%, 7%, minimum]
These percentiles can also be viewed as standard deviation boundaries on a normal distribution: [maximum, μ+1.5σ, μ+σ, μ+0.5σ, μ, μ-0.5σ, μ-σ, μ-1.5σ, minimum] so that the colored regions, read from inside to outside, have widths [σ, 2σ, 3σ] respectively.

soft binning in SIFT

According to "Lowe, David G. "Distinctive image features from scale-invariant keypoints." International journal of
computer vision 60.2 (2004): 91-110 "
"It is important to avoid all boundary affects in which the descriptor
abruptly changes as a sample shifts smoothly from being within one
histogram to another or from one orientation to another. Therefore,
trilinear interpolation is used to distribute the value of each
gradient sample into adjacent histogram bins. In other words, each
entry into a bin is multiplied by a weight of 1−d for each dimension,
where d is the distance of the sample from the central value of the
bin as measured in units of the histogram bin spacing."
I am calculating the orientation[t] and location of gradient(x,y) which will be in floating point. Currently, I was just
providing the gradient magnitude to 3d histogram values[t][x][y] ( means the lower bound of floating point values of t,x
and y). But, according to paper, I have to distribute the gradient magnitude to adjacent bins. I am not sure about how
to distribute it.
I got my answer on following link:
HOG Trilinear Interpolation of Histogram Bins

Computing HOG features

I have one problem in the second step which is to accumulate weighted votes for gradient orientation over spatial cells.
Assuming the cell is 8*8. Let me use two matrix GO[8][8]([1 9]), GM[8][8] to represent the gradient orientation and gradient magnitude respectively.
The gradient orientation ranges from 0 - 180 and there are 9 orientation bins.
According to my understanding of HOG, for every pixel in a cell, adding its gradient magnitude to its corresponding orientation bin. In this way, we can have the histogram for every cell.
But there is one sentence thats confusing me.
"To reduce aliasing, votes(gradient magnitude) are interpolated
trilinearly between the neighbouring bin centers in both orientation
and position."1
Why interpolated? How to interpolate? Can someone explains more detailed? No reducing aliasing.
Thanks in advance.
1 This sentence is in Navneet Dalal's PHD thesis, p38, line 4.
Interpolation is a standard technique for computing histograms. The idea here is that each value is not simply placed into one bin, but is distributed between two neighboring bins (assuming a 1d histogram), based on how far away it is from the center of the original bin.
The purpose of this is to deal with situations when a small error in your measurement can cause a value to be placed into a different bin. This is a very good thing to do for any type of histogram, not just for HOGs, assuming you have the CPU cycles.
There is also bi-linear and tri-linear interpolation for 2d and 3d histograms, where each value is distributed between 4 and 8 neighboring bins respectively.

Resources