What is "same quarter" meaning about histogram? - image-processing

In "Adaptive document image binarization" paper (link: http://www.mediateam.oulu.fi/publications/pdf/24.p) I found SDM, TBM algorithm for Text/Image Segmentation,
But I can't understand what "same quarter" is in the followed this paragraph.
If the average is high and a global histogram peak is in
the same quarter of the histogram and transient differ-
ence is transient, then use SDM.
If the average is medium and a global histogram peak
is not in the same quarter of the histogram and transi-
ent difference is uniform, then use TBM.
I know that a quarter meaning is 1/4. But i think that quarter is different meaning.. right?

After skimming the paper very quickly, I found two possible ways to interpret this.
From the current bin, choose a quarter of the histogram by looking 1/8th to the left and 1/8th to the right. i.e. if your histogram has 256 bins, and you are at bin 50, the quarter you are looking for is [18, 81]. So if the average is high and the peak lies in [18,81], use SDM.
Divide the entire histogram into quarters, and check which quarter your current bin lies in. i.e. if your histogram has 256 bins, divide it into [0, 63], [64, 127], [128, 191], [192, 255]. If your current bin is 50, you are in quarter 1, and so if the average is medium and the peak lies anywhere outside quarter 1, use TBM.
Based on intuition and mathematical sense, option 1 is more likely. But I would try both and see which implementation gives better results.

Related

GLCM Texture analysis in Sentinel-1 SNAP toolbox outputs texture with min and max pixel values not between 0 and 1

I have implemented GLCM Texture analysis on the Sentinel-1 SAR imagery. The imagery is high resolution. The parameters for the GLCM texture analysis are:
Window size: 5x5
Quantizer: Probablistic Quantizer
Quantization: 64 bit
Angle: 0 degree
Displacement: 1
The output is 10 different texture images. However the range of pixel values is not between 0 and 1. The range for every texture is between different min and max values. I believe this should be between 0 and 1 as it is a probabilistic analysis with GLCM that is being calculated for every pixel.
Am I missing a step?
I guess you are getting 10 different images because for each image pixel you are performing the following operations:
Define a neighbourhood of 5×5 centered at the considered pixel.
Compute the GLCM corresponding to displacement=1 and angle=0 of that neighbourhood.
Extract 10 features from the local GLCM.
This results in a stack of 10 images, one image for each feature extracted from the local GLCMs.
The problem is that Haralick features are not normalized to 1. Consider for example the standard definition of entropy:
If you wish to obtain entropy value in the range [0, 1] you should divide the equation above by the maximum entropy (measured in bits), like this:
where is the number of different grey levels.
This paper explains how to normalize contrast, correlation, energy, entropy and homogeneity features extracted from GLCM so that they have range [0, 1].

Meaning of Histogram on Tensorboard

I am working on Google Tensorboard, and I'm feeling confused about the meaning of Histogram Plot. I read the tutorial, but it seems unclear to me. I really appreciate if anyone could help me figure out the meaning of each axis for Tensorboard Histogram Plot.
Sample histogram from TensorBoard
I came across this question earlier, while also seeking information on how to interpret the histogram plots in TensorBoard. For me, the answer came from experiments of plotting known distributions.
So, the conventional normal distribution with mean = 0 and sigma = 1 can be produced in TensorFlow with the following code:
import tensorflow as tf
cwd = "test_logs"
W1 = tf.Variable(tf.random_normal([200, 10], stddev=1.0))
W2 = tf.Variable(tf.random_normal([200, 10], stddev=0.13))
w1_hist = tf.summary.histogram("weights-stdev_1.0", W1)
w2_hist = tf.summary.histogram("weights-stdev_0.13", W2)
summary_op = tf.summary.merge_all()
init = tf.initialize_all_variables()
sess = tf.Session()
writer = tf.summary.FileWriter(cwd, session.graph)
sess.run(init)
for i in range(2):
writer.add_summary(sess.run(summary_op),i)
writer.flush()
writer.close()
sess.close()
Here is what the result looks like:
.
The horizontal axis represents time steps.
The plot is a contour plot and has contour lines at the vertical axis values of -1.5, -1.0, -0.5, 0.0, 0.5, 1.0, and 1.5.
Since the plot represents a normal distribution with mean = 0 and sigma = 1 (and remember that sigma means standard deviation), the contour line at 0 represents the mean value of the samples.
The area between the contour lines at -0.5 and +0.5 represent the area under a normal distribution curve captured within +/- 0.5 standard deviations from the mean, suggesting that it is 38.3% of the sampling.
The area between the contour lines at -1.0 and +1.0 represent the area under a normal distribution curve captured within +/- 1.0 standard deviations from the mean, suggesting that it is 68.3% of the sampling.
The area between the contour lines at -1.5 and +1-.5 represent the area under a normal distribution curve captured within +/- 1.5 standard deviations from the mean, suggesting that it is 86.6% of the sampling.
The palest region extends a little beyond +/- 4.0 standard deviations from the mean, and only about 60 per 1,000,000 samples will be outside of this range.
While Wikipedia has a very thorough explanation, you can get the most relevant nuggets here.
Actual histogram plots will show several things. The plot regions will grow and shrink in vertical width as the variation of the monitored values increases or decreases. The plots may also shift up or down as the mean of the monitored values increases or decreases.
(You may have noted that the code actually produces a second histogram with a standard deviation of 0.13. I did this to clear up any confusion between the plot contour lines and the vertical axis tick marks.)
#marc_alain, you're a star for making such a simple script for TB, which are hard to find.
To add to what he said the histograms showing 1,2,3 sigma of the distribution of weights. which is equivalent to the 68th,95th, and 98th percentiles. So think if you're model has 784 weights, the histogram shows how the values of those weights change with training.
These histograms are probably not that interesting for shallow models, you could imagine that with deep networks, weights in high layers might take a while to grow because of the logistic function being saturated. Of course I'm just mindlessly parroting this paper by Glorot and Bengio, in which they study the weights distribution through training and show how the logistic function is saturated for the higher layers for quite a while.
When plotting histograms, we put the bin limits on the x-axis and the count on the y-axis. However, the whole point of histogram is to show how a tensor changes over times. Hence, as you may have already guessed, the depth axis (z-axis) containing the numbers 100 and 300, shows the epoch numbers.
The default histogram mode is Offset mode. Here the histogram for each epoch is offset in the z-axis by a certain value (to fit all epochs in the graph). This is like seeing all histograms places one after the other, from one corner of the ceiling of the room (from the mid point of the front ceiling edge to be precise).
In the Overlay mode, the z-axis is collapsed, and the histograms become transparent, so you can move and hover over to highlight the one corresponding to a particular epoch. This is more like the front view of the Offset mode, with only outlines of histograms.
As explained in the documentation here:
tf.summary.histogram
takes an arbitrarily sized and shaped Tensor, and compresses it into a
histogram data structure consisting of many bins with widths and
counts. For example, let's say we want to organize the numbers [0.5,
1.1, 1.3, 2.2, 2.9, 2.99] into bins. We could make three bins:
a bin containing everything from 0 to 1 (it would contain one element, 0.5),
a bin containing everything from 1-2 (it would contain two elements, 1.1 and 1.3),
a bin containing everything from 2-3 (it would contain three elements: 2.2, 2.9 and 2.99).
TensorFlow uses a similar approach to create bins, but unlike in our
example, it doesn't create integer bins. For large, sparse datasets,
that might result in many thousands of bins. Instead, the bins are
exponentially distributed, with many bins close to 0 and comparatively
few bins for very large numbers. However, visualizing
exponentially-distributed bins is tricky; if height is used to encode
count, then wider bins take more space, even if they have the same
number of elements. Conversely, encoding count in the area makes
height comparisons impossible. Instead, the histograms resample the
data into uniform bins. This can lead to unfortunate artifacts in
some cases.
Please read the documentation further to get the full knowledge of plots displayed in the histogram tab.
Roufan,
The histogram plot allows you to plot variables from your graph.
w1 = tf.Variable(tf.zeros([1]),name="a",trainable=True)
tf.histogram_summary("firstLayerWeight",w1)
For the example above the vertical axis would have the units of my w1 variable. The horizontal axis would have units of the step which I think is captured here:
summary_str = sess.run(summary_op, feed_dict=feed_dict)
summary_writer.add_summary(summary_str, **step**)
It may be useful to see this on how to make summaries for the tensorboard.
Don
Each line on the chart represents a percentile in the distribution over the data: for example, the bottom line shows how the minimum value has changed over time, and the line in the middle shows how the median has changed. Reading from top to bottom, the lines have the following meaning: [maximum, 93%, 84%, 69%, 50%, 31%, 16%, 7%, minimum]
These percentiles can also be viewed as standard deviation boundaries on a normal distribution: [maximum, μ+1.5σ, μ+σ, μ+0.5σ, μ, μ-0.5σ, μ-σ, μ-1.5σ, minimum] so that the colored regions, read from inside to outside, have widths [σ, 2σ, 3σ] respectively.

SIFT parabola fitting of histogram

I am implementing Lowe's method, "SIFT", for finding and describing features in an image.
I have found interest points, and now I have to describe them: Using Lowe's method, I have calculated the magnitude and gradient in an area around the keypoint, and created a Gauss weighted histogram, with 36 bins, each corresponding to an orientation of 10 degrees. For each keypoint, there is a histogram. Each bin is the sum of the weighted magnitude, in that direction. An example taken from aishack.in: http://www.aishack.in/static/img/tut/sift-orientation-histogram.jpg
Bins within 80% the size of the maximum bin, is made a new keypoint. After describing, it says in the paper: "Finally, a parabola is fit to the 3 histogram values closest to each peak to interpolate the peak position for better accuracy". I am not sure i get this.
In my understanding, it means the peak, the left, and the right value of that peak, will have a parabola fit, like this(be warned! Drawn free hand)
http://i.stack.imgur.com/7V8pb.jpg
and the orientation of the keypoint will be where the extremum of the parabola is. For instance: If the parabola fitted at 10-19, 20-29, and 30-39 (with 20-29 being the histogram peak), had extremum at a point, that reached in the 30-39, then this would be the orientation of that keypoint. Am i understanding this correctly? In this way, the orientation of the keypoint, can only be within 36 orientations
Another option: Same idea as above, only the histogram is no longer discrete: the extremum of the parapola will thus be a continuous value, and this value is assigned to the keypoint.
The idea of the parabola fitting is to find the peak with better than bin resolution. As you see in your example, the peak is at 20-29 (average 24.5) but the 10-19 bin is higher than the 30-39 bin. It's therefore likely that the precise peak should be below 24.5.
You can't have a non-discrete histogram, that defeats the point of a histogram. What you can have is overlapping bins: create a bin for 20-29, but also a bin for 21-30 and 22-31 etc. So the value 24 would map to 10 bins, from 15-24 to 24-35.
And when you increment a bin, you don't necessarily need to increment it by 1. You can also increment a bin by a variable amount, e.g. the distance from the given value to the edge of the bin. So 24 would add 1 to bin 16-25 but 4 to bin 20-29.

Normalization or Standardization? When to use what?

I've calculated the cosine similarity between two vectors. For instance, each vector can have x elements, V = {v[0], v[1], ...}, such as {age, height, ...}
Currently, I do not normalize on each element. In other words, elements that have higher absolute values tend to matter more in the similarity computation. e.g. if you have a person who is 180 cm tall and is only 10 years old, height is going to affect the similarity more than age.
I'm considering three variation of feature scaling, borrowed from wiki (http://en.wikipedia.org/wiki/Feature_scaling):
Rescaling (subtract the min and divide by the range)
Standardization (subtracting the mean and dividing by standard deviation)
Using Percentiles (get the distribution of all values for a specific element and compute the percentiles the absolute value falls in)
It would be helpful if someone can explain the benefits to each and how I would go about determining what is the right method of normalization use. Having done all three, the sample results I get for instance is:
none: 1.0
standardized: 0.963
scaled: 0.981
quantile: 0.878

Where to center the kernel when using FFTW for image convolution?

I am trying to use FFTW for image convolution.
At first just to test if the system is working properly, I performed the fft, then the inverse fft, and could get the exact same image returned.
Then a small step forward, I used the identity kernel(i.e., kernel[0][0] = 1 whereas all the other components equal 0). I took the component-wise product between the image and kernel(both in the frequency domain), then did the inverse fft. Theoretically I should be able to get the identical image back. But the result I got is very not even close to the original image. I am suspecting this has something to do with where I center my kernel before I fft it into frequency domain(since I put the "1" at kernel[0][0], it basically means that I centered the positive part at the top left). Could anyone enlighten me about what goes wrong here?
For each dimension, the indexes of samples should be from -n/2 ... 0 ... n/2 -1, so if the dimension is odd, center around the middle. If the dimension is even, center so that before the new 0 you have one sample more than after the new 0.
E.g. -4, -3, -2, -1, 0, 1, 2, 3 for a width/height of 8 or -3, -2, -1, 0, 1, 2, 3 for a width/height of 7.
The FFT is relative to the middle, in its scale there are negative points.
In the memory the points are 0...n-1, but the FFT treats them as -ceil(n/2)...floor(n/2), where 0 is -ceil(n/2) and n-1 is floor(n/2)
The identity matrix is a matrix of zeros with 1 in the 0,0 location (the center - according to above numbering). (In the spatial domain.)
In the frequency domain the identity matrix should be a constant (all real values 1 or 1/(N*M) and all imaginary values 0).
If you do not receive this result, then the identify matrix might need padding differently (to the left and down instead of around all sides) - this may depend on the FFT implementation.
Center each dimension separately (this is an index centering, no change in actual memory).
You will probably need to pad the image (after centering) to a whole power of 2 in each dimension (2^n * 2^m where n doesn't have to equal m).
Pad relative to FFT's 0,0 location (to center, not corner) by copying existing pixels into a new larger image, using center-based-indexes in both source and destination images (e.g. (0,0) to (0,0), (0,1) to (0,1), (1,-2) to (1,-2))
Assuming your FFT uses regular floating point cells and not complex cells, the complex image has to be of size 2*ceil(2/n) * 2*ceil(2/m) even if you don't need a whole power of 2 (since it has half the samples, but the samples are complex).
If your image has more than one color channel, you will first have to reshape it, so that the channel are the most significant in the sub-pixel ordering, instead of the least significant. You can reshape and pad in one go to save time and space.
Don't forget the FFTSHIFT after the IFFT. (To swap the quadrants.)
The result of the IFFT is 0...n-1. You have to take pixels floor(n/2)+1..n-1 and move them before 0...floor(n/2).
This is done by copying pixels to a new image, copying floor(n/2)+1 to memory-location 0, floor(n/2)+2 to memory-location 1, ..., n-1 to memory-location floor(n/2), then 0 to memory-location ceil(n/2), 1 to memory-location ceil(n/2)+1, ..., floor(n/2) to memory-location n-1.
When you multiply in the frequency domain, remember that the samples are complex (one cell real then one cell imaginary) so you have to use a complex multiplication.
The result might need dividing by N^2*M^2 where N is the size of n after padding (and likewise for M and m). - You can tell this by (a. looking at the frequency domain's values of the identity matrix, b. comparing result to input.)
I think that your understanding of the Identity kernel may be off. An Identity kernel should have the 1 at the center of the 2D kernal not at the 0, 0 position.
example for a 3 x 3, you have yours setup as follows:
1, 0, 0
0, 0, 0
0, 0, 0
It should be
0, 0, 0
0, 1, 0
0, 0, 0
Check this out also
What is the "do-nothing" convolution kernel
also look here, at the bottom of page 3.
http://www.fmwconcepts.com/imagemagick/digital_image_filtering.pdf
I took the component-wise product between the image and kernel in frequency domain, then did the inverse fft. Theoretically I should be able to get the identical image back.
I don't think that doing a forward transform with a non-fft kernel, and then an inverse fft transform should lead to any expectation of getting the original image back, but perhaps I'm just misunderstanding what you were trying to say there...

Resources