Yes, a similar question has been asked here NetLogo: histogram relative frequency, but as far as I know, no answer has been given. Am I allowed to re-ask it again? I would have added a comment under the question but I am not allowed to.
I would like to plot the relative frequency of patches with a specific colour against increasing distance from a turtle. What I've tried so far:
ask turtle [
set-current-plot "plot 1"
set-plot-y-range 0 1
set-plot-pen-mode 1
histogram [distance myself] of patches with [pcolor = red]]
This only gives me the absolute frequency. I want to plot the relative frequency of each patch colour against increasing distance from a turtle. So at distance = 1 away from turtle, how many patches out of the total no. of patches are red. I tried adding a
histogram [distance myself ] of patches with [ ] / count patches with [distance myself = 1] ; when x-axis = 1, and so for increasing x
but there were certain syntax problems since the primitive histogram expects a list. I would think there is a simpler way to set the y-axis to (absolute occurrences/total no. of patches at distance x) so I looked through the Netlogo dictionary but could not find something that sets up the y-axis under plot set-up commands.
Would appreciate any advice with regards to this issue! Thank you for your time.
It sounds like you want to plot the proportion of red to total patches vs the distance. You wouldn't use a histogram for this...
Instead, you'd plot it.
You'll need to get the distance of the farthest away patch. Than for each distance calculate the proportion of red to total number of patches at that distance.
To make your life simpler, you may want to round the distances since a turtle could be off-center from the patch (i.e. if the turtle has xcor .5 rather than 0 or 1.)
to setup
clear-all
crt 1 [ setxy random-xcor random-ycor]
ask patches [ set pcolor ifelse-value (random 100 < 30) [red][black]]
ask turtle 0
[
let max-distance round max [distance myself] of patches
set-current-plot "example-plot"
set-plot-x-range 0 max-distance
set-plot-y-range 0 1
foreach n-values max-distance [?]
[
let percent-red (count patches with [ round (distance myself) = ? and pcolor = red]) / (count patches with [ round (distance myself) = ?])
plotxy ? percent-red
]
]
end
Related
I am working on Google Tensorboard, and I'm feeling confused about the meaning of Histogram Plot. I read the tutorial, but it seems unclear to me. I really appreciate if anyone could help me figure out the meaning of each axis for Tensorboard Histogram Plot.
Sample histogram from TensorBoard
I came across this question earlier, while also seeking information on how to interpret the histogram plots in TensorBoard. For me, the answer came from experiments of plotting known distributions.
So, the conventional normal distribution with mean = 0 and sigma = 1 can be produced in TensorFlow with the following code:
import tensorflow as tf
cwd = "test_logs"
W1 = tf.Variable(tf.random_normal([200, 10], stddev=1.0))
W2 = tf.Variable(tf.random_normal([200, 10], stddev=0.13))
w1_hist = tf.summary.histogram("weights-stdev_1.0", W1)
w2_hist = tf.summary.histogram("weights-stdev_0.13", W2)
summary_op = tf.summary.merge_all()
init = tf.initialize_all_variables()
sess = tf.Session()
writer = tf.summary.FileWriter(cwd, session.graph)
sess.run(init)
for i in range(2):
writer.add_summary(sess.run(summary_op),i)
writer.flush()
writer.close()
sess.close()
Here is what the result looks like:
.
The horizontal axis represents time steps.
The plot is a contour plot and has contour lines at the vertical axis values of -1.5, -1.0, -0.5, 0.0, 0.5, 1.0, and 1.5.
Since the plot represents a normal distribution with mean = 0 and sigma = 1 (and remember that sigma means standard deviation), the contour line at 0 represents the mean value of the samples.
The area between the contour lines at -0.5 and +0.5 represent the area under a normal distribution curve captured within +/- 0.5 standard deviations from the mean, suggesting that it is 38.3% of the sampling.
The area between the contour lines at -1.0 and +1.0 represent the area under a normal distribution curve captured within +/- 1.0 standard deviations from the mean, suggesting that it is 68.3% of the sampling.
The area between the contour lines at -1.5 and +1-.5 represent the area under a normal distribution curve captured within +/- 1.5 standard deviations from the mean, suggesting that it is 86.6% of the sampling.
The palest region extends a little beyond +/- 4.0 standard deviations from the mean, and only about 60 per 1,000,000 samples will be outside of this range.
While Wikipedia has a very thorough explanation, you can get the most relevant nuggets here.
Actual histogram plots will show several things. The plot regions will grow and shrink in vertical width as the variation of the monitored values increases or decreases. The plots may also shift up or down as the mean of the monitored values increases or decreases.
(You may have noted that the code actually produces a second histogram with a standard deviation of 0.13. I did this to clear up any confusion between the plot contour lines and the vertical axis tick marks.)
#marc_alain, you're a star for making such a simple script for TB, which are hard to find.
To add to what he said the histograms showing 1,2,3 sigma of the distribution of weights. which is equivalent to the 68th,95th, and 98th percentiles. So think if you're model has 784 weights, the histogram shows how the values of those weights change with training.
These histograms are probably not that interesting for shallow models, you could imagine that with deep networks, weights in high layers might take a while to grow because of the logistic function being saturated. Of course I'm just mindlessly parroting this paper by Glorot and Bengio, in which they study the weights distribution through training and show how the logistic function is saturated for the higher layers for quite a while.
When plotting histograms, we put the bin limits on the x-axis and the count on the y-axis. However, the whole point of histogram is to show how a tensor changes over times. Hence, as you may have already guessed, the depth axis (z-axis) containing the numbers 100 and 300, shows the epoch numbers.
The default histogram mode is Offset mode. Here the histogram for each epoch is offset in the z-axis by a certain value (to fit all epochs in the graph). This is like seeing all histograms places one after the other, from one corner of the ceiling of the room (from the mid point of the front ceiling edge to be precise).
In the Overlay mode, the z-axis is collapsed, and the histograms become transparent, so you can move and hover over to highlight the one corresponding to a particular epoch. This is more like the front view of the Offset mode, with only outlines of histograms.
As explained in the documentation here:
tf.summary.histogram
takes an arbitrarily sized and shaped Tensor, and compresses it into a
histogram data structure consisting of many bins with widths and
counts. For example, let's say we want to organize the numbers [0.5,
1.1, 1.3, 2.2, 2.9, 2.99] into bins. We could make three bins:
a bin containing everything from 0 to 1 (it would contain one element, 0.5),
a bin containing everything from 1-2 (it would contain two elements, 1.1 and 1.3),
a bin containing everything from 2-3 (it would contain three elements: 2.2, 2.9 and 2.99).
TensorFlow uses a similar approach to create bins, but unlike in our
example, it doesn't create integer bins. For large, sparse datasets,
that might result in many thousands of bins. Instead, the bins are
exponentially distributed, with many bins close to 0 and comparatively
few bins for very large numbers. However, visualizing
exponentially-distributed bins is tricky; if height is used to encode
count, then wider bins take more space, even if they have the same
number of elements. Conversely, encoding count in the area makes
height comparisons impossible. Instead, the histograms resample the
data into uniform bins. This can lead to unfortunate artifacts in
some cases.
Please read the documentation further to get the full knowledge of plots displayed in the histogram tab.
Roufan,
The histogram plot allows you to plot variables from your graph.
w1 = tf.Variable(tf.zeros([1]),name="a",trainable=True)
tf.histogram_summary("firstLayerWeight",w1)
For the example above the vertical axis would have the units of my w1 variable. The horizontal axis would have units of the step which I think is captured here:
summary_str = sess.run(summary_op, feed_dict=feed_dict)
summary_writer.add_summary(summary_str, **step**)
It may be useful to see this on how to make summaries for the tensorboard.
Don
Each line on the chart represents a percentile in the distribution over the data: for example, the bottom line shows how the minimum value has changed over time, and the line in the middle shows how the median has changed. Reading from top to bottom, the lines have the following meaning: [maximum, 93%, 84%, 69%, 50%, 31%, 16%, 7%, minimum]
These percentiles can also be viewed as standard deviation boundaries on a normal distribution: [maximum, μ+1.5σ, μ+σ, μ+0.5σ, μ, μ-0.5σ, μ-σ, μ-1.5σ, minimum] so that the colored regions, read from inside to outside, have widths [σ, 2σ, 3σ] respectively.
My boss and I disagree as to what is going on with the CV_TM_CCORR_NORMED method for matchTemplate(); in openCV.
Can you please explain what is happening here especially the square root aspect of this equation.
Correlation is similarity of two signals,vectors etc. Suppose you have vectors
template=[0 1 0 0 1 0 ] A=[0 1 1 1 0 0] B =[ 1 0 0 0 0 1]
if you perform correlation between vectors and template to get which one is more similar ,you will see A is similar to template more than B because 1's are placed in corresponding indexes.This means the more nonzero elements corresponds the more correlation between vectors is.
In grayscale images the values are in the range of 0-255.Let's do that :
template=[10 250 36 30] A=[10 250 36 30] B=[220 251 240 210] .
Here it is clear that A is the same as template but correlation between B and template is bigger than A and template.In normalized cross correlation denumerator part of formula is solving this problem. If you check the formula below you can see that denumerator for B(x)template will be much bigger than A(x)template.
Formula as stated in opencv documentation :
In practice if you use cross correlation,if there is a brightness in a part of image , the correlation between that part and your template will be larger.But if you use normalized cross correlation you will get better result.
Think formula is this :
Before multiplying element by element you are normalizing two matrixes.By dividing root of square sum of all elements in matrix you are removing the gain;if all elements are large then divisor is large.
Think that you are dividing sum of all elements in matrix.If a pixel value is in a brighter area then its neighbours pixel values will be high.By dividing sum of its neighbourhood you are removing illumination effect.This is for image processing where pixel values are always positive.But for 2D matrix there may be some negative values so squaring ignores sign.
It's just a simple thing that I need to clarify.
I need a little refresh in mathematics:
In a circle the length of the gradient should be the radius?
Or do we use the gradient only to get the orientation?
I got to this question after I read about gradient in image processing:
I've read this answer and this about how to get the image gradient and of course here.
I don't understand if the magnitude should stand for the number of pixels? or it just stand for the strength of the intensity changes in a specific point.
The following image is the magnitude of the gradient:
the magnitude of the gradient:
I run the code and watched the magnitude in numbers, and the numbers clearly are not in the range of the image width\height.
Me, waiting to a simple clarify.
Thanks!
Mathematically speaking, the gradient magnitude, or in other words the norm of the gradient vector, represents the derivative (i.e. the slope) of a 2D signal. This is quite clear in the definition given by Wikipedia:
Here, f is the 2D signal and x^, y^ (this is ugly, I'll note them ux and uy in the following) are respectively unit vectors in the horizontal and vertical direction.
In the context of images, the 2D signal (i.e. the image) is discrete instead of being continuous, hence the derivative is approximated by the difference between the intensity of the current pixel and the intensity of the previous pixel, in the considered direction (actually, there are several ways to approximate the derivative, but let's keep it simple). Hence, we can approximate the gradient by the following quantity:
gradient f (u,v) = [ f(u,v)-f(u-1,v) ] . ux + [ f(u,v)-f(u,v-1) ] . uy
In this case, the gradient magnitude is the following:
|| gradient f (u,v) || = square_root { [ f(u,v)-f(u-1,v) ]² + [ f(u,v)-f(u,v-1) ]² }
To summarize, the gradient magnitude is a measure of the local intensity change at a given point and has not much to do with a radius, nor the width/height of the image.
In "Adaptive document image binarization" paper (link: http://www.mediateam.oulu.fi/publications/pdf/24.p) I found SDM, TBM algorithm for Text/Image Segmentation,
But I can't understand what "same quarter" is in the followed this paragraph.
If the average is high and a global histogram peak is in
the same quarter of the histogram and transient differ-
ence is transient, then use SDM.
If the average is medium and a global histogram peak
is not in the same quarter of the histogram and transi-
ent difference is uniform, then use TBM.
I know that a quarter meaning is 1/4. But i think that quarter is different meaning.. right?
After skimming the paper very quickly, I found two possible ways to interpret this.
From the current bin, choose a quarter of the histogram by looking 1/8th to the left and 1/8th to the right. i.e. if your histogram has 256 bins, and you are at bin 50, the quarter you are looking for is [18, 81]. So if the average is high and the peak lies in [18,81], use SDM.
Divide the entire histogram into quarters, and check which quarter your current bin lies in. i.e. if your histogram has 256 bins, divide it into [0, 63], [64, 127], [128, 191], [192, 255]. If your current bin is 50, you are in quarter 1, and so if the average is medium and the peak lies anywhere outside quarter 1, use TBM.
Based on intuition and mathematical sense, option 1 is more likely. But I would try both and see which implementation gives better results.
For image derivative computation, Sobel operator looks this way:
[-1 0 1]
[-2 0 2]
[-1 0 1]
I don't quite understand 2 things about it,
1.Why the centre pixel is 0? Can't I just use an operator like below,
[-1 1]
[-1 1]
[-1 1]
2.Why the centre row is 2 times the other rows?
I googled my questions, didn't find any answer which can convince me. Please help me.
In computer vision, there's very often no perfect, universal way of doing something. Most often, we just try an operator, see its results and check whether they fit our needs. It's true for gradient computation too: Sobel operator is one of many ways of computing an image gradient, which has proved its usefulness in many usecases.
In fact, the simpler gradient operator we could think of is even simpler than the one you suggest above:
[-1 1]
Despite its simplicity, this operator has a first problem: when you use it, you compute the gradient between two positions and not at one position. If you apply it to 2 pixels (x,y) and (x+1,y), have you computed the gradient at position (x,y) or (x+1,y)? In fact, what you have computed is the gradient at position (x+0.5,y), and working with half pixels is not very handy. That's why we add a zero in the middle:
[-1 0 1]
Applying this one to pixels (x-1,y), (x,y) and (x+1,y) will clearly give you a gradient for the center pixel (x,y).
This one can also be seen as the convolution of two [-1 1] filters: [-1 1 0] that computes the gradient at position (x-0.5,y), at the left of the pixel, and [0 -1 1] that computes the gradient at the right of the pixel.
Now this filter still has another disadvantage: it's very sensitive to noise. That's why we decide not to apply it on a single row of pixels, but on 3 rows: this allows to get an average gradient on these 3 rows, that will soften possible noise:
[-1 0 1]
[-1 0 1]
[-1 0 1]
But this one tends to average things a little too much: when applied to one specific row, we lose much of what makes the detail of this specific row. To fix that, we want to give a little more weight to the center row, which will allow us to get rid of possible noise by taking into account what happens in the previous and next rows, but still keeping the specificity of that very row. That's what gives the Sobel filter:
[-1 0 1]
[-2 0 2]
[-1 0 1]
Tampering with the coefficients can lead to other gradient operators such as the Scharr operator, which gives just a little more weight to the center row:
[-3 0 3 ]
[-10 0 10]
[-3 0 3 ]
There are also mathematical reasons to this, such as the separability of these filters... but I prefer seeing it as an experimental discovery which proved to have interesting mathematical properties, as experiment is in my opinion at the heart of computer vision.
Only your imagination is the limit to create new ones, as long as it fits your needs...
EDIT The true reason that the Sobel operator looks that way can be be
found by reading an interesting article by Sobel himself. My
quick reading of this article indicates Sobel's idea was to get an
improved estimate of the gradient by averaging the horizontal,
vertical and diagonal central differences. Now when you break the
gradient into vertical and horizontal components, the diagonal central
differences are included in both, while the vertical and horizontal
central differences are only included in one. Two avoid double
counting the diagonals should therefore have half the weights of the
vertical and horizontal. The actual weights of 1 and 2 are just
convenient for fixed point arithmetic (and actually include a scale
factor of 16).
I agree with #mbrenon mostly, but there are a couple points too hard to make in a comment.
Firstly in computer vision, the "Most often, we just try an operator" approach just wastes time and gives poor results compared to what might have been achieved. (That said, I like to experiment too.)
It is true that a good reason to use [-1 0 1] is that it centres the derivative estimate at the pixel. But another good reason is that it is the central difference formula, and you can prove mathematically that it gives a lower error in its estmate of the true derivate than [-1 1].
[1 2 1] is used to filter noise as mbrenon, said. The reason these particular numbers work well is that they are an approximation of a Gaussian which is the only filter that does not introduce artifacts (although from Sobel's article, this seems to be coincidence). Now if you want to reduce noise and you are finding a horizontal derivative you want to filter in the vertical direction so as to least affect the derivate estimate. Convolving transpose([1 2 1]) with [-1 0 1] we get the Sobel operator. i.e.:
[1] [-1 0 1]
[2]*[-1 0 1] = [-2 0 2]
[1] [-1 0 1]
For 2D image you need a mask. Say this mask is:
[ a11 a12 a13;
a21 a22 a23;
a31 a32 a33 ]
Df_x (gradient along x) should be produced from Df_y (gradient along y) by a rotation of 90o, i.e. the mask should be:
[ a11 a12 a11;
a21 a22 a21;
a31 a32 a31 ]
Now if we want to subtract the signal in front of the middle pixel (thats what differentiation is in discrete - subtraction) we want to allocate same weights to both sides of subtraction, i.e. our mask becomes:
[ a11 a12 a11;
a21 a22 a21;
-a11 -a12 -a11 ]
Next, the sum of the weight should be zero, because when we have a smooth image (e.g. all 255s) we want to have a zero response, i.e. we get:
[ a11 a12 a11;
a21 -2a21 a21;
-a31 -a12 -a31 ]
In case of a smooth image we expect the differentiation along X-axis to produce zero, i.e.:
[ a11 a12 a11;
0 0 0;
-a31 -a12 -a31 ]
Finally if we normalize we get:
[ 1 A 1;
0 0 0;
-1 -A -1 ]
and you can set A to anything you want experimentally. A factor of 2 gives the original Sobel filter.