I am learning about HOG and I understand it from here. A well-explained page with an example. I am not understanding this concept that how it works
A 16×16 block has 4 histograms which can be concatenated to form a 36
x 1 element vector and it can be normalized just the way a 3×1 vector
is normalized.
How this 36*1 came and how we calculated it? and is it compulsory that we always need 9 bin vector? Is it a fixed size for HOG?
came?
Is it compulsory that we always need 9 bin vector?
Not necessarily. Dalal and Triggs stated in their original HOG paper that accuracy for their application (which was human pedestrian detection) increased when using up to 9 bins, after that the accuracy did not increase any further, that's why 9 are commonly used.
How this 36*1 came and how we calculated it?
As already pointed out in the comments:
You have 9 bins per histogram (which will each be a scalar value in your feature vector). In your example, a histogram was calculated using 8 x 8 blocks, meaning in a 16 x 16 block you will be able to calculate 4 histograms. Each of those histograms will yield a 9 x 1 feature vector so:
4 (histograms) * 9 (bins) = 36 x 1 feature vector.
You basically just concatenate your results into one vector.
Related
I want to implement Ncuts algorithm for an image of size 1248 x 378 x 1 but the adjacency matrix will be (1248 x 378 ) x (1248 x 378 ) which needs about 800 gb of RAM. Even if i most of it is zero, still it needs too much memory. I do need this matrix though to compute the normalized cut. Is there any way that i can find the eigenvalues without actually calculate the whole matrix?
If most of the matrix is zero,, then don't use a dense format.
Instead use a sparse matrix.
The dimension of the image is 64 x 128. That is 8192 magnitude and gradient values. After the binning stage, we are left with 1152 values as we converted 64 pixels into 9 bins based on their orientation. Can you please explain to me how after L2 normalization we get 3780 vectors?
Assumption: You have the gradients of the 64 x 128 patch.
Calculate Histogram of Gradients in 8x8 cells
This is where it starts to get interesting. The image is divided into 8x8 cells and a HOG is calculated for each 8x8 cells. One of the reasons why we use 8x8 cells is that it provides a compact representation. An 8x8 image patch contains 8x8x3 = 192 pixel values (color image). The gradient of this patch contains 2 values (magnitude and direction) per pixel which adds up to 8x8x2 = 128 values. These 128 numbers are represented using a 9-bin histogram which can be stored as an array of 9 numbers. This makes it more compact and calculating histograms over a patch makes this representation more robust to noise.
The histogram is essentially a vector of 9 bins corresponding to angles 0, 20, 40, 60 ... 180 corresponding to unsigned gradients.
16 x 16 Block Normalization
After creating the histogram based on the gradient of the image, we want our descriptor to be independent of lighting variations. Hence, we normalize the histogram. The vector norm for a RGB color [128, 64, 32] is sqrt(128*128 + 64*64 + 32*32) = 146.64, which is the infamous L2-norm. Dividing each element of this vector by 146.64 gives us a normalized vector [0.87, 0.43, 0.22]. If we were to multiply each element of this vector by 2, the normalized vector will remain the same as before.
Although simply normalizing the 9x1 histogram is intriguing, normalizing a bigger sized block of 16 x 16 is better. A 16 x 16 block has 4 histograms, which can be concatenated to form a 36 x 1 element vector and it can be normalized the same way as the 3 x 1 vector in the example. The window is then moved by 8 pixels and a normalized 36 x 1 vector is calculated over this window and the process is repeated (see the animation: Courtesy)
Calculate the HOG feature vector
This is where your question comes in.
To calculate the final feature vector for the entire image patch, the 36 x 1 vectors are concatenated into on giant vector. Let us calculate the size:
How many positions of the 16 x 16 blocks do we have? There are 7 horizontal and 15 vertical positions, which gives - 105 positions.
Each 16 x 16 block is represented by a 36 x 1 vector. So when we concatenate them all into one giant vector we obtain a 36 x 105 = 3780 dimensional vector.
For more details, look at the tutorial where I learned.
Hope it helps!
I have some background in machine learning and I also just completed a face-identification excersize using support vector machine. I am in the process of trying to convert this exercise to HMM, but I am having problems understanding the notation and how to use it (I am using Kevin Murphy’s HMM package).
I am given about a 50 gray scale images of 6 different people (numbered 1-6). Each image is a 10 pixels by 10 pixels and each pixel can have values between 0-255 (8 bit gray scale). The goal is that I will be able to classify a new image to one of the 6 faces.
My approach is to take each image and make it a long vector of length 100 elements each is a pixel value . Now, I am getting to the confusing part. The notations I am using is as follows:
N : Number of observation symbols - I understand that the hidden state is the person’s face (i.e 1-6), therefore, there are 6 hidden states so N=6.
T : Length of observation sequence – is this equal to a 50 ? I am not sure what this represents
M: Number of observation symbols – is this equal to a 100 ? Does the term of “observation symbol” refer to the number of elements in the vector representing the observation?
O : Number of observations – what does this represent? In every example they use a single binary observed value and they make this to be 2 (i.e on or off). What would this be in my case ?
I greatly appreciate the help
if the data set has 440 objects and 8 attributes (dataset been taken from UCI machine learning repository). Then how do we calculate centroids for such datasets. (wholesale customers data)
https://archive.ics.uci.edu/ml/datasets/Wholesale+customers
if i calculate the mean of values of each row, will that be the centroid?
and how do I plot resulting clusters in matlab.
OK, first of all, in the dataset, 1 row corresponds to a single example in the data, you have 440 rows, which means the dataset consists of 440 examples. Each column contains the values for that specific feature (or attribute as you call it), e.g. column 1 in your dataset contains the values for the feature Channel, column 2 the values for the feature Region and so on.
K-Means
Now for K-Means Clustering, you need to specify the number of clusters (the K in K-Means). Say you want K=3 clusters, then the simplest way to initialise K-Means is to randomly choose 3 examples from your dataset (that is 3 rows, randomly drawn from the 440 rows you have) as your centroids. Now these 3 examples are your centroids.
You can think of your centroids as 3 bins and you want to put every example from the dataset into the closest(usually measured by the Euclidean distance; check the function norm in Matlab) bin.
After the first round of putting all examples into the closest bin, you recalculate the centroids by calculating the mean of all examples in their respective bins. You repeat the process of putting all the examples into the closest bin until no example in your dataset moves to another bin.
Some Matlab starting points
You load the data by X = load('path/to/the/dataset', '-ascii');
In your case X will be a 440x8 matrix.
You can calculate the Euclidean distance from an example to a centroid by
distance = norm(example - centroid1);,
where both, example and centroid1 have dimensionality 1x8.
Recalculating the centroids would work as follows, suppose you have done 1 iteration of K-Means and have put all examples into their respective closest bin. Say Bin1 now contains all examples that are closest to centroid1 and therefore Bin1 has dimensionality 127x8, which means that 127 examples out of 440 are in this bin. To calculate the centroid position for the next iteration you can then do centroid1 = mean(Bin1);. You would do similar things to your other bins.
As for plotting, you have to note that your dataset contains 8 features, which means 8 dimensions and which is not visualisable. I'd suggest you create or look for a (dummy) dataset which only consists of 2 features and would therefore be visualisable by using Matlab's plot() function.
Suppose that a given 3-bit image(L=8) of size 64*64 pixels (M*N=4096) has the intensity distribution shown as below. How to obtain histogram equalization transformation function
and then compute the equalized histogram of the image?
Rk nk
0 800
1 520
2 970
3 660
4 330
5 450
6 260
7 106
"Histogram Equalization is the process of obtaining transformation function automatically. So you need not have to worry about shape and nature of transformation function"
So in Histogram equalization, transformation function is calculated using cumulative frequency approach and this process is automatic. From the histogram of the image, we determine the cumulative histogram, c, rescaling the values as we go so that they occupy an 8-bit range. In this way, c becomes a look-up table that can be subsequently applied to the image in order to carry out equalization.
rk nk c sk = c/MN (L-1)sk rounded value
0 800 800 0.195 1.365 1
1 520 1320 0.322 2.254 2
2 970 2290 0.559 3.913 4
3 660 2950 0.720 5.04 5
4 330 3280 0.801 5.601 6
5 450 3730 0.911 6.377 6
6 260 3990 0.974 6.818 7
7 106 4096 1.000 7.0 7
Now the equalized histogram is therefore
rk nk
0 0
1 800
2 520
3 0
4 970
5 660
6 330 + 450 = 780
7 260 + 106 = 366
The algorithm for equalization can be given as
Compute a scaling factor, α= 255 / number of pixels
Calculate histogram of the image
Create a look up table c with
c[0] = α * histogram[0]
for all remaining grey levels, i, do
c[i] = c[i-1] + α * histogram[i]
end for
for all pixel coordinates, x and y, do
g(x, y) = c[f(x, y)]
end for
But there is a problem with histogram equalization and that is mainly because it is a completely automatic technique, with no parameters to set. At times, it can improve our ability to interpret an image dramatically. However, it is difficult to predict how beneficial equalization will be for any given image; in fact, it may not be of any use at all. This is because the improvement in contrast is optimal statistically, rather than perceptually. In images with narrow histograms and relatively few grey levels, a massive increase in contrast due to histogram equalisation can have the adverse effect of reducing perceived image quality. In particular, sampling or quantisation artefacts and image noise may become more prominent.
The alternative to obtaining the transformation (mapping) function automatically is Histogram Specification. In histogram specification instead of requiring a flat histogram, we specify a particular shape explicitly. We might wish to do this in cases where it is desirable for a set of related images to have the same histogram- in order, perhaps, that a particular operation produces the same results for all images.
Histogram specification can be visualised as a two-stage process. First, we transform the input image by equalisation into a temporary image with a flat histogram. Then we transform this equalised, temporary image into an output image possessing the desired histogram. The mapping function for the second stage is easily obtained. Since a rescaled version of the cumulative histogram can be used to transform a histogram with any shape into a flat histogram, it follows that the inverse of the cumulative histogram will perform the inverse transformation from a fiat histogram to one with a specified shape.
For more details about histogram equalization and mapping functions with C and C++ code
https://programming-technique.blogspot.com/2013/01/histogram-equalization-using-c-image.html