I have 1024 bit long binary representation of three handwritten digits: 0, 1, 8.
Basically, in 32x32 bitmap of a digit, rows are concatenated to form a binary vector.
There are 50 binary vectors for each digit.
When we apply Nearest neighbour to each digit, we can use hamming distance metric or some other, and then apply the algorithm to differentiate between the vectors.
Now I want to use another technique where instead of looking at each bit of a vector, I would like to analyse on less number of bits while comparing the vectors.
For example, I know that when one compares bitmap(size:1024 bits) of digits '8' and '0', We must have 1s in middle of the vector of digit '8' as there digit 8 visually appears as the combination of two zeros placed in column.
So our algorithm would look for the intersection of two zeros(which would be the middle of digit.
Thats the way I want to work. I want to convert the low level representation(looking at 1024 bitmap vector) to the high level representation(that consist of two properties extracted from bitmap).
Any suggestion? I hope, the question is somewhat clear to the audience.
Idea 1: Flood fill
This idea does not use the 50 patterns you have per digit: it is based on the idea that usually a "1" has all 0-bits connected around that "1" shape, while a "0" separates the 0-bits inside it from those outside it, and an "8" has two such enclosed areas. So counting connected areas of 0-bits would identify which of the three it is.
So you could use a flood fill algorithm, starting at any 0 bit in the vector, and set all those connected 0-bits to 1. In a 1 dimensional array you need to take care to correctly identify connected bits (either horizontally: 1 position apart, but not crossing a 32 boundary, or vertically... 32 positions apart). Of course, this flood-filling will destroy the image - so make sure to use a copy. If after one such flood-fill there are still 0 bits (which were therefore not connected to those you turned into 1), then choose one of those and start a second flood-fill there. Repeat if necessary.
When all bits have been set to 1 in that way, use the number of flood-fills you had to perform, as follows:
One flood-fill? It's a "1", because all 0-bits are connected.
Two flood-fills? It's a "0", because the shape of a zero separates two areas (inside/outside)
Three flood-fills? It's an "8", because this shape separates three areas of connected 0-bits.
Of course, this process assumes that these handwritten digits are well-formed. For example, if an 8-shape would have a small gap, like here:
..then it will not be identified as an "8", but a "0". This particular problem could be resolved by identifying "loose ends" of 1-bits (a "line" that stops). When you have two of those at a short distance, then increase the number you got from flood-fill counting with 1 (as if those two ends were connected).
Similarly, if a "0" accidentally has a small second loop, like here:
...it will be identified as an "8" instead of a "0". You could prevent this particular problem by requiring that each flood-fill finds a minimum number of 0-bits (like at least 10 0-bits) to count as one.
Idea 2: probability vector
For each digit, add up the 50 example vectors you have, so that for each position you have a count somewhere between 0 to 50. You would have one such "probability" vector per digit, so prob0, prob1 and prob8. If prob8[501] = 45, it means that it is highly probable (45/50) that an "8" vector will have a 1-bit at index 501.
Now transform these 3 probability vectors as follows: instead of storing a count per position, store the positions in order of decreasing count (probability). So if prob8[513] has the highest value (like 49), then that new array should start like [513, ...]. Let's call these new vectors A0, A8 and A1 (for the corresponding digit).
Finally, when you need to match a given input vector, simultaneously go through A0, A1 and A8 (always looking at the same index in the three vectors) and keep 3 scores. When the input vector has a 1 at the position specified in A0[i], then add 1 to score0. If it also has a 1 at the position specified in A1[i] (same i), then add 1 to score1. Same thing for score8. Increment i, and repeat. Stop this iteration as soon as you have a clear winner, i.e. when the highest score among score0, score1 and score8 has crossed a threshold difference with the second highest score among them. At that point you know which digit is being represented.
Related
Some of the research authors says that ,First of all, the mean values of the three color components R, G, and B are removed to reduce the internal
precision requirement of subsequent operations. Then, the
YCbCr transform is used to concentrate most of the image
energy into the Y component and reduce the correlation
among R, G, and B components. Therefore, the Y
component can be precisely quantified, while the Cb and Cr
components can be roughly quantified, so as to achieve the
purpose of compression without too much impact on the
quality of reconstructed images.
So can someone explain mean removing part ?
Removing the mean value of the R component means finding the mean (average) value of the R component and subtracting that from each R value. So if, for example, the R values were
204 204 192 200
then the mean would be 200. So you would adjust the values by subtracting 200 from each, yielding
4, 4, -8, 0
These values are smaller in magnitude than the original numbers, so the internal precision required to represent them is less.
(nb: this only helps if the values are not uniformly distributed across the available range already. But it doesn't hurt in any event, and most real world images don't have values that are uniformly distributed across the available range).
By removing the mean, you reduce the range of magnitudes needed.
To take an extreme example: if all pixels have the same value, whatever it is, removing the mean will convert everything to 0.
I don't understand part of this (quora: How does the last layer of a ConvNet connects to the first fully connected layer):
Make an one hot representation of feature maps. So we would have 64 *
7 * 7 = 3136 input features which is again processed by a 3136 neurons
reducing it to 1024 features. The matrix multiplication this layer
would be (1x3136) * (3136x1024) => 1x1024
I mean, what is the process to reduce 3136 inputs using 3136 neurons to 1024 features?
I would explain it using layman's terms how I understand it.
One hot representation of feature maps is a way for categorical values to be represented by a matrix using 1 and 0. This is a way for machines to read/process the data (in your example, an image or a picture). Then ig makes computations using matrix algebra.
Now the part of the computation is multiplication of 1 row and 3136 columns of binary values (1 or 0) and another matrix of size 3136 rows and 1024 columns. When you multiple these two matrices, the resulting matrix is 1 row and 1024 columns. This is now the matrix of 1's and 0's that represents your image or picture.
Hope I got your question right.
You need to understand matrix multiplication. (1x3136) * (3136x1024) is an example of matrix multiplication that first multiplier's((1x3136)) column number must be equal to second multiplier's (3136x1024) row number. This results in (1x1024) because first multiplier's row becomes result's row, while second multiplier's column becomes result's column.
Also, check this :
https://www.khanacademy.org/math/precalculus/precalc-matrices/multiplying-matrices-by-matrices/v/multiplying-a-matrix-by-a-matrix
I have an Image I
I am trying to do Automatic Object Extraction using Quantum Mechanics
Each pixel in an image is considered as a potential field, V(x,y) and hence each wave (eigen) function represents a meaningful region.
2D Time-independent Sschrodinger's equation
Multiplying both sides by
We get,
Rewriting the Laplacian using Finite Difference approach
where Ni is the set of neighbours with index i, and |Ni| is the cardinality of, i.e. the number of elements in Ni
Combining the above two equations, we get:
where M is the number of elements in
Now,the left hand side of the equation is a measure of how similar the labels in a neighbourhood are, i.e. a measure of spatial coherence.
Now, for applying this to images, the potential V is given as the pixel intensities.
Here, V is the pixel intensities
The right hand side is a measure of how close the pixel values in a segment are to a constant value E.
Now, the wave functions can be numerically calculated by solving the eigenvectors of Hamiltonian operator in matrix form which is
for i = j
for
and elsewhere 0
Now, in this paper it is said that first we have to find the maximum and minimum eigenvalues and then calculate the eigenvectors with eigenvalues closest to a number of values regularly selected between the minimum and maximum eigenvalues. the number is 300.
I have calculated the 300 eigenvectors.
And then the absolute square of the eigenvectors are thresholded to obtain the segments.
Fine upto this part.
Now, how do I reconstruct the eigenvectors into a 2D image so as to get the potential segments in the image?
I want to determine image sharpness by the amount of high frequencies within the image. As far as I understand the dft() function from OpenCV returns two matrices with real and complex numbers.
This is where I am stuck. How can I determine the amount of high frequencies from this data?
I am thankful for every hint/link which could provide me with a better understanding.
Greetings
Make FT
Calculate magnitude of result
Now you have 2D matrix. Consider upper left quadrant (other are mirrors for real source).
Here Magn[0][0] entry corresponds to zero frequency, and Magn[(n-1)/2][(n-1)/2] entry corresponds to the highest frequency.
Left upper part of this submatrix contains low-frequency samples, so you can calculate sum of values in this part and in the rest part and compare these sums. For example (pseudocode):
cvIntegral(Magn, Rect(0..n/4, 0..n/4)) compare with
cvIntegral(Magn, Rect(0..n/2, 0..n/2)) - cvIntegral(Magn, Rect(0..n/4, 0..n/4))
I have some background in machine learning and I also just completed a face-identification excersize using support vector machine. I am in the process of trying to convert this exercise to HMM, but I am having problems understanding the notation and how to use it (I am using Kevin Murphy’s HMM package).
I am given about a 50 gray scale images of 6 different people (numbered 1-6). Each image is a 10 pixels by 10 pixels and each pixel can have values between 0-255 (8 bit gray scale). The goal is that I will be able to classify a new image to one of the 6 faces.
My approach is to take each image and make it a long vector of length 100 elements each is a pixel value . Now, I am getting to the confusing part. The notations I am using is as follows:
N : Number of observation symbols - I understand that the hidden state is the person’s face (i.e 1-6), therefore, there are 6 hidden states so N=6.
T : Length of observation sequence – is this equal to a 50 ? I am not sure what this represents
M: Number of observation symbols – is this equal to a 100 ? Does the term of “observation symbol” refer to the number of elements in the vector representing the observation?
O : Number of observations – what does this represent? In every example they use a single binary observed value and they make this to be 2 (i.e on or off). What would this be in my case ?
I greatly appreciate the help