I was wondering , what happens to Huffman coding when the pixels are similar, so basically Huffman uses probability of each symbol and worth through it.
what happens if the image was like this:
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
ect.
does Huffman coding fails here?
No, it doesn't fail at all. If your images were stored with eight bits per pixel, now they will be stored with, on average, 2.67 bits per pixel. Compressed by a factor of three.
While the symbols all have equal probability, there are only six of them. That permits Huffman coding to use fewer bits per symbol.
Related
This is just basic theoretical question. so I read that a bit consist of 0 or 1. and a byte consists of 8 bits. and in 8 bit we can store 2^8 nos.
similarly in 10 bits we store 2^10 (1024). but then why do we say that 1024 is 1 kilo bytes, its actually 10 bits which just 1.25 byte to be exact.
please share some knowledge on it
just a concrete explanation.
Bit means like there are 8 bits in 1 byte, bit is the smallest unit of any storage or you can say the system and 8 bits sums up to 1 byte.
A bit, short for binary digit, is the smallest unit of measurement used in computers for information storage. A bit is represented by a 1 or a 0 with the value true or false, also known as on or off. A single byte of information, also known as an octet, is made up of eight bits. The size, or amount of information stored, distinguishes a bit from a byte.
A kilobit is 1,000 bits, but it is designated as 1024 bits in the binary system due to the amount of space required to store a kilobit using common operating systems and storage schemes. Most people, however, think of kilo as referring to 1,000 in order to remember what a kilobit is. A kilobyte then, would be 1,000 bytes.
Assume the 16 bit no. to be 256.
So,
byte 1 = Some binary no.
byte 2 = Some binary no.
But byte 1 also represents a 8 bit no.(Which could be an independent decimal number) and so does byte 2..
So how does the processor know that bytes 1,2 represent a single no. 256 and not two separate numbers
The processor would need to have another long type for that. I guess you could implement a software equivalent, but for the processor, these two bytes would still have individual values.
The processor could also have a special integer representation and machine instructions that handle these numbers. For example, most modern machines nowadays use twos-complement integers to represent negative numbers. In twos-complement, the most significant bit is used to differentiate negative numbers. So a twos-complement 8-bit integer can have a range of -128 (1000 0000) to 127 (0111 111).
You could easily have the most significant bit mean something else, so for example, when MSB is 0 we have integers from 0 (0000 0000) to 127 (0111 1111); when MSB is 1 we have integers from 256 (1000 0000) to 256 + 127 (1111 1111). Whether this is efficient or good architecture is another history.
I am trying to go through and understand some of VLFeat code to see how they generate the SIFT feature points. One thing that has me baffled early on is how they compute the number of octaves in their SIFT computation.
So according to the documentation, if one provides a negative value for the initial number of octaves, it will compute the maximum which is given by log2(min(width, height)). The code for the corresponding bit is:
if (noctaves < 0) {
noctaves = VL_MAX (floor (log2 (VL_MIN(width, height))) - o_min - 3, 1) ;
}
This code is in the function is in the vl_sift_new function. Here o_min is supposed to be the index of the first octave (I guess one does not need to start with the full resolution image). I am assuming this can be set to 0 in most use cases.
So, still I do not understand why they subtract 3 from this value. This seems very confusing. I am sure there is a good reason but I have not been able to figure it out.
The reason why they subtract by 3 is to ensure a minimum size of the patch you're looking at to get some appreciable output. In addition, when analyzing patches and extracting out features, depending on what algorithm you're looking at, there is a minimum size patch that the feature detection needs to get a good output and so subtracting by 3 ensures that this minimum patch size is met once you get to the lowest octave.
Let's take a numerical example. Let's say we have a 64 x 64 patch. We know that at each octave, the sizes of each dimension are divided by 2. Therefore, taking the log2 of the smallest of the rows and columns will theoretically give you the total number of possible octaves... as you have noticed in the above code. In our case, either the rows and columns are the minimum value, and taking the log2 of either the rows or columns gives us 7 octaves theoretically (log2(64) = 7). The octaves are arranged like so:
Octave | Size
--------------------
1 | 64 x 64
2 | 32 x 32
3 | 16 x 16
4 | 8 x 8
5 | 4 x 4
6 | 2 x 2
7 | 1 x 1
However, looking at octaves 5, 6 and 7 will probably not give you anything useful and so there's actually no point in analyzing those octaves. Therefore by subtracting by 3 from the total number of octaves, we will stop analyzing things at octave 4, and so the smallest patch to analyze is 8 x 8.
As such, this subtraction is commonly performed when looking at scale-spaces in images because this enforces that the last octave is of a good size to analyze features. The number 3 is arbitrary. I've seen people subtract by 4 and even 5. From all of the feature detection code that I have seen, 3 seems to be the most widely used number. So with what I said, it wouldn't really make much sense to look at an octave whose size is 1 x 1, right?
I am currently trying to implement a data path which processes an image data expressed in gray scale between unsigned integer 0 - 255. (Just for your information, my goal is to implement a Discrete Wavelet Transform in FPGA)
During the data processing, intermediate values will have negative numbers as well. As an example process, one of the calculation is
result = 48 - floor((66+39)/2)
The floor function is used to guarantee the integer data processing. For the above case, the result is -4, which is a number out of range between 0~255.
Having mentioned above case, I have a series of basic questions.
To deal with the negative intermediate numbers, do I need to represent all the data as 'equivalent unsigned number' in 2's complement for the hardware design? e.g. -4 d = 1111 1100 b.
If I represent the data as 2's complement for the signed numbers, will I need 9 bits opposed to 8 bits? Or, how many bits will I need to process the data properly? (With 8 bits, I cannot represent any number above 128 in 2's complement.)
How does the negative number division works if I use bit wise shifting? If I want to divide the result, -4, with 4, by shifting it to right by 2 bits, the result becomes 63 in decimal, 0011 1111 in binary, instead of -1. How can I resolve this problem?
Any help would be appreciated!
If you can choose to use VHDL, then you can use the fixed point library to represent your numbers and choose your rounding mode, as well as allowing bit extensions etc.
In Verilog, well, I'd think twice. I'm not a Verilogger, but the arithmetic rules for mixing signed and unsigned datatypes seem fraught with foot-shooting opportunities.
Another option to consider might be MyHDL as that gives you a very powerful verification environment and allows you to spit out VHDL or Verilog at the back end as you choose.
I want to compress many 32bit number using huffman compression.
Each number may appear multiple times, and I know that every number will be replaced with some bit sequences:
111
010
110
1010
1000
etc...
Now, the question: How many different numbers can be added to the huffman tree before the length of the binary sequence exceeds 32bits?
The rule of generating sequences (for those who don't know) is that every time a new number is added you must assign it the smallest binary sequence possible that is not the prefix of another.
You seem to understand the principle of prefix codes.
Many people (confusingly) refer to all prefix codes as "Huffman codes".
There are many other kinds of prefix codes -- none of them compress data into any fewer bits than Huffman compression (if we neglect the overhead of transmitting the frequency table), but many of them get pretty close (with some kinds of data) and have other advantages, such as running much faster or guaranteeing some maximum code length ("length-limited prefix codes").
If you have large numbers of unique symbols, the overhead of the Huffman frequency table becomes large -- perhaps some other prefix code can give better net compression.
Many people doing compression and decompression in hardware have fixed limits for the maximum codeword size -- many image and video compression algorithms specify a "length-limited Huffman code".
The fastest prefix codes -- universal codes -- do, in fact, involve a series of bit sequences that can be pre-generated without regard to the actual symbol frequencies. Compression programs that use these codes, as you mentioned, associate the most-frequent input symbol to the shortest bit sequence, the next-most-frequent input symbol to the next-shorted bit sequence, and so on.
For example, some compression programs use Fibonacci codes (a kind of universal code), and always associate the most-frequent symbol to the bit sequence "11", the next-most-frequent symbol to the bit sequence "011", the next to "0011", the next to "1011", and so on.
The Huffman algorithm produces a code that is similar in many ways to a universal code -- both are prefix codes.
But, as Cyan points out, the Huffman algorithm is slightly different than those universal codes.
If you have 5 different symbols, the Huffman tree will contain 5 different bit sequences -- however, the exact bit sequences generated by the Huffman algorithm depend on the exact frequencies.
One document may have symbol counts of { 10, 10, 20, 40, 80 }, leading to Huffman bit sequences { 0000 0001 001 01 1 }.
Another document may have symbol counts of { 40, 40, 79, 79, 80 }, leading to Huffman bit sequences { 000 001 01 10 11 }.
Even though both situations have exactly 5 unique symbols, the actual Huffman code for the most-frequent symbol is very different in these two compressed documents -- the Huffman code "1" in one document, the Huffman code "11" in another document.
If, however, you compressed those documents with the Fibonacci code, the Fibonacci code for the most-frequent symbol is always the same -- "11" in every document.
For Fibonacci in particular, the first 33-bit Fibonacci code is "31 zero bits followed by 2 one bits", representing the value F(33) = 3,524,578 .
And so 3,524,577 unique symbols can be represented by Fibonacci codes of 32 bits or less.
One of the more counter-intuitive features of prefix codes is that some symbols (the rare symbols) are "compressed" into much longer bit sequences.
If you actually have 2^32 unique symbols (all possible 32 bit numbers), it is not possible to gain any compression if you force the compressor to use prefix codes limited to 32 bits or less.
If you actually have 2^8 unique symbols (all possible 8 bit numbers), it is not possible to gain any compression if you force the compressor to use prefix codes limited to 8 bits or less.
By allowing the compressor to expand rare values -- to use more than 8 bits to store a rare symbol that we know can be stored in 8 bits -- or use more than 32 bits to store a rare symbol that we know can be stored in 32 bits -- that frees up the compressor to use less than 8 bits -- or less than 32 bits -- to store the more-frequent symbols.
In particular, if I use Fibonacci codes to compress a table of values,
where the values include all possible 32 bit numbers,
one must use Fibonacci codes up to N bits long, where F(N) = 2^32 -- solving for N I get
N = 47 bits for the least-frequently-used 32-bit symbol.
Huffman is about compression, and compression requires a "skewed" distribution to work (assuming we are talking about normal, order-0, entropy).
The worst situation regarding Huffman tree depth is when the algorithm creates a degenerated tree, i.e. with only one leaf per level. This situation can happen if the distribution looks like a Fibonacci serie.
Therefore, the worst distribution sequence looks like this : 1, 1, 1, 2, 3, 5, 8, 13, ....
In this case, you fill the full 32-bit tree with only 33 different elements.
Note, however, that to reach a 32 bit-depth with only 33 elements, the most numerous element must appear 3 524 578 times.
Therefore, since suming all Fibonacci numbers get you 5 702 886, you need to compress at least 5 702 887 numbers to start having a risk of not being able to represent them with a 32-bit huffman tree.
That being said, using an Huffman tree to represent 32-bits numbers requires a considerable amount of memory to calculate and maintain the tree.
[Edit] A simpler format, called "logarithm approximation", gives almost the same weight to all symbols. In this case, only the total number of symbols is required.
It computes very fast : say for 300 symbols, you will have some using 8 bits, and others using 9 bits. The formula to decide how many of each type :
9 bits : (300-256)*2 = 44*2 = 88 ;
8 bits : 300 - 88 = 212
Then you can distribute the numbers as you wish (preferably the most frequent ones using 8 bits, but that's not important).
This version scales up to 32 bits, meaning basically no restriction.