Huffman vs adaptive huffman - huffman-code

I know that adaptive huffman has better performance than huffman algorhitm, but I can't figure out why.
In Huffman, when you build a tree and encode your text, you must send frequencies for each letter in text with encoded text. So when decoding, you build a tree, like you did when you were encoding, and then decode the message.
But in adaptive huffman, when you build a tree and encode text, I guess you must send message with built huffman tree? I may be wrong, but it seems that it's easier to send table containing letter frequencies, than whole tree.
Where am I wrong?

No, you don't send the code. An adaptive Huffman code is adjusted incrementally using the data already received. That process is replicated on the receiving end.

Related

When labeling dimension is too big and want to find another way rather than one-hot encoding

I am a beginner who learns machine learning.
I try to make some model(FNN) and this model has too many output labels to use a one-hot encoding.
Could you help me?
I want to solve this problem :
labeling data is for fruits:
Type (Apple, Grapes, Peach), Quality(Good, Normal, Bad), Price(Expensive, Normal, Cheap), Size(Big, Normal, Small)
So, If I make one-hot encoding, the data size up to 3*3*3*3, 81
I think that the labeling data looks like 4 one-hot-encoding sequence data.
Is there any way to make labeling data in small-dimension, not 81 dimension one hot encoding?
I think binary encoding also can be used, but recognized some shortcoming to use binary encoding in NN.
Thanks :D
If you one hot encode your 4 variables you will have 3+3+3+3=12 variables, not 81.
The concept is that you need to create a binary variable for every category in a categorical feature, not one for every possible combination of categories in the four features.
Nevertheless, other possible approaches are Numerical Encoding, Binary Encoding (as you mentioned), or Frequency Encoding (change every category with its frequency in the dataset). The results often depend on the problem, so try different approaches and see what best fits yours!
But even if you use One-Hot-Encoding, as #DavideDn pointed out, you will have 12 features, not 81, which isn't a concerning number.
However, let's say the number was indeed 81, you could still use dimensionality reduction techniques (like Principal Component Analysis) to solve the problem.

Applying neural network algorithms on Encrypted data

I have encrypted text dataset and i want to classify it using neural network algorithm. I know that there is a pattern in the encrypted data.
example of my input data :
diss%^ghghE(t dffd$#KL*vb xod##:n>did ....
My questions is should i treat encrypted data as if its normal text and create vocabulary and transform my data into sequence of indices ?
should i clean my data first from all the special characters ?
What i tried is i cleaned all data from special characters, then created a vocabulary and transform my data into sequences however i am getting a very low accuracy. but my model works well when my data is in natural language.
Any help is appreciated.
By definition, a good encryption algorithm will not allow you to learn anything[*] from the encrypted data.
So, unless you suspect that the encryption algorithm is weak, I suggest you abandon this idea.
[*] apart from the approximate size of the original text

How to fix huffman tree

I am using Huffman Encoding to compress radar data. The data arrives at a rate of 30 fps. Each frame is divided into 9x64 data chunks and this chunk is compressed at one time.
I do not want to transfer the huffman tree along with the compressed data for decoding. Is there any way the tree can be fixed?
Thank you!
You can simply take a large amount of your data, generate a Huffman code for that, and ... that's it. Just use that code on both sides.
If you want to get fancier, you can see if your data clusters statistically, and generate a handful of Huffman codes, one for each cluster. Then just send a few bits at the front of the data to select the Huffman code to use.

How to obtain the decomposition of a Chinese character

I'm a complete beginner in character recognition as well as machine learning in general.
I want to write a program which is able to process the following input:
A Chinese character (in either pixels of vector format), for example:
The decomposition of the previous character, ie for the example above:
and and the information that they are aligned horizontally.
The decomposition of a Chinese character is always 3 things: 2 other characters and the pattern describing how the 2 character form the initial character (it is called the compoisition kind). In the example above the composition kind is "aligned horizontally".
Given such an input, I want my program to tell which pixels or which contours in the initial character belongs to which subcharacter in its decomposition.
Where to start?
Well, I can't say that I provide a full answer but think about:
1) Reading the papers on how Google Translate app works. You know, when you point your iPhone's camera at text and it instantly translates the text (even preserving the fonts!). It supports the chineese language so it would be interesting for you to see if they solved similar task and how they did it
2) Another big question to answer - how to prepare your input data. You will need to provide at least some input data - i.e. decomposition of at least some characters. Try to do this manually for couple of characters and try to formalize what exactly you are doing - this will help you to better formulate what exactly you want your algorithm to do.
3) Try to use some deep neural net with your data from #2. Use something with convolution layers. Pre-train it with RBM (restricted boltzmann machine). After that - just take a really close look into the resulting neural network. Don't expect to get any good results, but looking into the ANN layers will help you to understand what the net have learned from data and might provide some insight into where to move next

java code for drawing huffman tree for encoding a sentence

I am able to huffman encode and decode the sentence but how do I draw the huffman tree for the same? please help with java code.
also after having constructed the tree, we should traverse the tree and create a dictionary of codewords

Resources