java code for drawing huffman tree for encoding a sentence - huffman-code

I am able to huffman encode and decode the sentence but how do I draw the huffman tree for the same? please help with java code.
also after having constructed the tree, we should traverse the tree and create a dictionary of codewords

Related

Constituency parse tree after processing

I use Stanford CoreNLP to get constituency parse tree. I am wondering should I perform this after pre-processing or before pre-processing. In pre-processing I make the characters lower case, remove punctuations, remove stopwords (e.g., the, you're, ...) , remove numbers, keep just alphabets, and so on.
My task is getting a vector representation for each constituency parse tree by considering each leaf (i.e., token) as a vector embedding.
I am wondering how big the difference does it make if I get constituency parse tree after pre-processing?
I would run the full pipeline without doing your custom processing. The parser is trained on data that has not had your pre-processing applied to it.

How can we use the dependency parser output to text embeddings? or Feature extractions from text?

Knowing the dependencies between various parts of the sentence
can add some information to existing knowledge from raw texts, Now the question is how can we use this to get a good feature representation, which can be fed into classifier such as logistic regression, sim etc. just as TfIdfvectorizer gives us a vector representation, for text documents. I'd like to know what different methods are there to get these kind of representation using output of dependency parser?

How to fix huffman tree

I am using Huffman Encoding to compress radar data. The data arrives at a rate of 30 fps. Each frame is divided into 9x64 data chunks and this chunk is compressed at one time.
I do not want to transfer the huffman tree along with the compressed data for decoding. Is there any way the tree can be fixed?
Thank you!
You can simply take a large amount of your data, generate a Huffman code for that, and ... that's it. Just use that code on both sides.
If you want to get fancier, you can see if your data clusters statistically, and generate a handful of Huffman codes, one for each cluster. Then just send a few bits at the front of the data to select the Huffman code to use.

Huffman vs adaptive huffman

I know that adaptive huffman has better performance than huffman algorhitm, but I can't figure out why.
In Huffman, when you build a tree and encode your text, you must send frequencies for each letter in text with encoded text. So when decoding, you build a tree, like you did when you were encoding, and then decode the message.
But in adaptive huffman, when you build a tree and encode text, I guess you must send message with built huffman tree? I may be wrong, but it seems that it's easier to send table containing letter frequencies, than whole tree.
Where am I wrong?
No, you don't send the code. An adaptive Huffman code is adjusted incrementally using the data already received. That process is replicated on the receiving end.

How to parse CFG's with arbitrary numbers of neighbors?

I'm working on a project that is trying to use context-free grammars for parsing images. We are trying to construct trees of image segments, then use machine learning to parse images using these visual grammars.
I have found SVM-CFG which looks ideal, the trouble is that it is designed for string parsing, where each terminal in the string has at most two neighbors (the words before and after). In our visual grammar, each segment can be next to an arbitrary number of other segments.
What is the best way to parse these visual grammars? Specifically, can I encode my data to use SVM-CFG? Or am I going to have to write my own Kernel/parsing library?
SVM-CFG is a specific implementation of the cutting plane optimization algorithm used in SVM-struct (described here http://www.cs.cornell.edu/People/tj/publications/tsochantaridis_etal_04a.pdf, Section 4).
At each step, the cutting plane algorithm calls a function to find the highest scoring structured output assignment (in SVM-CFG this is the highest scoring parse).
For one-dimensional strings, SVM-CFG runs a dynamic programming algorithm to find the highest scoring parse in polynomial time.
You could extend SVM-struct to return the highest scoring parse for an image, but no polynomial-time algorithm exists to do this!
Here is a reference for a state-of-the-art technique that parses images: http://www.socher.org/uploads/Main/SocherLinNgManning_ICML2011.pdf. They run into the same problem for finding the highest scoring parse of an image segmentation, so they use a greedy algorithm to find an approximate solution (see section 4.2). You might be able to incorporate a similar greedy algorithm into SVM-struct.

Resources