Is it possible to encode different blocks using different huffman codes(fixed and dynamic) in deflate algorithm? - huffman-code

Is it possible to encode different data blocks in a file using different huffman codes (some blocks using fixed and some using dynamic)? If so, how can deflate decompress detects these different blocks?

Yes.
Each deflate block starts with a three-bit header indicating whether this is the last block or not (one bit), and which of three possible block types it is (two bits).

Related

xterm.js intercepting and replacing ANSI sequences

I need to intercept ANSI sequences received by Xterm.js, mostly CSI sequences, in order to modify/customize them, what would be the best method to do so ?
Is there a xterm.js API that could help me ?
example of what i need to achieve :
<esc>[2m (SGR faint) replaced by a vivid color like (Cyan)
<esc>[36m
<esc>[2J (erase screen) replaced by a sequence of newlines before
the erase sequence because of xterm.js implementation not sending in
scrollback buffer anything erased
note : i cant modify softwares generating the CSI sequence, and i'm obliged to modify the behavior of above sequence to match the exact behavior of an old client i'm replacing
Since i'm using WebSSH as a framework, i evaluated the opportunity to do it before, on python backend, on on_read ssh channel.recv(), but it is challenging because of data being split over multiple messages resulting in an ansi sequence can be split across two messages. Clean incomplete ansi sequence detection seems difficult to reach.
Moreover it sounds dirty to me trying to patch this text buffer on backend.
That's why i think it would be much more efficient if i could do it directly from xterm.js between his ansi sequence detection and rendering.
I gave a try to xterm.js CSI hook registerCsiHandler but it seems not allowing me to change the whole sequence, and even less to add more data, i think it's not design to suit my needs ...

When using stream's read-byte what kind of byte am I reading

If I this understood correctly, common lisp was standardized in a time when there were many different architectures with different opinions on the size of a byte. To that end common lisp allows us to define the size of a byte.
For example I can create an array of 8bit bytes like this:
(make-array 10 :element-type '(unsigned-byte 8))
This works great and so far this knowledge has been enough for whatever I've been doing.
Today though I've been getting into using binary streams and the read-byte function confuses me.
The CLHS says that read-byte reads and returns one byte from stream.
but what kind of byte is this? The default platform byte? Can I specify this in any way?
Thanks folks
For example OPEN has an :element-type argument, which is implementation-defined. Your Common Lisp implementation has more informations about it. As said in comments, (unsigned-byte 8) describes a stream octets which happens to be the size of bytes in most (all?) implementations. Thanks #Xach.
See also flexi-streams which has make-external-format and binary-types for custom binary encodings.
It is whatever the element type of the stream you read from indicates.

How to write and read float data fast, not using string?

I have many float data which is generated from an image. I want to store it to a file, like XX.dat ( general in C). and I will read it again to do further processing.
I have method to represent float by nsstring and write it in to .txt file. but it is too slow. Is there some function which is same as fwrite( *data , *pfile) and fread(*buf, *pfile) in c? or some new idea?
many thanks!
In iOS you can still make use of the standard low-level file (and socket, among other things) API's. So you can use fopen(), fwrite(), fread(), etc. just as you would in any other C program.
This question has some examples of using the low-level file API on iOS: iPhone Unzip code
Another option to consider is writing your floats into something like an NSMutableData instance, and then writing that to file. That will be faster than converting everything to strings (you'll get a binary file instead of a text one), though probably still not as fast as using the low-level API's. And you'd probably have to use something like this to convert between floats and byte-arrays.
If you are familiar with lower level access, you could mmap your file, and then access the data directly just as you would any allocated memory.

Delphi TStringList wrapper to implement on-the-fly compression

I have an application for storing many strings in a TStringList. The strings will be largely similar to one another and it occurs to me that one could compress them on the fly - i.e. store a given string in terms of a mixture of unique text fragments plus references to previously stored fragments. StringLists such as lists of fully-qualified path and filenames should be able to be compressed greatly.
Does anyone know of a TStringlist descendant that implement this - i.e. provides read and write access to the uncompressed strings but stores them internally compressed, so that a TStringList.SaveToFile produces a compressed file?
While you could implement this by uncompressing the entire stringlist before each access and re-compressing it afterwards, it would be unnecessarily slow. I'm after something that is efficient for incremental operations and random "seeks" and reads.
TIA
Ross
I don't think there's any freely available implementation around for this (not that I know of anyway, although I've written at least 3 similar constructs in commercial code), so you'd have to roll your own.
The remark Marcelo made about adding items in order is very relevant, as I suppose you'll probably want to compress the data at addition time - having quick access to entries already similar to the one being added, gives a much better performance than having to look up a 'best fit entry' (needed for similarity-compression) over the entire set.
Another thing you might want to read up about, are 'ropes' - a conceptually different type than strings, which I already suggested to Marco Cantu a while back. At the cost of a next-pointer per 'twine' (for lack of a better word) you can concatenate parts of a string without keeping any duplicate data around. The main problem is how to retrieve the parts that can be combined into a new 'rope', representing your original string. Once that problem is solved, you can reconstruct the data as a string at any time, while still having compact storage.
If you don't want to go the 'rope' route, you could also try something called 'prefix reduction', which is a simple form of compression - just start out each string with an index of a previous string and the number of characters that should be treated as a prefix for the new string. Be aware that you should not recurse this too far back, or access-speed will suffer greatly. In one simple implementation, I did a mod 16 on the index, to establish the entry at which prefix-reduction started, which gave me on average about 40% memory savings (this number is completely data-dependant of course).
You could try to wrap a Delphi or COM API around Judy arrays. The JudySL type would do the trick, and has a fairly simple interface.
EDIT: I assume you are storing unique strings and want to (or are happy to) store them in lexicographical order. If these constraints aren't acceptable, then Judy arrays are not for you. Mind you, any compression system will suffer if you don't sort your strings.
I suppose you expect general flexibility from the list (including delete operation), in this case I don't know about any out of the box solution, but I'd suggest one of the two approaches:
You split your string into words and
keep separated growning dictionary
to reference the words and save list of indexes internally
You implement something related to
zlib stream available in Delphi, but operating by the block that
for example can contains 10-100
strings. In this case you still have
to recompress/compress the complete
block, but the "price" you pay is lower.
I dont think you really want to compress TStrings items in memory, because it terribly ineffecient. I suggest you to look at TStream implementation in Zlib unit. Just wrap regular stream into TDecompressionStream on load and TCompressionStream on save (you can even emit gzip header there).
Hint: you will want to override LoadFromStream/SaveToStream instead of LoadFromFile/SaveToFile

Using Haskell's Parsec to parse binary files?

Parsec is designed to parse textual information, but it occurs to me that Parsec could also be suitable to do binary file format parsing for complex formats that involve conditional segments, out-of-order segments, etc.
Is there an ability to do this or a similar, alternative package that does this? If not, what is the best way in Haskell to parse binary file formats?
The key tools for parsing binary files are:
Data.Binary
cereal
attoparsec
Binary is the most general solution, Cereal can be great for limited data sizes, and attoparsec is perfectly fine for e.g. packet parsing. All of these are aimed at very high performance, unlike Parsec. There are many examples on hackage as well.
You might be interested in AttoParsec, which was designed for this purpose, I think.
I've used Data Binary successfully.
It works fine, though you might want to use Parsec 3, Attoparsec, or Iteratees. Parsec's reliance on String as its intermediate representation may bloat your memory footprint quite a bit, whereas the others can be configured to use ByteStrings.
Iteratees are particularly attractive because it is easier to ensure they won't hold onto the beginning of your input and can be fed chunks of data incrementally a they come available. This prevents you from having to read the entire input into memory in advance and lets you avoid other nasty workarounds like lazy IO.
The best approach depends on the format of the binary file.
Many binary formats are designed to make parsing easy (unlike text formats that are primarily to be read by humans). So any union data type will be preceded by a discriminator that tells you what type to expect, all fields are either fixed length or preceded by a length field, and so on. For this kind of data I would recommend Data.Binary; typically you create a matching Haskell data type for each type in the file, and then make each of those types an instance of Binary. Define the "get" method for reading; it returns a "Get" monad action which is basically a very simple parser. You will also need to define a "put" method.
On the other hand if your binary data doesn't fit into this kind of world then you will need attoparsec. I've never used that, so I can't comment further, but this blog post is very positive.

Resources