Reed-Solomon encoder for embedded application (memory-efficient)

Reed-Solomon encoder for embedded application (memory-efficient) - memory

I am looking for a very memory-efficient (like max. 500 bytes of memory for lookup tables etc.) implementation of a Reed-Solomon encoder for use in an embedded application?
I am interested in coding blocks of 10 bytes with 5 bytes of parity. Speed is of little importance.
Do you know any freely available implementations that I can use for this purpose?
Thanks in advance.

Starting here:
http://www.eccpage.com/rs.c
You can pre-compute alpha_to, index_of, and gg
For the case in the example program that is 16+16+7 ints (do they need to be ints or will bytes work?) or 156 bytes
That example has 9 ints of data and 6 ints of ecc or 15 total, if these are 4 byte ints that is another 60 bytes, 216 total.
Or 54 bytes if this could be done with bytes only. I seem to remember it works with bytes.
The encoder routine itself has a modulo but you can probably replace that with an and depending on your lengths. If your embedded processor has a divide then that is probably not going to hurt you anyway. Otherwise the encoder routine is quite simple. I am thinking that you may approach 500 bytes with the tables, data, and code.
I dont remember how to get from the 9 data and 6 ecc of the example to the 10 and 5 you are looking for. Hopefully the code in the link above will give you a head start to what you are looking for.

Related

how to tell how many memory addresses a processor can generate

Let's say a computer can hold a word size of 26 bits, I'm curious to know how many memory addresses can the processor generate?
I'm thinking that the maximum number it can hold would be 2^26 - 1 and can have 2^26 unique memory addresses.
I'm also curious to know that if let's say that each cell in the memory has a size of 12 bits then how many bytes of memory can this processor address?
My understanding is that in most cases a processor can hold up to 32 bits which is 4 bytes and each byte is 8 bits. However, in this case, each byte would be 12 bits and the processor would be able to address 2^26/12 bytes of memory. Is that safe to say?

I'm thinking that the maximum number it can hold would be 2^26 - 1 and can have 2^26 unique memory addresses.
I agree.  We usually refer to this as the size of the address space.
As for the next question:
These days, the term byte is generally agreed to means 8 bits, so 12 bits would mean 1.5 bytes.  It is a matter of terminology, though, which has varied in the long past.
So, I would say 226 12-bit words is capable of holding/storing 226 * 1.5 bytes, though they are not individually addressable, and would have to be packed & unpacked to access the separate bytes.
The DEC PDP-8 computer was a 12 bit computer and word addressable, so there were multiple schemes for storing characters: two 6 bit characters in a 12 bit word, and also 1 & 1/2 8-bit characters in a 12-bit word, so three 8-bit characters in two 12-bit words.
Similar issues occur when storing packed booleans in a memory, where each boolean takes only a single bit, yet the processor can access a minimum of 8 bits at a time, so must extract a single bit from a larger datum.

Deflate and fixed Huffman codes

I'm trying to implement a deflate compressor and I have to decide whether to
compress a block using the static huffman code or create a dynamic one.
What is the rationale behind the length associated with the static code?
(this is the table included in the rfc)
Lit Value Bits
--------- ----
0 - 143 8
144 - 255 9
256 - 279 7
280 - 287 8
I thought static code was more biased towards plain ascii text, instead it
looks like it prefers by a tiny bit the compression of the rle length
What is a good heuristic to decide whether to use static code?
I was thinking to build a distribution of probabilities from a sample of the
input data and calculate a distance (maybe EMD?) from the probabilities derived
from the static code.

I would guess that the creator of the code took a large sample of literals and lengths from compressed data, likely including executables along with text, and found typical code lengths over the large set. They were then approximated with the table shown. However the author passed away many years ago, so we'll never know for sure.
You don't need a heuristic. Once you have done the work to find matching strings, it is comparatively very fast to compute the number of bits in the block for both a dynamic and static representation. Then simply pick the smaller one. Or the static one if equal (decodes faster).

I don't know about rationale, but there was a small amount of irrationale in choosing the static code lengths:
In the table in your question, the maximum static code number there is 287, but the DEFLATE specification only allows up to code 285, meaning code lengths have wastefully been assigned to two invalid codes. (And not even the longest ones either!) It's a similar story with the table for distance codes, with 32 codes having lengths assigned, but only 30 valid.
So there are some easy improvements that could have been made, but that said, without some prior knowledge of the data, it's not really possible to produce anything that's massively more efficient generally. The "flatness" of the table (no code longer than 9 bits) reduces the worst-case performance to 1 extra bit per byte of uncompressable data.
I think the main rationale behind the groupings is that by keeping group sizes to a multiple of 8, it's possible to tell which group a code belongs to by looking at the 5 most significant bits, which also tells you its length, along with what value to add to immediately get the code value itself
00000 00 .. 00101 11 7 bits + 256 -> (256..279)
00110 000 .. 10111 111 8 bits - 48 -> ( 0..144)
11000 000 .. 11000 111 8 bits + 78 -> (280..287)
11001 0000 .. 11111 1111 9 bits - 256 -> (144..255)
So in theory you could set up a lookup table with 32 entries to quickly read in the codes, but it's an uncommon case and probably not worth optimising for.
There are only really two cases (with some overlap) where Fixed Huffman blocks are likely to be the most efficient:
where the input size in bytes is very small, Static Huffman can be more efficient than Uncompressed, because Uncompressed uses a 32-bit header, while Fixed Huffman needs only a 7-bit footer, plus 1 bit potential overhead per byte.
where the output size is very small (ie. small-ish, highly compressible data), Static Huffman can be more efficient than Dynamic Huffman - again because Dynamic Huffman uses a certain amount of space for an additional header. (A practical minimum header size is difficult to calculate, but I'd say at least 64 bits, probably more.)
That said, I've found they are actually helpful from a developer's perspective, because it's very easy to implement a Deflate-compatible function using Static Huffman blocks, and to iteratively improve from there to get more efficient algorithms working.

JVM 64-bit different memory usages?

I've done some reading but I'm not entirely sure about one thing, for example how much memory would this use in JVM 64 bit(sorry if stupid question, but I'm a bit confused and don't know much about this):
MyObject[] myArray; - I know an array takes up 24 bytes, but how much will each element in this array take? is every element an object reference, meaning 8 byte per element? If not, how do I know how many bytes each element in this array needs?

Normally, that is when using heap sizes of less than 32 GB, the 64-bit JVM uses compressed oops which store object pointers as a 32-bit integer (scaled by three bits when used, since all objects are aligned to 8 bytes; see the link for details), so each element would actually only use 4 bytes.
If you use more than 32 GB of heap or otherwise turn off compressed oops, however, then each element will indeed use 8 bytes.
Also, I suspect that your statement on the array header being 24 bytes is wrong. To begin with, when compressing oops, the class reference in the header is also compressed, and the identity-hash-code and array length fields are 32-bit to begin with, so I suspect it is more likely to use 12 bytes. Even when using full-length oops, it should still only take 16 bytes. I can't find any hard source verifying either, however. In general, however, it should be said that Hotspot does not even use a fixed-size object header but one that varies in size depending on various circumstances of the object. This article describes some of those circumstances.
That is on the Hotspot JVM, at least. Since the JLS doesn't specify any primitive sizes, it could, theoretically, be anything on any given JVM, though 8 bytes are, of course, the most likely implementation choice.

Here is good information on how to calculate the memory usage of a Java array
For Example
let's consider a 10x10 int array. Firstly, the "outer" array has its 12-byte object header followed by space for the 10 elements. Those elements are object references to the 10 arrays making up the rows. That comes to 12+4*10=52 bytes, which must then be rounded up to the next multiple of 8, giving 56. Then, each of the 10 rows has its own 12-byte object header, 4*10=40 bytes for the actual row of ints, and again, 4 bytes of padding to bring the total for that row to a multiple of 8. So in total, that gives 11*56=616 bytes. That's a bit bigger than if you'd just counted on 10*10*4=400 bytes for the hundred "raw" ints themselves.

Max length for a dynamic array in Delphi 64?

My current question is related to Max length for a dynamic array in Delphi?. That question was asked in 2009 when the 64 bit compiler was not available. I am preparing migration to Delphi XE2 (or whatever version is available for purchase not) or to Lazarus because I need 64 bit support.
I would like to know what changed (related to dynamic array max length) in Delphi 64bit. Can I create bigger arrays now?

Dynamic array lengths are, in modern Delphi, NativeInt.
This means that dynamic arrays are limited in theory to 32 bit lengths in 32 bit code, and 64 bit length in 64 bit code. Of course, practical considerations mean that the limits are somewhat lower. However it is possible to allocate dynamic arrays with more than 232 elements in 64 bit code.
On the other hand, strings are subject to a 32 bit limit on their length for all architectures. As I understand it the reasoning is that strings are simply not expected to hold such large amounts of text. And many of the text support library functions that strings rely on use 32 bit lengths. Whereas arrays are used for more general purpose computing and a 32 bit limit would greatly reduce their utility under 64 bits.

Reading a bit from memory

I'm looking into reading single bits from memory (RAM, harddisk). My understanding was, one can not read less than a byte.
However I read someone telling it can be done with assembly.
I wan't the bandwidth usage to be as low as possible and the to be retrieved data is not sequential, so I can not read a byte and convert it to 8 bits.

I don't think the CPU will read less than the size of a cache line from RAM (64 bytes on recent Intel chips). From disk, the minimum is typically 4 kiB.
Reading a single bit at a time is neither possible nor necessary, since the data bus is much wider than that.

You cannot read less than a byte from any PC or hard disk that I know of. Even if you could, it would be extremely inefficient.
Some machines do memory mapped port io that can read/write less than a byte to the port, but it still shows up when you get it as at least a byte.
Use the bitwise operators to pick off specific bits as in:
char someByte = 0x3D; // In binary, 111101
bool flag = someByte & 1; // Get the first bit, 1
flag = someByte & 2; // Get the second bit, 0
// And so on. The number after the & operator is a power of 2 if you want to isolate one bit.
// You can also pick off several bits like so:
int value = someByte & 3; // Assume the lower 2 bits are interesting for some reason

It used to be, say 386/486 days, where a memory was a bit wide, 1 meg by 1 bit, but you will have 8 or some multiple number of chips, one for each bit lane on the bus, and you could only read in widths of the bus. today the memories are a byte wide and you can only read in units of 32 or 64 or multiples of those. Even when you read a byte, most designs fill in the whole byte. it adds unnecessarily complication/cost, to isolate the bus all the way to the memory, a byte read looks to most of the system as a 32 or 64 bit read, as it approaches the edge of the processor (sometimes physical pins, sometimes the edge of the core inside the chip) is when the individual byte lane is separated out and the other bits are discarded. Having the cache on changes the smallest divisible read size from the memory, you will see a burst or block of reads.
it is possible to design a memory system that is 8 bits wide and read 8 bits at a time, but why would you? unless it is an 8 bit processor which you probably couldnt take advantage of a 8bit by 2 gig memory. dram is pretty slow anyway, something like 133 mhz (even your 1600mhz memory is only short burst as you read from slow parts, memory has not gotten faster in over 10 years).
Hard disks are similar but different, I think sectors are the smallest divisible unit, you have to read or write in those units. so when reading you have a memory cycle on the processor, no different that going to a memory, and depending on the controller either before you do the read or as a result, a sector is read of the disk, into a buffer, not unlike a cache line read, then your memory cycle to the buffer in the disk controller either causes a bus width read and the processor divides it up or if the bus adds complexity to isolate byte lanes then you isolate a byte, but nobody isolates bit lanes. (I say the word nobody and someone will come back with an exception...)
most of this is well documented, not hard to find. For arm platforms look for the amba and/or axi specifications, freely downloaded. the number of bridges, pcie controllers, disk controller documents are all available for PCs and other platforms. it still boils down to an address and data bus or one goesouta and one goesinta data bus and some control signals that indicate the access type. some busses have byte lane enables, which is generally for a write not a read. If I want to write only a byte to a dram in a modern 64 bit system, I DO have to tell everyone almost all the way out to the dram what I want to write. To write a byte on a memory module which must be accessed 64 bits at a time, at a minimum a 64 bit read happens into a temporary place either the cache or the memory controller, then the byte to be written modifies the specific byte within the 64 bit word, then that 64 bit quantity, eventually, is written back to the memory module itself. You can do this using a combination of the address bits and a few control signals or you can just put 8 byte lane enables and the lower address bits can be ignored. Hard disk, same deal, have to read a sector, modify one byte, then eventually write the whole sector at a time. with flash and eeprom, you can only write zeros (from the programmers perspective), you erase to ones (from the programmers perspective, is actually a zero in the logic, there is an inversion) and a write has to be a sector at a time, sectors can be 64 bytes, 128 bytes, 256 bytes typically.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart