How do you convert 8-bit bytes to 6-bit characters? - character-encoding

I have a specific requirement to convert a stream of bytes into a character encoding that happens to be 6-bits per character.
Here's an example:
Input: 0x50 0x11 0xa0
Character Table:
010100 T
000001 A
000110 F
100000 SPACE
Output: "TAF "
Logically I can understand how this works:
Taking 0x50 0x11 0xa0 and showing as binary:
01010000 00010001 10100000
Which is "TAF ".
What's the best way to do this programmatically (pseudo code or c++). Thank you!

Well, every 3 bytes, you end up with four characters. So for one thing, you need to work out what to do if the input isn't a multiple of three bytes. (Does it have padding of some kind, like base64?)
Then I'd probably take each 3 bytes in turn. In C#, which is close enough to pseudo-code for C :)
for (int i = 0; i < array.Length; i += 3)
{
// Top 6 bits of byte i
int value1 = array[i] >> 2;
// Bottom 2 bits of byte i, top 4 bits of byte i+1
int value2 = ((array[i] & 0x3) << 4) | (array[i + 1] >> 4);
// Bottom 4 bits of byte i+1, top 2 bits of byte i+2
int value3 = ((array[i + 1] & 0xf) << 2) | (array[i + 2] >> 6);
// Bottom 6 bits of byte i+2
int value4 = array[i + 2] & 0x3f;
// Now use value1...value4, e.g. putting them into a char array.
// You'll need to decode from the 6-bit number (0-63) to the character.
}

Just in case if someone is interested - another variant that extracts 6-bit numbers from the stream as soon as they appear there. That is, results can be obtained even if less then 3 bytes are currently read. Would be useful for unpadded streams.
The code saves the state of the accumulator a in variable n which stores the number of bits left in accumulator from the previous read.
int n = 0;
unsigned char a = 0;
unsigned char b = 0;
while (read_byte(&byte)) {
// save (6-n) most significant bits of input byte to proper position
// in accumulator
a |= (b >> (n + 2)) & (077 >> n);
store_6bit(a);
a = 0;
// save remaining least significant bits of input byte to proper
// position in accumulator
a |= (b << (4 - n)) & ((077 << (4 - n)) & 077);
if (n == 4) {
store_6bit(a);
a = 0;
}
n = (n + 2) % 6;
}

Related

How can I cast a value into two bytes in Dafny?

I want to convert an integer from 0 to 65355 and for that I need a two byte representation. I'm trying to divide it by 2, 8 times, and sum the powers of 2 when the rest is one, and then cast that integer as a byte but I'm having problems meeting the restrictions of a byte (256). The second byte will be the rest of the 8th division and I'm having problems casting that as a byte too.
The following is my code for the previously described function method:
method convertBin(i:int) returns (b:seq<byte>)
requires 0<=i<=65535;
{
var b1:=0;
var q:=i;
var j:=0;
while j<8
invariant 0<=j<=8 && (b1 as int)< power(2,j)
decreases 8-j
{
var p:int;
if(q%2==1){
p:=power(2, j);
b1:=b1 + p;
q:=q/2;
}
j:=j+1;
}
b1:=b1 as byte;
b:=[b1]+[q as byte];
}
To complete your example, you need stronger loop invariants. But you don't need a loop at all, since there's no reason to divide only by 2.
Here's doing it with byte as a subset type:
type byte = x | 0 <= x < 256
method convertBin(i: int) returns (b1: byte, b0: byte)
requires 0 <= i < 0x1_0000
ensures i == 256 * b1 + b0
{
b1, b0 := i / 256, i % 256;
}
And here's the same program, but with byte being a newtype:
newtype byte = x | 0 <= x < 256
method convertBin(i: int) returns (b1: byte, b0: byte)
requires 0 <= i < 0x1_0000
ensures i == 256 * b1 as int + b0 as int
{
b1, b0 := (i / 256) as byte, (i % 256) as byte;
}
Rustan

Break up 32-bit hex value into 4 bytes [QB64]

I would want to ask, how do you break up a 32-bit hex (for example: CEED6644) into 4 bytes (var1 = CE, var2 = ED, var3 = 66, var4 = 44). In QB64 or QBasic. I would use this to store several data bytes into one array address.
Something like this:
DIM Array(&HFFFF&) AS _UNSIGNED LONG
Array(&HAA00&) = &HCEED6644&
addr = &HAA00&
SUB PrintChar
SHARED addr
IF var1 = &HAA& THEN PRINT "A"
IF var1 = &HBB& THEN PRINT "B"
IF var1 = &HCC& THEN PRINT "C"
IF var1 = &HDD& THEN PRINT "D"
IF var1 = &HEE& THEN PRINT "E"
IF var1 = &HFF& THEN PRINT "F"
IF var1 = &H00& THEN PRINT "G"
IF var1 = &H11& THEN PRINT "H"
And so on...
You could use integer division (\) and bitwise AND (AND) to accomplish this.
DIM x(0 TO 3) AS _UNSIGNED _BYTE
a& = &HCEED6644&
x(0) = (a& AND &HFF000000&) \ 2^24
x(1) = (a& AND &H00FF0000&) \ 2^16
x(2) = (a& AND &H0000FF00&) \ 2^8
x(3) = a& AND &HFF&
PRINT HEX$(x(0)); HEX$(x(1)); HEX$(x(2)); HEX$(x(3))
Note that you could alternatively use a generic RShift~& function instead of raw integer division since what you're really doing is shifting bits:
x(0) = RShift~&(a& AND &HFF000000&, 18)
...
FUNCTION RShift~& (value AS _UNSIGNED LONG, shiftCount AS _UNSIGNED BYTE)
' Raise illegal function call if the shift count is greater than the width of the type.
' If shiftCount is not _UNSIGNED, then you must also check that it isn't less than 0.
IF shiftCount > 32 THEN ERROR 5
RShift~& = value / 2^shiftCount
END FUNCTION
Building upon that, you might create another function:
FUNCTION ByteAt~%% (value AS _UNSIGNED LONG, position AS _UNSIGNED BYTE)
'position must be in the range [0, 3].
IF (position AND 3) <> position THEN ERROR 5
ByteAt~%% = RShift~&(value AND LShift~&(&HFF&, 8*position), 8*position)
END FUNCTION
Note that an LShift~& function was used that shifts bits to the left (multiplication by a power of 2). A potentially better alternative would be to perform the right-shift first and just mask the lower 8 bits, eliminating the need for LShift~&:
FUNCTION ByteAt~%% (value AS _UNSIGNED LONG, position AS _UNSIGNED BYTE)
'position must be in the range [0, 3].
IF (position AND 3) <> position THEN ERROR 5
ByteAt~%% = RShift~&(value, 8*position) AND 255
END FUNCTION
Incidentally, another QB-like implementation known as FreeBASIC has an actual SHR operator, used like MOD or AND, to perform a shift operation directly instead of using division, which is potentially faster.
You could also use QB64's DECLARE LIBRARY facility to create functions in C++ that will perform the shift operations:
/*
* Place in a separate "shift.h" file or something.
*/
unsigned int LShift (unsigned int n, unsigned char count)
{
return n << count;
}
unsigned int RShift (unsigned int n, unsigned char count)
{
return n >> count;
}
Here's the full corresponding QB64 code:
DECLARE LIBRARY "shift"
FUNCTION LShift~& (value AS _UNSIGNED LONG, shiftCount AS _UNSIGNED _BYTE)
FUNCTION RShift~& (value AS _UNSIGNED LONG, shiftCount AS _UNSIGNED _BYTE)
END DECLARE
x(0) = ByteAt~%%(a&, 0)
x(1) = ByteAt~%%(a&, 1)
x(2) = ByteAt~%%(a&, 2)
x(3) = ByteAt~%%(a&, 3)
END
FUNCTION ByteAt~%% (value AS _UNSIGNED LONG, position AS _UNSIGNED BYTE)
'position must be in the range [0, 3].
IF (position AND 3) <> position THEN ERROR 5
ByteAt~%% = RShift~&(value, 8*position) AND 255
END FUNCTION
If QB64 had a documented API, it might be possible to raise a QB64 error from the C++ code when the shift count is too high, rather than relying on the behavior of C++ to essentially ignore shift counts that are too high. Unfortunately, this isn't the case, and it might actually cause more problems than it's worth.
This snip gets the byte pairs of a hexidecimal value:
DIM Value AS _UNSIGNED LONG
Value = &HCEED6644&
S$ = RIGHT$("00000000" + HEX$(Value), 8)
PRINT "Byte#1: "; MID$(S$, 1, 2)
PRINT "Byte#2: "; MID$(S$, 3, 2)
PRINT "Byte#3: "; MID$(S$, 5, 2)
PRINT "Byte#4: "; MID$(S$, 7, 2)

How to set 5 bits to value 3 at offset 387 bit in byte data sequence?

I need set some bits in ByteData at position counted in bits.
How I can do this?
Eg.
var byteData = new ByteData(1024);
var bitData = new BitData(byteData);
// Offset in bits: 387
// Number of bits: 5
// Value: 3
bitData.setBits(387, 5, 3);
Yes it is quite complicated. I dont know dart, but these are the general steps you need to take. I will label each variable as a letter and also use a more complicated example to show you what happens when the bits overflow.
1. Construct the BitData object with a ByteData object (A)
2. Call setBits(offset (B), bits (C), value (D));
I will use example values of:
A: 11111111 11111111 11111111 11111111
B: 7
C: 10
D: 00000000 11111111
3. Rather than using an integer with a fixed length of bits, you could
use another ByteData object (D) containing your bits you want to write.
Also create a mask (E) containing the significant bits.
e.g.
A: 11111111 11111111 11111111 11111111
D: 00000000 11111111
E: 00000011 11111111 (2^C - 1)
4. As an extra bonus step, we can make sure the insignificant
bits are really zero by ANDing with the bitmask.
D = D & E
D 00000000 11111111
E 00000011 11111111
5. Make sure D and E contain at least one full zero byte since we want
to shift them.
D 00000000 00000000 11111111
E 00000000 00000011 11111111
6. Work out these two integer values:
F = The extra bit offset for the start byte: B mod 8 (e.g. 7)
G = The insignificant bits: size(D) - C (e.g. 14)
7. H = G-F which should not be negative here. (e.g. 14-7 = 7)
8. Shift both D and E left by H bits.
D 00000000 01111111 10000000
E 00000001 11111111 10000000
9. Work out first byte number (J) floor(B / 8) e.g. 0
10. Read the value of A at this index out and let this be K
K = 11111111 11111111 11111111
11. AND the current (K) with NOT E to set zeros for the new bits.
Then you can OR the new bits over the top.
L = (K & !E) | D
K & !E = 11111110 00000000 01111111
L = 11111110 01111111 11111111
12. Write L to the same place you read it from.
There is no BitData class, so you'll have to do some of the bit-pushing yourself.
Find the corresponding byte offset, read in some bytes, mask out the existing bits and set the new ones at the correct bit offset, then write it back.
The real complexity comes when you need to store more bits than you can read/write in a single operation.
For endianness, if you are treating the memory as a sequence of bits with arbitrary width, I'd go for little-endian. Endianness only really makes sense for full-sized (2^n-bit, n > 3) integers. A 5 bit integer as the one you are storing can't have any endianness, and a 37 bit integer also won't have any natural way of expressing an endianness.
You can try something like this code (which can definitely be optimized more):
import "dart:typed_data";
void setBitData(ByteBuffer buffer, int offset, int length, int value) {
assert(value < (1 << length));
assert(offset + length < buffer.lengthInBytes * 8);
int byteOffset = offset >> 3;
int bitOffset = offset & 7;
if (length + bitOffset <= 32) {
ByteData data = new ByteData.view(buffer);
// Can update it one read/modify/write operation.
int mask = ((1 << length) - 1) << bitOffset;
int bits = data.getUint32(byteOffset, Endianness.LITTLE_ENDIAN);
bits = (bits & ~mask) | (value << bitOffset);
data.setUint32(byteOffset, bits, Endianness.LITTLE_ENDIAN);
return;
}
// Split the value into chunks of no more than 32 bits, aligned.
do {
int bits = (length > 32 ? 32 : length) - bitOffset;
setBitData(buffer, offset, bits, value & ((1 << bits) - 1));
offset += bits;
length -= bits;
value >>= bits;
bitOffset = 0;
} while (length > 0);
}
Example use:
main() {
var b = new Uint8List(32);
setBitData(b.buffer, 3, 8, 255);
print(b.map((v)=>v.toRadixString(16)));
setBitData(b.buffer, 13, 6*4, 0xffffff);
print(b.map((v)=>v.toRadixString(16)));
setBitData(b.buffer, 47, 21*4, 0xaaaaaaaaaaaaaaaaaaaaa);
print(b.map((v)=>v.toRadixString(16)));
}

Huffman's encoding and decoding

I have to build a compressor based on the Huffman algorithm.
So far, I managed to create the tree with the frequencies of each character and generate a representation with a smaller number of bits for each character.
Is something like this, for the phrase "good this sugarplum":
'o' 000, '' 001, 't' 0100, 'r' 0101, 'p' 0110, 'm' 0111, 'l' 1000, 'i' 1001, 'h' 1010, 'd' 1011, 'a'1100, 'u' 1101, 'g' 1110, 's' 1111
The problem I'm having now is finding a way to save the tree in the archive, so I can rebuild it and then decompress the file.
Any suggestions?
I did some research but found it difficult to understand, so if you can explain in detail, I would appreciate it.
The code I used to read the frequencies from file is:
int main (int argc, char *argv[])
{
int i;
TipoSentinela *sentinela;
TipoLista *no = NULL;
Arv *arvore, *arvore2, *arvore3;
int *repete = (int *) calloc (256, sizeof(int));
if(argc == 2)
{
in = load_base(argv[1]);
le_dados_arquivo (repete); //read the frequencies from the file
sentinela = cria_lista (); //create a marker for the tree node list
for (i = 0; i < 256; i++)
{
if(repete[i] > 0 && i != 0)
{
arvore = arv_cria (Cria_info (i, repete[i])); //create a tree node with the character i and the frequence of it in the file
no = inicia_lista (arvore, no, sentinela); //create the list of tree nodes
}
}
Ordena (sentinela); //sort the tree nodes list by the frequencies
for(Seta_primeiro(sentinela); Tamanho_lista(sentinela) != 1; Move_marcador(sentinela))
{
Seta_primeiro(sentinela); //put the marker in the first element of the list
no = Retorna_marcador(sentinela);
arvore2 = Retorna_arvore (no); //return the tree represented by the list marker
Move_marcador(sentinela); //put the marker to the next element
arvore3 = Retorna_arvore (Retorna_marcador (sentinela)); //return the tree represented by the list marker
arvore = Cria_pai (arvore2, arvore3); //create a tree node that will contain the both arvore2 and arvore3
Insere_arvoreFinal (sentinela, arvore); //insert the node at the end of the list
Remove_arvore (sentinela); //remove the node arvore2 from the list
Remove_arvore (sentinela); //remove the node arvore3 from the lsit
Ordena (sentinela); //sort the list again
}
out = load_out(argv[1]); //open the output file
Codificacao (arvore); //generate the code from each node of the tree
rewind(in);
char c;
while(!feof(in))
{
c = fgetc(in);
if(c != EOF)
arvore2 = Procura_info (arvore, c); //search the character c in the tree
if(arvore2 != NULL)
imprimebit(Retorna_codigo(arvore2), out); //write the code in the file
}
fclose(in);
fclose(out);
free(repete);
arvore = arv_libera (arvore);
Libera_Lista(sentinela);
}
return 0;
}
//bit_counter and cur_byte are global variables
void write_bit (unsigned char bit, FILE *f)
{
static k = 0;
if(k != 0)
{
if(++bit_counter == 8)
{
fwrite(&cur_byte,1,1,f);
bit_counter = 0;
cur_byte = 0;
}
}
k = 1;
cur_byte <<= 1;
cur_byte |= ('0' != bit);
}
//aux is the code of a character in the tree
void imprimebit(char *aux, FILE *f)
{
int i, j;
if(aux == NULL)
return;
for(i = 0; i < strlen(aux); i++)
{
write_bit(aux[i], f); //write the bits of the code in the file
}
}
With this, I can write the code of all characters in the output file, but I can't see a way to store the tree too.
You don't need to send the tree. Just send the lengths. Then establish a consistent algorithm to convert the lengths to codes on both ends. The consistency is called a "canonical" Huffman code. You sort the codes by length, and within each length, sort by the symbol. Then assign codes starting at 0. So you would end up with (_ means space):
_ 000
o 001
a 0100
d 0101
g 0110
h 0111
i 1000
l 1001
m 1010
p 1011
r 1100
s 1101
t 1110
u 1111
I did found a way to store the code of each character.
For example:
I write the tree, starting by the root and going down to the left, then right.
So, if my tree was something like
0
/ \
0 1
/ \ / \
'a' 'b' 'c' 'd'
The header of my file would be someting like this:
001[8 bits from 'a']1[8 bits from b]01[8 bits from c]1[8 bits from d]
With this, I would be able to rebuild my tree.
My problem now is in read bit-by-bit of the header of file to know in wich direction I have to create a new node.

PGMidi changing pitch sendBytes example

I'm trying the second day to send a midi signal. I'm using following code:
int pitchValue = 8191 //or -8192;
int msb = ?;
int lsb = ?;
UInt8 midiData[] = { 0xe0, msb, lsb};
[midi sendBytes:midiData size:sizeof(midiData)];
I don't understand how to calculate msb and lsb. I tried pitchValue << 8. But it's working incorrect, When I'm looking to events using midi tool I see min -8192 and +8064 max. I want to get -8192 and +8191.
Sorry if question is simple.
Pitch bend data is offset to avoid any sign bit concerns. The maximum negative deviation is sent as a value of zero, not -8192, so you have to compensate for that, something like this Python code:
def EncodePitchBend(value):
''' return a 2-tuple containing (msb, lsb) '''
if (value < -8192) or (value > 8191):
raise ValueError
value += 8192
return (((value >> 7) & 0x7F), (value & 0x7f))
Since MIDI data bytes are limited to 7 bits, you need to split pitchValue into two 7-bit values:
int msb = (pitchValue + 8192) >> 7 & 0x7F;
int lsb = (pitchValue + 8192) & 0x7F;
Edit: as #bgporter pointed out, pitch wheel values are offset by 8192 so that "zero" (i.e. the center position) is at 8192 (0x2000) so I edited my answer to offset pitchValue by 8192.

Resources