Writing UInt16List via IOSink.Add, what's the result? - dart

Trying to write audio samples to a file.
I have List of 16-bit ints
UInt16List _samples = new UInt16List(0);
I add elements to this list as samples come in.
Then I can write to an IOSink like so:
IOSink _ios = ...
List<int> _toWrite;
_toWrite.addAll(_samples);
_ios.add(_toWrite);
or
_ios.add(_samples);
just works, no issues with types despite the signature of add taking List<int> and not UInt16List.
As I read, in Dart the 'int' type is 64 bit.
Are both writes above identical? Do they produce packed 16-bit ints in this file?

A Uint16List is-a List<int>. It's a list of integers which truncates writes to 16-bits, and always reads out 16-bit integers, but it is a list of integers.
If you copy those integers to a plain growable List<int>, it will contain the same integer values.
So, doing ios.add(_sample) will do the same as ios.add(_toWrite), and most likely neither does what you want.
The IOSink's add method expects a list of bytes. So, it will take a list of integers and assume that they are bytes. That means that it will only use the low 8 bits of each integer, which will likely sound awful if you try to play that back as a 16-bit audio sample.
If you want to store all 16 bits, you need to figure out how to store each 16-bit value in two bytes. The easy choice is to just assume that the platform byte order is fine, and do ios.add(_samples.buffer.asUint8List(_samples.offsetInBytes, _samples.lengthInBytes)). This will make a view of the 16-bit data as twice as many bytes, then write those bytes.
The endianness of those bytes (is the high byte first or last) depends on the platform, so if you want to be safe, you can convert the bytes to a fixed byte order first:
if (Endian.host == Endian.little) {
ios.add(
_samples.buffer.asUint8List(_samples.offsetInBytes, _samples.lengthInBytes);
} else {
var byteData = ByteData(_samples.length * 2);
for (int i = 0; i < _samples.length; i++) {
byteData.setUint16(i * 2, _samples[i], Endian.little);
}
var littleEndianData = byteData.buffer.asUint8List(0, _samples.length * 2);
ios.add(littleEndianData);
}

Related

Dart: split an arbitrarily precised number onto a sequence of bytes

Assuming I have a declaration like this: final int input = 0xA55AA9D2;, I'd like to get a list of [0xA5, 0x5A, 0xA9, 0xD2]. It is easily achievable in Java by just right shifting the input by 24, 16, 8 and 0 respectively with subsequent cast to byte in order to cut precision to 8-bit value.
But how to do the same with Dart? I can't find sufficient information about numbers encoding (e.g. in Java front 1 means minus, but how is minus encoded here?) and transformations (e.g. how to cut precision) in order to solve this task.
P.S.: I solved this for 32-bit numbers using out.add([value >> 24, (value & 0x00FFFFFF) >> 16, (value & 0x0000FFFF) >> 8, value & 0X000000FF]); but it feels incredibly ugly, I feel that SDK provides more convenient means to split an arbitrarily precised number into bytes
The biggest issue here is that a Dart int is not the same type on the VM and in a browser.
On the native VM, an int is a 64-bit two's complement number.
In a browser, when compiled to JavaScript, an int is just a non-fractional double because JavaScript only has doubles as numbers.
If your code is only running on the VM, then getting the bytes is as simple as:
int number;
List<int> bytes = List.generate(8, (n) => (number >> (8 * n)) & 0xFF);
In JavaScript, bitwise operations only work on 32-bit integers, so you could do:
List<int> bytes = List.generate(4, (n) => (number >> (8 * n)) & 0xFF);
and get the byte representation of number.toSigned(32).
If you want a number larger than that, I'd probably use BigInt:
var bigNumber = BigInt.from(number).toSigned(64);
var b255 = BigInt.from(255);
List<int> bytes = List.generate(8, (n) => ((bigNumber >> (8 * n)) & b255).toInt());
From the documentation to the int class:
The default implementation of int is 64-bit two's complement integers with operations that wrap to that range on overflow.
Note: When compiling to JavaScript, integers are restricted to values that can be represented exactly by double-precision floating point values. The available integer values include all integers between -2^53 and 2^53 ...
(Most modern systems use two's complement for signed integers.)
If you need your Dart code to work portably for both web and for VMs, you can use package:fixnum to use fixed-width 32- or 64-bit integers.

Parse array of unsigned integers in Julia 1.x.x

I am trying to open a binary file that I have some knowledge of its internal structure, and reinterpret it correctly in Julia. Let us say that I can load it already via:
arx=open("../axonbinaryfile.abf", "r")
databin=read(arx)
close(arx)
The data is loaded as an Array of UInt8, which I guess are bytes.
In the first 4 I can perform a simple Char conversion and it works:
head=databin[1:4]
map(Char, head)
4-element Array{Char,1}:
'A'
'B'
'F'
' '
Then it happens to be that in the positions 13-16 is an integer of 32 bytes waiting to be interpreted. How should I do that?
I have tried reinterpret() and Int32 as function, but to no avail.
You can use reinterpret(Int32, databin[13:16])[1]. The last [1] is needed, because reinterpret returns you a view.
Now note that read supports type passing. So if you first read 12 bytes of data from your file e.g. like this read(arx, 12) and then run read(arx, Int32) you will get the desired number without having to do any conversions or vector allocation.
Finally observe that what conversion to Char does in your code is converting a Unicode number to a character. I am not sure if this is exactly what you want (maybe it is). For example if the first byte read in has value 200 you will get:
julia> Char(200)
'È': Unicode U+00c8 (category Lu: Letter, uppercase)
EDIT one more comment is that when you do a conversion to Int32 of 4 bytes you should be sure to check if it should be encoded as big-endian or little-endian (see ENDIAN_BOM constant and ntoh, hton, ltoh, htol functions)
Here it is. Use view to avoid copying the data.
julia> dat = UInt8[65,66,67,68,0,0,2,40];
julia> Char.(view(dat,1:4))
4-element Array{Char,1}:
'A'
'B'
'C'
'D'
julia> reinterpret(Int32, view(dat,5:8))
1-element reinterpret(Int32, view(::Array{UInt8,1}, 5:8)):
671219712

Preparing data in TLV8

I'm writing a HomeKit (so perhaps Bluetooth) characteristic in TLV8 format. Apple doc says
The value is an NSData object containing a set of one or more TLV8's,
which are packed type-length-value items with an 8-bit type, 8-bit
length, and N-byte value.
According to Wikipeida a type-length value is
Type
A binary code, often simply alphanumeric, which indicates the kind of field that this part of the message represents;
Length
The size of the value field (typically in bytes);
Value
Variable-sized series of bytes which contains data for this part of the message.
I have no idea how to pack one. I suppose I can write raw bytes to NSData, but what do I write for pad, if I need any padding, etc. So is there an example of how to do that?
Oh I figured it out.
TLV8 consist of three sections: "Tag", "Length", and "Value". I don't know what 8 means.
Both tag and length are UInt8. I believe what the tag may be depend on where the TLV8 is used. Length is the length of the value. Value is the content it self.
So when I want to send a simple 1 as a value, I use:
let tag = 0x02 // For example
let length = 0x01
let value = 0x01
let data = Data(bytes: [tag, length, value]) // NSData

Convert first two bytes of Lua string (in bigendian format) to unsigned short number

I want to have a lua function that takes a string argument. String has N+2 bytes of data. First two bytes has length in bigendian format, and rest N bytes contain data.
Say data is "abcd" So the string is 0x00 0x04 a b c d
In Lua function this string is an input argument to me.
How can I calculate length optimal way.
So far I have tried below code
function calculate_length(s)
len = string.len(s)
if(len >= 2) then
first_byte = s:byte(1);
second_byte = s:byte(2);
//len = ((first_byte & 0xFF) << 8) or (second_byte & 0xFF)
len = second_byte
else
len = 0
end
return len
end
See the commented line (how I would have done in C).
In Lua how do I achieve the commented line.
The number of data bytes in your string s is #s-2 (assuming even a string with no data has a length of two bytes, each with a value of 0). If you really need to use those header bytes, you could compute:
len = first_byte * 256 + second_byte
When it comes to strings in Lua, a byte is a byte as this excerpt about strings from the Reference Manual makes clear:
The type string represents immutable sequences of bytes. Lua is 8-bit clean: strings can contain any 8-bit value, including embedded zeros ('\0'). Lua is also encoding-agnostic; it makes no assumptions about the contents of a string.
This is important if using the string.* library:
The string library assumes one-byte character encodings.
If the internal representation in Lua of your number is important, the following excerpt from the Lua Reference Manual may be of interest:
The type number uses two internal representations, or two subtypes, one called integer and the other called float. Lua has explicit rules about when each representation is used, but it also converts between them automatically as needed.... Therefore, the programmer may choose to mostly ignore the difference between integers and floats or to assume complete control over the representation of each number. Standard Lua uses 64-bit integers and double-precision (64-bit) floats, but you can also compile Lua so that it uses 32-bit integers and/or single-precision (32-bit) floats.
In other words, the 2 byte "unsigned short" C data type does not exist in Lua. Integers are stored using the "long long" type (8 byte signed).
Lastly, as lhf pointed out in the comments, bitwise operations were added to Lua in version 5.3, and if lhf is the lhf, he should know ;-)

Decoding Huffman file from canonical form

I am writing a Huffman file where I am storing the code lengths of the canonical codes in the header of the file. And during decoding, I am able to regenerate the canonical codes and store them into a std::map<std:uint8_it, std::vector<bool>>. The actual data is read into a single std::vector<bool>. Before anyone suggests me to use std::bitset, let me clarify that Huffman codes have variable bit length, and hence, I am using std::vector<bool>. So, given that I have my symbols and their corresponding canonical codes, how do I decode my file? I don't know where to go from here. Can someone explain to me how I would decode this file since I couldn't find anything proper related to it on searching.
You do not need to create the codes or the tree in order to decode canonical codes. All you need is the list of symbols in order and the count of symbols in each code length. By "in order", I mean sorted by code length from shortest to longest, and within each code length, sorted by the symbol value.
Since the canonical codes within a code length are sequential binary integers, you can simply do integer comparisons to see if the bits you have fall within that code range, and if it is, an integer subtraction to determine which symbol it is.
Below is code from puff.c (with minor changes) to show explicitly how this is done. bits(s, 1) returns the next bit from the stream. (This assumes that there is always a next bit.) h->count[len] is the number of symbols that are coded by length len codes, where len is in 0..MAXBITS. If you add up h->count[1], h->count[2], ..., h->count[MAXBITS], that is the total number of symbols coded, and is the length of the h->symbol[] array. The first h->count[1] symbols in h->symbol[] have length 1. The next h->count[2] symbols in h->symbol[] have length 2. And so on.
The values in the h->count[] array, if correct, are constrained to not oversubscribe the possible number of codes that can be coded in len bits. It can be further constrained to represent a complete code, i.e. there is no bit sequence that remains undefined, in which case decode() cannot return an error (-1). For a code to be complete and not oversubscribed, the sum of h->count[len] << (MAXBITS - len) over all len must equal 1 << MAXBITS.
Simple example: if we are coding e with one bit, t with two bits, and a and o with three bits, then h->count[] is {0, 1, 1, 2} (the first value, h->count[0] is not used), and h->symbol[] is {'e','t','a','o'}. Then the code to e is 0, the code for t is 10, a is 110, and o is 111.
#define MAXBITS 15 /* maximum bits in a code */
struct huffman {
short *count; /* number of symbols of each length */
short *symbol; /* canonically ordered symbols */
};
int decode(struct state *s, const struct huffman *h)
{
int len; /* current number of bits in code */
int code; /* len bits being decoded */
int first; /* first code of length len */
int count; /* number of codes of length len */
int index; /* index of first code of length len in symbol table */
code = first = index = 0;
for (len = 1; len <= MAXBITS; len++) {
code |= bits(s, 1); /* get next bit */
count = h->count[len];
if (code - count < first) /* if length len, return symbol */
return h->symbol[index + (code - first)];
index += count; /* else update for next length */
first += count;
first <<= 1;
code <<= 1;
}
return -1; /* ran out of codes */
}
Your map contains the relevant information, but it maps symbols to codes.
Yet, the data you are trying to decode comprises codes.
Thus your map cant be used to get the symbols corresponding to the codes read in an efficient way since the lookup method expects a symbol. Searching for codes and retrieving the corresponding symbol would be a linear search.
Instead you should reconstruct the Huffman tree you constructed for the compression step.
The frequency values of the inner nodes are irrelevant here, but you will need the leaf nodes at the correct positions.
You can create the tree on the fly as you read your file header. Create an empty tree initially. For each symbol to code mapping you read, create the corresponding nodes in the tree.
E.g. if the symbol 'D' has been mapped to the code 101, then make sure there is a right child node at the root, which has a left child node, which has a right child node, which contains the symbol 'D', creating the nodes if they were missing.
Using that tree you can then decode the stream as follows (pseudo-code, assuming taking a right child corresponds to adding a 1 to the code):
// use a node variable to remember the position in the tree while reading bits
node n = tree.root
while(stream not fully read) {
read next bit into boolean b
if (b == true) {
n = n.rightChild
} else {
n = n.leftChild
}
// check whether we are in a leaf node now
if (n.leftChild == null && n.rightChild == null) {
// n is a leaf node, thus we have read a complete code
// add the corresponding symbol to the decoded output
decoded.add(n.getSymbol())
// reset the search
n = tree.root
}
}
Note that inverting your map to get the lookup into the correct direction will still result in suboptimal performance (compared to binary tree traversal) since it can't exploit the restriction to a smaller search space as the traversal does.

Resources