How to encode to UTF16 Little Endian in Dart? - dart

I am attempting to manipulate some system variables used by a program using Dart. I have encountered the problem of dart's utf package being discontinued, and I have not found any way to encode to UTF 16 Little Endian for a File.write. Is there a library that can do a byte to UTF 16 LE conversion in Dart? I would use UTF anyway, but it is not null safe. I may end up trying to use the utf package source code, but I am checking here to see if there is a native (or pub) implementation I have missed, as I am new to the world of UTF and byte conversions.
My goal:
encodeAsUtf16le(String s);
I do not need to write a BOM.

Dart Strings internally use UTF-16. You can use String.codeUnits to get the UTF-16 code units and then write them in little-endian form:
var s = '\u{1F4A9}';
var codeUnits = s.codeUnits;
var byteData = ByteData(codeUnits.length * 2);
for (var i = 0; i < codeUnits.length; i += 1) {
byteData.setUint16(i * 2, codeUnits[i], Endian.little);
}
var bytes = byteData.buffer.asUint8List();
await File('output').writeAsBytes(bytes);
or assume that you're running on a little-endian system:
var s = '\u{1F4A9}';
var codeUnits = s.codeUnits;
var bytes = Uint16List.fromList(codeUnits).buffer.asUint8List();
await File('output').writeAsBytes(bytes);
Also see https://stackoverflow.com/a/67802971/, which is about encoding UTF-16LE to Strings.
I also feel compelled to advise against writing UTF-16 to disk unless you're forced to by external requirements.

Related

Wireshark Lua dissector utf16 string

I am writing a custom Wireshark Lua dissector. One field in the dissector is a UTF16 string. I tried to specify this field with
msg_f = ProtoField.string("mydissector.msg", "msg", base.UNICODE)
local getMsg = buffer(13) -- starting on byte 13
subtree:add_le(m.msg_f, getMsg)
However, this only adds the first character rather than the whole string. It also raises an Expert Info warning undecoded trailing/stray characters.
What is the correct way to parse a UTF16 string?
You haven't specified the range of bytes that comprises the string. This is typically determined by either an explicit length field or by a NULL-terminator. The exact method of determining the range is dependent upon the particular protocol and field in question.
An example of each type:
If there's a length field, say of 1 byte in length that precedes the string, then you can use something like:
local str_len = buffer(13, 1):le_uint()
subtree:add_le(m.msg_len_f, buffer(13))
if str_len > 0 then
subtree:add_le(m.msg_f, buffer(14, str_len))
end
And if the string is NULL-terminated, you can use something like:
local str = buffer(13):stringz()
local str_len = str:len()
subtree:add_le(m.msg_f, buffer(13, str_len + 1))
These are just pseudo-examples, so you'll need to apply whatever method, possibly none of these, to fit your data.
Refer to the Wireshark's Lua API Reference Manual for more details, or to the Wireshark LuaAPI wiki pages.
The solution I came up with is simply:
msg_f = ProtoField.string("mydissector.msg", "msg")
local getMsg = buffer(13) -- starting on byte 13
local msg = getMsg:le_ustring()
subtree:add(msg_f, getMsg, msg)

Writing UInt16List via IOSink.Add, what's the result?

Trying to write audio samples to a file.
I have List of 16-bit ints
UInt16List _samples = new UInt16List(0);
I add elements to this list as samples come in.
Then I can write to an IOSink like so:
IOSink _ios = ...
List<int> _toWrite;
_toWrite.addAll(_samples);
_ios.add(_toWrite);
or
_ios.add(_samples);
just works, no issues with types despite the signature of add taking List<int> and not UInt16List.
As I read, in Dart the 'int' type is 64 bit.
Are both writes above identical? Do they produce packed 16-bit ints in this file?
A Uint16List is-a List<int>. It's a list of integers which truncates writes to 16-bits, and always reads out 16-bit integers, but it is a list of integers.
If you copy those integers to a plain growable List<int>, it will contain the same integer values.
So, doing ios.add(_sample) will do the same as ios.add(_toWrite), and most likely neither does what you want.
The IOSink's add method expects a list of bytes. So, it will take a list of integers and assume that they are bytes. That means that it will only use the low 8 bits of each integer, which will likely sound awful if you try to play that back as a 16-bit audio sample.
If you want to store all 16 bits, you need to figure out how to store each 16-bit value in two bytes. The easy choice is to just assume that the platform byte order is fine, and do ios.add(_samples.buffer.asUint8List(_samples.offsetInBytes, _samples.lengthInBytes)). This will make a view of the 16-bit data as twice as many bytes, then write those bytes.
The endianness of those bytes (is the high byte first or last) depends on the platform, so if you want to be safe, you can convert the bytes to a fixed byte order first:
if (Endian.host == Endian.little) {
ios.add(
_samples.buffer.asUint8List(_samples.offsetInBytes, _samples.lengthInBytes);
} else {
var byteData = ByteData(_samples.length * 2);
for (int i = 0; i < _samples.length; i++) {
byteData.setUint16(i * 2, _samples[i], Endian.little);
}
var littleEndianData = byteData.buffer.asUint8List(0, _samples.length * 2);
ios.add(littleEndianData);
}

Convert first two bytes of Lua string (in bigendian format) to unsigned short number

I want to have a lua function that takes a string argument. String has N+2 bytes of data. First two bytes has length in bigendian format, and rest N bytes contain data.
Say data is "abcd" So the string is 0x00 0x04 a b c d
In Lua function this string is an input argument to me.
How can I calculate length optimal way.
So far I have tried below code
function calculate_length(s)
len = string.len(s)
if(len >= 2) then
first_byte = s:byte(1);
second_byte = s:byte(2);
//len = ((first_byte & 0xFF) << 8) or (second_byte & 0xFF)
len = second_byte
else
len = 0
end
return len
end
See the commented line (how I would have done in C).
In Lua how do I achieve the commented line.
The number of data bytes in your string s is #s-2 (assuming even a string with no data has a length of two bytes, each with a value of 0). If you really need to use those header bytes, you could compute:
len = first_byte * 256 + second_byte
When it comes to strings in Lua, a byte is a byte as this excerpt about strings from the Reference Manual makes clear:
The type string represents immutable sequences of bytes. Lua is 8-bit clean: strings can contain any 8-bit value, including embedded zeros ('\0'). Lua is also encoding-agnostic; it makes no assumptions about the contents of a string.
This is important if using the string.* library:
The string library assumes one-byte character encodings.
If the internal representation in Lua of your number is important, the following excerpt from the Lua Reference Manual may be of interest:
The type number uses two internal representations, or two subtypes, one called integer and the other called float. Lua has explicit rules about when each representation is used, but it also converts between them automatically as needed.... Therefore, the programmer may choose to mostly ignore the difference between integers and floats or to assume complete control over the representation of each number. Standard Lua uses 64-bit integers and double-precision (64-bit) floats, but you can also compile Lua so that it uses 32-bit integers and/or single-precision (32-bit) floats.
In other words, the 2 byte "unsigned short" C data type does not exist in Lua. Integers are stored using the "long long" type (8 byte signed).
Lastly, as lhf pointed out in the comments, bitwise operations were added to Lua in version 5.3, and if lhf is the lhf, he should know ;-)

Read first bytes of lrange results using Lua scripting

I'm want to read and filter data from a list in redis. I want to inspect the first 4 bytes (an int32) of data in a blob to compare to an int32 I will pass in as an ARG.
I have a script started, but how can I check the first 4 bytes?
local updates = redis.call('LRANGE', KEYS[1], 0, -1)
local ret = {}
for i=1,#updates do
-- read int32 header
-- if header > ARGV[1]
ret[#ret+1] = updates[i]
end
return ret
Also, I see there is a limited set of libraries: http://redis.io/commands/EVAL#available-libraries
EDIT: Some more poking around and I'm running into issues due to how LUA stores numbers - ARGV[1] is a 8 byte string, and cannot be safely be converted into a 64 bit number. I think this is due to LUA storing everything as doubles, which only have 52 bits of precision.
EDIT: I'm accepting the answer below, but changing the question to int32. The int64 part of the problem I put into another question: Comparing signed 64 bit number using 32 bit bitwise operations in Lua
The Redis Lua interpreter loads struct library, so try
if struct.unpack("I8",updates) > ARGV[1] then

How to write a double value byte by byte

I have to communicate with a dll and it lua and this is the function I use to write strings by bytes:
writeString = function(pid, process, address, value)
local i = 1
while i <= String.Length(value) do
local byte = string.byte(value, i, i)
DLL.CallFunction("hook.dll", "writeMemByte", pid..','..process..','..address + (i-1)..','..byte, DLL_RETURN_TYPE_INTEGER, DLL_CALL_CDECL)
i = i + 1
end
DLL.CallFunction("hook.dll", "writeMemByte", pid..','..process..','..address + (i-1)..',0', DLL_RETURN_TYPE_INTEGER, DLL_CALL_CDECL)
end
I basically need to adapt this to write a double value byte by byte.
I just can't think how to make the memory.writeDouble function.
EDIT: this is my readString function:
readString = function(pid, process, address)
local i, str = 0, ""
repeat
local curByte = DLL.CallFunction("hook.dll", "readMemByte", pid..','..process..','..(address + i), DLL_RETURN_TYPE_INTEGER, DLL_CALL_CDECL)
if curByte == "" then curByte = 0 end
curByte = tonumber(curByte)
str = str .. string.char(curByte)
i = i + 1
until (curByte == 0)
return str
end,
My first recommendation would be: try to find a function that accepts strings representing doubles instead of doubles. Implementing the lua side of that would be incredibly easy, since you already have a writeString - it could be something very similar to this:
writeDouble = function(pid, process, address, value)
writeString(pid, process, address, tostring(value))
end
If you don't have that function, but you have access to the dll source, you can try to add that function yourself; it shouldn't be much more complicated than getting the string and then calling atof on it.
If you really can't modify the dll, then you need to figure out the exact double format that the lib is expecting - there are lots of factors that can change that format. The language and compiler used, the operative systems, and the compiler flags, to cite some.
If the dll uses a standard format, like IEE-754, the format will usually have well documented "translations" from/two bites. Otherwise, it's possible that you'll have to develop them yourself.
Regards and good luck!
There are many libraries available for Lua that do just this.
If you need the resulting byte array (string), string.pack should do it; you can find precompiled binaries for Windows included with Lua for Windows.
If you are more interested in using the double to interface with foreign code, I would recommend taking a different approach using alien, a Foreign Function Interface library that lets you directly call C functions.
If you able to, I even more highly recommend switching to LuaJIT, a Just-In-Time compiler for Lua that provides the power, speed and reach of C and assembly, but with the comfort an flexibility of Lua.
If none of these solutions are viable, I can supply some code to serialise doubles (not accessible at the moment).

Resources