Read first bytes of lrange results using Lua scripting - lua

I'm want to read and filter data from a list in redis. I want to inspect the first 4 bytes (an int32) of data in a blob to compare to an int32 I will pass in as an ARG.
I have a script started, but how can I check the first 4 bytes?
local updates = redis.call('LRANGE', KEYS[1], 0, -1)
local ret = {}
for i=1,#updates do
-- read int32 header
-- if header > ARGV[1]
ret[#ret+1] = updates[i]
end
return ret
Also, I see there is a limited set of libraries: http://redis.io/commands/EVAL#available-libraries
EDIT: Some more poking around and I'm running into issues due to how LUA stores numbers - ARGV[1] is a 8 byte string, and cannot be safely be converted into a 64 bit number. I think this is due to LUA storing everything as doubles, which only have 52 bits of precision.
EDIT: I'm accepting the answer below, but changing the question to int32. The int64 part of the problem I put into another question: Comparing signed 64 bit number using 32 bit bitwise operations in Lua

The Redis Lua interpreter loads struct library, so try
if struct.unpack("I8",updates) > ARGV[1] then

Related

How to generate a 32 bit big-endian number in the format 0x00000001 in erlang

I need to generate a variable which has the following properties -
32 bit, big-endian integer, initialized with 0x00000001 (I'm going to increment that number one by one). Is there a syntax in erlang for this?
In Erlang, normally you'd keep such numbers as plain integers inside the program:
X = 1.
or equivalently, if you want to use a hexadecimal literal:
X = 16#00000001.
And when it's time to convert the number to a binary representation in order to send it somewhere else, use bit syntax:
<<X:32/big>>
This returns a binary containing four bytes:
<<0,0,0,1>>
(That's a 32-bit big-endian integer. In fact, big-endian is the default, so you could just write <<X:32>>. <<X:64/little>> would be a 64-bit little-endian integer.)
On the other hand, if you just want to print the number in 0x00000001 format, use io:format with this format specifier:
io:format("0x~8.16.0b~n", [X]).
The 8 tells it to use a field width of 8 characters, the 16 tells it to use radix 16 (i.e. hexadecimal), and the 0 is the padding character, used for filling the number up to the field width.
Note that incrementing a variable works differently in Erlang compared to other languages. Once a variable has been assigned a value, you can't change it, so you'd end up making a recursive call, passing the new value as an argument to the function. This answer has an example.
According to the documentation[1] the following snippet should generate a 32-bit signed integer in little endian.
1> I = 258.
258
2> B = <<I:4/little-signed-integer-unit:8>>.
<<2,1,0,0>>
And the following should produce big endian numbers:
1> I = 258.
258
2> B = <<I:4/big-signed-integer-unit:8>>.
<<0,0,1,2>>
[1] http://erlang.org/doc/programming_examples/bit_syntax.html

Parse array of unsigned integers in Julia 1.x.x

I am trying to open a binary file that I have some knowledge of its internal structure, and reinterpret it correctly in Julia. Let us say that I can load it already via:
arx=open("../axonbinaryfile.abf", "r")
databin=read(arx)
close(arx)
The data is loaded as an Array of UInt8, which I guess are bytes.
In the first 4 I can perform a simple Char conversion and it works:
head=databin[1:4]
map(Char, head)
4-element Array{Char,1}:
'A'
'B'
'F'
' '
Then it happens to be that in the positions 13-16 is an integer of 32 bytes waiting to be interpreted. How should I do that?
I have tried reinterpret() and Int32 as function, but to no avail.
You can use reinterpret(Int32, databin[13:16])[1]. The last [1] is needed, because reinterpret returns you a view.
Now note that read supports type passing. So if you first read 12 bytes of data from your file e.g. like this read(arx, 12) and then run read(arx, Int32) you will get the desired number without having to do any conversions or vector allocation.
Finally observe that what conversion to Char does in your code is converting a Unicode number to a character. I am not sure if this is exactly what you want (maybe it is). For example if the first byte read in has value 200 you will get:
julia> Char(200)
'È': Unicode U+00c8 (category Lu: Letter, uppercase)
EDIT one more comment is that when you do a conversion to Int32 of 4 bytes you should be sure to check if it should be encoded as big-endian or little-endian (see ENDIAN_BOM constant and ntoh, hton, ltoh, htol functions)
Here it is. Use view to avoid copying the data.
julia> dat = UInt8[65,66,67,68,0,0,2,40];
julia> Char.(view(dat,1:4))
4-element Array{Char,1}:
'A'
'B'
'C'
'D'
julia> reinterpret(Int32, view(dat,5:8))
1-element reinterpret(Int32, view(::Array{UInt8,1}, 5:8)):
671219712

Convert first two bytes of Lua string (in bigendian format) to unsigned short number

I want to have a lua function that takes a string argument. String has N+2 bytes of data. First two bytes has length in bigendian format, and rest N bytes contain data.
Say data is "abcd" So the string is 0x00 0x04 a b c d
In Lua function this string is an input argument to me.
How can I calculate length optimal way.
So far I have tried below code
function calculate_length(s)
len = string.len(s)
if(len >= 2) then
first_byte = s:byte(1);
second_byte = s:byte(2);
//len = ((first_byte & 0xFF) << 8) or (second_byte & 0xFF)
len = second_byte
else
len = 0
end
return len
end
See the commented line (how I would have done in C).
In Lua how do I achieve the commented line.
The number of data bytes in your string s is #s-2 (assuming even a string with no data has a length of two bytes, each with a value of 0). If you really need to use those header bytes, you could compute:
len = first_byte * 256 + second_byte
When it comes to strings in Lua, a byte is a byte as this excerpt about strings from the Reference Manual makes clear:
The type string represents immutable sequences of bytes. Lua is 8-bit clean: strings can contain any 8-bit value, including embedded zeros ('\0'). Lua is also encoding-agnostic; it makes no assumptions about the contents of a string.
This is important if using the string.* library:
The string library assumes one-byte character encodings.
If the internal representation in Lua of your number is important, the following excerpt from the Lua Reference Manual may be of interest:
The type number uses two internal representations, or two subtypes, one called integer and the other called float. Lua has explicit rules about when each representation is used, but it also converts between them automatically as needed.... Therefore, the programmer may choose to mostly ignore the difference between integers and floats or to assume complete control over the representation of each number. Standard Lua uses 64-bit integers and double-precision (64-bit) floats, but you can also compile Lua so that it uses 32-bit integers and/or single-precision (32-bit) floats.
In other words, the 2 byte "unsigned short" C data type does not exist in Lua. Integers are stored using the "long long" type (8 byte signed).
Lastly, as lhf pointed out in the comments, bitwise operations were added to Lua in version 5.3, and if lhf is the lhf, he should know ;-)

What does 16#4000000 mean in Erlang?

I'm reading ejabberd source, specifically ejabberd_http.erl.
In the code below,
...
case (State#state.sockmod):recv(State#state.socket,
min(Len, 16#4000000), 300000)
of
{ok, Data} ->
recv_data(State, Len - byte_size(Data), <<Acc/binary, Data/binary>>);
...
What does 16#4000000 mean?
I've tested this in the Erlang shell.
%%erlang shell
...
7>16#4000000.
67108864
8>is_integer(16#4000000).
true
I know it's just an integer value.
Is there any advantage to writing 16#4000000 instead of 67108864?
In Erlang, the number before the # is the integer base. In your example, 16#4000000 means the hexadecimal representation of 67108864. In other languages it is often represented as 0x4000000.
One reason for using the hex representation is because each digit represents 4 bits, for example 16#F is 16 (in decimal), or 1111 in binary. When working with binary processing, using base 16 makes it easier to handle and understand for the human reader.

TCP/IP Client / Server commands data

I have a Client/Server architecture (C# .Net 4.0) that send's command packets of data as byte arrays. There is a variable number of parameters in any command, and each paramater is of variable length. Because of this I use delimiters for the end of a parameter and the command as a whole. The operand is always 2 bytes and both types of delimiter are 1 byte. The last parameter_delmiter is redundant as command_delmiter provides the same functionality.
The command structure is as follow:
FIELD SIZE(BYTES)
operand 2
parameter1 x
parameter_delmiter 1
parameter2 x
parameter_delmiter 1
parameterN x
.............
.............
command_delmiter 1
Parameters are sourced from many different types, ie, ints, strings etc all encoded into byte arrays.
The problem I have is that sometimes parameters when encoded into byte arrays contain bytes that are the same value as a delimiter. For example command_delmiter=255.. and a paramater may have that byte inside of it.
There is 3 ways I can think of fixing this:
1) Encode the parameters differently so that they can never be the same value as a delimiter (255 and 254) Modulus?. This will mean that paramaters will become larger, ie Int16 will be more than 2 bytes etc.
2) Do not use delimiters at all, use count and length values at the start of the command structure.
3) Use something else.
To my knowledge, the way TCP/IP buffers work is that SOME SORT of delimiter has to be used to seperate 'commands' or 'bundles of data' as a buffer may contain multiple commands, or a command may span multiple buffers.. So this
BinaryReader / Writer seems like an obvious candidate, the only issue is that the byte array may contain multiple commands ( with parameters inside). So the byte array would still have to be chopped up in order to feel into the BinaryReader.
Suggestions?
Thanks.
The standard way to do this is to have the length of the message in the (fixed) first few bytes of a message. So you could have the first 4 bytes to denote the length of a message, read those many bytes for the content of the message. The next 4 bytes would be the length of the next message. A length of 0 could indicate end of messages. Or you could use a header with a message count.
Also, remember TCP is a byte stream, so don't expect a complete message to be available every time you read data from a socket. You could receive an arbitrary number of bytes at ever read.

Resources