What does 16#4000000 mean in Erlang? - erlang

I'm reading ejabberd source, specifically ejabberd_http.erl.
In the code below,
...
case (State#state.sockmod):recv(State#state.socket,
min(Len, 16#4000000), 300000)
of
{ok, Data} ->
recv_data(State, Len - byte_size(Data), <<Acc/binary, Data/binary>>);
...
What does 16#4000000 mean?
I've tested this in the Erlang shell.
%%erlang shell
...
7>16#4000000.
67108864
8>is_integer(16#4000000).
true
I know it's just an integer value.
Is there any advantage to writing 16#4000000 instead of 67108864?

In Erlang, the number before the # is the integer base. In your example, 16#4000000 means the hexadecimal representation of 67108864. In other languages it is often represented as 0x4000000.
One reason for using the hex representation is because each digit represents 4 bits, for example 16#F is 16 (in decimal), or 1111 in binary. When working with binary processing, using base 16 makes it easier to handle and understand for the human reader.

Related

How to generate a 32 bit big-endian number in the format 0x00000001 in erlang

I need to generate a variable which has the following properties -
32 bit, big-endian integer, initialized with 0x00000001 (I'm going to increment that number one by one). Is there a syntax in erlang for this?
In Erlang, normally you'd keep such numbers as plain integers inside the program:
X = 1.
or equivalently, if you want to use a hexadecimal literal:
X = 16#00000001.
And when it's time to convert the number to a binary representation in order to send it somewhere else, use bit syntax:
<<X:32/big>>
This returns a binary containing four bytes:
<<0,0,0,1>>
(That's a 32-bit big-endian integer. In fact, big-endian is the default, so you could just write <<X:32>>. <<X:64/little>> would be a 64-bit little-endian integer.)
On the other hand, if you just want to print the number in 0x00000001 format, use io:format with this format specifier:
io:format("0x~8.16.0b~n", [X]).
The 8 tells it to use a field width of 8 characters, the 16 tells it to use radix 16 (i.e. hexadecimal), and the 0 is the padding character, used for filling the number up to the field width.
Note that incrementing a variable works differently in Erlang compared to other languages. Once a variable has been assigned a value, you can't change it, so you'd end up making a recursive call, passing the new value as an argument to the function. This answer has an example.
According to the documentation[1] the following snippet should generate a 32-bit signed integer in little endian.
1> I = 258.
258
2> B = <<I:4/little-signed-integer-unit:8>>.
<<2,1,0,0>>
And the following should produce big endian numbers:
1> I = 258.
258
2> B = <<I:4/big-signed-integer-unit:8>>.
<<0,0,1,2>>
[1] http://erlang.org/doc/programming_examples/bit_syntax.html

Convert first two bytes of Lua string (in bigendian format) to unsigned short number

I want to have a lua function that takes a string argument. String has N+2 bytes of data. First two bytes has length in bigendian format, and rest N bytes contain data.
Say data is "abcd" So the string is 0x00 0x04 a b c d
In Lua function this string is an input argument to me.
How can I calculate length optimal way.
So far I have tried below code
function calculate_length(s)
len = string.len(s)
if(len >= 2) then
first_byte = s:byte(1);
second_byte = s:byte(2);
//len = ((first_byte & 0xFF) << 8) or (second_byte & 0xFF)
len = second_byte
else
len = 0
end
return len
end
See the commented line (how I would have done in C).
In Lua how do I achieve the commented line.
The number of data bytes in your string s is #s-2 (assuming even a string with no data has a length of two bytes, each with a value of 0). If you really need to use those header bytes, you could compute:
len = first_byte * 256 + second_byte
When it comes to strings in Lua, a byte is a byte as this excerpt about strings from the Reference Manual makes clear:
The type string represents immutable sequences of bytes. Lua is 8-bit clean: strings can contain any 8-bit value, including embedded zeros ('\0'). Lua is also encoding-agnostic; it makes no assumptions about the contents of a string.
This is important if using the string.* library:
The string library assumes one-byte character encodings.
If the internal representation in Lua of your number is important, the following excerpt from the Lua Reference Manual may be of interest:
The type number uses two internal representations, or two subtypes, one called integer and the other called float. Lua has explicit rules about when each representation is used, but it also converts between them automatically as needed.... Therefore, the programmer may choose to mostly ignore the difference between integers and floats or to assume complete control over the representation of each number. Standard Lua uses 64-bit integers and double-precision (64-bit) floats, but you can also compile Lua so that it uses 32-bit integers and/or single-precision (32-bit) floats.
In other words, the 2 byte "unsigned short" C data type does not exist in Lua. Integers are stored using the "long long" type (8 byte signed).
Lastly, as lhf pointed out in the comments, bitwise operations were added to Lua in version 5.3, and if lhf is the lhf, he should know ;-)

Read first bytes of lrange results using Lua scripting

I'm want to read and filter data from a list in redis. I want to inspect the first 4 bytes (an int32) of data in a blob to compare to an int32 I will pass in as an ARG.
I have a script started, but how can I check the first 4 bytes?
local updates = redis.call('LRANGE', KEYS[1], 0, -1)
local ret = {}
for i=1,#updates do
-- read int32 header
-- if header > ARGV[1]
ret[#ret+1] = updates[i]
end
return ret
Also, I see there is a limited set of libraries: http://redis.io/commands/EVAL#available-libraries
EDIT: Some more poking around and I'm running into issues due to how LUA stores numbers - ARGV[1] is a 8 byte string, and cannot be safely be converted into a 64 bit number. I think this is due to LUA storing everything as doubles, which only have 52 bits of precision.
EDIT: I'm accepting the answer below, but changing the question to int32. The int64 part of the problem I put into another question: Comparing signed 64 bit number using 32 bit bitwise operations in Lua
The Redis Lua interpreter loads struct library, so try
if struct.unpack("I8",updates) > ARGV[1] then

Can I get a list of all currently-registered atoms?

My project has blown through the max 1M atoms, we've cranked up the limit, but I need to apply some sanity to the code that people are submitting with regard to list_to_atom and its friends. I'd like to start by getting a list of all the registered atoms so I can see where the largest offenders are. Is there any way to do this. I'll have to be creative about how I do it so I don't end up trying to dump 1-2M lines in a live console.
You can get hold of all atoms by using an undocumented feature of the external term format.
TL;DR: Paste the following line into the Erlang shell of your running node. Read on for explanation and a non-terse version of the code.
(fun F(N)->try binary_to_term(<<131,75,N:24>>) of A->[A]++F(N+1) catch error:badarg->[]end end)(0).
Elixir version by Ivar Vong:
for i <- 0..:erlang.system_info(:atom_count)-1, do: :erlang.binary_to_term(<<131,75,i::24>>)
An Erlang term encoded in the external term format starts with the byte 131, then a byte identifying the type, and then the actual data. I found that EEP-43 mentions all the possible types, including ATOM_INTERNAL_REF3 with type byte 75, which isn't mentioned in the official documentation of the external term format.
For ATOM_INTERNAL_REF3, the data is an index into the atom table, encoded as a 24-bit integer. We can easily create such a binary: <<131,75,N:24>>
For example, in my Erlang VM, false seems to be the zeroth atom in the atom table:
> binary_to_term(<<131,75,0:24>>).
false
There's no simple way to find the number of atoms currently in the atom table*, but we can keep increasing the number until we get a badarg error.
So this little module gives you a list of all atoms:
-module(all_atoms).
-export([all_atoms/0]).
atom_by_number(N) ->
binary_to_term(<<131,75,N:24>>).
all_atoms() ->
atoms_starting_at(0).
atoms_starting_at(N) ->
try atom_by_number(N) of
Atom ->
[Atom] ++ atoms_starting_at(N + 1)
catch
error:badarg ->
[]
end.
The output looks like:
> all_atoms:all_atoms().
[false,true,'_',nonode#nohost,'$end_of_table','','fun',
infinity,timeout,normal,call,return,throw,error,exit,
undefined,nocatch,undefined_function,undefined_lambda,
'DOWN','UP','EXIT',aborted,abs_path,absoluteURI,ac,accessor,
active,all|...]
> length(v(-1)).
9821
* In Erlang/OTP 20.0, you can call erlang:system_info(atom_count):
> length(all_atoms:all_atoms()) == erlang:system_info(atom_count).
true
I'm not sure if there's a way to do it on a live system, but if you can run it in a test environment you should be able to get a list via crash dump. The atom table is near the end of the crash dump format. You can create a crash dump via erlang:halt/1, but that will bring down the whole runtime system.
I dare say that if you use more than 1M atoms, then you are doing something wrong. Atoms are intended to be static as soon as the application runs or at least upper bounded by some small number, 3000 or so for a medium sized application.
Be very careful when an enemy can generate atoms in your vm. especially calls like list_to_atom/1 is somewhat dangerous.
EDITED (wrong answer..)
You can adjust number of atoms with +t
http://www.erlang.org/doc/efficiency_guide/advanced.html
..but I know very few use cases when it is necessary.
You can track atom stats with erlang:memory()

integer division

By definition the integer division returns the quotient.
Why 4613.9145 div 100. gives an error ("bad argument") ?
For div the arguments need to be integers. / accepts arbitrary numbers as arguments, especially floats. So for your example, the following would work:
1> 4613.9145 / 100.
46.139145
To contrast the difference, try:
2> 10 / 10.
1.0
3> 10 div 10.
1
Documentation: http://www.erlang.org/doc/reference_manual/expressions.html
Update: Integer division, sometimes denoted \, can be defined as:
a \ b = floor(a / b)
So you'll need a floor function, which isn't in the standard lib.
% intdiv.erl
-module(intdiv).
-export([floor/1, idiv/2]).
floor(X) when X < 0 ->
T = trunc(X),
case X - T == 0 of
true -> T;
false -> T - 1
end;
floor(X) ->
trunc(X) .
idiv(A, B) ->
floor(A / B) .
Usage:
$ erl
...
Eshell V5.7.5 (abort with ^G)
> c(intdiv).
{ok,intdiv}
> intdiv:idiv(4613.9145, 100).
46
Integer division in Erlang, div, is defined to take two integers as input and return an integer. The link you give in an earlier comment, http://mathworld.wolfram.com/IntegerDivision.html, only uses integers in its examples so is not really useful in this discussion. Using trunc and round will allow you use any arguments you wish.
I don't know quite what you mean by "definition." Language designers are free to define operators however they wish. In Erlang, they have defined div to accept only integer arguments.
If it is the design decisions of Erlang's creators that you are interested in knowing, you could email them. Also, if you are curious enough to sift through the (remarkably short) grammar, you can find it here. Best luck!
Not sure what you're looking for, #Bertaud. Regardless of how it's defined elsewhere, Erlang's div only works on integers. You can convert the arguments to integers before calling div:
trunc(4613.9145) div 100.
or you can use / instead of div and convert the quotient to an integer afterward:
trunc(4613.9145 / 100).
And trunc may or may not be what you want- you may want round, or floor or ceiling (which are not defined in Erlang's standard library, but aren't hard to define yourself, as miku did with floor above). That's part of the reason Erlang doesn't assume something and do the conversion for you. But in any case, if you want an integer quotient from two non-integers in Erlang, you have to have some sort of explicit conversion step somewhere.

Resources