In the register based lua virtual machine are the registers fixed size?
Or is it a dynamic structure?
I found an bytecode example here at page 17 where the constant string "hello" is loaded into a register, so it must be dynamic? Isn't this uncommon for registers?
http://luaforge.net/docman/83/98/ANoFrillsIntroToLua51VMInstructions.pdf
Each register contains a Lua value. Lua values are implemented in C as tagged unions. See also: The Implementation Of Lua 5.0. This tagged union stores small types (booleans, numbers) by value and everything else (strings, tables, functions, etc.) as a pointer. So the size of a register is constant, though larger than one native machine word.
Records are compile time structures. The record_info and is_record recognise the compiled records and their structures. Is there a way to ask the VM what records have been defined that are available to the process? I am interested in getting the internal tuple representation for every record definition.
What I want to do is something like:
-record(car,{make=honda}).
get_record(Car) ->
%% Some magic here to end up having sth like
{car,{make,honda}} or even better #car{} indeed. %% when Car = 'car'
As you said records are only a compile time construct, so once compiled records are only tuples, this would suggest no available information is left during runtime, but since you mentioned those two functions I was curious and I checked how they worked.
According to this record_info/2 is a pseudo function made available only during compilation, so it doesn't need any run time information on records.
On the other hand the description of is_record(Term, RecordTag) states that this BIF (built-in function) only "returns true if Term is a tuple and its first element is RecordTag, false otherwise", so it is actually only checking the structure and first element of the tuple.
Based on this, I would guess that there is no record information made available during runtime. This thread confirms the unavailability of record_info/2 during runtime.
I have used Dynarec (https://github.com/dieswaytoofast/dynarec.git) successfully in a data mapping module for one of the apps I am currently working on. It is a parse transformer, though, not a run-time VM tool. It compiles information on each defined record, as well as information about the fields for each record. In my case, I use it to dynamically map incoming data to record data. This module may get you what you need. YMMV. Good luck.
As others have said records are purely compile time and there is no runtime information about records. Erlang just sees tuples. For example the record_info/2 pseudo functions are expanded to data at compile time, a list of atoms for fields argument and an integer for size.
I want to construct a string in Java that represents a DICT term and that will be passed to an Erlang process for being reflected back as an erlang term ( string-to-term ).
I can achieve this easily for ORDDICT 's, since they are structured as a simple sorted key / value pair in a list of tuples such as : [ {field1 , "value1"} , {field2 , "value2} ]
But, for DICTS, they are compiled into a specific term that I want to find how to reverse-engineer it. I am aware this structure can change over new releases, but the benefits for performance and ease of integration to Java would overcome this. Unfortunately Erlang's JInterface is based on simple data structures. An efficient DICT type would be of great use.
A simple dict gets defined as follows:
D1 = dict:store("field1","AAA",dict:new()).
{dict,1,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
{{[],[],[],[],[],[],[],[],
[["field1",65,65,65]],
[],[],[],[],[],[],[]}}}
As it can be seen above, there are some coordinates which I do not understand what they mean ( the numbers 1,16,16,8,80,48 and a set of empty lists, which likely represent something as well.
Adding two other rows (key-value pairs) causes the data to look like:
D3 = dict:store("field3","CCC",D2).
{dict,3,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
{{[],[],
[["field3",67,67,67]],
[],[],[],[],[],
[["field1",65,65,65]],
[],[],[],[],
[["field2",66,66,66]],
[],[]}}}
From the above I can notice that:
the first number (3) reppresets the number of items in the DICT.
the second number (16) shows the number of list slots in the first tuple of lists
the third number (16) shows the number of list slots in the second typle of lists, of which the values ended up being placed on ( in the middle ).
the fourth number (8) appears to be the number of slots in the second row of tuples from where the values are placed ( a sort of index-pointer )
the remaining numbers (80 and 48)... no idea...
adding a key "field0" gets placed not in the end but just after "field1"'s data. This indicates the indexing approach.
So the question, is there a way (algorithm) to reliably directly create a DICT string from outside of Erlang ?
The comprehensive specification how dict is implemented can be found simply in the dict.erl sourcecode.
But I'm not sure replicating dict.erl's implementation in Java is worthwhile. This would only make sense if you want a fast dict like data structure that you need to pass often between Java and Erlang code. It might make more sense to use a Key-Value store both from Erlang and Java without passing it directly around. Depending on your application this could be e.g. riak or maybe even connect your different language worlds with RabbitMQ. Both examples are implemented in Erlang and are easily accessible from both worlds.
I heard that an atom table can fill up in Erlang, leaving the system open for DDoS unless you increase the number of atoms that can be created. It looks like binary_to_existing_atom/2 is the solution to this.
Can anyone explain exactly how binary_to_atom/2 is a security implication and how binary_to_existing_atom/2 solves this problem?
When an atom is first used it is given an internal number and put in an array in the VM. This array is allocated statically and can fill up if enough different atoms are used. binary_to_existing_atom will only convert a binary string to an atom which already exists in the array, if it does not exist the call will fail.
If you are converting input data directly to atoms without doing any sanity checks it would be possible for an external client to send <<"a">> and <<"b">> until the array is full at which point the vm will crash.
Another way to avoid this is to simply not use binary_to_atom and instead pattern match on different binaries and return the desired atom.
list_to_atom/1 and binary_to_atom/1 are very serious bugs in erlang code.
Always create a major function like this:
to_atom(X) when is_list(X) ->
try list_to_existing_atom(X) of
Atom -> Atom
catch
_Error:_ErrorReason -> list_to_atom(X)
end.
In this way, if the atom already exists in the Atom table, the try body avoids creating the atom again. Its only created the first time this function is called.
I am able to understand immutability with python (surprisingly simple too). Let's say I assign a number to
x = 42
print(id(x))
print(id(42))
On both counts, the value I get is
505494448
My question is, does python interpreter allot ids to all the numbers, alphabets, True/False in the memory before the environment loads? If it doesn't, how are the ids kept track of? Or am I looking at this in the wrong way? Can someone explain it please?
What you're seeing is an implementation detail (an internal optimization) calling interning. This is a technique (used by implementations of a number of languages including Java and Lua) which aliases names or variables to be references to single object instances where that's possible or feasible.
You should not depend on this behavior. It's not part of the language's formal specification and there are no guarantees that separate literal references to a string or integer will be interned nor that a given set of operations (string or numeric) yielding a given object will be interned against otherwise identical objects.
I've heard that the C Python implementation does include a set of the first hundred or so integers as statically instantiated immutable objects. I suspect that other very high level language run-time libraries are likely to include similar optimizations: the first hundred integers are used very frequently by most non-trivial fragments of code.
In terms of how such things are implemented ... for strings and larger integers it would make sense for Python to maintain these as dictionaries. Thus any expression yielding an integer (and perhaps even floats) and strings (at least sufficiently short strings) would be hashed, looked up in the appropriate (internal) object dictionary, added if necessary and then returned as references to the resulting object.
You can do your own similar interning of any sorts of custom object you like by wrapping the instantiation in your own calls to your own class static dictionary.