Difference between variable.functionName and variable["functionName"] - lua

I know that you can get variables and call functions both by using the name directly
variable.functionName
or using the name as a string
variable["functionName"] or variable[functionNameString]
Now my question is:
Is there any resulting difference in these different ways or are they completely interchangeable?
I am mostly interested about performance here, but any enlightenment is welcome.

The PUC-Rio Lua 5.1 byte code for
print(variable.functionName)
print(variable["functionName"])
print(variable[functionNameString])
is
main <var.lua:0,0> (14 instructions, 56 bytes at 0xafe530)
0+ params, 3 slots, 0 upvalues, 0 locals, 4 constants, 0 functions
1 [1] GETGLOBAL 0 -1 ; print
2 [1] GETGLOBAL 1 -2 ; variable
3 [1] GETTABLE 1 1 -3 ; "functionName"
4 [1] CALL 0 2 1
5 [2] GETGLOBAL 0 -1 ; print
6 [2] GETGLOBAL 1 -2 ; variable
7 [2] GETTABLE 1 1 -3 ; "functionName"
8 [2] CALL 0 2 1
9 [3] GETGLOBAL 0 -1 ; print
10 [3] GETGLOBAL 1 -2 ; variable
11 [3] GETGLOBAL 2 -4 ; functionNameString
12 [3] GETTABLE 1 1 2
13 [3] CALL 0 2 1
14 [3] RETURN 0 1
As you can see the first two lines generate exactly the same byte code (and thus take the same amount of time), while the third line has an additional (global) variable access.
The first line only works since "functionName" is a valid Lua identifier and not a reserved word. Lines 2 and 3 don't have restrictions about the format of the string key.

They are the same. From the manual:
... To represent records, Lua uses the field name as an index. The language supports this representation by providing a.name as syntactic sugar for a["name"].

Related

Can i avoid 2 lookups by using _G?

I've read that if you use some functions like table.insert, lua will first try to lookup the variable in the local scope, then in global scope. Can i bypass the local lookup by using _G.table.insert instead?
Here's the output of luac -l:
without _G
main <main.lua:0,0> (7 instructions at 0x55581ae50c60)
0+ params, 4 slots, 1 upvalue, 1 local, 3 constants, 0 functions
1 [1] NEWTABLE 0 0 0
2 [2] GETTABUP 1 0 -1 ; _ENV "table"
3 [2] GETTABLE 1 1 -2 ; "insert"
4 [2] MOVE 2 0
5 [2] LOADK 3 -3 ; 5
6 [2] CALL 1 3 1
7 [2] RETURN 0 1
with _G
main <main.lua:0,0> (8 instructions at 0x5562b3d6dc60)
0+ params, 4 slots, 1 upvalue, 1 local, 4 constants, 0 functions
1 [1] NEWTABLE 0 0 0
2 [2] GETTABUP 1 0 -1 ; _ENV "_G"
3 [2] GETTABLE 1 1 -2 ; "table"
4 [2] GETTABLE 1 1 -3 ; "insert"
5 [2] MOVE 2 0
6 [2] LOADK 3 -4 ; 5
7 [2] CALL 1 3 1
8 [2] RETURN 0 1
I'm not sure what the numbers mean.
_G is not reserved and it's probably even worse with it, as i can see from the compiler output. I think doing it with _G is even slower.

Is there a more efficient way to write this if (or) statement?

I'm looking to make sure that it is/isn't possible to write this statement more efficiently in Lua:
if (value == 1 or value == 2) then
something like this for an example (nonworking I assume):
if (value == (1 or 2)) then
or
if value == (1;2) then
Let's look at the produced bytecode as a proxy for speed. (Microbenchmarks are not reliable. Caching, pipelinging, branch prediction, … can have really weird effects that can make code that should be slower in principle perform better in practice in the context where you're actually using it. Bytecode size also isn't a very good indicator (same problems apply), but at least it's easy to produce, deterministic, and easy to interpret.)
(To follow along, throw your test files at luac -p -l, which will only parse (not write a compiled file) and list the resulting bytecode as a side-effect. If you want to understand the bytecode, have a look at the unofficial bytecode reference as initially created by Kein-Hong Man and kindly updated by Dibyendu Majumdar. But you don't have to.)
If value is a global variable, you'll get this:
1 [1] GETTABUP 0 0 -1 ; _ENV "value"
2 [1] EQ 1 0 -2 ; - 1 (fall through to next comparison)
3 [1] JMP 0 3 ; to 7 (true branch)
4 [1] GETTABUP 0 0 -1 ; _ENV "value"
5 [1] EQ 0 0 -3 ; - 2 (fall through into true branch)
6 [1] JMP 0 1 ; to 8 (beyond true branch)
Translating back into "pseudo-Lua", this is rougly
local r0 = _ENV["value"]
if r0 == 1 then goto true_branch end
local r0 = _ENV["value"]
if r0 ~= 2 then goto fin end
::true_branch::
-- stuff here
::fin::
If value is local (or a function argument) in the function where you use this, then you'll get something like this instead:
1 [1] EQ 1 0 -1 ; - 1
2 [1] JMP 0 2 ; to 5
3 [1] EQ 0 0 -2 ; - 2
4 [1] JMP 0 1 ; to 6
or roughly
if r0 == 1 then goto true_branch end
if r0 ~= 2 then goto fin end
::true_branch::
-- stuff here
::fin::
"Much" better!
So if value is a global variable (or an upvalue), doing
local value = value
if value == 1 or value == 2 then
-- stuff
end
will give you
1 [1] GETTABUP 0 0 -1 ; _ENV "value"
2 [1] EQ 1 0 -2 ; - 1
3 [1] JMP 0 2 ; to 6
4 [1] EQ 0 0 -3 ; - 2
5 [1] JMP 0 1 ; to 7
or
local r0 = _ENV["value"]
if r0 == 1 then goto true_branch end
if r0 == 2 then goto true_branch end
goto fin
::true_branch::
-- stuff here
::fin::
which saves one lookup. (While microbenchmarks will show a clear difference, in practice you'll almost never notice a difference. If you're doing a deeper lookup if foo.bar.baz == 1 or foo.bar.baz == 2 then it makes sense to local first, and it will probably also increase readability.)

Lua more than one locals in one line

Assuming we have the following code:
local x = 1
local x, y = 2, 3
I know x will become 2 after the second line, however, does the local on the that line create a new x, or use the one before?
They will be two different local values: the first one will be shadowed and not accessible as the second one is created with the same name in the same block. Here is the information that luac -l -l (Lua 5.3) shows for this script:
main <local.lua:0,0> (4 instructions at 00697ae8)
0+ params, 3 slots, 1 upvalue, 3 locals, 3 constants, 0 functions
1 [1] LOADK 0 -1 ; 1
2 [2] LOADK 1 -2 ; 2
3 [2] LOADK 2 -3 ; 3
4 [2] RETURN 0 1
constants (3) for 00697ae8:
1 1
2 2
3 3
locals (3) for 00697ae8:
0 x 2 5
1 x 4 5
2 y 4 5
upvalues (1) for 00697ae8:
0 _ENV 1 0
The locals section shows three variables with two x that have the same end-of-scope location.

Does the Lua compiler optimize local vars?

Is the current Lua compiler smart enough to optimize away local variables that are used for clarity?
local top = x - y
local bottom = x + y
someCall(top, bottom)
Or does inlining things by hand run faster?
someCall(x - y, x + y)
Since Lua often compiles source code into byte code on the fly, it is designed to be a fast single-pass compiler. It does do some constant folding, but other than that there are not many optimizations. You can usually check what the compiler does by executing luac -l -l -p file.lua and looking at the generated (disassembled) byte code.
In your case the Lua code
function a( x, y )
local top = x - y
local bottom = x + y
someCall(top, bottom)
end
function b( x, y )
someCall(x - y, x + y)
end
results int the following byte code listing when run through luac5.3 -l -l -p file.lua (some irrelevant parts skipped):
function <file.lua:1,5> (7 instructions at 0xcd7d30)
2 params, 7 slots, 1 upvalue, 4 locals, 1 constant, 0 functions
1 [2] SUB 2 0 1
2 [3] ADD 3 0 1
3 [4] GETTABUP 4 0 -1 ; _ENV "someCall"
4 [4] MOVE 5 2
5 [4] MOVE 6 3
6 [4] CALL 4 3 1
7 [5] RETURN 0 1
constants (1) for 0xcd7d30:
1 "someCall"
locals (4) for 0xcd7d30:
0 x 1 8
1 y 1 8
2 top 2 8
3 bottom 3 8
upvalues (1) for 0xcd7d30:
0 _ENV 0 0
function <file.lua:7,9> (5 instructions at 0xcd7f10)
2 params, 5 slots, 1 upvalue, 2 locals, 1 constant, 0 functions
1 [8] GETTABUP 2 0 -1 ; _ENV "someCall"
2 [8] SUB 3 0 1
3 [8] ADD 4 0 1
4 [8] CALL 2 3 1
5 [9] RETURN 0 1
constants (1) for 0xcd7f10:
1 "someCall"
locals (2) for 0xcd7f10:
0 x 1 6
1 y 1 6
upvalues (1) for 0xcd7f10:
0 _ENV 0 0
As you can see, the first variant (the a function) has two additional MOVE instructions, and two additional locals.
If you are interested in the details of the opcodes, you can check the comments for the OpCode enum in lopcodes.h.
E.g. the opcode format for OP_ADD is:
OP_ADD,/* A B C R(A) := RK(B) + RK(C) */
So the 2 [3] ADD 3 0 1 from above takes the values from registers 0 and 1 (the locals x and y in this case), adds them together, and stores the result in register 3. It is the second opcode in this function and the corresponding source code is on line 3.

Automatically learning clusters

HI complete newbie question here: I have a table consisting of two columns. First column belongs to "bins" that are coded by where a the fruit flies live. The second column is either 0 or 1, neutral vs really like sugar, respectively. I have two question?
1) if I suspect that there is a single variable, something about where they live that is determining whether how much they like sugar. Is there a way that I can have the computer to group into just 2 clusters? All the bins that like sugar vs neutral. That way we can do further experiment to determine what is it about the bins.
2) automatically determine how many clusters there might be that is driving this behavior? For example may be there is 4 variables (4 clusters) that can determine the outcome of sugar preference.
Apologies if this is trivial. The table is listed below. thanks!
Bin sugar
1 1
1 1
1 0
1 0
2 1
2 0
2 0
3 1
3 0
3 1
3 1
4 1
4 1
4 1
5 1
5 0
5 1
6 0
6 0
6 0
7 0
7 1
7 1
8 1
8 0
8 1
9 1
9 0
9 0
9 0
10 0
10 0
10 0
11 1
11 1
11 1
12 0
12 0
12 0
12 0
13 0
13 0
13 1
13 0
13 0
14 0
14 0
14 0
14 0
15 1
15 0
15 0
16 1
16 1
17 1
17 1
18 0
18 1
18 1
17 1
19 1
20 1
20 0
20 0
20 1
21 0
21 0
21 1
21 0
22 1
22 0
22 1
22 1
23 1
23 1
24 1
24 0
25 0
25 1
25 0
26 1
26 1
27 1
27 1
Okay, assuming I understood what you meant, one approach to problem 1) should be addressed using bayes filtering.
Say event L is "a fly likes sugar", event B is "a fly is in bin B".
So what you have is:
number of flies = 84
size of each bins = (eg size of bin 1: 4)
probability that a fly likes sugar:
P(L) = flies that like sugar / total number of flies = 43/84
probability that a fly doesn't like sugar:
P(notL) = 1 - P(L) = 41/84
probability that a fly is in a given bin:
P(B) = size of the bin / sum of the sizes of all bins = 4/84 (for bin 1)
probability that a fly isn't in a given bin:
P(notB) = 1 - P(B) = 80/84 (for bin 1)
probability that a fly likes sugar, knowing that's in bin B:
P(L|B) = flies that like sugar in a bin / size of the bin
(eg for bin 1 is 2/4 = 1/2)
probability that a fly likes sugar, knowing that it's not in bin B:
P(L|notB) = (total flies that like sugar - flies that like sugar in the bin)/(size of bins - size of the bin)) = 41/80
You want to know the probability that a fly is in a given bin B knowing that likes sugar, which you can obtain with:
P(B|L) = (P(L|B) * P(B)) / (P(L|B) * P(B) + P(L|notB) * P(notB))
If you compute P(B|L) and P(B|notL) for each bin, then you know which of the bins have the highest probability of containing flies that like sugar. Then you can further study those bins.
Hope i was clear, my statistics is a bit rusty and I'm not even sure I am doing everything correctly. Take it as a hint to point you in the right direction to address the problem.
You can refer here to get more accurate reasoning and results.
As for problem 2)... I have to think about it a bit more.

Resources