Does the Lua compiler optimize local vars? - lua

Is the current Lua compiler smart enough to optimize away local variables that are used for clarity?
local top = x - y
local bottom = x + y
someCall(top, bottom)
Or does inlining things by hand run faster?
someCall(x - y, x + y)

Since Lua often compiles source code into byte code on the fly, it is designed to be a fast single-pass compiler. It does do some constant folding, but other than that there are not many optimizations. You can usually check what the compiler does by executing luac -l -l -p file.lua and looking at the generated (disassembled) byte code.
In your case the Lua code
function a( x, y )
local top = x - y
local bottom = x + y
someCall(top, bottom)
end
function b( x, y )
someCall(x - y, x + y)
end
results int the following byte code listing when run through luac5.3 -l -l -p file.lua (some irrelevant parts skipped):
function <file.lua:1,5> (7 instructions at 0xcd7d30)
2 params, 7 slots, 1 upvalue, 4 locals, 1 constant, 0 functions
1 [2] SUB 2 0 1
2 [3] ADD 3 0 1
3 [4] GETTABUP 4 0 -1 ; _ENV "someCall"
4 [4] MOVE 5 2
5 [4] MOVE 6 3
6 [4] CALL 4 3 1
7 [5] RETURN 0 1
constants (1) for 0xcd7d30:
1 "someCall"
locals (4) for 0xcd7d30:
0 x 1 8
1 y 1 8
2 top 2 8
3 bottom 3 8
upvalues (1) for 0xcd7d30:
0 _ENV 0 0
function <file.lua:7,9> (5 instructions at 0xcd7f10)
2 params, 5 slots, 1 upvalue, 2 locals, 1 constant, 0 functions
1 [8] GETTABUP 2 0 -1 ; _ENV "someCall"
2 [8] SUB 3 0 1
3 [8] ADD 4 0 1
4 [8] CALL 2 3 1
5 [9] RETURN 0 1
constants (1) for 0xcd7f10:
1 "someCall"
locals (2) for 0xcd7f10:
0 x 1 6
1 y 1 6
upvalues (1) for 0xcd7f10:
0 _ENV 0 0
As you can see, the first variant (the a function) has two additional MOVE instructions, and two additional locals.
If you are interested in the details of the opcodes, you can check the comments for the OpCode enum in lopcodes.h.
E.g. the opcode format for OP_ADD is:
OP_ADD,/* A B C R(A) := RK(B) + RK(C) */
So the 2 [3] ADD 3 0 1 from above takes the values from registers 0 and 1 (the locals x and y in this case), adds them together, and stores the result in register 3. It is the second opcode in this function and the corresponding source code is on line 3.

Related

Can i avoid 2 lookups by using _G?

I've read that if you use some functions like table.insert, lua will first try to lookup the variable in the local scope, then in global scope. Can i bypass the local lookup by using _G.table.insert instead?
Here's the output of luac -l:
without _G
main <main.lua:0,0> (7 instructions at 0x55581ae50c60)
0+ params, 4 slots, 1 upvalue, 1 local, 3 constants, 0 functions
1 [1] NEWTABLE 0 0 0
2 [2] GETTABUP 1 0 -1 ; _ENV "table"
3 [2] GETTABLE 1 1 -2 ; "insert"
4 [2] MOVE 2 0
5 [2] LOADK 3 -3 ; 5
6 [2] CALL 1 3 1
7 [2] RETURN 0 1
with _G
main <main.lua:0,0> (8 instructions at 0x5562b3d6dc60)
0+ params, 4 slots, 1 upvalue, 1 local, 4 constants, 0 functions
1 [1] NEWTABLE 0 0 0
2 [2] GETTABUP 1 0 -1 ; _ENV "_G"
3 [2] GETTABLE 1 1 -2 ; "table"
4 [2] GETTABLE 1 1 -3 ; "insert"
5 [2] MOVE 2 0
6 [2] LOADK 3 -4 ; 5
7 [2] CALL 1 3 1
8 [2] RETURN 0 1
I'm not sure what the numbers mean.
_G is not reserved and it's probably even worse with it, as i can see from the compiler output. I think doing it with _G is even slower.

Is there a more efficient way to write this if (or) statement?

I'm looking to make sure that it is/isn't possible to write this statement more efficiently in Lua:
if (value == 1 or value == 2) then
something like this for an example (nonworking I assume):
if (value == (1 or 2)) then
or
if value == (1;2) then
Let's look at the produced bytecode as a proxy for speed. (Microbenchmarks are not reliable. Caching, pipelinging, branch prediction, … can have really weird effects that can make code that should be slower in principle perform better in practice in the context where you're actually using it. Bytecode size also isn't a very good indicator (same problems apply), but at least it's easy to produce, deterministic, and easy to interpret.)
(To follow along, throw your test files at luac -p -l, which will only parse (not write a compiled file) and list the resulting bytecode as a side-effect. If you want to understand the bytecode, have a look at the unofficial bytecode reference as initially created by Kein-Hong Man and kindly updated by Dibyendu Majumdar. But you don't have to.)
If value is a global variable, you'll get this:
1 [1] GETTABUP 0 0 -1 ; _ENV "value"
2 [1] EQ 1 0 -2 ; - 1 (fall through to next comparison)
3 [1] JMP 0 3 ; to 7 (true branch)
4 [1] GETTABUP 0 0 -1 ; _ENV "value"
5 [1] EQ 0 0 -3 ; - 2 (fall through into true branch)
6 [1] JMP 0 1 ; to 8 (beyond true branch)
Translating back into "pseudo-Lua", this is rougly
local r0 = _ENV["value"]
if r0 == 1 then goto true_branch end
local r0 = _ENV["value"]
if r0 ~= 2 then goto fin end
::true_branch::
-- stuff here
::fin::
If value is local (or a function argument) in the function where you use this, then you'll get something like this instead:
1 [1] EQ 1 0 -1 ; - 1
2 [1] JMP 0 2 ; to 5
3 [1] EQ 0 0 -2 ; - 2
4 [1] JMP 0 1 ; to 6
or roughly
if r0 == 1 then goto true_branch end
if r0 ~= 2 then goto fin end
::true_branch::
-- stuff here
::fin::
"Much" better!
So if value is a global variable (or an upvalue), doing
local value = value
if value == 1 or value == 2 then
-- stuff
end
will give you
1 [1] GETTABUP 0 0 -1 ; _ENV "value"
2 [1] EQ 1 0 -2 ; - 1
3 [1] JMP 0 2 ; to 6
4 [1] EQ 0 0 -3 ; - 2
5 [1] JMP 0 1 ; to 7
or
local r0 = _ENV["value"]
if r0 == 1 then goto true_branch end
if r0 == 2 then goto true_branch end
goto fin
::true_branch::
-- stuff here
::fin::
which saves one lookup. (While microbenchmarks will show a clear difference, in practice you'll almost never notice a difference. If you're doing a deeper lookup if foo.bar.baz == 1 or foo.bar.baz == 2 then it makes sense to local first, and it will probably also increase readability.)

Lua more than one locals in one line

Assuming we have the following code:
local x = 1
local x, y = 2, 3
I know x will become 2 after the second line, however, does the local on the that line create a new x, or use the one before?
They will be two different local values: the first one will be shadowed and not accessible as the second one is created with the same name in the same block. Here is the information that luac -l -l (Lua 5.3) shows for this script:
main <local.lua:0,0> (4 instructions at 00697ae8)
0+ params, 3 slots, 1 upvalue, 3 locals, 3 constants, 0 functions
1 [1] LOADK 0 -1 ; 1
2 [2] LOADK 1 -2 ; 2
3 [2] LOADK 2 -3 ; 3
4 [2] RETURN 0 1
constants (3) for 00697ae8:
1 1
2 2
3 3
locals (3) for 00697ae8:
0 x 2 5
1 x 4 5
2 y 4 5
upvalues (1) for 00697ae8:
0 _ENV 1 0
The locals section shows three variables with two x that have the same end-of-scope location.

Difference between variable.functionName and variable["functionName"]

I know that you can get variables and call functions both by using the name directly
variable.functionName
or using the name as a string
variable["functionName"] or variable[functionNameString]
Now my question is:
Is there any resulting difference in these different ways or are they completely interchangeable?
I am mostly interested about performance here, but any enlightenment is welcome.
The PUC-Rio Lua 5.1 byte code for
print(variable.functionName)
print(variable["functionName"])
print(variable[functionNameString])
is
main <var.lua:0,0> (14 instructions, 56 bytes at 0xafe530)
0+ params, 3 slots, 0 upvalues, 0 locals, 4 constants, 0 functions
1 [1] GETGLOBAL 0 -1 ; print
2 [1] GETGLOBAL 1 -2 ; variable
3 [1] GETTABLE 1 1 -3 ; "functionName"
4 [1] CALL 0 2 1
5 [2] GETGLOBAL 0 -1 ; print
6 [2] GETGLOBAL 1 -2 ; variable
7 [2] GETTABLE 1 1 -3 ; "functionName"
8 [2] CALL 0 2 1
9 [3] GETGLOBAL 0 -1 ; print
10 [3] GETGLOBAL 1 -2 ; variable
11 [3] GETGLOBAL 2 -4 ; functionNameString
12 [3] GETTABLE 1 1 2
13 [3] CALL 0 2 1
14 [3] RETURN 0 1
As you can see the first two lines generate exactly the same byte code (and thus take the same amount of time), while the third line has an additional (global) variable access.
The first line only works since "functionName" is a valid Lua identifier and not a reserved word. Lines 2 and 3 don't have restrictions about the format of the string key.
They are the same. From the manual:
... To represent records, Lua uses the field name as an index. The language supports this representation by providing a.name as syntactic sugar for a["name"].

Quick way to determine the number of k-length paths from A to B in a dense complete graph

Given a complete dense graph (over 250.000 nodes) , what is the quickest way to determine the number of k-length paths from node A to B ?
I understand this is an old post, but I had the exact same question and could not find the answer.
I like to think of this problem as a "permutation without repetition", as the order of the nodes visited matters (permutation) and we aren't backtracking (no repetitions). The number of permutations without repetition is: n!/(n-r)!
For a complete graph with N nodes, there are N - 2 remaining nodes to choose from when creating a path between a given A and B. To create a path of length K, K-1 nodes must be chosen from the remaining nodes after A and B are excluded. Therefore, in this context, n = N - 2, and r = k - 1.
Plugging into the above formula yields:
(N-2)!/(N-K-1)!
Example: for N = 5, with nodes 0,1,2,3,4 the following paths are possible from 0 to 1:
0 1
0 2 1
0 2 3 1
0 2 3 4 1
0 2 4 1
0 2 4 3 1
0 3 1
0 3 2 1
0 3 2 4 1
0 3 4 1
0 3 4 2 1
0 4 1
0 4 2 1
0 4 2 3 1
0 4 3 1
0 4 3 2 1
This yields 1 path of length 1, 3 paths of length 2, 6 paths of length 3, and 6 paths of length 4.
This appears to work for any N>=2 and K<=N-1.
You can use basically dynamic programming: For each node Y and path length k, you can compute the number of paths from A to Y of length k if you know the number of paths from A to X of path length k-1 for all nodes X. Total complexity is O(KV), where K is the total path length you are trying to compute for and V is the number of vertices.

Resources