Why do we need to call Lua's collectgarbage() twice? - lua

I have encountered several places where people call collectgarbage() twice to finalize all unused objects.
Why is that? Why isn't a single call enough? Why not three calls?
When I try the following code (on Lua 5.2), the object gets finalized (meaning: its __gc gets called) with just a single call to collectgarbage:
do
local x = setmetatable({},{
__gc = function() print("works") end
})
end
collectgarbage()
os.exit()
Does this mean one call is enough?

It's explained in Programming in Lua 3rd edition ยง17.6 Finalizers. In short, it's because of resurrection.
A finalizer is a function associated with an object that is called when that object is about to be collected. Lua implements finalizers with __gc metamethod.
The problem is, when the finalizers are called, the object must be alive in some cases. PiL explains it with this example:
A = {x = "this is A"}
B = {f = A}
setmetatable(B, {__gc = function (o) print(o.f.x) end})
A, B = nil
collectgarbage() --> this is A
The finalizer for B accesses A, so A cannot be collected before the finalization of B. Lua must resurrect both B and A before running that finalizer.
Resurrection is the reason of calling collectgarbage twice:
Because of resurrection, objects with finalizers are collected in two phases.
The first time the collector detects that an object with a finalizer is not reachable, the collector resurrects the object and queues it to be finalized. Once its finalizer runs, Lua marks the object as finalized. The next time the collector detects that the object is not reachable, it deletes the object. If you want to ensure that all garbage in your program has been actually released, you must call collectgarbage twice; the second call will delete the objects that were finalized during the first call.

Related

Distinguish function vs closure

Lua will write the code of a function out as bytes using string.dump, but warns that this does not work if there are any upvalues. Various snippets online describe hacking around this with debug. It looks like 'closed over variables' are called 'upvalues', which seems clear enough. Code is not data etc.
I'd like to serialise functions and don't need them to have any upvalues. The serialised function can take a table as an argument that gets serialised separately.
How do I detect attempts to pass closures to string.dump, before calling the broken result later?
Current thought is debug.getupvalue at index 1 and treat nil as meaning function, as opposed to closure, but I'd rather not call into the debug interface if there's an alternative.
Thanks!
Even with debug library it's very difficult to say whether a function has a non-trivial upvalue.
"Non-trivial" means "upvalue except _ENV".
When debug info is stripped from your program, all upvalues look almost the same :-)
local function test()
local function f1()
-- usual function without upvalues (except _ENV for accessing globals)
print("Hello")
end
local upv = {}
local function f2()
-- this function does have an upvalue
upv[#upv+1] = ""
end
-- display first upvalues
print(debug.getupvalue (f1, 1))
print(debug.getupvalue (f2, 1))
end
test()
local test_stripped = load(string.dump(test, true))
test_stripped()
Output:
_ENV table: 00000242bf521a80 -- test f1
upv table: 00000242bf529490 -- test f2
(no name) table: 00000242bf521a80 -- test_stripped f1
(no name) table: 00000242bf528e90 -- test_stripped f2
The first two lines of the output are printed by test(), the last two lines by test_stripped().
As you see, inside test_stripped functions f1 and f2 are almost undistinguishable.

Strange behavior caused by debug.getinfo(1, "n").name

I learned how to get the function name inside a function by using debug.getinfo(1, "n").name.
Using this feature, I found out the strange behavior in Lua.
Here's my code:
function myFunc()
local name = debug.getinfo(1, "n").name
return name
end
function foo()
return myFunc()
end
function boo()
local name = myFunc()
return name
end
print(foo())
print(boo())
Result:
nil
myFunc
As you can see, the function foo() and boo() calls the same function myFunc() but they return different results.
If I replace debug.getinfo(1, "n").name with other string, they return the same results as expected but I don't understand the unexpected behavior caused by using the debug.getinfo().
Is it possible to correct myFunc() function so calling both foo() and boo() functions return the same result?
Expected result:
myFunc
myFunc
In Lua, any return statement of the form return <expression_yielding_a_function>(...) is a "tail call". Tail calls essentially don't exist in the call stack, so they take up no additional space or resources. The function you call effectively gets erased from the debug information.
Is it possible to correct myFunc() function so calling both foo() and boo() functions return the same result?
Um... yes, but before I tell you how, allow me to try to convince you not to do this.
As previously mentioned, tail calls are part of the Lua language. The removal of tail calls from the stack is not an "optimization" any more than it is an "optimization" for a for loop to exit when you use break. It is a part of Lua's grammar, and Lua programmers have just as much a right to expect a tail call to be a tail call as they have the right to expect break to exit loops.
Lua, as a language, specifically states that this:
local function recursive(...)
--some terminating condition
return recursive(modified_args)
end
will never, ever, run out of stack space. It will be just as stack space efficient as performing a loop. This is a part of the Lua language, just as much a part of it as the behavior of for and while.
If a user wants to call your function via a tail call, that is their right as the user of a language that makes tail calls a thing. Denying users of a language the right to use the features of that language is rude.
So don't do it.
Furthermore, your code suggests that you are attempting to rely on functions having names. That you're doing something significant and meaningful with those names.
Well, Lua is not Python; Lua functions do not have to have names, period. As such, you should not write code that meaningfully relies upon the name of a function. For debugging or logging purposes, fine. But you should not break user expectations just for debugging and logging. So if the user made a tail call, just accept that's what the user wanted and that your debugging/logging will suffer slightly.
OK, so, do we agree that you shouldn't do this? That Lua users have the right to tail calls, and you don't have the right to deny them? That Lua functions are not named and you shouldn't write code that requires them to maintain a name? OK?
What follows is terrible code that you should never use! (in Lua 5.3):
function bypass_tail_call(Func)
local function tail_call_bypass(...)
local rets = table.pack(Func(...))
return table.unpack(rets, rets.n)
end
return tail_call_bypass
end
Then, simply replace your real function with the return of the bypass:
function myFunc()
local name = debug.getinfo(1, "n").name
return name
end
myFunc = bypass_tail_call(myFunc)
Note that the bypass function has to build an array to hold the return values, then unpack them into the final return statement. This obviously requires additional memory allocations that don't have to happen in regular code.
So there's another reason not to do this.
You can run your code through luac -l -p
...
function <stdin:6,8> (4 instructions at 0x555f561592a0)
0 params, 2 slots, 1 upvalue, 0 locals, 1 constant, 0 functions
1 [7] GETTABUP 0 0 -1 ; _ENV "myFunc"
2 [7] TAILCALL 0 1 0
3 [7] RETURN 0 0
4 [8] RETURN 0 1
function <stdin:10,13> (4 instructions at 0x555f561593b0)
0 params, 2 slots, 1 upvalue, 1 local, 1 constant, 0 functions
1 [11] GETTABUP 0 0 -1 ; _ENV "myFunc"
2 [11] CALL 0 1 2
3 [12] RETURN 0 2
4 [13] RETURN 0 1
Those are the two function that are of interest to us: foo and boo
As you can see, when boo calls myFunc, it's just a normal CALL, so nothing interesting there.
foo, however, does something called a tail call. That is, the return value of foo is the return value of myFunc.
What makes this kind of call special is that there is no need for the program to jump back into foo; once foo calls myFunc it can just hand over the keys and say "You know what to do"; myFunc then returns its results directly to where foo was called. This has two advantages:
The stack frame of foo can be cleaned up before myFunc is called
once myFunc returns, it doesn't need two jumps to return to the main thread; only one
Both of those are insignificant in examples like yours, but once you have a chain of lots and lots of tail calls, it becomes significant.
The downside of this is that, once the stack of foo gets cleaned up, Lua also forgets all the debugging information associated with it; it only remembers that myFunc was called as a tail call, but not from where.
An interesting side note, is that boo is almost also a tail call. If Lua didn't have multiple return values, it'd be exactly identical to foo, and a smarter compiler like LuaJIT might compile it to a tail call. PUC Lua won't though, since it needs a literal return some_function() to recognize the tail call.
The difference is that boo only returns the first value returned by myFunc, and while in your example, there will only ever be one, the interpreter can't make that assumption (LuaJIT might make that assumption during JIT compilation, but that's beyond my understanding)
Also note that, technically, the word tail call just describes a function A directly returning the return value of another function B.
It often gets used interchangeably with tail call optimization, which is what the compiler does when it re-uses the stack frame and turns the function call into a jump.
Strictly speaking, C (for example) has tail calls, but it has no tail call optimization, meaning something like
int recursive(n) { return recursive(n+1); }
is valid C code, but will eventually cause a stack overflow, while in Lua
local function recursive(n) return recursive(n+1) end
will just run forever. Both are tail calls, but only the second gets optimized.
EDIT: As always with C, some compilers may, on their own, implement tail call optimization, so don't go around telling everyone that "C never ever does it"; it's just not a requried part of the language, while in Lua it's actually defined in the language specification, so it's not Lua until it has TCO.
This is a result of tail call optimisation, which Lua does.
In this case, Lua translates the function call into a "goto" statement, and does not use any extra stack frame to perform the tail call.
You can add traceback statement to check it:
function myFunc()
local name = debug.getinfo(1, "n").name
print(debug.traceback("Stack trace"))
return name
end
Tail call optimisation happens in Lua when you return with a function call:
-- Optimized
function good1()
return test()
end
-- Optimized
function good2()
return test(foo(), bar(5 + baz()))
end
-- Not optimised
function bad1()
return test() + 1
end
-- Not optimised
function bad2()
return test()[2] + foo()
end
You can refer to the following links for more information:
- Programming in Lua - 6.3: Proper Tail Calls
- What is tail call optimisation? - Stack Overflow

Lua: Skip variable declaration

I am trying to "Skip" a variable, by either never declaring it or just having it garbage collected immediately, but I don't know if it's possible.
Example:
function TestFunc()
return 1, 2
end
function SecondFunction()
local nodeclare, var = TestFunc()
end
Basically what I wanted was for "nodeclare" to not even exist. So if I did print(nodeclare, var) it would do nil, 2.
The same thing would be if I was doing a pairs loop and I didn't need to use the keyvalue.
Is there some special thing I can put as the variable name for this to happen? If say I was doing a pairs loop over 100 values, would that even have a signifigant impact?
First of all, variables are not garbage collected, objects are. In this case, there's nothing to garbage collect.
However, let's say that TestFunc was creating objects (say, tables):
function TestFunc()
return {1}, {2}
end
function SecondFunction()
local nodeclare, var = TestFunc()
end
Now nodeclare is referencing a table returned by TestFunc. That's an object, allocated on the heap, that we don't want hanging around forever.
That object will eventually be collected if there is nothing left referring to it. In your case, as soon as SecondFunction returns, the local nodeclare goes out of scope and goes away. As long as there's nothing else referencing that table, the table will be collected (during next collection cycle).
You can avoid declaring nodeclare entirely by skipping the first return value of TestFunc like this:
local var = select(2, TestFunc())
However, when you're talking about a temporary local variable, as in your example, you normally just create the temporary variable then ignore it. This avoids the overhead of the call to select. Sometimes you use a variable name that indicates it's trash:
local _, var = TestFunc()
If say I was doing a pairs loop over 100 values, would that even have a signifigant impact?
None whatsoever. You're just continually overwriting the value of a local variable.
What impact do you mean exactly? Memory? Performance?
According to the Programming in Lua book, you can sort of skip the second return value, but not ignore the first and use the second:
x,y = foo2() -- x='a', y='b'
x = foo2() -- x='a', 'b' is discarded
x,y,z = 10,foo2() -- x=10, y='a', z='b'

Lua table memory leak?

I have a memory leak issue about the usage of lua table, the code is below:
function workerProc()
-- a table holds some objects (userdata, the __gc is implememted correctly)
local objs = {createObj(), createObj(), ...}
while isWorking() do
-- ...
local query = {unpack(objs)}
repeat
-- ...
table.remove(query, queryIndex)
until #query == 0
sleep(1000)
end
end
the table objs is initialized with some userdata objects and these objects are always available in the while loop so no gc will performed on these objs. In the while loop the query table is initialize with all the elements from objs (use unpack function). While running the script I found that the memory keeps increasing but when I comment out local query = {unpack(objs)} it disappears.
I don't think this piece of code have memory leak problem cause the query var is local and it should be unavailable after each iteration of while loop, but the fact is. Anybody know why the memory is swallowed by that table?
Judging from your code example, the likely explanation for what you are seeing is perhaps the gc doesn't get a chance to perform a full collection cycle while inside the loop.
You can force a collection right after the inner loop using collectgarbage() and see if that resolves the memory issue:
while isWorking() do
-- ..
local query = {unpack(objs)}
repeat
-- ..
table.remove(query, queryIndex)
until #query == 0
collectgarbage()
sleep(1000)
end
Another possibility is to move local query outside the loop and create the table once instead of creating a new table on every iteration in the outter loop.

Lua: lua_resume and lua_yield argument purposes

What is the purpose of passing arguments to lua_resume and lua_yield?
I understand that on the first call to lua_resume the arguments are passed to the lua function that is being resumed. This makes sense. However I'd expect that all subsequent calls to lua_resume would "update" the arguments in the coroutine's function. However that's not the case.
What is the purpose of passing arguments to lua_resume for lua_yield to return? Can the lua function running under the coroutine have access to the arguments passed by lua_resume?
What Nicol said. You can still preserve the values from the first resume call if you want:
do
local firstcall
function willyield(a)
firstcall = a
while a do
print(a, firstcall)
a = coroutine.yield()
end
end
end
local coro = coroutine.create(willyield)
coroutine.resume(coro, 1)
coroutine.resume(coro, 10)
coroutine.resume(coro, 100)
coroutine.resume(coro)
will print
1 1
10 1
100 1
Lua cannot magically give the original arguments new values. They might not even be on the stack anymore, depending on optimizations. Furthermore, there's no indication where the code was when it yielded, so it may not be able to see those arguments anymore. For example, if the coroutine called a function, that new function can't see the arguments passed into the old one.
coroutine.yield() returns the arguments passed to the resume call that continues the coroutine, so that the site of the yield call can handle parameters as it so desires. It allows the code doing the resuming to communicate with the specific code doing the yielding. yield() passes its arguments as return values from resume, and resume passes its arguments as return values to yield. This sets up a pathway of communication.
You can't do that in any other way. Certainly not by modifying arguments that may not be visible from the yield site. It's simple, elegant, and makes sense.
Also, it's considered exceedingly rude to go poking at someone's values. Especially a function already in operation. Remember: arguments are just local variables filled with values. The user shouldn't expect the contents of those variables to change unless it changes them itself. They're local variables, after all. They can only be changed locally; hence the name.
A simple example:
co = coroutine.create (function (a, b)
print("First args: ", a, b)
coroutine.yield(a+10, b+10)
print("Second args: ", a, b)
coroutine.yield(a+10, b+10)
end)
print(coroutine.resume(co, 1, 2))
print(coroutine.resume(co, 3, 4))
Prints:
First args: 1 2
true 11 12
Second args: 1 2
true 11 12
Showing that the orginal values for the args a and b did not change.

Resources