How do I unit test Lua modules without OO? - dependency-injection

In Lua I can write a simple module like so
local database = require 'database'
local M = {}
function M:GetData()
return database:GetData()
end
return M
Which when required, will load once, and all future versions will load the same copy.
If I wanted to take an object-oriented approach I could do something like:
local M = {}
M.__index = M
function M:GetData()
return self.database:GetData()
end
return function(database)
local newM = setmetatable({}, M)
newM.database = database
return newM
end
Where M is only loaded once, and each copy of newM just holds its own data and uses the methods of the original M.
When it comes to testing, with the OO approach I can just pass in a fake version of 'database' and check it gets called, but with the first approach I can't.
So my question is how can I make the first approach support DI/testing without making it class-like?
My thought was to wrap it in a closure something like this:
local mClosure = function(database)
local M = {}
function M:GetData()
return database:GetData()
end
return M
end
return mClosure
but then every time it is called it will create a new copy of M, so it will lose the benefits of both of the previous approaches.

That's clearly a use case for the Lua debug library. With that you can just modify the upvalues of your function and inject dependancies. Also consider that you can use require for this; just require your database module once, create small table that collects data and then redirects to the original module and put it in package.loaded so the next time you require it, the require call returns the modified version of the module. The OO approach is how you would do this kind of thing in a language like Ruby, but in Lua we have way nicer ways of tapping into a module or function without it being specifically designed for that purpose.
local real_db = require 'db'
local fake_db = setmetatable({}, {__index=db})
function fake_db.exec(query) print('running query: '..query) end -- dummy function
function fake_db.something(...) print('doing something'); real_db.something(...) end
package.loaded.db = fake_db
require 'my_tests' -- this in turn requires 'db', but gets the fake one
package.loaded.db = real_db
-- After this point, `require 'db'` will return the original module

Related

Lua (require) invoke an not intended print of required file name

When require is called in testt.lua which is one of two files the return is movee and movee.lua.
movee are for the most part a class to be required, but should be able to accept to be called direct with parameter.
movee.lua
local lib = {} --this is class array
function lib.moveAround( ... )
for i,direction in ipairs(arg) do
print(direction)
end
end
function lib.hello()
print("Hello water jump")
end
lib.moveAround(...)
return lib
testt.la
local move = require("movee")
Expected result is not to call lib.moveAround or print of file name when require is called.
Your expectations are incorrect. Lua, and most scripting languages for that matter, does not recognize much of a distinction between including a module and executing the Lua file which provides that module. Every function statement is a statement whose execution creates a function object. Until those statements are executed, those functions don't exist. Same goes for your local lib = {}. And so on.
Now, if you want to make a distinction between when a user tries to require your script as a module and when a user tries to execute your script on the command line (or via just loadfile or similar), then I would suggest doing the following.
Check the number of arguments the script was given. If no arguments were given, then your script was probably required, so don't do the stuff you don't want to do when the user requires your script:
local nargs = select("#", ...)
if(nargs > 0) then
lib.moveAround(...)
end
Solved by replacing
lib.moveAround(...)
with
local argument = {...}
if argument[1] ~= "movee" and argument[2] ~= "movee" then
lib.moveAround(...)
end
require("movee")
will execute the code within movee.lua
lib.moveAround(...)
is part of that code. Hence if you require "movee" you call lib.moveAround
If the expected result is not to call it, remove that line from your code or don't require that file.

Lua singleton when re-requiring

Let's say I have some wrapper for some class:
local wrapper = {}
local some_library = require('some-library')
wrapper._library = some_library.new(...)
return wrapper
If I require my wrapper in different files, will a new instance of some_library be created every time? i.e
// file1
local file1 = {}
local wrapper = require('wrapper')
-- add methods to file1
return file1
// file2
local file1 = require('file1')
local wrapper = require('wrapper')
And then I do lua file2
In this case, wrapper is included twice; once inside file1 which file2 is requiring, and once by file2 itself. Will there be two instances of some-library now? How do I create a singleton if I only want one instance?
Short answer:
no
Long answer:
When you require a file, lua first checks if the module has already been loaded. If not, it loads (executes) the file and saves its return value into a global table where it's cached. This means that a) some-library is only required once and b) the same happens for the wrapper, so you only have one wrapper for one library; as opposed to several wrappers to one library or several wrappers with individual libraries.
Keep in mind though: if your module returns not the library code, but a function that then creates the library code, only the function is cached, but every time you run it it returns a different copy of your library object. For example:
-- lib.lua
return function()
{ foo = function(bar) print("fooing a bar...") end }
end
-- program.lua
factory_1 = require "lib"
factory_2 = require "lib"
print(factory_1 == factory_2) --> prints "true"
lib_1 = factory_1()
lib_2 = factory_1() -- not a typo, it's factory_1 called twice
print(lib_1 == lib_2) --> prints "false"
And if you're really bored, you can create a library (as a table) that, when called as if it were a function (using the __call metamethod) returns a new instance of the library just like the first one.

Create suite of interdependent Lua files without affecting the global namespace

tl;dr: What design pattern allows you to split Lua code over multiple files that need to share some information without affecting the global table?
Background
It is considered bad form to create a library in Lua where requiring the library affects the global namespace:
--> somelib.lua <--
SomeLib = { ... }
--> usercode.lua <--
require 'somelib'
print(SomeLib) -- global key created == bad
Instead, it is considered a best practice to create a library that uses local variables and then returns them for the user to assign as they see fit:
--> somelib.lua <--
local SomeLib = { ... }
return SomeLib
--> usercode.lua <--
local theLib = require 'somelib' -- consumers name lib as they wish == good
The above pattern works fine when using a single file. However, this becomes considerably harder when you have multiple files that reference each other.
Concrete Example
How can you rewrite the following suite of files so that the assertions all pass? Ideally the rewrites will leave the same files on disk and responsibilities for each file. (Rewriting by merging all code into a single file is effective, but not helpful ;)
--> test_usage.lua <--
require 'master'
assert(MASTER.Simple)
assert(MASTER.simple)
assert(MASTER.Shared)
assert(MASTER.Shared.go1)
assert(MASTER.Shared.go2)
assert(MASTER.Simple.ref1()==MASTER.Multi1)
assert(pcall(MASTER.Simple.ref2))
assert(_G.MASTER == nil) -- Does not currently pass
--> master.lua <--
MASTER = {}
require 'simple'
require 'multi'
require 'shared1'
require 'shared2'
require 'shared3'
require 'reference'
--> simple.lua <--
MASTER.Simple = {}
function MASTER:simple() end
--> multi.lua <--
MASTER.Multi1 = {}
MASTER.Multi2 = {}
--> shared1.lua <--
MASTER.Shared = {}
--> shared2.lua <--
function MASTER.Shared:go1() end
--> shared3.lua <--
function MASTER.Shared:go2() end
--> reference.lua <--
function MASTER.Simple:ref1() return MASTER.Multi1 end
function MASTER.Simple:ref2() MASTER:simple() end
Failure: Setting the Environment
I thought to solve the problem by setting the environment to my master table with a self-reference. This does not work when calling functions like require however, as they change the environment back:
--> master.lua <--
foo = "original"
local MASTER = setmetatable({foo="captured"},{__index=_G})
MASTER.MASTER = MASTER
setfenv(1,MASTER)
require 'simple'
--> simple.lua <--
print(foo) --> "original"
MASTER.Simple = {} --> attempt to index global 'MASTER' (a nil value)
You are giving master.lua two responsibilities:
It defines the common module table
It imports all of the submodules
Instead you should create a separate module for (1) and import it in all of the submodules:
--> common.lua <--
return {}
--> master.lua <--
require 'simple'
require 'multi'
require 'shared1'
require 'shared2'
require 'shared3'
require 'reference'
return require'common' -- return the common table
--> simple.lua <--
local MASTER = require'common' -- import the common table
MASTER.Simple = {}
function MASTER:simple() end
etc.
Finally, change the first line of test_usage.lua to use a local variable:
--> test_usage.lua <--
local MASTER = require'master'
...
The tests should now pass.
I have a systematic way to solve that problem. I have refactored your module in a Git repository to show you how it works: https://github.com/catwell/dont-touch-global-namespace/commit/34b390fa34931464c1dc6f32a26dc4b27d5ebd69
The idea is that you should have the sub-parts return a function that takes the main module as an argument.
If you cheat by opening the source files in master.lua, append a header and a footer and use loadstring, you can even use them unmodified (only master.lua has to be modified, but it is more complex). Personally, I prefer to keep it explicit, which is what I have done here. I don't like magic :)
EDIT: it is very close to Andrew Stark's first solution, except I patch the MASTER table directly in the sub-modules. The advantage is that you can define several things at once, like in your simple.lua, multi.lua and reference.lua files.
We can solve the problem by changing the master file to modify the environment in which all required code is run:
--> master.lua <--
local m = {} -- The actual master table
local env = getfenv(0) -- The current environment
local sandbox = { MASTER=m } -- Environment for all requires
setmetatable(sandbox,{__index=env}) -- ...also exposes read access to real env
setfenv(0,sandbox) -- Use the sandbox as the environment
-- require all files as before
setfenv(0,env) -- Restore the original environment
return m
The sandbox is an empty table that inherits values from _G but that also has a reference to the MASTER table, simulating a global from the perspective of later code. Using this sandbox as the environment causes all later requires to evaluate their "global" code in this context.
We save the real environment for later restoration, so that we don't mess with any later code that might want to actually set a global variable.
The question concerns:
Not polluting the global space when making modules.
Making modules in such a way that they might be split into multiple files, for maintenance reasons, among others.
My solution to the above problem lies in tweaking the "return as table" idiom in Lua such that instead of returning a table, you return a function that returns a table, when state needs to be passed between sub-modules.
This works well for sub-modules that are entirely dependent upon some root-module. If they are loaded independently, then they require the user to know that they need to call the module before they can use it. This is unlike every other module that has a collection of methods, ready to go from local a = require('a').
At any rate, this works like so:
--callbacks.lua a -- sub-module
return function(self)
local callbacks = {}
callbacks.StartElement = function(parser, elementName, attributes)
local res = {}
local stack = self.stack
---awesome stuff for about 150 lines...
return callbacks
end
To use it, you can...
local make_callbacks = require'callbacks'
self.callbacks = make_callbacks(self)
Or, better yet, simply call the return value of require when assigning the callback table to the parent module, like so:
self.callbacks = require'trms.xml.callbacks'(self)
Most often, I try not to do this. If I'm passing state or self between submodules, I find that I'm often doing it wrong. My internal policy is that if I'm doing something that is highly-related to another file, I might be okay. More likely, I'm putting something in the wrong spot and there is a way to do it without passing anything between modules.
The reason that I don't like this is that which I pass by table has methods and properties unseen in the file that I am working within. I'm not free to refactor the internal implementation of one of my files, without horking the others. So, I humbly suggest that this idiom is a yellow flag, but probably not a red one. :)
While this solves the problem of state-sharing without globals, it doesn't really protect the user from the accidental omission of local. If I may speak to that implied question...
The first thing that I do is remove access to the global environment from my module. Remembering that it's only available as long as I don't
reset _ENV, reseting it is the first thing that I do. This is done by packing only what is needed into a new _ENV table.
_ENV = {print = print,
pairs = pairs, --etc
}
However, constantly re-typing all of the things that I need from lua into each file is a giant, error-prone pain. To avoid this, I make one file in my module's base directory and use it as the home for all of my modules' and sub-modules' common environments. I call it _ENV.lua.
Note: I cannot use "init.lua" or any other root-module for this purpose, because I need to be able to load it from the sub-modules, which are being loaded by
the root-module, which loads the sub-modules, which are...
My abbreviated _ENV.lua file looks something like the following:
--_ENV.lua
_ENV = {
type = type, pairs = pairs, ipairs = ipairs, next = next, print =
print, require = require, io = io, table = table, string = string,
lxp = require"lxp", lfs = require"lfs",
socket = require("socket"), lpeg = require'lpeg', --etc..
}
return _ENV
With this file, I now have a common base from which to work.
All of my other modules load this first, using the following command:
_ENV = require'root_mod._ENV' --where root_mod is the base of my module.
This facility was critical for me, for two reasons. First, it keeps me
out of global space. If I see that I am missing something from the global environment _G (took me a surprisingly long time before I saw that I didn't have
tostring!), I can go back into my _ENV.lua file and add it. As
a required file, this only gets loaded one time, so having it applied
to all of my submodules is 0 calories.
Second, I find that it gives me everything that I really needed for using
the "return module as table" protocol, with only a few exceptions where "return a function that returns a table" is needed.
TL;DR: Don't return the module, set package.loaded[...] = your_module as early as possible (can still be empty), then just require the module in submodules and it will be properly shared.
The clean way to do this is to explicitly register the module and not rely on require to implicitly register it at the end. The documentation says:
require (modname)
Loads the given module. The function starts by looking into the
package.loaded table to determine whether modname is already loaded.
If it is, then require returns the value stored at
package.loaded[modname]. [This gets you the caching behavior that
every file is run only once.] Otherwise, it tries to find a loader for
the module. [And one of the searchers is looking for Lua files to run,
which gets you the usual file loading behavior.]
[…]
Once a loader is found, require calls the loader with two arguments:
modname and an extra value dependent on how it got the loader. (If the
loader came from a file, this extra value is the file name.) If the loader
returns any non-nil value [e.g. your file returns the module table],
require assigns the returned value to package.loaded[modname]. If the
loader does not return a non-nil value and has not assigned any value to
package.loaded[modname], then require assigns true to this entry.
In any case, require returns the final value of
package.loaded[modname].
(emphasis, [comments] added by me.)
With the return mymodule idiom, the caching behavior fails if you have a loop in your dependencies – the cache is updated too late. (As a result, files may be loaded several times (you may even get endless loops!) and sharing will fail.) But explicitly saying
local _M = { } -- your module, however you define / name it
package.loaded[...] = _M -- recall: require calls loader( modname, something )
-- so `...` is `modname, something` which is shortened
-- to just `modname` because only one value is used
immediately updates the cache, so that other modules can already require your module before its main chunk returned. (Of course, at that time they can only actually use what's already been defined. But that's not usually a problem.)
The package.loaded[...] = mymodule approach works in 5.1–5.3 (incl. LuaJIT).
For your example, you would adjust the start of master.lua to
1c1,2
< MASTER = {}
---
> local MASTER = {}
> package.loaded[...] = MASTER
and for all other files
0a1
> local MASTER = require "master"
and you're done.

Globals are bad, does this increase performance in any way?

I'm working in LuaJIT and have all my libraries and whatnot stored inside "foo", like this:
foo = {}; -- The only global variable
foo.print = {};
foo.print.say = function(msg) print(msg) end;
foo.print.say("test")
Now I was wondering, would using metatables and keeping all libraries local help at all? Or would it not matter. What I thought of is this:
foo = {};
local libraries = {};
setmetatable(foo, {
__index = function(t, key)
return libraries[key];
end
});
-- A function to create a new library.
function foo.NewLibrary(name)
libraries[name] = {};
return libraries[name];
end;
local printLib = foo.NewLibrary("print");
printLib.say = function(msg) print(msg) end;
-- Other file:
foo.print.say("test")
I don't really have the tools to benchmark this right now, but would keeping the actual contents of the libraries in a local table increase performance at all? Even the slightest?
I hope I made mysef clear on this, basically all I want to know is: Performance-wise is the second method better?
If someone could link/give a detailed explaination on how global variables are processed in Lua which could explain this that would be great too.
don't really have the tools to benchmark this right now
Sure you do.
local start = os.clock()
for i=1,100000 do -- adjust iterations to taste
-- the thing you want to test
end
print(os.clock() - start)
With performance, you pretty much always want to benchmark.
would keeping the actual contents of the libraries in a local table increase performance at all?
Compared to the first version of the code? Theoretically no.
Your first example (stripping out the unnecessary cruft):
foo = {}
foo.print = {}
function foo.print.say(msg)
print(msg)
end
To get to your print function requires three table lookups:
index _ENV with "foo"
index foo table with "print"
index foo.print table with "say".
Your second example:
local libraries = {}
libraries.print = {}
function libraries.print.say(msg)
print(msg)
end
foo = {}
setmetatable(foo, {
__index = function(t, key)
return libraries[key];
end
});
To get to your print function now requires five table lookups along with other additional work:
index _ENV with "foo"
index foo table with "print"
Lua finds the result is nil, checks to see if foo has a metatable, finds one
index metatable with "__index"
check to see if the result is is is table or function, Lua find it's a function so it calls it with the key
index libraries with "print"
index the print table with "say"
Some of this extra work is done in the C code, so it's going to be faster than if this was all implemented in Lua, but it's definitely going to take more time.
Benchmarking using the loop I showed above, the first version is roughly twice as fast as the second in vanilla Lua. In LuaJIT, both are exactly the same speed. Obviously the difference gets optimized away at runtime in LuaJIT (which is pretty impressive). Just goes to show how important benchmarking is.
Side note: Lua allows you to supply a table for __index, which will result in a lookup equivalent to your code:
setmetatable(foo, { __index = function(t, key) return libraries[key] end } )
So you could just write:
setmetatable(foo, { __index = libraries })
This also happens to be a lot faster.
Here is how I write my modules:
-- foo.lua
local MyLib = {}
function MyLib.foo()
...
end
return MyLib
-- bar.lua
local MyLib = require("foo.lua")
MyLib.foo()
Note that the return MyLib is not in a function. require captures this return value and uses it as the library. This way, there are no globals.

Recreating setfenv() in Lua 5.2

How can I recreate the functionality of setfenv in Lua 5.2? I'm having some trouble understanding exactly how you are supposed to use the new _ENV environment variable.
In Lua 5.1 you can use setfenv to sandbox any function quite easily.
--# Lua 5.1
print('_G', _G) -- address of _G
local foo = function()
print('env', _G) -- address of sandbox _G
bar = 1
end
-- create a simple sandbox
local env = { print = print }
env._G = env
-- set the environment and call the function
setfenv(foo, env)
foo()
-- we should have global in our environment table but not in _G
print(bar, env.bar)
Running this example shows an output:
_G table: 0x62d6b0
env table: 0x635d00
nil 1
I would like to recreate this simple example in Lua 5.2. Below is my attempt, but it does not work like the above example.
--# Lua 5.2
local function setfenv(f, env)
local _ENV = env or {} -- create the _ENV upvalue
return function(...)
print('upvalue', _ENV) -- address of _ENV upvalue
return f(...)
end
end
local foo = function()
print('_ENV', _ENV) -- address of function _ENV
bar = 1
end
-- create a simple sandbox
local env = { print = print }
env._G = env
-- set the environment and call the function
foo_env = setfenv(foo, env)
foo_env()
-- we should have global in our envoirnment table but not in _G
print(bar, env.bar)
Running this example shows the output:
upvalue table: 0x637e90
_ENV table: 0x6305f0
1 nil
I am aware of several other questions on this subject, but they mostly seem to be dealing with loading dynamic code (files or string) which work quite well using the new load function provided in Lua 5.2. Here I am specifically asking for a solution to run arbitrary functions in a sandbox. I would like to do this without using the debug library. According to the Lua documentation we should not have to rely on it.
You cannot change the environment of a function without using the debug library from Lua in Lua 5.2. Once a function has been created, that is the environment it has. The only way to modify this environment is by modifying its first upvalue, which requires the debug library.
The general idea with environments in Lua 5.2 is that the environment should be considered immutable outside of trickery (ie: the debug library). You create a function in an environment; once created there, that's the environment it has. Forever.
This is how environments were often used in Lua 5.1, but it was easy and sanctioned to modify the environment of anything with a casual function call. And if your Lua interpreter removed setfenv (to prevent users from breaking the sandbox), then the user code can't set the environment for their own functions internally. So the outside world gets a sandbox, but the inside world can't have a sandbox within the sandbox.
The Lua 5.2 mechanism makes it harder to modify the environment post function-creation, but it does allow you to set the environment during creation. Which lets you sandbox inside the sandbox.
So what you really want is to just rearrange your code like this:
local foo;
do
local _ENV = { print = print }
function foo()
print('env', _ENV)
bar = 1
end
end
foo is now sandboxed. And now, it's much harder for someone to break the sandbox.
As you can imagine, this has caused some contention among Lua developers.
It's a bit expensive, but if it's that important to you...
Why not use string.dump, and re-load the function into the right environment?
function setfenv(f, env)
return load(string.dump(f), nil, nil, env)
end
function foo()
herp(derp)
end
setfenv(foo, {herp = print, derp = "Hello, world!"})()
To recreate setfenv/getfenv in Lua 5.2 you can do the following:
if not setfenv then -- Lua 5.2
-- based on http://lua-users.org/lists/lua-l/2010-06/msg00314.html
-- this assumes f is a function
local function findenv(f)
local level = 1
repeat
local name, value = debug.getupvalue(f, level)
if name == '_ENV' then return level, value end
level = level + 1
until name == nil
return nil end
getfenv = function (f) return(select(2, findenv(f)) or _G) end
setfenv = function (f, t)
local level = findenv(f)
if level then debug.setupvalue(f, level, t) end
return f end
end
RPFeltz's answer (load(string.dump(f)...)) is a clever one and may work for you, but it doesn't deal with functions that have upvalues (other than _ENV).
There is also compat-env module that implements Lua 5.1 functions in Lua 5.2 and vice versa.
In Lua5.2 a sandboxeable function needs to specify that itself. One simple pattern you can use is have it receive _ENV as an argument
function(_ENV)
...
end
Or wrap it inside something that defines the env
local mk_func(_ENV)
return function()
...
end
end
local f = mk_func({print = print})
However, this explicit use of _ENV is less useful for sandboxing, since you can't always assume the other function will cooperate by having an _ENV variable. In that case, it depends on what you do. If you just want to load code from some other file then functions such as load and loadfile usually receive an optional environment parameter that you can use for sandboxing. Additionally, if the code you are trying to load is in string format you can use string manipulation to add _ENV variables yourself (say, by wrapping a function with an env parameter around it)
local code = 'return function(_ENV) return ' .. their_code .. 'end'
Finally, if you really need dynamic function environment manipulation, you can use the debug library to change the function's internal upvalue for _ENV. While using the debug library is not usually encouraged, I think it is acceptable if all the other alternatives didn't apply (I feel that in this case changing the function's environment is deep voodoo magic already so using the debug library is not much worse)

Resources