Redis Lua Script Unpack Returning Different Results - lua

Setup by running sadd a b c
When I execute this code against the set a
keystoclear1 has a single value of "b" in it.
keystoclear2 as both values in it.
local keystoclear = unpack(redis.call('smembers', KEYS[1]))
redis.call('sadd', 'keystoclear1', keystoclear)
redis.call('sadd', 'keystoclear2', unpack(redis.call('smembers', KEYS[1])))
I am by no means a lua expert, so I could just have some strange behavior here, but I would like to know what is causing it.
I tested this on both the windows and linux version of redis, with redis-cli and the stackexchange.redis client. Same behavior in all cases. This is a trivial example, I actually would like to store the results of the unpack because I need to perform several operations with it.
UPDATE: I understand the issue.
table.unpack() only returns the first element
Lua always adjusts the number of results from a function to the circumstances of the call. When we call a function as a statement, Lua discards all of its results. When we use a call as an expression, Lua keeps only the first result. We get all results only when the call is the last (or the only) expression in a list of expressions.

This case is slightly different from the one you referenced in your update. In this case unpack (may) return several elements, but you only store one and discard the rest. You can get other elements if you use local keytoclear1, keytoclear2 = ..., but it's much easier to store the table itself and unpack it as needed:
local keystoclear = redis.call('smembers', KEYS[1])
redis.call('sadd', 'keystoclear1', unpack(keystoclear))
As long as unpack is the last parameter, you'll get all the elements that are present in the table being unpacked.

Related

Garbage collection and the memoization of a string-to-string function

The following exercise comes from p. 234 of Ierusalimschy's Programming in Lua (4th edition). (NB: Earlier in the book, the author explicitly rejects the word memoization, and insists on using memorization instead. Keep this in mind as you read the excerpt below.)
Exercise 23.3: Imagine you have to implement a memorizing table for a function from strings to strings. Making the table weak will not do the removal of entries, because weak tables do not consider strings as collectable objects. How can you implement memorization in that case?
I am stumped!
Part of my problem is that I have not been able to devise a way to bring about the (garbage) collection of a string.
In contrast, with a table, I can equip it with finalizer that will report when the table is about to be collected. Is there a way to confirm that a given string (and only that string) has been garbage-collected?
Another difficulty is simply figuring out what the desired function's specification is. The best I can do is to figure out what it isn't. Earlier in the book (p. 225), the author gave the following example of a "memorizing" function:
Imagine a generic server that takes requests in the form of strings with Lua code. Each time it gets a request, it runs load on the string, and then calls the resulting function. However, load is an expensive function, and some commands to the server may be quite frequent. Instead of calling load repeatedly each time it receives a common command like "closeconnection()", the server can memorize the results from load using an auxiliary table. Before calling load, the server checks in the table whether the given string already has a translation. If it cannot find a match then (and only then) the server calls load and stores the result into the table. We can pack this behavior in a new function:
[standard memo(r)ized implementation omitted; see variant using a weak-value table below]
The savings with this scheme can be huge. However, it may also cause unsuspected waste. ALthough some commands epeat over and over, many other commands happen only once. Gradually, the ["memorizing"] table results accumulates all commands the server has ever received plus their respective codes; after enough time, this behavior will exhaust the server's memory.
A weak table provides a simple solution to this problem. If the results table has weak values, each garbage-collection cycle will remove all translations not in use at that moment (which means virtually all of them)1:
local results = {}
setmetatable(results, {__mode = "v"}) -- make values weak
function mem_loadstring (s)
local res = results[s]
if res == nil then -- results not available?
res = assert(load(s)) -- compute new results
result[s] = res -- save for later reuse
end
return res
end
As the original problem statement notes, this scheme won't work when the function to be memo(r)ized returns strings, because the garbage collector does not treat strings as "collectable".
Of course, if one is allowed to change the desired function's interface so that instead of returning a string, it returns a singleton table whose sole item is the real result string, then the problem becomes almost trivial, but I find it hard to believe that the author had such a crude "solution" in mind2.
In case it matters, I am using Lua 5.3.
1 As an aside, if the rationale for memo(r)ization is to avoid invoking load more often than necessary, the scheme proposed by the author does not make sense to me. It seems to me that this scheme is based on the assumption (a heuristic, really) that a translation that is used frequently, and thus would pay to memo(r)ize, is also one that is always reachable (and hence not collectable). I don't see why this should necessarily, or even likely, be the case.
2 One may be able to put lipstick on this pig in the form of a __tostring method that would allow the table (the one returned by the memo(r)ized function) to masquerade as a string in certain contexts; it's still a pig, though.
Your idea is correct: wrap string into a table (because table is collectable).
function memormoize (func_from_string_to_string)
local cached = {}
setmetatable(cached, {__mode = "v"})
return
function(s)
local c = cached[s] or {func_from_string_to_string(s)}
cached[s] = c
return c[1]
end
end
And I see no pigs in this solution :-)
one that is always reachable (and hence not collectable). I don't see why this should necessarily, or even likely, be the case.
There will be no "always reachable" items in a weak table.
But most frequent items will be recalculated only once per GC cycle.
The ideal solution (never collect frequently used items) would require more complex implementation.
For example, you can move items from normal cache to weak cache when item's "inactivity timer" reaches some threshold.

Does String.to_atom("some-known-string") create a new atom in the atom-table each time?

Does String.to_atom("some-known-string") create a new atom in the atom-table each time?
If NO, then what is the point of String.to_existing_atom/1?
If YES, then why? since String.to_atom("some-known-string") will always give the same result ... and the atom-table is never garbage collected
Assuming you are always using the same string, it may only create a new atom the first time it is run. After that, assuming continued use of the same string, it will not create new atoms.
The reason there is also to_existing_atom is to help prevent filling the atom table with unknown information.
iex(1)> String.to_existing_atom("foo")
** (ArgumentError) argument error
:erlang.binary_to_existing_atom("foo", :utf8)
iex(1)> String.to_atom("foo")
:foo
iex(2)> String.to_existing_atom("foo")
:foo
As you can see, when I first try to call to_existing_atom, the process actually crashes because that atom is not in the atom table. However, if I use to_atom to ensure it exists, I can now call to_existing_atom and I do not get a crash.
An example use-case:
For process isolation, I need to dynamically generate a series of ets tables by partition number. I will have a fixed number of partitions -- but I can't name ets tables using anything but an atom, so {:my_table, num} is not an option.
Therefore, each process with a partition creates an atom based on a {name, number} combo.
String.to_atom("my_table" <> Integer.to_string(i))
Creating atoms from a source outside your direct control is dangerous, though, since it could crash your BEAM. Thus, to_existing_atom is a nice way to sanitize incoming data.
In elixir atoms are immutable.
field(q, ^(String.to_existing_atom k))
In this example we are using existing_atom because we are fetching data form DB and existing make sure the field is valid. It is useful and in such scenarios.

Am I basically doing extra work by making my local functions global?

I read that it is faster and better to keep most of your functions local instead of global.
So I'm doing this:
input = require("input")
draw = require("draw")
And then in input.lua for example:
local tableOfFunctions = {isLetter = isLetter, numpadCheck = numpadCheck, isDigit = isDigit, toUpper = toUpper}
return tableOfFunctions
Where isLetter, numpadCheck etc are local functions for that file.
Then I call the functions like so:
input.isLetter(key)
Now, my question is: Am I reinventing the wheel with this? Aren't global functions stored in a lua table? I do like the way it looks with the input. before the function name, keeps it nice and tidy so I may keep it if it's not a bad coding practise.
Reinventing the wheels tailored to your personal needs is centerpiece of lua.
The method you describe is described as a valid one by lua creator himself in his book, here.
Everything in Lua is stored inside a table. The "faster" local function (as well as faster local variables) comes from the way of how globals and upvalues are looked up.
Below the line there's a quote of relevant part of more detailed explanation on speed that happened to occur in on game's forum.
Apart from that, locals are recommended due to cleanness of the code and error proofing.
In lua a table is created with {}, this operator reserves a certain amount of memory in the ram for the table. That reserved space stays constant and unmovable, exceptions are implementation details that script writer should not concern himself with.
Any variable you assign table to
a={};
b={ c={a} }
is just a pointer to the table in memory. A pointer takes up either 32 or 64 bits and that's it.
Whenever you pass table around only those 64 bits are copied.
Whenever you query a table in a table:
return b.c[1]
computer follows the pointer stored in b, finds a table object in ram, queries it for key "c", takes pointer to another table, queries that for key "1" then returns the pointer to the table a. Just a simple pointer hopping, workload on par with arithmetic.
Every function has associated table _ENV, any variable lookup
return a
is actually a query to that table
return _ENV.a
If the variable is local, it is stored in _ENV.
If there's no variable in _ENV with the given name, then global variables are queried, those actually reside in top-level table, the _ENV of the root function of the script (it is require or dofile function that loads and executes the script).
Usually, a link to the global table is stored in any other _ENV as _G. So the access to a global variable
return b
is actually something like
return _ENV.b or _ENV._G.b
Thus it is about 3 pointer jumps instead of 1.
Here is convoluted example that should give you an insight on the amount of work that implies:
%RUN THIS IN STANDALONE LUA INTERPRETER
local depth=100--how many pointers will be in a chain
local q={};--a table
local a={};--a start of pointer chain
local b=a; -- intermediate variable
for i=1,depth do b.a={} b=b.a end; --setup chain
local t=os.clock();
print(q)
print(os.clock()-t);--time of previous line execution
t=os.clock(); --start of pointer chain traversal
b=a
while b.a do b=b.a end
print(b)
print(os.clock()-t)--time of pointer traversal
When a pointer chain is about 100 elements, system load fluctuations may actually cause the second time be smaller. The direct access gets notably faster only when you change depth to thousands and more intermediate pointers.
Note that, whenever you query uninitialized variable, all 3 jumps are taken.
Globals are stored in the reserved table _G (the contents of which you can examine at any time), but it is good programming practice to avoid the use of globals.
Unless there is a very good reason not to, your table input should be local as well.
From Programming in Lua:
It is good programming style to use local variables whenever possible. Local variables help you avoid cluttering the global environment with unnecessary names. Moreover, the access to local variables is faster than to global ones.

aerospike udf -- how lua gets executed? how to run a function only once?

We have a lua script which filters the records , returns the map. I have two questions
Does aerospike executes the lua script like an independent script (similar to 'lua ' ) on every query?
There is a need to read a file and cache it using a function -- I want this function to be called only once , how can it be achieved?
Aerospike executes Lua script in a sandboxed environment. The context is reset across calls. So, you cannot read a file and cache values which you can use during next invocation. If you need to pass some information to each call, consider passing them via arguments. Needless to say, its better to not pass huge data structures as arguments. The overhead of encoding/decoding them will be high.

Lua tables: performance hit for starting array indexing at 0?

I'm porting FFT code from Java to Lua, and I'm starting to worry a bit about the fact that in Lua the array part of a table starts indexing at 1 while in Java array indexing starts at 0.
For the input array this causes no problem because the Java code is set up to handle the possibility that the data under consideration is not located at the start of the array. However, all of the working arrays internal to the code are assumed to starting indexing at 0. I know that the code will work as written -- Lua tables are awesome like that -- but I have no sense at all about the performance hit I might incur by having the "0" element of the array going into the hash table part of the underlying C structure (or indeed, if that is what will happen).
My question: is this something worth worrying about? Should I be planning to profile and hand-optimize the code? (The code will eventually be used to transform many relatively small (> 100 time points) signals of varying lengths not known in advance.)
I have made small, probably not that reliable, test:
local arr = {}
for i=0,10000000 do
arr[i] = i*2
end
for k, v in pairs(arr) do
arr[k] = v*v
end
And similar version with 1 as the first index. On my system:
$ time lua example0.lua
real 2.003s
$ time lua example1.lua
real 2.014s
I was also interested how table.insert will perform
for i=1,10000000 do
table.insert(arr, 2*i)
...
and, suprisingly
$ time lua example2.lua
real 6.012s
Results:
Of course, it depends on what system you're running it, probably also whic lua version, but it seems that it makes little to no difference between zero-start and one-start. Bigger difference is caused by the way you insert things to array.
I think the correct answer in this case is changing the algorithm so that everything is indexed with 1. And consider that part of the conversion.
Your FFT will be less surprising to another Lua user (like me), given that all "array-like" tables are indexed by one.
It might not be as stressful as you might think, given the way numeric loops are structured in Lua (where the "start" and the "end" are "inclusive"). You would be exchanging this:
for i=0,#array-1 do
... (do stuff with i)
end
By this:
for i=1,#array do
... (do stuff with i)
end
The non-numeric loops would remain unchanged (except that you will be able to use ipairs too, if you so desire).

Resources