How to remove a lua table entry by its key? - lua

I have a lua table that I use as a hashmap, ie with string keys :
local map = { foo = 1, bar = 2 }
I would like to "pop" an element of this table identified by its key. There is a table.remove() method, but it only takes the index of the element to remove (ie a number) and not a generic key. I would like to be able to do table.remove(map, 'foo') and here is how I implemented it :
function table.removekey(table, key)
local element = table[key]
table[key] = nil
return element
end
Is there a better way to do that ?

No, setting the key's value to nil is the accepted way of removing an item in the hashmap portion of a table. What you're doing is standard. However, I'd recommend not overriding table.remove() - for the array portion of a table, the default table.remove() functionality includes renumbering the indices, which your override would not do. If you do want to add your function to the table function set, then I'd probably name it something like table.removekey() or some such.

TLDR
(because you're only damn trying to remove a thing from a map, how hard can that be)
Setting a key to nil (e.g. t[k] = nil) is not only accepted and standard, but it's the only way of removing the entry from the table's "content" in general. And that makes sense. Also, array portion and hashmap portion are implementation details and shouldn't have ever be mentioned in this Q/A.
Understanding Lua's tables
(and why you can't remove from an infinite)
Lua tables don't literally have concept of "removing an entry from a table" and it hardly has a concept of "inserting an entry to a table". This is different from many other data structures from different programming languages.
In Lua, tables are modelling a perfect associative structure of infinite size.
Tables in Lua are inherently mutable and there's only one fundamental way to construct a new table: using the {} expression. Constructing a table with initial content (e.g. t = {10, 20, ["foo"] = 123, 30} or anything alike) is actually a syntactic sugar equivalent to first constructing a new table (e.g. t = {}} and then setting the "initial" entries one by one (e.g. t[1] = 10, t[2] = 20, t["foo"] = 123, t[3] = 30) . The details of how the de-sugaring works doesn't help with understanding the discussed matter, so I will be avoiding the table construction sugar in this answer.
In Lua, a freshly-constructed table initially associates all possible values with nil. That means that for a table t = {}, t[2] will evaluate to nil, t[100] will evaluate to nil, t["foo"] will evaluate to nil, t[{}] will evaluate to nil, etc.
After construction, you can mutate the table by setting a value at some key. Then that key will be now associated with that value. For example, after setting t["foo"] = "bar", the key "foo" will now be associated with the value "bar". In consequence, t["foo"] will now evaluate to "bar".
Setting nil at some key will associate that key to nil. For example, after setting t["foo"] = nil, "foo" will (again) be associated with nil. In consequence, t["foo"] will (again) evaluate to nil.
While any key can be associated to nil (and initially all possible keys are), such entries (key/value pairs) are not considered a part of the table (i.e. aren't considered part of the table content).
Functions pairs and ipairs (and multiple others) operate on table's content, i.e. the of associations in which the value isn't nil. The number of such associations is always finite.
Having everything said above in mind, associating a key with nil will probably do everything you could expect when saying "removing an entry from a table", because t[k] will evaluate to nil (like it did after constructing the table) and functions like pairs and ipairs will ignore such entries, as entries (associations) with value nil aren't considered "a part of the table".
Sequences
(if tables weren't already tricky)
In this answer, I'm talking about tables in general, i.e. without any assumption about their keys. In Lua, tables with a particular set of keys can be called a sequence, and you can use table.remove to remove an integer key from such table. But, first, this function is effectively undefined for non-sequences (i.e. tables in general) and, second, there's no reason to assume that it's more than a util, i.e. something that could be directly implemented in Lua using primitive operations.
Which tables are or aren't a sequence is another hairy topic and I won't get into details here.
Referencing the reference
(I really didn't make up all that)
Everything said above is based on the official language reference manual. The interesting parts are mostly chapter 2.1 – Values and Types...
Tables can be heterogeneous; that is, they can contain values of all types (except nil). Any key with value nil is not considered part of the table. Conversely, any key that is not part of a table has an associated value nil.
This part is not worded perfectly. First, I find the phrase "tables can be heterogeneous" confusing. It's the only use of this term in the reference and the "can be" part makes it non-obvious whether "being heterogeneous" is a possible property of a table, or whether it tables are defined that way. The second sentence make the first explanation more reasonable, because if "any key with value nil is not considered part of the table", then it means that "a key with value nil" is not a contradiction. Also, the specification of the rawset function, which (indirectly) gives semantics to the t[k] = v syntax, in the 6.1 – Basic Functions chapter says...
Sets the real value of table[index] to value, without invoking any metamethod. table must be a table, index any value different from nil and NaN, and value any Lua value.
As nil values are not special-cased here, that means that you can set t[k] to nil. The only way to understand that is to accept that from now on, that key will be "a key with value nil", and in consequence "will not be considered part of the table" (pairs will ignore it, etc.), and as "any key that is not part of a table has an associated value nil", t[k] will evaluate to nil.
The whole reference also doesn't mention "removing" a key or an entry from tables in any other place.
Another perspective on tables
(if you hate infinities)
While I personally find the perspective from the reference elegant, I also understand that the fact that it's different from other popular models might make it more difficult to reason about.
I believe that the following perspective is effectively equivalent to the previous one.
You can think that...
{} returns an empty table.
t[k] evaluates to v if t contains key k, and nil otherwise
Setting t[k] = v inserts a new entry (k, v) to the table if it doesn't contain key k, updates such entry if t already contains key k, and finally, as a special case, removes the entry for the key k if v is nil
The content of the table (i.e. what's considered "a part of the table") is the set of all entries from the table
In this model, tables aren't capable of "containing" nil values.
This is not how the language reference defines things, but to the best of my understanding, such model is observably equivalent.
Don't talk implementation details
(unless you're sure that that's what you mean)
The so-called "hashmap portion" of the table (which supplements the so-called "array portion" of the table) are implementation details and talking about them, unless we discuss performance or the explanation of specific undefined or implementation-defined behaviors, is in my opinion confusing in the best case and plain wrong in the worst.
For example, in a table constructed like this... t = {}, t[1] = 10, t[2] = 20, t[3] = 30, the array portion will (probably!) be [10, 20, 30] and setting t[2] = nil will "remove" the entry (2, 20) "from the array part", possibly also resizing it or moving 3 -> 30 to the hashmap part. I'm not really sure. I'm just saying this to prove that discussing implementation details is not what we want to do here.

Related

Why does assigning a table as a value to another table cause problems?

How come we can't intuitively copy tables around in Lua like so:
a = {
a = {},
b = {},
}
b = {}
b = a.b
I've run into some weird bugs doing this. If I use a table clone function like the following, it will work fine, I just don't understand why having to use a clone function is needed/best practice in the first place.
It's hard to describe the bug I've run into when trying to do the first method, but basically, if I try to add additional key-values inside the a.b part of b = a.b, then the additional key-values don't always become what I set them to.
function deepCopy(object)
local lookup_table = {}
local function _copy(object)
if type(object) ~= "table" then
return object
elseif lookup_table[object] then
return lookup_table[object]
end
local new_table = {}
lookup_table[object] = new_table
for index, value in pairs(object) do
new_table[_copy(index)] = _copy(value)
end
return setmetatable(new_table, getmetatable(object))
end
return _copy(object)
end
and then doing the following removes any bugs
b = deepCopy(a.b)
In Lua, a table is a value, and each distinct table has a distinct value. The value of a table is used to identify its contents, but the contents of a table are not conceptually the value of the table. That is, to access the contents of a table, you need the table's value, but the table's value is not the same thing as its contents.
The table's value can be stored in any variable. And again, that value is used to identify that table and to access that table's contents, but that is not the same thing as the value logically being the table's contents.
Consider the following:
tbl1 = { 1, 2, 3 }
tbl2 = tbl1
tbl3 = { 1, 2, 3 }
The value of tbl1 and tbl2 is the same; this means that they both refer to the same table and thus you can access the contents of that table through either variable. So tbl1[2] and tbl2[2] don't simply return 2; they both access the same table.
tbl3 is not the same table as tbl1. They might have contents which are logically identical, but as far as Lua is concerned, they are different tables. Manipulating the contents of the table stored in tbl3 will not affect anyone looking at the tables stored in tbl1 or tbl2.
So, why does storing a table into a variable not copy the table's contents? Several reasons.
Deep copies are expensive. If all copies were deep, you wouldn't even be able to execute a simple return {1, 2, 3} without performing a copy. A pointless copy, because there are no other variables that can talk to that table (since it was created in-situ). Why waste performance? Same goes for passing a table as a parameter to a function or any number of other things.
Deep-copying-only prevents useful things like accessing the same table from different locations. If every table copy was deep, how could you have something as simple as a local copy of a module table? You couldn't have a table "member function" return a table internal to an object, so you can use to manipulate data in that object because that return would have to copy the table. And thus, the table object would only be mutable through direct member functions.
Deep copying is a useful tool. But it isn't the default because it shouldn't be. Most cases of copying tables don't need it, and users need a way to access a table from multiple locations.
There is no standard function or mechanism for deep copying either. The reason for that is simple: there are many ways to do a deep copy, ranging from the simple to the complex. Your simple deepCopy function for example breaks on a table that stores (recursively) itself:
me = { a = 4, other = {} }
me.other.me = me
That is 100% valid, and your deepCopy function will break on it. There are ways to implement deepCopy such that it will handle this, but they are complicated and expensive. Most users don't need a deepCopy that can handle recursive objects.
If the Lua's standard library had a deep copy function, then either it would handle every such case (and thus be expensive) or it would be a simpler one which could break on any number of corner cases (having multiple references to the same table in a table, etc).
So it's best to make any potential user of a deep copy sit down and decide exactly which cases they want to handle and which they do not.
Variables hold references, not entire tables.
It's far more efficient to copy a reference than an entire table.
A function call effectively assigns the arguments to that function's parameters, so if assignment did a full copy, it would be impossible to write a function that modifies a table.
Usually, when we assign a table to something, we either (a) don't plan on modifying the table, or (b) explicitly intend to use at least one of the variables to modify the underlying table. See the previous point on functions. This means that doing a full copy by default would be a waste of resources.
My advice is to only copy tables when you really need to, and prefer a shallow copy unless you really need a deep copy. In fact, when I need to copy tables, I usually write a specialized copy function so I don't copy any more than I need to.

Table remove with non-integer keys

I wanted to make my removeIf(aTable, unaryPredicate) function that removes elements that satisfy the predicate.
I've written the following code on a hunch, and surprisingly for me it works:
for k, v in pairs(aTable) do
if unaryPredicate(v) then
atable[k] = nil
end
end
What is the magic behind next or pairs that allows this code to work. As far as I see it iterates exactly sizeof(aTable) times.
Lua tables are implemented essentially as hashtables. The hashtable stores an array of (key, value) pairs.
next uses a hash to quickly skip to where they key should be in the table.
However, notice that there is a nil check in the implementation of next:
if (!ttisnil(&t->array[i])) { /* a non-nil value? */
This is because when nil is assigned to a key of a table, it updates the (key, value) pair inside the hashtable, but does not actually delete that entry. Thus you're left with a (key, nil) entry in the hashtable. This design allows iteration via next to continue unaffected when existing keys are assigned values, including when assigning to nil.
However, this is an implementation detail. Whether or not there is a nil entry in the hashtable is entirely invisible in the API exposed by the table implementation. Every function externally treats these nil keys in exactly the same way as absence.
next depends only on the keys in the table. The loop removes values but not keys (in the current implementation of Lua). The documentation explicitly says that you may remove values from tables in a loop like yours. It also says that you cannot add new entries with new keys, exactly because this would confuse next.

Initialisation order in Lua Table Constructor

So, a table constructor has two components, list-like and record-like. Do list-like entries always take precedence over record-like ones? I mean, consider the following scenario:
a = {[1]=1, [2]=2, 3}
print(a[1]) -- 3
a = {1, [2]=2, [1]=3}
print(a[1]) -- 1
Is the index 1 always associated with the first list-like entry, 2 with the second, and so on? Or is there something else?
There are two types of tables in Lua, arrays and dictionaries, these are what you call "lists" and "records". An array, contains values in an order, this gives you a few advantages, like faster iteration or inserting/removing values. Dictionaries work like a giant lookup table, it has no order, it's advantages are how you can use any value as a key, and you are not as restricted.
When you construct a table, you have 2 syntaxes, you can seperate the values with commas, e.g. {2,4,6,8} thereby creating an array with keys 1 through n, or you can define key-value pairs, e.g. {[1]=2,[58]=4,[368]=6,[48983]=8} creating a dictionary, you can often mix these syntaxes and you won't run into any problems, but this is not the case in your scenario.
What you are doing is defining the same key twice during the table's initial construction. This is most generally impractical and as such hasn't really had any serious thought put into it during the language's development. This means that what happens is essentially unspecified behaviour. It is not completely understood what effect this will have, and may be inconsistent across different platforms or implementations.
As such, you should not use this in any commercial projects, or any code you'll be sharing with other people. When in doubt, construct an empty table and define the key-value pairs afterward.

metamethods shadowing problems with luaL_ref key

I have an empty table, whose __newindex and __index metamethods are implemented from the C side. The table is going to be used as an array (t[1]=3, print(t[2])...), with C catching all the accesses.
Now, I want to use luaL_ref to add a reference of another object into this table, just to prevent the second from being thrown away by gc. But I think that the returned reference could shadow the "virtual" indexes that I'm going to use with this table:
For example, I expect t[1]=3 to call the __newindex, but if lauL_ref returned 1 then my table would really have a element at '1', then __newindex wouldn't be called anymore.
I know that luaL_ref is guaranteed to return a key not already used in the table, but since the table is empty (so that my metamethods are always called), I think it actually can return low values, which I'm likely to use.
Are there flaws in this reasoning? If not, how can I workaround this?
I would advise not using luaL_ref at all. At least, not on the empty table you're putting your metatable on. Maybe you should reference it in the metatable itself, or something other internal table that you store in the registry.

reading mnesia tables where key is a tuple and the search criteria contains '_' underscores

I have a 3rd or 4th normal form mnesia database and the table in question should be a key/value hash, however, the architect put the keys and values in the key portion of the record.
It looks something like:
-record(idx,{key,value}).
...
Invoice = 1,
Date = now(),
K1 = {?NORMAL_TYPE1,Invoice,Date},
mnesia:write(#idx{key=K1}).
...
So the question is... if I only know the Invoice number can I still get the data from the DB in the same time as if the Invoice was the only key instead of the tuple?
K2 = {?NORMAL_TYPE1,Invoice,'_'},
Rec = mnesia:read(#idx{key=K2}).
The short answer: No.
The longer answer: You may have a chance if the table is an orderet_set or something such, but I would not count on it.
The Aside: mnesia is usually not too effective w.r.t. normalized data. It is usually better to use standard RDBMS systems for that.
You can use mnesia:match_object/1/3 and mnesia:select/2/3/4/1 where you provide a pattern which can contain '_' as a don't care variable. I assume that is what you meant.

Resources