Table remove with non-integer keys - lua

I wanted to make my removeIf(aTable, unaryPredicate) function that removes elements that satisfy the predicate.
I've written the following code on a hunch, and surprisingly for me it works:
for k, v in pairs(aTable) do
if unaryPredicate(v) then
atable[k] = nil
end
end
What is the magic behind next or pairs that allows this code to work. As far as I see it iterates exactly sizeof(aTable) times.

Lua tables are implemented essentially as hashtables. The hashtable stores an array of (key, value) pairs.
next uses a hash to quickly skip to where they key should be in the table.
However, notice that there is a nil check in the implementation of next:
if (!ttisnil(&t->array[i])) { /* a non-nil value? */
This is because when nil is assigned to a key of a table, it updates the (key, value) pair inside the hashtable, but does not actually delete that entry. Thus you're left with a (key, nil) entry in the hashtable. This design allows iteration via next to continue unaffected when existing keys are assigned values, including when assigning to nil.
However, this is an implementation detail. Whether or not there is a nil entry in the hashtable is entirely invisible in the API exposed by the table implementation. Every function externally treats these nil keys in exactly the same way as absence.

next depends only on the keys in the table. The loop removes values but not keys (in the current implementation of Lua). The documentation explicitly says that you may remove values from tables in a loop like yours. It also says that you cannot add new entries with new keys, exactly because this would confuse next.

Related

How to check if child occurs multiple times in a dictionary

I am querying some data and saving it to a dictionary in Swift like this:
[[route1: [date, destination, description],
[route2: [date, destination, description]]]
I want to check whether the 'destination' occurs multiple times, and then save those entries to a separate dictionary. I can do the first part of checking, but I cannot figure out how I can then save each reccuring element of [route: [date, destination, description]] to a separate dictionary. The only way I can do it, is by adding them when there is more than one, but then I miss the first one.
Can someone tell me how to do this?
Create a new dictionary (let's call it reverseMap).
Loop over each key-value pair in your original dictionary.
For each pair:
In your reverseMap, see if you have a key yet containing your original dictionary's value (standardized if necessary to collapse similar values you want to consider necessary, e.g., arrays in different orders, or if you only care about destination being the same). If you don’t have one yet, create it, setting the value to an array with one element: the current key from your original dictionary.
If you do already have such a key, you have found a duplicate; at this point, you should add to the array at reverseMap[key] containing the current key from your original dictionary.
This should provide all the information you need. When you are done, reverseMap will contain, for each value in your original dictionary, a complete list of every key in your original dictionary that mapped to that value.

Erlang ets:insert_new for bag

In my code I want to take advantage of ETS's bag type that can store multiple values for single key. However, it would be very useful to know if insertion actually inserts a new value or not (i.e. if the inserted key with value was or was not present in the bag).
With type set of ETS I could use ets:insert_new, but semantics is different for bag (emphasis mine):
This function works exactly like insert/2, with the exception that instead of overwriting objects with the same key (in the case of set or ordered_set) or adding more objects with keys already existing in the table (in the case of bag and duplicate_bag), it simply returns false.
Is there a way to achieve such functionality with one call? I understand it can be achieved by a lookup followed by an optional insert, but I am afraid it might hurt performance of concurrent access.

What is the purpose of a DictionaryIndex in Swift?

Per the header documentation on Dictionary in Swift:
A hash-based mapping from Key to Value instances. Also a
collection of key-value pairs with no defined ordering.
Note in particular- no defined ordering.
With this in mind, I'm having trouble fully understanding these computed variables (and the related methods that take these types):
// The position of the first element in a non-empty dictionary.
var startIndex: DictionaryIndex<Key, Value> { get }
// The collection's "past the end" position.
var endIndex: DictionaryIndex<Key, Value> { get }
The "index" here is a DictionaryIndex.
However, the documentation on DictionaryIndex is kinda circular here:
Used to access the key-value pairs in an instance of
Dictionary<Key, Value>.
What actually is the purpose of DictionaryIndex?
We know that a Dictionary is composed of keys and values. Every key is mapped to a value based on some internal calculations. Here the mechanism used for this purpose is Hashing.
From wikipedia:
A hash table uses a hash function to compute an index into an array of buckets or slots, from which the correct value can be found.
Consider that a Dictionary is a Hash Table, which uses some hash function and returns an object of type DictionaryIndex - using which you can access particular object directly in the Dictionary.
Correct me if I am wrong!

Availability of bidictionary structure?

I'm facing a case in my application where I need a bidirectional dictionary data structure, that means a kind of NSDictionary where your can retrieve a key with a value and a value with a key (all values and keys are unique).
Is there such a kind of data structure in C / ObjectiveC ?
You can do it with a NSDictionary:
allKeysForObject: Returns a new array containing the keys
corresponding to all occurrences of a given object in the dictionary.
(NSArray *)allKeysForObject:(id)anObject Parameters anObject The value to look for in the dictionary. Return Value A new array
containing the keys corresponding to all occurrences of anObject in
the dictionary. If no object matching anObject is found, returns an
empty array.
Discussion Each object in the dictionary is sent an isEqual: message
to determine if it’s equal to anObject.
And:
objectForKey: Returns the value associated with a given key.
(id)objectForKey:(id)aKey Parameters aKey The key for which to return the corresponding value. Return Value The value associated with
aKey, or nil if no value is associated with aKey.
Literally, the answer is No.
As a workaround you may create a helper class which manages two dictionaries.
Another approach is to create a thin wrapper around C++ container which implement this: boost's Bimap.
When using ARC and Objective-C objects as values or keys in C++ containers, they will handle NSObjects quite nicely. That is, they take care of memory management as you would expect - and you even get "exception safety" for free. Additionally, C++ standard containers are also a tad faster, use less memory, and provide more options to optimize (e.g. custom allocators).

How to remove a lua table entry by its key?

I have a lua table that I use as a hashmap, ie with string keys :
local map = { foo = 1, bar = 2 }
I would like to "pop" an element of this table identified by its key. There is a table.remove() method, but it only takes the index of the element to remove (ie a number) and not a generic key. I would like to be able to do table.remove(map, 'foo') and here is how I implemented it :
function table.removekey(table, key)
local element = table[key]
table[key] = nil
return element
end
Is there a better way to do that ?
No, setting the key's value to nil is the accepted way of removing an item in the hashmap portion of a table. What you're doing is standard. However, I'd recommend not overriding table.remove() - for the array portion of a table, the default table.remove() functionality includes renumbering the indices, which your override would not do. If you do want to add your function to the table function set, then I'd probably name it something like table.removekey() or some such.
TLDR
(because you're only damn trying to remove a thing from a map, how hard can that be)
Setting a key to nil (e.g. t[k] = nil) is not only accepted and standard, but it's the only way of removing the entry from the table's "content" in general. And that makes sense. Also, array portion and hashmap portion are implementation details and shouldn't have ever be mentioned in this Q/A.
Understanding Lua's tables
(and why you can't remove from an infinite)
Lua tables don't literally have concept of "removing an entry from a table" and it hardly has a concept of "inserting an entry to a table". This is different from many other data structures from different programming languages.
In Lua, tables are modelling a perfect associative structure of infinite size.
Tables in Lua are inherently mutable and there's only one fundamental way to construct a new table: using the {} expression. Constructing a table with initial content (e.g. t = {10, 20, ["foo"] = 123, 30} or anything alike) is actually a syntactic sugar equivalent to first constructing a new table (e.g. t = {}} and then setting the "initial" entries one by one (e.g. t[1] = 10, t[2] = 20, t["foo"] = 123, t[3] = 30) . The details of how the de-sugaring works doesn't help with understanding the discussed matter, so I will be avoiding the table construction sugar in this answer.
In Lua, a freshly-constructed table initially associates all possible values with nil. That means that for a table t = {}, t[2] will evaluate to nil, t[100] will evaluate to nil, t["foo"] will evaluate to nil, t[{}] will evaluate to nil, etc.
After construction, you can mutate the table by setting a value at some key. Then that key will be now associated with that value. For example, after setting t["foo"] = "bar", the key "foo" will now be associated with the value "bar". In consequence, t["foo"] will now evaluate to "bar".
Setting nil at some key will associate that key to nil. For example, after setting t["foo"] = nil, "foo" will (again) be associated with nil. In consequence, t["foo"] will (again) evaluate to nil.
While any key can be associated to nil (and initially all possible keys are), such entries (key/value pairs) are not considered a part of the table (i.e. aren't considered part of the table content).
Functions pairs and ipairs (and multiple others) operate on table's content, i.e. the of associations in which the value isn't nil. The number of such associations is always finite.
Having everything said above in mind, associating a key with nil will probably do everything you could expect when saying "removing an entry from a table", because t[k] will evaluate to nil (like it did after constructing the table) and functions like pairs and ipairs will ignore such entries, as entries (associations) with value nil aren't considered "a part of the table".
Sequences
(if tables weren't already tricky)
In this answer, I'm talking about tables in general, i.e. without any assumption about their keys. In Lua, tables with a particular set of keys can be called a sequence, and you can use table.remove to remove an integer key from such table. But, first, this function is effectively undefined for non-sequences (i.e. tables in general) and, second, there's no reason to assume that it's more than a util, i.e. something that could be directly implemented in Lua using primitive operations.
Which tables are or aren't a sequence is another hairy topic and I won't get into details here.
Referencing the reference
(I really didn't make up all that)
Everything said above is based on the official language reference manual. The interesting parts are mostly chapter 2.1 – Values and Types...
Tables can be heterogeneous; that is, they can contain values of all types (except nil). Any key with value nil is not considered part of the table. Conversely, any key that is not part of a table has an associated value nil.
This part is not worded perfectly. First, I find the phrase "tables can be heterogeneous" confusing. It's the only use of this term in the reference and the "can be" part makes it non-obvious whether "being heterogeneous" is a possible property of a table, or whether it tables are defined that way. The second sentence make the first explanation more reasonable, because if "any key with value nil is not considered part of the table", then it means that "a key with value nil" is not a contradiction. Also, the specification of the rawset function, which (indirectly) gives semantics to the t[k] = v syntax, in the 6.1 – Basic Functions chapter says...
Sets the real value of table[index] to value, without invoking any metamethod. table must be a table, index any value different from nil and NaN, and value any Lua value.
As nil values are not special-cased here, that means that you can set t[k] to nil. The only way to understand that is to accept that from now on, that key will be "a key with value nil", and in consequence "will not be considered part of the table" (pairs will ignore it, etc.), and as "any key that is not part of a table has an associated value nil", t[k] will evaluate to nil.
The whole reference also doesn't mention "removing" a key or an entry from tables in any other place.
Another perspective on tables
(if you hate infinities)
While I personally find the perspective from the reference elegant, I also understand that the fact that it's different from other popular models might make it more difficult to reason about.
I believe that the following perspective is effectively equivalent to the previous one.
You can think that...
{} returns an empty table.
t[k] evaluates to v if t contains key k, and nil otherwise
Setting t[k] = v inserts a new entry (k, v) to the table if it doesn't contain key k, updates such entry if t already contains key k, and finally, as a special case, removes the entry for the key k if v is nil
The content of the table (i.e. what's considered "a part of the table") is the set of all entries from the table
In this model, tables aren't capable of "containing" nil values.
This is not how the language reference defines things, but to the best of my understanding, such model is observably equivalent.
Don't talk implementation details
(unless you're sure that that's what you mean)
The so-called "hashmap portion" of the table (which supplements the so-called "array portion" of the table) are implementation details and talking about them, unless we discuss performance or the explanation of specific undefined or implementation-defined behaviors, is in my opinion confusing in the best case and plain wrong in the worst.
For example, in a table constructed like this... t = {}, t[1] = 10, t[2] = 20, t[3] = 30, the array portion will (probably!) be [10, 20, 30] and setting t[2] = nil will "remove" the entry (2, 20) "from the array part", possibly also resizing it or moving 3 -> 30 to the hashmap part. I'm not really sure. I'm just saying this to prove that discussing implementation details is not what we want to do here.

Resources