lua - table.concat with string keys - lua

I'm having a problem with lua's table.concat, and suspect it is just my ignorance, but cannot find a verbose answer to why I'm getting this behavior.
> t1 = {"foo", "bar", "nod"}
> t2 = {["foo"]="one", ["bar"]="two", ["nod"]="yes"}
> table.concat(t1)
foobarnod
> table.concat(t2)
The table.concat run on t2 provides no results. I suspect this is because the keys are strings instead of integers (index values), but I'm not sure why that matters.
I'm looking for A) why table.concat doesn't accept string keys, and/or B) a workaround that would allow me to concatenate a variable number of table values in a handful of lines, without specifying the key names.

Because that's what table.concat is documented as doing.
Given an array where all elements are strings or numbers, returns table[i]..sep..table[i+1] ··· sep..table[j]. The default value for sep is the empty string, the default for i is 1, and the default for j is the length of the table. If i is greater than j, returns the empty string.
Non-array tables have no defined order so table.concat wouldn't be all that helpful then anyway.
You can write your own, inefficient, table concat function easily enough.
function pconcat(tab)
local ctab, n = {}, =1
for _, v in pairs(tab) do
ctab[n] = v
n = n + 1
end
return table.concat(ctab)
end
You could also use next manually to do the concat, etc. yourself if you wanted to construct the string yourself (though that's probably less efficient then the above version).

Related

What is the fastest way to go through an array / table with numeric indices?

If i have an array with numbered indices in lua and need to go through every entry at least once, is it faster to use a numeric for loop or a generic for loop?
Semantical Difference
for i = 1, #t do end
is not the same as
for i, v in ipairs(t) do end
the latter does not rely on #t and respects the __ipairs metamethod (although this is deprecated and the __index and __newindex metamethods should be used instead).
Assuming no metamethods are set, ipairs will simply loop until it encounters a nil value. It is thus roughly equivalent to the following loop:
local i, v = 1, t[v]
while v ~= nil do --[[loop body here]] i = i + 1; v = t[i] end
This means there are two things it doesn't have to do:
It does not determine the length. It won't call a __len metamethod if set. This might theoretically result in a better performance for lists which reside in the hash part (where Lua has to determine the length through a search). It could also improve performance in cases where the __len metamethod does costly counting.
It does not have to loop over nil values. The numeric for loop on the other hand might loop over arbitrarily many nil values due to how the length operator is defined: For a table {[1] = 1, [1e9] = 1}, both 1 and 1e9 are valid values for the length. This also means it's unclear what it does, as the exact length value is unspecified.
The latter point in particular means that in pathological cases, the numeric for loop could be arbitrarily slower. It also allows for mistakes, such as looping over (possibly long) strings instead of tables, and won't trigger an error:
local str = "hello world"
for i = 1, #str do local v = str[i] end
will loop over only nil values (as it indexes the string metatable) but throw no error.
I also consider ipairs to be more readable as it makes the intent clear.
Performance Difference
For non-pathological cases (lists residing in the list part, no "holes" (nil values), no odd metatables), the numeric for loop can be expected to run marginally faster, as it does not incur the call overhead of the for generic loop you'd be using with ipairs. This ought to be benchmarked on different Lua implementations though:
PUC Lua 5.1 to 5.4
LuaJIT 2.1.0
In practice, the costs of looping will often be negligible compared to the cost of the operations performed within the loop body. Results may vary depending on other factors such as operating system or hardware.
Rudimentary Benchmarks
print(jit and jit.version or _VERSION)
-- Fill list with 100 million consecutive integer values
a = {}
for i = 1, 1e8 do a[i] = i end
local function bench(name, func)
func() -- "warmup"
local t = os.clock()
for _ = 1, 10 do func() end
print(name, os.clock() - t, "s")
end
bench("numeric", function()
for i = 1, #a do
local v = a[i]
end
end)
bench("ipairs", function()
for i, v in ipairs(a) do end
end)
Conducted on a Linux machine.
Lua 5.1
numeric 54.312082 s
ipairs 63.579478 s
Lua 5.2
numeric 20.482682 s
ipairs 32.757554 s
Lua 5.3
numeric 14.81573 s
ipairs 23.121844 s
Lua 5.4
numeric 11.684143 s
ipairs 24.455616 s
Finally, LuaJIT:
LuaJIT 2.1.0-beta3
numeric 0.567874 s
ipairs 0.70047 s
Conclusion: Use LuaJIT if possible and stop worrying about micro-optimizations such as ipairs vs. numeric for (even though the latter may be slightly faster). Something as simple as an assert(i == v) will already cost as much as the loop itself (even if assert is local).
In this exact case it would be faster to use a numeric for loop. But not by much and in my testing it is more prone to load differences on my system.
Numeric Loop
a = {} -- new array
for i = 1, 10000000 do
a[i] = 10000000 + i
end
local startNumLoop = os.time(os.date("!*t"))
for i = 1, #a, 1 do
local value = a[i]
end
local stopNumLoop = os.time(os.date("!*t"))
local numloop = stopNumLoop - startNumLoop
print(os.clock())
Result: 1.379 - 1.499
Generic Loop
a = {} -- new array
for i = 1, 10000000 do
a[i] = 10000000 + i
end
local startGenLoop = os.time(os.date("!*t"))
for index, value in ipairs(a) do
end
local stopGenLoop = os.time(os.date("!*t"))
local genLoop = stopGenLoop- startGenLoop
print(os.clock())
Result: 1.568 - 1.662
Tested with lua 5.3.6 win32 binarys from lua.org
This was just a question i had where i didn't find the answer fast enough. If it is in fact a duplicate feel free to mark it so. :)
Especially for loops where is necessary to remove/change the current key/value pair i prefer the fast countdown method...
for i = #tab, 1, -1 do
if (tab[i].x > x) then
table.remove(tab, i) -- See comments
end
end
...for example in LÖVE [love2d] between two frames.

Non numeral indeces and the # never counts?

Given a table with mixed indexes like:
table = {
foo = 'bar'
[1] = 'foobar'
}
My question is about the # which gives the last index which is not separate through a gap while iterating through the table.
print(#table)
will give the output 1.
table = {
foo = 'bar',
lol = 'rofl',
[1] = 'some',
[2] = 'thing',
[3] = 'anything',
[4] = 'else'
}
print(#table)
should print 4
Can I be 100% sure that the # will never be distracted by non-numeral indexes?
Are those indexes really unregarded at every time?
Yes, you can count on that (in lua 5.1).
From the lua reference manual:
The length operator is denoted by the unary operator #. The length of
a string is its number of bytes (that is, the usual meaning of string
length when each character is one byte).
The length of a table t is defined to be any integer index n such that
t[n] is not nil and t[n+1] is nil; moreover, if t[1] is nil, n can be
zero. For a regular array, with non-nil values from 1 to a given n,
its length is exactly that n, the index of its last value. If the
array has "holes" (that is, nil values between other non-nil values),
then #t can be any of the indices that directly precedes a nil value
(that is, it may consider any such nil value as the end of the array).
lua 5.2 allows for the __len metamethod to operate on tables and that means # can do other things. See #kikito's answer for some examples.
Etan answer is correct, but not complete.
In Lua, if a table's metatable has a __len function, it will control what the # operator spits out. One can define it so that it takes into account the non-array keys.
local mt = {__len = function(tbl)
local len = 0
for _ in pairs(tbl) do len = len + 1 end
return len
end}
This demonstrates the thing:
local t = {1,2,3,4,foo='bar',baz='qux'}
print(#t) -- 4
setmetatable(t, mt)
print(#t) -- 6
If you really want to make sure that you get the "proper" array-like length, you must use rawlen instead:
print(rawlen(t)) -- 4, even with the metatable set
Edit: Note that __len does not work as I mention on Lua 5.1
The only way is to iterate through entries and count them. Iterate with ipair through the item and increment counter then return result.
function tablelength(T)
local count = 0 for _ in pairs(T) do
count = count + 1 end
return count
end
The # operator only work for hash table type.
See: How to get number of entries in a Lua table?

Why does Lua's length (#) operator return unexpected values?

Lua has the # operator to compute the "length" of a table being used as an array.
I checked this operator and I am surprised.
This is code, that I let run under Lua 5.2.3:
t = {};
t[0] = 1;
t[1] = 2;
print(#t); -- 1 aha lua counts from one
t[2] = 3;
print(#t); -- 2 tree values, but only two are count
t[4] = 3;
print(#t); -- 4 but 3 is mssing?
t[400] = 400;
t[401] = 401;
print(#t); -- still 4, now I am confused?
t2 = {10, 20, nil, 40}
print(#t2); -- 4 but documentations says this is not a sequence?
Can someone explain the rules?
About tables in general
(oh, can't you just give me an array)
In Lua, a table is the single general-purpose data structure. Table keys can be of any type, like number, string, boolean. Only nil keys aren't allowed.
Whether tables can or can't contain nil values is a surprisingly difficult question which I tried to answer in depth here. Let's just assume that setting t[k] = nil should be the observably the same as never setting k at all.
Table construction syntax (like t2 = {10, 20, nil, 40}) is a syntactic sugar for creating a table and then setting its values one by one (in this case: t2 = {}, t2[1] = 10, t2[2] = 20, t2[3] = nil, t2[4] = 40).
Tables as arrays
(oh, from this angle it really looks quite arrayish)
As tables are the only complex data structure in Lua, the language (for convenience) provides some ways for manipulating tables as if they were arrays.
Notably, this includes the length operator (#t) and many standard functions, like table.insert, table.remove, and more.
The behavior of the length operator (and, in consequence, the mentioned utility functions) is only defined for array-like tables with a particular set of keys, so-called sequences.
Quoting the Lua 5.2 Reference manual:
the length of a table t is only defined if the table is a sequence, that is, the set of its positive numeric keys is equal to {1..n} for some integer n
As a result, the behavior of calling #t on a table not being a sequence at that time, is undefined.
It means that any result could be expected, including 0, -1, or false, or an error being raised (unrealistic for the sake of backwards compatibility), or even Lua crashing (quite unrealistic).
Indirectly, this means that the behavior of utility functions that expect a sequence is undefined if called with a non-sequence.
Sequences and non-sequences
(it's really not obvious)
So far, we know that using the length operator on tables not being sequences is a bad idea. That means that we should either do that in programs that are written in a particular way, that guarantees that those tables will always be sequences in practice, or, in case we are provided with a table without any assumptions about their content, we should dynamically ensure they are indeed a sequence.
Let's practice. Remember: positive numeric keys have to be in the form {1..n}, e.g. {1}, {1, 2, 3}, {1, 2, 3, 4, 5}, etc.
t = {}
t[1] = 123
t[2] = "bar"
t[3] = 456
Sequence. Easy.
t = {}
t[1] = 123
t[2] = "bar"
t[3] = 456
t[5] = false
Not a sequence. {1, 2, 3, 5} is missing 4.
t = {}
t[1] = 123
t[2] = "bar"
t[3] = 456
t[4] = nil
t[5] = false
Not a sequence. nil values aren't considered part of the table, so again we're missing 4.
t = {}
t[1] = 123
t[2] = "bar"
t[3.14] = 456
t[4] = nil
t[5] = false
Not a sequence. 3.14 is positive, but isn't an integer.
t = {}
t[0] = "foo"
t[1] = 123
t[2] = "bar"
Sequence. 0 isn't counted for the length and utility functions will ignore it, but this is a valid sequence. The definition only gives requirements about positive number keys.
t = {}
t[-1] = "foo"
t[1] = 123
t[2] = "bar"
Sequence. Similar.
t = {}
t[1] = 123
t["bar"] = "foo"
t[2] = "bar"
t[false] = 1
t[3] = 0
Sequence. We don't care about non-numeric keys.
Diving into the implementation
(if you really have to know)
But what happens in C implementation of Lua when we call # on a non-sequence?
Background: Tables in Lua are internally divided into array part and hash part. That's an optimization. Lua tries to avoid allocating memory often, so it pre allocates for the next power of two. That's another optimization.
When the last item in the array part is nil, the result of # is the length of the shortest valid sequence found by binsearching the array part for the first nil-followed key.
When the last item in the array part is not nil AND the hash part is empty, the result of # is the physical length of the array part.
When the last item in the array part is not nil AND the hash part is NOT empty, the result of # is the length of the shortest valid sequence found by binsearching the hash part for for the first nil-followed key (that is such positive integer i that t[i] ~= nil and t[i+1] == nil), assuming that the array part is full of non-nils(!).
So the result of # is almost always the (desired) length of the shortest valid sequence, unless the last element in the array part representing a non-sequence is non-nil. Then, the result is bigger than desired.
Why is that? It seems like yet another optimization (for power-of-two sized arrays). The complexity of # on such tables is O(1), while other variants are O(log(n)).
In Lua only specially formed tables are considered an array. They are not really an array such as what one might consider as an array in the C language. The items are still in a hash table. But the keys are numeric and contiguous from 1 to N. Lua arrays are unit offset, not zero offset.
The bottom line is that if you do not know if the table you have formed meets the Lua criteria for an array then you must count up the items in the table to know the length of the table. That is the only way. Here is a function to do it:
function table_count(T)
local count = 0
for _ in pairs(T) do count = count + 1 end
return count
end
If you populate a table with the "insert" function used in the manner of the following example, then you will be guaranteed of making an "array" table.
s={}
table.insert(s,[whatever you want to store])
table.insert could be in a loop or called from other places in your code. The point is, if you put items in your table in this way then it will be an array table and you can use the # operator to know how many items are in the table, otherwise you have to count the items.

Reading lua table with word index gives random order

Following is a lua code to read a table with word indexes.
reading this into another table and printing it in output gives random order everytime it is run.
earthquakes = {
date8 = "1992/01/17",
date7 = "1971/02/09",
date6 = "2010/04/04",
date5 = "1987/10/19"
}
sf = string.format
earthquake_num ={}
for k, v in pairs(earthquakes) do
table.insert(earthquake_num, {key=k,value=v})
end
for i, v in pairs (earthquake_num) do
print(sf(" row %d key = %s", i, v.value))
end
OUTPUT :
everytime in different order
This is special feature of Lua 5.2.1 :-)
But what for this feature was introduced?
Anyway, you should not rely on ordering generated by pairs function.
EDIT :
This feature was introduced to fight hash collision attacks on web servers that are using Lua.
Randomized hash algorithm prevents easy generating of strings with equal hashes.
Ordering of table keys generated by pairs function depends on hashes of strings for string-type keys, so string keys are happened to be mixed up on every program run.
From Lua PiL on iterators:
The pairs function, which iterates over all elements in a table, is
similar, except that the iterator function is the next function, which
is a primitive function in Lua:
function pairs (t)
return next, t, nil
end
The call next(t, k), where k is a key of the table t, returns a next key
in the table, in an arbitrary order. (It returns also the
value associated with that key, as a second return value.) The call
next(t, nil) returns a first pair. When there are no more pairs, next
returns nil.
And the enumeration for next states:
next (table [, index])
The order in which the indices are enumerated is not specified,
even for numeric indices. (To traverse a table in numeric order, use a numerical for or the ipairs function.)
As Egor says the pairs iterator returns table values in an arbitrary order. To sort data and return it in a sequenced format you need to use ipairs for example
earthquakes = {
date8 = "1992/01/17",
date7 = "1971/02/09",
date6 = "2010/04/04",
date5 = "1987/10/19"
}
sf = string.format
earthquake_num ={}
for k, v in pairs(earthquakes) do
table.insert(earthquake_num, {key=k,value=v})
end
table.sort(earthquake_num,function(a, b) return a.value < b.value end)
for i, v in ipairs (earthquake_num) do
print(sf(" row %d key = %s", i, v.value))
end
see lua: iterate through all pairs in table for more information.

What's the difference between table.insert(t, i) and t[#t+1] = i?

In Lua, there seem to be two ways of appending an element to an array:
table.insert(t, i)
and
t[#t+1] = i
Which should I use, and why?
Which to use is a matter of preference and circumstance: as the # length operator was introduced in version 5.1, t[#t+1] = i will not work in Lua 5.0, whereas table.insert has been present since 5.0 and will work in both. On the other hand, t[#t+1] = i uses exclusively language-level operators, wheras table.insert involves a function (which has a slight amount of overhead to look up and call and depends on the table module in the environment).
In the second edition of Programming in Lua (an update of the Lua 5.0-oriented first edition), Roberto Ierusalimschy (the designer of Lua) states that he prefers t[#t+1] = i, as it's more visible.
Also, depending on your use case, the answer may be "neither". See the manual entry on the behavior of the length operator:
If the array has "holes" (that is, nil values between other non-nil values), then #t can be any of the indices that directly precedes a nil value (that is, it may consider any such nil value as the end of the array).
As such, if you're dealing with an array with holes, using either one (table.insert uses the length operator) may "append" your value to a lower index in the array than you want. How you define the size of your array in this scenario is up to you, and, again, depends on preference and circumstance: you can use table.maxn (disappearing in 5.2 but trivial to write), you can keep an n field in the table and update it when necessary, you can wrap the table in a metatable, or you could use another solution that better fits your situation (in a loop, a local tsize in the scope immediately outside the loop will often suffice).
The following is slightly on the amusing side but possibly with a grain of aesthetics. Even though there are obvious reasons that mytable:operation() is not supplied like mystring:operation(), one can easily roll one's own variant, and get a third notation if desired.
Table = {}
Table.__index = table
function Table.new()
local t = {}
setmetatable(t, Table)
return t
end
mytable = Table.new()
mytable:insert('Hello')
mytable:insert('World')
for _, s in ipairs(mytable) do
print(s)
end
insert can insert arbitrarily (as its name states), it only defaults to #t + 1, where as t[#t + 1] = i will always append to the (end of the) table. see section 5.5 in the lua manual.
'#' operator only use indexed key table.
t = {1, 2 ,3 ,4, 5, x=1, y=2}
at above code
print(#t) --> print 5 not 7
'#' operator whenever not using.
If you want to '#' operator, then check it to table elements type.
Insert function can using any type use.But element count to work slow than '#'

Resources