Lua: Sort table of numbers with multiple dots - lua

I have a table of strings like this:
{
"1",
"1.5",
"3.13",
"1.2.5.7",
"2.5",
"1.3.5",
"2.2.5.7.10",
"1.17",
"1.10.5",
"2.3.14.9",
"3.5.21.9.3",
"4"
}
And would like to sort that like this:
{
"1",
"1.2.5.7",
"1.3.5",
"1.5",
"1.10.5",
"1.17",
"2.2.5.7.10",
"2.3.14.9",
"2.5",
"3.5.21.9.3",
"3.13",
"4"
}
How do I sort this in Lua? I know that table.sort() will be used, I just don't know the function (second parameter) to use for comparison.

Given your requirements, you probably want something like natural sort order. I described several possible solution as well as their impact on the results in a blog post.
The simplest solution may look like this (below), but there are 5 different solutions listed with different complexity and the results:
function alphanumsort(o)
local function padnum(d) return ("%03d%s"):format(#d, d) end
table.sort(o, function(a,b)
return tostring(a):gsub("%d+",padnum) < tostring(b):gsub("%d+",padnum) end)
return o
end

table.sort sorts ascending by default. You don't have to provide a second parameter then. As you're sorting strings Lua will compare the strings character by character. Hence you must implement a sorting function that tells Lua which comes first.
I just don't know the function (second parameter) to use for
comparison.
That's why people wrote the Lua Reference Manual
table.sort (list [, comp])
Sorts the list elements in a given order, in-place, from list1 to
list[#list]. If comp is given, then it must be a function that
receives two list elements and returns true when the first element
must come before the second in the final order, so that, after the
sort, i <= j implies not comp(list[j],list[i]). If comp is not given,
then the standard Lua operator < is used instead.
The comp function must define a consistent order; more formally, the
function must define a strict weak order. (A weak order is similar to
a total order, but it can equate different elements for comparison
purposes.)
The sort algorithm is not stable: Different elements considered equal
by the given order may have their relative positions changed by the
sort.
Think about how you would do it with pen an paper. You would compare each number segment. As soon as a segment is smaller than the other you know this number comes first.
So a solution would probably require you to get those segments for the strings, convert them to numbers so you can compare their values...

Related

Performance of accessing table via reference vs ipairs loop

I'm modding a game. I'd like to optimize my code if possible for a frequently called function. The function will look into a dictionary table (consisting of estimated 10-100 entries). I'm considering 2 patterns a) direct reference and b) lookup with ipairs:
PATTERN A
tableA = { ["moduleName.propertyName"] = { some stuff } } -- the key is a string with dot inside, hence the quotation marks
result = tableA["moduleName.propertyName"]
PATTERN B
function lookup(type)
local result
for i, obj in ipairs(tableB) do
if obj.type == "moduleName.propertyName" then
result = obj
break
end
end
return result
end
***
tableB = {
[1] = {
type = "moduleName.propertyName",
... some stuff ...
}
}
result = lookup("moduleName.propertyName")
Which pattern should be faster on average? I'd expect the 'native' referencing to be faster (it is certainly much neater), but maybe this is a silly assumption? I'm able to sort (to some extent) tableB in a order of frequency of the lookups whereas (as I understand it) tableA will have in Lua random internal order by default even if I declare the keys in proper order.
A lookup table will always be faster than searching a table every time.
For 100 elements that's one indexing operation compared to up to 100 loop cycles, iterator calls, conditional statements...
It is questionable though if you would experience a difference in your application with so little elements.
So if you build that data structure for this purpose only, go with a look-up table right away.
If you already have this data structure for other purposes and you just want to look something up once, traverse the table with a loop.
If you have this structure already and you need to look values up more than once, build a look up table for that purpose.

What exactly does table.move do, and when would I use it?

The reference manual has this to say about the table.move function, introduced in Lua 5.3:
table.move (a1, f, e, t [,a2])
Moves elements from table a1 to table a2, performing the equivalent to the following multiple assignment: a2[t],··· = a1[f],···,a1[e]. The default for a2 is a1. The destination range can overlap with the source range. The number of elements to be moved must fit in a Lua integer.
This description leaves a lot to be desired. I'm hoping for a general, canonical explanation of the function that goes into more detail than the reference manual. (Oddly, I could not find such an explanation anywhere on the web, perhaps because the function is fairly new.)
Particular points I am still confused on after reading the reference manual's explanation a few times:
When it says "move", that means the items are being removed from their original location, correct? Do the indices of items above the removed items shift down to fill the gaps? If so, and we're moving within the same table, does t point to the original location before anything starts moving?
Is there some significance to the choice of index letters f, e, and t?
There is no similar function in any other language I know. What's an example of how I might use this? Since it's one of only seven table functions, I presume it's quite useful.
Moves elements from table a1 to table a2, performing the equivalent to the following multiple assignment a2[t],··· = a1[f],···,a1[e]
Maybe they could have added the information this is done using consecutive integer values from f to e.
If you know Lua a bit more you'll know that a Lua table has no order. So the only way to make that code work is to use consecutive integer keys. Especially as the documentation mentions a source range.
Giving the equivalent syntax is the most unambiguous way of describing a function.
If you know the very basic concept of multiple assignment in Lua (see 3.3.3. Assignment) , you know what this function does.
table.move(a1, 1, 4, 6, a2) would copy a1[1], a1[2], a1[3], a1[4] into a2[6], a2[7], a2[8], a2[9]
The most common usecase is probably to get a subset of a list.
local values = {1,45,1,44,123,2354,321,745,1231}
old syntax:
local subset = {}
for i = 3, 7 do
table.insert(subset, values[i])
end
new:
local subset = table.move(values, 5, 7, 1, {})
Or maybe you quickly want to remove the last 3 values from a table?
local a = {1,2,3,4,5,6,7}
table.move({}, 1,3,#a-2, a)

table size difference. are both examples identical?

tNum={[2]=true , [3]=true,[4]=true, [5]=true ,[6]=true }
#tNum-->0
tNum={}
tNum[2]=true
tNum[3]=true
tNum[4]=true
tNum[5]=true
tNum[6]=true
#tNum-->6
why such a difference in size?
are both examples identical?
Your two tables are semantically identical, but using # on them is ambiguous. Both 0 and 6 are correct lengths. Here's an abridged version of the docs:
The length operator applied on a table returns a border in that table. A border in a table t is any natural number that satisfies the following condition:
(border == 0 or t[border] ~= nil) and t[border + 1] == nil
A table with exactly one border is called a sequence.
When t is not a sequence, #t can return any of its borders. (The exact one depends on details of the internal representation of the table, which in turn can depend on how the table was populated and the memory addresses of its non-numeric keys.)
This is an example of undefined behavior (UB). (That may not be the right word, because the behavior is partially defined. UB in Lua can't launch nuclear weapons, as it can in C.) Undefined behavior is important, because it gives the devs the freedom to choose the fastest possible algorithm without worrying about what happens when a user violates their assumptions.
To find a length, Lua makes, at most, log n guesses instead of looking at every element to find an unambiguous length. For large arrays, this speeds things up a lot.
The issue is that when you define a table as starting at index [2], the length operator breaks because it assumes that tables start at index [1].
The following code works as intended:
tNum = {[1]=false, [2]=true, [3]=true, [4]=true, [5]=true, [6]=true}
#tNum => 6
The odd behaviour is caused because when you initialize an array with tNum={} it initializes by assigning every index to nil, and the first index is [1] (It doesn't actually initialize every value to nil, but it's easier to explain that way).
Conversely, when you initialize an array with tNum={[2]=true} you are explicitly telling the array that tNum[1] does not exist and the array begins at index 2. The length calculation breaks when you do this.
For a more thorough explanation, see this section of the lua wiki near the bottom where it explains:
For those that really want their arrays starting at 0, it is not difficult to write the following:
days = {[0]="Sunday", "Monday", "Tuesday", "Wednesday",
"Thursday", "Friday", "Saturday"}
Now, the first value, "Sunday", is at index 0. That zero does not affect the other fields, but "Monday" naturally goes to index 1, because it is the first list value in the constructor; the other values follow it. Despite this facility, I do not recommend the use of arrays starting at 0 in Lua. Remember that most functions assume that arrays start at index 1, and therefore will not handle such arrays correctly.
The Length operator assumes your array will begin at index [1], and since it does not, it doesn't work correctly.
I hope this was helpful, good luck with your code!

Is there a way to tell `next` to start at specific key?

My understanding is that pairs(t) simply returns next, t, nil.
If I change that to next, t, someKey (where someKey is a valid key in my table) will next start at/after that key?
I tried this on the Lua Demo page:
t = { foo = "foo", bar = "bar", goo = "goo" }
for k,v in next, t, t.bar do
print(k);
end
And got varying results each time I ran the code. So specifying a starting key has an effect, unfortunately the effect seems somewhat random. Any suggestions?
Every time you run a program that traverses a Lua table the order will be different because Lua internally uses a random salt in hash tables.
This was introduced in Lua 5.2. See luai_makeseed.
From the lua documentation:
The order in which the indices are enumerated is not specified, even
for numeric indices. (To traverse a table in numeric order, use a
numerical for.)

Ruby on Rails method to calculate percentiles - can it be refactored?

I have written a method to calculate a given percentile for a set of numbers for use in an application I am building. Typically the user needs to know the 25th percentile of a given set of numbers and the 75th percentile.
My method is as follows:
def calculate_percentile(array,percentile)
#get number of items in array
return nil if array.empty?
#sort the array
array.sort!
#get the array length
arr_length = array.length
#multiply items in the array by the required percentile (e.g. 0.75 for 75th percentile)
#round the result up to the next whole number
#then subtract one to get the array item we need to return
arr_item = ((array.length * percentile).ceil)-1
#return the matching number from the array
return array[arr_item]
end
This looks to provide the results I was expecting but can anybody refactor this or offer an improved method to return specific percentiles for a set of numbers?
Some remarks:
If a particular index of an Array does not exist, [] will return nil, so your initial check for an empty Array is unnecessary.
You should not sort! the Array argument, because you are affecting the order of the items in the Array in the code that called your method. Use sort (without !) instead.
You don't actually use arr_length after assignment.
A return statement on the last line is unnecessary in Ruby.
There is no standard definition for the percentile function (there can be a lot of subtleties with rounding), so I'll just assume that how you implemented it is how you want it to behave. Therefore I can't really comment on the logic.
That said, the function that you wrote can be written much more tersely while still being readable.
def calculate_percentile(array, percentile)
array.sort[(percentile * array.length).ceil - 1]
end
Here's the same refactored into a one liner. You don't need an explicit return as the last line in Ruby. The return value of the last statement of the method is what's returned.
def calculate_percentile(array=[],percentile=0.0)
# multiply items in the array by the required percentile
# (e.g. 0.75 for 75th percentile)
# round the result up to the next whole number
# then subtract one to get the array item we need to return
array ? array.sort[((array.length * percentile).ceil)-1] : nil
end
Not sure if it's worth it, but here is how I did it for the quartiles:
def median(list)
(list[(list.size - 1) / 2] + list[list.size / 2]) / 2
end
numbers = [1, 2, 3, 4, 5, 6]
if numbers.size % 2 == 0
puts median(numbers[0...(numbers.size / 2)])
puts median(numbers)
puts median(numbers[(numbers.size / 2)..-1])
else
median_index = numbers.index(median(numbers))
puts median(numbers[0..(median_index - 1)])
puts median(numbers)
puts median(numbers[(median_index + 1)..-1])
end
If you're calculating both quartiles, you might want to move the "sort" outside the function, so that it only needs to be done once. This also means you aren't modifying your caller's data (sort!), nor making a copy every time the function is called (sort).
I know, premature optimisation and all that. And it's a bit awkward for the function to say, "the array must be sorted before calling this function". So it's reasonable to leave it as it is.
But sorting already-sorted data is going to take considerably longer than the whole rest of the function put together(*). It also has higher algorithmic complexity: O(N) at best, when the function could be O(1) for the second quartile (although O(N log N) for the first one if the data is not already sorted, of course). So it's worth avoiding if performance might ever be an issue for this function.
There are slightly faster ways of finding the two quartiles than a full sort (look up "selection algorithms"). For instance if you're familiar with the way qsort uses pivots, observe that if you need to know the 25th and 75th items out of 100, and your pivot at some stage ends up in position 80, then there's absolutely no point recursing into the block above the pivot. You really don't care what order those elements are in, just that they're in the top quartile. But this will considerably increase the complexity of the code compared with just calling a library to sort for you. Unless you really need a minor performance boost, I think you're good as you are.
(*) Unless ruby arrays have a flag to remember they're already sorted and haven't been modified since. I don't know whether they do, but if so then using sort! a second time is of course free.

Resources