How start calculate statistics about internal Lua performance? - lua

How to determine Lua internal performance, i.e. start calculate statistics on table count, reference count, function call count and other.
I guess that my Lua scenario internal performance issue in manipulate with table (i.e. create a lot of table >= 1200) and this is performance problem to my scenario.
I would like to avoid redesigning scenario and ensure point of bottleneck.

Have a look at the debug library, that should help you quite a bit already.
Using os.clock() to time execution times of blocks of code can also help you identify roughly where your bottlenecks are.
Lookin at the instructions generated from your program using luac -p -l can also give you lots of insight into how things work and what might be causing your code to run slow.
Here's a short example of how you can even output the instructions of a function at runtime:
local function map(f, a, ...) -- Just a simple example function :)
if a then
return f(a), map(f, ...)
end
end
-- This is where the magic happens ;)
f = io.popen('luac -l -p -', 'w')
f:write(string.dump(map))
f:close()

Related

Why I always fail on loading big files in lua? [duplicate]

The overview is I am prototyping code to understand my problem space, and I am running into 'PANIC: unprotected error in call to Lua API (not enough memory)' errors. I am looking for ways to get around this limit.
The environment bottom line is Torch, a scientific computing framework that runs on LuaJIT, and LuaJIT runs on Lua. I need Torch because I eventually want to hammer on my problem with neural nets on a GPU, but to get there I need a good representation of the problem to feed to the nets. I am (stuck) on Centos Linux, and I suspect that trying to rebuild all the pieces from source in 32bit mode (this is reported to extend the LuaJIT memory limit to 4gb) will be a nightmare if it works at all for all of the libraries.
The problem space itself is probably not particularly relevant, but in overview I have datafiles of points that I calculate distances between and then bin (i.e. make histograms of) these distances to try and work out the most useful ranges. Conveniently I can create complicated Lua tables with various sets of bins and torch.save() the mess of counts out, then pick it up later and inspect with different normalisations etc. -- so after one month of playing I am finding this to be really easy and powerful.
I can make it work looking at up to 3 distances with 15 bins each (15x15x15 plus overhead), but this only by adding explicit garbagecollection() calls and using fork()/wait() for each datafile so that the outer loop will keep running if one datafile (of several thousand) still blows the memory limit and crashes the child. This gets extra painful as each successful child process now has to read, modify and write the current set of bin counts -- and my largest files for this are currently 36mb. I would like to go larger (more bins), and would really prefer to just hold the counts in the 15 gigs of RAM I can't seem to access.
So, here are some paths I have thought of; please do comment if you can confirm/deny that any of them will/won't get me outside of the 1gb boundary, or will just improve my efficiency within it. Please do comment if you can suggest another approach that I have not thought of.
am I missing a way to fire off a Lua process that I can read an arbitrary table back in from? No doubt I can break my problem into smaller pieces, but parsing a return table from stdio (as from a system call to another Lua script) seems error prone, and writing/reading small intermediate files will be a lot of disk i/o.
am I missing a stash-and-access-table-in-high-memory module ? This seems like what I really want, but not found it yet
can FFI C data structures be put outside the 1gb? Doesn't seem like that would be the case but certainly I lack a full understanding of what is causing the limit in the first place. I suspect that this will just get me an efficiency improvement over generic Lua tables for the few pieces that have moved beyond prototyping? (unless I do a bunch of coding for each change)
Surely I can get out by writing an extension in C (Torch appears to support nets that should go outside of the limit), but my brief investigation there turns up references to 'lightuserdata' pointers -- does this mean that a more normal extension won't get outside 1gb either? This also seems like it has the heavy development cost for what should be a prototyping exercise.
I know C well so going the FFI or extension route doesn't bother me - but I know from experience that encapsulating algorithms in this way can be both really elegant and really painful with two places to hide bugs. Working through data structures containing tables within tables on the stack doesn't seem great either. Before I make this effort I would like to be certain that the end result really will solve my problem.
Thanks for reading the long post.
Only object allocated by LuaJIT itself are limited to the first 2GB of memory. This means that tables, strings, full userdata (i.e. not lightuserdata), and FFI objects allocated with ffi.new will count towards the limit, but objects allocated with malloc, mmap, etc. are not subjected to this limit (regardless if called by a C module or the FFI).
An example for allocating a structure with malloc:
ffi.cdef[[
typedef struct { int bar; } foo;
void* malloc(size_t);
void free(void*);
]]
local foo_t = ffi.typeof("foo")
local foo_p = ffi.typeof("foo*")
function alloc_foo()
local obj = ffi.C.malloc(ffi.sizeof(foo_t))
return ffi.cast(foo_p, obj)
end
function free_foo(obj)
ffi.C.free(obj)
end
The new GC to be implemented in LuaJIT 3.0 IIRC will not have this limit, but I haven't heard any news on it's development recently.
Source: http://lua-users.org/lists/lua-l/2012-04/msg00729.html
Here is some follow-up information for those who find this question later:
The key information is as posted by Colonel Thirty Two, that C module extensions and FFI code can easily get outside of the limit. (and the referenced lua list post reminds that plain Lua tables that go outside the limit will be very slow to garbage collect)
It took me some time to pull the pieces together to both access and save/load my objects, so here it is in one place:
I used lds at https://github.com/neomantra/lds as a starting point, in particular the 1-D Array code.
This broke using torch.save(), as it doesn't know how to write the new objects. For each object I added the code below (using Array as the example):
function Array:load(inp)
for i=1,#inp do
self._data[i-1] = tonumber(inp[i])
end
return self
end
function Array:serialize ()
local siz = tonumber(self._size)
io.write(' lds.ArrayT( ffi.typeof("double"), lds.MallocAllocator )( ', siz , "):load({")
for i=0,siz-1 do
io.write(string.format("%a,", self._data[i]))
end
io.write("})")
end
Note that my application specifically uses doubles and malloc(), so a better implementation would store and use these in self rather than hard coding above.
Then as discussed in PiL and elsewhere, I needed a serializer that would handle the object:
function serialize (o)
if type(o) == "number" then
io.write(o)
elseif type(o) == "string" then
io.write(string.format("%q", o))
elseif type(o) == "table" then
io.write("{\n")
for k,v in pairs(o) do
io.write(" ["); serialize(k); io.write("] = ")
serialize(v)
io.write(",\n")
end
io.write("}\n")
elseif o.serialize then
o:serialize()
else
error("cannot serialize a " .. type(o))
end
end
and this needs to be wrapped with:
io.write('do local _ = ')
serialize( myWeirdTable )
io.write('; return _; end')
and then the output from that can be loaded back in with
local myWeirdTableReloaded = dofile('myWeirdTableSaveFile')
See PiL (Programming in Lua book) for dofile()
Hope that helps someone!
You can use the torch tds module. From the README:
Data structures which do not rely on Lua memory allocator, nor being limited by Lua garbage collector.
Only C types can be stored: supported types are currently number, strings, the data structures themselves (see nesting: e.g. it is possible to have a Hash containing a Hash or a Vec), and torch tensors and storages. All data structures can store heterogeneous objects, and support torch serialization.

Lua tables: performance hit for starting array indexing at 0?

I'm porting FFT code from Java to Lua, and I'm starting to worry a bit about the fact that in Lua the array part of a table starts indexing at 1 while in Java array indexing starts at 0.
For the input array this causes no problem because the Java code is set up to handle the possibility that the data under consideration is not located at the start of the array. However, all of the working arrays internal to the code are assumed to starting indexing at 0. I know that the code will work as written -- Lua tables are awesome like that -- but I have no sense at all about the performance hit I might incur by having the "0" element of the array going into the hash table part of the underlying C structure (or indeed, if that is what will happen).
My question: is this something worth worrying about? Should I be planning to profile and hand-optimize the code? (The code will eventually be used to transform many relatively small (> 100 time points) signals of varying lengths not known in advance.)
I have made small, probably not that reliable, test:
local arr = {}
for i=0,10000000 do
arr[i] = i*2
end
for k, v in pairs(arr) do
arr[k] = v*v
end
And similar version with 1 as the first index. On my system:
$ time lua example0.lua
real 2.003s
$ time lua example1.lua
real 2.014s
I was also interested how table.insert will perform
for i=1,10000000 do
table.insert(arr, 2*i)
...
and, suprisingly
$ time lua example2.lua
real 6.012s
Results:
Of course, it depends on what system you're running it, probably also whic lua version, but it seems that it makes little to no difference between zero-start and one-start. Bigger difference is caused by the way you insert things to array.
I think the correct answer in this case is changing the algorithm so that everything is indexed with 1. And consider that part of the conversion.
Your FFT will be less surprising to another Lua user (like me), given that all "array-like" tables are indexed by one.
It might not be as stressful as you might think, given the way numeric loops are structured in Lua (where the "start" and the "end" are "inclusive"). You would be exchanging this:
for i=0,#array-1 do
... (do stuff with i)
end
By this:
for i=1,#array do
... (do stuff with i)
end
The non-numeric loops would remain unchanged (except that you will be able to use ipairs too, if you so desire).

Can I profile Lua scripts running in Redis?

I have a cluster app that uses a distributed Redis back-end, with dynamically generated Lua scripts dispatched to the redis instances. The Lua component scripts can get fairly complex and have a significant runtime, and I'd like to be able to profile them to find the hot spots.
SLOWLOG is useful for telling me that my scripts are slow, and exactly how slow they are, but that's not my problem. I know how slow they are, I'd like to figure out which parts of them are slow.
The redis EVAL docs are clear that redis does not export any timekeeping functions to lua, which makes it seem like this might be a lost cause.
So, short a custom fork of Redis, is there any way to tell which parts of my Lua script are slower than others?
EDIT
I took Doug's suggestion and used debug.sethook - here's the hook routine I inserted at the top of my script:
redis.call('del', 'line_sample_count')
local function profile()
local line = debug.getinfo(2)['currentline']
redis.call('zincrby', 'line_sample_count', 1, line)
end
debug.sethook(profile, '', 100)
Then, to see the hottest 10 lines of my script:
ZREVRANGE line_sample_count 0 9 WITHSCORES
If your scripts are processing bound (not I/O bound), then you may be able to use the debug.sethook function with a count hook:
The count hook: is called after the interpreter executes every
count instructions. (This event only happens while Lua is executing a
Lua function.)
You'll have to build a profiler based on the counts you receive in your callback.
The PepperfishProfiler would be a good place to start. It uses os.clock which you don't have, but you could just use hook counts for a very crude approximation.
This is also covered in PiL 23.3 – Profiles
In standard Lua C, you can't. It's not a built-in function - it only returns seconds. So, there are two options available: You either write your own Lua extension DLL to return the time in msec, or:
You can do a basic benchmark using a millisecond-resolution time. You can access the current millisecond time with LuaSocket. Though this adds a dependency to your project, it's an effective way to do trivial benchmarking.
require "socket"
t = socket.gettime();

Doing efficient mathematical calculations in Redis

Looking around the web for information on doing maths in Redis and don't actually find much. I'm using the Redis-RB gem in Rails, and caching lists of results:
e = [1738738.0, 2019461.0, 1488842.0, 2272588.0, 1506046.0, 2448701.0, 3554207.0, 1659395.0, ...]
$redis.lpush "analytics:math_test", e
Currently, our lists of numbers max in the thousands to tens of thousands per list per day, with number of lists likely in the thousands per day. (This is not actually that much; however, we're growing, and expect much larger sample sizes very soon.)
For each of these lists, I'd like to be able to run stats. I currently do this in-app
def basic_stats(arr)
return nil if arr.nil? or arr.empty?
min = arr.min.to_f
max = arr.max.to_f
total = arr.inject(:+)
len = arr.length
mean = total.to_f / len # to_f so we don't get an integer result
sorted = arr.sort
median = len % 2 == 1 ? sorted[len/2] : (sorted[len/2 - 1] + sorted[len/2]).to_f / 2
sum = arr.inject(0){|accum, i| accum +(i-mean)**2 }
variance = sum/(arr.length - 1).to_f
std_dev = Math.sqrt(variance).nan? ? 0 : Math.sqrt(variance)
{min: min, max: max, mean: mean, median: median, std_dev: std_dev, size: len}
end
and, while I could simply store the stats, I will often have to aggregate lists together to run stats on the aggregated list. Thus, it makes sense to store the raw numbers rather than every possible aggregated set. Because of this, I need the math to be fast, and have been exploring ways to do this. The simplest way is just doing it in-app, with 150k items in a list, this isn't actually too terrible:
$redis_analytics.llen "analytics:math_test", 0, -1
=> 156954
Benchmark.measure do
basic_stats $redis_analytics.lrange("analytics:math_test", 0, -1).map(&:to_f)
end
=> 2.650000 0.060000 2.710000 ( 2.732993)
While I'd rather not push 3 seconds for a single calculation, given that this might be outside of my current use-case by about 10x number of samples, it's not terrible. What if we were working with a sample size of one million or so?
$redis_analytics.llen("analytics:math_test")
=> 1063454
Benchmark.measure do
basic_stats $redis_analytics.lrange("analytics:math_test", 0, -1).map(&:to_f)
end
=> 21.360000 0.340000 21.700000 ( 21.847734)
Options
Use the SORT method on the list, then you can instantaneously get min/max/length in Redis. Unfortunately, it seems that you still have to go in-app for things like median, mean, std_dev. Unless we can calculate these in Redis.
Use Lua scripting to do the calculations. (I haven't learned any Lua yet, so can't say I know what this would look like. If it's likely faster, I'd like to know so I can try it.)
Some more efficient way to utilize Ruby, which seems a wee bit unlikely since utilizing what seems like a fairly decent stats gem has analogous results
Use a different database.
Example using StatsSample gem
Using a gem seems to gain me nothing. In Python, I'd probably write a C module, not sure if many ruby stats gems are in C.
require 'statsample'
def basic_stats(stats)
return nil if stats.nil? or stats.empty?
arr = stats.to_scale
{min: arr.min, max: arr.max, mean: arr.mean, median: arr.median, std_dev: arr.sd, size: stats.length}
end
Benchmark.measure do
basic_stats $redis_analytics.lrange("analytics:math_test", 0, -1).map(&:to_f)
end
=> 20.860000 0.440000 21.300000 ( 21.436437)
Coda
It's quite possible, of course, that such large stats calculations will simply take a long time and that I should offload them to a queue. However, given that much of this math is actually happening inside Ruby/Rails, rather than in the database, I thought I might have other options.
I want to keep this open in case anyone has any input that could help others doing the same thing. For me, however, I've just realized that I'm spending too much time trying to force Redis to do something that SQL does quite well. If I simply dump this into Postgres, I can do really efficient aggregation AND math directly in the database. I think I was just stuck using Redis for something that, when it started, was a good idea, but scaled out to something bad.
Lua scripting is probably the best way to solve this problem, if you can switch to Redis 2.6. Btw testing the speed should be pretty straightforward so given the small time investment needed I strongly suggest trying Lua scripting to see what is the result you get.
Another thing you could do is to use Lua to set data, and make sure it will also update a related Hash type per every list to directly retain the min/max/average stats, so you don't have to compute those stats every time, as they are incrementally updated. Not always possible btw, depends on your specific use case.
I would take a look at NArray. From their homepage:
This extension library incorporates fast calculation and easy manipulation of large numerical arrays into the Ruby language.
It looks like their array class has most all of the functions you need built in. Cmd-F "Statistics" on that page.

(Secure) Random string?

In Lua, one would usually generate random values, and/or strings by using math.random & math.randomseed, where os.time is used for math.randomseed.
This method however has one major weakness; The returned number is always just as random as the current time, AND the interval for each random number is one second, which is way too long if one needs many random values in a very short time.
This issue is even pointed out by the Lua Users wiki: http://lua-users.org/wiki/MathLibraryTutorial, and the corresponding RandomStringS receipe: http://lua-users.org/wiki/RandomStrings.
So I've sat down and wrote a different algorithm (if it even can be called that), that generates random numbers by (mis-)using the memory addresses of tables:
math.randomseed(os.time())
function realrandom(maxlen)
local tbl = {}
local num = tonumber(string.sub(tostring(tbl), 8))
if maxlen ~= nil then
num = num % maxlen
end
return num
end
function string.random(length,pattern)
local length = length or 11
local pattern = pattern or '%a%d'
local rand = ""
local allchars = ""
for loop=0, 255 do
allchars = allchars .. string.char(loop)
end
local str=string.gsub(allchars, '[^'..pattern..']','')
while string.len(rand) ~= length do
local randidx = realrandom(string.len(str))
local randbyte = string.byte(str, randidx)
rand = rand .. string.char(randbyte)
end
return rand
end
At first, everything seems perfectly random, and I'm sure they are... at least for the current program.
So my question is, how random are these numbers returned by realrandom really?
Or is there an even better way to generate random numbers in a shorter interval than one second (which kind of implies that os.time shouldn't be used, as explaind above), without relying on external libraries, AND, if possible, in an entirely crossplatform manner?
EDIT:
There seems to be a major misunderstanding regarding the way the RNG is seeded; In production code, the call to math.randomseed() happens just once, this was just a badly chosen example here.
What I mean by the random value is only random once per second, is easily demonstrated by this paste: http://codepad.org/4cDsTpcD
As this question will get downvoted regardless my edits, I also cancelled my previously accepted answer - In hope for a better one, even if just better opinions. I understand that issues regarding random values/numbers has been discussed many times before, but I have not found such a question that could be relevant to Lua - Please keep that in mind!
You should not call seed each time you call random, you ought to call it only once, on the program initialization (unless you get the seed from somewhere, for example, to replicate some previous "random" behaviour).
Standard Lua random generator is of poor quality in the statistical sense (as it is, in fact, standard C random generator), do not use it if you care for that. Use, for example, lrandom module (available in LuaRocks).
If you need more secure random, read from /dev/random on Linux. (I think that Windows should have something along the same lines — but you may need to code something in C to use it.)
Relying on table pointer values is a bad idea. Think about alternate Lua implementations, in Java, for example — there is no telling what they would return. (Also, the pointer values may be predictable, and they may be, under certain circumstances the same each time the program is invoked.)
If you want finer precision for the seed (and you will want this only if you're launching the program more often than once per second), you should use a timer with better resolution. For example, socket.gettime() from LuaSocket. Multiply it by some value, since math.randomseed is working with integer part only, and socket.gettime() returns time in (floating point) seconds.
require 'socket'
math.randomseed(socket.gettime() * 1e6)
for i = 1, 1e3 do
print(math.random())
end
This method however has one major
weakness; The returned number is
always just as random as the current
time, AND the interval for each random
number is one second, which is way too
long if one needs many random values
in a very short time.
It has those weaknesses only if you implement it incorrectly.
math.randomseed is supposed to be called sparingly - usually just once at the beginning of your program, and it usually seeds using os.time. Once the seed is set, you can use math.random many times, and it will yield random values.
See what happens on this sample:
> math.randomseed(1)
> return math.random(), math.random(), math.random()
0.84018771715471 0.39438292681909 0.78309922375861
> math.randomseed(2)
> return math.random(), math.random(), math.random()
0.70097636929759 0.80967634907443 0.088795455214007
> math.randomseed(1)
> return math.random(), math.random(), math.random()
0.84018771715471 0.39438292681909 0.78309922375861
When I change the seed from 1 to 2, I get different random results. But when I go back to 1, the "random sequence" is reset. I obtain the same values as before.
os.time() returns an ever-increasing number. Using it as a seed is appropriate; then you can invoke math.random forever and have different random numbers every time you invoke it.
The only scenario you have to be a bit worried about non-randomness is when your program is supposed to be executed more than once per second. In that case, as the others are saying, the simplest solution is using a clock with higher definition.
In other words:
Invoke math.randomseed with an appropiate seed (os.time() is ok 99% of the cases) at the beginning of your program
Invoke math.random every time you need a random number.
Regards!
Some thoughts on the first part of your question:
So my question is, how random are these numbers returned by realrandom really?
Your function is attempting to discover the address of a table by using a quirk of its default implementation of tostring(). I don't believe that the string returned by tostring{} has a specified format, or that the value included in that string has any documented meaning. In practice, it is derived from the address of something related to the specific table, and so distinct tables convert to distinct strings. However, the next version of Lua is free to change that to anything that is convenient. Worse, the format it takes will be highly platform dependent because it appears to use the %p format specifier to sprintf() which is only specified as being a sensible representation of a pointer.
There's also a much bigger issue. While the address of the nth table created in a process might seem random on your platform, tt might not be random at all. Or it might vary in only a few bits. For example, on my win7 box only a few bits vary, and not very randomly:
C:...>for /L %i in (1,1,20) do # lua -e "print{}"
table: 0042E5D8
table: 0061E5D8
table: 0024E5D8
table: 0049E5D8
table: 0042E5D8
table: 0042E5D8
table: 0042E5D8
table: 0064E5D8
table: 0042E5D8
table: 002FE5D8
table: 0042E5D8
table: 0049E5D8
table: 0042E5D8
table: 0042E5D8
table: 0042E5D8
table: 0024E5D8
table: 0042E5D8
table: 0042E5D8
table: 0061E5D8
table: 0042E5D8
Other platforms will vary, of course. I'd even expect there to be platforms where the address of the first allocated table is completely deterministic, and hence identical on every run of the program.
In short, the address of an arbitrary object in your process image is not a very good source of randomness.
Edit: For completeness, I'd like to add a couple of other thoughts that came to mind over night.
The stock tostring() function is supplied by the base library and implemented by the function luaB_tostring(). The relevant bit is this fragment:
switch (lua_type(L, 1)) {
...
default:
lua_pushfstring(L, "%s: %p", luaL_typename(L, 1), lua_topointer(L, 1));
break;
If you really are calling this function, then the end of the string will be an address, represented by standard C sprintf() format %p, strongly related to the specific table. One observation is that I've seen several distinct implementations for %p. Windows MSVCR80.DLL (the version of the C library used by the current release of Lua for Windows) makes it equivalent to %08X. My Ubuntu Karmic Koala box appears to make it equivalent to %#x which notably drops leading zeros. If you are going to parse out that part of the string, then you should do it in a way that is more flexible in the face of variation of the meaning of %p.
Note, also, that doing anything like this in library code may expose you to a couple of surprises.
First, if the table passed to tostring() has a metatable that provides the function __tostring(), then that function will be called, and the fragment quoted above will never be executed at all. In your case, that issue cannot arise because tables have individual metatables, and you didn't accidentally apply a metatable to your local table.
Second, by the time your module loads, some other module or user-supplied code might have replaced the stock tostring() with something else. If the replacement is benign, (such as a memoization wrapper) then it likely doesn't matter to the code as written. However, this would be a source of attack, and is entirely outside the control of your module. That doesn't strike me as a good idea if the goal is some kind of improved security for your random seed material.
Third, you might not be loaded in a stock Lua interpreter at all, and the larger application (Lightroom, WoW, Wireshark, ...) may choose to replace the base library functions with their own implementations. This is a much less likely issue for tostring(), but note that the base library's print() is a frequent target for replacement or removal in alternate implementations and there are modules (Lua Lanes, for one) that break if print is not the implementation in the base library.
A few important things come to mind:
In most other languages you typically only call the random 'seed' function once at the beginning of the program or perhaps at limited times throughout its execution. You generally do not want to call it each time you generate a random number/sequence. If you call it once when the program starts you get around the "once per second" limitation. By calling it each time you may actually end up with less randomness in your results.
Your realrandom() function seems to rely on a private implementation detail of Lua. What happens in the next major release if this detail changes to always return the same number, or only even numbers, etc.... Just because it works for now is not a strong enough guarantee, especially in the case of wanting a secure RNG.
When you say "everything seems perfectly random" how are you measuring this performance? We humans are terrible at determining if a sequence is random or not and just looking at a sequence of numbers would be virtually impossible to truly tell if they were random or not. There are many ways to quantify the "randomness" of a series including frequency distribution, autocorrelation, compression, and many more far beyond my understanding.
If you are writing a true "secure PRNG" for production do not write your own! Investigate and use a library or algorithm by experts who has spent years/decades studying, designing and trying to break it. True secure random number generation is hard.
If you need more info start on the PRNG article on Wikipedia and use the references/links there as needed.

Resources