LuaJIT - lint option - lua

I have been looking at the lint kind of utils for Lua, and read about LuaInspect, LuaLint and MetaLlint. I am using LuaJIT-2.0.2, and my needs are quite simple, I need to be able to only inspect accesses of the sort of MODNAME:function(). Other than MetaLint (which needs MetaLua runtime), the rest seem to be doing bytecode dump and parse, check for global accesses.
So, if I want to implement a simple utility for my source, on similar lines, I should be looking for GGET and TGETS opcodes in the output of luajit -b -l source.lua, e.g:
0019 GGET 4 1 ; "MODNAME"
0020 TGETS 4 4 8 ; "function"
Are there better alternatives/approaches?
Thanks.

Related

How to get this sqrt inline assembly working for iOS

I am trying to follow another SO post and implement sqrt14 within my iOS app:
double inline __declspec (naked) __fastcall sqrt14(double n)
{
_asm fld qword ptr [esp+4]
_asm fsqrt
_asm ret 8
}
I have modified this to the following in my code:
double inline __declspec (naked) sqrt14(double n)
{
__asm__("fld qword ptr [esp+4]");
__asm__("fsqrt");
__asm__("ret 8");
}
Above, I have removed the "__fastcall" keyword from the method definition since my understanding is that it is for x86 only. The above gives the following errors for each assembly line respectively:
Unexpected token in argument list
Invalid instruction
Invalid instruction
I have attempted to read through a few inline ASM guides and other posts on how to do this, but I am generally just unfamiliar with the language. I know MIPS quite well, but these commands/registers seem to be very different. For example, I don't understand why the original author never uses the passed in "n" value anywhere in the assembly code.
Any help getting this to work would be greatly appreciated! I am trying to do this because I am building an app where I need to calculate sqrt (ok, yes, I could do a lookup table, but for right now I care a lot about precision) on every pixel of a live-video feed. I am currently using the standard sqrt, and in addition to the rest of the computation, I'm running at around 8fps. Hoping to bump that up a frame or two with this change.
If it matters: I'm building the app to ideally be compatibly with any current iOS device that can run iOS 7.1 Again, many thanks for any help.
The compiler is perfectly capable of generating fsqrt instruction, you don't need inline asm for that. You might get some extra speed if you use -ffast-math.
For completeness' sake, here is the inline asm version:
__asm__ __volatile__ ("fsqrt" : "=t" (n) : "0" (n));
The fsqrt instruction has no explicit operands, it uses the top of the stack implicitly. The =t constraint tells the compiler to expect the output on the top of the fpu stack and the 0 constraint instructs the compiler to place the input in the same place as output #0 (ie. the top of the fpu stack again).
Note that fsqrt is of course x86-only, meaning it wont work for example on ARM cpus.

Understanding call x"91"

Can someone please help me understand call x"91" function 11 and function 12 with simple example. I have tried to search and couldn't understand it. Right now I an using this code in COBOL under UNIX environment,Does this call works in windows environment as well?
http://opencobol.add1tocobol.com/#what-are-the-xf4-xf5-and-x91-routines
The CALL's X"F4", X"F5", X"91" are from MF.
You can find them in the online MF doc under
Library Routines.
F4/F5 are for packing/unpacking bits from/to bytes.
91 is a multi-use call. Implemented are the subfunctions
get/set cobol switches (11, 12) and get number of call params (16).
Use
CALL X"F4" USING
BYTE-VAR
ARRAY-VAR
RETURNING STATUS-VAR
to pack the last bit of each byte in the 8 byte ARRAY-VAR into corresponding bits of the 1 byte BYTE-VAR.
The X”F5” routine takes the eight bits of byte and moves them to the corresponding occurrence within array.
X”91” is a multi-function routine.
CALL X"91" USING
RESULT-VAR
FUNCTION-NUM
PARAMETER-VAR
RETURNING STATUS-VAR
As mentioned by Roger, OpenCOBOL supports FUNCTION-NUM of 11, 12 and 16.
11 and 12 get and set the on off status of the 8 (eight) run-time OpenCOBOL switches definable in the SPECIAL-NAMES paragraph. 16 returns the number of call parameters given to the current module.
x'91' is a general library routine, for a complete list of those see the MF documentation.
This documentation also specifies what its function 11 and function 12 do: they set/read the COBOL runtime switches 0-7 and the internal debugging mode switch.
Other than these library routines you can also read them one by one from COBOL and set "some" switches via the SET statement.

How to implement multi-user safe Linear Congruential Generator in Redis?

I use Linear Congruential Generators (http://en.wikipedia.org/wiki/Linear_congruential_generator) to generate IDs exposed to users.
nextID = (a * LastID + c) % m
Now I want to implement LCGs in Redis. Here's the problem:
Getting the current ID and generating the next ID outside Redis is not multi-user safe. Redis has 2 commands which can be used for simple counters: INCRBY and INCRBYFLOAT, but unfortunately Redis doesn't support modulo operation natively. At the moment the only way I see is using EVAL command and writing some lua script.
UPDATE1:
Some lua analog of
INCRBY LCG_Value ((LCG_Value*a+c)%m)-LCG_Value
seems to be a neat way to achieve this.
A server-side Lua script is probably the easiest and more efficient way to do it.
Now it can also be done using Redis primitive operations using a multi/exec block. Here it is in pseudo-code:
while not successful:
WATCH LCG_Value
$LastID = GET LCG_value
$NextID = (a*$LastID+c)%m
MULTI
SET LCG_value $NextID
EXEC
Of course, it is less efficient than the following Lua script:
# Set the initial value
SET LCG_value 1
# Execute Lua script with on LCG_value with a, c, and m parameters
EVAL "local last = tonumber(redis.call( 'GET', KEYS[1]));
local ret = (last*ARGV[1] + ARGV[2])%ARGV[3];
redis.call('SET',KEYS[1], ret);
return ret;
" 1 LCG_value 1103515245 12345 2147483648
Note: the whole script execution is atomic. See the EVAL documentation.

Can I get a list of all currently-registered atoms?

My project has blown through the max 1M atoms, we've cranked up the limit, but I need to apply some sanity to the code that people are submitting with regard to list_to_atom and its friends. I'd like to start by getting a list of all the registered atoms so I can see where the largest offenders are. Is there any way to do this. I'll have to be creative about how I do it so I don't end up trying to dump 1-2M lines in a live console.
You can get hold of all atoms by using an undocumented feature of the external term format.
TL;DR: Paste the following line into the Erlang shell of your running node. Read on for explanation and a non-terse version of the code.
(fun F(N)->try binary_to_term(<<131,75,N:24>>) of A->[A]++F(N+1) catch error:badarg->[]end end)(0).
Elixir version by Ivar Vong:
for i <- 0..:erlang.system_info(:atom_count)-1, do: :erlang.binary_to_term(<<131,75,i::24>>)
An Erlang term encoded in the external term format starts with the byte 131, then a byte identifying the type, and then the actual data. I found that EEP-43 mentions all the possible types, including ATOM_INTERNAL_REF3 with type byte 75, which isn't mentioned in the official documentation of the external term format.
For ATOM_INTERNAL_REF3, the data is an index into the atom table, encoded as a 24-bit integer. We can easily create such a binary: <<131,75,N:24>>
For example, in my Erlang VM, false seems to be the zeroth atom in the atom table:
> binary_to_term(<<131,75,0:24>>).
false
There's no simple way to find the number of atoms currently in the atom table*, but we can keep increasing the number until we get a badarg error.
So this little module gives you a list of all atoms:
-module(all_atoms).
-export([all_atoms/0]).
atom_by_number(N) ->
binary_to_term(<<131,75,N:24>>).
all_atoms() ->
atoms_starting_at(0).
atoms_starting_at(N) ->
try atom_by_number(N) of
Atom ->
[Atom] ++ atoms_starting_at(N + 1)
catch
error:badarg ->
[]
end.
The output looks like:
> all_atoms:all_atoms().
[false,true,'_',nonode#nohost,'$end_of_table','','fun',
infinity,timeout,normal,call,return,throw,error,exit,
undefined,nocatch,undefined_function,undefined_lambda,
'DOWN','UP','EXIT',aborted,abs_path,absoluteURI,ac,accessor,
active,all|...]
> length(v(-1)).
9821
* In Erlang/OTP 20.0, you can call erlang:system_info(atom_count):
> length(all_atoms:all_atoms()) == erlang:system_info(atom_count).
true
I'm not sure if there's a way to do it on a live system, but if you can run it in a test environment you should be able to get a list via crash dump. The atom table is near the end of the crash dump format. You can create a crash dump via erlang:halt/1, but that will bring down the whole runtime system.
I dare say that if you use more than 1M atoms, then you are doing something wrong. Atoms are intended to be static as soon as the application runs or at least upper bounded by some small number, 3000 or so for a medium sized application.
Be very careful when an enemy can generate atoms in your vm. especially calls like list_to_atom/1 is somewhat dangerous.
EDITED (wrong answer..)
You can adjust number of atoms with +t
http://www.erlang.org/doc/efficiency_guide/advanced.html
..but I know very few use cases when it is necessary.
You can track atom stats with erlang:memory()

Erlang/OTP - Timing Applications

I am interested in bench-marking different parts of my program for speed. I having tried using info(statistics) and erlang:now()
I need to know down to the microsecond what the average speed is. I don't know why I am having trouble with a script I wrote.
It should be able to start anywhere and end anywhere. I ran into a problem when I tried starting it on a process that may be running up to four times in parallel.
Is there anyone who already has a solution to this issue?
EDIT:
Willing to give a bounty if someone can provide a script to do it. It needs to spawn though multiple process'. I cannot accept a function like timer.. at least in the implementations I have seen. IT only traverses one process and even then some major editing is necessary for a full test of a full program. Hope I made it clear enough.
Here's how to use eprof, likely the easiest solution for you:
First you need to start it, like most applications out there:
23> eprof:start().
{ok,<0.95.0>}
Eprof supports two profiling mode. You can call it and ask to profile a certain function, but we can't use that because other processes will mess everything up. We need to manually start it profiling and tell it when to stop (this is why you won't have an easy script, by the way).
24> eprof:start_profiling([self()]).
profiling
This tells eprof to profile everything that will be run and spawned from the shell. New processes will be included here. I will run some arbitrary multiprocessing function I have, which spawns about 4 processes communicating with each other for a few seconds:
25> trade_calls:main_ab().
Spawned Carl: <0.99.0>
Spawned Jim: <0.101.0>
<0.100.0>
Jim: asking user <0.99.0> for a trade
Carl: <0.101.0> asked for a trade negotiation
Carl: accepting negotiation
Jim: starting negotiation
... <snip> ...
We can now tell eprof to stop profiling once the function is done running.
26> eprof:stop_profiling().
profiling_stopped
And we want the logs. Eprof will print them to screen by default. You can ask it to also log to a file with eprof:log(File). Then you can tell it to analyze the results. We tell it to collapse the run time from all processes into a single table with the option total (see the manual for more options):
27> eprof:analyze(total).
FUNCTION CALLS % TIME [uS / CALLS]
-------- ----- --- ---- [----------]
io:o_request/3 46 0.00 0 [ 0.00]
io:columns/0 2 0.00 0 [ 0.00]
io:columns/1 2 0.00 0 [ 0.00]
io:format/1 4 0.00 0 [ 0.00]
io:format/2 46 0.00 0 [ 0.00]
io:request/2 48 0.00 0 [ 0.00]
...
erlang:atom_to_list/1 5 0.00 0 [ 0.00]
io:format/3 46 16.67 1000 [ 21.74]
erl_eval:bindings/1 4 16.67 1000 [ 250.00]
dict:store_bkt_val/3 400 16.67 1000 [ 2.50]
dict:store/3 114 50.00 3000 [ 26.32]
And you can see that most of the time (50%) is spent in dict:store/3. 16.67% is taken in outputting the result, another 16.67% is taken by erl_eval (this is why you get by running short functions in the shell -- parsing them becomes longer than running them).
You can then start going from there. That's the basics of profiling run times with Erlang. Handle with care, eprof can be quite a load on a production system or for functions that run for too long. Especially on a production system.
You can use eprof or fprof.
The normal way to do this is with timer:tc. Here is a good explanation.
I can recommend you this tool: https://github.com/virtan/eep
You will get something like this https://raw.github.com/virtan/eep/master/doc/sshot1.png as a result.
Step by step instruction for profiling all processes on running system:
On target system:
1> eep:start_file_tracing("file_name"), timer:sleep(20000), eep:stop_tracing().
$ scp -C $PWD/file_name.trace desktop:
On desktop:
1> eep:convert_tracing("file_name").
$ kcachegrind callgrind.out.file_name

Resources