How to obtain the offset of a MASM local variable at assembly time - local

I want to obtain the offset into the stack frame of a MASM procedure's local variable at assembly time (not runtime, the LEA instruction isn't useful here). E.g., if the local variable is at RBP-8 at run time, I want to obtain the constant (-8) during assembly.
Is this possible?
OFFSET operator only works on global (static) variables. Every suggestion I've found on the internet talks about using LEA, which is not what I want.
someProc proc
local a:byte, b:word, c:dword
.
.
. someConst = MagicOffset(a) ;Magically sets constant "someConst" to the offset of "a" (probably something like -8).
someProc endp

Related

Memory allocation when a pointer is assigned

I am trying to avoid memory allocation and local copy in my code. Below is a small example :
module test
implicit none
public
integer, parameter :: nb = 1000
type :: info
integer n(nb)
double precision d(nb)
end type info
type(info), save :: abc
type(info), target, save :: def
contains
subroutine test_copy(inf)
implicit none
type(info), optional :: inf
type(info) :: local
if (present(inf)) then
local = inf
else
local = abc
endif
local%n = 1
local%d = 1.d0
end subroutine test_copy
subroutine test_assoc(inf)
implicit none
type(info), target, optional :: inf
type(info), pointer :: local
if (present(inf)) then
local => inf
else
local => def
endif
local%n = 1
local%d = 1.d0
end subroutine test_assoc
end module test
program run
use test
use caliper_mod
implicit none
type(ConfigManager), save :: mgr
abc%n = 0
abc%d = 0.d0
def%n = 0
def%d = 0.d0
! Init caliper profiling
mgr = ConfigManager_new()
call mgr%add("runtime-report(mem.highwatermark,output=stdout)")
call mgr%start
! Call subroutine with copy
call cali_begin_region("test_copy")
call test_copy()
call cali_end_region("test_copy")
! Call subroutine with pointer
call cali_begin_region("test_assoc")
call test_assoc()
call cali_end_region("test_assoc")
! End caliper profiling
call mgr%flush()
call mgr%stop()
call mgr%delete()
end program run
As far as i understand, the subroutine test_copy should produce a local copy while the subroutine test_assoc should only assign a pointer to some existing object. However, memory profiling with caliper leads to :
$ ./a.out
Path Min time/rank Max time/rank Avg time/rank Time % Allocated MB
test_assoc 0.000026 0.000026 0.000026 0.493827 0.000021
test_copy 0.000120 0.000120 0.000120 2.279202 0.000019
What looks odd is that Caliper shows the exact same amount of memory allocated whatever the value of the parameter nb. Am I using the right tool the right way to track memory allocation and local copy ?
Test performed with gfortran 11.2.0 and Caliper 2.8.0.
The local object is just a simple small scalar, albeit containing an array component, and will be very likely placed on the stack.
A stack is a pre-allocated part of memory of fixed size. Stack "allocation" is actually just a change of value of the stack pointer which is just one integer value. No actual memory allocation from the operating system takes place during stack allocation. There will be no change in the memory the process occupies.

local variable cannot be seen in a closure across the file?

Suppose I have the following two Lua files:
In a.lua:
local x = 5
f = dofile'b.lua'
f()
In b.lua:
local fun = function()
print(x)
end
return fun
Then if I run luajit a.lua in shell it prints nil since x cannot be seen in the function defined in b.lua. The expected printing should be 5. However if I put everything in a single file then it's exactly what I want:
In aa.lua:
local x = 5
local f = function()
print(x)
end
f()
Run luajit aa.lua it prints 5.
So why x cannot be seen in the first case?
As their name suggests, local variables are local to the chunk.
dofile() loads the chunk from another file. Since it's another chunk, it makes sense that the local variable x in the first chunk isn't seen by it.
I agree that it is somewhat unintuitive that this doesn't work.
You'd like to say, at any point in the code there is a clear set of variables that are 'visible' -- some may be local, some may be global, but there is some map that the interpreter can use to resolve names of either kind.
When you load a chunk using dofile, then it can see whatever global variables currently exist, but apparently it can't see any local variables. We know that 'dofile' is not like C/C++ inclusion macros, which would give exactly the behavior you describe for local variables, but still you might reasonably expect that this part of it would work the same.
Ultimately there's no answer but "that's just not how they specified the language". The only satisfying answer is probably along the lines 'because otherwise it would cause non-obvious problem X' or 'because then use-case Y would go slower'.
I think the best answer is that, if all names were dynamically rebound according to the scope in which they are loaded when you use loadfile / dofile, that would inhibit a lot of optimization and such when compiling chunks into bytecode. In the lua system, name resolution works like 'either it is local in this scope, and then it binds to that (known) object, or, it is a lookup in the (unique) global table.' This system is pretty simple, there are only a few options and not a lot of room for complexity.
I don't think that running byte code even keeps track of the names of local variables, it discards them after the chunk is compiled. They would have to undo that optimization if they wanted to allow dynamic name resolution at chunk loading time like you suggest.
If your question is not really why but how can I make it work anyways, then one way you can do it is, in the host script, put any local variables that you want to be visible in the environment of the script that is called. When you do this you need to split dofile into a few calls. It's slightly different in lua 5.1 vs lua 5.2.
In lua 5.1:
In a.lua:
local shared = { x = 5 }
temp = loadfile('b.lua')
setfenv(temp, shared)
f = temp()
f()
In lua 5.2:
In a.lua:
local shared = { x = 5 }
temp = loadfile('b.lua', 't', shared)
f = temp()
f()
The x variable defined in module a.lua cannot be seen from b.lua because it was declared as local. The scope of a local variable is its own module.
If you want x to be visible from b.lua, just need to declare it global. A variable is either local or global. To declare a variable as global, just simply do not declare it as local.
a.lua
x = 5
f = dofile'b.lua'
f()
b.lua
local fun = function()
print(x)
end
return fun
This will work.
Global variables live within the global namespace, which can be accessed at any given time via the _G table. When Lua cannot solve a variable, because it's not defined inside the module where is being used, Lua searches that variable in the global namespace. In conclusion, it's also possible to write b.lua as:
local fun = function()
print(_G["x"])
end
return fun

Explanation about “local foo = foo” idiom in Lua

In Programming in Lua (3rd Ed.) by Roberto Ierusalimschy it is stated that
A common idiom in Lua is
local foo = foo
This code creates a local
variable, foo, and initializes it with the value of the global
variable foo. (The local foo becomes visible only after its
declaration.) This idiom is useful when the chunk needs to preserve
the original value of foo even if later some other function changes
the value of the global foo; it also speeds up the access to foo.
Could someone explain this more in detail and provide a simple example?
At the moment, the only use I can think of for this idiom is to manage local variables (in a given block) that have the same names as global variables, so that the global variable is left unchanged after the block.
An example:
foo = 10
do
local foo = foo
foo = math.log10(foo)
print(foo)
end
print(foo)
this gives:
1
10
But the same could be accomplished without using the idiom at all:
bar = 10
do
local bar = math.log10(bar)
print(bar)
end
print(bar)
that gives the same result. So my explanation doesn't seem to hold.
I've seen this used more often as a optimization technique than as a way to preserve original values. With the standard Lua interpreter, every global variable access and module access requires a table lookup. Local variables, on the other hand, have statically-known locations at bytecode-compile time and can be placed in VM registers.
In more depth: Why are local variables accessed faster than global variables in lua?
The explanation is correct; I'm not sure why you are not satisfied with your example. To give you a real example:
local setfenv = setfenv
if not setfenv then -- Lua 5.2+
setfenv = function() ..... end
end
Another reason is to preserve the value as it is at this moment, so that other functions that use that value (in a file or a module) would have the same expectations about that value.
Wrapping a global:
do
local setmetatable = setmetatable
function _ENV.setmetatable(...)
-- Do your thing
return setmetatable(...)
end
end
Reducing overhead by using a local instead of doing a lookup in the globals-table (which is a local btw.):
local type = type
for k, v in next, bigtable do
if type(v) == "string" then
-- Do one thing
else
-- Do other thing
end
end
I think you're splitting hairs, unintentionally.
local bar = math.log10(bar)
is essentially the same as local bar = bar in spirit, but we it would be less useful to claim that the idiom is local bar = a(bar), because we may want to handle the local in some way other than passing it to a function first -- e.g. appending it to something.
This point is that we want to refer to the local bar, just as you say, not exactly how the conversion from global to local is done.

Why are local variables accessed faster than global variables in lua?

So I was reading Programing in Lua 2nd Ed and I came across this paragraph here:
It is good programming style to use local variables whenever
possible. Local variables help you avoid cluttering the global
environment with unnecessary names. Moreover, the access to local
variables is faster than to global ones.
Could anyone explain why this is? And is this "feature" only in Lua, or is it in other languages aswell? (e.g. C, C++, Java)
The difference in running time is due to the difference between hash table lookup and array lookup. An interpreter might be able to place a local variable in a CPU register, but even without such cleverness local variables are faster to access.
Global variables in Lua are stored in tables. Generally, anyone can modify these tables, and therefore the interpreter has to lookup a value anew every time it is being accessed. Local variables on the other hand disappear only when they go out of scope. Therefore they can have fixed locations in an array.
The benchmark program below calls a dummy function in a loop. The benchmark shows how the running time goes up the more tables the program has to jump through.
Other dynamic languages should be expected to have similar characteristics; see for example the Python benchmark at the very end.
Some relevant links:
Optimising Using Local Variables (Lua)
Local Variables (Python performance tips)
Optimizing Global Variable/Attribute Access. A (withdrawn) Python proposal on the lookup of global vs. local objects.
File demo.lua:
local M = {}
_G.demo = M
function M.op(x) return x end
return M
File main.lua:
local M = require "demo"
op = demo.op
local outer_op = demo.op
function iter_op(n)
local inner_op = demo.op
for i = 1, n do
-- Example running times for n = 100,000,000 (Lua 5.2.0):
-- Lookup a table (demo or _G), then lookup 'op'
-- within that table:
--
-- demo.op(i) --> 0:40
-- _G.op(i) --> 0:39
-- Lookup 'op' within a known local table (M or the table of
-- globals):
--
-- M.op(i) --> 0:30
-- op(i) --> 0:30
-- Dereference a local variable declared inside or outside
-- of this iter_op() function:
--
-- inner_op(i) --> 0:23
-- outer_op(i) --> 0:22
end
end
iter_op(100000000)
File main.py:
import demo # Contains 'def op(x): return x'.
global_op = demo.op
def iter_op(n):
local_op = demo.op
for i in xrange(n):
# Example running times for n = 50,000,000 (Python 2.6.5):
# demo.op(i) # 0:50
# global_op(i) # 0:41
local_op(i) # 0:36
iter_op(50000000)
In any language local variables will be faster. You will need to understand what a register is and what the thread stack is to understand my explanation. Most local variables are implemented as a register variable or pushed near the top of the local stack, so they are generally accessed much more quickly. Global variables are stored further up the stack (if they are not on the heap) so computing their address to access them is slower.
I'm making some assumptions here about the inner workings of Lua, but this makes sense from a computer architecture standpoint.

The CPU and Memory (value, register)

When a value is copied from one register to another, what happens to the value
in the source register? What happens to the value in the destination register.
I'll show how it works in simple processors, like DLX or RISC, which are used to study CPU-architecture.
When (AT&T syntax, or copy $R1 to $R2)
mov $R1, $R2
or even (for RISC-styled architecture)
add $R1, 0, $R2
instruction works, CPU will read source operands: R1 from register file and zero from... may be immediate operand or zero-generator; pass both inputs into Arithmetic Logic Unit (ALU). ALU will do an operation which will just pass first source operand to destination (because A+0 = A) and after ALU, destination will be written back to register file (but to R2 slot).
So, Data in source register is only readed and not changed in this operation; data in destination register will be overwritten with copy of source register data. (old state of destination register will be lost with generating of heat.)
At physical level, any register in register file is set of SRAM cells, each of them is the two inverters (bi-stable flip-flop, based on M1,M2,M3,M4) and additional gates for writing and reading:
When we want to overwrite value stored in SRAM cell, we will set BL and -BL according to our data (To store bit 0 - set BL and unset -BL; to store bit 1 - set -BL and unset BL); then the write is enabled for current set (line) of cells (WL is on; it will open M5 and M6). After opening of M5 and M6, BL and -BL will change state of bistable flip-flop (like in SR-latch). So, new value is written and old value is discarded (by leaking charge into BL and -BL).

Resources