Lightuserdata is different from userdata so what can I do with it? I mean the operations of lightuserdata in lua. Looks like I cannot convert it to any other data type.
One of my case:
My C library returns a C pointer named 'c_pointer', AKA lightuserdata to Lua, and then I want:
my_pointer = c_pointer +4
and then pass 'my_pointer' back to C library. Since I cannot do anything with 'c_pointer', so the expression 'c_pointer + 4' is invalid.
I am wondering are there some practical solutions to this?
Lightuserdata are created by C libraries. They are simply C pointers.
For example, you can use them to refer to data you allocate with malloc, or statically allocate in your module. Your C library can transfer these pointers to the Lua side as a lightuserdata using lua_pushlightuserdata, and later Lua can give it back to your library (or other C code) on the stack. Lua code can use the lightuserdata as any other value, storing it in a table, for example, even as a table key.
ADDENDUM
To answer your revised question, if you want to add an offset to the pointer, do it on the C side. Pass the lightuserdata and the integer offset to C, and let C do the offset using ptr[n]
void * ptr = lua_touserdata(L, idx1);
lua_Integer n = lua_tointeger(L. idx2);
// do something with
((char *)ptr)[n];
Plain Lua has no pointer arithmetic, so as Doug Currie stated you would need to do the pointer arithmetic on the C side.
LuaJIT on the other hand can do pointer arithmetic (via the FFI library), so consider using that instead.
Related
Are the following two examples equivalent?
Example 1:
let x = String::new();
let y = &x[..];
Example 2:
let x = String::new();
let y = &*x;
Is one more efficient than the other or are they basically the same?
In the case of String and Vec, they do the same thing. In general, however, they aren't quite equivalent.
First, you have to understand Deref. This trait is implemented in cases where a type is logically "wrapping" some lower-level, simpler value. For example, all of the "smart pointer" types (Box, Rc, Arc) implement Deref to give you access to their contents.
It is also implemented for String and Vec: String "derefs" to the simpler str, Vec<T> derefs to the simpler [T].
Writing *s is just manually invoking Deref::deref to turn s into its "simpler form". It is almost always written &*s, however: although the Deref::deref signature says it returns a borrowed pointer (&Target), the compiler inserts a second automatic deref. This is so that, for example, { let x = Box::new(42i32); *x } results in an i32 rather than a &i32.
So &*s is really just shorthand for Deref::deref(&s).
s[..] is syntactic sugar for s.index(RangeFull), implemented by the Index trait. This means to slice the "whole range" of the thing being indexed; for both String and Vec, this gives you a slice of the entire contents. Again, the result is technically a borrowed pointer, but Rust auto-derefs this one as well, so it's also almost always written &s[..].
So what's the difference? Hold that thought; let's talk about Deref chaining.
To take a specific example, because you can view a String as a str, it would be really helpful to have all the methods available on strs automatically available on Strings as well. Rather than inheritance, Rust does this by Deref chaining.
The way it works is that when you ask for a particular method on a value, Rust first looks at the methods defined for that specific type. Let's say it doesn't find the method you asked for; before giving up, Rust will check for a Deref implementation. If it finds one, it invokes it and then tries again.
This means that when you call s.chars() where s is a String, what's actually happening is that you're calling s.deref().chars(), because String doesn't have a method called chars, but str does (scroll up to see that String only gets this method because it implements Deref<Target=str>).
Getting back to the original question, the difference between &*s and &s[..] is in what happens when s is not just String or Vec<T>. Let's take a few examples:
s: String; &*s: &str, &s[..]: &str.
s: &String: &*s: &String, &s[..]: &str.
s: Box<String>: &*s: &String, &s[..]: &str.
s: Box<Rc<&String>>: &*s: &Rc<&String>, &s[..]: &str.
&*s only ever peels away one layer of indirection. &s[..] peels away all of them. This is because none of Box, Rc, &, etc. implement the Index trait, so Deref chaining causes the call to s.index(RangeFull) to chain through all those intermediate layers.
Which one should you use? Whichever you want. Use &*s (or &**s, or &***s) if you want to control exactly how many layers of indirection you want to strip off. Use &s[..] if you want to strip them all off and just get at the innermost representation of the value.
Or, you can do what I do and use &*s because it reads left-to-right, whereas &s[..] reads left-to-right-to-left-again and that annoys me. :)
Addendum
There's the related concept of Deref coercions.
There's also DerefMut and IndexMut which do all of the above, but for &mut instead of &.
They are completely the same for String and Vec.
The [..] syntax results in a call to Index<RangeFull>::index() and it's not just sugar for [0..collection.len()]. The latter would introduce the cost of bound checking. Gladly this is not the case in Rust so they both are equally fast.
Relevant code:
index of String
deref of String
index of Vec (just returns self which triggers the deref coercion thus executes exactly the same code as just deref)
deref of Vec
LuaJIT knows the C types it defines, and the lengths of the arrays, but it doesn't check the bounds:
ffi = require("ffi")
ten_ints = ffi.typeof("int [10]")
p1 = ten_ints()
print(ffi.sizeof(p1)) -- 40
var_ints = ffi.typeof("int [?]")
p2 = ffi.new(var_ints, 10)
print(ffi.sizeof(p2)) -- 40
p1[1000000] = 1 -- segfault
p2[1000000] = 1 -- segfault
Is there a way to make it do that, or is my only choice to write wrappers?
Short answer: There is no way, you'll have to write/find your own wrapper.
Here is the explanation from luajit.org
No Hand-holding!
[...] The FFI library provides no memory safety, unlike regular
Lua code. It will happily allow you to dereference a NULL pointer, to
access arrays out of bounds or to misdeclare C functions. If you make
a mistake, your application might crash, just like equivalent C code
would. This behavior is inevitable, since the goal is to provide full
interoperability with C code. Adding extra safety measures, like
bounds checks, would be futile. [...] Likewise there's no way to
infer the valid range of indexes for a returned pointer. Again: the
FFI library is a low-level library.
I'm experimenting with enumeration sorts in Z3 as described in How to use enumerated constants after calling of some tactic in Z3? and I noticed that I might have some misunderstanding on how to use C and C++ api properly. Let's consider the following example.
context z3_cont;
Z3_symbol e_names[2 ];
Z3_func_decl e_consts[2];
Z3_func_decl e_testers[2];
e_names[0] = Z3_mk_string_symbol(z3_cont, "x1");
e_names[1] = Z3_mk_string_symbol(z3_cont, "x2");
Z3_symbol e_name = Z3_mk_string_symbol(z3_cont, "enum_type");
Z3_sort new_enum_sort = Z3_mk_enumeration_sort(z3_cont, e_name, 2, e_names, e_consts, e_testers);
sort enum_sort = to_sort(z3_cont, new_enum_sort);
expr e_const0(z3_cont), e_const1(z3_cont);
/* WORKS!
func_decl a_decl = to_func_decl(z3_cont, e_consts[0]);
func_decl b_decl = to_func_decl(z3_cont, e_consts[1]);
e_const0 = a_decl(0, 0);
e_const1 = b_decl(0, 0);
*/
// SEGFAULT when doing cout
e_const0 = to_func_decl(z3_cont, e_consts[0])(0, 0);
e_const1 = to_func_decl(z3_cont, e_consts[1])(0, 0);
cout << e_const0 << " " << e_const1 << endl;
I expected the two variants of code to nicely wrap the C entity Z3_func_decl with a smart pointer so I can use with C++ api, however only the first variant seems to be correct. So my questions are
Is this a correct behavior that the second way doesn't work? If so, how can I better understand the reasons of why it shouldn't?
What happens with unwrapped C entities, as for example Z3_symbol e_name - here I don't wrap it, I don't increment references. So will the memory be managed properly? Is it safe to use it? when the object will be destroyed?
A minor question: I didn't see the to_symbol() function in C++ api. Is it just unnecessary?
Thank you.
Whenever we create a new Z3 AST, Z3 may garbage collect an AST n if the reference counter of n is 0. In the piece of code that works, we wrap e_consts[0] and e_consts[1] before we create any new AST. When we wrap them, the smart pointer will bump their reference counter.
This is why it works. In the piece of code that crashes, we wrap e_consts[0], and then create e_const0 before we wrap e_consts[1]. Thus, the AST referenced by e_consts[1] is deleted before we have the chance to create e_const1.
BTW, in the next official release, we will have support for creating enumeration types in the C++ API: http://z3.codeplex.com/SourceControl/changeset/b2810592e6bb
This change is already available in the nightly builds.
Z3_symbol is not a reference counted object. They are persistent, Z3 maintains a global table of all symbols created. We should view symbols as unique strings.
Note that we can use the class symbol and the constructor symbol::symbol(context & c, Z3_symbol s). The functions to_* are used to wrap objects created using the C API with smart pointers. We usually have a function to_A, if there is a C API function that returns an A object, and there is not function/method equivalent in the C++.
I have to communicate with a dll and it lua and this is the function I use to write strings by bytes:
writeString = function(pid, process, address, value)
local i = 1
while i <= String.Length(value) do
local byte = string.byte(value, i, i)
DLL.CallFunction("hook.dll", "writeMemByte", pid..','..process..','..address + (i-1)..','..byte, DLL_RETURN_TYPE_INTEGER, DLL_CALL_CDECL)
i = i + 1
end
DLL.CallFunction("hook.dll", "writeMemByte", pid..','..process..','..address + (i-1)..',0', DLL_RETURN_TYPE_INTEGER, DLL_CALL_CDECL)
end
I basically need to adapt this to write a double value byte by byte.
I just can't think how to make the memory.writeDouble function.
EDIT: this is my readString function:
readString = function(pid, process, address)
local i, str = 0, ""
repeat
local curByte = DLL.CallFunction("hook.dll", "readMemByte", pid..','..process..','..(address + i), DLL_RETURN_TYPE_INTEGER, DLL_CALL_CDECL)
if curByte == "" then curByte = 0 end
curByte = tonumber(curByte)
str = str .. string.char(curByte)
i = i + 1
until (curByte == 0)
return str
end,
My first recommendation would be: try to find a function that accepts strings representing doubles instead of doubles. Implementing the lua side of that would be incredibly easy, since you already have a writeString - it could be something very similar to this:
writeDouble = function(pid, process, address, value)
writeString(pid, process, address, tostring(value))
end
If you don't have that function, but you have access to the dll source, you can try to add that function yourself; it shouldn't be much more complicated than getting the string and then calling atof on it.
If you really can't modify the dll, then you need to figure out the exact double format that the lib is expecting - there are lots of factors that can change that format. The language and compiler used, the operative systems, and the compiler flags, to cite some.
If the dll uses a standard format, like IEE-754, the format will usually have well documented "translations" from/two bites. Otherwise, it's possible that you'll have to develop them yourself.
Regards and good luck!
There are many libraries available for Lua that do just this.
If you need the resulting byte array (string), string.pack should do it; you can find precompiled binaries for Windows included with Lua for Windows.
If you are more interested in using the double to interface with foreign code, I would recommend taking a different approach using alien, a Foreign Function Interface library that lets you directly call C functions.
If you able to, I even more highly recommend switching to LuaJIT, a Just-In-Time compiler for Lua that provides the power, speed and reach of C and assembly, but with the comfort an flexibility of Lua.
If none of these solutions are viable, I can supply some code to serialise doubles (not accessible at the moment).
I'm learning compilers and creating a code generator for a simple language that deals with two types: characters and integers.
After the user input has been scanned by the scanner and then parsed by the parser, I get an AST representation of the input. I have made a code generation for an even simpler language which only processes expressions with integers, operators and variables.
However with this new language I sometimes get a subtree for a type declaration, like this:
(IS TYPE (x) (INT))
which says x is of type INT.
Should there be a case in my code generator which deals with these type declarations? Or is this simply for the semantic analyzer to type check, so I should just assume the types have been checked and ignore this part of the tree and simply assign the value for x?
Both situations are possible, you need to describe more about your language, to see if you really need to add that feature to your code generator, or skip it as unnecessary, and avoid extra work with this difficult and interesting topic of designing a programming language.
Is you "code generator" a program that recieves as an input code in a programming language (maybe small one) and outputs code in another programming language (maybe small one) ?
This tool is usually called a "translator".
Is you "code generator" a program that receive as an input a programming language and outputs assembler / bytecode like programming language ?
This tool is usually called a "compiler".
Note: "pile" is a synonym for "stack".
Usually an A.S.T., stores the type of an operation, or function call. Example, in c:
...
int a = 3;
int b = 5;
float c = (float)(a * b);
...
The last line, generates an A.S.T. similar to this, (skip A.S.T. for other lines):
..................................................................
..................................................................
......................+--------------+............................
......................| [root] |............................
......................| (no type) = |............................
......................+------+-------+............................
.............................|....................................
.................+-----------+------------+.......................
.................|........................|.......................
...........+-----+-----+....+-------------+-------------+.........
...........| (int) c |....| (float) (cast operation) |.........
...........+-----------+....+-------------+-------------+.........
..........................................|.......................
....................................+-----+-----+.................
....................................| (int) () |.................
....................................+-----+-----+.................
..........................................|.......................
....................................+-----+-----+.................
....................................| (int) * |.................
....................................+-----+-----+.................
..........................................|.......................
..............................+-----------+-----------+...........
..............................|.......................|...........
........................+-----+-----+...........+-----+-----+.....
........................| (int) a |...........| (float) b |.....
........................+-----------+...........+-----------+.....
..................................................................
..................................................................
Note that the "(float)" cast its like an operator or a function,
similar to your question.
Good Luck.
If this is a declaration
(IS TYPE (x) (INT))
then x should be laid out in memory. In the case of C and automatic variables, local auto variables are allocated on stack. To allocate needed size of stack you should know sizes of all local vars and sizes are from types.
If this variable is stored in a register, you should select a register of needed size (think about x86 with: AL, AX, EAX, RAX - the same register with different sizes), if your target has such.
Also, type is needed when there is an ambiguous operation in AST, which can operate on different data sizes (e.g. char, short, int - or 8-bit, 16-bit, 32-bit, etc). And for some assemblers, size of data is encoded into instruction itself; so codegen should remember sizes of variables.
Or, if the type of operation was not recorded in AST, the ADD:
(ADD (x) (y))
may mean both float and int additions (ADD or FADD instructions), so types of x and y are needed in codegen to select right variant.