How can I automatically rewrite the signature of bindgen-created FFI functions? - binding

I'm writing a binding for a C library with the help of rust-bindgen for which the function signatures are generated automatically into a bindings.rs as:
#[repr(C)]
struct A {
//...
}
struct B {
//...
}
extern "C" {
pub fn foo(x: *mut A, y: *mut B);
//...
}
I'm not very happy with this signature of foo because I know that x is a pointer to a constant struct. Moreover, I want to apply this idea to improve this signature into something like
extern "C" {
pub fn foo(x: &'_ A, y: &'_ mut B);
}
But binding.rs has a bunch functions like foo and rewriting them by hand is a very time consuming task and I think that macros (or something else) should help. For example, there might exist one (or several) magic macro rewrite!
// hide
mod ffi {
include!("binding.rs"); // so bunch of functions: foo, bar
}
// re-exports
extern "C" {
rewrite!(foo); // should expand to: pub fn foo(x: &'_A, y: &'_ mut B)
rewrite!(bar);
}
I'm at a very early stage of this work. I don't even know if such a problem can be solved by a macro or anything else, so I'm looking for any entry point.
I've cross-posted this question to the Rust user forum.

A declarative macro can't accomplish this but a procedural macro might be able to. With proc_macro2, you can modify the token stream of a function declaration by placing your rewrite attribute on it, e.g.
extern "C" {
#[rustify]
pub fn foo(x: *mut A, y: *mut B);
}
And your rustify macro would substitute *mut Typename with Option<&mut Typename>.
I don't know how you'd change the mut borrow offhand without replacing the original declaration with *const.

Related

How does one create a struct that holds a type parameterized function in Rust?

I am a beginner in Rust and am trying to create a Parser Combinator library in order to learn the ropes of the language. Very early on in this project I've gotten stuck. I want to have a Parser struct that holds the function used to parse data. Here is my attempt at implementing this.
struct Parser<I, O> {
parse: impl Fn(&Vec<I>) -> Option<(&Vec<I>, O)>
}
Unfortunately, as the compiler informs me, i can not use the "impl Trait" notation in this way. Another way I've tried is by defining a separate type variable for the type of the function itself, as below.
struct Parser<I, O, F>
where
F: impl Fn(&Vec<I>) -> Option<(&Vec<I>, O)>
{
parse: F
}
However, it seems redundant and unnecessary to have to provide the input, output, and function type, as the function type can be derived from the input and output. Also, the compiler gives me an error due to the fact that neither I or O are used.
I also considered Parser may have to be a trait rather than a struct. However I can't really wrap my head around what that would look like, and it seems like you would run into the same issue trying to define a struct that implemented the Parser trait.
Not a lot of context, but you I'll try doing it this way:
struct Parser<I, O> {
parse: Box<dyn Fn(&Vec<I>) -> Option<(&Vec<I>, O)>>,
}
fn main() {
let parser = Parser {
parse: Box::new(|x| {
Some((x, x.iter().sum::<i32>()))
})
};
let v = vec![1, 2, 3, 4];
let result = (parser.parse)(&v).unwrap();
println!("{:?}", result);
}
For some more suggestion I would look here: How do I store a closure in a struct in Rust?
I think all you need is std::marker::PhantomData to silence the error about unused generics. You can also make the code a bit more DRY with some type aliases. (I've replaced &Vec<I> with &[I] as the latter is a strict superset of the former.)
use std::marker::PhantomData;
type Input<'a,I> = &'a [I];
type Output<'a,I,O> = Option<(&'a [I], O)>;
struct Parser<I, O, F>
where
F: Fn(Input<'_,I>) -> Output<'_, I, O>,
{
parse: F,
_phantom: PhantomData<(I, O)>,
}
impl<I, O, F> Parser<I, O, F>
where
F: Fn(Input<'_, I>) -> Output<'_, I, O>,
{
fn new(parse: F) -> Self {
Self {
parse,
_phantom: PhantomData,
}
}
fn parse_it<'a>(&'a self, input: Input<'a, I>) -> Output<'a, I, O> {
(self.parse)(input)
}
}
fn main() {
let parser = Parser::new(|v: &[i32]| Some((v, v.iter().fold(0, |acc, x| acc + x))));
println!("{:?}", parser.parse_it(&[1, 2, 3]));
// ^ Some(([1, 2, 3], 6))
}

Are dependency injection containers that return references to values allocated in a local function fundamentally impossible in Rust?

In my mind one of the ideal traits for a dependency injection container would look like:
pub trait ResolveOwn<T> {
fn resolve(&self) -> T;
}
I don't know how to implement this for certain T. I keep stubbing my toes on variations and cousins of this error:
error[E0515]: cannot return value referencing local variable `X`
I'm used to dependency injection in C# where returning values referencing local variables is precisely how you implement the equivalent of that resolve function.
Here's an illustration that focuses on this aspect of dependency injection:
struct ComplexThing<'a>(&'a i32);
struct Module();
impl Module {
fn resolve_foo(&self) -> i32 {
todo!()
}
pub fn resolve_complex_thing_1(&self) -> ComplexThing {
let foo = self.resolve_foo();
ComplexThing(&foo)
}
}
error[E0515]: cannot return value referencing local variable `foo`
--> src/lib.rs:12:9
|
12 | ComplexThing(&foo)
| ^^^^^^^^^^^^^----^
| | |
| | `foo` is borrowed here
| returns a value referencing data owned by the current function
See? There's that error.
My first instinct (again, coming from C#) is to give the local variable a place to live in the returned value, because the local variable is created here but it needs to live at least as long as the returned value. Hmm... that sounds sort of like returning a closure. Let's see how that goes...
pub fn resolve_complex_thing_2<'a>(&'a self) -> impl FnOnce() -> ComplexThing<'a> {
let foo = self.resolve_foo();
move || ComplexThing(&foo)
}
error[E0515]: cannot return value referencing local data `foo`
--> src/lib.rs:12:17
|
12 | move || ComplexThing(&foo)
| ^^^^^^^^^^^^^----^
| | |
| | `foo` is borrowed here
| returns a value referencing data owned by the current function
No joy. It doesn't work to package this closure up into a prettier type (like some impl of Into<ComplexThing<'a>>) because it's fundamentally about returning a value referencing local data.
My next instinct is to somehow jam the local data into some kind of weak cache inside my Module and then get a reference from there (undoubtedly unsafely). And then the weak cache will need to solve half of the hard problems in Computer Science (hint: the other hard problem is naming things). That's starting to sound an awful lot like... oh no. Garbage collection!
I also thought about inverting the flow of control. It's hideous and still doesn't work:
impl Module {
pub fn use_foo<T>(&self, f: impl FnOnce(i32) -> T) -> T {
(f)(42)
}
pub fn use_complex_thing<'a, T>(&'a self, f: impl FnOnce(ComplexThing<'a>) -> T) -> T {
self.use_foo(
|foo| (f)(ComplexThing(&foo)),
)
}
}
error[E0597]: `foo` does not live long enough
--> src/lib.rs:12:36
|
10 | pub fn use_complex_thing<'a, T>(&'a self, f: impl FnOnce(ComplexThing<'a>) -> T) -> T {
| -- lifetime `'a` defined here
11 | self.use_foo(
12 | |foo| (f)(ComplexThing(&foo)),
| -----------------^^^^--
| | | |
| | | `foo` dropped here while still borrowed
| | borrowed value does not live long enough
| argument requires that `foo` is borrowed for `'a`
My last instinct is to hack around the restriction against moving a value with active borrows, because then I could trick the compiler. My attempts at implementing that resulted in a type that's impossible to use correctly — it ended up requiring knowledge that only the compiler has and seemed to introduce undefined behavior at every turn. I won't bother reproducing that code here.
It seems like it's impossible to return any owned instances of types containing (non-singleton) references.
Assuming that's true, that means there are entire classes of types that simply cannot be created with a dependency injection container in Rust.
Surely I'm missing something?
You can try making a drop guard along with Box::leak to leak a reference to live long enough, then have custom behavior on Drop to reclaim the leaked memory. Note that this will require you to do everything through the drop guard:
use std::marker::PhantomData;
use std::mem::ManuallyDrop;
struct ComplexThing<'a>(&'a i32);
struct Module;
pub struct DropGuard<'a, T: 'a, V: 'a> {
// do NOT make these fields pub
// direct manipulation of these is very unsafe
container: ManuallyDrop<T>,
value: *mut V,
// I'm not sure this is needed but better safe than sorry
_value: PhantomData<&'a mut V>,
}
impl<'a, T: 'a, V: 'a> DropGuard<'a, T, V> {
pub fn new<F: FnOnce(&'a mut V) -> T>(value: Box<V>, gen: F) -> Self {
// leak the value so it lives long enough
let leaked = Box::leak(value);
// get a pointer to know what to drop
let leaked_ptr: *mut _ = leaked;
DropGuard {
container: ManuallyDrop::new(gen(leaked)),
value: leaked_ptr,
_value: PhantomData,
}
}
}
// so you can actually use it
// no DerefMut since dropping the container without dropping the guard is weird
impl<'a, T: 'a, V: 'a> std::ops::Deref for DropGuard<'a, T, V> {
type Target = T;
fn deref(&self) -> &Self::Target {
&self.container
}
}
impl<'a, T: 'a, V: 'a> Drop for DropGuard<'a, T, V> {
fn drop(&mut self) {
// drop the container first
// this should be safe since self.container is never referenced again
// the value its borrowing is still valid (due to not being dropped yet)
// and there should be no references to it (due to this struct being dropped)
unsafe {
ManuallyDrop::drop(&mut self.container);
}
// now drop the pointer
// this should be safe since it was created with Box::leak
// and the container borrowing it has already been dropped
// and no more references should have survived
std::mem::drop(unsafe { Box::from_raw(self.value) });
}
}
impl Module {
pub fn resolve_foo(&self) -> i32 {
5
}
pub fn resolve_complex_thing_1(&self) -> DropGuard<ComplexThing, i32> {
DropGuard::new(Box::new(self.resolve_foo()), |i32_ref| {
ComplexThing(i32_ref)
})
}
}
fn main() {
let module = Module;
let guard = module.resolve_complex_thing_1();
println!("{:?}", guard.0);
}
Playground link
Another way that also cleans up the typing is to use a trait:
use std::marker::PhantomData;
use std::mem::ManuallyDrop;
struct ComplexThing<'a>(&'a i32);
struct Module;
// not sure if this trait should be unsafe
// but again, better safe than sorry
pub unsafe trait Guardable {
type Value;
}
unsafe impl Guardable for ComplexThing<'_> {
type Value = i32;
}
pub struct DropGuard<'a, T: 'a + Guardable> {
// do NOT make these fields pub
// direct manipulation of these is very unsafe
container: ManuallyDrop<T>,
value: *mut T::Value,
// I'm not sure this is needed but better safe than sorry
_value: PhantomData<&'a mut T::Value>,
}
impl<'a, T: 'a + Guardable> DropGuard<'a, T> {
pub fn new<F: FnOnce(&'a mut T::Value) -> T>(value: Box<T::Value>, gen: F) -> Self {
// leak the value so it lives long enough
let leaked = Box::leak(value);
// get a pointer to know what to drop
let leaked_ptr: *mut _ = leaked;
DropGuard {
container: ManuallyDrop::new(gen(leaked)),
value: leaked_ptr,
_value: PhantomData,
}
}
}
// so you can actually use it
// no DerefMut since dropping the container without dropping the guard is weird
impl<'a, T: 'a + Guardable> std::ops::Deref for DropGuard<'a, T> {
type Target = T;
fn deref(&self) -> &Self::Target {
&self.container
}
}
impl<'a, T: 'a + Guardable> Drop for DropGuard<'a, T> {
fn drop(&mut self) {
// drop the container first
// this should be safe since self.container is never referenced again
// the value its borrowing is still valid (due to not being dropped yet)
// and there should be no references to it (due to this struct being dropped)
unsafe {
ManuallyDrop::drop(&mut self.container);
}
// now drop the pointer
// this should be safe since it was created with Box::leak
// and the container borrowing it has already been dropped
// and no more references should have survived
std::mem::drop(unsafe { Box::from_raw(self.value) });
}
}
impl Module {
pub fn resolve_foo(&self) -> i32 {
5
}
pub fn resolve_complex_thing_1(&self) -> DropGuard<ComplexThing> {
DropGuard::new(Box::new(self.resolve_foo()), |i32_ref| {
ComplexThing(i32_ref)
})
}
}
fn main() {
let module = Module;
let guard = module.resolve_complex_thing_1();
println!("{:?}", guard.0);
}
Playground link
Since every container should only have one valid DropGuard value type, you can put that in an associated type in a trait, so now you can work with DropGuard<ComplexThing> instead of DropGuard<ComplexThing, i32>, and this also prevents you from having bogus values in the DropGuard.
Disclaimer: I'm fairly new to Rust; this answer is based on limited experience and may be un-nuanced.
As a general principle of Rust program design — not specific to dependency injection — you should plan not to use references except for things that are one of:
temporary, i.e. confined to the life of some stack frame (or technically longer than that in the case of async functions, but you get the idea, I hope)
compile-time constants or lazily initialized singletons, i.e. &'static references
The reason is that Rust does not — without various trickery — support lifetimes that are not one of those two cases.
Any structures which are needed for longer durations than that should be designed to not contain non-'static references — and instead use owned values. In other words, let your DI be like
pub trait ResolveOwn<T: 'static> {
// ^^^^^^^^^^
fn resolve(&self) -> T;
}
Don't actually add that lifetime constraint: it doesn't buy you anything, and might be inconvenient (for example, in a test that wants to inject things referring to the test — which will work fine since they live longer than the entire DI container — or if the application's actual main() has something to share, similarly). But plan as if it were there.
Given this constraint, how can you implement things that seem to want references?
In the simplest cases, just use owned values and don't worry about any extra cloning required unless it proves to be a performance issue.
Use Rc or Arc for reference-counted smart pointers that keep things alive as long as necessary.
If some T really requires references, but only into its own data, use a safe self-referential struct helper like ouroboros. (I believe this is similar to but more general than the suggestion in Aplet123's answer to this question.)
All of these strategies are independent of the DI container: they're ways to make a type satisfy the 'static lifetime bound ("contains no references except 'static ones").

Is there any way to explicitly write the type of a closure?

I started reading the Rust guide on closures. From the guide:
That is because in Rust each closure has its own unique type. So, not only do closures with different signatures have different types, but different closures with the same signature have different types, as well.
Is there a way to explicitly write the type signature of a closure? Is there any compiler flag that expands the type of inferred closure?
No. The real type of a closure is only known to the compiler, and it's not actually that useful to be able to know the concrete type of a given closure. You can specify certain "shapes" that a closure must fit, however:
fn call_it<F>(f: F)
where
F: Fn(u8) -> u8, // <--- HERE
{
println!("The result is {}", f(42))
}
fn main() {
call_it(|a| a + 1);
}
In this case, we say that call_it accepts any type that implements the trait Fn with one argument of type u8 and a return type of u8. Many closures and free functions can implement that trait however.
As of Rust 1.26.0, you can also use the impl Trait syntax to accept or return a closure (or any other trait):
fn make_it() -> impl Fn(u8) -> u8 {
|a| a + 1
}
fn call_it(f: impl Fn(u8) -> u8) {
println!("The result is {}", f(42))
}
fn main() {
call_it(make_it());
}
Quoting the reference, "A closure expression produces a closure value with a unique, anonymous type that cannot be written out".
However, under conditions defined by RFC1558, a closure can be coerced to a function pointer.
let trim_lines: fn((usize, &str)) -> (usize, &str) = |(i, line)| (i, line.trim());
Function pointers can be used in .map(), .filter(), etc just like a regular function. Types will be different but the Iterator trait will be present on the returned values.

Moving and returning a mutable pointer

I am working through this Rust tutorial, and I'm trying to solve this problem:
Implement a function, incrementMut that takes as input a vector of integers and modifies the values of the original list by incrementing each value by one.
This seems like a fairly simple problem, yes?
I have been trying to get a solution to compile for a while now, and I'm beginning to lose hope. This is what I have so far:
fn main() {
let mut p = vec![1i, 2i, 3i];
increment_mut(p);
for &x in p.iter() {
print!("{} ", x);
}
println!("");
}
fn increment_mut(mut x: Vec<int>) {
for &mut i in x.iter() {
i += 1;
}
}
This is what the compiler says when I try to compile:
Compiling tut2 v0.0.1 (file:///home/nate/git/rust/tut2)
/home/nate/git/rust/tut2/src/main.rs:5:12: 5:13 error: use of moved value: `p`
/home/nate/git/rust/tut2/src/main.rs:5 for &x in p.iter() {
^
/home/nate/git/rust/tut2/src/main.rs:3:16: 3:17 note: `p` moved here because it has type `collections::vec::Vec<int>`, which is non-copyable
/home/nate/git/rust/tut2/src/main.rs:3 increment_mut(p);
^
error: aborting due to previous error
Could not compile `tut2`.
To learn more, run the command again with --verbose.
I also tried a version with references:
fn main() {
let mut p = vec![1i, 2i, 3i];
increment_mut(&p);
for &x in p.iter() {
print!("{} ", x);
}
println!("");
}
fn increment_mut(x: &mut Vec<int>) {
for &mut i in x.iter() {
i += 1i;
}
}
And the error:
Compiling tut2 v0.0.1 (file:///home/nate/git/rust/tut2)
/home/nate/git/rust/tut2/src/main.rs:3:16: 3:18 error: cannot borrow immutable dereference of `&`-pointer as mutable
/home/nate/git/rust/tut2/src/main.rs:3 increment_mut(&p);
^~
error: aborting due to previous error
Could not compile `tut2`.
To learn more, run the command again with --verbose.
I feel like I'm missing some core idea about memory ownership in Rust, and it's making solving trivial problems like this very difficult, could someone shed some light on this?
There are a few mistakes in your code.
increment_mut(&p), given a p that is Vec<int>, would require the function increment_mut(&Vec<int>); &-references and &mut-references are completely distinct things syntactically, and if you want a &mut-reference you must write &mut p, not &p.
You need to understand patterns and how they operate; for &mut i in x.iter() will not do what you intend it to: what it will do is take the &int that each iteration of x.iter() produces, dereference it (the &), copying the value (because int satisfies Copy, if you tried it with a non-Copy type like String it would not compile), and place it in the mutable variable i (mut i). That is, it is equivalent to for i in x.iter() { let mut i = *i; … }. The effect of this is that i += 1 is actually just incrementing a local variable and has no effect on the vector. You can fix this by using iter_mut, which produces &mut int rather than &int, and changing the &mut i pattern to just i and the i += 1 to *i += 1, meaning “change the int inside the &mut int.
You can also switch from using &mut Vec<int> to using &mut [int] by calling .as_mut_slice() on your vector. This is a better practice; you should practically never need a reference to a vector as that is taking two levels of indirection where only one is needed. Ditto for &String—it’s exceedingly rare, you should in such cases work with &str.
So then:
fn main() {
let mut p = vec![1i, 2i, 3i];
increment_mut(p.as_mut_slice());
for &x in p.iter() {
print!("{} ", x);
}
println!("");
}
fn increment_mut(x: &mut [int]) {
for i in x.iter_mut() {
*i += 1;
}
}

D callbacks in C functions

I am writing D2 bindings for Lua. This is in one of the Lua header files.
typedef int (*lua_CFunction) (lua_State *L);
I assume the equivalent D2 statement would be:
extern(C) alias int function( lua_State* L ) lua_CFunction;
Lua also provides an api function:
void lua_pushcfunction( lua_State* L, string name, lua_CFunction func );
If I want to push a D2 function does it have to be extern(C) or can I just use the function?
int dfunc( lua_State* L )
{
std.stdio.writeln("dfunc");
}
extern(C) int cfunc( lua_State* L )
{
std.stdio.writeln("cfunc");
}
lua_State* L = lua_newstate();
lua_pushcfunction(L, "cfunc", &cfunc); //This will definitely work.
lua_pushcfunction(L, "dfunc", &dfunc); //Will this work?
If I can only use cfunc, why? I don't need to do anything like that in C++. I can just pass the address of a C++ function to C and everything just works.
Yes, the function must be declared as extern (C).
The calling convention of functions in C and D are different, so you must tell the compiler to use the C convention with extern (C). I don't know why you don't have to do this in C++.
See here for more information on interfacing with C.
It's also worth noting that you can use the C style for declaring function arguments.
Yes, your typedef translation is correct. OTOH have you looked at the htod tool?

Resources