With closures as parameter and return values, is Fn or FnMut more idiomatic? - parsing

Continuing from How do I write combinators for my own parsers in Rust?, I stumbled into this question concerning bounds of functions that consume and/or yield functions/closures.
From these slides, I learned that to be convenient for consumers, you should try to take functions as FnOnce and return as Fn where possible. This gives the caller most freedom what to pass and what to do with the returned function.
In my example, FnOnce is not possible because I need to call that function multiple times. While trying to make it compile I arrived at two possibilities:
pub enum Parsed<'a, T> {
Some(T, &'a str),
None(&'a str),
}
impl<'a, T> Parsed<'a, T> {
pub fn unwrap(self) -> (T, &'a str) {
match self {
Parsed::Some(head, tail) => (head, &tail),
_ => panic!("Called unwrap on nothing."),
}
}
pub fn is_none(&self) -> bool {
match self {
Parsed::None(_) => true,
_ => false,
}
}
}
pub fn achar(character: char) -> impl Fn(&str) -> Parsed<char> {
move |input|
match input.chars().next() {
Some(c) if c == character => Parsed::Some(c, &input[1..]),
_ => Parsed::None(input),
}
}
pub fn some_v1<T>(parser: impl Fn(&str) -> Parsed<T>) -> impl Fn(&str) -> Parsed<Vec<T>> {
move |input| {
let mut re = Vec::new();
let mut pos = input;
loop {
match parser(pos) {
Parsed::Some(head, tail) => {
re.push(head);
pos = tail;
}
Parsed::None(_) => break,
}
}
Parsed::Some(re, pos)
}
}
pub fn some_v2<T>(mut parser: impl FnMut(&str) -> Parsed<T>) -> impl FnMut(&str) -> Parsed<Vec<T>> {
move |input| {
let mut re = Vec::new();
let mut pos = input;
loop {
match parser(pos) {
Parsed::Some(head, tail) => {
re.push(head);
pos = tail;
}
Parsed::None(_) => break,
}
}
Parsed::Some(re, pos)
}
}
#[test]
fn try_it() {
assert_eq!(some_v1(achar('#'))("##comment").unwrap(), (vec!['#', '#'], "comment"));
assert_eq!(some_v2(achar('#'))("##comment").unwrap(), (vec!['#', '#'], "comment"));
}
playground
Now I don't know which version is to be preferred. Version 1 takes Fn which is less general, but version 2 needs its parameter mutable.
Which one is more idiomatic/should be used and what is the rationale behind?
Update: Thanks jplatte for the suggestion on version one. I updated the code here, that case I find even more interesting.

Comparing some_v1 and some_v2 as you wrote them I would say version 2 should definitely be preferred because it is more general. I can't think of a good example for a parsing closure that would implement FnMut but not Fn, but there's really no disadvantage to parser being mut - as noted in the first comment on your question this doesn't constrain the caller in any way.
However, there is a way in which you can make version 1 more general (not strictly more general, just partially) than version 2, and that is by returning impl Fn(&str) -> … instead of impl FnMut(&str) -> …. By doing that, you get two functions that each are less constrained than the other in some way, so it might even make sense to keep both:
Version 1 with the return type change would be more restrictive in its argument (the callable can't mutate its associated data) but less restrictive in its return type (you guarantee that the returned callable doesn't mutate its associated data)
Version 2 would be less restrictive in its argument (the callable is allowed to mutate its associated data) but more restrictive in its return type (the returned callable might mutate its associated data)

Related

How does one create a struct that holds a type parameterized function in Rust?

I am a beginner in Rust and am trying to create a Parser Combinator library in order to learn the ropes of the language. Very early on in this project I've gotten stuck. I want to have a Parser struct that holds the function used to parse data. Here is my attempt at implementing this.
struct Parser<I, O> {
parse: impl Fn(&Vec<I>) -> Option<(&Vec<I>, O)>
}
Unfortunately, as the compiler informs me, i can not use the "impl Trait" notation in this way. Another way I've tried is by defining a separate type variable for the type of the function itself, as below.
struct Parser<I, O, F>
where
F: impl Fn(&Vec<I>) -> Option<(&Vec<I>, O)>
{
parse: F
}
However, it seems redundant and unnecessary to have to provide the input, output, and function type, as the function type can be derived from the input and output. Also, the compiler gives me an error due to the fact that neither I or O are used.
I also considered Parser may have to be a trait rather than a struct. However I can't really wrap my head around what that would look like, and it seems like you would run into the same issue trying to define a struct that implemented the Parser trait.
Not a lot of context, but you I'll try doing it this way:
struct Parser<I, O> {
parse: Box<dyn Fn(&Vec<I>) -> Option<(&Vec<I>, O)>>,
}
fn main() {
let parser = Parser {
parse: Box::new(|x| {
Some((x, x.iter().sum::<i32>()))
})
};
let v = vec![1, 2, 3, 4];
let result = (parser.parse)(&v).unwrap();
println!("{:?}", result);
}
For some more suggestion I would look here: How do I store a closure in a struct in Rust?
I think all you need is std::marker::PhantomData to silence the error about unused generics. You can also make the code a bit more DRY with some type aliases. (I've replaced &Vec<I> with &[I] as the latter is a strict superset of the former.)
use std::marker::PhantomData;
type Input<'a,I> = &'a [I];
type Output<'a,I,O> = Option<(&'a [I], O)>;
struct Parser<I, O, F>
where
F: Fn(Input<'_,I>) -> Output<'_, I, O>,
{
parse: F,
_phantom: PhantomData<(I, O)>,
}
impl<I, O, F> Parser<I, O, F>
where
F: Fn(Input<'_, I>) -> Output<'_, I, O>,
{
fn new(parse: F) -> Self {
Self {
parse,
_phantom: PhantomData,
}
}
fn parse_it<'a>(&'a self, input: Input<'a, I>) -> Output<'a, I, O> {
(self.parse)(input)
}
}
fn main() {
let parser = Parser::new(|v: &[i32]| Some((v, v.iter().fold(0, |acc, x| acc + x))));
println!("{:?}", parser.parse_it(&[1, 2, 3]));
// ^ Some(([1, 2, 3], 6))
}

Why does returning early not finish outstanding borrows?

I'm trying to write a function which pushes an element onto the end of a sorted vector only if the element is larger than the last element already in the vector, otherwise returns an error with a ref to the largest element. This doesn't seem to violate any borrowing rules as far as I cant tell, but the borrow checker doesn't like it. I don't understand why.
struct MyArray<K, V>(Vec<(K, V)>);
impl<K: Ord, V> MyArray<K, V> {
pub fn insert_largest(&mut self, k: K, v: V) -> Result<(), &K> {
{
match self.0.iter().next_back() {
None => (),
Some(&(ref lk, _)) => {
if lk > &k {
return Err(lk);
}
}
};
}
self.0.push((k, v));
Ok(())
}
}
error[E0502]: cannot borrow `self.0` as mutable because it is also borrowed as immutable
--> src/main.rs:15:9
|
6 | match self.0.iter().next_back() {
| ------ immutable borrow occurs here
...
15 | self.0.push((k, v));
| ^^^^^^ mutable borrow occurs here
16 | Ok(())
17 | }
| - immutable borrow ends here
Why doesn't this work?
In response to Paolo Falabella's answer.
We can translate any function with a return statement into one without a return statement as follows:
fn my_func() -> &MyType {
'inner: {
// Do some stuff
return &x;
}
// And some more stuff
}
Into
fn my_func() -> &MyType {
let res;
'outer: {
'inner: {
// Do some stuff
res = &x;
break 'outer;
}
// And some more stuff
}
res
}
From this, it becomes clear that the borrow outlives the scope of 'inner.
Is there any problem with instead using the following rewrite for the purpose of borrow-checking?
fn my_func() -> &MyType {
'outer: {
'inner: {
// Do some stuff
break 'outer;
}
// And some more stuff
}
panic!()
}
Considering that return statements preclude anything from happening afterwards which might otherwise violate the borrowing rules.
If we name lifetimes explicitly, the signature of insert_largest becomes fn insert_largest<'a>(&'a mut self, k: K, v: V) -> Result<(), &'a K>. So, when you create your return type &K, its lifetime will be the same as the &mut self.
And, in fact, you are taking and returning lk from inside self.
The compiler is seeing that the reference to lk escapes the scope of the match (as it is assigned to the return value of the function, so it must outlive the function itself) and it can't let the borrow end when the match is over.
I think you're saying that the compiler should be smarter and realize that the self.0.push can only ever be reached if lk was not returned. But it is not. And I'm not even sure how hard it would be to teach it that sort of analysis, as it's a bit more sophisticated than the way I understand the borrow checker reasons today.
Today, the compiler sees a reference and basically tries to answer one question ("how long does this live?"). When it sees that your return value is lk, it assigns lk the lifetime it expects for the return value from the fn's signature ('a with the explicit name we gave it above) and calls it a day.
So, in short:
should an early return end the mutable borrow on self? No. As said the borrow should extend outside of the function and follow its return value
is the borrow checker a bit too strict in the code that goes from the early return to the end of the function? Yes, I think so. The part after the early return and before the end of the function is only reachable if the function has NOT returned early, so I think you have a point that the borrow checked might be less strict with borrows in that specific area of code
do I think it's feasible/desirable to change the compiler to enable that pattern? I have no clue. The borrow checker is one of the most complex pieces of the Rust compiler and I'm not qualified to give you an answer on that. This seems related to (and might even be a subset of) the discussion on non-lexical borrow scopes, so I encourage you to look into it and possibly contribute if you're interested in this topic.
For the time being I'd suggest just returning a clone instead of a reference, if possible. I assume returning an Err is not the typical case, so performance should not be a particular worry, but I'm not sure how the K:Clone bound might work with the types you're using.
impl <K, V> MyArray<K, V> where K:Clone + Ord { // 1. now K is also Clone
pub fn insert_largest(&mut self, k: K, v: V) ->
Result<(), K> { // 2. returning K (not &K)
match self.0.iter().next_back() {
None => (),
Some(&(ref lk, _)) => {
if lk > &k {
return Err(lk.clone()); // 3. returning a clone
}
}
};
self.0.push((k, v));
Ok(())
}
}
Why does returning early not finish outstanding borrows?
Because the current implementation of the borrow checker is overly conservative.
Your code works as-is once non-lexical lifetimes are enabled, but only with the experimental "Polonius" implementation. Polonius is what enables conditional tracking of borrows.
I've also simplified your code a bit:
#![feature(nll)]
struct MyArray<K, V>(Vec<(K, V)>);
impl<K: Ord, V> MyArray<K, V> {
pub fn insert_largest(&mut self, k: K, v: V) -> Result<(), &K> {
if let Some((lk, _)) = self.0.iter().next_back() {
if lk > &k {
return Err(lk);
}
}
self.0.push((k, v));
Ok(())
}
}

Conflicting lifetime requirements when storing closure capturing returned value

EDIT:
I'm trying to create a vector of closures inside a function, add a standard closure to the vector, and then return the vector from the function. I'm getting an error about conflicting lifetimes.
Code can be executed here.
fn vec_with_closure<'a, T>(f: Box<FnMut(T) + 'a>) -> Vec<Box<FnMut(T) + 'a>>
{
let mut v = Vec::<Box<FnMut(T)>>::new();
v.push(Box::new(|&mut: t: T| {
f(t);
}));
v
}
fn main() {
let v = vec_with_closure(Box::new(|t: usize| {
println!("{}", t);
}));
for c in v.iter_mut() {
c(10);
}
}
EDIT 2:
Using Rc<RefCell<...>> together with move || and the Fn() trait as opposed to the FnMut()m as suggested by Shepmaster, helped me produce a working version of the above code. Rust playpen version here.
Here's my understanding of the problem, slightly slimmed down:
fn filter<F>(&mut self, f: F) -> Keeper
where F: Fn() -> bool + 'static //'
{
let mut k = Keeper::new();
self.subscribe(|| {
if f() { k.publish() }
});
k
}
In this method, f is a value that has been passed in by-value, which means that filter owns it. Then, we create another closure that captures f by-reference. We are then trying to save that closure somewhere, so all the references in the closure need to outlive the lifetime of our struct (I picked 'static for convenience).
However, f only lives until the end of the method, so it definitely won't live long enough. We need to make the closure own f. It would be ideal if we could use the move keyword, but that causes the closure to also move in k, so we wouldn't be able to return it from the function.
Trying to solve that led to this version:
fn filter<F>(&mut self, f: F) -> Keeper
where F: Fn() -> bool + 'static //'
{
let mut k = Keeper::new();
let k2 = &mut k;
self.subscribe(move || {
if f() { k2.publish() }
});
k
}
which has a useful error message:
error: `k` does not live long enough
let k2 = &mut k;
^
note: reference must be valid for the static lifetime...
...but borrowed value is only valid for the block
Which leads to another problem: you are trying to keep a reference to k in the closure, but that reference will become invalid as soon as k is returned from the function. When items are moved by-value, their address will change, so references are no longer valid.
One potential solution is to use Rc and RefCell:
fn filter<F>(&mut self, f: F) -> Rc<RefCell<Keeper>>
where F: Fn() -> bool + 'static //'
{
let mut k = Rc::new(RefCell::new(Keeper::new()));
let k2 = k.clone();
self.subscribe(move || {
if f() { k2.borrow_mut().publish() }
});
k
}

Can you clone a closure?

A FnMut closure cannot be cloned, for obvious reasons, but a Fn closure has an immutable scope; is there some way to create a "duplicate" of a Fn closure?
Trying to clone it results in:
error[E0599]: no method named `clone` found for type `std::boxed::Box<std::ops::Fn(i8, i8) -> i8 + std::marker::Send + 'static>` in the current scope
--> src/main.rs:22:25
|
22 | fp: self.fp.clone(),
| ^^^^^
|
= note: self.fp is a function, perhaps you wish to call it
= note: the method `clone` exists but the following trait bounds were not satisfied:
`std::boxed::Box<std::ops::Fn(i8, i8) -> i8 + std::marker::Send> : std::clone::Clone`
Is it safe to somehow pass a raw pointer to a Fn around, like:
let func_pnt = &mut Box<Fn<...> + Send> as *mut Box<Fn<...>>
Technically, the above works, but it seems quite weird.
Here's an example of what I'm trying to do:
use std::thread;
struct WithCall {
fp: Box<Fn(i8, i8) -> i8 + Send>,
}
impl WithCall {
pub fn new(fp: Box<Fn(i8, i8) -> i8 + Send>) -> WithCall {
WithCall { fp: fp }
}
pub fn run(&self, a: i8, b: i8) -> i8 {
(self.fp)(a, b)
}
}
impl Clone for WithCall {
fn clone(&self) -> WithCall {
WithCall {
fp: self.fp.clone(),
}
}
}
fn main() {
let adder = WithCall::new(Box::new(|a, b| a + b));
println!("{}", adder.run(1, 2));
let add_a = adder.clone();
let add_b = adder.clone();
let a = thread::spawn(move || {
println!("In remote thread: {}", add_a.run(10, 10));
});
let b = thread::spawn(move || {
println!("In remote thread: {}", add_b.run(10, 10));
});
a.join().expect("Thread A panicked");
b.join().expect("Thread B panicked");
}
playground
I have a struct with a boxed closure in it, and I need to pass that struct to a number of threads. I can't, but I also can't clone it, because you can't clone a Box<Fn<>> and you can't clone a &Fn<...>.
What you are trying to do is call a closure from multiple threads. That is, share the closure across multiple threads. As soon as the phrase "share across multiple threads" crosses my mind, my first thought is to reach for Arc (at least until RFC 458 is implemented in some form, when & will become usable across threads).
This allows for safe shared memory (it implements Clone without requiring its internal type to be Clone, since Clone just creates a new pointer to the same memory), and so you can have a single Fn object that gets used in multiple threads, no need to duplicate it.
In summary, put your WithCall in an Arc and clone that.
use std::sync::Arc;
use std::thread;
type Fp = Box<Fn(i8, i8) -> i8 + Send + Sync>;
struct WithCall {
fp: Fp,
}
impl WithCall {
pub fn new(fp: Fp) -> WithCall {
WithCall { fp }
}
pub fn run(&self, a: i8, b: i8) -> i8 {
(self.fp)(a, b)
}
}
fn main() {
let adder = WithCall::new(Box::new(|a, b| a + b));
println!("{}", adder.run(1, 2));
let add_a = Arc::new(adder);
let add_b = add_a.clone();
let a = thread::spawn(move || {
println!("In remote thread: {}", add_a.run(10, 10));
});
let b = thread::spawn(move || {
println!("In remote thread: {}", add_b.run(10, 10));
});
a.join().expect("thread a panicked");
b.join().expect("thread b panicked");
}
playground
Old answer (this is still relevant): It is quite unusual to have a &mut Fn trait object, since Fn::call takes &self. The mut is not necessary, and I think it adds literally zero extra functionality. Having a &mut Box<Fn()> does add some functionality, but it is also unusual.
If you change to a & pointer instead of an &mut things will work more naturally (with both &Fn and &Box<Fn>). Without seeing the actual code you're using, it's extremely hard to tell exactly what you're doing, but
fn call_it(f: &Fn()) {
(*f)();
(*f)();
}
fn use_closure(f: &Fn()) {
call_it(f);
call_it(f);
}
fn main() {
let x = 1i32;
use_closure(&|| println!("x is {}", x));
}
(This is partly due to &T being Copy and also partly due to reborrowing; it works with &mut as well.)
Alternatively, you can close-over the closure, which likely works in more situations:
fn foo(f: &Fn()) {
something_else(|| f())
}
A FnMut closure cannot be cloned, for obvious reasons.
There's no inherent reason a FnMut can't be cloned, it's just a struct with some fields (and a method that takes &mut self, rather than &self or self as for Fn and FnOnce respectively). If you create a struct and implement FnMut manually, you can still implement Clone for it.
Or is it safe to somehow pass a raw pointer to a Fn around, like:
let func_pnt = &mut Box<Fn<...> + Send> as *mut Box<Fn<...>>
Technically the above works, but it seems quite weird.
Technically it works if you're careful to ensure the aliasing and lifetime requirements of Rust are satisfied... but by opting in to unsafe pointers you're putting that burden on yourself, not letting the compiler help you. It is relatively rare that the correct response to a compiler error is to use unsafe code, rather than delving in to the error and tweaking the code to make it make more sense (to the compiler, which often results in it making more sense to humans).
Rust 1.26
Closures implement both Copy and Clone if all of the captured variables do. You can rewrite your code to use generics instead of a boxed trait object to be able to clone it:
use std::thread;
#[derive(Clone)]
struct WithCall<F> {
fp: F,
}
impl<F> WithCall<F>
where
F: Fn(i8, i8) -> i8,
{
pub fn new(fp: F) -> Self {
WithCall { fp }
}
pub fn run(&self, a: i8, b: i8) -> i8 {
(self.fp)(a, b)
}
}
fn main() {
let adder = WithCall::new(|a, b| a + b);
println!("{}", adder.run(1, 2));
let add_a = adder.clone();
let add_b = adder;
let a = thread::spawn(move || {
println!("In remote thread: {}", add_a.run(10, 10));
});
let b = thread::spawn(move || {
println!("In remote thread: {}", add_b.run(10, 10));
});
a.join().expect("Thread A panicked");
b.join().expect("Thread B panicked");
}
Before Rust 1.26
Remember that closures capture their environment, so they have a lifetime of their own, based on the environment. However, you can take references to the Fn* and pass those around further, or store them in a struct:
fn do_more<F>(f: &F) -> u8
where
F: Fn(u8) -> u8,
{
f(0)
}
fn do_things<F>(f: F) -> u8
where
F: Fn(u8) -> u8,
{
// We can pass the reference to our closure around,
// effectively allowing us to use it multiple times.
f(do_more(&f))
}
fn main() {
let val = 2;
// The closure captures `val`, so it cannot live beyond that.
println!("{:?}", do_things(|x| (x + 1) * val));
}
I would say that it is not universally safe to convert the Fn* to a raw pointer and pass it around, due to the lifetime concerns.
Here is the working code in 1.22.1
The intent is to make this work.
let x = |x| { println!("----{}",x)};
let mut y = Box::new(x);
y.clone();
The original code as suggested at the top was used.
I started with cloning a Fn closure.
type Fp = Box<Fn(i8, i8) -> i8 + Send + Sync>;
Ended up adding Arc around Fp in the struct WithCall
Play rust : working code
Gist : working code in 1.22.1

Return a closure from a function

Note that this question pertains to a version of Rust before 1.0 was released
Do I understand correctly that it is now impossible to return a closure from a function, unless it was provided to the function in its arguments? It is very useful approach, for example, when I need the same block of code, parameterized differently, in different parts of program. Currently the compiler does not allow something like this, naturally:
fn make_adder(i: int) -> |int| -> int {
|j| i + j
}
The closure is allocated on the stack and is freed upon returning from a function, so it is impossible to return it.
Will it be possible to make this work in future? I heard that dynamically-sized types would allow this.
This can't ever work for a stack closure; it needs to either have no environment or own its environment. The DST proposals do include the possibility of reintroducing a closure type with an owned environment (~Fn), which would satisfy your need, but it is not clear yet whether that will happen or not.
In practice, there are other ways of doing this. For example, you might do this:
pub struct Adder {
n: int,
}
impl Add<int, int> for Adder {
#[inline]
fn add(&self, rhs: &int) -> int {
self.n + *rhs
}
}
fn make_adder(i: int) -> Adder {
Adder {
n: int,
}
}
Then, instead of make_adder(3)(4) == 7, it would be make_adder(3) + 4 == 7, or make_adder(3).add(&4) == 7. (That it is Add<int, int> that it is implementing rather than just an impl Adder { fn add(&self, other: int) -> int { self.n + other } is merely to allow you the convenience of the + operator.)
This is a fairly silly example, as the Adder might just as well be an int in all probability, but it has its possibilities.
Let us say that you want to return a counter; you might wish to have it as a function which returns (0, func), the latter element being a function which will return (1, func), &c. But this can be better modelled with an iterator:
use std::num::{Zero, One};
struct Counter<T> {
value: T,
}
impl<T: Add<T, T> + Zero + One + Clone> Counter<T> {
fn new() -> Counter<T> {
Counter { value: Zero::zero() }
}
}
impl<T: Add<T, T> + Zero + One + Clone> Iterator<T> for Counter<T> {
#[inline]
fn next(&mut self) -> Option<T> {
let mut value = self.value.clone();
self.value += One::one();
Some(value)
}
// Optional, just for a modicum of efficiency in some places
#[inline]
fn size_hint(&self) -> (uint, Option<uint>) {
(uint::max_value, None)
}
}
Again, you see the notion of having an object upon which you call a method to mutate its state and return the desired value, rather than creating a new callable. And that's how it is: for the moment, where you might like to be able to call object(), you need to call object.method(). I'm sure you can live with that minor inconvenience that exists just at present.

Resources