How to match any of a Vector of strings with nom? - parsing

I'm trying to create a parser with nom that will parse some text that could be one of many options. Nom has alt! for when the values are known at compile-time, but my values won't be.
This has been my attempt to create my own parser that can take a Vec<String> to match against, and I'm running into a couple issues.
#[macro_use]
extern crate nom;
use nom::IResult;
fn alternative_wrapper<'a>(input: &'a [u8], alternatives: Vec<String>) -> IResult<&'a [u8], &'a [u8]> {
for alternative in alternatives {
// tag!("alternative");
println!("{}", alternative);
}
return IResult::Done(input, "test".as_bytes());
}
#[test]
fn test_date() {
let input = "May";
named!(alternative, call!(alternative_wrapper));
let months = vec!(
"January",
"February",
"March",
"April",
"May",
"June",
"July",
"August",
"September",
"October",
"November",
"December"
).iter().map(|s| s.to_string()).collect();
println!("{:?}", alternative("May".as_bytes(), months));
}
I'm aware that my alternative_wrapper function doesn't actually do anything useful, but that's not the problem. This is what Rust complains about for this snippet:
error[E0061]: this function takes 1 parameter but 2 parameters were supplied
--> src/parser.rs:32:34
|
17 | named!(alternative, call!(alternative_wrapper));
| ------------------------------------------------ defined here
...
32 | println!("{:?}", alternative("May".as_bytes(), months));
| ^^^^^^^^^^^^^^^^^^^^^^^^ expected 1 parameter
|
= note: this error originates in a macro outside of the current crate
error[E0061]: this function takes 2 parameters but 1 parameter was supplied
--> src/parser.rs:17:5
|
6 | / fn alternative_wrapper<'a>(input: &'a [u8], alternatives: Vec<String>) -> IResult<&'a [u8], &'a
[u8]> {
7 | | for alternative in alternatives {
8 | | // tag!("alternative");
9 | | println!("{}", alternative);
10 | | }
11 | | return IResult::Done(input, "test".as_bytes());
12 | | }
| |_- defined here
...
17 | named!(alternative, call!(alternative_wrapper));
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected 2 parameters
|
= note: this error originates in a macro outside of the current crate
How I can create a parser out of my function? And how I can use existing parsers such as tag! from within alternative_wrapper?

Starting with the errors, the first error is due to named! only taking a single argument, namely the input string. named! will declare a function for you, in this case with the signature fn(&[u8]) -> IResult<&[u8],&[u8]>. There's no magic regarding any other arguments, so trying to pass your months vector as a second argument will not work. There's a variant of named! called named_args! that can be used to declare functions with more arguments than just the input that should sort that out.
The second error is similar but reversed. You're calling the alternative_wrapper with only the input and no vector via call!. The call! macro can actually pass arguments, but you must do it explicitly, i.e call!(myparser, monts).
With the reasons to the errors sorted out, you're asking about how to create a parser. Well, actually, alternative_wrapper is already a nom parser by signature, but since you didn't declare it via a nom macro, none of the magic input passing happens, which is why tag! doesn't work in the function body when you've tried.
In order to use other combinators in a function that you have declared yourself, you have to pass the input manually to the outermost macro. In this case it's only tag!, but if you were to use, say, do_parse! and then multiple macros within that, you'd only need to pass the input to do_parse!. I'll provide a working version with a few additional tweaks here:
#[macro_use]
extern crate nom;
use std::str;
use nom::IResult;
fn alternative<'a>(input: &'a [u8], alternatives: &Vec<String>) -> IResult<&'a [u8], &'a [u8]> {
for alternative in alternatives {
match tag!(input, alternative.as_bytes()) {
done#IResult::Done(..) => return done,
_ => () // continue
}
}
IResult::Error(nom::ErrorKind::Tag) // nothing found.
}
fn main() {
let months: Vec<String> = vec![
"January", "February", "March", "April", "May", "June", "July",
"August", "September", "October", "November", "December"
].into_iter().map(String::from).collect();
fn print_res(r: IResult<&[u8],&[u8]>) {
println!("{:?}", r);
println!("{:?}\n", str::from_utf8(r.unwrap().1).unwrap());
}
print_res(alternative(b"May", &months));
print_res(alternative(b"August", &months));
print_res(alternative(b"NoGood", &months));
}
You can check it out in the rust playground.

I'm not really familiar with nom, and still learning Rust, but I have used parser combinators in the past.
Caveats aside, it looks like the named! macro generates a function that only takes one parameter, the string to parse.
To satisfy nom's expectations, I think I'd look at writing alternative_wrapper as a function that returns a function instead. The test would end up looking like this:
#[test]
fn test_date() {
let months = vec!(
"January",
"February",
"March",
"April",
"May",
"June",
"July",
"August",
"September",
"October",
"November",
"December"
).iter().map(|s| s.to_string()).collect();
let parser = generate_alternative_parser(months);
named!(alternative, call!(parser));
println!("{:?}", alternative("May".as_bytes()));
}
It looks like you'd need to construct an alt! expression from tag!s, but it's not immediately obvious to me from the docs how you'd do that.
Where does your list of options ultimately come from?
Depending on exactly what you are trying to accomplish, there might be some other ways to accomplish what you're trying to do as well. For example, you might be able to parse any word and then validate it against one of your options afterwards.

With Nom 4, here is a fully generic input that works no matter what your parser operates on:
/// Dynamic version of `alt` that takes a slice of strings
fn alternative<T>(input: T, alternatives: &[&'static str]) -> IResult<T, T>
where
T: InputTake,
T: Compare<&'static str>,
T: InputLength,
T: AtEof,
T: Clone,
{
let mut last_err = None;
for alternative in alternatives {
let inp = input.clone();
match tag!(inp, &**alternative) {
done # Ok(..) => return done,
err # Err(..) => last_err = Some(err), // continue
}
}
last_err.unwrap()
}
/// Usage
named!(test<Span, Span>,
call!(alternative, &["a", "b", "c"])
);

Related

Are dependency injection containers that return references to values allocated in a local function fundamentally impossible in Rust?

In my mind one of the ideal traits for a dependency injection container would look like:
pub trait ResolveOwn<T> {
fn resolve(&self) -> T;
}
I don't know how to implement this for certain T. I keep stubbing my toes on variations and cousins of this error:
error[E0515]: cannot return value referencing local variable `X`
I'm used to dependency injection in C# where returning values referencing local variables is precisely how you implement the equivalent of that resolve function.
Here's an illustration that focuses on this aspect of dependency injection:
struct ComplexThing<'a>(&'a i32);
struct Module();
impl Module {
fn resolve_foo(&self) -> i32 {
todo!()
}
pub fn resolve_complex_thing_1(&self) -> ComplexThing {
let foo = self.resolve_foo();
ComplexThing(&foo)
}
}
error[E0515]: cannot return value referencing local variable `foo`
--> src/lib.rs:12:9
|
12 | ComplexThing(&foo)
| ^^^^^^^^^^^^^----^
| | |
| | `foo` is borrowed here
| returns a value referencing data owned by the current function
See? There's that error.
My first instinct (again, coming from C#) is to give the local variable a place to live in the returned value, because the local variable is created here but it needs to live at least as long as the returned value. Hmm... that sounds sort of like returning a closure. Let's see how that goes...
pub fn resolve_complex_thing_2<'a>(&'a self) -> impl FnOnce() -> ComplexThing<'a> {
let foo = self.resolve_foo();
move || ComplexThing(&foo)
}
error[E0515]: cannot return value referencing local data `foo`
--> src/lib.rs:12:17
|
12 | move || ComplexThing(&foo)
| ^^^^^^^^^^^^^----^
| | |
| | `foo` is borrowed here
| returns a value referencing data owned by the current function
No joy. It doesn't work to package this closure up into a prettier type (like some impl of Into<ComplexThing<'a>>) because it's fundamentally about returning a value referencing local data.
My next instinct is to somehow jam the local data into some kind of weak cache inside my Module and then get a reference from there (undoubtedly unsafely). And then the weak cache will need to solve half of the hard problems in Computer Science (hint: the other hard problem is naming things). That's starting to sound an awful lot like... oh no. Garbage collection!
I also thought about inverting the flow of control. It's hideous and still doesn't work:
impl Module {
pub fn use_foo<T>(&self, f: impl FnOnce(i32) -> T) -> T {
(f)(42)
}
pub fn use_complex_thing<'a, T>(&'a self, f: impl FnOnce(ComplexThing<'a>) -> T) -> T {
self.use_foo(
|foo| (f)(ComplexThing(&foo)),
)
}
}
error[E0597]: `foo` does not live long enough
--> src/lib.rs:12:36
|
10 | pub fn use_complex_thing<'a, T>(&'a self, f: impl FnOnce(ComplexThing<'a>) -> T) -> T {
| -- lifetime `'a` defined here
11 | self.use_foo(
12 | |foo| (f)(ComplexThing(&foo)),
| -----------------^^^^--
| | | |
| | | `foo` dropped here while still borrowed
| | borrowed value does not live long enough
| argument requires that `foo` is borrowed for `'a`
My last instinct is to hack around the restriction against moving a value with active borrows, because then I could trick the compiler. My attempts at implementing that resulted in a type that's impossible to use correctly — it ended up requiring knowledge that only the compiler has and seemed to introduce undefined behavior at every turn. I won't bother reproducing that code here.
It seems like it's impossible to return any owned instances of types containing (non-singleton) references.
Assuming that's true, that means there are entire classes of types that simply cannot be created with a dependency injection container in Rust.
Surely I'm missing something?
You can try making a drop guard along with Box::leak to leak a reference to live long enough, then have custom behavior on Drop to reclaim the leaked memory. Note that this will require you to do everything through the drop guard:
use std::marker::PhantomData;
use std::mem::ManuallyDrop;
struct ComplexThing<'a>(&'a i32);
struct Module;
pub struct DropGuard<'a, T: 'a, V: 'a> {
// do NOT make these fields pub
// direct manipulation of these is very unsafe
container: ManuallyDrop<T>,
value: *mut V,
// I'm not sure this is needed but better safe than sorry
_value: PhantomData<&'a mut V>,
}
impl<'a, T: 'a, V: 'a> DropGuard<'a, T, V> {
pub fn new<F: FnOnce(&'a mut V) -> T>(value: Box<V>, gen: F) -> Self {
// leak the value so it lives long enough
let leaked = Box::leak(value);
// get a pointer to know what to drop
let leaked_ptr: *mut _ = leaked;
DropGuard {
container: ManuallyDrop::new(gen(leaked)),
value: leaked_ptr,
_value: PhantomData,
}
}
}
// so you can actually use it
// no DerefMut since dropping the container without dropping the guard is weird
impl<'a, T: 'a, V: 'a> std::ops::Deref for DropGuard<'a, T, V> {
type Target = T;
fn deref(&self) -> &Self::Target {
&self.container
}
}
impl<'a, T: 'a, V: 'a> Drop for DropGuard<'a, T, V> {
fn drop(&mut self) {
// drop the container first
// this should be safe since self.container is never referenced again
// the value its borrowing is still valid (due to not being dropped yet)
// and there should be no references to it (due to this struct being dropped)
unsafe {
ManuallyDrop::drop(&mut self.container);
}
// now drop the pointer
// this should be safe since it was created with Box::leak
// and the container borrowing it has already been dropped
// and no more references should have survived
std::mem::drop(unsafe { Box::from_raw(self.value) });
}
}
impl Module {
pub fn resolve_foo(&self) -> i32 {
5
}
pub fn resolve_complex_thing_1(&self) -> DropGuard<ComplexThing, i32> {
DropGuard::new(Box::new(self.resolve_foo()), |i32_ref| {
ComplexThing(i32_ref)
})
}
}
fn main() {
let module = Module;
let guard = module.resolve_complex_thing_1();
println!("{:?}", guard.0);
}
Playground link
Another way that also cleans up the typing is to use a trait:
use std::marker::PhantomData;
use std::mem::ManuallyDrop;
struct ComplexThing<'a>(&'a i32);
struct Module;
// not sure if this trait should be unsafe
// but again, better safe than sorry
pub unsafe trait Guardable {
type Value;
}
unsafe impl Guardable for ComplexThing<'_> {
type Value = i32;
}
pub struct DropGuard<'a, T: 'a + Guardable> {
// do NOT make these fields pub
// direct manipulation of these is very unsafe
container: ManuallyDrop<T>,
value: *mut T::Value,
// I'm not sure this is needed but better safe than sorry
_value: PhantomData<&'a mut T::Value>,
}
impl<'a, T: 'a + Guardable> DropGuard<'a, T> {
pub fn new<F: FnOnce(&'a mut T::Value) -> T>(value: Box<T::Value>, gen: F) -> Self {
// leak the value so it lives long enough
let leaked = Box::leak(value);
// get a pointer to know what to drop
let leaked_ptr: *mut _ = leaked;
DropGuard {
container: ManuallyDrop::new(gen(leaked)),
value: leaked_ptr,
_value: PhantomData,
}
}
}
// so you can actually use it
// no DerefMut since dropping the container without dropping the guard is weird
impl<'a, T: 'a + Guardable> std::ops::Deref for DropGuard<'a, T> {
type Target = T;
fn deref(&self) -> &Self::Target {
&self.container
}
}
impl<'a, T: 'a + Guardable> Drop for DropGuard<'a, T> {
fn drop(&mut self) {
// drop the container first
// this should be safe since self.container is never referenced again
// the value its borrowing is still valid (due to not being dropped yet)
// and there should be no references to it (due to this struct being dropped)
unsafe {
ManuallyDrop::drop(&mut self.container);
}
// now drop the pointer
// this should be safe since it was created with Box::leak
// and the container borrowing it has already been dropped
// and no more references should have survived
std::mem::drop(unsafe { Box::from_raw(self.value) });
}
}
impl Module {
pub fn resolve_foo(&self) -> i32 {
5
}
pub fn resolve_complex_thing_1(&self) -> DropGuard<ComplexThing> {
DropGuard::new(Box::new(self.resolve_foo()), |i32_ref| {
ComplexThing(i32_ref)
})
}
}
fn main() {
let module = Module;
let guard = module.resolve_complex_thing_1();
println!("{:?}", guard.0);
}
Playground link
Since every container should only have one valid DropGuard value type, you can put that in an associated type in a trait, so now you can work with DropGuard<ComplexThing> instead of DropGuard<ComplexThing, i32>, and this also prevents you from having bogus values in the DropGuard.
Disclaimer: I'm fairly new to Rust; this answer is based on limited experience and may be un-nuanced.
As a general principle of Rust program design — not specific to dependency injection — you should plan not to use references except for things that are one of:
temporary, i.e. confined to the life of some stack frame (or technically longer than that in the case of async functions, but you get the idea, I hope)
compile-time constants or lazily initialized singletons, i.e. &'static references
The reason is that Rust does not — without various trickery — support lifetimes that are not one of those two cases.
Any structures which are needed for longer durations than that should be designed to not contain non-'static references — and instead use owned values. In other words, let your DI be like
pub trait ResolveOwn<T: 'static> {
// ^^^^^^^^^^
fn resolve(&self) -> T;
}
Don't actually add that lifetime constraint: it doesn't buy you anything, and might be inconvenient (for example, in a test that wants to inject things referring to the test — which will work fine since they live longer than the entire DI container — or if the application's actual main() has something to share, similarly). But plan as if it were there.
Given this constraint, how can you implement things that seem to want references?
In the simplest cases, just use owned values and don't worry about any extra cloning required unless it proves to be a performance issue.
Use Rc or Arc for reference-counted smart pointers that keep things alive as long as necessary.
If some T really requires references, but only into its own data, use a safe self-referential struct helper like ouroboros. (I believe this is similar to but more general than the suggestion in Aplet123's answer to this question.)
All of these strategies are independent of the DI container: they're ways to make a type satisfy the 'static lifetime bound ("contains no references except 'static ones").

What is a macro for concatenating an arbitrary number of components to build a path in Rust?

In Python, a function called os.path.join() allows concatenating multiple strings into one path using the path separator of the operating system. In Rust, there is only a function join() that appends a string or a path to an existing path. This problem can't be solved with a normal function as a normal function needs to have a fixed number of arguments.
I'm looking for a macro that takes an arbitrary number of strings and paths and returns the joined path.
There's a reasonably simple example in the documentation for PathBuf:
use std::path::PathBuf;
let path: PathBuf = [r"C:\", "windows", "system32.dll"].iter().collect();
Once you read past the macro syntax, it's not too bad. Basically, we take require at least two arguments, and the first one needs to be convertible to a PathBuf via Into. Each subsequent argument is pushed on the end, which accepts anything that can be turned into a reference to a Path.
macro_rules! build_from_paths {
($base:expr, $($segment:expr),+) => {{
let mut base: ::std::path::PathBuf = $base.into();
$(
base.push($segment);
)*
base
}}
}
fn main() {
use std::{
ffi::OsStr,
path::{Path, PathBuf},
};
let a = build_from_paths!("a", "b", "c");
println!("{:?}", a);
let b = build_from_paths!(PathBuf::from("z"), OsStr::new("x"), Path::new("y"));
println!("{:?}", b);
}
A normal function which takes an iterable (e.g. a slice) can solve the problem in many contexts:
use std::path::{Path, PathBuf};
fn join_all<P, Ps>(parts: Ps) -> PathBuf
where
Ps: IntoIterator<Item = P>,
P: AsRef<Path>,
{
parts.into_iter().fold(PathBuf::new(), |mut acc, p| {
acc.push(p);
acc
})
}
fn main() {
let parts = vec!["/usr", "bin", "man"];
println!("{:?}", join_all(&parts));
println!("{:?}", join_all(&["/etc", "passwd"]));
}
Playground

How to get the output for several sequential nom parsers when the input is a &str?

This question is almost identical to Capture the entire contiguous matched input with nom, but I have to parse UTF-8 text as input (&str) not just bytes (&[u8]). I am trying to get the whole match for several parsers:
named!(parse <&str, &str>,
recognize!(
chain!(
is_not_s!(".") ~
tag_s!(".") ~
is_not_s!( "./ \r\n\t" ),
|| {}
)
)
);
And it causes this error:
no method named "offset" found for type "&str" in the current scope
Is the only way to do this to switch to &[u8] as input and then do map_res!?
there's an Offset trait implementation for &str that will be available in the next version of nom. There is no planned release date yet for nom 2.0, so in the meantime, you can copy the implementation in your code:
use nom::Offset;
impl Offset for str {
fn offset(&self, second: &Self) -> usize {
let fst = self.as_ptr();
let snd = second.as_ptr();
snd as usize - fst as usize
}
}

Return something that's allocated on the stack

I have the following simplified code, where a struct A contains a certain attribute. I'd like to create new instances of A from an existing version of that attribute, but how do I make the lifetime of the attribute's new value last past the function call?
pub struct A<'a> {
some_attr: &'a str,
}
impl<'a> A<'a> {
fn combine(orig: &'a str) -> A<'a> {
let attr = &*(orig.to_string() + "suffix");
A { some_attr: attr }
}
}
fn main() {
println!("{}", A::combine("blah").some_attr);
}
The above code produces
error[E0597]: borrowed value does not live long enough
--> src/main.rs:7:22
|
7 | let attr = &*(orig.to_string() + "suffix");
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ does not live long enough
8 | A { some_attr: attr }
9 | }
| - temporary value only lives until here
|
note: borrowed value must be valid for the lifetime 'a as defined on the impl at 5:1...
--> src/main.rs:5:1
|
5 | / impl<'a> A<'a> {
6 | | fn combine(orig: &'a str) -> A<'a> {
7 | | let attr = &*(orig.to_string() + "suffix");
8 | | A { some_attr: attr }
9 | | }
10| | }
| |_^
This question most certainly was answered before, but I'm not closing it as a duplicate because the code here is somewhat different and I think it is important.
Note how you defined your function:
fn combine(orig: &'a str) -> A<'a>
It says that it will return a value of type A whose insides live exactly as long as the provided string. However, the body of the function violates this declaration:
let attr = &*(orig.to_string() + "suffix");
A {
some_attr: attr
}
Here you construct a new String obtained from orig, take a slice of it and try to return it inside A. However, the lifetime of the implicit variable created for orig.to_string() + "suffix" is strictly smaller than the lifetime of the input parameter. Therefore, your program is rejected.
Another, more practical way to look at this is consider that the string created by to_string() and concatenation has to live somewhere. However, you only return a borrowed slice of it. Thus when the function exits, the string is destroyed, and the returned slice becomes invalid. This is exactly the situation which Rust prevents.
To overcome this you can either store a String inside A:
pub struct A {
some_attr: String
}
or you can use std::borrow::Cow to store either a slice or an owned string:
pub struct A<'a> {
some_attr: Cow<'a, str>
}
In the last case your function could look like this:
fn combine(orig: &str) -> A<'static> {
let attr = orig.to_owned() + "suffix";
A {
some_attr: attr.into()
}
}
Note that because you construct the string inside the function, it is represented as an owned variant of Cow and so you can use 'static lifetime parameter for the resulting value. Tying it to orig is also possible but there is no reason to do so.
With Cow it is also possible to create values of A directly out of slices without allocations:
fn new(orig: &str) -> A {
A { some_attr: orig.into() }
}
Here the lifetime parameter of A will be tied (through lifetime elision) to the lifetime of the input string slice. In this case the borrowed variant of Cow is used, and no allocation is done.
Also note that it is better to use to_owned() or into() to convert string slices to Strings because these methods do not require formatting code to run and so they are more efficient.
how can you return an A of lifetime 'static when you're creating it on the fly? Not sure what "owned variant of Cow" means and why that makes 'static possible.
Here is the definition of Cow:
pub enum Cow<'a, B> where B: 'a + ToOwned + ?Sized {
Borrowed(&'a B),
Owned(B::Owned),
}
It looks complex but it is in fact simple. An instance of Cow may either contain a reference to some type B or an owned value which could be derived from B via the ToOwned trait. Because str implements ToOwned where Owned associated type equals to String (written as ToOwned<Owned = String>, when this enum is specialized for str, it looks like this:
pub enum Cow<'a, str> {
Borrowed(&'a str),
Owned(String)
}
Therefore, Cow<str> may represent either a string slice or an owned string - and while Cow does indeed provide methods for clone-on-write functionality, it is just as often used to hold a value which can be either borrowed or owned in order to avoid extra allocations. Because Cow<'a, B> implements Deref<Target = B>, you can get &B from Cow<'a, B> with simple reborrowing: if x is Cow<str>, then &*x is &str, regardless of what is contained inside of x - naturally, you can get a slice out of both variants of Cow.
You can see that the Cow::Owned variant does not contain any references inside it, only String. Therefore, when a value of Cow is created using Owned variant, you can choose any lifetime you want (remember, lifetime parameters are much like generic type parameters; in particular, it is the caller who gets to choose them) - there are no restrictions on it. So it makes sense to choose 'static as the greatest lifetime possible.
Does orig.to_owned remove ownership from whoever's calling this function? That sounds like it would be inconvenient.
The to_owned() method belongs to ToOwned trait:
pub trait ToOwned {
type Owned: Borrow<Self>;
fn to_owned(&self) -> Self::Owned;
}
This trait is implemented by str with Owned equal to String. to_owned() method returns an owned variant of whatever value it is called on. In this particular case, it creates a String out of &str, effectively copying contents of the string slice into a new allocation. Therefore no, to_owned() does not imply ownership transfer, it's more like it implies a "smart" clone.
As far as I can tell String implements Into<Vec<u8>> but not str, so how can we call into() in the 2nd example?
The Into trait is very versatile and it is implemented for lots of types in the standard library. Into is usually implemented through the From trait: if T: From<U>, then U: Into<T>. There are two important implementations of From in the standard library:
impl<'a> From<&'a str> for Cow<'a, str>
impl<'a> From<String> for Cow<'a, str>
These implementations are very simple - they just return Cow::Borrowed(value) if value is &str and Cow::Owned(value) if value is String.
This means that &'a str and String implement Into<Cow<'a, str>>, and so they can be converted to Cow with into() method. That's exactly what happens in my example - I'm using into() to convert String or &str to Cow<str>. Without this explicit conversion you will get an error about mismatched types.

How can I move a captured variable into a closure within a closure?

This code is an inefficient way of producing a unique set of items from an iterator. To accomplish this, I am attempting to use a Vec to keep track of values I've seen. I believe that this Vec needs to be owned by the innermost closure:
fn main() {
let mut seen = vec![];
let items = vec![vec![1i32, 2], vec![3], vec![1]];
let a: Vec<_> = items
.iter()
.flat_map(move |inner_numbers| {
inner_numbers.iter().filter_map(move |&number| {
if !seen.contains(&number) {
seen.push(number);
Some(number)
} else {
None
}
})
})
.collect();
println!("{:?}", a);
}
However, compilation fails with:
error[E0507]: cannot move out of captured outer variable in an `FnMut` closure
--> src/main.rs:8:45
|
2 | let mut seen = vec![];
| -------- captured outer variable
...
8 | inner_numbers.iter().filter_map(move |&number| {
| ^^^^^^^^^^^^^^ cannot move out of captured outer variable in an `FnMut` closure
This is a little surprising, but isn't a bug.
flat_map takes a FnMut as it needs to call the closure multiple times. The code with move on the inner closure fails because that closure is created multiple times, once for each inner_numbers. If I write the closures in explicit form (i.e. a struct that stores the captures and an implementation of one of the closure traits) your code looks (a bit) like
struct OuterClosure {
seen: Vec<i32>
}
struct InnerClosure {
seen: Vec<i32>
}
impl FnMut(&Vec<i32>) -> iter::FilterMap<..., InnerClosure> for OuterClosure {
fn call_mut(&mut self, (inner_numbers,): &Vec<i32>) -> iter::FilterMap<..., InnerClosure> {
let inner = InnerClosure {
seen: self.seen // uh oh! a move out of a &mut pointer
};
inner_numbers.iter().filter_map(inner)
}
}
impl FnMut(&i32) -> Option<i32> for InnerClosure { ... }
Which makes the illegality clearer: attempting to move out of the &mut OuterClosure variable.
Theoretically, just capturing a mutable reference is sufficient, since the seen is only being modified (not moved) inside the closure. However things are too lazy for this to work...
error: lifetime of `seen` is too short to guarantee its contents can be safely reborrowed
--> src/main.rs:9:45
|
9 | inner_numbers.iter().filter_map(|&number| {
| ^^^^^^^^^
|
note: `seen` would have to be valid for the method call at 7:20...
--> src/main.rs:7:21
|
7 | let a: Vec<_> = items.iter()
| _____________________^
8 | | .flat_map(|inner_numbers| {
9 | | inner_numbers.iter().filter_map(|&number| {
10| | if !seen.contains(&number) {
... |
17| | })
18| | .collect();
| |__________________^
note: ...but `seen` is only valid for the lifetime as defined on the body at 8:34
--> src/main.rs:8:35
|
8 | .flat_map(|inner_numbers| {
| ___________________________________^
9 | | inner_numbers.iter().filter_map(|&number| {
10| | if !seen.contains(&number) {
11| | seen.push(number);
... |
16| | })
17| | })
| |_________^
Removing the moves makes the closure captures work like
struct OuterClosure<'a> {
seen: &'a mut Vec<i32>
}
struct InnerClosure<'a> {
seen: &'a mut Vec<i32>
}
impl<'a> FnMut(&Vec<i32>) -> iter::FilterMap<..., InnerClosure<??>> for OuterClosure<'a> {
fn call_mut<'b>(&'b mut self, inner_numbers: &Vec<i32>) -> iter::FilterMap<..., InnerClosure<??>> {
let inner = InnerClosure {
seen: &mut *self.seen // can't move out, so must be a reborrow
};
inner_numbers.iter().filter_map(inner)
}
}
impl<'a> FnMut(&i32) -> Option<i32> for InnerClosure<'a> { ... }
(I've named the &mut self lifetime in this one, for pedagogical purposes.)
This case is definitely more subtle. The FilterMap iterator stores the closure internally, meaning any references in the closure value (that is, any references it captures) have to be valid as long as the FilterMap values are being thrown around, and, for &mut references, any references have to be careful to be non-aliased.
The compiler can't be sure flat_map won't, e.g. store all the returned iterators in a Vec<FilterMap<...>> which would result in a pile of aliased &muts... very bad! I think this specific use of flat_map happens to be safe, but I'm not sure it is in general, and there's certainly functions with the same style of signature as flat_map (e.g. map) would definitely be unsafe. (In fact, replacing flat_map with map in the code gives the Vec situation I just described.)
For the error message: self is effectively (ignoring the struct wrapper) &'b mut (&'a mut Vec<i32>) where 'b is the lifetime of &mut self reference and 'a is the lifetime of the reference in the struct. Moving the inner &mut out is illegal: can't move an affine type like &mut out of a reference (it would work with &Vec<i32>, though), so the only choice is to reborrow. A reborrow is going through the outer reference and so cannot outlive it, that is, the &mut *self.seen reborrow is a &'b mut Vec<i32>, not a &'a mut Vec<i32>.
This makes the inner closure have type InnerClosure<'b>, and hence the call_mut method is trying to return a FilterMap<..., InnerClosure<'b>>. Unfortunately, the FnMut trait defines call_mut as just
pub trait FnMut<Args>: FnOnce<Args> {
extern "rust-call" fn call_mut(&mut self, args: Args) -> Self::Output;
}
In particular, there's no connection between the lifetime of the self reference itself and the returned value, and so it is illegal to try to return InnerClosure<'b> which has that link. This is why the compiler is complaining that the lifetime is too short to be able to reborrow.
This is extremely similar to the Iterator::next method, and the code here is failing for basically the same reason that one cannot have an iterator over references into memory that the iterator itself owns. (I imagine a "streaming iterator" (iterators with a link between &mut self and the return value in next) library would be able to provide a flat_map that works with the code nearly written: would need "closure" traits with a similar link.)
Work-arounds include:
the RefCell suggested by Renato Zannon, which allows seen to be borrowed as a shared &. The desugared closure code is basically the same other than changing the &mut Vec<i32> to &Vec<i32>. This change means "reborrow" of the &'b mut &'a RefCell<Vec<i32>> can just be a copy of the &'a ... out of the &mut. It's a literal copy, so the lifetime is retained.
avoiding the laziness of iterators, to avoid returning the inner closure, specifically.collect::<Vec<_>>()ing inside the loop to run through the whole filter_map before returning.
fn main() {
let mut seen = vec![];
let items = vec![vec![1i32, 2], vec![3], vec![1]];
let a: Vec<_> = items
.iter()
.flat_map(|inner_numbers| {
inner_numbers
.iter()
.filter_map(|&number| if !seen.contains(&number) {
seen.push(number);
Some(number)
} else {
None
})
.collect::<Vec<_>>()
.into_iter()
})
.collect();
println!("{:?}", a);
}
I imagine the RefCell version is more efficient.
It seems that the borrow checker is getting confused at the nested closures + mutable borrow. It might be worth filing an issue. Edit: See huon's answer for why this isn't a bug.
As a workaround, it's possible to resort to RefCell here:
use std::cell::RefCell;
fn main() {
let seen = vec![];
let items = vec![vec![1i32, 2], vec![3], vec![1]];
let seen_cell = RefCell::new(seen);
let a: Vec<_> = items
.iter()
.flat_map(|inner_numbers| {
inner_numbers.iter().filter_map(|&number| {
let mut borrowed = seen_cell.borrow_mut();
if !borrowed.contains(&number) {
borrowed.push(number);
Some(number)
} else {
None
}
})
})
.collect();
println!("{:?}", a);
}
I came across this question after I ran into a similar issue, using flat_map and filter_map. I solved it by moving the filter_map outside the flat_map closure.
Using your example:
let a: Vec<_> = items
.iter()
.flat_map(|inner_numbers| inner_numbers.iter())
.filter_map(move |&number| {
if !seen.contains(&number) {
seen.push(number);
Some(number)
} else {
None
}
})
.collect();

Resources