Why is the value moved into the closure here rather than borrowed? - closures

The Error Handling chapter of the Rust Book contains an example on how to use the combinators of Option and Result. A file is read and through application of a series of combinators the contents are parsed as an i32 and returned in a Result<i32, String>.
Now, I got confused when I looked at the code. There, in one closure to an and_then a local String value is created an subsequently passed as a return value to another combinator.
Here is the code example:
use std::fs::File;
use std::io::Read;
use std::path::Path;
fn file_double<P: AsRef<Path>>(file_path: P) -> Result<i32, String> {
File::open(file_path)
.map_err(|err| err.to_string())
.and_then(|mut file| {
let mut contents = String::new(); // local value
file.read_to_string(&mut contents)
.map_err(|err| err.to_string())
.map(|_| contents) // moved without 'move'
})
.and_then(|contents| {
contents.trim().parse::<i32>()
.map_err(|err| err.to_string())
})
.map(|n| 2 * n)
}
fn main() {
match file_double("foobar") {
Ok(n) => println!("{}", n),
Err(err) => println!("Error: {}", err),
}
}
The value I am referring to is contents. It is created and later referenced in the map combinator applied to the std::io::Result<usize> return value of Read::read_to_string.
The question: I thought that not marking the closure with move would borrow any referenced value by default, which would result in the borrow checker complaining, that contents does not live long enough. However, this code compiles just fine. That means, the String contents is moved into, and subequently out of, the closure. Why is this done without the explicit move?

I thought that not marking the closure with move would borrow any referenced value by default,
Not quite. The compiler does a bit of inspection on the code within the closure body and tracks how the closed-over variables are used.
When the compiler sees that a method is called on a variable, then it looks to see what type the receiver is (self, &self, &mut self). When a variable is used as a parameter, the compiler also tracks if it is by value, reference, or mutable reference. Whatever the most restrictive requirement is will be what is used by default.
Occasionally, this analysis is not complete enough — even though the variable is only used as a reference, we intend for the closure to own the variable. This usually occurs when returning a closure or handing it off to another thread.
In this case, the variable is returned from the closure, which must mean that it is used by value. Thus the variable will be moved into the closure automatically.
Occasionally the move keyword is too big of a hammer as it moves all of the referenced variables in. Sometimes you may want to just force one variable to be moved in but not others. In that case, the best solution I know of is to make an explicit reference and move the reference in:
fn main() {
let a = 1;
let b = 2;
{
let b = &b;
needs_to_own_a(move || a_function(a, b));
}
}

Related

Why does `set` method defined on `Cell<T>` explicitly drops the old value? (Rust)

Interested why does set method defined on Cell, on the last line explicitly drops old value.
Shouldn't it be implicitly dropped (memory freed) anyways when the function returns?
use std::mem;
use std::cell::UnsafeCell;
pub struct Cell<T> {
value: UnsafeCell<T>
}
impl<T> Cell<T> {
pub fn set(&self, val: T) {
let old = self.replace(val);
drop(old); // Is this needed?
} // old would drop here anyways?
pub fn replace(&self, val: T) -> T {
mem::replace(unsafe { &mut *self.value.get() }, val)
}
}
So why not have set do this only:
pub fn set(&self, val: T) {
self.replace(val);
}
or std::ptr::read does something I don't understand.
It is not needed, but calling drop explicitly can help make code easier to read in some cases. If we only wrote it as a call to replace, it would look like a wrapper function for replace and a reader might lose the context that it does an additional action on top of calling the replace method (dropping the previous value). At the end of the day though it is somewhat subjective on which version to use and it makes no functional difference.
That being said, the real reason is that it did not always drop the previous value when set. Cell<T> previously implemented set to overwrite the existing value via unsafe pointer operations. It was later modified in rust-lang/rust#39264: Extend Cell to non-Copy types so that the previous value would always be dropped. The writer (wesleywiser) likely wanted to more explicitly show that the previous value was being dropped when a new value is written to the cell so the pull request would be easier to review.
Personally, I think this is a good usage of drop since it helps to convey what we intend to do with the result of the replace method.

A few questions about Dart generics and type safety

I have the following Dart 2 code with null-safety.
extension Foo<T> on List<T> {
List<U> bar<U>({
U Function(T)? transform,
}) {
final t = transform ?? _identityTransform;
return map(t).toList();
}
}
U _identityTransform<T, U>(T t) => t as U; // #1, #2
void main() {
final strings = ['a', 'b', 'c'].bar<String>(); // #3
final ints = ['1', '2', '3'].bar(transform: int.parse);
print(strings);
print(ints);
}
It is an extension method on List<T> with a custom method that is basically a map with the
difference that it can return a new list of the same type if no transform is specified. (My real code is more complex than this, but this example is enough to present my questions.)
I want to be able to call bar() on a List with transform or without; if called without it, _identityTransform should be used.
The code above works, but I have a few reservations as to its quality, and questions, as I haven't really come to terms with Dart generics yet:
In the line marked #1 - the _identityTransform takes two generic parameters as I need access to them, but when the function is used the generic types are not used because I don't think it is possible to write something like _identityTransform<T, U> there. Is there a better way of defining _identityTransform? Am I losing any type safety with my current code?
In the line marked #2 I need a cast as U for the code to compile, I haven't managed to make the code work without it. Is there a way to do it without the cast?
In the line marked #3, when I call the extension method without any transform (i.e. I want the identity transform to kick in) I need to explicitly pass the generic type, otherwise the compiler complains about missing generic type (in strong mode) or infers strings to be List<dynamic> (strong mode turned off). Is some generics magic possible to be able to call .bar() and still have strings be inferred to List<String>?
I would make _identityTransform a nested function of bar so that you can remove its type arguments and instead use the same T and U as bar:
extension Foo<T> on List<T> {
List<U> bar<U>({
U Function(T)? transform,
}) {
U _identityTransform(T t) => t as U;
final t = transform ?? _identityTransform;
return map(t).toList();
}
}
Alternatively if you want to explicitly use _identityTransform<T, U>, then you could use a closure: t = transform ?? (arg) => _identityTransform<T, U>(arg), but that seems like overkill.
You need the cast. T and U are independent/unrelated types. Since you don't know that you want T and U to be the same until bar checks its argument at runtime, you will need the explicit cast to satisfy static type checking.
If the caller passes nothing for the transform argument, there is nothing to infer U from, so it will be dynamic. I can't think of any magical way make U default to T in such a case (again, that would be known only at runtime, but generics must satisfy static analysis).

Can I create an "unsafe closure"?

I have some code that, when simplified, looks like:
fn foo() -> Vec<u8> {
unsafe {
unsafe_iterator().map(|n| wrap_element(n)).collect()
}
}
The iterator returns items that would be invalidated if the underlying data changed. Sadly, I'm unable to rely on the normal Rust mechanism of mut here (I'm doing some... odd things).
To rectify the unsafe-ness, I traverse the iterator all at once and make copies of each item (via wrap_element) and then throw it all into a Vec. This works because nothing else has a chance to come in and modify the underlying data.
The code works as-is now, but since I use this idiom a few times, I wanted to DRY up my code a bit:
fn zap<F>(f: F) -> Vec<u8>
where F: FnOnce() -> UnsafeIter
{
f().map(|n| wrap_element(n)).collect()
}
fn foo() -> Vec<u8> {
zap(|| unsafe { unsafe_iterator() }) // Unsafe block
}
My problem with this solution is that the call to unsafe_iterator is unsafe, and it's the wrap_element / collect that makes it safe again. The way that the code is structured does not convey that at all.
I'd like to somehow mark my closure as being unsafe and then it's zaps responsibility to make it safe again.
It's not possible to create an unsafe closure in the same vein as an unsafe fn, since closures are just anonymous types with implementations of the Fn, FnMut, and/or FnOnce family of traits. Since those traits do not have unsafe methods, it's not possible to create a closure which is unsafe to call.
You could create a second set of closure traits with unsafe methods, then write implementations for those, but you would lose much of the closure sugar.

why does dart create closures when referencing a method?

void main() {
A one = new A(1);
A two = new A(2);
var fnRef = one.getMyId; //A closure created here
var anotherFnRef = two.getMyId; //Another closure created here
}
class A{
int _id;
A(this._id);
int getMyId(){
return _id;
}
}
According to the dart language tour page referencing methods like this creates a new closure each time. Does anyone know why it does this? I can understand creating closures when defining a method body as we can use variables in an outer scope within the method body, but when just referencing a method like above, why create the closure as the method body isn't changing so it can't use any of the variables available in that scope can it? I noticed in a previous question I asked that referencing methods like this effectively binds them to the object they were referenced from. So in the above example if we call fnRef() it will behave like one.getMyId() so is the closure used just for binding the calling context? ... I'm confused :S
UPDATE
In response to Ladicek. So does that mean that:
void main(){
var fnRef = useLotsOfMemory();
//did the closure created in the return statement close on just 'aVeryLargeObj'
//or did it close on all of the 'veryLargeObjects' thus keeping them all in memory
//at this point where they aren't needed
}
useLotsOfMemory(){
//create lots of 'veryLarge' objects
return aVeryLargeObj.doStuff;
}
Ladicek is right: accessing a method as a getter will automatically bind the method.
In response to the updated question:
No. It shouldn't keep the scope alive. Binding closures are normally implemented as if you invoked a getter of the same name:
class A{
int _id;
A(this._id);
int getMyId() => _id;
// The implicit getter for getMyId. This is not valid
// code but explains how dart2js implements it. The VM has
// probably a similar mechanism.
Function get getMyId { return () => this.getMyId(); }
}
When implemented this way you will not capture any variable that is alive in your useLotsOfMemory function.
Even if it really was allocating the closure inside the useLotsOfMemory function, it wouldn't be clear if it kept lots of memory alive.
Dart does not specify how much (or how little) is captured when a closure is created. Clearly it needs to capture at least the free variables of itself. This is the minimum. The question is thus: "how much more does it capture"?
The general consensus seems to be to capture every variable that is free in some closure. All local variables that are captured by some closure are moved into a context object and every closure that is created will just store a link to that object.
Example:
foo() {
var x = new List(1000);
var y = new List(100);
var z = new List(10);
var f = () => y; // y is free here.
// The variables y and z are free in some closure.
// The returned closure will keep both alive.
// The local x will be garbage collected.
return () => z; // z is free here.
}
I have seen Scheme implementations that only captured their own free variables (splitting the context object into independent pieces), so less is possible. However in Dart this is not a requirement and I wouldn't rely on it. For safety I would always assume that all captured variables (independent of who captures them) are kept alive. I would also make the assumption that bound closures are implemented similar to what I showed above and that they keep a strict minimum of memory alive.
That's exactly right -- the closure captures the object on which the method will be invoked.

How am I meant to use Filepath.Walk in Go?

The filepath.Walk function takes a function callback. This is straight function with no context pointer. Surely a major use case for Walk is to walk a directory and take some action based on it, with reference to a wider context (e.g. entering each file into a table).
If I were writing this in C# I would use an object (with fields that could point back to the objects in the context) as a callback (with a given callback method) on it so the object can encapsulate the context that Walk is called from.
(EDIT: user "usr" suggests that the closure method occurs in C# too)
If I were writing this in C I'd ask for a function and a context pointer as a void * so the function has a context pointer that it can pass into the Walk function and get that passed through to the callback function.
But Go only has the function argument and no obvious context pointer argument.
(If I'd designed this function I would have taken an object as a callback rather than a function, conforming to the interface FileWalkerCallback or whatever, and put a callback(...) method on that interface. The consumer could then attach whatever context to the object before passing it to Walk.)
The only way I can think of doing it is by capturing the closure of the outer function in the callback function. Here is how I am using it:
func ScanAllFiles(location string, myStorageThing *StorageThing) (err error) {
numScanned = 0
// Wrap this up in this function's closure to capture the `corpus` binding.
var scan = func(path string, fileInfo os.FileInfo, inpErr error) (err error) {
numScanned ++
myStorageThing.DoSomething(path)
}
fmt.Println("Scan All")
err = filepath.Walk(location, scan)
fmt.Println("Total scanned", numScanned)
return
}
In this example I create the callback function so its closure contains the variables numScanned and myStorageThing.
This feels wrong to me. Am I right to think it feels weird, or am I just getting used to writing Go? How is it intended for the filepath.Walk method to be used in such a way that the callback has a reference to a wider context?
You're doing it about right. There are two little variations you might consider. One is that you can replace the name of an unused parameter with an underbar. So, in your example where you only used the path, the signature could read
func(path string, _ os.FileInfo, _ error) error
It saves a little typing, cleans up the code a little, and makes it clear that you are not using the parameter. Also, for small functions especially, it's common skip assigning the function literal to a variable, and just use it directly as the argument. Your code ends up reading,
err = filepath.Walk(location, func(path string, _ os.FileInfo, _ error) error {
numScanned ++
myStorageThing.DoSomething(path)
})
This cleans up scoping a little, making it clear that you are using the closure just once.
As a C# programmer I can say that this is exactly how such an API in .NET would be meant to be used. You would be encouraged to use closures and discouraged to create an explicit class with fields because it just wastes your time.
As Go supports closures I'd say this is the right way to use this API. I don't see anything wrong with it.

Resources