I have some code that, when simplified, looks like:
fn foo() -> Vec<u8> {
unsafe {
unsafe_iterator().map(|n| wrap_element(n)).collect()
}
}
The iterator returns items that would be invalidated if the underlying data changed. Sadly, I'm unable to rely on the normal Rust mechanism of mut here (I'm doing some... odd things).
To rectify the unsafe-ness, I traverse the iterator all at once and make copies of each item (via wrap_element) and then throw it all into a Vec. This works because nothing else has a chance to come in and modify the underlying data.
The code works as-is now, but since I use this idiom a few times, I wanted to DRY up my code a bit:
fn zap<F>(f: F) -> Vec<u8>
where F: FnOnce() -> UnsafeIter
{
f().map(|n| wrap_element(n)).collect()
}
fn foo() -> Vec<u8> {
zap(|| unsafe { unsafe_iterator() }) // Unsafe block
}
My problem with this solution is that the call to unsafe_iterator is unsafe, and it's the wrap_element / collect that makes it safe again. The way that the code is structured does not convey that at all.
I'd like to somehow mark my closure as being unsafe and then it's zaps responsibility to make it safe again.
It's not possible to create an unsafe closure in the same vein as an unsafe fn, since closures are just anonymous types with implementations of the Fn, FnMut, and/or FnOnce family of traits. Since those traits do not have unsafe methods, it's not possible to create a closure which is unsafe to call.
You could create a second set of closure traits with unsafe methods, then write implementations for those, but you would lose much of the closure sugar.
Related
Interested why does set method defined on Cell, on the last line explicitly drops old value.
Shouldn't it be implicitly dropped (memory freed) anyways when the function returns?
use std::mem;
use std::cell::UnsafeCell;
pub struct Cell<T> {
value: UnsafeCell<T>
}
impl<T> Cell<T> {
pub fn set(&self, val: T) {
let old = self.replace(val);
drop(old); // Is this needed?
} // old would drop here anyways?
pub fn replace(&self, val: T) -> T {
mem::replace(unsafe { &mut *self.value.get() }, val)
}
}
So why not have set do this only:
pub fn set(&self, val: T) {
self.replace(val);
}
or std::ptr::read does something I don't understand.
It is not needed, but calling drop explicitly can help make code easier to read in some cases. If we only wrote it as a call to replace, it would look like a wrapper function for replace and a reader might lose the context that it does an additional action on top of calling the replace method (dropping the previous value). At the end of the day though it is somewhat subjective on which version to use and it makes no functional difference.
That being said, the real reason is that it did not always drop the previous value when set. Cell<T> previously implemented set to overwrite the existing value via unsafe pointer operations. It was later modified in rust-lang/rust#39264: Extend Cell to non-Copy types so that the previous value would always be dropped. The writer (wesleywiser) likely wanted to more explicitly show that the previous value was being dropped when a new value is written to the cell so the pull request would be easier to review.
Personally, I think this is a good usage of drop since it helps to convey what we intend to do with the result of the replace method.
I have the following Dart 2 code with null-safety.
extension Foo<T> on List<T> {
List<U> bar<U>({
U Function(T)? transform,
}) {
final t = transform ?? _identityTransform;
return map(t).toList();
}
}
U _identityTransform<T, U>(T t) => t as U; // #1, #2
void main() {
final strings = ['a', 'b', 'c'].bar<String>(); // #3
final ints = ['1', '2', '3'].bar(transform: int.parse);
print(strings);
print(ints);
}
It is an extension method on List<T> with a custom method that is basically a map with the
difference that it can return a new list of the same type if no transform is specified. (My real code is more complex than this, but this example is enough to present my questions.)
I want to be able to call bar() on a List with transform or without; if called without it, _identityTransform should be used.
The code above works, but I have a few reservations as to its quality, and questions, as I haven't really come to terms with Dart generics yet:
In the line marked #1 - the _identityTransform takes two generic parameters as I need access to them, but when the function is used the generic types are not used because I don't think it is possible to write something like _identityTransform<T, U> there. Is there a better way of defining _identityTransform? Am I losing any type safety with my current code?
In the line marked #2 I need a cast as U for the code to compile, I haven't managed to make the code work without it. Is there a way to do it without the cast?
In the line marked #3, when I call the extension method without any transform (i.e. I want the identity transform to kick in) I need to explicitly pass the generic type, otherwise the compiler complains about missing generic type (in strong mode) or infers strings to be List<dynamic> (strong mode turned off). Is some generics magic possible to be able to call .bar() and still have strings be inferred to List<String>?
I would make _identityTransform a nested function of bar so that you can remove its type arguments and instead use the same T and U as bar:
extension Foo<T> on List<T> {
List<U> bar<U>({
U Function(T)? transform,
}) {
U _identityTransform(T t) => t as U;
final t = transform ?? _identityTransform;
return map(t).toList();
}
}
Alternatively if you want to explicitly use _identityTransform<T, U>, then you could use a closure: t = transform ?? (arg) => _identityTransform<T, U>(arg), but that seems like overkill.
You need the cast. T and U are independent/unrelated types. Since you don't know that you want T and U to be the same until bar checks its argument at runtime, you will need the explicit cast to satisfy static type checking.
If the caller passes nothing for the transform argument, there is nothing to infer U from, so it will be dynamic. I can't think of any magical way make U default to T in such a case (again, that would be known only at runtime, but generics must satisfy static analysis).
The Error Handling chapter of the Rust Book contains an example on how to use the combinators of Option and Result. A file is read and through application of a series of combinators the contents are parsed as an i32 and returned in a Result<i32, String>.
Now, I got confused when I looked at the code. There, in one closure to an and_then a local String value is created an subsequently passed as a return value to another combinator.
Here is the code example:
use std::fs::File;
use std::io::Read;
use std::path::Path;
fn file_double<P: AsRef<Path>>(file_path: P) -> Result<i32, String> {
File::open(file_path)
.map_err(|err| err.to_string())
.and_then(|mut file| {
let mut contents = String::new(); // local value
file.read_to_string(&mut contents)
.map_err(|err| err.to_string())
.map(|_| contents) // moved without 'move'
})
.and_then(|contents| {
contents.trim().parse::<i32>()
.map_err(|err| err.to_string())
})
.map(|n| 2 * n)
}
fn main() {
match file_double("foobar") {
Ok(n) => println!("{}", n),
Err(err) => println!("Error: {}", err),
}
}
The value I am referring to is contents. It is created and later referenced in the map combinator applied to the std::io::Result<usize> return value of Read::read_to_string.
The question: I thought that not marking the closure with move would borrow any referenced value by default, which would result in the borrow checker complaining, that contents does not live long enough. However, this code compiles just fine. That means, the String contents is moved into, and subequently out of, the closure. Why is this done without the explicit move?
I thought that not marking the closure with move would borrow any referenced value by default,
Not quite. The compiler does a bit of inspection on the code within the closure body and tracks how the closed-over variables are used.
When the compiler sees that a method is called on a variable, then it looks to see what type the receiver is (self, &self, &mut self). When a variable is used as a parameter, the compiler also tracks if it is by value, reference, or mutable reference. Whatever the most restrictive requirement is will be what is used by default.
Occasionally, this analysis is not complete enough — even though the variable is only used as a reference, we intend for the closure to own the variable. This usually occurs when returning a closure or handing it off to another thread.
In this case, the variable is returned from the closure, which must mean that it is used by value. Thus the variable will be moved into the closure automatically.
Occasionally the move keyword is too big of a hammer as it moves all of the referenced variables in. Sometimes you may want to just force one variable to be moved in but not others. In that case, the best solution I know of is to make an explicit reference and move the reference in:
fn main() {
let a = 1;
let b = 2;
{
let b = &b;
needs_to_own_a(move || a_function(a, b));
}
}
In Rust the main function is defined like this:
fn main() {
}
This function does not allow for a return value though. Why would a language not allow for a return value and is there a way to return something anyway? Would I be able to safely use the C exit(int) function, or will this cause leaks and whatnot?
As of Rust 1.26, main can return a Result:
use std::fs::File;
fn main() -> Result<(), std::io::Error> {
let f = File::open("bar.txt")?;
Ok(())
}
The returned error code in this case is 1 in case of an error. With File::open("bar.txt").expect("file not found"); instead, an error value of 101 is returned (at least on my machine).
Also, if you want to return a more generic error, use:
use std::error::Error;
...
fn main() -> Result<(), Box<dyn Error>> {
...
}
std::process::exit(code: i32) is the way to exit with a code.
Rust does it this way so that there is a consistent explicit interface for returning a value from a program, wherever it is set from. If main starts a series of tasks then any of these can set the return value, even if main has exited.
Rust does have a way to write a main function that returns a value, however it is normally abstracted within stdlib. See the documentation on writing an executable without stdlib for details.
As was noted by others, std::process::exit(code: i32) is the way to go here
More information about why is given in RFC 1011: Process Exit. Discussion about the RFC is in the pull request of the RFC.
The reddit thread on this has a "why" explanation:
Rust certainly could be designed to do this. It used to, in fact.
But because of the task model Rust uses, the fn main task could start a bunch of other tasks and then exit! But one of those other tasks may want to set the OS exit code after main has gone away.
Calling set_exit_status is explicit, easy, and doesn't require you to always put a 0 at the bottom of main when you otherwise don't care.
Try:
use std::process::ExitCode;
fn main() -> ExitCode {
ExitCode::from(2)
}
Take a look in doc
or:
use std::process::{ExitCode, Termination};
pub enum LinuxExitCode { E_OK, E_ERR(u8) }
impl Termination for LinuxExitCode {
fn report(self) -> ExitCode {
match self {
LinuxExitCode::E_OK => ExitCode::SUCCESS,
LinuxExitCode::E_ERR(v) => ExitCode::from(v)
}
}
}
fn main() -> LinuxExitCode {
LinuxExitCode::E_ERR(3)
}
You can set the return value with std::os::set_exit_status.
The filepath.Walk function takes a function callback. This is straight function with no context pointer. Surely a major use case for Walk is to walk a directory and take some action based on it, with reference to a wider context (e.g. entering each file into a table).
If I were writing this in C# I would use an object (with fields that could point back to the objects in the context) as a callback (with a given callback method) on it so the object can encapsulate the context that Walk is called from.
(EDIT: user "usr" suggests that the closure method occurs in C# too)
If I were writing this in C I'd ask for a function and a context pointer as a void * so the function has a context pointer that it can pass into the Walk function and get that passed through to the callback function.
But Go only has the function argument and no obvious context pointer argument.
(If I'd designed this function I would have taken an object as a callback rather than a function, conforming to the interface FileWalkerCallback or whatever, and put a callback(...) method on that interface. The consumer could then attach whatever context to the object before passing it to Walk.)
The only way I can think of doing it is by capturing the closure of the outer function in the callback function. Here is how I am using it:
func ScanAllFiles(location string, myStorageThing *StorageThing) (err error) {
numScanned = 0
// Wrap this up in this function's closure to capture the `corpus` binding.
var scan = func(path string, fileInfo os.FileInfo, inpErr error) (err error) {
numScanned ++
myStorageThing.DoSomething(path)
}
fmt.Println("Scan All")
err = filepath.Walk(location, scan)
fmt.Println("Total scanned", numScanned)
return
}
In this example I create the callback function so its closure contains the variables numScanned and myStorageThing.
This feels wrong to me. Am I right to think it feels weird, or am I just getting used to writing Go? How is it intended for the filepath.Walk method to be used in such a way that the callback has a reference to a wider context?
You're doing it about right. There are two little variations you might consider. One is that you can replace the name of an unused parameter with an underbar. So, in your example where you only used the path, the signature could read
func(path string, _ os.FileInfo, _ error) error
It saves a little typing, cleans up the code a little, and makes it clear that you are not using the parameter. Also, for small functions especially, it's common skip assigning the function literal to a variable, and just use it directly as the argument. Your code ends up reading,
err = filepath.Walk(location, func(path string, _ os.FileInfo, _ error) error {
numScanned ++
myStorageThing.DoSomething(path)
})
This cleans up scoping a little, making it clear that you are using the closure just once.
As a C# programmer I can say that this is exactly how such an API in .NET would be meant to be used. You would be encouraged to use closures and discouraged to create an explicit class with fields because it just wastes your time.
As Go supports closures I'd say this is the right way to use this API. I don't see anything wrong with it.