mem::replace in Rust - memory

The rust by example guide shows the following code here for a fibonacci series with iterators:
fn next(&mut self) -> Option<u32> {
let new_next = self.curr + self.next;
let new_curr = mem::replace(&mut self.next, new_next);
// 'Some' is always returned, this is an infinite value generator
Some(mem::replace(&mut self.curr, new_curr))
}
I would like to understand what is the advantage of this, over the most intuitive (if you come from other languages):
fn next(&mut self) -> Option<u32> {
let tmp = self.next;
self.next = self.curr + self.next;
self.curr = tmp;
Some(self.curr)
}

It's not always possible to write the direct code, due to Rust's ownership. If self.next is storing a non-Copy type (e.g. Vec<T> for any type T) then let tmp = self.next; is taking that value out of self by-value, that is, moving ownership, so the source should not be usable. But the source is behind a reference and references must always point to valid data, so the compiler cannot allow moving out of &mut: you get errors like cannot move out of dereference of `&mut`-pointer.
replace gets around these issues via unsafe code, by internally making the guarantee that any invalidation is entirely made valid by the time replace returns.
You can see this answer for more info about moves in general, and this question for a related issue about a swap function (replace is implemented using the standard library's swap internally).

Related

How can I change the value of a field of a boxed struct while making sure the pointer to the box stays valid?

I have the following rust code:
pub struct Settings {
pub vec: Vec<usize>,
pub index: usize,
}
pub fn add_to_settings_vec(ptr: *mut Settings) {
// create a box from the pointer
let boxed_settings = unsafe { Box::from_raw(ptr) };
// get the value in the box
let mut settings = *boxed_settings;
// mutate the value
settings.vec.push(123);
// create a new box and get a raw pointer to it so the value isn't deallocated
Box::into_raw(Box::new(project));
}
I want to make sure that the pointer that was passed as a function parameter stays valid and points to the modified Settings value, even after adding an element to the vector of the unboxed Settings value.
I think I'll need the following steps to achieve my goal:
Receive a pointer to Settings
Unbox the value
Modify the settings in place
Make sure that the Settings don't get deallocated
Now I need the passed in pointer to still be valid and point to the in place modified Settings
This should probably be possible as the Settings value is of a fixed size, as far as I have understood it, even with the vector, as the vector grows its the memory at a different location.
I have looked at the docs over at https://doc.rust-lang.org/std/boxed/struct.Box.html but I haven't been able to figure this out.
I have also read about pins, but I do not know how or when to use them: https://doc.rust-lang.org/std/pin/index.html.
Does the above code always leave the passed in pointer in tact? I'd appreciate an explanation on why or why not.
If not, how can I make sure that the passed in pointer stays valid and keeps pointing to the modified Settings value, at the same place in memory?
Unbox the value
Modify the settings in place
Make sure that the Settings don't get deallocated
It's already been deallocated at step (1): when you deref'd it, you moved the Settings struct out of the Box, and in doing so deallocated the box.
I don't understand why you go through all this mess (or why you pass unsafe pointers to a safe non-extern function).
// create a box from the pointer
let mut settings = unsafe { Box::from_raw(ptr) };
// mutate the value
settings.vec.push(123);
settings.leak();
would work fine and not dealloc anything. Though it's still risky and unnecessary to get a handle on a Box if the pointer is only borrowed (which it seems to be here since the settings are never returned.
A much better way would be to just create a borrow of the settings vec:
let v = unsafe { &mut (*ptr).vec };
v.push(123);
Or possibly the settings struct if you also need to update the index:
let settings = unsafe { &mut (*ptr) };
settings.vec.push(123);
You can achive the mutation by just dereferencing the pointer in an unsafe block:
pub struct Settings {
pub vec: Vec<usize>,
pub index: usize,
}
pub fn add_to_settings_vec(ptr: *mut Settings) {
// mutate the value
unsafe {(*ptr).vec.push(123)};
}
fn main() {
let mut s = Settings {
vec: vec![],
index: 10,
};
add_to_settings_vec(&mut s as *mut Settings);
println!("{:?}", s.vec);
}
Playground

How do I iterate over a HashSet while simultaneously updating it in Rust?

In most languages, iterating over a container while simultaneously mutating it is a glaring anti-pattern. In Rust, the borrow checker makes it impossible.
But there are cases where this is needed. An example is Earley's algorithm for parsing. I'm new to Rust and I'm not aware of a known good way to iterate over a HashSet (or in fact any container) while extending it. I came up with the following (generalized from the Earley use case):
use std::collections::HashSet;
fn extend_from_within<T, F>(original: &mut HashSet<T>, process: F)
where T: std::cmp::Eq,
T: std::hash::Hash,
F: Fn(&T) -> Set<T>
{
let mut curr : HashSet<T> = HashSet::new(); // Items currently being processed
let mut next : HashSet<T> = HashSet::new(); // New items
// go over the original set once
let hack : &HashSet<T> = original; // &mut HashSet is not an iterator
for x in hack {
for y in process(x) {
if !original.contains(&y) {
curr.insert(y);
}
}
}
// Process and extend until no new items emerge
while !curr.is_empty() {
for x in &curr {
for y in process(x) {
// make sure that no item is processed twice
// the check on original is redundant, but might save space
if !curr.contains(&y) && !original.contains(&y) {
next.insert(y);
}
}
}
original.extend(curr.drain());
std::mem::swap(&mut curr, &mut next);
}
}
As you see, any call to process can yield multiple items. They are all added to the set and all of them have to be processed as well, but only if they haven't been seen. This works well enough. But keeping up to 4 sets, doing 3 checks for membership on each item (one of them twice on the original array) seems ridiculous for this problem. Is there a better way?
I'm not aware of a known good way to iterate over a HashSet (or in fact any Container) while extending it
I think there's mostly three patterns for handling modifications while iterating over HashMaps/HashSets:
Collecting the modifications into a Vec and applying them after the iteration
draining the HashSet and collecting into a new one
retain, but that's for deletions only
But your case is special anyway, you want to saturate your set with process, right?
Is there a better way?
In this case, I might go for
let mut todo = original.iter().map(Clone::clone).collect::<VecDeque<T>>();
while let Some(x) = todo.pop_front() {
for x in process(&x) {
if original.insert(x.clone()) {
todo.push_back(x);
}
}
}
The VecDeque is probably not necessary (a normal Vec would do), unless you have some requirement on the order of processing elements. Cache-wise, a Vec may be better.
One could avoid the clones by instead keeping the Set result of process(x) in todo. Without knowing what Set and T are, I can't say which is better. If the result of process is often empty, this would also allow to filter out empties before pushing them onto todo.
[Edit:] Another variant may be to
let mut todo = original
.iter()
.flat_map(&process)
.filter(|x| !original.contains(x))
.collect::<VecDeque<T>>();
todo.iter().for_each(|x| {
original.insert(x.clone());
});
// while let …
This might allocate less, but cause more hash map accesses / cache misses. Finding which variant is more efficient really requires benchmarking.

Why is "capture by reference" equivalent to "capture a reference by value" in Rust?

Excerpt from Huon Wilson's Finding Closure in Rust:
Capturing entirely by value is also strictly more general than capturing by reference: the reference types are first-class in Rust, so "capture by reference" is the same as "capture a reference by value". Thus, unlike C++, there’s little fundamental distinction between capture by reference and by value, and the analysis Rust does is not actually necessary: it just makes programmers’ lives easier.
I struggle to wrap my head around this. If you capture a reference "by value", don't you capture the data that is stored on the heap? Or does it refer to the pointer value of the reference, which is found on the stack?
I think the article means to say that the effect is almost equivalent. Immutable references implement Copy trait because of which when you capture a reference by value, it is copied. So basically you just created one more shared reference to the same data.
fn as_ref(_x: &String) {}
fn as_mut(_x: &mut String) {}
fn main() {
let mut x = String::from("hello world");
let y = &x;
let _ = move || {
as_ref(y); // here you moved y but it
// basically created a copy
};
let z = y; // can be used later
// The same cannot be done by mutable
// refs because they don't
// implement Copy trait
let y = &mut x;
let _ = move || {
as_mut(y); // here you moved y and
// it cannot be used outside
};
// ERROR! Cannot be used again
//let z = y;
}
Playground
Or does it refer to the pointer value of the reference, which is found on the stack?
Yes, ish.
In Rust, references are reified, they're an actual thing you manipulate. So when you capture a reference by value, you're capturing the reference itself (the pointer, which is what a Rust reference really is), not the referee (the pointee).
Capturing by reference basically just implicitly creates a reference and captures it by value.

What does it mean if a mutating function setting it's self equals to another function

I was going through Apple's Arkit project sample. I was trying to understand the code since, I am still learning. I saw a function setting it self equals to another function can someone please explain what these functions are exactly doing. Please brief in detail.In the code the "mutating func normalize()" is setting it self to self.normalized why is it. What this code is doing. Can we not simply call "func normalized()" seems like we are re-creating the same function.
mutating func normalize() {
self = self.normalized()
}
func normalized() -> SCNVector3 {
if self.length() == 0 {
return self
}
return self / self.length()
}
func length() -> CGFloat {
return sqrt(self.x * self.x + self.y * self.y)
}
Values types in Swift can be mutable and immutable . So when you create struct( or any other value type) and assign it to variable (var) it is mutable and you call normalize() on it. It means that struct won’t be copied to another peace of memory and will be updated in place (will act like reference type). But when you assign it to constant (let) - it can’t be mutated so the only way to update values in this struct is to create new one with updated values as with normalized() method. Regarding your question - normalize() is just reusing logic for normalizing vector from normalized(). So this is completely fine solution. Assigning to self is only permitted in mutable methods. It’s basically rewrites value of struct with new one.
I'm assuming that this fragment is in a struct rather than a class.
self.normalized() makes a copy of self and divides the copy's components by its length and then returns the copy. self is not affected.
self.normalize() gets a normalised version of self and then replaces self by the copy. So it changes in place.
Under the hood, every member function passes self as an implicit argument. i.e. to the compiler the declaration looks like this:
func normalised(self: SCNVector3) -> SCNVector3
Putting mutating on the front of a function definition makes the hidden argument inout
func normalise(self: inout SCNVector3)
So, if you have
var a = SCNVector3(3, 4, 0)
let b = SCNVector3(4, 3, 0)
let c = b.normalized()
a.normalize()
After that code, c would be (0.8, 0.6, 0) and a would be (0.6, 0.8, 0). b would be unchanged.
Note that a has to be declared with var because it is changed in place by normalise()
Edit
In the comments khan asks:
What i am not able to understand is why do we have to create a func again we can use "func normalized"
The point being made is why can't we do something like this:
var a = SCNVector3(3, 4, 0)
a = a.normalized()
and not have the normalise() function at all?
The above will have exactly the same effect as a.normalize() and, in my opinion[1], is better style, being more "functional".
I think a.normalize() exists only because it is common in Swift to provide both forms of the function. For example, with sets you have both union() and formUnion(): the first returns the union of one set with another, the second replaces a set with the union of itself and another set. In some cases, the in place version of the function may be more efficient, but not, I think, with this normalize function.
Which you choose to use is a matter of your preference.
[1] Actually, the better style is
let a = SCNVector3(3, 4, 0).normalized()

iOS swift why two type of closure can't add to an array

let sum1 = {(a:Int, b:Int) -> Int in return a + b}
let sum2 = {(a:Float, b:Float) -> Float in return a + b}
var cl = [sum1, sum2]
Why sum1 and sum2 can't be added to an array?
I know sum1 takes two Ints return an Int, sum2 takes two Floats return a Float, but an array can be added two different type of object, for example let foo = [12, "12"] is valid.
The suggestions that you use [Any] for this will compile, but if you need Any (or AnyObject) in a property, you almost always have a design mistake. Any exists for rare cases where you really must circumvent the type system, but you will typically run into numerous headaches with it.
There are numerous type-safe solutions to these kinds of problems depending on what your underlying goal really is. The most likely solution, if your goal is to keep Int-methods separate from Float-methods, is to use an enum:
enum Operator {
case int((Int, Int) -> Int)
case float((Float, Float) -> Float)
}
let sum1 = Operator.int {(a, b) in return a + b }
let sum2 = Operator.float {(a, b) in return a + b}
let cl = [sumop1, sumop2]
Any means you can throw absolutely anything in it and it's your problem to figure out what to do with the random things you find in it. The compiler can't help you, and in some cases will actively fight you. You'll need to accept undefined behavior or add precondition all over the place to check at runtime that you didn't make a mistake.
Operator is an "OR" type. [Operator] is an array of functions that either operate on Ints or Floats. This seems what you mean, so let the compiler help you by telling it that's what you mean. The compiler will detect mistakes at compile time rather than crashing at runtime. Many unit tests become unnecessary because whole classes of bugs are impossible. With a proper type and the compiler's help, you can simplify cl to:
let cl: [Operator] = [.int(+), .float(+)]
That's pretty nice IMO.
On the other hand, if the goal is to accept both Ints and Floats, then you should likely wrap them up in NSNumber which can work on both. If you want to keep track of which were Ints and which were Floats so you can apply your math more carefully, you can create a struct:
struct Number {
enum Kind {
case Int
case Float
}
let value: NSNumber
let type: Kind
}
Reason is that each closure represents new type, and if your array is not of type Any or AnyObject than it can have inside only elements of one type. That's why you can't place them as easily inside. For Closures you need to use type Any, because closures is structure type.

Resources