What's the difference between filter(|x|) and filter(|&x|)? - closures

In the Rust Book there is an example of calling filter() on an iterator:
for i in (1..100).filter(|&x| x % 2 == 0) {
println!("{}", i);
}
There is an explanation below but I'm having trouble understanding it:
This will print all of the even numbers between one and a hundred. (Note that because filter doesn't consume the elements that are being iterated over, it is passed a reference to each element, and thus the filter predicate uses the &x pattern to extract the integer itself.)
This, however does not work:
for i in (1..100).filter(|&x| *x % 2 == 0) {
println!("i={}", i);
}
Why is the argument to the closure a reference |&x|, instead of |x|? And what's the difference between using |&x| and |x|?
I understand that using a reference |&x| is more efficient, but I'm puzzled by the fact that I didn't have to dereference the x pointer by using *x.

When used as a pattern match (and closure and function arguments are also pattern matches), the & binds to a reference, making the variable the dereferenced value.
fn main() {
let an_int: u8 = 42;
// Note that the `&` is on the right side of the `:`
let ref_to_int: &u8 = &an_int;
// Note that the `&` is on the left side of the `:`
let &another_int = ref_to_int;
let () = another_int;
}
Has the error:
error: mismatched types:
expected `u8`,
found `()`
If you look at your error message for your case, it indicates that you can't dereference it, because it isn't a reference:
error: type `_` cannot be dereferenced
I didn't have to dereference the x pointer by using *x.
That's because you implicitly dereferenced it in the pattern match.
I understand that using a reference |&x| is more efficient
If this were true, then there would be no reason to use anything but references! Namely, references require extra indirection to get to the real data. There's some measurable cut-off point where passing items by value is more efficient than passing references to them.
If so, why does using |x| not throw an error? From my experience with C, I would expect to receive a pointer here.
And you do, in the form of a reference. x is a reference to (in this example) an i32. However, the % operator is provided by the trait Rem, which is implemented for all pairs of reference / value:
impl Rem<i32> for i32
impl<'a> Rem<i32> for &'a i32
impl<'a> Rem<&'a i32> for i32
impl<'a, 'b> Rem<&'a i32> for &'b i32
This allows you to not need to explicitly dereference it.
Or does Rust implicitly allocate a copy of value of the original x on the stack here?
It emphatically does not do that. In fact, it would be unsafe to do that, unless the iterated items implemented Copy (or potentially Clone, in which case it could also be expensive). This is why references are used as the closure argument.

Related

I don't understand this map tuple key compilation error, in F#

Here is a function:
let newPositions : PositionData list =
positions
|> List.filter (fun x ->
let key = (x.Instrument, x.Side)
match brain.Positions.TryGetValue key with
| false, _ ->
// if we don't know the position, it's new
true
| true, p when x.UpdateTime > p.UpdateTime ->
// it's newer than the version we have, it's new
true
| _ ->
false
)
it compiles at expected.
let's focus on two lines:
let key = (x.Instrument, x.Side)
match brain.Positions.TryGetValue key with
brain.Positions is a Map<Instrument * Side, PositionData> type
if I modify the second line to:
match brain.Positions.TryGetValue (x.Instrument, x.Side) with
then the code will not compile, with error:
[FS0001] This expression was expected to have type
'Instrument * Side'
but here has type
'Instrument'
but:
match brain.Positions.TryGetValue ((x.Instrument, x.Side)) with
will compile...
why is that?
This is due to method call syntax.
TryGetValue is not a function, but a method. A very different thing, and a much worse thing in general. And subject to some special syntactic rules.
This method, you see, actually has two parameters, not one. The first parameter is a key, as you expect. And the second parameter is what's known in C# as out parameter - i.e. kind of a second return value. The way it was originally meant to be called in C# is something like this:
Dictionary<int, string> map = ...
string val;
if (map.TryGetValue(42, out val)) { ... }
The "regular" return value of TryGetValue is a boolean signifying whether the key was even found. And the "extra" return value, denoted here out val, is the value corresponding to the key.
This is, of course, extremely awkward, but it did not stop the early .NET libraries from using this pattern very widely. So F# has special syntactic sugar for this pattern: if you pass just one parameter, then the result becomes a tuple consisting of the "actual" return value and the out parameter. Which is what you're matching against in your code.
But of course, F# cannot prevent you from using the method exactly as designed, so you're free to pass two parameters as well - the first one being the key and the second one being a byref cell (which is F# equivalent of out).
And here is where this clashes with the method call syntax. You see, in .NET all methods are uncurried, meaning their arguments are all effectively tupled. So when you call a method, you're passing a tuple.
And this is what happens in this case: as soon as you add parentheses, the compiler interprets that as an attempt to call a .NET method with tupled arguments:
brain.Positions.TryGetValue (x.Instrument, x.Side)
^ ^
first arg |
second arg
And in this case it expects the first argument to be of type Instrument * Side, but you're clearly passing just an Instrument. Which is exactly what the error message tells you: "expected to have type 'Instrument * Side'
but here has type 'Instrument'".
But when you add a second pair of parens, the meaning changes: now the outer parens are interpreted as "method call syntax", and the inner parens are interpreted as "denoting a tuple". So now the compiler interprets the whole thing as just a single argument, and all works as before.
Incidentally, the following will also work:
brain.Positions.TryGetValue <| (x.Instrument, x.Side)
This works because now it's no longer a "method call" syntax, because the parens do not immediately follow the method name.
But a much better solution is, as always, do not use methods, use functions instead!
In this particular example, instead of .TryGetValue, use Map.tryFind. It's the same thing, but in proper function form. Not a method. A function.
brain.Positions |> Map.tryFind (x.Instrument, x.Side)
Q: But why does this confusing method even exist?
Compatibility. As always with awkward and nonsensical things, the answer is: compatibility.
The standard .NET library has this interface System.Collections.Generic.IDictionary, and it's on that interface that the TryGetValue method is defined. And every dictionary-like type, including Map, is generally expected to implement that interface. So here you go.
In future, please consider the Stack Overflow guidelines provided under How to create a Minimal, Reproducible Example. Well, minimal and reproducible the code in your question is, but it shall also be complete...
…Complete – Provide all parts someone else needs to reproduce your
problem in the question itself
That being said, when given the following definitions, your code will compile:
type Instrument() = class end
type Side() = class end
type PositionData = { Instrument : Instrument; Side : Side; }
with member __.UpdateTime = 0
module brain =
let Positions = dict[(Instrument(), Side()), {Instrument = Instrument(); Side = Side()}]
let positions = []
Now, why is that? Technically, it is because of the mechanism described in the F# 4.1 Language Specification under §14.4 Method Application Resolution, 4. c., 2nd bullet point:
If all formal parameters in the suffix are “out” arguments with byref
type, remove the suffix from UnnamedFormalArgs and call it
ImplicitlyReturnedFormalArgs.
This is supported by the signature of the method call in question:
System.Collections.Generic.IDictionary.TryGetValue(key: Instrument * Side, value: byref<PositionData>)
Here, if the second argument is not provided, the compiler does the implicit conversion to a tuple return type as described in §14.4 5. g.
You are obviously familiar with this behaviour, but maybe not with the fact that if you specify two arguments, the compiler will see the second of them as the explicit byref "out" argument, and complains accordingly with its next error message:
Error 2 This expression was expected to have type
PositionData ref
but here has type
Side
This misunderstanding changes the return type of the method call from bool * PositionData to bool, which consequently elicits a third error:
Error 3 This expression was expected to have type
bool
but here has type
'a * 'b
In short, your self-discovered workaround with double parentheses is indeed the way to tell the compiler: No, I am giving you only one argument (a tuple), so that you can implicitly convert the byref "out" argument to a tuple return type.

Why does Rust reuse memory with same value

Example code:
fn main() {
let mut y = &5; // 1
println!("{:p}", y);
{
let x = &2; // 2
println!("{:p}", x);
y = x;
}
y = &3; // 3
println!("{:p}", y);
}
If third assignment contains &3 then code output:
0x558e7da926a0
0x558e7da926a4
0x558e7da926a8
If third assignment contains &2 (same value with second assignment) then code output:
0x558e7da926a0
0x558e7da926a4
0x558e7da926a4
If third assignment contains &5 (same value with first assignment) then code output:
0x558e7da926a0
0x558e7da926a4
0x558e7da926a0
Why does rust not free memory but reuse it if the assignment value is the same or allocate a new block of memory otherwise?
Two occurrences of the same literal number are indistinguishable. You cannot expect the address of two literals to be identical, and neither can you expect them to be different.
This allows the compiler (but in fact it is free to do otherwise) to emit one 5 data in the executable code, and have all &5 refer to it. Constants may (see comment) also have a static lifetime, in which case they are not allocated/deallocated during program execution, they always are allocated.
There are lots of tricks an optimizing compiler can use to determine if a variable can be assigned a constant value. Your findings are consistent with this, no need to run duplicate code if it is not needed.

How to capture mutable reference into move closure contained in iterator returned from a closure

I have a PRNG that I would like to allow a closure to access by mutable reference. The lifetimes of everything should theoretically be able to work out, here is what it looks like:
fn someFunction<F, I>(mut crossover_point_iter_generator: F)
where F: FnMut(usize) -> I, I: Iterator<Item=usize>;
let mut rng = Isaac64Rng::from_seed(&[1, 2, 3, 4]);
someFunction(|x| (0..3).map(move |_| rng.gen::<usize>() % x));
Here, a closure is creating an iterator that wraps PRNG generated values. This iterator contains a map with a closure that has the wrap range x cloned into it, but the problem is that it unintentionally clones rng as well, which I have verified. It is necessary to make it a move closure because the value of x must be captured, otherwise the closure will outlive x.
I attempted to add this line to force it to move the reference into the closure:
let rng = &mut rng;
However, Rust complains with this error:
error: cannot move out of captured outer variable in an `FnMut` closure
Can I mutably access the PRNG from inside the move closure, and if not, since the PRNG clearly outlives the function call, is there an alternative solution (aside from redesigning the API)?
Edit:
I have rewritten it to remove the copy issue and the call looks like this:
someFunction(|x| rng.gen_iter::<usize>().map(move |y| y % x).take(3));
This results in a new error:
error: cannot infer an appropriate lifetime for autoref due to conflicting requirements
You're got a situation that requires multiple conflicting mutable borrows, and rustc is denying this as it should. It's just up to us to understand how & why this happens!
A note that will be important:
Isaac64Rng implements Copy, which means that it implicitly copies instead of just moving. I'm assuming is is a legacy / backwards compatibility thing.
I wrote this version of the code to get it straight:
extern crate rand;
use rand::Isaac64Rng;
use rand::{Rng, SeedableRng};
fn someFunction<F, I>(crossover_point_iter_generator: F)
where F: FnMut(usize) -> I, I: Iterator<Item=usize>
{
panic!()
}
fn main() {
let mut rng = Isaac64Rng::from_seed(&[1, 2, 3, 4]);
let rng = &mut rng; /* (##) Rust does not allow. */
someFunction(|x| {
(0..3).map(move |_| rng.gen::<usize>() % x)
});
}
Let me put this in points:
someFunction wants a closure it can call, that returns an iterator each time it's called. The closure is mutable and can be called many times (FnMut).
We must assume all the returned iterators can be used at the same time, and not in sequence (one at a time).
We would like to borrow the Rng into the iterator, but mutable borrows are exclusive. So borrowing rules do not allow more than one iterator at a time.
FnOnce instead of FnMut would be one example of a closure protocol to help us here. It would make rustc see that there can be only one iterator.
In the working version, without the line (##), you have several iterators active at the same time, what's happening there? It's the implicit copying kicking in, so each iterator will use an identical copy of the original Rng (sounds undesirable).
Your second version of the code runs into essentially the same limitation.
If you want to work around the exclusivity of borrowing, you can use special containers like RefCell or Mutex to serialize access to the Rng.

How does Objective-C initialize C struct as property?

Consider below struct:
typedef struct _Index {
NSInteger category;
NSInteger item;
} Index;
If I use this struct as a property:
#property (nonatomic, assign) Index aIndex;
When I access it without any initialization right after a view controller alloc init, LLDB print it as:
(lldb) po vc.aIndex
(category = 0, item = 0)
(lldb) po &_aIndex
0x000000014e2bcf70
I am a little confused, the struct already has valid memory address, even before I want to allocate one. Does Objective-C initialize struct automatically? If it is a NSObject, I have to do alloc init to get a valid object, but for C struct, I get a valid struct even before I tried to initialize it.
Could somebody explains, and is it ok like this, not manually initializing it?
To answer the subquestion, why you cannot assign to a structure component returned from a getter:
(As a motivation this is, because I have read this Q several times.)
A. This has nothing to do with Cbjective-C. It is a behavior stated in the C standard. You can check it for simple C code:
NSMakeSize( 1.0, 2.0 ).width = 3.0; // Error
B. No, it is not an improvement of the compiler. If it would be so, a warning would be the result, not an error. A compiler developer does not have the liberty to decide what an error is. (There are some cases, in which they have the liberty, but this are explicitly mentioned.)
C. The reason for this error is quite easy:
An assignment to the expression
NSMakeSize( 1.0, 2.0 ).width
would be legal, if that expression is a l-value. A . operator's result is an l-value, if the structure is an l-value:
A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member,82) and is an lvalue if the first expression is an lvalue.
ISO/IEC 9899:TC3, 6.5.2.3
Therefore it would be assignable, if the expression
NSMakeSize( 1.0, 2.0 )
is an l-value. It is not. The reason is a little bit more complex. To understand that you have to know the links between ., -> and &:
In contrast to ., -> always is an l-value.
A postfix expression followed by the -> operator and an identifier designates a member of a structure or union object. The value is that of the named member of the object to which the first expression points, and is an lvalue. 83)
Therefore - that is what footnote 83 explains – ->, &, and . has a link:
If you can calculate the address of a structure S having a component C with the & operator, the expression (&S)->C is equivalent to S.C. This requires that you can calculate the address of S. But you can never do that with a return value, even it is a simple integer …
int f(void)
{
return 1;
}
f()=5; // Error
… or a pointer …
int *f(void)
{
return NULL;
}
f()=NULL; // Error
You always get the same error: It is not assignable. Because it is a r-value. This is obvious, because it is not clear,
a) whether the way the compiler returns a value, esp. whether he does it in address space.
b) when the time the life time of the returned value is over
Going back to the structure that means that the return value is a r-value. Therefore the result of the . operator on that is a r-value. You are not allowed to assign a value to a r-value.
D. The solution
There is a solution to assign to a "returned structure". One might decide, whether it is good or not. Since -> always is an l-value, you can return a pointer to the structure. Dereferencing this pointer with the -> operator has always an l-value as result, so you can assign a value to it:
// obj.aIndex returns a pointer
obj.aIndex->category = 1;
You do not need #public for that. (What really is a bad idea.)
The semantics of the property are to copy the struct, so it doesn't need to be allocated and initialized like an Objective-C object would. It's given its own space like a primitive type is.
You will need to be careful updating it, as this won't work:
obj.aIndex.category = 1;
Instead you will need to do this:
Index index = obj.aIndex;
index.category = 1;
obj.aIndex = index;
This is because the property getter will return a copy of the struct and not a reference to it (the first snippet is like the second snippet, without the last line that assigns the copy back to the object).
So you might be better off making it a first class object, depending on how it will be used.

How do nested functions get compiled?

When I have 2 functions in D like this:
void func() {
void innerFunc() {
import std.stdio;
writeln(x);
}
int x = 5;
innerFunc();
}
When I call func this will print 5. How does it work? Where in memory does the 5 get stored? How does innerFunc know it has to print 5?
I attempt to answer this in broad terms. This type of issue arises in a number of languages that permit nested function definitions (including Ada and Pascal).
Normally, a variable like "x" is allocated on the processor stack. That's the normal process in any language that permits recursion.
When a nested function is called, a descriptor for the enclosing function's stack frame gets passed as hidden argument.
funct() then knows that x is located at some offset specified by the base pointer register.
innerFunct () knows the offset of x but has to derive the base from the hidden argument. It can't use its own base pointer value because it will be different from funct(). And, if innerFunct () called itself, the base pointer value would be different in each invocation.

Resources