I was trying to implement vector algebra with generic algorithms and ended up playing with iterators. I have found two examples of not obvious and unexpected behaviour:
if I have pointer p to a struct (instance) with field fi, I can access the field as simply as p.fi (rather than p.*.fi)
if I have a "member" function fun(this: *Self) (where Self = #This()) and an instance s of the struct, I can call the function as simply as s.fun() (rather than (&s).fun())
My questions are:
is it documented (or in any way mentioned) somewhere? I've looked through both language reference and guide from ziglearn.org and didn't find anything
what is it that we observe in these examples? syntactic sugar for two particular cases or are there more general rules from which such behavior can be deduced?
are there more examples of weird pointers' behaviour?
For 1 and 2, you are correct. In Zig the dot works for both struct values and struct pointers transparently. Similarly, namespaced functions also do the right thing when invoked.
The only other similar behavior that I can think of is [] syntax used on arrays. You can use both directly on an array value and an array pointer interchangeably. This is somewhat equivalent to how the dot operates on structs.
const std = #import("std");
pub fn main() !void {
const arr = [_]u8{1,2,3};
const foo = &arr;
std.debug.print("{}", .{arr[2]});
std.debug.print("{}", .{foo[2]});
}
AFAIK these are the only three instances of this behavior. In all other cases if something asks for a pointer you have to explicitly provide it. Even when you pass an array to a function that accepts a slice, you will have to take the array's pointer explicitly.
The authoritative source of information is the language reference but checking it quickly, it doesn't seem to have a dedicated paragraph. Maybe there's some example that I missed though.
https://ziglang.org/documentation/0.8.0/
I first learned this syntax by going through the ziglings course, which is linked to on ziglang.org.
in exercise 43 (https://github.com/ratfactor/ziglings/blob/main/exercises/043_pointers5.zig)
// Note that you don't need to dereference the "pv" pointer to access
// the struct's fields:
//
// YES: pv.x
// NO: pv.*.x
//
// We can write functions that take pointer arguments:
//
// fn foo(v: *Vertex) void {
// v.x += 2;
// v.y += 3;
// v.z += 7;
// }
//
// And pass references to them:
//
// foo(&v1);
The ziglings course goes quite in-depth on a few language topics, so it's definitely work checking out if you're interested.
With regards to other syntax: as the previous answer mentioned, you don't need to dereference array pointers. I'm not sure about anything else (I thought function pointers worked the same, but I just ran some tests and they do not.)
Related
Example code:
fn main() {
let mut y = &5; // 1
println!("{:p}", y);
{
let x = &2; // 2
println!("{:p}", x);
y = x;
}
y = &3; // 3
println!("{:p}", y);
}
If third assignment contains &3 then code output:
0x558e7da926a0
0x558e7da926a4
0x558e7da926a8
If third assignment contains &2 (same value with second assignment) then code output:
0x558e7da926a0
0x558e7da926a4
0x558e7da926a4
If third assignment contains &5 (same value with first assignment) then code output:
0x558e7da926a0
0x558e7da926a4
0x558e7da926a0
Why does rust not free memory but reuse it if the assignment value is the same or allocate a new block of memory otherwise?
Two occurrences of the same literal number are indistinguishable. You cannot expect the address of two literals to be identical, and neither can you expect them to be different.
This allows the compiler (but in fact it is free to do otherwise) to emit one 5 data in the executable code, and have all &5 refer to it. Constants may (see comment) also have a static lifetime, in which case they are not allocated/deallocated during program execution, they always are allocated.
There are lots of tricks an optimizing compiler can use to determine if a variable can be assigned a constant value. Your findings are consistent with this, no need to run duplicate code if it is not needed.
Are the following two examples equivalent?
Example 1:
let x = String::new();
let y = &x[..];
Example 2:
let x = String::new();
let y = &*x;
Is one more efficient than the other or are they basically the same?
In the case of String and Vec, they do the same thing. In general, however, they aren't quite equivalent.
First, you have to understand Deref. This trait is implemented in cases where a type is logically "wrapping" some lower-level, simpler value. For example, all of the "smart pointer" types (Box, Rc, Arc) implement Deref to give you access to their contents.
It is also implemented for String and Vec: String "derefs" to the simpler str, Vec<T> derefs to the simpler [T].
Writing *s is just manually invoking Deref::deref to turn s into its "simpler form". It is almost always written &*s, however: although the Deref::deref signature says it returns a borrowed pointer (&Target), the compiler inserts a second automatic deref. This is so that, for example, { let x = Box::new(42i32); *x } results in an i32 rather than a &i32.
So &*s is really just shorthand for Deref::deref(&s).
s[..] is syntactic sugar for s.index(RangeFull), implemented by the Index trait. This means to slice the "whole range" of the thing being indexed; for both String and Vec, this gives you a slice of the entire contents. Again, the result is technically a borrowed pointer, but Rust auto-derefs this one as well, so it's also almost always written &s[..].
So what's the difference? Hold that thought; let's talk about Deref chaining.
To take a specific example, because you can view a String as a str, it would be really helpful to have all the methods available on strs automatically available on Strings as well. Rather than inheritance, Rust does this by Deref chaining.
The way it works is that when you ask for a particular method on a value, Rust first looks at the methods defined for that specific type. Let's say it doesn't find the method you asked for; before giving up, Rust will check for a Deref implementation. If it finds one, it invokes it and then tries again.
This means that when you call s.chars() where s is a String, what's actually happening is that you're calling s.deref().chars(), because String doesn't have a method called chars, but str does (scroll up to see that String only gets this method because it implements Deref<Target=str>).
Getting back to the original question, the difference between &*s and &s[..] is in what happens when s is not just String or Vec<T>. Let's take a few examples:
s: String; &*s: &str, &s[..]: &str.
s: &String: &*s: &String, &s[..]: &str.
s: Box<String>: &*s: &String, &s[..]: &str.
s: Box<Rc<&String>>: &*s: &Rc<&String>, &s[..]: &str.
&*s only ever peels away one layer of indirection. &s[..] peels away all of them. This is because none of Box, Rc, &, etc. implement the Index trait, so Deref chaining causes the call to s.index(RangeFull) to chain through all those intermediate layers.
Which one should you use? Whichever you want. Use &*s (or &**s, or &***s) if you want to control exactly how many layers of indirection you want to strip off. Use &s[..] if you want to strip them all off and just get at the innermost representation of the value.
Or, you can do what I do and use &*s because it reads left-to-right, whereas &s[..] reads left-to-right-to-left-again and that annoys me. :)
Addendum
There's the related concept of Deref coercions.
There's also DerefMut and IndexMut which do all of the above, but for &mut instead of &.
They are completely the same for String and Vec.
The [..] syntax results in a call to Index<RangeFull>::index() and it's not just sugar for [0..collection.len()]. The latter would introduce the cost of bound checking. Gladly this is not the case in Rust so they both are equally fast.
Relevant code:
index of String
deref of String
index of Vec (just returns self which triggers the deref coercion thus executes exactly the same code as just deref)
deref of Vec
Consider below struct:
typedef struct _Index {
NSInteger category;
NSInteger item;
} Index;
If I use this struct as a property:
#property (nonatomic, assign) Index aIndex;
When I access it without any initialization right after a view controller alloc init, LLDB print it as:
(lldb) po vc.aIndex
(category = 0, item = 0)
(lldb) po &_aIndex
0x000000014e2bcf70
I am a little confused, the struct already has valid memory address, even before I want to allocate one. Does Objective-C initialize struct automatically? If it is a NSObject, I have to do alloc init to get a valid object, but for C struct, I get a valid struct even before I tried to initialize it.
Could somebody explains, and is it ok like this, not manually initializing it?
To answer the subquestion, why you cannot assign to a structure component returned from a getter:
(As a motivation this is, because I have read this Q several times.)
A. This has nothing to do with Cbjective-C. It is a behavior stated in the C standard. You can check it for simple C code:
NSMakeSize( 1.0, 2.0 ).width = 3.0; // Error
B. No, it is not an improvement of the compiler. If it would be so, a warning would be the result, not an error. A compiler developer does not have the liberty to decide what an error is. (There are some cases, in which they have the liberty, but this are explicitly mentioned.)
C. The reason for this error is quite easy:
An assignment to the expression
NSMakeSize( 1.0, 2.0 ).width
would be legal, if that expression is a l-value. A . operator's result is an l-value, if the structure is an l-value:
A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member,82) and is an lvalue if the first expression is an lvalue.
ISO/IEC 9899:TC3, 6.5.2.3
Therefore it would be assignable, if the expression
NSMakeSize( 1.0, 2.0 )
is an l-value. It is not. The reason is a little bit more complex. To understand that you have to know the links between ., -> and &:
In contrast to ., -> always is an l-value.
A postfix expression followed by the -> operator and an identifier designates a member of a structure or union object. The value is that of the named member of the object to which the first expression points, and is an lvalue. 83)
Therefore - that is what footnote 83 explains – ->, &, and . has a link:
If you can calculate the address of a structure S having a component C with the & operator, the expression (&S)->C is equivalent to S.C. This requires that you can calculate the address of S. But you can never do that with a return value, even it is a simple integer …
int f(void)
{
return 1;
}
f()=5; // Error
… or a pointer …
int *f(void)
{
return NULL;
}
f()=NULL; // Error
You always get the same error: It is not assignable. Because it is a r-value. This is obvious, because it is not clear,
a) whether the way the compiler returns a value, esp. whether he does it in address space.
b) when the time the life time of the returned value is over
Going back to the structure that means that the return value is a r-value. Therefore the result of the . operator on that is a r-value. You are not allowed to assign a value to a r-value.
D. The solution
There is a solution to assign to a "returned structure". One might decide, whether it is good or not. Since -> always is an l-value, you can return a pointer to the structure. Dereferencing this pointer with the -> operator has always an l-value as result, so you can assign a value to it:
// obj.aIndex returns a pointer
obj.aIndex->category = 1;
You do not need #public for that. (What really is a bad idea.)
The semantics of the property are to copy the struct, so it doesn't need to be allocated and initialized like an Objective-C object would. It's given its own space like a primitive type is.
You will need to be careful updating it, as this won't work:
obj.aIndex.category = 1;
Instead you will need to do this:
Index index = obj.aIndex;
index.category = 1;
obj.aIndex = index;
This is because the property getter will return a copy of the struct and not a reference to it (the first snippet is like the second snippet, without the last line that assigns the copy back to the object).
So you might be better off making it a first class object, depending on how it will be used.
I've noticed that strings, numbers, bool and nil data seem to be straight forward to work with. But when it comes to functions, tables, etc. you get a reference instead of the actual object.
Is there a name for this phenomenon? Is there terminology that describes the distinction between the way these 2 sets of types are handled?
a = "hi"
b = 1
c = true
d = nil
e = {"joe", "mike"}
f = function () end
g = coroutine.create(function () print("hi") end)
print(a) --> hi
print(b) --> 1
print(c) --> true
print(d) --> nil
print(e) --> table: 0x103350
print(f) --> function: 0x1035a0
print(g) --> thread: 0x103d30
What you're seeing here is an attempt by the compiler to return a string representation of the object. For simple object types the __tostring implementation is provided already, but for other more complex types there is no intuitive way of returning a string representation.
See Lua: give custom userdata a tostring method for more information which might help!
.Net (Microsoft Visual Basic, Visual C++ and C#) would describe them as value types and reference types, where reference types refer to a value by reference and value types hold the actual values.
I don't think lua puts too much thought into it given that it's supposed to be a simpler interpreted language and ultimately it doesn't matter as much because lua is a fairly weakly typed language (ie it doesn't enforce type safety beyond throwing an error when you try to use operations on types they can't be used on).
Either way, most programmers in my experience understand them as 'value types' and 'reference types', so I'd say they're the two terms it's best to stick with.
In Lua, numbers are values, everything else is accessible by reference only. But the different behavior on print is just because there's no way to actually print functions (and while tables could have a default behavior for print, they don't - possibly because they're allowed to have cyclic references).
What you are seeing is the behavior of the print function. It will its arguments by using tostring on them. print could be implemented by using io.write like this (simplified a bit):
function print(...)
local args = {n = select('#',...), ...}
for i=1,args.n do
io.write(tostring(args[i]), '\t')
end
io.write('\n')
end
You should notice the call to tostring. By default it returns the representation of numbers, booleans and strings. Since there is no sane default way to convert other types to a string, it only displays the type and a useless internal pointer to the object (so that you can differentiate instances). You can view the source here.
You will be surprised, but there is no value/reference distinction in Lua. :-)
Please read here and here.
Any way to declare a new variable in F# without assigning a value to it?
See Aidan's comment.
If you insist, you can do this:
let mutable x = Unchecked.defaultof<int>
This will assign the absolute zero value (0 for numeric types, null for reference types, struct-zero for value types).
It would be interesting to know why the author needs this in F# (simple example of intended use would suffice).
But I guess one of the common cases when you may use uninitialised variable in C# is when you call a function with out parameter:
TResult Foo<TKey, TResult>(IDictionary<TKey, TResult> dictionary, TKey key)
{
TResult value;
if (dictionary.TryGetValue(key, out value))
{
return value;
}
else
{
throw new ApplicationException("Not found");
}
}
Luckily in F# you can handle this situation using much nicer syntax:
let foo (dict : IDictionary<_,_>) key =
match dict.TryGetValue(key) with
| (true, value) -> value
| (false, _) -> raise <| ApplicationException("Not Found")
You can also use explicit field syntax:
type T =
val mutable x : int
I agree with everyone who has said "don't do it". However, if you are convinced that you are in a case where it really is necessary, you can do this:
let mutable naughty : int option = None
...then later to assign a value.
naughty <- Some(1)
But bear in mind that everyone who has said 'change your approach instead' is probably right. I code in F# full time and I've never had to declare an unassigned 'variable'.
Another point: although you say it wasn't your choice to use F#, I predict you'll soon consider yourself lucky to be using it!
F# variables are by default immutable, so you can't assign a value later. Therefore declaring them without an initial value makes them quite useless, and as such there is no mechanism to do so.
Arguably, a mutable variable declaration could be declared without an initial value and still be useful (it could acquire an initial default like C# variables do), but F#'s syntax does not support this. I would guess this is for consistency and because mutable variable slots are not idiomatic F# so there's little incentive to make special cases to support them.