Is there a difference between slicing and an explicit reborrow when converting Strings to &strs? - memory

Are the following two examples equivalent?
Example 1:
let x = String::new();
let y = &x[..];
Example 2:
let x = String::new();
let y = &*x;
Is one more efficient than the other or are they basically the same?

In the case of String and Vec, they do the same thing. In general, however, they aren't quite equivalent.
First, you have to understand Deref. This trait is implemented in cases where a type is logically "wrapping" some lower-level, simpler value. For example, all of the "smart pointer" types (Box, Rc, Arc) implement Deref to give you access to their contents.
It is also implemented for String and Vec: String "derefs" to the simpler str, Vec<T> derefs to the simpler [T].
Writing *s is just manually invoking Deref::deref to turn s into its "simpler form". It is almost always written &*s, however: although the Deref::deref signature says it returns a borrowed pointer (&Target), the compiler inserts a second automatic deref. This is so that, for example, { let x = Box::new(42i32); *x } results in an i32 rather than a &i32.
So &*s is really just shorthand for Deref::deref(&s).
s[..] is syntactic sugar for s.index(RangeFull), implemented by the Index trait. This means to slice the "whole range" of the thing being indexed; for both String and Vec, this gives you a slice of the entire contents. Again, the result is technically a borrowed pointer, but Rust auto-derefs this one as well, so it's also almost always written &s[..].
So what's the difference? Hold that thought; let's talk about Deref chaining.
To take a specific example, because you can view a String as a str, it would be really helpful to have all the methods available on strs automatically available on Strings as well. Rather than inheritance, Rust does this by Deref chaining.
The way it works is that when you ask for a particular method on a value, Rust first looks at the methods defined for that specific type. Let's say it doesn't find the method you asked for; before giving up, Rust will check for a Deref implementation. If it finds one, it invokes it and then tries again.
This means that when you call s.chars() where s is a String, what's actually happening is that you're calling s.deref().chars(), because String doesn't have a method called chars, but str does (scroll up to see that String only gets this method because it implements Deref<Target=str>).
Getting back to the original question, the difference between &*s and &s[..] is in what happens when s is not just String or Vec<T>. Let's take a few examples:
s: String; &*s: &str, &s[..]: &str.
s: &String: &*s: &String, &s[..]: &str.
s: Box<String>: &*s: &String, &s[..]: &str.
s: Box<Rc<&String>>: &*s: &Rc<&String>, &s[..]: &str.
&*s only ever peels away one layer of indirection. &s[..] peels away all of them. This is because none of Box, Rc, &, etc. implement the Index trait, so Deref chaining causes the call to s.index(RangeFull) to chain through all those intermediate layers.
Which one should you use? Whichever you want. Use &*s (or &**s, or &***s) if you want to control exactly how many layers of indirection you want to strip off. Use &s[..] if you want to strip them all off and just get at the innermost representation of the value.
Or, you can do what I do and use &*s because it reads left-to-right, whereas &s[..] reads left-to-right-to-left-again and that annoys me. :)
Addendum
There's the related concept of Deref coercions.
There's also DerefMut and IndexMut which do all of the above, but for &mut instead of &.

They are completely the same for String and Vec.
The [..] syntax results in a call to Index<RangeFull>::index() and it's not just sugar for [0..collection.len()]. The latter would introduce the cost of bound checking. Gladly this is not the case in Rust so they both are equally fast.
Relevant code:
index of String
deref of String
index of Vec (just returns self which triggers the deref coercion thus executes exactly the same code as just deref)
deref of Vec

Related

Armadillo subview issue

The following fragment produces a compilation error:
arma::Mat<double> a(10,10,arma::fill::zeros);
arma::ucolvec w = whatever1;
whatever2 = a.rows(w).each_col() + another-col-vector;
The error is that arma::subview_elem2 has no member named each_col.
In a number of cases in Armadillo, the standard array functions are not always available on expressions or results of other function calls. Clearly the rows() function does not return a Mat object, but a subview_elem2 object, presumably for optimization. Another way to do this would be to declare all the array functions in interfaces/pure abstract classes that Mat and other internal classes, such as subviews, implement. It seems it should be possible to make all Armadillo array expressions interchangeable with array objects aside from write operations for expressions that only generate r-values.
So... I could wish for the following
a) An explanation of which methods are not available for which results.
b) Preferably, enabling all combinations of array methods that make sense.
Absent the above, how can accomplish the desired result, which is to evaluate the expression:
a.rows(w).each_col()
??
Some prior information about armadillo
The armadillo library uses templates heavily and most operations return expression templates. Only when you assign the result to a variable the actual computation is performed. This is why you should not store the result of some computation with armadillo using auto.
For instance, given some matrices A, B and C, something like
auto D = A * B + C;
will not perform the computation and only the expression template is stored in D. On the other hand, using
arma::mat D = A * B + C;
will force the computation to happen and the result is stored in D.
Solution to your problem
Particularly to your question, something like a.rows(w) returns an expression template of type subview_elem2 (this file is defined in the source code armadillo_bits/subview_elem2_bones.hpp). This "temporary type" does not have a .each_col method, which results in the error you got. One way around is to store the result of a.rows(w) in a variable, but since you are not interested in the variable you can use the .eval() method. The .eval() method forces the expression template to perform the actual computation up to that point and thus the subsequent call to .each_col will work. That is, replace
a.rows(w).each_col() + another-col-vector;
with
a.rows(w).eval().each_col() + another-col-vector;

Performant, idiomatic way to concatenate two chars that are not in a list into a string

I've done most of my development in C# and am just learning F#. Here's what I want to do in C#:
string AddChars(char char1, char char2) => char1.ToString() + char2.ToString();
EDIT: added ToString() method to the C# example.
I want to write the same method in F# and I don't know how to do it other than this:
let addChars char1 char2 = Char.ToString(char1) + Char.ToString(char2)
Is there a way to add concatenate these chars into a string without converting both into strings first?
Sidenote:
I also have considered making a char array and converting that into a string, but that seems similarly wasteful.
let addChars (char1:char) (char2: char) = string([|char1; char2|])
As I said in my comment, your C# code is not going to do what you want ( i.e. concatenate the characters into a string). In C#, adding a char and a char will result in an int. The reason for this is because the char type doesn't define a + operator, so C# reverts to the nearest compatable type that does, which just happens to be int. (Source)
So to accomplish this behavior, you will need to do something similar to what you are already trying to do in F#:
char a = 'a';
char b = 'b';
// This is the wrong way to concatenate chars, because the
// chars will be treated as ints and the result will be 195.
Console.WriteLine(a + b);
// These are the correct ways to concatenate characters into
// a single string. The result of all of these will be "ab".
// The third way is the recommended way as it is concise and
// involves creating the fewest temporary objects.
Console.WriteLine(a.ToString() + b.ToString());
Console.WriteLine(Char.ToString(a) + Char.ToString(b));
Console.WriteLine(new String(new[] { a, b }));
(See https://dotnetfiddle.net/aEh1FI)
F# is the same way in that concatenating two or more chars doesn't result in a String. Unlike C#, it results instead in another char, but the process is the same - the char values are treated like int and added together, and the result is the char representation of the sum.
So really, the way to concatenate chars into a String in F# is what you already have, and is the direct translation of the C# equivalent:
let a = 'a'
let b = 'b'
// This is still the wrong way (prints 'Ã')
printfn "%O" (a + b)
// These are still the right ways (prints "ab")
printfn "%O" (a.ToString() + b.ToString())
printfn "%O" (Char.ToString(a) + Char.ToString(b))
printfn "%O" (String [| a;b |]) // This is still the best way
(See https://dotnetfiddle.net/ALwI3V)
The reason the "String from char array" approach is the best way is two-fold. First, it is the most concise, since you can see that that approach offers the shortest line of code in both languages (and the difference only increases as you add more and more chars together). And second, only one temporary object is created (the array) before the final String, whereas the other two methods involve making two separate temporary String objects to feed into the final result.
(Also, I'm not sure if it works this way as the String constructors are hidden in external sources, but I imagine that the array passed into the constructor would be used as the String's backing data, so it wouldn't end up getting wasted at all.) Strings are immutable, but using the passed array directly as the created String's backing data could result in a situation where a reference to the array could be held elsewhere in the program and jeopardize the String's immutability, so this speculation wouldn't fly in practice. (Credit: #CaringDev)
Another option you could do in F# that could be more idiomatic is to use the sprintf function to combine the two characters (Credit: #rmunn):
let a = 'a'
let b = 'b'
let s = sprintf "%c%c" a b
printfn "%O" s
// Prints "ab"
(See https://dotnetfiddle.net/Pp9Tee)
A note of warning about this method, however, is that it is almost certainly going to be much slower than any of the other three methods listed above. That's because instead of processing array or String data directly, sprintf is going to be performing more advanced formatting logic on the output. (I'm not in a position where I could benchmark this myself at the moment, but plugged into #TomasPetricek's benckmarking code below, I wouldn't be surprised if you got performance hits of 10x or more.)
This might not be a big deal as for a single conversion it will still be far faster than any end-user could possibly notice, but be careful if this is going to be used in any performance-critical code.
The answer by #Abion47 already lists all the possible sensible methods I can think of. If you are interested in performance, then you can run a quick experiment using the F# Interactive #time feature:
#time
open System
open System.Text
let a = 'a'
let b = 'b'
Comparing the three methods, the one with String [| a; b |] turns out to be about twice as fast as the methods involving ToString. In practice, that's probably not a big deal unless you are doing millions of such operations (as my experiment does), but it's an interesting fact to know:
// 432ms, 468ms, 472ms
for i in 0 .. 10000000 do
let s = a.ToString() + b.ToString()
ignore s
// 396ms 440ms, 458ms
for i in 0 .. 10000000 do
let s = Char.ToString(a) + Char.ToString(b)
ignore s
// 201ms, 171ms, 170ms
for i in 0 .. 10000000 do
let s = String [| a;b |]
ignore s

When to use parentheses when calling a function in f#?

I'm learning about f# and I understand you don't need to use parentheses when calling a function.
Ex
let addOne arg1 =
arg1 + 1
addOne 1
vs
this.GetType()
Why do I have to use parentheses on the second function?
There is a bit of a mismatch between working with .NET libraries and working with F# libraries when it comes to parameters, but you can generally see () not as parentheses, but as a special value of type unit that means "no useful information".
This means that when you say:
addOne 1
You are calling addOne with a value - number 1 - as a parameter. Now, when you apply the same reading to the second example:
this.GetType()
You can read this as calling this.GetType with a value - the special () unit value as a parameter. If you wanted to be consistent, you could write this with space too:
this.GetType ()
In practice, most people will omit the space when calling .NET libraries. When you do not write the space, F# also supports method chaining so you can write e.g. foo().bar().
Many F# functions taking multiple parameters will use the "curried" form, which means that the parameters need to be separated by spaces. For example:
let add a b = a + b
let mul a b = a * b
add 10 (mul 20 3)
Here, you need parentheses around the second expression, so that the compiler knows how to parse the code. This is in contrast with typical .NET methods, which take parameters as a tuple. F# tuples are written as (10, "hello") and so you can see a method call as an ordinary call accepting a tuple:
some.Operation (10, "Hello")
Again, typically you wouldn't write the space here, because you know this is actually a .NET method call, rather than "passing tuple to a function", but conceptually, you can think of it in both ways.
This is the summary - there are a few corner cases where method calls do not really behave like tuples (e.g. when it comes to named parameters), but this way of thinking about it should give you an idea about how things work.

Is there a name for expressions that return what they are, instead of a reference?

I've noticed that strings, numbers, bool and nil data seem to be straight forward to work with. But when it comes to functions, tables, etc. you get a reference instead of the actual object.
Is there a name for this phenomenon? Is there terminology that describes the distinction between the way these 2 sets of types are handled?
a = "hi"
b = 1
c = true
d = nil
e = {"joe", "mike"}
f = function () end
g = coroutine.create(function () print("hi") end)
print(a) --> hi
print(b) --> 1
print(c) --> true
print(d) --> nil
print(e) --> table: 0x103350
print(f) --> function: 0x1035a0
print(g) --> thread: 0x103d30
What you're seeing here is an attempt by the compiler to return a string representation of the object. For simple object types the __tostring implementation is provided already, but for other more complex types there is no intuitive way of returning a string representation.
See Lua: give custom userdata a tostring method for more information which might help!
.Net (Microsoft Visual Basic, Visual C++ and C#) would describe them as value types and reference types, where reference types refer to a value by reference and value types hold the actual values.
I don't think lua puts too much thought into it given that it's supposed to be a simpler interpreted language and ultimately it doesn't matter as much because lua is a fairly weakly typed language (ie it doesn't enforce type safety beyond throwing an error when you try to use operations on types they can't be used on).
Either way, most programmers in my experience understand them as 'value types' and 'reference types', so I'd say they're the two terms it's best to stick with.
In Lua, numbers are values, everything else is accessible by reference only. But the different behavior on print is just because there's no way to actually print functions (and while tables could have a default behavior for print, they don't - possibly because they're allowed to have cyclic references).
What you are seeing is the behavior of the print function. It will its arguments by using tostring on them. print could be implemented by using io.write like this (simplified a bit):
function print(...)
local args = {n = select('#',...), ...}
for i=1,args.n do
io.write(tostring(args[i]), '\t')
end
io.write('\n')
end
You should notice the call to tostring. By default it returns the representation of numbers, booleans and strings. Since there is no sane default way to convert other types to a string, it only displays the type and a useless internal pointer to the object (so that you can differentiate instances). You can view the source here.
You will be surprised, but there is no value/reference distinction in Lua. :-)
Please read here and here.

F# Functions vs. Values

This is a pretty simple question, and I just wanted to check that what I'm doing and how I'm interpreting the F# makes sense. If I have the statement
let printRandom =
x = MyApplication.getRandom()
printfn "%d" x
x
Instead of creating printRandom as a function, F# runs it once and then assigns it a value. So, now, when I call printRandom, instead of getting a new random value and printing it, I simply get whatever was returned the first time. I can get around this my defining it as such:
let printRandom() =
x = MyApplication.getRandom()
printfn "%d" x
x
Is this the proper way to draw this distinction between parameter-less functions and values? This seems less than ideal to me. Does it have consequences in currying, composition, etc?
The right way to look at this is that F# has no such thing as parameter-less functions. All functions have to take a parameter, but sometimes you don't care what it is, so you use () (the singleton value of type unit). You could also make a function like this:
let printRandom unused =
x = MyApplication.getRandom()
printfn "%d" x
x
or this:
let printRandom _ =
x = MyApplication.getRandom()
printfn "%d" x
x
But () is the default way to express that you don't use the parameter. It expresses that fact to the caller, because the type is unit -> int not 'a -> int; as well as to the reader, because the call site is printRandom () not printRandom "unused".
Currying and composition do in fact rely on the fact that all functions take one parameter and return one value.
The most common way to write calls with unit, by the way, is with a space, especially in the non .NET relatives of F# like Caml, SML and Haskell. That's because () is a singleton value, not a syntactic thing like it is in C#.
Your analysis is correct.
The first instance defines a value and not a function. I admit this caught me a few times when I started with F# as well. Coming from C# it seems very natural that an assignment expression which contains multiple statements must be a lambda and hence delay evaluated.
This is just not the case in F#. Statements can be almost arbitrarily nested (and it rocks for having locally scoped functions and values). Once you get comfortable with this you start to see it as an advantage as you can create functions and continuations which are inaccessible to the rest of the function.
The second approach is the standard way for creating a function which logically takes no arguments. I don't know the precise terminology the F# team would use for this declaration though (perhaps a function taking a single argument of type unit). So I can't really comment on how it would affect currying.
Is this the proper way to draw this
distinction between parameter-less
functions and values? This seems less
than ideal to me. Does it have
consequences in currying, composition,
etc?
Yes, what you describe is correct.
For what its worth, it has a very interesting consequence able to partially evaluate functions on declaration. Compare these two functions:
// val contains : string -> bool
let contains =
let people = set ["Juliet"; "Joe"; "Bob"; "Jack"]
fun person -> people.Contains(person)
// val contains2 : string -> bool
let contains2 person =
let people = set ["Juliet"; "Joe"; "Bob"; "Jack"]
people.Contains(person)
Both functions produce identical results, contains creates its people set on declaration and reuses it, whereas contains2 creates its people set everytime you call the function. End result: contains is slightly faster. So knowing the distinction here can help you write faster code.
Assignment bodies looking like function bodies have cought a few programmers unaware. You can make things even more interesting by having the assignment return a function:
let foo =
printfn "This runs at startup"
(fun () -> printfn "This runs every time you call foo ()")
I just wrote a blog post about it at http://blog.wezeku.com/2010/08/23/values-functions-and-a-bit-of-both/.

Resources