Returning a constructed recursive data structure in Rust - parsing

I've just started learning Rust and am coming from a functional programming background. I'm trying to create a parser in rust and have defined this recursive data structure
enum SimpleExpression<'a> {
Number(u64),
FunctionCall(Function, &'a SimpleExpression<'a>, &'a SimpleExpression<'a>)
}
I initially had defined it as
enum SimpleExpression {
Number(u64),
FunctionCall(Function, SimpleExpression, SimpleExpression)
}
but got a complaint from the compiler saying this type had an infinite size. I'm not used to having to worry about managing memory so this confused me for a second but now it makes a lot of sense. Rust cannot allocate memory for a data structure where the size is not defined. So changing the SimpleExpression to &'a SimpleExpression<'a> makes sense.
The second problem I came across was when implementing the parsing function (excuse the verboseness)
fn parse_simple_expr(input: &str) -> Option<(SimpleExpression, usize)> {
match parse_simple_expr(&input) {
Some((left, consumed1)) => match parse_function_with_whitespace(&input[consumed1..]) {
Some((func, consumed2)) => match parse_simple_expr(&input[consumed1+consumed2..]) {
Some((right, consumed3)) => Some((
SimpleExpression::FunctionCall(func, &left.clone(), &right.clone()),
consumed1 + consumed2 + consumed3
)),
None => None
},
None => None
},
None => None
}
}
But basically what is wrong with this function is I am creating SimpleExpression objects inside the function and then trying to return references to them from the function. The problem here of course is that the objects will be dropped when the function returns and Rust does not allow dangling references so I get the error cannot return value referencing temporary value on &left.clone() and &right.clone().
It makes sense to me why this does not work but I am wondering if there is another way to execute this pattern to be able to create a recursive object and return it from a function. Or is there some fundamental reason this will never work in which case are there any good alternatives? Since my code is confusing I've also provided a simpler example of a recursive structure but that has the same limitations in case that helps to better understand the issue.
enum List<'a> {
End,
Next((char, &'a List<'a>))
}
fn create_linked(input: &str) -> List {
match input.chars().next() {
Some(c) => List::Next((c, &create_linked(&input[1..]))),
None => List::End
}
}

Related

Does Dart have a comma operator?

Consider the following line of code that doesn't compile in Dart -- lack of comma operator, but comparable things are totally fine in JavaScript or C++:
final foo = (ArgumentError.checkNotNull(value), value) * 2;
The closest I could get with an ugly workaround is
final foo = last(ArgumentError.checkNotNull(value), value) * 2;
with function
T last<T>(void op, T ret) => ret;
Is there a better solution?
Dart does not have a comma operator similar to the one in JavaScript.
There is no obviously better solution than what you already have.
The work-around operation you introduced is how I would solve it. I usually call it seq for "sequence" if I write it.
There is sadly no good way to use an extension operator because you need to be generic on the second operand and operators cannot be generic. You could use an extension method like:
extension Seq on void {
T seq<T>(T next) => next;
}
Then you can write ArgumentError.checkNotNull(value).seq(value).
(For what it's worth, the ArgumentError.checkNotNull function has been changed to return its value, but that change was made after releasing Dart 2.7, so it will only be available in the next release after that).
If the overhead doesn't matter, you can use closures without arguments for a similar effect (and also more complex operations than just a sequence of expressions).
final foo = () {
ArgumentError.checkNotNull(value);
return value;
} ();
This is not great for hot paths due to the overhead incurred by creating and calling a closure, but can work reasonably well outside those.
If you need this kind of test-plus-initialization pattern more than once, the cleanest way would arguably be to put it in a function of its own, anyway.
T ensureNotNull<T>(T value) {
ArgumentError.checkNotNull(value);
return value;
}
final foo = ensureNotNull(value);

Type inference behaves differently for similar cases

Running the following code (Dart 2.3) throws the exception:
type 'List<dynamic>' is not a subtype of type 'List<bool>'
bar() => 0;
foo() => [bar()];
main() {
var l = [1, 2, 3];
l = foo();
}
However, this slightly altered example runs correctly:
main() {
bar() => 0;
var l = [1, 2, 3];
l = [bar()];
}
As does this:
main() {
bar() => 0;
foo() => [bar()];
var l = [1, 2, 3];
l = foo();
}
What is it about Dart's type inference algorithm that makes these cases behave differently? Seems like the types of the functions foo and bar should be pretty easy to infer, since they always return the same value. It also isn't obvious to me why moving around the site of the function declaration would change type inference in these cases.
Anyone know what's going on here?
Leaf Petersen explains it in a comment to dart-lang/sdk issue #33137: Type inference of function return value:
This is by design. We do infer return types of non-recursive local
functions (functions declared inside of the scope of another function
or method), but for top level functions and methods, we do not infer
return types (except via override inference). The reasons are as
follows:
Methods and top level functions are usually part of the API of a program, and it's valuable to be able to quickly read off the API of a
piece of code. Doing method body based return type inference means
that understanding the signature of the API requires reading through
the method body.
Methods and top level functions can be arbitrarily mutually recursive, which makes the inference problem much harder and more
expensive.
For primarily these reasons, we do not infer return types for top level functions and methods. Leaving off the return type is just another way of saying dynamic.
If you set
analyzer:
strong-mode:
implicit-dynamic: false
in your analysis_options.yaml file, then dartanalyzer will generate errors when top-level functions have an implicit dynamic return type:
error • Missing return type for 'bar' at example.dart:1:1 • strong_mode_implicit_dynamic_return
error • Missing return type for 'foo' at example.dart:2:1 • strong_mode_implicit_dynamic_return
It looks like nested functions are treated differently than top-level functions. It is probably a bug. I get the following from Dartpad on Dart 2.3.1.
foo() => 0;
bar() => [foo()];
main() {
baz() => 0;
qux() => [baz()];
print(foo.runtimeType);
print(bar.runtimeType);
print(baz.runtimeType);
print(qux.runtimeType);
}
// () => dynamic
// () => dynamic
// () => int
// () => List<int>
Explanation here:
This is expected behavior.
Local functions use type inference to deduce their return type, but top-level/class-level functions do not.
The primary reason for the distinction is that top-level and class level functions exist at the same level as type declarations. Solving cyclic dependencies between types and functions gets even harder if we have to also analyze function bodies at a time where we don't even know the signature of classes yet.
When top-level inference has completed, we do know the type hierarchies, and where top-level functions are unordered, they can refer to each other in arbitrary ways, local functions are linear and can only depend on global functions or prior local functions. That means that we can analyze the function body locally to find the return type, without needing to look at anything except the body itself, and things we have already analyzed.

Swift `in` keyword meaning?

I am trying to implement some code from parse.com and I notice a keyword in after the void.
I am stumped what is this ? The second line you see the Void in
PFUser.logInWithUsernameInBackground("myname", password:"mypass") {
(user: PFUser?, error: NSError?) -> Void in
if user != nil {
// Do stuff after successful login.
} else {
// The login failed. Check error to see why.
}
}
The docs don't document this. I know the in keyword is used in for loops.
Anyone confirm?
In a named function, we declare the parameters and return type in the func declaration line.
func say(s:String)->() {
// body
}
In an anonymous function, there is no func declaration line - it's anonymous! So we do it with an in line at the start of the body instead.
{
(s:String)->() in
// body
}
(That is the full form of an anonymous function. But then Swift has a series of rules allowing the return type, the parameter types, and even the parameter names and the whole in line to be omitted under certain circumstances.)
Closure expression syntax has the following general form:
The question of what purpose in serves has been well-answered by other users here; in summary: in is a keyword defined in the Swift closure syntax as a separator between the function type and the function body in a closure:
{ /parameters and type/ in /function body/ }
But for those who might be wondering "but why specifically the keyword in?", here's a bit of history shared by Joe Groff, Senior Swift Compiler Engineer at Apple, on the Swift forums:
It's my fault, sorry. In the early days of Swift, we had a closure
syntax that was very similar to traditional Javascript:
func (arg: -> Type, arg: Type) -> Return { ... }
While this is nice and regular syntax, it is of course also very bulky
and awkward if you're trying to support expressive functional APIs,
such as map/filter on collections, or if you want libraries to be able
to provide closure-based APIs that feel like extensions of the
language.
Our earliest adopters at Apple complained about this, and mandated
that we support Ruby-style trailing closure syntax. This is tricky to
fit into a C-style syntax like Swift's, and we tried many different
iterations, including literally Ruby's {|args| } syntax, but many of
them suffered from ambiguities or simply distaste and revolt from our
early adopters. We wanted something that still looked like other parts
of the language, but which could be parsed unambiguously and could
span the breadth of use cases from a fully explicit function signature
to extremely compact.
We had already taken in as a keyword, we couldn't use -> like Java
does because it's already used to denote the return type, and we were
concerned that using => like C# would be too visually confusing. in
made xs.map { x in f(x) } look vaguely like for x in xs { f(x) },
and people hated it less than the alternatives.
*Formatting and emphasis mine. And thanks to Nikita Belov's post on the Swift forums for helping my own understanding.

Can I create an "unsafe closure"?

I have some code that, when simplified, looks like:
fn foo() -> Vec<u8> {
unsafe {
unsafe_iterator().map(|n| wrap_element(n)).collect()
}
}
The iterator returns items that would be invalidated if the underlying data changed. Sadly, I'm unable to rely on the normal Rust mechanism of mut here (I'm doing some... odd things).
To rectify the unsafe-ness, I traverse the iterator all at once and make copies of each item (via wrap_element) and then throw it all into a Vec. This works because nothing else has a chance to come in and modify the underlying data.
The code works as-is now, but since I use this idiom a few times, I wanted to DRY up my code a bit:
fn zap<F>(f: F) -> Vec<u8>
where F: FnOnce() -> UnsafeIter
{
f().map(|n| wrap_element(n)).collect()
}
fn foo() -> Vec<u8> {
zap(|| unsafe { unsafe_iterator() }) // Unsafe block
}
My problem with this solution is that the call to unsafe_iterator is unsafe, and it's the wrap_element / collect that makes it safe again. The way that the code is structured does not convey that at all.
I'd like to somehow mark my closure as being unsafe and then it's zaps responsibility to make it safe again.
It's not possible to create an unsafe closure in the same vein as an unsafe fn, since closures are just anonymous types with implementations of the Fn, FnMut, and/or FnOnce family of traits. Since those traits do not have unsafe methods, it's not possible to create a closure which is unsafe to call.
You could create a second set of closure traits with unsafe methods, then write implementations for those, but you would lose much of the closure sugar.

Why does Rust not have a return value in the main function, and how to return a value anyway?

In Rust the main function is defined like this:
fn main() {
}
This function does not allow for a return value though. Why would a language not allow for a return value and is there a way to return something anyway? Would I be able to safely use the C exit(int) function, or will this cause leaks and whatnot?
As of Rust 1.26, main can return a Result:
use std::fs::File;
fn main() -> Result<(), std::io::Error> {
let f = File::open("bar.txt")?;
Ok(())
}
The returned error code in this case is 1 in case of an error. With File::open("bar.txt").expect("file not found"); instead, an error value of 101 is returned (at least on my machine).
Also, if you want to return a more generic error, use:
use std::error::Error;
...
fn main() -> Result<(), Box<dyn Error>> {
...
}
std::process::exit(code: i32) is the way to exit with a code.
Rust does it this way so that there is a consistent explicit interface for returning a value from a program, wherever it is set from. If main starts a series of tasks then any of these can set the return value, even if main has exited.
Rust does have a way to write a main function that returns a value, however it is normally abstracted within stdlib. See the documentation on writing an executable without stdlib for details.
As was noted by others, std::process::exit(code: i32) is the way to go here
More information about why is given in RFC 1011: Process Exit. Discussion about the RFC is in the pull request of the RFC.
The reddit thread on this has a "why" explanation:
Rust certainly could be designed to do this. It used to, in fact.
But because of the task model Rust uses, the fn main task could start a bunch of other tasks and then exit! But one of those other tasks may want to set the OS exit code after main has gone away.
Calling set_exit_status is explicit, easy, and doesn't require you to always put a 0 at the bottom of main when you otherwise don't care.
Try:
use std::process::ExitCode;
fn main() -> ExitCode {
ExitCode::from(2)
}
Take a look in doc
or:
use std::process::{ExitCode, Termination};
pub enum LinuxExitCode { E_OK, E_ERR(u8) }
impl Termination for LinuxExitCode {
fn report(self) -> ExitCode {
match self {
LinuxExitCode::E_OK => ExitCode::SUCCESS,
LinuxExitCode::E_ERR(v) => ExitCode::from(v)
}
}
}
fn main() -> LinuxExitCode {
LinuxExitCode::E_ERR(3)
}
You can set the return value with std::os::set_exit_status.

Resources