Am I using ArrayLists wrong in Zig when performing simple variable assignment changes function behaviour? - zig

I have been doing Advent of Code this year, to learn Zig, and I discovered something during Day 5 that really confused me. So: mild spoilers for Day 5 of Advent of Code 2022, I guess?
I decided to implement my solution to Day 5 as an ArrayList of ArrayLists of U8s, which has ended up working well. My full solution file is here (probably terribly un-idiomatic Zig, but we all have to start somewhere).
As part of my solution, I have a function, which I call moveCrates, on a struct which wraps my arraylist of arraylists.
The relevant part of the struct declaration looks as so:
const BunchOfStacks = struct {
stacks: ArrayList(ArrayList(u8)),
...
This function is here, and looks like this:
fn moveCrates(self: *BunchOfStacks, amount: usize, source: usize, dest: usize) !void {
const source_height = self.stacks.items[source - 1].items.len;
const crate_slice = self.stacks.items[source - 1].items[(source_height - amount)..];
try self.stacks.items[dest - 1].appendSlice(crate_slice);
self.stacks.items[source - 1].shrinkRetainingCapacity(source_height - amount);
}
You can note that I refer 3 times to the source list by the very verbose reference self.stacks.items[source - 1]. This is not how I first wrote this function. I first wrote it like below:
fn moveCrates(self: *BunchOfStacks, amount: usize, source: usize, dest: usize) !void {
var source_list: ArrayList(u8) = self.stacks.items[source - 1];
const source_height = source_list.items.len;
const crate_slice = source_list.items[(source_height - amount)..];
try self.stacks.items[dest - 1].appendSlice(crate_slice);
source_list.shrinkRetainingCapacity(source_height - amount);
}
But this second form, where I make a local variable for my convenience, DOES NOT GIVE THE CORRECT RESULTS! It compiles fine, but seems to always point source_list to the same internal ArrayList(u8) (whichever one it first picks) regardless of what the value of source is. This means that the test example produces incorrect output.
This function is called within a loop, like so:
while (instructions.next()) |_| {
// First part is the verb, this is always "move" so skip it
// Get the amount next
const amount: usize = try std.fmt.parseInt(usize, instructions.next().?, 10);
// now skip _from_
_ = instructions.next();
// Now get source
const source: usize = try std.fmt.parseInt(usize, instructions.next().?, 10);
// Now skip _to_
_ = instructions.next();
// Now get dest
const dest: usize = try std.fmt.parseInt(usize, instructions.next().?, 10);
var crates_moved: usize = 0;
while (crates_moved < amount) : (crates_moved += 1) {
try stacks_part1.moveCrates(1, source, dest);
}
try stacks_part2.moveCrates(amount, source, dest);
}
Ultimately, as you can see, I have just avoided making variable assignments in the function and this passes the test (and the puzzle).
I have checked for issues in the zig repo that might be related, and cannot find anything immediately obvious (search used is this). I've looked on StackOverflow, and found this question, which does have some similarity to my issue (pointers seem a bit whack in while loops), but it's not the same.
I've scoured the zig documentation on loops and assignment, on the site, but don't see anything calling out this behaviour specifically.
I'm assuming I've either completely misunderstood something (or missed something that isn't well documented), or this is a bug — the language is under heavy active dev, after all.
I'm expecting that an assignment like I perform should work as expected — being a simple shorthand to avoid having to write out the repeated self.stacks.items[source - 1], so I'm hopeful that this is something that I'm just doing wrong. Zig version is v0.11.0-dev.537+36da3000c

An array list stores items as a slice (a pointer to the first item plus length).
When you do
var source_list: ArrayList(u8) = self.stacks.items[source - 1];
you make a shallow copy of the array list. You might be confused by your knowledge of some higher-level languages where this would've copied a reference to an object of the array list, but in Zig this isn't the case. In Zig everything is a "value object".
When you call shrinkRetainingCapacity it changes the length property of the items slice, but the changes are done to the local copy of the array list. The "real" array list, that is stored in another array list, remain unaffected.
TL;DR You need to use a pointer:
var source_list: *ArrayList(u8) = &self.stacks.items[source - 1];
This fixes the failing test.

Related

Why does `set` method defined on `Cell<T>` explicitly drops the old value? (Rust)

Interested why does set method defined on Cell, on the last line explicitly drops old value.
Shouldn't it be implicitly dropped (memory freed) anyways when the function returns?
use std::mem;
use std::cell::UnsafeCell;
pub struct Cell<T> {
value: UnsafeCell<T>
}
impl<T> Cell<T> {
pub fn set(&self, val: T) {
let old = self.replace(val);
drop(old); // Is this needed?
} // old would drop here anyways?
pub fn replace(&self, val: T) -> T {
mem::replace(unsafe { &mut *self.value.get() }, val)
}
}
So why not have set do this only:
pub fn set(&self, val: T) {
self.replace(val);
}
or std::ptr::read does something I don't understand.
It is not needed, but calling drop explicitly can help make code easier to read in some cases. If we only wrote it as a call to replace, it would look like a wrapper function for replace and a reader might lose the context that it does an additional action on top of calling the replace method (dropping the previous value). At the end of the day though it is somewhat subjective on which version to use and it makes no functional difference.
That being said, the real reason is that it did not always drop the previous value when set. Cell<T> previously implemented set to overwrite the existing value via unsafe pointer operations. It was later modified in rust-lang/rust#39264: Extend Cell to non-Copy types so that the previous value would always be dropped. The writer (wesleywiser) likely wanted to more explicitly show that the previous value was being dropped when a new value is written to the cell so the pull request would be easier to review.
Personally, I think this is a good usage of drop since it helps to convey what we intend to do with the result of the replace method.

vector<reference_wrapper> .. things going out of scope? how does it work?

Use case: I am converting data from a very old program of mine to a database friendly format. There are parts where I have to do multiple passes over the old data, because in particular the keys have to first exist before I can reference them in relationships. So I thought why not put the incomplete parts in a vector of references during the first pass and return it from the working function, so I can easily use that vector to make the second pass over whatever is still incomplete. I like to avoid pointers when possible so I looked into std::reference_wrapper<T> which seemes like exactly what I need .. except I don't understand it's behavior at all.
I have both vector<OldData> old_data and vector<NewData> new_data as member of my conversion class. The converting member function essentially does:
//...
vector<reference_wrapper<NewData>> incomplete;
for(const auto& old_elem : old_data) {
auto& new_ref = *new_data.insert(new_data.end(), convert(old_elem));
if(is_incomplete(new_ref)) incomplete.push_back(ref(new_ref));
}
return incomplete;
However, incomplete is already broken immediately after the for loop. The program compiles, but crashes and produces gibberish. Now I don't know if I placed ref correctly, but this is only one of many tries where I tried to put it somewhere else, use push_back or emplace_back instead, etc. ..
Something seems to be going out of scope, but what? both new_data and old_data are class members, incomplete also lives outside the loop, and according to the documentation, reference_wrapper is copyable.
Here's a simplified MWE that compiles, crashes, and produces gibberish:
// includes ..
using namespace std;
int main() {
int N = 2; // works correctly for N = 1 without any other changes ... ???
vector<string> strs;
vector<reference_wrapper<string>> refs;
for(int i = 0; i < N; ++i) {
string& sref = ref(strs.emplace_back("a"));
refs.push_back(sref);
}
for (const auto& r : refs) cout << r.get(); // crash & gibberish
}
This is g++ 10.2.0 with -std=c++17 if it means anything. Now I will probably just use pointers and be done, but I would like to understand what is going on here, documentation / search does not seem to help..
The problem here is that you are using vector data structure which might re-allocate memory for the entire vector any time that you add an element, so all previous references on that vector most probably get invalidated, you can resolve your problem by using list instead of vector.

Why is the value moved into the closure here rather than borrowed?

The Error Handling chapter of the Rust Book contains an example on how to use the combinators of Option and Result. A file is read and through application of a series of combinators the contents are parsed as an i32 and returned in a Result<i32, String>.
Now, I got confused when I looked at the code. There, in one closure to an and_then a local String value is created an subsequently passed as a return value to another combinator.
Here is the code example:
use std::fs::File;
use std::io::Read;
use std::path::Path;
fn file_double<P: AsRef<Path>>(file_path: P) -> Result<i32, String> {
File::open(file_path)
.map_err(|err| err.to_string())
.and_then(|mut file| {
let mut contents = String::new(); // local value
file.read_to_string(&mut contents)
.map_err(|err| err.to_string())
.map(|_| contents) // moved without 'move'
})
.and_then(|contents| {
contents.trim().parse::<i32>()
.map_err(|err| err.to_string())
})
.map(|n| 2 * n)
}
fn main() {
match file_double("foobar") {
Ok(n) => println!("{}", n),
Err(err) => println!("Error: {}", err),
}
}
The value I am referring to is contents. It is created and later referenced in the map combinator applied to the std::io::Result<usize> return value of Read::read_to_string.
The question: I thought that not marking the closure with move would borrow any referenced value by default, which would result in the borrow checker complaining, that contents does not live long enough. However, this code compiles just fine. That means, the String contents is moved into, and subequently out of, the closure. Why is this done without the explicit move?
I thought that not marking the closure with move would borrow any referenced value by default,
Not quite. The compiler does a bit of inspection on the code within the closure body and tracks how the closed-over variables are used.
When the compiler sees that a method is called on a variable, then it looks to see what type the receiver is (self, &self, &mut self). When a variable is used as a parameter, the compiler also tracks if it is by value, reference, or mutable reference. Whatever the most restrictive requirement is will be what is used by default.
Occasionally, this analysis is not complete enough — even though the variable is only used as a reference, we intend for the closure to own the variable. This usually occurs when returning a closure or handing it off to another thread.
In this case, the variable is returned from the closure, which must mean that it is used by value. Thus the variable will be moved into the closure automatically.
Occasionally the move keyword is too big of a hammer as it moves all of the referenced variables in. Sometimes you may want to just force one variable to be moved in but not others. In that case, the best solution I know of is to make an explicit reference and move the reference in:
fn main() {
let a = 1;
let b = 2;
{
let b = &b;
needs_to_own_a(move || a_function(a, b));
}
}

Can record constructors make record constants more concise?

I have some tabular data:
Foo Bar
-------------
fooes 42
bars 666
...
So, I declare the entity structure:
type TFoo = record
Foo: string;
Bar: Integer
end;
and the table of entities:
const FOOES = array [M..N] of TFoo = (
// Have to specify the field names for each record...
(Foo: 'fooes'; Bar: 42),
(Foo: 'bars'; Bar: 666)
{ so on }
);
As you see, this looks quite verbose and redundant, and it is because I initialize all of the fields for all of the records. And there is a lot of editing if I copy tabular data prepared elsewhere. I'd prefer to not enumerate all of the fields and stick to the more laconic C style, that is, constants only. And here comes the record constructor...
Can record constructors help me in this case?
Here's an example in C. You'll notice that we don't have to specify the field names in each declaration:
#include <stdio.h>
typedef struct {
char foo[10];
int bar;
} foo;
int main(void) {
/* Look here */
foo FOOES[2] = {{"foo", 42}, {"bar", 666}};
int i = 0;
for (; i < 2; i++) {
printf("%s\t%d\n", FOOES[i].foo, FOOES[i].bar);
}
return 0;
}
A const is just a read-only var which is loaded/mapped within the code, when the executable is launched.
You can create a var record (or a const but overriding the writable const option), then initialize it in the initialization section of the unit.
var FOOES = array [M..N] of TFoo;
....
initialization
SetFooArray(FOOES,['fooes',42,'bar',230]);
...
end.
The custom SetFooArray() function will put all array of const parameters into FOOES.
I use this technique sometimes to initialize computable arrays (e.g. conversion or lookup tables). Sometimes, it does make sense to compute once at startup a huge array, saving some KB of const in the source code, with a few lines of code.
But I'm not sure it will be worth it in your case. The default const declaration is a bit verbose, but not a problem if you use Ctrl+C/Ctrl+V or a find and replace. It is the most standard, is secure if you change later the record layout (whereas the C construction may compile without error), and will create a true constant.
Record constructors are runtime only and so for constants your current solution is the only option.
If you want it done in source then what you have already typed is your answer. You could, of course, put the data in separate arrays and initialize them that way, but that can make your code look messy.
You could also store them in an text file (Foo=Bar format) and read them into a TStringList at run-time (SL.LoadFromFile()). But even with a sorted TStringList it will be far less efficient (MyVariable := SL.Values['Foo1']; for example).
There are a million ways to solve this problem outside of source code. Taking it from the other direction, put the data into Excel and create an Excel macro to build the source and put it into the clipboard to paste into your PAS file. This wouldn't be too difficult and probably easier than formatting the Delphi code within the IDE.

How can I load an unnamed function in Lua?

I want users of my C++ application to be able to provide anonymous functions to perform small chunks of work.
Small fragments like this would be ideal.
function(arg) return arg*5 end
Now I'd like to be able to write something as simple as this for my C code,
// Push the function onto the lua stack
lua_xxx(L, "function(arg) return arg*5 end" )
// Store it away for later
int reg_index = luaL_ref(L, LUA_REGISTRY_INDEX);
However I dont think lua_loadstring will do "the right thing".
Am I left with what feels to me like a horrible hack?
void push_lua_function_from_string( lua_State * L, std::string code )
{
// Wrap our string so that we can get something useful for luaL_loadstring
std::string wrapped_code = "return "+code;
luaL_loadstring(L, wrapped_code.c_str());
lua_pcall( L, 0, 1, 0 );
}
push_lua_function_from_string(L, "function(arg) return arg*5 end" );
int reg_index = luaL_ref(L, LUA_REGISTRY_INDEX);
Is there a better solution?
If you need access to parameters, the way you have written is correct. lua_loadstring returns a function that represents the chunk/code you are compiling. If you want to actually get a function back from the code, you have to return it. I also do this (in Lua) for little "expression evaluators", and I don't consider it a "horrible hack" :)
If you only need some callbacks, without any parameters, you can directly write the code and use the function returned by lua_tostring. You can even pass parameters to this chunk, it will be accessible as the ... expression. Then you can get the parameters as:
local arg1, arg2 = ...
-- rest of code
You decide what is better for you - "ugly code" inside your library codebase, or "ugly code" in your Lua functions.
Have a look at my ae. It caches functions from expressions so you can simply say ae_eval("a*x^2+b*x+c") and it'll only compile it once.

Resources