vector<reference_wrapper> .. things going out of scope? how does it work? - c++17

Use case: I am converting data from a very old program of mine to a database friendly format. There are parts where I have to do multiple passes over the old data, because in particular the keys have to first exist before I can reference them in relationships. So I thought why not put the incomplete parts in a vector of references during the first pass and return it from the working function, so I can easily use that vector to make the second pass over whatever is still incomplete. I like to avoid pointers when possible so I looked into std::reference_wrapper<T> which seemes like exactly what I need .. except I don't understand it's behavior at all.
I have both vector<OldData> old_data and vector<NewData> new_data as member of my conversion class. The converting member function essentially does:
//...
vector<reference_wrapper<NewData>> incomplete;
for(const auto& old_elem : old_data) {
auto& new_ref = *new_data.insert(new_data.end(), convert(old_elem));
if(is_incomplete(new_ref)) incomplete.push_back(ref(new_ref));
}
return incomplete;
However, incomplete is already broken immediately after the for loop. The program compiles, but crashes and produces gibberish. Now I don't know if I placed ref correctly, but this is only one of many tries where I tried to put it somewhere else, use push_back or emplace_back instead, etc. ..
Something seems to be going out of scope, but what? both new_data and old_data are class members, incomplete also lives outside the loop, and according to the documentation, reference_wrapper is copyable.
Here's a simplified MWE that compiles, crashes, and produces gibberish:
// includes ..
using namespace std;
int main() {
int N = 2; // works correctly for N = 1 without any other changes ... ???
vector<string> strs;
vector<reference_wrapper<string>> refs;
for(int i = 0; i < N; ++i) {
string& sref = ref(strs.emplace_back("a"));
refs.push_back(sref);
}
for (const auto& r : refs) cout << r.get(); // crash & gibberish
}
This is g++ 10.2.0 with -std=c++17 if it means anything. Now I will probably just use pointers and be done, but I would like to understand what is going on here, documentation / search does not seem to help..

The problem here is that you are using vector data structure which might re-allocate memory for the entire vector any time that you add an element, so all previous references on that vector most probably get invalidated, you can resolve your problem by using list instead of vector.

Related

Am I using ArrayLists wrong in Zig when performing simple variable assignment changes function behaviour?

I have been doing Advent of Code this year, to learn Zig, and I discovered something during Day 5 that really confused me. So: mild spoilers for Day 5 of Advent of Code 2022, I guess?
I decided to implement my solution to Day 5 as an ArrayList of ArrayLists of U8s, which has ended up working well. My full solution file is here (probably terribly un-idiomatic Zig, but we all have to start somewhere).
As part of my solution, I have a function, which I call moveCrates, on a struct which wraps my arraylist of arraylists.
The relevant part of the struct declaration looks as so:
const BunchOfStacks = struct {
stacks: ArrayList(ArrayList(u8)),
...
This function is here, and looks like this:
fn moveCrates(self: *BunchOfStacks, amount: usize, source: usize, dest: usize) !void {
const source_height = self.stacks.items[source - 1].items.len;
const crate_slice = self.stacks.items[source - 1].items[(source_height - amount)..];
try self.stacks.items[dest - 1].appendSlice(crate_slice);
self.stacks.items[source - 1].shrinkRetainingCapacity(source_height - amount);
}
You can note that I refer 3 times to the source list by the very verbose reference self.stacks.items[source - 1]. This is not how I first wrote this function. I first wrote it like below:
fn moveCrates(self: *BunchOfStacks, amount: usize, source: usize, dest: usize) !void {
var source_list: ArrayList(u8) = self.stacks.items[source - 1];
const source_height = source_list.items.len;
const crate_slice = source_list.items[(source_height - amount)..];
try self.stacks.items[dest - 1].appendSlice(crate_slice);
source_list.shrinkRetainingCapacity(source_height - amount);
}
But this second form, where I make a local variable for my convenience, DOES NOT GIVE THE CORRECT RESULTS! It compiles fine, but seems to always point source_list to the same internal ArrayList(u8) (whichever one it first picks) regardless of what the value of source is. This means that the test example produces incorrect output.
This function is called within a loop, like so:
while (instructions.next()) |_| {
// First part is the verb, this is always "move" so skip it
// Get the amount next
const amount: usize = try std.fmt.parseInt(usize, instructions.next().?, 10);
// now skip _from_
_ = instructions.next();
// Now get source
const source: usize = try std.fmt.parseInt(usize, instructions.next().?, 10);
// Now skip _to_
_ = instructions.next();
// Now get dest
const dest: usize = try std.fmt.parseInt(usize, instructions.next().?, 10);
var crates_moved: usize = 0;
while (crates_moved < amount) : (crates_moved += 1) {
try stacks_part1.moveCrates(1, source, dest);
}
try stacks_part2.moveCrates(amount, source, dest);
}
Ultimately, as you can see, I have just avoided making variable assignments in the function and this passes the test (and the puzzle).
I have checked for issues in the zig repo that might be related, and cannot find anything immediately obvious (search used is this). I've looked on StackOverflow, and found this question, which does have some similarity to my issue (pointers seem a bit whack in while loops), but it's not the same.
I've scoured the zig documentation on loops and assignment, on the site, but don't see anything calling out this behaviour specifically.
I'm assuming I've either completely misunderstood something (or missed something that isn't well documented), or this is a bug — the language is under heavy active dev, after all.
I'm expecting that an assignment like I perform should work as expected — being a simple shorthand to avoid having to write out the repeated self.stacks.items[source - 1], so I'm hopeful that this is something that I'm just doing wrong. Zig version is v0.11.0-dev.537+36da3000c
An array list stores items as a slice (a pointer to the first item plus length).
When you do
var source_list: ArrayList(u8) = self.stacks.items[source - 1];
you make a shallow copy of the array list. You might be confused by your knowledge of some higher-level languages where this would've copied a reference to an object of the array list, but in Zig this isn't the case. In Zig everything is a "value object".
When you call shrinkRetainingCapacity it changes the length property of the items slice, but the changes are done to the local copy of the array list. The "real" array list, that is stored in another array list, remain unaffected.
TL;DR You need to use a pointer:
var source_list: *ArrayList(u8) = &self.stacks.items[source - 1];
This fixes the failing test.

LLVM, Get first usage of a global variable

I'm new to LLVM and I'm stuck on something that might seem basic.
I'm writing a LLVM pass to apply some transformations to global variables before they are use.
I would like to detect somehow when is the first usage of a global variable to only apply the transformation there, and not in all places where the global variable is used. But it must be the first time it is used otherwise the program crashes.
I have been reading about the AnalysisManager, and I would say that I want something similar to DominatorTree which is used for basic blocks in a function.
So the idea is to get the DominatorTree of a GlobalVariable to get the first time it is used in the code and apply there my transformation.
Given the following example
int MyGlobal = 30;
void foo()
{
printf("%s\n", MyGlobal);
}
int main()
{
printf("%s\n", MyGlobal);
foo();
}
In the example above, I only want to apply the transformation just before the first printf in the main function
Given the following example
int MyGlobal = 30;
void foo()
{
printf("%s\n", MyGlobal);
}
int main()
{
foo();
printf("%s\n", MyGlobal);
}
For the example above I would like to apply the transformation inside the foo function.
I want to avoid to create a stub function at the beginning of the program to process all globals before start running (This is what actually Im doing)
Does LLVM provide something that can help me doing this? or what should be the best approach to implement it?

Possible runtime error with while loop-Polyspace

I am working with Embedded C language and recently run the MathWorks Polyspace Code Prover (Dynamic analysis) for the whole project to check for critical runtime errors. It found one bug (Red warning) at While loop where I am copying some ROM data into RAM via memory registers.
The code is working fine and as expected but I would like to ask if there is any solution to safely remove this warning. Please find the code example below:
register int32 const *source;
uint32 i=0;
uint32 *dest;
source= (int32*)&ADDR_SWR4_BEGIN;
dest = (uint32*)&ADDR_ARAM_BEGIN;
if ( source != NULL )
{
while ( i < 2048 )
{
dest[i] = (uint32)source[i];
i++;
}
}
My guess is that ADDR_SWR4_BEGIN and ADDR_ARAM_BEGIN is defined in linker script and polyspace didn't compile and link the project that is why it is complaining about the possible run time error or infinite loop.
ADDR_SWR4_BEGIN and ADDR_ARAM_BEGIN are defined as extern in the respective header file.
extern uint32_t ADDR_SWR4_BEGIN;
extern uint32_t ADDR_ARAM_BEGIN;
The warning is red and exact warning is as follow:
Check: Non-terminating Loop
Detail: The Loop is infinite or contains a run-time error
Severity: Unset
Any suggestions would be appreciated.
The code is overall quite fishy.
Bugs
if ( source != NULL ). You just set this pointer to point at an address, so it will obviously not point at NULL. This line is superfluous.
You aren't using volatile when accessing registers/memory, so if this code is executed multiple times, the compiler might make all kinds of strange assumptions. This might be the cause of the diagnostic message.
Bad style/code smell (should be fixed)
Using the register keyword is fishy. This was once a thing in the 1980s when compilers were horrible and couldn't optimize code properly. Nowadays they can do this, and far better than the programmer, so any presence of register in new source code is fishy.
Accessing a register or memory location as int32 and then casting this to unsigned type doesn't make any sense at all. If the data isn't signed, then why are you using a signed type in the first place.
Using home-brewed uint32 types instead of stdint.h is poor style.
Nit-picks (minor remarks)
The (int32*) cast should be const qualified.
The loop is needlessly ugly, could be replaced with a for loop:
for(uint32_t i=0; i<2048; i++)
{
dest[i] = source[i];
}
If PolySpace does not know the value ADDR_ARAM_BEGIN it will assume it could be NULL (or any other value value for its type). While you explicitly test for source being NULL, you do not do the same for dest.
Since both source and dest are assigned from linker constants and in normal circumstances neither should be NULL it is unnecessary to explicitly test for NULL in the control flow and an assert() would be preferable - PolySPace recognises assertions, and will apply the constraint in subsequent analysis, but assert() resolves to nothing when NDEBUG is defined (normally in release builds), so does not impose unnecessary overhead:
const uint32_t* source = (const uint32_t*)&ADDR_SWR4_BEGIN ;
uint32_t* dest = (uint32_t*)&ADDR_ARAM_BEGIN;
// PolySpace constraints asserted
assert( source != NULL ) ;
assert( dest != NULL ) ;
for( int i = 0; i < 2048; i++ )
{
dest[i] = source[i] ;
}
An alternative is to provide PolySpace with a "forced-include" (-include option) to provide explicit definitions so that PolySpace will not consider all possible values to be valid in its analysis. That will probably have the effect of speeding analysis also.
the reason why Polyspace is giving a red error here is that source and dest are pointers to a uint32. Indeed, when you write:
source= (int32*)&ADDR_SWR4_BEGIN
you take the address of the variable ADDR_SWR4_BEGIN and assign it to source.
Hence both pointers are pointing to a buffer of 4 bytes only.
It is then not possible to use these pointers like arrays of 2048 elements.
You should also see an orange check on source[i] giving you information on what's happening with the pointer source.
It seems that ADDR_SWR4_BEGIN and ADDR_SWR4_BEGIN are actually containing addresses.
And in this case, the code should be:
source = (uint32*)ADDR_SWR4_BEGIN;
dest = (uint32*)ADDR_ARAM_BEGIN;
If you do this change in the code, the red error disappears.

ANTLR Parse tree modification

I'm using ANTLR4 to create a parse tree for my grammar, what I want to do is modify certain nodes in the tree. This will include removing certain nodes and inserting new ones. The purpose behind this is optimization for the language I am writing. I have yet to find a solution to this problem. What would be the best way to go about this?
While there is currently no real support or tools for tree rewriting, it is very possible to do. It's not even that painful.
The ParseTreeListener or your MyBaseListener can be used with a ParseTreeWalker to walk your parse tree.
From here, you can remove nodes with ParserRuleContext.removeLastChild(), however when doing this, you have to watch out for ParseTreeWalker.walk:
public void walk(ParseTreeListener listener, ParseTree t) {
if ( t instanceof ErrorNode) {
listener.visitErrorNode((ErrorNode)t);
return;
}
else if ( t instanceof TerminalNode) {
listener.visitTerminal((TerminalNode)t);
return;
}
RuleNode r = (RuleNode)t;
enterRule(listener, r);
int n = r.getChildCount();
for (int i = 0; i<n; i++) {
walk(listener, r.getChild(i));
}
exitRule(listener, r);
}
You must replace removed nodes with something if the walker has visited parents of those nodes, I usually pick empty ParseRuleContext objects (this is because of the cached value of n in the method above). This prevents the ParseTreeWalker from throwing a NPE.
When adding nodes, make sure to set the mutable parent on the ParseRuleContext to the new parent. Also, because of the cached n in the method above, a good strategy is to detect where the changes need to be before you hit where you want your changes to go in the walk, so the ParseTreeWalker will walk over them in the same pass (other wise you might need multiple passes...)
Your pseudo code should look like this:
public void enterRewriteTarget(#NotNull MyParser.RewriteTargetContext ctx){
if(shouldRewrite(ctx)){
ArrayList<ParseTree> nodesReplaced = replaceNodes(ctx);
addChildTo(ctx, createNewParentFor(nodesReplaced));
}
}
I've used this method to write a transpiler that compiled a synchronous internal language into asynchronous javascript. It was pretty painful.
Another approach would be to write a ParseTreeVisitor that converts the tree back to a string. (This can be trivial in some cases, because you are only calling TerminalNode.getText() and concatenate in aggregateResult(..).)
You then add the modifications to this visitor so that the resulting string representation contains the modifications you try to achieve.
Then parse the string and you get a parse tree with the desired modifications.
This is certainly hackish in some ways, since you parse the string twice. On the other hand the solution does not rely on antlr implementation details.
I needed something similar for simple transformations. I ended up using a ParseTreeWalker and a custom ...BaseListener where I overwrote the enter... methods. Inside this method the ParserRuleContext.children is available and can be manipulated.
class MyListener extends ...BaseListener {
#Override
public void enter...(...Context ctx) {
super.enter...(ctx);
ctx.children.add(...);
}
}
new ParseTreeWalker().walk(new MyListener(), parseTree);

How can I load an unnamed function in Lua?

I want users of my C++ application to be able to provide anonymous functions to perform small chunks of work.
Small fragments like this would be ideal.
function(arg) return arg*5 end
Now I'd like to be able to write something as simple as this for my C code,
// Push the function onto the lua stack
lua_xxx(L, "function(arg) return arg*5 end" )
// Store it away for later
int reg_index = luaL_ref(L, LUA_REGISTRY_INDEX);
However I dont think lua_loadstring will do "the right thing".
Am I left with what feels to me like a horrible hack?
void push_lua_function_from_string( lua_State * L, std::string code )
{
// Wrap our string so that we can get something useful for luaL_loadstring
std::string wrapped_code = "return "+code;
luaL_loadstring(L, wrapped_code.c_str());
lua_pcall( L, 0, 1, 0 );
}
push_lua_function_from_string(L, "function(arg) return arg*5 end" );
int reg_index = luaL_ref(L, LUA_REGISTRY_INDEX);
Is there a better solution?
If you need access to parameters, the way you have written is correct. lua_loadstring returns a function that represents the chunk/code you are compiling. If you want to actually get a function back from the code, you have to return it. I also do this (in Lua) for little "expression evaluators", and I don't consider it a "horrible hack" :)
If you only need some callbacks, without any parameters, you can directly write the code and use the function returned by lua_tostring. You can even pass parameters to this chunk, it will be accessible as the ... expression. Then you can get the parameters as:
local arg1, arg2 = ...
-- rest of code
You decide what is better for you - "ugly code" inside your library codebase, or "ugly code" in your Lua functions.
Have a look at my ae. It caches functions from expressions so you can simply say ae_eval("a*x^2+b*x+c") and it'll only compile it once.

Resources