Choosing between buffers in a Metal shader - metal

I'm struggling with porting my OpenGL application to Metal. In my old app, I used to bind two buffers, one with vertices and respective colours and one with vertices and respective textures, and switch between the two based on some app logic. Now in Metal I've started with the Hello Triangle example where I tried running this vertex shader
vertex RasterizerData
vertexShader(uint vertexID [[vertex_id]],
constant AAPLVertex1 *vertices1 [[buffer(AAPLVertexInputIndexVertices1)]],
constant AAPLVertex2 *vertices2 [[buffer(AAPLVertexInputIndexVertices2)]],
constant bool &useFirstBuffer [[buffer(AAPLVertexInputIndexUseFirstBuffer)]])
{
float2 pixelSpacePosition;
if (useFirstBuffer) {
pixelSpacePosition = vertices1[vertexID].position.xy;
} else {
pixelSpacePosition = vertices2[vertexID].position.xy;
}
...
and this Objective-C code
bool useFirstBuffer = true;
[renderEncoder setVertexBytes:&useFirstBuffer
length:sizeof(bool)
atIndex:AAPLVertexInputIndexUseFirstBuffer];
[renderEncoder setVertexBytes:triangleVertices
length:sizeof(triangleVertices)
atIndex:AAPLVertexInputIndexVertices1];
(where AAPLVertexInputIndexVertices1 = 0, AAPLVertexInputIndexVertices2 = 1 and AAPLVertexInputIndexUseFirstBuffer = 3), which should result in vertices2 never getting accessed, but still I get the error: failed assertion 'Vertex Function(vertexShader): missing buffer binding at index 1 for vertices2[0].'
Everything works if I replace if (useFirstBuffer) with if (true) in the Metal code. What is wrong?

When you're hard-coding the conditional, the compiler is smart enough to eliminate the branch that references the absent buffer (via dead-code elimination), but when the conditional must be evaluated at runtime, the compiler doesn't know that the branch is never taken.
Since all declared buffer parameters must be bound, leaving the unreferenced buffer unbound trips the validation layer. You could bind a few "dummy" bytes at the Vertices2 slot (using -setVertexBytes:length:atIndex:) when not following that path to get around this. It's not important that the buffers have the same length, since, after all, the dummy buffer will never actually be accessed.

In the atIndex argument, you call the code with the values AAPLVertexInputIndexUseFirstBuffer and AAPLVertexInputIndexVertices1 but in the Metal code the values AAPLVertexInputIndexVertices1 and AAPLVertexInputIndexVertices2 appear in the buffer() spec. It looks like you need to use AAPLVertexInputIndexVertices1 instead of AAPLVertexInputIndexUseFirstBuffer in your calling code.

Related

ClearUnorderedAccessViewFloat on a Buffer

On a RGBA Texture this works as expected. Each component of the FLOAT[4] argument gets casted to the corresponding component of the DXGI_FORMAT of the texture.
However, with a Buffer this doesn't work and some rubbish is assigned the buffer based on the first component of the FLOAT[4] argument.
Although, this makes sense since a buffer UAV has no DXGI_FORMAT to specify what cast happens.
The docs said
If you want to clear the UAV to a specific bit pattern, consider using ID3D12GraphicsCommandList::ClearUnorderedAccessViewUint.
so you can just use as follows:
float fill_value = ...;
auto Values = std::array<UINT, 4>{ *((UINT*)&fill_value),0,0,0};
commandList->ClearUnorderedAccessViewUint(
ViewGPUHandleInCurrentHeap,
ViewCPUHandle,
pResource,
Values.data(),
0,
nullptr
);
I believe the debug layer should raise a error when using ClearUnorderedAccessViewFloat on a Buffer.
Edit: it actually does I just missed it.
D3D12 ERROR: ID3D12CommandList::ClearUnorderedAccessViewUint: ClearUnorderedAccessView* methods are not compatible with Structured Buffers. StructuredByteStride is set to 4 for resource 0x0000023899A09A40'. [ RESOURCE_MANIPULATION ERROR #1156: CLEARUNORDEREDACCESSVIEW_INCOMPATIBLE_WITH_STRUCTURED_BUFFERS]

D3D12 Dynamic Constant Buffer Creation and Binding

I'm writing a shader tool app where I can create nodes and link them to generate a texture:
I used D3D12 shader reflection to get the constant buffer variables and now I'm trying to figure out how to pass/bind these vars in runtime. At the moment I have the constant buffers defined in code like this:
struct ObjectConstants
{
DirectX::XMFLOAT4X4 World{ D3DUtil::Identity4x4() };
DirectX::XMFLOAT3 Color{ 0.f, 0.f, 0.f };
};
How can I create a constant buffer, bind it and copy data from CPU to GPU in runtime?
PS: The root signature and PSO are being created in runtime.

vector<reference_wrapper> .. things going out of scope? how does it work?

Use case: I am converting data from a very old program of mine to a database friendly format. There are parts where I have to do multiple passes over the old data, because in particular the keys have to first exist before I can reference them in relationships. So I thought why not put the incomplete parts in a vector of references during the first pass and return it from the working function, so I can easily use that vector to make the second pass over whatever is still incomplete. I like to avoid pointers when possible so I looked into std::reference_wrapper<T> which seemes like exactly what I need .. except I don't understand it's behavior at all.
I have both vector<OldData> old_data and vector<NewData> new_data as member of my conversion class. The converting member function essentially does:
//...
vector<reference_wrapper<NewData>> incomplete;
for(const auto& old_elem : old_data) {
auto& new_ref = *new_data.insert(new_data.end(), convert(old_elem));
if(is_incomplete(new_ref)) incomplete.push_back(ref(new_ref));
}
return incomplete;
However, incomplete is already broken immediately after the for loop. The program compiles, but crashes and produces gibberish. Now I don't know if I placed ref correctly, but this is only one of many tries where I tried to put it somewhere else, use push_back or emplace_back instead, etc. ..
Something seems to be going out of scope, but what? both new_data and old_data are class members, incomplete also lives outside the loop, and according to the documentation, reference_wrapper is copyable.
Here's a simplified MWE that compiles, crashes, and produces gibberish:
// includes ..
using namespace std;
int main() {
int N = 2; // works correctly for N = 1 without any other changes ... ???
vector<string> strs;
vector<reference_wrapper<string>> refs;
for(int i = 0; i < N; ++i) {
string& sref = ref(strs.emplace_back("a"));
refs.push_back(sref);
}
for (const auto& r : refs) cout << r.get(); // crash & gibberish
}
This is g++ 10.2.0 with -std=c++17 if it means anything. Now I will probably just use pointers and be done, but I would like to understand what is going on here, documentation / search does not seem to help..
The problem here is that you are using vector data structure which might re-allocate memory for the entire vector any time that you add an element, so all previous references on that vector most probably get invalidated, you can resolve your problem by using list instead of vector.

Possible runtime error with while loop-Polyspace

I am working with Embedded C language and recently run the MathWorks Polyspace Code Prover (Dynamic analysis) for the whole project to check for critical runtime errors. It found one bug (Red warning) at While loop where I am copying some ROM data into RAM via memory registers.
The code is working fine and as expected but I would like to ask if there is any solution to safely remove this warning. Please find the code example below:
register int32 const *source;
uint32 i=0;
uint32 *dest;
source= (int32*)&ADDR_SWR4_BEGIN;
dest = (uint32*)&ADDR_ARAM_BEGIN;
if ( source != NULL )
{
while ( i < 2048 )
{
dest[i] = (uint32)source[i];
i++;
}
}
My guess is that ADDR_SWR4_BEGIN and ADDR_ARAM_BEGIN is defined in linker script and polyspace didn't compile and link the project that is why it is complaining about the possible run time error or infinite loop.
ADDR_SWR4_BEGIN and ADDR_ARAM_BEGIN are defined as extern in the respective header file.
extern uint32_t ADDR_SWR4_BEGIN;
extern uint32_t ADDR_ARAM_BEGIN;
The warning is red and exact warning is as follow:
Check: Non-terminating Loop
Detail: The Loop is infinite or contains a run-time error
Severity: Unset
Any suggestions would be appreciated.
The code is overall quite fishy.
Bugs
if ( source != NULL ). You just set this pointer to point at an address, so it will obviously not point at NULL. This line is superfluous.
You aren't using volatile when accessing registers/memory, so if this code is executed multiple times, the compiler might make all kinds of strange assumptions. This might be the cause of the diagnostic message.
Bad style/code smell (should be fixed)
Using the register keyword is fishy. This was once a thing in the 1980s when compilers were horrible and couldn't optimize code properly. Nowadays they can do this, and far better than the programmer, so any presence of register in new source code is fishy.
Accessing a register or memory location as int32 and then casting this to unsigned type doesn't make any sense at all. If the data isn't signed, then why are you using a signed type in the first place.
Using home-brewed uint32 types instead of stdint.h is poor style.
Nit-picks (minor remarks)
The (int32*) cast should be const qualified.
The loop is needlessly ugly, could be replaced with a for loop:
for(uint32_t i=0; i<2048; i++)
{
dest[i] = source[i];
}
If PolySpace does not know the value ADDR_ARAM_BEGIN it will assume it could be NULL (or any other value value for its type). While you explicitly test for source being NULL, you do not do the same for dest.
Since both source and dest are assigned from linker constants and in normal circumstances neither should be NULL it is unnecessary to explicitly test for NULL in the control flow and an assert() would be preferable - PolySPace recognises assertions, and will apply the constraint in subsequent analysis, but assert() resolves to nothing when NDEBUG is defined (normally in release builds), so does not impose unnecessary overhead:
const uint32_t* source = (const uint32_t*)&ADDR_SWR4_BEGIN ;
uint32_t* dest = (uint32_t*)&ADDR_ARAM_BEGIN;
// PolySpace constraints asserted
assert( source != NULL ) ;
assert( dest != NULL ) ;
for( int i = 0; i < 2048; i++ )
{
dest[i] = source[i] ;
}
An alternative is to provide PolySpace with a "forced-include" (-include option) to provide explicit definitions so that PolySpace will not consider all possible values to be valid in its analysis. That will probably have the effect of speeding analysis also.
the reason why Polyspace is giving a red error here is that source and dest are pointers to a uint32. Indeed, when you write:
source= (int32*)&ADDR_SWR4_BEGIN
you take the address of the variable ADDR_SWR4_BEGIN and assign it to source.
Hence both pointers are pointing to a buffer of 4 bytes only.
It is then not possible to use these pointers like arrays of 2048 elements.
You should also see an orange check on source[i] giving you information on what's happening with the pointer source.
It seems that ADDR_SWR4_BEGIN and ADDR_SWR4_BEGIN are actually containing addresses.
And in this case, the code should be:
source = (uint32*)ADDR_SWR4_BEGIN;
dest = (uint32*)ADDR_ARAM_BEGIN;
If you do this change in the code, the red error disappears.

Linux module: being notified about task creation and destruction

for Mach kernel API emulation on Linux, I need for my kernel module to get called when a task has been just created or is being terminated.
In my kernel module, this could most nicely be done via Linux Security Modules, but a couple of years ago, they prevented external modules from acting as a LSM by unexporting the needed symbols.
The only other way I could find was to make my module act like a rootkit. Find the syscall table and hook it in there.
Patching the kernel is out of the question. I need my app to be installed easily. Is there any other way?
You can use Kprobes, which enables you to dynamically hook into code in the kernel. You will need to find the right function among the ones involves in creating and destroying processes that give you the information you need. For instance, for tasks created, do_fork() in fork.c would be a good place to start. For tasks destroyed, do_exit. You would want to write a retprobe, which is a kind of kprobe that additionally gives you control at the end of the execution of the function, before it returns. The reason you want control before the function returns is to check if it succeeded in creating the process by checking the return value. If there was an error, then the function will return a negative value or in some cases possibly 0.
You would do this by creating a kretprobe struct:
static struct kretprobe do_fork_probe = {
.entry_handler = (kprobe_opcode_t *) my_do_fork_entry,
.handler = (kprobe_opcode_t *) my_do_fork_ret,
.maxactive = 20,
.data_size = sizeof(struct do_fork_ctx)
};
my_do_fork_entry gets executed when control enters the hooked function, and my_do_fork_ret gets executed just before it returns. You would hook it in as follows:
do_fork_probe.kp.addr =
(kprobe_opcode_t *) kallsyms_lookup_name("do_fork");
if ((ret = register_kretprobe(&do_fork_probe)) <0) {
// handle error
}
In the implementation of your hooks, it's a bit unwieldy to get the arguments and return value. You get these via the saved registers pt_regs data structure. Let's look at the return hook, where on x86 you get the return value via regs->ax.
static int my_do_fork_ret(struct kretprobe_instance *ri, struct pt_regs *regs)
{
struct do_fork_ctx *ctx = (struct do_fork_ctx *) ri->data;
int ret = regs->ax; // This is on x86
if (ret > 0) {
// It's not an error, probably a valid process
}
}
In the entry point, you can get access to the arguments via the registers. e.g. on x86, regs->di is the first argument, regs->si is the second etc. You can google to get the full list. Note that you shouldn't rely on these registers for the arguments in the return hook as the registers may have been overwritten for other computations.
You will surely have to jump many hoops in getting this working, but hopefully this note should set you off in the right direction.

Resources