Creating command queue with even in opencv - opencv

I am new to opencv and opencl.
In opencl they give a wrappers for opencl utility function calling.
I do not need to do much. ocl::Context::getContext() will get me the context and I can pass it to all opencl related execution. I do not need command quee here. But I want to know the performance of the kernels using opencl's profiling events. For that I need to create a custom command queue. How can I create a command queue with the same context that I used for executing kernel. Please I created this context using opencv's function ocl::Context::getContext().
I do not want to create command queue from scratch (by getting platform id, device id, context one by one). That would mean to change a lot of places. I want to reuse opencv's context and reuse it to create command queue with event capability.

You are in a tricky situation since the OpenCV code has missing functionality interface to the underlying OpenCL options:
804 void CommandQueue::create(ContextImpl* context)
805 {
806 release();
807 cl_int status = 0;
808 // TODO add CL_QUEUE_PROFILING_ENABLE
809 cl_command_queue clCmdQueue = clCreateCommandQueue(context->clContext, context->clDeviceID, 0, &status);
810 openCLVerifyCall(status);
811 context_ = context;
812 clQueue_ = clCmdQueue;
813 }
I think you should either release and re-create the internal queue, by:
cl_command_queue Queue = clCreateCommandQueue(ocl::Context::getOpenCLContextPtr(), ocl::Context::getOpenCLDeviceIDPtr(), CL_QUEUE_PROFILING_ENABLE); //Create a new queue with same parameters
ocl::CommandQueue::Release(); //To release the old queue
ocl::CommandQueue::clQueue_ = Queue ; //To overwrite it internally with the new one
Or do it everything yourself (creating all the devices and using them manually)
But beware! This is unsafe! (And untested). However, the DOC says that those classes have public attributes and they can be written from the outside.

Related

How to pass native void pointers to a Dart Isolate - without copying?

I am working on exposing an audio library (C library) for Dart. To trigger the audio engine, it requires a few initializations steps (non blocking for UI), then audio processing is triggered with a perform function, which is blocking (audio processing is a heavy task). That is why I came to read about Dart isolates.
My first thought was that I only needed to call the performance method in the isolate, but it doesn't seem possible, since the perform function takes the engine state as first argument - this engine state is an opaque pointer ( Pointer in dart:ffi ). When trying to pass engine state to a new isolate with compute function, Dart VM returns an error - it cannot pass C pointers to an isolate.
I could not find a way to pass this data to the isolate, I assume this is due to the separate memory of main isolate and the one I'm creating.
So, I should probably manage the entire engine state in the isolate which means :
Create the engine state
Initialize it with some options (strings)
trigger the perform function
control audio at runtime
I couldn't find any example on how to perform this actions in the isolate, but triggered from main thread/isolate. Neither on how to manage isolate memory (keep the engine state, and use it). Of course I could do
Here is a non-isolated example of what I want to do :
Pointer<Void> engineState = createEngineState();
initEngine(engineState, parametersString);
startEngine(engineState);
perform(engineState);
And at runtime, triggered by UI actions (like slider value changed, or button clicked) :
setEngineControl(engineState, valueToSet);
double controleValue = getEngineControl(engineState);
The engine state could be encapsulated in a class, I don't think it really matters here.
Whether it is a class or an opaque datatype, I can't find how to manage and keep this state, and perform triggers from main thread (processed in isolate). Any idea ?
In advance, thanks.
PS: I notice, while writing, that my question/explaination may not be precise, I have to say I'm a bit lost here, since I never used Dart Isolates. Please tell me if some information is missing.
EDIT April 24th :
It seems to be working with creating and managing object state inside the Isolate. But the main problem isn't solved. Because the perform method is actually blocking while it is not completed, there is no way to still receive messages in the isolate.
An option I thought first was to use the performBlock method, which only performs a block of audio samples. Like this :
while(performBlock(engineState)) {
// listen messages, and do something
}
But this doesn't seem to work, process is still blocked until audio performance finishes. Even if this loop is called in an async method in the isolate, it blocks, and no message are read.
I now think about the possibility to pass the Pointer<Void> managed in main isolate to another, that would then be the worker (for perform method only), and then be able to trigger some control methods from main isolate.
The isolate Dart package provides a registry sub library to manage some shared memory. But it is still impossible to pass void pointer between isolates.
[ERROR:flutter/lib/ui/ui_dart_state.cc(157)] Unhandled Exception: Invalid argument(s): Native objects (from dart:ffi) such as Pointers and Structs cannot be passed between isolates.
Has anyone already met this kind of situation ?
It is possible to get an address which this Pointer points to as a number and construct a new Pointer from this address (see Pointer.address and Pointer.fromAddress()). Since numbers can freely be passed between isolates, this can be used to pass native pointers between them.
In your case that could be done, for example, like this (I used Flutter's compute to make the example a bit simpler but that would apparently work with explicitly using Send/ReceivePorts as well)
// Callback to be used in a backround isolate.
// Returns address of the new engine.
int initEngine(String parameters) {
Pointer<Void> engineState = createEngineState();
initEngine(engineState, parameters);
startEngine(engineState);
return engineState.address;
}
// Callback to be used in a backround isolate.
// Does whichever processing is needed using the given engine.
void processWithEngine(int engineStateAddress) {
final engineState = Pointer<Void>.fromAddress(engineStateAddress);
process(engineState);
}
void main() {
// Initialize the engine in a background isolate.
final address = compute(initEngine, "parameters");
final engineState = Pointer<Void>.fromAddress(address);
// Do some heavy computation in a background isolate using the engine.
compute(processWithEngine, engineState.address);
}
I ended up doing the processing of callbacks inside the audio loop itself.
while(performAudio())
{
tasks.forEach((String key, List<int> value) {
double val = getCallback(key);
value.forEach((int element) {
callbackPort.send([element, val]);
});
});
}
Where the 'val' is the thing you want to send to callback. The list of int 'value' is a list of callback index.
Let's say you audio loop performs with vector size of 512 samples, you will be able to pass your callbacks after every 512 audio samples are processed, which means 48000 / 512 times per second (assuming you sample rate is 48000). This method is not the best one but it works, I still have to see if it works in very intensive processing context though. Here, it has been thought for realtime audio, but it could work the same for audio rendering.
You can see the full code here : https://framagit.org/johannphilippe/csounddart/-/blob/master/lib/csoundnative.dart

What is ID3D12GraphicsCommandList::DiscardResource?

What exactly should I expect to happen when using DiscardResource?
What's the difference between discard and destroying/deleting a resource.
When is a good time/use-case to discard a resource?
Unfortunately Microsoft doesn't seem to say much about it other than it "discards a resource".
TL;DR: Is a rarely used function that provides a driver hint related to handling clear compression structures. You are unlikely to use it except based on specific performance advice.
DiscardResource is the DirectX 12 version of the Direct3D 11.1 method. See Microsoft Docs
The primary use of these methods it to optimize the performance of tiled-based deferred rasterizer graphics parts by discarding the render target after present. This is a hint to the driver that the contents of the render target are no longer relevant to the operation of the program, so it can avoid some internal clearing operations on the next use.
For DirectX 11, this is in the DirectX 11 App template to use DiscardView because it makes use of DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL:
void DX::DeviceResources::Present()
{
// The first argument instructs DXGI to block until VSync, putting the application
// to sleep until the next VSync. This ensures we don't waste any cycles rendering
// frames that will never be displayed to the screen.
DXGI_PRESENT_PARAMETERS parameters = { 0 };
HRESULT hr = m_swapChain->Present1(1, 0, &parameters);
// Discard the contents of the render target.
// This is a valid operation only when the existing contents will be entirely
// overwritten. If dirty or scroll rects are used, this call should be removed.
m_d3dContext->DiscardView1(m_d3dRenderTargetView.Get(), nullptr, 0);
// Discard the contents of the depth stencil.
m_d3dContext->DiscardView1(m_d3dDepthStencilView.Get(), nullptr, 0);
// If the device was removed either by a disconnection or a driver upgrade, we
// must recreate all device resources.
if (hr == DXGI_ERROR_DEVICE_REMOVED || hr == DXGI_ERROR_DEVICE_RESET)
{
HandleDeviceLost();
}
else
{
DX::ThrowIfFailed(hr);
}
}
The DirectX 12 App template doesn't need those explicit calls because it uses DXGI_SWAP_EFFECT_FLIP_DISCARD.
If you are wondering why the DirectX 11 app doesn't just use DXGI_SWAP_EFFECT_FLIP_DISCARD, it probably should. The DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL swap effect was the only one supported by Windows 8.x for Windows Store apps, which is when DiscardView was introduced. For Windows 10 / DirectX 12 / UWP, it's probably better to always use DXGI_SWAP_EFFECT_FLIP_DISCARD unless you specifically don't want the backbuffer discarded.
It is also useful for multi-GPU SLI / Crossfire configurations since the clearing operation can require synchronization between the GPUs. See this GDC 2015 talk
There are also other scenario-specific usages. For example, if doing deferred rendering for the G-buffer where you know every single pixel will be overwritten, you can use DiscardResource instead of doing ClearRenderTargetView / ClearDepthStencilView.

Thread creation using pthread_create with SCHED_RR scheduling fails

I try to write some cores for create a pthread with SCHED_RR:
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setinheritsched (&attr, PTHREAD_EXPLICIT_SCHED);
pthread_attr_setschedpolicy(&attr, SCHED_RR);
struct sched_param params;
params.sched_priority = 10;
pthread_attr_setschedparam(&attr, &params);
pthread_create(&m_thread, &attr, &startThread, NULL);
pthread_attr_destroy(&attr);
But the thread does't run, do I need set more parameters?
A thread can only set the scheduling to SCHED_OTHER without the CAP_SYS_NICE capability. From sched(7):
In Linux kernels before 2.6.12, only privileged (CAP_SYS_NICE)
threads can set a nonzero static priority (i.e., set a real-time
scheduling policy). The only change that an unprivileged thread can
make is to set the SCHED_OTHER policy, and this can be done only if
the effective user ID of the caller matches the real or effective
user ID of the target thread (i.e., the thread specified by pid)
whose policy is being changed.
That means when you set the scheduling policy to round-robin scheduling (SCHED_RR) using pthread_attr_setschedpolicy() it's failed in all likelihood (unless you have enabled this capability for the user you are running as or running the program as sysadmin/root user who can override CAP_SYS_NICE).
You can set the capability using the setcap program:
$ sudo setcap cap_sys_nice=+ep ./a.out
(assuming a.out is your program name).
You'd have figured this out if you did error checking. You should check the return value of all the pthread functions (and generally all the library functions) for failure.
Since you haven't posted the full code, it might be an issue if you haven't joined with the thread you create (as main thread could exit before the m_thread was created and this exit the whole process). So, you might want to join:
pthread_join(m_thread, NULL);
or you could exit main thread without joining if main thread is no longer needed using pthread_exit(NULL); in main().

pthreads SIGEV_THREAD and async-safe function calls

Having trouble tracking down answer to usage of SIGEV_THREAD...
When one sets SIGEV_THREAD as the notify method in sigevent struct, is it correct to assume that async-signal-safe functions must still be used within the notify_function to be invoked as the handler?
Also - is it correct to assume the thread is run as "detached"?
For example
notify thread
void my_thread(union sigval my_data)
{
// is this ok or not (two non async-signal-safe functions)?
printf("in the notify function\n");
mq_send();
}
main function
(...)
se.sigev_notify = SIGEV_THREAD;
se.sigev_value.sival_ptr = &my_data;
se.sigev_notify_function = my_thread;
se.sigev_notify_attributes = NULL;
(...)
Please provide a reference if possible.
No, you don't need to use only async-signal-safe functions, because POSIX does not place any such limitation on the SIGEV_THREAD function. (The whole point of SIGEV_THREAD is that it lets you handle asychronous notifications in a less constrained environment than a signal handler).
As far as the thread being detached, POSIX says:
The function shall be executed in an environment as if it were the
start_routine for a newly created thread with thread attributes
specified by sigev_notify_attributes. If sigev_notify_attributes
is NULL, the behavior shall be as if the thread were created with
the detachstate attribute set to PTHREAD_CREATE_DETACHED. Supplying
an attributes structure with a detachstate attribute of
PTHREAD_CREATE_JOINABLE results in undefined behavior. The signal
mask of this thread is implementation-defined.
This means: you must either leave sigev_notify_attributes as NULL, or set it to an attributes structure with the detachstate set to PTHREAD_CREATE_DETACHED - in both cases the thread will be created detached.

Is it ok to create EAGLContext for each thread?

I want to do some work in my OpenGL ES project in concurrent GCD queues. Is it ok if to create EAGLContext for each thread? I'm going to do it with such way:
queue_ = dispatch_queue_create("test.queue", DISPATCH_QUEUE_CONCURRENT);
dispatch_async(queue_, ^{
NSMutableDictionary* threadDictionary = [[NSThread currentThread] threadDictionary];
EAGLContext* context = threadDictionary[#"context"];
if (!context) {
context = /* creating EAGLContext with sharegroup */;
threadDictionary[#"context"] = context;
}
if ([EAGLContext setCurrentContext:context]) {
// rendering
[EAGLContext setCurrentContext:nil];
}
});
If it is not correct what is the best practice to parallelize OpenGL rendering?
Not only is it okay, this is the only way you can share OpenGL resources between multiple threads. Note that shareable resources are typically limited to resources that allocate memory (e.g. buffer objects, textures, shaders). They do not include objects that merely store state (e.g. the global state machine, Framebuffer Objects or Vertex Array Objects). But if you are considering modifying data that you are using for rendering, I would strongly advise against this.
Whenever GL has a command in the pipeline that has not finished, any attempt to modify a resource used by that command will block until the command finishes. A better solution would be to double-buffer your resources, have a copy you use for rendering and a separate copy you use for updating. When you finish updating, the next time your drawing thread uses that resource, have it swap the buffers used for updating and drawing. This will reduce the amount of time the driver has to synchronize your worker threads with the drawing thread.
Now, if you are suggesting here that you want to draw from multiple threads, then you should re-think your strategy. OpenGL generally does not benefit from issuing draw commands from multiple threads, it just creates a synchronization nightmare. Multi-threading is useful mostly for controlling VSYNC on multiple windows (probably not something you will ever encounter in ES) or streaming resource data in the background.

Resources