I would like to utilize OpenCV's integration of OpenGL/OpenCL to achieve fast distortion of images directly on the GPU while avoiding GPU/CPU image transfers. I can create a cv::ogl::Buffer from an OpenGL buffer object in Qt:
// m_pub is of type QOpenGLBuffer
cv::ogl::Buffer b(512, 512, CV_8UC4, m_pub.bufferId());
But the next line throws an exception:
cv::UMat m = cv::ogl::mapGLBuffer(b);
The error reported by OpenCV was originally:
OpenCV(4.5.5) Error: Unknown error code -220 (OpenCL:
clCreateFromGLBuffer failed) in cv::ogl::mapGLBuffer, file
D:\OpenCV\opencv-4.5.5\modules\core\src\opengl.cpp, line 1886
To get further information, I call cv::ocl::getOpenCLErrorString(status); in opengl.cpp, rebuild, and find that the error is CL_INVALID_CONTEXT.
I've checked cv::ocl::Context, cv::ocl::Device, cv::ocl::Platform, added cv::ocl::attachContext, all doesn't work. I'm stuck here, don't know how to go forward.
Any suggestions are really appreciated. Thanks.
It needs to call cv::ogl::ocl::initializeContextFromGL() in the initialization phase. My project is based on Qt, the window is inherited from QOpenGLWindow, so I call that function in initializeGL().
I guess that without this call, OpenCV would also create an OpenCL context automatically, but that's not the same as the one created from OpenGL, that's why the CL_INVALID_CONTEXT error occurred.
As a reminder, I used cv::remap to process a 512x512 image and found that processing on the GPU via OpenCL was not much faster than on the CPU. It still takes around 12ms on average. This is still too slow if the application requires a high frame rate and needs to leave time for other processing.
Related
I checked the video stream displayed well in qml video surface. now I want to get the video frame data to do something not bad thing. but, It seems not doing well until now... I made a simple pipeline like below for focus on a test.
nvarguscamerasrc - appsink
I used QGst::Utils::ApplicationSink to get a frame data. I referenced an example "appsink-src"
/* making pipeline */
QGst::ElementPtr source, sink;
SubClassApplicationSink *appsink;
source = QGst::ElementFactory::make("nvarguscamerasrc");
sink = QGst::ElementFactory::make("appsink");
appsink = new SubClassApplicationSink();
// configure elements
source->setProperty("sensor-id", n);
appsink->setElement(sink);
appsink->enableDrop(true);
appsink->setMaxBuffers(7654321);
m_pipeline->add(source, sink);
source->link(sink);
subclass of ApplicationSink implements some callbacks eos, preroll, sample.
and I prints logs some values in a buffer I got from the new sample.
the same outputs are repeated as callback function is called.
result: [start-end offsets are -1, no flags, memory count 1, memory size 1008]
I don't know why... How do you think?
I solved the issue. the problem was a pipeline's composition. after put a "nvvidconv" element between "nvarguscamerasrc" and "appsink" then I could get video frames successfully.
I don't know why needs a nvvidconv element. but, It seems because of source's video type, "video/x-raw(memory:NVMM)" which means using DMA buffers for performance reasons.
https://forums.developer.nvidia.com/t/what-is-the-meaning-of-memory-nvmm/180522
I have a Kuka arm and some objects set up in my simulation(very similar to manipulation station example), and I have been running into coredump error below whenever there is a contact between the robot and the objects.
"abort: Failure at multibody/plant/multibody_plant.cc:1640 in CalcImplicitStribeckResults(): condition 'info == ImplicitStribeckSolverResult::kSuccess' failed.
Aborted (core dumped)"
Decreasing the integration step size for the simulator did not help, so I ended up tracing back the error and commented out the condition that is causing the error( "DRAKE_DEMAND(info == ImplicitStribeckSolverResult::kSuccess);" ), which seems to coredump a lot less often.
However, I am guessing that condition is there for a reason, so would commenting the line out cause any other issues in the simulation? What is the proper way to fix the coredump problem?
In Drake PR #12503, the ImplicitStribeck code was refactored to reflect the notation in the TAMSI arXiv paper, and in #12361 it was changed to provide a more helpful exception with troubleshooting tips:
https://github.com/RobotLocomotion/drake/blob/v0.15.0/multibody/plant/multibody_plant.cc#L1866-L1878
Can you try out a later version (e.g. 0.15.0), and then try out the troubleshooting instructions there? (You've already tried changing step size in the simulator, but you may want to check on the stiffness of your overall system, etc.)
We have a native C++ add-on running in an Electron renderer process providing Uint8Array bitmap data to JavaScript where it is painted into a canvas via textImage2D in a webgl context or putImageData in a 2d context.
The Uint8Array is allocated in the native addon and passed via a callback to JS. It is not deallocated immediately after the callback ends, and is kept in a memory pool that holds the last 10 frames sent to be available for async painting.
If the array is passed as is to putImageData or texImage2D, the renderer completely freezes. If it is copied into a new TypedArray beforehand, there is no problem, but we would like to avoid the extra copy operation, hence the memory pool.
I have a feeling the freezing is related to the way Chromium handles GL commands via it's command buffer.
I've tried the following Chromimum command line args in an attempt to isolate the issue, with no luck - renderer process is frozen when the array is not copied beforehand):
--use-passthrough-cmd-decoder results in longer render times for 2d, and a null context for webgl
--disable-gpu-sandbox does nothing
--in-process-gpu does nothing
Any idea what is happening?
I hope this question isn't too open-ended. I ran into a memory issue with Rust, where I got an "out of memory" from calling next on an Iterator trait object. I'm unsure how to debug it. Prints have only brought me to the point where the failure occurs. I'm not very familiar with other tools such as ltrace, so although I could create a trace (231MiB, pff), I didn't really know what to do with it. Is a trace like that useful? Would I do better to grab gdb/lldb? Or Valgrind?
In general I would try to do the following approach:
Boilerplate reduction: Try to narrow down the problem of the OOM, so that you don't have too much additional code around. In other words: the quicker your program crashes, the better. Sometimes it is also possible to rip out a specific piece of code and put it into an extra binary, just for the investigation.
Problem size reduction: Lower the problem from OOM to a simple "too much memory" so that you can actually tell the some part wastes something but that it does not lead to an OOM. If it is too hard to tell wether you see the issue or not, you can lower the memory limit. On Linux, this can be done using ulimit:
ulimit -Sv 500000 # that's 500MB
./path/to/exe --foo
Information gathering: If you problem is small enough, you are ready to collect information which has a lower noise level. There are multiple ways which you can try. Just remember to compile your program with debug symbols. Also it might be an advantage to turn off optimization since this usually leads to information loss. Both can be archived by NOT using the --release flag during compilation.
Heap profiling: One way is too use gperftools:
LD_PRELOAD="/usr/lib/libtcmalloc.so" HEAPPROFILE=/tmp/profile ./path/to/exe --foo
pprof --gv ./path/to/exe /tmp/profile/profile.0100.heap
This shows you a graph which symbolizes which parts of your program eat which amount of memory. See official docs for more details.
rr: Sometimes it's very hard to figure out what is actually happening, especially after you created a profile. Assuming you did a good job in step 2, you can use rr:
rr record ./path/to/exe --foo
rr replay
This will spawn a GDB with superpowers. The difference to a normal debug session is that you can not only continue but also reverse-continue. Basically your program is executed from a recording where you can jump back and forth as you want. This wiki page provides you some additional examples. One thing to point out is that rr only seems to work with GDB.
Good old debugging: Sometimes you get traces and recordings that are still way too large. In that case you can (in combination with the ulimit trick) just use GDB and wait until the program crashes:
gdb --args ./path/to/exe --foo
You now should get a normal debugging session where you can examine what the current state of the program was. GDB can also be launched with coredumps. The general problem with that approach is that you cannot go back in time and you cannot continue with execution. So you only see the current state including all stack frames and variables. Here you could also use LLDB if you want.
(Potential) fix + repeat: After you have a glue what might go wrong you can try to change your code. Then try again. If it's still not working, go back to step 3 and try again.
Valgrind and other tools work fine, and should work out of the box as of Rust 1.32. Earlier versions of Rust require changing the global allocator from jemalloc to the system's allocator so that Valgrind and friends know how to monitor memory allocations.
In this answer, I use the macOS developer tool Instruments, as I'm on macOS, but Valgrind / Massif / Cachegrind work similarly.
Example: An infinite loop
Here's a program that "leaks" memory by pushing 1MiB Strings into a Vec and never freeing it:
use std::{thread, time::Duration};
fn main() {
let mut held_forever = Vec::new();
loop {
held_forever.push("x".repeat(1024 * 1024));
println!("Allocated another");
thread::sleep(Duration::from_secs(3));
}
}
You can see memory growth over time, as well as the exact stack trace that allocated the memory:
Example: Cycles in reference counts
Here's an example of leaking memory by creating an infinite reference cycle:
use std::{cell::RefCell, rc::Rc};
struct Leaked {
data: String,
me: RefCell<Option<Rc<Leaked>>>,
}
fn main() {
let data = "x".repeat(5 * 1024 * 1024);
let leaked = Rc::new(Leaked {
data,
me: RefCell::new(None),
});
let me = leaked.clone();
*leaked.me.borrow_mut() = Some(me);
}
See also:
Why does Valgrind not detect a memory leak in a Rust program using nightly 1.29.0?
Handling memory leak in cyclic graphs using RefCell and Rc
Minimal `Rc` Dependency Cycle
In general, to debug, you can use either a log-based approach (either by inserting the logs yourself, or having a tool such a ltrace, ptrace, ... to generate the logs for you) or you can use a debugger.
Note that ltrace, ptrace or debugger-based approaches require that you be able to reproduce the problem; I tend to favor manual logs because I work in an industry where bug reports are generally too imprecise to allow immediate reproduction (and thus we use logs to create the reproducer scenario).
Rust supports both approaches, and the standard toolset that one uses for C or C++ programs works well for it.
My personal approach is to have some logging in place to quickly narrow down where the issue occurs, and if logging is insufficient to fire up a debugger for a more fine-combed inspection. In this case I would recommend going straight away for the debugger.
A panic is generated, which means that by breaking on the call to the panic hook, you get to see both the call stack and memory state at the moment where things go awry.
Launch your program with the debugger, set a break point on the panic hook, run the program, profit.
I am trying to create a Gtk Widget that you can pass an OpenCV image to that will then show it. I have created a class that is inherited from Gtk.Image that is used to show the image. You pass the OpenCV image to this class using the show_frame method, which then updates the Gtk.Image so it shows that image.
I have tested this and it works fine, i.e the image is correctly shown and updated when the show_frame method is called. However every time the image is updated, the memory used increases, until there is not enough memory and the program crashes.
I believe this is due to the memory that image is not being freed correctly. I cannot however work out how to fix this. I have tried unreferencing the gbytes once a new frame is received but this does not help. The memory only builds up when the set_from_pixbuf function is called. If this is commented out the memory usage stays at a constant level.
class OpenCVImageViewer(Gtk.Image):
def __init__(self):
Gtk.Image.__init__(self)
def show_frame(self, frame):
# Convert to opencv BGR to Gtk RGB
rgb_image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
# Get details about frame in order to set up pixbuffer
height = rgb_image.shape[0]
width = rgb_image.shape[1]
nChannels = rgb_image.shape[2]
gbytes = GLib.Bytes.new(rgb_image.tostring())
pixbuf = GdkPixbuf.Pixbuf.new_from_bytes(gbytes, GdkPixbuf.Colorspace.RGB, False,
8, width, height, width*nChannels)
# Add Gtk to main thread loop for thread safety
GLib.idle_add(self.set_from_pixbuf, pixbuf)
GLib.idle_add(self.queue_draw)
Well,
I found a solution, but I do not understand why it works: Set the image with a copy of the pixbuffer.
imageWidget.set_from_pixbuf(pixbuffer.copy())
I came to this solution after observing that the memory leak disapeared for scaled pixbuffers (i.e. result of pixbuffer.scale_simple).
Excerpt from the PyGTK FAQ, section 5.17:
There is a reference cycle between the python wrapper and its underlying C object; this means that the object will not be automatically deallocated when there are no more user references, and you will need the garbage collector to kick in (which may take a few cycles). This occasionally causes the odd problem, such as with pixbufs described in FAQ 8.4
And from section 8.4:
The answer is "Interesting GC behaviour" in Python. Apparently finalizers are not necessarily called as soon as an object goes out of scope. My guess is that the python memory manager doesn't directly know about the storage allocated for the image buffer (since it's allocated by the gdk) and that it therefore doesn't know how fast memory is being consumed.
The solution is to call gc.collect() at some appropriate place.
For example, I had some code that looked like this:
for image_path in images:
pb = gtk.gdk.pixbuf_new_from_file(image_path)
pb = pb.scale_simple(thumb_width, thumb_height, gtk.gdk.INTERP_BILINEAR)
thumb_list_model.set_value(thumb_list_model.append(None), 0, pb)
This chewed up an unacceptably large amount of memory for any reasonable image set. Changing the code to look like this fixed the problem:
import gc
for image_path in images:
pb = gtk.gdk.pixbuf_new_from_file(image_path)
pb = pb.scale_simple(thumb_width, thumb_height, gtk.gdk.INTERP_BILINEAR)
thumb_list_model.set_value(thumb_list_model.append(None), 0, pb)
del pb
gc.collect()
I am not exactly sure where you should call the garbage collector in your code (since I don't really know that much Python), but I believe this is the way to solve it.