We have a native C++ add-on running in an Electron renderer process providing Uint8Array bitmap data to JavaScript where it is painted into a canvas via textImage2D in a webgl context or putImageData in a 2d context.
The Uint8Array is allocated in the native addon and passed via a callback to JS. It is not deallocated immediately after the callback ends, and is kept in a memory pool that holds the last 10 frames sent to be available for async painting.
If the array is passed as is to putImageData or texImage2D, the renderer completely freezes. If it is copied into a new TypedArray beforehand, there is no problem, but we would like to avoid the extra copy operation, hence the memory pool.
I have a feeling the freezing is related to the way Chromium handles GL commands via it's command buffer.
I've tried the following Chromimum command line args in an attempt to isolate the issue, with no luck - renderer process is frozen when the array is not copied beforehand):
--use-passthrough-cmd-decoder results in longer render times for 2d, and a null context for webgl
--disable-gpu-sandbox does nothing
--in-process-gpu does nothing
Any idea what is happening?
Related
I would like to utilize OpenCV's integration of OpenGL/OpenCL to achieve fast distortion of images directly on the GPU while avoiding GPU/CPU image transfers. I can create a cv::ogl::Buffer from an OpenGL buffer object in Qt:
// m_pub is of type QOpenGLBuffer
cv::ogl::Buffer b(512, 512, CV_8UC4, m_pub.bufferId());
But the next line throws an exception:
cv::UMat m = cv::ogl::mapGLBuffer(b);
The error reported by OpenCV was originally:
OpenCV(4.5.5) Error: Unknown error code -220 (OpenCL:
clCreateFromGLBuffer failed) in cv::ogl::mapGLBuffer, file
D:\OpenCV\opencv-4.5.5\modules\core\src\opengl.cpp, line 1886
To get further information, I call cv::ocl::getOpenCLErrorString(status); in opengl.cpp, rebuild, and find that the error is CL_INVALID_CONTEXT.
I've checked cv::ocl::Context, cv::ocl::Device, cv::ocl::Platform, added cv::ocl::attachContext, all doesn't work. I'm stuck here, don't know how to go forward.
Any suggestions are really appreciated. Thanks.
It needs to call cv::ogl::ocl::initializeContextFromGL() in the initialization phase. My project is based on Qt, the window is inherited from QOpenGLWindow, so I call that function in initializeGL().
I guess that without this call, OpenCV would also create an OpenCL context automatically, but that's not the same as the one created from OpenGL, that's why the CL_INVALID_CONTEXT error occurred.
As a reminder, I used cv::remap to process a 512x512 image and found that processing on the GPU via OpenCL was not much faster than on the CPU. It still takes around 12ms on average. This is still too slow if the application requires a high frame rate and needs to leave time for other processing.
I am working on exposing an audio library (C library) for Dart. To trigger the audio engine, it requires a few initializations steps (non blocking for UI), then audio processing is triggered with a perform function, which is blocking (audio processing is a heavy task). That is why I came to read about Dart isolates.
My first thought was that I only needed to call the performance method in the isolate, but it doesn't seem possible, since the perform function takes the engine state as first argument - this engine state is an opaque pointer ( Pointer in dart:ffi ). When trying to pass engine state to a new isolate with compute function, Dart VM returns an error - it cannot pass C pointers to an isolate.
I could not find a way to pass this data to the isolate, I assume this is due to the separate memory of main isolate and the one I'm creating.
So, I should probably manage the entire engine state in the isolate which means :
Create the engine state
Initialize it with some options (strings)
trigger the perform function
control audio at runtime
I couldn't find any example on how to perform this actions in the isolate, but triggered from main thread/isolate. Neither on how to manage isolate memory (keep the engine state, and use it). Of course I could do
Here is a non-isolated example of what I want to do :
Pointer<Void> engineState = createEngineState();
initEngine(engineState, parametersString);
startEngine(engineState);
perform(engineState);
And at runtime, triggered by UI actions (like slider value changed, or button clicked) :
setEngineControl(engineState, valueToSet);
double controleValue = getEngineControl(engineState);
The engine state could be encapsulated in a class, I don't think it really matters here.
Whether it is a class or an opaque datatype, I can't find how to manage and keep this state, and perform triggers from main thread (processed in isolate). Any idea ?
In advance, thanks.
PS: I notice, while writing, that my question/explaination may not be precise, I have to say I'm a bit lost here, since I never used Dart Isolates. Please tell me if some information is missing.
EDIT April 24th :
It seems to be working with creating and managing object state inside the Isolate. But the main problem isn't solved. Because the perform method is actually blocking while it is not completed, there is no way to still receive messages in the isolate.
An option I thought first was to use the performBlock method, which only performs a block of audio samples. Like this :
while(performBlock(engineState)) {
// listen messages, and do something
}
But this doesn't seem to work, process is still blocked until audio performance finishes. Even if this loop is called in an async method in the isolate, it blocks, and no message are read.
I now think about the possibility to pass the Pointer<Void> managed in main isolate to another, that would then be the worker (for perform method only), and then be able to trigger some control methods from main isolate.
The isolate Dart package provides a registry sub library to manage some shared memory. But it is still impossible to pass void pointer between isolates.
[ERROR:flutter/lib/ui/ui_dart_state.cc(157)] Unhandled Exception: Invalid argument(s): Native objects (from dart:ffi) such as Pointers and Structs cannot be passed between isolates.
Has anyone already met this kind of situation ?
It is possible to get an address which this Pointer points to as a number and construct a new Pointer from this address (see Pointer.address and Pointer.fromAddress()). Since numbers can freely be passed between isolates, this can be used to pass native pointers between them.
In your case that could be done, for example, like this (I used Flutter's compute to make the example a bit simpler but that would apparently work with explicitly using Send/ReceivePorts as well)
// Callback to be used in a backround isolate.
// Returns address of the new engine.
int initEngine(String parameters) {
Pointer<Void> engineState = createEngineState();
initEngine(engineState, parameters);
startEngine(engineState);
return engineState.address;
}
// Callback to be used in a backround isolate.
// Does whichever processing is needed using the given engine.
void processWithEngine(int engineStateAddress) {
final engineState = Pointer<Void>.fromAddress(engineStateAddress);
process(engineState);
}
void main() {
// Initialize the engine in a background isolate.
final address = compute(initEngine, "parameters");
final engineState = Pointer<Void>.fromAddress(address);
// Do some heavy computation in a background isolate using the engine.
compute(processWithEngine, engineState.address);
}
I ended up doing the processing of callbacks inside the audio loop itself.
while(performAudio())
{
tasks.forEach((String key, List<int> value) {
double val = getCallback(key);
value.forEach((int element) {
callbackPort.send([element, val]);
});
});
}
Where the 'val' is the thing you want to send to callback. The list of int 'value' is a list of callback index.
Let's say you audio loop performs with vector size of 512 samples, you will be able to pass your callbacks after every 512 audio samples are processed, which means 48000 / 512 times per second (assuming you sample rate is 48000). This method is not the best one but it works, I still have to see if it works in very intensive processing context though. Here, it has been thought for realtime audio, but it could work the same for audio rendering.
You can see the full code here : https://framagit.org/johannphilippe/csounddart/-/blob/master/lib/csoundnative.dart
I am trying to create a Gtk Widget that you can pass an OpenCV image to that will then show it. I have created a class that is inherited from Gtk.Image that is used to show the image. You pass the OpenCV image to this class using the show_frame method, which then updates the Gtk.Image so it shows that image.
I have tested this and it works fine, i.e the image is correctly shown and updated when the show_frame method is called. However every time the image is updated, the memory used increases, until there is not enough memory and the program crashes.
I believe this is due to the memory that image is not being freed correctly. I cannot however work out how to fix this. I have tried unreferencing the gbytes once a new frame is received but this does not help. The memory only builds up when the set_from_pixbuf function is called. If this is commented out the memory usage stays at a constant level.
class OpenCVImageViewer(Gtk.Image):
def __init__(self):
Gtk.Image.__init__(self)
def show_frame(self, frame):
# Convert to opencv BGR to Gtk RGB
rgb_image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
# Get details about frame in order to set up pixbuffer
height = rgb_image.shape[0]
width = rgb_image.shape[1]
nChannels = rgb_image.shape[2]
gbytes = GLib.Bytes.new(rgb_image.tostring())
pixbuf = GdkPixbuf.Pixbuf.new_from_bytes(gbytes, GdkPixbuf.Colorspace.RGB, False,
8, width, height, width*nChannels)
# Add Gtk to main thread loop for thread safety
GLib.idle_add(self.set_from_pixbuf, pixbuf)
GLib.idle_add(self.queue_draw)
Well,
I found a solution, but I do not understand why it works: Set the image with a copy of the pixbuffer.
imageWidget.set_from_pixbuf(pixbuffer.copy())
I came to this solution after observing that the memory leak disapeared for scaled pixbuffers (i.e. result of pixbuffer.scale_simple).
Excerpt from the PyGTK FAQ, section 5.17:
There is a reference cycle between the python wrapper and its underlying C object; this means that the object will not be automatically deallocated when there are no more user references, and you will need the garbage collector to kick in (which may take a few cycles). This occasionally causes the odd problem, such as with pixbufs described in FAQ 8.4
And from section 8.4:
The answer is "Interesting GC behaviour" in Python. Apparently finalizers are not necessarily called as soon as an object goes out of scope. My guess is that the python memory manager doesn't directly know about the storage allocated for the image buffer (since it's allocated by the gdk) and that it therefore doesn't know how fast memory is being consumed.
The solution is to call gc.collect() at some appropriate place.
For example, I had some code that looked like this:
for image_path in images:
pb = gtk.gdk.pixbuf_new_from_file(image_path)
pb = pb.scale_simple(thumb_width, thumb_height, gtk.gdk.INTERP_BILINEAR)
thumb_list_model.set_value(thumb_list_model.append(None), 0, pb)
This chewed up an unacceptably large amount of memory for any reasonable image set. Changing the code to look like this fixed the problem:
import gc
for image_path in images:
pb = gtk.gdk.pixbuf_new_from_file(image_path)
pb = pb.scale_simple(thumb_width, thumb_height, gtk.gdk.INTERP_BILINEAR)
thumb_list_model.set_value(thumb_list_model.append(None), 0, pb)
del pb
gc.collect()
I am not exactly sure where you should call the garbage collector in your code (since I don't really know that much Python), but I believe this is the way to solve it.
I have a background thread which loads images (either from disk or a server), with the goal of eventually passing them to the main thread to draw. When this second thread is loading GIF images using the VCL's TGIFImage class, this program sometimes leaks several handles each time the following line executes in the thread:
m_poBitmap32->Assign(poGIFImage);
That is, the just-opened GIF image is being assigned to a bitmap owned by the thread. None of these are shared with any other threads, i.e. are entirely localised to the thread. It is timing-dependent, so doesn't occur every time the line is executed, but when it does occur it happens only on that line. Each leak is one DC, one palette, and one bitmap. (I use GDIView, which gives more detailed GDI information than Process Explorer.) m_poBitmap32 here is a Graphics32 TBitmap32 object, but I have reproduced this using plain VCL-only classes, i.e. using Graphics::TBitmap::Assign.
Eventually I get an EOutOfResources exception, probably indicating the desktop heap is full:
:7671b9bc KERNELBASE.RaiseException + 0x58
:40837f2f ; C:\Windows\SysWOW64\vclimg140.bpl
:40837f68 ; C:\Windows\SysWOW64\vclimg140.bpl
:4084459f ; C:\Windows\SysWOW64\vclimg140.bpl
:4084441a vclimg140.#Gifimg#TGIFFrame#Draw$qqrp16Graphics#TCanvasrx11Types#TRectoo + 0x4a
:408495e2 ; C:\Windows\SysWOW64\vclimg140.bpl
:50065465 rtl140.#Classes#TPersistent#Assign$qqrp19Classes#TPersistent + 0x9
:00401C0E TLoadingThread::Execute(this=:00A44970)
How do I solve this and safely use TGIFImage in a background thread?
And secondly, will I encounter this same problem with the PNG, JPEG or BMP classes? I haven't so far, but given it's a threading / timing issue that doesn't mean I won't if they use similar code to TGIFImage.
I am using C++ Builder 2010 (part of RAD Studio.)
More details
Some research showed I'm not the only person to encounter this. To quote from one thread,
Help (2007) says:
In multi-threaded applications that use Lock to protect a canvas, all calls that use the canvas must be protected by a call to
Lock. Any thread that does not lock the canvas before using it will
introduce potential bugs.
[...]
But this statement is absolute false: you MUST lock the canvas in
secondary thread even if other threads don't touch it. Otherwise the
canvas's GDI handle can be freed in main thread as unused at any
moment (asynchronously).
Another reply indicates something similar, that it may be to do with the GDI object cache in graphics.pas.
That's scary: an object created and used entirely in one thread can have some of its resources freed asynchronously in the main thread. Unfortunately, I don't know how to apply the Lock advice to TGIFImage. TGIFImage has no Canvas, although it does have a Bitmap which has a canvas. Locking that has no effect. I suspect that the problem is actually in TGIFFrame, an internal class. I also do not know if or how I should lock any TBitmap32 resources. I did try assigning a TMemoryBackend to the bitmap, which avoids using GDI, but it had no effect.
Reproduction
You can reproduce this very easily. Create a new VCL app, and make a new unit which contains a thread. In the thread's Execute method, place this code:
while (!Terminated) {
TGraphic* poGraphic = new TGIFImage();
TBitmap32* poBMP32 = new TBitmap32();
__try {
poGraphic->LoadFromFile(L"test.gif");
poBMP32->Assign(poGraphic);
} __finally {
delete poBMP32;
delete poGraphic;
}
}
You can use Graphics::TBitmap if you don't have Graphics32 installed.
In the app's main form, add a button which creates and starts the thread. Add another button which executes similar code to the above (once only, no need to loop. Mine also stores the TBitmap32 as a member variable instead of creating it there, and invalidates so it will eventually paint it to the form.) Run the program and click the button to start the thread. You will probably see GDI objects leak already, but if not press the second button which runs the similar code once in the main thread - once is enough, it seems to trigger something - and it will leak. You will see memory usage rise, and that it leaks GDI handles at the rate of several dozen per second.
Unfortunately, the fix is very, very ugly. The basic idea is that the background thread must acquire a lock that the main thread holds when it's between messages.
The naive implementation is like this:
Lock canvas mutex.
Spawn background thread.
Wait for message.
Release canvas mutex.
Process message.
Lock canvas mutex.
Go to step 3.
Note that this means the background thread can only access GDI objects while the main thread is busy, not while it's waiting for a message. And this means the background thread cannot own any canvasses while it does not hold the mutex. These two requirements tend to be too painful. So you may need to refine the algorithm.
One refinement is to have the background thread send the main thread a message when it needs to use a canvas. This will cause the main thread to more quickly release the canvas mutex so the background thread can get it.
I think this will be enough to make you give up this idea. Instead, perhaps, read the file from the background thread but process it in the main thread.
I'm developing an application with adobe air 3 for ios and having low memory errors frequently.
After ios 5 update os started to kill my app after some low memory warnings.
But the thing is profiler says app uses 4 to 9 megs of memory.
There are a lot of bitmap copy operations around and sometimes instantiates new bitmaps from embedded bitmaps.
I highly optimized everything and look for leaks etc.
I watch profiler for memory status and seems like GC clears everything. everything looks perfect but app continues to get low memory errors and gets killed by os.
Is there anything wrong with this code below. Because my assumption is this ClassReference never gets off from memory even the profiles says memory is cleared.
I used clone method to pass value instead of pass by ref. so I guess GC can collect that local variable. I tried with and without clone nothing changes.
If the code below runs 10-15 times with different tile Id's app crashes but with same ID's it continues working.
Is there anyone who is familiar with this kind of thing?
tmp is bitmapData
if (isMoving)
{
tmp=getProxyImage(x,y); //low resolution tile image
}
else
{
strTmp="main_TILE"+getTileID(x,y);
var ClassReference:Class = getDefinitionByName(strTmp) as Class; //full resolution tile image //something wrong here
tmp=new ClassReference().bitmapData.clone(); //something wrong here
ClassReference=null;
}
return tmp.clone();
Thanks for reading. I hope some one has a solution for this.
You are creating three copies of your bitmapdata with this. They will likely get garbage collected eventually, but you probably run out of memory before that happens.
(Here I assume you have embedded your bitmapdata using the [Embed] tag)
tmp = new ClassReference()
// allocates no new memory, class reference already exists
var ClassReference:Class = getDefinitionByName(strTmp) as Class;
// creates a new BitmapAsset from the class reference including it's BitmapData.
// then you clone this bitmapdata, giving you two
tmp = new ClassReference().bitmapData.clone();
// not really necessary since ClassReference goes out of scope anyway, but no harm done
ClassReference=null;
// Makes a third copy of your second copy and returns it.
return tmp.clone();
I would recommend this (assuming you need unique bitmapDatas for each tile)
var ClassReference:Class = getDefinitionByName(strTmp) as Class;
return new ClassReference().bitmapData.clone();
If you don't need unique bitmapDatas, keep static properties with the bitmapDatas on some class and use the same ones all over. That will minimize memory usage.