I have written an image processing application using Visual C++ forms and OpenCV on a windows machine. Everything seems to work ok, but displaying the images is very slow - only a few fps. I would like to be able to get to 30 or so. I am currently using the standard imshow(...) followed by waitkey(1).
My question is: Is there a better (i.e. faster) way to get an image from memory to the monitor.
The Mat structure used by openCV is essentially a fancy header pointing to a contiguous block of unsigned char values.
Edit:
I tested my code with the VS2013 profiler and it claims that I am spending 50% of the execution time in imshow/waitkey.
I've seen several discussions on this in the OpenCV Q/A forum and they always end with "you shouldn't be using imshow except for debugging" but nobody is suggesting anything else to use, so I thought I'd try here.
guy
Without seeing what you have, here is the approach I would take to achieve what you want.
Have a dedicated thread for frame acquisition from the camera. Insert the acquired frames into a synchronized queue, that is consumed by:
Image processing thread. Takes frames from the queue, processes them into images suitable for display. It changes a synchronized output image, and notifies GUI about it.
Main (GUI) thread is only dedicated to display. When it is notified of an image update, it swaps the synchronized output image with its current working image. (To avoid copying and extra allocations, we just reuse those two image buffers.) Then it invalidates the window. In a WM_PAINT handler, it then displays the image using BitBlt.
Some notes:
Minimize allocation/deallocation of buffers. For acquisition, you could have a pre-allocated pool of buffers to cycle through.
Prepare the output images in format and size that suit display.
Keep track of the number of frames in the queue and set some upper limit. Define an algorithm for dropping excess frames, so that you don't run out of memory and don't lag too much.
If you just want to ditch the sleep in waitKey and want something simpler, have a look at this question
Instrument your code -- add timing of the crucial parts using high resolution timer. Log them, and/or keep statistics, history.
Related
To avoid writing to a constant buffer from both the gpu and cpu at the same time, Apple recommends using a triple-buffered system with the help of a semaphore to prevent the cpu getting too far ahead of the gpu (this is fine and covered in at least three Metal videos now at this stage).
However, when the constant resource is an MTLTexture and the AVCaptureVideoDataOutput delegate runs separately than the rendering loop (CADisplaylink), how can a similar triple-buffered system (as used in Apple’s sample code MetalVideoCapture) guarantee synchronization? Screen tearing (texture tearing) can be observed if you take the MetalVideoCapture code and simply render a full screen quad and change the preset to AVCaptureSessionPresetHigh (at the moment the tearing is obscured by the rotating quad and low quality preset).
I realize that the rendering loop and the captureOutput delegate method (in this case) are both on the main thread and that the semaphore (in the rendering loop) keeps the _constantDataBufferIndex integer in check (which indexes into the MTLTexture for creation and encoding), but screen tearing can still be observed, which is puzzling to me (it would make sense if the gpu writing of the texture is not the next frame after encoding but 2 or 3 frames after, but I don’t believe this to be the case). Also, just a minor point: shouldn’t the rendering loop and the captureOutput have the same frame rate for a buffered texture system so old frames aren’t rendered interleaved with recent ones.
Any thoughts or clarification on this matter would be greatly appreciated; there is another example from McZonk, which doesn’t use the triple-buffered system, but I also observed tearing with this approach (but less so). Obviously, no tearing is observed if I use waitUntilCompleted (equivalent to Open GL’s glfinish), but thats like playing an accordion with one arm tied behind your back!
I am creating a photo slide show with complex transitions between images on iOS. Core Animation doesn't suits the purpose as the possible transitions are limited, so I resort to Opengles 2.0. The problem is uploading images to GPU and creating texture is a time consuming operation & takes roughly 200 ms even for a 960x640 image, which is not suitable for real time playback scenario. And its not feasible to pre-create all the textures before hand as there could be 100s of them. I wonder how Core Animation deals with this problem and is smooth enough to run no matter how many CGImages you assign in animations ? (As long as images are presented at different times and not together).
Texture loading is time consuming and most of applications dealing with a large number of them are loading them on some initialisation. That is the simplest approach but surely most resource consuming. You must understand that what goes on in the back is reading an image file, decompressing it, creating a raw RGB(A) data on the CPU, allocating a memory on the GPU and sending the raw data to the GPU...
As the best approach of dealing with large number of textures is loading them in background preferably even before you need them. In your case as already mentioned in the comment you will need to create some smart cache of these textures. This will still not be enough since the loading itself might make your thread unresponsive. You will need to add a background task to handle those images.
What I suggest to you is creating 2 additional threads. First should load the image data to the CPU while the second will push the data to the GPU. The first thread is pretty straight forward while the second will need a bit of additional GL code to accomplish. Each thread will need its own openGL context to be able to communicate with the GPU, so once you create this thread you also need to create an extra context. These contexts are not aware of each others resources which leads to creating a texture in one context will make it unusable on the other context. For this you will need an extra parameter called a share group. So first you create the share group and then create both contexts with the same share group so the textures will be accessible. Do note that the context is preferably created on the thread you are supposed to be using it on (it might be enough to simply set it as current though).
I can already hear the wrenching guts of a thousand iOS developers.
No, I am not noob.
Why is -drawRect faster for UITableView performance than having multiple views?
I understand that compositing operations take place on the GPU. But compositing is a one-time operation; once the layers are committed to memory, it is no different from a cached buffer that, from the point of view of the GPU, gets translated in and out of view. Compare this to using Core Graphics in drawRect, which employ an unknown amount of operations on the CPU to produce pixels that end up getting cached in CALayers anyway. What's the difference if it all ends up cached and flattened anyway?
Also, if you're handling cell reuse properly, you shouldn't need to regenerate views on each call to -cellForRowAtIndexPath. In fact, there may be a performance benefit to having the state data (font, font size, text color, attributes, etc) cached by UIView/CALayer objects than having them constantly recreated during -drawRect.
Why the craze for drawRect? Can someone give me pointers?
When you talking about optimization, you need to provide specific situations and conditions and limitations. Because optimization is all about micro-management. Otherwise, it's meaningless.
What's the basis of your faster? How did you measured it? What's the numbers?
For example, no-op or very simple -drawRect: can be faster, but it doesn't mean it always does.
I don't know internal design of CA neither. So here are my guesses.
In case of static content
It's weird that your drawing code is being called constantly. Because CALayer caches drawing result, and won't draw it again until you send setNeedsDisplay message. If you don't update cell's content, it's just same with single bitmap layer. Should be faster than multiple composited layers because it doesn't need composition cost. If you're using only small number of cells which are enough to be exist all in the pool at same time, it doesn't need to be updated. As RAM becomes larger in recent model, it's more likely to happen in recent models.
In case of dynamic content
If it is being updated constantly, it means you're actually updating them yourself. So maybe your layer-composited version would also being updated constantly. It means it is being composited again for every frame. It could be slower by how it is complex and large. If it's complex and large and have a lot of overlapping areas, it could be slower. I guess CA will draw everything strictly if it can't determine what area is fine to ignore. Unlike you can choose what to draw or not.
In case of actual drawing is done in CPU
Even you configure your view as pure composition of many layers, each sublayers should be drawn eventually. And drawing of their content is not guaranteed to be done in GPU. For example, I believe CATextLayer is drawing itself in CPU. (because drawing text with polygons on current mobile GPU doesn't make sense in performance perspective) And some filtering effects too. In that case, overall cost would be similar and plus it requires compositing cost.
In case of well balanced load of CPU and GPU
If your GPU is very busy for heavy load because there're too many layers or direct OpenGL drawings, your CPU may be idle. If your CG drawing can be done within the idle CPU time, it could be faster than giving more load to GPU.
None of them is your case?
If your case is none of situations I listed above, I really want to see and check the CG code draws faster than CA composition. I wish you attach some source code.
well, your program could easily end up moving and converting a lot of pixel data if going back and forth from GPU to CPU based renderers.
as well, many layers can consume a lot of memory.
I'm only seeing half the conversation here, so I might have misunderstood. Based on my recent experiences optimizing CALayer rendering, and investigating the ways Apple does(n't) optimize stuff you'd expect to be optimized...
What's the difference if it all ends up cached and flattened anyway?
Apple ends up creating a separate GPU element per layer. If you have lots of layers, you have lots of GPU elements. If you have one drawRect, you only have one element. Apple often does NOT flatten those, even where they could (and possibly "should").
In many cases, "lots of elements" is no issue. But if they get to be large ... or there's enough of them ... or they're bad sizes for OpenGL ... AND (see below) they get stored in CPU instead of on GPU, then things start to get nasty. NB: in my experience:
"enough": 40+ in memory
"large": 100x100 points (200x200 retina pixels)
Apple's code for GPU elements / buffers is well optimized in MOST places, but in a few places it's very POORLY optimized. The performance drop is like going off a cliff.
Also, if you're handling cell reuse properly, you shouldn't need to
regenerate views on each call to -cellForRowAtIndexPath
You say "properly", except ... IIRC Apple's docs tell people not to do it that way, they go for a simpler approach (IMHO: weak docs), and instead re-populate all the subviews on every call. At which point ... how much are you saving?
FINALLY:
...doesn't all this change with iOS 6, where the cost of creating a UIView is greatly reduced? (I haven't profiled it yet, just been hearing about it from other devs)
I'm developing an iPad app that uses large textures in OpenGL ES. When the scene first loads I get a large black artifact on the ceiling for a few frames, as seen in the picture below. It's as if higher levels of the mipmap have not yet been filled in. On subsequent frames, the ceiling displays correctly.
This problem only began showing up when I started using mipmapping. One possible explanation is that the glGenerateMipmap() call does its work asynchronously, spawning some mipmap creation worker (in a separate process, or perhaps in the GPU) and returning.
Is this possible, or am I barking up the wrong tree?
Within a single context, all operations will appear to execute strictly in order. However, in your most recent reply, you mentioned using a second thread. To do that, you must have created a second shared context: it is always illegal to re-enter an OpenGL context. If already using a shared context, there are still some synchronization rules you must follow, documented at http://developer.apple.com/library/ios/ipad/#DOCUMENTATION/3DDrawing/Conceptual/OpenGLES_ProgrammingGuide/WorkingwithOpenGLESContexts/WorkingwithOpenGLESContexts.html
It should be synchronous; OpenGL does not in itself have any real concept of threading (excepting the implicit asynchronous dialogue between CPU and GPU).
A good way to diagnose would be to switch to GL_LINEAR_MIPMAP_LINEAR. If it's genuinely a problem with lower resolution mip maps not arriving until later then you'll see the troublesome areas on the ceiling blend into one another rather than the current black-or-correct effect.
A second guess, based on the output, would be some sort of depth buffer clearing issue.
I followed #Tommy's suggestion and switched to GL_LINEAR_MIPMAP_LINEAR. Now the black-or-correct effect changed to a fade between correct and black.
I guess that although we all know that OpenGL is a pipeline (and therefore asynchronous unless you are retrieving state or explicity synchronizing), we tend to forget it. I certainly did in this case, where I was not drawing, but loading and setting up textures.
Once I confirmed the nature of the problem, I added a glFinish() after loading all my textures, and the problem went away. (Btw, my draw loop is in the foreground and my texture loading loop - because it is so time consuming and would impair interactivity - is in the background. Also, since this may vary between platforms, I'm using iOS5 on an iPad 2)
We have a project that is up and coming that will require us to push texture image information to a the EAGLView of an iPad app. Being green to OpenGL in general, are there implications to having a surface wait for texture information? What will OpenGL do while it's waiting for the image data? Does OpenGL require constant updates to it's textures, or will it retain the data until we update the texture again? We're not going to be having a loop or anything in the view, but more like an observer pattern.
When you upload a texture, you hand it off to the GPU — so a copy is made, in memory you don't have direct access to. It's then available to be drawn with as many times as you want. So there's no need for constant updates.
OpenGL won't do anything else while waiting for the image data, it's a synchronous API. The call to upload the data will take as long as it takes, the texture object will have no graphic associated with it beforehand and will have whatever you uploaded associated with it afterwards.
In the general case, OpenGL objects, including texture objects, belong to a specific context and contexts belong to a specific thread. However, iOS implements share groups, which allow you to put several contexts into a share group, allowing objects to be shared between them subject to you having to be a tiny bit careful about synchronisation.
iOS provides a specific subclass of CALayer, CAEAGLLayer, that you can use to draw to from OpenGL. It's up to you when you draw and how often. So your approach is the more native one, if anything. A lot of the samples wrap
Obviously try the simplest approach of 'everything on the main thread' first. If you're not doing all that much then it'll likely be fast enough and save you code maintenance. However, uploading can cost more than you expect, since the OpenGL way around of working is that you specify the data and the format it's in, leaving OpenGL to rearrange it as necessary for the particular GPU you're on. We're talking amounts of the 0.3 of a second variety rather than 30 seconds, but enough that there'll be an obvious pause if the user taps a button or tries to move a slider at the same time.
So if keeping the main thread responsive proves an issue, I'd imagine that you'd want to hop onto a background thread, create a new context within the same share group as the one on the main thread, upload, then hop back to do the actual drawing. In which case it'll remain up to you how you communicate to the user that data has been received and is being processed as distinct from no data having been received yet, if the gap is large enough to justify doing so.