How to apply video processing on the GPU

How to apply video processing on the GPU - image-processing

I have a issue in programming for the graphics Card.
I write a shader for Image processing (I use DirectX 11 in combination with SharpDX), but the Transfer from the CPU to the GPU is really slow (sometime about 0.5 seconds), but I think there should be a faster way than using Texture2D.FromStream, because when playing a video, also every image of the Video has to be transfered to the GPU (or am I wrong at this Point?).
How can I Speed up this process?

Related

Stream not in sync with OpenCV

I am using an implementation of YOLOv4 to detect traffic from a stream.
However because the hardware is a little bit slower than the 25 FPS of the stream, it stays behind it about 2 seconds. This means that i get two seconds jumps repeatedly throughout the detection process.
Is there a way for the OpenCV to read the stream sequentially, even when it is not synced?
Thanks

Possible to turn off multibuffering/doublebuffering in Metal?

In reading about the PowerVR alpha drivers for Vulkan, they note that multibufering needs to be performed explicitly. Since Vulkan and Metal are so similar, can I actually turn off multibuffering altogether? I am willing to sacrifice throughput for low latency.
http://blog.imgtec.com/powervr/trying-out-the-new-vulkan-graphics-api-on-powervr-gpus
As a bonus, is it possible to avoid double buffering? I know racing the beam is coming back into style on the desktop but I don't know if mobile display tech supports simple single-buffering.

Double buffering is not about throughput and in fact for most modern GPUs the latency increases in single buffered operation.
I know racing the beam is coming back into style on the desktop
Most certainly not, because racing the beam works only if you can build your image scanline by scanline. However realtime graphics these days operate by sending triangles to the GPU to rasterize and the GPU has some leeway in the order in which it is touching the pixels. Also the order of the triangles relative to the screen continuously changes.
Last but not least all modern graphics systems these days are composited, which goes contrary to racing the beam.

iOS video playback

In my application i should play video in unusual way.
Something like interactive player for special purposes.
Main issues here:
video resolution can be from 200*200px up to 1024*1024 px
i should have ability to change speed from -60 FPS to 60 PFS (in this case video should be played slower or faster depending on selected speed, negative means that video should play in back direction)
i should draw lines and objects over the video and scale it with image.
i should have ability Zoom image and pan it if its content more than screen size
i should have ability to change brightness, contrast and invert colors of this video
Now im doing next thing:
I splited my video to JPG frames
created timer for N times per seconds (play speed control)
each timer tick im drawing new texture (next JPG frame) with OpenGL
for zoom and pan im playing with OpenGL ES transformations (translate, scale)
All looks fine until i use 320*240 px, but if i use 512*512px my play rate is going down. Maybe timer behavour problem, maybe OpenGL. Sometimes, if im trying to open big textures with high play rate (more than 10-15 FPS), application just crash with memory warnings.
What is the best practice to solve this issue? What direction should i dig? Maybe cocos2d or other game engines helps me? Mb JPG is not best solution for textures and i should use PNG or PVR or smth else?

Keep the video data as a video and use AVAssetReader to get the raw frames. Use kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange as the colorspace, and do YUV->RGB colorspace conversion in GLES. It will mean keeping less data in memory, and make much of your image processing somewhat simpler (since you'll be working with luma and chroma data rather than RGB values).
You don't need to bother with Cocos 2d or any game engine for this. I strongly recommend doing a little bit of experimenting with OpenGL ES 2.0 and shaders. Using OpenGL for video is very simple and straightforward, adding a game engine to the mix is unnecessary overhead and abstraction.
When you upload image data to the textures, do not create a new texture every frame. Instead, create two textures: one for luma, and one for chroma data, and simply reuse those textures every frame. I suspect your memory issues are arising from using many images and new textures every frame and probably not deleting old textures.

JPEG frames will be incredibly expensive to uncompress. First step: use PNG.
But wait! There's more.
Cocos2D could help you mostly through its great support for sprite sheets.
The biggest help, however, may come from packed textures a la TexturePacker. Using PVR.CCZ compression can speed things up by insane amounts, enough for you to get better frame rates at bigger video sizes.

Vlad, the short answer is that you will likely never be able to get all of these features you have listed working at the same time. Playing video 1024 x 1024 video at 60 FPS is really going to be a stretch, I highly doubt that iOS hardware is going to be able to keep up with those kind of data transfer rates at 60FPS. Even the h.264 hardware on the device can only do 30FPS at 1080p. It might be possible, but to then layer graphics rendering over the video and also expect to be able to edit the brightness/contrast at the same time, it is just too many things at the same time.
You should focus in on what is actually possible instead of attempting to do every feature. If you want to see an example Xcode app that pushes iPad hardware right to the limits, please have a look at my Fireworks example project. This code displays multiple already decoded h.264 videos on screen at the same time. The implementation is built around CoreGraphics APIs, but the key thing is that Apple's impl of texture uploading to OpenGL is very fast because of a zero copy optimization. With this approach, a lot of video can be streamed to the device.

High-performance copying of RGB pixel data to the screen in iOS

Our product contains a kind of software image decoder that essentially produces full-frame pixel data that needs to be rapidly copied the screen (we're running on iOS).
Currently we're using CGBitmapContextCreate and we access the memory buffer directly, then for each frame we call CGBitmapContextCreateImage, and then draw that bitmap to the screen. This is WAY too slow for full-screen refreshes on the iPad's retina display at a decent framerate (but it was okay for non-Retina-devices).
We've tried all kinds of OpenGL ES-based approaches, including the use of glTexImage2D and glTexSubImage2D (essentially rendering to a texture), but CPU usage is still high and we can't get more than ~30 FPS for full-screen refreshes on the iPad 3. The problem is that with 30 FPS, CPU usage is nearly at %100 just for copying the pixels to the screen, which means we don't have much to work with for our own rendering on the CPU.
We are open to using OpenGL or any iOS API that would give us maximum performance. The pixel data is formatted as a 32-bit-per-pixel RGBA data but we have some flexibility there...
Any suggestions?

So, the bad news is that you have run into a really hard problem. I have been doing quite a lot of research in this specific area and currently the only way that you can actually blit a framebuffer that is the size of the full screen at 2x is to use the h.264 decoder. There are quite a few nice tricks that can be done with OpenGL once you have image data already decoded into actual memory (take a look at GPUImage). But, the big problem is not how to move the pixels from live memory onto the screen. The real issue is how to move the pixels from the encoded form on disk into live memory. One can use file mapped memory to hold the pixels on disk, but the IO subsystem is not fast enough to be able to swap out enough pages to make it possible to stream 2x full screen size images from mapped memory. This used to work great with 1x full screen sizes, but now the 2x size screens are actually 4x the amount of memory and the hardware just cannot keep up. You could also try to store frames on disk in a more compressed format, like PNG. But, then decoding the compressed format changes the problem from IO bound to CPU bound and you are still stuck. Please have a look at my blog post opengl_write_texture_cache for the full source code and timing results I found with that approach. If you have a very specific format that you can limit the input image data to (like an 8 bit table), then you could use the GPU to blit 8 bit data as 32BPP pixels via a shader, as shown in this example xcode project opengl_color_cycle. But, my advice would be to look at how you could make use of the h.264 decoder since it is actually able to decode that much data in hardware and no other approaches are likely to give you the kind of results you are looking for.

After several years, and several different situations where I ran into this need, I've decided to implement a basic "pixel viewer" view for iOS. It supports highly optimized display of a pixel buffer in a wide variety of formats, including 32-bpp RGBA, 24-bpp RGB, and several YpCbCr formats.
It also supports all of the UIViewContentMode* for smart scaling, scale to fit/fill, etc.
The code is highly optimized (using OpenGL), and achieves excellent performance on even older iOS devices such as iPhone 5 or the original iPad Air. On those devices it achieves 60FPS on all pixel formats except for 24bpp formats, where it achieves around 30-50fps (I usually benchmark by showing a pixel buffer at the device's native resolution, so obviously an iPad has to push far more pixels than the iPhone 5).
Please check out EEPixelViewer.

CoreVideo is most likely the framework you should be looking at. With the OpenGL and CoreGraphics approaches, you're being hit hard by the cost of moving bitmap data from main memory onto GPU memory. This cost exists on desktops as well, but is especially painful on iPhones.
In this case, OpenGL won't net you much of a speed boost over CoreGraphics because the bottleneck is the texture data copy. OpenGL will get you a more efficient rendering pipeline, but the damage will have already been done by the texture copy.
So CoreVideo is the way to go. As I understand the framework, it exists to solve the very problem you're encountering.

The pbuffer or FBO can then be used as a texture map for further rendering by OpenGL ES. This is called Render to Texture or RTT. its much quicker search pbuffer or FBO in EGL

Add constant latency to graphical output in XNA 4

does anyone know of an easy way to add a constant latency (about 30 ms) to the graphical output of an XNA 4 application?
I want to keep my graphical output in sync with a real-time buffered audio stream which inherently has a constant latency.
Thanks for any ideas on this!
Max

If you really need to delay your graphics, then what you could do is render your game to a cycling series of render-targets. So on frame n you display the frame you rendered at frame n-2. This will only work for small latencies, and requires a large amount of additional graphics memory and a small amount of extra GPU time.
A far better method is not to delay the graphical output at all, but delay the audio that is being used to generate the graphical output. Either by buffering it or having two read positions in your audio buffer. The "audio" read being X ms (the latency) ahead of the "game" read.
So if your computer's audio hardware has 100ms of latency (not uncommon), and your graphics hardware has a latency of 16ms: As you are feeding the sample at 100ms into the audio system, you are feeding the audio sample at 16ms into the your graphics calculation. At the same time, the audio from 0ms is hitting the speakers, and the matching graphic is hitting the screen.
Obviously this won't work if the thing generating the graphical output is also generating the audio. But the general principal of both these methods is that you have to buffer the input somewhere along your graphics chain, in order to introduce a delay that corresponds to the one you are experiencing for audio. Where along that chain it is easiest to insert a buffer is up to you.
For latencies of <100ms, I wouldn't worry about it for most games. You only really care about this kind of latency for audio programs and rhythm games.

I might not understand the question, but couldn't you keep track of how many times update is called and mod 2? 60fps mod 2 is 30...

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart