iOS video playback - ios

In my application i should play video in unusual way.
Something like interactive player for special purposes.
Main issues here:
video resolution can be from 200*200px up to 1024*1024 px
i should have ability to change speed from -60 FPS to 60 PFS (in this case video should be played slower or faster depending on selected speed, negative means that video should play in back direction)
i should draw lines and objects over the video and scale it with image.
i should have ability Zoom image and pan it if its content more than screen size
i should have ability to change brightness, contrast and invert colors of this video
Now im doing next thing:
I splited my video to JPG frames
created timer for N times per seconds (play speed control)
each timer tick im drawing new texture (next JPG frame) with OpenGL
for zoom and pan im playing with OpenGL ES transformations (translate, scale)
All looks fine until i use 320*240 px, but if i use 512*512px my play rate is going down. Maybe timer behavour problem, maybe OpenGL. Sometimes, if im trying to open big textures with high play rate (more than 10-15 FPS), application just crash with memory warnings.
What is the best practice to solve this issue? What direction should i dig? Maybe cocos2d or other game engines helps me? Mb JPG is not best solution for textures and i should use PNG or PVR or smth else?

Keep the video data as a video and use AVAssetReader to get the raw frames. Use kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange as the colorspace, and do YUV->RGB colorspace conversion in GLES. It will mean keeping less data in memory, and make much of your image processing somewhat simpler (since you'll be working with luma and chroma data rather than RGB values).
You don't need to bother with Cocos 2d or any game engine for this. I strongly recommend doing a little bit of experimenting with OpenGL ES 2.0 and shaders. Using OpenGL for video is very simple and straightforward, adding a game engine to the mix is unnecessary overhead and abstraction.
When you upload image data to the textures, do not create a new texture every frame. Instead, create two textures: one for luma, and one for chroma data, and simply reuse those textures every frame. I suspect your memory issues are arising from using many images and new textures every frame and probably not deleting old textures.

JPEG frames will be incredibly expensive to uncompress. First step: use PNG.
But wait! There's more.
Cocos2D could help you mostly through its great support for sprite sheets.
The biggest help, however, may come from packed textures a la TexturePacker. Using PVR.CCZ compression can speed things up by insane amounts, enough for you to get better frame rates at bigger video sizes.

Vlad, the short answer is that you will likely never be able to get all of these features you have listed working at the same time. Playing video 1024 x 1024 video at 60 FPS is really going to be a stretch, I highly doubt that iOS hardware is going to be able to keep up with those kind of data transfer rates at 60FPS. Even the h.264 hardware on the device can only do 30FPS at 1080p. It might be possible, but to then layer graphics rendering over the video and also expect to be able to edit the brightness/contrast at the same time, it is just too many things at the same time.
You should focus in on what is actually possible instead of attempting to do every feature. If you want to see an example Xcode app that pushes iPad hardware right to the limits, please have a look at my Fireworks example project. This code displays multiple already decoded h.264 videos on screen at the same time. The implementation is built around CoreGraphics APIs, but the key thing is that Apple's impl of texture uploading to OpenGL is very fast because of a zero copy optimization. With this approach, a lot of video can be streamed to the device.

Related

Taking Frame from Video vs Taking a Photo

My specific question is: What are the drawbacks to using a snipped frame from a video vs taking a photo?
Details:
I want to use frames from live video streams to replace taking pictures because it is faster. I have already researched and considered:
Videos need faster shutter speed, leading to higher possibility of blurring
Faster shutter speed also means less exposure to light, leading to potentially darker images
A snipped frame from a video will probably be lower resolution (although maybe we can possibly turn up the resolution to compensate for this?)
Video might take up more memory -- I am still exploring the details with another post (What is being stored and where when you use cv2.VideoCapture()?)
Anything else?
I will reword my question to make it (possibly) easier to answer: What changes must I make to a "snip frame from video" process to make the result equivalent to taking a photo? Are these changes worth it?
The maximum resolution in picamera is 2592x1944 for still photos and 1920x1080 for video recording. Other issues to take into account are that you cannot receive all formats from VideoCapture, so now conversion of the YUV frame to JPG will be your responsibility. OK, OpenCV can handle this, but it takes considerable CPU time and memory.

What is the most efficient way to search for faces in all 4 orientations of a video with OpenCV using a GPU?

I am new to GPU programming and I have started by passing haarcascade_frontalface_alt.xml and a video file to this compiled example:
https://github.com/Itseez/opencv/blob/master/samples/gpu/cascadeclassifier.cpp
It seems to take about 3 seconds to load the video into the GPU and then another 2 seconds to search for faces. This works well but the videos could have been recorded at any orientations so if no faces are found, I rotate the video by 90 degrees and try again. The problem is that this approach takes at least 20 seconds to determine if any faces were found in all 4 orientations and hence the correct orientation of the video.
Is it possible to perform a rotation invariant cascade classifier to determine the orientation of the video? Or is it possible transpose the video in the GPU without having to reload a rotated version? Or is possible to apply a rotated version of cascade classifier? How can I search for faces in all 4 orientations without having to load 4 versions of the video into the GPU?
Many things are possible in the world of computer vision, but few are robust/reliable :). Rotation invariance is not the way to go (since effectively rotation invariance means that rotation-information is somehow dropped).
The simplest approach: Image rotation on a GPU is quite fast, so you could try rotating each image after having uploaded it to the device, using gpu::rotate.
A faster approach: The typical approach would be to learn four different detectors and apply all of them. Detection scales quite well in the number of detectors with some recent advances.
But I am still not sure of what you want to achieve. If you do not want to find all faces, but rather estimate the orientation of the video (as it sounds from parts of your question), you only need to process a subsample of all frames and infer from those (as head rotations do not tend do be randomly distributed :) )

Converting raw data to displayable video for iOS

I have an interesting problem I need to research related to very low level video streaming.
Has anyone had any experience converting a raw stream of bytes(separated into per pixel information, but not a standard format of video) into a low resolution video stream? I believe that I can map the data into RGB value per pixel bytes, as the color values that correspond to the value in the raw data will be determined by us. I'm not sure where to go from there, or what the RGB format needs to be per pixel.
I've looked at FFMPeg but it's documentation is massive and I don't know where to start.
Specific questions I have include, is it possible to create CVPixelBuffer with that pixel data? If I were to do that, what sort of format for the per pixel data would I need to convert to?
Also, should I be looking deeper into OpenGL, and if so where would the best place to look for information on this topic?
What about CGBitmapContextCreate? For example, if I went I went with something like this, what would a typical pixel byte need to look like? Would this be fast enough to keep the frame rate above 20fps?
EDIT:
I think with the excellent help of you two, and some more research on my own, I've put together a plan for how to construct the raw RGBA data, then construct a CGImage from that data, in turn create a CVPixelBuffer from that CGImage from here CVPixelBuffer from CGImage.
However, to then play that live as the data comes in, I'm not sure what kind of FPS I would be looking at. Do I paint them to a CALayer, or is there some similar class to AVAssetWriter that I could use to play it as I append CVPixelBuffers. The experience that I have is using AVAssetWriter to export constructed CoreAnimation hierarchies to video, so the videos are always constructed before they begin playing, and not displayed as live video.
I've done this before, and I know that you found my GPUImage project a little while ago. As I replied on the issues there, the GPUImageRawDataInput is what you want for this, because it does a fast upload of RGBA, BGRA, or RGB data directly into an OpenGL ES texture. From there, the frame data can be filtered, displayed to the screen, or recorded into a movie file.
Your proposed path of going through a CGImage to a CVPixelBuffer is not going to yield very good performance, based on my personal experience. There's too much overhead when passing through Core Graphics for realtime video. You want to go directly to OpenGL ES for the fastest display speed here.
I might even be able to improve my code to make it faster than it is right now. I currently use glTexImage2D() to update texture data from local bytes, but it would probably be even faster to use the texture caches introduced in iOS 5.0 to speed up refreshing data within a texture that maintains its size. There's some overhead in setting up the caches that makes them a little slower for one-off uploads, but rapidly updating data should be faster with them.
My 2 cents:
I made an opengl game which lets the user record a 3d scene. Playback was done via replaying the scene (instead of playing a video because realtime encoding did not yield a comfortable FPS.
There is a technique which could help out, unfortunately I didn't have time to implement it:
http://allmybrain.com/2011/12/08/rendering-to-a-texture-with-ios-5-texture-cache-api/
This technique should cut down time on getting pixels back from openGL. You might get an acceptable video encoding rate.

Is Direct3D feasible for zoom and pan of large images?

I want to make an image viewer for large images (up to 2000 x 8000 pixels) with very responsive zooming and panning, say 30 FPS or more. One option I came up with is to create a 3D scene with the image as a sort of fixed billboard, then move the camera forward/back to zoom and up/down/left/right to pan.
That basic idea seems reasonable to me, but I don't have experience with 3D graphics. Is there some fundamental thing I'm missing that will make my idea difficult or impossible? What might cause problems or be challenging to implement well? What part of this design will limit the maximum image size? Any guesses as to what framerate I might achieve?
I also welcome any guidance or suggestions on how to approach this task for someone brand new to Direct3D.
That seems pretty doable to me, 30 fps even seems quite low, you can certainly achieve a solid 60 (minimum)
One image at 8k*2k resolution is about 100 megs of VRAM (with mipmaps), so with today's graphics cards it's not much of an issue, you'll of course face challenges if you need to load several at the same time.
DirectX 11 supports 16k*16k size textures, so for maximum size you should be sorted.
If you want to just show your image flat your should not even need any 3d transformations, 2d scaling/translations will do just fine.

High-performance copying of RGB pixel data to the screen in iOS

Our product contains a kind of software image decoder that essentially produces full-frame pixel data that needs to be rapidly copied the screen (we're running on iOS).
Currently we're using CGBitmapContextCreate and we access the memory buffer directly, then for each frame we call CGBitmapContextCreateImage, and then draw that bitmap to the screen. This is WAY too slow for full-screen refreshes on the iPad's retina display at a decent framerate (but it was okay for non-Retina-devices).
We've tried all kinds of OpenGL ES-based approaches, including the use of glTexImage2D and glTexSubImage2D (essentially rendering to a texture), but CPU usage is still high and we can't get more than ~30 FPS for full-screen refreshes on the iPad 3. The problem is that with 30 FPS, CPU usage is nearly at %100 just for copying the pixels to the screen, which means we don't have much to work with for our own rendering on the CPU.
We are open to using OpenGL or any iOS API that would give us maximum performance. The pixel data is formatted as a 32-bit-per-pixel RGBA data but we have some flexibility there...
Any suggestions?
So, the bad news is that you have run into a really hard problem. I have been doing quite a lot of research in this specific area and currently the only way that you can actually blit a framebuffer that is the size of the full screen at 2x is to use the h.264 decoder. There are quite a few nice tricks that can be done with OpenGL once you have image data already decoded into actual memory (take a look at GPUImage). But, the big problem is not how to move the pixels from live memory onto the screen. The real issue is how to move the pixels from the encoded form on disk into live memory. One can use file mapped memory to hold the pixels on disk, but the IO subsystem is not fast enough to be able to swap out enough pages to make it possible to stream 2x full screen size images from mapped memory. This used to work great with 1x full screen sizes, but now the 2x size screens are actually 4x the amount of memory and the hardware just cannot keep up. You could also try to store frames on disk in a more compressed format, like PNG. But, then decoding the compressed format changes the problem from IO bound to CPU bound and you are still stuck. Please have a look at my blog post opengl_write_texture_cache for the full source code and timing results I found with that approach. If you have a very specific format that you can limit the input image data to (like an 8 bit table), then you could use the GPU to blit 8 bit data as 32BPP pixels via a shader, as shown in this example xcode project opengl_color_cycle. But, my advice would be to look at how you could make use of the h.264 decoder since it is actually able to decode that much data in hardware and no other approaches are likely to give you the kind of results you are looking for.
After several years, and several different situations where I ran into this need, I've decided to implement a basic "pixel viewer" view for iOS. It supports highly optimized display of a pixel buffer in a wide variety of formats, including 32-bpp RGBA, 24-bpp RGB, and several YpCbCr formats.
It also supports all of the UIViewContentMode* for smart scaling, scale to fit/fill, etc.
The code is highly optimized (using OpenGL), and achieves excellent performance on even older iOS devices such as iPhone 5 or the original iPad Air. On those devices it achieves 60FPS on all pixel formats except for 24bpp formats, where it achieves around 30-50fps (I usually benchmark by showing a pixel buffer at the device's native resolution, so obviously an iPad has to push far more pixels than the iPhone 5).
Please check out EEPixelViewer.
CoreVideo is most likely the framework you should be looking at. With the OpenGL and CoreGraphics approaches, you're being hit hard by the cost of moving bitmap data from main memory onto GPU memory. This cost exists on desktops as well, but is especially painful on iPhones.
In this case, OpenGL won't net you much of a speed boost over CoreGraphics because the bottleneck is the texture data copy. OpenGL will get you a more efficient rendering pipeline, but the damage will have already been done by the texture copy.
So CoreVideo is the way to go. As I understand the framework, it exists to solve the very problem you're encountering.
The pbuffer or FBO can then be used as a texture map for further rendering by OpenGL ES. This is called Render to Texture or RTT. its much quicker search pbuffer or FBO in EGL

Resources