Converting raw data to displayable video for iOS - ios

I have an interesting problem I need to research related to very low level video streaming.
Has anyone had any experience converting a raw stream of bytes(separated into per pixel information, but not a standard format of video) into a low resolution video stream? I believe that I can map the data into RGB value per pixel bytes, as the color values that correspond to the value in the raw data will be determined by us. I'm not sure where to go from there, or what the RGB format needs to be per pixel.
I've looked at FFMPeg but it's documentation is massive and I don't know where to start.
Specific questions I have include, is it possible to create CVPixelBuffer with that pixel data? If I were to do that, what sort of format for the per pixel data would I need to convert to?
Also, should I be looking deeper into OpenGL, and if so where would the best place to look for information on this topic?
What about CGBitmapContextCreate? For example, if I went I went with something like this, what would a typical pixel byte need to look like? Would this be fast enough to keep the frame rate above 20fps?
EDIT:
I think with the excellent help of you two, and some more research on my own, I've put together a plan for how to construct the raw RGBA data, then construct a CGImage from that data, in turn create a CVPixelBuffer from that CGImage from here CVPixelBuffer from CGImage.
However, to then play that live as the data comes in, I'm not sure what kind of FPS I would be looking at. Do I paint them to a CALayer, or is there some similar class to AVAssetWriter that I could use to play it as I append CVPixelBuffers. The experience that I have is using AVAssetWriter to export constructed CoreAnimation hierarchies to video, so the videos are always constructed before they begin playing, and not displayed as live video.

I've done this before, and I know that you found my GPUImage project a little while ago. As I replied on the issues there, the GPUImageRawDataInput is what you want for this, because it does a fast upload of RGBA, BGRA, or RGB data directly into an OpenGL ES texture. From there, the frame data can be filtered, displayed to the screen, or recorded into a movie file.
Your proposed path of going through a CGImage to a CVPixelBuffer is not going to yield very good performance, based on my personal experience. There's too much overhead when passing through Core Graphics for realtime video. You want to go directly to OpenGL ES for the fastest display speed here.
I might even be able to improve my code to make it faster than it is right now. I currently use glTexImage2D() to update texture data from local bytes, but it would probably be even faster to use the texture caches introduced in iOS 5.0 to speed up refreshing data within a texture that maintains its size. There's some overhead in setting up the caches that makes them a little slower for one-off uploads, but rapidly updating data should be faster with them.

My 2 cents:
I made an opengl game which lets the user record a 3d scene. Playback was done via replaying the scene (instead of playing a video because realtime encoding did not yield a comfortable FPS.
There is a technique which could help out, unfortunately I didn't have time to implement it:
http://allmybrain.com/2011/12/08/rendering-to-a-texture-with-ios-5-texture-cache-api/
This technique should cut down time on getting pixels back from openGL. You might get an acceptable video encoding rate.

Related

Taking Frame from Video vs Taking a Photo

My specific question is: What are the drawbacks to using a snipped frame from a video vs taking a photo?
Details:
I want to use frames from live video streams to replace taking pictures because it is faster. I have already researched and considered:
Videos need faster shutter speed, leading to higher possibility of blurring
Faster shutter speed also means less exposure to light, leading to potentially darker images
A snipped frame from a video will probably be lower resolution (although maybe we can possibly turn up the resolution to compensate for this?)
Video might take up more memory -- I am still exploring the details with another post (What is being stored and where when you use cv2.VideoCapture()?)
Anything else?
I will reword my question to make it (possibly) easier to answer: What changes must I make to a "snip frame from video" process to make the result equivalent to taking a photo? Are these changes worth it?
The maximum resolution in picamera is 2592x1944 for still photos and 1920x1080 for video recording. Other issues to take into account are that you cannot receive all formats from VideoCapture, so now conversion of the YUV frame to JPG will be your responsibility. OK, OpenCV can handle this, but it takes considerable CPU time and memory.

Implementing audio waveform view and audio timeline view in iOS?

I am working on an app that will allow users to record from the mic, and I am using audio units for the purpose. I have the audio backend figured out and working, and I am starting to work on the views/controls etc.
There are two things I am yet to implement:
1) I will be using OpenGL ES to draw waveform of the audio input, there seems to be no easier way to do it for real-time drawing. I will be drawing inside a GLKView. After something is recorded, the user should be able to scroll back and forth and see the waveform without glitches. I know it's doable, but having a hard time understanding how that can be implemented. Suppose, the user is scrolling, would I need to re-read the recorded audio every time and re-draw everything? I obviously don't want to store the whole recording in memory, and reading from disk is slow.
2) For the scrolling etc., the user should see a timeline, and if I have an idea of the 1 question, I don't know how to implement the timeline.
All the functionality I'm describing is do-able since it can be seen in the Voice Memos app. Any help is always appreciated.
I have done just this. The way I did it was to create a data structure that holds different "zoom levels" data for the audio. Unless you are displaying the audio at a resolution that will display 1 sample per 1 pixel, you don't need every sample to be read from disk, so what you do is downsample your samples to a much smaller array that can be stored in memory ahead of time. A naive example is if your waveform were to display audio at a ratio of 64 samples per pixel. Lets say you have an array of 65536 stereo samples, you would average each L/R pair of samples into a positive mono value, then average 64 of the positive mono values into one float. Then your array of 65536 audio samples can be visualized with an array of 512 "visual samples". My real world implementation became much more complicated than this as I have ways to display all zoom levels and live resampling and such, but this is the basic idea. It's essentially a Mip map for audio.

iOS video playback

In my application i should play video in unusual way.
Something like interactive player for special purposes.
Main issues here:
video resolution can be from 200*200px up to 1024*1024 px
i should have ability to change speed from -60 FPS to 60 PFS (in this case video should be played slower or faster depending on selected speed, negative means that video should play in back direction)
i should draw lines and objects over the video and scale it with image.
i should have ability Zoom image and pan it if its content more than screen size
i should have ability to change brightness, contrast and invert colors of this video
Now im doing next thing:
I splited my video to JPG frames
created timer for N times per seconds (play speed control)
each timer tick im drawing new texture (next JPG frame) with OpenGL
for zoom and pan im playing with OpenGL ES transformations (translate, scale)
All looks fine until i use 320*240 px, but if i use 512*512px my play rate is going down. Maybe timer behavour problem, maybe OpenGL. Sometimes, if im trying to open big textures with high play rate (more than 10-15 FPS), application just crash with memory warnings.
What is the best practice to solve this issue? What direction should i dig? Maybe cocos2d or other game engines helps me? Mb JPG is not best solution for textures and i should use PNG or PVR or smth else?
Keep the video data as a video and use AVAssetReader to get the raw frames. Use kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange as the colorspace, and do YUV->RGB colorspace conversion in GLES. It will mean keeping less data in memory, and make much of your image processing somewhat simpler (since you'll be working with luma and chroma data rather than RGB values).
You don't need to bother with Cocos 2d or any game engine for this. I strongly recommend doing a little bit of experimenting with OpenGL ES 2.0 and shaders. Using OpenGL for video is very simple and straightforward, adding a game engine to the mix is unnecessary overhead and abstraction.
When you upload image data to the textures, do not create a new texture every frame. Instead, create two textures: one for luma, and one for chroma data, and simply reuse those textures every frame. I suspect your memory issues are arising from using many images and new textures every frame and probably not deleting old textures.
JPEG frames will be incredibly expensive to uncompress. First step: use PNG.
But wait! There's more.
Cocos2D could help you mostly through its great support for sprite sheets.
The biggest help, however, may come from packed textures a la TexturePacker. Using PVR.CCZ compression can speed things up by insane amounts, enough for you to get better frame rates at bigger video sizes.
Vlad, the short answer is that you will likely never be able to get all of these features you have listed working at the same time. Playing video 1024 x 1024 video at 60 FPS is really going to be a stretch, I highly doubt that iOS hardware is going to be able to keep up with those kind of data transfer rates at 60FPS. Even the h.264 hardware on the device can only do 30FPS at 1080p. It might be possible, but to then layer graphics rendering over the video and also expect to be able to edit the brightness/contrast at the same time, it is just too many things at the same time.
You should focus in on what is actually possible instead of attempting to do every feature. If you want to see an example Xcode app that pushes iPad hardware right to the limits, please have a look at my Fireworks example project. This code displays multiple already decoded h.264 videos on screen at the same time. The implementation is built around CoreGraphics APIs, but the key thing is that Apple's impl of texture uploading to OpenGL is very fast because of a zero copy optimization. With this approach, a lot of video can be streamed to the device.

High-performance copying of RGB pixel data to the screen in iOS

Our product contains a kind of software image decoder that essentially produces full-frame pixel data that needs to be rapidly copied the screen (we're running on iOS).
Currently we're using CGBitmapContextCreate and we access the memory buffer directly, then for each frame we call CGBitmapContextCreateImage, and then draw that bitmap to the screen. This is WAY too slow for full-screen refreshes on the iPad's retina display at a decent framerate (but it was okay for non-Retina-devices).
We've tried all kinds of OpenGL ES-based approaches, including the use of glTexImage2D and glTexSubImage2D (essentially rendering to a texture), but CPU usage is still high and we can't get more than ~30 FPS for full-screen refreshes on the iPad 3. The problem is that with 30 FPS, CPU usage is nearly at %100 just for copying the pixels to the screen, which means we don't have much to work with for our own rendering on the CPU.
We are open to using OpenGL or any iOS API that would give us maximum performance. The pixel data is formatted as a 32-bit-per-pixel RGBA data but we have some flexibility there...
Any suggestions?
So, the bad news is that you have run into a really hard problem. I have been doing quite a lot of research in this specific area and currently the only way that you can actually blit a framebuffer that is the size of the full screen at 2x is to use the h.264 decoder. There are quite a few nice tricks that can be done with OpenGL once you have image data already decoded into actual memory (take a look at GPUImage). But, the big problem is not how to move the pixels from live memory onto the screen. The real issue is how to move the pixels from the encoded form on disk into live memory. One can use file mapped memory to hold the pixels on disk, but the IO subsystem is not fast enough to be able to swap out enough pages to make it possible to stream 2x full screen size images from mapped memory. This used to work great with 1x full screen sizes, but now the 2x size screens are actually 4x the amount of memory and the hardware just cannot keep up. You could also try to store frames on disk in a more compressed format, like PNG. But, then decoding the compressed format changes the problem from IO bound to CPU bound and you are still stuck. Please have a look at my blog post opengl_write_texture_cache for the full source code and timing results I found with that approach. If you have a very specific format that you can limit the input image data to (like an 8 bit table), then you could use the GPU to blit 8 bit data as 32BPP pixels via a shader, as shown in this example xcode project opengl_color_cycle. But, my advice would be to look at how you could make use of the h.264 decoder since it is actually able to decode that much data in hardware and no other approaches are likely to give you the kind of results you are looking for.
After several years, and several different situations where I ran into this need, I've decided to implement a basic "pixel viewer" view for iOS. It supports highly optimized display of a pixel buffer in a wide variety of formats, including 32-bpp RGBA, 24-bpp RGB, and several YpCbCr formats.
It also supports all of the UIViewContentMode* for smart scaling, scale to fit/fill, etc.
The code is highly optimized (using OpenGL), and achieves excellent performance on even older iOS devices such as iPhone 5 or the original iPad Air. On those devices it achieves 60FPS on all pixel formats except for 24bpp formats, where it achieves around 30-50fps (I usually benchmark by showing a pixel buffer at the device's native resolution, so obviously an iPad has to push far more pixels than the iPhone 5).
Please check out EEPixelViewer.
CoreVideo is most likely the framework you should be looking at. With the OpenGL and CoreGraphics approaches, you're being hit hard by the cost of moving bitmap data from main memory onto GPU memory. This cost exists on desktops as well, but is especially painful on iPhones.
In this case, OpenGL won't net you much of a speed boost over CoreGraphics because the bottleneck is the texture data copy. OpenGL will get you a more efficient rendering pipeline, but the damage will have already been done by the texture copy.
So CoreVideo is the way to go. As I understand the framework, it exists to solve the very problem you're encountering.
The pbuffer or FBO can then be used as a texture map for further rendering by OpenGL ES. This is called Render to Texture or RTT. its much quicker search pbuffer or FBO in EGL

How to read pixel values of a video?

I recently wrote C programs for image processing of BMP images, I had to read the pixel values and process them, it was very simple I followed the BMP header contents and I could get most of the information of the BMP image.
Now the challenge is to process videos (Frame by Frame), how can I do it? How will I be able to read the headers of continuous streams of image frames in a video clip? Or else, is it like, for example, the mpeg format will also have universal header, upon reading which I can get the information about the entire video and after the header, all the data are only pixels.
I hope I could convey.
Has anyone got experience with processing videos?
Any books or links to tutorials will be very helpful.
A video stream, like MPEG, is composed by a number of frames dependent from (obviously) its duration and its frame-rate. To read a pixel you must start from what is called an Intra frame, which is not dependent from a previous frame into the stream. Any successive frame is a frame which is temporally dependent from its previous frame, so to obtain its pixel, you have to decode the stream from the Intra to the frame you want.
Note that, tipically, an intra frame is inserted periodically to give to the decoder a way to synchronize with the stream. This is very useful in a context where errors can occur.
What you want to do isn't an easy work. You have to use an MPEG decoder and then modify the frame before diplaying it, if you want to do post processing, like a filter or other.
I suggest you to study video coding, and you can find a lot of material on that, starting from the standard MPEG.
I would recommend looking into FFMpeg. It has a command line utility that can grab frames from a movie and dump them to an image like JPG. You can then modify your existing reader to handle JPG files (just use something like libjpeg to decode the JPG to a raw pixel buffer).
Alternatively, you can use the FFMpeg APIs (C, Python, other), and do the frame capture programatically and look at the pixels as you move through the video. Video formats are complex, so unless you want to start understanding all of the different codecs, you might want to grab a library to do the decode to raw pixel buffer for you.
MPEG 1/2/4 videos are much more difficult to handle than bitmaps because they are compressed. With the bitmap data, you have actual color values stored directly to the file. In MPEG, or JPEG for that matter, the color information goes through a number transformations before being written to the file. These include
RGB -> YUV 420P (sub-sampled chrominance)
Discrete Cosine Transform
Weighted quantization
zig-zag ordering
differential encoding
variable-length encoding (Huffman-like)
All of this means that there is no simple way to parse pixel data out of the file. You either have to study every minute detail of the standard and write your own decoder, or use some video decoding library like ffmpeg to do the work for you. ffmpeg can convert your video to still images (see answers this recent question). Also, you could interface directly to the ffmpeg libraries (libavformat and libavcodec). See the answers to this question for good tutorials.

Resources